diff --git a/README.md b/README.md
index 31d6fa18..04121af6 100644
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@ The Personal Cancer Genome Reporter (PCGR) is a stand-alone software package int
[![Documentation Status](https://readthedocs.org/projects/pcgr/badge/?version=latest)](http://pcgr.readthedocs.io/en/latest/?badge=latest)
-### Annotation resources included in PCGR (v0.3)
+### Annotation resources included in PCGR (v0.3.2)
* [VEP v85](http://www.ensembl.org/info/docs/tools/vep/index.html) - Variant Effect Predictor release 85 (GENCODE v19 as the gene reference dataset)
* [COSMIC v80](http://cancer.sanger.ac.uk/cosmic/) - Catalogue of somatic mutations in cancer (February 2017)
@@ -53,16 +53,16 @@ A local installation of Python (it has been tested with [version 2.7.13](https:/
#### STEP 2: Download PCGR
-April 14th 2017: New release (0.3.1)
+April 19th 2017: New release (0.3.2)
-1. Download and unpack the [latest release (0.3.1)](https://github.com/sigven/pcgr/releases/latest)
+1. Download and unpack the [latest release (0.3.2)](https://github.com/sigven/pcgr/releases/latest)
2. Download and unpack the data bundle (approx. 17Gb) in the PCGR directory
- * Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number, e.g `~/pcgr-0.3.1`)
+ * Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number, e.g `~/pcgr-0.3.2`)
* Unpack the data bundle, e.g. through the following Unix command: `gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -`
A _data/_ folder within the _pcgr-X.X_ software folder should now have been produced
-3. Pull the [PCGR Docker image (0.3.1)](https://hub.docker.com/r/sigven/pcgr/) from DockerHub (3.1Gb):
- * `docker pull sigven/pcgr:0.3.1` (PCGR annotation engine)
+3. Pull the [PCGR Docker image (0.3.2)](https://hub.docker.com/r/sigven/pcgr/) from DockerHub (3.1Gb):
+ * `docker pull sigven/pcgr:0.3.2` (PCGR annotation engine)
#### STEP 3: Input preprocessing
@@ -112,7 +112,7 @@ A tumor sample report is generated by calling the Python script __pcgr.py__ in t
positional arguments:
pcgr_dir PCGR base directory with accompanying data directory,
- e.g. ~/pcgr-0.3.1
+ e.g. ~/pcgr-0.3.2
output_dir Output directory
sample_id Tumor sample/cancer genome identifier - prefix for
output files
@@ -146,7 +146,7 @@ A tumor sample report is generated by calling the Python script __pcgr.py__ in t
The _examples_ folder contain sample files from TCGA. A report for a colorectal tumor case can be generated through the following command:
-`python pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments tumor_sample.COAD.cna.tsv ~/pcgr-0.3.1 ~/pcgr-0.3.1/examples tumor_sample.COAD`
+`python pcgr.py --input_vcf tumor_sample.COAD.vcf.gz --input_cna_segments tumor_sample.COAD.cna.tsv ~/pcgr-0.3.2 ~/pcgr-0.3.2/examples tumor_sample.COAD`
This command will run the Docker-based PCGR workflow and produce the following output files in the _examples_ folder:
diff --git a/docs/_build/doctrees/annotation_resources.doctree b/docs/_build/doctrees/annotation_resources.doctree
index c6afe1fc..f619ca2b 100644
Binary files a/docs/_build/doctrees/annotation_resources.doctree and b/docs/_build/doctrees/annotation_resources.doctree differ
diff --git a/docs/_build/doctrees/environment.pickle b/docs/_build/doctrees/environment.pickle
index b362ed94..9f14d7d0 100644
Binary files a/docs/_build/doctrees/environment.pickle and b/docs/_build/doctrees/environment.pickle differ
diff --git a/docs/_build/doctrees/getting_started.doctree b/docs/_build/doctrees/getting_started.doctree
index ca54be24..8177a7a9 100644
Binary files a/docs/_build/doctrees/getting_started.doctree and b/docs/_build/doctrees/getting_started.doctree differ
diff --git a/docs/_build/doctrees/output.doctree b/docs/_build/doctrees/output.doctree
index 3d9a5384..31d6ddef 100644
Binary files a/docs/_build/doctrees/output.doctree and b/docs/_build/doctrees/output.doctree differ
diff --git a/docs/_build/html/_sources/annotation_resources.rst.txt b/docs/_build/html/_sources/annotation_resources.rst.txt
index 02c5d8d6..fe8a3d12 100644
--- a/docs/_build/html/_sources/annotation_resources.rst.txt
+++ b/docs/_build/html/_sources/annotation_resources.rst.txt
@@ -79,11 +79,11 @@ A requirement for all variant annotation datasets used in PCGR is that
they have been mapped unambiguously to the human genome (GRCh37). For
most datasets this is already the case (i.e. dbSNP, COSMIC, ClinVar
etc.). A significant proportion of variants in the annotation datasets
-related to clinical interpretation, CIViC and CBMDB, are however not
+related to clinical interpretation, CIViC and CBMDB, is however not
mapped to the genome. Whenever possible, we have utilized
`TransVar `__ to
identify the actual genomic variants (e.g. *g.chr7:140453136A>T*) that
-corresponds to variants reported with other HGVS nomenclature (e.g.
+correspond to variants reported with other HGVS nomenclature (e.g.
*p.V600E*).
Other data quality concerns
@@ -91,7 +91,7 @@ Other data quality concerns
**Clinical biomarkers**
-Clinical biomarkers included in PCGR is limited to the following:
+Clinical biomarkers included in PCGR are limited to the following:
- Markers reported at the variant level (e.g. **BRAF p.V600E**)
- Markers reported at the codon level (e.g. **KRAS p.G12**)
diff --git a/docs/_build/html/_sources/getting_started.rst.txt b/docs/_build/html/_sources/getting_started.rst.txt
index 35ddf1c0..cce4c2d5 100644
--- a/docs/_build/html/_sources/getting_started.rst.txt
+++ b/docs/_build/html/_sources/getting_started.rst.txt
@@ -42,10 +42,10 @@ terminal window.
Download PCGR
^^^^^^^^^^^^^
-**April 14th 2017**: New release (0.3.1)
+**April 19th 2017**: New release (0.3.2)
- Download and unpack the `latest release
- (0.3.1) `__
+ (0.3.2) `__
- Download and unpack the data bundle (approx. 17Gb) in the PCGR
directory
@@ -53,7 +53,7 @@ Download PCGR
- Download `the latest data
bundle `__
from Google Drive to ``~/pcgr-X.X`` (replace *X.X* with the
- version number, e.g. ``~/pcgr-0.3.1``)
+ version number, e.g. ``~/pcgr-0.3.2``)
- Decompress and untar the bundle, e.g. through the following Unix
command:
``gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -``
@@ -62,10 +62,10 @@ Download PCGR
have been produced
- Pull the `PCGR Docker image -
- 0.3.1 `__ from DockerHub
+ 0.3.2 `__ from DockerHub
(3.1Gb) :
- - ``docker pull sigven/pcgr:0.3.1`` (PCGR annotation engine)
+ - ``docker pull sigven/pcgr:0.3.2`` (PCGR annotation engine)
Run test - generation of clinical report for a cancer genome
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -89,7 +89,7 @@ A tumor sample report is generated by calling the Python script
positional arguments:
pcgr_dir PCGR base directory with accompanying data directory,
- e.g. ~/pcgr-0.3
+ e.g. ~/pcgr-0.3.2
output_dir Output directory
sample_id Tumor sample/cancer genome identifier - prefix for
output files
@@ -125,7 +125,7 @@ sequenced within TCGA. A report for a colorectal tumor case can be
generated by running the following command in your terminal window:
``python pcgr.py --input_vcf examples/tumor_sample.COAD.vcf.gz --input_cna_segments``
-``examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.1 ~/pcgr-0.3.1/examples tumor_sample.COAD``
+``examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.2 ~/pcgr-0.3.2/examples tumor_sample.COAD``
This command will run the Docker-based PCGR workflow and produce the
following output files in the *examples* folder:
diff --git a/docs/_build/html/_sources/output.rst.txt b/docs/_build/html/_sources/output.rst.txt
index d52b9134..e59a1c9d 100644
--- a/docs/_build/html/_sources/output.rst.txt
+++ b/docs/_build/html/_sources/output.rst.txt
@@ -36,17 +36,24 @@ work properly:
`tabix `__
- 'chr' must be stripped from the chromosome names
-**IMPORTANT NOTE**: Considering the VCF output for the `numerous somatic
-SNV/InDel callers `__ that have been
-developed, we have a experienced a general lack of uniformity and
-robustness for the representation of somatic variant genotype data (e.g.
-variant allelic depths (tumor/normal), genotype quality etc.). In the
-output results provided within the current version of PCGR, we are
+**IMPORTANT NOTE 1**: Considering the VCF output for the `numerous
+somatic SNV/InDel callers `__ that
+have been developed, we have a experienced a general lack of uniformity
+and robustness for the representation of somatic variant genotype data
+(e.g. variant allelic depths (tumor/normal), genotype quality etc.). In
+the output results provided within the current version of PCGR, we are
considering PASSed variants only, and variant genotype data (i.e. as
found in the VCF SAMPLE columns) are not handled or parsed. As improved
standards for this matter may emerge, we will strive to include this
information in the annotated output files.
+**IMPORTANT NOTE 2**: PCGR generates a number of VCF INFO annotation
+tags that is appended to the query VCF. We will therefore encourage the
+users to submit query VCF files that have not been subject to
+annotations by other means, but rather a VCF file that comes directly
+from variant calling. If not, there are likely to be INFO tags in the
+query VCF file that coincide with those produced by PCGR.
+
Copy number segments
^^^^^^^^^^^^^^^^^^^^
diff --git a/docs/_build/html/annotation_resources.html b/docs/_build/html/annotation_resources.html
index 2c9db25f..b11b548b 100644
--- a/docs/_build/html/annotation_resources.html
+++ b/docs/_build/html/annotation_resources.html
@@ -234,17 +234,17 @@ Genome mapping to
identify the actual genomic variants (e.g. g.chr7:140453136A>T) that
-corresponds to variants reported with other HGVS nomenclature (e.g.
+correspond to variants reported with other HGVS nomenclature (e.g.
p.V600E).
Copy number segments
diff --git a/docs/_build/html/searchindex.js b/docs/_build/html/searchindex.js
index 713fb367..e0375803 100644
--- a/docs/_build/html/searchindex.js
+++ b/docs/_build/html/searchindex.js
@@ -1 +1 @@
-Search.setIndex({docnames:["about","annotation_resources","getting_started","index","output"],envversion:50,filenames:["about.rst","annotation_resources.rst","getting_started.rst","index.rst","output.rst"],objects:{},objnames:{},objtypes:{},terms:{"1000g":4,"1000genom":1,"12th":[],"140453136a":1,"14th":2,"16gb":[],"17gb":2,"17th":[],"1gb":2,"1kg":[],"2016_03":[],"2016_09":1,"2020plu":2,"28th":[],"2gb":[],"5gb":2,"5th":1,"8th":1,"abstract":1,"case":[1,2,4],"class":4,"default":2,"function":[2,3,4],"import":[0,2,4],"new":2,"public":4,"short":[1,4],CDS:4,EAS:4,For:[1,2,4],IDs:4,POS:[],SAS:4,The:[0,1,2,4],There:0,These:[],_strong:[],abber:3,aberr:[0,2,4],about:3,abov:4,accept:4,acceptor:4,access:[0,4],accompani:2,accord:[2,4],acid:4,acquir:0,across:4,action:4,actual:1,ada:[],adapt:4,add:4,adddit:4,addit:[0,4],adenoma:1,adjust:4,advanc:2,af_norm:[],af_tumor:[],affect:4,affected_spl:[],affili:0,afr:4,afr_af_1kg:4,afr_af_exac:4,afr_af_gnomad:4,african:4,after:4,aggreg:4,aid:0,algorithm:4,align:4,all:[0,1],allel:4,allele_num:4,alon:0,alreadi:[1,2],also:1,alt:4,alter:[2,4],altern:4,american:4,amino:4,amino_acid:4,among:[],amplif:2,amr:4,amr_af_1kg:4,amr_af_exac:4,amr_af_gnomad:4,analys:2,analysi:0,analyz:4,ani:[2,4],annot:[0,2,3],annotation_resourc:[],antineoplast:[1,4],antineoplastic_drug_interact:4,antineoplastic_drugs_dgidb:4,appli:4,applic:[0,4],appri:4,approv:4,approx:2,april:[1,2],argument:2,asian:4,assembl:4,assign:2,associ:2,attach:[0,4],aug:[],b147:1,base:[2,3,4],basic:[0,2,3,4],been:[0,1,2,4],below:4,benign:[],best:4,betweeen:1,bgzip:4,bind:4,biocomput:4,bioconductor:4,biologi:0,biomark:[0,1,2],biotyp:4,block:4,block_substitut:[],bm_citat:4,bm_clinical_signific:4,bm_disease_nam:4,bm_drug_nam:4,bm_evidence_direct:4,bm_evidence_level:4,bm_evidence_typ:4,bm_rate:4,bool:[],boost:4,both:[0,4],braf:1,breast:4,browser:4,build:4,bundl:[0,1,2],cadd:4,call:2,call_confid:[],caller:4,can:[0,2,4],cancer:4,cancer_census_germlin:4,cancer_census_somat:4,cancer_mutation_hotspot:4,cancer_typ:4,cancerhotspot:4,candid:4,canon:4,cap:4,caption:[],care:0,carri:4,catalog:[1,4],catalogu:1,categori:4,caus:4,causal:4,cbmdb:1,cbmdb_id:4,ccd:4,cdna:4,cdna_posit:4,cds:[],cds_chang:4,cds_end_nf:4,cds_posit:4,cds_start_nf:4,cell:4,cell_typ:4,cellular:[],cencu:1,censu:4,challeng:0,chang:[2,4],check:2,chr1:4,chr7:1,chr:4,chrom:4,chrome:4,chromosom:4,citat:[],cite:4,civic:[1,4],civic_id:4,civic_id_2:4,classif:4,clin:[],clin_sig:4,clinic:[0,3],clinvar:[1,4],clinvar_msid:4,clinvar_pmid:4,clinvar_sig:4,clinvar_variant_origin:4,cluster:4,cna:2,cna_seg:[2,4],cnminor:[],cntotal:[],cnv:[],coad:2,code:3,codon:[1,4],codon_numb:4,cohort:1,collect:1,colorect:[2,4],column:4,com:[],come:1,command:2,common:4,complet:[2,4],complex:0,composit:4,comprehens:0,compress:4,comput:2,concern:3,confer:1,confid:[],confirmed_somat:1,consensu:4,consequ:3,consid:4,consortium:[0,1,4],constitut:4,contain:[0,2,4],content:[3,4],context:0,contribut:[2,4],convent:4,coordin:4,copi:[0,2,3],correctli:4,correspond:1,cosmic:[1,4],cosmic_cancer_type_al:4,cosmic_cancer_type_gw:4,cosmic_codon_count_gw:4,cosmic_codon_frac_gw:4,cosmic_consequ:4,cosmic_count_gw:4,cosmic_drug_resist:4,cosmic_fathmm_pr:4,cosmic_mutation_id:4,cosmic_sample_sourc:4,cosmic_site_histolog:4,cosmic_vartyp:4,count:4,cover:4,cpu:2,creat:[],criteria:1,csq:4,curat:[1,4],current:4,damag:[],data:[0,2,3,4],databas:3,databundl:2,dataset:3,date:0,dbnsfp:[1,4],dbnsfp_consensus_lr:4,dbnsfp_consensus_svm:4,dbsnp:[1,4],dbsnp_mappingstatu:4,dbsnp_submiss:4,dbsnp_valid:4,dbsnpbuildid:4,dbsnprsid:4,dec:1,decompos:4,decompress:2,deconstructsig:4,dedic:[],defin:4,delet:[2,4],delin:4,denot:4,depend:0,depth:4,deriv:4,describ:4,descript:4,determin:[],develop:[0,4],dgidb:1,diagnosi:2,diagnost:[0,4],differ:[2,4],direct:[],directori:2,discov:1,diseas:4,distanc:4,distribut:4,dna:4,doc:[],docker:3,dockerhub:2,docm:1,docm_diseas:4,docm_pmid:4,document:[],domain:[3,4],done:4,donor:4,download:[],downstream:2,dp_normal:[],dp_tumor:[],drive:2,driver:[1,4],drug:[1,2,4],dure:2,each:4,eas_af_1kg:4,eas_af_exac:4,eas_af_gnomad:4,east:4,effect:[0,3],effect_predict:4,either:[1,4],emerg:4,end:4,engin:2,ensembl:[0,4],ensembl_gene_id:4,ensembl_transcript_id:4,ensp:4,entrez:4,entrez_id:4,error:2,estim:2,etc:[1,4],etiolog:[2,4],eur:4,eur_af_1kg:4,european:4,event:4,evid:[2,4],exac:[1,4],exampl:[2,4],exist:[2,4],existing_vari:4,exit:2,exom:[1,4],exon:[1,4],experi:4,experienc:4,experiment:4,expert:0,explor:4,extend:0,facet:[],factor:4,fail:2,fall:4,fals:2,famili:1,fathmm:4,fathmm_mkl:4,fda:1,featur:[3,4],feature_typ:4,feb:[],februai:1,februari:1,figur:0,file:[2,4],fin:[],fin_af_exac:4,fin_af_gnomad:4,find:[0,4],finnish:4,firefox:4,first:4,flag:[2,4],flag_pick_allel:4,flank:4,flexibl:0,focu:[],folder:2,follow:[1,2,4],forc:2,force_overwrit:2,fork:2,form:0,format:0,found:[1,2,4],four:4,frac:[],fraction:[],fraction_mut:4,frameshift:4,frequenc:3,from:[0,1,2,4],g12:1,gain:4,gencod:[1,4],gencode_tag:4,gencode_transcript_typ:4,gencode_v19:4,gene:[0,2,3],gene_biotyp:4,gene_nam:4,gene_pheno:4,gene_symbol:4,gener:[0,3,4],genet:1,genindex:[],genom:4,genome_vers:4,genomic_chang:4,genotyp:4,germlin:1,gerp:4,get:3,getting_start:[],given:4,global:4,global_af_1kg:4,global_af_exac:4,global_af_gnomad:4,gnomad:1,googl:[2,4],grch37:[1,2,4],great:0,guidelin:[1,4],gwa:4,gwas_catalog_pmid:4,gwas_catalog_trait_uri:4,gz_:[],gzip:2,handl:4,has:[0,2,4],have:[0,1,2,4],hdiv:4,help:[2,4],here:4,hgnc:[],hgnc_id:4,hgv:[1,4],hgvs_offset:4,hgvsc:4,hgvsp:4,hgvsp_short:4,high:4,high_inf_po:4,higher:[],highlight:0,histolog:[1,4],hit:4,homozyg:2,hospit:0,host:2,hotspot:[1,4],how:4,howev:1,html:[0,2,3],http:4,human:[1,4],humdiv:[],hvar:4,icgc:[1,4],icgc_project:4,identifi:[1,2,4],iii:0,imag:[0,2],impact:4,implic:4,improv:4,includ:[1,4],incomplet:4,indel:[0,2,3],index:4,indic:4,individu:[0,4],inf:[],infer:4,inferenti:4,info:4,inform:1,initi:4,input:[2,3],input_cna_seg:2,input_vcf:2,insert:4,insilico:3,instal:0,institut:0,instruct:2,integr:[0,4],intend:0,interact:[1,2,3],intern:1,interpret:[0,1,2],interrog:0,intersect:4,intogen:[1,4],intogen_driv:4,intogen_driver_mut:4,intro:[],intron:4,isoform:4,isol:0,item:[2,4],its:4,jan:4,june:[],kit:1,knowledg:[0,3],knowledgebas:1,known:[1,2,4],kra:1,lack:4,larg:[],latest:2,least:[],length:4,level:[0,1,4],librari:0,lies:4,like:[],limit:1,line:[],link:4,linux:2,list:[],literatur:[1,4],log:[2,4],logist:4,logr:4,logr_threshold_amplif:2,logr_threshold_homozygous_delet:2,lost:4,low:4,lrt:4,mac:2,machin:[2,4],maf:2,mai:[1,4],make:[],malign:1,mani:4,map:3,mappabl:4,mappingstatu:[],march:1,marker:1,master:[],match:4,matter:4,maxdepth:[],measur:4,memori:2,messag:2,met:4,minim:2,minimum:[],minor:[],missens:4,mix:4,mkdir:[],mkl:[],modifi:4,modindex:[],modul:[],most:[0,1,4],motif:4,motif_nam:4,motif_po:4,motif_score_chang:4,motiffeatur:4,mozilla:4,mrna:4,msid:[],multipl:4,must:[1,2,4],mut:[],mutat:[0,1,2],mutational_signatur:[2,4],mutationassessor:4,mutationtast:4,mutect:[],mutpr:4,mutsigcv:2,name:4,navig:0,nccn:1,ncgc:[],need:0,nfe:[],nfe_af_exac:4,nfe_af_gnomad:4,nomenclatur:1,non:[1,4],none:2,normal:4,norwegian:0,notat:4,note:[3,4],nov:4,novemb:1,novo:4,now:2,nucleotid:[2,4],num:[],num_vcfanno_process:2,num_vep_fork:2,number:[0,2,3],numer:4,observ:4,obtain:4,oct:[],offset:[],oncogen:[1,4],oncolog:[0,2],oncologist:0,oncoscor:4,one:4,onli:[0,1,4],ontolog:4,option:[2,4],order:4,org:4,organ:[2,4],origin:4,oslo:0,osx:[],oth:[],oth_af_exac:4,oth_af_gnomad:4,other:[2,3],out:4,output:[2,3],output_dir:2,overlap:[2,4],overview:[],overwrit:2,packag:[0,4],page:[],pair:4,pars:4,part:[1,4],particular:4,pass:4,pcgr:[1,3,4],pcgr_dir:2,pcgr_directori:[],pcgreport:[],percent:4,person:2,pfam:1,phase3:1,phase:4,pheno:4,phenotyp:4,phred:[],pick:4,pipelin:4,platform:2,pmid:4,point:4,polyp:1,polyphen2:4,portrai:4,pose:0,posit:[2,4],possibl:1,potenti:4,pre:4,precis:[0,2],pred:[],predict:[3,4],predictor:[0,1],predispos:4,predisposit:4,prefer:2,prefix:2,prerequisit:3,present:[0,4],primari:4,princip:4,prinicip:[],priorit:0,process:[2,4],produc:[0,2],product:4,profil:4,prognosi:2,prognost:[0,4],program:2,project:4,properli:4,proport:1,propos:4,proposed_aetiolog:4,prot:4,protein:3,protein_chang:4,protein_domain:4,protein_posit:4,provean:4,provid:4,pubm:4,pull:2,python:[],qualiti:[3,4],queri:[2,4],quickstart:[],ram:2,rang:4,rate:4,ratio:[2,4],raw:4,recommend:4,record:4,ref:[],refer:[1,4],reflect:4,refseq:4,refseq_match:4,refut:4,regress:4,regulatori:4,regulatoryfeatur:4,rel:4,relat:[0,1],releas:[1,2,4],relev:[0,2,4],replac:2,report:1,reported_in_another_cancer_sample_as_somat:1,repres:2,represent:4,requir:[0,1,2,4],research:0,resist:[2,4],resourc:[0,2,3],respect:4,restart:2,result:[0,2,4],retriev:[0,4],revel:4,rich:2,robust:4,root:[],rsid:4,run:[3,4],run_pcgr:[],safari:4,sampl:[1,2,4],sample_id:[2,4],sample_pair_identifi:[],sampleid:4,sas_af_1kg:4,sas_af_exac:4,sas_af_gnomad:4,satisfi:1,scale:4,scarciti:0,scientif:1,scientist:0,score:4,screen:4,script:2,search:[],segment:2,segment_end:4,segment_length:4,segment_mean:4,segment_start:4,sensit:[2,4],sep:[],separ:2,sequenc:[1,2,4],set:[0,2,4],sever:0,shift:4,shortest:4,should:2,show:[2,4],sift:4,sig:[],sigantur:[],signatur:2,signature_id:4,signific:[1,4],sigven:2,similarli:2,singl:4,site:[1,4],snv:[0,2,3],snvs_indel:[2,4],softwar:[0,2],somat:[0,1,2,3],sort:4,sourc:4,south:4,specif:4,sphinx:[],splice:4,splice_site_effect_ada:4,splice_site_effect_bool:[],splice_site_effect_rf:4,split:4,stabl:4,stand:0,standard:4,star:4,start:[3,4],statement:4,statist:1,statu:[1,4],step:[],stop:4,strand:4,strelka:[],strip:4,strive:4,strong:[],strongli:4,structur:4,studi:4,subject:4,submiss:4,subset:1,substitut:[],subtyp:4,support:4,suppressor:[1,4],svm:[],swiss:4,swissprot:[1,4],symbol:4,symbol_sourc:4,synonym:[1,4],systemat:0,tab:2,tabix:4,tabl:3,tag:4,take:2,taken:0,tar:2,target:4,tcga:[2,4],technolog:3,termin:2,test:[3,4],test_sampl:[],tfbp:4,tgz:2,thei:1,therapeut:[0,4],therapi:4,thi:[1,2,4],through:[0,2,4],throughput:[],thu:0,tier:[0,2,4],tier_descript:4,toctre:[],todo:[],tool:[0,2],toolbar:2,total:4,trait:4,transcript:[2,4],transcript_end:4,transcript_overlap_perc:4,transcript_start:4,transvar:1,treatment:4,trembl:4,treshold:2,trial:[1,4],trust:4,tsgene:[1,4],tsgene_oncogen:4,tsl:4,tsv:2,tumor:[0,1,2,4],tumor_sampl:2,tumor_suppressor:4,tumor_typ:4,tumorigenesi:4,two:[2,4],type:[2,4],unambigu:1,unannot:4,underli:[2,4],uniform:4,uniparc:4,uniprot:[1,4],uniprot_featur:4,uniprot_id:4,uniprotkb:4,uniqu:4,univers:0,unix:2,unpack:2,untar:2,upcom:4,upon:[],upper:4,uri:4,usag:2,use:[2,3],used:[1,2],user:4,using:[0,2,4],util:3,v15:[],v19:[1,4],v22:[],v23:1,v30:[],v31:1,v600e:1,v78:[],v80:1,v85:1,valid:4,valu:2,variabl:4,variant:[0,2,3],variant_class:4,variat:4,variou:4,vartyp:[],vcf:2,vcf_sample_id:4,vcfanno:[0,2],vcfbreakmulti:4,vcflib:4,vcftool:4,vector:4,vep:[0,1,2],vep_all_consequ:4,veri:2,version:[2,4],view:4,virtual:2,weak:[],weak_mutect:[],weak_strelka:[],weight:4,what:3,whenev:1,where:4,whether:4,which:[0,1,2,4],why:3,wide:[1,4],window:2,within:[1,2,4],work:4,workflow:[0,2,4],working_directori:[],wtsi:4,xvf:2,you:2,your:2,yyyymmdd:2},titles:["About","Annotation resources","Getting started","Welcome to Personal Cancer Genome Reporter’s documentation!","Input & output"],titleterms:{"function":1,abber:4,about:0,all:4,among:4,annot:[1,4],associ:4,base:[0,1],basic:1,biomark:4,both:[],call:4,cancer:[0,1,2,3],clinic:[1,2,4],code:[1,4],concern:1,consequ:[1,4],copi:4,data:1,databas:[1,4],dataset:1,differ:[],dna:[],docker:[0,2],document:3,domain:1,download:2,drug:[],effect:[1,4],etc:[],exampl:[],featur:1,format:4,frequenc:[1,4],gene:[1,4],gener:2,genom:[0,1,2,3],germlin:4,get:2,hotspot:[],html:4,includ:[],indel:4,indic:[],inform:4,input:4,insilico:1,instal:2,interact:4,introduct:[],knowledg:1,list:4,map:1,marker:[],mutat:4,ncgc:[],note:1,number:4,oncovarexplor:[],other:[1,4],output:4,packag:[],pcgr:[0,2],person:[0,3],predict:1,preprocess:[],prerequisit:2,protein:[1,4],python:2,qualiti:1,report:[0,2,3,4],resourc:1,run:2,segment:4,sensit:[],separ:4,signatur:4,snv:4,somat:4,sourc:[],start:2,tab:4,tabl:[],technolog:0,test:2,tsv:4,tumor:[],type:[],use:0,util:1,valu:4,variant:[1,4],variat:[],vcf:4,vep:4,welcom:3,what:0,why:0}})
\ No newline at end of file
+Search.setIndex({docnames:["about","annotation_resources","getting_started","index","output"],envversion:50,filenames:["about.rst","annotation_resources.rst","getting_started.rst","index.rst","output.rst"],objects:{},objnames:{},objtypes:{},terms:{"1000g":4,"1000genom":1,"12th":[],"140453136a":1,"14th":[],"16gb":[],"17gb":2,"17th":[],"19th":2,"1gb":2,"1kg":[],"2016_03":[],"2016_09":1,"2020plu":2,"28th":[],"2gb":[],"5gb":2,"5th":1,"8th":1,"abstract":1,"case":[1,2,4],"class":4,"default":2,"function":[2,3,4],"import":[0,2,4],"new":2,"public":4,"short":[1,4],CDS:4,EAS:4,For:[1,2,4],IDs:4,POS:[],SAS:4,The:[0,1,2,4],There:0,These:[],_strong:[],abber:3,aberr:[0,2,4],about:3,abov:4,accept:4,acceptor:4,access:[0,4],accompani:2,accord:[2,4],acid:4,acquir:0,across:4,action:4,actual:1,ada:[],adapt:4,add:4,adddit:4,addit:[0,4],adenoma:1,adjust:4,advanc:2,af_norm:[],af_tumor:[],affect:4,affected_spl:[],affili:0,afr:4,afr_af_1kg:4,afr_af_exac:4,afr_af_gnomad:4,african:4,after:4,aggreg:4,aid:0,algorithm:4,align:4,all:[0,1],allel:4,allele_num:4,alon:0,alreadi:[1,2],also:1,alt:4,alter:[2,4],altern:4,american:4,amino:4,amino_acid:4,among:[],amplif:2,amr:4,amr_af_1kg:4,amr_af_exac:4,amr_af_gnomad:4,analys:2,analysi:0,analyz:4,ani:[2,4],annot:[0,2,3],annotation_resourc:[],antineoplast:[1,4],antineoplastic_drug_interact:4,antineoplastic_drugs_dgidb:4,append:4,appli:4,applic:[0,4],appri:4,approv:4,approx:2,april:[1,2],argument:2,asian:4,assembl:4,assign:2,associ:2,attach:[0,4],aug:[],b147:1,base:[2,3,4],basic:[0,2,3,4],been:[0,1,2,4],below:4,benign:[],best:4,betweeen:1,bgzip:4,bind:4,biocomput:4,bioconductor:4,biologi:0,biomark:[0,1,2],biotyp:4,block:4,block_substitut:[],bm_citat:4,bm_clinical_signific:4,bm_disease_nam:4,bm_drug_nam:4,bm_evidence_direct:4,bm_evidence_level:4,bm_evidence_typ:4,bm_rate:4,bool:[],boost:4,both:[0,4],braf:1,breast:4,browser:4,build:4,bundl:[0,1,2],cadd:4,call:2,call_confid:[],caller:4,can:[0,2,4],cancer:4,cancer_census_germlin:4,cancer_census_somat:4,cancer_mutation_hotspot:4,cancer_typ:4,cancerhotspot:4,candid:4,canon:4,cap:4,caption:[],care:0,carri:4,catalog:[1,4],catalogu:1,categori:4,caus:4,causal:4,cbmdb:1,cbmdb_id:4,ccd:4,cdna:4,cdna_posit:4,cds:[],cds_chang:4,cds_end_nf:4,cds_posit:4,cds_start_nf:4,cell:4,cell_typ:4,cellular:[],cencu:1,censu:4,challeng:0,chang:[2,4],check:2,chr1:4,chr7:1,chr:4,chrom:4,chrome:4,chromosom:4,citat:[],cite:4,civic:[1,4],civic_id:4,civic_id_2:4,classif:4,clin:[],clin_sig:4,clinic:[0,3],clinvar:[1,4],clinvar_msid:4,clinvar_pmid:4,clinvar_sig:4,clinvar_variant_origin:4,cluster:4,cna:2,cna_seg:[2,4],cnminor:[],cntotal:[],cnv:[],coad:2,code:3,codon:[1,4],codon_numb:4,cohort:1,coincid:4,collect:1,colorect:[2,4],column:4,com:[],come:[1,4],command:2,common:4,complet:[2,4],complex:0,composit:4,comprehens:0,compress:4,comput:2,concern:3,confer:1,confid:[],confirmed_somat:1,consensu:4,consequ:3,consid:4,consortium:[0,1,4],constitut:4,contain:[0,2,4],content:[3,4],context:0,contribut:[2,4],convent:4,coordin:4,copi:[0,2,3],correctli:4,correspond:1,cosmic:[1,4],cosmic_cancer_type_al:4,cosmic_cancer_type_gw:4,cosmic_codon_count_gw:4,cosmic_codon_frac_gw:4,cosmic_consequ:4,cosmic_count_gw:4,cosmic_drug_resist:4,cosmic_fathmm_pr:4,cosmic_mutation_id:4,cosmic_sample_sourc:4,cosmic_site_histolog:4,cosmic_vartyp:4,count:4,cover:4,cpu:2,creat:[],criteria:1,csq:4,curat:[1,4],current:4,damag:[],data:[0,2,3,4],databas:3,databundl:2,dataset:3,date:0,dbnsfp:[1,4],dbnsfp_consensus_lr:4,dbnsfp_consensus_svm:4,dbsnp:[1,4],dbsnp_mappingstatu:4,dbsnp_submiss:4,dbsnp_valid:4,dbsnpbuildid:4,dbsnprsid:4,dec:1,decompos:4,decompress:2,deconstructsig:4,dedic:[],defin:4,delet:[2,4],delin:4,denot:4,depend:0,depth:4,deriv:4,describ:4,descript:4,determin:[],develop:[0,4],dgidb:1,diagnosi:2,diagnost:[0,4],differ:[2,4],direct:[],directli:4,directori:2,discov:1,diseas:4,distanc:4,distribut:4,dna:4,doc:[],docker:3,dockerhub:2,docm:1,docm_diseas:4,docm_pmid:4,document:[],domain:[3,4],done:4,donor:4,download:[],downstream:2,dp_normal:[],dp_tumor:[],drive:2,driver:[1,4],drug:[1,2,4],dure:2,each:4,eas_af_1kg:4,eas_af_exac:4,eas_af_gnomad:4,east:4,effect:[0,3],effect_predict:4,either:[1,4],emerg:4,encourag:4,end:4,engin:2,ensembl:[0,4],ensembl_gene_id:4,ensembl_transcript_id:4,ensp:4,entrez:4,entrez_id:4,error:2,estim:2,etc:[1,4],etiolog:[2,4],eur:4,eur_af_1kg:4,european:4,event:4,evid:[2,4],exac:[1,4],exampl:[2,4],exist:[2,4],existing_vari:4,exit:2,exom:[1,4],exon:[1,4],experi:4,experienc:4,experiment:4,expert:0,explor:4,extend:0,facet:[],factor:4,fail:2,fall:4,fals:2,famili:1,fathmm:4,fathmm_mkl:4,fda:1,featur:[3,4],feature_typ:4,feb:[],februai:1,februari:1,figur:0,file:[2,4],fin:[],fin_af_exac:4,fin_af_gnomad:4,find:[0,4],finnish:4,firefox:4,first:4,flag:[2,4],flag_pick_allel:4,flank:4,flexibl:0,focu:[],folder:2,follow:[1,2,4],forc:2,force_overwrit:2,fork:2,form:0,format:0,found:[1,2,4],four:4,frac:[],fraction:[],fraction_mut:4,frameshift:4,frequenc:3,from:[0,1,2,4],g12:1,gain:4,gencod:[1,4],gencode_tag:4,gencode_transcript_typ:4,gencode_v19:4,gene:[0,2,3],gene_biotyp:4,gene_nam:4,gene_pheno:4,gene_symbol:4,gener:[0,3,4],genet:1,genindex:[],genom:4,genome_vers:4,genomic_chang:4,genotyp:4,germlin:1,gerp:4,get:3,getting_start:[],given:4,global:4,global_af_1kg:4,global_af_exac:4,global_af_gnomad:4,gnomad:1,googl:[2,4],grch37:[1,2,4],great:0,guidelin:[1,4],gwa:4,gwas_catalog_pmid:4,gwas_catalog_trait_uri:4,gz_:[],gzip:2,handl:4,has:[0,2,4],have:[0,1,2,4],hdiv:4,help:[2,4],here:4,hgnc:[],hgnc_id:4,hgv:[1,4],hgvs_offset:4,hgvsc:4,hgvsp:4,hgvsp_short:4,high:4,high_inf_po:4,higher:[],highlight:0,histolog:[1,4],hit:4,homozyg:2,hospit:0,host:2,hotspot:[1,4],how:4,howev:1,html:[0,2,3],http:4,human:[1,4],humdiv:[],hvar:4,icgc:[1,4],icgc_project:4,identifi:[1,2,4],iii:0,imag:[0,2],impact:4,implic:4,improv:4,includ:[1,4],incomplet:4,indel:[0,2,3],index:4,indic:4,individu:[0,4],inf:[],infer:4,inferenti:4,info:4,inform:1,initi:4,input:[2,3],input_cna_seg:2,input_vcf:2,insert:4,insilico:3,instal:0,institut:0,instruct:2,integr:[0,4],intend:0,interact:[1,2,3],intern:1,interpret:[0,1,2],interrog:0,intersect:4,intogen:[1,4],intogen_driv:4,intogen_driver_mut:4,intro:[],intron:4,isoform:4,isol:0,item:[2,4],its:4,jan:4,june:[],kit:1,knowledg:[0,3],knowledgebas:1,known:[1,2,4],kra:1,lack:4,larg:[],latest:2,least:[],length:4,level:[0,1,4],librari:0,lies:4,like:4,limit:1,line:[],link:4,linux:2,list:[],literatur:[1,4],log:[2,4],logist:4,logr:4,logr_threshold_amplif:2,logr_threshold_homozygous_delet:2,lost:4,low:4,lrt:4,mac:2,machin:[2,4],maf:2,mai:[1,4],make:[],malign:1,mani:4,map:3,mappabl:4,mappingstatu:[],march:1,marker:1,master:[],match:4,matter:4,maxdepth:[],mean:4,measur:4,memori:2,messag:2,met:4,minim:2,minimum:[],minor:[],missens:4,mix:4,mkdir:[],mkl:[],modifi:4,modindex:[],modul:[],most:[0,1,4],motif:4,motif_nam:4,motif_po:4,motif_score_chang:4,motiffeatur:4,mozilla:4,mrna:4,msid:[],multipl:4,must:[1,2,4],mut:[],mutat:[0,1,2],mutational_signatur:[2,4],mutationassessor:4,mutationtast:4,mutect:[],mutpr:4,mutsigcv:2,name:4,navig:0,nccn:1,ncgc:[],need:0,nfe:[],nfe_af_exac:4,nfe_af_gnomad:4,nomenclatur:1,non:[1,4],none:2,normal:4,norwegian:0,notat:4,note:[3,4],nov:4,novemb:1,novo:4,now:2,nucleotid:[2,4],num:[],num_vcfanno_process:2,num_vep_fork:2,number:[0,2,3],numer:4,observ:4,obtain:4,oct:[],offset:[],oncogen:[1,4],oncolog:[0,2],oncologist:0,oncoscor:4,one:4,onli:[0,1,4],ontolog:4,option:[2,4],order:4,org:4,organ:[2,4],origin:4,oslo:0,osx:[],oth:[],oth_af_exac:4,oth_af_gnomad:4,other:[2,3],out:4,output:[2,3],output_dir:2,overlap:[2,4],overview:[],overwrit:2,packag:[0,4],page:[],pair:4,pars:4,part:[1,4],particular:4,pass:4,pcgr:[1,3,4],pcgr_dir:2,pcgr_directori:[],pcgreport:[],percent:4,person:2,pfam:1,phase3:1,phase:4,pheno:4,phenotyp:4,phred:[],pick:4,pipelin:4,platform:2,pmid:4,point:4,polyp:1,polyphen2:4,portrai:4,pose:0,posit:[2,4],possibl:1,potenti:4,pre:4,precis:[0,2],pred:[],predict:[3,4],predictor:[0,1],predispos:4,predisposit:4,prefer:2,prefix:2,prerequisit:3,present:[0,4],primari:4,princip:4,prinicip:[],priorit:0,process:[2,4],produc:[0,2,4],product:4,profil:4,prognosi:2,prognost:[0,4],program:2,project:4,properli:4,proport:1,propos:4,proposed_aetiolog:4,prot:4,protein:3,protein_chang:4,protein_domain:4,protein_posit:4,provean:4,provid:4,pubm:4,pull:2,python:[],qualiti:[3,4],queri:[2,4],quickstart:[],ram:2,rang:4,rate:4,rather:4,ratio:[2,4],raw:4,recommend:4,record:4,ref:[],refer:[1,4],reflect:4,refseq:4,refseq_match:4,refut:4,regress:4,regulatori:4,regulatoryfeatur:4,rel:4,relat:[0,1],releas:[1,2,4],relev:[0,2,4],replac:2,report:1,reported_in_another_cancer_sample_as_somat:1,repres:2,represent:4,requir:[0,1,2,4],research:0,resist:[2,4],resourc:[0,2,3],respect:4,restart:2,result:[0,2,4],retriev:[0,4],revel:4,rich:2,robust:4,root:[],rsid:4,run:[3,4],run_pcgr:[],safari:4,sampl:[1,2,4],sample_id:[2,4],sample_pair_identifi:[],sampleid:4,sas_af_1kg:4,sas_af_exac:4,sas_af_gnomad:4,satisfi:1,scale:4,scarciti:0,scientif:1,scientist:0,score:4,screen:4,script:2,search:[],segment:2,segment_end:4,segment_length:4,segment_mean:4,segment_start:4,sensit:[2,4],sep:[],separ:2,sequenc:[1,2,4],set:[0,2,4],sever:0,shift:4,shortest:4,should:2,show:[2,4],sift:4,sig:[],sigantur:[],signatur:2,signature_id:4,signific:[1,4],sigven:2,similarli:2,singl:4,site:[1,4],snv:[0,2,3],snvs_indel:[2,4],softwar:[0,2],somat:[0,1,2,3],sort:4,sourc:4,south:4,specif:4,sphinx:[],splice:4,splice_site_effect_ada:4,splice_site_effect_bool:[],splice_site_effect_rf:4,split:4,stabl:4,stand:0,standard:4,star:4,start:[3,4],statement:4,statist:1,statu:[1,4],step:[],stop:4,strand:4,strelka:[],strip:4,strive:4,strong:[],strongli:4,structur:4,studi:4,subject:4,submiss:4,submit:4,subset:1,substitut:[],subtyp:4,support:4,suppressor:[1,4],svm:[],swiss:4,swissprot:[1,4],symbol:4,symbol_sourc:4,synonym:[1,4],systemat:0,tab:2,tabix:4,tabl:3,tag:4,take:2,taken:0,tar:2,target:4,tcga:[2,4],technolog:3,termin:2,test:[3,4],test_sampl:[],tfbp:4,tgz:2,thei:1,therapeut:[0,4],therapi:4,therefor:4,thi:[1,2,4],those:4,through:[0,2,4],throughput:[],thu:0,tier:[0,2,4],tier_descript:4,toctre:[],todo:[],tool:[0,2],toolbar:2,total:4,trait:4,transcript:[2,4],transcript_end:4,transcript_overlap_perc:4,transcript_start:4,transvar:1,treatment:4,trembl:4,treshold:2,trial:[1,4],trust:4,tsgene:[1,4],tsgene_oncogen:4,tsl:4,tsv:2,tumor:[0,1,2,4],tumor_sampl:2,tumor_suppressor:4,tumor_typ:4,tumorigenesi:4,two:[2,4],type:[2,4],unambigu:1,unannot:4,underli:[2,4],uniform:4,uniparc:4,uniprot:[1,4],uniprot_featur:4,uniprot_id:4,uniprotkb:4,uniqu:4,univers:0,unix:2,unpack:2,untar:2,upcom:4,upon:[],upper:4,uri:4,usag:2,use:[2,3],used:[1,2],user:4,using:[0,2,4],util:3,v15:[],v19:[1,4],v22:[],v23:1,v30:[],v31:1,v600e:1,v78:[],v80:1,v85:1,valid:4,valu:2,variabl:4,variant:[0,2,3],variant_class:4,variat:4,variou:4,vartyp:[],vcf:2,vcf_sample_id:4,vcfanno:[0,2],vcfbreakmulti:4,vcflib:4,vcftool:4,vector:4,vep:[0,1,2],vep_all_consequ:4,veri:2,version:[2,4],view:4,virtual:2,weak:[],weak_mutect:[],weak_strelka:[],weight:4,what:3,whenev:1,where:4,whether:4,which:[0,1,2,4],why:3,wide:[1,4],window:2,within:[1,2,4],work:4,workflow:[0,2,4],working_directori:[],wtsi:4,xvf:2,you:2,your:2,yyyymmdd:2},titles:["About","Annotation resources","Getting started","Welcome to Personal Cancer Genome Reporter’s documentation!","Input & output"],titleterms:{"function":1,abber:4,about:0,all:4,among:4,annot:[1,4],associ:4,base:[0,1],basic:1,biomark:4,both:[],call:4,cancer:[0,1,2,3],clinic:[1,2,4],code:[1,4],concern:1,consequ:[1,4],copi:4,data:1,databas:[1,4],dataset:1,differ:[],dna:[],docker:[0,2],document:3,domain:1,download:2,drug:[],effect:[1,4],etc:[],exampl:[],featur:1,format:4,frequenc:[1,4],gene:[1,4],gener:2,genom:[0,1,2,3],germlin:4,get:2,hotspot:[],html:4,includ:[],indel:4,indic:[],inform:4,input:4,insilico:1,instal:2,interact:4,introduct:[],knowledg:1,list:4,map:1,marker:[],mutat:4,ncgc:[],note:1,number:4,oncovarexplor:[],other:[1,4],output:4,packag:[],pcgr:[0,2],person:[0,3],predict:1,preprocess:[],prerequisit:2,protein:[1,4],python:2,qualiti:1,report:[0,2,3,4],resourc:1,run:2,segment:4,sensit:[],separ:4,signatur:4,snv:4,somat:4,sourc:[],start:2,tab:4,tabl:[],technolog:0,test:2,tsv:4,tumor:[],type:[],use:0,util:1,valu:4,variant:[1,4],variat:[],vcf:4,vep:4,welcom:3,what:0,why:0}})
\ No newline at end of file
diff --git a/docs/annotation_resources.md b/docs/annotation_resources.md
index f02514bc..24fb9d8c 100644
--- a/docs/annotation_resources.md
+++ b/docs/annotation_resources.md
@@ -35,13 +35,13 @@
### Genome mapping
-A requirement for all variant annotation datasets used in PCGR is that they have been mapped unambiguously to the human genome (GRCh37). For most datasets this is already the case (i.e. dbSNP, COSMIC, ClinVar etc.). A significant proportion of variants in the annotation datasets related to clinical interpretation, CIViC and CBMDB, are however not mapped to the genome. Whenever possible, we have utilized [TransVar](http://bioinformatics.mdanderson.org/transvarweb/) to identify the actual genomic variants (e.g. _g.chr7:140453136A>T_) that corresponds to variants reported with other HGVS nomenclature (e.g. _p.V600E_).
+A requirement for all variant annotation datasets used in PCGR is that they have been mapped unambiguously to the human genome (GRCh37). For most datasets this is already the case (i.e. dbSNP, COSMIC, ClinVar etc.). A significant proportion of variants in the annotation datasets related to clinical interpretation, CIViC and CBMDB, is however not mapped to the genome. Whenever possible, we have utilized [TransVar](http://bioinformatics.mdanderson.org/transvarweb/) to identify the actual genomic variants (e.g. _g.chr7:140453136A>T_) that correspond to variants reported with other HGVS nomenclature (e.g. _p.V600E_).
### Other data quality concerns
__Clinical biomarkers__
-Clinical biomarkers included in PCGR is limited to the following:
+Clinical biomarkers included in PCGR are limited to the following:
* Markers reported at the variant level (e.g. __BRAF p.V600E__)
* Markers reported at the codon level (e.g. __KRAS p.G12__)
diff --git a/docs/annotation_resources.rst b/docs/annotation_resources.rst
index 02c5d8d6..fe8a3d12 100644
--- a/docs/annotation_resources.rst
+++ b/docs/annotation_resources.rst
@@ -79,11 +79,11 @@ A requirement for all variant annotation datasets used in PCGR is that
they have been mapped unambiguously to the human genome (GRCh37). For
most datasets this is already the case (i.e. dbSNP, COSMIC, ClinVar
etc.). A significant proportion of variants in the annotation datasets
-related to clinical interpretation, CIViC and CBMDB, are however not
+related to clinical interpretation, CIViC and CBMDB, is however not
mapped to the genome. Whenever possible, we have utilized
`TransVar
`__ to
identify the actual genomic variants (e.g. *g.chr7:140453136A>T*) that
-corresponds to variants reported with other HGVS nomenclature (e.g.
+correspond to variants reported with other HGVS nomenclature (e.g.
*p.V600E*).
Other data quality concerns
@@ -91,7 +91,7 @@ Other data quality concerns
**Clinical biomarkers**
-Clinical biomarkers included in PCGR is limited to the following:
+Clinical biomarkers included in PCGR are limited to the following:
- Markers reported at the variant level (e.g. **BRAF p.V600E**)
- Markers reported at the codon level (e.g. **KRAS p.G12**)
diff --git a/docs/getting_started.md b/docs/getting_started.md
index 3e800218..6906b861 100644
--- a/docs/getting_started.md
+++ b/docs/getting_started.md
@@ -23,18 +23,18 @@ An installation of Python (version 2.7.13) is required to run PCGR. Check that P
#### Download PCGR
-__April 14th 2017__: New release (0.3.1)
+__April 19th 2017__: New release (0.3.2)
-* Download and unpack the [latest release (0.3.1)](https://github.com/sigven/pcgr/releases/latest)
+* Download and unpack the [latest release (0.3.2)](https://github.com/sigven/pcgr/releases/latest)
* Download and unpack the data bundle (approx. 17Gb) in the PCGR directory
- * Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number, e.g. `~/pcgr-0.3.1`)
+ * Download [the latest data bundle](https://drive.google.com/file/d/0B8aYD2TJ472mQjZOMmg4djZfT1k/) from Google Drive to `~/pcgr-X.X` (replace _X.X_ with the version number, e.g. `~/pcgr-0.3.2`)
* Decompress and untar the bundle, e.g. through the following Unix command: `gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -`
A _data/_ folder within the _pcgr-X.X_ software folder should now have been produced
-* Pull the [PCGR Docker image - 0.3.1](https://hub.docker.com/r/sigven/pcgr/) from DockerHub (3.1Gb) :
- * `docker pull sigven/pcgr:0.3.1` (PCGR annotation engine)
+* Pull the [PCGR Docker image - 0.3.2](https://hub.docker.com/r/sigven/pcgr/) from DockerHub (3.1Gb) :
+ * `docker pull sigven/pcgr:0.3.2` (PCGR annotation engine)
### Run test - generation of clinical report for a cancer genome
@@ -55,7 +55,7 @@ A tumor sample report is generated by calling the Python script __pcgr.py__, whi
positional arguments:
pcgr_dir PCGR base directory with accompanying data directory,
- e.g. ~/pcgr-0.3
+ e.g. ~/pcgr-0.3.2
output_dir Output directory
sample_id Tumor sample/cancer genome identifier - prefix for
output files
@@ -90,7 +90,7 @@ A tumor sample report is generated by calling the Python script __pcgr.py__, whi
The _examples_ folder contain input files from two tumor samples sequenced within TCGA. A report for a colorectal tumor case can be generated by running the following command in your terminal window:
`python pcgr.py --input_vcf examples/tumor_sample.COAD.vcf.gz --input_cna_segments `
-`examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.1 ~/pcgr-0.3.1/examples tumor_sample.COAD`
+`examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.2 ~/pcgr-0.3.2/examples tumor_sample.COAD`
This command will run the Docker-based PCGR workflow and produce the following output files in the _examples_ folder:
diff --git a/docs/getting_started.rst b/docs/getting_started.rst
index 35ddf1c0..cce4c2d5 100644
--- a/docs/getting_started.rst
+++ b/docs/getting_started.rst
@@ -42,10 +42,10 @@ terminal window.
Download PCGR
^^^^^^^^^^^^^
-**April 14th 2017**: New release (0.3.1)
+**April 19th 2017**: New release (0.3.2)
- Download and unpack the `latest release
- (0.3.1)
`__
+ (0.3.2) `__
- Download and unpack the data bundle (approx. 17Gb) in the PCGR
directory
@@ -53,7 +53,7 @@ Download PCGR
- Download `the latest data
bundle `__
from Google Drive to ``~/pcgr-X.X`` (replace *X.X* with the
- version number, e.g. ``~/pcgr-0.3.1``)
+ version number, e.g. ``~/pcgr-0.3.2``)
- Decompress and untar the bundle, e.g. through the following Unix
command:
``gzip -dc pcgr.databundle.GRCh37.YYYYMMDD.tgz | tar xvf -``
@@ -62,10 +62,10 @@ Download PCGR
have been produced
- Pull the `PCGR Docker image -
- 0.3.1 `__ from DockerHub
+ 0.3.2 `__ from DockerHub
(3.1Gb) :
- - ``docker pull sigven/pcgr:0.3.1`` (PCGR annotation engine)
+ - ``docker pull sigven/pcgr:0.3.2`` (PCGR annotation engine)
Run test - generation of clinical report for a cancer genome
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -89,7 +89,7 @@ A tumor sample report is generated by calling the Python script
positional arguments:
pcgr_dir PCGR base directory with accompanying data directory,
- e.g. ~/pcgr-0.3
+ e.g. ~/pcgr-0.3.2
output_dir Output directory
sample_id Tumor sample/cancer genome identifier - prefix for
output files
@@ -125,7 +125,7 @@ sequenced within TCGA. A report for a colorectal tumor case can be
generated by running the following command in your terminal window:
``python pcgr.py --input_vcf examples/tumor_sample.COAD.vcf.gz --input_cna_segments``
-``examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.1 ~/pcgr-0.3.1/examples tumor_sample.COAD``
+``examples/tumor_sample.COAD.cna.tsv ~/pcgr-0.3.2 ~/pcgr-0.3.2/examples tumor_sample.COAD``
This command will run the Docker-based PCGR workflow and produce the
following output files in the *examples* folder:
diff --git a/docs/output.md b/docs/output.md
index 1bb64b22..02caeb1f 100644
--- a/docs/output.md
+++ b/docs/output.md
@@ -20,7 +20,9 @@ The following requirements __MUST__ be met by the input VCF for PCGR to work pro
* We __strongly__ recommend that the input VCF is compressed and indexed using [bgzip](http://www.htslib.org/doc/tabix.html) and [tabix](http://www.htslib.org/doc/tabix.html)
* 'chr' must be stripped from the chromosome names
-__IMPORTANT NOTE__: Considering the VCF output for the [numerous somatic SNV/InDel callers](https://www.biostars.org/p/19104/) that have been developed, we have a experienced a general lack of uniformity and robustness for the representation of somatic variant genotype data (e.g. variant allelic depths (tumor/normal), genotype quality etc.). In the output results provided within the current version of PCGR, we are considering PASSed variants only, and variant genotype data (i.e. as found in the VCF SAMPLE columns) are not handled or parsed. As improved standards for this matter may emerge, we will strive to include this information in the annotated output files.
+__IMPORTANT NOTE 1__: Considering the VCF output for the [numerous somatic SNV/InDel callers](https://www.biostars.org/p/19104/) that have been developed, we have a experienced a general lack of uniformity and robustness for the representation of somatic variant genotype data (e.g. variant allelic depths (tumor/normal), genotype quality etc.). In the output results provided within the current version of PCGR, we are considering PASSed variants only, and variant genotype data (i.e. as found in the VCF SAMPLE columns) are not handled or parsed. As improved standards for this matter may emerge, we will strive to include this information in the annotated output files.
+
+__IMPORTANT NOTE 2__: PCGR generates a number of VCF INFO annotation tags that is appended to the query VCF. We will therefore encourage the users to submit query VCF files that have not been subject to annotations by other means, but rather a VCF file that comes directly from variant calling. If not, there are likely to be INFO tags in the query VCF file that coincide with those produced by PCGR.
#### Copy number segments
diff --git a/docs/output.rst b/docs/output.rst
index d52b9134..e59a1c9d 100644
--- a/docs/output.rst
+++ b/docs/output.rst
@@ -36,17 +36,24 @@ work properly:
`tabix `__
- 'chr' must be stripped from the chromosome names
-**IMPORTANT NOTE**: Considering the VCF output for the `numerous somatic
-SNV/InDel callers `__ that have been
-developed, we have a experienced a general lack of uniformity and
-robustness for the representation of somatic variant genotype data (e.g.
-variant allelic depths (tumor/normal), genotype quality etc.). In the
-output results provided within the current version of PCGR, we are
+**IMPORTANT NOTE 1**: Considering the VCF output for the `numerous
+somatic SNV/InDel callers `__ that
+have been developed, we have a experienced a general lack of uniformity
+and robustness for the representation of somatic variant genotype data
+(e.g. variant allelic depths (tumor/normal), genotype quality etc.). In
+the output results provided within the current version of PCGR, we are
considering PASSed variants only, and variant genotype data (i.e. as
found in the VCF SAMPLE columns) are not handled or parsed. As improved
standards for this matter may emerge, we will strive to include this
information in the annotated output files.
+**IMPORTANT NOTE 2**: PCGR generates a number of VCF INFO annotation
+tags that is appended to the query VCF. We will therefore encourage the
+users to submit query VCF files that have not been subject to
+annotations by other means, but rather a VCF file that comes directly
+from variant calling. If not, there are likely to be INFO tags in the
+query VCF file that coincide with those produced by PCGR.
+
Copy number segments
^^^^^^^^^^^^^^^^^^^^
diff --git a/pcgr.py b/pcgr.py
index 7b1167e7..5b594dfe 100755
--- a/pcgr.py
+++ b/pcgr.py
@@ -8,6 +8,8 @@
import logging
import sys
+version = '0.3.2'
+
def __main__():
parser = argparse.ArgumentParser(description='Personal Cancer Genome Reporter (PCGR) workflow for clinical interpretation of somatic nucleotide variants and copy number aberration segments',formatter_class=argparse.ArgumentDefaultsHelpFormatter)
@@ -18,12 +20,12 @@ def __main__():
parser.add_argument('--num_vcfanno_processes', dest = "num_vcfanno_processes", default=4, type=int, help='Number of processes used during vcfanno annotation')
parser.add_argument('--num_vep_forks', dest = "num_vep_forks", default=4, type=int, help='Number of forks (--forks option in VEP) used during VEP annotation')
parser.add_argument('--force_overwrite', action = "store_true", help='By default, the script will fail with an error if any output file already exists. You can force the overwrite of existing result files by using this flag')
- parser.add_argument('--version', action='version', version='%(prog)s 0.3.1')
+ parser.add_argument('--version', action='version', version='%(prog)s ' + str(version))
parser.add_argument('pcgr_dir',help='PCGR base directory with accompanying data directory, e.g. ~/pcgr-0.3.1')
parser.add_argument('output_dir',help='Output directory')
parser.add_argument('sample_id',help="Tumor sample/cancer genome identifier - prefix for output files")
- docker_image_version = 'sigven/pcgr:0.3.1'
+ docker_image_version = 'sigven/pcgr:' + str(version)
args = parser.parse_args()
overwrite = 0
@@ -199,7 +201,7 @@ def run_pcgr(host_directories, docker_image_version, logR_threshold_amplificatio
## verify VCF and CNA segment file
logger = getlogger('pcgr-validate-input')
logger.info("STEP 0: Validate input data")
- vcf_validate_command = str(docker_command_run1) + "pcgr_check_input.py " + str(input_vcf_docker) + " " + str(input_cna_segments_docker) + "\""
+ vcf_validate_command = str(docker_command_run1) + "pcgr_check_input.py /data " + str(input_vcf_docker) + " " + str(input_cna_segments_docker) + "\""
check_subprocess(vcf_validate_command)
logger.info('Finished')
@@ -252,7 +254,6 @@ def run_pcgr(host_directories, docker_image_version, logR_threshold_amplificatio
logger = getlogger('pcgr-writer')
logger.info("STEP 4: Generation of output files")
pcgr_report_command = str(docker_command_run1) + "/pcgr.R /workdir/output " + str(output_vcf) + " " + str(input_cna_segments_docker) + " " + str(sample_id) + " " + str(logR_threshold_amplification) + " " + str(logR_threshold_homozygous_deletion) + "\""
- #print str(pcgr_report_command)
check_subprocess(pcgr_report_command)
logger.info("Finished")
diff --git a/src/R/pcgrr2/.Rhistory b/src/R/pcgrr2/.Rhistory
index b069d275..b1190f86 100644
--- a/src/R/pcgrr2/.Rhistory
+++ b/src/R/pcgrr2/.Rhistory
@@ -1,366 +1,3 @@
-library(pcgrr)
-library(pcgrr2)
-df <- data.frame('CIVIC_ID' = character())
-df <- rbind(df, data.frame('CIVIC_ID' = 'EID1'))
-df
-df <- rbind(df, data.frame('CIVIC_ID' = 'EID2'))
-df
-stringr::str_detect(df$CIVIC_ID,"EID")
-if(stringr::str_detect(df$CIVIC_ID,"EID")){}
-if(any(stringr::str_detect(df$CIVIC_ID,"EID"))){}
-if(any(stringr::str_detect(df$CIVIC_ID,"EID"))){
-cat('balle\n')
-}
-library(pcgrr2)
-pfam <- list('version' = 'Pfam 31.0 (March 2017)')
-pfam$version
-pfam <- list('version' = 'v31.0 (March 2017)')
-uniprot <- list('version' = 'release 2017_03')
-pfam <- list('version' = 'v31.0 (March 2017)')
-uniprot <- list('version' = 'release 2017_03')
-database_versions <- list('pfam' = pfam, 'uniprot' = uniprot)
-database_versions$pfam$version
-?install.packages
-help(install)
-library(pcgrr2)
-help(install.packages)
-help("install")
-library(devtools)
-help(install)
-setwd('/Users/sigven/research/docker/pcgr/examples')
-suppressWarnings(suppressPackageStartupMessages(library(pcgrr2)))
-suppressWarnings(suppressPackageStartupMessages(library(magrittr)))
-suppressWarnings(suppressPackageStartupMessages(library(BSgenome.Hsapiens.UCSC.hg19)))
-suppressWarnings(suppressPackageStartupMessages(library(deconstructSigs)))
-project_directory <- getwd()
-query_vcf <- 'TCGA-A6-2686-01A-01D-1408-10_TCGA-A6-2686-10A-01D-2188-10.pcgr.vcf.gz'
-sample_name <- 'TCGA-A6-2686-01A-01D-1408-10_TCGA-A6-2686-10A-01D-2188-10'
-load('../data/rda/pcgr_data.rda')
-sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
-sort(colnames(sample_calls))
-minimum_n_signature_analysis <- 50
-signatures_limit <- 6
-tier1_report <- FALSE
-tier2_report <- FALSE
-tier3_report <- FALSE
-tier4_report <- FALSE
-tier5_report <- FALSE
-clinical_evidence_items_tier1A <- data.frame()
-clinical_evidence_items_tier1B <- data.frame()
-clinical_evidence_items_tier1C <- data.frame()
-variants_tier1_display <- data.frame()
-variants_tier2_display <- data.frame()
-variants_tier3_display <- data.frame()
-variants_tier4_display <- data.frame()
-variants_tier5_display <- data.frame()
-signature_report <- FALSE
-missing_signature_data <- FALSE
-signature_call_set <- data.frame()
-sample_calls_coding <- sample_calls %>% dplyr::filter(stringr::str_detect(CONSEQUENCE,"stop_gained|stop_lost|start_lost|frameshift_variant|missense_variant|splice_donor|splice_acceptor|inframe_deletion|inframe_insertion"))
-rlogging::message(paste0("Number of coding variants: ",nrow(sample_calls_coding)))
-sample_calls_noncoding <- sample_calls %>% dplyr::filter(!stringr::str_detect(CONSEQUENCE,"stop_gained|stop_lost|start_lost|frameshift_variant|missense_variant|splice_donor|splice_acceptor|inframe_deletion|inframe_insertion"))
-rlogging::message(paste0("Number of noncoding variants: ",nrow(sample_calls_noncoding)))
-#sample_stats_plot_all <- OncoVarReporter::plot_call_statistics(sample_calls,"Somatic calls - all")
-#sample_stats_plot_coding <- OncoVarReporter::plot_call_statistics(sample_calls_coding,"Somatic calls - coding")
-min_variants_for_signature <- minimum_n_signature_analysis
-signature_data <- NULL
-tsv_variants <- NULL
-tsv_biomarkers <- NULL
-if(any(grepl(paste0("VARIANT_CLASS$"),names(sample_calls)))){
-if(nrow(sample_calls[sample_calls$VARIANT_CLASS == 'SNV',]) >= min_variants_for_signature){
-signature_call_set <- sample_calls[sample_calls$VARIANT_CLASS == 'SNV',]
-signature_call_set <- dplyr::filter(signature_call_set, CHROM != 'MT')
-signature_call_set$VCF_SAMPLE_ID <- sample_name
-signature_report <- TRUE
-mut_signature_contributions <- pcgrr2::signature_contributions_single_sample(signature_call_set, sample_name = sample_name, signatures_limit = signatures_limit)
-signature_columns <- as.numeric(stringr::str_replace(as.character(mut_signature_contributions$which_signatures_df[mut_signature_contributions$which_signatures_df$signature_id != 'unknown',]$signature_id),"S",""))
-weight_df <- data.frame('Signature_ID' = as.character(mut_signature_contributions$which_signatures_df$signature_id), 'Weight' = round(as.numeric(mut_signature_contributions$which_signatures_df$weight),digits=3), stringsAsFactors = F)
-cancertypes_aetiologies <- pcgr_data$signatures_aetiologies[signature_columns,]
-signatures_cancertypes_aetiologies <- dplyr::left_join(cancertypes_aetiologies,weight_df,by=c("Signature_ID")) %>% dplyr::arrange(desc(Weight))
-signatures_cancertypes_aetiologies <- signatures_cancertypes_aetiologies[,c("Signature_ID","Weight","Cancer_types","Proposed_aetiology","Comments")]
-signature_data <- list('signature_call_set' = signature_call_set, 'mut_signature_contributions' = mut_signature_contributions, 'signatures_cancertypes_aetiologies' = signatures_cancertypes_aetiologies)
-}
-else{
-if(nrow(sample_calls[sample_calls$VARIANT_CLASS == 'SNV',]) > 0){
-signature_call_set <- sample_calls[sample_calls$VARIANT_CLASS == 'SNV',]
-signature_call_set <- dplyr::filter(signature_call_set, CHROM != 'MT')
-}
-rlogging::message(paste0("Too few variants (n = ",nrow(signature_call_set),") for reconstruction of mutational signatures by deconstructSigs"))
-missing_signature_data <- TRUE
-signature_data <- list('signature_call_set' = signature_call_set)
-}
-}
-vcf_data_df <- sample_calls_coding
-mapping <- 'exact'
-variant_origin <- 'Somatic Mutation'
-if("pubmed_html_link" %in% colnames(pcgr_data$civic_biomarkers)){
-pcgr_data$civic_biomarkers <- dplyr::rename(pcgr_data$civic_biomarkers, citation = pubmed_html_link)
-}
-if("evidence_description" %in% colnames(pcgr_data$civic_biomarkers)){
-pcgr_data$civic_biomarkers <- dplyr::rename(pcgr_data$civic_biomarkers, description = evidence_description)
-}
-if("pubmed_html_link" %in% colnames(pcgr_data$cbmdb_biomarkers)){
-pcgr_data$cbmdb_biomarkers <- dplyr::rename(pcgr_data$cbmdb_biomarkers, citation = pubmed_html_link)
-}
-if("evidence_description" %in% colnames(pcgr_data$cbmdb_biomarkers)){
-pcgr_data$cbmdb_biomarkers <- dplyr::rename(pcgr_data$cbmdb_biomarkers, description = evidence_description)
-}
-clinical_evidence_items <- data.frame()
-biomarker_descriptions <- data.frame()
-pcgr_data$cbmdb_biomarkers <- dplyr::filter(pcgr_data$cbmdb_biomarkers, is.na(variant_origin) | variant_origin == variant_origin)
-pcgr_data$civic_biomarkers <- dplyr::filter(pcgr_data$civic_biomarkers, is.na(variant_origin) | variant_origin == variant_origin)
-vcf_data_df_civic <- vcf_data_df %>% dplyr::filter(!is.na(CIVIC_ID))
-if(nrow(vcf_data_df_civic) > 0){
-tmp <- dplyr::select(vcf_data_df_civic,CIVIC_ID,VAR_ID)
-tmp <- tmp %>% tidyr::separate_rows(CIVIC_ID,sep=",")
-vcf_data_df_civic <- merge(tmp,dplyr::select(vcf_data_df_civic,-c(CIVIC_ID)),by.x = "VAR_ID",by.y = "VAR_ID")
-civic_calls <- dplyr::select(vcf_data_df_civic,dplyr::one_of(pcgr_data$pcgr_all_annotation_columns))
-eitems <- NULL
-if(any(stringr::str_detect(civic_calls$CIVIC_ID,"EID"))){
-eitems <- dplyr::left_join(civic_calls,dplyr::filter(dplyr::select(pcgr_data$civic_biomarkers,-c(civic_exon,civic_consequence,civic_codon,transvar_id,civic_id)),alteration_type == 'MUT'),by=c("CIVIC_ID" = "evidence_id"))
-}
-else{
-eitems <- dplyr::left_join(civic_calls,dplyr::filter(dplyr::select(pcgr_data$civic_biomarkers,-c(civic_exon,civic_consequence,civic_codon,transvar_id)),alteration_type == 'MUT'),by=c("CIVIC_ID" = "civic_id"))
-}
-names(eitems) <- toupper(names(eitems))
-eitems$BIOMARKER_MAPPING <- 'exact'
-bm_descriptions <- data.frame('description' = eitems$BIOMARKER_DESCRIPTION)
-biomarker_descriptions <- rbind(biomarker_descriptions, bm_descriptions)
-clinical_evidence_items <- rbind(clinical_evidence_items, eitems)
-}
-clinical_evidence_items
-vcf_data_df_cbmdb <- vcf_data_df %>% dplyr::filter(is.na(CIVIC_ID) & !is.na(CBMDB_ID))
-vcf_data_df_cbmdb
-colnames(clinical_evidence_items)
-pcgr_all_annotation_columns_reduced <- pcgr_data$pcgr_all_annotation_columns[-which(pcgr_data$pcgr_all_annotation_columns == 'EXON' | pcgr_data$pcgr_all_annotation_columns == 'CIVIC_ID' | pcgr_data$pcgr_all_annotation_columns == 'CIVIC_ID_2' | pcgr_data$pcgr_all_annotation_columns == 'CBMDB_ID')]
-pcgr_all_annotation_columns_reduced
-all_tier1_tags <- c(pcgr_data$pcgr_all_annotation_columns_reduced,c("CLINICAL_SIGNIFICANCE","EVIDENCE_LEVEL","EVIDENCE_TYPE","EVIDENCE_DIRECTION","DISEASE_NAME","DESCRIPTION","CITATION","DRUG_NAMES","RATING"))
-clinical_evidence_items <- dplyr::select(clinical_evidence_items, dplyr::one_of(all_tier1_tags))
-unique_variants <- clinical_evidence_items %>% dplyr::select(SYMBOL,CONSEQUENCE,PROTEIN_CHANGE,CDS_CHANGE) %>% dplyr::distinct()
-clinical_evidence_items
-all_tier1_tags
-pcgr_all_annotation_columns_reduced <- pcgr_data$pcgr_all_annotation_columns[-which(pcgr_data$pcgr_all_annotation_columns == 'EXON' | pcgr_data$pcgr_all_annotation_columns == 'CIVIC_ID' | pcgr_data$pcgr_all_annotation_columns == 'CIVIC_ID_2' | pcgr_data$pcgr_all_annotation_columns == 'CBMDB_ID')]
-pcgr_all_annotation_columns_reduced
-all_tier1_tags <- c(pcgr_all_annotation_columns_reduced,c("CLINICAL_SIGNIFICANCE","EVIDENCE_LEVEL","EVIDENCE_TYPE","EVIDENCE_DIRECTION","DISEASE_NAME","DESCRIPTION","CITATION","DRUG_NAMES","RATING"))
-all_tier1_tags
-report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
-report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
-pcgrr2::generate_pcg_report(project_directory = getwd(), query_vcf = 'TCGA-A6-2686-01A-01D-1408-10_TCGA-A6-2686-10A-01D-2188-10.pcgr.vcf.gz' sample_name = sample_name, signatures_limit = 6)
-pcgrr2::generate_pcg_report(project_directory = getwd(), query_vcf = 'TCGA-A6-2686-01A-01D-1408-10_TCGA-A6-2686-10A-01D-2188-10.pcgr.vcf.gz', sample_name = sample_name, signatures_limit = 6)
-library(dplyr)
-help(left_join)
-vcf_gz_file <- '../../../examples/tumor_sample.BRCA.vcf.gz'
-library(BSgenome.Hsapiens.UCSC.hg19)
-library(magrittr)
-library(deconstructSigs)
-library(data.table)
-library(ggplot2)
-library(rlogging)
-rlogging::message(paste0("Reading and parsing VEP/vcfanno-annotated VCF file - ",vcf_gz_file))
-vcf_data_vr <- VariantAnnotation::readVcfAsVRanges(vcf_gz_file,genome = "hg19")
-vcf_data_vr <- vcf_data_vr[!is.na(vcf_data_vr$GT) & !(vcf_data_vr$GT == '.'),]
-vcf_data_vr <- pcgrr2::postprocess_vranges_info(vcf_data_vr)
-vcf_data_df <- as.data.frame(vcf_data_vr)
-vcf_data_df$GENOME_VERSION <- 'GRCh37'
-vcf_data_df <- dplyr::rename(vcf_data_df, CHROM = seqnames, POS = start, REF = ref, ALT = alt, CONSEQUENCE = Consequence, PROTEIN_CHANGE = HGVSp_short)
-vcf_data_df$GENOMIC_CHANGE <- paste(paste(paste(paste0("g.chr",vcf_data_df$CHROM),vcf_data_df$POS,sep=":"),vcf_data_df$REF,sep=":"),vcf_data_df$ALT,sep=">")
-vcf_data_df <- pcgrr2::add_pfam_domain_links(vcf_data_df)
-vcf_data_df <- pcgrr2::add_swissprot_feature_descriptions(vcf_data_df)
-vcf_data_df <- pcgrr2::add_read_support(vcf_data_df)
-rlogging::message("Extending annotation descriptions related to Database of Curated Mutations (DoCM)")
-vcf_data_df <- dplyr::left_join(vcf_data_df, pcgr_data$docm_literature, by=c("VAR_ID"))
-gencode_xref <- dplyr::rename(pcgr_data$gene_xref, Gene = ensembl_gene_id, GENENAME = name, ENTREZ_ID = entrezgene)
-gencode_xref <- gencode_xref %>% dplyr::filter(!is.na(Gene)) %>% dplyr::select(Gene,GENENAME,ENTREZ_ID) %>% dplyr::distinct()
-gencode_xref$GENENAME <- stringr::str_replace(gencode_xref$GENENAME," \\[.{1,}$","")
-gencode_xref$ENTREZ_ID <- as.character(gencode_xref$ENTREZ_ID)
-gencode_xref <- dplyr::filter(gencode_xref, !is.na(GENENAME) & !is.na(ENTREZ_ID))
-load('../../../data/rda/pcgr_data.rda')
-vcf_data_df$GENOME_VERSION <- 'GRCh37'
-vcf_data_df <- dplyr::rename(vcf_data_df, CHROM = seqnames, POS = start, REF = ref, ALT = alt, CONSEQUENCE = Consequence, PROTEIN_CHANGE = HGVSp_short)
-vcf_data_df$GENOMIC_CHANGE <- paste(paste(paste(paste0("g.chr",vcf_data_df$CHROM),vcf_data_df$POS,sep=":"),vcf_data_df$REF,sep=":"),vcf_data_df$ALT,sep=">")
-vcf_data_df <- pcgrr2::add_pfam_domain_links(vcf_data_df)
-vcf_data_df <- pcgrr2::add_swissprot_feature_descriptions(vcf_data_df)
-vcf_data_df <- pcgrr2::add_read_support(vcf_data_df)
-rlogging::message("Extending annotation descriptions related to Database of Curated Mutations (DoCM)")
-vcf_data_df <- dplyr::left_join(vcf_data_df, pcgr_data$docm_literature, by=c("VAR_ID"))
-gencode_xref <- dplyr::rename(pcgr_data$gene_xref, Gene = ensembl_gene_id, GENENAME = name, ENTREZ_ID = entrezgene)
-gencode_xref <- gencode_xref %>% dplyr::filter(!is.na(Gene)) %>% dplyr::select(Gene,GENENAME,ENTREZ_ID) %>% dplyr::distinct()
-gencode_xref$GENENAME <- stringr::str_replace(gencode_xref$GENENAME," \\[.{1,}$","")
-gencode_xref$ENTREZ_ID <- as.character(gencode_xref$ENTREZ_ID)
-gencode_xref <- dplyr::filter(gencode_xref, !is.na(GENENAME) & !is.na(ENTREZ_ID))
-dplyr::filter(gencode_xref, ENTREZ_ID == '79465')
-str(gencode_xref)
-str(pcgr_data$kegg_gene_pathway_links)
-dplyr::filter(pcgr_data$kegg_gene_pathway_links, gene_id == '79465')
-nrow(pcgr_data$kegg_gene_pathway_links)
-project_directory <- '/Users/sigven/research/docker/pcgr/examples'
-query_vcf <- 'tumor_sample.COAD.vcf.gz'
-library(pcgrr2)
-setwd(project_directory)
-vcf_gz_file <- 'EMN-3661-G1_EMN-3661-T2_MuTect_SNVs.vcf.gz'
-vcf_data_vr <- VariantAnnotation::readVcfAsVRanges(vcf_gz_file,genome = "hg19")
-head(vcf_data_vr)
-geno(vcf_data_vr)
-library(VariantAnnotation)
-geno(vcf_data_vr)
-vcf_data_vr
-mcols(vcf_data_vr)
-info(vcf_data_vr)
-help(VRanges)
-softFilterMatrix(vcf_data_vr)
-head(softFilterMatrix(vcf_data_vr))
-names(softFilterMatrix(vcf_data_vr))
-colnames(softFilterMatrix(vcf_data_vr))
-head(vcf_data_vr)
-vcf_data_vr[1463294,]
-vcf_data_vr[1463294]
-ranges(vcf_data_vr)
-ranges(vcf_data_vr)[vcf_data_vr[1463294]]
-ranges(vcf_data_vr)[1463294]
-unique(softFilterMatrix(vcf_data_vr))
-vcf_gz_file <- 'EMN-3661-G1_EMN-3661-T2.strelka.snv.vcf.gz'
-vcf_data_vr <- VariantAnnotation::readVcfAsVRanges(vcf_gz_file,genome = "hg19")
-unique(softFilterMatrix(vcf_data_vr))
-vcf_data_vr <- VariantAnnotation::readVcf(vcf_gz_file,genome = "hg19")
-head(vcf_data_vr)
-rowranges(vcf_data_vr)
-vcf_data_vr$`1:1431051_G/T
-``
-`
-`
-''
-`
-`
-`
-vcf_data_vr <- VariantAnnotation::readVcfAsVRanges(vcf_gz_file,genome = "hg19")
-head(vcf_data_vr)
-length(vcf_data_vr)
-length(softFilterMatrix)
-length(softFilterMatrix(vcf_data_vr))
-unique(softFilterMatrix(vcf_data_vr))
-mcols(vcf_data_vranges)
-library(GenomicRanges)
-mcols(vcf_data_vranges)
-GenomicRanges::mcols(vcf_data_vranges)
-GenomicRanges::mcols(vcf_data_vr)
-help("readVcfAsVRanges")
-called(vcf_data_vr)
-head(vcf_data_vr, 30)
-head(called(vcf_data_vr),20)
-vcf_data_vr <- vcf_data_vr[called(vcf_data_vr)]
-length(vcf_data_vr)
-vcf_gz_file <- 'tumor_sample.BRCA.vcf.gz'
-vcf_data_vr <- VariantAnnotation::readVcfAsVRanges(vcf_gz_file,genome = "hg19")
-length(vcf_data_vr)
-vcf_data_vr <- vcf_data_vr[called(vcf_data_vr)]
-length(vcf_data_vr)
-help(called)
-VRanges::called
-library(VRanges)
-library(pcgrr2)
-library(rlogging)
-knit_with_parameters('~/research/test_md.Rmd')
-help(as.numeric)
-tmp <- c(-1.234,NA)
-tmp
-tmp <- as.numeric(tmp)
-tmp
-cna_file <- '/Users/sigven/research/docker/pcgr/examples/test.cna.tsv'
-logR_threshold_amplification <- 0.8
-logR_threshold_homozygous_deletion <- -0.8
-cna_df <- read.table(file=cna_file,header = T,stringsAsFactors = F,comment.char="", quote="")
-cna_df <- dplyr::rename(cna_df, chromosome = Chromosome, LogR = Segment_Mean, segment_start = Start, segment_end = End) %>% dplyr::distinct()
-library(magrittr)
-cna_df <- dplyr::rename(cna_df, chromosome = Chromosome, LogR = Segment_Mean, segment_start = Start, segment_end = End) %>% dplyr::distinct()
-cna_df <- cna_df %>% dplyr::filter(!is.na(LogR))
-cna_df$LogR <- as.numeric(cna_df$LogR)
-str(cna_df)
-load('/Users/sigven/research/docker/pcgr/data/rda/pcgr_data.rda')
-cna_gr <- GenomicRanges::makeGRangesFromDataFrame(cna_df, keep.extra.columns = T, seqinfo = pcgr_data$seqinfo_hg19, seqnames.field = 'chromosome',start.field = 'segment_start', end.field = 'segment_end', ignore.strand = T, starts.in.df.are.0based = T)
-hits <- GenomicRanges::findOverlaps(cna_gr, pcgr_data$ensembl_genes_gr, type="any", select="all")
-ranges <- pcgr_data$ensembl_genes_gr[subjectHits(hits)]
-mcols(ranges) <- c(mcols(ranges),mcols(cna_gr[queryHits(hits)]))
-library(GenomicRanges)
-cna_gr <- GenomicRanges::makeGRangesFromDataFrame(cna_df, keep.extra.columns = T, seqinfo = pcgr_data$seqinfo_hg19, seqnames.field = 'chromosome',start.field = 'segment_start', end.field = 'segment_end', ignore.strand = T, starts.in.df.are.0based = T)
-hits <- GenomicRanges::findOverlaps(cna_gr, pcgr_data$ensembl_genes_gr, type="any", select="all")
-ranges <- pcgr_data$ensembl_genes_gr[subjectHits(hits)]
-mcols(ranges) <- c(mcols(ranges),mcols(cna_gr[queryHits(hits)]))
-help("subjectHits")
-df <- as.data.frame(mcols(ranges))
-df$segment_start <- start(ranges(cna_gr[queryHits(hits)]))
-df$segment_end <- end(ranges(cna_gr[queryHits(hits)]))
-df$segment_length <- paste(round((as.numeric((df$segment_end - df$segment_start)/1000000)),digits = 3),"Mb")
-df$transcript_start <- start(ranges)
-df$transcript_end <- end(ranges)
-df$chrom <- as.character(seqnames(ranges))
-df <- as.data.frame(df %>% dplyr::rowwise() %>% dplyr::mutate(transcript_overlap_percent = round(as.numeric((min(transcript_end,segment_end) - max(segment_start,transcript_start)) / (transcript_end - transcript_start)) * 100, digits = 2)))
-df$segment_link <- paste0("",paste0(df$chrom,':',df$segment_start,'-',df$segment_end),"")
-df_print <- df
-df_print <- dplyr::select(df_print,chrom,segment_start,segment_end,segment_length,LogR,ensembl_gene_id,symbol,ensembl_transcript_id,transcript_start,transcript_end,transcript_overlap_percent,name,gene_biotype,cancer_census_germline,cancer_census_somatic,tsgene,ts_oncogene,intogen_drivers,antineoplastic_drugs_dgidb,gencode_transcript_type,gencode_tag,gencode_v19)
-chrOrder <- c(as.character(paste0('chr',c(1:22))),"chrX","chrY")
-df_print$chrom <- factor(df_print$chrom, levels=chrOrder)
-df_print <- df_print[order(df_print$chrom),]
-df_print$segment_start <- as.integer(df_print$segment_start)
-df_print$segment_end <- as.integer(df_print$segment_end)
-df_print_sorted <- NULL
-for(chrom in chrOrder){
-if(nrow(df_print[df_print$chrom == chrom,]) > 0){
-chrom_regions <- df_print[df_print$chrom == chrom,]
-chrom_regions_sorted <- chrom_regions[with(chrom_regions, order(segment_start, segment_end)),]
-df_print_sorted <- rbind(df_print_sorted, chrom_regions_sorted)
-}
-}
-df_print_sorted$cancer_census_somatic <- stringr::str_replace_all(df_print_sorted$cancer_census_somatic,"&",", ")
-df_print_sorted$cancer_census_germline <- stringr::str_replace_all(df_print_sorted$cancer_census_germline,"&",", ")
-df_print_sorted$antineoplastic_drugs_dgidb <- stringr::str_replace_all(df_print_sorted$antineoplastic_drugs_dgidb,"&",", ")
-df <- dplyr::select(df, -ensembl_transcript_id) %>% dplyr::filter(gene_biotype == 'protein_coding') %>% dplyr::distinct()
-df$cancer_census_somatic <- stringr::str_replace_all(df$cancer_census_somatic,"&",", ")
-df <- dplyr::rename(df, ANTINEOPLASTIC_DRUG_INTERACTION = antineoplastic_drugs_dgidb)
-df$VAR_ID <- rep(1:nrow(df))
-df <- pcgrr2::annotate_variant_link(df, vardb = 'DGIDB')
-df <- dplyr::rename(df, ONCOGENE = ts_oncogene, TUMOR_SUPPRESSOR = tsgene, ENTREZ_ID = entrezgene, CANCER_CENSUS_SOMATIC = cancer_census_somatic, GENE = symbol, CHROMOSOME = chrom, GENENAME = name, ANTINEOPLASTIC_DRUG_INTERACTIONS = DGIDBLINK, SEGMENT_LENGTH = segment_length, SEGMENT = segment_link, TRANSCRIPT_OVERLAP = transcript_overlap_percent)
-df$ENTREZ_ID <- as.character(df$ENTREZ_ID)
-df <- dplyr::left_join(df,pcgr_data$kegg_gene_pathway_links, by=c("ENTREZ_ID" = "gene_id"))
-df <- dplyr::rename(df, KEGG_PATHWAY = kegg_pathway_urls)
-df <- pcgrr2::annotate_variant_link(df, vardb = 'NCBI_GENE')
-df <- dplyr::rename(df, GENE_NAME = NCBI_GENE_LINK)
-df <- dplyr::select(df, CHROMOSOME, GENE, GENE_NAME, CANCER_CENSUS_SOMATIC, KEGG_PATHWAY, TUMOR_SUPPRESSOR, ONCOGENE, ANTINEOPLASTIC_DRUG_INTERACTIONS,SEGMENT_LENGTH, SEGMENT, gencode_transcript_type,LogR, TRANSCRIPT_OVERLAP) %>% dplyr::distinct()
-df <- df %>% dplyr::distinct()
-segments <- NULL
-segments <- dplyr::select(df, SEGMENT, SEGMENT_LENGTH, LogR) %>% dplyr::distinct()
-df$SEGMENT
-load('/Users/sigven/research/docker/pcgr/data/rda/pcgr_data.rda')
-logR_threshold_homozygous_deletion <- -0.8
-logR_threshold_amplification <- 0.8
-cna_file <- '/Users/sigven/research/docker/pcgr/examples/test.cna.tsv'
-cna_df <- read.table(file=cna_file,header = T,stringsAsFactors = F,comment.char="", quote="")
-cna_df <- dplyr::rename(cna_df, chromosome = Chromosome, LogR = Segment_Mean, segment_start = Start, segment_end = End) %>% dplyr::distinct()
-if(!any(stringr::str_detect(cna_df$chromosome,"chr"))){
-cna_df$chromosome <- paste0("chr",cna_df$chromosome)
-}
-cna_df <- cna_df %>% dplyr::filter(!is.na(LogR))
-cna_df$LogR <- as.numeric(cna_df$LogR)
-cna_gr <- GenomicRanges::makeGRangesFromDataFrame(cna_df, keep.extra.columns = T, seqinfo = pcgr_data$seqinfo_hg19, seqnames.field = 'chromosome',start.field = 'segment_start', end.field = 'segment_end', ignore.strand = T, starts.in.df.are.0based = T)
-hits <- GenomicRanges::findOverlaps(cna_gr, pcgr_data$ensembl_genes_gr, type="any", select="all")
-ranges <- pcgr_data$ensembl_genes_gr[subjectHits(hits)]
-mcols(ranges) <- c(mcols(ranges),mcols(cna_gr[queryHits(hits)]))
-df <- as.data.frame(mcols(ranges))
-df$segment_start <- start(ranges(cna_gr[queryHits(hits)]))
-df$segment_end <- end(ranges(cna_gr[queryHits(hits)]))
-df$segment_length <- paste(round((as.numeric((df$segment_end - df$segment_start)/1000000)),digits = 3),"Mb")
-head(df)
-dplyr::select(df, segment_start, segment_end,LogR) %>% dplyr::distinct() %>% nrow()
-head(cna_gr)
-cna_df <- read.table(file=cna_file,header = T,stringsAsFactors = F,comment.char="", quote="")
-cna_df <- dplyr::rename(cna_df, chromosome = Chromosome, LogR = Segment_Mean, segment_start = Start, segment_end = End) %>% dplyr::distinct()
-if(!any(stringr::str_detect(cna_df$chromosome,"chr"))){
-cna_df$chromosome <- paste0("chr",cna_df$chromosome)
-}
-cna_df <- cna_df %>% dplyr::filter(!is.na(LogR))
-cna_df$LogR <- as.numeric(cna_df$LogR)
-cna_segments <- cna_df
-cna_segments$segment_link <- paste0("",paste0(cna_segments$chromosome,':',cna_segments$segment_start,'-',cna_segments$segment_end),"")
-cna_segments$segment_length <- paste(round((as.numeric((cna_segments$segment_end - cna_segments$segment_start)/1000000)),digits = 3),"Mb")
cna_segments <- dplyr::rename(cna_segments, SEGMENT_LENGTH = segment_length, SEGMENT = segment_link)
cna_segments <- dplyr::select(cna_segments, SEGMENT, SEGMENT_LENGTH, LogR) %>% dplyr::distinct()
cna_segments_filtered <- dplyr::filter(segments, LogR >= logR_threshold_amplification | LogR <= logR_threshold_homozygous_deletion)
@@ -409,3 +46,467 @@ pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = q
pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, logR_threshold_amplification = 0.8, logR_threshold_homozygous_deletion = -0.8, cna_segments_tsv = cna_segments_tsv)
pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, logR_threshold_amplification = 0.8, logR_threshold_homozygous_deletion = -0.8, cna_segments_tsv = cna_segments_tsv)
pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, logR_threshold_amplification = 0.8, logR_threshold_homozygous_deletion = -0.8, cna_segments_tsv = cna_segments_tsv)
+library(rmarkdown)
+help(render)
+library(pcgrr2)
+library(pcgrr2)
+project_directory <- '/Users/sigven/research/docker/pcgr/output'
+setwd(project_directory)
+query_vcf <- 'tumor_sample.BRCA.pcgr.vcf.gz'
+cna_segments_tsv <- '../examples/tumor_sample.BRCA.cna.tsv'
+sample_name <- 'tumor_sample.BRCA'
+print_biomarkers <- TRUE
+print_tier_variants <- TRUE
+print_mutational_signatures <- TRUE
+print_cna_segments <- TRUE
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+load('/Users/sigven/research/docker/pcgr/data/rda/pcgr_data.rda')
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+suppressWarnings(suppressPackageStartupMessages(library(pcgrr2)))
+suppressWarnings(suppressPackageStartupMessages(library(magrittr)))
+suppressWarnings(suppressPackageStartupMessages(library(BSgenome.Hsapiens.UCSC.hg19)))
+suppressWarnings(suppressPackageStartupMessages(library(deconstructSigs)))
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, sample_name = sample_name)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+query_vcf
+getw()
+getwd()
+project_directory
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+setwd('/Users/sigven/research/docker/pcgr/output')
+query_vcf <- 'tumor_sample.BRCA.pcgr.vcf.gz'
+sample_name <- 'tumor_sample.BRCA'
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+load('/Users/sigven/research/docker/pcgr/data/rda/pcgr_data.rda')
+suppressWarnings(suppressPackageStartupMessages(library(pcgrr2)))
+suppressWarnings(suppressPackageStartupMessages(library(magrittr)))
+suppressWarnings(suppressPackageStartupMessages(library(BSgenome.Hsapiens.UCSC.hg19)))
+suppressWarnings(suppressPackageStartupMessages(library(deconstructSigs)))
+cna_segments_tsv <- '../examples/tumor_sample.BRCA.cna.tsv'
+project_directory <- getwd()
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+getwd()
+setwd('/Users/sigven/research/docker/pcgr/output')
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+detach("package:pcgrr2", unload=TRUE)
+getwd()
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+getwd()
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = NULL, sample_name = sample_name)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = NULL, sample_name = sample_name)
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'None', 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'None', 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+library(pcgrr2)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'None', 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'None', 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+rmarkdown::render('test.Rmd', params = list(tier1_report = TRUE))
+rmarkdown::render('test.Rmd', params = list(tier1_report = TRUE))
+rmarkdown::render('test.Rmd', params = list(tier1_report = FALSE))
+getwd()
+cna_segments_tsv <- '/Users/sigven/research/docker/pcgr/examples/tumor_sample.BRCA.cna.tsv'
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'None', 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'None', 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'None', 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+library(pcgrr2)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'None', 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+getwd()
+setwd('/Users/sigven/research/docker/pcgr/output')
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'tumor_sample.BRCA.pcgr.vcf.gz', 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+install.packages(c("curl", "DBI", "dendextend", "jsonlite", "lattice", "matrixStats", "psych", "readr", "rmarkdown", "shiny", "sjmisc", "sourcetools", "stringi", "survival", "tibble", "vegan", "viridis", "viridisLite", "XML"))
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+install.packages(c("curl", "DBI", "dendextend", "jsonlite", "lattice", "matrixStats", "psych", "readr", "rmarkdown", "shiny", "sjmisc", "sourcetools", "stringi", "survival", "tibble", "vegan", "viridis", "viridisLite", "XML"))
+install.packages(c("lattice", "survival"), lib="/Library/Frameworks/R.framework/Versions/3.3/Resources/library")
+install.packages(c("curl", "DBI", "dendextend", "jsonlite", "lattice", "matrixStats", "psych", "readr", "rmarkdown", "shiny", "sjmisc", "sourcetools", "stringi", "survival", "tibble", "vegan", "viridis", "viridisLite", "XML"))
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'tumor_sample.BRCA.pcgr.vcf.gz', 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'tumor_sample.BRCA.pcgr.vcf.gz', 0.8, -0.8, cna_segments_tsv = NULL, sample_name = sample_name)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'tumor_sample.BRCA.pcgr.vcf.gz', 0.8, -0.8, cna_segments_tsv = NULL, sample_name = sample_name)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'tumor_sample.BRCA.pcgr.vcf.gz', 0.8, -0.8, cna_segments_tsv = NULL, sample_name = sample_name)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'tumor_sample.BRCA.pcgr.vcf.gz', 0.8, -0.8, cna_segments_tsv = NULL, sample_name = sample_name)
+report_data <- list(tier1_report = FALSE, tier2_report = FALSE, tier3_report = FALSE, tier4_report = FALSE, tier5_report = FALSE, signature_report = FALSE, missing_signature_data = FALSE, cna_report_oncogene_gain = FALSE, cna_report_tsgene_loss = FALSE, cna_report_biomarkers = FALSE, cna_report_segments = FALSE)
+report_data
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
+signatures_limit <- 6
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
+report_data
+report_data$tier1_report
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'tumor_sample.BRCA.pcgr.vcf.gz', 0.8, -0.8, cna_segments_tsv = NULL, sample_name = sample_name)
+params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report)
+params
+logR_threshold_homozygous_deletion <- -0.8
+logR_threshold_amplification <- 0.8
+params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report)
+params
+params$tier1_report
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
+report_data$sample_name <- sample_name
+report_data$cna_report_tsgene_loss <- FALSE
+report_data$cna_report_oncogene_gain <- FALSE
+report_data$cna_report_biomarkers <- FALSE
+report_data$cna_report_segments <- FALSE
+report_data$cna_report_tsgene_loss
+params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report)
+params
+help(render)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'tumor_sample.BRCA.pcgr.vcf.gz', 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'tumor_sample.BRCA.pcgr.vcf.gz', 0.8, -0.8, cna_segments_tsv = NULL, sample_name = sample_name)
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'tumor_sample.BRCA.pcgr.vcf.gz', 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+query_vcf <- paste0(project_directory,'/',query_vcf)
+query_vcf
+setwd('/Users/sigven/research/docker')
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = 'tumor_sample.BRCA.pcgr.vcf.gz', 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+sessionInfo()
+getwd()
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+sessionInfo()
+report_data <- list(tier1_report = FALSE, tier2_report = FALSE, tier3_report = FALSE, tier4_report = FALSE, tier5_report = FALSE, signature_report = FALSE, missing_signature_data = FALSE, cna_report_oncogene_gain = FALSE, cna_report_tsgene_loss = FALSE, cna_report_biomarkers = FALSE, cna_report_segments = FALSE, missing_signature_data = FALSE)
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
+report_data$sample_name <- sample_name
+report_data$cna_report_tsgene_loss <- TRUE
+report_data$cna_report_oncogene_gain <- TRUE
+report_data$cna_report_biomarkers <- TRUE
+report_data$cna_report_segments <- TRUE
+cna_data <- pcgrr2::cna_segment_annotation(cna_segments_tsv, logR_threshold_amplification, logR_threshold_homozygous_deletion, format='tcga')
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data)
+)
+rm(params)
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data))
+library(knitr)
+help("opts_chunk")
+getwd()
+rmarkdown::render(system.file("templates","Test.Rmd", package="pcgrr2"), output_file = 'test.html', output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report))
+getwd()
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+rmarkdown::render(system.file("templates","Test.Rmd", package="pcgrr2"), output_file = 'test.html', output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report))
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+rmarkdown::render(system.file("templates","Test.Rmd", package="pcgrr2"), output_file = 'test.html', output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report))
+rmarkdown::render(system.file("templates","Test.Rmd", package="pcgrr2"), output_file = 'test.html', output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = TRUE))
+rmarkdown::render(system.file("templates","Test.Rmd", package="pcgrr2"), output_file = 'test.html', output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = FALSE))
+cna_data <- NULL
+nrow(cna_data$ranked_segments)
+help(knit)
+rmarkdown::render(system.file("templates","Test.Rmd", package="pcgrr2"), output_file = 'test.html', output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = FALSE),envir = globalenv())
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data))
+report_data$clinical_evidence_items_tier1A
+colnames(report_data$clinical_evidence_items_tier1A)
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
+report_data$sample_name <- sample_name
+colnames(report_data$clinical_evidence_items_tier1A)
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data))
+cna_data <- data.frame()
+cna_data$ranked_segments <- NULL
+cna_data
+cna_data <- list(ranked_segments = data.frame(), oncogene_amplified = data.frame(), tsgene_homozygous_deletion = data.frame(),cna_df_for_print = data.frame(), cna_biomarkers = data.frame(), cna_biomarker_segments = data.frame())
+cna_data
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data))
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
+report_data$sample_name <- sample_name
+report_data$cna_report_tsgene_loss <- FALSE
+report_data$cna_report_oncogene_gain <- FALSE
+report_data$cna_report_biomarkers <- FALSE
+report_data$cna_report_segments <- FALSE
+cna_data <- list(ranked_segments = data.frame(), oncogene_amplified = data.frame(), tsgene_homozygous_deletion = data.frame(),cna_df_for_print = data.frame(), cna_biomarkers = data.frame(), cna_biomarker_segments = data.frame())
+report_data$tier1_report
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data))
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data))
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
+report_data$sample_name <- sample_name
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data))
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
+report_data$sample_name <- sample_name
+cna_data <- list(ranked_segments = data.frame(), oncogene_amplified = data.frame(), tsgene_homozygous_deletion = data.frame(),cna_df_for_print = data.frame(), cna_biomarkers = data.frame(), cna_biomarker_segments = data.frame())
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data))
+cna_data
+cna_data$ranked_segments
+nrow(cna_data$ranked_segments)
+report_data$tier1_report
+cna_data$ranked_segments
+myOptions <- list(paging = F, searching=F,caching=F)
+if(nrow(cna_data$ranked_segments) >= 10){
+myOptions <- list(paging = T,pageLength=10, searching=F,caching=F)
+}
+if(nrow(cna_data$ranked_segments) > 0){
+DT::datatable(cna_data$ranked_segments, options = myOptions, escape=F, extensions = "Responsive") %>% DT::formatStyle('LogR',color='white', backgroundColor = DT::styleInterval(logR_threshold_homozygous_deletion, c('red', '#009E73')))
+}
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = sample_name)
+pcgrr2::generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = NULL, sample_name = sample_name)
+report_data <- list(tier1_report = FALSE, tier2_report = FALSE, tier3_report = FALSE, tier4_report = FALSE, tier5_report = FALSE, signature_report = FALSE, missing_signature_data = FALSE, cna_report_oncogene_gain = FALSE, cna_report_tsgene_loss = FALSE, cna_report_biomarkers = FALSE, cna_report_segments = FALSE, missing_signature_data = FALSE)
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
+report_data$sample_name <- sample_name
+report_data$cna_report_tsgene_loss <- FALSE
+report_data$cna_report_oncogene_gain <- FALSE
+report_data$cna_report_biomarkers <- FALSE
+report_data$cna_report_segments <- FALSE
+cna_data <- list(ranked_segments = data.frame(), oncogene_amplified = data.frame(), tsgene_homozygous_deletion = data.frame(),cna_df_for_print = data.frame(), cna_biomarkers = data.frame(), cna_biomarker_segments = data.frame())
+cna_segments_tsv
+cna_segments_tsv <- NULL
+tmp <- list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data)
+tmp
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data))
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+report_data <- list(tier1_report = FALSE, tier2_report = FALSE, tier3_report = FALSE, tier4_report = FALSE, tier5_report = FALSE, signature_report = FALSE, missing_signature_data = FALSE, cna_report_oncogene_gain = FALSE, cna_report_tsgene_loss = FALSE, cna_report_biomarkers = FALSE, cna_report_segments = FALSE, missing_signature_data = FALSE)
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
+report_data$sample_name <- sample_name
+report_data$cna_report_tsgene_loss <- FALSE
+report_data$cna_report_oncogene_gain <- FALSE
+report_data$cna_report_biomarkers <- FALSE
+report_data$cna_report_segments <- FALSE
+cna_data <- list(ranked_segments = data.frame(), oncogene_amplified = data.frame(), tsgene_homozygous_deletion = data.frame(),cna_df_for_print = data.frame(), cna_biomarkers = data.frame(), cna_biomarker_segments = data.frame())
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data))
+report_data <- list(tier1_report = FALSE, tier2_report = FALSE, tier3_report = FALSE, tier4_report = FALSE, tier5_report = FALSE, signature_report = FALSE, missing_signature_data = FALSE, cna_report_oncogene_gain = FALSE, cna_report_tsgene_loss = FALSE, cna_report_biomarkers = FALSE, cna_report_segments = FALSE, missing_signature_data = FALSE)
+tier_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.snvs_indels.tiers.tsv')
+msig_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.mutational_signatures.tsv')
+biomarker_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.snvs_indels.biomarkers.tsv')
+cna_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.cna_segments.tsv')
+maf_fname <- paste0(project_directory, '/',sample_name,'.pcgr.maf')
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
+report_data$sample_name <- sample_name
+report_data$cna_report_tsgene_loss <- FALSE
+report_data$cna_report_oncogene_gain <- FALSE
+report_data$cna_report_biomarkers <- FALSE
+report_data$cna_report_segments <- FALSE
+cna_data <- list(ranked_segments = data.frame(), oncogene_amplified = data.frame(), tsgene_homozygous_deletion = data.frame(),cna_df_for_print = data.frame(), cna_biomarkers = data.frame(), cna_biomarker_segments = data.frame())
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data))
+library(pcgrr2)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+report_data <- list(tier1_report = FALSE, tier2_report = FALSE, tier3_report = FALSE, tier4_report = FALSE, tier5_report = FALSE, signature_report = FALSE, missing_signature_data = FALSE, cna_report_oncogene_gain = FALSE, cna_report_tsgene_loss = FALSE, cna_report_biomarkers = FALSE, cna_report_segments = FALSE, missing_signature_data = FALSE)
+tier_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.snvs_indels.tiers.tsv')
+msig_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.mutational_signatures.tsv')
+biomarker_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.snvs_indels.biomarkers.tsv')
+cna_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.cna_segments.tsv')
+maf_fname <- paste0(project_directory, '/',sample_name,'.pcgr.maf')
+project_directory <- '/Users/sigven/research/docker/pcgr/output'
+report_data <- list(tier1_report = FALSE, tier2_report = FALSE, tier3_report = FALSE, tier4_report = FALSE, tier5_report = FALSE, signature_report = FALSE, missing_signature_data = FALSE, cna_report_oncogene_gain = FALSE, cna_report_tsgene_loss = FALSE, cna_report_biomarkers = FALSE, cna_report_segments = FALSE, missing_signature_data = FALSE)
+tier_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.snvs_indels.tiers.tsv')
+msig_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.mutational_signatures.tsv')
+biomarker_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.snvs_indels.biomarkers.tsv')
+cna_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.cna_segments.tsv')
+maf_fname <- paste0(project_directory, '/',sample_name,'.pcgr.maf')
+sample_name <- 'tumor_sample.BRCA'
+report_data <- list(tier1_report = FALSE, tier2_report = FALSE, tier3_report = FALSE, tier4_report = FALSE, tier5_report = FALSE, signature_report = FALSE, missing_signature_data = FALSE, cna_report_oncogene_gain = FALSE, cna_report_tsgene_loss = FALSE, cna_report_biomarkers = FALSE, cna_report_segments = FALSE, missing_signature_data = FALSE)
+tier_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.snvs_indels.tiers.tsv')
+msig_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.mutational_signatures.tsv')
+biomarker_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.snvs_indels.biomarkers.tsv')
+cna_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.cna_segments.tsv')
+maf_fname <- paste0(project_directory, '/',sample_name,'.pcgr.maf')
+setwd(project_directory)
+query_vcf <- 'tumor_sample.BRCA.pcgr.vcf.gz'
+cna_segments_tsv <- NULL
+signatures_limit <- 6
+logR_threshold_amplification <- 0.8
+logR_threshold_homozygous_deletion <- -0.8
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
+report_data$sample_name <- sample_name
+load('../data/rda/pcgr_data.rda')
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
+report_data$sample_name <- sample_name
+report_data$cna_report_tsgene_loss <- FALSE
+report_data$cna_report_oncogene_gain <- FALSE
+report_data$cna_report_biomarkers <- FALSE
+report_data$cna_report_segments <- FALSE
+cna_data <- list(ranked_segments = data.frame(), oncogene_amplified = data.frame(), tsgene_homozygous_deletion = data.frame(),cna_df_for_print = data.frame(), cna_biomarkers = data.frame(), cna_biomarker_segments = data.frame())
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data))
+detach("package:pcgrr2", unload=TRUE)
+report_data <- list(tier1_report = FALSE, tier2_report = FALSE, tier3_report = FALSE, tier4_report = FALSE, tier5_report = FALSE, signature_report = FALSE, missing_signature_data = FALSE, cna_report_oncogene_gain = FALSE, cna_report_tsgene_loss = FALSE, cna_report_biomarkers = FALSE, cna_report_segments = FALSE, missing_signature_data = FALSE)
+tier_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.snvs_indels.tiers.tsv')
+msig_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.mutational_signatures.tsv')
+biomarker_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.snvs_indels.biomarkers.tsv')
+cna_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.cna_segments.tsv')
+maf_fname <- paste0(project_directory, '/',sample_name,'.pcgr.maf')
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
+report_data$sample_name <- sample_name
+report_data$cna_report_tsgene_loss <- FALSE
+report_data$cna_report_oncogene_gain <- FALSE
+report_data$cna_report_biomarkers <- FALSE
+report_data$cna_report_segments <- FALSE
+cna_data <- list(ranked_segments = data.frame(), oncogene_amplified = data.frame(), tsgene_homozygous_deletion = data.frame(),cna_df_for_print = data.frame(), cna_biomarkers = data.frame(), cna_biomarker_segments = data.frame())
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data))
+cna_segments_tsv <- '/Users/sigven/research/docker/pcgr/examples/tumor_sample.BRCA.cna.tsv'
+project_directory <- '/Users/sigven/research/docker/pcgr/output'
+cna_segments_tsv <- '/Users/sigven/research/docker/pcgr/examples/tumor_sample.BRCA.cna.tsv'
+signaturs_limit <- 6
+load('../data/rda/pcgr_data.rda')
+signatures_limit <- 6
+setwd(project_directory)
+query_vcf <- 'tumor_sample.BRCA.pcgr.vcf.gz'
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+logR_threshold_amplification <- 0.8
+logR_threshold_homozygous_deletion <- -0.8
+report_data <- list(tier1_report = FALSE, tier2_report = FALSE, tier3_report = FALSE, tier4_report = FALSE, tier5_report = FALSE, signature_report = FALSE, missing_signature_data = FALSE, cna_report_oncogene_gain = FALSE, cna_report_tsgene_loss = FALSE, cna_report_biomarkers = FALSE, cna_report_segments = FALSE, missing_signature_data = FALSE)
+tier_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.snvs_indels.tiers.tsv')
+msig_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.mutational_signatures.tsv')
+biomarker_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.snvs_indels.biomarkers.tsv')
+cna_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.cna_segments.tsv')
+maf_fname <- paste0(project_directory, '/',sample_name,'.pcgr.maf')
+sample_name <- 'tumor_sample.BRCA'
+report_data <- list(tier1_report = FALSE, tier2_report = FALSE, tier3_report = FALSE, tier4_report = FALSE, tier5_report = FALSE, signature_report = FALSE, missing_signature_data = FALSE, cna_report_oncogene_gain = FALSE, cna_report_tsgene_loss = FALSE, cna_report_biomarkers = FALSE, cna_report_segments = FALSE, missing_signature_data = FALSE)
+tier_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.snvs_indels.tiers.tsv')
+msig_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.mutational_signatures.tsv')
+biomarker_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.snvs_indels.biomarkers.tsv')
+cna_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.cna_segments.tsv')
+maf_fname <- paste0(project_directory, '/',sample_name,'.pcgr.maf')
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
+report_data$sample_name <- sample_name
+report_data$cna_report_tsgene_loss <- FALSE
+report_data$cna_report_oncogene_gain <- FALSE
+report_data$cna_report_biomarkers <- FALSE
+report_data$cna_report_segments <- FALSE
+cna_data <- list(ranked_segments = data.frame(), oncogene_amplified = data.frame(), tsgene_homozygous_deletion = data.frame(),cna_df_for_print = data.frame(), cna_biomarkers = data.frame(), cna_biomarker_segments = data.frame())
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data))
+rm(eval_tier1)
+rm(eval_tier2)
+rm(eval_tier3)
+rm(eval_tier4)
+rm(eval_tier5)
+rm(eval_cna_biomarker)
+rm(eval_cna_gain)
+rm(eval_cna_loss)
+rm(eval_cna_segments)
+rm(eval_missing_signature_data)
+rm(eval_signature_report)
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data))
+report_data <- list(tier1_report = FALSE, tier2_report = FALSE, tier3_report = FALSE, tier4_report = FALSE, tier5_report = FALSE, signature_report = FALSE, missing_signature_data = FALSE, cna_report_oncogene_gain = FALSE, cna_report_tsgene_loss = FALSE, cna_report_biomarkers = FALSE, cna_report_segments = FALSE, missing_signature_data = FALSE)
+tier_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.snvs_indels.tiers.tsv')
+msig_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.mutational_signatures.tsv')
+biomarker_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.snvs_indels.biomarkers.tsv')
+cna_tsv_fname <- paste0(project_directory, '/',sample_name,'.pcgr.cna_segments.tsv')
+maf_fname <- paste0(project_directory, '/',sample_name,'.pcgr.maf')
+sample_calls <- pcgrr2::get_calls(query_vcf, sample_id = sample_name)
+report_data <- pcgrr2::generate_report_data(sample_calls, sample_name = sample_name, minimum_n_signature_analysis = 50, signatures_limit = signatures_limit)
+report_data$sample_name <- sample_name
+rm(eval_tier1)
+rm(eval_tier2)
+rm(eval_tier3)
+rm(eval_tier4)
+rm(eval_cna_biomarker)
+rm(eval_cna_gain)
+rm(eval_cna_loss)
+rm(eval_cna_segments)
+rm(eval_tier5)
+rm(eval_signature_report)
+rm(eval_missing_signature_data)
+report_data$cna_report_tsgene_loss <- FALSE
+report_data$cna_report_oncogene_gain <- FALSE
+report_data$cna_report_biomarkers <- FALSE
+report_data$cna_report_segments <- FALSE
+rmarkdown::render(system.file("templates","report.Rmd", package="pcgrr2"), output_file = paste0(sample_name,'.pcgr.html'), output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report, tier2_report = report_data$tier2_report, tier3_report = report_data$tier3_report, tier4_report = report_data$tier4_report, tier5_report = report_data$tier5_report, cna_report_tsgene_loss = report_data$cna_report_tsgene_loss, cna_report_oncogene_gain = report_data$cna_report_oncogene_gain, cna_report_biomarkers = report_data$cna_report_biomarkers, cna_report_segments = report_data$cna_report_segments, signature_report = report_data$signature_report, missing_signature_data = report_data$missing_signature_data))
+rmarkdown::render(system.file("templates","Test.Rmd", package="pcgrr2"), output_file = 'test.html', output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report))
+library(pcgrr2)
+rmarkdown::render(system.file("templates","Test.Rmd", package="pcgrr2"), output_file = 'test.html', output_dir = project_directory, intermediates_dir = project_directory, params = list(logR_threshold_amplification = logR_threshold_amplification, logR_threshold_homozygous_deletion = logR_threshold_homozygous_deletion, tier1_report = report_data$tier1_report))
+project_directory <- '~/research/docker/pcgr/output'
+rmarkdown::render(system.file("templates","Test.Rmd", package="pcgrr2"), output_file = 'test.html', output_dir = project_directory, intermediates_dir = project_directory, params = list(tier1_report = F))
+library(pcgrr2)
+load('../data/rda/pcgr_data.rda')
+sample_name <- 'tumor_sample.BRCA'
+query_vcf <- 'tumor_sample.BRCA.pcgr.vcf.gz'
+getwd()
+cna_segments_tsv <- '~/research/docker/pcgr/examples/tumor_sample.BRCA.cna.tsv'
+generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = 'tumor_sample.BRCA')
+generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = 'tumor_sample.BRCA')
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+help(globalenv)
+eval_tier1 <- globalenv(FALSE)
+help(render)
+help("parent.env")
+generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = 'tumor_sample.BRCA')
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = 'tumor_sample.BRCA')
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = 'tumor_sample.BRCA')
+detach("package:pcgrr2", unload=TRUE)
+library("pcgrr2", lib.loc="~/Library/R/3.3/library")
+generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = 'tumor_sample.BRCA')
+eval_tier1 <- FALSE
+eval_tier2 <- FALSE
+eval_tier3 <- FALSE
+eval_tier4 <- FALSE
+eval_tier5 <- FALSE
+eval_signature_report <- FALSE
+eval_missing_signature_data <- FALSE
+eval_cna_segments <- FALSE
+eval_cna_loss <- FALSE
+eval_cna_gain <- FALSE
+eval_cna_biomarker <- FALSE
+generate_pcg_report(project_directory = project_directory, query_vcf = query_vcf, 0.8, -0.8, cna_segments_tsv = cna_segments_tsv, sample_name = 'tumor_sample.BRCA')
+cna_biomarkers <- data.frame()
+nrow<(cna_biomarkers)
+nrow(cna_biomarkers)
diff --git a/src/R/pcgrr2/R/utils.R b/src/R/pcgrr2/R/utils.R
index 1291b1e8..3b6e9780 100644
--- a/src/R/pcgrr2/R/utils.R
+++ b/src/R/pcgrr2/R/utils.R
@@ -89,7 +89,7 @@ tier_to_maf <- function(tier_df){
#' Function that retrieves relative estimates of known somatic signatures from a single tumor
#'
-#' @param mut_data data frame with somatic mutations (VCF_SAMPLE_ID, CHROM, POS, REF, ALT)
+#' @param mut_data data frame with somatic mutations (VCF_SAMPLE_ID, chrom, pos, ref, alt)
#' @param sample_name sample name
#' @param signatures_limit max number of contributing signatures
#'
@@ -97,7 +97,7 @@ tier_to_maf <- function(tier_df){
signature_contributions_single_sample <- function(mut_data, sample_name, signatures_limit = 6){
n_muts = nrow(mut_data)
rlogging::message(paste0("Identifying weighted contributions of known mutational signatures using deconstructSigs (n = ",n_muts," SNVs)"))
- sigs.input <- deconstructSigs::mut.to.sigs.input(mut.ref = mut_data, sample.id = "VCF_SAMPLE_ID",chr = "CHROM",pos = "POS", ref = "REF", alt = "ALT")
+ sigs.input <- deconstructSigs::mut.to.sigs.input(mut.ref = mut_data, sample.id = "VCF_SAMPLE_ID",chr = "chrom",pos = "pos", ref = "ref", alt = "alt")
sample_1 <- deconstructSigs::whichSignatures(tumor.ref = sigs.input, sample.id = sample_name, signatures.limit = signatures_limit, signatures.ref = signatures.cosmic,contexts.needed = T,tri.counts.method = 'exome')
nonzero_signatures <- sample_1$weights[which(colSums(sample_1$weights != 0) > 0)]
n <- 1
@@ -486,7 +486,7 @@ generate_report_data <- function(sample_calls, sample_name = NULL, minimum_n_sig
if(any(grepl(paste0("VARIANT_CLASS$"),names(sample_calls)))){
if(nrow(sample_calls[sample_calls$VARIANT_CLASS == 'SNV',]) >= min_variants_for_signature){
signature_call_set <- sample_calls[sample_calls$VARIANT_CLASS == 'SNV',]
- signature_call_set <- dplyr::filter(signature_call_set, CHROM != 'MT')
+ signature_call_set <- dplyr::filter(signature_call_set, chrom != 'MT')
signature_call_set$VCF_SAMPLE_ID <- sample_name
signature_report <- TRUE
@@ -506,7 +506,7 @@ generate_report_data <- function(sample_calls, sample_name = NULL, minimum_n_sig
else{
if(nrow(sample_calls[sample_calls$VARIANT_CLASS == 'SNV',]) > 0){
signature_call_set <- sample_calls[sample_calls$VARIANT_CLASS == 'SNV',]
- signature_call_set <- dplyr::filter(signature_call_set, CHROM != 'MT')
+ signature_call_set <- dplyr::filter(signature_call_set, chrom != 'MT')
}
rlogging::message(paste0("Too few variants (n = ",nrow(signature_call_set),") for reconstruction of mutational signatures by deconstructSigs"))
missing_signature_data <- TRUE
@@ -1135,13 +1135,13 @@ get_calls <- function(vcf_gz_file, sample_id = NULL){
# for (col in c('AF_TUMOR','AF_NORMAL')){
# vcf_data_df[col] <- numeric(nrow(vcf_data_df))
# }
- vcf_data_df <- dplyr::rename(vcf_data_df, CHROM = seqnames, POS = start, REF = ref, ALT = alt, CONSEQUENCE = Consequence, PROTEIN_CHANGE = HGVSp_short)
+ vcf_data_df <- dplyr::rename(vcf_data_df, chrom = seqnames, pos = start, CONSEQUENCE = Consequence, PROTEIN_CHANGE = HGVSp_short)
return(vcf_data_df)
}
vcf_data_df$GENOME_VERSION <- 'GRCh37'
- vcf_data_df <- dplyr::rename(vcf_data_df, CHROM = seqnames, POS = start, REF = ref, ALT = alt, CONSEQUENCE = Consequence, PROTEIN_CHANGE = HGVSp_short)
- vcf_data_df$GENOMIC_CHANGE <- paste(paste(paste(paste0("g.chr",vcf_data_df$CHROM),vcf_data_df$POS,sep=":"),vcf_data_df$REF,sep=":"),vcf_data_df$ALT,sep=">")
+ vcf_data_df <- dplyr::rename(vcf_data_df, chrom = seqnames, pos = start, CONSEQUENCE = Consequence, PROTEIN_CHANGE = HGVSp_short)
+ vcf_data_df$GENOMIC_CHANGE <- paste(paste(paste(paste0("g.chr",vcf_data_df$chrom),vcf_data_df$pos,sep=":"),vcf_data_df$ref,sep=":"),vcf_data_df$alt,sep=">")
vcf_data_df <- pcgrr2::add_pfam_domain_links(vcf_data_df)
vcf_data_df <- pcgrr2::add_swissprot_feature_descriptions(vcf_data_df)
diff --git a/src/R/pcgrr2/inst/templates/mutational_signature.Rmd b/src/R/pcgrr2/inst/templates/mutational_signature.Rmd
index 393767de..91328f1e 100644
--- a/src/R/pcgrr2/inst/templates/mutational_signature.Rmd
+++ b/src/R/pcgrr2/inst/templates/mutational_signature.Rmd
@@ -9,11 +9,11 @@ A total of __n = `r nrow(report_data$signature_data$signature_call_set)`__ SNVs
Given an input tumor profile and reference input signatures (i.e. [30 mutational signatures detected by Sanger/COSMIC](http://cancer.sanger.ac.uk/cosmic/signatures)), deconstructSigs iteratively infers the weighted contributions of each reference signature until an empirically chosen error threshold is reached. In the plots below, the _top panel_ is the tumor mutational profile displaying the fraction of mutations found in each trinucleotide context, the _middle panel_ is the reconstructed mutational profile created by multiplying the calculated weights by the signatures, and the _bottom panel_ is the error between the tumor mutational profile and reconstructed mutational profile. The piechart shows the relative contribution of each signature in the sample.
-```{r sigplot, echo=F, fig.width=12,fig.height = 11}
+```{r sigplot, echo=F, fig.width=12,fig.height = 12, dpi=200}
deconstructSigs::plotSignatures(report_data$signature_data$mut_signature_contributions$which_signatures_obj)
```
-```{r sigpie, echo=F, fig.width=12,fig.height = 6}
+```{r sigpie, echo=F, fig.width=12,fig.height = 6, dpi=200}
deconstructSigs::makePie(report_data$signature_data$mut_signature_contributions$which_signatures_obj)
```
diff --git a/src/R/pcgrr2/man/signature_contributions_single_sample.Rd b/src/R/pcgrr2/man/signature_contributions_single_sample.Rd
index 37bf8ddd..9c37875b 100644
--- a/src/R/pcgrr2/man/signature_contributions_single_sample.Rd
+++ b/src/R/pcgrr2/man/signature_contributions_single_sample.Rd
@@ -8,7 +8,7 @@ signature_contributions_single_sample(mut_data, sample_name,
signatures_limit = 6)
}
\arguments{
-\item{mut_data}{data frame with somatic mutations (VCF_SAMPLE_ID, CHROM, POS, REF, ALT)}
+\item{mut_data}{data frame with somatic mutations (VCF_SAMPLE_ID, chrom, pos, ref, alt)}
\item{sample_name}{sample name}
diff --git a/src/R/pcgrr2_0.1.0.tar.gz b/src/R/pcgrr2_0.1.0.tar.gz
index e14a4d29..d48df661 100644
Binary files a/src/R/pcgrr2_0.1.0.tar.gz and b/src/R/pcgrr2_0.1.0.tar.gz differ
diff --git a/src/pcgr/lib/pcgr.py b/src/pcgr/lib/pcgr.py
deleted file mode 100755
index 10a1c09e..00000000
--- a/src/pcgr/lib/pcgr.py
+++ /dev/null
@@ -1,24 +0,0 @@
-#!/usr/bin/env python
-
-import os,re,sys
-import logging
-
-
-def getlogger(logger_name):
- logger = logging.getLogger(logger_name)
- logger.setLevel(logging.DEBUG)
-
- # create console handler and set level to debug
- ch = logging.StreamHandler(sys.stdout)
- ch.setLevel(logging.DEBUG)
-
- # add ch to logger
- logger.addHandler(ch)
-
- # create formatter
- formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s", "20%y-%m-%d %H:%M:%S")
-
- #add formatter to ch
- ch.setFormatter(formatter)
-
- return logger
diff --git a/src/pcgr/lib/pcgrutils.py b/src/pcgr/lib/pcgrutils.py
new file mode 100755
index 00000000..988d48d4
--- /dev/null
+++ b/src/pcgr/lib/pcgrutils.py
@@ -0,0 +1,302 @@
+#!/usr/bin/env python
+
+import os,re,sys
+import csv
+import logging
+import gzip
+from bx.intervals.intersection import IntervalTree
+
+csv.field_size_limit(500 * 1024 * 1024)
+
+def read_infotag_file(vcf_info_tags_tsv):
+ """
+ Function that reads a VCF info tag file that denotes annotation tags produced by PCGR.
+ An example of the VCF info tag file is the following:
+
+ tag number type description
+ Consequence . String "Impact modifier for the consequence type (picked by VEP's --flag_pick_allele option)."
+
+ A dictionary is returned, with the tag as the key, and the full dictionary record as the value
+ """
+ info_tag_xref = {} ##dictionary returned
+ if not os.path.exists(vcf_info_tags_tsv):
+ return info_tag_xref
+ with open(vcf_info_tags_tsv, 'rb') as tsvfile:
+ reader = csv.DictReader(tsvfile, delimiter='\t')
+ for rec in reader:
+ if not info_tag_xref.has_key(rec['tag']):
+ info_tag_xref[rec['tag']] = rec
+
+ return info_tag_xref
+
+def index_cancer_hotspots(cancer_hotspot_fname):
+ """
+ returns a dictionary of dictionaries, with gene symbols and codons as the respective keys
+ Each entry is associated with the full dictionary record of actual hotspot annotations (Q-value, variants, tumor types etc)
+ """
+ hotspot_xref = {} ##dictionary returned
+
+ if not os.path.exists(cancer_hotspot_fname):
+ return hotspot_xref
+ with open(cancer_hotspot_fname, 'rb') as tsvfile:
+ ch_reader = csv.DictReader(tsvfile, delimiter='\t', quotechar='#')
+ for rec in ch_reader:
+ if 'splice' in rec['Codon']:
+ continue
+ gene = str(rec['Hugo Symbol']).upper()
+ codon = str(re.sub(r'[A-Z]','',rec['Codon']))
+ if not hotspot_xref.has_key(gene):
+ hotspot_xref[gene] = {}
+ hotspot_xref[gene][codon] = rec
+ return hotspot_xref
+
+
+def index_gene_transcripts(gene_transcript_annotations_fname, index = 'ensGene_transcript'):
+ """
+ Function that parses the gene transcript annotation file and returns a dictionary with Ensembl transcript
+ ID as key (e.g. ENST000.. ) and an array of transcripts as elements
+
+ each transcript element is a dictionary record with annotation types as keys (e.g. symbol, name etc)
+ """
+ transcript_xref = {} ##dictionary returned
+ with gzip.open(gene_transcript_annotations_fname, 'rb') as tsvfile:
+ greader = csv.DictReader(tsvfile, delimiter='\t', quotechar='|')
+ for rec in greader:
+ if index == 'ensGene_transcript' and not rec['ensembl_transcript_id'].startswith('ENST'):
+ continue
+ #if not (rec['gencode_tag'] == 'NA' or rec['gencode_tag'] == 'basic'):
+ #continue
+ if not transcript_xref.has_key(rec['ensembl_transcript_id']):
+ transcript_xref[rec['ensembl_transcript_id']] = []
+ transcript_xref[rec['ensembl_transcript_id']].append(rec)
+ return transcript_xref
+
+
+def map_cancer_hotspots(cancer_hotspot_xref, vcfrec_vep_info_tags, vcfrec_protein_info_tags):
+ """
+ Function that matches the annotations of a VCF record (VEP_INFO + PROTEIN_INFO) with the cancer hotspot dictionary in order
+ to return a dictionary of cancer mutation hotspots
+ """
+
+ for alt_allele in vcfrec_vep_info_tags['Feature'].keys():
+ hotspot_hits = {}
+ symbol = vcfrec_vep_info_tags['SYMBOL'][alt_allele]
+ consequence = vcfrec_vep_info_tags['Consequence'][alt_allele]
+ if vcfrec_protein_info_tags['PROTEIN_POSITIONS'].has_key(alt_allele):
+ if 'missense_variant' in consequence or 'stop_gained' in consequence:
+ if cancer_hotspot_xref.has_key(symbol):
+ for codon in vcfrec_protein_info_tags['PROTEIN_POSITIONS'][alt_allele].keys():
+ if cancer_hotspot_xref[symbol].has_key(str(codon)):
+ cancer_hotspot_description = str(symbol) + '|' + str(cancer_hotspot_xref[symbol][str(codon)]['Codon']) + '|' + str(cancer_hotspot_xref[symbol][str(codon)]['Q-value'])
+ vcfrec_protein_info_tags['CANCER_MUTATION_HOTSPOT'][alt_allele] = cancer_hotspot_description
+
+
+def index_uniprot(uniprot_fname, index = 'ensGene_transcript'):
+
+ """
+ Function that creates a dictionary of UniProt annotations using ENSEMBL gene transcripts as keys
+ """
+
+ uniprot_xref = {}
+ with gzip.open(uniprot_fname, 'rb') as tsvfile:
+ upreader = csv.DictReader(tsvfile, delimiter='\t', quotechar='|')
+ for rec in upreader:
+ if index == 'ensGene_transcript' and not rec['transcript_id'].startswith('ENST'):
+ continue
+ if not uniprot_xref.has_key(rec['transcript_id']):
+ uniprot_xref[rec['transcript_id']] = []
+ uniprot_xref[rec['transcript_id']].append(rec)
+ return uniprot_xref
+
+def index_pfam(pfam_fname):
+
+ """
+ Function that adds one interval tree per uniprot ID, individual PFAM domains and their corresponding amino acid positions are
+ appended to the tree, enabling rapid lookup and annotation for gene variants/amino acid positions
+ """
+ # dictionary mapping uniprot ids to interval trees
+ pfam = dict()
+
+ # parse the UniProt-PFAM annotations file (tsv) and build the interval trees
+ with gzip.open(pfam_fname, 'r') as annotations_file:
+ reader = csv.DictReader(annotations_file, delimiter='\t', quotechar = '|')
+ for rec in reader:
+ # one interval tree per uniprot ID
+ if rec['uniprot_id'] in pfam:
+ tree = pfam[rec['uniprot_id']]
+ else:
+ # first time we've encountered this chromosome, create an interval tree
+ tree = IntervalTree()
+ pfam[rec['uniprot_id']] = tree
+
+ # index the feature
+ tree.add(int(rec['aa_start']), int(rec['aa_stop']), rec['pfam_id'])
+
+ return pfam
+
+
+def index_uniprot_features(uniprot_feature_fname):
+ """
+ Function that adds one interval tree per uniprot ID, individual UniProt features and their corresponding amino acid positions (e.g. active sites etc) are
+ appended to the tree, enabling rapid lookup and annotation for gene variants/amino acid positions
+ """
+
+ uniprot_features = dict()
+
+ observed_feats = {}
+ # parse the UniProt-PFAM annotations file (tsv) and build the interval trees
+ with gzip.open(uniprot_feature_fname, 'rb') as annotations_file:
+ reader = csv.DictReader(annotations_file, delimiter='\t', quotechar = '|')
+ for rec in reader:
+ if re.match(r'(CA_BIND|ZN_FING|DNA_BIND|NP_BIND|REGION|MOTIF|ACT_SITE|METAL|BINDING|SITE|MOD_RES|NON_STD|CARBOHYD|DISULFID|CROSSLNK|MUTAGEN)',rec['feature_type']) != None:
+ # one interval tree per uniprot ID
+ if rec['uniprot_id'] in uniprot_features:
+ tree = uniprot_features[rec['uniprot_id']]
+ else:
+ # first time we've encountered this chromosome, create an interval tree
+ tree = IntervalTree()
+ uniprot_features[rec['uniprot_id']] = tree
+
+ feat_key = str(rec['uniprot_id']) + '_' + str(rec['key'])
+ if not observed_feats.has_key(feat_key):
+ # index the feature
+ tree.add(int(rec['aa_start']), int(rec['aa_stop']), rec['key'].replace('#',':'))
+ observed_feats[feat_key] = 1
+
+ return uniprot_features
+
+def get_uniprot_data_by_transcript(up_xref, transcript_id, csq):
+
+ uniprot_mappings = None
+ uniprot_ids = {}
+ symbols = {}
+ seqmatches = {}
+ if up_xref.has_key(transcript_id):
+ uniprot_mappings = up_xref[transcript_id]
+ if not uniprot_mappings is None:
+ if len(uniprot_mappings) > 1:
+ for k in uniprot_mappings:
+ if k['symbol'] == csq['SYMBOL']:
+ uniprot_ids[k['uniprot_id']] = 1
+ seqmatches[k['uniprot_seq_match']] = 1
+ else:
+ uniprot_ids[uniprot_mappings[0]['uniprot_id']] = 1
+ seqmatches[uniprot_mappings[0]['uniprot_seq_match']] = 1
+
+ csq['UNIPROT_ID'] = '&'.join(uniprot_ids.keys())
+ csq['SEQ_MATCH'] = '&'.join(seqmatches.keys())
+
+
+
+def get_domains_features_by_aapos(xref, csq, qtype = 'domain'):
+
+ """
+ Function that finds protein features/protein domains based on the position given as input (i.e. csq['AA_position']) in the given protein (e.g. csq['UNIPROT_ID'])
+ """
+
+ if csq['UNIPROT_ID'] != '' and csq['AA_position'] != '' and csq['SEQ_MATCH'] != '.':
+ if xref.has_key(csq['UNIPROT_ID']):
+ start = None
+ stop = None
+ if '-' in csq['AA_position']:
+ start, stop = csq['AA_position'].split('-')
+ else:
+ start = csq['AA_position']
+ stop = start
+
+ if not start is None and not stop is None and (csq['SEQ_MATCH'] == 'IDENTICAL_SEQUENCE' or csq['SEQ_MATCH'] == 'IDENTICAL_LENGTH'):
+ if qtype == 'domain':
+ csq['DOMAIN'] = '&'.join(xref[csq['UNIPROT_ID']].find(int(start), int(stop)))
+ elif qtype == 'feature':
+ csq['UNIPROT_FEATURE'] = '&'.join(xref[csq['UNIPROT_ID']].find(int(start), int(stop)))
+
+
+def index_pfam_names(pfam_domains_fname, ignore_versions = False):
+ """
+ Function that indexes PFAM protein domain nmames, found in 'pfam_domains_fname' with the following format:
+
+ pfam_id url name
+ F02671.20 Paired amphipathic helix repeat Paired amphipathic helix repeat
+ PF09810.8 Exonuclease V - a 5' deoxyribonuclease Exonuclease V - a 5' deoxyribonuclease
+
+ A dictionary is returned, with pfam_id as the key and the domain name as the value
+ """
+ pfam_domain_names = {}
+ with gzip.open(pfam_domains_fname, 'rb') as tsvfile:
+ pfam_reader = csv.DictReader(tsvfile, delimiter='\t', quotechar='|')
+ for rec in pfam_reader:
+ if ignore_versions is True:
+ pfam_domain_names[re.sub(r'\.[0-9]{1,}$','',rec['pfam_id'])] = rec['name']
+ else:
+ pfam_domain_names[rec['pfam_id']] = rec['name']
+ return pfam_domain_names
+
+
+def index_uniprot_feature_names(sp_features_fname):
+ """
+ Function that indexes UniProt/KB function features, found in 'sp_features_fname' with the following format:
+
+ feature_type uniprot_id description aa_start aa_stop key type_description
+ CHAIN SERC_HUMAN Phosphoserine aminotransferase 1 370 CHAIN#1#370 Polypeptide chain in the mature protein
+ REGION SERC_HUMAN Pyridoxal phosphate binding 79 80 REGION#79#80 Region of interest
+
+ 'key' denotes the feature type and amino acid position(s) of the feature
+
+ A dictionary is returned, with uniprot_id + key as the dictionary key, e.g. SERC_HUMAN:REGION:79:80
+ and the full dictionary record as the value
+ """
+ swissprot_features = {}
+ with gzip.open(sp_features_fname, 'rb') as tsvfile:
+ sp_reader = csv.DictReader(tsvfile, delimiter='\t', quotechar='|')
+ for rec in sp_reader:
+ sp_key = rec['uniprot_id'] + ':' + re.sub(r'#',':',rec['key'])
+ swissprot_features[sp_key] = rec
+ return swissprot_features
+
+
+def index_clinvar(clinvar_tsv_fname):
+ clinvar_xref = {}
+ with gzip.open(clinvar_tsv_fname, 'rb') as tsvfile:
+ cv_reader = csv.DictReader(tsvfile, delimiter='\t')
+ for rec in cv_reader:
+
+ unique_traits = {}
+ traits = ''
+ traits = rec['all_traits']
+ for t in traits.split(';'):
+ t_lc = str(t).lower()
+ unique_traits[t_lc] = 1
+ origin = ''
+ origin = rec['origin']
+
+ traits_curated = ';'.join(unique_traits.keys())
+ traits_origin = traits_curated + ' - ' + str(origin)
+
+ clinvar_xref[rec['measureset_id']] = {}
+ clinvar_xref[rec['measureset_id']]['phenotype_origin'] = traits_origin
+ if rec['symbol'] == '-' or rec['symbol'] == 'more than 10':
+ rec['symbol'] = 'NA'
+ clinvar_xref[rec['measureset_id']]['genesymbol'] = rec['symbol']
+
+ return clinvar_xref
+
+
+def getlogger(logger_name):
+ logger = logging.getLogger(logger_name)
+ logger.setLevel(logging.DEBUG)
+
+ # create console handler and set level to debug
+ ch = logging.StreamHandler(sys.stdout)
+ ch.setLevel(logging.DEBUG)
+
+ # add ch to logger
+ logger.addHandler(ch)
+
+ # create formatter
+ formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s", "20%y-%m-%d %H:%M:%S")
+
+ #add formatter to ch
+ ch.setFormatter(formatter)
+
+ return logger
+
diff --git a/src/pcgr/lib/transcript.py b/src/pcgr/lib/transcript.py
deleted file mode 100755
index 50e4f8c5..00000000
--- a/src/pcgr/lib/transcript.py
+++ /dev/null
@@ -1,37 +0,0 @@
-#!/usr/bin/env python
-
-import os,re,sys
-import logging
-import csv
-import gzip
-from bx.intervals.intersection import IntervalTree
-
-csv.field_size_limit(500 * 1024 * 1024)
-
-
-def index_gene(gene_annotations_file_path, index = 'ensGene_transcript'):
-
- gene_xref = {}
- with gzip.open(gene_annotations_file_path, 'rb') as tsvfile:
- greader = csv.DictReader(tsvfile, delimiter='\t', quotechar='|')
- for rec in greader:
- if index != 'symbol':
- transcript_id = rec['refseq_mrna']
- if index == 'ensGene_transcript':
- transcript_id = rec['ensembl_transcript_id']
- if index == 'refGene_transcript' and not (transcript_id.startswith('NM_') or transcript_id.startswith('NR_')):
- continue
- if index == 'ensGene_transcript' and not transcript_id.startswith('ENST'):
- continue
- if not gene_xref.has_key(transcript_id):
- gene_xref[transcript_id] = []
- gene_xref[transcript_id].append(rec)
- else:
- if not rec['gene_biotype'] == 'LRG_gene':
- if not gene_xref.has_key(rec['symbol']):
- gene_xref[rec['symbol']] = []
- gene_xref[rec['symbol']].append(rec)
- if not gene_xref.has_key(rec['ensembl_gene_id']):
- gene_xref[rec['ensembl_gene_id']] = []
- gene_xref[rec['ensembl_gene_id']].append(rec)
- return gene_xref
diff --git a/src/pcgr/lib/uniprot.py b/src/pcgr/lib/uniprot.py
deleted file mode 100755
index f4dcd112..00000000
--- a/src/pcgr/lib/uniprot.py
+++ /dev/null
@@ -1,140 +0,0 @@
-#!/usr/bin/env python
-
-import os,re,sys
-import logging
-import csv
-import gzip
-from bx.intervals.intersection import IntervalTree
-
-csv.field_size_limit(500 * 1024 * 1024)
-
-
-def index_uniprot(uniprot_file_path, index = 'refGene_transcript'):
-
- uniprot_xref = {}
- with gzip.open(uniprot_file_path, 'rb') as tsvfile:
- upreader = csv.DictReader(tsvfile, delimiter='\t', quotechar='|')
- for rec in upreader:
- if index == 'refGene_transcript' and not (rec['transcript_id'].startswith('NM_') or rec['transcript_id'].startswith('NR_')):
- continue
- if index == 'ensGene_transcript' and not rec['transcript_id'].startswith('ENST'):
- continue
- if not uniprot_xref.has_key(rec['transcript_id']):
- uniprot_xref[rec['transcript_id']] = []
- uniprot_xref[rec['transcript_id']].append(rec)
- return uniprot_xref
-
-def index_pfam(pfam_file_path):
-
- # dictionary mapping uniprot_ids to interval trees
- pfam = dict()
-
- # parse the UniProt-PFAM annotations file (tsv) and build the interval trees
- with gzip.open(pfam_file_path, 'r') as annotations_file:
- reader = csv.DictReader(annotations_file, delimiter='\t', quotechar = '|')
- for rec in reader:
- # one interval tree per uniprot ID
- if rec['uniprot_id'] in pfam:
- tree = pfam[rec['uniprot_id']]
- else:
- # first time we've encountered this chromosome, create an interval tree
- tree = IntervalTree()
- pfam[rec['uniprot_id']] = tree
-
- # index the feature
- tree.add(int(rec['aa_start']), int(rec['aa_stop']), rec['pfam_id'])
-
- return pfam
-
-
-def index_uniprot_features(uniprot_feature_file_path):
- uniprot_features = dict()
-
-
- observed_feats = {}
- # parse the UniProt-PFAM annotations file (tsv) and build the interval trees
- with gzip.open(uniprot_feature_file_path, 'rb') as annotations_file:
- reader = csv.DictReader(annotations_file, delimiter='\t', quotechar = '|')
- for rec in reader:
- if re.match(r'(CA_BIND|ZN_FING|DNA_BIND|NP_BIND|REGION|MOTIF|ACT_SITE|METAL|BINDING|SITE|MOD_RES|NON_STD|CARBOHYD|DISULFID|CROSSLNK|MUTAGEN)',rec['feature_type']) != None:
- # one interval tree per uniprot ID
- if rec['uniprot_id'] in uniprot_features:
- tree = uniprot_features[rec['uniprot_id']]
- else:
- # first time we've encountered this chromosome, create an interval tree
- tree = IntervalTree()
- uniprot_features[rec['uniprot_id']] = tree
-
- feat_key = str(rec['uniprot_id']) + '_' + str(rec['key'])
- if not observed_feats.has_key(feat_key):
- # index the feature
- tree.add(int(rec['aa_start']), int(rec['aa_stop']), rec['key'].replace('#',':'))
- observed_feats[feat_key] = 1
-
- return uniprot_features
-
-def get_uniprot_data_by_transcript(up_xref, transcript_id, csq):
-
- uniprot_mappings = None
- uniprot_ids = {}
- symbols = {}
- seqmatches = {}
- if up_xref.has_key(transcript_id):
- uniprot_mappings = up_xref[transcript_id]
- if not uniprot_mappings is None:
- if len(uniprot_mappings) > 1:
- for k in uniprot_mappings:
- if k['symbol'] == csq['SYMBOL']:
- uniprot_ids[k['uniprot_id']] = 1
- seqmatches[k['uniprot_seq_match']] = 1
- else:
- uniprot_ids[uniprot_mappings[0]['uniprot_id']] = 1
- seqmatches[uniprot_mappings[0]['uniprot_seq_match']] = 1
-
- csq['UNIPROT_ID'] = '&'.join(uniprot_ids.keys())
- csq['SEQ_MATCH'] = '&'.join(seqmatches.keys())
-
-
-
-def get_domains_features_by_aapos(xref, csq, qtype = 'domain'):
- if csq['UNIPROT_ID'] != '' and csq['AA_position'] != '' and csq['SEQ_MATCH'] != '.':
- if xref.has_key(csq['UNIPROT_ID']):
- start = None
- stop = None
- if '-' in csq['AA_position']:
- start, stop = csq['AA_position'].split('-')
- else:
- start = csq['AA_position']
- stop = start
-
- if not start is None and not stop is None and (csq['SEQ_MATCH'] == 'IDENTICAL_SEQUENCE' or csq['SEQ_MATCH'] == 'IDENTICAL_LENGTH'):
- if qtype == 'domain':
- csq['DOMAIN'] = '&'.join(xref[csq['UNIPROT_ID']].find(int(start), int(stop)))
- elif qtype == 'feature':
- csq['UNIPROT_FEATURE'] = '&'.join(xref[csq['UNIPROT_ID']].find(int(start), int(stop)))
-
-
-def index_pfam_names(pfam_domains_file_path, ignore_versions = False):
-
- pfam_domain_names = {}
- with gzip.open(pfam_domains_file_path, 'rb') as tsvfile:
- pfam_reader = csv.DictReader(tsvfile, delimiter='\t', quotechar='|')
- for rec in pfam_reader:
- if ignore_versions is True:
- pfam_domain_names[re.sub(r'\.[0-9]{1,}$','',rec['pfam_id'])] = rec['name']
- else:
- pfam_domain_names[rec['pfam_id']] = rec['name']
- return pfam_domain_names
-
-
-def index_uniprot_feature_names(sp_features_file_path):
-
- swissprot_features = {}
- with gzip.open(sp_features_file_path, 'rb') as tsvfile:
- sp_reader = csv.DictReader(tsvfile, delimiter='\t', quotechar='|')
- for rec in sp_reader:
- sp_key = rec['uniprot_id'] + ':' + re.sub(r'#',':',rec['key'])
- swissprot_features[sp_key] = rec
- return swissprot_features
-
-
diff --git a/src/pcgr/pcgr_check_input.py b/src/pcgr/pcgr_check_input.py
index e080661b..8502daa3 100755
--- a/src/pcgr/pcgr_check_input.py
+++ b/src/pcgr/pcgr_check_input.py
@@ -7,43 +7,29 @@
import subprocess
import logging
import sys
+import pcgrutils
import pandas as np
+import cyvcf
from cyvcf2 import VCF
def __main__():
parser = argparse.ArgumentParser(description='Verify input data for PCGR')
+ parser.add_argument('pcgr_dir',help='Docker location of PCGR base directory with accompanying data directory, e.g. /data')
parser.add_argument('input_vcf', help='VCF input file with somatic query variants (SNVs/InDels)')
parser.add_argument('input_cna_segments', help='Somatic copy number query segments (tab-separated values)')
args = parser.parse_args()
- ret = verify_input(args.input_vcf, args.input_cna_segments)
+ ret = verify_pcgr_input(args.pcgr_dir, args.input_vcf, args.input_cna_segments)
if ret != 0:
sys.exit(-1)
-def getlogger(logger_name):
- logger = logging.getLogger(logger_name)
- logger.setLevel(logging.DEBUG)
-
- # create console handler and set level to debug
- ch = logging.StreamHandler(sys.stdout)
- ch.setLevel(logging.DEBUG)
-
- # add ch to logger
- logger.addHandler(ch)
-
- # create formatter
- formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s", "20%y-%m-%d %H:%M:%S")
-
- #add formatter to ch
- ch.setFormatter(formatter)
-
- return logger
-
-
def is_valid_cna_segment_file(cna_segment_file, logger):
+ """
+ Function that checks whether the CNA segment file adheres to the correct format
+ """
cna_reader = csv.DictReader(open(cna_segment_file,'r'), delimiter='\t')
- if not ('Chromosome' in cna_reader.fieldnames and 'Segment_Mean' in cna_reader.fieldnames and 'Start' in cna_reader.fieldnames and 'End' in cna_reader.fieldnames):
+ if not ('Chromosome' in cna_reader.fieldnames and 'Segment_Mean' in cna_reader.fieldnames and 'Start' in cna_reader.fieldnames and 'End' in cna_reader.fieldnames): ## check that required columns are present
error_message_cnv = "Copy number segment file (" + str(cna_segment_file) + ") is missing required column(s): 'Chromosome', 'Start', 'End', or 'Segment_Mean'"
error_message_cnv2 = "Column names present in file: " + str(cna_reader.fieldnames)
logger.error('')
@@ -53,29 +39,29 @@ def is_valid_cna_segment_file(cna_segment_file, logger):
return -1
cna_dataframe = np.read_csv(cna_segment_file, sep="\t")
- if not cna_dataframe['Start'].dtype.kind in 'i':
+ if not cna_dataframe['Start'].dtype.kind in 'i': ## check that 'Start' is of type integer
logger.error('')
logger.error('\'Start\' column of copy number segment file contains non-integer values')
logger.error('')
return -1
- if not cna_dataframe['End'].dtype.kind in 'i':
+ if not cna_dataframe['End'].dtype.kind in 'i': ## check that 'End' is of type integer
logger.error('')
logger.error('\'End\' column of copy number segment file contains non-integer values')
logger.error('')
return -1
- if not cna_dataframe['Segment_Mean'].dtype.kind in 'if':
+ if not cna_dataframe['Segment_Mean'].dtype.kind in 'if': ## check that 'Segment_Mean' is of type integer/float
logger.error('')
logger.error('\'Segment_Mean\' column of copy number segment file contains non-numerical values')
logger.error('')
return -1
for rec in cna_reader:
- if int(rec['End']) < int(rec['Start']):
+ if int(rec['End']) < int(rec['Start']): ## check that 'End' is always greather than 'Start'
logger.error('')
logger.error('Detected wrongly formatted chromosomal segment - \'Start\' is greater than \'End\' (' + str(rec['Chromosome']) + ':' + str(rec['Start']) + '-' + str(rec['End']) + ')')
logger.error('')
return -1
- if rec['End'] < 1 or rec['Start'] < 1:
+ if rec['End'] < 1 or rec['Start'] < 1: ## check that 'Start' and 'End' is always non-negative
logger.error('')
logger.error('Detected wrongly formatted chromosomal segment - \'Start\' or \'End\' is less than or equal to zero (' + str(rec['Chromosome']) + ':' + str(rec['Start']) + '-' + str(rec['End']) + ')')
logger.error('')
@@ -86,19 +72,21 @@ def is_valid_cna_segment_file(cna_segment_file, logger):
def is_valid_vcf(vcf_validator_output_file):
-
+ """
+ Function that reads the output file of EBIvariation/vcf-validator and reports potential errors and validation status
+ """
valid_vcf = -1
ret = {}
if os.path.exists(vcf_validator_output_file):
f = open(vcf_validator_output_file,'r')
error_messages = []
for line in f:
- if not re.search(r' \(warning\)$|^Reading from ',line.rstrip()):
+ if not re.search(r' \(warning\)$|^Reading from ',line.rstrip()): ## ignore warnings
if line.startswith('Line '):
error_messages.append(line.rstrip())
- if line.endswith('the input file is valid'):
+ if line.endswith('the input file is valid'): ## valid VCF
valid_vcf = 1
- if line.endswith('the input file is not valid'):
+ if line.endswith('the input file is not valid'): ## non-valid VCF
valid_vcf = 0
f.close()
os.system('rm -f ' + str(vcf_validator_output_file))
@@ -107,10 +95,51 @@ def is_valid_vcf(vcf_validator_output_file):
return ret
-def verify_input(input_vcf, input_cna_segments):
+def check_existing_vcf_info_tags(input_vcf, pcgr_directory, logger):
+
+ """
+ Function that compares the INFO tags in the query VCF and the INFO tags generated by PCGR
+ If any coinciding tags, an error will be returned
+ """
- logger = getlogger('pcgr-check-input')
+ vep_infotags_desc = pcgrutils.read_infotag_file(os.path.join(pcgr_directory,'data','vep_infotags.tsv'))
+ pcgr_infotags_desc = pcgrutils.read_infotag_file(os.path.join(pcgr_directory,'data','pcgr_infotags.tsv'))
+
+ vcfanno_tags = {}
+ for db in ['intogen_driver_mut','dbsnp','oneKG','docm','exac','gnomad','civic','cbmdb','dbnsfp','clinvar','icgc','cosmic']:
+ vcfanno_tag_file = os.path.join(pcgr_directory,'data',str(db),str(db) + '.vcfanno.vcf_info_tags.txt')
+ try:
+ f = open(vcfanno_tag_file, 'r')
+ for line in f:
+ if line.startswith('##INFO'):
+ tag = re.sub(r'##INFO= 0):
if k == 'ONCOGENE' or k == 'TUMOR_SUPPRESSOR':
all_info_vals.append(str(k))
else:
all_info_vals.append(str(k) + '=' + ','.join(values))
- for k in extended_protein_info_tags.keys():
+ for k in pcgr_protein_info_tags.keys():
values = []
if k == 'PROTEIN_POSITIONS':
continue
- for alt_allele in sorted(extended_protein_info_tags[k].keys()):
+ for alt_allele in sorted(pcgr_protein_info_tags[k].keys()):
if k == 'UNIPROT_FEATURE':
- if len(extended_protein_info_tags[k][alt_allele].keys()) != 0:
- values.append('&'.join(extended_protein_info_tags[k][alt_allele].keys()))
+ if len(pcgr_protein_info_tags[k][alt_allele].keys()) != 0:
+ values.append('&'.join(pcgr_protein_info_tags[k][alt_allele].keys()))
else:
- if extended_protein_info_tags[k][alt_allele] != '.':
- values.append(extended_protein_info_tags[k][alt_allele])
+ if pcgr_protein_info_tags[k][alt_allele] != '.':
+ values.append(pcgr_protein_info_tags[k][alt_allele])
if(len(values) > 0):
all_info_vals.append(str(k) + '=' + ','.join(values))
diff --git a/src/pcgr/pcgr_vcfanno.py b/src/pcgr/pcgr_vcfanno.py
index 4618e439..8e857854 100755
--- a/src/pcgr/pcgr_vcfanno.py
+++ b/src/pcgr/pcgr_vcfanno.py
@@ -40,6 +40,9 @@ def __main__():
run_vcfanno(args.num_processes, args.query_vcf, query_info_tags, vcfheader_file, args.pcgr_dir, conf_fname, args.out_vcf, args.cosmic, args.icgc, args.exac, args.docm, args.intogen_driver_mut, args.clinvar, args.dbsnp, args.dbnsfp, args.oneKG, args.civic, args.cbmdb, args.gnomad)
def run_vcfanno(num_processes, query_vcf, query_info_tags, vcfheader_file, pcgr_directory, conf_fname, output_vcf, cosmic, icgc, exac, docm, intogen_driver_mut, clinvar, dbsnp, dbnsfp, oneKG, civic, cbmdb, gnomad):
+ """
+ Function that annotates a VCF file with vcfanno against a user-defined set of germline and somatic VCF files
+ """
pcgr_db_directory = pcgr_directory + '/data'
if cosmic is True:
@@ -148,12 +151,17 @@ def run_vcfanno(num_processes, query_vcf, query_info_tags, vcfheader_file, pcgr_
return 0
def append_to_vcf_header(pcgr_db_directory, dbsource, vcfheader_file):
-
+ """
+ Function that appends the VCF header information for a given 'dbsource' (containing INFO tag formats/descriptions, and dbsource version)
+ """
vcf_info_tags_file = str(pcgr_db_directory) + '/' + str(dbsource) + '/' + str(dbsource) + '.vcfanno.vcf_info_tags.txt'
os.system('cat ' + str(vcf_info_tags_file) + ' >> ' + str(vcfheader_file))
def append_to_conf_file(dbsource, pcgr_db_directory, conf_fname):
+ """
+ Function that appends data to a vcfanno conf file ('conf_fname') according to user-defined ('dbsource'). The dbsource defines the set of tags that will be appended during annotation
+ """
fh = open(conf_fname,'a')
if dbsource != 'civic':
fh.write('[[annotation]]\n')