GRAPES stands for "Genomic Rearrangement Analysis of Panel Enrichment Sequencing". It is yet another SV and CNV caller, but with new interesting features (Keep reading!)
- SV and CNV calling for targeted sequencing: using data from on-target, off-target and breakpoint-informative reads.
- Clustering of samples with highly correlated read depth for improved CNV calling specificity.
- Dynamic update of sample references using SQLite.
- New scoring metric to evaluate the confidence of each CNV.
- Improved single-exon CNV detection.
- Improved speed performance.
- BAF extraction on the fly.
- Several bug fixes.
There is a docker repository available at:
docker pull bdolmo/grapes
And run commands with:
docker run -it bdolmo/grapes:latest GRAPES
GRAPES will only work on Unix-based systems and supports only human genome. Before building and installing GRAPES, make sure you have available on path:
- Perl
- R (>= 3.3). In addition Rscript must be accessible
- g++ compiler (>= 4.7)
- Boost C++ library
- openMP C++ API
- BEDtools
- SAMtools
- Tabix
- macs2
- SQLite3
- GNU core utils: wget, awk, sort, cat, grep, head, tail, sed, cut, paste, uniq, wc and mv. macOS users: need to install Homebrew (https://brew.sh/) and the latest XCode.
- Parallel::ForkManager
- Sort::Key::Natural
- Statistics::Descriptive
- JSON::MaybeXS
- Excel::Writer::XLSX
You can install them through CPAN:
cpan Parallel::ForkManager Sort::Key::Natural Statistics::Descriptive JSON::MaybeXS Excel::Writer::XLSX
Then you can download and install the latest release:
git clone --recursive https://github.com/bdolmo/GRAPES.git
cd GRAPES
./INSTALL.PL
INSTALL.PL will compile all C++ code along with the required R packages for segmentation and plotting. In addition it will download both 75-mer and 100-mer mappability tracks for GRCh37 and GRCh38 and gnomaAD annotation files.
GRAPES wes --pooled <bam_dir> --outdir <output_dir> --bed <roi> --fasta <genome_fa> --genome <hg19/hg38> --all
(docker)
docker run -t -i \
-v $HOME/BAM_FOLDER:/bam_folder \
-v $HOME/BED_FOLDER:/bed_folder \
-v $HOME/GENOME_FOLDER:/genome_folder \
-v $HOME/OUTPUT_FOLDER:/output_folder \
-it bdolmo/grapes:latest GRAPES wes \
-all -pooled /bam_folder/ -b /bed_folder/targets.bed -f /genome_folder/genome.fa -g <hg19/hg38> -o /out_dir/ -t 4
GRAPES wes --cases <bam_dir> --control <bam_dir> --outdir <output_dir> --bed <roi> --fasta <genome_fa> --genome <hg19/hg38> --all
(docker)
docker run -t -i \
-v $HOME/CASE_FOLDER:/case_folder \
-v $HOME/CONTROL_FOLDER:/control_folder \
-v $HOME/BED_FOLDER:/bed_folder \
-v $HOME/GENOME_FOLDER:/genome_folder \
-v $HOME/OUTPUT_FOLDER:/output_folder \
-it bdolmo/grapes:latest GRAPES wes \
-all -cases /case_folder/ -controls /control_folder/ -b /bed_folder/targets.bed -f /genome_folder/genome.fa -g <hg19/hg38> -o /out_dir/ -t 4
-o,--outdir STRING Output directory
-f,--fasta STRING Output directory
-g,--genome STRING Genome reference in FASTA format
-r,--genome_version STRING Genome version. Choose: hg19, hg38 (default = hg19)
-b,--bed STRING Regions file in BED format
-t,--threads INT Number of CPUs (default = 1)
--all Perform all steps below
--refdir Input directory where DB references will be stored.
--breakpoint Perform Breakpoint analysis
--nobreakpoint Turn off breakpoint analysis
--extract Extract Depth, GC and Mappability
--offtarget Perform Off-target analysis
--noofftarget Turn off offtarget analysis
--buildref Build a reference from a pool of samples
--callcnv Segment and call CNVs
--nocallcnv Turn off CNV calling
--normalize Normalize read depth. Choose from 'median', 'PCA'. (default='median')
--plotsingleexon Plot single exon CNVs
--noplotcnv Turn off CNV plotting
--plotlargecnv Plot segmented CNVs
--noplotcnv Turn off CNV plotting
--plotscatter Plot genome-wide CNV scatter plot
--vaf Include Variant-Allele Frequency (VAF) analysis
--novaf Turn off VAF analysis
--samtools Default if --vaf set. Perform variant call with samtools
--freebayes if --vaf set, perform variant call with freebayes
--annotate Annotate VCF
--noannotate Turn off VCF annotation
--filtervcf Filter low qual VCF entries
--nofiltervcf Turn off VCF filtering
--reporthtml Write results to an HTML file (Only for gene panels)
--noreporthtml Turn off report HTML creation
--filterdiscordantonly Filter discordant-only SV predictions (default = true)
--nofilterdiscordantonly Turn off discordant-pair only filtering
--verbose Print sub-command messages
--mincorr FLOAT Minimum pairwise-correlation to build a reference set (default = 0.91)
--minrefsize INT Default minimum number of samples to build a single baseline (default = 2)
--maxrefsize INT Default maximum number of samples to build a single baseline (default = 15)
--minzscore FLOAT Minimum Z-score required to output a CNV prediction (default = 2.58)
--pcavariance FLOAT Variance to remove when normalizing by PCA (default = 0.7)
--lowerdelcutoff FLOAT Lower-bound deletion cutoff ratio (default = 0.35)
--upperdelcutoff FLOAT Upper-bound deletion cutoff ratio (default = 0.71)
--lowerdupcutoff FLOAT Lower-bound duplication cutoff ratio (default = 1.24)
--minofftargetreads INT Default minimum number of offtarget reads required to trigger off-target analysis (default = 1e6)
--minofftargetsd FLOAT Default minimum number of std.dev from off-target rartios trigger off-target analysis (default = 0.2)
--minsvsize INT Minimum SV size to report a breakpoint call (default = 15)
--maxsvsize INT Maximum SV size to report a breakpoint call (default = 5000000)
--mindiscordants INT Minimum number of discordant read pairs (default = 5)
--mindiscordantssd INT Minimum std.deviations from the mean insert size to consider discordant pairs (default = 10)
--breakreads INT Minimum number of break reads (default = 5)
GRAPES annotate --input_vcf <vcf> --output_dir <output_dir> --r_overlap <reciprocal_overlap>
-i,--input_vcf STRING Input VCF file
-o,--output_dir STRING Output directory (default=.)
-l,--r_overlap FLOAT Reciprocal overlap fraction (default=0.5)