Skip to content

WORKFLOW: Hybrid Assembly

Rauf Salamzade edited this page Dec 22, 2020 · 1 revision

The Hybrid Assembly workflow is used for generating hybrid Nanopore+Illumina assemblies in parallel for multiple samples. It will produce two assemblies per sample by default, one where Illumina sequencing data is subsampled and one using the full Illumina sequencing dataset.

Parameter Identifier Parameter Value Type / Default Parameter Description
read_subsampling Integer. 2500000 The number of Illumina reads to subsample for the construction of the assembly where it is performed. Should correspond to 100X coverage.
nanomerge_memory Integer. 24 The memory (in Gb) to use for merging Nanopore FASTQs.
nanomerge_timelimit String. 02:00:00 The time limit for merging Nanopore FASTQs.
nanotrim_memory Integer. 24 The memory (in Gb) to use for Porechop to trim adapters from Nanopore FASTQ.
nanotrim_timelimit String. 10:00:00 The time limit for trimming Nanopore FASTQ.
fastqfilter_options String. -l 3000 -L 20000 -b 300000000 Options for subsampling of Nanopore FASTQ using fastqfilter script in /path/to/seQuoia/scripts/
nanosubsample_memory Integer. 32 The memory (in Gb) to use for fastqfilter subsampling of Nanopore FASTQ.
nanosubsample_timelimit String. 10:00:00 The time limit for subsampling of Nanopore FASTQ.
unicycler_options String. --mode normal --verbosity 2 Options for running Unicycler hybrid assembly.
unicycler_threads Integer. 4 The number of cores/threads to use for running Unicycler.
unicycler_memory Integer. 16 The memory (in Gb per core/thread) to use for running Unicycler.
unicycler_timelimit String. 48:00:00 The time limit for running Unicycler.
pilonpolishing_max_iterations Integer. 10 The max number of iterations to perform when performing iterative assembly refinement using Pilon.
pilonpolishing_threads Integer. 4 The number of cores/threads to provide Pilon for iterative refinement.
pilonpolishing_memory Integer. 8 The memory (in Gb per core/thread) to use for running Pilon for iterative refinement.
pilonpolishing_timelimit String. 24:00:00 The time limit for running Pilon based iterative assembly refinement.
run_guinan Boolean. False Whether to run GAEMR/Guinan suite for secondary sanity check of adapters making it into the assembly. Currently only works on Broad servers.
contig_size_filter Integer. 0 Whether to filter contigs shorter than a certain length.
run_gaemr_ont Boolean. False Whether to run GAEMR with Nanopore read coverage accounted for. Currently only works on Broad servers.
gaemr_formatter_options -g 1 -c 100 -r Options to use for GAEMR assembly formatting/preparation for QC analysis.
gaemr_qc_options _--force --analyze_rna Options to use for running GAEMR assembly QC analysis.
gaemr_threads 2 The number of cores/threads to provide GAEMR for iterative refinement.
gaemr_memory 32 The memory (in Gb per core/thread) to use for running GAEMR.
gaemr_timelimit String. 24:00:00 The time limit for running GAEMR assembly QC.
run_canu Boolean. False Whether to create a third assembly using only Nanopore data with canu.
canu_options String. stopOnReadQuality=false genomeSize=5m Options to run canu with.
canu_threads Integer. 4 The number of cores/threads to provide canu assembly.
canu_memory Integer. 24 The memory (in Gb per core/thread) to use for running canu.
canu_timelimit String. 07:00:00:00 The time limit for running canu assembly.
run_nanopolish Boolean. False Whether to perform nanopolish refinement of canu assembly.
nanopolish_options String. Options for nanopolish variants.
nanopolish_threads Integer. 16 The number of cores/threads to provide nanopolish.
nanopolish_memory Integer. 10 The number of memory (in Gb per core/thread) to use for nanopolish.
nanopolish_timelimit String. 72:00:00 The time limit for running nanopolish assembly refinement.
run_pilon_on_canu Boolean. True Whether to run pilon refinement on canu assembly.
run_reorganize Boolean. True Whether to restructure/organize the directory structure for each sample after all steps have run.