We describe below the pipeline used for the analysis as well as brief description of the various scripts. For more details, see the scripts themselves, which contain the usage. Note that paths to various external tools are hardcoded and these tools need to be installed and the paths need to be updated before running the analysis (see ../TOOLS.md
for details). In most cases, the scripts with name starting with run_exp_
were actually run to perform the experiments for all the different compression parameters, the remaining scripts are called within these scripts.
generate_lossy_fast5.py
: Generate new fast5 files with the raw signal replaced by a lossily compressed version (called inrun_exp_generate_fast5.sh
). This is also used to perform VBZ lossless compression and compute the compressed sizes.
basecall_guppy.sh
: Perform basecalling with guppy (hac/fast) (called inrun_exp_basecall_analysis_guppy.sh
).basecall_bonito.sh
: Perform basecalling with bonito (called inrun_exp_basecall_analysis_bonito.sh
).fix_fastq_bonito.py
: used to fix the fastq files generated by seqtk when using bonito. Bonito produces a fasta file and we use seqtk to convert to fastq, but sometimes the basecalling produces an empty sequence and seqtk produces an invalid fastq file (called inbasecall_bonito.sh
).analysis_basecall_accuracy.sh
: Perform basecalling accuracy analysis (called inrun_exp_basecall_analysis*.sh
).read_length_identity.py
: Perform basecalling analysis (called inanalysis_basecall_accuracy.sh
).
assembly_guppy.sh
: Perform assembly with Flye+Rebaler+Medaka for guppy basecalled reads (called inrun_exp_assembly_guppy.sh
).assembly_bonito.sh
: Perform assembly with Flye+Rebaler+Medaka for bonito basecalled reads (called inrun_exp_assembly_bonito.sh
).analysis_assembly.sh
: Perform assembly/consensus accuracy analysis (called inrun_exp_assembly_analysis.sh
).chop_up_assembly.py
,medians.py
andread_length_identity.py
: Called inanalysis_assembly.sh
to actually do the evaluation.analysis_assembly.sh
also relies onfastmer.py
which is installed in../TOOLS.md
.subsample_fastqs.sh
: subsample fastq file.
megalodon_modcall.sh
: Perform methylation calling (used byrun_exp_megalodon_modcall.sh
).evaluate_methylation_calls.py
: Evaluate methylation calls.