-
Notifications
You must be signed in to change notification settings - Fork 2
7. seQuoia and Customized Object Oriented Programming
Rauf Salamzade edited this page Dec 22, 2020
·
1 revision
FastqAnalyzer Testing Notebook¶
Loading FastQC¶
In [1]:
# Using Python 3.6 import logging import os import sys sys.path.append("/gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/") # load the seQc branch into your python path from seQuoia.other import usefulFunctions as uF # module with useful functions from seQuoia.classes.FastqAnalyzer import Fastq, FastqPaired # import the Fastq (single-end) and FastqPaired (paired-end) classes
Specify the Input Files¶
In [2]:
# dummy FASTQ example files (each one should contain 10,000 reads) paired_end_R1_fastq = "/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/inputs/dummy_R1.fastq.gz" paired_end_R2_fastq = "/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/inputs/dummy_R2.fastq.gz" sample_name = "DummySample" # set up workspace and create logging object using uF module workspace = "/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial" log_file = workspace + "FastQC.log" logger = uF.createLoggerObject(log_file) # The following is only done to show logging captured in Jupyter notebook as pink cells. handler = logging.StreamHandler(sys.stderr) formatter = logging.Formatter('%(name)s - %(levelname)s - %(message)s') handler.setFormatter(formatter) logger.handlers = [handler]
Let's Test All Single End FASTQ Cmds¶
In [3]:
# The following creates a Fastq Object meant for dealing with a single FASTQ file. FastqObject = Fastq(paired_end_R1_fastq, "DummySample", logger)
In [4]:
# Create symlink of file to take care of any downstream naming issues. Always recommended! # Note, symlinks are soft, won't be able to influence original data. FastqObject.create_symlink(workspace, change_reference=True)
Out[4]:
'/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.fastq.gz'
In [5]:
# Create new instance of FASTQ file in specified working directory FastqObject.create_new_instance(workspace, change_reference=False, compress=True)
task_logger - INFO - Creating new instance of FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.fastq.gz in working directory /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial task_logger - INFO - Successfully created new instance of FASTQ file: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.copy.fastq.gz.
Out[5]:
'/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.copy.fastq.gz'
In [6]:
# Run validation on FastqObject to make sure that the FASTQ file is indeed a FASTQ file. Uses fqtools. FastqObject.validate()
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Validating FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.fastq.gz using fqtools installation 2.0 (main_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/fqtools.sh validate /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.fastq.gz task_logger - INFO - Was able to successfully validate file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.fastq.gz as a FASTQ
Out[6]:
True
In [7]:
# Run QC analysis using the notorious FastQC program! FastqObject.run_qc(workspace)
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running FastQC on FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.fastq.gz using installation 0.11.8 (main_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/fastqc.sh --quiet -o /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ -t 1 /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.fastq.gz task_logger - INFO - FastQC ran successfully! The results can be found at: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample_fastqc.zip
Out[7]:
'/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample_fastqc.zip'
In [8]:
# Since adapters are unknown for this sequence, I just filter reads based on a minimum size threshold of 25bp. FastqObject.trim_galore_adapter_trim(workspace, options="", change_reference=True)
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running trim_galore on FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.fastq.gz using installation 0.5.0 (main_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/trim_galore.sh --gzip -o /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.fastq.gz task_logger - INFO - trim galore finished successfully! task_logger - INFO - Fastq Object changed reference from /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.fastq.gz to /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.adapter-trim.fastq.gz
Out[8]:
'/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.adapter-trim.fastq.gz'
In [9]:
# Trimmomatic has really nice quality trimming functionality. Here we apply some common cutoffs. FastqObject.quality_trim(workspace, options="LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:25", change_reference=True) print("Current sample name: %s" % FastqObject.sample_name)
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running Trimmomatic on FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.adapter-trim.fastq.gz using installation 0.38 (main_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/trimmomatic.sh SE -threads 1 /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.adapter-trim.fastq.gz /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.quality-trim.fastq.gz LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:25 task_logger - INFO - Trimmomatic finished successfully! task_logger - INFO - Fastq Object changed reference from /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.adapter-trim.fastq.gz to /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.quality-trim.fastq.gz
Current sample name: DummySample
In [10]:
# Let us subsample the reads as some of the operations ahead are computationally intensive! FastqObject.subsample(workspace, reads=1000, compress=True, change_reference=True) # We could also do this by the number of bases, instead of the number of reads. FastqObject.downsample(workspace, bases=300000000, compress=True, change_reference=False)
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Subsampling 1000 reads from the sample /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.quality-trim.fastq.gz using seqtk version 1.3 (main_env). task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/seqtk_sample.sh -s100 /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.quality-trim.fastq.gz 1000 task_logger - INFO - seqtk ran successfully! Resulting FASTQ can be found at: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.subsampled.fastq task_logger - INFO - Successfully compressed resulting FASTQ file. task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Subsampling 300000000 bases from the sample /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.subsampled.fastq.gz using fastqfilter version current-version-in-scripts (main_env). task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/fastqfilter.sh -b 300000000 -o /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.subsampled.fastq /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.subsampled.fastq.gz task_logger - INFO - fastqfilter ran successfully! Resulting FASTQ can be found at: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.subsampled.fastq task_logger - INFO - Successfully compressed resulting FASTQ file.
Out[10]:
'/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.subsampled.fastq.gz'
In [11]:
# Written by the Knight lab, SortMeRNA has super nice features such as not accepting gzipped FASTQ files!" FastqObject.filter_ribo_rna(workspace, "/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/", change_reference=True) print("Current sample name: %s" % FastqObject.sample_name)
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running SortMeRNA on FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.subsampled.fastq.gz using installation 2.1 (main_env) task_logger - WARNING - FASTQ files are not compressed, time to create local instances. task_logger - INFO - Creating new instance of FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.subsampled.fastq.gz in working directory /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ task_logger - INFO - Successfully created new instance of FASTQ file: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.copy.fastq.gz. task_logger - INFO - Sucessfully uncompressed new FASTQ instance /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.copy.fastq! task_logger - INFO - SortMeRNA ribosomal databases/references could be found at /gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/ task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/sortmerna.sh --ref /gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/rRNA_databases/silva-bac-16s-id90.fasta,/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/index/silva-bac-16s-db:/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/rRNA_databases/silva-bac-23s-id98.fasta,/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/index/silva-bac-23s-db:/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/rRNA_databases/silva-arc-16s-id95.fasta,/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/index/silva-arc-16s-db:/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/rRNA_databases/silva-arc-23s-id98.fasta,/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/index/silva-arc-23s-db:/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/rRNA_databases/silva-euk-18s-id95.fasta,/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/index/silva-euk-18s-db:/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/rRNA_databases/silva-euk-28s-id98.fasta,/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/index/silva-euk-28s:/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/rRNA_databases/rfam-5s-database-id98.fasta,/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/index/rfam-5s-db:/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/rRNA_databases/rfam-5.8s-database-id98.fasta,/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/./index/rfam-5.8s-db --reads /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.copy.fastq --num_alignments 1 -a 1 --fastx --aligned /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.rna-aligned --other /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.rna-removed --log -v task_logger - INFO - Successfully ran SortMeRNA! task_logger - INFO - Successfully compressed resulting FASTQ file.
Current sample name: DummySample
In [12]:
# KneadData is a trimming QC pipeline/program by the Huttenhower lab for metagenomic data. FastqObject.run_kneaddata(workspace, change_reference=False)
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running KneadData on FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.rna-removed.fastq.gz using installation 0.7.2 (hut_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/kneaddata.sh -t 1 --input /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.rna-removed.fastq.gz --output /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ --output-prefix DummySample_kneaddata task_logger - INFO - kneaddata finished successfully!
Out[12]:
'/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.kneaddata.fastq.gz'
In [13]:
# Error Correction is useful prior to calling variants or constructing assemblies to overcome sequencing errors. # Here we use BayesHammer which is part of the Spades package. FastqObject.error_correction(workspace, change_reference=False, compress=True)
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running FreeBayes on FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.rna-removed.fastq.gz using installation 3.13.0 (main_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/spades.sh -t 1 -m 40 --only-error-correction -s /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.rna-removed.fastq.gz -o /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ task_logger - INFO - BayesHammer for Error Corrections finished successfully!
Out[13]:
'/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.bayeshammer.fastq.gz'
In [14]:
# Centrifuge is the new Kraken and allows for fairly speedy and sensitive profiling of reads into taxonomic bins. FastqObject.bin_taxonomically(workspace, "/gsap/garage-bacterial/Users/Rauf/local_databases/Centrifuge/indices_04-02-2018/abvh")
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running Centrifuge on FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.rna-removed.fastq.gz using installation 1.0.4_beta (main_env) task_logger - WARNING - FASTQ files are not compressed, time to create local instances. task_logger - INFO - Creating new instance of FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.rna-removed.fastq.gz in working directory /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ task_logger - INFO - Successfully created new instance of FASTQ file: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.copy.fastq.gz. task_logger - INFO - Sucessfully uncompressed new FASTQ instance /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.copy.fastq! task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/centrifuge.sh -p 1 -q -x /gsap/garage-bacterial/Users/Rauf/local_databases/Centrifuge/indices_04-02-2018/abvh -U /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.copy.fastq -S /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample_centrifuge_results.txt --report-file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample_centrifuge_report.tsv task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/centrifuge_kreport.sh -x /gsap/garage-bacterial/Users/Rauf/local_databases/Centrifuge/indices_04-02-2018/abvh /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample_centrifuge_results.txt task_logger - INFO - Successfully ran centrifuge and generated kraken-like report file!
Out[14]:
['/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample_centrifuge_results.txt', '/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample_centrifuge_report.tsv', '/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample_centrifuge_kraken_report.txt']
In [15]:
# Metaphlan for more accurate FastqObject.run_metaphlan(workspace)
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running Centrifuge on FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.copy.fastq using installation 2.7.7 (hut_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/metaphlan2.sh /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.copy.fastq --input_type fastq --nproc 1 task_logger - INFO - Successfully ran Metaphlan2!
Out[15]:
'/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/DummySample.profiled_metagenome.txt'
Let's Test All Paired End FASTQ Cmds¶
In [16]:
# The following creates a Fastq Object meant for dealing with a paired-end FASTQ file. FastqObject = FastqPaired(paired_end_R1_fastq, paired_end_R2_fastq, "PairedDummySample", logger)
In [17]:
# Create symlink of file to take care of any downstream naming issues. Always recommended! # Note, symlinks are soft, won't be able to influence original data. FastqObject.create_symlink(workspace, change_reference=True)
task_logger - INFO - Creating symlinks of FASTQ files /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/inputs/dummy_R1.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/inputs/dummy_R2.fastq.gz in working directory /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ task_logger - INFO - Successfully created new instances of FASTQ files
Out[17]:
['/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.fastq.gz', '/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.fastq.gz']
In [18]:
# Create new instance of FASTQ file in specified working directory FastqObject.create_new_instance(workspace, change_reference=False, compress=True) print(FastqObject.sample_name) print(FastqObject.fastqFrw.sample_name) print(FastqObject.fastqRev.sample_name)
task_logger - INFO - Creating new instance of FASTQ files /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.fastq.gz in working directory /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ task_logger - INFO - Creating new instance of FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.fastq.gz in working directory /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ task_logger - INFO - Successfully created new instance of FASTQ file: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.copy.fastq.gz. task_logger - INFO - Creating new instance of FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.fastq.gz in working directory /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ task_logger - INFO - Successfully created new instance of FASTQ file: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.copy.fastq.gz. task_logger - INFO - Successfully created new instances of FASTQ files
PairedDummySample PairedDummySample_R1 PairedDummySample_R2
In [19]:
# Run validation on FastqObject to make sure that the FASTQ file is indeed a FASTQ file. Uses fqtools. FastqObject.validate()
task_logger - INFO - ********************************************************************** task_logger - INFO - Validating forward FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.fastq.gz and reverse FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.fastq.gz. task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Validating FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.fastq.gz using fqtools installation 2.0 (main_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/fqtools.sh validate /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.fastq.gz task_logger - INFO - Was able to successfully validate file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.fastq.gz as a FASTQ task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Validating FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.fastq.gz using fqtools installation 2.0 (main_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/fqtools.sh validate /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.fastq.gz task_logger - INFO - Was able to successfully validate file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.fastq.gz as a FASTQ task_logger - INFO - Successfully validated both forward and reverse files as being FASTQs!
Out[19]:
True
In [20]:
# Since adapters are unknown for this sequence, I just filter reads based on a minimum size threshold of 25bp. FastqObject.trim_galore_adapter_trim(workspace, options="", change_reference=True) print(FastqObject.fastqFrw.fastq) print(FastqObject.fastqRev.fastq) print(FastqObject.fastqRev.sample_name) print(FastqObject.fastqFrw.sample_name) print(FastqObject.sample_name)
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running trim_galore on FASTQ files /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.fastq.gz using installation 0.5.0 (main_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/trim_galore.sh --paired --gzip -o /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.fastq.gz /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.fastq.gz task_logger - INFO - trim galore finished successfully! task_logger - INFO - Fastq Objects changed reference
/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.adapter-trim.fastq.gz /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.adapter-trim.fastq.gz PairedDummySample_R2 PairedDummySample_R1 PairedDummySample
In [21]:
# Run QC analysis using the famous FastQC program! FastqObject.run_qc(workspace)
task_logger - INFO - ********************************************************************** task_logger - INFO - Running FastQC on FASTQ files /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.adapter-trim.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.adapter-trim.fastq.gz task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running FastQC on FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.adapter-trim.fastq.gz using installation 0.11.8 (main_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/fastqc.sh --quiet -o /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ -t 1 /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.adapter-trim.fastq.gz task_logger - INFO - FastQC ran successfully! The results can be found at: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1_fastqc.zip task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running FastQC on FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.adapter-trim.fastq.gz using installation 0.11.8 (main_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/fastqc.sh --quiet -o /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ -t 1 /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.adapter-trim.fastq.gz task_logger - INFO - FastQC ran successfully! The results can be found at: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2_fastqc.zip task_logger - INFO - Succesfully ran FastQC on both forward and reverse read files. Results for the forward readset can be found at %s. Results for the reverse readset can be found %s.
Out[21]:
['/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1_fastqc.zip', '/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2_fastqc.zip']
In [22]:
# Since adapters are unknown for this sequence, I just filter reads based on a minimum size threshold of 25bp. FastqObject.trim_galore_adapter_trim(workspace, options="", change_reference=True)
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running trim_galore on FASTQ files /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.adapter-trim.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.adapter-trim.fastq.gz using installation 0.5.0 (main_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/trim_galore.sh --paired --gzip -o /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.adapter-trim.fastq.gz /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.adapter-trim.fastq.gz task_logger - INFO - trim galore finished successfully! task_logger - INFO - Fastq Objects changed reference
Out[22]:
['/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.adapter-trim.fastq.gz', '/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.adapter-trim.fastq.gz']
In [23]:
# Trimmomatic has really nice quality trimming functionality. Here we apply some common cutoffs. FastqObject.quality_trim(workspace, options="LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:25", change_reference=True) print("Current sample name: %s" % FastqObject.sample_name)
task_logger - INFO - ********************************************************************** task_logger - INFO - Running Trimmomatic on FASTQ files /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.adapter-trim.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.adapter-trim.fastq.gz using installation 0.38 (main_env) task_logger - INFO - Trimmomatic successfully ran on paired-end FASTQ set, now attempting to merge unpaired/paired file sets ... task_logger - INFO - Successfully ran Trimmomatic and merged paired/un-paired result files.
Current sample name: PairedDummySample
In [24]:
# Let us subsample the reads as some of the operations ahead are computationally intensive! FastqObject.subsample(workspace, reads=1000, compress=True, change_reference=True) # We could also do this by the number of bases, instead of the number of reads. FastqObject.downsample(workspace, bases=300000000, compress=True, change_reference=False)
task_logger - INFO - ********************************************************************** task_logger - INFO - Creating subsampled FASTQs for /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.quality-trim.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.quality-trim.fastq.gz. task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Subsampling 1000 reads from the sample /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.quality-trim.fastq.gz using seqtk version 1.3 (main_env). task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/seqtk_sample.sh -s100 /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.quality-trim.fastq.gz 1000 task_logger - INFO - seqtk ran successfully! Resulting FASTQ can be found at: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.subsampled.fastq task_logger - INFO - Successfully compressed resulting FASTQ file. task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Subsampling 1000 reads from the sample /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.quality-trim.fastq.gz using seqtk version 1.3 (main_env). task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/seqtk_sample.sh -s100 /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.quality-trim.fastq.gz 1000 task_logger - INFO - seqtk ran successfully! Resulting FASTQ can be found at: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.subsampled.fastq task_logger - INFO - Successfully compressed resulting FASTQ file. task_logger - INFO - Successfully subsampled FASTQ files task_logger - INFO - ********************************************************************** task_logger - INFO - Creating subsampled FASTQs for /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.subsampled.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.subsampled.fastq.gz. task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Subsampling 300000000 bases from the sample /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.subsampled.fastq.gz using fastqfilter version current-version-in-scripts (main_env). task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/fastqfilter.sh -b 300000000 -o /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.subsampled.fastq /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.subsampled.fastq.gz task_logger - INFO - fastqfilter ran successfully! Resulting FASTQ can be found at: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.subsampled.fastq task_logger - INFO - Successfully compressed resulting FASTQ file. task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Subsampling 300000000 bases from the sample /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.subsampled.fastq.gz using fastqfilter version current-version-in-scripts (main_env). task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/fastqfilter.sh -b 300000000 -o /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.subsampled.fastq /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.subsampled.fastq.gz task_logger - INFO - fastqfilter ran successfully! Resulting FASTQ can be found at: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.subsampled.fastq task_logger - INFO - Successfully compressed resulting FASTQ file. task_logger - INFO - Successfully subsampled FASTQ files
Out[24]:
['/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.subsampled.fastq.gz', '/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.subsampled.fastq.gz']
In [25]:
# Written by the Knight lab, SortMeRNA has super nice features such as not accepting gzipped FASTQ files!" FastqObject.filter_ribo_rna(workspace, "/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/", change_reference=True)
task_logger - INFO - ********************************************************************** task_logger - INFO - Running SortMeRNA on FASTQ files /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.subsampled.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.subsampled.fastq.gz using installation 2.1 (main_env) task_logger - INFO - SortMeRNA ribosomal databases/references could be found at /gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/ task_logger - WARNING - FASTQ files are not compressed, uh time to create local instances. task_logger - INFO - Creating new instance of FASTQ files /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.subsampled.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.subsampled.fastq.gz in working directory /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ task_logger - INFO - Creating new instance of FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.subsampled.fastq.gz in working directory /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ task_logger - INFO - Successfully created new instance of FASTQ file: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.copy.fastq.gz. task_logger - INFO - Sucessfully uncompressed new FASTQ instance /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.copy.fastq! task_logger - INFO - Creating new instance of FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.subsampled.fastq.gz in working directory /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ task_logger - INFO - Successfully created new instance of FASTQ file: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.copy.fastq.gz. task_logger - INFO - Sucessfully uncompressed new FASTQ instance /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.copy.fastq! task_logger - INFO - Successfully created new instances of FASTQ files task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running SortMeRNA on FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/merged_input.fastq using installation 2.1 (main_env) task_logger - INFO - Awesome! FASTQ files are already compressed! task_logger - INFO - SortMeRNA ribosomal databases/references could be found at /gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/ task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/sortmerna.sh --ref /gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/rRNA_databases/silva-bac-16s-id90.fasta,/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/index/silva-bac-16s-db:/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/rRNA_databases/silva-bac-23s-id98.fasta,/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/index/silva-bac-23s-db:/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/rRNA_databases/silva-arc-16s-id95.fasta,/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/index/silva-arc-16s-db:/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/rRNA_databases/silva-arc-23s-id98.fasta,/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/index/silva-arc-23s-db:/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/rRNA_databases/silva-euk-18s-id95.fasta,/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/index/silva-euk-18s-db:/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/rRNA_databases/silva-euk-28s-id98.fasta,/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/index/silva-euk-28s:/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/rRNA_databases/rfam-5s-database-id98.fasta,/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/index/rfam-5s-db:/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/rRNA_databases/rfam-5.8s-database-id98.fasta,/gsap/garage-bacterial/Users/Rauf/local_databases/SortMeRNA/./index/rfam-5.8s-db --reads /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/merged_input.fastq --num_alignments 1 -a 1 --fastx --aligned /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_Merged.rna-aligned --other /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_Merged.rna-removed --log -v --paired_in task_logger - INFO - Successfully ran SortMeRNA! task_logger - INFO - Running fastq-pair on FASTQ files /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.copy.fastq and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.copy.fastq using installation 1.0 (main_env) task_logger - INFO - Successfully ran SortMeRNA and paired end read set.
Out[25]:
['/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.rna-removed.fastq.gz', '/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.rna-removed.fastq.gz']
In [26]:
# KneadData is a trimming QC pipeline/program by the Huttenhower lab for metagenomic data. FastqObject.run_kneaddata(workspace, change_reference=False)
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running KneadData on FASTQ files /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.rna-removed.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.rna-removed.fastq.gz using installation 0.7.2 (hut_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/kneaddata.sh -t 1 --input /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.rna-removed.fastq.gz --input /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.rna-removed.fastq.gz --output /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ --output-prefix PairedDummySample_kneaddata task_logger - INFO - kneaddata finished successfully!
Out[26]:
['/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.kneaddata.fastq.gz', '/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.kneaddata.fastq.gz']
In [27]:
# Error Correction is useful prior to calling variants or constructing assemblies to overcome sequencing errors. # Here we use BayesHammer which is part of the Spades package. FastqObject.error_correction(workspace, change_reference=False, compress=True)
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running FreeBayes on FASTQ files /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.rna-removed.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.rna-removed.fastq.gz using installation 3.13.0 (main_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/spades.sh --only-error-correction -t 1 -m 40 -1 /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.rna-removed.fastq.gz -2 /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.rna-removed.fastq.gz -o /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ task_logger - INFO - BayesHammer for Error Corrections finished successfully!
Out[27]:
True
In [28]:
# Centrifuge is the new Kraken and allows for fairly speedy and sensitive profiling of reads into taxonomic bins. FastqObject.bin_taxonomically(workspace, "/gsap/garage-bacterial/Users/Rauf/local_databases/Centrifuge/indices_04-02-2018/abvh")
task_logger - INFO - ********************************************************************** task_logger - INFO - Running Centrifuge on FASTQ files /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.rna-removed.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.rna-removed.fastq.gz using installation 1.0.4_beta (main_env) task_logger - WARNING - FASTQ files are not compressed, uh time to create local instances. task_logger - INFO - Creating new instance of FASTQ files /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.rna-removed.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.rna-removed.fastq.gz in working directory /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ task_logger - INFO - Creating new instance of FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.rna-removed.fastq.gz in working directory /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ task_logger - INFO - Successfully created new instance of FASTQ file: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.copy.fastq.gz. task_logger - INFO - Sucessfully uncompressed new FASTQ instance /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.copy.fastq! task_logger - INFO - Creating new instance of FASTQ file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.rna-removed.fastq.gz in working directory /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ task_logger - INFO - Successfully created new instance of FASTQ file: /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.copy.fastq.gz. task_logger - INFO - Sucessfully uncompressed new FASTQ instance /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.copy.fastq! task_logger - INFO - Successfully created new instances of FASTQ files task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/centrifuge.sh -p 1 -q -x /gsap/garage-bacterial/Users/Rauf/local_databases/Centrifuge/indices_04-02-2018/abvh -1 /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.copy.fastq -2 /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.copy.fastq -S /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_centrifuge_results.txt --report-file /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_centrifuge_report.tsv task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/centrifuge_kreport.sh -x /gsap/garage-bacterial/Users/Rauf/local_databases/Centrifuge/indices_04-02-2018/abvh /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_centrifuge_results.txt task_logger - INFO - Successfully ran centrifuge and generated kraken-like report file!
Out[28]:
['/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_centrifuge_results.txt', '/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_centrifuge_report.tsv', '/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_centrifuge_kraken_report.txt']
In [29]:
# Metaphlan for more accurate FastqObject.run_metaphlan(workspace)
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running Metaphlan2 on FASTQ file using installation 2.7.7 (hut_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/metaphlan2.sh /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.rna-removed.fastq.gz,/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.rna-removed.fastq.gz --bowtie2out PairedDummySample.bowtie2.bz2 --input_type fastq --nproc 1 task_logger - INFO - Successfully ran Metaphlan2!
Out[29]:
'/gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample.profiled_metagenome.txt'
In [30]:
# Run AMR Prediction using ShortBred and ARIBA FastqObject.shortbred_amrp(workspace, "/gsap/garage-bacterial/Users/Rauf/local_databases/AMR/ShortBRED/ShortBRED_CARD_2017_markers.faa") FastqObject.ariba(workspace, "ARIBA_AMRP", "/gsap/archive-bacterial/Rauf/LSARP/Mykrobe_vs_ARIBA/Create_Ariba_Database/The_Easy_Way/ARIBA_Staph_AMR_db/")
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running shortBRED on FASTQ files /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.rna-removed.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.rna-removed.fastq.gz using installation 0.9.5 (hut_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/shortbred_quantify.sh --threads 1 --markers /gsap/garage-bacterial/Users/Rauf/local_databases/AMR/ShortBRED/ShortBRED_CARD_2017_markers.faa --wgs /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/wgs.fna --result /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_shortBRED_results.txt --tmp /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/shortbred_tmpdir/ task_logger - INFO - shortBRED for AMRP finished successfully! task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running ARIBA on FASTQ files /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.rna-removed.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.rna-removed.fastq.gz using installation 2.13.3 (ariba_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/ariba.sh run /gsap/archive-bacterial/Rauf/LSARP/Mykrobe_vs_ARIBA/Create_Ariba_Database/The_Easy_Way/ARIBA_Staph_AMR_db/ /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.rna-removed.fastq.gz /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.rna-removed.fastq.gz /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ARIBA_AMRP/ task_logger - INFO - ARIBA finished successfully!
Out[30]:
True
In [31]:
# Run MLST using ARIBA FastqObject.ariba(workspace, "ARIBA_MLST", "/gsap/garage-bacterial/Users/Rauf/local_databases/ARIBA_DBs/MLST/Staphylococcus_aureus/get_mlst/ref_db/")
task_logger - INFO - ---------------------------------------------------------------------- task_logger - INFO - Running ARIBA on FASTQ files /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.rna-removed.fastq.gz and /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.rna-removed.fastq.gz using installation 2.13.3 (ariba_env) task_logger - INFO - Executing the command: /bin/bash /gsap/garage-bacterial/Users/Rauf/git_repos/seQuoia/seQuoia/external_wrappers/ariba.sh run /gsap/garage-bacterial/Users/Rauf/local_databases/ARIBA_DBs/MLST/Staphylococcus_aureus/get_mlst/ref_db/ /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R1.rna-removed.fastq.gz /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/PairedDummySample_R2.rna-removed.fastq.gz /gsap/garage-bacterial/Projects/seQc/FastqAnalyzer_Jupyter_Tutorial/ARIBA_MLST/ task_logger - INFO - ARIBA finished successfully!
Out[31]:
True
In [ ]: