Automated Bulk Segregant Analysis (ABSA) is a pipeline designed to streamline the Bulk Segregant Analysis (BSA) workflow for Cryptosporidium crossings genomic data. It automates a series of bioinformatics tools and processes, enhancing efficiency, reproducibility, and scalability. This pipeline is a streamlined implementation of Dr. Xue Li's BSA analysis. Bulk segregant analysis is a technique for identifying the genetic loci that underlie phenotypic trait differences. The basic approach is to compare two pools of individuals from the opposing tails of the phenotypic distribution, sampled from an interbred population. This pipeline takes input all the way to allele freq table generation and plotting for all valid SNPs.
- Read Trimming: Utilizes Trim Galore for trimming sequencing reads.
- Read Mapping: Employs BWA MEM for aligning reads to a reference genome.
- BAM Processing: SAM/BAM processing using SAMTools and GATK4.
- Variant Calling: Calls variants using GATK4 HaplotypeCaller.
- SNP Filtering: Filters SNPs using vcffilter and GATK4 SelectVariants.
- Post-Processing: Generates plots and tables for downstream analysis.
- Organized Outputs: Automatically organizes output tasble and plot files into designated folders (
tables
andplots
). - Comprehensive Logging: Logs detailed workflow progress and errors to both the console and a log file (
AutomatedBSA.log
).
Ensure the following bioinformatics tools are installed and accessible in your system's $PATH
:
- Trim Galore
- BWA
- SAMTools
- GATK4
- vcffilter
- R
- Python 3.10.11: The pipeline has been tested with Python version 3.10.11.
pandas pyfiglet colorama tqdm
ggplot2 readr
-
git clone https://github.com/ruicatxiao/Automated_Bulk_Segregant_Analysis.git
-
chmod u+x AutomatedBSA.py
-
chmod u+x bin/scatter_plot_snp_location.py
-
chmod u+x bin/BSA_R_Preprocessing.R
Follow the provided samplesheet.csv file in the repo and place read files into raw_reads folder, update samplesheet.csv to reflect changes. You do not need to provide absolute path to read files
You should have a reference genome in fasta, a samplesheet.csv and the raw_reads folder containing all reads
python3 AutomatedBSA.py
--ref <GENOME_REFERENCE.fasta>
--sample samplesheet.csv
--threads <NUMBER_OF_CPU_THREADS>
- CpBGF genome is provided by default. replace this with any other Cryptosporidum genome as needed
All final output tables are located in "tables" folder. All generated plots file are located in "plots" folder
- Outlier SNPs removal via median absolute deviation (MAD)
- TriCube smoothing of SNP frequency distribution
- QTLSeqR processing, g-prime calculation and plotting
- nf-core adaptation to remove local dependency requirements
ABSA is conceptualized by Sebastian Shaw and Rui Xiao. The pipeline is developed and implemented by Rui Xiao.
We thank Xue Li for implementing the original BSA, which ABSA is based upon
=====PLACE HOLDER FOR STRIPEN GROUP PUBLICATION CITATION USING ABSA=======
Brenneman KV, Li X, Kumar S, Delgado E, Checkley LA, Shoue DA, Reyes A, Abatiyow BA, Haile MT, Tripura R, Peto T, Lek D, Button-Simons KA, Kappe SHI, Dhorda M, Nosten F, Nkhoma SC, Cheeseman IH, Vaughan AM, Ferdig MT, Anderson TJC. Optimizing bulk segregant analysis of drug resistance using Plasmodium falciparum genetic crosses conducted in humanized mice. iScience. 2022 Mar 16;25(4):104095. doi: 10.1016/j.isci.2022.104095. PMID: 35372813; PMCID: PMC8971943.