Barcode Phylogenetics
✨ NOTE: Pipeline is under active development ✍️
This script is required when an expected species ID does not match the result species/genus ID from a sequence similarity search against the BOLD database. It requires a merged barcode sequence as input, along with an expected species ID (EXID) and a result species/genus ID (REXID). Using the BOLD API it pulls down barcode data for EXID and REXID, creates a multiple sequence alignment, and uses maximum likelihood estimation to construct a phylogenetic tree. The output is a pdf image of the tree, where the identification of the barcode query can be confidently assigned in a phylogenetic context.
- python3
- Mafft - install
- IQTree - install
- R v3 - install
- Toytree - install
- Python modules;
pandas, os, glob, argparse, subprocess, ete3, Bio, joblib
. If you get an error for any of these, install usingpip
before running script. - R libraries;
ggtree v1.10
from bioconductor,getopt
Clone repository locally using git clone https://github.com/PeterMulhair/BarPhy.git
Note this script is built to run on command line on a linux system
BarPhy can be run in two ways:
- With an excel sheet or csv file of barcode results as input (see the excel sheet in data/)
python barcode_queries.py --barcode Barcoding_results.xlsx
- With a fasta file containing a barcode sequence named as EXID and a query species/genus to search against
python barcode_queries.py --query EXID.fasta REXID
The script also requires certain directories. To run the --query
version, place your fasta files in a directory called queries/
Output
The output folder consists of a number of files including raw fasta, MSA, and tree output files. The tree image file, ending in .pdf, is what you want to check to see where your barcode query fits in the tree.
Using the fasta file from queries/
you can ID the barcode sequences using the --query
version (the --barcode
version can be run using the excel sheet in data/
)
Then run the script:
$ python barcode_queries.py --query queries/Melangyna_labiatarum.fasta Melangyna_compositarum
OR to search against the REXID genus rather than species:
$ python barcode_queries.py --query queries/Melangyna_labiatarum.fasta Melangyna_
NOTE for --query
version the input is required as the species bionomial name, or genus, separated by an underscore eg. Drosophila_melanogaster or Drosophila_
Output
For --query
runs, the query species will be labelled with _query
and coloured red in the output tree image (labelled with _DToL
in --barcode
version).