Oct. 25th, 2019
The goal of this exercise is to learn how to use a python script to retrieve PANTHER annotation data or perform a statistical overrepresentation test through an Application Programming Interface (API)
The script is developed in the GitHub repository in the following location:
https://github.com/pantherdb/pantherapi-pyclient
If you have a GitHub account, you can clone the repo to your desktop app. If not, you can simply download the repo to your desktop.
PANTHER API Service
PANTHER API is an interface to allow client to access PANTHER data and tools. The users can access directly through command-line command, or embed the commands/codes in various scripts and programs (Perl, Python, R, etc.).
Example client code for calling can be found in the Panther API services
$ git clone https://github.com/pantherdb/pantherapi-pyclient.git
$ cd pantherapi-pyclient
$ python3 -m venv env
$ . env/bin/activate (bash) or source env/bin/activate.csh (C-shell or tcsh)
$ pip install -r requirements.txt
$ python3 pthr_go_annots.py --service --params_file --seq_id_file
Currently, there are three options for service types (--service or -s).
- enrich -- This is the statistical overrepresentation test on a list of genes.
- geneinfo -- This call provides GO and pathway annnotations to the uploaded genes.
- ortholog -- This call returns the orthologs of the uploaded list. Maximum of 10 genes can be loaded. The orthologs can be from a specified genome, or from all genomes in the PANTHER database (132 total).
These files (in JSON format) are in the params/ folder. They should be edited according to the uploaded data and the type of call.
enrich.json
This file should be used when enrich is specified as the service type. There are four items to be specified in this file.
- "organism": "9606" --specify an organism with a taxon ID. (see Appendix on How to find a taxon ID?)
- "annotDataSet": "GO:0008150" --specify an annotation data set. (see Appendix on How to find the ID for supported annotation dataset?)
- "enrichmentTestType": "FISHER", --enter either FISHER (for Fisher's Exact test) or BINOMIAL (for binomial distribution test)
- "correction": "FDR" --specify the multi test correction method (FDR, BONFERRONI, or NONE)
geneinfo.json
This file should be used when geneinfo is specified as the service type. The organism taxon ID needs to be specified to match the uploaded data.
ortholog.json
This file should be used when ortholog is specified as the service type. There are three items to be specified
- "organism": "9606" -- specify the organism of the uploaded genes
- "orthologType": "LDO" -- specify the type of ortholog, e.g., LDO (for least divergent ortholog), or all.
- "targetOrganism": “10090,7227” -- specifiy the taxon ids for the target organisms, separated by a comma.
This should be a simple text file (.txt) with one gene identifier per line. Please visit the following page to find out the supported IDs.
www.pantherdb.org/tips/tips_batchIdSearch_supportedId.jsp
$ python3 pthr_go_annots.py -h
usage: pthr_go_annots.py [-h] [-s SERVICE] [-p PARAMS_FILE] [-f SEQ_ID_FILE]
optional arguments:
-h, --help show this help message and exit
-s SERVICE, --service SERVICE
Panther API service to call (e.g. 'enrich',
'geneinfo', 'ortholog')
-p PARAMS_FILE, --params_file PARAMS_FILE
File path to request parameters JSON file
-f SEQ_ID_FILE, --seq_id_file SEQ_ID_FILE
File path to list of sequence identifiers
Examples:
% python3 pthr_go_annots.py -s geneinfo -p params/geneinfo.json -f resources/test_ids.txt
% python3 pthr_go_annots.py -s enrich -p params/enrich.json -f resources/test_ids.txt
% python3 pthr_go_annots.py -s ortholog -p params/ortholog.json -f resources/test_ids_ortholog.txt
There are three ways to find the exact taxon IDs for genomes supported by PANTHER.
- Go to the PANTHER Open API site (http://panthertest3.med.usc.edu:8083/services/tryItOut.jsp?url=%2Fservices%2Fapi%2Fpanther), and use the /supportedgenomes service.
- Go directly to the API link page (http://panthertest3.med.usc.edu:8083/services/oai/pantherdb/supportedgenomes).
- Run the following command: curl -X POST "http://panthertest3.med.usc.edu:8083/services/oai/pantherdb/supportedgenomes" -H "accept: application/json"
Use the taxon ID that corresponds to the genomes in the ‘name’ field.
There are three similar ways to find the IDs or text needed for the supported annotation dataset.
- Go to the PANTHER Open API site (http://panthertest3.med.usc.edu:8083/services/tryItOut.jsp?url=%2Fservices%2Fapi%2Fpanther), and use the /supportedannotdatasets service.
- Go directly to the API link page (http://panthertest3.med.usc.edu:8083/services/oai/pantherdb/supportedannotdatasets).
- Run the following command: curl -X POST "http://panthertest3.med.usc.edu:8083/services/oai/pantherdb/supportedannotdatasets" -H "accept: application/json"
Use the text in the ‘id’ field for the parameter files.