You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are multiple releases of the database. Usually, you should use the latest release, unless you are trying to reproduce a previous analysis. You can download the entire database here or just a subset.
The FASTA header format
The database is a FASTA file with headers in the following format:
“name”: The binomial species name of the organism with spaces replaced by underscores.
“strain”: The name of the strain/isolate if available. If the strain is not available, it is left empty.
“ncbi_acc: The NCBI accession number for the sequence submitted to genbank. Note that the version number (the number at the end, after the period) is not included.
“ncbi_taxid”: The NCBI taxonomy id. This can be looked up using the NCBI accession number.
“oodb_id”: This is the unique numeric ID specific to OomyceteDB.
“taxonomy”: The taxonomic classification separated by semicolons. This classification is curated by us and is not the taxonomic classification from NCBI associated with the NCBI taxid.
However, the downloads I made in the last few days still have spaces in the species name (as shown in both the name= and taxonomy= fields, counter to the example). e.g.
http://oomycetedb.cgrb.oregonstate.edu/search.html says:
However, the downloads I made in the last few days still have spaces in the species name (as shown in both the
name=
andtaxonomy=
fields, counter to the example). e.g.This is important as most FASTA parsers will take the first word as the identifier, i.e. breaking at the first space.
The text was updated successfully, but these errors were encountered: