Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VADR produces qualifier with invalid value #53

Open
taltman opened this issue Dec 21, 2021 · 4 comments
Open

VADR produces qualifier with invalid value #53

taltman opened this issue Dec 21, 2021 · 4 comments

Comments

@taltman
Copy link

taltman commented Dec 21, 2021

VADR produces features with the /exception qualifier, to specify ribosomal slippage:

>Feature NODE_1_length_19663_cov_257.252269
<1      12777   gene
                        gene    ORF1ab
<1      5687    CDS
5687    12777
                        product ORF1ab polyprotein
                        exception       ribosomal slippage
                        codon_start     3
                        protein_id      NODE_1_length_19663_cov_257.252269_1

But according to the INSDC specs, this is an invalid value:

https://www.insdc.org/documents/feature_table.html#7.3

Qualifier       /exception=
Definition      indicates that the coding region cannot be translated using
                standard biological rules
...
                - must not be used for ribosomal slippage, instead use join operator, 
                  e.g.: CDS   join(486..1784,1787..4810)
                              /note="ribosomal slip on tttt sequence at 1784..1787"

This causes problems when trying to submit genomes annotated with VADR to ENA.

@taltman
Copy link
Author

taltman commented Dec 21, 2021

This is what is desired:

Qualifier       /ribosomal_slippage
Definition      during protein translation, certain sequences can program
                ribosomes to change to an alternative reading frame by a 
                mechanism known as ribosomal slippage 
Value format    none 
Example         /ribosomal_slippage 
Comment         a join operator,e.g.: [join(486..1784,1787..4810)] should be used 
                in the CDS spans to indicate the location of ribosomal_slippage 

@nawrockie
Copy link
Member

That's actually a different format than the .tbl file that vadr creates, despite them both being (confusingly) called feature tables. The format of vadr output 'feature tables' is described here: https://www.ncbi.nlm.nih.gov/genbank/feature_table/

The vadr format is a useful file format for submissions to GenBank. There may be ways to convert it to a format that ENA accepts for submissions, but I'm not sure what those formats are. The vadr .ftr output files may also be relatively easy to parse and reformat into an accepted ENA format.

@taltman
Copy link
Author

taltman commented Dec 27, 2021

Hi @nawrockie ,

I've so far used VADR to submit nine CoV genomes to ENA with remote homology to SARS-CoV-2. The latest version of DARTH will have the scripts to turn VADR output into a format that can be submitted to ENA using their Webin-CLI tool.

I looked at the GenBank feature_table page, but it doesn't talk about ribosome slippage, and how to encode it correctly (the page seems to be more concerned about syntax). INSDC is a collaboration between GenBank, DDBJ, and EMBL, so why wouldn't the semantics defined for INSDC apply to the GenBank?

@nawrockie
Copy link
Member

VADR outputs the .tbl file format because that was preferred by the GenBank indexers for viral submissions at the time of development. In the GenBank submission pipeline, the vadr .tbl format file is then converted to 'asn' format using tbl2asn which is used to input the data into the GenBank database.

The latest version of DARTH will have the scripts to turn VADR output into a format that can be submitted to ENA using their Webin-CLI tool.

That sounds good. Have you finished developing the format conversion tool/script, or do you still have a problem with ribosomal_slippage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants