Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_vrs returns valid results when hgvs position is greater than ref sequence length #449

Open
larrybabb opened this issue Jun 19, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@larrybabb
Copy link
Collaborator

I'm not exactly sure what is happening here, but it seems wrong and is easily reproducible. It appears that when the same as reference hgvs syntax is used as shown below, that you can put a position in that is greater than the length of the sequence and it will still return a result with a state of state: { type="LiteralSequenceExprssion",value=""}. You can test it with the following expression...

NM_000412.5:c.1930=

The transcript NM_000412.5 is 1947 residues long, but its CDS region is19..1596, so I'm assuming the c.1929 is really be reference position 1929+(19-1) or 1947 thus pointing to the last residue in the sequence. So, I would assume anything greater than c.1929 would fail. But as I write this out I'm thinking that c.1929 should really fail since the coding sequence is really goes from the ref seq residue 19 through 1596. This means the c. positions would go from c.1 = 19 to c.1577 = (1596 - 19). If this tracks then I would assume any c. position greater than c.1577 should fail. Of course, if someone used the hgvs positional syntax to reference sequences further into the 3`utr region like c.*200 which would indicate an additional 200 residues into the 3`utr region past the stop codon of the coding sequence.

In this transcript the last position that is valid using c. nomenclature would be 1947 base ref seq len - 1596 last cds position = 351 or NM_000412.5:c.*351=.

@larrybabb larrybabb added the bug Something isn't working label Jun 19, 2023
@larrybabb
Copy link
Collaborator Author

I haven't tested whether or not this happens on genomic or other sequence types. But I would suspect it does.

@korikuzma
Copy link
Member

korikuzma commented Jun 19, 2023

@larrybabb Good catch. I always forget about CDS start site with coding DNA. So when performing index checks, I forgot to add CDS start site to the position on c. coordinate types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants