Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move variant.definingContext.expressions to variant.members #309

Closed
korikuzma opened this issue Mar 27, 2024 · 20 comments · Fixed by #404
Closed

Move variant.definingContext.expressions to variant.members #309

korikuzma opened this issue Mar 27, 2024 · 20 comments · Fixed by #404
Assignees
Labels
enhancement New feature or request priority:medium Medium priority

Comments

@korikuzma
Copy link
Member

After initial talk with @ahwagner , I should move variant.definingContext.expressions into variant.members. For CIViC variants, the genomic hgvs expression should already be there.

Screenshot 2024-01-08 at 10 03 41 AM

Originally posted by @korikuzma in #279 (comment)

@korikuzma korikuzma added enhancement New feature or request priority:medium Medium priority labels Mar 27, 2024
Copy link

This issue is stale because it has been open 45 days with no activity. Please make a comment for triaging or closing the issue.

@github-actions github-actions bot added the stale label May 11, 2024
@korikuzma korikuzma removed the stale label May 13, 2024
@korikuzma korikuzma self-assigned this Sep 5, 2024
@korikuzma
Copy link
Member Author

@DanielPuthawala @ahwagner @larrybabb
Here is the current example I have. For each expression in definingContext.expressions (currently only using HGVS expressions), we try to create a VRS variation object. If successful, we add the VRS variation object to the members field. My questions are:

  1. Do we need to have members.expressions populated?
    a. If so, do we use all expressions from the definingContext.expressions? If not, do we just use the label (which is always an HGVS expression) to create an expression? Do we do something else?
  2. When creating normalized VRS variation objects, if multiple expressions return the same VRS object would we just create a single member with no label and expressions would be the HGVS expressions used to generate the VRS object?

@DanielPuthawala
Copy link

Forgive me Kori, but can you bring me up to speed with some context here? Is this a question about how we interface between VRS and Cat-VRS in the context of metaKB entries?

@korikuzma
Copy link
Member Author

@DanielPuthawala Yes

@korikuzma
Copy link
Member Author

@DanielPuthawala I will be working on updating the CIViC + MOA CDMs (which leverage VRS/Cat-VRS/VA-Spec) for plenary. There is this example in cat-vrs, but it doesn't quite show what I need to do for members/expressions.

@ahwagner
Copy link
Member

ahwagner commented Sep 6, 2024

Here is the current example I have. For each expression in definingContext.expressions (currently only using HGVS expressions), we try to create a VRS variation object. If successful, we add the VRS variation object to the members field. My questions are:

  1. Do we need to have members.expressions populated?

It isn't a requirement, but I think would be more informative than using the label field. I recommend we do so.

a. If so, do we use all expressions from the definingContext.expressions?

I'm not sure I understand this. The defining context is one variant with one expression; the members associated with the categorical variant are each different variants from the defining context, and each should have one HGVS expression.

  1. When creating normalized VRS variation objects, if multiple expressions return the same VRS object would we just create a single member with no label and expressions would be the HGVS expressions used to generate the VRS object?

To answer this, would you clarify what you mean when by "normalized VRS variation objects"? Does this mean VRS normalization (VOCA)? Or does this mean with the liftover and transcript selection methods we implemented in the Variation Normalization service?

@korikuzma
Copy link
Member Author

Another question.... In my example, there is a member that has the same VRS variation as the definingContext. Does this make sense? Should we have the protein variant be the definingContext and not a member?

@ahwagner
Copy link
Member

ahwagner commented Sep 6, 2024

Another question.... In my example, there is a member that has the same VRS variation as the definingContext. Does this make sense? Should we have the protein variant be the definingContext and not a member?

It is technically both, but redundant to specify that. I would keep as defining context only.

@korikuzma
Copy link
Member Author

korikuzma commented Sep 6, 2024

Here is the current example I have. For each expression in definingContext.expressions (currently only using HGVS expressions), we try to create a VRS variation object. If successful, we add the VRS variation object to the members field. My questions are:

  1. Do we need to have members.expressions populated?

It isn't a requirement, but I think would be more informative than using the label field. I recommend we do so.

🫡

a. If so, do we use all expressions from the definingContext.expressions?

I'm not sure I understand this. The defining context is one variant with one expression; the members associated with the categorical variant are each different variants from the defining context, and each should have one HGVS expression.

That answers my question!

  1. When creating normalized VRS variation objects, if multiple expressions return the same VRS object would we just create a single member with no label and expressions would be the HGVS expressions used to generate the VRS object?

To answer this, would you clarify what you mean when by "normalized VRS variation objects"? Does this mean VRS normalization (VOCA)? Or does this mean with the liftover and transcript selection methods we implemented in the Variation Normalization service?

Both -- VOCA + liftover/transcript selection. We use Variation Normalizer to normalize expressions

@DanielPuthawala
Copy link

That is the case because the catvar is specified at the protein level in this case, so you can end up with a synonymous assayed protein variant as the catvar. But any assayed nucleotide variants couldn't end up with synonymous hashes, and so would only show up in members.

So it's possible for an assayed and categorical variant to share the same ID, but isn't necessarily meaningful, just indicative of synonymy. Assayed variants show up in members.

@korikuzma
Copy link
Member Author

I'm not sure I understand this. The defining context is one variant with one expression;

@ahwagner Wait, how would we handle this variant? NP_000133.1:p.Ala391Glu and ENSP00000414914.2:p.Ala391Glu are provided and return the same VRS computed identifier. Wouldn't we want to store both expressions?

@DanielPuthawala
Copy link

DanielPuthawala commented Sep 6, 2024

I think those should be different relations, right @ahwagner? Because they the same variant just represented in different systems.

@korikuzma
Copy link
Member Author

I think those should be different relations, right @ahwagner? Because they the same variant just represented in different systems.

@DanielPuthawala when you say 'relations', are you talking about mappings?

@korikuzma
Copy link
Member Author

This example in cat-vrs also has multiple expressions for a definingContext: https://github.com/ga4gh/cat-vrs/blob/1.x/examples/canonicalAllele-ex1.yaml#L30-L36

@DanielPuthawala
Copy link

No, At least in Cat-VRS, relations and mappings are performing different functions. A relation points to equivalent representations in the system, such as HGVS vs SPDI, or on liftover between GRCh 37 to 38. On the other hand, mappings point to equivalent entries between systems, like the same thing in CIViC, ClinVar, and MOA.

@DanielPuthawala
Copy link

This example in cat-vrs also has multiple expressions for a definingContext: https://github.com/ga4gh/cat-vrs/blob/1.x/examples/canonicalAllele-ex1.yaml#L30-L36

Oh... I think these are from the VRSatile times. I need to update them.

@ahwagner
Copy link
Member

ahwagner commented Sep 6, 2024

From #309 (comment):

Both -- VOCA + liftover/transcript selection. We use Variation Normalizer to normalize expressions

Okay. I think we should discuss what this looks like in MetaKB. Don't have time to diagram now but will follow-up on this as a separate thread.

From #309 (comment):

I'm not sure I understand this. The defining context is one variant with one expression;

@ahwagner Wait, how would we handle this variant? NP_000133.1:p.Ala391Glu and ENSP00000414914.2:p.Ala391Glu are provided and return the same VRS computed identifier. Wouldn't we want to store both expressions?

Hey, yes, you're right. Wasn't thinking about RefSeq/Ensembl when I said they should each of have one HGVS expression. Yes, these could both be valid expressions, and we should support both if easy to do. But if not easy (e.g. requiring extensive work in UTA or SeqRepo), just start with RefSeq.

Regarding other, non-HGVS types, if it is quick to add support for SPDI (should be simple) and/or gnomAD (slightly more complicated) expression types, that is good. But not required.

From #309 (comment) and related:

For relations vs. mappings, I usually speak about these terms in the context of the SKOS standard definitions. Yes, relations are the semantic relationships between concepts within a vocabulary, and mappings are the relationships between concepts across vocabularies. I don't think either apply with respect to this particular issue; the various expressions for a given contextual variant are simply part of an expressions array. I expect we will have concept relations inheriting from SKOS relations in MetaKB; to follow on another issue.

@korikuzma
Copy link
Member Author

From #309 (comment):

Both -- VOCA + liftover/transcript selection. We use Variation Normalizer to normalize expressions

Okay. I think we should discuss what this looks like in MetaKB. Don't have time to diagram now but will follow-up on this as a separate thread.

👍

From #309 (comment):

I'm not sure I understand this. The defining context is one variant with one expression;

@ahwagner Wait, how would we handle this variant? NP_000133.1:p.Ala391Glu and ENSP00000414914.2:p.Ala391Glu are provided and return the same VRS computed identifier. Wouldn't we want to store both expressions?

Hey, yes, you're right. Wasn't thinking about RefSeq/Ensembl when I said they should each of have one HGVS expression. Yes, these could both be valid expressions, and we should support both if easy to do. But if not easy (e.g. requiring extensive work in UTA or SeqRepo), just start with RefSeq.

Regarding other, non-HGVS types, if it is quick to add support for SPDI (should be simple) and/or gnomAD (slightly more complicated) expression types, that is good. But not required.

@ahwagner we are pulling this information (HGVS expressions) directly from CIViC HGVS Descriptions. I'll save creating gnomAD/SPDI expressions from the Representative Variant Coordinates for a future issue.

From #309 (comment) and related:

For relations vs. mappings, I usually speak about these terms in the context of the SKOS standard definitions. Yes, relations are the semantic relationships between concepts within a vocabulary, and mappings are the relationships between concepts across vocabularies. I don't think either apply with respect to this particular issue; the various expressions for a given contextual variant are simply part of an expressions array. I expect we will have concept relations inheriting from SKOS relations in MetaKB; to follow on another issue.

👍

@korikuzma
Copy link
Member Author

This is also being addressed in #388

@korikuzma korikuzma linked a pull request Nov 14, 2024 that will close this issue
korikuzma added a commit that referenced this issue Nov 20, 2024
close #388 , #395, #309

* Breaking changes
  * Update/add ga4gh packages
    * Add `ga4gh.cat_vrs` + `ga4gh.va_spec` packages 
* The Pydantic models in these packages replace the manually created
models in `metakb/schemas/annotation.py`,
`metakb/schemas/categorical_variation.py`, and
`metakb/schemas/variation_statement.py`. (#388)
    * `ga4gh.vrs` version bumped
* The models in all ga4gh packages caused breaking changes (mainly
renaming) to the evidence models (such as #395)
* Represent categorical variation members and constraints properly
(#309)
  * Standardize representing normalizer data in extensions
* The `extension.name` will always be `vicc_normalizer_data` and value
will contain `id`, `label`, and optional `mondo_id` (for disease)
* Simplify assertion checks in tests
Copy link

Closed by #404.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority:medium Medium priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants