Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug?: Subclass sync: Why many deletions? (DO & NCIT) #734

Open
joeflack4 opened this issue Dec 19, 2024 · 7 comments
Open

Bug?: Subclass sync: Why many deletions? (DO & NCIT) #734

joeflack4 opened this issue Dec 19, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@joeflack4
Copy link
Contributor

Overview

Many deletions showing up for these:
Subclass sync run from Dec - monarch-initiative/mondo#8503
Subclass sync run from Nov - monarch-initiative/mondo#8432

Note that there is no -confirmed file for NCIT.

@joeflack4
Copy link
Contributor Author

joeflack4 commented Dec 21, 2024

DO removals

I examined 4 cases and determined that they were appropriate deletions of subclass evidence. I found 3 types of cases.

1. Redundant is_a declaration

These types do not show up in the subclass sync because they are instances where a SCR (subclass relation) appears to be direct in mondo-edit, but it has been reasoned to be indirect.

MONDO:0000193 is_a MONDO:0005151 (del: DOID:0090139)

-is_a: MONDO:0005151 {source="DOID:0090139", source="Orphanet:168588/inferred"} ! endocrine system disorder
+is_a: MONDO:0005151 ! endocrine system disorder

Current state in mondo-edit.obo:

id: MONDO:0000193
name: cortisone reductase deficiency
xref: DOID:0090139 {source="MONDO:equivalentTo"}
is_a: MONDO:0005039 {source="https://orcid.org/0000-0001-9310-0163"} ! reproductive system disorder
is_a: MONDO:0005151 ! endocrine system disorder
is_a: MONDO:0015898 {source="Orphanet:168588"} ! adrenogenital syndrome

Current state in mondo.owl:

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/MONDO_0000193">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/MONDO_0002525"/>
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/MONDO_0005039"/>
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/MONDO_0015898"/>

Observe that is_a MONDO:0005151 changed to MONDO:0002525.

@joeflack4
Copy link
Contributor Author

joeflack4 commented Dec 21, 2024

2. Disagreement in parentage

MONDO:0000009 is_a MONDO:0002243 (del: DOID:2218)

  • why deleted
    • Mondo term's mapped source term has a different parent than the one that is mapped to Mondo's parent.
  • actual source subclasses
    • DOID:2218 is_a DOID:1247
  • exact matches
    • MONDO:0000009 -> DOID:2218
    • MONDO:0002243 -> DOID:2213

@joeflack4
Copy link
Contributor Author

3. Missing entries in mondo.sssom.tsv

This is something that could also happen, but I did not yet observe such caes.

@joeflack4
Copy link
Contributor Author

joeflack4 commented Dec 21, 2024

NCIT removals

Edit: Trish's Analysis

@matentzn We observed some deletions of NCIT subclass evidence when running the subclass sync pipeline. Many of these appear to be caused by missing owl:Class declarations in the component. However, these declarations do appear in the mirror. Any idea why they are removed? Is it a bug; intentional?

Example: NCIT:C118172
component-download-ncit.owl.owl:

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/NCIT_C118172">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/NCIT_C34588"/>
        ...
        <rdfs:label>Nocturnal Enuresis</rdfs:label>
    </owl:Class>

components/ncit.owl:

    <rdf:Description rdf:about="http://purl.obolibrary.org/obo/NCIT_C118172">
        <obo:IAO_0000115>Urination during sleep.</obo:IAO_0000115>
        <oboInOwl:hasExactSynonym>Bedwetting</oboInOwl:hasExactSynonym>
        <oboInOwl:hasExactSynonym>Nocturnal Enuresis</oboInOwl:hasExactSynonym>
        <oboInOwl:hasExactSynonym>Sleep Enuresis</oboInOwl:hasExactSynonym>
        <oboInOwl:inSubset rdf:resource="http://purl.obolibrary.org/obo/NCIT_C118464"/>
        <oboInOwl:inSubset rdf:resource="http://purl.obolibrary.org/obo/NCIT_C189762"/>
        <oboInOwl:inSubset rdf:resource="http://purl.obolibrary.org/obo/NCIT_C90259"/>
        <rdfs:label>Nocturnal Enuresis</rdfs:label>
    </rdf:Description>
$(COMPONENTSDIR)/ncit.owl: $(TMPDIR)/ncit_relevant_signature.txt | component-download-ncit.owl
	if [ $(SKIP_HUGE) = false ] && [ $(COMP) = true ]; then $(ROBOT) remove -i $(TMPDIR)/component-download-ncit.owl.owl --select imports \
		rename --mappings config/property-map.sssom.tsv --allow-missing-entities true --allow-duplicates true \
		query \
			--update ../sparql/rm_xref_by_prefix.ru \
			--update ../sparql/exact_syn_from_label.ru \
		remove -T $(TMPDIR)/ncit_relevant_signature.txt --select complement --select "classes individuals" --trim false \
			--drop-axiom-annotations NCIT:P378 \
			--drop-axiom-annotations NCIT:P383 \
			--drop-axiom-annotations NCIT:P384 \
		remove -T config/properties.txt --select complement --select properties --trim true \
		remove --term "http://purl.obolibrary.org/obo/NCIT_C179199" --axioms "equivalent" \
		annotate --ontology-iri $(URIBASE)/mondo/sources/ncit.owl --version-iri $(URIBASE)/mondo/sources/$(TODAY)/ncit.owl -o $@; fi
  1. Why are owl:Class removed and rdf:Description added?
  • Something to do with remove -T $(TMPDIR)/ncit_relevant_signature.txt --select complement --select "classes individuals" --trim false \?
  1. Why are subClassOf removed?
  • I suppose because these can only appear on owl:Class instances.

@matentzn
Copy link
Member

Remember that we only care about neoplasm branch. For NCIT, do not look for evidence outside the neoplasm branch, at least not in the current way things are set up! I am pretty sure we had that discussion before so please remember: NCIT = NCIT neoplasm branch. Bedwetting is nearly certainly not in there!

So if all the dropped evidence is to subclass relationships outside the neoplasm branch, nothing to worry about. If we want such evidence, that requires an issue and some planning in the next call!

@twhetzel
Copy link
Contributor

Ok, it needs to be confirmed that the only subclass evidence that is being dropped is for classes within the neoplasm branch. Is this the design that Sabrina agreed to?

@matentzn
Copy link
Member

matentzn commented Dec 23, 2024

Sabrina agreed that the only part of NCIT we are syncing is the neoplasm branch, so I assume it's clear we are not syncing anything outside it - including subclass confirmations!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants