Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-125 Use wikidata to provide skos:definition to owl:Class'es #208

Closed
wants to merge 2 commits into from

Conversation

lewismc
Copy link
Member

@lewismc lewismc commented Jul 23, 2020

This issue addresses all prior suggestions contained in related PR's #205 #203 #202 #201 and #200.

@smrgeoinfo it also removes all spurious definitions as you had highlighted.

@brandonnodnarb this addresses of all the issues you pointed out.

Please review and provide any feedback. Thanks

@lewismc lewismc requested a review from brandonnodnarb July 23, 2020 18:26
@lewismc lewismc linked an issue Jul 23, 2020 that may be closed by this pull request
@lewismc lewismc added this to the 3.6.0 milestone Jul 23, 2020
@lewismc lewismc self-assigned this Jul 23, 2020
@smrgeoinfo
Copy link
Collaborator

smrgeoinfo commented Jul 23, 2020

I did not do a complete pass through all the suggested Wikidata definition mappings. I suspect there are other defs that should be rejected... I can work on that but it will take a few days.

@lewismc
Copy link
Member Author

lewismc commented Jul 23, 2020

@smrgeoinfo ... no problem. I am in no rush to close this issue. If and when you can review, please do. Thank you

@brandonnodnarb
Copy link
Member

link to spreadsheet for convenience

I'll start from the last entry and work my way up (as I have time this weekend).

@wdduncan
Copy link

@lewismc

Have you looked at Chris Mungall's sparql-prog tool for wikidata?: https://github.com/cmungall/sparqlprog_wikidata

It may help mine data from Wikidata.

@lewismc
Copy link
Member Author

lewismc commented Jul 25, 2020

Hi @wdduncan

Have you looked at Chris Mungall's

Yeah I did previously. Great piece of kit! I think that post-'this issue' I'll approach @cmungall again and see if we can re-run/update some of his previous efforts in this area. Thanks for dropping in.

@rrovetto
Copy link
Collaborator

rrovetto commented Jul 26, 2020

I added #210 for some recommendations.
So far I spent a few hours reviewing and adding input to the spreadsheet.
If an online spreadsheet is prefer to adding Issues, let me know.
I also made this doc to summarize meta considerations/method.
Will continue as able.

@lewismc
Copy link
Member Author

lewismc commented Jul 26, 2020

Excellent @rrovetto thank you so much. I'll incorporate these into the solution when I get a minute.

@brandonnodnarb
Copy link
Member

@lewismc are the previous iterations of SWEET available on COR? i ask because, from memory, there were previously Wikipedia definitions in SWEET. I'm not sure which version, or why they were removed, but they may be useful for disambiguation in this task.

@brandonnodnarb
Copy link
Member

brandonnodnarb commented Aug 19, 2020

Just had a look --- apparently I still have a local copy of at least a few previous versions of SWEET :) On a quick grep through, it appears that there was text in the rdfs:comment tags/files with a [Wikipedia] reference (textual; no link). Stats attached (tsv using txt file type for github conformance).

There is no link, version, or any other information, but there is text which could be compared using a similarity function which may, at minimum, rule out the non-domain faff.

Does this help, or hurt? :)
SWEET_wikipedia_refs_stats.txt

@lewismc
Copy link
Member Author

lewismc commented Aug 20, 2020

@brandonnodnarb

are the previous iterations of SWEET available on COR?

Yes, for any given resource just navigate to the versions pulldown and click on whichever version you wish to view : )

@lewismc
Copy link
Member Author

lewismc commented Sep 4, 2020

Hi folks, any further comments here? Thank you

@brandonnodnarb
Copy link
Member

It's still on my radar. Haven't had time to dig into it properly.

@pbuttigieg
Copy link
Collaborator

Discussed on today's SemTech call.

General feeling - as long as this doesn't overwrite the work done by the Semantic Harmonization and the issue of label/domain matches (see below), then we can move forward.

Things should be clear as long as we're clear who (e.g. Wikidata, ENVO) is making the definitional claim (e.g. by annotating the definition annotation property).

The issue of a lack of domain matching in favour of simple label matching presents a major issue - some are simply wrong.
Attempting to match class hierarchies in Wikidata / SWEET is likely to be helpful - note the semantics here are constrained to structured labels.

Suggestions to split this PR into a realm-by-realm task may be better to focus work and spot issues in a contained space.
There should be a human review process involved to curate the auto-population, or the definitions should be kept in an experimental or development branch. This branch can be coupled with a pre-release version (e.g. like the OBO *-edit.owl files)

@lewismc lewismc added the esipwinter2021 Work which will feature at the ESIP Winter 2021 meeting label Jan 13, 2021
@lewismc
Copy link
Member Author

lewismc commented Jan 29, 2021

Now that #211 is resolved. I will revisit this issue and update the conflicts.

@brandonnodnarb
Copy link
Member

If interested, we could sort out #218 first and then work on developing a more intelligent filter for matching and generating results.

@lewismc
Copy link
Member Author

lewismc commented Jan 29, 2021 via email

@brandonnodnarb
Copy link
Member

#218 is now closed and the PR (#246) merged with master, I'd like to revisit the approach. This likely has broader implications related to automating (or semi-automating) adding definitions from multiple resources.
Related to #225

@lewismc
Copy link
Member Author

lewismc commented May 28, 2021

Nice work @brandonnodnarb I'm happy to close this one off and regenerate the PR.
I'll try and get a PR together over the weekend.

@lewismc lewismc closed this May 28, 2021
@lewismc lewismc deleted the ISSUE-125 branch May 28, 2021 04:34
@lewismc
Copy link
Member Author

lewismc commented May 28, 2021

I think we should also push 3.5.0 once we have merged this into master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement esipwinter2021 Work which will feature at the ESIP Winter 2021 meeting
Projects
None yet
6 participants