Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-125 Use wikidata to provide skos:definition to owl:Class'es #203

Closed
wants to merge 1 commit into from

Conversation

lewismc
Copy link
Member

@lewismc lewismc commented Jul 20, 2020

PR replaces #202

Please let me know feedback.

@graybeal there should be no conflicts on the definitions.

@pbuttigieg
Copy link
Collaborator

Are the dcterms:creator annotations (e.g. here) at the class level or the definition level? In either case, it's somewhat misleading unless @lewismc created the classes de novo.

Otherwise, this is a pretty good SKOS level solution to filter Wikidata for the SWEET user base.

However, I think the wikidata defs should be superceded once we get expert input / definitions from activities like the Semantic Harmonization Cluster's cryohackathons (@rduerr). We need a way to identify those activities as definition sources.

@graybeal
Copy link
Collaborator

@pbuttigieg the annotations are at the definition level, I'm not sure why you think that's misleading? The definition is declared with a blank node, the blank node has 4 annotations including dcterms:creator, it seems clear to me it applies to the definition that was previously declared. (Otherwise its subject would be :HumanActivity, right?)

On the other hand, I'm not sure why all the prefixes were replaced by just the colons, I guess just one more default parsing pattern simplification to master…? It actually make things a little less clear to me, but if it works for others I'll come around.

I do agree that the cryohackathon activity needs its own identifier to use as the source for any definitions it supplies, that will be a big win! But I don't agree that wikidata defs should be superceded unless there is a consensus that they are faulty.

The argument I am putting forward is that SWEET is not going become authoritative as to what is the "best" definition. Even when definitions come from experts there are sooner or later going to be other experts that have created their own favorite way of defining the world. Even if you think having "one answer to rule them all" is the better principal, as a practical matter I don't think the SWEET team will be spending time wisely to put itself in the role of adjudicating which sources, including which expert teams, should have their definitions supercede other definitions.

@graybeal
Copy link
Collaborator

Lewis, big picture I'm ready to approve (or whatever it is) the change.

But at a detailed level, after 10 minutes I've found two individual cases that need rejection, and one that is arguable:

I think given that many issues in 1.2 files, it is best to wait, so as to not include too many conspicuously wrong entries.

It's hard to parse in the existing format, so I can't really justify another 5 hours or so reviewing these one at a time. But if someone can put them into a spreadsheet (just need concept IRI and definition strings), maybe on Google sheets is best, I could probably review all of them in an hour, and mark them up for rejecting or further evaluating. And it would make it easy for others to check my work.

(Sorry, I could maybe build that table in an hour, or maybe not, but I need to get some other stuff done for a little bit.)

@lewismc
Copy link
Member Author

lewismc commented Jul 20, 2020

On the other hand, I'm not sure why all the prefixes were replaced by just the colons, I guess just one more default parsing pattern simplification to master…? It actually make things a little less clear to me, but if it works for others I'll come around.

I kinda agree here as well. This shorthand writing method is default in OWLAPI Java codebase as well.

You may have also noticed that some prefixes are removed...I'm working on addressing some of these issues but that is another issue.

@brandonnodnarb
Copy link
Member

brandonnodnarb commented Jul 20, 2020

Thoughts from an admittedly quick review:

  1. this is pretty rad.

  2. The lack of prefixes should be a ttl shorthand for base. I.e. no need to prefix self. This seems to be consistent from my spot check, but I have not looked thoroughly.

  3. It may be appropriate to use skos:related for the automated linkage, with a more refined semantics --- e.g. skos:closeMatch or skos:exactMatch (or whatever) --- reserved for verified relationships. I'm assuming that any/all verification is going to be a human task, at least for the time being, and this could be an easy way to find/query "verified" definitions versus automated linkages without conflating it with accuracy, efficacy, etc.

  4. I propose creating a separate graph/file for contributions --- i.e. contributions.ttl --- which contains all the provenance.... metadata: Contributions, creators, edits, etc.

  5. Following from 4, it probably also makes sense then to create a contributors.ttl file where any and everyone can add their desired dc: info, probably similar to what is currently described on the recognition page.

  6. Following from 3 and 5, something along the lines of a sweet:verifiedBy relation may also be appropriate. Perhaps as a type of prov:Activity?

  7. The matching is entirely based on string matching against the labels, correct? At any rate, it may be beneficial, particularly for verification, if there were a table listing: realm | NL label | wikidata link | wikidata def

If such a table existed it may make verification a bit more straightforward as it can be filtered and read/interpreted without the surrounding...faff.

@lewismc is this type of table/csv easily outputted from your scala code, or is there a need for a separate filtering script or SPARQL query?

@brandonnodnarb
Copy link
Member

ach. Ok, so 4-6 in my previous comment should probably be their own issues for discussion. If any of you think they deserve discussion I can add them as issues with at least a minimal proposal.

@brandonnodnarb
Copy link
Member

brandonnodnarb commented Jul 20, 2020

I would approve this for a 'development' branch.

In a related note, would stable and development branches make sense for this, hopefully burgeoning, crowd? :)

EDIT: added the latter here --- #204

@lewismc
Copy link
Member Author

lewismc commented Jul 20, 2020

The matching is entirely based on string matching against the labels, correct?

Correct

@lewismc is this type of table/csv easily outputted from your scala code, or is there a need for a separate filtering script or SPARQL query?

I will go ahead and create the table as John also requested it.

@rrovetto
Copy link
Collaborator

rrovetto commented Jul 20, 2020

"But if someone can put them into a spreadsheet (just need concept IRI and definition strings), maybe on Google sheets is best, I could probably review all of them in an hour, and mark them up for rejecting or further evaluating. And it would make it easy for others to check my work."

update: exported the module https://github.com/lewismc/sweet/blob/3f6ede0dfbfd6c6c5e1b47b13011a12c08224c98/src/human.ttl for example but some content wasn't displayed. Trying another way.

@lewismc
Copy link
Member Author

lewismc commented Jul 21, 2020

@rrovetto

To confirm, is this the correct source?...

Correct source for what? That is a collection of all of the file-level base URI's for the entire SWEET ontology suite.

@dr-shorthair
Copy link
Collaborator

Attempting to load in TopBraid so I can run the SPARQL: a lot of errors from mis-formatted xsd"dateTime and xsd:dateTimeStamp :-(

@dr-shorthair
Copy link
Collaborator

in phenCryo and realmCryo ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants