Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to include a property for "type of data" contained in a dataset #1548

Open
svituz opened this issue Nov 23, 2022 · 6 comments
Open
Labels
dcat:theme dcat feedback Issues stemming from external feedback to the WG future-work issue deferred to the next standardization round

Comments

@svituz
Copy link

svituz commented Nov 23, 2022

I think it can be useful for Dataset to have a property that indicates the type of data that can be found in it (such as dcat:collectedData or dcat:typeOfData).
For example, for a Dataset with clinical data, it can be useful to have such a property, whose value may be taken from controlled vocabularies or ontologies (e.g. LOINC, SNOMED) to express concepts such as "Laboratory Exams", "Vital Signs Obervation" etc..
I think it can be useful to increase the findability of datasets, also for other domains.

Any thoughts about this?

@rob-metalinkage
Copy link
Contributor

This speaks to the ambiguity in the semantics of dcterms:conformsTo... does it relate to the subject or the resource the subject describes?

Any data service is going to have at least 5 different conformance aspects.. access method, data model, data content type and service level.

Possibly an aspect oriented qualified relationship is needed.

Dimensions of data may be separate aspects.. or a compound aspect. Rdf datacube provides quite powerful starting point for this.

@andrea-perego andrea-perego added dcat feedback Issues stemming from external feedback to the WG labels Nov 23, 2022
@andrea-perego andrea-perego added this to the DCAT3 CR milestone Nov 23, 2022
@andrea-perego
Copy link
Contributor

@svituz , your requirement it is not completely clear to me.

Is it about conformity and data structure definition, as per @rob-metalinkage 's comment? Or is it about the classification of a dataset - which in DCAT is done via dcat:theme?

It would be useful if you could provide a full example, ideally also with its RDF representation.

@svituz
Copy link
Author

svituz commented Nov 23, 2022

@andrea-perego let's say I have a Dataset that collects data about a clinical study regarding a specific disease (the theme). The data you can find in the Dataset contains laboratory exams. I think these are two different concepts.
The RDF example would be:

:studyDataset1 a dcat:Dataset;
dcat:theme icd10:C50;
dcat:collectedData obo:NCIT_C25294.

This RDF would describe a Dataset containing laboratory exams of clinical cases with breast cancer, for example.

You can have another one with the same theme (the disease) but a different type of data for example digital x ray

:studyDataset2 a dcat:Dataset;
dcat:theme icd10:C50;
dcat:collectedData obo:NCIT_C18001.

Even in the case that dcterms:conformsTo relates to the resources and not the subject (I thought the second scenario) I see it applicable to indicate a model describing the structure of the laboratory exams and not to say that you can find laboratory exams in it.

Hope to have clarified my issue and that it makes sense. I looked a lot in dcat (and also outside of dcat) to solve this issue we have but couldn't find a satisfying solution.

If you're interested in the context, here you can read about it

@davebrowning davebrowning added the future-work issue deferred to the next standardization round label Feb 13, 2023
@init-dcat-ap-de
Copy link

Could you use the PROV Ontology?

@rob-metalinkage
Copy link
Contributor

I'll circle back to my comment - there are multiple aspects - do you want a property for every possible one or a flexible mechanism to support documentation.

a well known ontology could provide predicates (such as prov:wasGeneratedBy)

in this case you need to describe data structure - typically container organisation or custom application schema (and specialised profiles thereof), data dimensions (e.g. RDF datacube), nature of data elements within the containers etc.

Profiles of DCAT supporting available descriptive ontologies would be better than half-implementing via a limited set of properties.

@bertvannuffelen
Copy link

@svituz when reading your case I get the feeling it could be resolved with as @rob-metalinkage mentioned creating a proper DCAT profile.

For you specific profiling case, DCAT has 3 options for classifications:

  • keywords which are literals
  • subject which are concepts usually exprected from some classification
  • themes which are a special kind of subjects

In the DCAT-AP ecosystem we encounter this need regulary that one would like to classify the datasets according to some domain specific needs.
There are two strategies here:

  • an aggregation one: use dct:subject directly
  • an precise distinctive one: use a subproperty of dct:subject

The second option is the safest in case one deals with datasets that must be documented by multiple DCAT profiles. It means that in your domain you can express a specific set of constraints on that one, and that any other user of that metadata can use it as if it was dct:subject.

:studyDataset1 a dcat:Dataset;
dcat:theme icd10:C50;
profile:collectionOrigin obo:NCIT_C25294.

profile:collectionOrigin rdfs:subPropertyOf dct:subject.
profile:collectionOrigin rdfs:description "The origin of data collection"@en.
profile:collectionOrigin skos:note "This is indicated using the NCIT classification"@en. 

You can use this pattern for as many classifications you want without loosing compatibility with DCAT (because of the subPropertyOf relationship).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dcat:theme dcat feedback Issues stemming from external feedback to the WG future-work issue deferred to the next standardization round
Projects
None yet
Development

No branches or pull requests

6 participants