Proposal to include a property for "type of data" contained in a dataset #1548

svituz · 2022-11-23T14:05:20Z

I think it can be useful for Dataset to have a property that indicates the type of data that can be found in it (such as dcat:collectedData or dcat:typeOfData).
For example, for a Dataset with clinical data, it can be useful to have such a property, whose value may be taken from controlled vocabularies or ontologies (e.g. LOINC, SNOMED) to express concepts such as "Laboratory Exams", "Vital Signs Obervation" etc..
I think it can be useful to increase the findability of datasets, also for other domains.

Any thoughts about this?

The text was updated successfully, but these errors were encountered:

rob-metalinkage · 2022-11-23T19:02:04Z

This speaks to the ambiguity in the semantics of dcterms:conformsTo... does it relate to the subject or the resource the subject describes?

Any data service is going to have at least 5 different conformance aspects.. access method, data model, data content type and service level.

Possibly an aspect oriented qualified relationship is needed.

Dimensions of data may be separate aspects.. or a compound aspect. Rdf datacube provides quite powerful starting point for this.

andrea-perego · 2022-11-23T21:24:49Z

@svituz , your requirement it is not completely clear to me.

Is it about conformity and data structure definition, as per @rob-metalinkage 's comment? Or is it about the classification of a dataset - which in DCAT is done via dcat:theme?

It would be useful if you could provide a full example, ideally also with its RDF representation.

svituz · 2022-11-23T22:10:43Z

@andrea-perego let's say I have a Dataset that collects data about a clinical study regarding a specific disease (the theme). The data you can find in the Dataset contains laboratory exams. I think these are two different concepts.
The RDF example would be:

:studyDataset1 a dcat:Dataset;
dcat:theme icd10:C50;
dcat:collectedData obo:NCIT_C25294.

This RDF would describe a Dataset containing laboratory exams of clinical cases with breast cancer, for example.

You can have another one with the same theme (the disease) but a different type of data for example digital x ray

:studyDataset2 a dcat:Dataset;
dcat:theme icd10:C50;
dcat:collectedData obo:NCIT_C18001.

Even in the case that dcterms:conformsTo relates to the resources and not the subject (I thought the second scenario) I see it applicable to indicate a model describing the structure of the laboratory exams and not to say that you can find laboratory exams in it.

Hope to have clarified my issue and that it makes sense. I looked a lot in dcat (and also outside of dcat) to solve this issue we have but couldn't find a satisfying solution.

If you're interested in the context, here you can read about it

init-dcat-ap-de · 2023-08-09T09:00:28Z

Could you use the PROV Ontology?

rob-metalinkage · 2023-08-11T01:15:48Z

I'll circle back to my comment - there are multiple aspects - do you want a property for every possible one or a flexible mechanism to support documentation.

a well known ontology could provide predicates (such as prov:wasGeneratedBy)

in this case you need to describe data structure - typically container organisation or custom application schema (and specialised profiles thereof), data dimensions (e.g. RDF datacube), nature of data elements within the containers etc.

Profiles of DCAT supporting available descriptive ontologies would be better than half-implementing via a limited set of properties.

bertvannuffelen · 2024-04-29T08:05:12Z

@svituz when reading your case I get the feeling it could be resolved with as @rob-metalinkage mentioned creating a proper DCAT profile.

For you specific profiling case, DCAT has 3 options for classifications:

keywords which are literals
subject which are concepts usually exprected from some classification
themes which are a special kind of subjects

In the DCAT-AP ecosystem we encounter this need regulary that one would like to classify the datasets according to some domain specific needs.
There are two strategies here:

an aggregation one: use dct:subject directly
an precise distinctive one: use a subproperty of dct:subject

The second option is the safest in case one deals with datasets that must be documented by multiple DCAT profiles. It means that in your domain you can express a specific set of constraints on that one, and that any other user of that metadata can use it as if it was dct:subject.

:studyDataset1 a dcat:Dataset;
dcat:theme icd10:C50;
profile:collectionOrigin obo:NCIT_C25294.

profile:collectionOrigin rdfs:subPropertyOf dct:subject.
profile:collectionOrigin rdfs:description "The origin of data collection"@en.
profile:collectionOrigin skos:note "This is indicated using the NCIT classification"@en.

You can use this pattern for as many classifications you want without loosing compatibility with DCAT (because of the subPropertyOf relationship).

andrea-perego added dcat feedback Issues stemming from external feedback to the WG labels Nov 23, 2022

andrea-perego added this to the DCAT3 CR milestone Nov 23, 2022

andrea-perego added the dcat:theme label Nov 23, 2022

andrea-perego mentioned this issue Nov 23, 2022

Update ack section for CR #1549

Closed

davebrowning added the future-work issue deferred to the next standardization round label Feb 13, 2023

davebrowning modified the milestones: DCAT3 CR, DCAT Future Priority Work Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal to include a property for "type of data" contained in a dataset #1548

Proposal to include a property for "type of data" contained in a dataset #1548

svituz commented Nov 23, 2022 •

edited

Loading

rob-metalinkage commented Nov 23, 2022

andrea-perego commented Nov 23, 2022

svituz commented Nov 23, 2022

init-dcat-ap-de commented Aug 9, 2023

rob-metalinkage commented Aug 11, 2023

bertvannuffelen commented Apr 29, 2024

Proposal to include a property for "type of data" contained in a dataset #1548

Proposal to include a property for "type of data" contained in a dataset #1548

Comments

svituz commented Nov 23, 2022 • edited Loading

rob-metalinkage commented Nov 23, 2022

andrea-perego commented Nov 23, 2022

svituz commented Nov 23, 2022

init-dcat-ap-de commented Aug 9, 2023

rob-metalinkage commented Aug 11, 2023

bertvannuffelen commented Apr 29, 2024

svituz commented Nov 23, 2022 •

edited

Loading