Skip to content

Commit

Permalink
#14 #13 add/ update READMEs to explain csv generation and cleaning
Browse files Browse the repository at this point in the history
  • Loading branch information
SArndt-TIB committed Jun 19, 2024
1 parent dca4d23 commit d862729
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 22 deletions.
45 changes: 23 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,52 +9,53 @@ We decided to build upon this work and build and RDF based ontology, for the *DF

![](./docs/dfgfo-hierarchies.png)

## Ontology


## Ontology
* **Ontology TTL**: [dfgfo.ttl](./dfgfo.ttl)
* **Ontology IRI**: https://github.com/tibonto/dfgfo/
* **Ontology IRI**: <https://github.com/tibonto/dfgfo/>
* **Ontology PURL**: <https://raw.githubusercontent.com/tibonto/DFG-Fachsystematik-Ontology/main/dfgfo.ttl>
* **ontology prefix/id**: `dfgfo`


## Create/update ontology
## Create/update ontology

**[dfgfo.ttl](./dfgfo.ttl) ontology file is created, by [scripts/create_ontology.py](./scripts/create_ontology.py) python script**, which
* parses the DFG classification system encoded [csv/Fachsystematik_2020-2024.csv](./csv/Fachsystematik_2020-2024.csv) (in EN/DE)

* parses the DFG classification system encoded in csv/Fachsystematik_20XX-20XX.csv (in EN/DE) (cf. directory [csv/](/csv/) and [csv/README.md](/csv/README.md))
* encodes each of the DFG's classification subjects (in .csv cells) into RDF graph triples
* of type `owl:Class`
* with `rdfs:label` in EN and skos:altLabel in DE
* subsumed to parent subject with `rdfs:subClassOf` accordinng to DFG Classification hierarchy
* of type `owl:Class`
* with `rdfs:label` in EN and skos:altLabel in DE
* subsumed to parent subject with `rdfs:subClassOf` accordinng to DFG Classification hierarchy
* parses the metadata triples from [metadata.ttl](./metadata.ttl) into a graph
* joins metadata and DFG classification graphs into [dfgfo.ttl](./dfgfo.ttl)


**Run**
### Run

Create a python3 Virtual Environment

Install requirements `pip install -r scripts/requirements.txt`

Run script to create ontology `python scripts/create_ontology.py`. Make sure to use end of line sequence `LF` for [/csv/Fachsystematik_2020-2024.csv](/csv/Fachsystematik_2020-2024.csv).


## Other scripts

* [scripts/parse_csv.py](./scripts/parse_csv.py) parses the CSV and ensures that the columns `Subject Number` and `Fachnummer` have the same values

## Ontology contributions:
Contributions are welcome.

At every push or pull_request a [ROBOT report](http://robot.obolibrary.org/report) and [ROBOT validate OWL DL profile](http://robot.obolibrary.org/validate-profile)test will be run from [.github/workflows/main.yml](.github/workflows/main.yml).
## Ontology contributions

Contributions are welcome.

At every push or pull_request a [ROBOT report](http://robot.obolibrary.org/report) and [ROBOT validate OWL DL profile](http://robot.obolibrary.org/validate-profile) test will be run from [.github/workflows/main.yml](.github/workflows/main.yml).

## DFG Classification of Scientific Disciplines

* [PDF(en)](https://www.dfg.de/download/pdf/dfg_im_profil/gremien/fachkollegien/amtsperiode_2020_2024/fachsystematik_2020-2024_en_grafik.pdf)
* [PDF(de)](https://www.dfg.de/download/pdf/dfg_im_profil/gremien/fachkollegien/amtsperiode_2020_2024/fachsystematik_2020-2024_de_grafik.pdf)
* [HTML page](https://www.dfg.de/en/dfg_profile/statutory_bodies/review_boards/subject_areas/index.jsp)
* [Edited CSV - combining both German and English labels](./csv/Fachsystematik_2020-2024.csv) (this repo)


* [HTML page](https://www.dfg.de/en/research-funding/proposal-funding-process/interdisciplinarity/subject-area-structure)
* PDFs
* 2020-2024
* [PDF(en)](https://www.dfg.de/download/pdf/dfg_im_profil/gremien/fachkollegien/amtsperiode_2020_2024/fachsystematik_2020-2024_en_grafik.pdf)
* [PDF(de)](https://www.dfg.de/download/pdf/dfg_im_profil/gremien/fachkollegien/amtsperiode_2020_2024/fachsystematik_2020-2024_de_grafik.pdf)
* 2024-2028
* [PDF(en)](https://www.dfg.de/resource/blob/331950/85717c3edb9ea8bd453d5110849865d3/fachsystematik-2024-2028-en-data.pdf)
* [PDF(de)](https://www.dfg.de/resource/blob/331944/33422f091e941592cdc355038a865e03/fachsystematik-2024-2028-de-data.pdf)
* Edited CSV - combining both German and English labels
* [2020-2024](/csv/2020-2024/Fachsystematik_2020-2024.csv) (this repo)
* [2024-2028](/csv/2024-2028/Fachsystematik_2024-2028.csv) (this repo)
13 changes: 13 additions & 0 deletions csv/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Tabular input for ontology creation

The DFG Fächerklassifikation is published in .pdf format by DFG. Fortunately, the DFG also made available some tabular data in the form of two .xlsx files - one containing the German language version of the Fachsystematik, the other containing the English language version of the Fachsystematik. In order to create the ontology file, we need to process the data with a script that requires a .csv file as input.

## Creating the .csv input for [create_ontology.py](/scripts/create_ontology.py) - some irregularties explained

The ontology is created with [create_ontology.py](/scripts/create_ontology.py), which requires a .csv file as input. The .csv file is manually created from both .xlsx files provided by DFG. The .xlsx files contain line breaks and merged cells. To prepare the .csv file, merged cells need to be unmerged, and empty cells need to be filled down with the respective values.

The cells also contain line breaks and trailing white spaces. These may vary in between versions. This is a problem for [create_ontology.py](/scripts/create_ontology.py). The script may not be working with new versions of the Fachsystematik, unless the table is cleaned up, e.g. unexpected line breaks need to be removed, new trailing white spaces need to be removed, etc. until the script can parse through the whole file.

## Checking the alignment of German and English version in the .csv file

The ontology can only be created properly, if English and German version of the Fachsystematik align exactly in the .csv file. This can be tested with [parse_csv.py](/scripts/parse_csv.py).

0 comments on commit d862729

Please sign in to comment.