Skip to content

Latest commit

 

History

History
113 lines (98 loc) · 5.52 KB

README.md

File metadata and controls

113 lines (98 loc) · 5.52 KB

DCAT-AP for Wikibase

Note

This repo stems from the time before the tool was integrated into the Wikimedia dumping infrastructure. The up-to-date repository can be found at wikimedia/operations-dumps-dcat.


A project aimed at generating a DCAT-AP document for Wikibase installations in general and Wikidata in particular.

Takes into account access through:

  • Content negotiation (various formats)
  • MediaWiki api (various formats)
  • Entity dumps e.g. json, ttl (assumes that these are compressed)

An example result can be found at lokal-profil / dcatap.rdf. The live DCAT-AP description of Wikidata can be found here.

To use

  1. Copy config.example.json to config.json and change the contents to match your installation. Refer to the Config section below for an explanation of the individual configuration parameters.
  2. Copy catalog.example.json to a suitable place (e.g. on-wiki) and update the translations to fit your wikibase installation. Set this value as catalog-i18n in the config file.
  3. Create the dcatap.rdf file by running php DCAT.php or php DCAT.php --config="<path_1>" --dumpDir="<path_2>" --outputDir="<path_3>" where each of the options is optional and can be left out. The options are:
    1. --config is the relative path to the json file containing the configurations, defaults to ./config.json
    2. --dumpDir is the relative path to the directory containing the dumps (if any), defaults to the directory parameter in the config file
    3. --outputDir is the relative path to the directory where the dcatap.rdf file should be created, defaults to the directory parameter in the config file

Translations

  • Translations which are generic to the tool are handled by Intuition and should be translated through translatewiki.net.
  • Translations which are specific to a project/catalog are added to the location specified in the catalog-i18n parameter of the config file.

Config

Below follows a key by key explanation of the config file.

  • directory: Relative path to the directory containing the dump subcategories (if any) and for the final dcat file.
  • api-enabled: (Boolean) Is API access activated for the MediaWiki installation?
  • dumps-enabled: (Boolean) Is JSON dump generation activated for the WikiBase installation?
  • uri: URL used as basis for rdf identifiers, e.g. http://www.example.org/about
  • catalog-homepage: URL for the homepage of the WikiBase installation, e.g. http://www.example.org
  • catalog-issued: ISO date at which the WikiBase installation was first issued, e.g. 2000-12-24
  • catalog-license: License of the catalog, i.e. of the dcat file itself (not the contents of the WikiBase installation), e.g. http://creativecommons.org/publicdomain/zero/1.0/
  • catalog-i18n: URL or path to json file containing i18n strings for catalog title and description. Can be an on-wiki page, e.g. https://www.example.org/w/index.php?title=MediaWiki:DCAT.json&action=raw
  • keywords: (array) List of keywords applicable to all of the datasets
  • themes: (array) List of thematic ids in accordance with Eurovoc, e.g. 2191 for http://eurovoc.europa.eu/2191
  • publisher:
    • name: Name of the publisher
    • homepage: URL for or the homepage of the publisher
    • email: Contact e-mail for the publisher, should be a function address, e.g. [email protected]
    • publisherType: Publisher type according to ADMS, e.g. NonProfitOrganisation
  • contactPoint:
    • name: Name of the contact point
    • email: E-mail for the contact point, should ideally be a function address, e.g. [email protected]
    • vcardType: Type of contact point, either Organization or Individual
  • ld-info:
  • api-info:
    • accessURL: URL to the MediaWiki API endpoint of the wiki, e.g. http://www.example.org/w/api.php
    • mediatype: (object) List of non-deprecated formats available thorough the API, see ld-info:mediatype above for formatting
    • license: See ld-info:license above
  • dump-info:
    • accessURL: URL to the directory where the .json.gz files reside ($1 is replaced on the fly by the actual filename), e.g. http://example.org/dumps/$1
    • mediatype: (object) List of media types. e.g. {"json": "application/json"}
    • compression: (object) List of compression formats, in the format name:file-ending e.g. {"gzip": "gz"}
    • license: See ld-info:license above