diff --git a/README.md b/README.md index 287b566..f83597f 100644 --- a/README.md +++ b/README.md @@ -12,47 +12,57 @@ Birds are services providing processes for specific thematic subjects. For examp Birdhouse provides tools (cockiecutter-template, birdy client, docker, ...) to make it easier to build, use and deploy new thematic birds. -## Build the docs +# Climate Services Information Systems -Clone the docs repo from GitHub: -```console -git clone https://github.com/bird-house/birdhouse2-docs.git +The following sections are instructions, guidelines and backgrounds around **Climate Services Information Systems (CSIS)**. -cd birdhouse2-docs -``` -Create conda environment: -```console -conda env create -conda activate birdhouse2-docs -``` +## FAIR Climate Services -Build the docs: -```console -mkdocs build +Climate datasets rapidly grow in volume and complexity and creating climate products requires high bandwidth, massive storage and large compute resources. For some regions, low bandwidth constitutes a real obstacle to developing climate services. Data volume also hinders reproducibility because very few institutions have the means to archive original data sets over the long term. Moreover, typical climate products often aggregate multiple sources of information, yet mechanisms to systematically track and document the provenance of all these data are only emerging. So although there is a general consensus that climate information should follow the **FAIR Principles** {cite:p}`Wilkinson2016,mons2017`, that is be **findable, accessible, interoperable, and reusable**, a number of obstacles hinder progress. The following principles can help set up efficient climate services information systems, and show how the four FAIR Principles not only apply to data, but also to analytical processes. -# html output is in folder site/ -``` +### Findable +Findable is the basic requirement for data and product usage and already an difficult obstacle with time intensive work for the data provider ensuring find-able data. On the production level finding algorithms requires open source software with intensive documentation. -Or serve the docs for your browser: -```console -mkdocs serve -``` +**Finding data:** +Finding data requires a structured data repository and if possible an assigning of a globally unique and eternally persistent identifier (like a DOI or Handle), describing the data with rich metadata, and making sure it is find-able through discovery portals of search clients. It is recommended to establish data repository collecting and managing core input and output data enabling coordinated provisioning and sharing of data focusing on sustainable storage and management of core data collections. Depending on data importance a certified long-term archive can be managed. The identification of core data collections to be managed in centralized repositories might be realized with e.g the Research Data Management Organiser (RDMO) tool. https://rdmorganiser.github.io/ -## Adding new pages +**Finding algorithms:** +In free and open source for geospatial (FOSS4G) developments workflows, independent developer groups are collaborating in a win-win situation and ensuring a high-quality software product {cite:p}`Bahamdain2015`. Public repositories enabling a work efficient participating and knowledge sharing approach {cite:p}`Bejoy2010`. A high certainty and quality of scientific evidence is needed for information in a juridical context to regulate the conflict between economic development and environmental protection {cite:p}`Brown2019`. Therefor backend solutions to provide climate information for decision makers, need to be as much as possible 'error free'. The challenge of high-quality software solutions is illustrated with Linus's law that "given enough eyeballs, all bugs are shallow". {cite:p}`Raymond2001`. -Please read the docs at [mkdocs](https://www.mkdocs.org/) and also at [mkdocs-material](https://squidfunk.github.io/mkdocs-material/). +### Accessible +**Access to data:** +For data users, the prevailing *modus operandi* has traditionally been to download raw data locally to conduct analyses. As data volume grows, bandwidth and local storage capacity limits the type of science that individual scientists can perform. -There is a [user guide](https://www.mkdocs.org/user-guide/writing-your-docs/) on mkdocs describing how to add new pages with markdown. +**Access to algorithms:** +A high certainty and quality of scientific evidence is needed for information in a juridical context to regulate the conflict between economic development and environmental protection {cite:p}`Brown2019`. Therefor backend solutions to provide climate information for decision makers, need to be as much as possible *error free*. The challenge of high-quality software solutions is illustrated with Linus's law that "given enough eyeballs, all bugs are shallow". {cite:p}`Raymond2001`. In free and open source for geospatial (FOSS4G) developments workflows, independent developer groups are collaborating in a win-win situation and ensuring a high-quality software product {cite:p}`Bahamdain2015`. Public repositories enabling a work efficient participating and knowledge sharing approach {cite:p}`Bejoy2010`. -## Convert rst to markdown +### Interoperable +Following the UNGGIM recommendations (2020) about 'Implementation and adoption of standards for the global geospatial information community' climate data should be organized following this [UNGIM recommendations](http://ggim.un.org/meetings/GGIM-committee/10th-Session/documents/E-C.20-2020-33-Add_1-Implementation-and-Adoption-of-Standards-21Jul2020.pdf). +Interoperabillity needs to be respected on two levels: -You can convert `rst` files to *markdown* using [pandoc](https://pandoc.org/). +**Interoperable data :** +following the conventions regarding metadata. -```console -pandoc tutorial.rst -t markdown -o tutorial.md -``` +**Interoperable structures:** +The OGC standardisation also enables communication between climate services information systems services. -Probably some edits are neccessary after the conversion. +### Reusable +Reusabillity is a major aspect to avoid duplication of work and to foster the dynamique of providing high quality products. +**Reusable data:** +The data should maintain its initial richness. The description of essential, recommended, and optional metadata elements should be machine processable and verifiable, use should be easy and data should be citable to sustain data sharing and recognize the value of data. Result output data from one service can be post-processed by another service where other component are provided. + +**Reusable algorithms:** +Contrary to running analysis code on a local machine, it is recommended to use remote services have no direct control on the software they are running. The server's maintainer essentially decides when software and services are upgraded, meaning that within the time a scientist performs initial exploration and produces the final version of a figure for a paper, remote-services might have slightly changed or have been retired. + +**Reproducabillity:** +This implies that reproducabillity results might not be easily reproducible if earlier versions of services are not available anymore. This puts an additional burden on scientists to carefully monitor the version of all the remote services used in the analysis to be able to explain discrepancies between results. Similar issues occur with data versions. If a scientist used version 1 for an analysis, there is no guarantee the source data will be archived over the long term if it has been superseded by version 2. In practice, climate services use ensembles of simulations, meaning that typical climate products aggregate hundreds or thousands of files, whose versions should ideally be tracked up until the final graphic or table. This capability to uniquely identify simulation files, errata and updates is available in CMIP6 {cite:p}`Stockhause2017,Weigel2013`, but it is the responsibility of climate service providers to embed this information into the products they develop. + + + diff --git a/docs/guide_docs.md b/docs/guide_docs.md new file mode 100644 index 0000000..db9bf1d --- /dev/null +++ b/docs/guide_docs.md @@ -0,0 +1,46 @@ +# Guidelines + + +## Build the docs + +Clone the docs repo from GitHub: +```console +git clone https://github.com/bird-house/birdhouse2-docs.git + +cd birdhouse2-docs +``` + +Create conda environment: +```console +conda env create +conda activate birdhouse2-docs +``` + +Build the docs: +```console +mkdocs build + +# html output is in folder site/ +``` + +Or serve the docs for your browser: +```console +mkdocs serve +``` + +## Adding new pages + +Please read the docs at [mkdocs](https://www.mkdocs.org/) and also at [mkdocs-material](https://squidfunk.github.io/mkdocs-material/). + +There is a [user guide](https://www.mkdocs.org/user-guide/writing-your-docs/) on mkdocs describing how to add new pages with markdown. + +## Convert rst to markdown + +You can convert `rst` files to *markdown* using [pandoc](https://pandoc.org/). + +```console +pandoc tutorial.rst -t markdown -o tutorial.md +``` + +Probably some edits are neccessary after the conversion. + diff --git a/docs/guide_fairclimateservices.md b/docs/guide_fairclimateservices.md deleted file mode 100644 index a6c061c..0000000 --- a/docs/guide_fairclimateservices.md +++ /dev/null @@ -1,42 +0,0 @@ -# FAIR Climate Services - -Climate datasets rapidly grow in volume and complexity and creating climate products requires high bandwidth, massive storage and large compute resources. For some regions, low bandwidth constitutes a real obstacle to developing climate services. Data volume also hinders reproducibility because very few institutions have the means to archive original data sets over the long term. Moreover, typical climate products often aggregate multiple sources of information, yet mechanisms to systematically track and document the provenance of all these data are only emerging. So although there is a general consensus that climate information should follow the **FAIR Principles** {cite:p}`Wilkinson2016,mons2017`, that is be **findable, accessible, interoperable, and reusable**, a number of obstacles hinder progress. The following principles can help set up efficient climate services information systems, and show how the four FAIR Principles not only apply to data, but also to analytical processes. - -### Findable -Findable is the basic requirement for data and product usage and already an difficult obstacle with time intensive work for the data provider ensuring find-able data. On the production level finding algorithms requires open source software with intensive documentation. - -**Finding data:** -Finding data requires a structured data repository and if possible an assigning of a globally unique and eternally persistent identifier (like a DOI or Handle), describing the data with rich metadata, and making sure it is find-able through discovery portals of search clients. It is recommended to establish data repository collecting and managing core input and output data enabling coordinated provisioning and sharing of data focusing on sustainable storage and management of core data collections. Depending on data importance a certified long-term archive can be managed. The identification of core data collections to be managed in centralized repositories might be realized with e.g the Research Data Management Organiser (RDMO) tool. https://rdmorganiser.github.io/ - -**Finding algorithms:** -In free and open source for geospatial (FOSS4G) developments workflows, independent developer groups are collaborating in a win-win situation and ensuring a high-quality software product {cite:p}`Bahamdain2015`. Public repositories enabling a work efficient participating and knowledge sharing approach {cite:p}`Bejoy2010`. A high certainty and quality of scientific evidence is needed for information in a juridical context to regulate the conflict between economic development and environmental protection {cite:p}`Brown2019`. Therefor backend solutions to provide climate information for decision makers, need to be as much as possible 'error free'. The challenge of high-quality software solutions is illustrated with Linus's law that "given enough eyeballs, all bugs are shallow". {cite:p}`Raymond2001`. - -### Accessible -**Access to data:** -For data users, the prevailing *modus operandi* has traditionally been to download raw data locally to conduct analyses. As data volume grows, bandwidth and local storage capacity limits the type of science that individual scientists can perform. - -**Access to algorithms:** -A high certainty and quality of scientific evidence is needed for information in a juridical context to regulate the conflict between economic development and environmental protection {cite:p}`Brown2019`. Therefor backend solutions to provide climate information for decision makers, need to be as much as possible *error free*. The challenge of high-quality software solutions is illustrated with Linus's law that "given enough eyeballs, all bugs are shallow". {cite:p}`Raymond2001`. In free and open source for geospatial (FOSS4G) developments workflows, independent developer groups are collaborating in a win-win situation and ensuring a high-quality software product {cite:p}`Bahamdain2015`. Public repositories enabling a work efficient participating and knowledge sharing approach {cite:p}`Bejoy2010`. - -### Interoperable -Following the UNGGIM recommendations (2020) about 'Implementation and adoption of standards for the global geospatial information community' climate data should be organized following this [UNGIM recommendations](http://ggim.un.org/meetings/GGIM-committee/10th-Session/documents/E-C.20-2020-33-Add_1-Implementation-and-Adoption-of-Standards-21Jul2020.pdf). -Interoperabillity needs to be respected on two levels: - -**Interoperable data :** -following the conventions regarding metadata. - -**Interoperable structures:** -The OGC standardisation also enables communication between climate services information systems services. - -### Reusable - -Reusabillity is a major aspect to avoid duplication of work and to foster the dynamique of providing high quality products. - -**Reusable data:** -The data should maintain its initial richness. The description of essential, recommended, and optional metadata elements should be machine processable and verifiable, use should be easy and data should be citable to sustain data sharing and recognize the value of data. Result output data from one service can be post-processed by another service where other component are provided. - -**Reusable algorithms:** -Contrary to running analysis code on a local machine, it is recommended to use remote services have no direct control on the software they are running. The server's maintainer essentially decides when software and services are upgraded, meaning that within the time a scientist performs initial exploration and produces the final version of a figure for a paper, remote-services might have slightly changed or have been retired. - -**Reproducabillity:** -This implies that reproducabillity results might not be easily reproducible if earlier versions of services are not available anymore. This puts an additional burden on scientists to carefully monitor the version of all the remote services used in the analysis to be able to explain discrepancies between results. Similar issues occur with data versions. If a scientist used version 1 for an analysis, there is no guarantee the source data will be archived over the long term if it has been superseded by version 2. In practice, climate services use ensembles of simulations, meaning that typical climate products aggregate hundreds or thousands of files, whose versions should ideally be tracked up until the final graphic or table. This capability to uniquely identify simulation files, errata and updates is available in CMIP6 {cite:p}`Stockhause2017,Weigel2013`, but it is the responsibility of climate service providers to embed this information into the products they develop. diff --git a/docs/list_apps.md b/docs/list_apps.md new file mode 100644 index 0000000..888cece --- /dev/null +++ b/docs/list_apps.md @@ -0,0 +1,64 @@ +# Climate Services Application Packages + +Here is a list of active software packages, applications and utilities to be used to spin off technical climate services information systems. +Guidelines and tutorials with are realted to the workflows of WPS are documented in the [previous version of birdhouse](https://birdhouse.readthedocs.io/en/latest/) +The sources of the software packages are centralised in the GitHub Organisation [Bird-House](https://github.com/bird-house). Some applications are stored in different places due to the deveopment history, funding mechanism or intellectual property rights. + +## Central collection of Application Packages +| Name and Documentation | Usage | Source | +| -------- | ------- | ------- | +| [Birdhouse](http://bird-house.github.io/) | Collection of OGC based application packages for CRIS | Organization in GitHub | + +## Utilities, Clients and Frontend Components + +| Name and Documentation | Usage | Source | +| -------- | ------- | ------- | +| [Twitcher](http://twitcher.readthedocs.io/) | Security Proxy for WPS, WCS, WMS | Deployed in [PAVICS](https://pavics-sdi.readthedocs.io/en/latest/) | +| [cookiecutter-birdhouse](https://cookiecutter-birdhouse.readthedocs.io) | Utility to create an OGC API Processes application package skeleton | Version 0.5 | +| [birdy](https://birdy.readthedocs.io) | Python WPS client to call a serverside deployed application package | Version 0.8.1 | +| [Phoenix](https://pyramid-phoenix.readthedocs.io/en/latest/) | Graphical User Interphase Frontend | Deployed at [CLINT Demonstrator](clint.dkrz.de) | +| [Rooki](https://www.google.com/url?q=https://github.com/roocs/rooki&sa=D&source=editors&ust=1660816924244993&usg=AOvVaw1PeZJ1lymJDD331QwFM55x) | PYTHON Library | | + + +## Climate Services Application Packages based on PyGEO API + +| Name and Documentation | Usage | Source | +| -------- | ------- | ------- | +| [nandu](https://github.com/bird-house/nandu) | ------- | [Nandu Github repository](https://github.com/bird-house/nandu) | + + +## Climate Services Application Packages based on OGC API Processes + +| [WEAVER](https://pavics-weaver.readthedocs.io/en/latest/) | Implementation following OGC API - Processes best practices. | | + +## Climate Services Application Packages based on OGC WPS including AI + +| Name and Documentation | Usage | Source | +| -------- | ------- | ------- | +| [duck](https://climateintelligence.github.io/smartduck-docs/sections/duck.html) | AI enhanced process to Fill in missing values | Deployed at [CLINT Demonstrator](clint.dkrz.de) | +| [hawk](https://clint-hawk.readthedocs.io/en/latest/) | ------- | [Hawk Github Repository](https://github.com/climateintelligence/hawk)| +| [albatross](https://clint-albatross.readthedocs.io/en/latest/) | ------- | [Albatross GitHub Repository](https://github.com/climateintelligence/albatross) | +| [shearwater](https://shearwater.readthedocs.io/en/latest/)| ------- | [Shearwater GitHub Repository](https://github.com/climateintelligence/shearwater) | +| [owl](https://clint-owl.readthedocs.io/en/latest/) | ------- | [Owl GitHub Repository](https://github.com/climateintelligence/owl) | +| [dipper](https://clint-dipper.readthedocs.io/en/latest/) | ------- | [Dipper GitHub Repository](https://github.com/climateintelligence/dipper) | + +## Climate Services Application Packages based on OGC WPS + +| Name and Documentation | Usage | Source | +| -------- | ------- | ------- | +| [Magpie](https://pavics-magpie.readthedocs.io/en/latest/) | | | +| [emu](https://emu.readthedocs.io/) | Demo and testing application for training purpose | Version 0.12 | +| [finch](https://pavics-sdi.readthedocs.io/) | application package for processing services to calculate climate indices | Deployed in [Climatedata.ca](Climatedata.ca) and [PAVICS](https://pavics.ouranos.ca) | +| [flyingpigon](https://flyingpigeon.readthedocs.io) | Test-suite | Deployed in [PAVICS](https://pavics.ouranos.ca) | +| [rooks](https://github.com/roocs) | Remote operations on climate simulations | Deployed in [COPERNICUS Climate Data Store (CDS)](https://cds.climate.copernicus.eu/#!/home) and CEDA | +| [raven](https://pavics-sdi.readthedocs.io/projects/raven/en/latest/notebooks/index.html) | Hydrological modelling | Deployed in [PAVICS](https://pavics.ouranos.ca) | +| [hummingbird](http://birdhouse-hummingbird.readthedocs.io/) | data compliance checker | DKRZ internal usage | +| [goldfinch](https://github.com/cedadev/goldfinch) | filtering and extraction of MIDAS data | Deployed at CEDA | +| [pelican](https://github.com/bird-house/pelican) | WPS supporting ESGF compute API | | + + + + diff --git a/mkdocs.yml b/mkdocs.yml index f7e5ecf..6f36ce3 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,7 +1,8 @@ site_name: Birdhouse nav: - Home: index.md - - FAIR Climate Services: guide_fairclimateservices.md + - List of Applications: list_apps.md + - Guidelines: guide_docs.md - Release Notes: release_notes.md - References: bibliography.md theme: