Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMIP7 DataRequest Integration Tests #85

Merged
merged 86 commits into from
Jan 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
f81491b
feat: adds preliminary CMIP7 software as submodule
pgierz Oct 24, 2024
32d655e
feat: variable as dataclass
pgierz Nov 28, 2024
6fbcf69
feat(data-request): add dataclasses for CMIP6 variables
pgierz Nov 29, 2024
d2b9bbb
feat: DataRequestTableHeader as ABC
pgierz Nov 29, 2024
8c80159
removes old data_request implementation -- everything is broken at th…
pgierz Dec 2, 2024
0c6ab8a
wip
pgierz Dec 2, 2024
290998f
wip
pgierz Dec 2, 2024
03803fb
wip
pgierz Dec 2, 2024
15de45d
Merge branch 'main' into feat/dataclasses-for-data-request
pgierz Dec 2, 2024
b0943d1
wip
pgierz Dec 2, 2024
4c269c8
resurrects the ignore tables enum
pgierz Dec 2, 2024
6137d36
argh indent error
pgierz Dec 2, 2024
de49dad
forgot type import
pgierz Dec 2, 2024
a1f0406
type hints will be the death of me
pgierz Dec 2, 2024
906f489
wip
pgierz Dec 2, 2024
ff162af
wrong decorator order
pgierz Dec 2, 2024
fe7eacf
wip 123
pgierz Dec 2, 2024
a619e82
wip 456
pgierz Dec 2, 2024
4328361
wip 678
pgierz Dec 2, 2024
d47bec6
wip 999
pgierz Dec 2, 2024
dd34aef
ci: trying to run pytest correctly for all (nested) unit tests
pgierz Dec 3, 2024
55e3d5c
fix: forgot the header property from ABC
pgierz Dec 3, 2024
f3a9a09
wip 999
pgierz Dec 3, 2024
19708ba
wip still...factories are self-assembling in the computer ether
pgierz Dec 3, 2024
3aef404
wip, at least I have it in my head...
pgierz Dec 3, 2024
e78c297
feat: new data request classes
pgierz Dec 5, 2024
cbdb593
merge main
pgierz Dec 5, 2024
9eef92e
fix: IgnoreTableFiles Enum is now specifically called CMIP6IgnoreTabl…
pgierz Dec 5, 2024
4fc72b8
feat: allows you to add header information to the variables when crea…
pgierz Dec 5, 2024
a098e4a
style: isort on test file
pgierz Dec 5, 2024
acf6ac7
fix: variable_id attribute is needed
pgierz Dec 5, 2024
022b455
fix/test: cmor version needs to be defined
pgierz Dec 5, 2024
4040ad3
fix: still missing variable_id in the constructor in DataRequestVariable
pgierz Dec 5, 2024
e2c4b36
fix: wrong method name in from json constructor
pgierz Dec 5, 2024
87dff06
fix/test: rearrage test fixtures for new structure of data request va…
pgierz Dec 5, 2024
2bb1360
wip: try without JSON table specifics
pgierz Dec 5, 2024
2e06907
specify CMIP6 constructors for CMIP6 DataRequest
pgierz Dec 5, 2024
190ad5b
wip still...
pgierz Dec 5, 2024
8d55849
wip
pgierz Dec 5, 2024
43ef8ca
more wip
pgierz Dec 5, 2024
b626d32
wip still, this is getting annoying
pgierz Dec 5, 2024
9a0a407
wip still, this is getting annoying 2
pgierz Dec 5, 2024
2e9d574
still trying full refactory
pgierz Dec 5, 2024
e94a2bb
units (plural) not unit
pgierz Dec 5, 2024
bb91dba
wip 3
pgierz Dec 6, 2024
fe2b155
data request variable needs to get units (plural) not unit
pgierz Dec 6, 2024
8ebb073
forgot to add mocker dependency in setuppy
pgierz Dec 6, 2024
11966a8
unit tests for unit conversions work again
pgierz Dec 6, 2024
fc03c86
wip for integration tests
pgierz Dec 6, 2024
5602098
feat: includes table headers by default in all DataRequestVariables w…
pgierz Dec 6, 2024
2be93d6
supresses some of the warnings
pgierz Dec 6, 2024
68e2e20
re-enables progressive pipeline to ensure no recursion errors
pgierz Dec 6, 2024
b511ad7
frequency is on the variable, not the table
pgierz Dec 6, 2024
2cebaa3
last tables (hopefully)
pgierz Dec 6, 2024
cf29d1b
test: disable progressive pipeline again for now
pgierz Dec 6, 2024
91e83df
test: removes dead code
pgierz Dec 6, 2024
d7ede1e
Merge branch 'feat/dataclasses-for-data-request' into feat/cmip7
pgierz Dec 9, 2024
7691374
feat: CMIP6 DataRequest from git URL
pgierz Dec 9, 2024
50be309
test: CMIP6 DataRequest from git URL
pgierz Dec 9, 2024
0625f5d
ci: ignores CMIP7 bundled software in isort and black checks
pgierz Dec 9, 2024
0c0187f
ci: forgot isort part of skipping CMIP7
pgierz Dec 9, 2024
f3aaae8
wip(cmip7): DataRequestVariable can be initialized from shipped JSON …
pgierz Dec 9, 2024
9f07447
wip(cmip7): table name set up correctly when parsing from all_var json
pgierz Dec 9, 2024
50fb3e2
wip(cmip7): update to 1.0 tag version of CMIP7 Data Request
pgierz Dec 9, 2024
a5f4c2a
wip(cmip7): first try to create tables
pgierz Dec 9, 2024
0241ff2
wip(cmip7): 1 instead of v1.0 to be compatible with SemVer
pgierz Dec 9, 2024
7fd2fc8
wip(cmip7): first try to create a full data request
pgierz Dec 9, 2024
81bb8ec
wip/ci: forgot to auto-run tests for new subfolder in unit
pgierz Dec 9, 2024
f778217
wip: integration tests with cmip7
pgierz Dec 10, 2024
f4c1483
wip: wrong spelling for parametrize :-(
pgierz Dec 10, 2024
fba5477
wip: indirect fixtures should behave now
pgierz Dec 10, 2024
e479760
wip: still trying indirect fixtures
pgierz Dec 10, 2024
43347d9
wip: factory design for tables, still in progress
pgierz Dec 10, 2024
a2d6ece
wip: didn't need a dict comprehension
pgierz Dec 10, 2024
1f658c0
wip: changed DataRequestVariable to cls by mistake for CMIP7DataReque…
pgierz Dec 10, 2024
44df344
wip: skip certain table files when constructing tables for dir into dict
pgierz Dec 10, 2024
d91bf85
Update cmorizer.py
pgierz Dec 10, 2024
fcb14cd
Update cmorizer.py
pgierz Dec 10, 2024
48355df
root: start of CMIP7 integration tests with real processing
pgierz Dec 11, 2024
2be4f67
test: adds integration tests with real processing using CMIP7 data re…
pgierz Dec 11, 2024
2cee85d
feat: fake table headers for CMIP7
pgierz Dec 11, 2024
799406d
feat: attach fake table headers to variables in CMIP7
pgierz Dec 11, 2024
71d62e3
fix: staticmethod doesn't need self (facepalm)
pgierz Dec 11, 2024
cb14abe
wip
pgierz Dec 11, 2024
807568f
wip: Bad practice on CMOR side, a table can contain more than one Realm
pgierz Dec 11, 2024
85b0a34
Merge branch 'feat/dataclasses-for-data-request' into feat/cmip7-proc…
pgierz Dec 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions .github/workflows/CI-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ jobs:
export XARRAY_ENGINE=h5netcdf
export PREFECT_SERVER_EPHEMERAL_STARTUP_TIMEOUT_SECONDS=300
pytest -vvv -s --cov tests/unit/*.py
pytest -vvv -s --cov tests/unit/data_request/*.py
- name: Test with pytest (Integration)
run: |
export XARRAY_ENGINE=h5netcdf
Expand Down Expand Up @@ -80,18 +81,18 @@ jobs:
python -m pip install .[dev]
- name: Run isort
run: |
isort --profile black --check --skip ./cmip6-cmor-tables --skip ./versioneer.py .
isort --profile black --check --skip ./cmip6-cmor-tables --skip ./versioneer.py --skip ./CMIP7_DReq_Software .
- name: Lint with flake8
run: |
## stop the build if there are Python syntax errors or undefined names
#flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
## exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
#flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
# stop at any error
flake8 . --show-source --statistics --exclude ./cmip6-cmor-tables,./build,_version.py,./src/pymorize/webapp.py
flake8 . --show-source --statistics --exclude ./cmip6-cmor-tables,./build,_version.py,./src/pymorize/webapp.py,./CMIP7_DReq_Software
- name: Run black
run: |
black --check --extend-exclude 'cmip6-cmor-tables|versioneer\.py|webapp\.py' .
black --check --extend-exclude 'cmip6-cmor-tables|CMIP7_DReq_Software|versioneer\.py|webapp\.py' .
- name: yamllint
run: |
yamllint -d "{extends: default, rules: {line-length: {level: 'warning'}}, ignore: ['cmip6-cmor-tables']}" .
yamllint -d "{extends: default, rules: {line-length: {level: 'warning'}}, ignore: ['cmip6-cmor-tables', 'CMIP7_DReq_Software']}" .
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
[submodule "cmip6-cmor-tables"]
path = cmip6-cmor-tables
url = https://github.com/PCMDI/cmip6-cmor-tables.git
[submodule "CMIP7_DReq_Software"]
path = CMIP7_DReq_Software
url = https://github.com/CMIP-Data-Request/CMIP7_DReq_Software.git
3 changes: 2 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
default_language_version:

Check warning on line 1 in .pre-commit-config.yaml

View workflow job for this annotation

GitHub Actions / check_format (3.9)

1:1 [document-start] missing document start "---"
python: python3.9
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
Expand All @@ -22,16 +22,17 @@
rev: '7.1.1'
hooks:
- id: flake8
# YAML stuff from https://earth.bsc.es/gitlab/digital-twins/de_340-2/workflow

Check warning on line 25 in .pre-commit-config.yaml

View workflow job for this annotation

GitHub Actions / check_format (3.9)

25:9 [comments-indentation] comment not indented like content

Check warning on line 25 in .pre-commit-config.yaml

View workflow job for this annotation

GitHub Actions / check_format (3.9)

25:81 [line-length] line too long (85 > 80 characters)
- repo: https://github.com/adrienverge/yamllint.git
rev: v1.35.1 # or higher tag

Check warning on line 27 in .pre-commit-config.yaml

View workflow job for this annotation

GitHub Actions / check_format (3.9)

27:18 [comments] too few spaces before comment
hooks:
- id: yamllint
args: [-d, "{extends: default, rules: {line-length: {level: 'warning'}}}"]

Check warning on line 30 in .pre-commit-config.yaml

View workflow job for this annotation

GitHub Actions / check_format (3.9)

30:81 [line-length] line too long (82 > 80 characters)
exclude: |
(?x) # Enable verbose regex
(
^versioneer\.py$| # Exclude 'versioneer.py'
^src/pymorize/webapp\.py$| # Exclude 'src/pymorize/webapp.py'
_version\.py # Exclude '_version.py'
_version\.py| # Exclude '_version.py'
^src/pymorize/data/cmip7/ # Exclude 'src/pymorize/data/cmip7/'
)
1 change: 1 addition & 0 deletions CMIP7_DReq_Software
Submodule CMIP7_DReq_Software added at a7879d
32 changes: 0 additions & 32 deletions conftest.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,3 @@
import os
import re
from pathlib import Path

import pytest

from tests.utils.constants import TEST_ROOT # noqa: F401

pytest_plugins = [
Expand All @@ -21,29 +15,3 @@
"tests.fixtures.CV_Dir",
"tests.fixtures.data_requests",
]


@pytest.hookimpl(tryfirst=True)
def pytest_collection_modifyitems(config, items):
for item in items:
if item.fspath and item.fspath.ext == ".py":
item.add_marker(pytest.mark.doctest)


@pytest.fixture(autouse=True)
def pathlib_doctest_directive(doctest_namespace):
"""Replace PosixPath/WindowsPath with Path in doc-test output."""
doctest_namespace["Path"] = Path

def path_replace(output):
"""Replace platform-specific Path output with generic Path in doc-tests."""
return re.sub(r"(PosixPath|WindowsPath)\((.*?)\)", r"Path(\2)", output)

doctest_namespace["path_replace"] = path_replace


def pytest_unconfigure(config):
"""Remove all JSON files containing 'pipeline' in their name."""
for file in os.listdir():
if "pipeline" in file and file.endswith(".json"):
os.remove(file)
3 changes: 3 additions & 0 deletions pytest.ini
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
[pytest]
filterwarnings =
ignore:Import\(s\) unavailable to set up matplotlib support:UserWarning

doctest_optionflags = NORMALIZE_WHITESPACE IGNORE_EXCEPTION_DETAIL ELLIPSIS
4 changes: 3 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ def read(filename):
"pyyaml",
"questionary",
"randomname",
"semver",
"rich-click",
"streamlit",
"tqdm",
Expand All @@ -73,6 +74,7 @@ def read(filename):
"pytest",
"pytest-asyncio",
"pytest-cov",
"pytest-mock",
"pytest-xdist",
"sphinx",
"sphinx_rtd_theme",
Expand All @@ -91,7 +93,7 @@ def read(filename):
},
include_package_data=True,
package_data={
"pymorize": ["data/*.yaml"],
"pymorize": ["data/*.yaml", "data/cmip7/all_var_info.json"],
},
classifiers=[
"Development Status :: 2 - Pre-Alpha",
Expand Down
62 changes: 38 additions & 24 deletions src/pymorize/cmorizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,10 @@
set_dashboard_link,
)
from .config import PymorizeConfig, PymorizeConfigManager
from .data_request import (
DataRequest,
DataRequestTable,
DataRequestVariable,
IgnoreTableFiles,
)
from .data_request.collection import DataRequest
from .data_request.factory import create_factory
from .data_request.table import DataRequestTable
from .data_request.variable import DataRequestVariable
from .filecache import fc
from .logging import logger
from .pipeline import Pipeline
Expand All @@ -39,9 +37,15 @@
DIMENSIONLESS_MAPPING_TABLE = files("pymorize.data").joinpath(
"dimensionless_mappings.yaml"
)
"""Path: The dimenionless unit mapping table, used to recreate meaningful units from
dimensionless fractional values (e.g. 0.001 --> g/kg)"""
# FIXME(PG): I don't know if this is a Path or not, so the documented type might be wrong


class CMORizer:
_SUPPORTED_CMOR_VERSIONS = ("CMIP6", "CMIP7")
"""tuple : Supported CMOR versions."""

def __init__(
self,
pymorize_cfg=None,
Expand All @@ -61,6 +65,15 @@ def __init__(
self.pipelines = pipelines_cfg or []
self._cluster = None # ask Cluster, might be set up later
################################################################################
# CMOR Version Settings:

if self._general_cfg.get("cmor_version") is None:
raise ValueError("cmor_version must be set in the general configuration.")
self.cmor_version = self._general_cfg["cmor_version"]
if self.cmor_version not in self._SUPPORTED_CMOR_VERSIONS:
logger.error(f"CMOR version {self.cmor_version} is not supported.")
logger.error(f"Supported versions are {self._SUPPORTED_CMOR_VERSION}")
raise ValueError(f"Unsupported CMOR version: {self.cmor_version}")

################################################################################
# Print Out Configuration:
Expand Down Expand Up @@ -105,14 +118,19 @@ def __init__(
logger.debug("...done!")
self._post_init_create_pipelines()
self._post_init_create_rules()
self._post_init_read_bare_tables()
self._post_init_create_data_request_tables()
self._post_init_create_data_request()
self._post_init_populate_rules_with_tables()
self._post_init_read_dimensionless_unit_mappings()
self._post_init_data_request_variables()
logger.debug("...post-init done!")
################################################################################

def __del__(self):
"""Gracefully close the cluster if it exists"""
if self._cluster is not None:
self._cluster.close()

def _post_init_configure_dask(self):
"""
Sets up configuration for Dask-Distributed
Expand Down Expand Up @@ -184,31 +202,26 @@ def _post_init_create_dask_cluster(self):
import flox.xarray # noqa: F401
logger.info(f"...done! Imported {dask_extras} libraries.")

def _post_init_read_bare_tables(self):
def _post_init_create_data_request_tables(self):
"""
Loads all the tables from table directory as a mapping object.
A shortened version of the filename (i.e., ``CMIP6_Omon.json`` -> ``Omon``) is used as the mapping key.
The same key format is used in CMIP6_table_id.json
"""
data_request_table_factory = create_factory(DataRequestTable)
DataRequestTableClass = data_request_table_factory.get(self.cmor_version)
table_dir = Path(self._general_cfg["CMIP_Tables_Dir"])
table_files = {
path.stem.replace("CMIP6_", ""): path for path in table_dir.glob("*.json")
}
tables = {}
ignore_files = set(ignore_file.value for ignore_file in IgnoreTableFiles)
for tbl_name, tbl_file in table_files.items():
logger.debug(f"{tbl_name}, {tbl_file}")
if tbl_file.name not in ignore_files:
logger.debug(f"Adding Table {tbl_name}")
tables[tbl_name] = DataRequestTable(tbl_file)
tables = DataRequestTableClass.table_dict_from_directory(table_dir)
self._general_cfg["tables"] = self.tables = tables

def _post_init_create_data_request(self):
"""
Creates a DataRequest object from the tables directory.
"""
table_dir = self._general_cfg["CMIP_Tables_Dir"]
self.data_request = DataRequest.from_tables_dir(table_dir)
data_request_factory = create_factory(DataRequest)
DataRequestClass = data_request_factory.get(self.cmor_version)
self.data_request = DataRequestClass.from_directory(table_dir)

def _post_init_populate_rules_with_tables(self):
"""
Expand All @@ -217,11 +230,11 @@ def _post_init_populate_rules_with_tables(self):
tables = self._general_cfg["tables"]
for rule in self.rules:
for tbl in tables.values():
if rule.cmor_variable in tbl.variable_ids:
if rule.cmor_variable in tbl.variables:
rule.add_table(tbl.table_id)

def _post_init_data_request_variables(self):
for drv in self.data_request.variables:
for drv in self.data_request.variables.values():
rule_for_var = self.find_matching_rule(drv)
if rule_for_var is None:
continue
Expand Down Expand Up @@ -312,10 +325,11 @@ def _rules_expand_drvs(self):
self.rules = new_rules

def _rules_depluralize_drvs(self):
"""Ensures that only one data request variable is assigned to each rule"""
for rule in self.rules:
assert len(rule.data_request_variables) == 1
drv = rule.data_request_variable = rule.data_request_variables[0]
drv.depluralize()
rule.data_request_variable = rule.data_request_variables[0]
del rule.data_request_variables

def _post_init_create_pipelines(self):
pipelines = []
Expand Down Expand Up @@ -371,7 +385,7 @@ def _check_is_subperiod(self):
errors = []
for rule in self.rules:
table_freq = _frequency_from_approx_interval(
rule.data_request_variable.table.approx_interval
rule.data_request_variable.table_header.approx_interval
)
# is_subperiod from pandas does not support YE or ME notation
table_freq = table_freq.rstrip("E")
Expand Down
Empty file.
Loading
Loading