Skip to content

Commit

Permalink
Initial Sphinx documentation content. (#76)
Browse files Browse the repository at this point in the history
* Initial Sphinx documentation content.

* Improved Sphinx config.

* Update README.

* Create placeholders for extended User Guide content.

* Reference docs pages.

* Fix User Guide page.

* Fix some links.

* Tie off broken link.
  • Loading branch information
pp-mo committed Jun 26, 2024
1 parent 4ef8f0c commit 81a0c68
Show file tree
Hide file tree
Showing 28 changed files with 969 additions and 216 deletions.
6 changes: 0 additions & 6 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,12 +59,6 @@ repos:
- id: blacken-docs
types: [file, rst]

- repo: https://github.com/aio-libs/sort-all
rev: v1.2.0
hooks:
- id: sort-all
types: [file, python]

- repo: https://github.com/pycqa/pydocstyle
rev: 6.3.0
hooks:
Expand Down
14 changes: 14 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,23 @@ build:
os: ubuntu-20.04
tools:
python: mambaforge-4.10

jobs:
# Content here largely copied from Iris
# see : https://github.com/SciTools/iris/pull/4855
post_checkout:
# The SciTools/iris repository is shallow i.e., has a .git/shallow,
# therefore complete the repository with a full history in order
# to allow setuptools-scm to correctly auto-discover the version.
- git fetch --unshallow
- git fetch --all
# Need to stash the local changes that Read the Docs makes so that
# setuptools_scm can generate the correct Iris version.
pre_install:
- git stash
post_install:
- sphinx-apidoc -Mfe -o ./docs/api ./lib/ncdata
- git stash pop

conda:
environment: requirements/readthedocs.yml
Expand Down
254 changes: 71 additions & 183 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,218 +22,106 @@ This enables the user to freely mix+match operations from both projects, getting
> temp_cube = cubes.extract_cube("air_temperature")
> qplt.contourf(temp_cube[0])
## Contents
* [Motivation](#motivation)
* [Primary Use](#primary-use)
* [Secondary Uses](#secondary-uses)
* [Principles](#principles)
* [Working Usage Examples](#code-examples)
* [API documentation](#api-documentation)
* [Installation](#installation)
* [Project Status](#project-status)
* [Change Notes](#change-notes)
* [Code stability](#code-stability)
* [Iris and Xarray version compatibility](#iris-and-xarray-compatibility)
* [Current Limitations](#known-limitations)
* [Known Problems](#known-problems)
* [References](#references)
* [Developer Notes](#developer-notes)

# Motivation
## Primary Use
Fast and efficient translation of data between Xarray and Iris objects.

This allows the user to mix+match features from either package in code.

For example:
# Purposes
* represent netcdf data as structures of Python objects
* easy manipulation of netcdf data with pythonic syntax
* Fast and efficient translation of data between Xarray and Iris objects.
* This allows the user to mix+match features from either package in code.

See : https://ncdata.readthedocs.io/en/latest/userdocs/user_guide/design_principles.html

# Documentation
On ReadTheDocs. Please see:
* [stable](https://ncdata.readthedocs.io/en/stable/index.html)
* [latest](https://ncdata.readthedocs.io/en/latest/index.html)

# Demonstration code examples:
* [Apply Iris regrid to xarray data](#apply-iris-regrid-to-xarray-data)
* [Use Zarr data in Iris](#use-zarr-data-in-iris)
* [Correct a mis-coded attribute in Iris input](#correct-a-miscoded-attribute-in-iris-input)
* [Rename a dimension in xarray output](#rename-a-dimension-in-xarray-output)
* [Copy selected data to a new file](#copy-selected-data-to-a-new-file)

## Apply Iris regrid to xarray data
``` python
from ncdata.iris_xarray import cubes_to_xarray, cubes_from_xarray

# Apply Iris regridder to xarray data
dataset = xarray.open_dataset("file1.nc", chunks="auto")
(cube,) = cubes_from_xarray(dataset)
cube2 = cube.regrid(grid_cube, iris.analysis.PointInCell)
dataset2 = cubes_to_xarray(cube2)
```

# Apply Xarray statistic to Iris data
cubes = iris.load("file1.nc")
dataset = cubes_to_xarray(cubes)
dataset2 = dataset.group_by("time.dayofyear").argmin()
cubes2 = cubes_from_xarray(dataset2)
## Use Zarr data in Iris
``` python
from ncdata.threadlock_sharing import enable_lockshare
enable_lockshare(iris=True, xarray=True)
import xarray as xr
dataset = xr.open_dataset(input_zarr_path, engine="zarr", chunks="auto")
input_cubes = cubes_from_xarray(dataset)
output_cubes = my_process(input_cubes)
dataset2 = cubes_to_xarray(output_cubes)
dataset2.to_zarr(output_zarr_path)
```
* data conversion is equivalent to writing to a file with one library, and reading it
back with the other ..
* .. except that no actual files are written
* both real (numpy) and lazy (dask) variable data arrays are transferred directly,
without copying or computing


## Secondary Uses
### Exact control of file formatting
Ncdata can also be used as a transfer layer between Iris or Xarray file i/o and the
exact format of data stored in files.
I.E. adjustments can be made to file data before loading it into Iris/Xarray; or
Iris/Xarray saved output can be adjusted before writing to a file.

This allows the user to workaround any package limitations in controlling storage
aspects such as : data chunking; reserved attributes; missing-value processing; or
dimension control.

For example:
## Correct a miscoded attribute in Iris input
``` python
from ncdata.xarray import from_xarray
from ncdata.iris import to_iris
from ncdata.netcdf4 import to_nc4, from_nc4
enable_lockshare(iris=True)
ncdata = from_nc4(input_path)
for var in ncdata.variables.values():
if "coords" in var.attributes:
var.attributes.rename("coords", "coordinates")
cubes = to_iris(ncdata)
```

# Rename a dimension in xarray output
## Rename a dimension in xarray output
``` python
enable_lockshare(xarray=True)
dataset = xr.open_dataset("file1.nc")
xr_ncdata = from_xarray(dataset)
dim = xr_ncdata.dimensions.pop("dim0")
dim.name = "newdim"
xr_ncdata.dimensions["newdim"] = dim
xr_ncdata.dimensions.rename("dim0", "newdim")
# N.B. must also replace the name in dimension-lists of variables
for var in xr_ncdata.variables.values():
var.dimensions = ["newdim" if dim == "dim0" else dim for dim in var.dimensions]
to_nc4(ncdata, "file_2a.nc")

# Fix chunking in Iris input
ncdata = from_nc4("file1.nc")
for var in ncdata.variables:
# custom chunking() mimics the file chunks we want
var.chunking = lambda: (100.0e6 if dim == "dim0" else -1 for dim in var.dimensions)
cubes = to_iris(ncdata)
```

### Manipulation of data
ncdata can also be used for data extraction and modification, similar to the scope of
CDO and NCO command-line operators but without file operations.
However, this type of usage is as yet still undeveloped : There is no inbuilt support
for data consistency checking, or obviously useful operations such as indexing by
dimension.
This could be added in future, but it is also true that many such operations (like
indexing) may be better done using Iris/Xarray.


# Principles
* ncdata represents NetCDF data as Python objects
* ncdata objects can be freely manipulated, independent of any data file
* ncdata variables can contain either real (numpy) or lazy (Dask) arrays
* ncdata can be losslessly converted to and from actual NetCDF files
* Iris or Xarray objects can be converted to and from ncdata, in the same way that
they are read from and saved to NetCDF files
* **_translation_** between Xarray and Iris is based on conversion to ncdata, which
is in turn equivalent to file i/o
* thus, Iris/Xarray translation is equivalent to _saving_ from one
package into a file, then _loading_ the file in the other package
* ncdata exchanges variable data directly with Iris/Xarray, with no copying of real
data or computing of lazy data
* ncdata exchanges lazy arrays with files using Dask 'streaming', thus allowing
transfer of arrays larger than memory


# Code Examples
* mostly TBD
* proof-of-concept script for
[netCDF4 file i/o](https://github.com/pp-mo/ncdata/blob/main/tests/integration/example_scripts/ex_ncdata_netcdf_conversion.py)
* proof-of-concept script for
[iris-xarray conversions](https://github.com/pp-mo/ncdata/blob/main/tests/integration/example_scripts/ex_iris_xarray_conversion.py)


# API documentation
* see the [ReadTheDocs build](https://ncdata.readthedocs.io/en/latest/index.html)


# Installation
Install from conda-forge with conda
```
conda install -c conda-forge ncdata
```

Or from PyPI with pip
```
pip install ncdata
```

# Project Status

## Code Stability
We intend to follow [PEP 440](https://peps.python.org/pep-0440/) or (older) [SemVer](https://semver.org/) versioning principles.

Minor release version is at **"v0.1"**.
This is a first complete implementation, with functional operational of all public APIs.

The code is however still experimental, and APIs are not stable (hence no major version yet).
## Copy selected data to a new file
``` python
from ncdata.netcdf4 import from_nc4, to_nc4
ncdata = from_nc4("file1.nc")

## Change Notes
### v0.1.1
Small tweaks + bug fixes.
**Note:** [#62](https://github.com/pp-mo/ncdata/pull/62) and [#59](https://github.com/pp-mo/ncdata/pull/59) are important fixes to achieve intended performance goals,
i.e. moving arbitrarily large data via Dask without running out of memory.
# Make a list of partial names to select the wanted variables
keys = ["air_", "surface"]

* Stop non-numpy attribute values from breaking attribute printout. [#63](https://github.com/pp-mo/ncdata/pull/63)
* Stop ``ncdata.iris.from_iris()`` consuming full data memory for each variable. [#62](https://github.com/pp-mo/ncdata/pull/62)
* Provide convenience APIs for ncdata component dictionaries and attribute values. [#61](https://github.com/pp-mo/ncdata/pull/61)
* Use dask ``chunks="auto"`` in ``ncdata.netcdf4.from_nc4()``. [#59](https://github.com/pp-mo/ncdata/pull/59)
# Explicitly add dimension names, to include all the dimension variables
keys += + list(ncdata.dimensions)

### v0.1.0
First release
# Identify the wanted variables
select_vars = [
var
for var in ncdata.variables.values()
if any(key in var.name for key in keys)
]

## Iris and Xarray Compatibility
* C.I. tests GitHub PRs and merges, against latest releases of Iris and Xarray
* compatible with iris >= v3.7.0
* see : [support added in v3.7.0](https://scitools-iris.readthedocs.io/en/stable/whatsnew/3.7.html#internal)
# Add any referenced coordinate variables
for var in list(select_vars):
var = ncdata.variables[varname]
for coordname in var.attributes.get("coordinates", "").split(" "):
select_vars.append(ncdata.variables[coordname])

## Known limitations
Unsupported features : _not planned_
* user-defined datatypes are not supported
* this includes compound and variable-length types
# Replace variables with only the wanted ones
ncdata.variables.clear()
ncdata.variables.addall(select_vars)

Unsupported features : _planned for future release_
* groups (not yet fully supported ?)
* file output chunking control
# Save
to_nc4(ncdata, "pruned.nc")
```

## Known problems
As-of v0.1.1
* in conversion from iris cubes with [`from_iris`](https://ncdata.readthedocs.io/en/latest/api/ncdata.iris.html#ncdata.iris.from_iris),
use of an `unlimited_dims` key currently causes an exception
* https://github.com/pp-mo/ncdata/issues/43
* in conversion to xarray with [`to_xarray`](https://ncdata.readthedocs.io/en/latest/api/ncdata.xarray.html#ncdata.xarray.to_xarray),
dataset encodings are not reproduced, most notably **the "unlimited_dims" control is missing**
* https://github.com/pp-mo/ncdata/issues/66

# References
# Older References in Iris
* Iris issue : https://github.com/SciTools/iris/issues/4994
* planning presentation : https://github.com/SciTools/iris/files/10499677/Xarray-Iris.bridge.proposal.--.NcData.pdf
* in-Iris code workings : https://github.com/pp-mo/iris/pull/75


# Developer Notes
## Documentation build
* For a full docs-build, a simple `make html` will do for now.
* The ``docs/Makefile`` wipes the API docs and invokes sphinx-apidoc for a full rebuild
* Results are then available at ``docs/_build/html/index.html``
* The above is just for _local testing_ if required :
We have automatic builds for releases and PRs via [ReadTheDocs](https://readthedocs.org/projects/ncdata/)

## Release actions
1. Cut a release on GitHub : this triggers a new docs version on [ReadTheDocs](https://readthedocs.org/projects/ncdata/)
1. Build the distribution
1. if needed, get [build](https://github.com/pypa/build)
2. run `python -m build`
2. Push to PyPI
1. if needed, get [twine](https://github.com/pypa/twine)
2. run `python -m twine upload --repository testpypi dist/*`
* this uploads to TestPyPI
3. create a new env with test dependencies `conda create -n ncdtmp python=3.11 iris xarray filelock requests pytest pip`
(N.B. 'filelock' and 'requests' are _test_ dependencies of iris)
5. install the new package with `pip install --index-url https://test.pypi.org/simple/ ncdata` and run tests
6. if that checks OK, _remove_ `--repository testpypi` _and repeat_ #2
* --> uploads to "real" PyPI
7. repeat #4, _removing_ the `--index-url`, to check that `pip install ncdata` now finds the new version
3. Update conda to source the new version from PyPI
1. create a PR on the [ncdata feedstock](https://github.com/conda-forge/ncdata-feedstock)
1. update :
* [version number](https://github.com/conda-forge/ncdata-feedstock/blob/3f6b35cbdffd2ee894821500f76f2b0b66f55939/recipe/meta.yaml#L2)
* [SHA](https://github.com/conda-forge/ncdata-feedstock/blob/3f6b35cbdffd2ee894821500f76f2b0b66f55939/recipe/meta.yaml#L10)
* Note : the [PyPI reference](https://github.com/conda-forge/ncdata-feedstock/blob/3f6b35cbdffd2ee894821500f76f2b0b66f55939/recipe/meta.yaml#L9) will normally look after itself
* Also : make any required changes to [dependencies](https://github.com/conda-forge/ncdata-feedstock/blob/3f6b35cbdffd2ee894821500f76f2b0b66f55939/recipe/meta.yaml#L17-L29) -- normally _no change required_
1. get PR merged ; wait a few hours ; check the new version appears in `conda search ncdata`
4 changes: 2 additions & 2 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ help:
.PHONY: help Makefile

allapi:
rm -rf ./api
sphinx-apidoc -Mfe -o ./api ../lib/ncdata
rm -rf ./details/api
sphinx-apidoc -Mfe -o ./details/api ../lib/ncdata

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
Expand Down
3 changes: 3 additions & 0 deletions docs/_templates/repo.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<!-- A github repo link -->
<a class="github reference external" href="https://github.com/pp-mo/ncdata">NcData on GitHub</a>

Loading

0 comments on commit 81a0c68

Please sign in to comment.