-
Notifications
You must be signed in to change notification settings - Fork 179
ODC EP 001 Add Support for 3D Datasets
Following discussion in issue #672, we'd like to move to a more formal proposal for adding better support for higher-dimensional data management within datacube. Adding fully generic support for n
-d data is too challenging of a task, given that we still want to retain present functionality with respect to scale and projection normalisation during load operation. Instead we start with a constrained implementation that should be sufficient for certain types of problems, like managing hyperspectral data.
- Kirill Kouzoubov
- Robert Woodcock
- snowman2
- Under Discussion
- In Progress
- Completed
- Rejected
- Deferred
Datacube does support loading data into arbitrary n
-d xarray.DataArray
. Right now .load_data
supports more than one non-spatial dimensions: give it n
-dimensional xarray.DataArray
of Tuple[datacube.Dataset]
and you will get back (n+2)
-dimensional array of pixels, with extra dimensions being y,x
.
Fundamental assumption within datacube is that dataset encodes a collection of named 2D rasters, (datacube.Dataset, band_name) -> Single Y,X raster
. Load needs to operate on 2D rasters at the lowest level as it does things like projection change and rescaling and unifying several datasets into one raster plane (mosaic). And so in order to model, say, 200 channel hyperspectral image one has to either:
- Create single Dataset with 200 bands:
b001, b002, .. b200
- Create 200 Datasets with single
reflectance
band, each Dataset covering the same region in time and space, but pointing to a different hyperspectral channel. Then you have to use customgroup_by
operator that knows which dataset encodes whichwavelength
.
Both approaches are problematic: defining 200 bands is a chore, creating 200 datasets is even more of a chore and has implications for database performance, also group_datasets
in its current form assumes single non-spatial dimension (see #643).
We are making several simplifying assumptions compared to generic n
-d support:
- Only support 2d or 3d data per dataset
- Simplifies configuration
- Reduces implementation complexity
- Assume that extra dimension shape and axis values are fixed for the entire product
- Removes the need for complex unification rules
- Trivial to determine shape of output
- Fixed order of dimensions as they come out of
.load|.load_data
- Reduces configuration surface
- Is consistent with the status quo: currently it's
t,y,x
, always, non-negotiable - Allows more efficient implementation
For simplicity of notation we will refer to extra new dimension as z
and to spatial dimensions as y,x
. In practice user can choose to name extra dimension differently, and spatial dimensions might be named longitude,latitude
.
-
Dataset[band] -> 2d (y,x) | 3d (z,y,x)
pixels per band- All bands share the same number of dimensions:
2d
or3d
- Individual bands can have different resolution in
y,x
, but notz
.z
, if present, is fixed across bands -
z
dimension can not be time dimension, create extraDataset
s for that (just like now) - Last two dimensions are
y,x
- If extra dimension is present it will go just before spatial dimensions:
[z, ]y, x
- All bands share the same number of dimensions:
- Assume fixed size extra dimension across all datasets and all bands within a product
- Extra dimension should be defined in the product definition
- Name for
z
axis:str
anything compatible with python variable name, for the exception oftime|t|x|y|longitude|latitude
- Values for coordinates of the axis:
List[float|str]
matching size of the dimension, no duplicates, sorted in the same order as returned by.read
- Name for
- Extra dimension should be defined in the product definition
- Slicing of
z
dimension on load should be supported- People will want rgb mosaics from hyperspectral and they won't want to wait for 20-30+ longer than needed to construct them.
After thinking some more about this, I think that the main complexity savings come from fixing extra dimensions, and not so much from limiting extra dimensions to just one or none. So if you have 4 depth slices (for example at 0, 10, 20, 100
meters) of 200 channel hyperspectral data, implementation complexity for that is not that much extra compared to just fixed size single axis. I think going from 1 fixed extra dimension to n
fixed extra dimensions should be fairly trivial. But going from 1 fixed extra dimension to 1 sparse extra dimension is a much bigger challenge.
- +1 - Robert Woodcock (on behalf of CSIRO)
- +1 - Alan Snow (snowman2)
- CSIRO:
- Peter Wang
- Mike Caccetta
- Robert Woodcock
Welcome to the Open Data Cube