Workflow enhancements #43

orianac · 2021-11-23T00:38:04Z

Below are a list of things we (might) need to do to improve the workflows. Each improvement here might not be required for each downscaling method.

How to pass around connection credentials (e.g. connection_string or stores)
Improving functionality for working with multiple variables at once. This is connected to the question of whether we want to be working in Datasets or DataArrays, since Datasets allow us to work on multiple variables at once.
Updating rechunk_zarr_array to process multiple variables at once. To do this we need to pass a list of variables instead of a string to it.
Figuring out how much of the prep is generalizable across downscaling methods
Change the regridding obs step from being keyed to a specific GCM and rather to a specific grid, (and include a test to make sure that the grids actually have the same coordinates, not just that they have the same number of coordinates)

Prefect features to add

Add in caching routines.
Setting up the workflow to read in a config file and set those variables as the context. Then call those context variables instead of passing them around as strings.

The text was updated successfully, but these errors were encountered:

orianac · 2021-11-23T20:23:06Z

A few more to-dos from other comments in PR:

Change observations.py to training.py
Add testing - a starting point would be to create a dummy one-year dataset and run it through and assert that it looks the same as that. Could start with a sample output that I have for one year. This would be a longer-to-run test case, but it would be a good check that everything is working as expected before we enter production.
Add clean up routines (connected to Issue Data organization #45 )

Provide feedback