Skip to content

Commit

Permalink
wip: try execution in CI
Browse files Browse the repository at this point in the history
  • Loading branch information
agoose77 committed Dec 4, 2024
1 parent 80ceb26 commit 888b8be
Show file tree
Hide file tree
Showing 7 changed files with 169 additions and 118 deletions.
3 changes: 1 addition & 2 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,7 @@ jobs:
ipykernel
jupyter_server
- name: Build HTML Assets
# FIXME: enable execution once it is scoped to particular notebooks
run: myst build --html # --execute
run: myst build --html --execute
shell: micromamba-shell {0}
- name: Upload artifact
uses: actions/upload-pages-artifact@v3
Expand Down
11 changes: 5 additions & 6 deletions content/IS2_cloud_data_access.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,16 @@ jupytext:
format_name: myst
format_version: 0.13
jupytext_version: 1.16.4
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---

+++ {"user_expressions": []}

# ICESat-2 AWS cloud data access

This notebook ({download}`download <IS2_cloud_data_access.ipynb>`) illustrates the use of icepyx for accessing ICESat-2 data currently available through the AWS (Amazon Web Services) us-west2 hub s3 data bucket.

## Notes

1. ICESat-2 data became publicly available on the cloud on 29 September 2022. Thus, access methods and example workflows are still being developed by NSIDC, and the underlying code in icepyx will need to be updated now that these data (and the associated metadata) are available. We appreciate your patience and contributions (e.g. reporting bugs, sharing your code, etc.) during this transition!
2. This example and the code it describes are part of ongoing development. Current limitations to using these features are described throughout the example, as appropriate.
3. You **MUST** be working within an AWS instance. Otherwise, you will get a permissions error.
Expand Down Expand Up @@ -104,7 +102,7 @@ We can use the Variables module with an s3 url to explore available data variabl

Notice that accessing cloud data requires two layers of authentication: 1) authenticating with your Earthdata Login 2) authenticating for cloud access. These both happen behind the scenes, without the need for users to provide any explicit commands.

Icepyx uses earthaccess to generate your s3 data access token, which will be valid for *one* hour. Icepyx will also renew the token for you after an hour, so if viewing your token over the course of several hours you may notice the values will change.
Icepyx uses earthaccess to generate your s3 data access token, which will be valid for _one_ hour. Icepyx will also renew the token for you after an hour, so if viewing your token over the course of several hours you may notice the values will change.

If you do want to see your s3 credentials, you can access them using:

Expand Down Expand Up @@ -180,4 +178,5 @@ The slow load speed is a demonstration of the many steps involved in making clou
+++ {"user_expressions": []}

#### Credits
* notebook by: Jessica Scheick and Rachel Wegener

- notebook by: Jessica Scheick and Rachel Wegener
85 changes: 48 additions & 37 deletions content/IS2_data_access.md

Large diffs are not rendered by default.

30 changes: 16 additions & 14 deletions content/IS2_data_access2-subsetting.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,12 @@ jupytext:
format_name: myst
format_version: 0.13
jupytext_version: 1.16.4
kernelspec:
display_name: python3
language: python
name: python3
---

+++ {"user_expressions": []}

# Subsetting ICESat-2 Data

This notebook ({download}`download <IS2_data_access2-subsetting.ipynb>`) illustrates the use of icepyx for subsetting ICESat-2 data ordered through the NSIDC DAAC. We'll show how to find out what subsetting options are available and how to specify the subsetting options for your order.

For more information on using icepyx to find, order, and download data, see our complimentary [ICESat-2 Data Access Notebook](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_access.html).
Expand Down Expand Up @@ -66,11 +63,12 @@ Previously, icepyx required you to explicitly use the `.earthdata_login()` funct
## Discover Subsetting Options

You can see what subsetting options are available for a given product by calling `show_custom_options()`. The options are presented as a series of headings followed by available values in square brackets. Headings are:
* **Subsetting Options**: whether or not temporal and spatial subsetting are available for the data product
* **Data File Formats (Reformatting Options)**: return the data in a format other than the native hdf5 (submitted as a key=value kwarg to `order_granules(format='NetCDF4-CF')`)
* **Data File (Reformatting) Options Supporting Reprojection**: return the data in a reprojected reference frame. These will be available for gridded ICESat-2 L3B data products.
* **Data File (Reformatting) Options NOT Supporting Reprojection**: data file formats that cannot be delivered with reprojection
* **Data Variables (also Subsettable)**: a dictionary of variable name keys and the paths to those variables available in the product

- **Subsetting Options**: whether or not temporal and spatial subsetting are available for the data product
- **Data File Formats (Reformatting Options)**: return the data in a format other than the native hdf5 (submitted as a key=value kwarg to `order_granules(format='NetCDF4-CF')`)
- **Data File (Reformatting) Options Supporting Reprojection**: return the data in a reprojected reference frame. These will be available for gridded ICESat-2 L3B data products.
- **Data File (Reformatting) Options NOT Supporting Reprojection**: data file formats that cannot be delivered with reprojection
- **Data Variables (also Subsettable)**: a dictionary of variable name keys and the paths to those variables available in the product

```{code-cell} ipython3
region_a.show_custom_options(dictview=True)
Expand Down Expand Up @@ -107,6 +105,7 @@ Thus, this notebook uses a default list of wanted variables to showcase subsetti
+++ {"user_expressions": []}

### Determine what variables are available for your data product

There are multiple ways to get a complete list of available variables.
To increase readability, some display options (2 and 3, below) show the 200+ variable + path combinations as a dictionary where the keys are variable names and the values are the paths to that variable.

Expand Down Expand Up @@ -167,6 +166,7 @@ region_a.download_granules('/home/jovyan/icepyx/dev-notebooks/vardata') # <-- yo
```

### _Why does the subsetter say no matching data was found?_

_Sometimes, granules ("files") returned in our initial search end up not containing any data in our specified area of interest._
_This is because the initial search is completed using summary metadata for a granule._
_You've likely encountered this before when viewing available imagery online: your spatial search turns up a bunch of images with only a few border or corner pixels, maybe even in no data regions, in your area of interest._
Expand All @@ -185,6 +185,7 @@ fn = ''
```

## Check the downloaded data

Get all `latitude` variables in your downloaded file:

```{code-cell} ipython3
Expand All @@ -194,14 +195,14 @@ varlist = []
def IS2h5walk(vname, h5node):
if isinstance(h5node, h5py.Dataset):
varlist.append(vname)
return
return
with h5py.File(fn,'r') as h5pt:
h5pt.visititems(IS2h5walk)
for tvar in varlist:
vpath,vn = os.path.split(tvar)
if vn==varname: print(tvar)
if vn==varname: print(tvar)
```

### Compare to the variable paths available in the original data
Expand All @@ -211,5 +212,6 @@ region_a.order_vars.parse_var_list(region_a.order_vars.avail)[0][varname]
```

#### Credits
* notebook contributors: Zheng Liu, Jessica Scheick, and Amy Steiker
* some source material: [NSIDC Data Access Notebook](https://github.com/ICESAT-2HackWeek/ICESat2_hackweek_tutorials/tree/main/03_NSIDCDataAccess_Steiker) by Amy Steiker and Bruce Wallin

- notebook contributors: Zheng Liu, Jessica Scheick, and Amy Steiker
- some source material: [NSIDC Data Access Notebook](https://github.com/ICESAT-2HackWeek/ICESat2_hackweek_tutorials/tree/main/03_NSIDCDataAccess_Steiker) by Amy Steiker and Bruce Wallin
40 changes: 23 additions & 17 deletions content/IS2_data_read-in.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,19 @@ jupytext:
format_name: myst
format_version: 0.13
jupytext_version: 1.16.4
kernelspec:
display_name: python3
language: python
name: python3
---

+++ {"user_expressions": []}

# Reading ICESat-2 Data in for Analysis

This notebook ({download}`download <IS2_data_read-in.ipynb>`) illustrates the use of icepyx for reading ICESat-2 data files, loading them into a data object.
Currently the default data object is an Xarray Dataset, with ongoing work to provide support for other data object types.

For more information on how to order and download ICESat-2 data, see the [icepyx data access tutorial](https://icepyx.readthedocs.io/en/latest/example_notebooks/IS2_data_access.html).

### Motivation

Most often, when you open a data file, you must specify the underlying data structure and how you'd like the information to be read in.
A simple example of this, for instance when opening a csv or similarly delimited file, is letting the software know if the data contains a header row, what the data type is (string, double, float, boolean, etc.) for each column, what the delimiter is, and which columns or rows you'd like to be loaded.
Many ICESat-2 data readers are quite manual in nature, requiring that you accurately type out a list of string paths to the various data variables.
Expand All @@ -28,6 +26,7 @@ icepyx simplifies this process by relying on its awareness of ICESat-2 specific
Instead of needing to manually iterate through the beam pairs, you can provide a few options to the `Read` object and icepyx will do the heavy lifting for you (as detailed in this notebook).

### Approach

If you're interested in what's happening under the hood: icepyx uses the [xarray](https://docs.xarray.dev/en/stable/) library to read in each of the requested variables of the dataset. icepyx formats each requested variable and then merges the read-in data from each of the variables to create a single data object. The use of xarray is powerful, because the returned data object can be used with relevant xarray processing tools.

+++
Expand All @@ -40,9 +39,10 @@ import icepyx as ipx

+++ {"user_expressions": []}

---------------------------------
---

## Quick-Start Guide

For those who might be looking into playing with this (but don't want all the details/explanations)

```{code-cell} ipython3
Expand All @@ -65,10 +65,12 @@ ds.plot.scatter(x="longitude", y="latitude", hue="h_li", vmin=-100, vmax=2000)

+++ {"user_expressions": []}

---------------------------------------
---

## Key steps for loading (reading) ICESat-2 data

Reading in ICESat-2 data with icepyx happens in a few simple steps:

1. Let icepyx know where to find your data (this might be local files or urls to data in cloud storage)
2. Create an icepyx `Read` object
3. Make a list of the variables you want to read in (does not apply for gridded products)
Expand All @@ -79,6 +81,7 @@ We go through each of these steps in more detail in this notebook.
+++ {"user_expressions": []}

### Step 0: Get some data if you haven't already

Here are a few lines of code to get you set up with a few data files if you don't already have some on your local system.

```{code-cell} ipython3
Expand All @@ -102,10 +105,11 @@ Previously, icepyx required you to explicitly use the `.earthdata_login()` funct

Provide a full path to the data to be read in (i.e. opened).
Currently accepted inputs are:
* a string path to directory - all files from the directory will be opened
* a string path to single file - one file will be opened
* a list of filepaths - all files in the list will be opened
* a glob string (see [glob](https://docs.python.org/3/library/glob.html)) - any files matching the glob pattern will be opened

- a string path to directory - all files from the directory will be opened
- a string path to single file - one file will be opened
- a list of filepaths - all files in the list will be opened
- a glob string (see [glob](https://docs.python.org/3/library/glob.html)) - any files matching the glob pattern will be opened

```{code-cell} ipython3
path_root = '/full/path/to/your/data/'
Expand All @@ -116,7 +120,7 @@ path_root = '/full/path/to/your/data/'
```

```{code-cell} ipython3
# list_of_files = ['/my/data/ATL06/processed_ATL06_20190226005526_09100205_006_02.h5',
# list_of_files = ['/my/data/ATL06/processed_ATL06_20190226005526_09100205_006_02.h5',
# '/my/other/data/ATL06/processed_ATL06_20191202102922_10160505_006_01.h5']
```

Expand All @@ -128,9 +132,9 @@ path_root = '/full/path/to/your/data/'

glob works using `*` and `?` as wildcard characters, where `*` matches any number of characters and `?` matches a single character. For example:

* `/this/path/*.h5`: refers to all `.h5` files in the `/this/path` folder (Example matches: "/this/path/processed_ATL03_20191130221008_09930503_006_01.h5" or "/this/path/myfavoriteicsat-2file.h5")
* `/this/path/*ATL07*.h5`: refers to all `.h5` files in the `/this/path` folder that have ATL07 in the filename. (Example matches: "/this/path/ATL07-02_20221012220720_03391701_005_01.h5" or "/this/path/processed_ATL07.h5")
* `/this/path/ATL??/*.h5`: refers to all `.h5` files that are in a subfolder of `/this/path` and a subdirectory of `ATL` followed by any 2 characters (Example matches: "/this/path/ATL03/processed_ATL03_20191130221008_09930503_006_01.h5", "/this/path/ATL06/myfile.h5")
- `/this/path/*.h5`: refers to all `.h5` files in the `/this/path` folder (Example matches: "/this/path/processed_ATL03_20191130221008_09930503_006_01.h5" or "/this/path/myfavoriteicsat-2file.h5")
- `/this/path/*ATL07*.h5`: refers to all `.h5` files in the `/this/path` folder that have ATL07 in the filename. (Example matches: "/this/path/ATL07-02_20221012220720_03391701_005_01.h5" or "/this/path/processed_ATL07.h5")
- `/this/path/ATL??/*.h5`: refers to all `.h5` files that are in a subfolder of `/this/path` and a subdirectory of `ATL` followed by any 2 characters (Example matches: "/this/path/ATL03/processed_ATL03_20191130221008_09930503_006_01.h5", "/this/path/ATL06/myfile.h5")

See the glob documentation or other online explainer tutorials for more in depth explanation, or advanced glob paths such as character classes and ranges.

Expand All @@ -143,6 +147,7 @@ See the glob documentation or other online explainer tutorials for more in depth
glob will not by default search all of the subdirectories for matching filepaths, but it has the ability to do so.

If you would like to search recursively, you can achieve this by either:

1. passing the `recursive` argument into `glob_kwargs` and including `\**\` in your filepath
2. using glob directly to create a list of filepaths

Expand Down Expand Up @@ -272,7 +277,7 @@ ds = reader.load()

Within a Jupyter Notebook, you can get a summary view of your data object.

***ATTENTION: icepyx loads your data by creating an Xarray DataSet for each input granule and then merging them. In some cases, the automatic merge fails and needs to be handled manually. In these cases, icepyx will return a warning with the error message from the failed Xarray merge and a list of per-granule DataSets***
**_ATTENTION: icepyx loads your data by creating an Xarray DataSet for each input granule and then merging them. In some cases, the automatic merge fails and needs to be handled manually. In these cases, icepyx will return a warning with the error message from the failed Xarray merge and a list of per-granule DataSets_**

This can happen if you unintentionally provide the same granule multiple times with different filenames or in segmented products where the rgt+cycle automatically generated `gran_idx` values match. In this latter case, you can simply provide unique `gran_idx` values for each DataSet in `ds` and run `import xarray as xr` and `ds_merged = xr.merge(ds)` to create one merged DataSet.

Expand Down Expand Up @@ -302,8 +307,9 @@ Please let us know if you have any ideas or already have functions developed (we
+++ {"user_expressions": []}

#### Credits
* original notebook by: Jessica Scheick
* notebook contributors: Wei Ji and Tian

- original notebook by: Jessica Scheick
- notebook contributors: Wei Ji and Tian

```{code-cell} ipython3
Expand Down
Loading

0 comments on commit 888b8be

Please sign in to comment.