Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bucket is requester pays #3

Open
JulioHC00 opened this issue May 4, 2023 · 9 comments
Open

Bucket is requester pays #3

JulioHC00 opened this issue May 4, 2023 · 9 comments
Assignees

Comments

@JulioHC00
Copy link

Currently, when following the tutorial notebook and reaching the cell:

root = zarr.group(store)

The error ValueError: Bucket is requester pays. Set "requester_pays=True" when creating the GCSFileSystem. is raised. Does this mean payment is now required to access SDOML data?

@PaulJWright
Copy link
Member

@JulioHC00 this shouldn't be the case. Let me get back to you

@PaulJWright
Copy link
Member

PaulJWright commented May 4, 2023

In the meantime, please try the following (here, we are using the Heliocloud to load from AWS instead).

# !pip install zarr s3fs

import os
from typing import Union

import s3fs
import zarr

AWS_ZARR_ROOT = (
    "s3://gov-nasa-hdrl-data1/contrib/fdl-sdoml/fdl-sdoml-v2/sdomlv2_small.zarr/"
)


def s3_connection(path_to_zarr: os.path) -> s3fs.S3Map:
    """
    Instantiate connection to aws for a given path `path_to_zarr`
    """
    return s3fs.S3Map(
        root=path_to_zarr,
        s3=s3fs.S3FileSystem(anon=True),
        # anonymous access requires no credentials
        check=False,
    )


def load_single_aws_zarr(
    path_to_zarr: os.path,
    cache_max_single_size: int = None,
) -> Union[zarr.Array, zarr.Group]:
    """
    load zarr from s3 using LRU cache
    """
    return zarr.open(
        zarr.LRUStoreCache(
            store=s3_connection(path_to_zarr),
            max_size=cache_max_single_size,
        ),
        mode="r",
    )

then

root = load_single_aws_zarr(
    path_to_zarr=AWS_ZARR_ROOT,
)

print(root.tree())
/
 └── 2010
     ├── 131A (6135, 512, 512) float32
     ├── 1600A (6136, 512, 512) float32
     ├── 1700A (6135, 512, 512) float32
     ├── 171A (6135, 512, 512) float32
     ├── 193A (6135, 512, 512) float32
     ├── 211A (6136, 512, 512) float32
     ├── 304A (6134, 512, 512) float32
     ├── 335A (6135, 512, 512) float32
     └── 94A (6136, 512, 512) float32

and then

data = root["2010"]["171A"]

import dask.array as da

all_image = da.from_array(data)
all_image

will output:

image

@JulioHC00
Copy link
Author

Thanks! That seems to work. Is there a way to access the full dataset? I've change the AWS_ZARR_ROOT to s3://gov-nasa-hdrl-data1/contrib/fdl-sdoml/fdl-sdoml-v2/ but the cell never seems to finish running. Is it just because it takes a long time? Or is the full dataset not available in this way?

@PaulJWright
Copy link
Member

No worries, just tidied up the instructions a bit incase anyone else comes along!

RE s3://gov-nasa-hdrl-data1/contrib/fdl-sdoml/fdl-sdoml-v2/, that should work... it may just take a while. I can run it my end and update here.

@PaulJWright
Copy link
Member

@JulioHC00 I was mistaken, sorry.

~ aws s3 ls s3://gov-nasa-hdrl-data1/contrib/fdl-sdoml/fdl-sdoml-v2/ --no-sign-request
                           PRE notebooks/
                           PRE sdomlv2.zarr/
                           PRE sdomlv2_eve.zarr/
                           PRE sdomlv2_hmi.zarr/
                           PRE sdomlv2_hmi_small.zarr/
                           PRE sdomlv2_small.zarr/

I believe you'll want to load s3://gov-nasa-hdrl-data1/contrib/fdl-sdoml/fdl-sdoml-v2/sdomlv2.zarr/

@JulioHC00
Copy link
Author

It's working! Thanks a lot for the quick answer!

@PaulJWright PaulJWright self-assigned this May 4, 2023
@PaulJWright PaulJWright pinned this issue May 25, 2023
@jkilb
Copy link

jkilb commented May 26, 2023

Paul, thank you for your response - I've also run into this error when attempting to pull the hmi data with:

loc_hmi = "fdl-sdoml-v2/sdomlv2_hmi.zarr/2010" store = gcsfs.GCSMap(loc_hmi, gcs=gcs, check=False) root = zarr.group(store) print(root.tree())

I tried a similar solution to what was provided above, by using the new functions with Heliocloud, but I'm receiving an error that states there is nothing at the path. Do you know if the hmi data is avialble at the following path: "s3://gov-nasa-hdrl-data1/contrib/fdl-sdoml-v2/sdomlv2_hmi.zarr/2010".

Thanks!

`AWS_ZARR_ROOT_hmi = (
    "s3://gov-nasa-hdrl-data1/contrib/fdl-sdoml-v2/sdomlv2_hmi.zarr/2010"
)`

`root = load_single_aws_zarr(
    path_to_zarr=AWS_ZARR_ROOT_hmi,
)

print(root.tree())`

@PaulJWright
Copy link
Member

PaulJWright commented May 26, 2023

Paul, thank you for your response - I've also run into this error when attempting to pull the hmi data with:

loc_hmi = "fdl-sdoml-v2/sdomlv2_hmi.zarr/2010" store = gcsfs.GCSMap(loc_hmi, gcs=gcs, check=False) root = zarr.group(store) print(root.tree())

I tried a similar solution to what was provided above, by using the new functions with Heliocloud, but I'm receiving an error that states there is nothing at the path. Do you know if the hmi data is avialble at the following path: "s3://gov-nasa-hdrl-data1/contrib/fdl-sdoml-v2/sdomlv2_hmi.zarr/2010".

Thanks!

`AWS_ZARR_ROOT_hmi = (
    "s3://gov-nasa-hdrl-data1/contrib/fdl-sdoml-v2/sdomlv2_hmi.zarr/2010"
)`

`root = load_single_aws_zarr(
    path_to_zarr=AWS_ZARR_ROOT_hmi,
)

print(root.tree())`
root = load_single_aws_zarr(
    path_to_zarr='s3://gov-nasa-hdrl-data1/contrib/fdl-sdoml/fdl-sdoml-v2/sdomlv2_hmi.zarr/2010/',
)

print(root.tree())

root = load_single_aws_zarr(
path_to_zarr=AWS_ZARR_ROOT_hmi,
)

print(root.tree())

Hello, no worries! The mistake is in the path. It's

s3://gov-nasa-hdrl-data1/contrib/fdl-sdoml/fdl-sdoml-v2/sdomlv2_hmi.zarr/2010 instead of
s3://gov-nasa-hdrl-data1/contrib/fdl-sdoml-v2/sdomlv2_hmi.zarr/2010.

AWS_ZARR_ROOT_hmi = (
    "s3://gov-nasa-hdrl-data1/contrib/fdl-sdoml/fdl-sdoml-v2/sdomlv2_hmi.zarr/2010"
)

root = load_single_aws_zarr(
    path_to_zarr=AWS_ZARR_ROOT_hmi,
)

print(root.tree())
/
 ├── Bx (25540, 512, 512) float32
 ├── By (25540, 512, 512) float32
 └── Bz (25540, 512, 512) float32

@eliu390
Copy link

eliu390 commented Jul 19, 2023

@PaulJWright, I'm trying to write a service that periodically pulls the latest HMI/AIA data. Accessing the dataset using

AWS_ZARR_ROOT = (
    "s3://gov-nasa-hdrl-data1/contrib/fdl-sdoml/fdl-sdoml-v2/sdomlv2.zarr/"
)

returns data from 2010-2020. Is the data from after 2020 available via this or some other method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants