Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid values for some images (np.nan) #1

Open
mariusgiger opened this issue Jun 17, 2022 · 2 comments
Open

Invalid values for some images (np.nan) #1

mariusgiger opened this issue Jun 17, 2022 · 2 comments

Comments

@mariusgiger
Copy link

Hi there,

There are np.nan values in some of the images, which leads to problems when training ML models.

For example for the following times in 171A (fdl-sdoml-v2/sdomlv2.zarr/2020/171A):

{
  "T_OBS": [
    "2020-02-24T01:48:10.35Z",
    "2020-02-24T01:54:10.35Z",
    "2020-02-24T02:00:10.35Z",
    "2020-02-24T02:06:10.35Z",
    "2020-02-24T02:12:10.35Z",
    "2020-02-24T02:18:10.34Z",
    "2020-02-24T02:24:10.35Z",
    "2020-02-24T02:30:10.35Z",
    "2020-02-24T02:36:10.35Z",
    "2020-02-24T02:42:10.34Z",
    "2020-02-24T02:48:10.35Z",
    "2020-02-24T02:54:10.34Z"
  ]
}

Given a Pytorch DataLoader (an example can be found here, the issue can be reproduced as follows:

for batch_idx, samples in enumerate(loader):
    X, attrs = samples
    for i, x in enumerate(X):
        if np.isnan(x).any():
            obs_time = attrs["T_OBS"][i]
            print(f"found invalid sample at {obs_time}")

The resulting tensor will contain np.nan values:

tensor([[[nan, nan, nan,  ..., nan, nan, nan],
           [nan, nan, nan,  ..., nan, nan, nan],
           [nan, nan, nan,  ..., nan, nan, nan],
           ...,
           [nan, nan, nan,  ..., nan, nan, nan],
           [nan, nan, nan,  ..., nan, nan, nan],
           [nan, nan, nan,  ..., nan, nan, nan]]]))

Cheers,
Marius

@PaulJWright
Copy link
Member

Passed on to Meng

@kingbob8
Copy link
Contributor

Hi Marius and Paul,
These nan images seem to be related to the source synoptic images we used to generate the dataset. I found these problematic images have normal fits header information (i.e., QUALITY = 0) therefore passed the quality check in the code. We will investigate this issue and correct it in the future versions. But for now, maybe just discard these images with nan values. Sorry for the inconvenience and thanks for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants