Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long SXS catalog loading times #39

Open
Akash-Maurya-0899 opened this issue May 31, 2024 · 7 comments
Open

Long SXS catalog loading times #39

Akash-Maurya-0899 opened this issue May 31, 2024 · 7 comments

Comments

@Akash-Maurya-0899
Copy link

The following code snippet takes a long time to execute every time I run it:

from nrcatalogtools import SXSCatalog
sxscatalog = SXSCatalog.load()

(Just to be clear, it's the second line that's taking long time to execute)

I already have catalog.zip stored in my ~/.cache/sxs directory, and still it takes a lot of time to load. I also tried to explicitly disable the downloading like so:

sxscatalog = SXSCatalog.load(download=False)

and it still takes long to execute.

Can this be cured or is this some "fundamental" I/O speed limitation in reading the catalog.zip file itself?

@adivijaykumar
Copy link
Collaborator

I hit this issue earlier today, and I find it quite concerning. This is also causing the tests to be slow, so we should try to see if there is a solution.

@adivijaykumar
Copy link
Collaborator

Wondering if this is related to the following warning from the sxs package:

        You have called a function that uses the `Catalog` class,
        which, as of `sxs` version 2024.0.0, has been deprecated in
        favor of the `Simulations` interface.  See the documentation
        for more information.

@adivijaykumar
Copy link
Collaborator

adivijaykumar commented Aug 7, 2024

OK, yes, indeed that is the issue. sxs.Catalog is deprecated, and we might have to refactor our entire code to take care of this change :(

CC: @prayush

@anuj137
Copy link
Contributor

anuj137 commented Sep 25, 2024

I do the following hack to avoid "infinitely" long waiting times:
Instead of directly loading the catalog through nrcatalogtools, one can supply the path to the catalog while defining the object of the class. I notice that it takes significantly lesser time this way. Furthermore, instead of reloading it everytime, one can just simply save the nrcatalogtools.sxs.SXSCatalog object as a pickle file to avoid long waiting times. Please find the code below:

import sxs
from nrcatalogtools.sxs import SXSCatalog
from glob import glob
from subprocess import call
import json
import pickle

# Define the path to the SXS cache directory using the sxs library
sxs_cache_dir = str(sxs.sxs_directory("cache"))

# Check if the SXS catalog file is available in the cache directory
# If it exists, load the catalog.json file from the cache
try:
    sxs_catalog = sxs.load(location="%s"%( glob(sxs_cache_dir + "/catalog.json")[0] ))

# If the file is not found (i.e., the catalog is missing), download the catalog.json file
# from the SXS website and save it in the cache directory
except:
    call("wget https://data.black-holes.org/catalog.json -P %s"%(sxs_cache_dir), shell=True)
    sxs_catalog = sxs.load(location="%s"%( glob(sxs_cache_dir + "/catalog.json")[0] ))

# Define the path to the nrcatalogtools SXSCatalog pickle file in the cache directory
nrcatalogtools_sxscatalog_path = glob(sxs_cache_dir + "/nrcatalogtools_sxscatalog.pkl")

# If the nrcatalogtools.sxs.SXSCatalog object is not saved in the cache, create it from the catalog.json
if len(nrcatalogtools_sxscatalog_path) == 0:
    print(
        "Loading SXS catalog through `nrcatalogtools.sxs.SXSCatalog`. This will take some time."
    )
    
    # Load the catalog.json data.
    with open(sxs_cache_dir + "/catalog.json", "r") as f:
        sxs_catalog_json = json.load(f)

    # Create the SXSCatalog object using the loaded JSON data
    nrcatalogtools_sxscatalog = SXSCatalog(catalog=sxs_catalog_json)
    
    # Save the SXSCatalog object to a pickle file in the cache directory for future use
    with open(nrcatalogtools_sxscatalog_path[0], "wb") as f:
        pickle.dump(nrcatalogtools_sxscatalog, f)

# If the SXSCatalog object is already saved in the cache (i.e., the pickle file exists),
# load the object from the cache to avoid recomputing it
else:
    print(f"Loading the `nrcatalogtools.sxs.SXSCatalog` object from cache directory: {nrcatalogtools_sxscatalog_path[0]}")
    with open(nrcatalogtools_sxscatalog_path[0], "rb") as f:
        nrcatalogtools_sxscatalog = pickle.load(f)

@prayush
Copy link
Contributor

prayush commented Dec 20, 2024

Is this issue still relevant @Akash-Maurya-0899 ?

@Akash-Maurya-0899
Copy link
Author

I can use the pickling hack @anuj137 mentioned above, but the loading times are still long via the bare code I initiated this issue against.

@adivijaykumar
Copy link
Collaborator

adivijaykumar commented Dec 20, 2024

I think we should change the complete API and not use sxs.catalog at all. The base behaviour of the sxs package is actually quite good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants