-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
setup_bathymetry
hangs on simple domain
#168
Comments
OK, good to raise this, but as it reads now it's only a note to self? If you don't figure it out perhaps please add here a MWE to showcase the error you get so that it's documented. E.g., what is "simple domain"? |
(why the figure for #100 needs bathymetry?) |
Because I only have the bathy regridded for the high res domain. To expand the domain and have a border of low res I need more bathy (which also gives the land mask) Simple domain: yextent = [-56,-26] Everything else set to defaults. This exact code worked fine on 24 cores on a previous version but I'm yet to figure out which changes caused the issue |
Thanks! the kwargs are called But don't worry about figuring out the historical thread of things -- just try to make this work on current version |
No it wasn’t to to with wrong variable names. Everything is input with updates argument names but something breaks (“numpy can’t allocate 2Eib of data”) or hangs forever. Used to take 2 min. More investigation needed |
Could you provide w a code snippet that when I copy paste in python or in Jupyter notebook I will get the error? |
I made an MWE. import regional_mom6 as rmom6
import os
import xarray as xr
from pathlib import Path
from dask.distributed import Client
scratch = "/scratch/v45/nc3020"
gdata = "/g/data/v45/nc3020"
home = "/home/552/nc3020"
expt_name = "bathymetry_mwe"
input_dir = f"{scratch}/regional_mom6_configs/{expt_name}/"
run_dir = f"{home}/mom6_rundirs/{expt_name}/"
toolpath_dir = "/home/157/ahg157/repos/mom5/src/tools/"
tmp_dir = f"{gdata}/{expt_name}"
for path in (run_dir, tmp_dir, input_dir):
os.makedirs(str(path), exist_ok=True)
expt = rmom6.experiment(
longitude_extent = (142, 180),
latitude_extent = (-56, -26),
resolution = 1/20,
date_range = ["2003-01-01 00:00:00", "2003-01-05 00:00:00"],
number_vertical_layers = 75,
layer_thickness_ratio = 10,
depth = 4500,
mom_run_dir = run_dir,
mom_input_dir = input_dir,
toolpath_dir = toolpath_dir
)
expt.setup_bathymetry(
bathymetry_path='/g/data/ik11/inputs/GEBCO_2022/GEBCO_2022.nc',
longitude_coordinate_name='lon',
latitude_coordinate_name='lat',
vertical_coordinate_name='elevation',
minimum_layers=1
)
expt.bathymetry.depth.plot() |
The above gives
and hangs there at least for 10-15min, after which I lost patience and killed the kernel. However, if I change to longitude_extent = (142, 144),
latitude_extent = (-56, -52),
resolution = 1/4, I get this plot after few seconds... I don't see the claimed bug! On the contrary, I see that the code warns the user that |
setup_bathymetry
hangs on simple domain
thanks, point being though that the code used to work with the same sized example and the same sized compute just in the jupyter notebook. So something has still messed up the code's efficiency |
OK. A performance issue :) |
I've tried with mpirun and that breaks too despite being given ample resources (96 cores, 250gb mem). This points to an issue with the hgrid & raw bathymetry files, as these are what are fed into mpirun script. Or with xESMF itself somehow? I'll keep looking into it but might take me a while |
I've been trying to reproduce the figure for the paper, and have therefore been re-making bathymetry. Strangely, some tasks that used to be really simple and fast (eg my region of study at 1/12 degree used to run on one node in ~2min) now hangs
On some further testing, it's now failing as it can't allocate stupid amounts of memory. Somewhere along the line we've messed up this function. I'm not sure how it's still passing the github actions! There's nothing really special about my domain.
I'll keep troubleshooting
The text was updated successfully, but these errors were encountered: