-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read and cache lightning freq data in ELM to improve I/O performance #6742
Comments
The compset is 2010_EAM%CMIP6_ELM%CNPRDCTCBCTOP_MPASSI%PRES_DOCN%DOM_MOSART_SGLC_SWAV_SIAC_SESP |
FYI: @thorntonpe . Not sure who would be best to fix this. |
I just wanted to note quickly that I introduced some code back with #5369 that replicated the lightning read in for elm-fates usage. As such, I'll make sure to update that code with the recommended fix or we could refactor the elm fire code to integrate with the fates methods that were added with #5369. That would be a larger amount of work that is beyond scope here. |
@jayeshkrishna I don't see any reason to change the time dimension away from unlimited. You certainly can read multiple time samples from a file that has the unlimited dimension. The advantage of the unlimited dimension is that it's easy to add new time slides on the end of the file. Is there something else going on for your reasoning to change the file dimension? |
@ekluzek : You are right, it should be possible to read the entire variable without converting the record dimension (Since typically these variables are read one timestep at a time you could encounter bugs that don't show up for typical runs. For files with multiple variables with unlimited dimensions the performance of reads would be improved by the conversion). I will update the conversion as a suggestion. |
Finally looking into this. There is an unused option to create a stream that reads the entire file, which seems like the simplest fix to pursue. I'll try to have a PR with a working version before end of the week. If future lnfm files will be much larger than 200MB, then this will probably need to be adapted to calculate an appropriate span of years to read. |
The lnfm_init is actually duplicated: one in
/* Allow extra outermost dimensions in the decomposition */
int num_extra_dims = (vdesc->record >= 0 && fndims > 1)? (ndims - (fndims - 1)) : (ndims - fndims);
pioassert(num_extra_dims >= 0, "Unexpected num_extra_dims", __FILE__, __LINE__);
char message[256];
if (num_extra_dims > 0)
{
for (int d = 0; d < num_extra_dims; d++){
sprintf(message, "Extra outermost dimension %d != 1 for index %d", iodesc->dimlen[d],d);
pioassert(iodesc->dimlen[d] == 1, message, __FILE__, __LINE__);
}
} The 2920 is the length of the time dimension in the lnfm file. I'm looking more into the stream creation parameters but the fix is not obvious to me. |
When looking at the I/O performance of E3SM v3 high resolution (ne120) case we found that the lightning data was being read by the land model a lot of times (~3K reads, every 3 hrs) slowing down the simulation (total read time = 1538.39s, close to 8% of the runtime, for a 1 yr run). Instead of reading the lightning data in multiple time steps it might be useful to read the entire variable (or multiple timesteps) in a single read and cache it.
The performance results of the case above from PACE is given below for reference (Please look at the "Read file I/O statistics" section to see the input file with the slow read times),
https://pace.ornl.gov/scorpio/198912/ELM-1601179
The file discussed above is /global/cfs/cdirs/e3sm/inputdata/atm/datm7/NASA_LIS/clmforc.Li_2012_climo1995-2011.T62.lnfm_Total_c140423.nc .
The lightning data, variable lnfm, in the file is small (70K per timestep, 201MB for the entire variable), assuming that the variable is distributed across processes. Multiple timesteps can be cached by the model to improve read performance (by reducing the number of reads)
Since lnfm is a time dependent variable and time is an UNLIMITED dimension if you encounter issues (including performance issues) reading the entire variable in a single read you might want to convert the time dimension to a fixed size dimension first.
After converting the UNLIMITED dimension (time) to a fixed size dimension you would need to modify the decomposition map passed to pio_read_darray() to read multiple (or all) time slices of the lightning data (lnfm variable)
The text was updated successfully, but these errors were encountered: