-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reinstate product groups #3208
Reinstate product groups #3208
Conversation
47cc7fe
to
f983698
Compare
Testing with full-sized GEFS found that the sheer number of tasks overloads rocoto, resulting in `rocotorun` taking over 10 min to complete or hanging entirely. To reduce the number of tasks, product groups are reimplemented so that multiple forecast hour are processed in a single task. However, the implementation is a little different than previously. The jobs where groups are enabled (atmos_products, oceanice_products, and wavepostsbs) have a new variable, `MAX_TASKS`, that controls how many groups to use. This setting is currently *per member*. The forecast hours to be processed are then divided into this many groups as evenly as possible without crossing forecast segment boundaries. The walltime for those jobs is then multiplied by the number of times in the largest group. A number of helper methods are added to Tasks to determine these groups and make a standard metatask variable dict in a centralized location. There is also a function to multiply the walltime, but this may be better off relocated to wxflow with the other time functions. As part of switching from a single value to a list, hours are no longer passed by rocoto as zero-padded values. The lists are comma-delimited (without spaces) and split apart in the job stub (`jobs/rocoto/*`), so each j-job call is still a single forecast hour. The offline post (upp) job is not broken into groups, since it really isn't used outside the analysis anymore. Gempak jobs that run over multiple forecast hours also aren't broken into groups yet. Resolves NOAA-EMC#2999
f983698
to
94143aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no major concerns with this PR. It looks good and clean to me. I have not created the XML and looked at it or run any tests.
A few comments inline with the review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good, just leaving a few suggested changes to resolve some typos I noticed.
Adds an alternative data dependency for gridded wave post so jobs can begin before the entire forecast is complete. Uses the next file existing to prevent reading incomplete files. Resolves NOAA-EMC#3210
All reviewer comments other than the location of test methods have been addressed, so I am going to go ahead and run CI for the first machine. |
CI Tests set up to run in /lfs/h2/emc/ptmp/emc.global/PR/PR_3208/RUNTESTS on WCOSS |
Need to look at the gempak jobs a little closer. |
CI Passed on Hercules in Build# 3
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @WalterKolczynski-NOAA !
Description
Testing with full-sized GEFS found that the sheer number of tasks overloads rocoto, resulting in
rocotorun
taking over 10 min to complete or hanging entirely. To reduce the number of tasks, product groups are reimplemented so that multiple forecast hour are processed in a single task. However, the implementation is a little different than previously.The jobs where groups are enabled (atmos_products, oceanice_products, wavepostsbs, atmos_ensstat, and gempak) have a new variable,
MAX_TASKS
, that controls how many groups to use. This setting is currently per member. The forecast hours to be processed are then divided into this many groups as evenly as possible without crossing forecast segment boundaries. The walltime for those jobs is then multiplied by the number of times in the largest group. For the gridded wave post job, the dependencies were also updated to trigger off of either the data being available or the appropriate segment completing (the dependencies had not been updated when the job was initially broken into fhrs).A number of helper methods are added to Tasks to determine these groups and make a standard metatask variable dict in a centralized location. There is also a function to multiply the walltime, but this may be better off relocated to wxflow with the other time functions.
As part of switching from a single value to a list, hours are no longer passed by rocoto as zero-padded values. The lists are comma-delimited (without spaces) and split apart in the job stub (
jobs/rocoto/*
), so each j-job call is still a single forecast hour.The offline post (upp) job is not broken into groups, since it really isn't used outside the analysis anymore.
Resolves #2999
Resolves #3210
Type of change
Change characteristics
How has this been tested?
Checklist