Skip to content

zzz OBSOLETE Run GEOSldas

Rolf Reichle edited this page Jan 27, 2021 · 1 revision

To run GEOSldas execute the following command:

YOUR_EXPDIR/EXP_ID/run/sbatch lenkf.j

Note that lenkf.j itself is executable. Users can launch it directly using the command ./lenkf.j in interactive mode (see debugger instructions below on how to get compute nodes using xalloc).

How does lenkf.j work?

TSTEPS_PER_SGMT = JOB_SGMT (in seconds) / HEARTBEAT_DT
  1. The begin date is in cap_restart.

  2. JOB_SGMT is the time period to output checkpoints or restart files.

  3. (NUM_SGMT*JOB_SGTMT) is the simulation time period for which execution fits within the wall time limit (12 hours at NCCS).

  4. For each JOB_SGTMT time, GEOSldas advances cap_restart, re-links the restart file and post-processes the outputs, i.e., transforms binary to nc4, concatenates into daily files, etc.

  5. After (NUM_SGMT*JOB_SGTMT), lenkf.j re-submits itself and repeats from step 1 until the END_DATE.

Some often asked questions and answers:

What if an NCCS downtime or some other reason kills my job before the simulation is complete?

Go to YOUR_EXPDIR/EXP_ID/run/ and verify that cap_restart matches the time stamp in the link ../input/restart/catch_internal_rst, then run sbatch lenkf.j

What if I decide to extend the simulation beyond the initially set END_DATE?

Go to YOUR_EXPDIR/EXP_ID/run, change END_DATE in CAP.rc and run sbatch lenkf.j

Can I change the ntasks after ldas_setup?

Yes. Change the following line in lenkf.j

#SBATCH --ntasks=xxx
How are JOB_SGMT and NUM_SGMT set to make the run efficient?

The table below lists an example for a Catchment simulation on the M36 global domain with two default HISTORY collections of daily output using 56 processors. The two experiments are identical except that Exp_A writes monthly restarts, and Exp_B writes daily restarts. The results illustrate that it is not efficient to write many restarts within a single 12-hour NCCS job window. The job runs most efficiently when NUM_SGMT=1 and JOB_SGMT is such that one GEOSldas.x job (for the period NUM_SGMT*JOB_SGMT) finishes within 12 hours of wall time. Note that restarts (especially the carbon restarts needed for CatchmentCN) are large files. If only some restart variables are needed for later analysis, users should create a custom file collection using the HISTORY.rc functionality.

EXP ID JOB_SGMT NUM_SGMT Wall time Description
EXP A 00000100 000000 (monthly restarts) 1 7m 25s The longer JOB_SGMT is more efficient. But the job will have to restart from the beginning of the month if it is interrupted for any reason.
EXP B 00000001 000000 (monthly restarts) 31 15m 17s Small values for JOB_SGMT are inefficient because of the overhead needed to frequently restart the GEOSldas executable. Storage may also be an issue if many restarts are written.