Skip to content

Running GEOSldas v17.9.x on Discover

Rolf Reichle edited this page Nov 18, 2021 · 2 revisions

The NCCS/Discover system updates applied on 10 Nov 2021 cause runtime failures of GEOSldas releases v17.9.x on Haswell and Cascade Lake nodes. The following work-arounds are available and require only a minor modification of your job file ("lenkf.j"):

  1. Add module swap mpi/impi mpi/impi/2021.2.0 after g5_modules is sourced and other modules are loaded. This work-around should be zero-diff for GEOSldas (to be confirmed).

or

  1. Add module load mpi/impi-prov/19.1.3.304 (for GEOSldas v17.9.6, v17.9.4, v17.9.3, v17.9.2) or module load mpi/impi-prov/19.1.0.166 (for GEOSldas v17.9.1) after g5_modules is sourced. This work-around should be zero-diff for GEOSldas (confirmed for SMAP L4_SM Version 5 ops).

or

  1. Add #SBATCH --constraint=sky to request Skylake nodes. Note that each Skylake node has 40 processors. If you are adapting from a Haswell configuration, different inputs for --nodes and --ntasks-per-node may be more efficient, as long as the total number of tasks (=nodes*ntasks-per-node) remains unchanged and matches the existing domain decomposition (IMS.rc or JMS.rc). This work-around is generally not zero-diff for GEOSldas when there are changes in the layout of the nodes and the number of processors per node.

Only one of these work-arounds is needed. Testing so far suggests that the work-around can be implemented at run time. There is no need to recompile or re-run ldas_setup.