Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCPP MPI interface #1106

Open
dustinswales opened this issue Jan 2, 2025 · 6 comments
Open

CCPP MPI interface #1106

dustinswales opened this issue Jan 2, 2025 · 6 comments

Comments

@dustinswales
Copy link
Collaborator

dustinswales commented Jan 2, 2025

Description

Many schemes have I/O in their initialization phases, but are not guarded by MPI commands. Adding these MPI commands, and their associated error checking, within the schemes introduces redundancies.

Explanation: This means that these schemes read input files with each MPI task individually at the same time. This can cause problems on parallel file systems with large core counts, as recently experienced on the DOD HPCMP system Narwhal. Reading this data with the MPI root rank only and then broadcasting it resolves the problem. However, coding up these MPI broadcast calls directly, capturing the error and reporting it in a CCPP-compliant way, is tedious and results in several lines of code for each MPI call. This can be hidden in a CCPP MPI interface that takes care of these CCPP-specific aspects.

Solution

Create a CCPP MPI interface

@DomHeinzeller @nusbaume @peverwhee @cacraigucar

@mwaxmonsky
Copy link

Would it be possible to add a few examples and/or pseudo code with this issue? Just to make sure we're on the same page as to how this would look to consumers and from a maintenance perspective.

@climbfuji
Copy link
Collaborator

With the caveat that this is a simplistic version that does not address all the CCPP requirements (do not stop model etc; ccpp_external_abort is a non-CCPP compliant NEPTUNE thing) and that does not take advantage of abstracting the MPI calls in a module yet:

module mpiutil

  use iso_fortran_env, only : real32, real64
  use mpi

  implicit none

  private
  public ccpp_bcast

  interface ccpp_bcast
     procedure :: bcast_i32d0
     procedure :: bcast_i32d1
     procedure :: bcast_i32d2
     procedure :: bcast_i32d3
     procedure :: bcast_r32d0
     procedure :: bcast_r64d0
     procedure :: bcast_r32d1
     procedure :: bcast_r64d1
     procedure :: bcast_r32d2
     procedure :: bcast_r64d2
     procedure :: bcast_r32d3
     procedure :: bcast_r64d3
     procedure :: bcast_r32d4
     procedure :: bcast_r64d4
     procedure :: bcast_r32d5
     procedure :: bcast_r64d5
     procedure :: bcast_ld0
  end interface ccpp_bcast

contains

! Helper routines for MPI broadcasting

   subroutine bcast_i32d0(arr, root, comm, ierr)
      integer, intent(inout) :: arr
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, 1, MPI_INTEGER, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_i32d0")
      end if
   end subroutine bcast_i32d0

   subroutine bcast_i32d1(arr, root, comm, ierr)
      integer, intent(inout) :: arr(:)
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, size(arr), MPI_INTEGER, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_i32d1")
      end if
   end subroutine bcast_i32d1

   subroutine bcast_i32d2(arr, root, comm, ierr)
      integer, intent(inout) :: arr(:,:)
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, size(arr), MPI_INTEGER, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_i32d2")
      end if
   end subroutine bcast_i32d2

   subroutine bcast_i32d3(arr, root, comm, ierr)
      integer, intent(inout) :: arr(:,:,:)
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, size(arr), MPI_INTEGER, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_i32d3")
      end if
   end subroutine bcast_i32d3

   subroutine bcast_r32d0(arr, root, comm, ierr)
      real(kind=real32), intent(inout) :: arr
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, 1, MPI_REAL, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_r32d0")
      end if
   end subroutine bcast_r32d0

   subroutine bcast_r64d0(arr, root, comm, ierr)
      real(kind=real64), intent(inout) :: arr
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, 1, MPI_DOUBLE_PRECISION, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_r64d0")
      end if
   end subroutine bcast_r64d0

   subroutine bcast_r32d1(arr, root, comm, ierr)
      real(kind=real32), intent(inout) :: arr(:)
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, size(arr), MPI_REAL, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_r32d1")
      end if
   end subroutine bcast_r32d1

   subroutine bcast_r64d1(arr, root, comm, ierr)
      real(kind=real64), intent(inout) :: arr(:)
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, size(arr), MPI_DOUBLE_PRECISION, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_r64d1")
      end if
   end subroutine bcast_r64d1

   subroutine bcast_r32d2(arr, root, comm, ierr)
      real(kind=real32), intent(inout) :: arr(:,:)
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, size(arr), MPI_REAL, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_r32d2")
      end if
   end subroutine bcast_r32d2

   subroutine bcast_r64d2(arr, root, comm, ierr)
      real(kind=real64), intent(inout) :: arr(:,:)
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, size(arr), MPI_DOUBLE_PRECISION, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_r64d2")
      end if
   end subroutine bcast_r64d2

   subroutine bcast_r32d3(arr, root, comm, ierr)
      real(kind=real32), intent(inout) :: arr(:,:,:)
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, size(arr), MPI_REAL, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_r32d3")
      end if
   end subroutine bcast_r32d3

   subroutine bcast_r64d3(arr, root, comm, ierr)
      real(kind=real64), intent(inout) :: arr(:,:,:)
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, size(arr), MPI_DOUBLE_PRECISION, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_r64d3")
      end if
   end subroutine bcast_r64d3

   subroutine bcast_r32d4(arr, root, comm, ierr)
      real(kind=real32), intent(inout) :: arr(:,:,:,:)
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, size(arr), MPI_REAL, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_r32d4")
      end if
   end subroutine bcast_r32d4

   subroutine bcast_r64d4(arr, root, comm, ierr)
      real(kind=real64), intent(inout) :: arr(:,:,:,:)
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, size(arr), MPI_DOUBLE_PRECISION, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_r64d4")
      end if
   end subroutine bcast_r64d4

   subroutine bcast_r32d5(arr, root, comm, ierr)
      real(kind=real32), intent(inout) :: arr(:,:,:,:,:)
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, size(arr), MPI_REAL, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_r32d5")
      end if
   end subroutine bcast_r32d5

   subroutine bcast_r64d5(arr, root, comm, ierr)
      real(kind=real64), intent(inout) :: arr(:,:,:,:,:)
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, size(arr), MPI_DOUBLE_PRECISION, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_r64d5")
      end if
   end subroutine bcast_r64d5

   subroutine bcast_ld0(arr, root, comm, ierr)
      logical, intent(inout) :: arr
      integer, intent(in) :: root, comm
      integer, intent(out) :: ierr
      call MPI_BCAST(arr, 1, MPI_LOGICAL, root, comm, ierr)
      if (ierr/=MPI_SUCCESS) then
         call ccpp_external_abort("mpiutil.F90:bcast_ld0")
      end if
   end subroutine bcast_ld0
     
end module mpiutil

@mwaxmonsky
Copy link

Thanks @climbfuji! Not to add more to your plate but could we add any details about what this would look like to users with the #ifdef ... construct we mentioned? Or if there are other mechanisms that we could use to interface with this?

@climbfuji
Copy link
Collaborator

I don't have an #ifdef examples at hand. Here is example code for how to use the above ccpp_bcast (again, simplistic, since I'll need to redo a lot of this when updating to the latest ccpp-physics code and the solution we as a group come up with for the MPI call wrappers.

      subroutine read_h2odata (h2o_phys, mpicomm, mpirank, mpiroot)
      use machine,  only: kind_phys
      use mpiutil, only: ccpp_bcast
      use h2o_def

      ...

      read_and_broadcast_1: if (mpirank==mpiroot) then
        open(unit=kh2opltc,file='global_h2oprdlos.f77', form='unformatted', convert='big_endian')

!--- read in indices
!---
        read (kh2opltc) h2o_coeff, latsh2o, levh2o, timeh2o
        write(*,*) 'Reading in h2odata from global_h2oprdlos.f77 '
        write(*,*) '     h2o_coeff = ', h2o_coeff
        write(*,*) '       latsh2o = ', latsh2o
        write(*,*) '        levh2o = ', levh2o
        write(*,*) '       timeh2o = ', timeh2o
      endif read_and_broadcast_1

      call ccpp_bcast(h2o_coeff, mpiroot, mpicomm, ierr)
      call ccpp_bcast(latsh2o,   mpiroot, mpicomm, ierr)
      call ccpp_bcast(levh2o,    mpiroot, mpicomm, ierr)
      call ccpp_bcast(timeh2o,   mpiroot, mpicomm, ierr)

      ...

One possible way to provide a generic plus additional, host-specific implementations of ccpp_bcast would be to have different files mpiutil.F90, mpiutil.fv3.F90, mpiutil.neptune.F90, ... - see existing examples for GFS_phys_time_vary etc. This way, any model that is happy with the generic approach (something simple as above?) can use this, and models that need a more elaborate solution can implement their own.

This does not address the issue of making code in the CCPP repo dependent on CCPP, however. For this, there are multiple solutions: ifdefs (#ifdef CCPP out the opening and closing read_and_broadcast_1 lines and the ccpp_bcast lines in the above example), stubs, ...

@dustinswales
Copy link
Collaborator Author

@mwaxmonsky Here is an example with a ton of MPI commands embedded within an initialization routine, rrtmgp_lw_gas_optics.F90 (I never removed the PP directives when mpi_f08 became a requirement, but they could be now). This module

  • reads data array sizes from file on master processor
  • allocates space for data on all processors (This is needed to achieve "statlessness")
  • read from file and broadcast data on master processor

When we were developing the code, we had the error checking piece, but for brevity we removed it after everything was working. With a CCPP MPI interface like Dom mentioned, this routine would look pretty much the same, but
call mpi_bcast()
would be replaced by
call ccpp_bcast()

@climbfuji
Copy link
Collaborator

Note to self. The update for the NRL physics, required for the transition of NEPTUNE to operations, was merged in https://github.nrlmry.navy.mil/NEPTUNE/ccpp-physics/pull/3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants