Support for shared libraries in GPU execution (python launch support) #795

pramodk · 2022-04-03T22:45:27Z

Description

Summary

mod2c now generates code without need of global variables
coreneuron and mechanism library can be built as shared and it
enables launching coreneuron on GPU via pyton
scopmath library can be also shared
removed acc/openmp global annotations for celsius, pi and secondorder
and they don't need to be copied on GPU

TODOs

Closes #141.
Closes #599.

How to test this?

Build neuron master with this PR branch of coreneuron including updated mod2c submodule.

module load unstable gcc nvhpc cuda hpe-mpi cmake  python-dev

cmake .. -DCMAKE_INSTALL_PREFIX=`pwd`/install -DCORENRN_ENABLE_GPU=ON -DCORENRN_ENABLE_NMODL=OFF -DCORENRN_ENABLE_MPI=ON  -DNRN_ENABLE_CORENEURON=ON -DNRN_ENABLE_INTERVIEWS=OFF -DNRN_ENABLE_TESTS=OFF
make -j12
make install

No compile ringtest with nrnivmodl -coreneuron and launch ringtest on GPU using python as well as special. See #141 (comment)

Test System

Compiler: NVHPC & CUDA on BB5
Version: this PR and neuron@master
Backend: GPU

CI_BRANCHES:NEURON_BRANCH=olupton/coreneuron-gpu-dynamic-loading,SPACK_BRANCH=olupton/coreneuron-gpu-dynamic

pramodk · 2022-04-04T06:16:18Z

@olupton : to reproduce the linking issue:

git pull
git checkout pramodk/exclude-global-vars
git submodule update -f --init --recursive

temporarily update hh.mod with call to random123:

+++ b/coreneuron/mechanism/mech/modfile/hh.mod
@@ -112,6 +112,9 @@ UNITSOFF
        sum = alpha + beta
         ntau = 1/(q10*sum)
         ninf = alpha/sum
+        VERBATIM
+            double xxx = nrnran123_dblpick(nullptr);
+        ENDVERBATIM
 }

and build:

mkdir build_gpu_pr &&  cd build_gpu_pr
module load unstable gcc nvhpc cuda hpe-mpi cmake  python-dev
cmake .. -DCMAKE_INSTALL_PREFIX=`pwd`/install -DCORENRN_ENABLE_GPU=ON -DCORENRN_ENABLE_MPI=ON
make -j8

and this should give:

[ 81%] Running nrnivmodl-core with halfgap.mod
[INFO] Running: make -j1 -f /gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/bbpv/repos/bbp/coreneuron/build_gpu_pr/share/coreneuron/nrnivmodl_core_makefile ROOT=/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/bbpv/repos/bbp/coreneuron/build_gpu_pr MOD2CPP_BINARY=/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/bbpv/repos/bbp/coreneuron/build_gpu_pr/bin/mod2c_core MODS_PATH=x86_64/corenrn/mod2c BUILD_TYPE=SHARED NRN_PRCELLSTATE=0
Default NMODL flags:
NVC++-W-1057-Static variables are not supported in acc routine - _ZN49_INTERNAL_27_x86_64_corenrn_mod2c_hh_cpp_4898369610coreneuron21_global_variables_ptrE (x86_64/corenrn/mod2c/hh.cpp: 364)
NVC++-W-1057-Static variables are not supported in acc routine - _ZN49_INTERNAL_27_x86_64_corenrn_mod2c_hh_cpp_4898369610coreneuron21_global_variables_ptrE (x86_64/corenrn/mod2c/hh.cpp: 374)
NVC++-W-1057-Static variables are not supported in acc routine - _ZN49_INTERNAL_27_x86_64_corenrn_mod2c_hh_cpp_4898369610coreneuron21_global_variables_ptrE (x86_64/corenrn/mod2c/hh.cpp: 440)
ptxas fatal   : Unresolved extern function '_ZN10coreneuron17nrnran123_dblpickEPNS_15nrnran123_StateE'
NVC++-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (x86_64/corenrn/mod2c/hh.cpp: 444)
NVC++/x86-64 Linux 22.2-0: compilation aborted
make[3]: *** [x86_64/corenrn/build/hh.o] Error 2
make[2]: *** [bin/x86_64/special-core] Error 2
make[1]: *** [coreneuron/CMakeFiles/nrniv-core.dir/all] Error 2

olupton · 2022-04-12T12:05:31Z

Two posts from me about the compilation/linking issues:

we currently don't know how to produce a shared library containing both OpenACC and CUDA functions.

CMake/OpenAccHelper.cmake

coreneuron/CMakeLists.txt

extra/nrnivmodl_core_makefile.in

tests/integration/CMakeLists.txt

olupton · 2022-04-26T09:11:37Z

I think the CI failures are mainly because of build system issues, and using (apparently) not widely supported arguments to ar.

pramodk

I went through the changes quickly and I added my quick comments here. The only thing that stands out is random123 but that could be rediscussed once we have initial tests are working.

CMake/OpenAccHelper.cmake

coreneuron/CMakeLists.txt

tests/integration/CMakeLists.txt

coreneuron/utils/randoms/nrnran123.cpp

coreneuron/gpu/nrn_acc_manager.cpp

olupton · 2022-07-18T14:07:48Z

neuronsimulator/nrn#1922 is needed to make some tests (e.g. pynrn::basic_tests) pass on systems (e.g. BB5) where Numpy is built with Intel MKL.

bbpbuildbot · 2022-08-25T13:08:44Z

Logfiles from GitLab pipeline #71245 (:no_entry:) have been uploaded here!

Status and direct links:

bbpbuildbot · 2022-08-25T14:48:14Z

Logfiles from GitLab pipeline #71247 (:no_entry:) have been uploaded here!

Status and direct links:

pramodk

LGTM!

Don't have anything specific to add here except that rpath question.

extra/nrnivmodl_core_makefile.in

bbpbuildbot · 2022-08-25T16:57:21Z

Logfiles from GitLab pipeline #71296 (:no_entry:) have been uploaded here!

Status and direct links:

bbpbuildbot · 2022-08-26T10:20:28Z

Logfiles from GitLab pipeline #71347 (:no_entry:) have been uploaded here!

Status and direct links:

e

bbpbuildbot · 2022-08-26T14:04:52Z

Logfiles from GitLab pipeline #71424 (:white_check_mark:) have been uploaded here!

Status and direct links:

Various fixes to make GLOBAL variables in order to support shared library support for GPU/OpenACC build. * sync global variables like celsius from global to instance struct * partialPivLu: use nvc++ -cuda * instance struct no longer in unified memory * drop OpenMP async wait * fixes for ISPC, also drop ispc_celsius * cnrn_target_update_on_device * fix codegen with TABLE * fix unit tests * fmt: use upstream master with my nvhpc/22.3 + c++17 fix * global variables are always accessed via the instance struct in .ispc Related to BlueBrain/CoreNeuron#795

bbpbuildbot · 2022-08-28T11:54:17Z

Logfiles from GitLab pipeline #71522 (:no_entry:) have been uploaded here!

Status and direct links:

bbpbuildbot · 2022-08-28T14:51:44Z

Logfiles from GitLab pipeline #71526 (:white_check_mark:) have been uploaded here!

Status and direct links:

pramodk

LGTM - as this is discussed quite a bit and tested by Olli, I will merge this! 🚀

codecov-commenter · 2022-08-28T15:48:36Z

Codecov Report

Merging #795 (567bd21) into master (d6507e2) will decrease coverage by 0.75%.
The diff coverage is 24.68%.

@@            Coverage Diff             @@
##           master     #795      +/-   ##
==========================================
- Coverage   58.49%   57.74%   -0.76%     
==========================================
  Files         102      103       +1     
  Lines        9412     9459      +47     
==========================================
- Hits         5506     5462      -44     
- Misses       3906     3997      +91

Impacted Files	Coverage Δ
coreneuron/apps/corenrn_parameters.hpp	`100.00% <ø> (ø)`
coreneuron/apps/main1.cpp	`45.28% <0.00%> (-0.87%)`	⬇️
coreneuron/io/core2nrn_data_return.cpp	`1.13% <0.00%> (ø)`
coreneuron/io/nrn2core_data_init.cpp	`0.00% <0.00%> (ø)`
coreneuron/io/nrn_checkpoint.cpp	`4.35% <ø> (ø)`
coreneuron/mechanism/capac.cpp	`61.81% <ø> (ø)`
coreneuron/mechanism/eion.cpp	`68.26% <0.00%> (+12.36%)`	⬆️
coreneuron/mechanism/mechanism.hpp	`20.00% <ø> (ø)`
coreneuron/mechanism/membfunc.hpp	`0.00% <0.00%> (ø)`
coreneuron/mechanism/patternstim.cpp	`0.00% <0.00%> (ø)`
... and 25 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

…BlueBrain/CoreNeuron#795) * coreneuron and mechanism library can be built as shared and it enables launching coreneuron on GPU via python * update MOD2C and NMODL fixes to handle GLOBAL variables See BlueBrain/mod2c/pull/78 See BlueBrain/nmodl/pull/904 * removed acc/openmp global annotations for celsius, pi and secondorder and they don't need to be copied on GPU * Pass Memb_list* as an argument for all common prototypes in order to support global variables via argument * free ml->instance if not empty * add link to libscopmath in neuron as well * nrn_ghk is now declared inline. * homegrown present table to avoid dynamic loading + acc_deviceptr limitations * use -gpu=nordc and make #pragma acc routine seq functions inline * drop -lscopmath as its folded in elsewhere * random123 header reorganisation * try and cleanup CLI11 handling. * try and consolidate build logic * some CORENEURON_ -> CORENRN_ for consistency. * export OpenACC flags to NEURON separately as well as part of the whole ... -lcoreneuron ... link line. * libcoreneuron.so -> libcorenrnmech.so, try and fix static builds * do not enable OpenMP in shared/OpenACC builds. * add rpaths inside nrnivmodl-core. * accept a private destructor function pointer from generated mechanisms * drop ${TEST_EXEC_PREFIX} that was causing simple tests to be executed on many ranks. * CORENEURON_GPU_DEBUG: add environment variable that enables cnrn_target_* debug messages. fixes BlueBrain/CoreNeuron#141 Co-authored-by: Olli Lupton <[email protected]> CoreNEURON Repo SHA: BlueBrain/CoreNeuron@12272f8

pramodk added the gpu label Apr 3, 2022

pramodk marked this pull request as draft April 3, 2022 22:45