-
Notifications
You must be signed in to change notification settings - Fork 40
Support for shared libraries in GPU execution (python launch support) #795
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
@olupton : to reproduce the linking issue: git pull
git checkout pramodk/exclude-global-vars
git submodule update -f --init --recursive temporarily update hh.mod with call to random123: +++ b/coreneuron/mechanism/mech/modfile/hh.mod
@@ -112,6 +112,9 @@ UNITSOFF
sum = alpha + beta
ntau = 1/(q10*sum)
ninf = alpha/sum
+ VERBATIM
+ double xxx = nrnran123_dblpick(nullptr);
+ ENDVERBATIM
} and build:
and this should give: [ 81%] Running nrnivmodl-core with halfgap.mod
[INFO] Running: make -j1 -f /gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/bbpv/repos/bbp/coreneuron/build_gpu_pr/share/coreneuron/nrnivmodl_core_makefile ROOT=/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/bbpv/repos/bbp/coreneuron/build_gpu_pr MOD2CPP_BINARY=/gpfs/bbp.cscs.ch/home/kumbhar/workarena/systems/bbpv/repos/bbp/coreneuron/build_gpu_pr/bin/mod2c_core MODS_PATH=x86_64/corenrn/mod2c BUILD_TYPE=SHARED NRN_PRCELLSTATE=0
Default NMODL flags:
NVC++-W-1057-Static variables are not supported in acc routine - _ZN49_INTERNAL_27_x86_64_corenrn_mod2c_hh_cpp_4898369610coreneuron21_global_variables_ptrE (x86_64/corenrn/mod2c/hh.cpp: 364)
NVC++-W-1057-Static variables are not supported in acc routine - _ZN49_INTERNAL_27_x86_64_corenrn_mod2c_hh_cpp_4898369610coreneuron21_global_variables_ptrE (x86_64/corenrn/mod2c/hh.cpp: 374)
NVC++-W-1057-Static variables are not supported in acc routine - _ZN49_INTERNAL_27_x86_64_corenrn_mod2c_hh_cpp_4898369610coreneuron21_global_variables_ptrE (x86_64/corenrn/mod2c/hh.cpp: 440)
ptxas fatal : Unresolved extern function '_ZN10coreneuron17nrnran123_dblpickEPNS_15nrnran123_StateE'
NVC++-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (x86_64/corenrn/mod2c/hh.cpp: 444)
NVC++/x86-64 Linux 22.2-0: compilation aborted
make[3]: *** [x86_64/corenrn/build/hh.o] Error 2
make[2]: *** [bin/x86_64/special-core] Error 2
make[1]: *** [coreneuron/CMakeFiles/nrniv-core.dir/all] Error 2 |
Two posts from me about the compilation/linking issues:
we currently don't know how to produce a shared library containing both OpenACC and CUDA functions. |
This comment was marked as outdated.
This comment was marked as outdated.
b5083fd
to
89baf7b
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
9b8fe22
to
df32d6f
Compare
This comment was marked as outdated.
This comment was marked as outdated.
I think the CI failures are mainly because of build system issues, and using (apparently) not widely supported arguments to |
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the changes quickly and I added my quick comments here. The only thing that stands out is random123 but that could be rediscussed once we have initial tests are working.
9ebd22d
to
3989293
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
e9f8930
to
4af041d
Compare
This comment was marked as outdated.
This comment was marked as outdated.
4af041d
to
246918e
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
neuronsimulator/nrn#1922 is needed to make some tests (e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Don't have anything specific to add here except that rpath question.
4e0e386
to
0e24755
Compare
Various fixes to make GLOBAL variables in order to support shared library support for GPU/OpenACC build. * sync global variables like celsius from global to instance struct * partialPivLu: use nvc++ -cuda * instance struct no longer in unified memory * drop OpenMP async wait * fixes for ISPC, also drop ispc_celsius * cnrn_target_update_on_device * fix codegen with TABLE * fix unit tests * fmt: use upstream master with my nvhpc/22.3 + c++17 fix * global variables are always accessed via the instance struct in .ispc Related to BlueBrain/CoreNeuron#795
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - as this is discussed quite a bit and tested by Olli, I will merge this! 🚀
Codecov Report
@@ Coverage Diff @@
## master #795 +/- ##
==========================================
- Coverage 58.49% 57.74% -0.76%
==========================================
Files 102 103 +1
Lines 9412 9459 +47
==========================================
- Hits 5506 5462 -44
- Misses 3906 3997 +91
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
…BlueBrain/CoreNeuron#795) * coreneuron and mechanism library can be built as shared and it enables launching coreneuron on GPU via python * update MOD2C and NMODL fixes to handle GLOBAL variables See BlueBrain/mod2c/pull/78 See BlueBrain/nmodl/pull/904 * removed acc/openmp global annotations for celsius, pi and secondorder and they don't need to be copied on GPU * Pass Memb_list* as an argument for all common prototypes in order to support global variables via argument * free ml->instance if not empty * add link to libscopmath in neuron as well * nrn_ghk is now declared inline. * homegrown present table to avoid dynamic loading + acc_deviceptr limitations * use -gpu=nordc and make #pragma acc routine seq functions inline * drop -lscopmath as its folded in elsewhere * random123 header reorganisation * try and cleanup CLI11 handling. * try and consolidate build logic * some CORENEURON_ -> CORENRN_ for consistency. * export OpenACC flags to NEURON separately as well as part of the whole ... -lcoreneuron ... link line. * libcoreneuron.so -> libcorenrnmech.so, try and fix static builds * do not enable OpenMP in shared/OpenACC builds. * add rpaths inside nrnivmodl-core. * accept a private destructor function pointer from generated mechanisms * drop ${TEST_EXEC_PREFIX} that was causing simple tests to be executed on many ranks. * CORENEURON_GPU_DEBUG: add environment variable that enables cnrn_target_* debug messages. fixes BlueBrain/CoreNeuron#141 Co-authored-by: Olli Lupton <[email protected]> CoreNEURON Repo SHA: BlueBrain/CoreNeuron@12272f8
Description
Summary
enables launching coreneuron on GPU via pyton
and they don't need to be copied on GPU
TODOs
See see Avoid use of global variables in generated code mod2c#78
See Support for SHARED build with PGI OpenACC build #141 (comment)
into link errors, see
Support for SHARED build with PGI OpenACC build #141 (comment)
@olupton to rescue here!
pointer when coreneuron is launched via python. See
Support for SHARED build with PGI OpenACC build #141 (comment)
python
call to cuLaunchKernel returned error 400: Invalid handle
Closes #141.
Closes #599.
How to test this?
Build neuron master with this PR branch of coreneuron including updated mod2c submodule.
module load unstable gcc nvhpc cuda hpe-mpi cmake python-dev cmake .. -DCMAKE_INSTALL_PREFIX=`pwd`/install -DCORENRN_ENABLE_GPU=ON -DCORENRN_ENABLE_NMODL=OFF -DCORENRN_ENABLE_MPI=ON -DNRN_ENABLE_CORENEURON=ON -DNRN_ENABLE_INTERVIEWS=OFF -DNRN_ENABLE_TESTS=OFF make -j12 make install
No compile
ringtest
withnrnivmodl -coreneuron
and launch ringtest on GPU usingpython
as well asspecial
. See #141 (comment)Test System
CI_BRANCHES:NEURON_BRANCH=olupton/coreneuron-gpu-dynamic-loading,SPACK_BRANCH=olupton/coreneuron-gpu-dynamic