GPU wheel support #2446
Labels
building
Issue related to build/compilation
coreneuron
enhancement
gpu
improvement
Improvement over existing implementation
python
wheel
Overview of the feature
Ideally, a simple
pip install neuron
or similar would yield a NEURON installation that is capable of using CoreNEURON's GPU support. i.e.A previous attempt at this was introduced in #1452 and ultimately removed in #2378. This ticket concerns a (hypothetical) future attempt to re-introduce GPU wheel support in a more sustainable way.
Retrospective on previous efforts
All NEURON and CoreNEURON workflows involving custom MOD files use the
nrnivmodl
(ornrnivmodl-core
) scripts on the end user's machine. In essence, these scripts, on the user's machine:Note in particular that the last step involves linking together compiled code that was compiled on the user's machine, with the user's toolchain, at
nrnivmodl
runtime, with [Core]NEURON libraries that were compiled much earlier, on a different machine and with a different toolchain.This already causes problems that are nothing to do with GPU support, see, for example, #1963.
With GPU support enabled, GPU-aware code exists on both sides: some is compiled on the user's machine, while some is compiled and distributed as part of the wheel distribution.
Because CoreNEURON's GPU support currently requires the NVIDIA HPC C++ compiler,
nvc++
, the old GPU-enabled wheels had to be compiled withnvc++
(notg++
, which is used for the regular Linux wheels). As NVIDIA do not, so far as we know, provide any guarantees about forward/backward link compatibility between GPU-enabled code built with different versions ofnvc++
, we adopted the conservative approach of requiring an exact match.This implied that the user had to manually install the same version of the NVIDIA HPC SDK on their own machine as had been used to build the GPU-enabled wheel that they had installed. This is obviously possible, but it was not user-friendly. (Furthermore, it plainly does not scale if any other package were to ship
nvc++
-compiled wheels that required a different version.)In practice, limited developer resources meant that the version used to build the wheels was not regularly updated, and it was quite old by the time the old GPU wheel support was removed.
In addition, the
auditwheel
infrastructure caused significant problems with the extra runtime libraries used bynvc++
.In order for the binaries shipped in the wheel (such as
nrniv
) to be usable standalone, without a local installation of the NVIDIA HPC SDK, some of these runtime libraries were shipped as part of the wheel. This was not a problem in and of itself -- binaries such asnrniv
worked as expected -- but it was problematic in thenrnivmodl
workflow described above.Specifically, the shipped CoreNEURON
.so
linked against severallibnvidiastuff-{hash}.so
libraries, where the{hash}
suffix was added by the wheel-building infrastructure. Insidenrnivmodl
,nvc++
would linklibnvidiastuff.so
from the user's NVIDIA HPC SDK installation, as well as linking to the CoreNEURON library, resulting in an executable that links against bothlibnvidiastuff.so
(from the system) andlibnvidiastuff-{hash}.so
(from the wheel). Unsurprisingly, this caused problems. Working around this involved usingpatchelf
insidenrnivmodl
to remove the link dependency onlibnvidiastuff-{hash}.so
. Needless to say, this is unpleasant and fragile.Proposed way forwards
A lot of the complication above is due to the existence of GPU-enabled code inside the shared library that is shipped, compiled, inside the wheel. This code is principally responsible for:
nrnivmodl
is not runIf the source code were reorganised (in the direction of shipping more source code) so that all GPU-enabled code is compiled inside
nrnivmodl
, the main NEURON and CoreNEURON libraries (libnrniv.so
and so on) could be compiled withg++
as normal, meaning that the binary wheels would no longer be tied to a specific version ofnvc++
. This would also allow all compute-intensive code to be compiled on the user's machine, with knowledge of the CPU/GPU hardware that is being used.Without extra effort, this would imply that even models that only use default mechanisms (no custom MOD files) require
nrnivmodl
to be available on the user's machine. With a little extra effort, it might be possible to additionally ship a GPU-enabled mechanism library, compiled for lowest-common-denominator hardware with some specific version ofnvc++
, thatnrniv
could fall back to ifnrnivmodl
has not been run. Because the GPU-enabled code would be restricted to the default mechanism library, which is not an input tonrnivmodl
(or at least need not be), we would avoid thenvc++
mismatches described above.Foreseeable Impact
A user-friendly GPU-enabled wheel would further reduce the barrier to entry to using CoreNEURON's most performant mode of operation.
The text was updated successfully, but these errors were encountered: