Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adaptivecpp: init at 24.06.0 #360893

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

blenderfreaky
Copy link

@blenderfreaky blenderfreaky commented Dec 1, 2024

OpenSYCL has been renamed to AdaptiveCpp, "due to external legal pressure" (see their repo)

The package is a 1:1 of pkgs/development/compilers/opensycl/default.nix, with the repository and name updated, as well as the version bumped to the newest.

Open questions:

  • Remove opensycl package and add an alias or warning?
  • Update dependencies? ROCm 6 should and does seem to work now, but I haven't done sufficient testing yet

Tagging maintainers: @yboettcher

Things done

Built successfully on x86_64 with AMD (ROCm) GPU.

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@NixOSInfra NixOSInfra added the 12. first-time contribution This PR is the author's first one; please be gentle! label Dec 1, 2024
@yboettcher
Copy link
Contributor

Interesting that you apparently had no issues with the rocm build, while I had quite some trouble just to try make it compile. I do see that the clhpp package in nixpkgs was updated a few weeks ago, and in general, some time has passed, so that might have helped.
As for rocm, did you try to use the "generic" workflow of acpp, or did you instruct it to build for a specific gpu target? For me, only the latter worked when I tried it. The generic workflow always errored out at some point with rocm.

As for the open questions that you posed, I'd say updating to rocm6 is preferable, if it's possible as well as creating an alias in pkgs/top-level/aliases.nix that throws with a notice that the package has been renamed.

@blenderfreaky
Copy link
Author

blenderfreaky commented Dec 4, 2024

False alarm on ROCm 6 building, I forgot nix-build doesn't pull in my configs config.rocmSupport = true; 😅
With basically anything I do I get the error error: unsupported option '-fzero-call-used-regs=used-gpr' for target 'amdgcn-amd-amdhsa'

However, it SHOULD be possible. Building the arch-linux package in distrobox works just fine. Their build script doesn't do anything special. I think we might just need a slightly newer ROCm for some fixes? They use ROCm 6.2.4 and LLVM 18.1. We're on ROCm 6.0.2

@blenderfreaky
Copy link
Author

As it turns out, the issue was caused by Nix' default hardening options (specifically zerocallusedregs)

  • Updated to use LLVM 17 (latest version that should work with our ROCm (see here), though I can't verify if this plays nice with CUDA and others as I am on AMD)
  • Updated to use ROCm 6
  • Fixed formatting

Adds adaptivecpp, based on opensycl package.
Updated to newest version and to use LLVM 17 and ROCm 6.
@blenderfreaky
Copy link
Author

I've added the test suite, but I'm not sure it actually runs on GPU. It compiles and runs without complaints but it only seems to use CPU (nothing shows up in nvtop while btop spikes).

Looking at their CI, they seem to specify specific Target Architecture. It's not the most elegant, but I've passed it through like this, though it doesn't actually seem to make any difference in what runs:

nix-build -A adaptivecpp.tests --arg config '{ rocmSupport = true; }' --arg targetsBuild '"omp;hip:gfx1030"' --arg targetsRun '"omp;hip"'

Could you kindly take a look at this @yboettcher?

@yboettcher
Copy link
Contributor

I might be wrong, but given that nix builds are usually rather "sealed off" I would not expect a nix-build call to be able to use (or even detect) a gpu.

I cloned your branch, built the adativecppWithRocm package (with nix-build) and then used that to try and build the adaptivecpp tests manually (in a clone of the adaptivecpp repo).

  • When building the tests with -DACPP_TARGETS=omp all is well and it runs on the host cpu.

  • Trying to build with -DACPP_TARGETS=hip:gfx1101 I get error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library. I believe this might be because only one of the resulting binaries is wrapped with the --rocm-device-lib-path. In the postFixup script, only syclcc-clang is wrapped with the rocm-libs, while (I assume) the build uses the acpp binary which did not exist back then when it was still opensycl. So I assume this could be fixed by adding a wrapProgram for the acpp binary that is similar to the one already in place for syclcc-clang.
    I could fix this issue and actually run it on my gpu by adding

+ ''
      wrapProgram $out/bin/acpp \
        --prefix PATH : ${
          lib.makeBinPath [
            python3
            lld
          ]
        } \
        --add-flags "-L${llvmPackages.openmp}/lib" \
        --add-flags "-I${llvmPackages.openmp.dev}/include" \
    ''
    + lib.optionalString rocmSupport ''
      --add-flags "--rocm-device-lib-path=${rocmPackages.rocm-device-libs}/amdgcn/bitcode"
    ''

to the postFixup in addition to whats already there.
I also had to set NIX_HARDENING_ENABLE = "", or otherwise the '-fzero-call-used-regs=used-gpr' would come back. I also had to pull in boost llvmPackages_17.openmp llvmPackages_17.bintools, but that might just be due to me running outside of any special dev environment (have a feeling though that we might want to add the bintools to the path similar to how it's done with python, because I also had to use that when using acpp "standalone" to compile a single file for hip.).

  • When building without any ACPP_TARGETS or with -DACPP_TARGETS=generic it compiles, but fails to run with plenty of
[AdaptiveCpp Error] from /build/source/include/hipSYCL/glue/llvm-sscp/jit.hpp:297 @ compile(): jit::compile: Encountered errors:
0: LLVMToAmdgpu: hiprtcLinkComplete() failed. Setting the environment variables AMD_COMGR_SAVE_TEMPS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1 might reveal more information.

[AdaptiveCpp Error] from /build/source/src/runtime/hip/hip_queue.cpp:692 @ submit_sscp_kernel_from_code_object(): hip_queue: Code object construction failed
[AdaptiveCpp Error] from /build/source/include/hipSYCL/glue/llvm-sscp/jit.hpp:297 @ compile(): jit::compile: Encountered errors:
0: LLVMToAmdgpu: hiprtcLinkComplete() failed. Setting the environment variables AMD_COMGR_SAVE_TEMPS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1 might reveal more information.

[AdaptiveCpp Error] from /build/source/src/runtime/hip/hip_queue.cpp:692 @ submit_sscp_kernel_from_code_object(): hip_queue: Code object construction failed
[AdaptiveCpp Error] from /build/source/include/hipSYCL/glue/llvm-sscp/jit.hpp:297 @ compile(): jit::compile: Encountered errors:
0: LLVMToAmdgpu: hiprtcLinkComplete() failed. Setting the environment variables AMD_COMGR_SAVE_TEMPS=1 AMD_COMGR_REDIRECT_LOGS=stdout AMD_COMGR_EMIT_VERBOSE_LOGS=1 might reveal more information.

[AdaptiveCpp Error] from /build/source/src/runtime/hip/hip_queue.cpp:692 @ submit_sscp_kernel_from_code_object(): hip_queue: Code object construction failed
[AdaptiveCpp Error] accessor [host]: Aborting synchronization, runtime error list is non-empty

messages. Setting the mentioned variables however, only added some initial output about a few AMD_COMGR_ACTION_... calls that all ended with AMD_COMGR_STATUS_SUCCESS, so I guess that's not helpful.

And I think this is also how far I've gotten in trying to make this work: Explicit compilation for a specific target works (when adding that extra wrapProgram section), but the generic target does not work (and that's where I just gave up back then :/ #295845 (comment)). Although I think I gave up, because I also struggled with opencl a lot, which does not seem to be an issue here. If I remember correctly, it got enabled automatically back then because rocm also provided opencl, which is why I tried to make it work.

That said, I think I'd rather have a dysfunctional generic target with an all around updated compiler where you can at least specify a specific target, than no updated compiler (which also does not even have a generic target). Although I would prefer if we could somehow make this work, but I honestly don't know why it fails. It might even be that the way rocm is installed on nixos is the problem. Or not.
So, I guess what's left here would be adding that second wrapProgram and consider adding the bintools to the path. (I just saw: llvm-objcopy is also provided by llvmPackages_17.libllvm. Maybe that's the more appropriate package to use for this?)

@tdavidcl
Copy link

tdavidcl commented Dec 7, 2024

That said, I think I'd rather have a dysfunctional generic target with an all around updated compiler where you can at least specify a specific target, than no updated compiler (which also does not even have a generic target).

I second this, having a up to date version would be much appreciated even if the generic backend does not yet work.
Otherwise, since other backends are apparently working could this be merged without SSCP for the time being until a fix is found ?

@illuhad
Copy link

illuhad commented Dec 8, 2024

Hi - this discussion has made it into the AdaptiveCpp discord ;)

Without having seen the JIT logs it's hard to say why generic fails - does it work with other JIT backends?

If ROCm thinks that it has generated the device binary successfully, then it is possible that there is an internal ROCm issue. IIRC there were also some ROCm versions where the retrieval of the ROCm log was prevented by a bug (in case of an error, ROCm would return before filling the error log). This might play a role here.
Otherwise it might also be worthwhile to look at the ACPP_DEBUG_LEVEL=3 output - in particular, to check whether it finds and links the ROCm bitcode libraries.
Recent AdaptiveCpp versions (current develop branch) support the ACPP_S2_DUMP_IR_FINAL=1 environment variable, which causes the final IR to be dumped to a file before handing it off to ROCm. This could be helpful for validation.
(See https://github.com/AdaptiveCpp/AdaptiveCpp/blob/develop/doc/env_variables.md#environment-variables-to-control-dumping-ir-during-jit-compilation)

I don't want get involved in your internal prioritization, but perhaps it is helpful for your decision process to outline how we see these compilation flows from the perspective of the AdaptiveCpp project.

  • generic is where we've strongly focused all optimization and compiler functionality work for the last years.
  • generic supports many features that hip and cuda do not (and cannot) support. The performance is also better.
  • I see hip, cuda and omp as niche compilation flows that are primarily helpful for specific use cases (in particular certain types of interop), not for general purpose.
  • hip/cuda/omp will be maintained, but it is very unlikely that they will see new features being added.
  • generic and hip/cuda/omp are very, very different. From the compiler design, to the involved LLVM transformations and even the frontend. So generic running on e.g. sm_86 is not comparable or related in any way to targeting cuda:sm_86. They are entirely independent compilers.

@blenderfreaky
Copy link
Author

blenderfreaky commented Dec 8, 2024

Looking at the output more closely, I'm not convinced the executables should be wrapped. I've built it without any wrapping, and /nix/store/HASH-adaptivecpp-24.06.0/etc/AdaptiveCpp/acpp-rocm.json contains all the paths, which should be used as the default arguments. (Also note, acpp, syclcc and syclcc-clang all seem to be the same file, the code just branches depending on the filename)

I got different results for the tests though (both with and without wrapping):
cmake .. -DAdaptiveCpp_DIR=/nix/store/HASH-adaptivecpp-24.06.0/lib/cmake/AdaptiveCpp -DBOOST_ROOT=/nix/store/HASH-boost-1.81.0-dev/lib/cmake/ -DACPP_TARGETS=generic compiles and sycl_tests runs fine, just like without ACPP_TARGETS.
However none of them run on my GPU (RX 6800), I get:
Default-selected queue runs on device: hipSYCL OpenMP host device

@tdavidcl
Copy link

tdavidcl commented Dec 8, 2024

I just tested the generic backend on Nvidia hardware, it seems to be working correctly.

@yboettcher
Copy link
Contributor

yboettcher commented Dec 9, 2024

@illuhad, thank you very much for your insights!
I guess it works fine with other backends. I compiled a simple test file with generic and at runtime told AdaptiveCpp to use the omp backend using the ACPP_VISIBILITY_MASK variable, which ran fine (just to double check, I also used the variable to enforce the hip backend which lead to the error I encountered earlier).

Given that we get the jit::compile: Encountered errors: line, which looks like it originates from here, I would assume that both rocm and AdaptiveCpp know that compilation failed.

ACPP_DEBUG_LEVEL=3 does give me quite a lot of logging, but (at least to me) it did not contain anything suspicious.

Given that generic appears to be the most important target for AdaptiveCpp, I would like to make this work, if it is possible from within this build script. However, given that the call to hiprtc appears to fail, it might also be "something" with nixos rocm packaging, which I guess is out of scope here.

I added some more details (including the dumped IR) here.

@blenderfreaky
Interesting find with the wrapping. I also just removed all wrapping and it did "just work". Well, it still didn't run for generic on hip, but it apparently also did not break anything else. Aside from compiling with --acpp_targets="hip" which now complains about being unable to find the rocm device library. But that can be dealt with by the user by supplying a suitable path.

I noticed that when I run a binary that was compiled with generic with the ACPP_DEBUG_LEVEL=3 variable set, it tells me about the various backends AdaptiveCpp discovers. Does rocm show up for you there?

Log excerpt when executing
[AdaptiveCpp Info] backend_loader: Searching path for backend libs: '"/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/bin/../lib/hipSYCL"'
[AdaptiveCpp Info] backend_loader: Successfully opened plugin: "/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/bin/../lib/hipSYCL/librt-backend-hip.so" for backend 'hip'
[AdaptiveCpp Info] backend_loader: Successfully opened plugin: "/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/bin/../lib/hipSYCL/librt-backend-omp.so" for backend 'omp'
[AdaptiveCpp Info] Registering backend: 'hip'...
[AdaptiveCpp Info] Registering backend: 'omp'...
[AdaptiveCpp Info] Discovered devices from backend 'HIP': 
[AdaptiveCpp Info]   device 0: 
[AdaptiveCpp Info]     vendor: AMD
[AdaptiveCpp Info]     name: AMD Radeon RX 7800 XT
[AdaptiveCpp Info] Discovered devices from backend 'OpenMP': 
[AdaptiveCpp Info]   device 0: 
[AdaptiveCpp Info]     vendor: the hipSYCL project
[AdaptiveCpp Info]     name: hipSYCL OpenMP host device

@illuhad
Copy link

illuhad commented Dec 9, 2024

@yboettcher thanks for the details. I assumed that since

Setting the mentioned variables however, only added some initial output about a few AMD_COMGR_ACTION_... calls that all ended with AMD_COMGR_STATUS_SUCCESS, so I guess that's not helpful.

there was an issue retrieving the logs. Anyway, from you ACPP_DEBUG_LEVEL=3 I noticed that there is no sign of it linking any bitcode libraries into the kernel. You would expect to see something like

LLVMToAmdgpu: Linking with bitcode file: ....

for each of the relevant ROCm bitcode libraries like ockl.bc, ocml.bc, oclc_*.bc etc.
Probably - for some reason - those bitcode libraries don't get linked, which then results in any function calls that AdaptiveCpp does into these bitcode libraries to remain unresolved. The result is then a JIT failure.

From you IR output, we can see that it tries to call __ockl_get_group_id() and __ockl_get_local_id() from ROCm ockl.bc, so it clearly needs these libraries for this kernel. Apart from that, I'm not seeing anything interesting or surprising in the IR - it looks fine.

The way the AdaptiveCpp amdgpu backend discovers these bitcode libraries is at the moment a bit hacky (for various reasons): It tries to invoke hipcc to figure out which bitcode libraries hipcc links, and then pull in the same ones. My guess is that something goes wrong there.
If you want to debug this, this is where I'd look and put in a couple of printf statements: https://github.com/AdaptiveCpp/AdaptiveCpp/blob/2d47fb0d533db54dbba3bf803f3997ac95929241/src/compiler/llvm-to-backend/amdgpu/LLVMToAmdgpu.cpp#L126

I noticed that when I run a binary that was compiled with generic with the ACPP_DEBUG_LEVEL=3 variable set, it tells me about the various backends AdaptiveCpp discovers. Does rocm show up for you there?

You can also try acpp-info -l, which prints the same information but avoids you having to sift through pages of debug output ;)
But yes, I agree that when it runs unexpectedly on the host it's much more likely to be an issue with the runtime not seeing the backend/device rather than a compiler issue.

@yboettcher
Copy link
Contributor

I am guessing here, but given that it works on omp and nvidia, I would assume that the problem is not with this adaptivecpp derivation, but maybe somewhere in how rocm is working on nixos. But that's just a guess.

In any case, unless we can find some "simple fix" by just supplying adaptivecpp with some parameter or including another buildInput or something like that, I would say that fixing this issue is out of scope of this PR, and I would say we can merge. Cpu and NVidia users gain the generic backend, and Amd users would have to use the --acpp_targets=hip:gfx<version> and supply a --rocm-device-lib-path=... , which should result in the same compilation as before.
I would be curious though, whether the wrapProgram can be entirely removed. @tdavidcl, could you try to remove the entire postFixup section and run again on the generic/nvidia backend? If that works, I guess the wrapProgram is not needed anymore.

@blenderfreaky
Copy link
Author

It turns out I linked to the wrong path in the nix store. I now also get the AMD_COMGR errors, and *_hip and *_omp both show up.

Looking into the code @illuhad linked, acpp should try to run hipcc -x ir --cuda-device-only -O3 -nogpuinc /dev/null --hip-link -### --cuda-gpu-arch=<YOUR GFX ARCH>.
Running the command in a shell with clr and rocm-device-libs fails with

clang: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library

Passing --rocm-device-lib-path makes it run however. But acpp does not pass this, as long as it finds hipcc. It only does so as a fallback, if it doesn't find hipcc (instead finding clang for example). Forcing this fallback to always run via a patch did not solve the problem however, as it tried looking in ${rocmPackages.clr}/amdgcn/bitcode rather than ${rocmPackages.rocm-device-libs}/amdgcn/bitcode.

I tried re-adding the wrapper, however that didn't help at all. @yboettcher, how exactly did you pass the arguments for it to see ROCm?

Revisting the acpp-rocm.json file mentioned in my last comment, it seems to pass --rocm-device-lib-path, however it points to the (false) link into rocmPackages.clr. Simply forcefully replacing this file in the fixup phase did not fix it however.

One thing I noticed is that --rocm-device-lib-path and --rocm-device-libs-path (note the extra s) both appear in AdaptiveCpp source. However, the auto-generated json uses ..-lib-.., where as it reads in ..-libs-.., which seems like it might be a bug?

Nonetheless, peppering in some debug statements, it doesn't seem like it actually ever receives those arguments from the json files, so maybe this is a dead-end path anyways? Even though acpp --help does show the correct arguments as default, e.g.:

--acpp-rocm-cxx-flags=<value>
  [can also be set with environment variable: ACPP_ROCM_CXX_FLAGS=<value>]
  [default value provided by field 'default-rocm-cxx-flags' in JSON files from directories: ['/nix/store/yli7gzgjg18q83mahfinmlcj3nj1kqr9-adaptivecpp-24.06.0/etc/AdaptiveCpp'].]
  [current value: -isystem /nix/store/yli7gzgjg18q83mahfinmlcj3nj1kqr9-adaptivecpp-24.06.0/include/AdaptiveCpp/hipSYCL/std/hiplike -isystem /nix/store/nha35i03qnyfvdpb14glfxw8mrv2fww4-clang-17.0.6-dev/include -U__FLOAT128__ -U__SIZEOF_FLOAT128__ -I/nix/store/24zx3a3b1vcyb6a76px1f9vbbcjvq40z-clr-6.0.2/include --rocm-device-libs-path=/nix/store/8miza8sk0ky5n3s98rzfwsq2vssf94bh-rocm-device-libs-6.0.2/amdgcn/bitcode --rocm-path=/nix/store/24zx3a3b1vcyb6a76px1f9vbbcjvq40z-clr-6.0.2 -fhip-new-launch-api -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -D__HIP_ROCclr__]
 The arguments passed to the compiler to compile for the ROCm backend

If you want to try the patch-stuff I did, I've put it in a separate branch for now to keep this one somewhat merge-able. See blenderfreaky:adaptivecpp-amd

@illuhad
Copy link

illuhad commented Dec 9, 2024

One thing I noticed is that --rocm-device-lib-path and --rocm-device-libs-path (note the extra s) both appear in AdaptiveCpp source. However, the auto-generated json uses ..-lib-.., where as it reads in ..-libs-.., which seems like it might be a bug?

Yep, I think too that one of them is likely incorrect.

Nonetheless, peppering in some debug statements, it doesn't seem like it actually ever receives those arguments from the json files, so maybe this is a dead-end path anyways? Even though acpp --help does show the correct arguments as default, e.g.:

rocm-cxx-flags only affects the ahead-of-time compilation flow (--acpp-targets=hip). The kind of flags that need to be taken into account there are entirely different for generic, so it neither needs them nor uses them. So I don't think this will play a role for JIT of generic target.

The fallback paths are not well tested. I think the main assumption is that we require a working ROCm installation, and a working ROCm installation will have a functioning hipcc that users can just invoke without having to configure paths first.

Is it possible to fix hipcc? IIRC it might be possible to point AdaptiveCpp to a hipcc wrapper of your choice using -DHIPCC_COMPILER. This might be the easiest solution here.

@blenderfreaky
Copy link
Author

I don't think hipcc is broken per-se, it's just that NixOS seperates the device libs and hipcc into two separate packages.

Afaict, all that's needed is to somehow pass --rocm-device-lib-path through to the JIT. Looking through the code it's not obvious to me how to do that.

@illuhad
Copy link

illuhad commented Dec 9, 2024

I don't think hipcc is broken per-se, it's just that NixOS seperates the device libs and hipcc into two separate packages.

What is the workflow for users? Can they just invoke hipcc if both packages are installed without having to pass extra flags to configure it?
If yes, then it should work with AdaptiveCpp. If no, then probably every Makefile using hipcc will be broken too.

As I said, the fallback path is not well-tested. It would be more predictable to get it to work with hipcc, where we deliberately do not pass in such flags:
https://github.com/AdaptiveCpp/AdaptiveCpp/blob/2d47fb0d533db54dbba3bf803f3997ac95929241/src/compiler/llvm-to-backend/amdgpu/LLVMToAmdgpu.cpp#L162

The fallback path seems to assume a standard ROCm directory layout where the bitcode libraries live in $ROCM_PATH/amdgcn/bitcode:
https://github.com/AdaptiveCpp/AdaptiveCpp/blob/2d47fb0d533db54dbba3bf803f3997ac95929241/src/compiler/llvm-to-backend/amdgpu/LLVMToAmdgpu.cpp#L214C54-L214C62
So there is no way to configure this in the JIT compiler. As I said, the assumption is that hipcc is already correctly configured and works out of the box.

@blenderfreaky
Copy link
Author

@illuhad somehow, the example builds and runs fine if I use the provided Makefile, but I haven't been able to even build it using CMake. I think that's a problem with my usage of CMake though.

I don't have much time this week for debugging but I've uploaded the comgr logs & bitcode in case you want to take a look.

@yboettcher
Copy link
Contributor

somehow, the example builds and runs fine if I use the provided Makefile, but I haven't been able to even build it using CMake

Same for me. I tried to also make a wrapped hipcc with the device libs and tried compile the rocm example @illuhad provided, but I just couldn't get any of it to work with cmake (with or without the wrapped hipcc). I initially only tried the cmake version, which looks like CMake failed on enable_language(HIP). With the Makefile however, there was no problem.

There seemed to be some issues with the builds on ofBorg, unsure if those are relevant though.

I don't know how this is set up, but if nixpkgs is set up in a way that ofborg failures prevent a merge, then I guess they're relevant. I once had issues where adaptivecpp tried to compile the opencl backend (because rocm also provides opencl), but that backend requires network access, which failed, making it necessary to manually disable the opencl backend. But I haven't encountered this issue in your branch.

@blenderfreaky
Copy link
Author

Gonna have wait for ofBorg to run through I guess

Do you by chance have the /opt/rocm workaround/hack from the wiki configured? I do, and after removing it, the Makefile fails too. But unlike CMake, just adding ROCM_INSTALL_DIR := /nix/store/...-clr-6.0.2/ instantly fixes it. Not even any merged rocm, wrapper, etc. required.

@yboettcher
Copy link
Contributor

I actually had a /opt/rocm thing set up, but apparently I ran an out of date version of that workaround from the old(?) wiki https://nixos.wiki/wiki/Amd_Gpu which used /opt/rocm/hip. I did have some issues with that when I played around with cmake (cmake complained about things not existing in /opt/rocm/lib) and manually added an /opt/rocm symlink to clr (outside of nix, just a plain old ln -s), which did not help with cmake, but that's probably why the makefile worked.

@illuhad
Copy link

illuhad commented Dec 11, 2024

I don't have much time this week for debugging but I've uploaded the comgr logs & bitcode in case you want to take a look.

@blenderfreaky Thanks, I don't know why we are seeing a JIT failure, but the bitcode does not actually seem to contain the kernel from your source file (only some AMD builtin kernels). So something seems definitely off, although I cannot say what exactly.

EDIT: It seems that __ockl_image_load_2Da is not resolved which likely causes the JIT failure. Apparently a lot of image support bitcode gets linked into the binary through the ROCm bitcode libraries, even though we're not using any of it in our example. This is what everything linked together looks like: https://godbolt.org/z/rqs78neqd

@ofborg ofborg bot added the ofborg-internal-error Ofborg encountered an error label Dec 12, 2024
@blenderfreaky
Copy link
Author

@illuhad Interesting that the it says amdhsa.target: amdgcn-amd-amdhsa--gfx700, considering I'm on gfx1030, is that normal?

Running the program with strace, two thing stood out to me:

access("/dev/dri/card128", F_OK)        = -1 ENOENT (No such file or directory)
access("/dev/dri/renderD128", F_OK)     = 0

Note that /dev/dri/card1 exists. Not sure this is relevant, but seemed odd.

openat(AT_FDCWD, "/nix/store/nj7c3fsf4a65c3ndfxn8253q42f4jzk7-libdrm-2.4.123/lib/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/vxgysj55gc9yq7g1wlb0vjjgd74hk84j-rocm-llvm-libunwind-6.0.2/lib/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/y1563grxzk23mapa57a6qzsjaqyvcw76-elfutils-0.191/lib/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/1hyxvd3zv5gyl9836y6895i567x04bqj-numactl-2.0.18/lib/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/wn7v2vhyyyi6clcyn0s9ixvl7d4d87ic-glibc-2.40-36/lib/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/wn7v2vhyyyi6clcyn0s9ixvl7d4d87ic-glibc-2.40-36/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/wn7v2vhyyyi6clcyn0s9ixvl7d4d87ic-glibc-2.40-36/lib/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/2d5spnl8j5r4n1s4bj1zmra7mwx0f1n8-xgcc-13.3.0-libgcc/lib/glibc-hwcaps/x86-64-v3/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/nix/store/2d5spnl8j5r4n1s4bj1zmra7mwx0f1n8-xgcc-13.3.0-libgcc/lib/glibc-hwcaps/x86-64-v3/", 0x7ffc483a2450, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/2d5spnl8j5r4n1s4bj1zmra7mwx0f1n8-xgcc-13.3.0-libgcc/lib/glibc-hwcaps/x86-64-v2/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/nix/store/2d5spnl8j5r4n1s4bj1zmra7mwx0f1n8-xgcc-13.3.0-libgcc/lib/glibc-hwcaps/x86-64-v2/", 0x7ffc483a2450, 0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/nix/store/2d5spnl8j5r4n1s4bj1zmra7mwx0f1n8-xgcc-13.3.0-libgcc/lib/libhsa-amd-aqlprofile64.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
newfstatat(AT_FDCWD, "/nix/store/2d5spnl8j5r4n1s4bj1zmra7mwx0f1n8-xgcc-13.3.0-libgcc/lib/", {st_mode=S_IFDIR|0555, st_size=48, ...}, 0) = 0

It seems to be looking for libhsa-amd-aqlprofile64.so, which doesn't seem to exist in any package. This seems to be a common issue (see ROCm/ROCm#1781), but I'm not sure if it's really the cause of our issue.

@illuhad
Copy link

illuhad commented Dec 12, 2024

@illuhad Interesting that the it says amdhsa.target: amdgcn-amd-amdhsa--gfx700, considering I'm on gfx1030, is that normal?

Are referring to the godbolt link? I don't think this is cause for concern. When a target architecture is not provided, clang/LLVM falls back to the oldest AMD GPU it knows, which presumably is gfx700.
I haven't added any flags in that godbolt link to request compilation for a specific GPU architecture, so that's probably why we get default behavior.
I think the exact instructions in the assembly are not particularly interesting, the point is rather that the LLVM IR compiles, and that some function calls remain unresolved in the assembly.

@blenderfreaky
Copy link
Author

Ah ok, makes sense.

I compiled and ran the example in arch via distrobox and compared the comgr commands, there are some interesting differences.
Here's just the differences:

On Arch:

ActionKind: AMD_COMGR_ACTION_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC

Compilation Args:
--rocm-path=/tmp/comgr-5bec98/rocm
-Xclang
-mlink-builtin-bitcode-postopt

Driver Job Args:
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/opencl.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/ocml.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/ockl.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/oclc_daz_opt_off.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/oclc_unsafe_math_off.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/oclc_finite_only_off.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/oclc_correctly_rounded_sqrt_off.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/oclc_wavefrontsize64_off.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/oclc_isa_version_1030.bc
-mlink-builtin-bitcode
/tmp/comgr-5bec98/rocm/amdgcn/bitcode/oclc_abi_version_500.bc

-mlink-builtin-bitcode-postopt

Others Actions:

amd_comgr_do_action:
          ActionKind: AMD_COMGR_ACTION_LINK_BC_TO_BC
             IsaName: amdgcn-amd-amdhsa--gfx1030
             Options: "code_object_v5"
                Path: 
            Language: AMD_COMGR_LANGUAGE_OPENCL_1_2
 Comgr Branch-Commit: makepkg-1e2c94795ee0
         LLVM Commit: 1e2c94795ee0d6ab8e2ff3035965a6b74e11b475
             Linking Bitcode: /tmp/comgr-d874ea/input/LLVM Binary
        ReturnStatus: AMD_COMGR_STATUS_SUCCESS

On Nix:

ActionKind: AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC

Compilation Args:
-nogpulib

Driver Job Args:
-nogpulib
-fcolor-diagnostics

-disable-llvm-verifier
-discard-value-names

Others Actions:

amd_comgr_do_action:
          ActionKind: AMD_COMGR_ACTION_ADD_DEVICE_LIBRARIES
             IsaName: amdgcn-amd-amdhsa--gfx1030
             Options: "code_object_v5"
                Path: 
            Language: AMD_COMGR_LANGUAGE_OPENCL_1_2
        ReturnStatus: AMD_COMGR_STATUS_SUCCESS

amd_comgr_do_action:
          ActionKind: AMD_COMGR_ACTION_LINK_BC_TO_BC
             IsaName: amdgcn-amd-amdhsa--gfx1030
             Options: "code_object_v5"
                Path: 
            Language: AMD_COMGR_LANGUAGE_OPENCL_1_2
             Linking: LLVM Binary
             Linking: opencl_lib.bc
             Linking: ocml_lib.bc
             Linking: ockl_lib.bc
             Linking: oclc_isa_version_1030.bc
             Linking: oclc_correctly_rounded_sqrt_off_lib.bc
             Linking: oclc_daz_opt_off_lib.bc
             Linking: oclc_finite_only_off_lib.bc
             Linking: oclc_unsafe_math_off_lib.bc
             Linking: oclc_wavefrontsize64_off_lib.bc
             Linking: oclc_abi_version_500_lib.bc
        ReturnStatus: AMD_COMGR_STATUS_SUCCESS

It seems the two are using different methods for linking, no idea why though.

@illuhad
Copy link

illuhad commented Dec 13, 2024

@blenderfreaky Are we looking at the same ROCm versions? IIRC there was a change where they changed how builtin bitcode libraries were included in the compilation. But it was back around ROCm 5.7.

I'm also a bit surprised to see -nogpulib in the nix compilation. This tells the compiler that bitcode libraries do not need to be included. But I'm not sure if this is related to our problem, or just a side effect of the change in bitcode handling that I mentioned.

EDIT: Notice the two different comgr actions:

  • AMD_COMGR_ACTION_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC
  • AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC

This indicates that the difference might be related to the mentioned ROCm change.

@blenderfreaky
Copy link
Author

@illuhad We're using ROCm 6.0.2 (Nix seems a little out of date here), the arch one is on 6.2.1 & 6.2.4 for some packages. Both are definitely >= ROCm 6 though.

I interpreted the difference in Action as the Arch one doing Linking and stuff in one step, whereas on the Nix one splits it out into several, linking in a later step.
What's odd as well is the --rocm-path being passed on Arch but not on Nix, maybe this is because of that difference in order as well?

@yboettcher Can you make any sense of the ofBorg failures? It looks like it's just the test failing, however:

  • It's failing to build, not to run
  • The error seems to be that it fails to resolve python, but python is an explicit build input for the tests. It might need to be a nativeBuildInput?
  • I cannot reproduce the failure locally, despite sandboxing being on. I don't know what other differences should be possible even?

We could just comment them out to get this merged sooner though I think, seeing as they don't run as they should anyways and it's working for Nvidia already.

@yboettcher
Copy link
Contributor

yboettcher commented Dec 13, 2024

I actually have the same failures ofborg has on my end. It appears that for the normal adaptivecpp build, the interpreter directives !#/usr/bin/env python do not get automatically changed to some nix path. When building adaptivecppWithRocm I get these messages at the end of the build

patching script interpreter paths in /nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0
/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/lib/cmake/OpenSYCL/syclcc-launcher: interpreter directive changed from "#!/usr/bin/env python3" to "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/bin/python3"
/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/lib/cmake/hipSYCL/syclcc-launcher: interpreter directive changed from "#!/usr/bin/env python3" to "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/bin/python3"
/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/lib/cmake/AdaptiveCpp/syclcc-launcher: interpreter directive changed from "#!/usr/bin/env python3" to "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/bin/python3"
/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/bin/acpp: interpreter directive changed from "#!/usr/bin/env python3" to "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/bin/python3"
/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/bin/syclcc: interpreter directive changed from "#!/usr/bin/env python3" to "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/bin/python3"
/nix/store/8373mglfq1n3zz4pf61qd3350zvx3mpq-adaptivecpp-24.06.0/bin/syclcc-clang: interpreter directive changed from "#!/usr/bin/env python3" to "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/bin/python3"

but when building the normal adaptivecpp, these messages are absent.
I assume that the rocm packages include python somewhere, triggering this autoreplacement. I added python3 to the buildInputs of adaptivecpp, which caused the interpreter paths to get auto-changed, and now my tests build and run fine.

Edit:
forgot to mention: without the changed interpreter path, the resulting binaries of the adaptivecpp derivation cannot run on their own and require the user to provide some python. At least for me.

$ result/bin/acpp
/usr/bin/env: ‘python3’: No such file or directory

$ nix-shell -p python3 --run "result/bin/acpp"
acpp [AdaptiveCpp compilation driver], Copyright (C) 2018-2024 Aksel Alpay and the AdaptiveCpp project
  AdaptiveCpp version: 24.06.0
  Installation root: /nix/store/3lygkianckvd07brsfjdqd5ijcrgzyz5-adaptivecpp-24.06.0
<more output>

@blenderfreaky
Copy link
Author

blenderfreaky commented Dec 17, 2024

Good catch, added python to the build deps.

Now ofBorg fails on darwin, but works on linux. Just to finally get something for NVIDIA and OMP merged, I've just simply marked the tests package as broken on darwin for now.

@ofborg ofborg bot removed the ofborg-internal-error Ofborg encountered an error label Jan 7, 2025
@github-actions github-actions bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin and removed 10.rebuild-darwin: 1-10 labels Jan 7, 2025
@blenderfreaky
Copy link
Author

I've been playing around with #367695, and I've managed to successfully run adaptivecpp on my AMD GPU (with the generic target afaict).

With the newer ROCm 6.3.1 in place (just merging the two PRs branches), all that's needed is a merged ROCm derivation with the wrapped hipcc, and telling CMake about the ROCm path.

Along the way, I've also updated to the 24.10 version.

@yboettcher Originally, the package used darwin.apple_sdk_11_0.callPackage, do you recall why? nixpkgs-vet was asking for the entire callPackage to be removed, which I did for now.

@tdavidcl, could you verify the updated version still plays nice with NVIDIA?

If NVIDIA works, I'd maintain that this can be merged as is. Once the ROCm update lands, this package should get working ROCm automatically. Maybe mark it as broken = rocmSupport until then?
Also, I'm not sure if the test suite should stay included. Thoughts?

@yboettcher
Copy link
Contributor

yboettcher commented Jan 7, 2025

Awesome that you got it to work 😁 When you say telling CMake about the ROCm path, do you refer to the CMake file of whatever project you are compiling with the adaptivecpp compiler? I did not test this yet, but I want to try it in the next few days.

As for darwin, this is where the darwin call was initially added: https://github.com/NixOS/nixpkgs/pull/236274/files#r1264687523. I also had issues with ofborg builds on darwin and apparently, that was what was necessary to make it work. Maybe using a newer sdk would help the darwin build? I don't know whether there is any other way to quickly check this without waiting for ofborg (aside from using apple hardware, which I don't have).

It does seem though that nixpkgs-vet does not like it when a package in pkgs/by-name/... is called with anything other than a plain callPackage. Was there a reason why you moved it there? Other than putting it back into pkgs/development and using a newer apple-sdk, I admittedly have no idea of how to make this work with darwin, so I'd personally be fine with marking this as broken if that is not an option/does not work.

Would it be possible to use the version of the rocm package in the broken check? As in broken = rocmSupport && lib.versionOlder rocm-runtime.version "6.3.1", or something like that (I just guessed rocm-runtime, I assume any rocm package following this versioning would suffice). This way, the adaptiveCppWithRocm package would start working without another update. It seems like something similar has been done for a handful of other packages, so it does not seem to be against any "rules".

As for the tests, I do like automated tests, but since the generic workflow depends on the available hardware and nixpkgs will only be able to test omp, it would not be able to detect any rocm/cuda specific failures, which does feel suboptimal. One could remove the targetsBuild input, or at least make it default to null, since — if I remember correctly — not passing ACPP_TARGETS should make it use the generic workflow (which you talk about in one of the comments), which in turn would only be able to discover the cpu and then run the omp backend. Then the test would at least show that the compiler and generic/omp workflow are working in a simple situation.

@illuhad
Copy link

illuhad commented Jan 8, 2025

if I remember correctly — not passing ACPP_TARGETS should make it use the generic workflow (which you talk about in one of the comments), which in turn would only be able to discover the cpu and then run the omp backend. Then the test would at least show that the compiler and generic/omp workflow are working in a simple situation.

If the generic JIT compiler is enabled in the AdaptiveCpp build, then it will indeed default to that for an empty ACPP_TARGETS. Testing generic with CUDA or ROCm requires having NVIDIA/AMD GPUs available.

As you say generic+omp can always be tested. Additionally, generic + OpenCL backend can be tested if you install an OpenCL implementation for CPU (e.g. Intel OpenCL CPU runtime). We test both in CI (plus with GPUs in our self-hosted CI).

not passing ACPP_TARGETS [...] would only be able to discover the cpu and then run the omp backend.

Note that device discovery is independent from ACPP_TARGETS. Device discovery (which devices the runtime sees) only depends on which backends are enabled when building AdaptiveCpp and on what is available in the system. If you use generic, then you will be able to run kernels on all devices that you can see.

@blenderfreaky
Copy link
Author

blenderfreaky commented Jan 8, 2025

The reason I moved it was because nix-vet complained.
It looks like nixpkgs wants to migrate everything to by-name eventually and new packages outside of it are discouraged.

By telling CMake I meant the two lines

      "-DHIPCC_COMPILER=${finalAttrs.rocmMerged}/bin/hipcc"
      "-DROCM_PATH=${finalAttrs.rocmMerged}"

which seem to be the variables it looks up.

Marking it as broken = lib.versionOlder ... sounds good, gonna do that.

I'm gonna refactor the tests as runner and then sycl, rt and etc. and see what ofBorg thinks of those.

@blenderfreaky
Copy link
Author

blenderfreaky commented Jan 9, 2025

If I'm understanding right, ofBorg is down for good now that 2024 has ended. The Eval GitHub Actions pass, though I don't think they're building tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.has: clean-up 8.has: package (new) This PR adds a new package 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 12. first-time contribution This PR is the author's first one; please be gentle! unresolved_build_failure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants