Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/nvidia-container-toolkit: add option for driver path #312253

Conversation

loicreynier
Copy link
Contributor

@loicreynier loicreynier commented May 16, 2024

Description of changes

Add an option nvidia-driver-path (in hardware.nvidia-container-toolkit) for systems where NVIDIA drivers are not set up by NixOS. This is useful for example in WSL systems where the Windows drivers can be used and are located in /usr/lib/wsl/lib.

Tested with Podman

podman run --rm --device=nvidia.com/gpu=all \
    nvcr.io/nvidia/nvhpc:24.3-devel-cuda_multi-ubuntu22.04 \
    nvidia-smi -L

on a WSL NixOS system with

hardware.nvidia-container-toolkit = {
  enable = true;
  nvidia-driver-path = "/usr/lib/wsl";
  mount-nvidia-executables = false;
  mount-nvidia-docker-1-directories = false;
};

Also tested the compilation and run of simple CUDA programs on that container.

Not sure yet if this option should always disables the mount-nvidia-executables and mount-nvidia-docker-1-directories options.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels May 16, 2024
@loicreynier loicreynier requested a review from ereslibre May 16, 2024 16:29
@ereslibre ereslibre self-assigned this May 16, 2024
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 labels May 16, 2024
Copy link
Member

@ereslibre ereslibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @loicreynier, trying to understand a bit here, since I'm not familiar with WSL myself.

We had a related case at nix-community/NixOS-WSL#433; how is it different from this, does enabling wsl.useWindowsDriver work for you? Or is this PR needed even in that case?

type = with lib.types; nullOr path;
default = null;
description = ''
Custom NVidia driver path for systems not using NixOS to setup NVidia drivers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Custom NVidia driver path for systems not using NixOS to setup NVidia drivers.
Custom Nvidia driver path for systems not using NixOS to setup Nvidia drivers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be consistent with current casing.

@@ -17,11 +17,14 @@
(mount:
["${lib.getExe jq} '${jqAddMountExpression} ${builtins.toJSON (mkMount mount)}'"])
mounts;
ldLibraryPathExport = if builtins.isString nvidia-driver then "export LD_LIBRARY_PATH=${nvidia-driver}/lib" else "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this behavior only when the provided path is a string? Please explain a little further how this new attribute might be used.

I would like to avoid surprises when nvidia-driver is something other than null and a string, we will silently fallback to not use the provided parameter. Is there any chance for the user to have provided a nix path?

Also, can we use a flag of nvidia-ctk cdi generate rather than using an envvar? I see --driver-root; others might be relevant too. I think this is more explicit and lets less room for unexpected behaviors.

@loicreynier
Copy link
Contributor Author

Thanks for pointing out nix-community/NixOS-WSL#433. I wasn't aware that NixOS WSL had this wsl.useWindowsDriver which solves the issue.

Linking the Window driver files (as done by this option) is probably a better approach than what I proposed here. I don't think this PR is needed anymore.

For reference, I couldn't make it work (without the above mentioned option) using the --driver-root and --library-search-path flags from nvidia-ctk cdi generate:

nvidia-ctk --debug cdi generate \
  --discovery-mode wsl \
  --driver-root /usr/lib/wsl/lib/ \
  --library-search-path /usr/lib/wsl/lib/
Output
DEBU[0000] Locating NVIDIA Container Toolkit CLI as nvidia-ctk
DEBU[0000] Locating "nvidia-ctk" in ...
DEBU[0000] Checking candidate '/run/current-system/sw/bin/nvidia-ctk'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Found nvidia-ctk candidates: [/run/current-system/sw/bin/nvidia-ctk]
DEBU[0000] Using NVIDIA Container Toolkit CLI path nvidia-ctk
DEBU[0000] Locating /dev/dxg
DEBU[0000] Locating "/dev/dxg" in [/usr/lib/wsl/lib /usr/lib/wsl/lib/dev]
WARN[0000] Could not locate /dev/dxg: pattern /dev/dxg not found
ERRO[0000] failed to generate CDI spec: failed to create edits common for entities: failed to create discoverer for WSL driver: failed to initialize dxcore: failed to initialize dxcore context

@ereslibre
Copy link
Member

Thanks for pointing out nix-community/NixOS-WSL#433. I wasn't aware that NixOS WSL had this wsl.useWindowsDriver which solves the issue.

I cannot help but think that there must be something we can do (at the NixOS-WSL level) to improve this user experience. If:

  • config.wsl.enable is true, and
  • config.hardware.nvidia-container-toolkit.enable is true,

Then, we can set wsl.useWindowsDriver = true, would that be reasonable from the NixOS-WSL standpoint?

For reference, I couldn't make it work (without the above mentioned option) using the --driver-root and --library-search-path flags from nvidia-ctk cdi generate:

Thanks for the input.

@loicreynier
Copy link
Contributor Author

Then, we can set wsl.useWindowsDriver = true, would that be reasonable from the NixOS-WSL standpoint?

I think it would be reasonable since, as far as I know, the easiest way to make it work is to implement something similar to what this option does.

Additionally, I think it is worth noting that since WSL aims to simplify the interaction between Linux and the Windows host as much as possible, we should avoid duplicates and rely solely on the native drivers.

@ereslibre
Copy link
Member

I think it would be reasonable since, as far as I know, the easiest way to make it work is to implement something similar to what this option does.

Thank you, I have created nix-community/NixOS-WSL#478. Let me know if that looks fine or if you think it can be improved somehow.

Additionally, I think it is worth noting that since WSL aims to simplify the interaction between Linux and the Windows host as much as possible, we should avoid duplicates and rely solely on the native drivers.

I might be missing here a bit, sorry. With this comment you mean that you encourage to use the useWindowsDriver=true option? This is what you mean with the native drivers, right? Or am I misunderstanding?

@ereslibre
Copy link
Member

This one can be probably closed. Thank you @loicreynier for the contribution and for following up on this.

@loicreynier
Copy link
Contributor Author

I might be missing here a bit, sorry. With this comment you mean that you encourage to use the useWindowsDriver=true option? This is what you mean with the native drivers, right? Or am I misunderstanding?

Yes exactly this is what I meant, sorry for the unclear comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants