Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate gpu-driver-util into the driver images #180

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tariq1890
Copy link
Contributor

No description provided.

@tariq1890 tariq1890 force-pushed the integrate-driver-assistant branch from f0f0038 to 6761ca7 Compare January 6, 2025 23:52
@tariq1890 tariq1890 marked this pull request as ready for review January 7, 2025 00:04
@tariq1890 tariq1890 force-pushed the integrate-driver-assistant branch from 6761ca7 to 2865db2 Compare January 7, 2025 00:15
@tariq1890 tariq1890 force-pushed the integrate-driver-assistant branch from 2865db2 to 1ffa0da Compare January 7, 2025 00:20
cd driver/vgpu/src && \
go build -o vgpu-util && \
mv vgpu-util /work
go build -C driver/vgpu/src -o vgpu-util && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wondering if we should create a common folder for these go utilities? Like cmds, tools, ... ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think we could remove the nested vgpu/src directory and instead just have vgpu contain the go source files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I am open to doing this in a follow-up PR

Comment on lines +73 to +76
# Download the nvidia-driver-assistant to get the latest supported-gpus.json file
RUN apt-get update && \
apt-get install -y --no-install-recommends nvidia-driver-assistant && \
rm -rf /var/lib/apt/lists/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should download nvidia-driver-assistant in the build step, and only copy over the supported-gpus.json file to the final image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this means that we would have to add the cuda repo during that stage as well. Since we already download the fabricmanager and nscq packages from this state, I thought adding the nvidia-driver-assistant would be a simple change. Moreover, the nvidia-driver-assistant is a small package.

$ apt show nvidia-driver-assistant
Package: nvidia-driver-assistant
Version: 0.9.57.01-1
Priority: optional
Section: admin
Source: driver-assistant
Maintainer: Alberto Milone <[email protected]>
Installed-Size: 979 kB
Pre-Depends: dpkg (>= 1.15.7.2)
Depends: python3:any (>= 3.2~), python3-apt
Download-Size: 40.5 kB
APT-Manual-Installed: yes
APT-Sources: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages
Description: Detect and install the best NVIDIA driver packages for the system
 This piece of software is meant to help users deciding on which NVIDIA graphics
 driver to install, based on the detected system's hardware.

Copy link
Contributor Author

@tariq1890 tariq1890 Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like 892kB/979kB is taken up by the supported-gpus.json file.

$ ls -hal /usr/share/nvidia-driver-assistant/supported-gpus/supported-gpus*.json
-rw-r--r-- 1 root root 354K Oct 10 18:09 /usr/share/nvidia-driver-assistant/supported-gpus/supported-gpus.json
-rw-r--r-- 1 root root 538K Oct 10 18:09 /usr/share/nvidia-driver-assistant/supported-gpus/supported-gpus-mod.json

I don't think it's worth the added complexity to optimise for a couple hundreds of kilobytes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is the dependency on python.

Comment on lines +489 to +490
echo "cannot autodetect the kernel module type, please check /var/log/gpu-driver-util.log for more details..."
return 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on slack, should we cat the log file here before returning?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants