You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running standalone docker container in an Azure Linux 2.0 VM with nvidia container toolkit installed will lose access to the GPU and throw the error: "Failed to initialize NVML: Unknown Error" after the container is running for a while. The symptom is similar to that described in this known issue: #48 and can be reproed by running systemctl daemon-reload.
The issue would not show up if I explicitly set --device= for each NVIDIA device node in my system in the docker command. However, this is not a sustainable solution as the number of NVIDIA device nodes in the system may change based on the configuration and thus I'm wondering if there's a better way to let the container automatically access all the NVIDIA devices without explicitly setting --device= for each NVIDIA device node?
The text was updated successfully, but these errors were encountered:
Running standalone docker container in an Azure Linux 2.0 VM with nvidia container toolkit installed will lose access to the GPU and throw the error: "Failed to initialize NVML: Unknown Error" after the container is running for a while. The symptom is similar to that described in this known issue: #48 and can be reproed by running
systemctl daemon-reload
.The issue would not show up if I explicitly set
--device=
for each NVIDIA device node in my system in the docker command. However, this is not a sustainable solution as the number of NVIDIA device nodes in the system may change based on the configuration and thus I'm wondering if there's a better way to let the container automatically access all the NVIDIA devices without explicitly setting--device=
for each NVIDIA device node?The text was updated successfully, but these errors were encountered: