You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I get the config file as the example says.
Then I restart my crio service.
systemctl restart crio
Everything looks like peaceful.
But when I restart some deploys or delete some pods in k8s, the process will stuck in Terminating.
NAME READY STATUS RESTARTS AGE
...
backend-6b7945bc64-jqwl7 1/1 Running 0 19m
backend-ccfff5ccc-ktngm 1/1 Terminating 0 22m <---------- After 19 minutes still running
...
After a long time, I finally found this config will cause this problem:
...
[crio.runtime]
default_runtime = "nvidia"
...
This config changes the crio default user to "nvidia" not root, so the permission blocks all the action that crio wants to do.
After delete this config, crio returns to normal, however the new container can not use nvidia plugin anymore.
Therefore, I have these questions:
Why nvidia user is necessary?
Why root user can not use nvidia driver in container?
Any other way to setup config for crio that make it work funcationally?
It will be really helpful for any suggestion you provide.
Thank you very much! <3
The text was updated successfully, but these errors were encountered:
Hello there, I found a Fatal Error in nvidia-ctk.
After run the config command as
README
says:I get the config file as the example says.
Then I restart my crio service.
Everything looks like peaceful.
But when I restart some deploys or delete some pods in k8s, the process will stuck in
Terminating
.After a long time, I finally found this config will cause this problem:
This config changes the crio default user to "nvidia" not root, so the permission blocks all the action that crio wants to do.
After delete this config, crio returns to normal, however the new container can not use nvidia plugin anymore.
Therefore, I have these questions:
It will be really helpful for any suggestion you provide.
Thank you very much! <3
The text was updated successfully, but these errors were encountered: