Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crictl info does not match containerd.toml #11631

Closed
santurini opened this issue Jan 20, 2025 · 8 comments
Closed

crictl info does not match containerd.toml #11631

santurini opened this issue Jan 20, 2025 · 8 comments

Comments

@santurini
Copy link

I am experiencing an issue with the NVIDIA device plugin on my k3s cluster after enabling MPS mode. In fact switching from the GPU-Operator to the standalone nvidia-device-plugin, every time a systemctl daemon-reload is triggered, my pods lose access to the GPUS (NVIDIA/nvidia-container-toolkit#48).

As the official issue suggest, a workaround should be changing the /etc/containerd/config.toml in order to set SystemdCgroup = false.

Even if I do this, when I run crictl info this is the configuration I have for the nvidia runtime:

  "config":
      ...
      "runtimes": {
        "nvidia": {
          ...
          "options": {
            "BinaryName": "/usr/local/nvidia/toolkit/nvidia-container-runtime",
            "SystemdCgroup": true
          },
        },

While this is my containerd configuration:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
  BinaryName = "/usr/bin/nvidia-container-runtime"
  CriuImagePath = ""
  CriuPath = ""
  CriuWorkPath = ""
  IoGid = 0
  IoUid = 0
  NoNewKeyring = false
  NoPivotRoot = false
  Root = ""
  ShimCgroup = ""
  SystemdCgroup = false

How can I change the configuration I get when running crictl info to change the SystemdCgroup ?

@brandond
Copy link
Member

Please read the docs: https://docs.k3s.io/advanced#configuring-containerd

K3s does not use /etc/containerd/config.toml.

@github-project-automation github-project-automation bot moved this from New to Done Issue in K3s Development Jan 20, 2025
@santurini
Copy link
Author

Thank you @brandond ! Once I create the config.toml.tmpl does it refresh the configuration on its own or do I have to recreate the cluster?

And last question: should I see the changes reflected when running crictl info?

@brandond
Copy link
Member

Just restart the service. And yes.

@santurini
Copy link
Author

santurini commented Jan 21, 2025

@brandond Sorry to bother you again but I'm not practical with Go templates, how should my config.toml.tmpl look like if I just want to change this section SystemdCgroup = true of the config.toml to false:

# File generated by k3s. DO NOT EDIT. Use config.toml.tmpl instead.
version = 2
...
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia".options]
  BinaryName = "/usr/local/nvidia/toolkit/nvidia-container-runtime"
  SystemdCgroup = true

Do I just copy and paste this configuration and edit the line?

@brandond
Copy link
Member

Why are you trying to change that bit? That should be set to true if k3s is running under systemd. If k3s is not running under systemd it will automatically be set to false.

@santurini
Copy link
Author

Because I am using NVIDIA MPS in the cluster, this means I am not using the GPU Operator but the standalone nvidia-device-plugin which has this known issue. One of the workarounds is to set SystemCgroup to false in the nvidia runtime options.

@santurini
Copy link
Author

Using this config.toml.tmpl actually solved the error, is this ok?

# File generated by k3s. DO NOT EDIT. Use config.toml.tmpl instead.
version = 2

[plugins."io.containerd.internal.v1.opt"]
  path = "{{ .NodeConfig.Containerd.Opt }}"

[plugins."io.containerd.grpc.v1.cri"]
  stream_server_address = "127.0.0.1"
  stream_server_port = "10010"
  enable_selinux = {{ .NodeConfig.SELinux }}
  enable_unprivileged_ports = {{ .EnableUnprivileged }}
  enable_unprivileged_icmp = {{ .EnableUnprivileged }}

{{- if .NodeConfig.AgentConfig.PauseImage }}
  sandbox_image = "{{ .NodeConfig.AgentConfig.PauseImage }}"
{{end}}

{{- if .NodeConfig.AgentConfig.Snapshotter }}
[plugins."io.containerd.grpc.v1.cri".containerd]
  snapshotter = "{{ .NodeConfig.AgentConfig.Snapshotter }}"
  disable_snapshot_annotations = {{ if eq .NodeConfig.AgentConfig.Snapshotter "stargz" }}false{{else}}true{{end}}
  {{ if .NodeConfig.DefaultRuntime }}default_runtime_name = "{{ .NodeConfig.DefaultRuntime }}"{{end}}
{{ if eq .NodeConfig.AgentConfig.Snapshotter "stargz" }}
{{ if .NodeConfig.AgentConfig.ImageServiceSocket }}
[plugins."io.containerd.snapshotter.v1.stargz"]
cri_keychain_image_service_path = "{{ .NodeConfig.AgentConfig.ImageServiceSocket }}"
[plugins."io.containerd.snapshotter.v1.stargz".cri_keychain]
enable_keychain = true
{{end}}

[plugins."io.containerd.snapshotter.v1.stargz".registry]
  config_path = "{{ .NodeConfig.Containerd.Registry }}"

{{ if .PrivateRegistryConfig }}
{{range $k, $v := .PrivateRegistryConfig.Configs }}
{{ if $v.Auth }}
[plugins."io.containerd.snapshotter.v1.stargz".registry.configs."{{$k}}".auth]
  {{ if $v.Auth.Username }}username = {{ printf "%q" $v.Auth.Username }}{{end}}
  {{ if $v.Auth.Password }}password = {{ printf "%q" $v.Auth.Password }}{{end}}
  {{ if $v.Auth.Auth }}auth = {{ printf "%q" $v.Auth.Auth }}{{end}}
  {{ if $v.Auth.IdentityToken }}identitytoken = {{ printf "%q" $v.Auth.IdentityToken }}{{end}}
{{end}}
{{end}}
{{end}}
{{end}}
{{end}}

{{- if not .NodeConfig.NoFlannel }}
[plugins."io.containerd.grpc.v1.cri".cni]
  bin_dir = "{{ .NodeConfig.AgentConfig.CNIBinDir }}"
  conf_dir = "{{ .NodeConfig.AgentConfig.CNIConfDir }}"
{{end}}

{{- if or .NodeConfig.Containerd.BlockIOConfig .NodeConfig.Containerd.RDTConfig }}
[plugins."io.containerd.service.v1.tasks-service"]
  {{ if .NodeConfig.Containerd.BlockIOConfig }}blockio_config_file = "{{ .NodeConfig.Containerd.BlockIOConfig }}"{{end}}
  {{ if .NodeConfig.Containerd.RDTConfig }}rdt_config_file = "{{ .NodeConfig.Containerd.RDTConfig }}"{{end}}
{{end}}

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = {{ .SystemdCgroup }}

[plugins."io.containerd.grpc.v1.cri".registry]
  config_path = "{{ .NodeConfig.Containerd.Registry }}"

{{ if .PrivateRegistryConfig }}
{{range $k, $v := .PrivateRegistryConfig.Configs }}
{{ if $v.Auth }}
[plugins."io.containerd.grpc.v1.cri".registry.configs."{{$k}}".auth]
  {{ if $v.Auth.Username }}username = {{ printf "%q" $v.Auth.Username }}{{end}}
  {{ if $v.Auth.Password }}password = {{ printf "%q" $v.Auth.Password }}{{end}}
  {{ if $v.Auth.Auth }}auth = {{ printf "%q" $v.Auth.Auth }}{{end}}
  {{ if $v.Auth.IdentityToken }}identitytoken = {{ printf "%q" $v.Auth.IdentityToken }}{{end}}
{{end}}
{{end}}
{{end}}

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia"]
  runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."nvidia".options]
  BinaryName = "/usr/local/nvidia/toolkit/nvidia-container-runtime"
  SystemdCgroup = false

@brandond
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

2 participants