You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am sure I am doing something stupid but figured I would open this in case it is a bug (or you can educate me).
I have been playing with Talos in lab using terraform to provision it on a small proxmox cluster. All the VMs get created fine and I can bootstrap the first control plane node but all nodes then fail to connect to discovery.talos.dev. The logs fill with...
At this point the cluster sort of builds but my call to talosctl health results in nodes only being able to see themselves.
talosctl health -n k8s-cp-1
discovered nodes: ["192.168.2.81"]
waiting for etcd to be healthy: ...
waiting for etcd to be healthy: OK
waiting for etcd members to be consistent across nodes: ...
waiting for etcd members to be consistent across nodes: OK
waiting for etcd members to be control plane nodes: ...
waiting for etcd members to be control plane nodes: etcd member ips ["192.168.2.82" "192.168.2.83" "192.168.2.81"] are not subset of control plane node ips ["192.168.2.81" "2001:4d48:ad5e:e02:b8d0:ff:fe01:1"]
Interestingly kubernetes does seem to make the cluster though
kg nodes
NAME STATUS ROLES AGE VERSION
k8s-cp-1 Ready control-plane 19m v1.32.1
k8s-cp-2 Ready control-plane 17m v1.32.1
k8s-cp-3 Ready control-plane 18m v1.32.1
k8s-wk-1 Ready 18m v1.32.1
k8s-wk-2 Ready 19m v1.32.1
There are no firewall rules preventing outbound connectivity to discovery.talos.dev:443 and a curl from a machine on the same network works fine. I have checked all the obvious things I can think of such as NTP, network reachability, DNS etc.
If I downgrade to kubernetes 1.31 and use the kubernetes registry I can get the cluster to build.
In case this matters I am trying to build with cilium as an inline manifest.
Anyway let me know if you need any more info or what me to run something but I am at a bit of a loss.
Just to add to this I booted the cluster using the kubernetes registry and kubernetes version 1.31.5 and then edited the machine config for each node and rebooted one of them (so far). The node that rebooted still prints the "transport: authentication handshake failed: context deadline exceeded" error. "talosctl get affiliates --namespace=cluster-raw" also only has entries from the k8s service. Nothing appears from the discovery service. my machine config look like this anyway.
One other thing that I thought might be relevant. The cluster is using an image generated by https://factory.talos.dev/. It has these additional options added to it.
Bug Report
2025-01-24T18:04:21.623Z �[31mERROR�[0m hello failed {"component": "controller-runtime", "controller": "cluster.DiscoveryServiceController", "error": "rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: context deadline exceeded"", "endpoint": "discovery.talos.dev:443"}
Description
Hi all,
I am sure I am doing something stupid but figured I would open this in case it is a bug (or you can educate me).
I have been playing with Talos in lab using terraform to provision it on a small proxmox cluster. All the VMs get created fine and I can bootstrap the first control plane node but all nodes then fail to connect to discovery.talos.dev. The logs fill with...
2025-01-24T18:04:21.623Z �[31mERROR�[0m hello failed {"component": "controller-runtime", "controller": "cluster.DiscoveryServiceController", "error": "rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: context deadline exceeded"", "endpoint": "discovery.talos.dev:443"}
At this point the cluster sort of builds but my call to talosctl health results in nodes only being able to see themselves.
talosctl health -n k8s-cp-1
discovered nodes: ["192.168.2.81"]
waiting for etcd to be healthy: ...
waiting for etcd to be healthy: OK
waiting for etcd members to be consistent across nodes: ...
waiting for etcd members to be consistent across nodes: OK
waiting for etcd members to be control plane nodes: ...
waiting for etcd members to be control plane nodes: etcd member ips ["192.168.2.82" "192.168.2.83" "192.168.2.81"] are not subset of control plane node ips ["192.168.2.81" "2001:4d48:ad5e:e02:b8d0:ff:fe01:1"]
Interestingly kubernetes does seem to make the cluster though
kg nodes
NAME STATUS ROLES AGE VERSION
k8s-cp-1 Ready control-plane 19m v1.32.1
k8s-cp-2 Ready control-plane 17m v1.32.1
k8s-cp-3 Ready control-plane 18m v1.32.1
k8s-wk-1 Ready 18m v1.32.1
k8s-wk-2 Ready 19m v1.32.1
There are no firewall rules preventing outbound connectivity to discovery.talos.dev:443 and a curl from a machine on the same network works fine. I have checked all the obvious things I can think of such as NTP, network reachability, DNS etc.
If I downgrade to kubernetes 1.31 and use the kubernetes registry I can get the cluster to build.
In case this matters I am trying to build with cilium as an inline manifest.
Anyway let me know if you need any more info or what me to run something but I am at a bit of a loss.
Thanks
iain
Logs
support.zip
Environment
talosctl version --nodes <problematic nodes>
]talosctl version --nodes 192.168.2.81 ✔ py3.12 Py
Client:
Tag: v1.8.3
SHA: 6494ace
Built:
Go version: go1.22.9
OS/Arch: linux/amd64
Server:
NODE: 192.168.2.81
Tag: v1.9.2
SHA: 09758b3
Built:
Go version: go1.23.4
OS/Arch: linux/amd64
Enabled: RBAC
Kubernetes version: [
kubectl version --short
]kubectl version 1 ✘ py3.12 Py admin@k8s ○
Client Version: v1.32.0
Kustomize Version: v5.5.0
Server Version: v1.32.1
Platform:
The text was updated successfully, but these errors were encountered: