-
I've got a K3S cluster with 3 masters and 3 workers that's been running just fine for the past several months. However, my first etcd master node (node1) is spamming my journalctl log multiple times a second with this pair of similar entries:
The same error does not occur on the other two masters. Other than the spamming, everything appears to be fine. I can use etcdctl to change leader to/from any of the 3 nodes. Bringing down any one node has no ill effect on the overall cluster. I've tried deleting and rejoining the node from the cluster multiple times. I've re-imaged the node (but using the same computer name). I've tried compacting and defragging the database multiple times. I've removed two of the three masters and re-joined. I've tried backing up and restoring etcd database while initiating a cluster-reset. But no matter what I do, as soon as I bring node1 back online, the errors start spamming on that node. The only thing that changes in the error message is the local-member-id The interesting thing is that the IDs for remote-peer-cluster-id, remote-peer-server-name, and local-member-cluster-id never changes, even though those nodes have been removed/rejoined several times and have new IDs. It seems as though there is some stale info in the database that I have no idea how to get rid of. Again, everything else seems fine, except for the log spam on node1. How can I clean up the etcd database to get rid of these old entries (assuming that's the issue here)? Node endpoint status:
Environmental Info:k3s version v1.27.5+k3s1 (k3s-io/k3s@8d074ec) Node(s) CPU architecture, OS, and Version:Linux node1 5.15.0-84-generic #93-Ubuntu SMP Tue Sep 5 17:16:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux Cluster Configuration:3 servers |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 7 replies
-
please open in k3s repo! |
Beta Was this translation helpful? Give feedback.
-
Could you generate a report using etcd-diagnosis? Example command:
|
Beta Was this translation helpful? Give feedback.
The new warning log makes lots of sense. It means an unknown etcd instance was trying to connect to one member (89a79bc17f234c01) in the cluster (398dad8ab81b9249). You need to find out the unknown etcd instance and shut it down or correct its configuration.