Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when add meta config v then pod restart, and meta pod report select leader failed #431

Closed
jinyingsunny opened this issue Jan 31, 2024 · 3 comments
Assignees
Labels
affects/master PR/issue: this bug affects master version. process/done Process of bug severity/blocker Severity of bug type/bug Type: something is unexpected

Comments

@jinyingsunny
Copy link

jinyingsunny commented Jan 31, 2024

as title

root@k8s-master:/home/sunny.liu/k8s_file# kubectl -n nebula get pod
NAME                                READY   STATUS              RESTARTS   AGE
nebulav-console                     1/1     Running             0          41m
nebulav-exporter-5cdf9575dc-qfkcz   1/1     Running             0          42m
nebulav-graphd-0                    1/1     Running             0          9m1s
nebulav-graphd-1                    1/1     Running             0          9m22s
nebulav-graphd-2                    1/1     Running             0          9m43s
nebulav-graphd-3                    1/1     Running             0          10m
nebulav-graphd-4                    1/1     Running             0          10m
nebulav-graphd-5                    1/1     Running             0          10m
nebulav-metad-0                     1/1     Running             0          42m
nebulav-metad-1                     1/1     Running             0          42m
nebulav-metad-2                     0/1     ContainerCreating   0          6m38s
nebulav-storaged-0                  1/1     Running             0          42m
nebulav-storaged-1                  1/1     Running             0          42m
nebulav-storaged-2                  1/1     Running             0          42m
root@k8s-master:/home/sunny.liu/k8s_file# kubectl -n nebula describe pod nebulav-metad-2
Name:             nebulav-metad-2
Namespace:        nebula
Priority:         0
Service Account:  nebula-sa
Node:             liuxue/192.168.8.238
Start Time:       Wed, 31 Jan 2024 19:11:33 +0800
Labels:           app.kubernetes.io/cluster=nebulav
                  app.kubernetes.io/component=metad
                  app.kubernetes.io/managed-by=nebula-operator
                  app.kubernetes.io/name=nebula-graph
                  controller-revision-hash=nebulav-metad-9bbb8945
                  statefulset.kubernetes.io/pod-name=nebulav-metad-2
Annotations:      nebula-graph.io/cm-hash: caad35a3716230f1
Status:           Pending
IP:
IPs:              <none>
Controlled By:    StatefulSet/nebulav-metad
Containers:
  metad:
    Container ID:
    Image:         reg.vesoft-inc.com/rc/nebula-metad-ent:v3.5.0
    Image ID:
    Ports:         9559/TCP, 19559/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /bin/sh
      -ecx
      exec /usr/local/nebula/bin/nebula-metad --flagfile=/usr/local/nebula/etc/nebula-metad.conf --meta_server_addrs=nebulav-metad-0.nebulav-metad-headless.nebula.svc.cluster.local:9559,nebulav-metad-1.nebulav-metad-headless.nebula.svc.cluster.local:9559,nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local:9559 --local_ip=$(hostname).nebulav-metad-headless.nebula.svc.cluster.local --daemonize=false --license_manager_url=192.168.8.53:9119
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:      300m
      memory:   500Mi
    Readiness:  http-get http://:19559/status delay=10s timeout=5s period=10s #success=1 #failure=3
    Environment:
      MY_IP:       (v1:status.podIP)
      HTTP_PORT:  19559
      POST_DATA:  {"v":"2"}
      SCRIPT:
                  set -x

                  while :
                  do
                    curl -i -X PUT -H "Content-Type: application/json" -d ${POST_DATA} -s "http://${MY_IP}:${HTTP_PORT}/flags"
                    if [ $? -eq 0 ]
                    then
                      break
                    fi
                    sleep 1
                  done


    Mounts:
      /usr/local/nebula/data from metad-data (rw,path="data")
      /usr/local/nebula/etc/nebula-metad.conf from nebulav-metad (rw,path="nebula-metad.conf")
      /usr/local/nebula/logs from metad-log (rw,path="logs")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2qzwd (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  metad-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  metad-data-nebulav-metad-2
    ReadOnly:   false
  metad-log:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  metad-log-nebulav-metad-2
    ReadOnly:   false
  nebulav-metad:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      nebulav-metad
    Optional:  false
  kube-api-access-2qzwd:
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    ConfigMapOptional:        <nil>
    DownwardAPI:              true
QoS Class:                    Burstable
Node-Selectors:               nebula=cloud
Tolerations:                  node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector app.kubernetes.io/cluster=nebulav,app.kubernetes.io/component=metad,app.kubernetes.io/managed-by=nebula-operator,app.kubernetes.io/name=nebula-graph
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  108s  default-scheduler  Successfully assigned nebula/nebulav-metad-2 to liuxue
  Normal  Pulling    108s  kubelet            Pulling image "reg.vesoft-inc.com/rc/nebula-metad-ent:v3.5.0"
  Normal  Pulled     107s  kubelet            Successfully pulled image "reg.vesoft-inc.com/rc/nebula-metad-ent:v3.5.0" in 1.041233625s (1.041247286s including waiting)
  Normal  Created    107s  kubelet            Created container metad
  Normal  Started    107s  kubelet            Started container metad

对应节点上的meta.INFO日志,表面现象就是leader选不出来:

I20240131 10:35:45.438298     1 MetaDaemon.cpp:162] localhost = "nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local":9559
I20240131 10:35:45.438529     1 PartManager.h:303] handler_ is null 1
I20240131 10:35:45.440829     1 NebulaStore.cpp:96] Start the raft service...
I20240131 10:35:45.441242     1 NebulaSnapshotManager.cpp:25] Send snapshot is rate limited to 10485760 for each part by default
I20240131 10:35:45.487561     1 RaftexService.cpp:48] Start raft service on 9560
I20240131 10:35:45.487679     1 NebulaStore.cpp:108] Register handler...
I20240131 10:35:45.487689     1 NebulaStore.cpp:136] Scan the local path, and init the spaces_
E20240131 10:35:45.487705     1 FileUtils.cpp:405] Failed to read the directory "/usr/local/nebula/data/meta/nebula" (2): No such file or directory
I20240131 10:35:45.487864     1 NebulaStore.cpp:311] Init data from partManager for "nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local":9559
I20240131 10:35:45.487880     1 NebulaStore.cpp:462] Create data space 0
W20240131 10:35:45.487942     1 DiskManager.cpp:176] Disk path /usr/local/nebula/data/meta does not exist!
I20240131 10:35:45.518112     1 RocksEngine.cpp:117] open rocksdb on /usr/local/nebula/data/meta/nebula/0/0/data
I20240131 10:35:45.528494     1 NebulaStore.cpp:536] Space 0, part 0 has been added, asLearner 0
I20240131 10:35:45.528522     1 MetaDaemonInit.cpp:169] Waiting for the leader elected...
I20240131 10:35:45.528529     1 MetaDaemonInit.cpp:181] Leader has not been elected, sleep 1s
I20240131 10:35:46.528605     1 MetaDaemonInit.cpp:181] Leader has not been elected, sleep 1s
I20240131 10:35:47.528707     1 MetaDaemonInit.cpp:181] Leader has not been elected, sleep 1s
I20240131 10:35:48.528827     1 KVBasedClusterIdMan.h:118] There is no clusterId existed in kvstore!
I20240131 10:35:48.528856     1 MetaDaemonInit.cpp:204] I am follower, wait for the leader's clusterId
I20240131 10:35:48.528861     1 MetaDaemonInit.cpp:206] Waiting for the leader's clusterId
I20240131 10:35:49.529033     1 KVBasedClusterIdMan.h:118] There is no clusterId existed in kvstore!
I20240131 10:35:49.529072     1 MetaDaemonInit.cpp:206] Waiting for the leader's clusterId
I20240131 10:35:50.529165     1 KVBasedClusterIdMan.h:118] There is no clusterId existed in kvstore!
I20240131 10:35:50.529189     1 MetaDaemonInit.cpp:206] Waiting for the leader's clusterId
I20240131 10:35:51.529305     1 KVBasedClusterIdMan.h:118] There is no clusterId existed in kvstore!
I20240131 10:35:51.529332     1 MetaDaemonInit.cpp:206] Waiting for the leader's clusterId
I20240131 10:35:52.529660     1 RocksEngine.cpp:555] Target checkpoint data path : /usr/local/nebula/data/meta/nebula/0/0/checkpoints/META_UPGRADE_SNAPSHOT_2024_01_31_10_35_52/data
I20240131 10:35:52.547821     1 RocksEngine.cpp:555] Target checkpoint data path : /usr/local/nebula/data/meta/nebula/0/0/checkpoints/META_UPGRADE_SNAPSHOT_2024_01_31_10_35_52/data
I20240131 10:35:52.564363     1 MetaDaemonInit.cpp:249] Nebula store init succeeded, clusterId 903579623150608002
I20240131 10:35:52.564379     1 MetaDaemon.cpp:198] Start http service
I20240131 10:35:52.564558     1 MetaDaemon.cpp:205] [License Manager] Connect to license manager, address: 192.168.8.53:9119
I20240131 10:35:52.564610     1 LicenseManagerConnector.cpp:56] [License Manager] product id: 903579623150608002
I20240131 10:35:52.566996     1 LicenseManagerConnector.cpp:64] [License Manager] Get license manager ID successfully, LMId: HK3V-HKZM
I20240131 10:35:52.567116     1 LicenseManagerConnector.cpp:90] [License Manager] Initialization succeeded, LMId: HK3V-HKZM
I20240131 10:35:52.567126     1 MetaDaemonInit.cpp:341] Starting Meta HTTP Service
I20240131 10:35:52.567274    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498522, storageQuota: 2498014, lastUpdateTime: 1706697338
I20240131 10:35:52.568538    99 WebService.cpp:130] Web service started on HTTP[19559]
I20240131 10:35:52.568570     1 MetaDaemonInit.cpp:307] Check root user
I20240131 10:35:52.568610     1 RootUserMan.h:35] God user exists
W20240131 10:35:52.571058     1 MetaDaemon.cpp:266] Black box is disabled.
I20240131 10:35:52.571254     1 MetaDaemon.cpp:271] The meta daemon start on "nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local":9559
I20240131 10:35:52.571281     1 JobManager.cpp:88] Not leader, skip reading remaining jobs
I20240131 10:35:52.571346     1 JobManager.cpp:64] JobManager initialized
I20240131 10:35:52.571780   105 JobManager.cpp:150] JobManager::scheduleThread enter
I20240131 10:36:02.525667   113 HBProcessor.cpp:36] Receive heartbeat from "nebulav-storaged-2.nebulav-storaged-headless.nebula.svc.cluster.local":9779, role = STORAGE
I20240131 10:36:02.525732   113 HBProcessor.cpp:53] Machine "nebulav-storaged-2.nebulav-storaged-headless.nebula.svc.cluster.local":9779 is not registered
I20240131 10:36:07.969594   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 10:36:14.897624   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 10:36:15.068790   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 10:36:18.422425   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 10:36:53.475281    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706697395
I20240131 10:37:56.395320    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706697457
I20240131 10:38:55.197206    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706697513
I20240131 10:39:51.147907    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706697576
I20240131 10:40:46.311695    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706697635
I20240131 10:41:42.991394    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706697694
I20240131 10:42:40.737295    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706697755
I20240131 10:43:36.475239    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706697814
I20240131 10:44:36.513720    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706697872
I20240131 10:45:34.431366    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706697931
I20240131 10:46:31.889339    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706697931
I20240131 10:47:33.195380    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706698050
I20240131 10:48:32.403376    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706698106
I20240131 10:49:35.887533    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706698161
I20240131 10:50:38.515333    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706698221
I20240131 10:51:42.513160    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706698286
I20240131 10:52:39.123322    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706698345
I20240131 10:53:36.371359    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706698409
I20240131 10:54:05.832768   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 10:54:35.419584    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706698467
I20240131 10:55:33.987351    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706698522
I20240131 10:56:31.320688    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706698581
I20240131 10:56:47.036199   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 10:56:49.430006   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 10:57:35.533393    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706698640
I20240131 10:58:35.799302    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706698698
I20240131 10:59:37.144786    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706698754
I20240131 11:00:41.303352    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498162, storageQuota: 2497642, lastUpdateTime: 1706698810
I20240131 11:01:46.103931    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498162, storageQuota: 2497642, lastUpdateTime: 1706698869
I20240131 11:02:43.919332    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498162, storageQuota: 2497642, lastUpdateTime: 1706698927
I20240131 11:03:45.978574    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498162, storageQuota: 2497642, lastUpdateTime: 1706698985
I20240131 11:04:48.913203    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498162, storageQuota: 2497642, lastUpdateTime: 1706699042
I20240131 11:04:59.628901   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:05:00.528554   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:05:01.429610   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:05:10.367007   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:05:10.493301   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:05:30.678907   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:05:46.686457   113 HBProcessor.cpp:36] Receive heartbeat from "nebulav-graphd-3.nebulav-graphd-headless.nebula.svc.cluster.local":9669, role = GRAPH
I20240131 11:05:46.686609   113 ActiveHostsMan.cpp:125] Failed to get machines, error E_LEADER_CHANGED
E20240131 11:05:46.686625   113 HBProcessor.cpp:99] [License Manager] Resource usage check failed, reject heartbeat from "nebulav-graphd-3.nebulav-graphd-headless.nebula.svc.cluster.local":9669, error: E_LEADER_CHANGED
I20240131 11:05:51.703392    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498162, storageQuota: 2497642, lastUpdateTime: 1706699099
I20240131 11:06:01.134460   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:06:03.088264   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:06:08.469533   113 HBProcessor.cpp:36] Receive heartbeat from "nebulav-graphd-2.nebulav-graphd-headless.nebula.svc.cluster.local":9669, role = GRAPH
I20240131 11:06:08.469574   113 ActiveHostsMan.cpp:125] Failed to get machines, error E_LEADER_CHANGED
E20240131 11:06:08.469589   113 HBProcessor.cpp:99] [License Manager] Resource usage check failed, reject heartbeat from "nebulav-graphd-2.nebulav-graphd-headless.nebula.svc.cluster.local":9669, error: E_LEADER_CHANGED
I20240131 11:06:11.462049   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:06:26.879314   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:06:46.847257   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:06:56.119325    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498162, storageQuota: 2497642, lastUpdateTime: 1706699158
I20240131 11:07:02.079025   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:07:08.991068   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:07:16.190618   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:07:19.790959   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:07:26.089530   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:07:34.271867   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:07:45.090745   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:07:54.407570   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:08:00.297727    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706699223
I20240131 11:08:06.499989   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:08:06.658623   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:08:27.415412   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:08:27.990500   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:08:28.889681   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:08:34.924608   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:08:45.103807   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:08:48.835413   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:08:48.969771   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:08:51.229332   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:08:53.425906   113 HBProcessor.cpp:36] Receive heartbeat from "nebulav-graphd-1.nebulav-graphd-headless.nebula.svc.cluster.local":9669, role = GRAPH
I20240131 11:08:53.426008   113 ActiveHostsMan.cpp:125] Failed to get machines, error E_LEADER_CHANGED
E20240131 11:08:53.426026   113 HBProcessor.cpp:99] [License Manager] Resource usage check failed, reject heartbeat from "nebulav-graphd-1.nebulav-graphd-headless.nebula.svc.cluster.local":9669, error: E_LEADER_CHANGED
I20240131 11:08:55.217540   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:08:55.446993   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:08:55.809818   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:08:56.709738   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:08:58.020574    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706699285
I20240131 11:09:14.702630   113 HBProcessor.cpp:36] Receive heartbeat from "nebulav-graphd-0.nebulav-graphd-headless.nebula.svc.cluster.local":9669, role = GRAPH
I20240131 11:09:14.702688   113 ActiveHostsMan.cpp:125] Failed to get machines, error E_LEADER_CHANGED
E20240131 11:09:14.702706   113 HBProcessor.cpp:99] [License Manager] Resource usage check failed, reject heartbeat from "nebulav-graphd-0.nebulav-graphd-headless.nebula.svc.cluster.local":9669, error: E_LEADER_CHANGED
I20240131 11:09:44.061359   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:09:54.860700   113 ListHostsProcessor.cpp:384] List Hosts Failed, error E_LEADER_CHANGED
I20240131 11:10:00.795286    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706699349
I20240131 11:11:00.540163    98 LicenseManagerConnector.cpp:787] Load license manager response from kvstore successfully, type: CPU, graphQuota: 2498546, storageQuota: 2498026, lastUpdateTime: 1706699404
I20240131 11:11:32.882158     1 MetaDaemon.cpp:312] Signal 15(Terminated) received, stopping this server
I20240131 11:11:32.886912     1 JobManager.cpp:138] JobManager::shutDown() begin
I20240131 11:11:32.891616   105 JobManager.cpp:155] Detect shutdown called, exit
I20240131 11:11:32.891705   105 JobDescription.cpp:113] Loading job description failed, error: E_LEADER_CHANGED
I20240131 11:11:32.891742   105 JobManager.cpp:175] Load an invalid job from space 0 jodId 0
I20240131 11:11:32.891875     1 JobManager.cpp:146] JobManager::shutDown() end
I20240131 11:11:32.891892     1 NebulaStore.cpp:362] Stop the raft service...
I20240131 11:11:32.891899     1 RaftexService.cpp:69] Stopping the raftex service on port 9560
I20240131 11:11:32.891937     1 RaftexService.cpp:79] All partitions have stopped
I20240131 11:11:32.891947     1 NebulaStore.cpp:367] Stop kv engine...
I20240131 11:11:32.892185     1 NebulaStore.cpp:362] Stop the raft service...
I20240131 11:11:32.892195     1 RaftexService.cpp:69] Stopping the raftex service on port 9560
I20240131 11:11:32.892200     1 RaftexService.cpp:79] All partitions have stopped
I20240131 11:11:32.892206     1 NebulaStore.cpp:367] Stop kv engine...
I20240131 11:11:32.892216     1 NebulaStore.cpp:51] Cut off the relationship with meta client
I20240131 11:11:32.892773     1 Part.h:64] [Port: 9560, Space: 0, Part: 0] ~Part()
I20240131 11:11:32.903903     1 RocksEngine.h:208] Release rocksdb on /usr/local/nebula/data/meta/nebula/0/0
I20240131 11:11:32.904755     1 NebulaStore.cpp:63] ~NebulaStore()
I20240131 11:11:32.905380     1 LicenseManagerConnector.h:81] License manager connector shuting down...
I20240131 11:11:32.905792     1 MetaDaemon.cpp:291] The meta server stopped
I20240131 11:11:32.905808     1 MetaDaemon.cpp:297] The meta Daemon stopped
I20240131 11:11:32.907467     1 JobManager.cpp:138] JobManager::shutDown() begin
I20240131 11:11:32.907485     1 JobManager.cpp:141] JobManager not running, exit
I20240131 11:11:35.516305     1 MetaDaemon.cpp:162] localhost = "nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local":9559
I20240131 11:11:35.516575     1 PartManager.h:303] handler_ is null 1
I20240131 11:11:35.518914     1 NebulaStore.cpp:96] Start the raft service...
I20240131 11:11:35.519510     1 NebulaSnapshotManager.cpp:25] Send snapshot is rate limited to 10485760 for each part by default
I20240131 11:11:35.583485     1 RaftexService.cpp:48] Start raft service on 9560
I20240131 11:11:35.583572     1 NebulaStore.cpp:108] Register handler...
I20240131 11:11:35.583581     1 NebulaStore.cpp:136] Scan the local path, and init the spaces_
I20240131 11:11:35.583652     1 NebulaStore.cpp:144] Scan data path "/usr/local/nebula/data/meta/nebula/0"
I20240131 11:11:35.583671     1 NebulaStore.cpp:311] Init data from partManager for "nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local":9559
I20240131 11:11:35.583693     1 NebulaStore.cpp:462] Create data space 0
W20240131 11:11:35.583755     1 DiskManager.cpp:176] Disk path /usr/local/nebula/data/meta does not exist!
I20240131 11:11:35.611538     1 RocksEngine.cpp:117] open rocksdb on /usr/local/nebula/data/meta/nebula/0/0/data
I20240131 11:11:35.630159     1 NebulaStore.cpp:536] Space 0, part 0 has been added, asLearner 0
I20240131 11:11:35.630194     1 MetaDaemonInit.cpp:169] Waiting for the leader elected...
I20240131 11:11:35.630203     1 MetaDaemonInit.cpp:181] Leader has not been elected, sleep 1s
.....

在当前的metaleader,meta1上ping另外两个meta节点,follower metad0是好的,metad2是挂的。
image

console中,metaleader正常,集群正常

(root@nebula) [王3]> show hosts meta
+-------------------------------------------------------------------+------+-----------+--------+--------------+-------------+
| Host                                                              | Port | Status    | Role   | Git Info Sha | Version     |
+-------------------------------------------------------------------+------+-----------+--------+--------------+-------------+
| "nebulav-metad-1.nebulav-metad-headless.nebula.svc.cluster.local" | 9559 | "ONLINE"  | "META" | "ec14175"    | "3.5.0-ent" |
| "nebulav-metad-0.nebulav-metad-headless.nebula.svc.cluster.local" | 9559 | "ONLINE"  | "META" | "ec14175"    | "3.5.0-ent" |
| "nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local" | 9559 | "OFFLINE" | "META" | "ec14175"    | "3.5.0-ent" |
+-------------------------------------------------------------------+------+-----------+--------+--------------+-------------+
Got 3 rows (time spent 1.693ms/2.301341ms)

Wed, 31 Jan 2024 11:24:35 UTC

(root@nebula) [王3]> show hosts storage
+-------------------------------------------------------------------------+------+----------+-----------+--------------+-------------+
| Host                                                                    | Port | Status   | Role      | Git Info Sha | Version     |
+-------------------------------------------------------------------------+------+----------+-----------+--------------+-------------+
| "nebulav-storaged-0.nebulav-storaged-headless.nebula.svc.cluster.local" | 9779 | "ONLINE" | "STORAGE" | "ec14175"    | "3.5.0-ent" |
| "nebulav-storaged-1.nebulav-storaged-headless.nebula.svc.cluster.local" | 9779 | "ONLINE" | "STORAGE" | "ec14175"    | "3.5.0-ent" |
| "nebulav-storaged-2.nebulav-storaged-headless.nebula.svc.cluster.local" | 9779 | "ONLINE" | "STORAGE" | "ec14175"    | "3.5.0-ent" |
+-------------------------------------------------------------------------+------+----------+-----------+--------------+-------------+
Got 3 rows (time spent 1.395ms/1.901869ms)

Wed, 31 Jan 2024 11:24:38 UTC

(root@nebula) [王3]> show meta leadert
[ERROR (-1004)]: SyntaxError: syntax error near `leadert'

Wed, 31 Jan 2024 11:24:44 UTC

(root@nebula) [王3]> show meta leader
+------------------------------------------------------------------------+---------------------------+
| Meta Leader                                                            | secs from last heart beat |
+------------------------------------------------------------------------+---------------------------+
| "nebulav-metad-1.nebulav-metad-headless.nebula.svc.cluster.local:9559" | 7                         |
+------------------------------------------------------------------------+---------------------------+
Got 1 rows (time spent 473µs/942.216µs)

Wed, 31 Jan 2024 11:24:46 UTC

(root@nebula) [王3]> match(v) return v limit 2
+-----------------------------------------------------------+
| v                                                         |
+-----------------------------------------------------------+
| ("长城守卫军" :组织{名称: "长城守卫军", 国际组织: false}) |
| ("猎妖者" :组织{名称: "猎妖者", 国际组织: false})         |
+-----------------------------------------------------------+
Got 2 rows (time spent 10.694ms/11.310095ms)

Wed, 31 Jan 2024 11:24:56 UTC

当前meta_leader上的日志

sh-4.2# tail -f logs/nebula-metad.INFO
E20240131 11:27:56.672024    67 ThriftClientManager-inl.h:70] Failed to resolve address for 'nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
E20240131 11:27:56.672082    68 ThriftClientManager-inl.h:70] Failed to resolve address for 'nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
I20240131 11:27:57.382710   115 HBProcessor.cpp:36] Receive heartbeat from "nebulav-graphd-1.nebulav-graphd-headless.nebula.svc.cluster.local":9669, role = GRAPH
E20240131 11:27:57.386005    69 ThriftClientManager-inl.h:70] Failed to resolve address for 'nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
E20240131 11:27:58.534116    70 ThriftClientManager-inl.h:70] Failed to resolve address for 'nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
E20240131 11:27:58.534821    71 ThriftClientManager-inl.h:70] Failed to resolve address for 'nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
I20240131 11:27:59.051434   115 HBProcessor.cpp:36] Receive heartbeat from "nebulav-graphd-0.nebulav-graphd-headless.nebula.svc.cluster.local":9669, role = GRAPH
E20240131 11:27:59.054920    72 ThriftClientManager-inl.h:70] Failed to resolve address for 'nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
E20240131 11:28:00.635208    74 ThriftClientManager-inl.h:70] Failed to resolve address for 'nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
E20240131 11:28:00.635385    73 ThriftClientManager-inl.h:70] Failed to resolve address for 'nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
E20240131 11:28:02.407060    60 ThriftClientManager-inl.h:70] Failed to resolve address for 'nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
E20240131 11:28:02.407060    59 ThriftClientManager-inl.h:70] Failed to resolve address for 'nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
I20240131 11:28:03.003306   115 HBProcessor.cpp:36] Receive heartbeat from "nebulav-graphd-4.nebulav-graphd-headless.nebula.svc.cluster.local":9669, role = GRAPH
E20240131 11:28:03.006033    61 ThriftClientManager-inl.h:70] Failed to resolve address for 'nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
I20240131 11:28:03.486295   115 HBProcessor.cpp:36] Receive heartbeat from "nebulav-graphd-5.nebulav-graphd-headless.nebula.svc.cluster.local":9669, role = GRAPH
E20240131 11:28:03.489298    62 ThriftClientManager-inl.h:70] Failed to resolve address for 'nebulav-metad-2.nebulav-metad-headless.nebula.svc.cluster.local': Name or service not known (error=-2): Unknown error -2
I20240131 11:28:04.206938   115 HBProcessor.cpp:36] Receive heartbeat from "nebulav-storaged-2.nebulav-storaged-headless.nebula.svc.cluster.local":9779, role = STORAGE

我的配置文件:

apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaCluster
metadata:
  name: nebulav
  namespace: nebula
spec:
  console:
    image: vesoft/nebula-console
    version: v3.6.0
  #alpineImage: ""
  enablePVReclaim: true
  exporter:
    httpPort: 9100
    image: vesoft/nebula-stats-exporter
    maxRequests: 20
    replicas: 1
    version: latest
  failoverPeriod: 5m0s
  graphd:
    config:
      stderrthreshold: "0"
      v: "3"
    image: reg.vesoft-inc.com/rc/nebula-graphd-ent
    replicas: 6
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 200m
        memory: 500Mi
    version: v3.5.0
  imagePullPolicy: Always
  imagePullSecrets:
  - name: image-pull-secret
  metad:
    licenseManagerURL: 192.168.8.53:9119
    config:
      stderrthreshold: "2"
      zone_list: us-east-2a,us-east-2b,us-east-2c
      #v: "2"
      #license_manager_url: nebula-license-manager.nebula-license-manager.svc.cluster.local:9119
    dataVolumeClaim:
      resources:
        requests:
          storage: 2Gi
      storageClassName: local-path
    logVolumeClaim:
      resources:
        requests:
          storage: 1Gi
      storageClassName: local-path
    image: reg.vesoft-inc.com/rc/nebula-metad-ent
    replicas: 3
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 300m
        memory: 500Mi
    version: v3.5.0
  reference:
    name: statefulsets.apps
    version: v1
  schedulerName: default-scheduler
  sslCerts:
    caCert: root.crt
    caSecret: ca-cert
    clientCACert: ca.crt
    clientCert: tls.crt
    clientKey: tls.key
    clientSecret: client-cert
    insecureSkipVerify: true
    serverCert: tls.crt
    serverKey: tls.key
    serverSecret: server-cert
  storaged:
    config:
      stderrthreshold: "2"
    dataVolumeClaims:
    - resources:
        requests:
          storage: 1.2Gi
      storageClassName: local-path
    - resources:
        requests:
          storage: 1.3Gi
      storageClassName: local-path
    - resources:
        requests:
          storage: 1.4Gi
      storageClassName: local-path
    enableAutoBalance: true
    logVolumeClaim:
      resources:
        requests:
          storage: 1Gi
      storageClassName: local-path
    image: reg.vesoft-inc.com/rc/nebula-storaged-ent
    replicas: 3
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: 300m
        memory: 500Mi
    version: v3.5.0
  topologySpreadConstraints:
  - topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: ScheduleAnyway
  imagePullSecrets:
  - name: image-nebula-ent-sc-secret
  nodeSelector:
    nebula: cloud

Your Environments (required)

operator1.30

How To Reproduce(required)

Steps to reproduce the behavior:

1. 启动集群;
2. edit nc nebulav
3. 在metad的config下增加v: "2"

Expected behavior

meta start succeed

@jinyingsunny jinyingsunny added the type/bug Type: something is unexpected label Jan 31, 2024
@github-actions github-actions bot added affects/none PR/issue: this bug affects none version. severity/none Severity of bug labels Jan 31, 2024
@jinyingsunny jinyingsunny changed the title when modify meta config v‘s value then pod restart, and meta pod report select leader failed when add meta config v then pod restart, and meta pod report select leader failed Jan 31, 2024
@MegaByte875 MegaByte875 self-assigned this Feb 2, 2024
@MegaByte875
Copy link
Contributor

#434

@jinyingsunny
Copy link
Author

用operator1.34验证上面反馈的
第二个问题已解决:meta pod 启动后,直接成功,且动态参数已经生效。
第一个问题,动态参数变更后,meta pod仍有重启。

@jinyingsunny
Copy link
Author

用operator1.35验证
第一个问题,动态参数变更后,meta pod不需要重启,运行时参数已经生效。

@github-actions github-actions bot added the process/fixed Process of bug label Feb 20, 2024
@jinyingsunny jinyingsunny added affects/master PR/issue: this bug affects master version. severity/blocker Severity of bug process/done Process of bug and removed severity/none Severity of bug affects/none PR/issue: this bug affects none version. process/fixed Process of bug labels Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/master PR/issue: this bug affects master version. process/done Process of bug severity/blocker Severity of bug type/bug Type: something is unexpected
Projects
None yet
Development

No branches or pull requests

2 participants