We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I tried creating a model with cachingProfile and storageClass but the storageclass doesn't exist.
Steps to reproduce:
Expected result: The model gets cleaned up automatically.
Current result: The model is stuck with finalizers and the evict cache and load cache pods stay pending forever.
Pods:
evict-cache-llama-3.1-8b-instruct-fp8-l4-ghlmq 0/1 Pending 0 70s kubeai-794576b9f-jt5p5 1/1 Running 0 106s load-cache-llama-3.1-8b-instruct-fp8-l4-nzpdm 0/1 Pending 0 87s openwebui-69ffb7dbb4-hb2lk 1/1 Running 0 106s
Model spec:
apiVersion: kubeai.org/v1 kind: Model metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"kubeai.org/v1","kind":"Model","metadata":{"annotations":{},"name":"llama-3.1-8b-instruct-fp8-l4","namespace":"default"},"spec":{"args":["--max-model-len=16384","--max-num-batched-token=16384","--gpu-memory-utilization=0.9","--disable-log-requests"],"cacheProfile":"efs-dynamic","engine":"VLLM","features":["TextGeneration"],"minReplicas":1,"owner":"neuralmagic","resourceProfile":"nvidia-gpu-l4:1","url":"hf://neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8"}} creationTimestamp: "2024-10-24T14:00:13Z" deletionGracePeriodSeconds: 0 deletionTimestamp: "2024-10-24T14:00:30Z" finalizers: - kubeai.org/cache-eviction generation: 3 labels: features.kubeai.org/TextGeneration: "true" name: llama-3.1-8b-instruct-fp8-l4 namespace: default resourceVersion: "7101" uid: 44ecf5d1-7437-43d4-ad66-6646963eab4a spec: args: - --max-model-len=16384 - --max-num-batched-token=16384 - --gpu-memory-utilization=0.9 - --disable-log-requests cacheProfile: efs-dynamic engine: VLLM features: - TextGeneration minReplicas: 1 owner: neuralmagic replicas: 1 resourceProfile: nvidia-gpu-l4:1 scaleDownDelaySeconds: 30 targetRequests: 100 url: hf://neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 status: cache: loaded: false replicas: all: 0 ready: 0
Logs in KubeAI:
2024-10-24T13:59:57Z INFO manager loaded config {"config": "allowPodAddressOverride: false\ncacheProfiles:\n efs-dynamic:\n sharedFilesystem:\n storageClassName: efs-sc\n efs-static:\n sharedFilesystem:\n persistentVolumeName: efs-pv\nhealthAddress: :8081\nleaderElection:\n leaseDuration: 15s\n renewDeadline: 10s\n retryPeriod: 2s\nmessaging:\n errorMaxBackoff: 30s\n streams: []\nmetricsAddr: :8080\nmodelAutoscaling:\n interval: 10s\n stateConfigMapName: kubeai-autoscaler-state\n timeWindow: 10m0s\nmodelLoaders:\n huggingface:\n image: substratusai/huggingface-model-loader:v0.9.0\nmodelRollouts:\n surge: 1\nmodelServerPods:\n securityContext:\n allowPrivilegeEscalation: false\n capabilities:\n drop:\n - ALL\n readOnlyRootFilesystem: false\n runAsUser: 0\n serviceAccountName: kubeai-models\nmodelServers:\n FasterWhisper:\n images:\n default: fedirz/faster-whisper-server:latest-cpu\n nvidia-gpu: fedirz/faster-whisper-server:latest-cuda\n Infinity:\n images:\n default: michaelf34/infinity:latest\n OLlama:\n images:\n default: ollama/ollama:latest\n VLLM:\n images:\n cpu: substratusai/vllm:v0.6.3.post1-cpu\n default: vllm/vllm-openai:v0.6.3.post1\n google-tpu: substratusai/vllm:v0.6.3.post1-tpu\nresourceProfiles:\n cpu:\n imageName: cpu\n requests:\n cpu: \"1\"\n memory: 2Gi\n nvidia-gpu-a100-40gb:\n imageName: nvidia-gpu\n limits:\n nvidia.com/gpu: \"1\"\n nodeSelector:\n node.kubernetes.io/instance-type: p4de.24xlarge\n tolerations:\n - effect: NoSchedule\n key: nvidia.com/gpu\n operator: Equal\n value: present\n nvidia-gpu-a100-80gb:\n imageName: nvidia-gpu\n limits:\n nvidia.com/gpu: \"1\"\n nodeSelector:\n node.kubernetes.io/instance-type: p4d.24xlarge\n tolerations:\n - effect: NoSchedule\n key: nvidia.com/gpu\n operator: Equal\n value: present\n nvidia-gpu-h100:\n imageName: nvidia-gpu\n limits:\n nvidia.com/gpu: \"1\"\n nodeSelector:\n node.kubernetes.io/instance-type: p5.48xlarge\n tolerations:\n - effect: NoSchedule\n key: nvidia.com/gpu\n operator: Equal\n value: present\n nvidia-gpu-l4:\n imageName: nvidia-gpu\n limits:\n nvidia.com/gpu: \"1\"\n nodeSelector:\n karpenter.k8s.aws/instance-gpu-name: l4\n requests:\n cpu: \"6\"\n memory: 24Gi\n nvidia.com/gpu: \"1\"\n tolerations:\n - effect: NoSchedule\n key: nvidia.com/gpu\n operator: Equal\n value: present\n nvidia-gpu-l40s:\n imageName: \"\"\n nodeSelector:\n karpenter.k8s.aws/instance-gpu-name: l40s\nsecretNames:\n huggingface: kubeai-huggingface\n"} 2024/10/24 13:59:57 Autoscaler state ConfigMap "models" has no key "default/kubeai-autoscaler-state", state not loaded 2024/10/24 13:59:57 Loaded last state of models: 0 total, last calculated on 0001-01-01 00:00:00 +0000 UTC 2024-10-24T13:59:57Z INFO manager starting controller-manager 2024-10-24T13:59:57Z INFO manager run launched all goroutines 2024-10-24T13:59:57Z INFO starting server {"name": "health probe", "addr": "[::]:8081"} 2024-10-24T13:59:57Z INFO manager starting api server {"addr": ":8000"} 2024-10-24T13:59:57Z INFO manager starting metrics server {"addr": ":8080"} 2024-10-24T13:59:57Z INFO manager starting leader election I1024 13:59:57.951075 1 leaderelection.go:250] attempting to acquire leader lease default/kubeai.org... 2024-10-24T13:59:57Z INFO Starting EventSource {"controller": "pod", "controllerGroup": "", "controllerKind": "Pod", "source": "kind source: *v1.Pod"} 2024-10-24T13:59:57Z INFO Starting Controller {"controller": "pod", "controllerGroup": "", "controllerKind": "Pod"} I1024 13:59:57.951759 1 leaderelection.go:250] attempting to acquire leader lease default/cc6bca10.substratus.ai... I1024 13:59:57.964735 1 leaderelection.go:260] successfully acquired lease default/kubeai.org 2024/10/24 13:59:57 "kubeai-794576b9f-jt5p5" started leading I1024 13:59:57.965962 1 leaderelection.go:260] successfully acquired lease default/cc6bca10.substratus.ai 2024-10-24T13:59:57Z DEBUG events kubeai-794576b9f-jt5p5_8fb2aee3-97c9-4306-a7c4-314f0889f83e became leader {"type": "Normal", "object": {"kind":"Lease","namespace":"default","name":"cc6bca10.substratus.ai","uid":"44c2a5a3-e00b-47bb-aecc-18dff348a894","apiVersion":"coordination.k8s.io/v1","resourceVersion":"6934"}, "reason": "LeaderElection"} 2024-10-24T13:59:57Z INFO Starting EventSource {"controller": "model", "controllerGroup": "kubeai.org", "controllerKind": "Model", "source": "kind source: *v1.Model"} 2024-10-24T13:59:57Z INFO Starting EventSource {"controller": "model", "controllerGroup": "kubeai.org", "controllerKind": "Model", "source": "kind source: *v1.Pod"} 2024-10-24T13:59:57Z INFO Starting EventSource {"controller": "model", "controllerGroup": "kubeai.org", "controllerKind": "Model", "source": "kind source: *v1.PersistentVolumeClaim"} 2024-10-24T13:59:57Z INFO Starting EventSource {"controller": "model", "controllerGroup": "kubeai.org", "controllerKind": "Model", "source": "kind source: *v1.Job"} 2024-10-24T13:59:57Z INFO Starting Controller {"controller": "model", "controllerGroup": "kubeai.org", "controllerKind": "Model"} 2024-10-24T13:59:58Z INFO Starting workers {"controller": "pod", "controllerGroup": "", "controllerKind": "Pod", "worker count": 1} 2024-10-24T13:59:58Z INFO Starting workers {"controller": "model", "controllerGroup": "kubeai.org", "controllerKind": "Model", "worker count": 1} 2024/10/24 14:00:07 Is leader, autoscaling 2024/10/24 14:00:07 Aggregating metrics from KubeAI addresses [192.168.67.206:8080] 2024-10-24T14:00:13Z INFO Reconciling Model {"controller": "model", "controllerGroup": "kubeai.org", "controllerKind": "Model", "Model": {"name":"llama-3.1-8b-instruct-fp8-l4","namespace":"default"}, "namespace": "default", "name": "llama-3.1-8b-instruct-fp8-l4", "reconcileID": "c383c9be-7ee8-40fe-949a-b19aa94704e1"} 2024-10-24T14:00:13Z INFO KubeAPIWarningLogger metadata.name: this is used in Pod names and hostnames, which can result in surprising behavior; a DNS label is recommended: [must not contain dots] 2024-10-24T14:00:13Z INFO Reconciling Model {"controller": "model", "controllerGroup": "kubeai.org", "controllerKind": "Model", "Model": {"name":"llama-3.1-8b-instruct-fp8-l4","namespace":"default"}, "namespace": "default", "name": "llama-3.1-8b-instruct-fp8-l4", "reconcileID": "0d012363-27c3-41ab-befc-53a559a6ee21"} 2024-10-24T14:00:13Z INFO Reconciling Model {"controller": "model", "controllerGroup": "kubeai.org", "controllerKind": "Model", "Model": {"name":"llama-3.1-8b-instruct-fp8-l4","namespace":"default"}, "namespace": "default", "name": "llama-3.1-8b-instruct-fp8-l4", "reconcileID": "e580de7b-d4d9-4fcc-a0eb-ddb631d5ae80"} 2024-10-24T14:00:13Z INFO Reconciling Model {"controller": "model", "controllerGroup": "kubeai.org", "controllerKind": "Model", "Model": {"name":"llama-3.1-8b-instruct-fp8-l4","namespace":"default"}, "namespace": "default", "name": "llama-3.1-8b-instruct-fp8-l4", "reconcileID": "61c706dc-6e2e-4d97-83ff-600e80f45efa"} 2024/10/24 14:00:17 Is leader, autoscaling 2024/10/24 14:00:17 Aggregating metrics from KubeAI addresses [192.168.67.206:8080] 2024/10/24 14:00:17 No metrics found for model "llama-3.1-8b-instruct-fp8-l4", skipping 2024/10/24 14:00:27 Is leader, autoscaling 2024/10/24 14:00:27 Aggregating metrics from KubeAI addresses [192.168.67.206:8080] 2024/10/24 14:00:27 No metrics found for model "llama-3.1-8b-instruct-fp8-l4", skipping 2024-10-24T14:00:30Z INFO Reconciling Model {"controller": "model", "controllerGroup": "kubeai.org", "controllerKind": "Model", "Model": {"name":"llama-3.1-8b-instruct-fp8-l4","namespace":"default"}, "namespace": "default", "name": "llama-3.1-8b-instruct-fp8-l4", "reconcileID": "cce6ae02-b08c-4dd5-b13f-72be8916d22e"} 2024-10-24T14:00:30Z INFO Reconciling Model {"controller": "model", "controllerGroup": "kubeai.org", "controllerKind": "Model", "Model": {"name":"llama-3.1-8b-instruct-fp8-l4","namespace":"default"}, "namespace": "default", "name": "llama-3.1-8b-instruct-fp8-l4", "reconcileID": "cfcf5f39-8516-4522-917a-84d884869f98"} 2024-10-24T14:00:30Z INFO Reconciling Model {"controller": "model", "controllerGroup": "kubeai.org", "controllerKind": "Model", "Model": {"name":"llama-3.1-8b-instruct-fp8-l4","namespace":"default"}, "namespace": "default", "name": "llama-3.1-8b-instruct-fp8-l4", "reconcileID": "9a7498a8-09a8-4a02-8703-6941edcf64fc"}
The text was updated successfully, but these errors were encountered:
Workaround: remove the finalizer on the model object.
Sorry, something went wrong.
No branches or pull requests
I tried creating a model with cachingProfile and storageClass but the storageclass doesn't exist.
Steps to reproduce:
Expected result: The model gets cleaned up automatically.
Current result: The model is stuck with finalizers and the evict cache and load cache pods stay pending forever.
Pods:
Model spec:
Logs in KubeAI:
The text was updated successfully, but these errors were encountered: