-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Mount a PVC in ReadManyOnly mode as model storage #311
Comments
Supporting PVC can make kubeai agnostic to storage provider, for example existing volumes from azure files or azure blob can be used. |
Would you prefer passing a PVC or PV to the model spec? @SatyKrish please share your reasoning for preference as well. |
The typical pattern here is to match what is on the PV:
That is an example of dynamic provisioning. The PVC triggers the creation of a new bucket:
In this case, the bucket would be empty unless you followed a process like:
KubeAI already supports this flow naturally via the cache download functionality - and will soon support the specific bucket use case via a direct url to the bucket. However, consider the following use case:
The way to represent that volume inside of k8s would be to create a PV with a reference to the preexisting NFS share. I would recommend supporting |
Apologies, couldn’t respond earlier (prod fire drill). I’m currently loading large models from Disk, and small models from azure file share on AKS. PVC support will make it easier to migrate vLLM setup to KubeAI. |
Perfect, the PR is working for vLLM support to load models from PVC directly. Probably will be merged tomorrow. |
Storing models on a PVC is now supported with vLLM. Please update your helm chart to v0.10.0 or later to try it out. Other engines may happen later. Keeping this bug open until Ollama and Infinity engine is updated to add support as well. |
Use case: Users have pre-provisioned PVs that contain models on them and support ReadManyOnly. The user would be responsible for ensuring a compatible model is stored on the PV and creating a PVC.
Example:
The $PVC_MODEL_PATH will always be an absolute path startign with a
/
.Model is stored in PV under /llama:
model.url: "pvc://123f124/llama"
or "pvc://123f124//llama"`If the model is stored under the root directory in PV, then this would both be valid:
The following would happen inside the model engine pod:
/model
/model
Open questions:
No this won't be needed since the user is responsible for creating the PVC. So the user will have control over this. The only issue is we would start encountering issues when we try to scale beyond 1 replica with ReadWriteOnce.
Example user flow:
/llama-3-8b
. The PVC name isllama-3-8b
Model
and specifies model.url aspvc://llama-3-8b/llama-3-8b
Step 1 PVC:
Step 3 pod spec:
Why not have KubeAI manage the PVC and allow users to specify PV instead?
It allows the user to specify more attributes that may be relevant on the PVC. One example that wouldn't be as easy is figuring out the resource.requests.storage capacity to request for the PVC. So it may make more sense to have the user control that.
Taking a look at both GCS Fuse and Azure Blob. The only way to easily support both is to let the user supply a PVC. In the Azure case there seems to be no need to create a PV.
Azure Blob storage with PVC only
Let's take the use case for Azure Blob Storage. You may have a storage account
samos123
and in that storage account we have 2 different storage containers:llama-3.1-8b
andqwen70b
. So the storage looks like this:The AKS cluster is configured with storage account samos123
Then the user would only create a PVC and have no need to create a PV:
In KubeAI the user would specify the folllowing url:
pvc://azure-blob-storage/llama-3.1.8b
GCS Fuse example
Assume I have a bucket named
samos123
and in that bucket I have a directory llama-3-8b.Create PV
Create PVC:
Then in KubeAI, the URL would be:
pvc://gcs-fuse-csi-static-pvc/llama-3-8b
Source: https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/cloud-storage-fuse-csi-driver#provision-static
The text was updated successfully, but these errors were encountered: