Dynamically calculate PVC request sizes or remove, or document if it can be ignored #298

nstogner · 2024-10-30T13:55:42Z

When using KubeAI caching, PVCs are created with a static storage request size of 10Gi. We have received feedback from multiple users who have been surprised by this and surprised that this size is not actually enforced when they use it.

kubeai/internal/modelcontroller/cache.go

Line 290 in 9d4fb2c

corev1.ResourceStorage: resource.MustParse("10Gi"),

When it comes to Filestore / NFS / EFS I believe that this size is generally not an issue. In some cases I think this is because the minimum size of a dynamically provisioned Filestore instance is much larger than this. It might be disregarded in other cases. Need to do some more investigation across all implementations.

Possible options:

Dynamically calculate this request size (inspect the model source?)
Remove the storage request (is this valid?)
Document the implications of the request

Links to user feedback:

Model Storage #292

radu-catrangiu · 2024-11-19T13:04:17Z

Hello! 👋

Before I even noticed this issue has been created I tried my hand at solving the problem and I'd like to get your thoughts on it.

My fix is currently at the "proof of concept" level, it works as intended, but I need to add error handling, documentation and unit tests. Also I only added support for Huggingface models, I haven't looked into Ollama models yet.

I did the following:

Created a simple HTTP server that takes a Huggingface model and uses git and git-lfs to determine its size
- I took inspiration from the comment you left in the cachePVCForModel function
- I took this approach in order to avoid adding another dependency to the KubeAI image
I added a new package called modelevaluator
- this required an update to the System config type to add the field ModelEvaluators which currently supports only the Huggingface model evaluator
- A model evaluator has 2 fields:
  - Image - Mandatory String - the image for the model evaluator server
  - DevProxyHost - Optional String - the host to be used when running KubeAI off of the local machine
I updated the CacheProfile to support ModelFilesystem alongside SharedFilesystem
I updated the manager.Run function to initialize the modelevaluator
- I create a Deployment and a Service from the image supplied in the System config
In the cachePVCForModel function I call a GetModelSize function from the modelevaluator to obtain the size

What I haven't done yet, but thought might be useful:

Make the SharedFilesystem storage size configurable from the System config
- Maybe this could be improved later to perform a volume resize if the System config is changed and the PVC already exists and has a different size.
Make the Model Evaluator component optional
- A default might be needed if the Model Evaluator isn't available or maybe disable model-level caching entirely

Any input or idea you might have would be greatly appreciated! 😁

nstogner · 2024-11-23T17:14:04Z

@radu-catrangiu Thanks for taking the time to detail all of this. A few things that will make this more complicated:

PR (LoRA Adapters for vLLM & support for s3, gs, oss for pulling adapters and models (to cache) from buckets #304) that will expand on the number of model sources (s3://, gs://, oss://).
Issue Make model cache loading more efficient #274 needs to be tackled soon which will likely add filtering logic to the files that are downloaded.

Overall feedback:

In KubeAI we strive to keep the system as simple as possible. KubeAI could be broken out into multiple services as it stands today, but we have decided to keep it as one monolith in order to avoid exposing microservice-operational-complexity to the admins that need to run it. My preference would be to use library calls over service calls. Being that this will take a good amount of code, I think a first simple step might be to expose a volume size parameter when configuring a cache profile. See:

kubeai/charts/kubeai/values-gke.yaml

Line 45 in 499a6ed

sharedFilesystem:

nstogner changed the title ~~Dynamically calculate PVC request sizes or remove, or document why it doesnt matter~~ Dynamically calculate PVC request sizes or remove, or document if it can be ignored Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamically calculate PVC request sizes or remove, or document if it can be ignored #298

Dynamically calculate PVC request sizes or remove, or document if it can be ignored #298

nstogner commented Oct 30, 2024

radu-catrangiu commented Nov 19, 2024

nstogner commented Nov 23, 2024 •

edited

Loading

Dynamically calculate PVC request sizes or remove, or document if it can be ignored #298

Dynamically calculate PVC request sizes or remove, or document if it can be ignored #298

Comments

nstogner commented Oct 30, 2024

radu-catrangiu commented Nov 19, 2024

nstogner commented Nov 23, 2024 • edited Loading

nstogner commented Nov 23, 2024 •

edited

Loading