Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically calculate PVC request sizes or remove, or document if it can be ignored #298

Open
nstogner opened this issue Oct 30, 2024 · 2 comments

Comments

@nstogner
Copy link
Contributor

When using KubeAI caching, PVCs are created with a static storage request size of 10Gi. We have received feedback from multiple users who have been surprised by this and surprised that this size is not actually enforced when they use it.

corev1.ResourceStorage: resource.MustParse("10Gi"),

When it comes to Filestore / NFS / EFS I believe that this size is generally not an issue. In some cases I think this is because the minimum size of a dynamically provisioned Filestore instance is much larger than this. It might be disregarded in other cases. Need to do some more investigation across all implementations.

Possible options:

  • Dynamically calculate this request size (inspect the model source?)
  • Remove the storage request (is this valid?)
  • Document the implications of the request

Links to user feedback:

@nstogner nstogner changed the title Dynamically calculate PVC request sizes or remove, or document why it doesnt matter Dynamically calculate PVC request sizes or remove, or document if it can be ignored Oct 30, 2024
@radu-catrangiu
Copy link

Hello! 👋

Before I even noticed this issue has been created I tried my hand at solving the problem and I'd like to get your thoughts on it.

My fix is currently at the "proof of concept" level, it works as intended, but I need to add error handling, documentation and unit tests. Also I only added support for Huggingface models, I haven't looked into Ollama models yet.

I did the following:

  1. Created a simple HTTP server that takes a Huggingface model and uses git and git-lfs to determine its size
    • I took inspiration from the comment you left in the cachePVCForModel function
    • I took this approach in order to avoid adding another dependency to the KubeAI image
  2. I added a new package called modelevaluator
    • this required an update to the System config type to add the field ModelEvaluators which currently supports only the Huggingface model evaluator
    • A model evaluator has 2 fields:
      • Image - Mandatory String - the image for the model evaluator server
      • DevProxyHost - Optional String - the host to be used when running KubeAI off of the local machine
  3. I updated the CacheProfile to support ModelFilesystem alongside SharedFilesystem
  4. I updated the manager.Run function to initialize the modelevaluator
    • I create a Deployment and a Service from the image supplied in the System config
  5. In the cachePVCForModel function I call a GetModelSize function from the modelevaluator to obtain the size

What I haven't done yet, but thought might be useful:

  1. Make the SharedFilesystem storage size configurable from the System config
    • Maybe this could be improved later to perform a volume resize if the System config is changed and the PVC already exists and has a different size.
  2. Make the Model Evaluator component optional
    • A default might be needed if the Model Evaluator isn't available or maybe disable model-level caching entirely

Any input or idea you might have would be greatly appreciated! 😁

@nstogner
Copy link
Contributor Author

nstogner commented Nov 23, 2024

@radu-catrangiu Thanks for taking the time to detail all of this. A few things that will make this more complicated:

Overall feedback:

In KubeAI we strive to keep the system as simple as possible. KubeAI could be broken out into multiple services as it stands today, but we have decided to keep it as one monolith in order to avoid exposing microservice-operational-complexity to the admins that need to run it. My preference would be to use library calls over service calls. Being that this will take a good amount of code, I think a first simple step might be to expose a volume size parameter when configuring a cache profile. See:

sharedFilesystem:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants