Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoRA Adapters for vLLM & support for s3, gs, oss for pulling adapters and models (to cache) from buckets #304

Merged
merged 34 commits into from
Nov 24, 2024

Conversation

nstogner
Copy link
Contributor

@nstogner nstogner commented Nov 4, 2024

  • Add .spec.adapters
  • Support requesting adapters using the pattern: {"model": "<model>_<adapter>", ... }
  • Load LoRA adapters into running vLLM containers
  • Support updating LoRA adapters without needing to restart vLLM
  • Rewrite .model to use adapter in chat request body when proxying to vLLM
  • Add adapters to model list
  • Add support for s3://, gs://, oss:// urls (for adapters and cache loading)
  • Add new cloud credentials to support new urls
  • Update docs
  • Update Model validation

NOTE:

  • Was unable to test oss:// urls... Had issues opening acct.

FOLLOWUP:

  • Need to add adapter e2e tests (have not found a small enough model with adapters for use in kind cluster)
  • Need to update chart values.yaml to include GH-actions-built image for model loader after merge!!!

Fixes #132, #303

@nstogner nstogner requested a review from samos123 November 4, 2024 14:40
@alpe
Copy link
Contributor

alpe commented Nov 5, 2024

Nice drawing. This is very helpful! 🌻
I am not super familiar with LoRa adapters but they can have significant size from what I saw. Caching seems a good idea. For the non-cache scenario, I would suggest to have no-cache or container-managed profile so that it does not look like the default to skip this.
With on-demand LoRa, the disk size may become a problem at some point. This is off scope but purge job or retention time can be things that need to be configured at some point in the profile.

@samos123
Copy link
Contributor

samos123 commented Nov 5, 2024

Can you show an example that has the url field? I'm assuming the url field must be used to specify the base model?

@nstogner
Copy link
Contributor Author

nstogner commented Nov 5, 2024

@samos123 I currently have all examples in the diagrams

@samos123
Copy link
Contributor

samos123 commented Nov 5, 2024

That's where I looked but none of them have the base model URL set?

@nstogner
Copy link
Contributor Author

nstogner commented Nov 8, 2024

Model .spec.url would be the same as normal.

@nstogner
Copy link
Contributor Author

nstogner commented Nov 8, 2024

Note, it looks like vLLM supports loading adapters from huggingface: vllm-project/vllm#6234

@nstogner
Copy link
Contributor Author

nstogner commented Nov 8, 2024

Note, vLLM has an endpoint to support dynamic loading/unloading of adapters: vllm-project/vllm#6566

#url: hf://meta-llama/Llama-2-7b
adapters:
- id: test
url: hf://jashing/tinyllama-colorist-lora
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does vLLM support directly loading this adapter from HF or is it a hard requirement to download the lora adapter first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vLLM can load it from HF but not S3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nstogner nstogner changed the title WIP: LoRA Adapters WIP: LoRA Adapters for vLLM Nov 13, 2024
@nstogner nstogner changed the title WIP: LoRA Adapters for vLLM LoRA Adapters for vLLM Nov 19, 2024
@nstogner nstogner changed the title LoRA Adapters for vLLM LoRA Adapters for vLLM + more URLs Nov 22, 2024
@nstogner nstogner changed the title LoRA Adapters for vLLM + more URLs LoRA Adapters for vLLM & support for pulling models and adapters from buckets Nov 22, 2024
@nstogner nstogner changed the title LoRA Adapters for vLLM & support for pulling models and adapters from buckets LoRA Adapters for vLLM & support for s3, gs, oss for pulling adapters and models (to cache) from buckets Nov 24, 2024
@samos123
Copy link
Contributor

Not sure whether this should work or not. So sharing what I did and the error message I get:

git checkout $THIS_PR
helm install kubeai ./charts/kubeai

k get pods
NAME                         READY   STATUS             RESTARTS     AGE
kubeai-6bf98d5b77-txmpb      0/1     CrashLoopBackOff   1 (2s ago)   5s
openwebui-69ffb7dbb4-xcvlj   0/1     Running            0            5s

k logs -f kubeai-6bf98d5b77-txmpb 
2024-11-24T06:37:29Z    INFO    manager run finished
2024-11-24T06:37:29Z    ERROR   manager failed to run command   {"error": "invalid config: Key: 'System.ModelLoaders' Error:Field validation for 'ModelLoaders' failed on the 'required' tag"}
main.main
        /workspace/cmd/main.go:50
runtime.main
        /usr/local/go/src/runtime/proc.go:271

I was trying to play around with the branch and see how the helm validation worked.

Copy link
Contributor

@samos123 samos123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't finish all. Leaving partial feedback, will continue more tomorrow.

api/v1/model_types.go Show resolved Hide resolved
internal/apiutils/requests.go Show resolved Hide resolved
internal/modelcontroller/engine_vllm.go Show resolved Hide resolved
internal/modelcontroller/adapters.go Show resolved Hide resolved
internal/modelcontroller/pod_utils.go Outdated Show resolved Hide resolved
internal/modelcontroller/engine_vllm.go Show resolved Hide resolved
@nstogner
Copy link
Contributor Author

nstogner commented Nov 24, 2024

Not sure whether this should work or not. So sharing what I did and the error message I get:

I was trying to play around with the branch and see how the helm validation worked.

Here is what I used to make sure that the most recent image was running:

gcloud container clusters create-auto cluster-1 \
    --location=us-central1

skaffold run -f ./skaffold.yaml --profile kubeai-only-gke --default-repo us-central1-docker.pkg.dev/substratus-dev

@nstogner nstogner merged commit b12f811 into main Nov 24, 2024
11 checks passed
@nstogner nstogner deleted the lora-adapters branch November 24, 2024 18:21
@alpe
Copy link
Contributor

alpe commented Nov 26, 2024

nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support dynamic LoRA serving
3 participants