-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LoRA Adapters for vLLM & support for s3, gs, oss for pulling adapters and models (to cache) from buckets #304
Conversation
Nice drawing. This is very helpful! 🌻 |
Can you show an example that has the url field? I'm assuming the url field must be used to specify the base model? |
@samos123 I currently have all examples in the diagrams |
That's where I looked but none of them have the base model URL set? |
Model |
Note, it looks like vLLM supports loading adapters from huggingface: vllm-project/vllm#6234 |
Note, vLLM has an endpoint to support dynamic loading/unloading of adapters: vllm-project/vllm#6566 |
#url: hf://meta-llama/Llama-2-7b | ||
adapters: | ||
- id: test | ||
url: hf://jashing/tinyllama-colorist-lora |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does vLLM support directly loading this adapter from HF or is it a hard requirement to download the lora adapter first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vLLM can load it from HF but not S3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether this should work or not. So sharing what I did and the error message I get:
I was trying to play around with the branch and see how the helm validation worked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't finish all. Leaving partial feedback, will continue more tomorrow.
Here is what I used to make sure that the most recent image was running: gcloud container clusters create-auto cluster-1 \
--location=us-central1
skaffold run -f ./skaffold.yaml --profile kubeai-only-gke --default-repo us-central1-docker.pkg.dev/substratus-dev |
07e4c2a
to
ad72797
Compare
nice work! |
.spec.adapters
{"model": "<model>_<adapter>", ... }
.model
to use adapter in chat request body when proxying to vLLMs3://
,gs://
,oss://
urls (for adapters and cache loading)NOTE:
oss://
urls... Had issues opening acct.FOLLOWUP:
Fixes #132, #303