-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue in Lora Adapter Test #347
Comments
Will look to reproduce this |
It seems the newer version of tinyllama does have a chat template in the tokenizer_config.json: https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0/blob/main/tokenizer_config.json#L29 however the v0.3 version is missing the chat template: https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.3/blob/main/tokenizer_config.json The easiest approach for now would be to switch to a base model that has a chat_template specificed. I suspect this would just work
Let me see if we already have an open issue for ability to provide a chat template by passing a configmap or simply inline with the model definition. edit here is the issue: #243 |
I tested the new yaml. I think TinyLlama layer worked but colorist adapter seems problematic still.
If I specify the base model in openweb ui, chatbot worked. If I specified -colorist, it responded nothing and stuck.
|
Summary: It seems KubeAI doesn't always correctly update the adapters for an endpoint. Restarting KubeAI is able to workaround the issue. Next step, figure out why endpoints aren't being updated correctly. The reconcile loop should be triggered after the pod labels are updated? I was able to reproduce this. It seems the request is never passed from KubeAI to the vLLM backend. I see this in the kubeAI logs:
Even though the host is already up and running, it seems KubeAI is waiting for the host? Some more relevant KubeAI logs
Eventually the adapter seems to load correctly though. Checking the pod directly, I do see the adapter label on the pod
|
Did you see if the label was present on the Pod before the restart? |
Yes it was there already before restart |
I followed this tutorial to deploy a tinyLLama base model with colorist adapter. However, after the deployment, I found the following deployment error from kubectl logs -f . All others seems working.
Both containers could run successfully. However, when testing from UI, if I choose TinyLlamaModel instead of colorist, I would get the following error.
If choose the colorist, the prompt freezes but there is no log.
My Yaml is shown below
The text was updated successfully, but these errors were encountered: