CentML AI Inference Provider Integration #810
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Add CentML as a Remote Inference Provider in llama-stack.
This PR integrates CentML into llama-stack, enabling users to utilize CentML's models (
meta-llama/Llama-3.3-70B-Instruct
andmeta-llama/Llama-3.1-405B-Instruct-FP8
) for inference tasks like chat and text completions.Right now only supported for conda deployments, simply build with
llama stack build --template centml --image-type conda
and run withllama stack run run.yaml --port <PORT> --env CENTML_API_KEY=<API_KEY>
, then usellama-stack-client
to perform any inference workload as needed.Key Changes:
meta-llama/Llama-3.3-70B-Instruct
andmeta-llama/Llama-3.1-405B-Instruct-FP8
.Addresses issue #809
Test Plan
The integration tests is only ran against the dev cluster becuase of model's availability. Plus the
completion
endpoints and lobprobs / tool-calling ability is not yet implemented on centml end, the tests are failing so currently there is no point to alter it. Please see the following manual testing on the integration tests for more details.Manual Testing:
llama_3b
(production cluster only have70b
and405b
models).Passed test:
Reproduction Instructions:
(P.S: the tests will only run on developement cluster because of model availability (which is this PR is not set to))
Sources
Related Issue: #809
Before submitting