-
Notifications
You must be signed in to change notification settings - Fork 58
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Tutorials] Add a helm chart for MAX OpenAI API container image
MODULAR_ORIG_COMMIT_REV_ID: 4fe81ce8b59387888d641238fed0cc455ef95831
- Loading branch information
1 parent
ce15fbc
commit 636f83b
Showing
13 changed files
with
972 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
.DS_Store | ||
# Common VCS dirs | ||
.git/ | ||
.gitignore | ||
.bzr/ | ||
.bzrignore | ||
.hg/ | ||
.hgignore | ||
.svn/ | ||
# Common backup files | ||
*.swp | ||
*.bak | ||
*.tmp | ||
*~ | ||
# Various IDEs | ||
.project | ||
.idea/ | ||
*.tmproj | ||
bin |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
##===----------------------------------------------------------------------===## | ||
# | ||
# This file is Modular Inc proprietary. | ||
# | ||
##===----------------------------------------------------------------------===## | ||
apiVersion: v2 | ||
appVersion: "0.1.0" | ||
description: The MAX platform unifies the leading AI development frameworks (TensorFlow, PyTorch, ONNX) and hardware backends in order to simplify deployment for AI production teams and accelerate innovation for AI developers. | ||
name: max-openai-api-chart | ||
home: https://www.modular.com/ | ||
keywords: | ||
- machine learning | ||
- inference | ||
sources: | ||
- https://github.com/modularml/max | ||
maintainers: | ||
- name: Modular team | ||
email: [email protected] | ||
url: https://github.com/modularml/max | ||
version: 0.1.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,193 @@ | ||
<!-- markdownlint-disable --> | ||
<!-- | ||
NOTE: This file is generated by helm-docs: https://github.com/norwoodj/helm-docs#installation | ||
--> | ||
|
||
# MAX OpenAI API Helm chart | ||
|
||
The MAX platform unifies the leading AI development frameworks (TensorFlow, PyTorch, ONNX) and hardware backends in order to simplify deployment for AI production teams and accelerate innovation for AI developers. | ||
|
||
**Homepage:** <https://www.modular.com/> | ||
|
||
## Source Code | ||
|
||
* <https://github.com/modularml/max> | ||
|
||
## Usage | ||
|
||
### Installing the chart | ||
|
||
To install this chart using Helm 3, run the following command: | ||
|
||
```console | ||
helm install max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart \ | ||
--version <insert-version> \ | ||
--set huggingfaceRepoId=<insert-huggingface-model-id> | ||
--set maxServe.maxLength=512 \ | ||
--set maxServe.maxCacheBatchSize=16 \ | ||
--set envSecret.HUGGING_FACE_HUB_TOKEN=<insert-huggingface-token> \ | ||
--set env.HF_HUB_ENABLE_HF_TRANSFER=1 \ | ||
--wait | ||
``` | ||
|
||
The command deploys MAX OpenAI API on the Kubernetes cluster in the default configuration. The Values reference section below lists the parameters that can be configured during installation. | ||
|
||
### Upgrading the chart | ||
|
||
To upgrade the chart with the release name `max-openai-api`: | ||
|
||
```console | ||
helm upgrade max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart | ||
``` | ||
|
||
### Uninstalling the chart | ||
|
||
To uninstall/delete the `max-openai-api` deployment: | ||
|
||
```console | ||
helm delete max-openai-api | ||
``` | ||
|
||
### End-to-end example that provisions an K8s cluster and installs MAX OpenAI API | ||
|
||
To provision a k8s cluster via `eksctl` and then install MAX OpenAI API, run the following commands: | ||
|
||
```console | ||
# provision a k8s cluster (takes 10-15 minutes) | ||
eksctl create cluster \ | ||
--name max-openai-api-demo \ | ||
--region us-east-1 \ | ||
--node-type g5.4xlarge \ | ||
--nodes 1 | ||
|
||
# create a k8s namespace | ||
kubectl create namespace max-openai-api-demo | ||
|
||
# deploy MAX OpenAI API via helm chart (takes 10 minutes) | ||
helm install max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart \ | ||
--version <insert-version> \ | ||
--namespace max-openai-api-demo \ | ||
--set huggingfaceRepoId=modularai/llama-3.1 | ||
--set maxServe.maxLength=512 \ | ||
--set maxServe.maxCacheBatchSize=16 \ | ||
--set envSecret.HUGGING_FACE_HUB_TOKEN=<insert-huggingface-token> \ | ||
--set env.HF_HUB_ENABLE_HF_TRANSFER=1 \ | ||
--timeout 10m0s \ | ||
--wait | ||
|
||
# forward the remote k8s port to the local network to access the service locally | ||
# the command is blocking and takes the terminal | ||
# user another terminal for subsequent curl and ctrl-c to stop the port forwarding | ||
POD_NAME=$(kubectl get pods --namespace max-openai-api-demo -l "app.kubernetes.io/name=max-openai-api-chart,app.kubernetes.io/instance=max-openai-api" -o jsonpath="{.items[0].metadata.name}") | ||
CONTAINER_PORT=$(kubectl get pod --namespace max-openai-api-demo $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}") | ||
kubectl port-forward $POD_NAME 8000:$CONTAINER_PORT --namespace max-openai-api-demo & | ||
|
||
# test the service | ||
curl -N http://localhost:8000/v1/chat/completions \ | ||
-H "Content-Type: application/json" \ | ||
-d '{ | ||
"model": "modularai/llama-3.1", | ||
"stream": true, | ||
"messages": [ | ||
{"role": "system", "content": "You are a helpful assistant."}, | ||
{"role": "user", "content": "Who won the world series in 2020?"} | ||
] | ||
}' | ||
|
||
# uninstall MAX OpenAI API | ||
helm uninstall max-openai-api --namespace max-openai-api-demo | ||
|
||
# Delete the namespace | ||
kubectl delete namespace max-openai-api-demo | ||
|
||
# delete the k8s cluster | ||
eksctl delete cluster \ | ||
--name max-openai-api-demo \ | ||
--region us-east-1 | ||
``` | ||
|
||
## Values | ||
|
||
| Key | Type | Default | Description | | ||
|-----|------|---------|-------------| | ||
| affinity | object | `{}` | Affinity to be added to all deployments | | ||
| env | object | `{}` | Environment variables that will be passed into pods | | ||
| envFromSecret | string | `"{{ template \"max.fullname\" . }}-env"` | The name of the secret which we will use to populate env vars in deployed pods This can be useful for secret keys, etc. | | ||
| envFromSecrets | list | `[]` | This can be a list of templated strings | | ||
| envRaw | list | `[]` | Environment variables in RAW format that will be passed into pods | | ||
| envSecret | object | `{}` | Environment variables to pass as secrets | | ||
| fullnameOverride | string | `nil` | Provide a name to override the full names of resources | | ||
| image.pullPolicy | string | `"IfNotPresent"` | | | ||
| image.repository | string | `"modular/max-openai-api"` | | | ||
| image.tag | string | `"latest"` | | | ||
| imagePullSecrets | list | `[]` | | | ||
| inferenceServer.affinity | object | `{}` | Affinity to be added to inferenceServer deployment | | ||
| inferenceServer.args | list | See `values.yaml` | Arguments to pass to the node entrypoint. If defined it overwrites the default args value set by .Values.max-serve | | ||
| inferenceServer.autoscaling.enabled | bool | `false` | | | ||
| inferenceServer.autoscaling.maxReplicas | int | `2` | | | ||
| inferenceServer.autoscaling.minReplicas | int | `1` | | | ||
| inferenceServer.autoscaling.targetCPUUtilizationPercentage | int | `80` | | | ||
| inferenceServer.containerSecurityContext | object | `{}` | | | ||
| inferenceServer.deploymentAnnotations | object | `{}` | Annotations to be added to inferenceServer deployment | | ||
| inferenceServer.deploymentLabels | object | `{}` | Labels to be added to inferenceServer deployment | | ||
| inferenceServer.env | object | `{}` | | | ||
| inferenceServer.extraContainers | list | `[]` | Launch additional containers into inferenceServer pod | | ||
| inferenceServer.livenessProbe.failureThreshold | int | `3` | | | ||
| inferenceServer.livenessProbe.httpGet.path | string | `"/v1/health"` | | | ||
| inferenceServer.livenessProbe.httpGet.port | string | `"http"` | | | ||
| inferenceServer.livenessProbe.initialDelaySeconds | int | `1` | | | ||
| inferenceServer.livenessProbe.periodSeconds | int | `15` | | | ||
| inferenceServer.livenessProbe.successThreshold | int | `1` | | | ||
| inferenceServer.livenessProbe.timeoutSeconds | int | `1` | | | ||
| inferenceServer.nodeSelector | object | `{}` | NodeSelector to be added to inferenceServer deployment | | ||
| inferenceServer.podAnnotations | object | `{}` | Annotations to be added to inferenceServer pods | | ||
| inferenceServer.podLabels | object | `{}` | Labels to be added to inferenceServer pods | | ||
| inferenceServer.podSecurityContext | object | `{}` | | | ||
| inferenceServer.readinessProbe.failureThreshold | int | `3` | | | ||
| inferenceServer.readinessProbe.httpGet.path | string | `"/v1/health"` | | | ||
| inferenceServer.readinessProbe.httpGet.port | string | `"http"` | | | ||
| inferenceServer.readinessProbe.initialDelaySeconds | int | `1` | | | ||
| inferenceServer.readinessProbe.periodSeconds | int | `15` | | | ||
| inferenceServer.readinessProbe.successThreshold | int | `1` | | | ||
| inferenceServer.readinessProbe.timeoutSeconds | int | `1` | | | ||
| inferenceServer.replicaCount | int | `1` | | | ||
| inferenceServer.resources | object | `{}` | Resource settings for the inferenceServer pods - these settings overwrite existing values from the global resources object defined above. | | ||
| inferenceServer.startupProbe.failureThreshold | int | `60` | | | ||
| inferenceServer.startupProbe.httpGet.path | string | `"/v1/health"` | | | ||
| inferenceServer.startupProbe.httpGet.port | string | `"http"` | | | ||
| inferenceServer.startupProbe.initialDelaySeconds | int | `1` | | | ||
| inferenceServer.startupProbe.periodSeconds | int | `5` | | | ||
| inferenceServer.startupProbe.successThreshold | int | `1` | | | ||
| inferenceServer.startupProbe.timeoutSeconds | int | `1` | | | ||
| inferenceServer.strategy | object | `{}` | | | ||
| inferenceServer.tolerations | list | `[]` | Tolerations to be added to inferenceServer deployment | | ||
| inferenceServer.topologySpreadConstraints | list | `[]` | TopologySpreadConstrains to be added to inferenceServer deployments | | ||
| inferenceServer.volumeMounts | list | `[]` | Volumes to mount into inferenceServer pod | | ||
| inferenceServer.volumes | list | `[]` | Volumes to mount into inferenceServer pod | | ||
| ingress.annotations | object | `{}` | | | ||
| ingress.enabled | bool | `false` | | | ||
| ingress.extraHostsRaw | list | `[]` | | | ||
| ingress.hosts | list | `[]` | | | ||
| ingress.ingressClassName | string | `nil` | | | ||
| ingress.path | string | `"/"` | | | ||
| ingress.pathType | string | `"ImplementationSpecific"` | | | ||
| ingress.tls | list | `[]` | | | ||
| maxServe | object | `{"cacheStrategy":"continuous","huggingfaceRepoId":"modularai/llama-3.1","maxCacheBatchSize":"250","maxLength":"2048","maxNumSteps":"10"}` | MAX Serve arguments | | ||
| nameOverride | string | `nil` | Provide a name to override the name of the chart | | ||
| nodeSelector | object | `{}` | NodeSelector to be added to all deployments | | ||
| resources | object | `{}` | | | ||
| runAsUser | int | `0` | User ID directive. This user must have enough permissions to run the bootstrap script Running containers as root is not recommended in production. Change this to another UID - e.g. 1000 to be more secure | | ||
| service.annotations | object | `{}` | | | ||
| service.loadBalancerIP | string | `nil` | | | ||
| service.ports[0].name | string | `"http"` | | | ||
| service.ports[0].port | int | `8000` | | | ||
| service.ports[0].protocol | string | `"TCP"` | | | ||
| service.ports[0].targetPort | int | `8000` | | | ||
| service.type | string | `"ClusterIP"` | | | ||
| serviceAccount.annotations | object | `{}` | | | ||
| serviceAccount.create | bool | `false` | Create custom service account for MAX Serving. If create: true and serviceAccountName is not provided, `max.fullname` will be used. | | ||
| serviceAccountName | string | `nil` | Specify service account name to be used | | ||
| tolerations | list | `[]` | Tolerations to be added to all deployments | | ||
| topologySpreadConstraints | list | `[]` | TopologySpreadConstraints to be added to all deployments | | ||
| volumeMounts | list | `[]` | | | ||
| volumes | list | `[]` | | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
<!-- markdownlint-disable --> | ||
<!-- | ||
NOTE: This file is generated by helm-docs: https://github.com/norwoodj/helm-docs#installation | ||
--> | ||
|
||
# MAX OpenAI API Helm chart | ||
|
||
{{ template "chart.deprecationWarning" . }} | ||
|
||
{{ template "chart.description" . }} | ||
|
||
{{ template "chart.homepageLine" . }} | ||
|
||
{{ template "chart.sourcesSection" . }} | ||
|
||
## Usage | ||
|
||
### Installing the chart | ||
|
||
To install this chart using Helm 3, run the following command: | ||
|
||
```console | ||
helm install max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart \ | ||
--version <insert-version> \ | ||
--set huggingfaceRepoId=<insert-huggingface-model-id> | ||
--set maxServe.maxLength=512 \ | ||
--set maxServe.maxCacheBatchSize=16 \ | ||
--set envSecret.HUGGING_FACE_HUB_TOKEN=<insert-huggingface-token> \ | ||
--set env.HF_HUB_ENABLE_HF_TRANSFER=1 \ | ||
--wait | ||
``` | ||
|
||
The command deploys MAX OpenAI API on the Kubernetes cluster in the default configuration. The Values reference section below lists the parameters that can be configured during installation. | ||
|
||
### Upgrading the chart | ||
|
||
To upgrade the chart with the release name `max-openai-api`: | ||
|
||
```console | ||
helm upgrade max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart | ||
``` | ||
|
||
### Uninstalling the chart | ||
|
||
To uninstall/delete the `max-openai-api` deployment: | ||
|
||
```console | ||
helm delete max-openai-api | ||
``` | ||
|
||
### End-to-end example that provisions an K8s cluster and installs MAX OpenAI API | ||
|
||
To provision a k8s cluster via `eksctl` and then install MAX OpenAI API, run the following commands: | ||
|
||
```console | ||
# provision a k8s cluster (takes 10-15 minutes) | ||
eksctl create cluster \ | ||
--name max-openai-api-demo \ | ||
--region us-east-1 \ | ||
--node-type g5.4xlarge \ | ||
--nodes 1 | ||
|
||
# create a k8s namespace | ||
kubectl create namespace max-openai-api-demo | ||
|
||
# deploy MAX OpenAI API via helm chart (takes 10 minutes) | ||
helm install max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart \ | ||
--version <insert-version> \ | ||
--namespace max-openai-api-demo \ | ||
--set huggingfaceRepoId=modularai/llama-3.1 | ||
--set maxServe.maxLength=512 \ | ||
--set maxServe.maxCacheBatchSize=16 \ | ||
--set envSecret.HUGGING_FACE_HUB_TOKEN=<insert-huggingface-token> \ | ||
--set env.HF_HUB_ENABLE_HF_TRANSFER=1 \ | ||
--timeout 10m0s \ | ||
--wait | ||
|
||
# forward the remote k8s port to the local network to access the service locally | ||
# the command is blocking and takes the terminal | ||
# user another terminal for subsequent curl and ctrl-c to stop the port forwarding | ||
POD_NAME=$(kubectl get pods --namespace max-openai-api-demo -l "app.kubernetes.io/name=max-openai-api-chart,app.kubernetes.io/instance=max-openai-api" -o jsonpath="{.items[0].metadata.name}") | ||
CONTAINER_PORT=$(kubectl get pod --namespace max-openai-api-demo $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}") | ||
kubectl port-forward $POD_NAME 8000:$CONTAINER_PORT --namespace max-openai-api-demo & | ||
|
||
# test the service | ||
curl -N http://localhost:8000/v1/chat/completions \ | ||
-H "Content-Type: application/json" \ | ||
-d '{ | ||
"model": "modularai/llama-3.1", | ||
"stream": true, | ||
"messages": [ | ||
{"role": "system", "content": "You are a helpful assistant."}, | ||
{"role": "user", "content": "Who won the world series in 2020?"} | ||
] | ||
}' | ||
|
||
# uninstall MAX OpenAI API | ||
helm uninstall max-openai-api --namespace max-openai-api-demo | ||
|
||
# Delete the namespace | ||
kubectl delete namespace max-openai-api-demo | ||
|
||
# delete the k8s cluster | ||
eksctl delete cluster \ | ||
--name max-openai-api-demo \ | ||
--region us-east-1 | ||
``` | ||
|
||
|
||
{{ template "chart.requirementsSection" . }} | ||
|
||
{{ template "chart.valuesSection" . }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
1. Get the application URL by running these commands: | ||
{{- if .Values.ingress.enabled }} | ||
{{- range .Values.ingress.hosts }} | ||
http{{ if $.Values.ingress.tls }}s{{ end }}://{{ . }}{{ $.Values.ingress.path }} | ||
{{- end }} | ||
{{- else if contains "NodePort" .Values.service.type }} | ||
NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ template "max.fullname" . }}) | ||
NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}") | ||
echo http://$NODE_IP:$NODE_PORT | ||
{{- else if contains "LoadBalancer" .Values.service.type }} | ||
NOTE: It may take a few minutes for the LoadBalancer IP to be available. | ||
You can watch the status of by running 'kubectl get svc -w {{ template "max.fullname" . }}' | ||
SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ template "max.fullname" . }} -o jsonpath='{.status.loadBalancer.ingress[0].ip}') | ||
echo http://$SERVICE_IP:{{ .Values.service.port }} | ||
{{- else if contains "ClusterIP" .Values.service.type }} | ||
POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ template "max.name" . }},app.kubernetes.io/instance={{ .Release.Name }}" -o jsonpath="{.items[0].metadata.name}") | ||
CONTAINER_PORT=$(kubectl get pod --namespace {{ .Release.Namespace }} $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}") | ||
echo "The application is available at the following DNS name from within your cluster:" | ||
echo "{{ .Release.Name }}.{{ .Release.Namespace }}.svc.cluster.local:$CONTAINER_PORT" | ||
echo "Or use the following command to forward ports and visit it locally at http://127.0.0.1:8000" | ||
echo "kubectl port-forward $POD_NAME 8000:$CONTAINER_PORT --namespace {{ .Release.Namespace }}" | ||
{{- end }} |
Oops, something went wrong.