[Tutorials] Add a helm chart for MAX OpenAI API container image

MODULAR_ORIG_COMMIT_REV_ID: 4fe81ce8b59387888d641238fed0cc455ef95831
modular · Dec 17, 2024 · 636f83b · 636f83b
1 parent ce15fbc
commit 636f83b
Show file tree

Hide file tree

Showing 13 changed files with 972 additions and 0 deletions.
diff --git a/tutorials/helm/max-openai-api/.helmignore b/tutorials/helm/max-openai-api/.helmignore
@@ -0,0 +1,19 @@
+.DS_Store
+# Common VCS dirs
+.git/
+.gitignore
+.bzr/
+.bzrignore
+.hg/
+.hgignore
+.svn/
+# Common backup files
+*.swp
+*.bak
+*.tmp
+*~
+# Various IDEs
+.project
+.idea/
+*.tmproj
+bin
diff --git a/tutorials/helm/max-openai-api/Chart.yaml b/tutorials/helm/max-openai-api/Chart.yaml
@@ -0,0 +1,20 @@
+##===----------------------------------------------------------------------===##
+#
+# This file is Modular Inc proprietary.
+#
+##===----------------------------------------------------------------------===##
+apiVersion: v2
+appVersion: "0.1.0"
+description: The MAX platform unifies the leading AI development frameworks (TensorFlow, PyTorch, ONNX) and hardware backends in order to simplify deployment for AI production teams and accelerate innovation for AI developers.
+name: max-openai-api-chart
+home: https://www.modular.com/
+keywords:
+  - machine learning
+  - inference
+sources:
+  - https://github.com/modularml/max
+maintainers:
+  - name: Modular team
+    email: [email protected]
+    url: https://github.com/modularml/max
+version: 0.1.0
diff --git a/tutorials/helm/max-openai-api/README.md b/tutorials/helm/max-openai-api/README.md
@@ -0,0 +1,193 @@
+<!-- markdownlint-disable -->
+<!--
+NOTE: This file is generated by helm-docs: https://github.com/norwoodj/helm-docs#installation
+-->
+
+# MAX OpenAI API Helm chart
+
+The MAX platform unifies the leading AI development frameworks (TensorFlow, PyTorch, ONNX) and hardware backends in order to simplify deployment for AI production teams and accelerate innovation for AI developers.
+
+**Homepage:** <https://www.modular.com/>
+
+## Source Code
+
+* <https://github.com/modularml/max>
+
+## Usage
+
+### Installing the chart
+
+To install this chart using Helm 3, run the following command:
+
+```console
+helm install max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart \
+  --version <insert-version> \
+  --set huggingfaceRepoId=<insert-huggingface-model-id>
+  --set maxServe.maxLength=512 \
+  --set maxServe.maxCacheBatchSize=16 \
+  --set envSecret.HUGGING_FACE_HUB_TOKEN=<insert-huggingface-token> \
+  --set env.HF_HUB_ENABLE_HF_TRANSFER=1 \
+  --wait
+```
+
+The command deploys MAX OpenAI API on the Kubernetes cluster in the default configuration. The Values reference section below lists the parameters that can be configured during installation.
+
+### Upgrading the chart
+
+To upgrade the chart with the release name `max-openai-api`:
+
+```console
+helm upgrade max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart
+```
+
+### Uninstalling the chart
+
+To uninstall/delete the `max-openai-api` deployment:
+
+```console
+helm delete max-openai-api
+```
+
+### End-to-end example that provisions an K8s cluster and installs MAX OpenAI API
+
+To provision a k8s cluster via `eksctl` and then install MAX OpenAI API, run the following commands:
+
+```console
+# provision a k8s cluster (takes 10-15 minutes)
+eksctl create cluster \
+  --name max-openai-api-demo \
+  --region us-east-1 \
+  --node-type g5.4xlarge \
+  --nodes 1
+
+# create a k8s namespace
+kubectl create namespace max-openai-api-demo
+
+# deploy MAX OpenAI API via helm chart (takes 10 minutes)
+helm install max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart \
+  --version <insert-version> \
+  --namespace max-openai-api-demo \
+  --set huggingfaceRepoId=modularai/llama-3.1
+  --set maxServe.maxLength=512 \
+  --set maxServe.maxCacheBatchSize=16 \
+  --set envSecret.HUGGING_FACE_HUB_TOKEN=<insert-huggingface-token> \
+  --set env.HF_HUB_ENABLE_HF_TRANSFER=1 \
+  --timeout 10m0s \
+  --wait
+
+# forward the remote k8s port to the local network to access the service locally
+# the command is blocking and takes the terminal
+# user another terminal for subsequent curl and ctrl-c to stop the port forwarding
+POD_NAME=$(kubectl get pods --namespace max-openai-api-demo -l "app.kubernetes.io/name=max-openai-api-chart,app.kubernetes.io/instance=max-openai-api" -o jsonpath="{.items[0].metadata.name}")
+CONTAINER_PORT=$(kubectl get pod --namespace max-openai-api-demo $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
+kubectl port-forward $POD_NAME 8000:$CONTAINER_PORT --namespace max-openai-api-demo &
+
+# test the service
+curl -N http://localhost:8000/v1/chat/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+        "model": "modularai/llama-3.1",
+        "stream": true,
+        "messages": [
+            {"role": "system", "content": "You are a helpful assistant."},
+            {"role": "user", "content": "Who won the world series in 2020?"}
+        ]
+    }'
+
+# uninstall MAX OpenAI API
+helm uninstall max-openai-api --namespace max-openai-api-demo
+
+# Delete the namespace
+kubectl delete namespace max-openai-api-demo
+
+# delete the k8s cluster
+eksctl delete cluster \
+  --name max-openai-api-demo \
+  --region us-east-1
+```
+
+## Values
+
+| Key | Type | Default | Description |
+|-----|------|---------|-------------|
+| affinity | object | `{}` | Affinity to be added to all deployments |
+| env | object | `{}` | Environment variables that will be passed into pods |
+| envFromSecret | string | `"{{ template \"max.fullname\" . }}-env"` | The name of the secret which we will use to populate env vars in deployed pods This can be useful for secret keys, etc. |
+| envFromSecrets | list | `[]` | This can be a list of templated strings |
+| envRaw | list | `[]` | Environment variables in RAW format that will be passed into pods |
+| envSecret | object | `{}` | Environment variables to pass as secrets |
+| fullnameOverride | string | `nil` | Provide a name to override the full names of resources |
+| image.pullPolicy | string | `"IfNotPresent"` |  |
+| image.repository | string | `"modular/max-openai-api"` |  |
+| image.tag | string | `"latest"` |  |
+| imagePullSecrets | list | `[]` |  |
+| inferenceServer.affinity | object | `{}` | Affinity to be added to inferenceServer deployment |
+| inferenceServer.args | list | See `values.yaml` | Arguments to pass to the node entrypoint. If defined it overwrites the default args value set by .Values.max-serve |
+| inferenceServer.autoscaling.enabled | bool | `false` |  |
+| inferenceServer.autoscaling.maxReplicas | int | `2` |  |
+| inferenceServer.autoscaling.minReplicas | int | `1` |  |
+| inferenceServer.autoscaling.targetCPUUtilizationPercentage | int | `80` |  |
+| inferenceServer.containerSecurityContext | object | `{}` |  |
+| inferenceServer.deploymentAnnotations | object | `{}` | Annotations to be added to inferenceServer deployment |
+| inferenceServer.deploymentLabels | object | `{}` | Labels to be added to inferenceServer deployment |
+| inferenceServer.env | object | `{}` |  |
+| inferenceServer.extraContainers | list | `[]` | Launch additional containers into inferenceServer pod |
+| inferenceServer.livenessProbe.failureThreshold | int | `3` |  |
+| inferenceServer.livenessProbe.httpGet.path | string | `"/v1/health"` |  |
+| inferenceServer.livenessProbe.httpGet.port | string | `"http"` |  |
+| inferenceServer.livenessProbe.initialDelaySeconds | int | `1` |  |
+| inferenceServer.livenessProbe.periodSeconds | int | `15` |  |
+| inferenceServer.livenessProbe.successThreshold | int | `1` |  |
+| inferenceServer.livenessProbe.timeoutSeconds | int | `1` |  |
+| inferenceServer.nodeSelector | object | `{}` | NodeSelector to be added to inferenceServer deployment |
+| inferenceServer.podAnnotations | object | `{}` | Annotations to be added to inferenceServer pods |
+| inferenceServer.podLabels | object | `{}` | Labels to be added to inferenceServer pods |
+| inferenceServer.podSecurityContext | object | `{}` |  |
+| inferenceServer.readinessProbe.failureThreshold | int | `3` |  |
+| inferenceServer.readinessProbe.httpGet.path | string | `"/v1/health"` |  |
+| inferenceServer.readinessProbe.httpGet.port | string | `"http"` |  |
+| inferenceServer.readinessProbe.initialDelaySeconds | int | `1` |  |
+| inferenceServer.readinessProbe.periodSeconds | int | `15` |  |
+| inferenceServer.readinessProbe.successThreshold | int | `1` |  |
+| inferenceServer.readinessProbe.timeoutSeconds | int | `1` |  |
+| inferenceServer.replicaCount | int | `1` |  |
+| inferenceServer.resources | object | `{}` | Resource settings for the inferenceServer pods - these settings overwrite existing values from the global resources object defined above. |
+| inferenceServer.startupProbe.failureThreshold | int | `60` |  |
+| inferenceServer.startupProbe.httpGet.path | string | `"/v1/health"` |  |
+| inferenceServer.startupProbe.httpGet.port | string | `"http"` |  |
+| inferenceServer.startupProbe.initialDelaySeconds | int | `1` |  |
+| inferenceServer.startupProbe.periodSeconds | int | `5` |  |
+| inferenceServer.startupProbe.successThreshold | int | `1` |  |
+| inferenceServer.startupProbe.timeoutSeconds | int | `1` |  |
+| inferenceServer.strategy | object | `{}` |  |
+| inferenceServer.tolerations | list | `[]` | Tolerations to be added to inferenceServer deployment |
+| inferenceServer.topologySpreadConstraints | list | `[]` | TopologySpreadConstrains to be added to inferenceServer deployments |
+| inferenceServer.volumeMounts | list | `[]` | Volumes to mount into inferenceServer pod |
+| inferenceServer.volumes | list | `[]` | Volumes to mount into inferenceServer pod |
+| ingress.annotations | object | `{}` |  |
+| ingress.enabled | bool | `false` |  |
+| ingress.extraHostsRaw | list | `[]` |  |
+| ingress.hosts | list | `[]` |  |
+| ingress.ingressClassName | string | `nil` |  |
+| ingress.path | string | `"/"` |  |
+| ingress.pathType | string | `"ImplementationSpecific"` |  |
+| ingress.tls | list | `[]` |  |
+| maxServe | object | `{"cacheStrategy":"continuous","huggingfaceRepoId":"modularai/llama-3.1","maxCacheBatchSize":"250","maxLength":"2048","maxNumSteps":"10"}` | MAX Serve arguments |
+| nameOverride | string | `nil` | Provide a name to override the name of the chart |
+| nodeSelector | object | `{}` | NodeSelector to be added to all deployments |
+| resources | object | `{}` |  |
+| runAsUser | int | `0` | User ID directive. This user must have enough permissions to run the bootstrap script Running containers as root is not recommended in production. Change this to another UID - e.g. 1000 to be more secure |
+| service.annotations | object | `{}` |  |
+| service.loadBalancerIP | string | `nil` |  |
+| service.ports[0].name | string | `"http"` |  |
+| service.ports[0].port | int | `8000` |  |
+| service.ports[0].protocol | string | `"TCP"` |  |
+| service.ports[0].targetPort | int | `8000` |  |
+| service.type | string | `"ClusterIP"` |  |
+| serviceAccount.annotations | object | `{}` |  |
+| serviceAccount.create | bool | `false` | Create custom service account for MAX Serving. If create: true and serviceAccountName is not provided, `max.fullname` will be used. |
+| serviceAccountName | string | `nil` | Specify service account name to be used |
+| tolerations | list | `[]` | Tolerations to be added to all deployments |
+| topologySpreadConstraints | list | `[]` | TopologySpreadConstraints to be added to all deployments |
+| volumeMounts | list | `[]` |  |
+| volumes | list | `[]` |  |
diff --git a/tutorials/helm/max-openai-api/README.md.gotmpl b/tutorials/helm/max-openai-api/README.md.gotmpl
@@ -0,0 +1,112 @@
+<!-- markdownlint-disable -->
+<!--
+NOTE: This file is generated by helm-docs: https://github.com/norwoodj/helm-docs#installation
+-->
+
+# MAX OpenAI API Helm chart
+
+{{ template "chart.deprecationWarning" . }}
+
+{{ template "chart.description" . }}
+
+{{ template "chart.homepageLine" . }}
+
+{{ template "chart.sourcesSection" . }}
+
+## Usage
+
+### Installing the chart
+
+To install this chart using Helm 3, run the following command:
+
+```console
+helm install max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart \
+  --version <insert-version> \
+  --set huggingfaceRepoId=<insert-huggingface-model-id>
+  --set maxServe.maxLength=512 \
+  --set maxServe.maxCacheBatchSize=16 \
+  --set envSecret.HUGGING_FACE_HUB_TOKEN=<insert-huggingface-token> \
+  --set env.HF_HUB_ENABLE_HF_TRANSFER=1 \
+  --wait
+```
+
+The command deploys MAX OpenAI API on the Kubernetes cluster in the default configuration. The Values reference section below lists the parameters that can be configured during installation.
+
+### Upgrading the chart
+
+To upgrade the chart with the release name `max-openai-api`:
+
+```console
+helm upgrade max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart
+```
+
+### Uninstalling the chart
+
+To uninstall/delete the `max-openai-api` deployment:
+
+```console
+helm delete max-openai-api
+```
+
+### End-to-end example that provisions an K8s cluster and installs MAX OpenAI API
+
+To provision a k8s cluster via `eksctl` and then install MAX OpenAI API, run the following commands:
+
+```console
+# provision a k8s cluster (takes 10-15 minutes)
+eksctl create cluster \
+  --name max-openai-api-demo \
+  --region us-east-1 \
+  --node-type g5.4xlarge \
+  --nodes 1
+
+# create a k8s namespace
+kubectl create namespace max-openai-api-demo
+
+# deploy MAX OpenAI API via helm chart (takes 10 minutes)
+helm install max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart \
+  --version <insert-version> \
+  --namespace max-openai-api-demo \
+  --set huggingfaceRepoId=modularai/llama-3.1
+  --set maxServe.maxLength=512 \
+  --set maxServe.maxCacheBatchSize=16 \
+  --set envSecret.HUGGING_FACE_HUB_TOKEN=<insert-huggingface-token> \
+  --set env.HF_HUB_ENABLE_HF_TRANSFER=1 \
+  --timeout 10m0s \
+  --wait
+
+# forward the remote k8s port to the local network to access the service locally
+# the command is blocking and takes the terminal
+# user another terminal for subsequent curl and ctrl-c to stop the port forwarding
+POD_NAME=$(kubectl get pods --namespace max-openai-api-demo -l "app.kubernetes.io/name=max-openai-api-chart,app.kubernetes.io/instance=max-openai-api" -o jsonpath="{.items[0].metadata.name}")
+CONTAINER_PORT=$(kubectl get pod --namespace max-openai-api-demo $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
+kubectl port-forward $POD_NAME 8000:$CONTAINER_PORT --namespace max-openai-api-demo &
+
+# test the service
+curl -N http://localhost:8000/v1/chat/completions \
+    -H "Content-Type: application/json" \
+    -d '{
+        "model": "modularai/llama-3.1",
+        "stream": true,
+        "messages": [
+            {"role": "system", "content": "You are a helpful assistant."},
+            {"role": "user", "content": "Who won the world series in 2020?"}
+        ]
+    }'
+
+# uninstall MAX OpenAI API
+helm uninstall max-openai-api --namespace max-openai-api-demo
+
+# Delete the namespace
+kubectl delete namespace max-openai-api-demo
+
+# delete the k8s cluster
+eksctl delete cluster \
+  --name max-openai-api-demo \
+  --region us-east-1
+```
+
+
+{{ template "chart.requirementsSection" . }}
+
+{{ template "chart.valuesSection" . }}
diff --git a/tutorials/helm/max-openai-api/templates/NOTES.txt b/tutorials/helm/max-openai-api/templates/NOTES.txt
@@ -0,0 +1,22 @@
+1. Get the application URL by running these commands:
+{{- if .Values.ingress.enabled }}
+  {{- range .Values.ingress.hosts }}
+  http{{ if $.Values.ingress.tls }}s{{ end }}://{{ . }}{{ $.Values.ingress.path }}
+  {{- end }}
+{{- else if contains "NodePort" .Values.service.type }}
+  NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ template "max.fullname" . }})
+  NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}")
+  echo http://$NODE_IP:$NODE_PORT
+{{- else if contains "LoadBalancer" .Values.service.type }}
+     NOTE: It may take a few minutes for the LoadBalancer IP to be available.
+           You can watch the status of by running 'kubectl get svc -w {{ template "max.fullname" . }}'
+  SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ template "max.fullname" . }} -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
+  echo http://$SERVICE_IP:{{ .Values.service.port }}
+{{- else if contains "ClusterIP" .Values.service.type }}
+  POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ template "max.name" . }},app.kubernetes.io/instance={{ .Release.Name }}" -o jsonpath="{.items[0].metadata.name}")
+  CONTAINER_PORT=$(kubectl get pod --namespace {{ .Release.Namespace }} $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
+  echo "The application is available at the following DNS name from within your cluster:"
+  echo "{{ .Release.Name }}.{{ .Release.Namespace }}.svc.cluster.local:$CONTAINER_PORT"
+  echo "Or use the following command to forward ports and visit it locally at http://127.0.0.1:8000"
+  echo "kubectl port-forward $POD_NAME 8000:$CONTAINER_PORT --namespace {{ .Release.Namespace }}"
+{{- end }}