Skip to content

Commit

Permalink
[Tutorials] Add a helm chart for MAX OpenAI API container image
Browse files Browse the repository at this point in the history
MODULAR_ORIG_COMMIT_REV_ID: 4fe81ce8b59387888d641238fed0cc455ef95831
  • Loading branch information
steventr authored and modularbot committed Dec 17, 2024
1 parent ce15fbc commit 636f83b
Show file tree
Hide file tree
Showing 13 changed files with 972 additions and 0 deletions.
19 changes: 19 additions & 0 deletions tutorials/helm/max-openai-api/.helmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*~
# Various IDEs
.project
.idea/
*.tmproj
bin
20 changes: 20 additions & 0 deletions tutorials/helm/max-openai-api/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
##===----------------------------------------------------------------------===##
#
# This file is Modular Inc proprietary.
#
##===----------------------------------------------------------------------===##
apiVersion: v2
appVersion: "0.1.0"
description: The MAX platform unifies the leading AI development frameworks (TensorFlow, PyTorch, ONNX) and hardware backends in order to simplify deployment for AI production teams and accelerate innovation for AI developers.
name: max-openai-api-chart
home: https://www.modular.com/
keywords:
- machine learning
- inference
sources:
- https://github.com/modularml/max
maintainers:
- name: Modular team
email: [email protected]
url: https://github.com/modularml/max
version: 0.1.0
193 changes: 193 additions & 0 deletions tutorials/helm/max-openai-api/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
<!-- markdownlint-disable -->
<!--
NOTE: This file is generated by helm-docs: https://github.com/norwoodj/helm-docs#installation
-->

# MAX OpenAI API Helm chart

The MAX platform unifies the leading AI development frameworks (TensorFlow, PyTorch, ONNX) and hardware backends in order to simplify deployment for AI production teams and accelerate innovation for AI developers.

**Homepage:** <https://www.modular.com/>

## Source Code

* <https://github.com/modularml/max>

## Usage

### Installing the chart

To install this chart using Helm 3, run the following command:

```console
helm install max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart \
--version <insert-version> \
--set huggingfaceRepoId=<insert-huggingface-model-id>
--set maxServe.maxLength=512 \
--set maxServe.maxCacheBatchSize=16 \
--set envSecret.HUGGING_FACE_HUB_TOKEN=<insert-huggingface-token> \
--set env.HF_HUB_ENABLE_HF_TRANSFER=1 \
--wait
```

The command deploys MAX OpenAI API on the Kubernetes cluster in the default configuration. The Values reference section below lists the parameters that can be configured during installation.

### Upgrading the chart

To upgrade the chart with the release name `max-openai-api`:

```console
helm upgrade max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart
```

### Uninstalling the chart

To uninstall/delete the `max-openai-api` deployment:

```console
helm delete max-openai-api
```

### End-to-end example that provisions an K8s cluster and installs MAX OpenAI API

To provision a k8s cluster via `eksctl` and then install MAX OpenAI API, run the following commands:

```console
# provision a k8s cluster (takes 10-15 minutes)
eksctl create cluster \
--name max-openai-api-demo \
--region us-east-1 \
--node-type g5.4xlarge \
--nodes 1

# create a k8s namespace
kubectl create namespace max-openai-api-demo

# deploy MAX OpenAI API via helm chart (takes 10 minutes)
helm install max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart \
--version <insert-version> \
--namespace max-openai-api-demo \
--set huggingfaceRepoId=modularai/llama-3.1
--set maxServe.maxLength=512 \
--set maxServe.maxCacheBatchSize=16 \
--set envSecret.HUGGING_FACE_HUB_TOKEN=<insert-huggingface-token> \
--set env.HF_HUB_ENABLE_HF_TRANSFER=1 \
--timeout 10m0s \
--wait

# forward the remote k8s port to the local network to access the service locally
# the command is blocking and takes the terminal
# user another terminal for subsequent curl and ctrl-c to stop the port forwarding
POD_NAME=$(kubectl get pods --namespace max-openai-api-demo -l "app.kubernetes.io/name=max-openai-api-chart,app.kubernetes.io/instance=max-openai-api" -o jsonpath="{.items[0].metadata.name}")
CONTAINER_PORT=$(kubectl get pod --namespace max-openai-api-demo $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
kubectl port-forward $POD_NAME 8000:$CONTAINER_PORT --namespace max-openai-api-demo &

# test the service
curl -N http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "modularai/llama-3.1",
"stream": true,
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}
]
}'

# uninstall MAX OpenAI API
helm uninstall max-openai-api --namespace max-openai-api-demo

# Delete the namespace
kubectl delete namespace max-openai-api-demo

# delete the k8s cluster
eksctl delete cluster \
--name max-openai-api-demo \
--region us-east-1
```

## Values

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| affinity | object | `{}` | Affinity to be added to all deployments |
| env | object | `{}` | Environment variables that will be passed into pods |
| envFromSecret | string | `"{{ template \"max.fullname\" . }}-env"` | The name of the secret which we will use to populate env vars in deployed pods This can be useful for secret keys, etc. |
| envFromSecrets | list | `[]` | This can be a list of templated strings |
| envRaw | list | `[]` | Environment variables in RAW format that will be passed into pods |
| envSecret | object | `{}` | Environment variables to pass as secrets |
| fullnameOverride | string | `nil` | Provide a name to override the full names of resources |
| image.pullPolicy | string | `"IfNotPresent"` | |
| image.repository | string | `"modular/max-openai-api"` | |
| image.tag | string | `"latest"` | |
| imagePullSecrets | list | `[]` | |
| inferenceServer.affinity | object | `{}` | Affinity to be added to inferenceServer deployment |
| inferenceServer.args | list | See `values.yaml` | Arguments to pass to the node entrypoint. If defined it overwrites the default args value set by .Values.max-serve |
| inferenceServer.autoscaling.enabled | bool | `false` | |
| inferenceServer.autoscaling.maxReplicas | int | `2` | |
| inferenceServer.autoscaling.minReplicas | int | `1` | |
| inferenceServer.autoscaling.targetCPUUtilizationPercentage | int | `80` | |
| inferenceServer.containerSecurityContext | object | `{}` | |
| inferenceServer.deploymentAnnotations | object | `{}` | Annotations to be added to inferenceServer deployment |
| inferenceServer.deploymentLabels | object | `{}` | Labels to be added to inferenceServer deployment |
| inferenceServer.env | object | `{}` | |
| inferenceServer.extraContainers | list | `[]` | Launch additional containers into inferenceServer pod |
| inferenceServer.livenessProbe.failureThreshold | int | `3` | |
| inferenceServer.livenessProbe.httpGet.path | string | `"/v1/health"` | |
| inferenceServer.livenessProbe.httpGet.port | string | `"http"` | |
| inferenceServer.livenessProbe.initialDelaySeconds | int | `1` | |
| inferenceServer.livenessProbe.periodSeconds | int | `15` | |
| inferenceServer.livenessProbe.successThreshold | int | `1` | |
| inferenceServer.livenessProbe.timeoutSeconds | int | `1` | |
| inferenceServer.nodeSelector | object | `{}` | NodeSelector to be added to inferenceServer deployment |
| inferenceServer.podAnnotations | object | `{}` | Annotations to be added to inferenceServer pods |
| inferenceServer.podLabels | object | `{}` | Labels to be added to inferenceServer pods |
| inferenceServer.podSecurityContext | object | `{}` | |
| inferenceServer.readinessProbe.failureThreshold | int | `3` | |
| inferenceServer.readinessProbe.httpGet.path | string | `"/v1/health"` | |
| inferenceServer.readinessProbe.httpGet.port | string | `"http"` | |
| inferenceServer.readinessProbe.initialDelaySeconds | int | `1` | |
| inferenceServer.readinessProbe.periodSeconds | int | `15` | |
| inferenceServer.readinessProbe.successThreshold | int | `1` | |
| inferenceServer.readinessProbe.timeoutSeconds | int | `1` | |
| inferenceServer.replicaCount | int | `1` | |
| inferenceServer.resources | object | `{}` | Resource settings for the inferenceServer pods - these settings overwrite existing values from the global resources object defined above. |
| inferenceServer.startupProbe.failureThreshold | int | `60` | |
| inferenceServer.startupProbe.httpGet.path | string | `"/v1/health"` | |
| inferenceServer.startupProbe.httpGet.port | string | `"http"` | |
| inferenceServer.startupProbe.initialDelaySeconds | int | `1` | |
| inferenceServer.startupProbe.periodSeconds | int | `5` | |
| inferenceServer.startupProbe.successThreshold | int | `1` | |
| inferenceServer.startupProbe.timeoutSeconds | int | `1` | |
| inferenceServer.strategy | object | `{}` | |
| inferenceServer.tolerations | list | `[]` | Tolerations to be added to inferenceServer deployment |
| inferenceServer.topologySpreadConstraints | list | `[]` | TopologySpreadConstrains to be added to inferenceServer deployments |
| inferenceServer.volumeMounts | list | `[]` | Volumes to mount into inferenceServer pod |
| inferenceServer.volumes | list | `[]` | Volumes to mount into inferenceServer pod |
| ingress.annotations | object | `{}` | |
| ingress.enabled | bool | `false` | |
| ingress.extraHostsRaw | list | `[]` | |
| ingress.hosts | list | `[]` | |
| ingress.ingressClassName | string | `nil` | |
| ingress.path | string | `"/"` | |
| ingress.pathType | string | `"ImplementationSpecific"` | |
| ingress.tls | list | `[]` | |
| maxServe | object | `{"cacheStrategy":"continuous","huggingfaceRepoId":"modularai/llama-3.1","maxCacheBatchSize":"250","maxLength":"2048","maxNumSteps":"10"}` | MAX Serve arguments |
| nameOverride | string | `nil` | Provide a name to override the name of the chart |
| nodeSelector | object | `{}` | NodeSelector to be added to all deployments |
| resources | object | `{}` | |
| runAsUser | int | `0` | User ID directive. This user must have enough permissions to run the bootstrap script Running containers as root is not recommended in production. Change this to another UID - e.g. 1000 to be more secure |
| service.annotations | object | `{}` | |
| service.loadBalancerIP | string | `nil` | |
| service.ports[0].name | string | `"http"` | |
| service.ports[0].port | int | `8000` | |
| service.ports[0].protocol | string | `"TCP"` | |
| service.ports[0].targetPort | int | `8000` | |
| service.type | string | `"ClusterIP"` | |
| serviceAccount.annotations | object | `{}` | |
| serviceAccount.create | bool | `false` | Create custom service account for MAX Serving. If create: true and serviceAccountName is not provided, `max.fullname` will be used. |
| serviceAccountName | string | `nil` | Specify service account name to be used |
| tolerations | list | `[]` | Tolerations to be added to all deployments |
| topologySpreadConstraints | list | `[]` | TopologySpreadConstraints to be added to all deployments |
| volumeMounts | list | `[]` | |
| volumes | list | `[]` | |
112 changes: 112 additions & 0 deletions tutorials/helm/max-openai-api/README.md.gotmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
<!-- markdownlint-disable -->
<!--
NOTE: This file is generated by helm-docs: https://github.com/norwoodj/helm-docs#installation
-->

# MAX OpenAI API Helm chart

{{ template "chart.deprecationWarning" . }}

{{ template "chart.description" . }}

{{ template "chart.homepageLine" . }}

{{ template "chart.sourcesSection" . }}

## Usage

### Installing the chart

To install this chart using Helm 3, run the following command:

```console
helm install max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart \
--version <insert-version> \
--set huggingfaceRepoId=<insert-huggingface-model-id>
--set maxServe.maxLength=512 \
--set maxServe.maxCacheBatchSize=16 \
--set envSecret.HUGGING_FACE_HUB_TOKEN=<insert-huggingface-token> \
--set env.HF_HUB_ENABLE_HF_TRANSFER=1 \
--wait
```

The command deploys MAX OpenAI API on the Kubernetes cluster in the default configuration. The Values reference section below lists the parameters that can be configured during installation.

### Upgrading the chart

To upgrade the chart with the release name `max-openai-api`:

```console
helm upgrade max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart
```

### Uninstalling the chart

To uninstall/delete the `max-openai-api` deployment:

```console
helm delete max-openai-api
```

### End-to-end example that provisions an K8s cluster and installs MAX OpenAI API

To provision a k8s cluster via `eksctl` and then install MAX OpenAI API, run the following commands:

```console
# provision a k8s cluster (takes 10-15 minutes)
eksctl create cluster \
--name max-openai-api-demo \
--region us-east-1 \
--node-type g5.4xlarge \
--nodes 1

# create a k8s namespace
kubectl create namespace max-openai-api-demo

# deploy MAX OpenAI API via helm chart (takes 10 minutes)
helm install max-openai-api oci://registry-1.docker.io/modular/max-openai-api-chart \
--version <insert-version> \
--namespace max-openai-api-demo \
--set huggingfaceRepoId=modularai/llama-3.1
--set maxServe.maxLength=512 \
--set maxServe.maxCacheBatchSize=16 \
--set envSecret.HUGGING_FACE_HUB_TOKEN=<insert-huggingface-token> \
--set env.HF_HUB_ENABLE_HF_TRANSFER=1 \
--timeout 10m0s \
--wait

# forward the remote k8s port to the local network to access the service locally
# the command is blocking and takes the terminal
# user another terminal for subsequent curl and ctrl-c to stop the port forwarding
POD_NAME=$(kubectl get pods --namespace max-openai-api-demo -l "app.kubernetes.io/name=max-openai-api-chart,app.kubernetes.io/instance=max-openai-api" -o jsonpath="{.items[0].metadata.name}")
CONTAINER_PORT=$(kubectl get pod --namespace max-openai-api-demo $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
kubectl port-forward $POD_NAME 8000:$CONTAINER_PORT --namespace max-openai-api-demo &

# test the service
curl -N http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "modularai/llama-3.1",
"stream": true,
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}
]
}'

# uninstall MAX OpenAI API
helm uninstall max-openai-api --namespace max-openai-api-demo

# Delete the namespace
kubectl delete namespace max-openai-api-demo

# delete the k8s cluster
eksctl delete cluster \
--name max-openai-api-demo \
--region us-east-1
```


{{ template "chart.requirementsSection" . }}

{{ template "chart.valuesSection" . }}
22 changes: 22 additions & 0 deletions tutorials/helm/max-openai-api/templates/NOTES.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
1. Get the application URL by running these commands:
{{- if .Values.ingress.enabled }}
{{- range .Values.ingress.hosts }}
http{{ if $.Values.ingress.tls }}s{{ end }}://{{ . }}{{ $.Values.ingress.path }}
{{- end }}
{{- else if contains "NodePort" .Values.service.type }}
NODE_PORT=$(kubectl get --namespace {{ .Release.Namespace }} -o jsonpath="{.spec.ports[0].nodePort}" services {{ template "max.fullname" . }})
NODE_IP=$(kubectl get nodes --namespace {{ .Release.Namespace }} -o jsonpath="{.items[0].status.addresses[0].address}")
echo http://$NODE_IP:$NODE_PORT
{{- else if contains "LoadBalancer" .Values.service.type }}
NOTE: It may take a few minutes for the LoadBalancer IP to be available.
You can watch the status of by running 'kubectl get svc -w {{ template "max.fullname" . }}'
SERVICE_IP=$(kubectl get svc --namespace {{ .Release.Namespace }} {{ template "max.fullname" . }} -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo http://$SERVICE_IP:{{ .Values.service.port }}
{{- else if contains "ClusterIP" .Values.service.type }}
POD_NAME=$(kubectl get pods --namespace {{ .Release.Namespace }} -l "app.kubernetes.io/name={{ template "max.name" . }},app.kubernetes.io/instance={{ .Release.Name }}" -o jsonpath="{.items[0].metadata.name}")
CONTAINER_PORT=$(kubectl get pod --namespace {{ .Release.Namespace }} $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
echo "The application is available at the following DNS name from within your cluster:"
echo "{{ .Release.Name }}.{{ .Release.Namespace }}.svc.cluster.local:$CONTAINER_PORT"
echo "Or use the following command to forward ports and visit it locally at http://127.0.0.1:8000"
echo "kubectl port-forward $POD_NAME 8000:$CONTAINER_PORT --namespace {{ .Release.Namespace }}"
{{- end }}
Loading

0 comments on commit 636f83b

Please sign in to comment.