add a generic K8s install guide (#312)

This shows that KubeAI can easily be installed on any K8s cluster.
substratusai · Nov 21, 2024 · 499a6ed · 499a6ed
1 parent 8e0a494
commit 499a6ed
Show file tree

Hide file tree

Showing 2 changed files with 112 additions and 0 deletions.
diff --git a/docs/how-to/install-models.md b/docs/how-to/install-models.md
@@ -54,10 +54,52 @@ kubectl explain models.spec
 kubectl explain models.spec.engine
 ```
 
+You can view all example manifests on the [GitHub repository](https://github.com/substratusai/kubeai/tree/main/manifests/models).
+
+Below are few examples using various engines and resource profiles.
+
+### Example Gemma 2 2B using Ollama on CPU
+
+```yaml
+apiVersion: kubeai.org/v1
+kind: Model
+metadata:
+  name: gemma2-2b-cpu
+spec:
+  features: [TextGeneration]
+  url: ollama://gemma2:2b
+  engine: OLlama
+  resourceProfile: cpu:2
+```
+
+### Example Llama 3.1 8B using vLLM on NVIDIA L4 GPU
+
+```yaml
+apiVersion: kubeai.org/v1
+kind: Model
+metadata:
+  name: llama-3.1-8b-instruct-fp8-l4
+spec:
+  features: [TextGeneration]
+  owner: neuralmagic
+  url: hf://neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8
+  engine: VLLM
+  args:
+    - --max-model-len=16384
+    - --max-num-batched-token=16384
+    - --gpu-memory-utilization=0.9
+    - --disable-log-requests
+  resourceProfile: nvidia-gpu-l4:1
+```
+
 ## Programmatically installing models
 
 See the [examples](https://github.com/substratusai/kubeai/tree/main/examples/k8s-api-clients).
 
+## Calling a model
+
+You can inference a model by calling the KubeAI OpenAI compatible API. The model name should match the KubeAI model name.
+
 ## Feedback welcome: A model management UI
 
 We are considering adding a UI for managing models in a running KubeAI instance. Give the [GitHub Issue](https://github.com/substratusai/kubeai/issues/148) a thumbs up if you would be interested in this feature.
diff --git a/docs/installation/any.md b/docs/installation/any.md
@@ -0,0 +1,70 @@
+# Install on any Kubernetes Cluster
+
+KubeAI can be installed on any Kubernetes cluster and doesn't require GPUs.
+If you do have GPUs, then KubeAI can take advantage of them.
+
+Please follow the Installation using GPUs section if you have GPUs available.
+
+
+## Prerequisites
+
+1. Add the KubeAI helm repository.
+
+```bash
+helm repo add kubeai https://www.kubeai.org
+helm repo update
+```
+
+2. (Optional) Set the Hugging Face token as an environment variable. This is only required if you plan to use HuggingFace models that require authentication.
+
+```bash
+export HF_TOKEN=<your-hugging-face-token>
+```
+
+## Installation using only CPUs
+
+All engines supported in KubeAI also support running only on CPU resources.
+
+Install KubeAI using the pre-defined values file which defines CPU resourceProfiles:
+
+```bash
+helm install kubeai kubeai/kubeai --wait \
+  --set secrets.huggingface.token=$HF_TOKEN
+```
+
+Optionally, inspect the values file to see the default resourceProfiles:
+
+```bash
+helm show values kubeai/kubeai > values.yaml
+```
+
+## Installation using GPUs
+
+This section assumes you have a Kubernetes cluster with GPU resources available and
+installed the NVIDIA device plugin that adds GPU information labels to the nodes.
+
+This time we need to use a custom resource profiles that define the nodeSelectors
+for different GPU types.
+
+Download the values file for the NVIDIA GPU operator:
+
+```bash
+curl -L -O https://raw.githubusercontent.com/substratusai/kubeai/refs/heads/main/charts/kubeai/values-nvidia-k8s-device-plugin.yaml
+```
+
+You likely will not need to modify the `values-nvidia-k8s-device-plugin.yaml` file.
+However, do inspect the file to ensure the GPU resourceProfile nodeSelectors match
+the node labels on your nodes.
+
+
+Install KubeAI using the custom resourceProfiles:
+```bash
+helm upgrade --install kubeai kubeai/kubeai \
+    -f values-nvidia-k8s-device-plugin.yaml \
+    --set secrets.huggingface.token=$HF_TOKEN \
+    --wait
+```
+
+## Deploying models
+
+See the [How to install models guide](/how-to/installing-models.md) for instructions on deploying models and examples.