This page demonstrates a Torchserve deployment in Kubernetes using Helm Charts. It uses the DockerHub Torchserve Image for the pods and a PersistentVolume for storing config / model files.
The following table describes all the parameters for the Helm Chart.
Parameter | Description | Default |
---|---|---|
image |
Torchserve Serving image | pytorch/torchserve:latest-gpu |
inference_port |
TS Inference port | 8080 |
management_port |
TS Management port | 8081 |
metrics_port |
TS Metrics port | 8082 |
replicas |
K8S deployment replicas | 1 |
model-store |
EFS mountpath | /home/model-server/shared/ |
persistence.size |
Storage size to request | 1Gi |
n_gpu |
Number of GPU in a TS Pod | 1 |
n_cpu |
Number of CPU in a TS Pod | 1 |
memory_limit |
TS Pod memory limit | 4Gi |
memory_request |
TS Pod memory request | 1Gi |
Edit the values in values.yaml
with the right parameters. Somethings to consider,
- Set torchserve_image to the
pytorch/torchserve:latest
if your nodes are CPU. - Set
persistence.size
based on the size of your models. - The value of
replicas
should be less than number of Nodes in the Node group. n_gpu
would be exposed to TS container by docker. This should be set tonumber_of_gpu
inconfig.properties
above.n_gpu
&n_cpu
values are used on a per pod level and not in the entire cluster level
# Default values for torchserve helm chart.
torchserve_image: pytorch/torchserve:latest-gpu
namespace: torchserve
torchserve:
management_port: 8081
inference_port: 8080
metrics_port: 8082
grpc_inference_port: 7070
pvd_mount: /home/model-server/shared/
n_gpu: 1
n_cpu: 1
memory_limit: 4Gi
memory_request: 1Gi
deployment:
replicas: 1 # Changes this to number of node in Node Group
persitant_volume:
size: 1Gi
To install Torchserve run helm install ts .
ubuntu@ip-172-31-50-36:~/serve/kubernetes/Helm$ helm install ts .
NAME: ts
LAST DEPLOYED: Wed Jul 29 08:29:04 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
Verify that torchserve has successfully started by executing kubectl exec pod/torchserve-fff -- cat logs/ts_log.log
on your torchserve pod. You can get this id by lookingup kubectl get po --all-namespaces
Your output should should look similar to
ubuntu@ip-172-31-50-36:~/serve/kubernetes$ kubectl exec pod/torchserve-fff -- cat logs/ts_log.log
2020-07-29 08:29:08,295 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.1.1
TS Home: /home/venv/lib/python3.6/site-packages
Current directory: /home/model-server
......
Fetch the Load Balancer External IP by executing
kubectl get svc
You should see an entry similar to
ubuntu@ip-172-31-65-0:~/ts/rel/serve$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
torchserve LoadBalancer 10.100.142.22 your-loadbalancer-address 8080:31115/TCP,8081:31751/TCP 14m
Now execute the following commands to test Management / Prediction APIs
curl http://your-loadbalancer-address:8081/models
# You should something similar to the following
{
"models": [
{
"modelName": "mnist",
"modelUrl": "mnist.mar"
},
{
"modelName": "squeezenet1_1",
"modelUrl": "squeezenet1_1.mar"
}
]
}
curl http://your-loadbalancer-address:8081/models/squeezenet1_1
# You should see something similar to the following
[
{
"modelName": "squeezenet1_1",
"modelVersion": "1.0",
"modelUrl": "squeezenet1_1.mar",
"runtime": "python",
"minWorkers": 3,
"maxWorkers": 3,
"batchSize": 1,
"maxBatchDelay": 100,
"loadedAtStartup": false,
"workers": [
{
"id": "9000",
"startTime": "2020-07-23T18:34:33.201Z",
"status": "READY",
"gpu": true,
"memoryUsage": 177491968
},
{
"id": "9001",
"startTime": "2020-07-23T18:34:33.204Z",
"status": "READY",
"gpu": true,
"memoryUsage": 177569792
},
{
"id": "9002",
"startTime": "2020-07-23T18:34:33.204Z",
"status": "READY",
"gpu": true,
"memoryUsage": 177872896
}
]
}
]
wget https://raw.githubusercontent.com/pytorch/serve/master/docs/images/kitten_small.jpg
curl -X POST http://your-loadbalancer-address:8080/predictions/squeezenet1_1 -T kitten_small.jpg
# You should something similar to the following
[
{
"lynx": 0.5370921492576599
},
{
"tabby": 0.28355881571769714
},
{
"Egyptian_cat": 0.10669822245836258
},
{
"tiger_cat": 0.06301568448543549
},
{
"leopard": 0.006023923866450787
}
]
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/prometheus
helm repo add grafana https://grafana.github.io/helm-charts
helm install grafana grafana/grafana
kubectl get pods
NAME READY STATUS RESTARTS AGE
grafana-cbd8775fd-6f8l5 1/1 Running 0 4h12m
model-store-pod 1/1 Running 0 4h35m
prometheus-alertmanager-776df7bfb5-hpsp4 2/2 Running 0 4h42m
prometheus-kube-state-metrics-6df5d44568-zkcm2 1/1 Running 0 4h42m
prometheus-node-exporter-fvsd6 1/1 Running 0 4h42m
prometheus-node-exporter-tmfh8 1/1 Running 0 4h42m
prometheus-pushgateway-85948997f7-4s4bj 1/1 Running 0 4h42m
prometheus-server-f8677599b-xmjbt 2/2 Running 0 4h42m
torchserve-7d468f9894-fvmpj 1/1 Running 0 4h33m
kubectl get pod prometheus-server-f8677599b-xmjbt -o jsonpath='{.status.podIPs[0].ip}'
192.168.52.141
kubectl patch service grafana -p '{"spec": {"type": "LoadBalancer"}}'
kubectl get svc grafana -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'
Get admin user password by running:
kubectl get secret --namespace default grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
Username: admin Password: <--The password got from previous step-->
Open Grafana in browser with the url - <http://your.grafana.loadbalancer.address:3000>
Add Prometheus data source
The TS metrics will be available in Prometheus for Grafana dashboards.
Follow the link for log aggregation with EFK Stack.
Log aggregation using EFK stack
- Incorrect values in
values.yaml
- If you changed values in
torchserve.pvd_mount
, make sureconfig.properties
was also updated to match the values.
- If you changed values in
- Invalid
config.properties
- You can verify these values by running this for local TS installation.
- Ensure you have available Nodes in Node Group.
- You may inspect the values by running
helm list
andhelm get all ts
to verify if the values used for the installation. - You can uninstall / reinstall the helm chart by executing
helm uninstall ts
andhelm install ts .
helm install ts .
fails withError: create: failed to create: Request entity too large: limit is 3145728
orinvalid: data: Too long: must have at most 1048576 characters
.- Ensure that you don't have any stale files in your kubernetes directory where you are executing the command. If so, move them out of the directory or add them to .helmignore file.
kubectl get svc
does't show my torchserve service- Try reinstalling the helm chart by executing
helm uninstall ts
andhelm install ts .
- Try reinstalling the helm chart by executing
- "Error: unable to build kubernetes objects from release manifest: unable to recognize “”: no matches for kind “ClusterConfig” in version “eksctl.io/v1alpha5”"
- Helm is picking up other .yaml files. Make sure you’ve added other files correctly to .helmignore. It should only run with values.yaml.
kubectl describe pod
shows error message "0/1 nodes are available: 1 Insufficient cpu."- Ensure that the
n_cpu
value invalues.yaml
is set to a number that can be supported by the nodes in the cluster.
- Ensure that the
Autoscaling with torchserve metrics
- Follow the instructions above and deploy Torchserve with more than 1 replica to the kubernetes cluster
- Download Istio and add to path as shown here
- Install Istio with below command
istioctl install --set meshConfig.accessLogFile=/dev/stdout
Now we have multiple replicas of Torchserve running and istio installed. We can apply gateway, virtual service and destination rule to enable session affinity to the user requests.
- Apply the istio gateway via
kubectl apply -f gateway.yaml
- This gateway exposes all the host behind it via port 80 as defined in the yaml file.
- Apply the virtual service with command
kubectl apply -f virtual_service.yaml
- This with look for header named
protocol
in the incoming request and forward the request to Torchserve service. If theprotocol
header has a valuerest
then the request is forwarded to port8080
of Torchserve service and if theprotocol
header has a valuegrpc
then the request is forwarded to port7070
for Torchserve service.
- This with look for header named
- Apply the destination Rule using the command
kubectl apply -f destination_rule.yaml
.- The destination rule look for a http cookie with a key
session_id
. The request withsession_id
is served by the same pod that served the previous request with the samesession_id
- The destination rule look for a http cookie with a key
- Fetch the external IP from istio-ingress gateway using the below command
ubuntu@ubuntu$ kubectl get svc -n istio-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway LoadBalancer 10.100.84.243 a918b2zzzzzzzzzzzzzzzzzzzzzz-1466623565.us-west-2.elb.amazonaws.com 15021:32270/TCP,80:31978/TCP,443:31775/TCP,70:31778/TCP 2d6h
- Make Request as shown below
curl -v -H "protocol: REST" --cookie "session_id="12345" http://a918b2d70dbddzzzzzzzzzzz49ec8cf03b-1466623565.us-west-2.elb.amazonaws.com:80/predictions/<model_name> -d "data=<input-string>"
- Refer grpc_api to generate python files and run
python ts_scripts/torchserve_grpc_client.py infer <model_name> <input-string>
- [] Log / Metrics Aggregation using AWS Container Insights
- [] EFK Stack Integration
- [] Readiness / Liveness Probes
- [] Canary
- [] Cloud agnostic Distributed Storage example