Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRDB-45670: helm: automate the statefulset update involving new PVCs #443

Merged
merged 1 commit into from
Jan 16, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 3 additions & 19 deletions build/templates/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,26 +203,10 @@ $ helm upgrade my-release cockroachdb/cockroachdb \

Kubernetes will carry out a safe [rolling upgrade](https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#updating-statefulsets) of your CockroachDB nodes one-by-one.

However, the upgrade will fail if it involves adding new Persistent Volume Claim (PVC) to the existing pods (e.g. enabling WAL Failover, pushing logs to a separate volume, etc.). In such cases, kindly repeat the following steps for each pod:
1. Delete the statefulset
```shell
$ kubectl delete sts my-release-cockroachdb --cascade=orphan
```
The statefulset name can be found by running `kubectl get sts`. Note the `--cascade=orphan` flag used to prevent the deletion of pods.

2. Delete the pod
```shell
$ kubectl delete pod my-release-cockroachdb-<pod_number>
```

3. Upgrade Helm chart
```shell
$ helm upgrade my-release cockroachdb/cockroachdb
```
Kindly update the values.yaml file or provide the necessary flags to the `helm upgrade` command. This step will recreate the pod with the new PVCs.
However, the upgrade will fail if it involves adding new Persistent Volume Claim (PVC) to the existing pods (e.g. enabling WAL Failover, pushing logs to a separate volume, etc.).
In such cases, kindly run the `scripts/upgrade_with_new_pvc.sh` script to upgrade the cluster.

Note that the above steps need to be repeated for each pod in the CockroachDB cluster. This will ensure that the cluster is upgraded without any downtime.
Given the manual process involved, it is likely to cause network churn as cockroachdb will try to rebalance data across the other nodes. We are working on an automated solution to handle this scenario.
`./scripts/upgrade_with_new_pvc.sh -h` can be used for generating help on how to run the script.

Monitor the cluster's pods until all have been successfully restarted:

Expand Down
22 changes: 3 additions & 19 deletions cockroachdb/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,26 +204,10 @@ $ helm upgrade my-release cockroachdb/cockroachdb \

Kubernetes will carry out a safe [rolling upgrade](https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#updating-statefulsets) of your CockroachDB nodes one-by-one.

However, the upgrade will fail if it involves adding new Persistent Volume Claim (PVC) to the existing pods (e.g. enabling WAL Failover, pushing logs to a separate volume, etc.). In such cases, kindly repeat the following steps for each pod:
1. Delete the statefulset
```shell
$ kubectl delete sts my-release-cockroachdb --cascade=orphan
```
The statefulset name can be found by running `kubectl get sts`. Note the `--cascade=orphan` flag used to prevent the deletion of pods.

2. Delete the pod
```shell
$ kubectl delete pod my-release-cockroachdb-<pod_number>
```

3. Upgrade Helm chart
```shell
$ helm upgrade my-release cockroachdb/cockroachdb
```
Kindly update the values.yaml file or provide the necessary flags to the `helm upgrade` command. This step will recreate the pod with the new PVCs.
However, the upgrade will fail if it involves adding new Persistent Volume Claim (PVC) to the existing pods (e.g. enabling WAL Failover, pushing logs to a separate volume, etc.).
In such cases, kindly run the `scripts/upgrade_with_new_pvc.sh` script to upgrade the cluster.

Note that the above steps need to be repeated for each pod in the CockroachDB cluster. This will ensure that the cluster is upgraded without any downtime.
Given the manual process involved, it is likely to cause network churn as cockroachdb will try to rebalance data across the other nodes. We are working on an automated solution to handle this scenario.
`./scripts/upgrade_with_new_pvc.sh -h` can be used for generating help on how to run the script.

Monitor the cluster's pods until all have been successfully restarted:

Expand Down
67 changes: 67 additions & 0 deletions scripts/upgrade_with_new_pvc.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
#!/bin/bash

Help()
{
# Display Help
echo "This script performs Helm upgrade involving new PVCs. Kindly run it from the root of the repository."
echo
echo "usage: ./scripts/upgrade_with_new_pvc.sh <release_name> <chart> <chart_version> <values_file> <namespace> <sts_name> <num_replicas> [kubeconfig]"
echo
echo "options:"
echo "release_name: Helm release name, e.g. my-release"
echo "chart: Helm chart to use (either referenced locally, or to the Helm repository), e.g. cockroachdb/cockroachdb"
echo "chart_version: Helm chart version to upgrade to, e.g. 15.0.0"
echo "values_file: Path to the values file, e.g. ./cockroachdb/values.yaml"
echo "namespace: Kubernetes namespace, e.g. default"
echo "sts_name: Statefulset name (can be obtained through \"kubectl get sts\"), e.g. my-release-cockroachdb"
echo "num_replicas: Number of replicas in the statefulset, e.g. 3"
echo "kubeconfig (optional): Path to the kubeconfig file. Default is $HOME/.kube/config."
echo
echo "example: ./scripts/upgrade_with_new_pvc.sh my-release cockroachdb/cockroachdb 15.0.0 ./cockroachdb/values.yaml default my-release-cockroachdb 3"
echo
}

while getopts ":h" option; do
case $option in
h) # display Help
Help
exit;;
\?) # incorrect option
echo "Error: Invalid option"
exit;;
esac
done

release_name=$1
chart=$2
chart_version=$3
values_file=$4
namespace=$5
sts_name=$6
num_replicas=$7
kubeconfig=${8:-$HOME/.kube/config}

# For each replica, do the following:
# 1. Delete the statefulset
# 2. Delete the pod replica
# 3. Upgrade the Helm chart

for i in $(seq 0 $((num_replicas-1))); do
echo "========== Iteration $((i+1)) =========="

echo "$((i+1)). Deleting sts"
kubectl --kubeconfig=$kubeconfig -n $namespace delete statefulset $sts_name --cascade=orphan --wait=true

echo "$((i+1)). Deleting replica"
kubectl --kubeconfig=$kubeconfig -n $namespace delete pod $sts_name-$i --wait=true

echo "$((i+1)). Upgrading Helm"
# The "--wait" flag ensures the deleted pod replica and STS are up and running.
# However, at times, the STS fails to understand that all replicas are running and the upgrade is stuck.
# The "--timeout 1m" helps with short-circuiting the upgrade process. Even if the upgrade does time out, it is
# harmless and the last upgrade process will be successful once all the pods replicas have been updated.
helm upgrade -f $values_file $release_name $chart --kubeconfig=$kubeconfig --namespace $namespace --version $chart_version --wait --timeout 1m --debug

echo "Iteration $((i+1)) complete. Kindly validate the changes before proceeding."
read -p "Press enter to continue ..."
done
Loading