Skip to content

Latest commit

 

History

History
161 lines (96 loc) · 7.56 KB

README.md

File metadata and controls

161 lines (96 loc) · 7.56 KB

K3S the sane way

k3s-sane

But Why?

The cost

Before I started this project, I had PlanetScale Scaler Pro, Vercel Teams, Fly.io, Railway...One night I realised the bills from Vercel alone was higher than my monthly grocery budget!!!

Latency

Let's say if I am making a web app for my own convenience, I might buy a server in the UK, and deploy both the app and database on the same server, it will load instantly for me, and the data traveling between server and database would be short.

But how about my friends in the US? In Asia? In Australia? I soon bought a server in Singapore to test it.

Why slow? It took me close to 38.2 seconds to wait for data to come back after I clicked on login. The speed test was run under a full stack Golang project (the worse stack for frontend, amazing tool for backend, I won't say twice), all server render, no hydration. This means the page will be blank until the data comes in unless I build separate handlers for specific components that need data.

If I find a service provider that gives me this level of latency, I am out (unless they are very pretty 👉🏻👈🏻).

Inspiration

I took inspiration from Jeff Geerling's Raspberry Pi Cluster Project - if your stack can run under extreme conditions like the Rasberry Pi (ARM with 1GB RAM), you are golden. And you will learn a lot from running things on bare metal.

I stop thinking about distributed system, servers and their location but instead, I think about the OS of our generation -- Kubernetes to handle the complexity and separation of servers. Stripping down the distance, and physicality of servers, merging them as one. The Unix of distributed operation system excites me.

I know what I want. I can't afford EKS by myself. I want a deployment strategy that's optimised no matter how extreme the condition is. I can't guarantee the experience on the edge with milliseconds of cold start-- the best I can do is a server close enough to my friends and the shortest distance between server and database.

Let the fiber handle the rest. I pray.

What I want at the end...

  • GEO DNS: based on the requested user's location I route them to a server that's closest to them.
  • Application replication: I don't want to only have one database on one server. I want every server to have a replica of the same database. Each server might have multiple databases for different applications. Each server should also have replicas of the same applications.
  • Node Affinity: Each application in the same node should only talk to the database in the same node to allow the best speed. The communication happens within the node but not outside.

I bought 4 servers from a cloud provider around the world: London, Frankfurt, Seattle and Singapore. It's not the managed services from GKE or EKS that help you manage your Kubernetes cluster -- the only thing that came with them was the fact that they were booted with Debian 12.

What's included

Currently:

Todo:

  • Longhorn
  • BullMQ
  • ...?

Provisioning

Step 1: Configure K3s cluster

To provision all servers

ansible-playbook playbook/site.yml

To reset all servers

ansible-playbook playbook/reboot.yml

After k3s is installed on the master run:

scp root@<master-ip>:~/.kube/config ~/.kube/config-ctb-london

Edit the ~/.kube/config-ctb-london server address to the master node's address

And then set it as environment variable as:

export KUBECONFIG=~/.kube/config-ctb-london

We can check all the nodes and roles by running

kubectl get nodes -o wide

Step 2: Prometheus

I did something messed up on my first attempt: I forgot to make sure all pods from this stack should be on the same node. It had the database in Frankfurt, alert manager in Seattle and Grafana in Singapore. So I walked the ConfigSet and reassigned all nodes to London via the easiest way. I found the label I need for London node from OpenLens (OpenLens good).

  1. Run
ansible-playbook helm/prereq.yml

to ensure helm is installed and if it is the latest version.

  1. Install Prometheus and Grafana stack

In the prometheus install play, using ansible's built-in helm module to install helm chart.

To learn more about the relationship between nodes, pods, and their relationships, look at taint and toleration and node affinity - assigning pods to nodes on kubernetes documentation.

Run

ansible-playbook helm/prometheus/install.yml -vvv

:::tips use -vvv for VERY VERBOSE DEBUG MODE :::

Check it health by running

kubectl --namespace monitoring get pods

Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.

After installing Prometheus and Grafana, remember to install openlens on the local computer to monitor the cluster by

brew install openlens

After installing openlens remember to add a plugin on openlens @alebcay/openlens-node-pod-menu

OpenLens will allow you to enter pods' terminal (imagine docker exec -it <mycontainer> bash) and monitor their health without getting cancer. Be careful of what the plugin can do it might terminate a pod.

For services that are deployed to kubernetes, without active deployment you can access their dashboard via port forwarding on openlens.

Step 3: Install Cert manager and configure ClusterIssuer

Make sure that within the pip role we are installling "pyyaml" and "kubernetes" otherwise Helm configuration and ClusterIssuer configuration will fail.

In the site.yml we have splitted roles into different tags. To run a standalone role we can do

ansible-playbook plabook/site.yml -t cluster-pip  # give it a tag

After that run to install cert manager as well as configuring ClusterIssuer for Kubernetes

ansible-playbook helm/cert-manager/install.yml  

Current Available dashboard

  • Prometheus alert manager
  • Grafana

Appreciation

Without the help, discussion with my friends Anna and Martin this project wouldn't have started. Also thank Rancher's k3s-ansible project and Jeff's pip role.