Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observability refactor #83

Merged
merged 2 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/observability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Introduction

I have installed most components of the Prometheus/Grafana stack for observability.

## Grafana
Open Source Monitoring

## kube-prometheus-stack

## node-exporter
Prometheus exporter for hardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors.
2 changes: 1 addition & 1 deletion kubernetes/apps/network/cloudflared/ks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ spec:
labels:
app.kubernetes.io/name: *app
dependsOn:
- name: external-dns-cloudflare
- name: external-dns-external
- name: external-secrets-stores
path: ./kubernetes/apps/network/cloudflared/app
prune: true #Revert
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
receivers:
- name: "null"
- name: "pushover"
pushover_configs:
- html: true
token_file: /etc/secrets/pushover_api_token
user_key_file: /etc/secrets/pushover_api_userkey
send_resolved: true
priority: |-
{{ if eq .Status "firing" }}1{{ else }}0{{ end }}
url_title: View in Alert Manager
title: |-
[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }}
message: |-
{{- range .Alerts }}
{{- if ne .Labels.severity "" }}
<b>Severity:</b> <i>{{ .Labels.severity }}</i>
{{- else }}
<b>Severity:</b> <i>N/A</i>
{{- end }}
{{- if ne .Annotations.description "" }}
<b>Description:</b> <i>{{ .Annotations.description }}</i>
{{- else if ne .Annotations.summary "" }}
<b>Summary:</b> <i>{{ .Annotations.summary }}</i>
{{- else if ne .Annotations.message "" }}
<b>Message:</b> <i>{{ .Annotations.message }}</i>
{{- else }}
<b>Description:</b> <i>N/A</i>
{{- end }}
{{- if gt (len .Labels.SortedPairs) 0 }}
<b>Details:</b>
{{- range .Labels.SortedPairs }}
• <b>{{ .Name }}:</b> <i>{{ .Value }}</i>
{{- end }}
{{- end }}
{{- end }}

route:
group_by: ["alertname", "job"]
group_wait: 30s
group_interval: 5m
repeat_interval: 6h
receiver: "pushover"
routes:
- receiver: "null"
matchers:
- alertname =~ "InfoInhibitor|Watchdog"
- receiver: "pushover"
matchers:
- severity = critical
continue: true

inhibit_rules:
- source_matchers:
- severity = "critical"
target_matchers:
- severity = "warning"
equal: ["alertname", "namespace"]
23 changes: 23 additions & 0 deletions kubernetes/apps/observability/alertmanager/app/externalsecret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
# yaml-language-server: $schema=https://raw.githubusercontent.com/datreeio/CRDs-catalog/main/external-secrets.io/externalsecret_v1beta1.json
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: alertmanager-secret
spec:
refreshInterval: 5m
secretStoreRef:
kind: ClusterSecretStore
name: onepassword-connect
target:
name: alertmanager-secret
creationPolicy: Owner
data:
- secretKey: pushover_api_token
remoteRef:
key: Pushover
property: ALERTMANAGER_TOKEN
- secretKey: pushover_api_userkey
remoteRef:
key: Pushover
property: PUSHOVER_USER_KEY
88 changes: 88 additions & 0 deletions kubernetes/apps/observability/alertmanager/app/helmrelease.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
# yaml-language-server: $schema=https://raw.githubusercontent.com/bjw-s/helm-charts/main/charts/other/app-template/schemas/helmrelease-helm-v2.schema.json
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: alertmanager
spec:
interval: 30m
chart:
spec:
chart: app-template
version: 3.6.1
interval: 30m
sourceRef:
kind: HelmRepository
name: bjw-s
namespace: flux-system

values:
controllers:
alertmanager:
type: statefulset
annotations:
reloader.stakater.com/auto: "true"

statefulset:
volumeClaimTemplates:
- name: storage
accessMode: ReadWriteOnce
size: 50Mi
storageClass: ceph-block
globalMounts:
- path: /alertmanager

containers:
alertmanager:
image:
repository: quay.io/prometheus/alertmanager
tag: v0.28.0
ports:
- name: http
containerPort: 9093
probes:
liveness:
enabled: true
readiness:
enabled: true
startup:
enabled: true
spec:
failureThreshold: 30
periodSeconds: 5
resources:
requests:
cpu: 11m
memory: 50M
limits:
memory: 99M

service:
app:
controller: alertmanager
ports:
http:
port: 9093

ingress:
app:
className: internal
hosts:
- host: alertmanager.altena.io
paths:
- path: /
service:
identifier: app
port: http

persistence:
config:
type: configMap
name: alertmanager
globalMounts:
- path: /etc/alertmanager
secrets:
type: secret
name: alertmanager-secret
globalMounts:
- path: /etc/secrets
16 changes: 16 additions & 0 deletions kubernetes/apps/observability/alertmanager/app/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
# yaml-language-server: $schema=https://json.schemastore.org/kustomization
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ./externalsecret.yaml
- ./helmrelease.yaml
configMapGenerator:
- name: alertmanager
files:
- config/alertmanager.yaml
generatorOptions:
annotations:
kustomize.toolkit.fluxcd.io/substitute: disabled
configurations:
- kustomizeconfig.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
nameReference:
- kind: ConfigMap
version: v1
fieldSpecs:
- path: spec/values/persistence/config/name
kind: HelmRelease
22 changes: 22 additions & 0 deletions kubernetes/apps/observability/alertmanager/ks.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
# yaml-language-server: $schema=https://raw.githubusercontent.com/fluxcd-community/flux2-schemas/main/kustomization-kustomize-v1.json
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: &appname alertmanager
namespace: flux-system
spec:
targetNamespace: observability
commonMetadata:
labels:
app.kubernetes.io/name: *appname
interval: 30m
timeout: 5m
path: "./kubernetes/apps/observability/alertmanager/app"
prune: true
sourceRef:
kind: GitRepository
name: flux-system
wait: false
dependsOn:
- name: external-secrets-stores
Original file line number Diff line number Diff line change
Expand Up @@ -33,23 +33,7 @@ spec:
enabled: false
cleanPrometheusOperatorObjectNames: true
alertmanager:
ingress:
enabled: true
annotations:
external-dns.alpha.kubernetes.io/target: internal.altena.io
ingressClassName: internal
hosts: ["alertmanager.altena.io"]
pathType: Prefix
alertmanagerSpec:
useExistingSecret: true
configSecret: alertmanager-secret
storage:
volumeClaimTemplate:
spec:
storageClassName: nfs-csi-sc
resources:
requests:
storage: 1Gi
enabled: false
kubelet:
enabled: true
kubeApiServer:
Expand Down Expand Up @@ -104,18 +88,7 @@ spec:
requests:
storage: 75Gi
nodeExporter:
enabled: true
prometheus-node-exporter:
fullnameOverride: node-exporter
prometheus:
monitor:
enabled: true
relabelings:
- action: replace
regex: (.*)
replacement: $1
sourceLabels: ["__meta_kubernetes_pod_node_name"]
targetLabel: kubernetes_node
enabled: false
kubeStateMetrics:
enabled: true
kube-state-metrics:
Expand Down
6 changes: 5 additions & 1 deletion kubernetes/apps/observability/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,11 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
# Pre Flux-Kustomizations
- ./namespace.yaml
# Flux-Kustomizations
- ./alertmanager/ks.yaml
- ./grafana/ks.yaml
- ./prometheus-operator-crds/ks.yaml
- ./kube-prometheus-stack/ks.yaml
- ./node-exporter/ks.yaml
- ./prometheus-operator-crds/ks.yaml
57 changes: 57 additions & 0 deletions kubernetes/apps/observability/node-exporter/app/helmrelease.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
---
# yaml-language-server: $schema=https://raw.githubusercontent.com/fluxcd-community/flux2-schemas/main/helmrelease-helm-v2.json
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: node-exporter
spec:
interval: 30m
chart:
spec:
chart: prometheus-node-exporter
version: 4.43.1
sourceRef:
kind: HelmRepository
name: prometheus-community
namespace: flux-system
interval: 30m
values:
fullnameOverride: node-exporter

image:
registry: quay.io
repository: prometheus/node-exporter

prometheus:
monitor:
enabled: true
jobLabel: app.kubernetes.io/instance

relabelings:
- action: replace
regex: (.*)
replacement: $1
sourceLabels:
- __meta_kubernetes_pod_node_name
targetLabel: kubernetes_node
- action: replace
regex: (.*)
replacement: $1
sourceLabels:
- __meta_kubernetes_pod_node_name
targetLabel: nodename
- action: replace
regex: (.*)
replacement: $1.bjw-s.internal:9100
sourceLabels:
- kubernetes_node
targetLabel: instance

resources:
requests:
cpu: 23m
memory: 64M
limits:
memory: 64M

hostNetwork: false
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
# yaml-language-server: $schema=https://json.schemastore.org/kustomization
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ./helmrelease.yaml
20 changes: 20 additions & 0 deletions kubernetes/apps/observability/node-exporter/ks.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
# yaml-language-server: $schema=https://raw.githubusercontent.com/fluxcd-community/flux2-schemas/main/kustomization-kustomize-v1.json
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: &appname node-exporter
namespace: flux-system
spec:
targetNamespace: monitoring
commonMetadata:
labels:
app.kubernetes.io/name: *appname
interval: 30m
timeout: 5m
path: "./kubernetes/apps/monitoring/node-exporter/app"
prune: true
sourceRef:
kind: GitRepository
name: flux-system
wait: false
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ spec:
commonMetadata:
labels:
app.kubernetes.io/name: *app
path: ./kubernetes/apps/observability/prometheus-operator-crds/app
path: ./kubernetes/apps/observability/prometheus-operator-crds/crd
prune: true #Revert # never should be deleted
sourceRef:
kind: GitRepository
Expand Down
Loading