Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uyuni Health Check Tool Disconnected Solution #9322

Draft
wants to merge 29 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
9ac0bcc
Initial Uyuni Health Check tool commit
ycedres Oct 4, 2024
3fece3d
Refactor show stats and errors
ycedres Oct 10, 2024
5f681ed
Fix pagination of full error logs
ycedres Oct 10, 2024
ba22374
Avoid breaking if no errors are found
ycedres Oct 10, 2024
b1363b7
Fix error in exporter initialization
ycedres Oct 11, 2024
db1d93f
Parametrize default grafana time range on startup
ycedres Oct 18, 2024
831967c
Wait for loki to ingest all jobs
ycedres Oct 18, 2024
0d7ed47
Add dashboard for showing error logs
ycedres Oct 18, 2024
6acd46e
Add static metrics to display in Grafana
ycedres Oct 29, 2024
52a29a0
Wait for Promtail to finish parsing log files
ycedres Nov 6, 2024
c0d852b
Refactor static metrics in supportconfig exporter
ycedres Nov 7, 2024
8ae98d4
Clean up unused files and relocation
ycedres Nov 11, 2024
e72852d
Fix container startup
m-czernek Nov 28, 2024
0fc710b
Update exporter to serve static data
m-czernek Dec 5, 2024
2e2977e
Modify the Grafana dashboard to display new static data
m-czernek Dec 5, 2024
cd06e22
Fix main codepath and style
m-czernek Dec 5, 2024
d59d329
Document a way to execute health check without hacking pythonpath
m-czernek Dec 6, 2024
2fbed83
Upgrade promtail to fix memory leak
m-czernek Dec 9, 2024
07b0457
Expose CPU count property
m-czernek Dec 10, 2024
c2f7e23
Include first alert
m-czernek Dec 10, 2024
9b1e570
Add Salt-perf alerts
m-czernek Dec 12, 2024
9e3d682
Parse journalctl in promtail
m-czernek Dec 12, 2024
fa07385
Add info about memory and fs layout
m-czernek Jan 3, 2025
c62bf57
Add RAM table and alerts, display disk layout
m-czernek Jan 8, 2025
4eb24c6
Add further alerts
m-czernek Jan 9, 2025
6046b31
Provide additional alerts
m-czernek Jan 13, 2025
b93ad14
provide more alerts and parse reposync logs
m-czernek Jan 13, 2025
abaaa5b
Add alerts when disk mount is out of space or has insufficient size
m-czernek Jan 20, 2025
ecc3e8c
Refactor the config usage
m-czernek Jan 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions health-check/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
build
dist
.eggs
*.egg-info
logcli-linux-amd64
promtail-linux-amd64
__pycache__
**/config/exporter/config.yaml
**/config/promtail/config.yaml
**/config/grafana/dashboards/supportconfig_with_logs.json

.vscode/
41 changes: 41 additions & 0 deletions health-check/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
### uyuni-health-check

A tool providing dashboard, metrics and logs from an Uyuni server supportconfig to visualise its health status.

## Requirements

* `python3`
* `podman`

## Building and installing

Install the tool locally into a virtual environment:

```
python3 -m venv venv
. venv/bin/activate
pip install .
```

## Getting started

This tool builds and deploys the necessary containers to scrape some metrics and logs from an Uyuni server supportconfig directory.
Execute the `run` phase of the tool as such:

```
uyuni-health-check -s ~/path/to/supportconfig run --logs --from_datetime=2024-01-01T00:00:00Z --to_datetime=2024-06-01T20:00:00Z
```

This will create and start the following containers locally:

- uyuni-health-exporter (port `9000`)
- grafana (port `3000`)
- loki (port `9100`)
- promtail (port `9081`)

After you start the containers, visit `localhost:3000` and select the `Supportconfig with Logs` dashboard.
If necessary, the default username/password for Grafana is `admin:admin`.

## Security notes
After running this tool, and until containers are destroyed, the Grafana Dashboards (and other metrics) are exposing metrics and logs messages that may contain sensitive data and information to any non-root user in the system or to anyone that have access to this host in the network.

44 changes: 44 additions & 0 deletions health-check/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# SPDX-FileCopyrightText: 2023 SUSE LLC
#
# SPDX-License-Identifier: Apache-2.0

[project]
name = "uyuni-health-check"
description = "Show Uyuni server health metrics and logs"
readme = "README.md"
requires-python = ">=3.6"
classifiers = [
"Programming Language :: Python :: 3",
"Operating System :: OS Independent",
]
dependencies = [
"Click",
"rich",
"requests",
"Jinja2",
"PyYAML",
"tomli",
]
maintainers = [
{name = "Pablo Suárez Hernández", email = "[email protected]"},
]
dynamic = ["version"]

[project.urls]
homepage = "https://github.com/uyuni-project/uyuni"
tracker = "https://github.com/uyuni-project/uyuni/issues"

[project.scripts]
uyuni-health-check = "uyuni_health_check.main:main"

[tool.setuptools]
package-dir = {"" = "src"}

[build-system]
requires = [
"setuptools>=42",
"setuptools_scm[toml]",
"wheel",
]
build-backend = "setuptools.build_meta"

Empty file.
72 changes: 72 additions & 0 deletions health-check/src/uyuni_health_check/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
import functools
import os
from typing import Any, Dict
import tomli
from pathlib import Path
import json
import jinja2

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
CONFIG_DIR = os.path.join(BASE_DIR, "config")
TEMPLATES_DIR = os.path.join(CONFIG_DIR, "templates")
CONTAINERS_DIR = os.path.join(BASE_DIR, "containers")
CONFIG_TOML_PATH = os.environ.get("HEALTH_CHECK_TOML", os.path.join(BASE_DIR, "config.toml"))

@functools.lru_cache
def _init_jinja_env() -> jinja2.Environment:
return jinja2.Environment(loader=jinja2.FileSystemLoader(TEMPLATES_DIR))

@functools.lru_cache
def parse_config() -> Dict:
if not os.path.exists(CONFIG_TOML_PATH):
raise ValueError(f"Config file does not exist: {CONFIG_TOML_PATH}")

with open(CONFIG_TOML_PATH, "rb") as f:
conf = tomli.load(f)
return conf

def get_json_template_filepath(json_relative_path: str) -> str:
return os.path.join(TEMPLATES_DIR, json_relative_path)

def load_jinja_template(template: str) -> jinja2.Template:
return _init_jinja_env().get_template(template)

def load_dockerfile_dir(dockerfile_dir: str) -> str:
return os.path.join(CONTAINERS_DIR, dockerfile_dir)

def get_config_dir_path(component: str) -> str:
return os.path.join(CONFIG_DIR, component)

def load_prop(property: str) -> Any:
res = parse_config().copy()
for prop_part in property.split('.'):
try:
res = res[prop_part]
except Exception as e:
raise ValueError(
f"Invalid config lookup ({property}); trying to get {prop_part} from {res}"
) from e
return res

def write_config(component: str, config_file_path: str, content: str, is_json=False):
basedir = Path(get_config_dir_path(component))
if not basedir.exists():
basedir.mkdir(parents=True)
file_path = os.path.join(basedir, config_file_path)
with open(file_path, "w") as file:
if is_json:
json.dump(content, file, indent=4)
else:
file.write(content)

def get_config_file_path(component):
return os.path.join(get_config_dir_path(component), "config.yaml")

def get_sources_dir(component):
return os.path.join(BASE_DIR, component)

def get_grafana_config_dir():
return os.path.join(CONFIG_DIR, "grafana")

def get_prometheus_config_dir():
return os.path.join(CONFIG_DIR, "prometheus")
11 changes: 11 additions & 0 deletions health-check/src/uyuni_health_check/config.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[podman]
network_name = "health-check-network"

[loki]
loki_container_name = "uyuni_health_check_loki"
loki_port = 3100
jobs = ["cobbler", "postgresql", "rhn", "apache"]

[logcli]
logcli_container_name = "uyuni_health_check_logcli"
logcli_image_name = "logcli"
Loading
Loading