Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics collection for workflows deployed as knative services #550

Open
jianrongzhang89 opened this issue Oct 10, 2024 · 0 comments
Open

Comments

@jianrongzhang89
Copy link
Contributor

Description

When a workflow is deployed as a serverless Knative Service, and a new workflow instance is triggered, a pod for the workflow gets started automatically, and after the instance is finished, Knative will automatically terminate the pod by scaling down the corresponding k8s deployment to zero replica. The workflow pods last for a short period before they get terminated.

As as result, Prometheus may not have the chance to scrap the metrics from the workflow on time and they may miss such metrics if the pods are already terminated and this leads to the accuracy issue of the dashboards.

This issue is created to implement a solution to overcome such limitation and implement a metrics collector as a push gateway, and a Kogito extension for the workflows to push their metrics to the collector, and prometheus will then scrap metrics from the collector instead. Knative documentation uses such collector for its own components as an example:
https://knative.dev/docs/eventing/observability/metrics/collecting-metrics/#understanding-the-collector

Implementation ideas

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant