Metrics collection for workflows deployed as knative services #550

jianrongzhang89 · 2024-10-10T15:49:51Z

Description

When a workflow is deployed as a serverless Knative Service, and a new workflow instance is triggered, a pod for the workflow gets started automatically, and after the instance is finished, Knative will automatically terminate the pod by scaling down the corresponding k8s deployment to zero replica. The workflow pods last for a short period before they get terminated.

As as result, Prometheus may not have the chance to scrap the metrics from the workflow on time and they may miss such metrics if the pods are already terminated and this leads to the accuracy issue of the dashboards.

This issue is created to implement a solution to overcome such limitation and implement a metrics collector as a push gateway, and a Kogito extension for the workflows to push their metrics to the collector, and prometheus will then scrap metrics from the collector instead. Knative documentation uses such collector for its own components as an example:
https://knative.dev/docs/eventing/observability/metrics/collecting-metrics/#understanding-the-collector

Implementation ideas

No response

jianrongzhang89 mentioned this issue Oct 10, 2024

Create a Prometheus ServiceMonitor object that can capture/collect metrics from deployed SonataFlow instances #464

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics collection for workflows deployed as knative services #550

Metrics collection for workflows deployed as knative services #550

jianrongzhang89 commented Oct 10, 2024

Metrics collection for workflows deployed as knative services #550

Metrics collection for workflows deployed as knative services #550

Comments

jianrongzhang89 commented Oct 10, 2024

Description

Implementation ideas