Skip to content

Commit

Permalink
Reports aggregation in a separate process
Browse files Browse the repository at this point in the history
Signed-off-by: Charles-Edouard Brétéché <[email protected]>
  • Loading branch information
eddycharly committed Sep 12, 2022
1 parent b011557 commit 05028a7
Showing 1 changed file with 106 additions and 0 deletions.
106 changes: 106 additions & 0 deletions proposals/reports.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Meta

- Name: reports-v2
- Start Date: 2022-09-12
- Author(s): eddycharly
- Supersedes: N/A

# Table of Contents

- [Meta](#meta)
- [Table of Contents](#table-of-contents)
- [Overview](#overview)
- [Definitions](#definitions)
- [Motivation](#motivation)
- [Proposal](#proposal)
- [Implementation](#implementation)
- [Migration (OPTIONAL)](#migration-optional)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
- [Prior Art](#prior-art)
- [Unresolved Questions](#unresolved-questions)
- [CRD Changes (OPTIONAL)](#crd-changes-optional)

# Overview

Support per resource report, automatic reports cleanup, and separate reports aggregation controller.

# Definitions

- TTL: Time-to-live. The amount of time a resource may exist until it is cleaned up.

# Motivation

The current implementation to generate and maintain policy reports is causing memory issues.
Moreover, reports lifecycle are all managed by hand. Matching reports and their corresponding resource/policy is cumbersome.

Kubernes has built-in mecanisms to clean up resources when parent resources are deleted, we should leverage native capabilities when possible.

Processing reports should not impact Kyverno admission requests processing and we should be able to scale differently for large clusters when necessary.

# Proposal

In this proposal, we study the possibility to change the way reports are generated by:
- creating one report per resource
- bind the report lifecycle to the resource lifecycle
- allow reconciling reports in an external process

There are three ways of generating reports in Kyverno:
1. At admission time, all policies running in audit mode are run against the admission request and produce report results.
1. When a policy is created/updated/deleted, if the policy can run in background mode, reports are updated according to the policy changes.
1. Periodically, policies running in background mode are re eveluated against resources present in the cluster and reports are updated accordingly.

By creating one report per resource, generating higher level reports (ie. per namepsace) boils down to aggregating reports living in the namespace.

The controller responsible for aggregating reports can be isolated in its own process, separated from the Kyverno main controller.

Finally, managing the one to one relationship between a resource and its corresponding report is way easier than what we have today.

# Implementation

The implementation should be straightforward, we need to generate single resource reports at admission time:
- the report name can be derived from the resource `uid`
- if the resource is updated, we update the report (the `uid` remains the same)
- if the resource is deleted, Kubernetes will garbage collect the orphan reports

Background scans should follow the same logic as above.

A new controller implemented in a separate process will be in charge of watching reports and create/update/delete higher level reports.

As a bonus, for very large clusters, we can add options to run multiple controllers responsible for aggregating reports by only watching a subset of per resource reports, hence letting the end user shard reports aggregation (we could have one controller per namespace for example).

## Custom Resources

1. We have all necessary resources in place to implement this new design.

## Existing Kubernetes Constructs

N/A

## Link to the Implementation PR

N/A

# Migration (OPTIONAL)

N/A

# Drawbacks

N/A

# Alternatives

* Various alternatives have been tested in the past but without much success. Throttling is hard to implement in distributed systems and all alternatives were running in process and have shown high memory and/or cpu consumption.

# Prior Art

* [kube-janitor](https://codeberg.org/hjacobs/kube-janitor)

# Unresolved Questions

N/A

# CRD Changes (OPTIONAL)

N/A

0 comments on commit 05028a7

Please sign in to comment.