-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reports aggregation in a separate process #32
Open
eddycharly
wants to merge
1
commit into
kyverno:main
Choose a base branch
from
eddycharly:reports-v2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,220 @@ | ||
# Meta | ||
|
||
- Name: reports-v2 | ||
- Start Date: 2022-09-12 | ||
- Author(s): eddycharly | ||
- Supersedes: N/A | ||
|
||
# Table of Contents | ||
|
||
- [Meta](#meta) | ||
- [Table of Contents](#table-of-contents) | ||
- [Overview](#overview) | ||
- [Definitions](#definitions) | ||
- [Motivation](#motivation) | ||
- [Proposal](#proposal) | ||
- [Implementation](#implementation) | ||
- [Migration (OPTIONAL)](#migration-optional) | ||
- [Drawbacks](#drawbacks) | ||
- [Alternatives](#alternatives) | ||
- [Unresolved Questions](#unresolved-questions) | ||
- [CRD Changes (OPTIONAL)](#crd-changes-optional) | ||
|
||
# Overview | ||
|
||
Support per resource report, automatic reports cleanup, and separate reports aggregation controller. | ||
|
||
# Definitions | ||
|
||
N/A | ||
|
||
# Motivation | ||
|
||
The current implementation to generate and maintain policy reports is causing memory issues. | ||
Moreover, reports lifecycle are all managed by hand. Matching reports and their corresponding resource/policy is cumbersome. | ||
|
||
Kubernes has built-in mecanisms to clean up resources when parent resources are deleted, we should leverage native capabilities when possible. | ||
|
||
Processing reports should not impact Kyverno admission requests processing and we should be able to scale differently for large clusters when necessary. | ||
|
||
# Proposal | ||
|
||
In this proposal, we study the possibility to change the way reports are generated by: | ||
- creating one report per resource | ||
- bind the report lifecycle to the resource lifecycle | ||
- allow reconciling reports in an external process | ||
|
||
There are three ways of generating reports in Kyverno: | ||
1. At admission time, all policies running in audit mode are run against the admission request and produce report results. | ||
1. When a policy is created/updated/deleted, if the policy can run in background mode, reports are updated according to the policy changes. | ||
1. Periodically, policies running in background mode are re eveluated against resources present in the cluster and reports are updated accordingly. | ||
Comment on lines
+48
to
+50
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Index: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will appear correctly in md viewer. |
||
|
||
By creating one report per resource, generating higher level reports (ie. per namepsace) boils down to aggregating reports living in the namespace. | ||
|
||
The controller responsible for aggregating reports can be isolated in its own process, separated from the Kyverno main controller. | ||
|
||
Finally, managing the one to one relationship between a resource and its corresponding report is way easier than what we have today. | ||
|
||
# Implementation | ||
|
||
The implementation should be straightforward, we need to generate single resource reports at admission time: | ||
- the report name can be derived from the resource `uid` | ||
- if the resource is updated, we update the report (the `uid` remains the same) | ||
- if the resource is deleted, Kubernetes will garbage collect the orphan reports | ||
|
||
Background scans should follow the same logic as above. | ||
|
||
A new controller implemented in a separate process will be in charge of watching reports and create/update/delete higher level reports. | ||
|
||
As a bonus, for very large clusters, we can add options to run multiple controllers responsible for aggregating reports by only watching a subset of per resource reports, hence letting the end user shard reports aggregation (we could have one controller per namespace for example). | ||
|
||
## Impulses | ||
|
||
Reports need to be updated when the following event occurs: | ||
- A resource is created or modified | ||
- A policy is created, modified or deleted | ||
|
||
When a resource is created or modified it will go through admission again, reports will be generated at the end of the admission request processing. | ||
|
||
When a policy is created, modified or deleted, a controller will be responsible of updating the report according the the new policy (or remove results belonging to the policy if it is a deletion). | ||
|
||
> **Note**: Admission webhooks can be invoked twice depending on the installed policies (with `Fail` and/or `Ignore` failure policies). | ||
|
||
## Report controller | ||
|
||
The report controller is responsible for merging resource reports together and aggregate results at the namespace level. | ||
|
||
To be optimal it needs to sort results in a stable way so that we don't update reports if nothing changes. | ||
|
||
The sorting algorithm will look something like this: | ||
```go | ||
func SortReportResults(results []policyreportv1alpha2.PolicyReportResult) { | ||
slices.SortFunc(results, func(a policyreportv1alpha2.PolicyReportResult, b policyreportv1alpha2.PolicyReportResult) bool { | ||
if a.Policy != b.Policy { | ||
return a.Policy < b.Policy | ||
} | ||
if a.Rule != b.Rule { | ||
return a.Rule < b.Rule | ||
} | ||
if len(a.Resources) != len(b.Resources) { | ||
return len(a.Resources) < len(b.Resources) | ||
} | ||
for i := range a.Resources { | ||
if a.Resources[i].UID != b.Resources[i].UID { | ||
return a.Resources[i].UID < b.Resources[i].UID | ||
} | ||
} | ||
return false | ||
}) | ||
} | ||
``` | ||
|
||
Reports will be split per policy to avoid breaking the resource size limit imposed by Kubernetes. | ||
|
||
Reports names need to be unique, for this reason we introduce the following convention: | ||
- Reports generated for cluster policies will be named `cpol-<cluster policy name>` | ||
- Reports generated for namespace policies will be named `pol-<cluster policy name>` | ||
|
||
This makes the system compatible with policies and cluster policies having the same name. | ||
|
||
Cleanup of reports will happen at the end of reconciliation cycle by comparing actual and expected reports. | ||
|
||
One last concern is the key we push in the work queue. | ||
The unit of work for this controller is a namespace, we will push a namespace name in the work queue (or `""` for cluster level report). | ||
The reconcilation loop will wait for a small delay before reconciling reports in case things are changing fast in the cluster, to avoid unnecessary computations as much as possible. | ||
|
||
## Audit controller | ||
|
||
The audit controller is responsible for creating, updating and deleting reports. | ||
|
||
Deleting reports can be necessary when a policy changes or is removed, existing reports can become obsolete. | ||
|
||
In case of a resource deletion, reports will be automatically garbage collected by Kubernetes so we don't need to take care of this. | ||
|
||
Most of the time (but not always), created and/or updated resources will go through an admission request. Initial reports will be created at admission time. | ||
Some resources might already exist in the cluster or be created using static manifests on cluster nodes by kubelet. In this case there will be no admission request. | ||
|
||
The controller needs to be able to identify the resources that should habe an associated report but don't have one. For those resources it will create an empty report that will be populated during the next reconciliation loop. | ||
|
||
To be optimal the controller needs to sort results in a stable way so that we don't update reports if nothing changes. | ||
|
||
The sorting algorithm will look something like this: | ||
```go | ||
func SortReportResults(results []policyreportv1alpha2.PolicyReportResult) { | ||
slices.SortFunc(results, func(a policyreportv1alpha2.PolicyReportResult, b policyreportv1alpha2.PolicyReportResult) bool { | ||
if a.Policy != b.Policy { | ||
return a.Policy < b.Policy | ||
} | ||
if a.Rule != b.Rule { | ||
return a.Rule < b.Rule | ||
} | ||
if len(a.Resources) != len(b.Resources) { | ||
return len(a.Resources) < len(b.Resources) | ||
} | ||
for i := range a.Resources { | ||
if a.Resources[i].UID != b.Resources[i].UID { | ||
return a.Resources[i].UID < b.Resources[i].UID | ||
} | ||
} | ||
return false | ||
}) | ||
} | ||
``` | ||
|
||
Naturally, the key to enqueue in the controller work queue is the key of the report, the controller will process each report individually. | ||
|
||
The controller will also need to watch policies/cluster policies and requeue all reports that need to be potentially updated when a given policy changes. | ||
To do this efficiently, all reports will store labels for the policies that have been applied to it, the label value being the policy version that was applied to the report. | ||
This allows to quickly determine reports that need to be recomputed and requeue only the necessary reports. | ||
|
||
The label name used to store a policy and its version in a report follows the convention: | ||
- `cpol.kyverno.io/<cluster policy name>: <cluster policy version>` for cluster policies | ||
- `pol.kyverno.io/<policy name>: <cluster policy version>` for namespace policies | ||
|
||
This convention prevents names clashing in case of a policy and cluster policy having the same name, while staying compatible with constraints imposed by Kubernetes labels naming scheme. | ||
|
||
The controller needs to pay attention to the background field of policies and only update reports for policies that support/enabled background processing. | ||
For this reason, reports will be compared with all policies in the cluster but only background policies will be considered when updating the report during a background scan. | ||
This way, reports should be up to date regardless of the background policy spec. | ||
For example, when a policy that disabled background processing is deleted, reports will be updated correctly. | ||
|
||
Resources that need their report to be updated will be fetched from the api server only once and all policies will be applied to it, we should not fetch the resource multiple times and only when we need it to reduce api server calls at a minimum. | ||
Using the labels to filter out reports that are up to date will be used for that. | ||
|
||
Cleanup of reports will happen at the end of reconciliation cycle by comparing actual and expected reports. | ||
|
||
Two special cases can require recomputation of reports: | ||
- If a policy uses a configmap as external variables, the configmap could have changed and we have no way to track that | ||
- If namespace labels changed, policies matching could be impacted | ||
|
||
## Custom Resources | ||
|
||
1. We have all necessary resources in place to implement this new design. | ||
|
||
## Existing Kubernetes Constructs | ||
|
||
N/A | ||
|
||
## Link to the Implementation PR | ||
|
||
- https://github.com/kyverno/kyverno/pull/4608 | ||
|
||
# Migration (OPTIONAL) | ||
|
||
N/A | ||
|
||
# Drawbacks | ||
|
||
N/A | ||
|
||
# Alternatives | ||
|
||
* Various alternatives have been tested in the past but without much success. Throttling is hard to implement in distributed systems and all alternatives were running in process and have shown high memory and/or cpu consumption. | ||
|
||
# Unresolved Questions | ||
|
||
N/A | ||
|
||
# CRD Changes (OPTIONAL) | ||
|
||
N/A |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is report here a report change request or a policy report?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested with RCR but could be a policy report as well.