-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support paginating by key-value size #14809
Comments
cc @serathius |
Is this for Kubernetes use case or general case? FYI, Kubernetes recently added |
Thanks for the link! Be default the object size is limited in 1.5MiB. The pageSize can be used to reduce memory spike. |
I am thinking of Kubernetes use case, but it definitely benefits general cases as well
It seems somehow to mitigate the similar problem. AIUI, the intention of it is actually increase the response size to reduce the calls to etcd. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
@ahrtr can you please also try re-opening this? |
Note, not sure if large objects are as much of a concern when compared to large amounts of small objects. Response size can be controlled on user side as long as they limit maximal size of the object and number of objects per response. Context kubernetes/enhancements#2340 (comment) Small objects are much more disruptful for clients. See the difference in CPU usage between "Disabled Large Objects" and "Disabled small objects" for apiserver. Difference is between 3 CPUs and 72 CPUs. Overall I still think that long term better solution is to introduce watch cache to etcd client to get the same awesome benefits (2-10 times CPU reduction, 20-50 times latency reduction) instead of creating workaround. |
As far as the problem is concerned for this issue to address, the total size is most relevant as it results in proportional memory usage for etcd. Fixing this is a measure to avoid query-of-death (ie. OOM) more than a performance improvement. Besides that, I don't see major conflict in having both long term and short term solution at the same time. |
I expect Kubernetes will have consistent reads from cache before etcd releases v3.6. Getting watch cache out will not be fare out. Short term will be immediately deprecated. |
IMO, this pagination by size is still helpful after we have consistent reads in k8s because k8s still needs to request etcd with non-streaming requests which might be too large and explode etcd memory usage, right? |
No, goal is to serve non-streaming requests from watch cache. |
you mean apiserver's watch cache, right? how about when apiserver starts and gets all objects from etcd into its own cache? |
Single get all objects request is not a big problem, the problem are frequent fetch request done by badly written Kubernetes controllers. Such requests require a lot of allocations and waste time for proto marshalling/unmarshalling causing etcd memory to grow in uncontrolled way. Note that all Kubernetes requests are sharded by resource, and Kubernetes limits size of response to 2GB, meaning that a initial request for all resources allocates maybe couple to tens of GB, it's big, but not horrible. Compare it to multiple controllers that fetch single 1GB resource every second. |
In 16300, the newly implemented maxbytes is defaulted to 0, which would not break any existing use case, and for cases where they do want to control the max size, this could be useful. Note that kube-apiserver maxlimit limits the object counts, but object with large size and more revisions would still cause problem. I don't see a strong reason not to merge. @serathius Do you have any specific concerns? |
I don't have any concerns about compatibility, more about feature bloat. Based on my understanding of SIG api-machinery roadmap this feature will never be used by Kubernetes. I don't think etcd should implement a feature for Kubernetes without having a prior design and approval from Kubernetes stakeholders. Before we add work for ourselves let's get LGTMs from SIG api-machinery. |
@linxiulei I share the same understanding as #14809 (comment). Please raise a KEP on Kubernetes and get K8s's approval firstly. I don't think etcd community will reject a feature which is useful and approved by K8s. |
Turns out we have a case where we ran out of the size aspect of the objects. We get the logs from apiserver:
Imagine a case where there's just a load of big secrets. I'm still figuring out whether this is caused by badly written operators :) The GRPC limit we can't really change, because it's already int32.Max: We should enable some kind of paging for range responses, but I agree that this needs to go the KEP route first. |
This adds a metric to allow us to alert on large range responses described in etcd-io#14809. Signed-off-by: Thomas Jungblut <[email protected]>
As a start, I've quickly added a metric that allows us to alert on common large range requests (secrets / configmaps / image streams) with #16881. |
What would you like to be added?
Allow RangeRequest to specify a size limit to paginate the result if the size of result exceeds the limit.
Why is this needed?
This would allow clients to constrain the response size to avoid overwhelming etcd server with expensive requests.
The text was updated successfully, but these errors were encountered: