Support paginating by key-value size #14809

linxiulei · 2022-11-19T16:25:29Z

What would you like to be added?

Allow RangeRequest to specify a size limit to paginate the result if the size of result exceeds the limit.

Why is this needed?

This would allow clients to constrain the response size to avoid overwhelming etcd server with expensive requests.

linxiulei · 2022-11-19T16:26:05Z

cc @serathius

lavacat · 2022-11-19T19:19:37Z

Is this for Kubernetes use case or general case?

FYI, Kubernetes recently added maxLimit = 10000

fuweid · 2022-11-20T05:09:37Z

Is this for Kubernetes use case or general case?

FYI, Kubernetes recently added maxLimit = 10000

Thanks for the link! Be default the object size is limited in 1.5MiB. The pageSize can be used to reduce memory spike.
In kubernetes, it seems it can resolve this issue.

linxiulei · 2022-11-20T10:28:45Z

Is this for Kubernetes use case or general case?

I am thinking of Kubernetes use case, but it definitely benefits general cases as well

FYI, Kubernetes recently added maxLimit = 10000

It seems somehow to mitigate the similar problem. AIUI, the intention of it is actually increase the response size to reduce the calls to etcd.

stale · 2023-03-18T09:20:20Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

linxiulei · 2023-07-25T10:36:14Z

@ahrtr can you please also try re-opening this?

serathius · 2023-07-26T09:57:08Z

Note, not sure if large objects are as much of a concern when compared to large amounts of small objects. Response size can be controlled on user side as long as they limit maximal size of the object and number of objects per response.

Context kubernetes/enhancements#2340 (comment)

Small objects are much more disruptful for clients. See the difference in CPU usage between "Disabled Large Objects" and "Disabled small objects" for apiserver. Difference is between 3 CPUs and 72 CPUs.

Overall I still think that long term better solution is to introduce watch cache to etcd client to get the same awesome benefits (2-10 times CPU reduction, 20-50 times latency reduction) instead of creating workaround.

linxiulei · 2023-07-26T10:21:52Z

As far as the problem is concerned for this issue to address, the total size is most relevant as it results in proportional memory usage for etcd. Fixing this is a measure to avoid query-of-death (ie. OOM) more than a performance improvement.

Besides that, I don't see major conflict in having both long term and short term solution at the same time.

serathius · 2023-07-26T11:22:15Z

Besides that, I don't see major conflict in having both long term and short term solution at the same time.

I expect Kubernetes will have consistent reads from cache before etcd releases v3.6. Getting watch cache out will not be fare out. Short term will be immediately deprecated.

linxiulei · 2023-09-14T10:25:40Z

IMO, this pagination by size is still helpful after we have consistent reads in k8s because k8s still needs to request etcd with non-streaming requests which might be too large and explode etcd memory usage, right?

serathius · 2023-09-14T10:32:38Z

No, goal is to serve non-streaming requests from watch cache.

linxiulei · 2023-09-14T10:33:41Z

you mean apiserver's watch cache, right? how about when apiserver starts and gets all objects from etcd into its own cache?

serathius · 2023-09-14T11:26:23Z

Single get all objects request is not a big problem, the problem are frequent fetch request done by badly written Kubernetes controllers. Such requests require a lot of allocations and waste time for proto marshalling/unmarshalling causing etcd memory to grow in uncontrolled way.

Note that all Kubernetes requests are sharded by resource, and Kubernetes limits size of response to 2GB, meaning that a initial request for all resources allocates maybe couple to tens of GB, it's big, but not horrible. Compare it to multiple controllers that fetch single 1GB resource every second.

wenjiaswe · 2023-09-14T18:14:38Z

In 16300, the newly implemented maxbytes is defaulted to 0, which would not break any existing use case, and for cases where they do want to control the max size, this could be useful. Note that kube-apiserver maxlimit limits the object counts, but object with large size and more revisions would still cause problem.

I don't see a strong reason not to merge. @serathius Do you have any specific concerns?

serathius · 2023-09-18T10:26:10Z

I don't have any concerns about compatibility, more about feature bloat. Based on my understanding of SIG api-machinery roadmap this feature will never be used by Kubernetes.

I don't think etcd should implement a feature for Kubernetes without having a prior design and approval from Kubernetes stakeholders. Before we add work for ourselves let's get LGTMs from SIG api-machinery.

cc @jpbetz @deads @logicalhan @wojtek-t

ahrtr · 2023-09-18T11:15:51Z

@linxiulei I share the same understanding as #14809 (comment). Please raise a KEP on Kubernetes and get K8s's approval firstly. I don't think etcd community will reject a feature which is useful and approved by K8s.

tjungblu · 2023-10-27T12:21:28Z

Turns out we have a case where we ran out of the size aspect of the objects. We get the logs from apiserver:

I0821 23:03:24.662977      17 trace.go:205] Trace[419549052]: "List(recursive=true) etcd3" key:/secrets,resourceVersion:,resourceVersionMatch:, limit:10000,continue: (21-Aug-2023 23:03:23.767) (total time: 895 ms): Trace[419549052]: [895.681318ms] [895.681318ms] END
W0821 23:03:24.662996      17 reflector.go:324] storage/cacher.go:/secrets: failed to list *core.Secret: rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (2169698338 vs.  2147483647)
E0821 23:03:24.663007      17 cacher.go:425] cacher (*core.Secret): unexpected ListAndWatch error: failed to list *core.Secret: rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (2169698338 vs. 2147483647); reinitializing...

{"level":"warn","ts":"2023-08-17T23:03:24.662Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00357ae00/10.1.1.3:2379","attempt":0,"error":"rpc error: code = ResourceExhausted desc = grpc: trying to send message larger than max (2169698338 vs. 2147483647)"}

Imagine a case where there's just a load of big secrets. I'm still figuring out whether this is caused by badly written operators :)

The GRPC limit we can't really change, because it's already int32.Max:
https://github.com/grpc/grpc-go/blob/8cb98464e5999aa2fd57bbf5b23bd5a634f4b2f5/server.go#L59

We should enable some kind of paging for range responses, but I agree that this needs to go the KEP route first.

This adds a metric to allow us to alert on large range responses described in etcd-io#14809. Signed-off-by: Thomas Jungblut <[email protected]>

tjungblu · 2023-11-07T14:35:59Z

As a start, I've quickly added a metric that allows us to alert on common large range requests (secrets / configmaps / image streams) with #16881.

linxiulei added the type/feature label Nov 19, 2022

linxiulei mentioned this issue Nov 19, 2022

Support specifying MaxBytes in range request #14810

Closed

stale bot added the stale label Mar 18, 2023

stale bot closed this as completed May 22, 2023

linxiulei linked a pull request Jul 25, 2023 that will close this issue

Support specifying MaxBytes in range request #16300

Open

ahrtr reopened this Jul 25, 2023

stale bot removed the stale label Jul 25, 2023

ahrtr added stale stage/tracked labels Jul 25, 2023

stale bot removed the stale label Jul 25, 2023

tjungblu mentioned this issue Nov 7, 2023

Add range response KV length as a metric #16881

Closed

tjungblu added a commit to tjungblu/etcd that referenced this issue Nov 7, 2023

Add range response KV length as a metric

1dbb35e

This adds a metric to allow us to alert on large range responses described in etcd-io#14809. Signed-off-by: Thomas Jungblut <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support paginating by key-value size #14809

Support paginating by key-value size #14809

linxiulei commented Nov 19, 2022

linxiulei commented Nov 19, 2022

lavacat commented Nov 19, 2022

fuweid commented Nov 20, 2022

linxiulei commented Nov 20, 2022

stale bot commented Mar 18, 2023

linxiulei commented Jul 25, 2023

serathius commented Jul 26, 2023 •

edited

Loading

linxiulei commented Jul 26, 2023

serathius commented Jul 26, 2023

linxiulei commented Sep 14, 2023

serathius commented Sep 14, 2023

linxiulei commented Sep 14, 2023 •

edited

Loading

serathius commented Sep 14, 2023

wenjiaswe commented Sep 14, 2023 •

edited

Loading

serathius commented Sep 18, 2023

ahrtr commented Sep 18, 2023

tjungblu commented Oct 27, 2023

tjungblu commented Nov 7, 2023

Support paginating by key-value size #14809

Support paginating by key-value size #14809

Comments

linxiulei commented Nov 19, 2022

What would you like to be added?

Why is this needed?

linxiulei commented Nov 19, 2022

lavacat commented Nov 19, 2022

fuweid commented Nov 20, 2022

linxiulei commented Nov 20, 2022

stale bot commented Mar 18, 2023

linxiulei commented Jul 25, 2023

serathius commented Jul 26, 2023 • edited Loading

linxiulei commented Jul 26, 2023

serathius commented Jul 26, 2023

linxiulei commented Sep 14, 2023

serathius commented Sep 14, 2023

linxiulei commented Sep 14, 2023 • edited Loading

serathius commented Sep 14, 2023

wenjiaswe commented Sep 14, 2023 • edited Loading

serathius commented Sep 18, 2023

ahrtr commented Sep 18, 2023

tjungblu commented Oct 27, 2023

tjungblu commented Nov 7, 2023

serathius commented Jul 26, 2023 •

edited

Loading

linxiulei commented Sep 14, 2023 •

edited

Loading

wenjiaswe commented Sep 14, 2023 •

edited

Loading