-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client watchGrpcStream tight-loops if server is taken down while watcher is running #9578
Comments
How about a quick fix |
Add a 100ms sleep to avoid tight loop if reconnection fails quickly. Fixes etcd-io#9578
Add a 100ms sleep to avoid tight loop if reconnection fails quickly. Fixes etcd-io#9578
Add a 100ms sleep to avoid tight loop if reconnection fails quickly. Fixes etcd-io#9578
big +1 to a minimal fix here that can be picked back to 3.2.x and 3.3.x streams. the impact of the hotloop is pretty severe. a simple backoff (start fast, multiply backoff, cap at 100ms) took our client application from 700% CPU consumption when etcd was down to 10%. the Unavailable error code description explicitly references retrying with a backoff: |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Debugging an issue in our project,
calico-felix
which uses the etcdv3 API. I noticed that if I let the product start a watch and then take down the single-node etcd server, the product starts using a lot of CPU (150%+). Profiling, I traced it down towatchGrpcStream.run()
Looks like the connection fails quickly (connection refused, presumably) and that results in an immediate retry:I'm using v3.3.3 of the client (I started with v3.3.0 but then upgraded to see if it fixed the issue).
The text was updated successfully, but these errors were encountered: