-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: significant heap profiler memory usage increase in Go 1.23 #69590
Comments
Related Issues and Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
CC @golang/runtime @felixge |
Thanks for the detailed issue! I'm glad this seems localized for now; I agree we should wait and see if this more widespread. |
Just wanted to confirm that the |
Yes, that's correct |
Copying from #57175 (comment): This is theoretically a problem even with 32 frames. i.e., with The reproducer above only triggers extreme usage only with a higher stack limit because it has 16 constant leaf frames. That limits to 2^16 combinations with a 32 frame limit, but 2^112 with a 128 frame limit. The production application that encountered this way presumably similar. |
We are also experiencing this issue after upgrading AzCopy to Go v1.23.x - Azure/azure-storage-azcopy#2901 This is a show-stopping issue requiring us to downgrade AzCopy back to 1.22.x. We cannot stay on 1.22.x forever, as CVEs will be reported on these versions. @dr2chase and @mknyszek, any updates on this issue? CC: @dphulkar-msft, @vibhansa-msft, @gapra-msft, @adreed-msft |
As noted in the original post on this issue, you can set |
@mknyszek AzCopy is an application that customers download and run on their machine. We cannot set env variables on the customer's machine beforehand. |
Yeah, that doesn't have a workaround right now. Thanks for clarifying. I'll escalate the issue and get back to you soon on next steps. |
Another possible workaround: somewhere in |
That's a good point. If you don't find yourself collecting memory profiles from customer's machines via telemetry or anything else, then this seems like a decent (though unfortunate) workaround. |
Good catch. We'll take a look at that option and see if it resolves the issue. |
FYI if #71187 is fixed, that would give you another possible workaround. |
Hi @mknyszek, CC: @seanmcc-msft , @vibhansa-msft, @gapra-msft, @adreed-msft |
That suggests what you're experiencing is not quite the same as what this issue describes. Just to confirm, have you looked at the |
@mknyszek, we agree. We are also going to attempt to repo with the environment variable |
We did a bit more experimentation-- Neither the godebug flag nor the runtime variable happened to resolve the problem. Cracking open a new issue. |
Go version
go1.23.1
Output of
go env
in your module/workspace:What did you do?
We upgraded our Go services to Go 1.23.1. All of our services use continuous profiling and have the heap profiler enabled. Go 1.23 increased the default call stack depth for the heap profiler (and others) from 32 frames to 128 frames.
What did you see happen?
We saw a significant increase in memory usage for one of our services, in particular the
/memory/classes/profiling/buckets:bytes
runtime metric:The maximum went from ~50MiB to almost 4GiB, an 80x increase. We also saw a significant increase in the time to serialize the heap profile, from <1 second to over 20 seconds.
We set the environment variable
GODEBUG=profstackdepth=32
to get the old limit, and the profiling bucket memory usage went back down.What did you expect to see?
We were surprised at first to see such a significant memory usage increase. However, the affected program is doing just about the worst-case thing for the heap profiler. It parses complex, deeply-nested XML. This results in a massive number of unique, deep stack traces due to the mutual recursion in the XML parser. And the heap profiler never frees any stack trace it collects, so the cumulative size of the buckets becomes significant as more and more unique stack traces are observed.
See this gist for a (kind of kludgy) example program which sees a 100x increase in bucket size from Go 1.22 to Go 1.23.
I'm mainly filing this issue to document this behavior. Manually setting
GODEBUG=profstackdepth=32
mitigates the issue. I don't think anything necessarily needs to change in the runtime right now, unless this turns out to be a widespread problem.cc @felixge
The text was updated successfully, but these errors were encountered: