Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: significant heap profiler memory usage increase in Go 1.23 #69590

Open
nsrip-dd opened this issue Sep 23, 2024 · 18 comments
Open

runtime: significant heap profiler memory usage increase in Go 1.23 #69590

nsrip-dd opened this issue Sep 23, 2024 · 18 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@nsrip-dd
Copy link
Contributor

nsrip-dd commented Sep 23, 2024

Go version

go1.23.1

Output of go env in your module/workspace:

n/a

What did you do?

We upgraded our Go services to Go 1.23.1. All of our services use continuous profiling and have the heap profiler enabled. Go 1.23 increased the default call stack depth for the heap profiler (and others) from 32 frames to 128 frames.

What did you see happen?

We saw a significant increase in memory usage for one of our services, in particular the /memory/classes/profiling/buckets:bytes runtime metric:

Screenshot 2024-09-23 at 10 37 58

The maximum went from ~50MiB to almost 4GiB, an 80x increase. We also saw a significant increase in the time to serialize the heap profile, from <1 second to over 20 seconds.

We set the environment variable GODEBUG=profstackdepth=32 to get the old limit, and the profiling bucket memory usage went back down.

What did you expect to see?

We were surprised at first to see such a significant memory usage increase. However, the affected program is doing just about the worst-case thing for the heap profiler. It parses complex, deeply-nested XML. This results in a massive number of unique, deep stack traces due to the mutual recursion in the XML parser. And the heap profiler never frees any stack trace it collects, so the cumulative size of the buckets becomes significant as more and more unique stack traces are observed.

See this gist for a (kind of kludgy) example program which sees a 100x increase in bucket size from Go 1.22 to Go 1.23.

I'm mainly filing this issue to document this behavior. Manually setting GODEBUG=profstackdepth=32 mitigates the issue. I don't think anything necessarily needs to change in the runtime right now, unless this turns out to be a widespread problem.

cc @felixge

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 23, 2024
@gabyhelp
Copy link

@ianlancetaylor
Copy link
Member

CC @golang/runtime @felixge

@dr2chase dr2chase added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 24, 2024
@mknyszek mknyszek added this to the Backlog milestone Sep 25, 2024
@mknyszek
Copy link
Contributor

Thanks for the detailed issue! I'm glad this seems localized for now; I agree we should wait and see if this more widespread.

@robert-thille-cb
Copy link

Just wanted to confirm that the GODEBUG=profstackdepth=32 environment setting needs to be done at runtime, not at build time. Correct?

@nsrip-dd
Copy link
Contributor Author

Yes, that's correct

@prattmic
Copy link
Member

Copying from #57175 (comment):

This is theoretically a problem even with 32 frames. i.e., with left and right recursive frames, there are 2^32 combinations of those frames. Storing that many unique samples would likely use ~128GB of memory, assuming 1 byte per frame in a sample.

The reproducer above only triggers extreme usage only with a higher stack limit because it has 16 constant leaf frames. That limits to 2^16 combinations with a 32 frame limit, but 2^112 with a 128 frame limit. The production application that encountered this way presumably similar.

@seanmcc-msft
Copy link

seanmcc-msft commented Jan 8, 2025

We are also experiencing this issue after upgrading AzCopy to Go v1.23.x - Azure/azure-storage-azcopy#2901

This is a show-stopping issue requiring us to downgrade AzCopy back to 1.22.x. We cannot stay on 1.22.x forever, as CVEs will be reported on these versions.

@dr2chase and @mknyszek, any updates on this issue?

CC: @dphulkar-msft, @vibhansa-msft, @gapra-msft, @adreed-msft

@mknyszek
Copy link
Contributor

mknyszek commented Jan 8, 2025

As noted in the original post on this issue, you can set GODEBUG=profstackdepth=32 as a workaround. Unfortunately this value can only be set before startup, not in the application itself. Is that a sufficient to unblock you? If not, why? Thanks.

@seanmcc-msft
Copy link

@mknyszek AzCopy is an application that customers download and run on their machine. We cannot set env variables on the customer's machine beforehand.

@mknyszek
Copy link
Contributor

mknyszek commented Jan 8, 2025

Yeah, that doesn't have a workaround right now. Thanks for clarifying. I'll escalate the issue and get back to you soon on next steps.

@nsrip-dd
Copy link
Contributor Author

nsrip-dd commented Jan 8, 2025

Another possible workaround: somewhere in package main, either in func main() or a func init(), can you set runtime.MemProfileRate=0? That way the memory profiler will be off and won't collect any stacks.

@mknyszek
Copy link
Contributor

mknyszek commented Jan 8, 2025

That's a good point. If you don't find yourself collecting memory profiles from customer's machines via telemetry or anything else, then this seems like a decent (though unfortunate) workaround.

@adreed-msft
Copy link

Good catch. We'll take a look at that option and see if it resolves the issue.

@mknyszek
Copy link
Contributor

mknyszek commented Jan 8, 2025

FYI if #71187 is fixed, that would give you another possible workaround.

@dphulkar-msft
Copy link

Hi @mknyszek,
We attempted to set the runtime.MemoryProfileRate=0 variable in the main() function as suggested, but unfortunately, it did not resolve the issue.

CC: @seanmcc-msft , @vibhansa-msft, @gapra-msft, @adreed-msft

@mknyszek
Copy link
Contributor

mknyszek commented Jan 9, 2025

That suggests what you're experiencing is not quite the same as what this issue describes.

Just to confirm, have you looked at the runtime/metrics metric for /memory/classes/profiling/buckets:bytes? Is it the dominant source of your memory usage when switching to Go 1.23?

@seanmcc-msft
Copy link

@mknyszek, we agree. We are also going to attempt to repo with the environment variable GODEBUG=profstackdepth=32 set. If this does not mitigate the issue, we will probably open a new GitHub issue with full context.

@adreed-msft
Copy link

We did a bit more experimentation-- Neither the godebug flag nor the runtime variable happened to resolve the problem. Cracking open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests