Memory/leak issue #97

matthewmrichter · 2018-05-24T15:19:04Z

We're using zetcd (running in a container - from the quay.io/repository/coreos/zetcd tag v0.0.5) as a middleware between etcd and Mesos. We're experiencing a behavior where the memory usage of the zetcd seems to continue to climb and climb and climb gradually. It was overflowing a 4gig ram instance very quickly, so moved it to a host with 8 gigs, but the zetcd container still seems to continue to grow and grow in memory usage.

I'd be interested in helping solve this.. is there anything that I can provide to help expose the memory leak? Is there any automatic garbage collecting or anything like that that can be implemented? Are there any docker container launch parameters to contain its hunger for memory?

gyuho · 2018-05-24T16:58:00Z

Hmm, do you see the same behavior with v0.0.4?

ref. v0.0.4...v0.0.5

matthewmrichter · 2018-05-24T17:00:25Z

Yep..

…

On Thu, May 24, 2018, 12:58 PM Gyuho Lee ***@***.***> wrote: Hmm, do you see the same behavior with v0.0.4? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#97 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AERhLEN_JOvXzwaG0K-51MVX3aIctl8fks5t1uacgaJpZM4UMdcE> .

gyuho · 2018-05-24T17:04:45Z

It would be best if you can provide reproducible steps. And also try to heap-profile zetcd.

matthewmrichter · 2018-05-24T17:58:54Z

I'm new to go, could you provide some guidance on enabling heap-profiling?

gyuho · 2018-05-24T18:22:30Z

@matthewmrichter Please enable profile via zetcd --pprof-addr flag.

And do something like

go tool pprof -seconds=30 http://zetcd-endpoint/debug/pprof/heap

go tool pprof ~/go/src/github.com/coreos/etcd/bin/etcd ./pprof/pprof.localhost\:2379.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
go tool pprof -pdf ~/go/src/github.com/coreos/etcd/bin/etcd ./pprof/pprof.localhost\:2379.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz > ~/a.pdf

Where you need to replace */bin/etcd binaries with the zetcd binary.

I would first try to reproduce without containerization.

matthewmrichter · 2018-05-24T18:27:52Z

Great, I'll put some time into that. Thanks so far

matthewmrichter · 2018-05-25T18:32:38Z

Ok, I think the main offender here may actually be Marathon(https://mesosphere.github.io/marathon/), not Mesos. The usage really shoots up when Marathon starts.

I converted zetcd to run as a service rather than containerized. I took a heap profile - this is shortly after startup. It already blasts up to 5 gig shortly after startup. I will keep an eye on htop for a little while to see if it begins to approach 7+ gig as well and provide another profile.

a.pdf

Steps to reproduce -

Build etcd cluster or endpoint (separate autoscaling group behind a load balancer)
On mesosmaster server, install zetcd. Configure to point to remote etcd cluster load balancer as etcd endpoint.
On mesosmaster server, install mesos, marathon. (marathon::version : '1.4.8-1.0.660.el7', mesos::version: '1.6.0-2.0.4')
Configure mesos and marathon to run with localhost:2181 (zetcd port) as zookeeper URL

matthewmrichter · 2018-05-25T20:02:45Z

I gave it a while, and the process according to htop had gotten up to 6 gigs. Here's a second profile aftertaken at this point, looks mostly the same:

b.pdf

matthewmrichter · 2018-05-29T15:18:58Z

Here's a qustion, based on the bottleneck being in that "ReadPacket" method..

Currently, I have etcd running on server A and marathon/zetcd on server B. Would it make more sense for zetcd and etcd to live on server A together rather than having zetcd reach out to etcd across the LAN?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory/leak issue #97

Memory/leak issue #97

matthewmrichter commented May 24, 2018

gyuho commented May 24, 2018 •

edited

Loading

matthewmrichter commented May 24, 2018 via email

gyuho commented May 24, 2018

matthewmrichter commented May 24, 2018

gyuho commented May 24, 2018 •

edited

Loading

matthewmrichter commented May 24, 2018

matthewmrichter commented May 25, 2018

matthewmrichter commented May 25, 2018 •

edited

Loading

matthewmrichter commented May 29, 2018

Memory/leak issue #97

Memory/leak issue #97

Comments

matthewmrichter commented May 24, 2018

gyuho commented May 24, 2018 • edited Loading

matthewmrichter commented May 24, 2018 via email

gyuho commented May 24, 2018

matthewmrichter commented May 24, 2018

gyuho commented May 24, 2018 • edited Loading

matthewmrichter commented May 24, 2018

matthewmrichter commented May 25, 2018

matthewmrichter commented May 25, 2018 • edited Loading

matthewmrichter commented May 29, 2018

gyuho commented May 24, 2018 •

edited

Loading

gyuho commented May 24, 2018 •

edited

Loading

matthewmrichter commented May 25, 2018 •

edited

Loading