-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak with sync.Pool
in JetStreams
#6442
Comments
Ran
See also: main...maurice/6442 The Looking at the output now, that does appear to be the case:
Could you share the original reproducer instead of the benchmark? Likely the benchmark doesn't cover some part that your reproducer did. |
I see, thanks for the example. I cannot share original reproducer unfortunately because it's a part of production infrastructure including services. I can share memory profiles and maybe NATS logs but they are almost empty. Could it be helpful? |
Feel free to share those, [email protected] Could you also share |
Thanks! I sent profiles to your email, let me share statistics here:
It is a part of ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ JetStream Summary │
├───────────┬─────────┬─────────┬───────────┬──────────┬─────────┬────────┬─────────┬─────────┬─────────────┤
│ Server │ Cluster │ Streams │ Consumers │ Messages │ Bytes │ Memory │ File │ API Req │ API Err │
├───────────┼─────────┼─────────┼───────────┼──────────┼─────────┼────────┼─────────┼─────────┼─────────────┤
│ nats │ │ 63 │ 3,132 │ 172,671 │ 254 MiB │ 0 B │ 254 MiB │ 7,174 │ 18 / 0.250% │
├───────────┼─────────┼─────────┼───────────┼──────────┼─────────┼────────┼─────────┼─────────┼─────────────┤
│ │ │ 63 │ 3,132 │ 172,671 │ 254 MIB │ 0 B │ 254 MIB │ 7,174 │ 18 │
╰───────────┴─────────┴─────────┴───────────┴──────────┴─────────┴────────┴─────────┴─────────┴─────────────╯ As you can see these two streams take most of our workloads. A service is reading messages from I attached to the email another profile - the moment when we add a new |
Could you share that stream/consumer info still? Would like to see the full configuration outputs from those commands. |
Observed behavior
We have a few consumers for a stream with a high volume of messages, each averaging a few KB in size. The stream works perfectly fine when only the producer adds messages. However, when we add a couple of consumers to the stream, it starts leaking memory, about 500MB in 10 minutes.
Other streams with lighter workloads don't add troubles. Sometimes, memory consumption either slows down or crashes rapidly when we add consumers to other random streams. Currently, we must disable this stream because it consumes all 128GB of memory on the host.
We set
GOMEMLIMIT
, but to be honest, we expected it to help more, the memory growth slowed but didn't stop.After some profiling and investigation, I found interesting behavior in
server/stream.go
. It seems you intended to optimize garbage generation using sync.Pool, but the implementation confused me a bit. I expected to seeNew()
in the definition, but it wasn’t there. My understanding is that youPut
a new object into the pool first, then retrieve it withGet
elsewhere. This approach is complex and makes it hard to track where objects are created,Get
, andPut
into the pool, which might be the source of the bug. See Steps to reproduce, the number ofPut
operations exceeds the number ofGet
operations.If I removed
sync.Pool
lines:GC works as expected and memory doesn't leak.
I hope it will help you, thanks.
Expected behavior
Memory is collected by GC
Server and client version
github.com/nats-io/nats-server
different versions including main branchgithub.com/nats-io/nats.go
v1.37.0Host environment
Ubuntu 20.04 x86 without containers.
Steps to reproduce
To investigate that I ran
BenchmarkJetStreamConsumeWithFilters
with a few updates:Put
inreturnToPool
Get
ingetJSPubMsgFromPool
After the benchmark finished I got the result:
The branch with the code.
The text was updated successfully, but these errors were encountered: