Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bufferpool: Improve performance #603

Merged
merged 1 commit into from
Jan 11, 2025
Merged

Conversation

matheusd
Copy link
Contributor

This improves the performance of bufferpool by removing the mutex called on every operation (via init()) and improving bucket indexing by removing the loop and using direct index calculation.

Warning: this is a breaking API change. Zero valued Pool objects are no longer safe to be used and MUST be initialized by NewPool().

Benchmark results against current main branch
name                                        old time/op    new time/op    delta
MessageGetFirstSegment-6                      88.9ns ± 1%    51.5ns ± 2%  -42.04%  (p=0.000 n=10+10)
TextMovementBetweenSegments-6                  458µs ± 2%     457µs ± 1%     ~     (p=0.684 n=10+10)
Marshal-6                                     1.26µs ± 1%    1.21µs ± 1%   -4.35%  (p=0.000 n=10+10)
Marshal_ReuseMsg-6                             742ns ± 1%     682ns ± 1%   -8.12%  (p=0.000 n=9+10)
Unmarshal-6                                    890ns ± 1%     883ns ± 1%   -0.74%  (p=0.019 n=8+10)
Unmarshal_Reuse-6                              557ns ± 2%     562ns ± 2%     ~     (p=0.063 n=10+10)
Decode-6                                      4.91ms ± 1%    4.55ms ± 1%   -7.38%  (p=0.000 n=10+10)
Growth/SingleSegment/release=false-6          11.1ms ± 2%    11.2ms ± 2%   +0.97%  (p=0.019 n=10+10)
Growth/MultiSegment/release=false-6           11.9ms ± 1%    11.7ms ± 2%   -2.00%  (p=0.000 n=9+10)
Growth/SingleSegment/release=true-6           11.3ms ± 2%    11.3ms ± 1%     ~     (p=0.968 n=10+9)
Growth/MultiSegment/release=true-6            11.5ms ± 1%    11.5ms ± 2%     ~     (p=0.065 n=10+9)
SmallMessage/SingleSegment/release=false-6    1.15µs ± 2%    1.03µs ± 1%  -10.81%  (p=0.000 n=9+10)
SmallMessage/MultiSegment/release=false-6     1.22µs ± 1%    1.14µs ± 1%   -6.99%  (p=0.000 n=10+10)
SmallMessage/SingleSegment/release=true-6      721ns ± 1%     621ns ± 1%  -13.91%  (p=0.000 n=9+10)
SmallMessage/MultiSegment/release=true-6       667ns ± 1%     618ns ± 1%   -7.29%  (p=0.000 n=10+10)

name                                        old alloc/op   new alloc/op   delta
MessageGetFirstSegment-6                       0.00B          0.00B          ~     (all equal)
Marshal-6                                     1.32kB ± 0%    1.32kB ± 0%     ~     (p=0.294 n=10+8)
Marshal_ReuseMsg-6                             96.0B ± 0%     96.0B ± 0%     ~     (all equal)
Unmarshal-6                                     352B ± 0%      352B ± 0%     ~     (all equal)
Unmarshal_Reuse-6                              0.00B          0.00B          ~     (all equal)
Decode-6                                       801kB ± 0%     801kB ± 0%   +0.01%  (p=0.000 n=10+10)
Growth/SingleSegment/release=false-6          4.35MB ± 0%    4.35MB ± 0%     ~     (p=0.604 n=9+10)
Growth/MultiSegment/release=false-6           1.59MB ± 0%    1.59MB ± 0%     ~     (p=0.322 n=9+10)
Growth/SingleSegment/release=true-6           4.35MB ± 0%    4.35MB ± 0%     ~     (p=0.720 n=9+10)
Growth/MultiSegment/release=true-6             536kB ± 1%     536kB ± 0%     ~     (p=0.736 n=10+9)
SmallMessage/SingleSegment/release=false-6    1.22kB ± 0%    1.22kB ± 0%   +0.06%  (p=0.005 n=9+10)
SmallMessage/MultiSegment/release=false-6     1.43kB ± 0%    1.43kB ± 0%     ~     (p=0.248 n=10+9)
SmallMessage/SingleSegment/release=true-6      80.0B ± 0%     80.0B ± 0%     ~     (all equal)
SmallMessage/MultiSegment/release=true-6       80.0B ± 0%     80.0B ± 0%     ~     (all equal)

name                                        old allocs/op  new allocs/op  delta
MessageGetFirstSegment-6                        0.00           0.00          ~     (all equal)
Marshal-6                                       5.00 ± 0%      5.00 ± 0%     ~     (all equal)
Marshal_ReuseMsg-6                              1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Unmarshal-6                                     3.00 ± 0%      3.00 ± 0%     ~     (all equal)
Unmarshal_Reuse-6                               0.00           0.00          ~     (all equal)
Decode-6                                       10.0k ± 0%     10.0k ± 0%     ~     (all equal)
Growth/SingleSegment/release=false-6            20.0 ± 0%      20.0 ± 0%     ~     (all equal)
Growth/MultiSegment/release=false-6             20.0 ± 0%      20.0 ± 0%     ~     (all equal)
Growth/SingleSegment/release=true-6             18.8 ± 4%      19.0 ± 0%     ~     (p=0.471 n=9+9)
Growth/MultiSegment/release=true-6              4.00 ± 0%      4.00 ± 0%     ~     (all equal)
SmallMessage/SingleSegment/release=false-6      4.00 ± 0%      4.00 ± 0%     ~     (all equal)
SmallMessage/MultiSegment/release=false-6       5.00 ± 0%      5.00 ± 0%     ~     (all equal)
SmallMessage/SingleSegment/release=true-6       1.00 ± 0%      1.00 ± 0%     ~     (all equal)
SmallMessage/MultiSegment/release=true-6        1.00 ± 0%      1.00 ± 0%     ~     (all equal)

name                                        old speed      new speed      delta
Decode-6                                     196MB/s ± 1%   211MB/s ± 1%   +7.97%  (p=0.000 n=10+10)
Growth/SingleSegment/release=false-6        94.3MB/s ± 2%  93.4MB/s ± 2%   -0.96%  (p=0.017 n=10+10)
Growth/MultiSegment/release=false-6         87.9MB/s ± 1%  89.6MB/s ± 2%   +2.04%  (p=0.000 n=9+10)
Growth/SingleSegment/release=true-6         92.7MB/s ± 2%  92.6MB/s ± 3%     ~     (p=0.754 n=10+10)
Growth/MultiSegment/release=true-6          91.5MB/s ± 1%  90.8MB/s ± 2%     ~     (p=0.063 n=10+9)
SmallMessage/SingleSegment/release=false-6  62.5MB/s ± 2%  70.0MB/s ± 1%  +12.10%  (p=0.000 n=9+10)
SmallMessage/MultiSegment/release=false-6   58.8MB/s ± 1%  63.3MB/s ± 1%   +7.51%  (p=0.000 n=10+10)
SmallMessage/SingleSegment/release=true-6    100MB/s ± 1%   116MB/s ± 1%  +16.15%  (p=0.000 n=9+10)
SmallMessage/MultiSegment/release=true-6     108MB/s ± 1%   116MB/s ± 1%   +7.87%  (p=0.000 n=10+10)

This improves the performance of bufferpool by removing the mutex called
on every operation (via init()) and improving bucket indexing by
removing the loop and using direct index calculation.

Warning: this is a breaking API change. Zero valued Pool objects are no
longer safe to be used and MUST be initialized by NewPool().
@Semisol
Copy link
Contributor

Semisol commented Jan 11, 2025

I am seeing some odd results on my side, with RPC increasing by a consistent 10% or so. packed Reader-16 benchmark seems to be impacted too for some reason.

I will report back after I run the tests on another machine.

@matheusd
Copy link
Contributor Author

I haven't looked, but two things to watch for:

  1. I believe there was actually a bug in the prior bufferpool implementation that was not respecting minAlloc, so using this fixed version increased allocs (and thus reduced overall throughput).

  2. The benchmark may not be releasing data back to the bufferpool, which combined with the prior point may be increasing pressure to runtime memory management (that was the reason I added the .Release() in the prior benchmark).

Copy link
Collaborator

@lthibault lthibault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@lthibault lthibault merged commit c559d94 into capnproto:main Jan 11, 2025
4 checks passed
@matheusd matheusd deleted the bufferpool branch January 13, 2025 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants