Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid composite buffers for small messages #68

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

franz1981
Copy link
Contributor

@franz1981 franz1981 commented Aug 22, 2023

This change is optimizing the use case of Netty composite buffers in case a small payload/data is required.

The NIO Netty flow will end up copying the compontents from the CompositeBuffer in a single native direct buffer at
https://github.com/netty/netty/blob/fbb0207d5ecce39f3d63450dfd59bad5510b8e8b/transport/src/main/java/io/netty/channel/nio/AbstractNioChannel.java#L443.

For a composite buffer made of 2 components (the length and compression flag + data ones) it means iterating them (updating for each the offsets etc etc) in CompositeByteBuf::getBytes(int index, ByteBuf dst, int dstIndex, int length) (calling back to UnpooledHeapByteBuf::getBytes(int index, ByteBuf dst, int dstIndex, int length) with the direct dst for each component, updating each ones offset, checking accessibility etc etc).

In the case of a single merged buffer, we pay an additional allocation (and copy of data), but:

  • the allocation is actually amortized: (for JDK 17 and COOPS/CCPS) allocating the composite + the length buffer still cost a total of +176 bytes, while allocating a new buffer (not composite) would cost data length + 5 + 48 bytes and perform the additional data copy. eg for 128 bytes data, the composite would still allocate +176 bytes, while this PR will allocate 128 + 53 = 181 bytes which is similar (but we can throw away immediately the original data buffer, no longer needed!)
  • no ping-pong of buffer types to find the appropriate copy method nor iterations required (or accessiblity/offsets adjustments), just calling straight to setBytes(AbstractByteBuf buf, long addr, int index, ByteBuf src, int srcIndex, int length) on the direct pooled buffer, using a single vertx heap buffer which contains length + data.

The point about reachability of buffers is a stealthy but important consideration:

  • in the original code version the reachable live data must be the original buffer AND the composite one, till the NIO channel copy them into the direct single buffer (to be sent on the wire) eg 128 bytes original buffer + 176 composite = 304 bytes live data
  • in the new code, instead, instead of the composite, we create a single new buffer, and the previous one case be GCed, making it, for 128 bytes original data, just 181 bytes in total (which is way less too!)

try {
return BufferInternal.buffer(fullMsg);
} finally {
bbuf.release();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vertx buffers are unreleasable and not pooled

Copy link
Contributor Author

@franz1981 franz1981 Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but the type is still ByteBuf so I've added it just in case we decide to change rules on the concrete type used: I can add an assertion here to be sure in the future nothing break too (or just remove it as by your suggestion)

@franz1981 franz1981 changed the title It enforces the right size of the Composite's capacity or avoid using it, if not necessary Avoid composite buffers for small messages Aug 22, 2023
@vietj vietj added this to the 5.0.0 milestone Aug 29, 2023
@franz1981
Copy link
Contributor Author

@He-Pin I remember you have commented on the gRpc benchmark...we are still working on enabling our load generator to support gRpc, hence we have no "official" and supported benchmarks to verify this. This is mostly based on my knowledge of the Netty stack and the OpenJDK platform; do you have anything to test it and verify if it is worthy?

if (len <= 128) {
int totalBytes = 5 + len;
// let's use vertx buffer here because it doesn't have any atomic release, if unpooled
ByteBuf fullMsg = VertxByteBufAllocator.DEFAULT.heapBuffer(totalBytes, totalBytes);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we would like to use pooled direct buffers here instead, wdyt @vietj ?
It would save the transport later to copy it into a direct one :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants