-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Explicit HTTP content copy/retain #116115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...rt-netty4/src/internalClusterTest/java/org/elasticsearch/http/netty4/Netty4PipeliningIT.java
Outdated
Show resolved
Hide resolved
Hi @mhl-b, I've created a changelog YAML for you. |
Pinging @elastic/es-distributed (Team:Distributed) |
…csearch into http-unsafe-buffers-default
...rt-netty4/src/internalClusterTest/java/org/elasticsearch/http/netty4/Netty4PipeliningIT.java
Outdated
Show resolved
Hide resolved
* Release underlying HTTP request and related buffers. | ||
*/ | ||
@Override | ||
public void close() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only for the testPipelineOverflow
workaround? If so, I'd rather we used a package-private method instead of making this public and Releasable
. The eventual goal would be to release these buffers up-front and then we can remove the workaround, and in the meantime we don't want to accumulate other spots that explicitly close the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add method that returns HttpBody and then we can close body explicitly. Right now there is public access from RestRequest to close pooled/streamed content.
Armin raised a concern that with this change we'll be retaining these Netty buffers for much longer than we do today, which might result in much more stress on the Netty allocator (especially since the Netty buffers are oversized). Could we move to releasing these things on return from |
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
@DaveCTurner, I addressed feedback 226060e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - two optional comments that don't block this change.
@@ -102,9 +103,11 @@ public RestChannelConsumer prepareRequest(final RestRequest request, final NodeC | |||
boolean defaultRequireDataStream = request.paramAsBoolean(DocWriteRequest.REQUIRE_DATA_STREAM, false); | |||
bulkRequest.timeout(request.paramAsTime("timeout", BulkShardRequest.DEFAULT_TIMEOUT)); | |||
bulkRequest.setRefreshPolicy(request.param("refresh")); | |||
ReleasableBytesReference content = request.requiredReleasableContent(); | |||
content.mustIncRef(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the exception handling here is correct, but I wonder if we should use the same pattern as in RestIndexAction
where we call mustIncRef
within the lambda, just before client.bulk
. That will be more robust to future changes that might introduce more exception paths around here.
if (refCnt() == 1) { | ||
trashContent(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's not valid but it's also not obvious from the code that this isn't valid. Suggest a comment like this:
diff --git a/modules/transport-netty4/src/main/java/org/elasticsearch/transport/netty4/NettyAllocator.java b/modules/transport-netty4/src/main/java/org/elasticsearch/transport/netty4/NettyAllocator.java
index 7fd092c1964..791d55481d5 100644
--- a/modules/transport-netty4/src/main/java/org/elasticsearch/transport/netty4/NettyAllocator.java
+++ b/modules/transport-netty4/src/main/java/org/elasticsearch/transport/netty4/NettyAllocator.java
@@ -373,6 +373,7 @@ public class NettyAllocator {
@Override
public boolean release() {
if (refCnt() == 1) {
+ // see [NOTE on racy trashContent() calls]
trashContent();
}
return super.release();
@@ -381,11 +382,20 @@ public class NettyAllocator {
@Override
public boolean release(int decrement) {
if (refCnt() == decrement && refCnt() > 0) {
+ // see [NOTE on racy trashContent() calls]
trashContent();
}
return super.release(decrement);
}
+ // [NOTE on racy trashContent() calls]: We trash the buffer content _before_ reducing the ref count to zero, which looks racy
+ // because in principle a concurrent caller could come along and successfully retain() this buffer to keep it alive after it's been
+ // trashed. Such a caller would sometimes get an IllegalReferenceCountException ofc but that's something it could handle - see for
+ // instance org.elasticsearch.transport.netty4.Netty4Utils.ByteBufRefCounted.tryIncRef. Yet in practice this should never happen,
+ // we only ever retain() these buffers while we know them to be alive (i.e. via RefCounted#mustIncRef or its moral equivalents) so
+ // it'd be a bug for a caller to retain() a buffer whose ref count is heading to zero and whose contents we've already decided to
+ // trash.
+
private void trashContent() {
if (trashed == false) {
trashed = true;
@@ -117,10 +120,16 @@ public RestChannelConsumer prepareRequest(final RestRequest request, final NodeC | |||
request.getRestApiVersion() | |||
); | |||
} catch (Exception e) { | |||
content.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't close the content on exception here any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM again
💔 Backport failed
You can use sqren/backport to manually backport by running |
Today we implicitly copy HTTP content before parsing, except for two
RestHandlers
-RestSearchAction
andRestBulkAction
. The original problem comes from netty pooled byte buffers, netty tries to reduce allocations and GC pressure by manual reference counting. But there is no mechanism in RestHandler that allow us to keep track of the refCount, so to prevent leaking buffers we copy whole http body and let GC take care of it.The flag
RestHandler#allowsUnsafeBuffers()
was introduced to indicate that Handler will not leak content at the end of request so it can be cleaned up properly. Flag is removed in this PR.This PR removes implicit HTTP content copy. Right now
RestHandler
must explicitly tell how to handle content inprepareRequest()
- either copy or retain. The content reference count will be decremented afterRestHandler.handlerRequest
and content will be released if it reaches 0.Major change is in RestRequest and "rest" package. A few new methods introduced to help with content handling.
RestRequest.content()
allocates new GC-managed buffer (aka "safe") and copy content.RestRequest.requiredContent()
depends on content(), means a copy.releasableContent()
andrequiredReleasableContent()
. These methods returnReleasableBytesReference
backed by Netty ByteBuf. These two are recommended way to handle content without copying it.Additionally, a new
ByteBufAllocator
is introduced -TrashingByteBufAllocator
. It allocates buffers that trash(fill with zeros) it's content on release. This allocator works only in tests by wrapping our current allocator. Our abstraction over byte buffersReleasableBytesReference
can expose underlying byte arrays, effectively escaping ref counts lifecycle. That means we might still have an access to bytes after release that will pass undetected. Filling buffer with zeros increases chances to catch use-after-free in tests.Netty4TrashingAllocatorIT
displays how RestHandler can leak buffers.