-
Notifications
You must be signed in to change notification settings - Fork 25.4k
[ML] Retry on streaming errors #123076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Retry on streaming errors #123076
Conversation
We now always retry based on the provider's configured retry logic rather than the HTTP status code. Some providers (e.g. Cohere, Anthropic) will return 200 status codes with error bodies, others (e.g. OpenAI, Azure) will return non-200 status codes with non-streaming bodies. Notes: - Refactored from HttpResult to StreamingHttpResult, the byte body is now the streaming element while the http response lives outside the stream. - Refactored StreamingHttpResultPublisher so that it only pushes byte body into a queue. - Tests all now have to wait for the response to be fully consumed before closing the service, otherwise the close method will shut down the mock web server and apache will throw an error.
Hi @prwhelan, I've created a changelog YAML for you. |
Pinging @elastic/ml-core (Team:ML) |
class StreamingHttpResultPublisher implements HttpAsyncResponseConsumer<HttpResponse>, Flow.Publisher<HttpResult> { | ||
private final HttpSettings settings; | ||
private final ActionListener<Flow.Publisher<HttpResult>> listener; | ||
class StreamingHttpResultPublisher implements HttpAsyncResponseConsumer<Void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is almost completely different, it might be easier to review it as if it's new.
We're now just sending the byte[]
as a stream, rather than sending HttpResult(response, byte[])
, which simplifies what is being queued.
I separated the class out into the main Apache consumer and two subclasses to publish the consumed data and manage pausing/unpausing Apache. It's hopefully clearer that we're doing three distinct parts:
- Apache will continuously read the response bytes and store it in our buffer queue
- Meanwhile our response handling code will only pull from the buffer when it is ready to send data to the client (e.g. when the client requests it)
- We have the pause/unpause logic to slow Apache down if we've stored too many bytes in memory and are moving too slowly to drain the buffer.
|
||
import static org.elasticsearch.core.Strings.format; | ||
|
||
class StreamingResponseHandler implements Flow.Processor<HttpResult, HttpResult> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class only existed to read the response headers and determine if there is an error, but we now do that in RetryingHttpSender
directly
return RestStatus.isSuccessful(response.getStatusLine().getStatusCode()); | ||
} | ||
|
||
public Flow.Publisher<HttpResult> toHttpResult() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toHttpResult
is a bit of a stopgap to shorten the PR - I didn't want to refactor every provider just yet, but in theory they should all be able to read byte[] directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just left a few questions.
Does this PR implementing retrying on midstream and beginning of stream errors? Or does there need to be a follow up for the providers after this?
} | ||
|
||
private void addBytesAndMaybePause(long count, IOControl ioControl) { | ||
if (bytesInQueue.accumulateAndGet(count, Long::sum) >= settings.getMaxResponseSize().getBytes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use addAndGet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah idk why I used two different methods in this one file lol
|
||
private void subtractBytesAndMaybeUnpause(long count) { | ||
var currentBytesInQueue = bytesInQueue.updateAndGet(current -> Long.max(0, current - count)); | ||
if (savedIoControl != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to wrap this check in a synchronized block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so? Because the resumeProducer()
call will lock, so if we ever get into a state where we have two threads competing to unpause, then the worst thing we do is calculate the multiplication twice. We should never be pausing while we are unpausing (Apache shouldn't be calling us with more data when we are paused), but if we are then locking wouldn't help mitigate that either, since we'd be able to unpause and immediately pause.
try { | ||
responseHandler.validateResponse(throttlerManager, logger, request, httpResult); | ||
InferenceServiceResults inferenceResults = responseHandler.parseResult(request, httpResult); | ||
ll.onResponse(inferenceResults); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand this flow correctly, we can get a status code that indicates a failure but if validateResponse
doesn't throw an error we'll return an actual result? Or are we calling onResponse
here to also handle returning an error object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can get a status code that indicates a failure but if validateResponse doesn't throw an error we'll return an actual result
I was thinking this might be an option, but otherwise most providers would throw an exception and we'd call the listener.onFailure
|
||
private void addBytesAndMaybePause(long count, IOControl ioControl) { | ||
if (bytesInQueue.accumulateAndGet(count, Long::sum) >= settings.getMaxResponseSize().getBytes()) { | ||
pauseProducer(ioControl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the addBytesAndMaybePause
could be called again after the queue is already full (aka such that the if-block would return true)? Would that matter? I assume the most recent IOControl
supersedes any that we've set previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it shouldn't happen, but it's okay to pause IOControl
twice, and yeah any recent one supersedes the other. We only need one resume call to continue.
💔 Backport failed
You can use sqren/backport to manually backport by running |
We now always retry based on the provider's configured retry logic rather than the HTTP status code. Some providers (e.g. Cohere, Anthropic) will return 200 status codes with error bodies, others (e.g. OpenAI, Azure) will return non-200 status codes with non-streaming bodies. Notes: - Refactored from HttpResult to StreamingHttpResult, the byte body is now the streaming element while the http response lives outside the stream. - Refactored StreamingHttpResultPublisher so that it only pushes byte body into a queue. - Tests all now have to wait for the response to be fully consumed before closing the service, otherwise the close method will shut down the mock web server and apache will throw an error.
We now always retry based on the provider's configured retry logic rather than the HTTP status code. Some providers (e.g. Cohere, Anthropic) will return 200 status codes with error bodies, others (e.g. OpenAI, Azure) will return non-200 status codes with non-streaming bodies. Notes: - Refactored from HttpResult to StreamingHttpResult, the byte body is now the streaming element while the http response lives outside the stream. - Refactored StreamingHttpResultPublisher so that it only pushes byte body into a queue. - Tests all now have to wait for the response to be fully consumed before closing the service, otherwise the close method will shut down the mock web server and apache will throw an error.
We now always retry based on the provider's configured retry logic rather than the HTTP status code. Some providers (e.g. Cohere, Anthropic) will return 200 status codes with error bodies, others (e.g. OpenAI, Azure) will return non-200 status codes with non-streaming bodies. Notes: - Refactored from HttpResult to StreamingHttpResult, the byte body is now the streaming element while the http response lives outside the stream. - Refactored StreamingHttpResultPublisher so that it only pushes byte body into a queue. - Tests all now have to wait for the response to be fully consumed before closing the service, otherwise the close method will shut down the mock web server and apache will throw an error.
* [ML] Retry on streaming errors (#123076) We now always retry based on the provider's configured retry logic rather than the HTTP status code. Some providers (e.g. Cohere, Anthropic) will return 200 status codes with error bodies, others (e.g. OpenAI, Azure) will return non-200 status codes with non-streaming bodies. Notes: - Refactored from HttpResult to StreamingHttpResult, the byte body is now the streaming element while the http response lives outside the stream. - Refactored StreamingHttpResultPublisher so that it only pushes byte body into a queue. - Tests all now have to wait for the response to be fully consumed before closing the service, otherwise the close method will shut down the mock web server and apache will throw an error. * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]>
* [ML] Retry on streaming errors (#123076) We now always retry based on the provider's configured retry logic rather than the HTTP status code. Some providers (e.g. Cohere, Anthropic) will return 200 status codes with error bodies, others (e.g. OpenAI, Azure) will return non-200 status codes with non-streaming bodies. Notes: - Refactored from HttpResult to StreamingHttpResult, the byte body is now the streaming element while the http response lives outside the stream. - Refactored StreamingHttpResultPublisher so that it only pushes byte body into a queue. - Tests all now have to wait for the response to be fully consumed before closing the service, otherwise the close method will shut down the mock web server and apache will throw an error. * Use old isSuccess API
* [ML] Retry on streaming errors (#123076) We now always retry based on the provider's configured retry logic rather than the HTTP status code. Some providers (e.g. Cohere, Anthropic) will return 200 status codes with error bodies, others (e.g. OpenAI, Azure) will return non-200 status codes with non-streaming bodies. Notes: - Refactored from HttpResult to StreamingHttpResult, the byte body is now the streaming element while the http response lives outside the stream. - Refactored StreamingHttpResultPublisher so that it only pushes byte body into a queue. - Tests all now have to wait for the response to be fully consumed before closing the service, otherwise the close method will shut down the mock web server and apache will throw an error. * [CI] Auto commit changes from spotless * Use old isSuccess API --------- Co-authored-by: elasticsearchmachine <[email protected]>
We now always retry based on the provider's configured retry logic rather than the HTTP status code. Some providers (e.g. Cohere, Anthropic) will return 200 status codes with error bodies, others (e.g. OpenAI, Azure) will return non-200 status codes with non-streaming bodies. Notes: - Refactored from HttpResult to StreamingHttpResult, the byte body is now the streaming element while the http response lives outside the stream. - Refactored StreamingHttpResultPublisher so that it only pushes byte body into a queue. - Tests all now have to wait for the response to be fully consumed before closing the service, otherwise the close method will shut down the mock web server and apache will throw an error.
We now always retry based on the provider's configured retry logic rather than the HTTP status code. Some providers (e.g. Cohere, Anthropic) will return 200 status codes with error bodies, others (e.g. OpenAI, Azure) will return non-200 status codes with non-streaming bodies.
Notes: