Skip to content

Retry S3BlobContainer#getRegister on all exceptions #114813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

DaveCTurner
Copy link
Contributor

S3 register reads are subject to the regular client retry policy, but in
practice we see failures of these reads sometimes for errors that are
transient but for which the SDK does not retry. This commit adds another
layer of retries to these reads.

Relates ES-9721

S3 register reads are subject to the regular client retry policy, but in
practice we see failures of these reads sometimes for errors that are
transient but for which the SDK does not retry. This commit adds another
layer of retries to these reads.

Relates ES-9721
@DaveCTurner DaveCTurner added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v9.0.0 labels Oct 15, 2024
Copy link
Contributor

Documentation preview:

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Oct 15, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @DaveCTurner, I've created a changelog YAML for you.

continue;
} catch (InterruptedException interruptedException) {
Thread.currentThread().interrupt();
finalException.addSuppressed(interruptedException);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we break out of the loop if we're interrupted? I'm guessing something else in the loop will throw an InterruptedException if we continue, but would it be more polite to bail early?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we do break out here, we fall through to the throw finalException line here. I added a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes

} else if (finalException == null) {
finalException = attemptException;
} else if (finalException != attemptException) {
finalException.addSuppressed(attemptException);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it possible that finalException == attemptException? does something we call throw a singleton exception? I'm not sure I follow this condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who knows what the SDK does or might do in future, but calling addSuppressed with the same exception throws an IllegalArgumentException and we don't want that here.

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments

@DaveCTurner DaveCTurner enabled auto-merge (squash) October 16, 2024 06:01
@DaveCTurner DaveCTurner merged commit 9082e02 into elastic:main Oct 16, 2024
16 checks passed
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this pull request Oct 25, 2024
S3 register reads are subject to the regular client retry policy, but in
practice we see failures of these reads sometimes for errors that are
transient but for which the SDK does not retry. This commit adds another
layer of retries to these reads.

Relates ES-9721
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants