Skip to content

Propagate status codes from shard failures appropriately #118016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

pawankartik-elastic
Copy link
Contributor

@pawankartik-elastic pawankartik-elastic commented Dec 4, 2024

If the underlying error caused is NodeNotConnectedException and the reason is connection already closed, the status code that should be returned is 502 rather than 500. This allows Kibana to retry the search op.

Edit: the scope of this PR itself has changed since I first worked on it. The goal of this PR is to let the status code propagate so that it is returned back to the user appropriately. There's a separate task to map out Exception-s to 502; hence I'll be removing the related code form this PR. Adding this edit to let the readers understand how the scope has evolved.

Relates ES-10191.
Fixes #118482.

…dException` and undo `status()` override in `NodeDisconnectedException`
Walking the stacktrace explicitly and looking for a specific error (node
connection-related errors in this case) is a workaround rather than a
proper fix. Instead, let the status codes propagate all the way to the
top so that they can be reported as-is.
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinions about the implementation of SearchPhaseExecutionException#status(), this is something that the search-foundations team should review.

We shouldn't be changing the ConnectTransportException subclasses here.

@pawankartik-elastic pawankartik-elastic changed the title Return 502 if the underlying error is NodeNotConnectedException Propagate status codes from shard failures appropriately Dec 9, 2024
4xx status codes are not likely to appear along with 5xx status codes.
As a result, we do not need to account for them when looking at shard
failures' status codes.
In reference to the previous commit: this case is no longer needed.
@pawankartik-elastic pawankartik-elastic added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations labels Dec 19, 2024
Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left two comments and questions, thanks!

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @pawankartik-elastic !

@elasticsearchmachine
Copy link
Collaborator

Hi @pawankartik-elastic, I've created a changelog YAML for you.

@pawankartik-elastic pawankartik-elastic added the auto-backport Automatically create backport pull requests when merged label Dec 19, 2024
@pawankartik-elastic pawankartik-elastic dismissed DaveCTurner’s stale review December 19, 2024 21:34

Changes have been addressed and were reviewed by a member of the Search team.

@pawankartik-elastic pawankartik-elastic marked this pull request as ready for review December 23, 2024 08:13
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@pawankartik-elastic pawankartik-elastic merged commit 22990df into elastic:main Dec 23, 2024
16 checks passed
pawankartik-elastic added a commit to pawankartik-elastic/elasticsearch that referenced this pull request Dec 23, 2024
)

* Return 502 if the underlying error is `NodeNotConnectedException`

* Traverse through the cause stack trace and check for `NodeNotConnectedException` and undo `status()` override in `NodeDisconnectedException`

* Rewrite `while` condition

* Fix: precommit

* Let status codes propagate rather than walking the stacktrace explicitly

Walking the stacktrace explicitly and looking for a specific error (node
connection-related errors in this case) is a workaround rather than a
proper fix. Instead, let the status codes propagate all the way to the
top so that they can be reported as-is.

* Fix: unused import

* Fix null deref

* Do not map descendants of `ConnectTransportException` to `502`

* Fix: precommit

* Do not account for 4xx status codes

4xx status codes are not likely to appear along with 5xx status codes.
As a result, we do not need to account for them when looking at shard
failures' status codes.

* Remove unnecessary `switch` case

In reference to the previous commit: this case is no longer needed.

* Rewrite code comment

* Address review comments

* [CI] Auto commit changes from spotless

* Update docs/changelog/118016.yaml

---------

Co-authored-by: elasticsearchmachine <[email protected]>
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

elasticsearchmachine pushed a commit that referenced this pull request Dec 23, 2024
…119205)

* Return 502 if the underlying error is `NodeNotConnectedException`

* Traverse through the cause stack trace and check for `NodeNotConnectedException` and undo `status()` override in `NodeDisconnectedException`

* Rewrite `while` condition

* Fix: precommit

* Let status codes propagate rather than walking the stacktrace explicitly

Walking the stacktrace explicitly and looking for a specific error (node
connection-related errors in this case) is a workaround rather than a
proper fix. Instead, let the status codes propagate all the way to the
top so that they can be reported as-is.

* Fix: unused import

* Fix null deref

* Do not map descendants of `ConnectTransportException` to `502`

* Fix: precommit

* Do not account for 4xx status codes

4xx status codes are not likely to appear along with 5xx status codes.
As a result, we do not need to account for them when looking at shard
failures' status codes.

* Remove unnecessary `switch` case

In reference to the previous commit: this case is no longer needed.

* Rewrite code comment

* Address review comments

* [CI] Auto commit changes from spotless

* Update docs/changelog/118016.yaml

---------

Co-authored-by: elasticsearchmachine <[email protected]>
@pawankartik-elastic pawankartik-elastic deleted the pkar/node-not-conn-502 branch June 26, 2025 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >enhancement :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure that the status codes are propagated back to the REST layer
4 participants