Skip to content

Retry on ClusterBlockException on transform destination index #118194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

dan-rubinstein
Copy link
Member

@dan-rubinstein dan-rubinstein commented Dec 6, 2024

Issue - https://github.com/elastic/ml-team/issues/1417

Description

As part of the upgrade process from 8.x -> 9.x users will have to re-index their indices. The recommended route is using the kibana upgrade assistant to do this which will add an index block to the original index before re-indexing and will remove the block after re-indexing. Currently, the upgrade assistant does not stop transforms before re-indexing or start them after re-indexing. As such, if a user begins the re-index process, all transforms with destination indices that are being re-indexed will fail due to the write block. This change adds a retry mechanism that continuously retries until the write-block is removed at which point the transform will succeed. The retry count does not increase.

Both write and read-only blocks will work with this change as they both throw the same ClusterBlockException. The kibana upgrade assistant likely uses read-only blocks but since write-blocks would raise the same problem for transforms, it makes sense to treat the scenario the same way.

This change also adds more rigorous testing to the overall transform failure handler as some of the failure cases were missing from testing.

Note: This logic occurs at the end of the transform process when we attempt to write to the destination index. As a follow-up improvement, we will be adding this check to the start of the transform process to avoid doing unnecessary reads from the source index as well.

Note: As another follow-up improvement, we would like to modify the transform health check process to catch this case and flag the transform's health as yellow instead of green which it currently does with the retries.

Test

  • Unit tests
  • Created/started a transform and added a write block on the destination index. Ensured that the retry count did not increment and the transform continuously retried until the write block was removed. See this comment for the commands run for this process.
  • Created/started a transform and added a read-only block on the destination index. Ensured that the retry count did not increment and the transform continuously retried until the write block was removed. See this comment for the commands run for this process.

@dan-rubinstein dan-rubinstein added >enhancement :ml Machine learning Team:ML Meta label for the ML team v9.0.0 labels Dec 6, 2024
@elasticsearchmachine elasticsearchmachine added the external-contributor Pull request authored by a developer outside the Elasticsearch team label Dec 6, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @dan-rubinstein, I've created a changelog YAML for you.

@dan-rubinstein dan-rubinstein added v8.18.0 and removed external-contributor Pull request authored by a developer outside the Elasticsearch team labels Dec 9, 2024
@dan-rubinstein dan-rubinstein marked this pull request as ready for review December 9, 2024 21:27
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@dan-rubinstein dan-rubinstein added the auto-backport Automatically create backport pull requests when merged label Dec 10, 2024
@dan-rubinstein
Copy link
Member Author

@elasticmachine merge upstream

@dan-rubinstein
Copy link
Member Author

@elasticmachine merge upstream

@dan-rubinstein
Copy link
Member Author

@elasticmachine merge upstream

@dan-rubinstein
Copy link
Member Author

@elasticmachine merge upstream

@dan-rubinstein
Copy link
Member Author

@elasticmachine merge upstream

@dan-rubinstein dan-rubinstein merged commit 78488c7 into elastic:main Dec 12, 2024
16 checks passed
dan-rubinstein added a commit to dan-rubinstein/elasticsearch that referenced this pull request Dec 12, 2024
…c#118194)

* Retry on ClusterBlockException on transform destination index

* Update docs/changelog/118194.yaml

* Cleaning up tests

* Fixing tests

---------

Co-authored-by: Elastic Machine <[email protected]>
elasticsearchmachine pushed a commit that referenced this pull request Dec 12, 2024
… (#118581)

* Retry on ClusterBlockException on transform destination index

* Update docs/changelog/118194.yaml

* Cleaning up tests

* Fixing tests

---------

Co-authored-by: Elastic Machine <[email protected]>
maxhniebergall pushed a commit to maxhniebergall/elasticsearch that referenced this pull request Dec 16, 2024
…c#118194) (elastic#118581)

* Retry on ClusterBlockException on transform destination index

* Update docs/changelog/118194.yaml

* Cleaning up tests

* Fixing tests

---------

Co-authored-by: Elastic Machine <[email protected]>
maxhniebergall pushed a commit to maxhniebergall/elasticsearch that referenced this pull request Dec 16, 2024
…c#118194) (elastic#118581)

* Retry on ClusterBlockException on transform destination index

* Update docs/changelog/118194.yaml

* Cleaning up tests

* Fixing tests

---------

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >enhancement :ml Machine learning Team:ML Meta label for the ML team v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants