Add a not-master state for desired balance #116904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

elasticsearchmachine merged 13 commits into elastic:main from ywangd:introduce-not-master-for-desired-balance

Dec 3, 2024

Member

ywangd commented Nov 17, 2024

The new state prevents a long running desired balance computation to set result after the node stands down as master.


          Add a not-master state for desired balance

3cb6452

The new state prevents a long running desired balance computation to set
result after the node stands down as master.

ywangd added >enhancement :Distributed Coordination/Allocation v9.0.0 labels

ywangd requested review from nicktindall, idegtiarenko and DaveCTurner

November 17, 2024 23:36

Collaborator

elasticsearchmachine commented Nov 17, 2024

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

elasticsearchmachine added the Team:Distributed Coordination label

Collaborator

elasticsearchmachine commented Nov 17, 2024

Hi @ywangd, I've created a changelog YAML for you.


          Update docs/changelog/116904.yaml

e2ca928

ywangd mentioned this pull request

[CI] MinimumMasterNodesIT testThreeNodesNoMasterBlock failing #115885

Closed

ywangd added 2 commits

November 18, 2024 11:15


          fix assertion

55f7997


          fix assertion again

c33066e

nicktindall reviewed

View reviewed changes

...va/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceShardsAllocator.java Outdated

+                          } else {
+                              logger.debug("Desired balance updated for [{}]", newDesiredBalance.lastConvergedIndex());
+                          }
+                          computedShardMovements.inc(DesiredBalance.shardMovements(updatedDesiredBalance, newDesiredBalance));

Contributor

nicktindall Nov 18, 2024

Is this right? shouldn't we be using DesiredBalance.shardMovements(previousDesiredBalabnce, newDesiredBalance)? perhaps I'm not reading this right

Contributor

nicktindall Nov 18, 2024 •

edited

Loading

Also for the diff above

Member Author

ywangd Nov 18, 2024

You are right. This unfortunately makes the code a bit ugly. Pushed 0b03f4e

ywangd added 2 commits

November 18, 2024 11:58


          remove assertion

2c8feed


          fix shard movements

0b03f4e

nicktindall reviewed

View reviewed changes

...va/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceShardsAllocator.java Outdated

+                      if (updatedDesiredBalance == newDesiredBalance) {
+                          if (logger.isTraceEnabled()) {
+                              var diff = DesiredBalance.hasChanges(updatedDesiredBalance, newDesiredBalance)
+                                  ? "Diff: " + DesiredBalance.humanReadableDiff(updatedDesiredBalance, newDesiredBalance)

Contributor

nicktindall Nov 18, 2024

Also the two lines above? updatedDesiredBalance -> oldDesiredBalance ?

Member Author

ywangd Nov 18, 2024

Yeah. Just noticed that. See a432406


          more fix

a432406

nicktindall reviewed

View reviewed changes

...va/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceShardsAllocator.java Outdated

+                          } else {
+                              logger.debug("Desired balance updated for [{}]", newDesiredBalance.lastConvergedIndex());
+                          }
+                          computedShardMovements.inc(DesiredBalance.shardMovements(oldDesiredBalance.get(), newDesiredBalance));

Contributor

nicktindall Nov 18, 2024 •

edited

Loading

Alternative approach? not sure if it's better or worse...

while (true) {
    final DesiredBalance previousBalance = currentDesiredBalanceRef.get();
    if (previousBalance == DesiredBalance.NOT_MASTER) {
        logger.debug("discard desired balance for [{}]", newDesiredBalance.lastConvergedIndex());
        break;
    }
    if (currentDesiredBalanceRef.compareAndSet(previousBalance, newDesiredBalance) {
        if (logger.isTraceEnabled()) {
            var diff = DesiredBalance.hasChanges(previousBalance, newDesiredBalance)
                ? "Diff: " + DesiredBalance.humanReadableDiff(previousBalance, newDesiredBalance)
                : "No changes";
            logger.trace("Desired balance updated: {}. {}", newDesiredBalance, diff);
        } else {
            logger.debug("Desired balance updated for [{}]", newDesiredBalance.lastConvergedIndex());
        }
        computedShardMovements.inc(DesiredBalance.shardMovements(previousBalance, newDesiredBalance));
        break;
    }
}

Contributor

nicktindall Nov 18, 2024

I think that's kind-of what updateAndGet does internally, because the javadoc says:

The function should be side-effect-free, since it may be re-applied when attempted updates fail due to contention among threads.

Member Author

ywangd Nov 18, 2024

That's viable as well. I don't have strong opinions. The concurrency here should be fairly low since it happens only during master failover.

Member Author

ywangd Nov 18, 2024

I think that's kind-of what updateAndGet does internally, because the javadoc says:

I think so too. We also use this pattern elsewhere in code.

I pushed b5cd708 to apply your suggeston.

I think it has an edge-case advantage to return earlier when the 1st read from currentDesiredBalanceRef is a not-master. The updateAndGet would probably try to set it again which may fail and trigger retry and see a different value which is from a newer term.

nicktindall reviewed

View reviewed changes

...va/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceShardsAllocator.java Outdated Show resolved Hide resolved


          apply suggestion

b5cd708

nicktindall approved these changes

View reviewed changes

Contributor

nicktindall left a comment

LGTM

ywangd mentioned this pull request

Skip eager reconciliation for empty routing table #116903

Merged

Member Author

ywangd commented Nov 25, 2024

@elasticmachine update branch


          Merge branch 'main' into introduce-not-master-for-desired-balance

0fc0e42

ywangd added the auto-merge-without-approval label


          fix test

b5b7b5a

pxsalehi reviewed

View reviewed changes

...va/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceShardsAllocator.java

    
            @@ -62,7 +64,7 @@ public class DesiredBalanceShardsAllocator implements ShardsAllocator {
          
                  private final AtomicLong indexGenerator = new AtomicLong(-1);

                  private final ConcurrentLinkedQueue<List<MoveAllocationCommand>> pendingDesiredBalanceMoves = new ConcurrentLinkedQueue<>();

                  private final MasterServiceTaskQueue<ReconcileDesiredBalanceTask> masterServiceTaskQueue;

                  private volatile DesiredBalance currentDesiredBalance = DesiredBalance.INITIAL;

                  private final AtomicReference<DesiredBalance> currentDesiredBalanceRef = new AtomicReference<>(DesiredBalance.NOT_MASTER);

Member

pxsalehi Nov 25, 2024

I find it a bit confusing that we have changed INITIAL to not be the initial state (at least that's what I understand from its name and usage). Maybe it deserves a different name and the two state could be clearly described in DesiredBalance?

Member Author

ywangd Dec 2, 2024

Fair point. I pushed 44cc48c to rename the two and add comments for both of them.

pxsalehi reviewed

View reviewed changes

...va/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceShardsAllocator.java Outdated

                               );
                               computationsExecuted.inc();
-                              if (currentDesiredBalance.finishReason() == DesiredBalance.ComputationFinishReason.STOP_EARLY) {
+                              final DesiredBalance currentDesiredBalance = currentDesiredBalanceRef.get();
+                              if (currentDesiredBalance == DesiredBalance.NOT_MASTER || currentDesiredBalance == DesiredBalance.INITIAL) {

Member

pxsalehi Nov 25, 2024

How come both of these states are considered here?

Member Author

ywangd Dec 2, 2024

The first one, NOT_MASTER, is for when the node concurrently stands down as master. The 2nd one is for when the node concurrently stands down as master but then quickly elected as master again, which is very rare, but not impossible, especially in tests.

pxsalehi reviewed

View reviewed changes

...va/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceShardsAllocator.java

-                          logger.trace("Desired balance updated: {}. {}", newDesiredBalance, diff);
-                      } else {
-                          logger.debug("Desired balance updated for [{}]", newDesiredBalance.lastConvergedIndex());
+                      while (true) {

Member

pxsalehi Nov 25, 2024

do we really have no better alternative to this while loop here? It is not clear why we need to do this.

Member

pxsalehi Nov 25, 2024

(unless this is a performance critical code path, I'd personally prefer more synchronization to this non-blocking approach.)

Member Author

ywangd Dec 2, 2024

The while loop is basically what AtomicReference#updateAndGet does internally. I prefer it over synchronized for this particular case. The concurrency should not be high. But it is called in a cluster state applier thread on both standing down as master and becoming master. I'd like to avoid synchronization for such two usages. This field still needs to be volatile even with synchronization (unless we want to synchronize on read paths which seems bad). So overall I think it's better to have it as an AtomicReference with a bit spinning just on the computation thread. The complexity seems ok to me.

Member

pxsalehi Dec 3, 2024

I don't think setCurrentDesiredBalance gets called from the applier thread. I don't want to block this, feel free to merge it as is. I'm also not objecting to the correctness. I just think this is unnecessarily over-complicated.

Member Author

ywangd Dec 3, 2024

I meant currentDesiredBalanceRef is set on the applier thread, not this setCurrentDesiredBalance method which also sets currentDesiredBalanceRef. This method here is called only on the desired balance computation thread for which I think a bit spinning is fine. But I don't want the applier threads to have either spinning or synchronization.

Thanks for the discussion. Since you don't have strong objection (IIUC), I'd prefer to merge it as is. In my view, this method is really just a updateAndGet.

ywangd added 2 commits

December 2, 2024 15:40


          Merge remote-tracking branch 'origin/main' into introduce-not-master-…

be6e67c

…for-desired-balance


          Rename

44cc48c

ywangd removed the auto-merge-without-approval label

ywangd requested a review from pxsalehi

December 2, 2024 05:06

Member Author

ywangd commented Dec 3, 2024

@elasticmachine update branch


          Merge branch 'main' into introduce-not-master-for-desired-balance

74820a9

ywangd added the auto-merge-without-approval label

elasticsearchmachine merged commit 2a9a3a4 into elastic:main

16 checks passed

ywangd deleted the introduce-not-master-for-desired-balance branch

December 3, 2024 13:13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval :Distributed Coordination/Allocation >enhancement Team:Distributed Coordination v9.0.0