-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Update shardGenerations for all indices on snapshot finalization #128650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If an index is deleted after a snapshot has written its shardGenerations file but before the snapshot is finalized, we exclude this index from the snapshot because its indexMetadata is no longer available. However, the shardGenerations file is still valid in that it is the latest copy with all necessary information despite it containing an extra snapshot entry. This is OK. Instead of dropping this shardGenerations file, this PR changes to carry it forward so that the next snapshot to finalize, either earlier or later in the list of entries, builts the next shardGenerations based on this one. Resolves: elastic#108907
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
Hi @ywangd, I've created a changelog YAML for you. |
.andThen(l -> { | ||
client.admin() | ||
.cluster() | ||
.prepareGetSnapshots(TEST_REQUEST_TIMEOUT, repoName) | ||
.setSnapshots(IntStream.range(0, snapshotCount).mapToObj(i -> "snapshot-" + i).toArray(String[]::new)) | ||
.execute(ActionTestUtils.assertNoFailureListener(getSnapshotsResponse -> { | ||
for (final var snapshot : getSnapshotsResponse.getSnapshots()) { | ||
assertThat(snapshot.state(), is(SnapshotState.SUCCESS)); | ||
final String snapshotName = snapshot.snapshot().getSnapshotId().getName(); | ||
// Does not contain the deleted index in the snapshot | ||
assertThat(snapshot.indices(), contains("index-" + snapshotName.charAt(snapshotName.length() - 1))); | ||
} | ||
l.onResponse(null); | ||
})); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaveCTurner The test is copied from your comment. I only added this part of checking the GetSnapshots response.
I am still looking to add a randomization so that the earlier entry is a clone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clone is added in efdeaed
@DaveCTurner The change makes a finalizing snapshot to update shard generation for earlier entry even when the index is deleted. As a result, it removes the old shard generation. But since the index is deleted, the new generation is not recorded in I don't think we want to remove the assertion. So the solution seems to be bringing in the concept of "shard generations for deleted indices" (e.g. something like |
Ugh yeah this is tricky, but IMO it doesn't make sense to track shard generations for deleted indices in |
OK I think there might be a simple solution, see b478b95 The old generation got deleted not because of |
* In this case, its shard generation is tracked in {@link #deletedIndices}. Otherwise, it is tracked in | ||
* {@link #liveIndices}. | ||
*/ | ||
public record UpdatedShardGenerations(ShardGenerations liveIndices, ShardGenerations deletedIndices) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think this will work (but I will add some other comments elsewhere)
server/src/main/java/org/elasticsearch/repositories/FinalizeSnapshotContext.java
Outdated
Show resolved
Hide resolved
.entrySet() | ||
.stream() | ||
// We want to keep both old and new generations for deleted indices, so we filter them out here to avoid deletion. | ||
// We need the old generations because they are what get recorded in the RepositoryData. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this seems suspicious. We should be updating the shard generations for existing shards in RepositoryData
and discarding the previous values, even if the index isn't included in the snapshot. AIUI the tripping assertion you mentioned in an earlier comment related to generations for shards that were totally absent from RepositoryData
. We should drop those shards entirely from RepositoryData
, but update any ones that do exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes you are absolutely right. We should conditionally update the shard generations in RepositoryData
for the deleted indices. The issue with my previous attempt is that the update is "unconditional" which triggered the assertion. It's now updated as suggested. Thanks a lot!
server/src/main/java/org/elasticsearch/repositories/FinalizeSnapshotContext.java
Outdated
Show resolved
Hide resolved
…apshotContext.java Co-authored-by: David Turner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I like it, just a few tiny suggestions
@@ -1749,7 +1749,7 @@ int sizeInBytes() { | |||
public void finalizeSnapshot(final FinalizeSnapshotContext finalizeSnapshotContext) { | |||
assert ThreadPool.assertCurrentThreadPool(ThreadPool.Names.SNAPSHOT); | |||
final long repositoryStateId = finalizeSnapshotContext.repositoryStateId(); | |||
final ShardGenerations shardGenerations = finalizeSnapshotContext.updatedShardGenerations(); | |||
final ShardGenerations shardGenerations = finalizeSnapshotContext.updatedShardGenerations().liveIndices(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe inline this, it's only used in one place, and we now need to care about which ShardGenerations
we're talking about so the variable name is ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. See 1071d95
return; | ||
} | ||
builder.put(key.index(), key.shardId(), value); | ||
}); | ||
} | ||
return builder.build(); | ||
return new UpdatedShardGenerations(builder.build(), deletedBuilder.build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deletedBuilder.build()
is mostly going to be ShardGenerations.EMPTY
, can we special-case the empty collection in ShardGenerations.Builder
to skip a bunch of unnecessary allocations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Special cased in both Builder#build
and initialization of the variable here 81f88ba
public record UpdatedShardGenerations(ShardGenerations liveIndices, ShardGenerations deletedIndices) { | ||
public static final UpdatedShardGenerations EMPTY = new UpdatedShardGenerations(ShardGenerations.EMPTY, ShardGenerations.EMPTY); | ||
|
||
public UpdatedShardGenerations(ShardGenerations updated) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constructor is only used in tests; IMO it'd be better to just be explicit and pass in ShardGenerations.EMPTY
as the second parameter rather than take the risk that some future caller uses this constructor when they should be accounting for deleted indices too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep see 1852486
@@ -424,6 +425,7 @@ public RepositoryData addSnapshot( | |||
// the new master, so we make the operation idempotent | |||
return this; | |||
} | |||
final var shardGenerations = updatedShardGenerations.liveIndices(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise this name is ambiguous; here we actually only care about the live index IDs now, maybe it'd avoid some confusion to extract that variable instead:
final var shardGenerations = updatedShardGenerations.liveIndices(); | |
final var liveIndexIds = updatedShardGenerations.liveIndices().indices(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep fair call 6d148e5
} | ||
} | ||
}); | ||
return this; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IntelliJ indicates the return value is unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. It is a leftover from the initial version where this method is public. It's indeed no longer needed. See a257d39
@@ -244,6 +254,20 @@ public Builder put(IndexId indexId, int shardId, ShardGeneration generation) { | |||
return this; | |||
} | |||
|
|||
private Builder updateIfPresent(ShardGenerations shardGenerations) { | |||
shardGenerations.shardGenerations.forEach((indexId, gens) -> { | |||
if (generations.containsKey(indexId)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do this with a generations.computeIfPresent()
rather than a .containsKey()
check followed by several .get()
calls? Or even just .get()
up front and then a null check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I replaced it with a .get()
and null check. I personally like it slightly better for readability than computeIfPresent
. See a257d39
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM great stuff 🚀
@elasticmachine update branch |
There are no new commits on the base branch. |
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…stic#128650) If an index is deleted after a snapshot has written its shardGenerations file but before the snapshot is finalized, we exclude this index from the snapshot because its indexMetadata is no longer available. However, the shardGenerations file is still valid in that it is the latest copy with all necessary information despite it containing an extra snapshot entry. This is OK. Instead of dropping this shardGenerations file, this PR changes to carry it forward by updating RepositoryData and relevant in-progress snapshots so that the next finalization builds on top of this one. Co-authored-by: David Turner <[email protected]> (cherry picked from commit aa0397f) # Conflicts: # server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java
…8650) (#128724) If an index is deleted after a snapshot has written its shardGenerations file but before the snapshot is finalized, we exclude this index from the snapshot because its indexMetadata is no longer available. However, the shardGenerations file is still valid in that it is the latest copy with all necessary information despite it containing an extra snapshot entry. This is OK. Instead of dropping this shardGenerations file, this PR changes to carry it forward by updating RepositoryData and relevant in-progress snapshots so that the next finalization builds on top of this one. Co-authored-by: David Turner <[email protected]>
…8650) (#128725) If an index is deleted after a snapshot has written its shardGenerations file but before the snapshot is finalized, we exclude this index from the snapshot because its indexMetadata is no longer available. However, the shardGenerations file is still valid in that it is the latest copy with all necessary information despite it containing an extra snapshot entry. This is OK. Instead of dropping this shardGenerations file, this PR changes to carry it forward by updating RepositoryData and relevant in-progress snapshots so that the next finalization builds on top of this one. Co-authored-by: David Turner <[email protected]> (cherry picked from commit aa0397f) # Conflicts: # server/src/main/java/org/elasticsearch/snapshots/SnapshotsService.java
…8650) If an index is deleted after a snapshot has written its shardGenerations file but before the snapshot is finalized, we exclude this index from the snapshot because its indexMetadata is no longer available. However, the shardGenerations file is still valid in that it is the latest copy with all necessary information despite it containing an extra snapshot entry. This is OK. Instead of dropping this shardGenerations file, this PR changes to carry it forward by updating RepositoryData and relevant in-progress snapshots so that the next finalization builds on top of this one. Co-authored-by: David Turner <[email protected]>
…stic#128650) If an index is deleted after a snapshot has written its shardGenerations file but before the snapshot is finalized, we exclude this index from the snapshot because its indexMetadata is no longer available. However, the shardGenerations file is still valid in that it is the latest copy with all necessary information despite it containing an extra snapshot entry. This is OK. Instead of dropping this shardGenerations file, this PR changes to carry it forward by updating RepositoryData and relevant in-progress snapshots so that the next finalization builds on top of this one. Co-authored-by: David Turner <[email protected]>
…stic#128650) If an index is deleted after a snapshot has written its shardGenerations file but before the snapshot is finalized, we exclude this index from the snapshot because its indexMetadata is no longer available. However, the shardGenerations file is still valid in that it is the latest copy with all necessary information despite it containing an extra snapshot entry. This is OK. Instead of dropping this shardGenerations file, this PR changes to carry it forward by updating RepositoryData and relevant in-progress snapshots so that the next finalization builds on top of this one. Co-authored-by: David Turner <[email protected]>
…stic#128650) If an index is deleted after a snapshot has written its shardGenerations file but before the snapshot is finalized, we exclude this index from the snapshot because its indexMetadata is no longer available. However, the shardGenerations file is still valid in that it is the latest copy with all necessary information despite it containing an extra snapshot entry. This is OK. Instead of dropping this shardGenerations file, this PR changes to carry it forward by updating RepositoryData and relevant in-progress snapshots so that the next finalization builds on top of this one. Co-authored-by: David Turner <[email protected]>
If an index is deleted after a snapshot has written its shardGenerations file but before the snapshot is finalized, we exclude this index from the snapshot because its indexMetadata is no longer available. However, the shardGenerations file is still valid in that it is the latest copy with all necessary information despite it containing an extra snapshot entry. This is OK. Instead of dropping this shardGenerations file, this PR changes to carry it forward by updating RepositoryData and relevant in-progress snapshots so that the next finalization builds on top of this one.
Resolves: #108907
Co-authored-by: DaveCTurner [email protected]