-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Expire Snapshots reachability analysis #5669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Expire Snapshots reachability analysis #5669
Conversation
a84f92d to
7e23bd3
Compare
|
Still very incomplete, just starting a draft. |
8fcb2bc to
a6807aa
Compare
b9525c7 to
be9232c
Compare
c9e1ab8 to
302b9eb
Compare
324fe11 to
49021ca
Compare
|
cc: @rdblue @namrathamyske @singhpk234 @jackye1995 I moved it out of draft state, would be great to get your thoughts on this. Thanks! |
core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ReachableFileCleanup.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ReachableFileCleanup.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ReachableFileCleanup.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java
Outdated
Show resolved
Hide resolved
95cd087 to
8698582
Compare
| // Reads and deletes are done using Tasks.foreach(...).suppressFailureWhenFinished to complete | ||
| // as much of the delete work as possible and avoid orphaned data or manifest files. | ||
| SnapshotRef branchToCleanup = Iterables.getFirst(base.refs().values(), null); | ||
| if (branchToCleanup == null) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Porting the fix from https://github.com/apache/iceberg/pull/5666/files. This is still needed if we want to support incremental file cleanups in the case that expiration is called on a table with only non-main commits.
|
|
||
| protected void deleteMetadataFiles( | ||
| Set<String> manifestsToDelete, Set<String> manifestListsToDelete) { | ||
| log.warn("Manifests to delete: {}", Joiner.on(", ").join(manifestsToDelete)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still warn as it is currently, but I'm doubting if it really needs to be warn level or if we even need this logging? warn level implies that something is in a "not fatally wrong" which is not the case here. We are intentionally deleting these files after they've been correctly computed. It may mislead users. @rdblue @singhpk234 @jackye1995
| private Set<ManifestFile> readManifests(Set<Snapshot> snapshots) { | ||
| Set<ManifestFile> manifestFiles = Sets.newHashSet(); | ||
| for (Snapshot snapshot : snapshots) { | ||
| try (CloseableIterable<ManifestFile> manifestFilesForSnapshot = readManifestFiles(snapshot)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to do this in parallel using Tasks. The results are copied anyway so there isn't much of a cost to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh missed this when addressing the comments, is this blocking or could I address in a follow on PR along with https://github.com/apache/iceberg/pull/5669/files#r990877250 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do it in a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, we may want to do something like what you do in findFilesToDelete, where rather than reading a set of current manifest files, you just remove manifests from the candidate set. You could also use the same logic to short circuit if there are no more candidate manifests.
core/src/main/java/org/apache/iceberg/ReachableFileCleanup.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ReachableFileCleanup.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ReachableFileCleanup.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ReachableFileCleanup.java
Outdated
Show resolved
Hide resolved
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amogh-jahagirdar, this looks very good. There's just one blocker here: #5669 (comment). There are some minor things as well you may want to clean up but once the blocker is in we can commit this. Thank you!
b90e8b6 to
1f73232
Compare
1f73232 to
df49bda
Compare
5ba53a2 to
cddc93b
Compare
cddc93b to
b3e3a47
Compare
| @Override | ||
| @SuppressWarnings({"checkstyle:CyclomaticComplexity", "MethodLength"}) | ||
| public void cleanFiles(TableMetadata beforeExpiration, TableMetadata afterExpiration) { | ||
| if (afterExpiration.refs().size() > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method seems way too long, wasn't it better to refactor into several private methods with names explaining the intents ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zinking, that's a great idea, but not something we should do in this PR. This code was moved from RemoveSnapshots so we want to keep it as close to the original as possible. Would you like to follow up with a PR to simplify it and get rid of the checkstyle warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .onFailure( | ||
| (item, exc) -> | ||
| LOG.warn( | ||
| "Failed to determine live files in manifest {}: this may cause orphaned data files", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't true anymore. It should state that there will be a retry.
| }); | ||
|
|
||
| } catch (Throwable e) { | ||
| LOG.warn("Failed to determine the data files to be removed", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow up, I think this message could be a little better. This makes sense if you know what the code is currently doing, but if you don't then I think it will be hard to understand that the error was in reading current manifests, rather than dropped manifests. I'd use something like "Failed to list all reachable files".
|
Thanks, @amogh-jahagirdar! This looks good. |
|
FYI @puneetzaroo and @szehon-ho, this affects incremental table cleanup. |
Currently, Snapshot expiration has a limitation which prevents file cleanup from being performed if there are multiple branches and tags. That's because for reliable file cleanup in the presence of multiple references, a reachability analysis needs to be performed to determine which files can safely be removed. The existing incremental file cleanup cannot be performed in this case.
This PR introduces a FileCleanupStrategy which is a base strategy pattern that classes override with their own file cleanup logic. There are 2 strategy implementations
1.) IncrementalFileCleanup which is used in RemoveSnapshots procedure in the case there's only 1 reference. This is simply the existing file cleanup logic.
2.) A new ReachableFileCleanup which performs a reachability analysis of reachable manifest lists, manifests, and data files given the previous and current table states.
Closes #5666.