-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Fix table corruption from OOM during commit cleanup in Spark #4673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
LGTM |
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
|
@rdblue @RussellSpitzer
|
|
@dilipbiswal, done. |
|
Thanks a lot @rdblue |
|
|
||
| } catch (RuntimeException e) { | ||
| LOG.warn("Failed to load committed table metadata, skipping manifest clean-up", e); | ||
| } catch (Throwable e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleanAll is called in line 323 before this try-catch block. what is the further cleanup after this block? I only see notifyListeners.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are talking about OutOfMemoryError happened during the first/commit try-catch block (line 286-324), then cleanAll won't be called either as OutOfMemoryError is not RuntimeException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleanAll is called when the commit fails. If the commit fails, then there is no problem with Spark cleaning up the data files, or cleaning the metadata files that Iceberg creates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there is an OutOfMemoryError throw from the try block (line 326-346), this change will catch and swallow it. So it skip further cleanup from line 333-339. I am not following how this fix the issue #4666 where committed data files were deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was that a throwable not caught by the catch block here would be thrown to Spark, which would cause Spark to consider the commit failed and cause Spark to perform it's own abort code removing the committed files.
So the old behavior was
- Table Operations Commit (this can happen successfully)
- While Cleaning up old files or theoretically while calling notify listeners a non-runtime exception is thrown
- The commit has been successful but the exception is re-thrown to the Spark Commit code
- The Spark commit code sees the exception and executes it's abort method
So the fix is basically never ever throw an exception once the commit has occurred, regardless of what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RussellSpitzer thanks a lot for the detailed explanation. Didn't realize it is Spark abort flow deleted the data files.
I am uneasy with swallowing fatal errors (like OutOfMemoryError) though. Should Spark only catch CommitFailedException and perform abort only for that specific exception? I assume Spark doesn't perform abort for CommitStateUnknownException.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevenzwu It's not in our code, it's in Spark's code where it doesn't have a defined exception for either of those states. Apache Spark basically has this code
try {
DataSource.commit()
} catch (anything) {
datasource.abort()
}Spark doesn't know about any of our particular exceptions, it just wants to know if it's call to DataSource.commit failed or not. In this particular instance the worry is that the Iceberg commit did succeed and no exceptions were thrown but additional code ran after the Iceberg commit (like our clean up code or notify listeners) which throws another exception. This exception is surfaced up as being thrown by DataSource.commit() and causes Spark to call DataSource.abort().
Spark itself cannot distinguish between an exception that happened before the actual commit and after the actual commit within our DataSource commit method. If we wanted to let engines handle this we could have our Iceberg SparkWrite try to handle this but I think that would probably be difficult to manage as well. I think it more sense to say that once committed, we suppress all exceptions. Our Commit method in SnapshotProducer only throws exceptions when the commit fails and in no other circumstances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I thought it was iceberg-spark.
When CommitStateUnknownException is thrown, commit may have actually succeeded in the backend. If Spark performs abort and delete data files in this case, it could corrupt table state. could this happen or it is taken care of by iceberg-spark?
Sorry for asking some newbie Spark questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that may be possible... All of our tests at the moment just make sure that the Metadata.json file is not incorrectly removed. We probably should add an additional test to make sure data files in Spark are not removed. I would think we would need a catch here
try {
operation.commit(); // abort is automatically called if this fails
} catch (CommitStateUnknownException commitStateUnknownException) {
LOG.warn("Unknown Commit State", commitStateUnknownException);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching and swallowing the CommitStateUnknownException is also not ideal. If the commit actually failed, that would lead Spark to falsely assume the commit is successful.
Ideally, Spark needs to distinguish these commit results and then iceberg-spark can translate Iceberg commit result to Spark commit result.
- commit success. treat it as success
- commit failure. treat it as failure and perform the abort
- commit state unknown. treat it as failure but don't perform the abort
This discussion is probably outside the scope of this PR. This PR is fine. Basically, it swallows any exceptions from post-commit-success cleanup steps.
| } catch (RuntimeException e) { | ||
| LOG.warn("Failed to load committed table metadata, skipping manifest clean-up", e); | ||
| } catch (Throwable e) { | ||
| LOG.warn("Failed to load committed table metadata or during cleanup, skipping further cleanup", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failed loading committed table metadata or during cleanup
or
Failed to load committed table metadata or failed during cleanup
|
This pr looks like copied from Core: Skipping manifest clean-up for all Error or Exception. #4507 , @rdblue why not continue on the original pr and copied them into new pr ? I think we should encourage more contributors to join us. |
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
Thanks for reviewing this, everyone! Good to have it fixed. |
Fixes #4666.