Core: Fix table corruption from OOM during commit cleanup in Spark #4673

rdblue · 2022-04-29T20:52:01Z

Fixes #4666.

dilipbiswal · 2022-04-29T21:06:41Z

LGTM

RussellSpitzer

Lgtm

dilipbiswal · 2022-04-29T23:06:17Z

@rdblue @RussellSpitzer
Should we move

iceberg/core/src/main/java/org/apache/iceberg/SnapshotProducer.java

Line 326 in 1750053

    
           LOG.info("Committed snapshot {} ({})", newSnapshotId.get(), getClass().getSimpleName());

into the following try block as a safe measure ?

rdblue · 2022-04-29T23:12:03Z

@dilipbiswal, done.

dilipbiswal · 2022-04-29T23:17:31Z

Thanks a lot @rdblue

stevenzwu · 2022-04-29T23:22:33Z

core/src/main/java/org/apache/iceberg/SnapshotProducer.java


-    } catch (RuntimeException e) {
-      LOG.warn("Failed to load committed table metadata, skipping manifest clean-up", e);
+    } catch (Throwable e) {


cleanAll is called in line 323 before this try-catch block. what is the further cleanup after this block? I only see notifyListeners.

If we are talking about OutOfMemoryError happened during the first/commit try-catch block (line 286-324), then cleanAll won't be called either as OutOfMemoryError is not RuntimeException.

cleanAll is called when the commit fails. If the commit fails, then there is no problem with Spark cleaning up the data files, or cleaning the metadata files that Iceberg creates.

if there is an OutOfMemoryError throw from the try block (line 326-346), this change will catch and swallow it. So it skip further cleanup from line 333-339. I am not following how this fix the issue #4666 where committed data files were deleted.

The issue was that a throwable not caught by the catch block here would be thrown to Spark, which would cause Spark to consider the commit failed and cause Spark to perform it's own abort code removing the committed files.

So the old behavior was

Table Operations Commit (this can happen successfully)

While Cleaning up old files or theoretically while calling notify listeners a non-runtime exception is thrown

The commit has been successful but the exception is re-thrown to the Spark Commit code

The Spark commit code sees the exception and executes it's abort method

So the fix is basically never ever throw an exception once the commit has occurred, regardless of what happens.

@RussellSpitzer thanks a lot for the detailed explanation. Didn't realize it is Spark abort flow deleted the data files.

I am uneasy with swallowing fatal errors (like OutOfMemoryError) though. Should Spark only catch CommitFailedException and perform abort only for that specific exception? I assume Spark doesn't perform abort for CommitStateUnknownException.

@stevenzwu It's not in our code, it's in Spark's code where it doesn't have a defined exception for either of those states. Apache Spark basically has this code

try { DataSource.commit() } catch (anything) { datasource.abort() }

source

Spark doesn't know about any of our particular exceptions, it just wants to know if it's call to DataSource.commit failed or not. In this particular instance the worry is that the Iceberg commit did succeed and no exceptions were thrown but additional code ran after the Iceberg commit (like our clean up code or notify listeners) which throws another exception. This exception is surfaced up as being thrown by DataSource.commit() and causes Spark to call DataSource.abort().

Spark itself cannot distinguish between an exception that happened before the actual commit and after the actual commit within our DataSource commit method. If we wanted to let engines handle this we could have our Iceberg SparkWrite try to handle this but I think that would probably be difficult to manage as well. I think it more sense to say that once committed, we suppress all exceptions. Our Commit method in SnapshotProducer only throws exceptions when the commit fails and in no other circumstances.

Got it. I thought it was iceberg-spark.

When CommitStateUnknownException is thrown, commit may have actually succeeded in the backend. If Spark performs abort and delete data files in this case, it could corrupt table state. could this happen or it is taken care of by iceberg-spark?

Sorry for asking some newbie Spark questions.

I think that may be possible... All of our tests at the moment just make sure that the Metadata.json file is not incorrectly removed. We probably should add an additional test to make sure data files in Spark are not removed. I would think we would need a catch here

try { operation.commit(); // abort is automatically called if this fails } catch (CommitStateUnknownException commitStateUnknownException) { LOG.warn("Unknown Commit State", commitStateUnknownException); }

Catching and swallowing the CommitStateUnknownException is also not ideal. If the commit actually failed, that would lead Spark to falsely assume the commit is successful.

Ideally, Spark needs to distinguish these commit results and then iceberg-spark can translate Iceberg commit result to Spark commit result.

commit success. treat it as success

commit failure. treat it as failure and perform the abort

commit state unknown. treat it as failure but don't perform the abort

This discussion is probably outside the scope of this PR. This PR is fine. Basically, it swallows any exceptions from post-commit-success cleanup steps.

RussellSpitzer · 2022-04-30T01:20:24Z

core/src/main/java/org/apache/iceberg/SnapshotProducer.java

-    } catch (RuntimeException e) {
-      LOG.warn("Failed to load committed table metadata, skipping manifest clean-up", e);
+    } catch (Throwable e) {
+      LOG.warn("Failed to load committed table metadata or during cleanup, skipping further cleanup", e);


Failed loading committed table metadata or during cleanup

or

Failed to load committed table metadata or failed during cleanup

StefanXiepj · 2022-04-30T01:38:39Z

This pr looks like copied from Core: Skipping manifest clean-up for all Error or Exception. #4507 , @rdblue why not continue on the original pr and copied them into new pr ? I think we should encourage more contributors to join us.

@dilipbiswal @RussellSpitzer @stevenzwu @electrum cc, pls

kbendick

LGTM.

rdblue · 2022-05-02T19:27:01Z

Thanks for reviewing this, everyone! Good to have it fixed.

) Co-authored-by: Ryan Blue <blue@apache.org>

github-actions bot added the core label Apr 29, 2022

rdblue mentioned this pull request Apr 29, 2022

Iceberg commit flow can abort and delete data files exposed to readers #4666

Closed

rdblue added this to the Iceberg 0.13.2 Release milestone Apr 29, 2022

rdblue changed the title ~~Core: Fix table corruption from OOM during commit cleanup~~ Core: Fix table corruption from OOM during commit cleanup in Spark Apr 29, 2022

RussellSpitzer approved these changes Apr 29, 2022

View reviewed changes

szehon-ho mentioned this pull request Apr 29, 2022

Flink S3Fileio incorrect commit #4168

Closed

Core: Fix table corruption from OOM during commit cleanup.

bd94ca7

rdblue force-pushed the fix-oom-failure branch from 1750053 to bd94ca7 Compare April 29, 2022 23:11

dilipbiswal approved these changes Apr 29, 2022

View reviewed changes

stevenzwu reviewed Apr 29, 2022

View reviewed changes

RussellSpitzer reviewed Apr 30, 2022

View reviewed changes

stevenzwu approved these changes May 1, 2022

View reviewed changes

kbendick approved these changes May 2, 2022

View reviewed changes

rdblue merged commit 9ec7a8f into apache:master May 2, 2022

nastra pushed a commit to nastra/iceberg that referenced this pull request May 16, 2022

Core: Fix table corruption from OOM during commit cleanup (apache#4673)

433f36e

nastra mentioned this pull request May 16, 2022

[0.13] Core: Fix table corruption from OOM during commit cleanup (#4673) #4779

Merged

rdblue added a commit that referenced this pull request May 17, 2022

Core: Fix table corruption from OOM during commit cleanup (#4673) (#4779

ffaf603

) Co-authored-by: Ryan Blue <blue@apache.org>

Core: Fix table corruption from OOM during commit cleanup in Spark #4673

Core: Fix table corruption from OOM during commit cleanup in Spark #4673

Uh oh!

Conversation

rdblue commented Apr 29, 2022

Uh oh!

dilipbiswal commented Apr 29, 2022

Uh oh!

RussellSpitzer left a comment

Choose a reason for hiding this comment

Uh oh!

dilipbiswal commented Apr 29, 2022

Uh oh!

rdblue commented Apr 29, 2022

Uh oh!

dilipbiswal commented Apr 29, 2022

Uh oh!

stevenzwu Apr 29, 2022

Choose a reason for hiding this comment

Uh oh!

stevenzwu Apr 29, 2022

Choose a reason for hiding this comment

Uh oh!

rdblue Apr 29, 2022

Choose a reason for hiding this comment

Uh oh!

stevenzwu Apr 30, 2022

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer Apr 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Apr 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer Apr 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Apr 30, 2022

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer Apr 30, 2022

Choose a reason for hiding this comment

Uh oh!

stevenzwu May 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer Apr 30, 2022

Choose a reason for hiding this comment

Uh oh!

StefanXiepj commented Apr 30, 2022

Uh oh!

kbendick left a comment

Choose a reason for hiding this comment

Uh oh!

rdblue commented May 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

RussellSpitzer Apr 30, 2022 •

edited

Loading

stevenzwu Apr 30, 2022 •

edited

Loading

RussellSpitzer Apr 30, 2022 •

edited

Loading

stevenzwu May 1, 2022 •

edited

Loading