-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Apache Iceberg version
1.10.0 (latest release)
Query engine
Flink
Please describe the bug 🐞
DynamicWriteResultAggregator in DynamicIcebrgSink currently produces multiple dynamic committables per table/branch/checkpoint triplet because it aggregates write results by WriteTarget, which is unique per schemaId, specId, and equality fields. It violates the idempotence contract of the DynamicCommitter, which relies on one commit request per triplet to identify and skip already committed requests during recovery.
For example, a data loss occurs in the following scenario:
- Sink creates two commit requests with properties
CheckpointID = 1,JobID = a,OperatorID = abc:
Commit 1(data with Schema 1)Commit 2(data with Schema 2)
- Commit requests are saved to a checkpoint
- Committer commits
Commit 1 - Flink job restarts due to a commit failure or an autoscaling event
- Commit requests are retrieved from a checkpoint
DynamicCommitterchecks the Iceberg snapshots and identifies that theCommittable 1has already been committed as it matchesCheckpointID = 1,JobID = a,OperatorID = abc
DynamicCommitterskips the subsequentCommittable 2, which is part of the same checkpoint
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time
mxm
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working