-
Notifications
You must be signed in to change notification settings - Fork 3k
Flink:RewriteDataFiles support filter in plan #13669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
.../v2.0/flink/src/test/java/org/apache/iceberg/flink/maintenance/api/TestRewriteDataFiles.java
Show resolved
Hide resolved
...link/src/main/java/org/apache/iceberg/flink/maintenance/operator/DataFileRewritePlanner.java
Show resolved
Hide resolved
...link/src/main/java/org/apache/iceberg/flink/maintenance/operator/DataFileRewritePlanner.java
Show resolved
Hide resolved
|
@mxm: Your thoughts? |
mxm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is handy. Thanks!
| .maxFileSizeBytes(2_000_000L) | ||
| .minFileSizeBytes(500_000L) | ||
| .minInputFiles(2) | ||
| .filter(Expressions.in("id", 1, 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always nice to add a comment at the test-relevant parameters (here: the filter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have add it .
| } | ||
|
|
||
| @Test | ||
| void testRewriteUnPartitionedWithFilter() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "Unpartitioned" relevant for this test? If not, we could rename the test to testRewriteWithFilter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not relevant. I have rename totestRewriteWithFilter
|
Merged to main. |
Background
When an existing table is migrated to the Flink implementation of RewriteDataFiles and its historical data has not been compacted thoroughly,or modify the targetFileSize, the Flink maintenance job must first rewrite the entire table before it can follow the user’s intended schedule. This initial compaction can run for a very long time, while new data cannot trigger compaction, resulting in degraded query performance.
Purpose
This PR introduces an optional filter parameter to Flink’s RewriteDataFiles. Users can define predicates (e.g. data whose time is greater than a given timestamp) so that only the necessary subset of data is compacted, allowing historical data to be skipped.