Skip to content

Conversation

@Guosmilesmile
Copy link
Contributor

Background

When an existing table is migrated to the Flink implementation of RewriteDataFiles and its historical data has not been compacted thoroughly,or modify the targetFileSize, the Flink maintenance job must first rewrite the entire table before it can follow the user’s intended schedule. This initial compaction can run for a very long time, while new data cannot trigger compaction, resulting in degraded query performance.

Purpose

This PR introduces an optional filter parameter to Flink’s RewriteDataFiles. Users can define predicates (e.g. data whose time is greater than a given timestamp) so that only the necessary subset of data is compacted, allowing historical data to be skipped.

@github-actions github-actions bot added the flink label Jul 25, 2025
@pvary
Copy link
Contributor

pvary commented Jul 27, 2025

@mxm: Your thoughts?

Copy link
Contributor

@mxm mxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handy. Thanks!

.maxFileSizeBytes(2_000_000L)
.minFileSizeBytes(500_000L)
.minInputFiles(2)
.filter(Expressions.in("id", 1, 2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always nice to add a comment at the test-relevant parameters (here: the filter).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have add it .

}

@Test
void testRewriteUnPartitionedWithFilter() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "Unpartitioned" relevant for this test? If not, we could rename the test to testRewriteWithFilter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not relevant. I have rename totestRewriteWithFilter

@pvary pvary merged commit d6f22d5 into apache:main Jul 29, 2025
18 checks passed
@pvary
Copy link
Contributor

pvary commented Jul 29, 2025

Merged to main.
Thanks @Guosmilesmile for the feature and @mxm for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants