Skip to content

Conversation

@stevenzwu
Copy link
Contributor

No description provided.

@stevenzwu stevenzwu requested a review from pvary July 23, 2024 17:18
@github-actions github-actions bot added the flink label Jul 23, 2024
@stevenzwu
Copy link
Contributor Author

stevenzwu commented Jul 23, 2024

It is a clean back port (complete copy over from 1.19 to 1.17/1.18)

1.18 and 1.17 are identical. Here are the diffs btw 1.19 and 1.18.

➜  iceberg git:(backport-10457) ✗ diff flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/ flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/
diff --color flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/CompletedStatisticsSerializer.java flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/CompletedStatisticsSerializer.java
151c151,153
<     public CompletedStatisticsSerializerSnapshot() {}
---
>     public CompletedStatisticsSerializerSnapshot() {
>       super(CompletedStatisticsSerializer.class);
>     }
diff --color flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsOperator.java flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsOperator.java
95,96c95,96
<     this.parallelism = getRuntimeContext().getTaskInfo().getNumberOfParallelSubtasks();
<     this.subtaskIndex = getRuntimeContext().getTaskInfo().getIndexOfThisSubtask();
---
>     this.parallelism = getRuntimeContext().getNumberOfParallelSubtasks();
>     this.subtaskIndex = getRuntimeContext().getIndexOfThisSubtask();
207,208c207
<     if (globalStatistics != null
<         && getRuntimeContext().getTaskInfo().getIndexOfThisSubtask() == 0) {
---
>     if (globalStatistics != null && getRuntimeContext().getIndexOfThisSubtask() == 0) {
diff --color flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsSerializer.java flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/DataStatisticsSerializer.java
180c180,182
<     public DataStatisticsSerializerSnapshot() {}
---
>     public DataStatisticsSerializerSnapshot() {
>       super(DataStatisticsSerializer.class);
>     }
diff --color flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/GlobalStatisticsSerializer.java flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/GlobalStatisticsSerializer.java
175c175,177
<     public GlobalStatisticsSerializerSnapshot() {}
---
>     public GlobalStatisticsSerializerSnapshot() {
>       super(GlobalStatisticsSerializer.class);
>     }
diff --color flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java
317,318c317,318
<         TypeSerializerSnapshot<SortKey> oldSerializerSnapshot) {
<       if (!(oldSerializerSnapshot instanceof SortKeySerializerSnapshot)) {
---
>         TypeSerializer<SortKey> newSerializer) {
>       if (!(newSerializer instanceof SortKeySerializer)) {
322,327c322,323
<       SortKeySerializerSnapshot oldSnapshot = (SortKeySerializerSnapshot) oldSerializerSnapshot;
<       if (!sortOrder.sameOrder(oldSnapshot.sortOrder)) {
<         return TypeSerializerSchemaCompatibility.incompatible();
<       }
<
<       return resolveSchemaCompatibility(oldSnapshot.schema, schema);
---
>       SortKeySerializer newAvroSerializer = (SortKeySerializer) newSerializer;
>       return resolveSchemaCompatibility(newAvroSerializer.schema, schema);
diff --color flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/StatisticsOrRecordSerializer.java flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/StatisticsOrRecordSerializer.java
177c177,179
<     public StatisticsOrRecordSerializerSnapshot() {}
---
>     public StatisticsOrRecordSerializerSnapshot() {
>       super(StatisticsOrRecordSerializer.class);
>     }

➜  iceberg git:(backport-10457) ✗ diff flink/v1.19/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/ flink/v1.18/flink/src/test/java/org/apache/iceberg/flink/sink/shuffle/
42d41
< import org.apache.flink.runtime.state.OperatorStateBackendParametersImpl;
336,337c335
<             new OperatorStateBackendParametersImpl(
<                 env, "test-operator", Collections.emptyList(), cancelStreamRegistry));
---
>             env, "test-operator", Collections.emptyList(), cancelStreamRegistry);

@stevenzwu stevenzwu changed the title Flink: backport PR #10457 for handling rescale and statistics refactoring in smart shuffling Flink: backport PR#10331 and PR #10457 Jul 23, 2024
@stevenzwu stevenzwu changed the title Flink: backport PR#10331 and PR #10457 Flink: backport PR #10331 and PR #10457 Jul 23, 2024
@github-actions github-actions bot added the build label Jul 23, 2024
@pvary
Copy link
Contributor

pvary commented Jul 24, 2024

@stevenzwu: Could you please help me understand why are the changes in flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java are needed?

Thanks, Peter

@stevenzwu
Copy link
Contributor Author

stevenzwu commented Jul 25, 2024

@stevenzwu: Could you please help me understand why are the changes in flink/v1.19/flink/src/main/java/org/apache/iceberg/flink/sink/shuffle/SortKeySerializer.java are needed?

I assume you were asking about the diff btw 1.19 and 1.18. There were change of API in 1.19 for TypeSerializerSnapshot interface.

1.18
    @Override
    public TypeSerializerSchemaCompatibility<SortKey> resolveSchemaCompatibility(
        TypeSerializer<SortKey> newSerializer) {

1.19
    @Override
    public TypeSerializerSchemaCompatibility<SortKey> resolveSchemaCompatibility(
        TypeSerializerSnapshot<SortKey> oldSerializerSnapshot) {

But I did find a bug in the 1.18 implementation that didn't compare the SortOder for compatibility, this is fixed in the latest commit that I just pushed.

Would also need to add a unit test to test the compatibility check. It could probably be done in a separate one.

@stevenzwu stevenzwu merged commit 4dbc7f5 into apache:main Jul 26, 2024
@stevenzwu
Copy link
Contributor Author

thanks @pvary for the review

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025
czy006 pushed a commit to czy006/iceberg that referenced this pull request Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants