Skip to content

Conversation

@rodmeneses
Copy link
Contributor

@rodmeneses rodmeneses commented Jan 23, 2025

This PR ports the RANGE distribution mode on the FlinkSink to the new IcebergSink based on the Flink V2 sink interface.

cc: @stevenzwu @mxm @pvary @Guosmilesmile

@github-actions github-actions bot added the flink label Jan 23, 2025
@mxm
Copy link
Contributor

mxm commented Jan 24, 2025

Hey @rodmeneses! Thanks for porting this feature over. I'll have a look shortly.

Copy link
Contributor

@mxm mxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mxm
Copy link
Contributor

mxm commented Feb 7, 2025

@pvary Can you take a look as well?

@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Mar 10, 2025
@rodmeneses
Copy link
Contributor Author

keeping it alive

@rodmeneses rodmeneses marked this pull request as draft March 13, 2025 23:31
@github-actions github-actions bot removed the stale label Mar 14, 2025
@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Apr 13, 2025
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Apr 21, 2025
@Guosmilesmile
Copy link
Contributor

Hi team, how is the progress on this PR? I really need this feature in SinkV2.

@stevenzwu stevenzwu reopened this May 15, 2025
@stevenzwu
Copy link
Contributor

reopened the PR. @rodmeneses please update when the it is ready for review. right now, it is marked as draft.

@github-actions github-actions bot removed the stale label May 16, 2025
@mxm
Copy link
Contributor

mxm commented May 16, 2025

+1 it would be nice to follow up with this. If @rodmeneses is busy, maybe @Guosmilesmile could also take this one?

@rodmeneses
Copy link
Contributor Author

Hi @stevenzwu thanks for reopening this. I will try to finish it this coming week. I think it needs only to port some fixes recently made in the FlinkSink RANGE distribution mode, as well as address some reviews.

@rodmeneses rodmeneses force-pushed the rangeDistributionIcebergSink branch from c44ba98 to 21c628a Compare May 20, 2025 16:52
}
}

private DataStream<RowData> distributeDataStreamByNoneDistributionMode(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking distributeDataStream into smaller functions, each one for a given distributionMode. due to:

 Cyclomatic Complexity is 13 (max allowed is 12). [CyclomaticComplexity]

This way, it is also clear what each distributionMode needs as function parameters, i.e:

  1. distributeDataStreamByNoneDistributionMode -> (DataStream<RowData> input, Schema schema)
  2. distributeDataStreamByHashDistributionMode -> (DataStream<RowData> input, Schema schema, PartitionSpec spec)
  3. distributeDataStreamByRangeDistributionMode ->
    DataStream input, Schema schema, PartitionSpec spec, SortOrder sortOrderParam`

Which brings nice information about what each distributionMode needs for its internal calculation

Copy link
Contributor

@stevenzwu stevenzwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rodmeneses please mark the PR ready for view when ready. right now, it is still a draft.

One uber comment: it has been a long time since the initial creation of this PR. there were some changes in FlinkSink. please do another path and make sure the drift has been ported here.

also, there are some old comments from Peter not addressed. but looks like those problem might also exist in the v1 sink test too.

+ "and table is unpartitioned");
return input;
} else {
if (BucketPartitionerUtil.hasOneBucketField(spec)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check with FlinkSink code again? this code has been removed/reverted there.


return shuffleStream
.partitionCustom(new RangePartitioner(schema, sortOrder), r -> r)
.filter(StatisticsOrRecord::hasRecord)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there has been changes from when this PR was initially created. please re-sync with FlinkSink

@rodmeneses rodmeneses marked this pull request as ready for review May 20, 2025 22:48
@rodmeneses rodmeneses force-pushed the rangeDistributionIcebergSink branch from 82e7d90 to de8ea5e Compare May 20, 2025 23:50
}

@TestTemplate
void testJobNoneDistributeMode() throws Exception {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these methods were incorrectly copied here in previous commit and they are duplicates of the ones on TestFlinkIcebergSinkV2DistributionMode
So I'm removing them

@rodmeneses rodmeneses force-pushed the rangeDistributionIcebergSink branch from de8ea5e to d2b4b78 Compare May 20, 2025 23:53
@stevenzwu stevenzwu changed the title Range distribution iceberg sink Flink: port range distribution to v2 iceberg sink May 22, 2025
PartitionSpec partitionSpec,
SortOrder sortOrderParam) {

int writerParallelism =
Copy link
Contributor

@stevenzwu stevenzwu May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic should be applied to the writer parallelism for v2 sink.

if write parallelism is not configured, the v1 sink default the writer parallelism to the input parallelism to promote chaining. Want to confirm if that is the case for v2 sink? from reading the code, I thought v2 sink will default the writer parallelism to default job parallelism?

      // Note that IcebergSink internally consists o multiple operators (like writer, committer,
      // aggregator).
      // The following parallelism will be propagated to all of the above operators.
      if (sink.flinkWriteConf.writeParallelism() != null) {
        rowDataDataStreamSink.setParallelism(sink.flinkWriteConf.writeParallelism());
      }

technically, if this is a behavior change problem for v2 sink, it is not caused by this PR. but it is critical that the same writer parallelism is used by the shuffle operator to properly range partition the data to downstream writer tasks. That is why in the v1 FlinkSink, you can see writerParallelism is computed once and pass to two methods.

      int writerParallelism =
          flinkWriteConf.writeParallelism() == null
              ? rowDataInput.getParallelism()
              : flinkWriteConf.writeParallelism();

      // Distribute the records from input data stream based on the write.distribution-mode and
      // equality fields.
      DataStream<RowData> distributeStream =
          distributeDataStream(rowDataInput, equalityFieldIds, flinkRowType, writerParallelism);

      // Add parallel writers that append rows to files
      SingleOutputStreamOperator<FlinkWriteResult> writerStream =
          appendWriter(distributeStream, flinkRowType, equalityFieldIds, writerParallelism);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @stevenzwu
Thanks for your comment.
We have the same logic in the beginning of the distributeDataStreamByRangeDistributionMode method:

int writerParallelism =
    flinkWriteConf.writeParallelism() == null
        ? input.getParallelism()
        : flinkWriteConf.writeParallelism();

So I think that for range partitioning the behavior should be the same. What do you think ?

Copy link
Contributor

@stevenzwu stevenzwu May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I wasn't very clear earlier. The v2 sink writer parallelism selection is different than the v1 sink. It doesn't use input parallelism if write parallelism is not configured explicitly.

    @Override
    public Builder writeParallelism(int newWriteParallelism) {
      writeOptions.put(
          FlinkWriteOptions.WRITE_PARALLELISM.key(), Integer.toString(newWriteParallelism));
      return this;
    }

Copy link
Contributor Author

@rodmeneses rodmeneses May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @stevenzwu . Thanks for the clarification. I think I know what you mean. On FlinkSink, even without talking about RANGE distribution, the parallelism of the v1 sink by default will be the same as the input source parallelism.
This is a good approach, because it encourage chaining.

However, we dont have that logic on the IcebergSink v2. I think we should have the same logic there. I could do that in another PR and the follow up with this one. What do you think? Thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. you can follow up with a new PR for the writer parallelism fix.

can you resolve the conflict? then we can merge this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do!

Copy link
Contributor Author

@rodmeneses rodmeneses Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @stevenzwu I have rebased and this is ready! Thanks
Tagging @mxm and @pvary as well

@rodmeneses rodmeneses force-pushed the rangeDistributionIcebergSink branch 3 times, most recently from ce63639 to 1dcf734 Compare June 2, 2025 17:15
@rodmeneses rodmeneses force-pushed the rangeDistributionIcebergSink branch from 1dcf734 to 2721fbe Compare June 2, 2025 17:44
Copy link
Contributor

@mxm mxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This should be ported to 2.0 and 1.19 once merged.

@stevenzwu stevenzwu merged commit 931865e into apache:main Jun 3, 2025
18 checks passed
@stevenzwu
Copy link
Contributor

thanks @rodmeneses for the contribution and @mxm @pvary for the review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants