Skip to content

Conversation

@chenjunjiedada
Copy link
Collaborator

@chenjunjiedada chenjunjiedada commented Jul 27, 2023

In some cases of skew data, where a few partitions contain most records, we want to skip sorting operations and use fanout writing to accelerate the merge into, the later read performance wouldn't downgrade much if we apply the rebalance at first.

@chenjunjiedada
Copy link
Collaborator Author

chenjunjiedada commented Jul 27, 2023

This closes #5853. @aokolnychyi Could you please take a look?

@aokolnychyi
Copy link
Contributor

aokolnychyi commented Jul 28, 2023

Since recently, we will automatically skip the local sort if fanout writers are enabled and the table is unsorted. Applies to regular jobs as well as row-level operations (both CoW and MoR).

Let me take a closer look later today.

@aokolnychyi
Copy link
Contributor

I have doubts about adding this config at the SQL level. It won't really help the use cause you mentioned above. It will disable both distribution and ordering. In regular writes, you can add a manual repartition step but not in row-level operations. Not doing a repartition/rebalance step is probably not a great idea.

I see multiple options:

  • Leave as is where no local sort is triggered if fanout writers are enabled and the table is unsorted.
  • Never request a local sort if fanout writers are enabled (even when the table is sorted).
  • Add a SQL property like spark.sql.iceberg.use-table-ordering-with-fanout-writers to control this behavior.

I am probably inclined to go with option 1 or 2. Any thoughts, @chenjunjiedada @RussellSpitzer @szehon-ho?

@aokolnychyi
Copy link
Contributor

@chenjunjiedada, did the table have a proper sort order in the use case that hit this?

@chenjunjiedada
Copy link
Collaborator Author

chenjunjiedada commented Jul 29, 2023

@aokolnychyi The table doesn't have a sort order. I agree that without repartition/rebalance is not a good idea, it leads to small files problem. It just does not hurt that much if the data contains few partitions.

  1. Leave as is where no local sort is triggered if fanout writers are enabled and the table is unsorted.
  2. Never request a local sort if fanout writers are enabled (even when the table is sorted).
  3. Add a SQL property like spark.sql.iceberg.use-table-ordering-with-fanout-writers to control this behavior.

I prefer option 1. Just tried to update SortOrderUtil.

@github-actions github-actions bot added the core label Jul 29, 2023
@chenjunjiedada chenjunjiedada force-pushed the add-session-conf branch 2 times, most recently from 720d42b to f60ce13 Compare July 29, 2023 15:56
@chenjunjiedada
Copy link
Collaborator Author

Hmm, it seems like we also need to take different distribution modes into account. Range distribution should apply local sort anyway, right?

@chenjunjiedada
Copy link
Collaborator Author

chenjunjiedada commented Jul 30, 2023

I found #7637 already contains the option 1 logic in Spark 3.4, the unit test testRangeCopyOnWriteMergePartitionedUnsortedTableFanout also verifies that. The issue mentioned is in our Spark 3.3 production env, so backporting #7637 should work. @aokolnychyi Do we have a plan to backport this? The AQE also exists in Spark 3.3, any other dependencies from Spark 3.4?

  // a local ordering within a task is beneficial in two cases:
  // - there is a defined table sort order, so it is clear how the data should be ordered
  // - the table is partitioned and fanout writers are disabled,
  //   so records for one partition must be co-located within a task
  private static SortOrder[] writeOrdering(Table table, boolean fanoutEnabled) {
    if (fanoutEnabled && table.sortOrder().isUnsorted()) {
      return EMPTY_ORDERING;
    } else {
      return ordering(table);
    }
  }

@aokolnychyi
Copy link
Contributor

@chenjunjiedada, AQE is not supported for V2 writes in OSS Spark 3.3. It works for queries but not for writes. I was not planning to cherry-pick this to 3.3 but we can do it after #8042. I can help review if you want to do the cherry-pick.

@chenjunjiedada
Copy link
Collaborator Author

@aokolnychyi , Do we need to backport #7646 as well or #7637 can work independently?

@javrasya
Copy link
Contributor

Any update on this. Due to the fact that Spark3.3 is the latest version supported by Glue, we are stuck with it and not being able to set this within SQL is unreasonable since it will sort before write for no reason, or am I missing something? 🤔

@github-actions
Copy link

github-actions bot commented Sep 9, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Sep 9, 2024
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants