Update defaults of max-concurrent-file-group-rewrites to 5 #6907

karuppayya · 2023-02-22T17:58:38Z

No description provided.

karuppayya · 2023-02-22T17:59:00Z

@aokolnychyi @RussellSpitzer @flyrain @szehon-ho

aokolnychyi · 2023-02-22T21:05:03Z

api/src/main/java/org/apache/iceberg/actions/RewriteDataFiles.java

  String MAX_CONCURRENT_FILE_GROUP_REWRITES = "max-concurrent-file-group-rewrites";

-  int MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT = 1;
+  int MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT = 5;


I support the idea of changing this default as we ask our users to change it in 9/10 cases.
@RussellSpitzer, any thoughts on this as you wrote this part?

I like 5 but what I'd really like us to do is check the size of our "file groups" then use that to determine what to set max concurrent groups too. For example if we see all file groups are less than say target file size, we set Max Conncurrent = Num Cores

We would probably need to look at the scheduler implementation (e.g. FIFO) and total resources available in the cluster vs the amount of data we need to compact and partitions. I think it makes sense exploring but if it becomes too tricky, I'd just default it to something more reasonable.

I support the idea of changing this default as we ask our users to change it in 9/10 cases.

True. Weekly once or twice, I am suggesting users configure it in Iceberg slack. Most of them doesn't read javadoc :(
I would love to see this default value getting changed.

+1 for this change.

I'm sold for this right now, then something more complicated later

ajantha-bhat · 2023-03-10T00:08:31Z

It seems now testPartialProgressEnabled is flaky. failed only for spark3.1 java 8.
Because of 5 concurrent commits, One of the file groups has failed to commit? I think it is better to set the concurrency to 1 for this particular test case.

org.apache.iceberg.spark.actions.TestRewriteDataFilesAction > testPartialProgressEnabled FAILED
    java.lang.AssertionError: Should have 10 fileGroups expected:<9> but was:<10>
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at

karuppayya · 2023-03-16T18:22:22Z

The test was failing as one(or more) commit was failing

[Rewrite-Service-1] ERROR org.apache.iceberg.actions.RewriteDataFilesCommitManager - Cannot commit groups [RewriteFileGroup{info=BaseRewriteDataFilesFileGroupInfo{globalIndex=2, partitionIndex=2,

This was happening since the number of concurrent writes was more than the retry attempts.
I will fix the test, to increase the retry attempts.
I will also file an issue to see if this can be computed and passed on to action.
@RussellSpitzer @aokolnychyi @ajantha-bhat

RussellSpitzer · 2023-03-22T17:00:15Z

Thanks for this update @karuppayya !

nastra · 2023-03-22T17:14:34Z

.palantir/revapi.yml

+    - code: "java.field.constantValueChanged"
+      old: "field org.apache.iceberg.actions.RewriteDataFiles.MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT"
+      new: "field org.apache.iceberg.actions.RewriteDataFiles.MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT"
+      justification: "In most casse,users need more parallelism and is a default recommendation\


typo: casse -> cases

nastra · 2023-03-22T17:16:16Z

.palantir/revapi.yml

      old: "method void org.apache.iceberg.io.DataWriter<T>::add(T)"
      justification: "Removing deprecated method"
  "1.1.0":
+    org.apache.iceberg:iceberg-api:


I think this should land under 1.2.0 and not 1.1.0 because the change is being introduced after 1.2.0 was released (most likely because the branch was created before 1.2.0 was released). @karuppayya could you please follow-up with that and move this under 1.2.0?

Ah yeah I forgot we just missed the release

I've opened #7223 to address that. @RussellSpitzer could you review please?

Update defaults of max-concurrent-file-group-rewrites to 5

88a7fa7

github-actions bot added the API label Feb 22, 2023

aokolnychyi reviewed Feb 22, 2023

View reviewed changes

RussellSpitzer approved these changes Mar 6, 2023

View reviewed changes

Add justification so that revapi doesnt throw error

c26fc8f

ajantha-bhat approved these changes Mar 9, 2023

View reviewed changes

karuppayya closed this Mar 13, 2023

karuppayya reopened this Mar 13, 2023

Test fix attempt

c25fa0c

github-actions bot added the spark label Mar 16, 2023

Fix all versions of Spark

b99255d

karuppayya mentioned this pull request Mar 16, 2023

Set Commit retry attempts dynamically based on file groups and partial progress max commits #7129

Closed

RussellSpitzer approved these changes Mar 22, 2023

View reviewed changes

RussellSpitzer merged commit 08d5c50 into apache:master Mar 22, 2023

nastra reviewed Mar 22, 2023

View reviewed changes

Update defaults of max-concurrent-file-group-rewrites to 5 #6907

Update defaults of max-concurrent-file-group-rewrites to 5 #6907

Uh oh!

Conversation

karuppayya commented Feb 22, 2023

Uh oh!

karuppayya commented Feb 22, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajantha-bhat commented Mar 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karuppayya commented Mar 16, 2023

Uh oh!

RussellSpitzer commented Mar 22, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ajantha-bhat commented Mar 10, 2023 •

edited

Loading