Skip to content

Conversation

@karuppayya
Copy link
Contributor

No description provided.

@github-actions github-actions bot added the API label Feb 22, 2023
@karuppayya
Copy link
Contributor Author

String MAX_CONCURRENT_FILE_GROUP_REWRITES = "max-concurrent-file-group-rewrites";

int MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT = 1;
int MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support the idea of changing this default as we ask our users to change it in 9/10 cases.
@RussellSpitzer, any thoughts on this as you wrote this part?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like 5 but what I'd really like us to do is check the size of our "file groups" then use that to determine what to set max concurrent groups too. For example if we see all file groups are less than say target file size, we set Max Conncurrent = Num Cores

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would probably need to look at the scheduler implementation (e.g. FIFO) and total resources available in the cluster vs the amount of data we need to compact and partitions. I think it makes sense exploring but if it becomes too tricky, I'd just default it to something more reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support the idea of changing this default as we ask our users to change it in 9/10 cases.

True. Weekly once or twice, I am suggesting users configure it in Iceberg slack. Most of them doesn't read javadoc :(
I would love to see this default value getting changed.

+1 for this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sold for this right now, then something more complicated later

@ajantha-bhat
Copy link
Member

ajantha-bhat commented Mar 10, 2023

It seems now testPartialProgressEnabled is flaky. failed only for spark3.1 java 8.
Because of 5 concurrent commits, One of the file groups has failed to commit? I think it is better to set the concurrency to 1 for this particular test case.

org.apache.iceberg.spark.actions.TestRewriteDataFilesAction > testPartialProgressEnabled FAILED
    java.lang.AssertionError: Should have 10 fileGroups expected:<9> but was:<10>
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at 

@karuppayya karuppayya closed this Mar 13, 2023
@karuppayya karuppayya reopened this Mar 13, 2023
@karuppayya
Copy link
Contributor Author

The test was failing as one(or more) commit was failing

[Rewrite-Service-1] ERROR org.apache.iceberg.actions.RewriteDataFilesCommitManager - Cannot commit groups [RewriteFileGroup{info=BaseRewriteDataFilesFileGroupInfo{globalIndex=2, partitionIndex=2, 

This was happening since the number of concurrent writes was more than the retry attempts.
I will fix the test, to increase the retry attempts.
I will also file an issue to see if this can be computed and passed on to action.
@RussellSpitzer @aokolnychyi @ajantha-bhat

@github-actions github-actions bot added the spark label Mar 16, 2023
@RussellSpitzer RussellSpitzer merged commit 08d5c50 into apache:master Mar 22, 2023
@RussellSpitzer
Copy link
Member

Thanks for this update @karuppayya !

- code: "java.field.constantValueChanged"
old: "field org.apache.iceberg.actions.RewriteDataFiles.MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT"
new: "field org.apache.iceberg.actions.RewriteDataFiles.MAX_CONCURRENT_FILE_GROUP_REWRITES_DEFAULT"
justification: "In most casse,users need more parallelism and is a default recommendation\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: casse -> cases

old: "method void org.apache.iceberg.io.DataWriter<T>::add(T)"
justification: "Removing deprecated method"
"1.1.0":
org.apache.iceberg:iceberg-api:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should land under 1.2.0 and not 1.1.0 because the change is being introduced after 1.2.0 was released (most likely because the branch was created before 1.2.0 was released). @karuppayya could you please follow-up with that and move this under 1.2.0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah I forgot we just missed the release

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #7223 to address that. @RussellSpitzer could you review please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants