Skip to content

Conversation

@ConeyLiu
Copy link
Contributor

@ConeyLiu ConeyLiu commented May 5, 2022

This patch adds support for parallel delete files when commit aborting.

@github-actions github-actions bot added the spark label May 5, 2022
// Isolation Level for DataFrame calls. Currently supported by overwritePartitions
public static final String ISOLATION_LEVEL = "isolation-level";

public static final String DELETE_FILES_PARALLEL_WHEN_ABORT = "delete-files-parallel-when-abort";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] missing comment for this property

private void abort(WriterCommitMessage[] messages) {
Map<String, String> props = table.properties();
Tasks.foreach(files(messages))
.executeWith(writeConf.deleteFilesParallelWhenAbort() ? ThreadPools.getWorkerPool() : null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] does it makes sense to give it, it's own dedicated worker pool , Your thoughts ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably use the worker pool for now. No need to extend it until someone needs it.

public static final String CHECK_ORDERING = "spark.sql.iceberg.check-ordering";
public static final boolean CHECK_ORDERING_DEFAULT = true;

// Control whether to parallel delete files when abort
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit]

Suggested change
// Control whether to parallel delete files when abort
// Controls whether to parallel delete files when abort

public static final boolean CHECK_ORDERING_DEFAULT = true;

// Control whether to parallel delete files when abort
public static final String DELETE_FILES_PARALLEL_WHEN_ABORT = "spark.sql.iceberg.delete-files-parallel-when-abort";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name "delete-files-parallel-when-abort" is very long. How about "parallel-abort-enabled"?

Does this even need to be an option? Why would anyone turn it off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue I remove the option. Previously I want to have the ability to keep the behavior as before.

@ConeyLiu ConeyLiu force-pushed the delete_with_executors branch from b3e062e to 9a79e00 Compare May 10, 2022 13:42
@ConeyLiu
Copy link
Contributor Author

Thanks, @singhpk234 @rdblue for the reviewing, the code has been updated. Please take another look.

if (cleanupOnAbort) {
Map<String, String> props = table.properties();
Tasks.foreach(files(messages))
.executeWith(ThreadPools.getWorkerPool())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RussellSpitzer, what do you think about this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me, although I really wish we had a more discrete pool for things like this. I find that our worker pool is hard to configure for end users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a Spark-managed pool? We have the ability to plug in the pool used in most places now, thanks to @yittg.

@rdblue rdblue merged commit 9b75114 into apache:master May 10, 2022
@rdblue
Copy link
Contributor

rdblue commented May 10, 2022

Thanks, @ConeyLiu!

@ConeyLiu
Copy link
Contributor Author

Thanks all for the review.

@ConeyLiu ConeyLiu deleted the delete_with_executors branch May 11, 2022 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants