-
Notifications
You must be signed in to change notification settings - Fork 3k
Spark: CALL procedure for rewrite_data_files #3375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for working on this! Almost forgot we haven't added the call procedure for this yet... I remember Russell had laid out the arguments for the procedure in https://docs.google.com/document/d/1aXo1VzuXxSuqcTzMLSQdnivMVgtLExgDWUFMvWeXRxc, could you follow that proposal? |
|
@jackye1995 : Thanks for sharing this doc. I will look into it and implement as per that. |
|
@jackye1995 , @RussellSpitzer : Also I should copy the same code in spark/v3.2 folder as well ? p.s: Also let me know if any other arguments is missed. |
For "where" can you check and see if you can produce expressions using the Spark session's parser? I think you may be able to convert to expressions using the parser and then translate them using our Spark3 utils ... I'm not sure though. |
6dea11f to
36ed20a
Compare
|
@jackye1995 , @RussellSpitzer , @rdblue : Actually for supporting filter, I have used the spark parser to parse the string as expression. But I got spark catalyst expression. But rewrite data files action needs Iceberg expression instead of spark catalyst expression. |
|
There is a conversion from Spark to Iceberg expressions in |
@rdblue : I think it is spark filter to Iceberg expression conversion. I needed spark expression to Iceberg expression conversion. |
|
@RussellSpitzer , @jackye1995 , @rdblue : I will back sync to v3.0 folder once this PR is reviewed and merged. Thanks. |
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/SparkExpressions.java
Outdated
Show resolved
Hide resolved
...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java
Outdated
Show resolved
Hide resolved
...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java
Outdated
Show resolved
Hide resolved
2194b42 to
170fe7b
Compare
.../v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java
Outdated
Show resolved
Hide resolved
.../v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java
Outdated
Show resolved
Hide resolved
|
@RussellSpitzer : So, anything else need to be done for this PR or it can be merged ? |
...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java
Outdated
Show resolved
Hide resolved
...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java
Outdated
Show resolved
Hide resolved
...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java
Outdated
Show resolved
Hide resolved
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good to me, just some nit
|
@jackye1995 : Thanks for the review. I have pushed the nit changes also. |
...ensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java
Outdated
Show resolved
Hide resolved
.../v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java
Outdated
Show resolved
Hide resolved
.../v3.2/spark/src/main/java/org/apache/iceberg/spark/procedures/RewriteDataFilesProcedure.java
Outdated
Show resolved
Hide resolved
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had some minor comments which I think we need to wrap up, other than that I think this is looking pretty good. I know it's difficult for us to work this kind of functionality into Iceberg when Spark keeps a lot of these methods from us.
|
@RussellSpitzer : Thanks for the review again. I have addressed the comments (also added a comment reply) and pushed the new changes. |
|
@RussellSpitzer : I have updated the test case with dataframe write instead of sql now. So, I think all comments are handled now. Thanks. |
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much, this is a great addition to the codebase
|
@RussellSpitzer : Thanks for merging the PR. Appreciate your guidance and patient reply to my questions. It helped in finally avoiding the custom codes 👍🏻 |
Backport of #3375 - Support CALL procedure for rewrite_data_files
Backport of #3375 - Support CALL procedure for rewrite_data_files
Supported the call procedure for
rewrite_data_files.Supported input arguments:
table, [strategy], [sort_order], [options], [where]Supported output arguments:
rewritten_data_files_count,added_data_files_countAdded test cases to cover the scenarios.
Fixes #3355