-
Notifications
You must be signed in to change notification settings - Fork 3k
Spark 3.5: Fix system function pushdown in CoW row-level commands #9873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } else { | ||
| filter.copy(condition = newCondition) | ||
| } | ||
| plan.transformWithPruning (_.containsAnyPattern(COMMAND, FILTER, JOIN)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note containsAllPatterns became containsAnyPattern. I don't anticipate this being a performance problem, however.
| case replace @ ReplaceData(_, cond, _, _, _, _) => | ||
| replaceStaticInvoke(replace, cond, newCond => replace.copy(condition = newCond)) | ||
|
|
||
| case join @ Join(_, _, _, Some(cond), _) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to cover Join because GroupBasedRowLevelOperationScanPlanning in Spark must be able to simplify the join condition by discarding filters evaluated on the Iceberg side.
| } | ||
|
|
||
| private def replaceStaticInvoke(condition: Expression): Expression = { | ||
| condition.transformWithPruning(_.containsAnyPattern(BINARY_COMPARISON, IN, INSET)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cover not only BINARY_COMPARISON but also IN and INSET.
Otherwise, IN expressions are not pushed down.
| public static List<Expression> collectExprs( | ||
| SparkPlan sparkPlan, Predicate<Expression> predicate) { | ||
| Seq<List<Expression>> seq = | ||
| SPARK_HELPER.collect( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to logic in PlanUtils but it accounts for AQE plans by relying on SPARK_HELPER.
This class also existed before PlanUtils, so we might want to converge in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these are only used in tests, should we add them to test utils?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already a test util, just the one that existed before PlanUtils. I would probably merge PlanUtils with this class in the future because this one handles AQE plans.
|
|
||
| private void checkDelete(RowLevelOperationMode mode, String cond) { | ||
| withUnavailableLocations( | ||
| findIrrelevantFileLocations(cond), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making irrelevant files unavailable ensures the predicate pushdown works correctly.
|
|
||
| import org.apache.spark.sql.connector.catalog.functions.ScalarFunction; | ||
|
|
||
| abstract class BaseScalarFunction<R> implements ScalarFunction<R> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be added for proper equality on the Spark side. Otherwise, post-scan filters cannot be simplified. It is arguable who had to do this but we can't fix Spark right now.
case class ApplyFunctionExpression(
function: ScalarFunction[_],
children: Seq[Expression])
extends Expression with UserDefinedExpression with CodegenFallback with ImplicitCastInputTypes {
|
Thanks, @rdblue! |
) (#10170) Co-authored-by: Anton Okolnychyi <aokolnychyi@apple.com>
…ache#9873) (apache#10170) Co-authored-by: Anton Okolnychyi <aokolnychyi@apple.com>
This PR fixes the system function predicate pushdown in CoW MERGE operations and adds tests for other use cases. Prior to this PR, we did not cover
ReplaceDataandJoinnodes that are participating in CoW operations. We also did not cover IN expressions.See
GroupBasedRowLevelOperationScanPlanningin Spark for more context.