-
Notifications
You must be signed in to change notification settings - Fork 3k
Spark: Throw unsupported for ADD COLUMN with default value #13464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark: Throw unsupported for ADD COLUMN with default value #13464
Conversation
| SparkCatalogConfig.SPARK.implementation(), | ||
| SparkCatalogConfig.SPARK.properties() | ||
| }, | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure yet what's going on but I see that when the parameters are executed in this order the REST catalog is set as the underlying catalog for the SparkSessionCatalog instead of just the regular SparkCatalog that I'd expect given this definition.
I'm not sure if there's some weird classloader caching happening in Spark between the test executions that's leading to that behavior but the reason it surfaces in the new test is because in spark 3.4/3.5 we can ignore spark session catalog since it already fails but for an expected different message (so we want to skip that case). I just reordered the parameterization and everything works as expected.
Ultimately, will need to figure out what's really going on here but wanted to provide context to reviewers for why this change was made.
65e5b7c to
4778f60
Compare
| if (add.defaultValue() != null) { | ||
| throw new UnsupportedOperationException( | ||
| String.format( | ||
| "Cannot add column %s since default values are currently unsupported", | ||
| leafName(add.fieldNames()))); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[doubt][not a blocker] A while back we had a thread to align with Spark's standardized error handling (doc : https://docs.google.com/document/d/11qHUiCcKMJ-xAyfL__Yv7B1b5N-80-GIwE8AV96A2Ac/edit?tab=t.0) if we are ok
can we throw this documented error class then https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala#L731
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc, I'll double check if there's some standard error code but I don't really agree with the linked error class since that's specific to query parsing failures (e.g. some tests in the original Spark PR demonstrate the intention) whereas the intention of the implemented error is to surface that the Iceberg-Spark implementation just doesn't support this yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking a look ! Agree, this error is specific to iceberg connector for spark not supporting this feature, brought this up to get your thoughts on this !
if there's some standard error code
Presently for REPLACE columns spark throws
Expecting actual throwable to be an instance of:
java.lang.UnsupportedOperationException
but was:
org.apache.spark.sql.catalyst.parser.ParseException:
[UNSUPPORTED_DEFAULT_VALUE.WITHOUT_SUGGESTION] DEFAULT column values is not supported. SQLSTATE: 0A000
== SQL (line 1, position 1) ==
ALTER TABLE t1 REPLACE COLUMNS (x STRING DEFAULT 42)
When we throw this presently in iceberg connector way past parsing stage, i think its at the time of commit throws this, we don't really have much handle of the parser here ~
| if (add.defaultValue() != null) { | ||
| throw new UnsupportedOperationException( | ||
| String.format( | ||
| "Cannot add column %s since default values are currently unsupported", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"since setting default values in Spark is currently unsupported?"
I thought the columns can be read but the code for altering the schema is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah they can be read, you're right this error message does leave the impression that it's generally not supported but that's not true. it's specifically that setting default values in Spark is unsupported
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I have a minor nit on the wording of the error message since I think it slightly gives the impression that a table with default values will be broken in Spark when we actually mean that a table with Default Values cannot be created by Spark .
4778f60 to
1a20bcb
Compare
|
Thanks for the reviews @singhpk234 @nastra @RussellSpitzer ! |
Currently, ALTER TABLE ADD COLUMN with default values is unsupported, however the DDL will succeed and silently ignore setting the default value in Iceberg metadata. There is an ongoing PR for supporting this but there's still more work to be done on that.
It'd be ideal at least in the interim if we can explicitly surface an unsupported operation exception to users when the default value is specified .
Note: