-
Notifications
You must be signed in to change notification settings - Fork 3k
Spark: Implement InvalidateTable for SparkSessionCatalog #3072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark: Implement InvalidateTable for SparkSessionCatalog #3072
Conversation
Previously this was not implemented making it impossible for users to refresh Iceberg or Session tables. Here we implement the method by calling invalidate on the Iceberg catalog if the table exists, and on the Session catalog if it does not.
|
@kbendick + @aokolnychyi + @rdblue The invalidate table issue. Could you please review? |
|
Somehow this test is failing on CI but passing locally ... I'll try to repo with more tests |
|
Ok it passes in the Intellij Test Runner but not in the gradle one... |
|
Truly bizzare, I put in some print debug messages locally and found that in intellij refresh table is running SparkSessionCatalog invalidateTable but in gradle it doesn't hit that path at all... I'll have to look at this tomorrow |
|
I think the root cause is Spark 3.0 vs Spark 3.1 |
|
The culprit is I believe https://issues.apache.org/jira/browse/SPARK-32990, in Spark 3.0.X the RefreshTable Command does not get executed on the V2 catalog for the Session catalog, meaning it will only refresh using the internal session refresh code. |
|
Ok all wrapped up now, In Spark 3.1 Refresh Table will work |
Spark 3.0 only uses the internal catalog class refreshTable method. This means it will never utilize the invalidateTable method we provide with the V2Catalog we extend.
|
All good to go now |
|
|
||
| @Override | ||
| public void invalidateTable(Identifier ident) { | ||
| if (icebergCatalog.tableExists(ident)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to check for Identififer cases whether is null or is it done by TableExists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should only be invoked by Scala in Spark so we shouldn't have a null ever passed through basically the same as the other Api's in this class like "loadTable"
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit that I'll leave at your discretion to update or not, but overall this looks good to me.
Thank you @RussellSpitzer!
| public void invalidateTable(Identifier ident) { | ||
| if (icebergCatalog.tableExists(ident)) { | ||
| icebergCatalog.invalidateTable(ident); | ||
| } else { | ||
| getSessionCatalog().invalidateTable(ident); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit / non-blocking: Is there any need to check if the Identifier belongs to the catalog of icebergCatalog? I doubt it because tableExists should be checking that, but if there's no test around that might consider adding one (or just trying it out).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's basically the same as the "loadTable" method in this class. TableExists just catches the noSuchTable exception and returns false
iceberg/api/src/main/java/org/apache/iceberg/catalog/Catalog.java
Lines 268 to 275 in b623b07
| default boolean tableExists(TableIdentifier identifier) { | |
| try { | |
| loadTable(identifier); | |
| return true; | |
| } catch (NoSuchTableException e) { | |
| return false; | |
| } | |
| } |
So I don't really think we need a check here. Again it's basically the same logic for "loadTable" so if loadTable would touch this table in the Iceberg catalog, refresh table will also use the Iceberg table.
|
|
||
| @Test | ||
| public void testRefreshCommand() { | ||
| Assume.assumeFalse("Spark 3.0 Spark Session Catalog does not use V2 Catalogs so Iceberg refresh is impossible", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
Thanks, @RussellSpitzer! |
Previously this was not implemented making it impossible for users to
refresh Iceberg or Session tables. Here we implement the method by
calling invalidate on the Iceberg catalog if the table exists, and
on the Session catalog if it does not.
Solves #2972