Skip to content

Conversation

@RussellSpitzer
Copy link
Member

Previously this was not implemented making it impossible for users to
refresh Iceberg or Session tables. Here we implement the method by
calling invalidate on the Iceberg catalog if the table exists, and
on the Session catalog if it does not.

Solves #2972

Previously this was not implemented making it impossible for users to
refresh Iceberg or Session tables. Here we implement the method by
calling invalidate on the Iceberg catalog if the table exists, and
on the Session catalog if it does not.
@github-actions github-actions bot added the spark label Sep 4, 2021
@RussellSpitzer
Copy link
Member Author

@kbendick + @aokolnychyi + @rdblue The invalidate table issue. Could you please review?

@RussellSpitzer
Copy link
Member Author

Somehow this test is failing on CI but passing locally ... I'll try to repo with more tests

@RussellSpitzer
Copy link
Member Author

Ok it passes in the Intellij Test Runner but not in the gradle one...

@RussellSpitzer
Copy link
Member Author

Truly bizzare, I put in some print debug messages locally and found that in intellij refresh table is running SparkSessionCatalog invalidateTable but in gradle it doesn't hit that path at all... I'll have to look at this tomorrow

@RussellSpitzer
Copy link
Member Author

I think the root cause is Spark 3.0 vs Spark 3.1

@RussellSpitzer
Copy link
Member Author

The culprit is I believe https://issues.apache.org/jira/browse/SPARK-32990, in Spark 3.0.X the RefreshTable Command does not get executed on the V2 catalog for the Session catalog, meaning it will only refresh using the internal session refresh code.

@RussellSpitzer
Copy link
Member Author

Ok all wrapped up now,

In Spark 3.1 Refresh Table will work
In Spark 3.0 as a workaround you can clone the session cloneSession() which will recreate the catalog and clear the cache

Spark 3.0 only uses the internal catalog class refreshTable method. This
means it will never utilize the invalidateTable method we provide with
the V2Catalog we extend.
@RussellSpitzer
Copy link
Member Author

All good to go now


@Override
public void invalidateTable(Identifier ident) {
if (icebergCatalog.tableExists(ident)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to check for Identififer cases whether is null or is it done by TableExists?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only be invoked by Scala in Spark so we shouldn't have a null ever passed through basically the same as the other Api's in this class like "loadTable"

Copy link
Contributor

@kbendick kbendick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit that I'll leave at your discretion to update or not, but overall this looks good to me.

Thank you @RussellSpitzer!

Comment on lines +125 to +130
public void invalidateTable(Identifier ident) {
if (icebergCatalog.tableExists(ident)) {
icebergCatalog.invalidateTable(ident);
} else {
getSessionCatalog().invalidateTable(ident);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit / non-blocking: Is there any need to check if the Identifier belongs to the catalog of icebergCatalog? I doubt it because tableExists should be checking that, but if there's no test around that might consider adding one (or just trying it out).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically the same as the "loadTable" method in this class. TableExists just catches the noSuchTable exception and returns false

default boolean tableExists(TableIdentifier identifier) {
try {
loadTable(identifier);
return true;
} catch (NoSuchTableException e) {
return false;
}
}

So I don't really think we need a check here. Again it's basically the same logic for "loadTable" so if loadTable would touch this table in the Iceberg catalog, refresh table will also use the Iceberg table.


@Test
public void testRefreshCommand() {
Assume.assumeFalse("Spark 3.0 Spark Session Catalog does not use V2 Catalogs so Iceberg refresh is impossible",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@rdblue rdblue merged commit 6faed7e into apache:master Sep 19, 2021
@rdblue
Copy link
Contributor

rdblue commented Sep 19, 2021

Thanks, @RussellSpitzer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants