Spark 3.4: Add backup table name support for Migrate procedure #8227

tomtongue · 2023-08-04T10:52:07Z

Changes

Add supporting the backup table name configuration for Migrate procedure.

Details

Currently, Iceberg migrate procedure keeps the table backup with <TABLE_NAME>_BACKUP_.
However, some catalogs such as Glue Data Catalog only accept lowercase as its table name, and this renaming operation in the migrate procedure would be a blocker for running the migrate.

This change enables users to set their custom table back up name to avoid the restriction with keeping the backward compatiblity of the table name.

ConeyLiu · 2023-08-04T14:38:40Z

api/src/main/java/org/apache/iceberg/actions/MigrateTable.java

    throw new UnsupportedOperationException("Dropping a backup is not supported");
  }

+  default MigrateTable withBackupTableName(String tableName) {


I think the doc is required.

Thank you. Once the commit is merged, I will add the doc. Or if I should add the doc along with this commit please let me know.

I mean the document for this method.

Sure, let me add it to the method.

Added the doc in this commit; 3bd716a

Can we add the new property to spark-procedures.md to document it?

Of course, sure. I add a new row about the backup_table_name argument as a draft. If there's something I need to add or change, please let me know.

…om/apache/iceberg/pull/8227\#discussion_r1284511379

ConeyLiu · 2023-08-09T03:26:08Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/MigrateTableSparkAction.java

  }

+  @Override
+  public MigrateTableSparkAction withBackupTableName(String tableName) {


We could implement Spark 3.4 for this PR and do a backport for other versions.

Sure. Revert the Spark 3.3 commit to the previous in the latest commit.

ConeyLiu · 2023-08-09T03:26:37Z

...-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java


+  @Test
+  public void testMigrateWithBackupTableName() throws IOException {
+    Assume.assumeTrue(catalogName.equals("spark_catalog"));


Are there any reasons to skip spark_catalog?

This part assumes if the catalog is spark_catalog or not. Here, spark_catalog is used (if not, the migrate fails). You mean this part should be removed?

ConeyLiu · 2023-08-09T03:28:41Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/MigrateTableSparkAction.java

  private final StagingTableCatalog destCatalog;
  private final Identifier destTableIdent;
-  private final Identifier backupIdent;
+  private Identifier backupIdent;


Put those un-final fields together?

I believe you mean that the final field should be kept, and un-final field should be added newly.
This backupIdent field is only referred within this class, specifically it's referred indoExecute and relevant methods such as rename, restore and drop that are called in doExecute. Therefore, to keep the parameter final and to make backup table name flexible, I add method variable to each method in doExecute, and process the table name in doExecute.

ConeyLiu · 2023-08-09T03:33:09Z

However, some catalogs such as Glue Data Catalog only accept lowercase as its table name, and this renaming operation in the migrate procedure would be a blocker for running the migrate.

I think @jackye1995 @amogh-jahagirdar @singhpk234 have more knowledge about this.

ConeyLiu · 2023-08-09T03:48:14Z

...-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMigrateTableProcedure.java

+    Assert.assertEquals("Should have added one file", 1L, result);
+
+    String dbName = tableName.split("\\.")[0];
+    Assert.assertTrue(spark.catalog().tableExists(dbName + "." + backupTableName));


Please use AssertJ instead. You can refer to the contributing guide: https://iceberg.apache.org/contribute/#assertj

Thanks for the advice. Replace the test part with it.

… in the migrate and replace tests with AssertJ based on the comments

…he/iceberg/pull/8227/files\#r1287889370

tomtongue · 2023-08-09T11:33:33Z

However, some catalogs such as Glue Data Catalog only accept lowercase as its table name, and this renaming operation in the migrate procedure would be a blocker for running the migrate.

I think @jackye1995 @amogh-jahagirdar @singhpk234 have more knowledge about this.

Let me add my thoughts.
As described above, the current migrate keeps the source Spark table as a backup table with <src_table>_BACKUP_. This would causes the Iceberg validation exception if Glue Data Catalog impl in Iceberg handles such as table.
And, the backup table can be kept without dropping the table. This would be possible to occur the table name conflication.

The ability to specify the backup table name should be necessary to expand the capability and avoid the name confliction. Therefore I submitted this PR.

aokolnychyi · 2023-08-15T17:04:31Z

api/src/main/java/org/apache/iceberg/actions/MigrateTable.java

  }

+  /**
+   * Sets a table name for the backup of the original table


Minor: Missing . at the end of the sentence?

Sorry for this. Add . at the end of the sentence in a next commit.

aokolnychyi · 2023-08-15T17:33:04Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/MigrateTableSparkAction.java


  private boolean dropBackup = false;

+  private String backupTableName = "";


What about placing this var next to dropBackup? Both of them are non-final variables that can be overridden.

Also, what about just making Identifier backupIdent non-final but keeping the type and the initialization in the constructor? We can call construct an identifier in withBackupTableName. That way, we should be able to reduce the amount of changes.

Thanks for the suggestion. The suggestion totally makes sense to me. Based on this comment, I update as follows:

revert back the backupIdent variable with making the var non-final, and the backupIdent initialization part

add the table change logic in the withBackupTableName method that is newly added in this PR

remove the backupTableName var along with the way to update back along with the above two changes

aokolnychyi · 2023-08-15T17:38:11Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/MigrateTableSparkAction.java


+    String backupName;
+    if (backupTableName.isEmpty()) {
+      backupName = this.destTableIdent.name() + BACKUP_SUFFIX;


Minor: We usually don't use this. when accessing fields, only while setting.

Thanks for the advice. Will remove it.

…eName method and add typos and parameter-call

tomtongue · 2023-08-16T05:15:09Z

Thanks for the review. Sent the commit that reflects the comments. It would be happy if you review the new one. @aokolnychyi

aokolnychyi · 2023-08-17T19:40:59Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/MigrateTableSparkAction.java


+  @Override
+  public MigrateTableSparkAction withBackupTableName(String tableName) {
+    if (!tableName.isEmpty()) {


This behavior for checking if tableName is empty seems a bit weird to me. If this method is called, I assume someone wants to override the backup table name. I think we should just go ahead and set backupIdent.

I know dest and source identifiers are same but it would be more readable to use sourceTableIdent() from the parent class in this case. We are backing up the source table, not the destination.

Thanks for the suggestion. Your suggestion is correct and I totally agree with it. Update as follows:

remove checking if the table name is empty, and directly set the backup table name to backupIdent

set the sourceTableIdent in the backupIdent update part.

aokolnychyi · 2023-08-17T19:41:49Z

api/src/main/java/org/apache/iceberg/actions/MigrateTable.java

+   * @param tableName the table name for backup
+   * @return this for method chaining
+   */
+  default MigrateTable withBackupTableName(String tableName) {


I think we should drop with prefix from the name given that no other existing methods have them.

aokolnychyi · 2023-08-17T19:47:07Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/procedures/MigrateTableProcedure.java

    }

    boolean dropBackup = args.isNullAt(2) ? false : args.getBoolean(2);
+    String backupTableName = args.isNullAt(3) ? "" : args.getString(3);


Instead of using an empty string, I think we should use null and adapt the logic below.

MigrateTableSparkAction action = actions().migrateTable(tableName).tableProperties(properties); if (dropBackup) { action.dropBackup(); } if (backupTableName != null) { action.backupTableName(backupTableName); } MigrateTable.Result result = action.execute();

Sure, thank you. Will update this part.

I believe to reflect the above code, the action parameter needs to be reassigned in each if part. In the latest commit, I update this part as below (because dropBackup() and my backupTableName return the MigrateTableAction type):

MigrateTableSparkAction migrateTableSparkAction = SparkActions.get().migrateTable(tableName).tableProperties(properties); if (dropBackup) { migrateTableSparkAction = migrateTableSparkAction.dropBackup(); } if (backupTableName != null) { migrateTableSparkAction = migrateTableSparkAction.backupTableName(backupTableName); } MigrateTable.Result result = migrateTableSparkAction.execute();

If I misunderstand the comment, or there's more recommendation, please let me know.

tomtongue · 2023-08-18T12:03:12Z

Thanks for the review again! I updated as follows:

Changing the method name to backupTableName (remove with)
Updating the logic of dropBackup and backupTableName
Adding the backup_table_name parameter description to the spark-procedure.md

aokolnychyi · 2023-08-21T23:34:30Z

Thanks, @tomtongue! Thanks for reviewing, @ConeyLiu!

tomtongue · 2023-08-22T03:31:56Z

Thanks for kindly fixing and reviewing, @aokolnychyi @ConeyLiu !

Add backup table name support for Migrate procedure

a3db9d9

github-actions bot added API spark labels Aug 4, 2023

tomtongue added 5 commits August 4, 2023 19:56

Fix style violations by spotlessApply

b4694d3

Fix migrate procedure test queries

ab61ed8

Fix table creation

fb41a06

Fix the backup table name initialization

12ffa58

Apply the style by spotlessApply

f64dc7d

ConeyLiu reviewed Aug 4, 2023

View reviewed changes

Reflect the method description based on the comment; https://github.c…

3bd716a

…om/apache/iceberg/pull/8227\#discussion_r1284511379

tomtongue requested a review from ConeyLiu August 9, 2023 03:07

ConeyLiu reviewed Aug 9, 2023

View reviewed changes

tomtongue added 2 commits August 9, 2023 17:17

Move class variables into doExecute, add variables to utility methods…

623211c

… in the migrate and replace tests with AssertJ based on the comments

Revert Spark 3.3 commit based on the comment; https://github.com/apac…

5a90120

…he/iceberg/pull/8227/files\#r1287889370

tomtongue changed the title ~~Spark 3.3, 3.4: Add backup table name support for Migrate procedure~~ Spark 3.4: Add backup table name support for Migrate procedure Aug 9, 2023

Merge branch 'apache:master' into backup-table-name

2fb7a57

aokolnychyi reviewed Aug 15, 2023

View reviewed changes

Update backupTableName change mechanism to call within the backupTabl…

04b4e6e

…eName method and add typos and parameter-call

tomtongue requested a review from aokolnychyi August 16, 2023 05:15

aokolnychyi reviewed Aug 17, 2023

View reviewed changes

tomtongue added 2 commits August 18, 2023 16:39

Update backupTableName config logic

f6a1e24

Add the backup_table_name argument description

0d71d2b

github-actions bot added the docs label Aug 18, 2023

tomtongue requested a review from aokolnychyi August 18, 2023 12:03

aokolnychyi approved these changes Aug 21, 2023

View reviewed changes

aokolnychyi merged commit 87d2a92 into apache:master Aug 21, 2023

tomtongue deleted the backup-table-name branch September 1, 2023 09:04


		private boolean dropBackup = false;

		private String backupTableName = "";

Spark 3.4: Add backup table name support for Migrate procedure #8227

Spark 3.4: Add backup table name support for Migrate procedure #8227

Uh oh!

Conversation

tomtongue commented Aug 4, 2023

Changes

Details

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomtongue Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ConeyLiu Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomtongue Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomtongue Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ConeyLiu commented Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ConeyLiu Aug 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomtongue commented Aug 9, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Aug 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomtongue commented Aug 16, 2023

Uh oh!

aokolnychyi Aug 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

tomtongue Aug 18, 2023 •

edited

Loading

ConeyLiu Aug 9, 2023 •

edited

Loading

tomtongue Aug 9, 2023 •

edited

Loading

tomtongue Aug 9, 2023 •

edited

Loading

ConeyLiu commented Aug 9, 2023 •

edited

Loading

ConeyLiu Aug 9, 2023 •

edited

Loading

aokolnychyi Aug 15, 2023 •

edited

Loading

aokolnychyi Aug 17, 2023 •

edited

Loading