Skip to content

Conversation

@lurnagao-dahua
Copy link
Contributor

@lurnagao-dahua lurnagao-dahua commented Mar 30, 2024

Hi, I get an Metadata not found in metadata location for table error when trying to query the data generated by flink

The reason is that in some cases, the e.getMessage() return null and it will throw NullPointerException, then skip checkCommitStatus, it may be delete metadataLocation, actually, metadata commit succeed.

It was introduced by 6570

Hive: Check e.getMessage() is not null
@github-actions github-actions bot added the hive label Mar 30, 2024
fix spotlessJavaCheck
@lurnagao-dahua lurnagao-dahua changed the title Hive: Check e.getMessage() is not null Hive: Fix metadata file not found Apr 1, 2024
} catch (Throwable e) {
if (e.getMessage()
.contains(
if (e.getMessage() != null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd better to create a util method for this pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would we use this new method?
If we would use it only in this class, and only in 2 places, then I think it is better not to "hide" what we do behind a method somewhere else in the code.

Copy link
Member

@manuzhang manuzhang Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had the method in the first place, I don't think we would miss the null check here. With this method, we can ensure the null check will be applied in any other places. Just my two cents, which should not block merging this PR.

Copy link
Contributor

@pvary pvary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... accidentally chose "Approve".

While I think the PR is good, I wanted to wait @manuzhang's review before moving forward.

@manuzhang
Copy link
Member

@lurnagao-dahua please check styles.

The reason is that in some cases, the e.getMessage() return null and it will throw NullPointerException, then skip checkCommitStatus, it may be delete metadataLocation, actually, metadata commit succeed.

Is it possible to add a UT for this case?

@lurnagao-dahua
Copy link
Contributor Author

lurnagao-dahua commented Apr 2, 2024

@lurnagao-dahua please check styles.

The reason is that in some cases, the e.getMessage() return null and it will throw NullPointerException, then skip checkCommitStatus, it may be delete metadataLocation, actually, metadata commit succeed.

Is it possible to add a UT for this case?

Hi, I have checked the style. I found the original pr similar to this one 701, then I will try to write an UT

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it seems we're using the same pattern in a few other places, but having a test to reproduce the underlying issue would be great

@lurnagao-dahua
Copy link
Contributor Author

Thank you for your response!
In my case flink streaming write to iceberg
1.Hive metastore has been full GC continuously so it will throw SocketTimeoutException: Read timed out(hive.metastore.client.socket.timeout default 600s)
2.hiveTableOperations commit thread call Thread.sleep(retryDelaySeconds * 1000) to retry
3.The Flink checkpoint timeout time is less than 600s and Interrupt it, then throw InterruptedException and not message

I have been thinking for a while and I have some doubts about this UT. Can you give me some advice?

@pvary
Copy link
Contributor

pvary commented Apr 3, 2024

@lurnagao-dahua: If you check https://github.com/apache/iceberg/blob/3caa3a28d07a2d08b9a0e4196634126f1e016d6a/hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java, you can find plenty of examples for commit errors. Maybe if we could do something similar, like throwing an exception without a message. It would be nice to have a test.

OTOH, if the test is more than 50 lines, it would cost us more in the upkeep of the test in the long run, than what we gain with testing a null check. In this case I would skip addig the extra code, following the example of #701.

@lurnagao-dahua
Copy link
Contributor Author

@lurnagao-dahua: If you check https://github.com/apache/iceberg/blob/3caa3a28d07a2d08b9a0e4196634126f1e016d6a/hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java, you can find plenty of examples for commit errors. Maybe if we could do something similar, like throwing an exception without a message. It would be nice to have a test.

OTOH, if the test is more than 50 lines, it would cost us more in the upkeep of the test in the long run, than what we gain with testing a null check. In this case I would skip addig the extra code, following the example of #701.

Hi,Thank you for your suggestion
I added a unit test.It will throw runtimeException without message and result in throw CommitStateUnknownException,the meesage will be null + COMMON_INFO:

public CommitStateUnknownException(Throwable cause) {
  super(cause.getMessage() + "\n" + COMMON_INFO, cause);
}

}

@Test
public void testCommitExceptionWithoutMessage() throws TException, InterruptedException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be simplified to

@Test
  public void testCommitExceptionWithoutMessage() throws TException, InterruptedException {
    Table table = catalog.loadTable(TABLE_IDENTIFIER);
    HiveTableOperations ops = (HiveTableOperations) ((HasTableOperations) table).operations();

    TableMetadata metadataV1 = ops.current();
    table.updateSchema().addColumn("n", Types.IntegerType.get()).commit();

    ops.refresh();

    HiveTableOperations spyOps = spy(ops);
    doThrow(new RuntimeException()).when(spyOps).persistTable(any(), anyBoolean(), any());

    assertThatThrownBy(() -> spyOps.commit(ops.current(), metadataV1))
        .isInstanceOf(CommitStateUnknownException.class)
        .hasMessageStartingWith("null\nCannot determine whether the commit was successful or not");
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@test
public void testCommitExceptionWithoutMessage() throws TException, InterruptedException {
Table table = catalog.loadTable(TABLE_IDENTIFIER);
HiveTableOperations ops = (HiveTableOperations) ((HasTableOperations) table).operations();

TableMetadata metadataV1 = ops.current();
table.updateSchema().addColumn("n", Types.IntegerType.get()).commit();

ops.refresh();

HiveTableOperations spyOps = spy(ops);
doThrow(new RuntimeException()).when(spyOps).persistTable(any(), anyBoolean(), any());

assertThatThrownBy(() -> spyOps.commit(ops.current(), metadataV1))
    .isInstanceOf(CommitStateUnknownException.class)
    .hasMessageStartingWith("null\nCannot determine whether the commit was successful or not");

}

Thank you very much, I have simplified unit test.

@nastra nastra merged commit c65023b into apache:main Apr 3, 2024
sasankpagolu pushed a commit to sasankpagolu/iceberg that referenced this pull request Oct 27, 2024
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants