Skip to content

Conversation

@przemekd
Copy link
Contributor

@przemekd przemekd commented Oct 8, 2025

Closes #11527 in a way that does not break fixes provided by changes needed to fix #9029.

It eagerly calculates the location provider field, but wraps that into a newly implemented wrapper that only throws an error if code really needs to use that location provider. In case table needs to be read only by other processes the error is not thrown, even if there is an error with getting the location provider object.

- Add missing .hasMessage() checks to assertThatThrownBy calls
- Ensures compliance with AssertThatThrownByWithMessageCheck rule
- Add .hasMessage() check to multi-line assertThatThrownBy call
- Ensures full compliance with AssertThatThrownByWithMessageCheck rule
Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change @przemekd.

reading the issue it seems like this has been impacting multiple users, we have a 1.10.1 window open can you please post it in iceberg mailing thread for 1.10.1
https://lists.apache.org/thread/9tjs060vzs6nnghk2brcw0hv89h4drp0 for visibility.

*
* @param <T> the type of the result
*/
public class Try<T> implements Serializable {
Copy link
Contributor

@singhpk234 singhpk234 Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered vavr - https://github.com/vavr-io/vavr/tree/main/vavr it has similar utility, i think you use its getUnchecked and its apache licenced.

we might need some other community members to weigh in here too, my observation is in past, on an orthogonal context, we preferred a lib like failsafe rather than wriring something on our own.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we can be even simpler to begin with here. I poked around locally, and I do agree we need some abstraction almost like a Try monad which encapsulates either a result of some kind or a failure. I originally was going to see if we could just use an abstraction over Tasks but ultimately, we need something to capture the state that it was a failure, not just handle the failure. We could use a separate AtomicReference and a boolean for indicating that a location provider could not be resolved, but that's pretty unclean.

That said, I don't think we need a full blown public class with all these APIs, at least as of now.

I think we could just have a package private class like the following with the main write API being of(SerializableSupplier<T> supplier) and the main read API being the getOrThrow.

class Try<T> implements Serializable {
    public static <T> of(SerializableSupplier<T> supplier) {//does what capture does}
    public static T getOrThrow()....
  
}

I think this will remove the need for a separate test suite just for this class. Until we know we'll need this for a wider set of use cases, I think we should just keep this as minimal as possible.

Copy link
Contributor Author

@przemekd przemekd Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@singhpk234 @amogh-jahagirdar I looked at Vavr, but for such limited use I chose not to add it. I pared the class down; it now catches Throwable and rethrows using sneakyThrow, following the same pattern as Vavr.


// The exception should be thrown when locationProvider() is actually called
assertThatThrownBy(serializableTable::locationProvider).isSameAs(failure);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test for round trip serialization to ensure the behavior is the same?

TestHelpers.KryoHelpers.roundTripSerialize(serializableTable)

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @przemekd , I agree with the general idea behind the fix, I just think we could have a simpler, and hidden Try construct to begin with.

*
* @param <T> the type of the result
*/
public class Try<T> implements Serializable {
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we can be even simpler to begin with here. I poked around locally, and I do agree we need some abstraction almost like a Try monad which encapsulates either a result of some kind or a failure. I originally was going to see if we could just use an abstraction over Tasks but ultimately, we need something to capture the state that it was a failure, not just handle the failure. We could use a separate AtomicReference and a boolean for indicating that a location provider could not be resolved, but that's pretty unclean.

That said, I don't think we need a full blown public class with all these APIs, at least as of now.

I think we could just have a package private class like the following with the main write API being of(SerializableSupplier<T> supplier) and the main read API being the getOrThrow.

class Try<T> implements Serializable {
    public static <T> of(SerializableSupplier<T> supplier) {//does what capture does}
    public static T getOrThrow()....
  
}

I think this will remove the need for a separate test suite just for this class. Until we know we'll need this for a wider set of use cases, I think we should just keep this as minimal as possible.

public static <T> Try<T> capture(ThrowingSupplier<T> supplier) {
try {
return success(supplier.get());
} catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd handle Throwable for this particular case

@singhpk234 singhpk234 added this to the Iceberg 1.10.1 milestone Oct 15, 2025
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @przemekd . I think it mostly looks great, just had a comment that I think we should separate out the kryo assertions into a separate test method.

Comment on lines 189 to 192
Table kryoDeserialized = KryoHelpers.roundTripSerialize(serializableTable);
assertThatThrownBy(kryoDeserialized::locationProvider)
.isInstanceOf(RuntimeException.class)
.hasMessage("location provider failure");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have kryo test in a separate test? In case something breaks in the future it's easier to isolate if it's a kryo issue or not instead of having to poke into the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, it makes total sense 👍🏻

@amogh-jahagirdar amogh-jahagirdar changed the title Core: Fix locationProvider implementation for SerializableTable #12564 Core: Allow unresolved location providers in SerializableTable until they are needed for writes #12564 Oct 15, 2025
@amogh-jahagirdar amogh-jahagirdar changed the title Core: Allow unresolved location providers in SerializableTable until they are needed for writes #12564 Core: Delay surfacing location provider errors in SerializableTable until writes are performed #12564 Oct 15, 2025
@amogh-jahagirdar amogh-jahagirdar changed the title Core: Delay surfacing location provider errors in SerializableTable until writes are performed #12564 Core: Defer surfacing location provider errors in SerializableTable until writes are performed #12564 Oct 15, 2025
@amogh-jahagirdar amogh-jahagirdar changed the title Core: Defer surfacing location provider errors in SerializableTable until writes are performed #12564 Core: Delay surfacing location provider resolution failures in SerializableTable until writes are performed #12564 Oct 15, 2025
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thank you @przemekd , looks good from my side. I'll leave this up for a bit for @singhpk234 @geruh to take another look

@amogh-jahagirdar amogh-jahagirdar changed the title Core: Delay surfacing location provider resolution failures in SerializableTable until writes are performed #12564 Core: Fix for respecting custom location providers #12564 Oct 15, 2025
@amogh-jahagirdar amogh-jahagirdar changed the title Core: Fix for respecting custom location providers #12564 Core: Fix for respecting custom location providers in SerializableTable #12564 Oct 15, 2025
Copy link
Contributor

@geruh geruh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! the Try class now effectively defers the location provider outcomes during serialization. Internal testing of this workflow has been successful!

@amogh-jahagirdar
Copy link
Contributor

Thank you @przemekd and @geruh @singhpk234 for reviewing. I'll go ahead and merge. This is something that should be cherry-picked onto the 1.10.x branch

@amogh-jahagirdar amogh-jahagirdar merged commit e667f64 into apache:main Oct 16, 2025
42 checks passed
huaxingao pushed a commit to huaxingao/iceberg that referenced this pull request Nov 7, 2025
huaxingao added a commit that referenced this pull request Nov 7, 2025
…ble #12564  (#14280) (#14532)

(cherry picked from commit e667f64)

Co-authored-by: przemekd <1896041+przemekd@users.noreply.github.com>
thomaschow pushed a commit to thomaschow/iceberg that referenced this pull request Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TableOperations.locationProvider is not respected by Spark

4 participants