-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Fix for respecting custom location providers in SerializableTable #12564 #14280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add missing .hasMessage() checks to assertThatThrownBy calls - Ensures compliance with AssertThatThrownByWithMessageCheck rule
- Add .hasMessage() check to multi-line assertThatThrownBy call - Ensures full compliance with AssertThatThrownByWithMessageCheck rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change @przemekd.
reading the issue it seems like this has been impacting multiple users, we have a 1.10.1 window open can you please post it in iceberg mailing thread for 1.10.1
https://lists.apache.org/thread/9tjs060vzs6nnghk2brcw0hv89h4drp0 for visibility.
| * | ||
| * @param <T> the type of the result | ||
| */ | ||
| public class Try<T> implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered vavr - https://github.com/vavr-io/vavr/tree/main/vavr it has similar utility, i think you use its getUnchecked and its apache licenced.
we might need some other community members to weigh in here too, my observation is in past, on an orthogonal context, we preferred a lib like failsafe rather than wriring something on our own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we can be even simpler to begin with here. I poked around locally, and I do agree we need some abstraction almost like a Try monad which encapsulates either a result of some kind or a failure. I originally was going to see if we could just use an abstraction over Tasks but ultimately, we need something to capture the state that it was a failure, not just handle the failure. We could use a separate AtomicReference and a boolean for indicating that a location provider could not be resolved, but that's pretty unclean.
That said, I don't think we need a full blown public class with all these APIs, at least as of now.
I think we could just have a package private class like the following with the main write API being of(SerializableSupplier<T> supplier) and the main read API being the getOrThrow.
class Try<T> implements Serializable {
public static <T> of(SerializableSupplier<T> supplier) {//does what capture does}
public static T getOrThrow()....
}
I think this will remove the need for a separate test suite just for this class. Until we know we'll need this for a wider set of use cases, I think we should just keep this as minimal as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@singhpk234 @amogh-jahagirdar I looked at Vavr, but for such limited use I chose not to add it. I pared the class down; it now catches Throwable and rethrows using sneakyThrow, following the same pattern as Vavr.
|
|
||
| // The exception should be thrown when locationProvider() is actually called | ||
| assertThatThrownBy(serializableTable::locationProvider).isSameAs(failure); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a test for round trip serialization to ensure the behavior is the same?
TestHelpers.KryoHelpers.roundTripSerialize(serializableTable)
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @przemekd , I agree with the general idea behind the fix, I just think we could have a simpler, and hidden Try construct to begin with.
| * | ||
| * @param <T> the type of the result | ||
| */ | ||
| public class Try<T> implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we can be even simpler to begin with here. I poked around locally, and I do agree we need some abstraction almost like a Try monad which encapsulates either a result of some kind or a failure. I originally was going to see if we could just use an abstraction over Tasks but ultimately, we need something to capture the state that it was a failure, not just handle the failure. We could use a separate AtomicReference and a boolean for indicating that a location provider could not be resolved, but that's pretty unclean.
That said, I don't think we need a full blown public class with all these APIs, at least as of now.
I think we could just have a package private class like the following with the main write API being of(SerializableSupplier<T> supplier) and the main read API being the getOrThrow.
class Try<T> implements Serializable {
public static <T> of(SerializableSupplier<T> supplier) {//does what capture does}
public static T getOrThrow()....
}
I think this will remove the need for a separate test suite just for this class. Until we know we'll need this for a wider set of use cases, I think we should just keep this as minimal as possible.
| public static <T> Try<T> capture(ThrowingSupplier<T> supplier) { | ||
| try { | ||
| return success(supplier.get()); | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd handle Throwable for this particular case
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @przemekd . I think it mostly looks great, just had a comment that I think we should separate out the kryo assertions into a separate test method.
| Table kryoDeserialized = KryoHelpers.roundTripSerialize(serializableTable); | ||
| assertThatThrownBy(kryoDeserialized::locationProvider) | ||
| .isInstanceOf(RuntimeException.class) | ||
| .hasMessage("location provider failure"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have kryo test in a separate test? In case something breaks in the future it's easier to isolate if it's a kryo issue or not instead of having to poke into the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, it makes total sense 👍🏻
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great thank you @przemekd , looks good from my side. I'll leave this up for a bit for @singhpk234 @geruh to take another look
geruh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! the Try class now effectively defers the location provider outcomes during serialization. Internal testing of this workflow has been successful!
|
Thank you @przemekd and @geruh @singhpk234 for reviewing. I'll go ahead and merge. This is something that should be cherry-picked onto the 1.10.x branch |
…ble apache#12564 (apache#14280) (cherry picked from commit e667f64)
Closes #11527 in a way that does not break fixes provided by changes needed to fix #9029.
It eagerly calculates the location provider field, but wraps that into a newly implemented wrapper that only throws an error if code really needs to use that location provider. In case table needs to be read only by other processes the error is not thrown, even if there is an error with getting the location provider object.