-
Notifications
You must be signed in to change notification settings - Fork 3k
[Core | Spark | Integrations] : Fix kryo serialization failure for FileIO #5437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b7cc12c to
3532aa3
Compare
kbendick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM.
You might want to put a small comment in the code above the creation of tableSpecificCatalogProperties and mention that ImmutableMap isn't used to support Kryo serde as that's the normal style.
Totally up to you though.
e058629 to
95bc117
Compare
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right way to fix the problem.
The approach in this PR is to avoid passing ImmutableMap into initializeFileIO, but that makes S3FileIO vulnerable to this problem any time the caller doesn't know not to pass that type of map. Instead, S3FileIO should be responsible for making a copy of the incoming map that can be serialized with Kryo.
aws/src/test/java/org/apache/iceberg/aws/glue/TestGlueCatalog.java
Outdated
Show resolved
Hide resolved
95bc117 to
ecedfc7
Compare
|
This looks like the right fix to me. Thanks, @singhpk234! Could you also add tests for the other FileIO implementations? |
8f40d83 to
159e84a
Compare
nastra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I would define the version number for com.twitter:chill-java:0.10.0 in versions.propsand have build.gradle only use the actual artifact (without the version): testImplementation 'com.twitter:chill-java
spark/v3.3/spark/src/test/java/org/apache/iceberg/TestFileIOSerialization.java
Outdated
Show resolved
Hide resolved
1f3d860 to
3aef768
Compare
3aef768 to
6118aa8
Compare
6118aa8 to
58e2082
Compare
spark/v3.3/spark/src/test/java/org/apache/iceberg/TestFileIOSerialization.java
Outdated
Show resolved
Hide resolved
|
Thanks, @singhpk234! |
|
Thanks @rdblue @kbendick @nastra @amogh-jahagirdar for your awesome reviews :) !!! |
|
@singhpk234 I am still facing this issue using |
|
@akshay-kokate-06 what issue are you seeing exactly? Could you provide the stack trace please? |
About the change
presently spark queries fails when using S3fileIO & GlueCatalog when being used with KryoSerializer ref #5414 (comment). This happens because Immutable map part of S3FileIO properties is not serializable with Kryo and seralizers for it not available in Twitter Chill lib as well (which also spark uses). This PR attemps to fix this by using java collections instead which spark will be able to ser/de as it's serializer is present in Chill lib.
Solves #5414
Testing Done
Manual Test
UT to validate the change fix the issue, without the fix the UT fails with exception mentioned in ticket #5414 (comment)