-
Notifications
You must be signed in to change notification settings - Fork 3k
Spark: Fix issue when partitioning by UUID #8250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
18e15e5 to
60f39d1
Compare
|
Which issue is this fixing? |
1355e85 to
3fe816b
Compare
| required(116, "dec_38_10", Types.DecimalType.of(38, 10)) // maximum precision | ||
| ); | ||
| required(116, "dec_38_10", Types.DecimalType.of(38, 10)), // maximum precision | ||
| optional(117, "uuid", Types.UUIDType.get())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is being used by Spark tests across all versions, so we need to apply the fix for all Spark versions in a single PR unfortunately
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/InternalRowWrapper.java
Show resolved
Hide resolved
|
@aokolnychyi could you take a look at this please? |
|
|
||
| @SuppressWarnings("unchecked") | ||
| InternalRowWrapper(StructType rowType) { | ||
| InternalRowWrapper(StructType rowType, Types.StructType icebergStruct) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have a Pre-condition check that icebergStruct.fields().length is equal to types.lenght ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added one
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/InternalRowWrapper.java
Show resolved
Hide resolved
161538c to
ed424a9
Compare
singhpk234
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks @nastra !
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed review, overall I think the fix is good, just a naming nit.
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/InternalRowWrapper.java
Outdated
Show resolved
Hide resolved
|
thanks for the reviews @hililiwei, @singhpk234, @amogh-jahagirdar |
fixes #8247
The isse (as can be seen in #8247) is that
InternalRowWrapperuses the underlying Java class of a particular type (UUID.classin this case) to do a cast. Since Spark doesn't support UUIDs, they are mapped as strings and the cast fails.To solve the issue, we're passing the underlying Iceberg schema as struct to
InternalRowWrapper, so that we can convert to the right type before doing the cast. A similar approach is being done for Flink inRowDataWrapper.Unfortunately this required to fix this issue across all Spark versions, because
RecordWrapperTestis being used by all Spark versions. The fix itself is insideInternalRowWrapperand all other changes are due to passing the underlying Schema as struct toInternalRowWrapper.