Skip to content

Conversation

@nastra
Copy link
Contributor

@nastra nastra commented Aug 7, 2023

fixes #8247

The isse (as can be seen in #8247) is that InternalRowWrapper uses the underlying Java class of a particular type (UUID.class in this case) to do a cast. Since Spark doesn't support UUIDs, they are mapped as strings and the cast fails.

To solve the issue, we're passing the underlying Iceberg schema as struct to InternalRowWrapper, so that we can convert to the right type before doing the cast. A similar approach is being done for Flink in RowDataWrapper.

Unfortunately this required to fix this issue across all Spark versions, because RecordWrapperTest is being used by all Spark versions. The fix itself is inside InternalRowWrapper and all other changes are due to passing the underlying Schema as struct to InternalRowWrapper.

@nastra nastra marked this pull request as draft August 7, 2023 13:22
@nastra nastra force-pushed the spark-uuids branch 3 times, most recently from 18e15e5 to 60f39d1 Compare August 7, 2023 15:33
@manuzhang
Copy link
Member

Which issue is this fixing?

@nastra nastra force-pushed the spark-uuids branch 2 times, most recently from 1355e85 to 3fe816b Compare August 8, 2023 07:45
required(116, "dec_38_10", Types.DecimalType.of(38, 10)) // maximum precision
);
required(116, "dec_38_10", Types.DecimalType.of(38, 10)), // maximum precision
optional(117, "uuid", Types.UUIDType.get()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is being used by Spark tests across all versions, so we need to apply the fix for all Spark versions in a single PR unfortunately

@nastra nastra requested review from Fokko and aokolnychyi August 8, 2023 07:50
@nastra nastra marked this pull request as ready for review August 8, 2023 09:06
@nastra
Copy link
Contributor Author

nastra commented Aug 10, 2023

@aokolnychyi could you take a look at this please?

@nastra nastra added this to the Iceberg 1.4.0 milestone Aug 30, 2023

@SuppressWarnings("unchecked")
InternalRowWrapper(StructType rowType) {
InternalRowWrapper(StructType rowType, Types.StructType icebergStruct) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have a Pre-condition check that icebergStruct.fields().length is equal to types.lenght ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added one

@nastra nastra changed the title Spark 3.4: Fix issue when partitioning by UUID Spark: Fix issue when partitioning by UUID Oct 16, 2023
@nastra nastra force-pushed the spark-uuids branch 3 times, most recently from 161538c to ed424a9 Compare October 17, 2023 07:26
Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @nastra !

@nastra nastra modified the milestones: Iceberg 1.5.0, Iceberg 1.6.0 Jan 25, 2024
@amogh-jahagirdar amogh-jahagirdar self-requested a review May 16, 2024 15:07
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed review, overall I think the fix is good, just a naming nit.

@nastra
Copy link
Contributor Author

nastra commented May 16, 2024

thanks for the reviews @hililiwei, @singhpk234, @amogh-jahagirdar

@nastra nastra merged commit bd046f8 into apache:main May 16, 2024
@nastra nastra deleted the spark-uuids branch May 16, 2024 16:49
sasankpagolu pushed a commit to sasankpagolu/iceberg that referenced this pull request Oct 27, 2024
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Spark: Support UUID partitioned tables

6 participants