Spark: Fix issue when partitioning by UUID #8250

nastra · 2023-08-07T13:22:25Z

The isse (as can be seen in #8247) is that InternalRowWrapper uses the underlying Java class of a particular type (UUID.class in this case) to do a cast. Since Spark doesn't support UUIDs, they are mapped as strings and the cast fails.

To solve the issue, we're passing the underlying Iceberg schema as struct to InternalRowWrapper, so that we can convert to the right type before doing the cast. A similar approach is being done for Flink in RowDataWrapper.

Unfortunately this required to fix this issue across all Spark versions, because RecordWrapperTest is being used by all Spark versions. The fix itself is inside InternalRowWrapper and all other changes are due to passing the underlying Schema as struct to InternalRowWrapper.

manuzhang · 2023-08-08T01:20:02Z

Which issue is this fixing?

nastra · 2023-08-08T07:46:49Z

data/src/test/java/org/apache/iceberg/RecordWrapperTest.java

-          required(116, "dec_38_10", Types.DecimalType.of(38, 10)) // maximum precision
-          );
+          required(116, "dec_38_10", Types.DecimalType.of(38, 10)), // maximum precision
+          optional(117, "uuid", Types.UUIDType.get()));


this is being used by Spark tests across all versions, so we need to apply the fix for all Spark versions in a single PR unfortunately

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/InternalRowWrapper.java

nastra · 2023-08-10T16:10:34Z

@aokolnychyi could you take a look at this please?

singhpk234 · 2023-10-14T17:25:10Z

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/InternalRowWrapper.java


  @SuppressWarnings("unchecked")
-  InternalRowWrapper(StructType rowType) {
+  InternalRowWrapper(StructType rowType, Types.StructType icebergStruct) {


should we have a Pre-condition check that icebergStruct.fields().length is equal to types.lenght ?

I've added one

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/InternalRowWrapper.java

singhpk234

LGTM, Thanks @nastra !

amogh-jahagirdar

Sorry for the delayed review, overall I think the fix is good, just a naming nit.

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/InternalRowWrapper.java

nastra · 2024-05-16T16:22:43Z

thanks for the reviews @hililiwei, @singhpk234, @amogh-jahagirdar

nastra marked this pull request as draft August 7, 2023 13:22

github-actions bot added spark data labels Aug 7, 2023

nastra force-pushed the spark-uuids branch 3 times, most recently from 18e15e5 to 60f39d1 Compare August 7, 2023 15:33

nastra force-pushed the spark-uuids branch 2 times, most recently from 1355e85 to 3fe816b Compare August 8, 2023 07:45

nastra commented Aug 8, 2023

View reviewed changes

nastra requested review from Fokko and aokolnychyi August 8, 2023 07:50

nastra marked this pull request as ready for review August 8, 2023 09:06

nastra force-pushed the spark-uuids branch from 3fe816b to aba5416 Compare August 8, 2023 09:13

hililiwei reviewed Aug 9, 2023

View reviewed changes

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/InternalRowWrapper.java Show resolved Hide resolved

nastra added this to the Iceberg 1.4.0 milestone Aug 30, 2023

danielcweeks modified the milestones: Iceberg 1.4.0, Iceberg 1.5.0 Sep 19, 2023

singhpk234 reviewed Oct 14, 2023

View reviewed changes

nastra force-pushed the spark-uuids branch from aba5416 to c0715cf Compare October 16, 2023 15:08

nastra changed the title ~~Spark 3.4: Fix issue when partitioning by UUID~~ Spark: Fix issue when partitioning by UUID Oct 16, 2023

nastra force-pushed the spark-uuids branch 3 times, most recently from 161538c to ed424a9 Compare October 17, 2023 07:26

singhpk234 approved these changes Oct 17, 2023

View reviewed changes

nastra modified the milestones: Iceberg 1.5.0, Iceberg 1.6.0 Jan 25, 2024

Spark: Fix issue when partitioning by UUID

31155a1

nastra force-pushed the spark-uuids branch from ed424a9 to 3e6cb21 Compare May 16, 2024 11:53

amogh-jahagirdar self-requested a review May 16, 2024 15:07

amogh-jahagirdar approved these changes May 16, 2024

View reviewed changes

spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/InternalRowWrapper.java Outdated Show resolved Hide resolved

fixes

53f2373

nastra force-pushed the spark-uuids branch from 3e6cb21 to 53f2373 Compare May 16, 2024 15:52

nastra merged commit bd046f8 into apache:main May 16, 2024

nastra deleted the spark-uuids branch May 16, 2024 16:49

sasankpagolu pushed a commit to sasankpagolu/iceberg that referenced this pull request Oct 27, 2024

Spark: Fix issue when partitioning by UUID (apache#8250)

98cdbb2

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024

Spark: Fix issue when partitioning by UUID (apache#8250)

7ac42c1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: Fix issue when partitioning by UUID #8250

Spark: Fix issue when partitioning by UUID #8250

Uh oh!

nastra commented Aug 7, 2023 •

edited

Loading

Uh oh!

manuzhang commented Aug 8, 2023

Uh oh!

nastra Aug 8, 2023

Uh oh!

Uh oh!

nastra commented Aug 10, 2023

Uh oh!

singhpk234 Oct 14, 2023

Uh oh!

nastra Oct 17, 2023

Uh oh!

Uh oh!

singhpk234 left a comment

Uh oh!

amogh-jahagirdar left a comment

Uh oh!

Uh oh!

nastra commented May 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Spark: Fix issue when partitioning by UUID #8250

Spark: Fix issue when partitioning by UUID #8250

Uh oh!

Conversation

nastra commented Aug 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manuzhang commented Aug 8, 2023

Uh oh!

nastra Aug 8, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nastra commented Aug 10, 2023

Uh oh!

singhpk234 Oct 14, 2023

Choose a reason for hiding this comment

Uh oh!

nastra Oct 17, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

singhpk234 left a comment

Choose a reason for hiding this comment

Uh oh!

amogh-jahagirdar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nastra commented May 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

nastra commented Aug 7, 2023 •

edited

Loading