Skip to content

Conversation

@ajantha-bhat
Copy link
Member

@ajantha-bhat ajantha-bhat commented Aug 7, 2023

During write, Int64 decimal data is written as long and Int32 decimal data is written as int.

case INT32:
return ParquetValueWriters.ints(desc);
case INT64:
return ParquetValueWriters.longs(desc);

But during read, Int32 data is read as long and Int64 data is read as int. Hence the exception.

Fixes: #8245

required(24, "couch rope", Types.IntegerType.get())))),
optional(2, "slide", Types.StringType.get()));
optional(2, "slide", Types.StringType.get()),
optional(25, "foo", Types.DecimalType.of(7, 5)));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I fixed the actual issue,
the test case passes the read part and fails for decimal type for Equals() method.

Line 220:
Assert.assertEquals("Record " + recordNum + " should match expected", expected, actual);

Looks like Avro genericData doesn't have Decimal type. Was the data suppose to be Fixed instead of Decimal?

Unknown datum type java.math.BigDecimal: 0.00040
org.apache.avro.AvroRuntimeException: Unknown datum type java.math.BigDecimal: 0.00040
	at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:933)
	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:892)
	at org.apache.avro.generic.GenericData.compare(GenericData.java:1178)
	at org.apache.avro.generic.GenericData.compare(GenericData.java:1154)
	at org.apache.avro.generic.GenericData$Record.equals(GenericData.java:287)
	at org.junit.Assert.isEquals(Assert.java:133)
	at org.junit.Assert.equalsRegardingNull(Assert.java:129)
	at org.junit.Assert.assertEquals(Assert.java:112)
	at org.apache.iceberg.spark.data.TestParquetAvroReader.testCorrectness(TestParquetAvroReader.java:220)

Need to dig deeper on this.
But @Fokko, do you know about this? (tagged you since you are involved in all 3 of the project - avro, parquet, iceberg)

Copy link
Member Author

@ajantha-bhat ajantha-bhat Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some narrow down. The issue happens only when the schema field is optional. For the required field it passes.

required(25, "foo", Types.DecimalType.of(7, 5)));

Currently, I changed field to required and so, this issue can be merged.
Later need to investigate for optional field in separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajantha-bhat I'm pretty comfortable with Avro yes. This is an interesting one. In Avro, both int's and long's have the same zigzag encoding. The difference between optional and required is that optional has an integer in front of the actual value that will indicate if the value is null, or not: https://github.com/apache/iceberg/blob/master/python/pyiceberg/avro/reader.py#L250-L262

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we cool if we can find out what's going on with the optional, but this is a great catch @ajantha-bhat

@Fokko Fokko merged commit 15d68da into apache:master Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Exception while reading the decimal data from ParquetAvroValueReaders

2 participants