Hive: Parquet vectorization support for Hive 3 #3980

szlta · 2022-01-25T14:06:30Z

Adding support or Parquet vectorization. Similarly to how this was done with ORC, we are reusing the already implemented Parquet vectorized read implementation of Hive. It is hooked into IcebergInputFormat to create the record reader from the split and fileScanTask information.

This change also reuses parts of the proposed solution in #3242 by @tprelle so that query plans that failed to get vectorized during Hive query compilation are correctly falling back to the non-vectorized read path.

mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java

hive3/src/main/java/org/apache/iceberg/mr/hive/vector/ParquetSchemaFieldNameVisitor.java

pvary · 2022-01-26T12:46:10Z

Let's see if anyone wants to comment. If not, then I can merge tomorrow

szlta · 2022-01-28T11:03:37Z

I have found an issue while testing column re-addition, I'll be uploading the fix for that shortly, please don't commit yet.

hive3/src/main/java/org/apache/iceberg/mr/hive/vector/ParquetSchemaFieldNameVisitor.java

szlta · 2022-02-01T08:42:58Z

Thanks for the reviews and the merge!

Hive: Parquet vectorization support for Hive 3

5bb5fc3

github-actions bot added hive MR labels Jan 25, 2022

checkstyle fixes

65a04b9

pvary reviewed Jan 26, 2022

View reviewed changes

mr/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java Outdated Show resolved Hide resolved

pvary reviewed Jan 26, 2022

View reviewed changes

hive3/src/main/java/org/apache/iceberg/mr/hive/vector/ParquetSchemaFieldNameVisitor.java Outdated Show resolved Hide resolved

marton-bod reviewed Jan 26, 2022

View reviewed changes

hive3/src/main/java/org/apache/iceberg/mr/hive/vector/ParquetSchemaFieldNameVisitor.java Outdated Show resolved Hide resolved

review round 1

5241b4e

marton-bod approved these changes Jan 26, 2022

View reviewed changes

pvary approved these changes Jan 26, 2022

View reviewed changes

fixing issue with recreated field in schema

5f4d8e6

pvary reviewed Jan 28, 2022

View reviewed changes

hive3/src/main/java/org/apache/iceberg/mr/hive/vector/ParquetSchemaFieldNameVisitor.java Show resolved Hide resolved

pvary reviewed Jan 28, 2022

View reviewed changes

hive3/src/main/java/org/apache/iceberg/mr/hive/vector/ParquetSchemaFieldNameVisitor.java Show resolved Hide resolved

pvary reviewed Jan 28, 2022

View reviewed changes

hive3/src/main/java/org/apache/iceberg/mr/hive/vector/ParquetSchemaFieldNameVisitor.java Outdated Show resolved Hide resolved

review round2

81b70c1

pvary approved these changes Jan 28, 2022

View reviewed changes

pvary merged commit e9335c5 into apache:master Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hive: Parquet vectorization support for Hive 3 #3980

Hive: Parquet vectorization support for Hive 3 #3980

Uh oh!

szlta commented Jan 25, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pvary commented Jan 26, 2022

Uh oh!

szlta commented Jan 28, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

szlta commented Feb 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Hive: Parquet vectorization support for Hive 3 #3980

Hive: Parquet vectorization support for Hive 3 #3980

Uh oh!

Conversation

szlta commented Jan 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pvary commented Jan 26, 2022

Uh oh!

szlta commented Jan 28, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

szlta commented Feb 1, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

szlta commented Jan 25, 2022 •

edited

Loading