Skip to content

Conversation

@szlta
Copy link
Contributor

@szlta szlta commented Jan 25, 2022

Adding support or Parquet vectorization. Similarly to how this was done with ORC, we are reusing the already implemented Parquet vectorized read implementation of Hive. It is hooked into IcebergInputFormat to create the record reader from the split and fileScanTask information.

This change also reuses parts of the proposed solution in #3242 by @tprelle so that query plans that failed to get vectorized during Hive query compilation are correctly falling back to the non-vectorized read path.

@pvary
Copy link
Contributor

pvary commented Jan 26, 2022

Let's see if anyone wants to comment. If not, then I can merge tomorrow

@szlta
Copy link
Contributor Author

szlta commented Jan 28, 2022

I have found an issue while testing column re-addition, I'll be uploading the fix for that shortly, please don't commit yet.

@pvary pvary merged commit e9335c5 into apache:master Jan 31, 2022
@szlta
Copy link
Contributor Author

szlta commented Feb 1, 2022

Thanks for the reviews and the merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants