Skip to content

Conversation

@szlta
Copy link
Contributor

@szlta szlta commented Oct 8, 2021

Currently there are 3 struct-like data holder types that ORC integrates with in Iceberg: Record (Iceberg), RowData (Flink) and InternalRow (Spark).

While there's an OrcRowWriter implementation for all these 3 types, only the first two relies on OrcValueWriter's, while Spark uses SparkValueWriters and that has a different API ("int ordinal" is part of nonNullWrite method). Also there's no common abstract StuctWriter class which would be similar to how Parquet/Avro writes are implemented (e.g. see https://github.com/apache/iceberg/blob/master/parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueWriters.java#L539 )

Due to all that, the current ORC write code has a lot of code duplication, and fits less into Iceberg, where the write code paths for Parquet and Avro are much more in harmony.

Thus I propose to bring ORC writes up to the standard too, as we can already see some effects of this shortcoming in this PR: #2935 where a common StructWriter<?> type could help implement positional delete writer creation for ORC.

@openinx openinx self-requested a review October 11, 2021 06:46
pvary pushed a commit to pvary/iceberg that referenced this pull request Oct 11, 2021
pvary pushed a commit to pvary/iceberg that referenced this pull request Oct 11, 2021
@szlta szlta changed the title Refactor ORC value writers ORC: Refactor value writers Oct 12, 2021
@openinx
Copy link
Member

openinx commented Oct 12, 2021

Thanks @szlta for the contribution ! The PR almost looks good to me overall, just left several comments.

@github-actions github-actions bot added the ORC label Oct 13, 2021
Copy link
Member

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks @szlta for contribution !

@openinx openinx merged commit dbfa71e into apache:master Oct 14, 2021
@szlta
Copy link
Contributor Author

szlta commented Oct 14, 2021

Thanks for the thorough review @openinx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants