Skip to content

Unable to set dictionary_page_offset when encoding_stats are missing #2962

@mothukur

Description

@mothukur

Describe the bug, including details regarding any error messages, version, and platform.

I am facing an issue while splitting a parquet file into multiple files using the ParquetFileWriter.appendRowGroups API. It is failing to set the dictionary page offsets correctly in the new files. When investigated further, I observed that the API ParquetMetadataConverter.addRowGroup has an assumption on the availability of EncodingStats always. As per the format specification, it is not mandatory to have the encoding_stats. Is it possible to remove this requirement? 

https://github.com/apache/parquet-java/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L559

https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L826

Component(s)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions