Skip to content

Conversation

@nastra
Copy link
Contributor

@nastra nastra commented Oct 7, 2022

No description provided.

@nastra nastra added this to the Iceberg 1.0.0 Release milestone Oct 7, 2022
@nastra nastra requested a review from rdblue October 7, 2022 07:55
@rdblue rdblue merged commit e2bb9ad into apache:1.0.x Oct 7, 2022
gaborkaszab pushed a commit to gaborkaszab/iceberg that referenced this pull request Oct 24, 2022
@haydenflinner
Copy link

If I have a table with more than 100 columns, what are the downsides since I'm above this param value? I don't see it documented here -- https://iceberg.apache.org/docs/latest/configuration/

I only ask because I have a table that is basically a collection of events. Upstream, each event has some metadata in a dict. Using a column per key in that metadata dict felt like it would compress better than each row having a {"key1": 123}, where the key names are relatively static and the values would benefit from columnar compression. The majority of such cols are empty for any particular partition which I assume is near 0 storage/runtime overhead. Like, file 1's rows will have metadata dict {"abc": 1234} repeated in virtually the whole GB of data. File 2 may have metadata in most rows of {"def": "foo"} instead.

@nastra nastra deleted the 1.0.x-increase-column-metrics branch June 1, 2023 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants