-
Notifications
You must be signed in to change notification settings - Fork 3k
ORC: Add compression properties #4273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
|
|
||
| return new Context(stripeSize, blockSize); | ||
| static String lookupValue(Configuration conf, OrcConf orcConf, String icebergPropertyKey) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is wrong if this is looking up an Iceberg table property like ORC_STRIPE_SIZE_BYTES in a Hadoop Configuration. Iceberg settings should be pulled from table properties, not from Hadoop.
|
I don't think that we can merge this until we fix the ORC configuration PR that recently went in. See #3810 (comment) for more details. The configuration context should not use Hadoop Configuration. |
a93bda2 to
d54c377
Compare
|
Updated based on #4291. |
| String codecAsString = config.getOrDefault(OrcConf.COMPRESS.getAttribute(), ORC_COMPRESSION_DEFAULT); | ||
| codecAsString = config.getOrDefault(ORC_COMPRESSION, codecAsString); | ||
|
|
||
| String compressionStrategy = config.getOrDefault(OrcConf.COMPRESSION_STRATEGY.getAttribute(), | ||
| ORC_COMPRESSION_STRATEGY_DEFAULT); | ||
| compressionStrategy = config.getOrDefault(ORC_COMPRESSION_STRATEGY, compressionStrategy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just follow the same config parse approach as the above config keys did ? I'm saying using the generic PropertyUtil.propertyAsString.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, please also use the CompressionStrategy enum to validate whether this configured value is a valid one ? ( Just as this comment said: https://github.com/apache/iceberg/pull/4273/files#r836994947 )
|
|
||
| return new Context(stripeSize, blockSize, vectorizedRowBatchSize); | ||
| String codecAsString = config.getOrDefault(OrcConf.COMPRESS.getAttribute(), ORC_COMPRESSION_DEFAULT); | ||
| codecAsString = config.getOrDefault(ORC_COMPRESSION, codecAsString); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will suggest to use the similar approach as parquet did: try to parse the string into a CompressionKind so that we can validate whether this value is a valid one in the correct place.
| String codecAsString = config.getOrDefault(DELETE_ORC_COMPRESSION, dataContext.codecAsString()); | ||
|
|
||
| String compressionStrategy = config.getOrDefault(DELETE_ORC_COMPRESSION_STRATEGY, | ||
| dataContext.compressionStrategy()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Please use the same approach to parse the codec & compression-strategy .
openinx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just left serval comments, and I think we are recommended to add unit tests as this PR did to avoid regression .
|
Updated based on comments. |
|
|
||
| Configuration configuration = confField.get(); | ||
| Assert.assertEquals(String.valueOf(blockSizeBytes), configuration.get(OrcConf.BLOCK_SIZE.getAttribute())); | ||
| Assert.assertEquals(String.valueOf(stripeSizeBytes), configuration.get(OrcConf.STRIPE_SIZE.getAttribute())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to using OrcConf to get value directly so no need to know property key detail.
| @Test | ||
| public void testOrcTableProperties() throws Exception { | ||
| Long stripeSizeBytes = 32L * 1024 * 1024; | ||
| Long blockSizeBytes = 128L * 1024 * 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed Long to long because what OrcConf.XXX#getLong returns is a long, and I got
Ambiguous method call. Both assertEquals(Object, Object) in Assert and assertEquals(long, long) in Assert match
when using both Long and long as parameters for Assert#assertEquals.
|
Thanks all for your review! |
This PR adds compression-codec and compression-strategy of ORC in
TableProperties, @openinx @hililiwei would you help review this? Thanks!