Skip to content

Conversation

@amogh-jahagirdar
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar commented Mar 13, 2024

As part of adding encryption support, in #9592 we added some new FileIO APIs, namely

newInputFile(ManifestFile)
newInputFile(DataFile)
newInputFile(DeleteFile)

The overriden implementaiton in EncryptedFileIO is correct but the default implementation in FileIO for these new APIs should pass in a length since it's always known from the Iceberg metadata.

Without this, FileIO implementations which end up calling these default implementations (specifically referring to these new APIs)will make extra requests to the object store/file system to determine the length which we can avoid.

@github-actions github-actions bot added the API label Mar 13, 2024
@amogh-jahagirdar amogh-jahagirdar force-pushed the fix-default-inputfile-impls branch from 2b6300e to 6c0b6f8 Compare March 13, 2024 18:35
@amogh-jahagirdar amogh-jahagirdar requested a review from rdblue March 13, 2024 18:48
Copy link
Member

@ajantha-bhat ajantha-bhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment about the test validation.

Thanks for fixing it.

I think we do have to conclude whether we need 1.5.1 release because of this. Because one extra IO for each manifest and DataFile reading seems like a problem for me.

@ajantha-bhat
Copy link
Member

ping @danielcweeks, @rdblue, @nastra

@amogh-jahagirdar
Copy link
Contributor Author

amogh-jahagirdar commented Mar 19, 2024

@ajantha-bhat Replied my thoughts on the Trino PR. In short, don't think a patch release is required. I'll summarize why I think that here:

To be clear, there is no extra I/O being done in practice for newInputfile(DataFile) since that code path does not look to be exercised yet in the latest release. If I'm wrong there though, then I would definitely change my position.

What's remaining is the newInputFile(ManifestFile) where there would be an extra I/O during engine planning. Since the I/O is an extra head request to object stores which typically is low double digit milliseconds + the fact that only a subset of manifess are read during planning I think that it's not significant enough to warrant a patch release, but I can be convinced otherwise if folks hit a latency regression this in their workloads.

@amogh-jahagirdar amogh-jahagirdar force-pushed the fix-default-inputfile-impls branch from 9e094ca to 47e5e90 Compare April 4, 2024 19:06
@amogh-jahagirdar amogh-jahagirdar force-pushed the fix-default-inputfile-impls branch from 47e5e90 to 01a2868 Compare April 4, 2024 19:08
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great catch @amogh-jahagirdar thanks for fixing this 👍

@amogh-jahagirdar
Copy link
Contributor Author

Thanks for the reviews @ajantha-bhat and @Fokko ! Merging

@amogh-jahagirdar amogh-jahagirdar merged commit 25c909b into apache:main Apr 4, 2024
@danielcweeks danielcweeks added this to the Iceberg 1.5.1 milestone Apr 6, 2024
sasankpagolu pushed a commit to sasankpagolu/iceberg that referenced this pull request Oct 27, 2024
zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants