-
Notifications
You must be signed in to change notification settings - Fork 3k
Add interface for FileIO prefix operations and implementations #5096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…doopFileIO and S3FileIO
api/src/main/java/org/apache/iceberg/io/SupportsPrefixOperations.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/io/SupportsPrefixOperations.java
Outdated
Show resolved
Hide resolved
nastra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly LGTM once we switch to Tasks.foreach, everything else were just a few nits
api/src/main/java/org/apache/iceberg/io/SupportsPrefixOperations.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/io/SupportsPrefixOperations.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/io/SupportsPrefixOperations.java
Outdated
Show resolved
Hide resolved
| /** | ||
| * This method provides a "best-effort" to delete all objects under the | ||
| * given prefix. | ||
| * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing <p>?
| return () -> client().listObjectsV2Paginator(request).stream() | ||
| .flatMap(r -> r.contents().stream()) | ||
| .map(o -> new FileInfo( | ||
| String.format("%s://%s/%s", s3uri.scheme(), s3uri.bucket(), o.key()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: The bucket returned by S3URI may actually be an access point reference, which could break URI parsing if this were to embed it in a location.
I think the solution is to move the S3 access point mapping out of S3URI, since that class should remain simple and report what was parsed and avoid this kind of issue. Let's not fix it here, but we should revisit this.
FYI @jackye1995.
This adds an interface for FileIO implementations to support prefix based operations for listing and deleting.
The primary motivation is to enable supporting maintenance activities (like cleaning path directories or listing table locations) without the need to fall back to Hadoop FileSystem.
There are some notable behavioral differences between directory based and object based storage systems (e.g. for directory based storage, a the prefix must denote a directory).