Skip to content

Conversation

@danielcweeks
Copy link
Contributor

This adds an interface for FileIO implementations to support prefix based operations for listing and deleting.

The primary motivation is to enable supporting maintenance activities (like cleaning path directories or listing table locations) without the need to fall back to Hadoop FileSystem.

There are some notable behavioral differences between directory based and object based storage systems (e.g. for directory based storage, a the prefix must denote a directory).

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM once we switch to Tasks.foreach, everything else were just a few nits

/**
* This method provides a "best-effort" to delete all objects under the
* given prefix.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing <p>?

return () -> client().listObjectsV2Paginator(request).stream()
.flatMap(r -> r.contents().stream())
.map(o -> new FileInfo(
String.format("%s://%s/%s", s3uri.scheme(), s3uri.bucket(), o.key()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: The bucket returned by S3URI may actually be an access point reference, which could break URI parsing if this were to embed it in a location.

I think the solution is to move the S3 access point mapping out of S3URI, since that class should remain simple and report what was parsed and avoid this kind of issue. Let's not fix it here, but we should revisit this.

FYI @jackye1995.

@rdblue rdblue merged commit ac8733d into apache:master Jun 22, 2022
namrathamyske pushed a commit to namrathamyske/iceberg that referenced this pull request Jul 10, 2022
namrathamyske pushed a commit to namrathamyske/iceberg that referenced this pull request Jul 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants