-
Notifications
You must be signed in to change notification settings - Fork 3k
Core, AWS, REST: Promote the S3 signing endpoint to the main spec #15112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Dev ML discussion: https://lists.apache.org/thread/2kqdqb46j7jww36wwg4txv6pl2hqq9w7 This commit promotes the S3 remote signing endpoint from an AWS-specific implementation to a first-class REST catalog API endpoint. This enables other storage providers (GCS, Azure, etc.) to eventually reuse the same signing endpoint pattern without duplicating the API definition. OpenAPI Specification changes: - Add `/v1/{prefix}/namespaces/{namespace}/tables/{table}/sign/{provider}` endpoint to the main REST catalog OpenAPI spec - Define `RemoteSignRequest`, `RemoteSignResult` and `RemoteSignResponse` schemas - Remove the separate `s3-signer-open-api.yaml` from the AWS module - Update the Python client Core Module changes (iceberg-core): - Add `RemoteSignRequest` and `RemoteSignResponse` model classes, copied from the iceberg-aws module - Add `RemoteSignRequestParser` and `RemoteSignResponseParser` for JSON serialization, copied from the iceberg-aws module - Add `SIGNER_URI` and `SIGNER_ENDPOINT` properties to `CatalogProperties` for configuring the signing endpoint - Add `V1_TABLE_REMOTE_SIGN` field and `remoteSign()` method to `ResourcePaths` - Register the new endpoint in `Endpoint.java` - Add abstract `RemoteSignerServlet` base class for remote signing tests, copied from the iceberg-aws module AWS Module changes (iceberg-aws): - Deprecate `S3SignRequest` and `S3SignResponse` for removal - Deprecate `S3SignRequestParser` and `S3SignResponseParser` for removal - Deprecate `S3ObjectMapper` for removal - Refactor `S3SignerServlet` to extend `RemoteSignerServlet` - Update `S3V4RestSignerClient`
ad95a85 to
f3fc095
Compare
| $ref: '#/components/responses/AuthenticationTimeoutResponse' | ||
| 503: | ||
| $ref: '#/components/responses/ServiceUnavailableResponse' | ||
| 5XX: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wondering: is it valid in Open API to use placeholders like 5xx here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the use of 5XX as a status code in OpenAPI specifications is correct and valid:
https://spec.openapis.org/oas/v3.0.3#x4-7-16-2-patterned-fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx - TIL
open-api/rest-catalog-open-api.yaml
Outdated
| schema: | ||
| type: string | ||
| enum: | ||
| - s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this "lock" generated clients to only allow operating on s3 until the spec is changed? The other parts of this spec do not appear to be bound to S3... I wonder if we could relax this enum to be a free-form string with possible values defined in a way that does not require spec changes to adopt on the client and server sides. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I hesitated as well. I am OK with a free-form string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to free-form.
| 5XX: | ||
| $ref: '#/components/responses/ServerErrorResponse' | ||
|
|
||
| /v1/{prefix}/namespaces/{namespace}/tables/{table}/sign/{provider}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{provide} why do we need that ? a table would ideally be in one object store ? if there are multiple thats fine too, i believe we give absolute path of the uri right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this, because if/when a catalog server eventually has remote signing available for more than one object storage provider (say, S3 and Azure), it would be good if the server could determine how exactly to sign the request. Without this path parameter, the server would need to apply some heuristics to determine the right object store provider, and hence how to sign the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the server would need to apply some heuristics to determine the right object store provider
didn't get this part, we give the path we want to be signed from client to server as part of payload of this request right ? can't we extract that from there (Are you concerned with s3 / s3a / s3n semantics ?)
open-api/rest-catalog-open-api.yaml
Outdated
| If remote signing for a specific storage provider is enabled, clients must respect the following configurations when creating a remote signer client: | ||
| - `signer.uri`: the base URI of the remote signer endpoint. Optional; if absent, defaults to the catalog's base URI. | ||
| - `signer.endpoint`: the path of the remote signer endpoint. Required. Should be concatenated with `signer.uri` to form the complete URI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHOULD or MUST ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's complicated 😄
The signer client impl uses org.apache.iceberg.rest.RESTUtil#resolveEndpoint to perform the concatenation of signer.uri and signer.endpoint.
So, signer.endpoint could also be an absolute URL, in which case, signer.uri would be ignored.
I will try to come up with a better wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrased, lmk what you think!
| allOf: | ||
| - $ref: '#/components/schemas/Expression' | ||
|
|
||
| MultiValuedMap: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is S3Headers eq section in the s3 signer spec ? can we say like ObjectStoreProviderHeader ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went for a more generic name because there is nothing specific to remote signing here. This component could perfectly be used for something else in the spec.
| - `s3.secret-access-key`: secret for credentials that provide access to data in S3 | ||
| - `s3.session-token`: if present, this value should be used for as the session token | ||
| - `s3.remote-signing-enabled`: if `true` remote signing should be performed as described in the `s3-signer-open-api.yaml` specification | ||
| - `s3.remote-signing-enabled`: if `true` remote signing should be performed as described in the `RemoteSignRequest` schema section of this spec document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I chose to keep this property specific to S3. I think that even if the signer endpoint is now generic, enablement should be performed for each specific object storage.
Dev ML discussion: https://lists.apache.org/thread/2kqdqb46j7jww36wwg4txv6pl2hqq9w7
This commit promotes the S3 remote signing endpoint from an AWS-specific implementation to a first-class REST catalog API endpoint.
This enables other storage providers (GCS, Azure, etc.) to eventually reuse the same signing endpoint pattern without duplicating the API definition.
OpenAPI Specification changes:
/v1/{prefix}/namespaces/{namespace}/tables/{table}/sign/{provider}endpoint to the main REST catalog OpenAPI specRemoteSignRequest,RemoteSignResultandRemoteSignResponseschemass3-signer-open-api.yamlfrom the AWS moduleCore Module changes (iceberg-core):
RemoteSignRequestandRemoteSignResponsemodel classes, copied from the iceberg-aws moduleRemoteSignRequestParserandRemoteSignResponseParserfor JSON serialization, copied from the iceberg-aws moduleSIGNER_URIandSIGNER_ENDPOINTproperties toCatalogPropertiesfor configuring the signing endpointV1_TABLE_REMOTE_SIGNfield andremoteSign()method toResourcePathsEndpoint.javaRemoteSignerServletbase class for remote signing tests, copied from the iceberg-aws moduleAWS Module changes (iceberg-aws):
S3SignRequestandS3SignResponsefor removalS3SignRequestParserandS3SignResponseParserfor removalS3ObjectMapperfor removalS3SignerServletto extendRemoteSignerServletS3V4RestSignerClient