-
Notifications
You must be signed in to change notification settings - Fork 3k
Spec, Core: mark 503 as non retryable error code for Update Table #13619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dd94970 to
deba2c6
Compare
|
opened a thread for this : https://lists.apache.org/thread/wg8m058z1jy9dss7jotx6g8h9ko1fxho |
deba2c6 to
5a9a8d4
Compare
dennishuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for following up on this! LGTM
I guess one thing we might consider if we're worried about the behavior change is to extract a client-configurable setting to list the error codes to consider as "UnknownState" codes on commit operations instead of only the hard-coded switch statement. But that's a tradeoff of how much complexity to expose to the caller.
Either way, I think including 503 as the default is definitely the right thing to do here, especially since #13449 means pure reads won't suffer any reduction in availability during temporary failures.
huaxingao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
open-api/rest-catalog-open-api.yaml
Outdated
| } | ||
| 503: | ||
| $ref: '#/components/responses/ServiceUnavailableResponse' | ||
| description: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I agree with this change being localized here. Shouldn't we update the #/components/responses/ServiceUnavailableResponse definition for all usage of 503. I don't see why it would only apply here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, my understanding was that we just have alignment for the 503 in context for update table, since it can lead to corruption with some fairly common tool, if we are fine to interpret whole 503 as a status code where some partitial processing can be done (it doesn't matter for idempotent requests) , happy to update it centrally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this makes sense as most of the other endpoints don't have a side-effect if retried on 503, so it shouldn't be a problem to assume that they can retry.
| throw new CommitFailedException("Commit failed: %s", error.message()); | ||
| case 500: | ||
| case 502: | ||
| case 503: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just change this to that any 5XX code throws CommitStateUnknownException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't do it for all 5xx, 501 means not Implemented, imho we can't say its commit unknown, hence 500, 502, 503, 504 are what we have, am i missing something ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the concern is that we're being conservative on what we consider erroneous and very aggressive about cleaning up in these cases. We've been through this same issue multiple times now around 5XX codes and if we consider the commit state unknown for all 5XX codes, the only downside is that we leave more files around. The side-effect of being too aggressive on cleanup is that we break a table, which is the worst option.
|
@singhpk234 we probably should host a quick vote on this since we are changing the spec. |
7ca67ae to
afaaaef
Compare
|
Thank you for the feedbacks @danielcweeks ! Just to be double confirm we want to vote for update : |
afaaaef to
57454c0
Compare
core/src/main/java/org/apache/iceberg/rest/ExponentialHttpRequestRetryStrategy.java
Show resolved
Hide resolved
57454c0 to
4d891a0
Compare
stevenzwu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Can this be added into the 1.10.0 release? |
|
Hey @mrcnc, this change will be available for 1.10 |
About the change
Mark 503 as non retryable in update table, as pointed some of the very common services like Envoy
details : https://lists.apache.org/thread/oqonscy1b4qlmovnjtbcohz38kgprgmq
There seems to be a general alignment on the direction to treat 503 as commit state unknown as the outcomes are severe as if leading to table corruption.