Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persister.transaction does not rollback on panic #2857

Open
4 of 6 tasks
mitar opened this issue Nov 15, 2021 · 13 comments
Open
4 of 6 tasks

Persister.transaction does not rollback on panic #2857

mitar opened this issue Nov 15, 2021 · 13 comments
Labels
bug Something is not working.

Comments

@mitar
Copy link
Contributor

mitar commented Nov 15, 2021

Preflight checklist

Describe the bug

I am not sure if this is by design, but when looking at Hydra's code, Persister.transaction does not handle panic and does not call c.TX.Rollback on it. I think it would be cleaner if it would be handled here instead of potentially never handling it because panic got handled somewhere else and never killed the process/context.

Reproducing the bug

This is a design issue.

Relevant log output

No response

Relevant configuration

No response

Version

observed on current master branch

On which operating system are you observing this issue?

No response

In which environment are you deploying?

No response

Additional Context

See here an example of a similar function in Gorm which does similar handling, but handles panic as well.

@mitar mitar added the bug Something is not working. label Nov 15, 2021
@zepatrik
Copy link
Member

zepatrik commented Nov 15, 2021

True, it would be way cleaner to do rollback on panic. Anyone working on this, please make the changes in https://github.com/ory/x/blob/master/popx/transaction.go and then use that function in Hydra instead of the current transaction implementation.

@mitar
Copy link
Contributor Author

mitar commented Nov 15, 2021

I am not familiar with popx, but I would suggest that this function here is changed to:

func (p *Persister) transaction(ctx context.Context, f func(ctx context.Context, c *pop.Connection) error) (err error) {
	isNested := true
        panicked := true

	c, ok := ctx.Value(transactionContextKey).(*pop.Connection)
	if !ok {
		isNested = false

		c, err = p.conn.WithContext(ctx).NewTransaction()
		if err != nil {
			return errorsx.WithStack(err)
		}
		defer func() {
			if panicked || err != nil {
				if !isNested {
					// It would be better if there was RollbackUnlessCommitted.
					rollbackErr := errorsx.WithStack(c.TX.Rollback())
					if rollbackErr != nil {
						err = rollbackErr
					}
				}
			}
		}()
	}

	err = f(context.WithValue(ctx, transactionContextKey, c), c)
	if err == nil {
		// commit if there is no wrapping transaction
		if !isNested {
			err = errorsx.WithStack(c.TX.Commit())
		}
        }

	return
}

@aeneasr
Copy link
Member

aeneasr commented Nov 22, 2021

Does Ory Hydra actually have a panic? If not, I'd close this as a dupe of ory/fosite#637 and also my comment ory/fosite#637 (comment)

@mitar
Copy link
Contributor Author

mitar commented Nov 22, 2021

Are you saying that in no library Hydra is calling, they ever use panic or no code there will ever panic?

Moreover, Go panics also when there are other issues, like nil pointer dereferencing or interface assertion mismatch.

@mitar
Copy link
Contributor Author

mitar commented Nov 22, 2021

Are you saying that in no library Hydra is calling, they ever use panic or no code there will ever panic?

Hydra calls into pop, which calls into sglx, which has many uses of panic:

https://github.com/jmoiron/sqlx/search?q=panic

@aeneasr
Copy link
Member

aeneasr commented Nov 22, 2021

What I meant was if you actually have observed a panic or if this is a hypothetical fix. Please also do keep in mind that if we begin a transaction, the transaction is not "commited" until you actually do "commit". So a panic will automagically rollback the transaction if the connection is closed (on panic). So I don't really understand what bug case this would solve? Or did I miss a bug here (which is why I asked)?

@mitar
Copy link
Contributor Author

mitar commented Nov 22, 2021

I have observed panics while developing our own storage backend and when I had bugs. Hopefully I have ironed them out, but I also observed that it didn't cleanup things well. So I worry that if I have any more bug around, things might be problematic in production.

So a panic will automagically rollback the transaction if the connection is closed (on panic).

Unless you are using connection pooling, then transaction might be kept around and the next query hitting the same connection might be in for a surprise.

@aeneasr
Copy link
Member

aeneasr commented Nov 22, 2021

Unless you are using connection pooling, then transaction might be kept around and the next query hitting the same connection might be in for a surprise.

Oh now I see the issue, @zepatrik also made me aware that the problem is connection pooling in combination with panic recovery in the middleware, in combination with a panic in the storage adapter. The good news is, in Ory Hydra we do not have panic recovery because we actually want the system to crash in order to have a high impact issue which must be resolved, as panics are always very bad due to lack of controlled execution flow! Still possible to address this in Fosite though. As I said already though, the refactor will be substantial and it only covers very rare edge cases, so it's a risk / reward decision that has to be made.

@mitar
Copy link
Contributor Author

mitar commented Nov 22, 2021

The good news is, in Ory Hydra we do not have panic recovery because we actually want the system to crash in order to have a high impact issue which must be resolved, as panics are always very bad due to lack of controlled execution flow!

Oh, I am not saying that you should handle the panic, only that you should call rollback in defer and then keep the panic propagating. I understand that killing the process probably addresses the connection pooling issue (although not sure what happens if somebody is using pg_bouncer, I hope it cleans up the transaction automatically when connection dies), but I think that it could help with storage backends which are not transactional (like mongo) but one wants to take advantage of transaction API to manually cleanup things. But that is pure speculation.

At least for Hydra, the change necessary to call rollback on panic is relatively small, I posted it above. Not sure why we would not do that. For Fosite, I can look into that a bit and see how much work it really is.

@aeneasr
Copy link
Member

aeneasr commented Nov 23, 2021

Ok, sounds good :)

@mitar
Copy link
Contributor Author

mitar commented Nov 24, 2021

Done for Fosite in ory/fosite#638.

@mitar
Copy link
Contributor Author

mitar commented Jun 19, 2023

So I made this for Fosite, but for Hydra the change still has to be done, see: #2857 (comment)

@kmherrmann
Copy link
Contributor

thanks, reopening!

@kmherrmann kmherrmann reopened this Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working.
Projects
None yet
Development

No branches or pull requests

4 participants