Skip to content

Fix ZTS zend signal crashes due to NULL globals #10861

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

nielsdos
Copy link
Member

@nielsdos nielsdos commented Mar 15, 2023

Fixes GH-8789.
Fixes GH-10015.

This is one small part of the underlying bug for GH-10737, as in my attempts to reproduce the issue I constantly hit this crash easily. (The fix for the other underlying issue for that bug will follow soon.)

It's possible that a signal arrives at a thread that never handled a PHP request before. This causes the signal globals to dereference a NULL pointer because the TSRM pointers for the thread aren't set up to point to the thread resources yet.

PR GH-9766 previously fixed this for master by ignoring the signal if the thread didn't handle a PHP request yet. While this fixes the crash bug, I think the solution is suboptimal for 3 reasons:

  1. The signal is ignored and a message is printed saying there is a bug.
    However, this is not a bug at all. For example in Apache, the signal
    set up happens on child process creation, and the thread resource
    creation happens lazily when the first request is handled by the
    thread. So the fact that the thread resources aren't set up yet
    is not actually buggy behaviour.

  2. I believe since it was believed to be buggy behaviour, that fix was
    only applied to master, so 8.1 & 8.2 keep on crashing.

  3. We can do better than ignoring the signal. By just acting in the
    same way as if the signals aren't active. This means we need to
    take the same path as if the TSRM had already shut down.

If this is accepted, my plan on merging for master is undoing the previous fix which prints a message and returns.

cc @dunglas @arnaud-lb because you both worked on this issue in the past

@devnexen
Copy link
Member

Just in case ; as member you have access to the toolbar on the right to label, assign reviewers and so on.

@nielsdos nielsdos requested a review from arnaud-lb March 16, 2023 07:29
@nielsdos
Copy link
Member Author

Thanks for reminding, just forgot to assign a reviewer. I chose Arnaud because he worked on this previously.
I don't see an appropriate label though.

Copy link
Member

@arnaud-lb arnaud-lb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One downside is that we may allocate thread resources for threads that may never execute a PHP request (in the context of #9649, for instance). This could be an issue if signals are delivered to many different threads unrelated to PHP.

An alternative fix would be to disable Zend Signal in ZTS builds, or to improve the ZTS implementation of Zend Signal: #10193 (comment)

A proper ZTS implementation of Zend Signal would not use thread specific variables. This would fix this issue at the same time.

WDYT?

@nielsdos
Copy link
Member Author

Thanks for checking this!

One downside is that we may allocate thread resources for threads that may never execute a PHP request (in the context of #9649, for instance). This could be an issue if signals are delivered to many different threads unrelated to PHP.

Right, I see now how this is unacceptable for e.g. embedded SAPIs.

An alternative fix would be to disable Zend Signal in ZTS builds, or to improve the ZTS implementation of Zend Signal: #10193 (comment)

A proper ZTS implementation of Zend Signal would not use thread specific variables. This would fix this issue at the same time.

Yeah, I think the suggestion to get rid of those thread-specific variables is the best way forward. However, I don't think we can do that for stable versions, as that could be a BC break right?
I would like to at least fix the problem, especially the crashing, in stable versions.

What this PR will do if a signal arrives, is take the path to zend_signal_handler because SIGG(active) is false.
And what GH-9766 did was to return and print an error message.
Can't we solve this issue for stable versions by checking if the thread resources are already allocated (in the same way as GH-9766), and then get to zend_signal_handler and there take the same path as the "TSRM already shutdown" path? That would at least stop the crashing and I think that's semantically correct?
My suggested patch would look like this, and apply on 8.1+:

diff --git a/Zend/zend_signal.c b/Zend/zend_signal.c
index 3c090ccb8c..3b9dd514bc 100644
--- a/Zend/zend_signal.c
+++ b/Zend/zend_signal.c
@@ -85,8 +85,10 @@ void zend_signal_handler_defer(int signo, siginfo_t *siginfo, void *context)
 	zend_signal_queue_t *queue, *qtmp;
 
 #ifdef ZTS
-	/* A signal could hit after TSRM shutdown, in this case globals are already freed. */
-	if (tsrm_is_shutdown()) {
+	/* A signal could hit after TSRM shutdown, in this case globals are already freed.
+	 * Or it could be delivered to a thread that didn't execute PHP yet.
+	 * In the latter case we act as if SIGG(active) is false. */
+	if (tsrm_is_shutdown() || !tsrm_get_ls_cache()) {
 		/* Forward to default handler handler */
 		zend_signal_handler(signo, siginfo, context);
 		return;
@@ -178,7 +180,7 @@ static void zend_signal_handler(int signo, siginfo_t *siginfo, void *context)
 	sigset_t sigset;
 	zend_signal_entry_t p_sig;
 #ifdef ZTS
-	if (tsrm_is_shutdown()) {
+	if (tsrm_is_shutdown() || !tsrm_get_ls_cache()) {
 		p_sig.flags = 0;
 		p_sig.handler = SIG_DFL;
 	} else

For master, it would get rid of the error message, and use the tsrm_is_managed_thread() function.

Alternatively, we could backport GH-9766 to PHP-8.1+, although I still find the error message saying there is a bug weird, because as far as I understand it isn't actually a bug?

What do you think?

@dunglas
Copy link
Member

dunglas commented Mar 17, 2023

Alternatively you can disable Zend Signals. It's probably the best option for now.

The message say their is a bug because there is likely one. A signal is handled by a thread that shouldn't handle it, and the intended signal handler is never called.

@nielsdos
Copy link
Member Author

The message say their is a bug because there is likely one. A signal is handled by a thread that shouldn't handle it, and the intended signal handler is never called.

But isn't this just a side effect of how the SAPIs setup zend signals, which is process wide. It's very much possible that a thread which never executed PHP receives a signal, but that thread might execute PHP in the future. We don't know that upfront. This situation happens on Apache for example.

@arnaud-lb
Copy link
Member

What this PR will do if a signal arrives, is take the path to zend_signal_handler because SIGG(active) is false.
And what GH-9766 did was to return and print an error message.
Can't we solve this issue for stable versions by checking if the thread resources are already allocated (in the same way as GH-9766), and then get to zend_signal_handler and there take the same path as the "TSRM already shutdown" path? That would at least stop the crashing and I think that's semantically correct?
My suggested patch would look like this, and apply on 8.1+:

Yes, this looks reasonable to me 👍

Fixes phpGH-8789.
Fixes phpGH-10015.

This is one small part of the underlying bug for phpGH-10737, as in my
attempts to reproduce the issue I constantly hit this crash easily.
(The fix for the other underlying issue for that bug will follow soon.)

It's possible that a signal arrives at a thread that never handled a PHP
request before. This causes the signal globals to dereference a NULL
pointer because the TSRM pointers for the thread aren't set up to point
to the thread resources yet.

PR phpGH-9766 previously fixed this for master by ignoring the signal if
the thread didn't handle a PHP request yet. While this fixes the crash
bug, I think the solution is suboptimal for 3 reasons:

1) The signal is ignored and a message is printed saying there is a bug.
   However, this is not a bug at all. For example in Apache, the signal
   set up happens on child process creation, and the thread resource
   creation happens lazily when the first request is handled by the
   thread. Hence, the fact that the thread resources aren't set up yet
   is not actually buggy behaviour.

2) I believe since it was believed to be buggy behaviour, that fix was
   only applied to master, so 8.1 & 8.2 keep on crashing.

3) We can do better than ignoring the signal. By just acting in the
   same way as if the signals aren't active. This means we need to
   take the same path as if the TSRM had already shut down.
@nielsdos
Copy link
Member Author

Thanks! I've force pushed the patch above into this PR. :)

@nielsdos nielsdos requested a review from arnaud-lb March 17, 2023 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants