-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Fix ZTS zend signal crashes due to NULL globals #10861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Just in case ; as member you have access to the toolbar on the right to label, assign reviewers and so on. |
Thanks for reminding, just forgot to assign a reviewer. I chose Arnaud because he worked on this previously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One downside is that we may allocate thread resources for threads that may never execute a PHP request (in the context of #9649, for instance). This could be an issue if signals are delivered to many different threads unrelated to PHP.
An alternative fix would be to disable Zend Signal in ZTS builds, or to improve the ZTS implementation of Zend Signal: #10193 (comment)
A proper ZTS implementation of Zend Signal would not use thread specific variables. This would fix this issue at the same time.
WDYT?
Thanks for checking this!
Right, I see now how this is unacceptable for e.g. embedded SAPIs.
Yeah, I think the suggestion to get rid of those thread-specific variables is the best way forward. However, I don't think we can do that for stable versions, as that could be a BC break right? What this PR will do if a signal arrives, is take the path to diff --git a/Zend/zend_signal.c b/Zend/zend_signal.c
index 3c090ccb8c..3b9dd514bc 100644
--- a/Zend/zend_signal.c
+++ b/Zend/zend_signal.c
@@ -85,8 +85,10 @@ void zend_signal_handler_defer(int signo, siginfo_t *siginfo, void *context)
zend_signal_queue_t *queue, *qtmp;
#ifdef ZTS
- /* A signal could hit after TSRM shutdown, in this case globals are already freed. */
- if (tsrm_is_shutdown()) {
+ /* A signal could hit after TSRM shutdown, in this case globals are already freed.
+ * Or it could be delivered to a thread that didn't execute PHP yet.
+ * In the latter case we act as if SIGG(active) is false. */
+ if (tsrm_is_shutdown() || !tsrm_get_ls_cache()) {
/* Forward to default handler handler */
zend_signal_handler(signo, siginfo, context);
return;
@@ -178,7 +180,7 @@ static void zend_signal_handler(int signo, siginfo_t *siginfo, void *context)
sigset_t sigset;
zend_signal_entry_t p_sig;
#ifdef ZTS
- if (tsrm_is_shutdown()) {
+ if (tsrm_is_shutdown() || !tsrm_get_ls_cache()) {
p_sig.flags = 0;
p_sig.handler = SIG_DFL;
} else
For master, it would get rid of the error message, and use the Alternatively, we could backport GH-9766 to PHP-8.1+, although I still find the error message saying there is a bug weird, because as far as I understand it isn't actually a bug? What do you think? |
Alternatively you can disable Zend Signals. It's probably the best option for now. The message say their is a bug because there is likely one. A signal is handled by a thread that shouldn't handle it, and the intended signal handler is never called. |
But isn't this just a side effect of how the SAPIs setup zend signals, which is process wide. It's very much possible that a thread which never executed PHP receives a signal, but that thread might execute PHP in the future. We don't know that upfront. This situation happens on Apache for example. |
Yes, this looks reasonable to me 👍 |
Fixes phpGH-8789. Fixes phpGH-10015. This is one small part of the underlying bug for phpGH-10737, as in my attempts to reproduce the issue I constantly hit this crash easily. (The fix for the other underlying issue for that bug will follow soon.) It's possible that a signal arrives at a thread that never handled a PHP request before. This causes the signal globals to dereference a NULL pointer because the TSRM pointers for the thread aren't set up to point to the thread resources yet. PR phpGH-9766 previously fixed this for master by ignoring the signal if the thread didn't handle a PHP request yet. While this fixes the crash bug, I think the solution is suboptimal for 3 reasons: 1) The signal is ignored and a message is printed saying there is a bug. However, this is not a bug at all. For example in Apache, the signal set up happens on child process creation, and the thread resource creation happens lazily when the first request is handled by the thread. Hence, the fact that the thread resources aren't set up yet is not actually buggy behaviour. 2) I believe since it was believed to be buggy behaviour, that fix was only applied to master, so 8.1 & 8.2 keep on crashing. 3) We can do better than ignoring the signal. By just acting in the same way as if the signals aren't active. This means we need to take the same path as if the TSRM had already shut down.
d869ead
to
9d9a3fb
Compare
Thanks! I've force pushed the patch above into this PR. :) |
Fixes GH-8789.
Fixes GH-10015.
This is one small part of the underlying bug for GH-10737, as in my attempts to reproduce the issue I constantly hit this crash easily. (The fix for the other underlying issue for that bug will follow soon.)
It's possible that a signal arrives at a thread that never handled a PHP request before. This causes the signal globals to dereference a NULL pointer because the TSRM pointers for the thread aren't set up to point to the thread resources yet.
PR GH-9766 previously fixed this for master by ignoring the signal if the thread didn't handle a PHP request yet. While this fixes the crash bug, I think the solution is suboptimal for 3 reasons:
The signal is ignored and a message is printed saying there is a bug.
However, this is not a bug at all. For example in Apache, the signal
set up happens on child process creation, and the thread resource
creation happens lazily when the first request is handled by the
thread. So the fact that the thread resources aren't set up yet
is not actually buggy behaviour.
I believe since it was believed to be buggy behaviour, that fix was
only applied to master, so 8.1 & 8.2 keep on crashing.
We can do better than ignoring the signal. By just acting in the
same way as if the signals aren't active. This means we need to
take the same path as if the TSRM had already shut down.
If this is accepted, my plan on merging for master is undoing the previous fix which prints a message and returns.
cc @dunglas @arnaud-lb because you both worked on this issue in the past