Skip to content

Remove Zend signal handling #5591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

alexdowad
Copy link
Contributor

Zend signal handling was added in PHP 5.4 to protect against signal handlers running at inopportune times and causing bugs. The details are in this RFC: https://wiki.php.net/rfc/zendsignals

In short, the handlers for 7 signals of concern are saved and replaced with a generic handler which delegates to the specific handlers. Inside 'critical sections', however, the generic handler puts all information regarding a signal on a queue and just returns. At the end of the critical section, all pending signals on the queue are processed and the specific handlers are called.

However, in PHP 7.1, the vm_interrupt flag was added which also protects against script execution timeouts, etc. occurring at wrong times. This eliminated most of the use cases of Zend signal handling. The one which has remained until now is accessing shared memory in OPCache. By eliminating the use of Zend signal handling there, there will be no need for Zend signal handling any more and a subsystem can be removed. This will make the codebase smaller and easier to understand.

The funny thing about the whole idea of Zend signal handling is... it seems to duplicate what Unix kernels already do. Each process/thread in Unix already has a signal mask which can be used to block signals from being delivered at inopportune times. If a signal arrives when it is masked, the kernel will store it and only deliver it once it is unmasked. So rather than storing signals on a queue and unqueueing them later, we can just let the kernel to its job.

@nikic
Copy link
Member

nikic commented May 18, 2020

Can you please check how many sigprocmask syscalls we need per included file in opcache (assuming hot cache)?

I'm fine with the general direction here on the assumption that switching to sigprocmask doesn't introduce undue overhead.

cc @dstogov

@dstogov
Copy link
Member

dstogov commented May 18, 2020

Zend signal handling allows signal masking/delaying without sigprocmask() syscalls and corresponding user<->kernel context switches. It would be good to check the performance impact of this removal on some application with big code base (many included files) and minimal execution time.

@javiereguiluz
Copy link
Contributor

You may consider using the Symfony Demo application for these performance tests. It's a real-world PHP app with lots of files and it's super easy to download and run it:

$ composer create-project symfony/symfony-demo my_project
$ cd my_project/
$ php -S localhost:8000 -t public/

@alexdowad
Copy link
Contributor Author

Zend signal handling allows signal masking/delaying without sigprocmask() syscalls and corresponding user<->kernel context switches. It would be good to check the performance impact of this removal on some application with big code base (many included files) and minimal execution time.

Certainly, I'll do so.

@alexdowad
Copy link
Contributor Author

Just trying to reproduce CI failures locally. While originally working on this code, I was green on Travis CI and Appveyor before making some final changes and opening the PR... but can't figure out just now what I did to break things. Still working on it.

@nikic nikic closed this May 18, 2020
@nikic nikic reopened this May 18, 2020
@alexdowad
Copy link
Contributor Author

Travis CI failure is apt-get failing to install packages.

@alexdowad
Copy link
Contributor Author

This exact same commit had also passed on Appveyor before: https://ci.appveyor.com/project/alexdowad/php-src/builds/32945360

@nikic
Copy link
Member

nikic commented May 18, 2020

Travis failure is spurous, AppVeyor failure is a failure on master, looks like file cache got broken.

@alexdowad
Copy link
Contributor Author

Travis failure is spurous, AppVeyor failure is a failure on master, looks like file cache got broken.

Thanks. I'll keep working to figure it out. Will proceed with performance assessment after this issue is clear.

@alexdowad
Copy link
Contributor Author

Travis failure is spurous, AppVeyor failure is a failure on master, looks like file cache got broken.

Hmm. Tried with current master and didn't get that file cache failure.

The funny thing is that on Windows, the macros which were removed all expanded to nothing, and the ones which have been added expand to do {} while(0). I'm very surprised that this patch could affect anything on Windows.

Then there is also the fact that the exact same commit was green on a different Appveyor test run.

I'm keeping my mind open to all possibilities, but trying more test runs with the same commit.

@nikic
Copy link
Member

nikic commented May 19, 2020

@alexdowad To be clear, the AppVeyor failure is not related to your changes. Maybe @cmb69 can take a look at why this happens.

@alexdowad
Copy link
Contributor Author

When running simple test files at the CLI with OPCache enabled, the macros in this patch make 10 syscalls per script.

@alexdowad
Copy link
Contributor Author

Here is what I am using to measure procsigmask syscalls:

From b45ed64538636a6b4d0dc95407293b643572c4c6 Mon Sep 17 00:00:00 2001
From: Alex Dowad <alexinbeijing@gmail.com>
Date: Tue, 19 May 2020 11:37:50 +0200
Subject: [PATCH] Measure procsigmask syscalls used to protect shm in opcache

---
 ext/opcache/ZendAccelerator.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/ext/opcache/ZendAccelerator.c b/ext/opcache/ZendAccelerator.c
index 8036ee0e11..23c36000b6 100644
--- a/ext/opcache/ZendAccelerator.c
+++ b/ext/opcache/ZendAccelerator.c
@@ -120,6 +120,7 @@ zend_bool fallback_process = 0; /* process uses file cache fallback */
 
 #ifdef HAVE_SIGPROCMASK
 static sigset_t mask_all_signals;
+static int _syscalls = 0;
 
 # if ZEND_DEBUG
 #  ifdef ZTS
@@ -139,9 +140,11 @@ static sigset_t mask_all_signals;
 # define BLOCK_ALL_SIGNALS() \
        sigset_t _oldmask; \
        DEBUG_BLOCK_ALL_SIGNALS(); \
+       _syscalls += 1; \
        MASK_ALL_SIGNALS()
 # define UNBLOCK_ALL_SIGNALS() \
        DEBUG_UNBLOCK_ALL_SIGNALS(); \
+       _syscalls += 1; \
        UNMASK_ALL_SIGNALS()
 
 # ifdef ZTS
@@ -3153,6 +3156,8 @@ void accel_shutdown(void)
        zend_jit_shutdown();
 #endif
 
+       printf("*** TOTAL SYSCALLS = %d\n", _syscalls);
+
        zend_optimizer_shutdown();
 
        zend_accel_blacklist_shutdown(&accel_blacklist);
@@ -4769,6 +4774,8 @@ static int accel_finish_startup(void)
                return SUCCESS;
        }
 
+       printf("Started OPCache\n");
+
        if (ZCG(accel_directives).preload && *ZCG(accel_directives).preload) {
 #ifdef ZEND_WIN32
                zend_accel_error(ACCEL_LOG_ERROR, "Preloading is not supported on Windows");
-- 
2.17.1

@alexdowad
Copy link
Contributor Author

When I run the symfony-demo app as suggested by @javiereguiluz, hit the main page once (and get a 404 error), then stop the app, the above code shows a total of... 2 syscalls.

I'm running it with:

11:41 ~prog/php/symfony-test % php -d extension_dir=../php-src/modules -d zend_extension=opcache -d opcache.enable=1 -d opcache.enable_cli=1 -S localhost:8000 -t public

Any suggestions of better .INI settings to use, to make it a better test?

@alexdowad
Copy link
Contributor Author

Did a little microbenchmark calling sigprocmask 10,000 times in a loop. Average was 560 nanoseconds per call.

@cmb69
Copy link
Member

cmb69 commented May 19, 2020

@nikic, the test fails, because as of yesterday's changes JIT may be enabled for file_cache_only, while formerly it never was in that case, so zend_file_cache_script_store bails out now. Not sure how to solve that; either allow jitted opcodes to be written to file cache, or, probably better, do not enable JIT if file_cache_only is requested.

@nikic
Copy link
Member

nikic commented May 19, 2020

@dstogov Can you comment on how file cache and JIT are supposed to interact? It probably does not make sense to cache JITed code in file cache right now (we probably don't generate PIC code), but shouldn't file cache still work for caching opcodes, even if JIT is enabled?

@alexdowad alexdowad force-pushed the dont-use-zend-signals branch 5 times, most recently from 7db5b99 to 3f04331 Compare May 19, 2020 19:10
@dstogov
Copy link
Member

dstogov commented May 19, 2020

@dstogov Can you comment on how file cache and JIT are supposed to interact? It probably does not make sense to cache JITed code in file cache right now (we probably don't generate PIC code), but shouldn't file cache still work for caching opcodes, even if JIT is enabled?

Currently, scripts are stored in file cache after they are already cached in SHM and JIT-ed, and unfortunately, file cache can't store opcode handlers overridden by JIT. So file caching is disabled.

In general, it's possible to perform JIT after file caching, but this would require writing to file cache under exclusve lock (zend_shared_alloc_unlock).

@nikic if you have ideas how to fix this, please let me know.

@alexdowad alexdowad force-pushed the dont-use-zend-signals branch 2 times, most recently from f89b554 to ae06ae7 Compare May 20, 2020 09:33
@alexdowad
Copy link
Contributor Author

OK, I think I have fixed the bugs now. Trying again to assess impact on performance.

Generated 1000 random source files, each of which just does a few arithmetic ops. Then another which includes all of them in a loop.

I may have been printing out the counter from the wrong place before. Now doing it from a shutdown function for opcache extension. On the test script which includes 1000 other source files, it shows 4010 syscalls.

Just running it at the shell with time, it's not possible to notice any difference in runtime with the calls to procsigmask commented out. Any impact on performance is swamped by other random variation.

Of course, I could set up a test harness to do it many times and run some stats on the results... though personally, it already seems fairly clear that the impact on performance is negligible...

@alexdowad alexdowad force-pushed the dont-use-zend-signals branch from a72dee6 to df45e01 Compare May 23, 2020 04:55
@alexdowad
Copy link
Contributor Author

Hopefully this should be ready to merge now.

@dstogov
Copy link
Member

dstogov commented May 25, 2020

I don't understand the reason of this removal. Zend signals seem to work fine and implementation is not complex. Moving "HANDLE_BLOCK_INTERRUPTIONS from hot path" may cause race conditions.
I also see a risk of breaking ZTS on less popular OSes (e.g. AIX).

@alexdowad
Copy link
Contributor Author

I don't understand the reason of this removal. Zend signals seem to work fine and implementation is not complex. Moving "HANDLE_BLOCK_INTERRUPTIONS from hot path" may cause race conditions.
I also see a risk of breaking ZTS on less popular OSes (e.g. AIX).

@dstogov Thanks for raising these good questions. The reason for submitting this PR is to (hopefully):

  1. Reduce LOC for the project without losing any functionality, and
  2. Reduce config parameters when building.

My reason for preferring smaller LOC (when the extra lines are not needed) is to make the codebase more maintainable, more readable, make it easier for new developers to get started, reduce the number of places where bugs can "hide", etc. As for config parameters, every added config parameter increases the number of possible combinations exponentially (2^n), which makes it impossible to test every possible build in CI. So reducing unneeded config parameters is also helpful.

By itself, the removal of Zend signal handling would not be transformative, but I think there may be a lot of other places where the codebase can be simplified and trimmed down without losing functionality. Taken together, these simplifications could have a major impact. We are all aware that PHP has a large number of known bugs, and any reduction in complexity is a step towards getting that open bug count down (and keeping it down).

If moving HANDLE_BLOCK_INTERRUPTIONS causes race conditions, that is a very serious issue and needs to be fixed. Could you share more details about what issues you see in that area?

If we have users on rare flavors of Unix, I certainly wouldn't want to break things for them either. What did you feel would present such a risk? Do some of these Unixes not support procsigmask?

@alexdowad
Copy link
Contributor Author

alexdowad commented May 28, 2020

If we have users on rare flavors of Unix, I certainly wouldn't want to break things for them either. What did you feel would present such a risk? Do some of these Unixes not support sigprocmask?

Just did a bit of reading in the developer docs for AIX. They say that multi-threaded programs (using pthreads) must not use sigprocmask but rather sigthreadmask.

However, it looks like AIX also supports pthread_sigmask, which is good. Both with this PR and before, ZTS builds do use pthread_sigmask.

So unless I'm missing something, it is hard to see how this PR could break PHP on AIX.

@alexdowad
Copy link
Contributor Author

Maintainers, any further comments on this proposed change? I have another bugfix PR which is stalled waiting on this one (because I based it on this branch). However, if this change is not wanted, I can still rebase that other PR on master.

@KalleZ
Copy link
Member

KalleZ commented Jun 7, 2020

ping @dstogov perhaps you can come with some more input here in response to @alexdowad

As for the branch, you should always base your branch on the lowest target branch in php-src, then you avoid issues like this, preference would always be to target an active branch in php-src. If a feature/bug fix depends on another, then hold on submitting it until the dependency is finalized would be my recommendation.

@dstogov
Copy link
Member

dstogov commented Jun 8, 2020

@alexdowad I'm also for simplification, but zend signals is really not a complex peace of code. Removing it may cause breaks, new bugs and possible performance loss.
I would prefer not to touch the working subsystem.

@alexdowad
Copy link
Contributor Author

@dstogov Fair enough. 😄 I'll close the PR. Thanks for reviewing. Just as a side point: please note that if removing ZS may cause bugs or breaks, that means that the --disable-zend-signals config parameter is wrong and should not exist, since when that parameter is used, it does just the same as this patch.

With that in mind: How would you feel about removing the --{enable,disable}-zend-signals config parameter (so that Zend signals are always used)? This would mean that there would be less variation in possible builds; the builds tested in CI would be more consistent with what people in the field may actually run.

Zend signal handling depends on sigaction being available, so the file would then be guarded with #ifdef HAVE_SIGACTION. If sigaction is not available on the platform, all the functions in zend_signal.h would be defined as no-ops (which is what happens right now if someone uses configure --disable-zend-signals).

If the maintainers are willing to have a look, I can also submit some refactorings to Zend signal handling -- for example, correcting erroneous code comments, removing #defines which are never used, etc. Another interesting thing about ZS is that although it uses a doubly linked list to hold the pending signals, the next links only ever point to the following node in the static array of nodes, and the previous links only ever point to the preceding node. In other words, there is no reason to use a linked list at all; a circular buffer would be much simpler and would trim down some unneeded code.

Thanks again for the review and for the comments already shared.

@alexdowad alexdowad closed this Jun 8, 2020
@dstogov
Copy link
Member

dstogov commented Jun 9, 2020

Removing --{enable,disable}-zend-signals may make sense.
Switching to circular buffer also looks like a good idea (both for PHP 8, of course).

@alexdowad
Copy link
Contributor Author

alexdowad commented Jun 9, 2020

@dstogov I'll give it a try.

@alexdowad
Copy link
Contributor Author

@dstogov, may I ask what the purpose of clearing SIGG(reset) during OPCache preloading is?

Could anything "bad" happen if the Zend SH deferred handlers were installed during preloading?

@dstogov
Copy link
Member

dstogov commented Jun 11, 2020

Preloading is done in context of "virtual" request at the end of PHP initialization.
If we reset signal handlers at this time, this will affect all the real subsequent requests.

@alexdowad
Copy link
Contributor Author

Preloading is done in context of "virtual" request at the end of PHP initialization.
If we reset signal handlers at this time, this will affect all the real subsequent requests.

Hmm. This is interesting. The question I'm trying to figure out now is:

If SIGG(reset) is true, won't the subsequent request install the Zend signal handlers anyways? It seems that if Zend SH is enabled, we want the deferred signal handlers to be installed as soon as possible. I can't see what could be harmful about installing them.

@alexdowad
Copy link
Contributor Author

Another comment: After looking more at the issue of SIGG(reset), it appears even more strongly that this global variable is not needed. The place mentioned above is literally the only place where it is used; everywhere else, it is set to 1.

This means that as long as Zend SH is enabled, each new request will always install the deferred signal handlers. What problem could it cause if this is also done during preloading? I really can't see anything (but would love to be proved wrong).

@dstogov
Copy link
Member

dstogov commented Jun 15, 2020

I suppose SIGG(reset) was introduced for "bad" extensions that might override signal handler during request processing.

@alexdowad
Copy link
Contributor Author

I suppose SIGG(reset) was introduced for "bad" extensions that might override signal handler during request processing.

Aha... OK, this is interesting.

Was the idea that we want to keep signal handlers set by "bad" extensions? Or that we want to replace them with the original handlers?

@nikic
Copy link
Member

nikic commented Jun 17, 2020

I think we should reconsider this decision. The problem is that zend_signals requires us to control all code that is registering signals. However, we cannot really do this with 3rd party libraries we depend on.

An instance of this I'm hitting right now, is that I'm seeing Zend signals related test failures in ext/readline using libedit, because apparently it registers some signal handlers if specific features are used. This is something we don't have control over.

Doing this using sigprocmask instead will work even if signal handlers are installed by libraries.

@alexdowad
Copy link
Contributor Author

I think we should reconsider this decision. The problem is that zend_signals requires us to control all code that is registering signals. However, we cannot really do this with 3rd party libraries we depend on.

An instance of this I'm hitting right now, is that I'm seeing Zend signals related test failures in ext/readline using libedit, because apparently it registers some signal handlers if specific features are used. This is something we don't have control over.

Doing this using sigprocmask instead will work even if signal handlers are installed by libraries.

Yeah, good point. Note that the issue you have identified is still a problem if Zend SH is selected using a config parameter -- because users who install PHP as a binary package (with Zend SH built in) may have problems using libraries that expect to set their own signal handlers.

I think the very fact that --disable-zend-signals exists shows that the feature is not needed. If it was needed for correctness, then any interpreter built with --disable-zend-signals would be broken.

And IMHO the performance argument is also very weak. In the above microbenchmark done by @nikic, there was only a clear performance win from Zend SH when loading a million empty source files. I doubt that any PHP application ever written has ever done that.

As such, if there are concerns that moving HANDLE_BLOCK_INTERRUPT out of the hot path in OPCache might not be safe, I personally feel that leaving it in the hot path and taking the microscopic performance hit wouldn't be bad.

On the other hand, if the maintainers want to keep Zend SH, I have a series of about 30 commits ready which refactor it into very nice shape.

@paresy
Copy link
Contributor

paresy commented Jun 26, 2020

I would be in favor of removing Zend Signals. In our usecase (ZTS embed) signals are not working very well with threads. Therefore, if there is no major benefit of keeping them, i would be in favor in removing them to resolve the current problems in ZTS embed environments.

@nikic
Copy link
Member

nikic commented Jun 26, 2020

@paresy Could you please explain what issues you saw with ZTS and Zend signals? (Note that the timeout issue is unrelated, that's a general signal handling problem, regardless of whether Zend signals are used or not.)

@paresy
Copy link
Contributor

paresy commented Jun 26, 2020

My app catches the SIGINT signal to make a proper shutdown. When Zend Signals is enabled the signal handlers somehow get messed up (or not properly restored, i can just assume due to some race condition while threading) and my own SIGINT handler will never be called. I can try to build a demo app if this would help. The same happens with other signals (if i recall correctly curl used some) which also got lost.

This could be the reason why the curl extension is disabling all curl signal handling on ZTS. See here:

#ifdef ZTS

This commit added CURLOPT_NOSIGNAL to the code base. d81f2e5 Unfortunately no bug report / further information is given.

@velemas
Copy link

velemas commented Jul 20, 2020

Hi from much less popular mainframe OS BS2000 (namely its POSIX subsystem). I second paresy. On recent php 7.4.8 zend signals break a lot of tests on BS2000 (when disabled ~600 tests pass additionally). Moreover with zend signals some phpdbg tests loop infinitely. I truss'ed one of them and the loop was:

getpid()->kill(SIGSEGV)->sigprocmask()->sigaction()->getpid()->...

I think it is very difficult if not impossible to implement universal zend signals which behave correctly on all kinds of signal system implementation. So please leave it at least configurable (--disable-zend-signals).

@alexdowad
Copy link
Contributor Author

The comments from @paresy and @velemas both seem to argue for removing Zend signal handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants