Skip to content

fix: disable Zend Signals by default for ZTS builds #10193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dunglas
Copy link
Member

@dunglas dunglas commented Dec 30, 2022

As described on the mailing list, Zend Signals cause segmentation faults and other problems in multi-threaded environments.

We already disabled Zend Signals for ZTS builds of the official Docker images: docker-library/php#1331
Similarly, this patch disables Zend Signals by default for ZTS build to prevent these issues. It's still possible to enable Zend Signals explicitly with --enable-zend-signals

Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is likely the best way forward, although it is a minor BC break.

Since @arnaud-lb helped with the original zend_signal implementation, what do you think?

@dunglas dunglas force-pushed the fix/zts-disable-zend-signals branch from 7f3f011 to a2858d0 Compare January 8, 2023 20:43
@arnaud-lb
Copy link
Member

arnaud-lb commented Jan 13, 2023

Thinking more about it, I believe that Zend Signal still provides some protection against Opcache corruptions in ZTS builds, and that we may be able to make it on par with non-ZTS builds.

The primary purpose of Zend Signal was to prevent interruption of critical sections by the timeout signal used by PHP, by process-management signals used by SAPIs, and by external signals whose default disposition is to terminate the process.

Interruptions are an issue when they happen while php-src is updating some data, and:

  • The data may be accessed by a signal handler
  • A signal handler does not return (e.g. long jump or thread/process termination) and the data is accessed later by the same thread, or by other threads/processes

Zend Signal delays signal handling so that interruptions do not happen during critical sections.

There used to be critical sections in zend alloc and other places, because timeout handling used to call zend_error() in a signal handler, but that's not the case anymore (#1173). Today, the only remaining critical sections are SHM-handling code in Opcache.

It's fine to not protect these critical sections as long as the following constraints are met:

  • The Opcache SHM is shared by only one process (and possibly multiple threads)
  • No signal handler is installed that may call php-src code
  • No signal handler is installed that does not return (e.g. long jump or thread/process termination)

Otherwise, Opcache corruptions may happen.

Zend Signal eliminates the 1st and 3rd constraints entirely in non-ZTS.

In ZTS it fails to eliminate the 1st constraint, because signals are delayed on a thread basis, but they can be delivered to other threads of the same process.

This seems fixable by changing SIGG(depth) and other SIGG members to be true globals. In retrospect it's wrong to have per-thread globals for signal handling, as signal dispositions are process wide. Doing so would also fix issues such as #9649 because we would be able to queue the signal and deliver it later.

@arnaud-lb
Copy link
Member

I've edited my comment above

@dunglas
Copy link
Member Author

dunglas commented Feb 3, 2023

In retrospect it's wrong to have per-thread globals for signal handling, as signal dispositions are process wide.

In #10141, we use timer_create() to deliver the timeout signal to the thread that caused it, so this will not be true anymore, and - if we want to keep Zend Signals - we'll have to keep the per-thread queue at least for this specific signal.

@arnaud-lb
Copy link
Member

It depends on what we want to protect from.

For example, there is the following threat:

  • Context: SAPI is multi-process and multi-thread, with Opcache SHM shared between all processes and threads
  • Thread 1 in process A is updating the Opcache SHM
  • Thread 2 in same process receives a signal and exits
  • All threads in the process were effectively interrupted. Thread 1 was in the middle of updating the Opcache SHM
  • Remaining processes have to live with a corrupted SHM

In order to protect from this, we need to delay signal handling in all threads when one thread is in a critical section.

we'll have to keep the per-thread queue at least for this specific signal

Indeed! We need to ensure that thread-directed signals are delivered to the right thread if we delay them

@bukka
Copy link
Member

bukka commented Dec 3, 2023

This might be problematic for FPM ZTS builds and there are obviously some users that use that (see #10219). I assume this might be for extensions using threads for some things. I remember someone was talking about using Parallel with FPM. There might be some things that doesn't work correctly like slowlog but otherwise it might work and FPM officially still support ZTS builds even though most distros won't offer them.

As you know in such case FPM is multiprocess and multiple threads might be run in each child so extra Opcache protection might be needed.

I think this should be closed and it would be better to create issues for the problems that are still broken so potentially different solutions can be find. I see that some issues are already fixed so it would be good to have issues for the ones that still exist. The only open issue that I see is #8029 .

@dunglas
Copy link
Member Author

dunglas commented Dec 3, 2023

Working on fixing the remaining issues is on my todo list but no ETA.

Another option would be to disable Zend Signals if FPM isn't enable (ZTS builds are mostly used to build libphp).

But again, the feature is currently broken on ZTS and doesn't work, there is no point enabling it while the remaining issues aren't fixed. The Docker images and the PHP builder projects (the only projects I know maintaining ZTS builds) already have disabled Zend Signals for ZTS builds.

@bukka
Copy link
Member

bukka commented Dec 4, 2023

Would be able to create GH issues for the remaining issues? We really need something more specific than just saying it's broken...

Well any packager can disable it if they want to so I don't really see a point to disable only in some cases for ZTS. It might not be just FPM but there might be other cases where it would prevent corruption. I think it would make sense to disable it only if we come to the conclusion that it's unfixable which I don't think is the case at the moment.

@dunglas
Copy link
Member Author

dunglas commented Dec 4, 2023

Most details are in this PR and in the related thread, but I'll try to open an issue (or better, propose a fix) when I'll have a bit more time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants