Skip to content

Tail call VM [2] #18720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Tail call VM [2] #18720

wants to merge 1 commit into from

Conversation

arnaud-lb
Copy link
Member

@arnaud-lb arnaud-lb commented May 31, 2025

Related:

This part takes tail-calling and preserve_none from #17849:

  • Opcode handlers dispatch directly to the next handler by tail-calling, which reduces function call overhead and avoids returning to the executor loop
  • preserve_none reduces register saving overhead in opcode handlers

This also implements JIT support.

Non-dispatching opcode handlers

JIT needs non-dispatching opcode handlers (opcode handlers that return instead of calling the next one). I've tried two approaches for this:

  • Generate a second, non-dispatching, variant of each handler. Use the variant as call_handler
  • Use indirect dispatch similarly to the hybrid VM: zend_op->handler is a function that calls the real handler and dispatches.

I've tried both approach (the first one in this branch, and the second one in master...arnaud-lb:php-src:hybrid-tailcall.

The second approach resulted in a slightly slower VM due to indirect dispatching, and JIT generated more spilling when calling handlers as they clobber all registers.

Therefore I've taken the first approach in this PR.

A 3rd approach would be to control dispatching via an additional handler parameter, or to pass a dispatch function to handlers, but I suspect this would have been slower.

Fixed regs and preserved regs

Thanks to the preserve_none convention, JIT'ed code only has to preserve rbp, which reduces the size of prologue/epilogue. Instead of preserving it, I add it to the set of fixed registers, so it's not used. This results in faster code.

Also, quite conveniently, preserve_none receives its first arguments via registers that are callee-saved in sysv. Therefore we can use the arg1 and arg2 regs as our fixed registers. This avoids moving arg1 and arg2 to SP/IP in prologue, or setting arg1/arg2 when tail-calling other handlers.

Benchmarks:

Benchmark Mode vs base vs gcc vs valgrind
bench.php JIT -4% -0% -5%
bench.php Non-JIT -44% +3%
symfony demo JIT -0% -0%
symfony demo Non-JIT -2.8% -0%

base: Clang build of master, wall time
gcc: GCC build of master, wall time
valgrind: Clang build of master, valgrind instructions

Conclusion: Clang builds are now as fast GCC builds on the Symfony Demo benchmark in both JIT and non-JIT modes.

Issues

  • The preserve_none calling convention is documented as unstable. JIT would break if it changed. I suggest checking this at build time, and disabling this optimization (tailcalling + preserve_none) if preserve_none changed.
  • Clang currently fails to build PHP when using preserve_none and ASAN, therefore we disable this if ASAN is enabled. Edit: This seems to be fixed in recent Clang versions.

TODO

  • x86
  • aarch64
  • IR PRs
  • preserve_none configure check

#define IR_GEN_CODE (1<<22) /* C or LLVM */

#define IR_GEN_CACHE_DEMOTE (1<<23) /* Demote the generated code from closest CPU caches */
#define IR_PRESERVE_NONE_FUNC (1<<2) /* Generate a function with preserve_none calling convention */
Copy link
Member Author

@arnaud-lb arnaud-lb May 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added basic preserve_none support to IR.

What's missing:

  • Tail-calling is supported, but not normal calls
  • Var args

I will submit this as a proper PR separately

Comment on lines +9266 to +9267
/* Move op2 to a scratch register before epilogue if it's in
* used_preserved_regs, because it will be overridden. */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes a bug where TAILCALL() op2 would be overridden by epilogue before the call.

I will submit this as a proper PR separately.

@@ -135,7 +136,7 @@ void zend_const_expr_to_zval(zval *result, zend_ast **ast_ptr, bool allow_dynami
typedef int (*user_opcode_handler_t) (zend_execute_data *execute_data);

struct _zend_op {
const void *handler;
zend_vm_opcode_handler_t handler;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typed handler pointers in a few places to prevent confusion between orig handler and call handlers (zend_vm_opcode_handler_t / zend_vm_opcode_handler_func_t).

Comment on lines -436 to +510
ZEND_OPCODE_HANDLER_RET ZEND_FASTCALL zend_jit_func_trace_helper(ZEND_OPCODE_HANDLER_ARGS)
ZEND_OPCODE_HANDLER_RET ZEND_OPCODE_HANDLER_CCONV zend_jit_func_trace_helper(ZEND_OPCODE_HANDLER_ARGS)
{
ZEND_OPCODE_TAIL_CALL_EX(zend_jit_trace_counter_helper,
((ZEND_JIT_COUNTER_INIT + JIT_G(hot_func) - 1) / JIT_G(hot_func)));
zend_jit_op_array_trace_extension *jit_extension =
(zend_jit_op_array_trace_extension*)ZEND_FUNC_INFO(&EX(func)->op_array);
size_t offset = jit_extension->offset;
uint32_t cost = ((ZEND_JIT_COUNTER_INIT + JIT_G(hot_func) - 1) / JIT_G(hot_func));

*(ZEND_OP_TRACE_INFO(opline, offset)->counter) -= cost;

ZEND_OPCODE_TAIL_CALL(zend_jit_trace_counter_helper);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tail-calling was not possible due to the extra arg in zend_jit_trace_counter_helper, so I removed it.

Alternative would be to define these handlers in IR, as we do for the hybrid JIT.

@arnaud-lb
Copy link
Member Author

@dstogov this is still a WIP, but I would like to have your opinion on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant