-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: opcache JIT optimization >=2 [Windows cli] #18136
Comments
I get an assertion failure on 8.4.5 on Windows+JIT (when built in debug mode): |
Reduced to a standalone reproducer: <?php
namespace Foo;
function diff($point1, $point2)
{
$a = deg2rad($point1); // Prefixing these with \ also makes the issue go away
$b = deg2rad($point2);
return $a - $b;
}
function getRawDistance()
{
$distance = 0;
for ($p = 0; $p < 200; $p++) {
// Needs to be a namespaced call_user_func call to reproduce the issue (i.e. adding \ at front makes the issue go away)
$distance += call_user_func('Foo\diff', 0, $p);
}
return $distance;
}
var_dump(getRawDistance()); |
The above reproducer outputs |
Hi,
What kind of details would be usefull to understand this one? (the script is quite complex...) |
I found the root cause, and your last comment is likely related to the same root cause. The problem is due to (global) register allocation. The reason this doesn't happen on Linux is because on Linux all XMM registers are volatile, while on Windows only xmm0-xmm5 are volatile while xmm6-xmm15 are preserved. @nono303 As a workaround, try setting the R component of |
Many Thx @nielsdos for your time on it! ✅ I confirm that changing
⏩ If you need some quick (like with patch on php-src before making a PR) tests on Windows, don’t hesitate as I can easily compile binary and launch different tests |
Thanks for testing. Unfortunately, I will have difficulties making a patch for this as I'm not too familiar with the backend component of IR and it would take me some time learning that. I also think that if this means we diverge from the fixed stack frame then we get issues with the unwinding code that was added in #17095 |
The fair fix is simple. Could you check that it fixes your problem (I wasn't able to reproduce the failure with reproduction case. Only the assertion). diff --git a/ext/opcache/jit/ir/ir_ra.c b/ext/opcache/jit/ir/ir_ra.c
index 0c0e8dec3b4..fa8f639c0c3 100644
--- a/ext/opcache/jit/ir/ir_ra.c
+++ b/ext/opcache/jit/ir/ir_ra.c
@@ -4097,7 +4097,7 @@ static void assign_regs(ir_ctx *ctx)
if (IR_REGSET_DIFFERENCE(IR_REGSET_INTERSECTION(used_regs, IR_REGSET_PRESERVED),
ctx->used_preserved_regs)) {
// TODO: Preserved reg and fixed frame conflict ???
- // IR_ASSERT(0 && "Preserved reg and fixed frame conflict");
+ IR_ASSERT(0 && "Preserved reg and fixed frame conflict");
}
} else {
ctx->used_preserved_regs = IR_REGSET_UNION((ir_regset)ctx->fixed_save_regset,
diff --git a/ext/opcache/jit/zend_jit_ir.c b/ext/opcache/jit/zend_jit_ir.c
index 4d006c19d34..ff0abd8e1a0 100644
--- a/ext/opcache/jit/zend_jit_ir.c
+++ b/ext/opcache/jit/zend_jit_ir.c
@@ -33,12 +33,10 @@
# define ZREG_IP 15 /* IR_REG_R15 */
# define ZREG_FIRST_FPR 16
# if defined(_WIN64)
-# define IR_REGSET_PRESERVED ((1<<3) | (1<<5) | (1<<6) | (1<<7) | (1<<12) | (1<<13) | (1<<14) | (1<<15))
-/*
+//# define IR_REGSET_PRESERVED ((1<<3) | (1<<5) | (1<<6) | (1<<7) | (1<<12) | (1<<13) | (1<<14) | (1<<15))
# define IR_REGSET_PRESERVED ((1<<3) | (1<<5) | (1<<6) | (1<<7) | (1<<12) | (1<<13) | (1<<14) | (1<<15) | \
(1<<(16+6)) | (1<<(16+7)) | (1<<(16+8)) | (1<<(16+9)) | (1<<(16+10)) | \
(1<<(16+11)) | (1<<(16+12)) | (1<<(16+13)) | (1<<(16+14)) | (1<<(16+15)))
-*/
# define IR_SHADOW_ARGS 32
# else
# define IR_REGSET_PRESERVED ((1<<3) | (1<<5) | (1<<12) | (1<<13) | (1<<14) | (1<<15)) /* all preserved registers */
@@ -2696,7 +2694,8 @@ static void zend_jit_init_ctx(zend_jit_ctx *jit, uint32_t flags)
jit->ctx.flags |= IR_USE_FRAME_POINTER;
jit->ctx.fixed_stack_frame_size = sizeof(void*) * 16; /* 10 saved registers and 6 spill slots (8 bytes) */
#elif defined(_WIN64)
- jit->ctx.fixed_stack_frame_size = sizeof(void*) * 11; /* 8 saved registers and 3 spill slots (8 bytes) */
+// jit->ctx.fixed_stack_frame_size = sizeof(void*) * 11; /* 8 saved registers and 3 spill slots (8 bytes) */
+ jit->ctx.fixed_stack_frame_size = sizeof(void*) * 21; /* 18 saved registers and 3 spill slots (8 bytes) */
#elif defined(IR_TARGET_X86_64)
jit->ctx.fixed_stack_frame_size = sizeof(void*) * 9; /* 6 saved registers and 3 spill slots (8 bytes) */
#else /* IR_TARGET_x86 */ The drawback of the fix, that it significantly increases code of each prologue and epilogue. This is the reason it wasn't applied in the first place. After all, I think that it may make sense to avoid "fixed stack frame" for CALL VM JIT, but this would require inlining of most the JIT stubs. Probably, this is the right way, but it's not a simple fix. @nielsdos may be you have other ideas? |
@arnaud-lb please also think about this problem |
If I understand correctly, If that's the case, I think we could achieve a similar result by storing a bitset of saved registers on the top of the stack frame, so that stubs know the size of the frame and which registers they have to restore. Stub epilogues would be slower, but normal functions will usually have smaller epilogue/prologue. An alternative is to not emit an epilogue in stubs, and to call them:
NB: I don't understand the difference between CALL and HYBRID in this context. |
@dstogov That works and the patch looks right: now xmm6-xmm15 are preserved. I had tried something like this (but more hacked) but didn't pursue it because of the overhead. |
This won't work at least for linking between traces. (prologue in root trace, then few branches to side traces, then epilogue and return).
HYBRID handlers are called through C computed goto, inside the execute_ex() stack frame, and "return" through indirect jmp. They require "fixed stack frame" and may avoid saving/restoring preserved registers. |
Hi @dstogov
I'll check if i see some regression on other use-cases |
Hi,
I'm facing a strange bug on a "simple" distance calculation (addition in a loop) with PHP cli & opcache JIT enabled.
with
opcache.enable_cli = 1
andopcache.jit >= 1252
(optimization level between 2 and 5) at some time in the loop the addition (+=) give erratic results.I've extracted the distance calculation function for standalone & callback tests with same data without any issue.
So, might be related to namespace, object type or whatever...
Here are "ready to test" files to reproduce it (like on linux)
⏩ Results:
Adding a debug line to DistanceCalculator.php
results, failing @ iteration 128:
Available to perform further tests and provide details if needed
PHP Version
PHP 8.4.5 TS vs17 x64
Operating System
Windows 11 x64
The text was updated successfully, but these errors were encountered: