Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: opcache JIT optimization >=2 [Windows cli] #18136

Open
nono303 opened this issue Mar 24, 2025 · 14 comments
Open

Bug: opcache JIT optimization >=2 [Windows cli] #18136

nono303 opened this issue Mar 24, 2025 · 14 comments

Comments

@nono303
Copy link
Contributor

nono303 commented Mar 24, 2025

Hi,
I'm facing a strange bug on a "simple" distance calculation (addition in a loop) with PHP cli & opcache JIT enabled.
with opcache.enable_cli = 1 and opcache.jit >= 1252 (optimization level between 2 and 5) at some time in the loop the addition (+=) give erratic results.

  • Only appears when using the phpGPX components.
    I've extracted the distance calculation function for standalone & callback tests with same data without any issue.
    So, might be related to namespace, object type or whatever...
  • The bug doesn't affect PHP cgi: tested with the same script, same data same php.ini... configuration without any issue.
  • I'm running on Windows, self-compiled PHP 8.4.5 TS vs17 x64

Here are "ready to test" files to reproduce it (like on linux)
⏩ Results:

opcache.enable_cli:0
  standalone: 9063.663059466
  phpGPX    : 9063.6630584286

opcache.enable_cli:1    opcache.jit:disable
  standalone: 9063.663059466
  phpGPX    : 9063.6630584286
opcache.enable_cli:1    opcache.jit:1251
  standalone: 9063.663059466
  phpGPX    : 9063.6630584286
opcache.enable_cli:1    opcache.jit:1252
  standalone: 9063.663059466
  phpGPX    : 1.369121694286
opcache.enable_cli:1    opcache.jit:1253
  standalone: 9063.663059466
  phpGPX    : 1.7499852968769
opcache.enable_cli:1    opcache.jit:1254
  standalone: 9063.663059466
  phpGPX    : 1.7499852968769
opcache.enable_cli:1    opcache.jit:1255
  standalone: 9063.663059466
  phpGPX    : 1.7499852968769

Adding a debug line to DistanceCalculator.php

diff --git "a/src/phpGPX/Helpers/DistanceCalculator.php" "b/src/phpGPX/Helpers/DistanceCalculator.php"
index 5a2a51b..d2df82e 100644
--- "a/src/phpGPX/Helpers/DistanceCalculator.php"
+++ "b/src/phpGPX/Helpers/DistanceCalculator.php"
@@ -77,6 +77,8 @@ class DistanceCalculator
 			else {
 				$distance += $curPoint->difference;
 				$lastConsideredPoint = $curPoint;
+				if($p >= 120 && $p <= 130)
+					echo "it:".$p." last:".$ld." cur:".$distance." add:".$curPoint->difference.PHP_EOL;
 			}
 
 			$curPoint->distance = $distance;

results, failing @ iteration 128:

opcache.enable_cli:1 opcache.jit:1254
it:120 last:565.12951886607 cur:568.91636386652 add:3.7868450004519
it:121 last:568.91636386652 cur:574.28152730666 add:5.3651634401381
it:122 last:574.28152730666 cur:578.88597767002 add:4.604450363368
it:123 last:578.88597767002 cur:583.43520379413 add:4.5492261241002
it:124 last:583.43520379413 cur:587.74462875461 add:4.3094249604809
it:125 last:587.74462875461 cur:594.67011966358 add:6.9254909089751
it:126 last:594.67011966358 cur:598.5481926842 add:3.8780730206198
it:127 last:598.5481926842 cur:601.6696032882 add:3.1214106039962
it:128 last:0.51993194404748 cur:2.8028430036349 add:2.2829110595874
it:129 last:0.51993220497079 cur:5.3492195083835 add:4.8292873034127
it:130 last:0.51993241619443 cur:4.6919392250549 add:4.1720068088605

opcache.enable_cli:0
it:120 last:565.12951886607 cur:568.91636386652 add:3.7868450004519
it:121 last:568.91636386652 cur:574.28152730666 add:5.3651634401381
it:122 last:574.28152730666 cur:578.88597767002 add:4.604450363368
it:123 last:578.88597767002 cur:583.43520379413 add:4.5492261241002
it:124 last:583.43520379413 cur:587.74462875461 add:4.3094249604809
it:125 last:587.74462875461 cur:594.67011966358 add:6.9254909089751
it:126 last:594.67011966358 cur:598.5481926842 add:3.8780730206198
it:127 last:598.5481926842 cur:601.6696032882 add:3.1214106039962
it:128 last:601.6696032882 cur:603.95251434778 add:2.2829110595874
it:129 last:603.95251434778 cur:608.7818016512 add:4.8292873034127
it:130 last:608.7818016512 cur:612.95380846006 add:4.1720068088605

Available to perform further tests and provide details if needed

PHP Version

PHP 8.4.5 TS vs17 x64

Operating System

Windows 11 x64

@nielsdos
Copy link
Member

I get an assertion failure on 8.4.5 on Windows+JIT (when built in debug mode):
Assertion failed: _blocks[use] == b || use_insn->op == IR_PARAM, file ext\opcache\jit\ir\ir_gcm.c, line 948
On PHP-8.4 git HEAD I no longer get the assertion failure but I do get the wrong result on Windows+JIT.
I'll have a look a bit later

@nielsdos
Copy link
Member

nielsdos commented Mar 24, 2025

Reduced to a standalone reproducer:

<?php
namespace Foo;

function diff($point1, $point2)
{
	$a = deg2rad($point1); // Prefixing these with \ also makes the issue go away
	$b = deg2rad($point2);
	return $a - $b;
}

function getRawDistance()
{
	$distance = 0;
	for ($p = 0; $p < 200; $p++) {
		// Needs to be a namespaced call_user_func call to reproduce the issue (i.e. adding \ at front makes the issue go away)
		$distance += call_user_func('Foo\diff', 0, $p);
	}

	return $distance;
}

var_dump(getRawDistance());

@nielsdos
Copy link
Member

nielsdos commented Mar 24, 2025

The above reproducer outputs float(-3.473205211468716) but it should output float(-347.3205211468715). Can only reproduce it on Windows, not on Linux. Run with -d opcache.jit=tracing. Not sure where to start looking and I don't have a capstone package on Windows to look at the disassembly... cc @dstogov
EDIT: it's as if instead of addition to $distance, it assigns to $distance. The result of $distance is always equal to the result of the last call to Foo\diff.

@nono303
Copy link
Contributor Author

nono303 commented Apr 4, 2025

Hi,
Another opcache JIT optimization >=2 issue resulting a core dump
Always the same context (Windows 11 x64 + PHP 8.4.5 TS vs17 x64 self compiled) but with php-cgi
Unhandled exception at 0x00007FFD48540727 (php8ts.dll) in php-cgi.exe_250402_164329.dmp: 0xC0000005: Access violation reading location 0x000001F33605F0B8.

const _zend_array * is null in the Inline Frame call stack

[Inline Frame] php8ts.dll!zend_hash_find_bucket(const _zend_array *) Line 769
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\Zend\zend_hash.c(769)
[Inline Frame] php8ts.dll!zend_hash_find(const _zend_array *) Line 2675
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\Zend\zend_hash.c(2675)
[Inline Frame] php8ts.dll!zend_hash_find_ex(const _zend_array *) Line 192
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\Zend\zend_hash.h(192)
[Inline Frame] php8ts.dll!zend_fetch_dimension_address_inner(_zend_array * ht, const _zval_struct * dim, int) Line 2712
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\Zend\zend_execute.c(2712)
php8ts.dll!ZEND_FETCH_DIM_R_SPEC_CV_CONST_HANDLER(_zend_execute_data * execute_data) Line 42849
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\Zend\zend_vm_execute.h(42849)
php8ts.dll!execute_ex(_zend_execute_data * ex) Line 58584
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\Zend\zend_vm_execute.h(58584)
php8ts.dll!zend_execute(_zend_op_array * op_array, _zval_struct * return_value) Line 64238
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\Zend\zend_vm_execute.h(64238)
php8ts.dll!zend_execute_script(int type, _zval_struct * retval, _zend_file_handle * file_handle) Line 1935
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\Zend\zend.c(1935)
php8ts.dll!php_execute_script_ex(_zend_file_handle * primary_file, _zval_struct * retval) Line 2575
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\main\main.c(2575)
php-cgi.exe!main(int argc, char * * argv) Line 2562
	at C:\sdk\src\php-sdk\phpmaster\vs17\x64\php-src\sapi\cgi\cgi_main.c(2562)
[Inline Frame] php-cgi.exe!invoke_main() Line 78
	at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl(78)
php-cgi.exe!__scrt_common_main_seh() Line 288
	at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl(288)
kernel32.dll!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()

Image

What kind of details would be usefull to understand this one? (the script is quite complex...)

@nielsdos
Copy link
Member

nielsdos commented Apr 5, 2025

I found the root cause, and your last comment is likely related to the same root cause.

The problem is due to (global) register allocation. getRawDistance() stores $distance in %xmm6 and diff() also uses %xmm6. %xmm6 is a preserved register, but the prologue of diff() does not save this to the stack frame. ctx->used_preserved_regs does not even contain the floating point register. So the value of $distance (%xmm6) in getRawDistance() is simply overwritten by the code in diff(). I found that we enter this branch:

https://github.com/dstogov/ir/blob/8d17022fb61ebfed9f6be81a8182ea31202697ed/ir_ra.c#L4099-L4100

The reason this doesn't happen on Linux is because on Linux all XMM registers are volatile, while on Windows only xmm0-xmm5 are volatile while xmm6-xmm15 are preserved.

@nono303 As a workaround, try setting the R component of opcache.jit to 1. e.g. instead of 1252 pick 1152. Does that also workaround the bug reported in your last comment? If that doesn't work, try picking 0.

@nono303
Copy link
Contributor Author

nono303 commented Apr 5, 2025

Many Thx @nielsdos for your time on it!

✅ I confirm that changing R component of opcache.jit from 2 to 1 (whatever is O between 2 and 5) fix the issue as a workaround for getRawDistance() and also for my second complex script

> php dist.php
opcache.enable_cli:1    opcache.jit:1255
  function:    9063.663059466
  callback:    9063.663059466
  phpGPX full: 1.7499852968769
  phpGPX mini: 1.7499852906557

> php dist.php
opcache.enable_cli:1    opcache.jit:1155
  function:    9063.663059466
  callback:    9063.663059466
  phpGPX full: 9063.6630584286
  phpGPX mini: 9061.4300164687

⏩ If you need some quick (like with patch on php-src before making a PR) tests on Windows, don’t hesitate as I can easily compile binary and launch different tests

@nielsdos
Copy link
Member

nielsdos commented Apr 5, 2025

Thanks for testing. Unfortunately, I will have difficulties making a patch for this as I'm not too familiar with the backend component of IR and it would take me some time learning that. I also think that if this means we diverge from the fixed stack frame then we get issues with the unwinding code that was added in #17095
So this is a bit above my knowledge level for now and would be best if @dstogov could take a look please.

@dstogov
Copy link
Member

dstogov commented Apr 7, 2025

@nono303 thanks for the report. @nielsdos thanks for the investigation. You already found the reason and I hope it shouldn't be a big problem to fix this now. I'll take care.

@dstogov
Copy link
Member

dstogov commented Apr 7, 2025

The fair fix is simple. Could you check that it fixes your problem (I wasn't able to reproduce the failure with reproduction case. Only the assertion).

diff --git a/ext/opcache/jit/ir/ir_ra.c b/ext/opcache/jit/ir/ir_ra.c
index 0c0e8dec3b4..fa8f639c0c3 100644
--- a/ext/opcache/jit/ir/ir_ra.c
+++ b/ext/opcache/jit/ir/ir_ra.c
@@ -4097,7 +4097,7 @@ static void assign_regs(ir_ctx *ctx)
 		if (IR_REGSET_DIFFERENCE(IR_REGSET_INTERSECTION(used_regs, IR_REGSET_PRESERVED),
 			ctx->used_preserved_regs)) {
 			// TODO: Preserved reg and fixed frame conflict ???
-			// IR_ASSERT(0 && "Preserved reg and fixed frame conflict");
+			IR_ASSERT(0 && "Preserved reg and fixed frame conflict");
 		}
 	} else {
 		ctx->used_preserved_regs = IR_REGSET_UNION((ir_regset)ctx->fixed_save_regset,
diff --git a/ext/opcache/jit/zend_jit_ir.c b/ext/opcache/jit/zend_jit_ir.c
index 4d006c19d34..ff0abd8e1a0 100644
--- a/ext/opcache/jit/zend_jit_ir.c
+++ b/ext/opcache/jit/zend_jit_ir.c
@@ -33,12 +33,10 @@
 # define ZREG_IP             15 /* IR_REG_R15 */
 # define ZREG_FIRST_FPR      16
 # if defined(_WIN64)
-#  define IR_REGSET_PRESERVED ((1<<3) | (1<<5) | (1<<6) | (1<<7) | (1<<12) | (1<<13) | (1<<14) | (1<<15))
-/*
+//#  define IR_REGSET_PRESERVED ((1<<3) | (1<<5) | (1<<6) | (1<<7) | (1<<12) | (1<<13) | (1<<14) | (1<<15))
 #  define IR_REGSET_PRESERVED ((1<<3) | (1<<5) | (1<<6) | (1<<7) | (1<<12) | (1<<13) | (1<<14) | (1<<15) | \
                                (1<<(16+6)) | (1<<(16+7)) | (1<<(16+8)) | (1<<(16+9)) | (1<<(16+10)) | \
                                (1<<(16+11)) | (1<<(16+12)) | (1<<(16+13)) | (1<<(16+14)) | (1<<(16+15)))
-*/
 #  define IR_SHADOW_ARGS     32
 # else
 #  define IR_REGSET_PRESERVED ((1<<3) | (1<<5) | (1<<12) | (1<<13) | (1<<14) | (1<<15)) /* all preserved registers */
@@ -2696,7 +2694,8 @@ static void zend_jit_init_ctx(zend_jit_ctx *jit, uint32_t flags)
 			jit->ctx.flags |= IR_USE_FRAME_POINTER;
 			jit->ctx.fixed_stack_frame_size = sizeof(void*) * 16; /* 10 saved registers and 6 spill slots (8 bytes) */
 #elif defined(_WIN64)
-			jit->ctx.fixed_stack_frame_size = sizeof(void*) * 11; /* 8 saved registers and 3 spill slots (8 bytes) */
+//			jit->ctx.fixed_stack_frame_size = sizeof(void*) * 11; /* 8 saved registers and 3 spill slots (8 bytes) */
+			jit->ctx.fixed_stack_frame_size = sizeof(void*) * 21; /* 18 saved registers and 3 spill slots (8 bytes) */
 #elif defined(IR_TARGET_X86_64)
 			jit->ctx.fixed_stack_frame_size = sizeof(void*) * 9;  /* 6 saved registers and 3 spill slots (8 bytes) */
 #else /* IR_TARGET_x86 */

The drawback of the fix, that it significantly increases code of each prologue and epilogue. This is the reason it wasn't applied in the first place.

After all, I think that it may make sense to avoid "fixed stack frame" for CALL VM JIT, but this would require inlining of most the JIT stubs. Probably, this is the right way, but it's not a simple fix.

@nielsdos may be you have other ideas?

@dstogov
Copy link
Member

dstogov commented Apr 7, 2025

@arnaud-lb please also think about this problem

@arnaud-lb
Copy link
Member

If I understand correctly, fixed_stack_frame_size causes JIT'ed code to allocate a fixed amount of stack in prologue and to always save fixed_save_regset. This implies that epilogues are always the same, which allow us to jump to stubs, as they will restore registers and stack pointers correctly before they return.

If that's the case, I think we could achieve a similar result by storing a bitset of saved registers on the top of the stack frame, so that stubs know the size of the frame and which registers they have to restore. Stub epilogues would be slower, but normal functions will usually have smaller epilogue/prologue.

An alternative is to not emit an epilogue in stubs, and to call them:

call foo_stub
ret

NB: I don't understand the difference between CALL and HYBRID in this context.

@nielsdos
Copy link
Member

nielsdos commented Apr 7, 2025

@dstogov That works and the patch looks right: now xmm6-xmm15 are preserved. I had tried something like this (but more hacked) but didn't pursue it because of the overhead.

@dstogov
Copy link
Member

dstogov commented Apr 7, 2025

An alternative is to not emit an epilogue in stubs, and to call them:

This won't work at least for linking between traces. (prologue in root trace, then few branches to side traces, then epilogue and return).

NB: I don't understand the difference between CALL and HYBRID in this context.

HYBRID handlers are called through C computed goto, inside the execute_ex() stack frame, and "return" through indirect jmp. They require "fixed stack frame" and may avoid saving/restoring preserved registers.

@nono303
Copy link
Contributor Author

nono303 commented Apr 7, 2025

Hi @dstogov
✅ I can confirm that your patch applied on 8.4.6rc1 fix the issue on Windows!

>php dist.php
opcache.enable_cli:1    opcache.jit:1255
  function:    9063.663059466
  callback:    9063.663059466
  phpGPX full: 1.7499852968769
  phpGPX mini: 1.7499852906557

...

    # apply php-src_pr18136.patch
Checking patch ext/opcache/jit/ir/ir_ra.c...
Checking patch ext/opcache/jit/zend_jit_ir.c...
Applied patch ext/opcache/jit/ir/ir_ra.c cleanly.
Applied patch ext/opcache/jit/zend_jit_ir.c cleanly.

...

>php dist.php
opcache.enable_cli:1    opcache.jit:1255
  function:    9063.663059466
  callback:    9063.663059466
  phpGPX full: 9063.6630584286
  phpGPX mini: 9061.4300164687

I'll check if i see some regression on other use-cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants