-
Notifications
You must be signed in to change notification settings - Fork 7.9k
RFC: list() reference syntax - bug 6768 & 7930 #2371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Zend/zend_compile.c
Outdated
break; | ||
} | ||
prev_op->extended_value = ZEND_LIST_MAKE_WRITABLE; | ||
} while ((prev_op = (prev_op - 1)) != NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this correctly handle the case of [[$a, &$b]] = $c
, i.e. where the FETCH_LIST for $c[0]
is not immediately before the FETCH_LIST for $c[0][1]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a similar vein, if $c = []
in that example, would a notice be generate for access of $c[0]
or not? One of the variables is a simple read, while the other isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the former concern. Yes, thanks for the close eyes looking at that. I'll change around the loop there, going back looking for List related assign & fetch, to properly run the write up the list tree. Just pushed a change for it now.
Zend/zend_compile.c
Outdated
continue; | ||
} | ||
break; | ||
} else if (prev_op->opcode == ZEND_ASSIGN_REF || prev_op->opcode == ZEND_ASSIGN) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that the list() keys may contain complex expressions that emit additional opcodes, e.g. [$a, $b . $c => $d] = $e
is valid, in which case there will be a CONCAT before the FETCH_LIST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yes. I'm just trying to limit how far back we need to look. Obviously this is for optimization purposes here, but since it's during compile and not execute, I guess I could drop that check and just keep looking back through this op-stack. Not exactly efficient, but not sure if that's better or worse than keeping looking back through the op-array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, once you handle that you will run into the larger problem that keeping INDIRECT variables alive across arbitrary expressions is not memory safe, because the expression might invalidate the location the INDIRECT points to. For example, something like [[&$b, $a[] = 1 => &$c]] = $a
might reallocate the $a
array during $a[] = 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of fixing up opcodes after the fact you can do an up-front pass through the list() ast and add a flag for all list()s that recursively contain a reference. Then you can directly emit the opcode with the correct flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not be perfect, but I changed it to be a pre-look that for each elem it'd look down the list seeing if there's references so we can set flag accordingly.
As to the next issue with indirect, I'm not sure how to get a case in which it fails.
$a = [
1 => "one",
2 => "two",
];
[[1 => &$b, $a[] = 2 => $c]] = $a;
var_dump($a);
outputs
PHP Notice: Undefined offset: 0 in /home/dwalker/src/php/e.php on line 8
PHP Notice: Undefined offset: 1 in /home/dwalker/src/php/e.php on line 8
PHP Notice: Undefined offset: 2 in /home/dwalker/src/php/e.php on line 8
array(4) {
[1]=>
string(3) "one"
[2]=>
string(3) "two"
[0]=>
array(1) {
[1]=>
&NULL
}
[3]=>
int(2)
}
Which is what I'm kind of expecting to see. The $a
main array remains after the run. There's a 0
'th element that is an array with the reference to NULL
, and the odd key of $a[] = 2
does give a new element in $a
with the value 2. If there some trick to make the indirect fail better to try and understand it better.
Zend/zend_compile.h
Outdated
#define ZEND_RETURN_VAL 0 | ||
#define ZEND_RETURN_REF 1 | ||
#define ZEND_RETURN_VALUE 0 | ||
#define ZEND_RETURN_REFERENCE 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unrelated change. It's better to commit it separately with appropriate comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, sorry old habits die hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I do/revert spacing issues, question of best-practice for PR's. Interactive rebase, and remove the changes from prior commits, and force-update. Or, a commit to revert the spacing changes, and then re-commit spacing changes after?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, better ti do a squash commit at the end.
Zend/zend_execute.c
Outdated
@@ -1622,7 +1622,7 @@ static zend_always_inline zval *zend_fetch_dimension_address_inner(HashTable *ht | |||
switch (Z_TYPE_P(dim)) { | |||
case IS_UNDEF: | |||
zval_undefined_cv(EG(current_execute_data)->opline->op2.var, EG(current_execute_data)); | |||
/* break missing intentionally */ | |||
/* break missing intentionally */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, don't mix white space changes with semantic patches. commit this separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, old habits, will revert.
Zend/zend_execute.c
Outdated
@@ -1881,7 +1881,7 @@ static zend_always_inline void zend_fetch_dimension_address_read(zval *result, z | |||
dim = &EG(uninitialized_zval); | |||
} | |||
if (!Z_OBJ_HT_P(container)->read_dimension) { | |||
zend_throw_error(NULL, "Cannot use object as array"); | |||
zend_throw_error(NULL, "Cannot use object of type %s as array", ZSTR_VAL(Z_OBJCE_P(container)->name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also unrelated. I'm not sure, if we need this.
Zend/zend_vm_def.h
Outdated
ZVAL_REF(EX_VAR(opline->result.var), Z_REF_P(retval_ptr)); | ||
} else { | ||
ZVAL_COPY_VALUE(EX_VAR(opline->result.var), &retval); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of FETCH_LIST modification and relaying on extended_value, it's better to rename unmodified FETCH_LIST into FETCH_LIST_R and introduce new FETCH_LIST_RW, that accepts only CV and VAR as op1 . This should make the patch completely safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was debating doing that as well. However, since I didn't want to get too expansive on, what seemed to be a smaller change (w/ indirects earlier) I kept the one. I will split this into the two respective opcodes.
When I do that, my other concern, was for opcode caching. I'm not overly familiar with that end of the code, and am not sure the direct impact that would have there. Would you be able to shed any light on the implications of making FETCH_LIST_R/RW?
Zend/zend_compile.c
Outdated
zend_emit_assign_ref_znode(var_ast, &fetch_result); | ||
} else { | ||
zend_emit_assign_znode(var_ast, &fetch_result); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expr_node might be IS_CONST or IS_TMP_VAR.
These expressions should be prohibited in the same way as $a =& 5; or $a =& $x + $y;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will test and check for it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The $a =& 5
is prohibited as a parse error, whereas shockingly $a =& $x+$y
is executed, albeit I concur it probably shouldn't be.
<?php
$a = 10;
$b = 20;
$c =& $a + $b;
outputs:
int(10)
int(20)
int(10)
However, wrapping it like $c =& ($a + $b)
is also a parse error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, constants, and tmpvars, appear to be caught during execution raising a Fatal, wherein assignments can only happen to writable values. So it's preventing the odd uses. I assume the check you're asking for here is a compile time check, rather than an execution based one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding $c =& $a + $b
: PHP defines the =&
operator as accepting variables on both sides, so this is parsed as ($c &= $a) + $b
(as $a + $b
is not a variable). This is similar to how !$x = f()
is parsed as !($x = f())
rather than (!$x) = f()
, as most other languages would interpret it.
Zend/zend_execute.c
Outdated
{ | ||
zend_fetch_dimension_address_read(result, container, dim, IS_TMP_VAR, BP_VAR_R, 0, 0); | ||
} | ||
|
||
static zend_never_inline void zend_fetch_dimension_address_LIST_RW(zval *result, zval *container, zval *dim) | ||
{ | ||
zend_fetch_dimension_address(result, container, dim, IS_TMP_VAR, BP_VAR_RW); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, verify string offset handling.
$s = "abc"; [$a, &$b, $c]= $s; var_dump($a, $b, $c);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is expected behavior, will include it in a test as well.
PHP Fatal error: Uncaught Error: Cannot create references to/from string offsets in /home/dwalker/src/php/e.php:4
Stack trace:
#0 {main}
thrown in /src/php/e.php on line 4
Zend/zend_vm_def.h
Outdated
|
||
zend_fetch_dimension_address_LIST_R(&retval, container, GET_OP2_ZVAL_PTR_UNDEF(BP_VAR_R)); | ||
|
||
ZVAL_COPY_VALUE(EX_VAR(opline->result.var), &retval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change this to perform double copy?
RFC has been accepted. |
There is no test case for |
edit: ah, not using list() using [], I see, I can add.
|
The Travis build is current failing (compile error). |
Zend/zend_compile.c
Outdated
if (elem_ast) { | ||
zend_ast *var_ast = elem_ast->child[0]; | ||
|
||
if (elem_ast->kind == ZEND_AST_ARRAY_ELEM && elem_ast->attr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this ever be something other than an ARRAY_ELEM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so, I tried thinking of what other sub-container of variables could exist within list() syntax. Or, at least, I'm not familiar with any other.
Zend/zend_compile.h
Outdated
@@ -966,6 +966,8 @@ static zend_always_inline int zend_check_arg_send_type(const zend_function *zf, | |||
#define ZEND_RETURNS_FUNCTION 1<<0 | |||
#define ZEND_RETURNS_VALUE 1<<1 | |||
|
|||
#define ZEND_LIST_MAKE_WRITABLE 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a leftover from a previous implementation.
Zend/zend_vm_def.h
Outdated
container = GET_OP1_ZVAL_PTR_UNDEF(BP_VAR_RW); | ||
|
||
if (UNEXPECTED(OP1_TYPE == IS_VAR && !Z_ISREF_P(container))) { | ||
zend_error(E_NOTICE, "Attempting to set reference to non refereancable value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Typo "refereancable"
Zend/zend_vm_def.h
Outdated
SAVE_OPLINE(); | ||
container = GET_OP1_ZVAL_PTR_UNDEF(BP_VAR_RW); | ||
|
||
if (UNEXPECTED(OP1_TYPE == IS_VAR && !Z_ISREF_P(container))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit heavy-handed to me. VARs that can be turned into refs are not necessarily already refs. Can you try if this works correctly on something like list(&$x) = $obj->x
, where $obj->x
is not already a reference?
Zend/zend_vm_def.h
Outdated
|
||
if (UNEXPECTED(Z_TYPE(retval) == IS_INDIRECT)) { | ||
retval_ptr = Z_INDIRECT(retval); | ||
if (EXPECTED(!Z_ISREF_P(retval_ptr))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZVAL_MAKE_REF already contains this condition.
Apologies if I missed them, but I don't see test cases for some of the more perverse behaviour you can do with these sorts of assignments, e.g., list($a, &$a) = $arr;
list(&$a, $a) = $arr;
list($a, &$a) = $a;
list(&$a, $a) = $a; Or what about: $a = 1;
$b = &$a;
$arr = [&$a, &$b];
list(&$a, &$b) = $arr; // Or list(&$b, &$a) = $arr; This sort of edge case having bizarre unintended behaviour has caused enormous grief to users and other runtime developers in the past (both HHVM and future-PHP) and so are probably worth making sure there aren't any hidden mines in here somewhere :) |
This PR doesn't seem to touch the code in https://github.com/php/php-src/blob/master/Zend/zend_compile.c#L3017, so the RHS of list() is still fetched in R mode, rather than RW. Pretty sure this means that if the RHS is anything but a simple variable the reference unpacking isn't going to work correctly. Would be good to have tests where the RHS is something other than a CV. (Note that if it is fetched as RW, then there is again the INDIRECT safety issue mentioned earlier, so this would have to be converted into a reference proactively as well.) |
What also seems to be missing from the implementation is handling for destructuring in foreach, i.e. something like |
@nikic - what would be a case of a RHS list()? I will add tests for something like this:
(fwiw, the above emits a |
Also, I'm not certain how to fix the travis compile error, the complaint about the |
I'm not totally sure how this is happening. Current master contains a ZEND_FETCH_LIST reference there: https://github.com/php/php-src/blob/master/ext/opcache/Optimizer/zend_optimizer.c#L458 However, your tree does not, so I wouldn't expect this to cause a problem :/ Probably this will go away if you rebase onto master and replace the additional reference to ZEND_FETCH_LIST. |
Ok, I'm addressing other issues now (specifically your catch about
|
Any progress here? |
5af885d
to
5cf9a54
Compare
This looks pretty good to me. There is one larger issue with regard to memory safety: The PHP VM has strict requirements about no user code executing between a FETCH_W value being created and used. See http://nikic.github.io/2017/04/14/PHP-7-Virtual-machine.html#writes-and-memory-safety for a brief explanation. This issue currently arises for code like the following: $ary = [[0, 1]];
[[
0 => &$a,
($ary["foo"] = 1) => &$b
]] = $ary; This produces the following opcodes:
Notably, the @1 variable is created in # 1 and then used in # 6 and quite a lot of code runs in between. In this case the array is reallocated in the meantime, resulting in the following valgrind warnings:
An easy way to fix this is to emit an additional |
Zend/zend_compile.c
Outdated
if (zend_compile_list_assign_requires_w(var_ast) && zend_is_variable(expr_ast)) { | ||
zend_compile_var(&expr_node, expr_ast, BP_VAR_W); | ||
} else { | ||
zend_compile_expr(&expr_node, expr_ast); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it wouldn't be better to throw a compile time error in this case. After all things like $foo =& 42
are parse errors, so I would expect list(&$foo) = [42]
to be a compile error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change. I guess they are kind of synonymous, attempt to assign to a non referenceable value.
@nikic - Thanks for the explanation of the VM. (ps, enjoy your blog on the internals, super helpful). Added the ref, and valgrind is coming up clean locally, and added a test for your example case. Also flipped around so we can toss out a compiler error & added tests to the effect. |
Zend/zend_compile.c
Outdated
zend_compile_expr(&expr_node, expr_ast); | ||
} | ||
} else { | ||
zend_error_noreturn(E_COMPILE_ERROR, "Cannot assign reference to non referencable value"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the branches are inverted here. Now this error will also be thrown for normal, non-reference list assignments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I neglected to run against more than just the list tests. Yes, it should be checking not if variable, but, if needs a write.
Zend/zend_vm_def.h
Outdated
ZEND_VM_NEXT_OPCODE_CHECK_EXCEPTION(); | ||
} | ||
|
||
ZEND_VM_HANDLER(198, ZEND_FETCH_LIST_W, CONST|TMPVAR|CV, CONST|TMPVAR|CV) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible drop the the CONST|
for the first operand here now, as it is covered by the compile-time check.
Zend/zend_compile.c
Outdated
zend_compile_expr(&expr_node, expr_ast); | ||
if (zend_compile_list_assign_requires_w(var_ast)) { | ||
if (zend_is_variable(expr_ast)) { | ||
zend_compile_var(&expr_node, expr_ast, BP_VAR_W); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case also needs an extra MAKE_REF
opcode. In this case it may make sense to only emit it if expr_node.op_type != IS_CV
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you possibly have an example of code that would utilize this case, or rather, would expose why it'd be necessary? So I can use it for understanding why / adding a test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working off the previous test-case, something like this should trigger it (not tested):
$ary = [[0, 1]];
[
0 => &$a,
($ary["foo"] = 1) => &$b
] = $ary[0];
So basically the case where an array access has been shifted from the LHS to the RHS of the list assignment.
Zend/zend_compile.c
Outdated
@@ -3025,7 +3068,15 @@ void zend_compile_assign(znode *result, zend_ast *ast) /* {{{ */ | |||
/* list($a, $b) = $a should evaluate the right $a first */ | |||
zend_compile_simple_var_no_cv(&expr_node, expr_ast, BP_VAR_R, 0); | |||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The zend_list_has_assign_to_self() branch above currently doesn't handle the reference case, it also needs to switch to BP_VAR_W (+ MAKE_REF).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, fwiw, zend_compile_simple_var_no_cv
does seem to accept the BP_
type, but then proceeds to do nothing with that argument. Each emit in there is hardcoded for an _R
type as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay here. I've just pushed a tweak (9fbb019) so that zend_compile_simple_var_no_cv
does not require an explicit zend_adjust_for_fetch_type
call after it, so just passing the right fetch type should work now.
For the opcache support, I think it should be sufficient to add |
Add unit tests Update opcode name Remove obsolete test Apply changes requested - Make _w fetch from list a ref - Raise compiler error when assigning to a known non referencable value Remove unused condition in VM Fix logic for compile error, drop const from LIST_W op1 Add additional make ref, for write access Add ZEND_FETCH_LIST_W to switches and places DIM_W exists
Tests are failing due to segfault, but I'm not able to replicate anywhere I'm building on. How does one best debug that? |
* Don't generate MAKE_REF for CVs normally * But do generate it for self-assign CVs
I don't see any more open issues here. @dstogov Would you like to review as well? |
Let assign_ref handle the MAKE_REF emission. Only explicitly emit the MAKE_REF if it's for a (non-leaf) list() assignment.
Rather than recomputing it on every access, propagate only once and store it in elem_ast->attr, consistently with direct references.
Make all compile_list_assign() calls explicit, instead of automatically calling it from assign_znode(). I did not like the asymmetry between assign_znode() and assign_ref_znode() here.
Sorry for the delay, now merged as 6d4de4c into master. Thanks a lot for your work on this RFC! |
RFC: https://wiki.php.net/rfc/list_reference_assignment
Proposed implementation would expand list() assignment syntax to allow users to include reference assignments as part of the list() rather than needing to do independently.