-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Track HashTableIterators for copy-on-write copies of HashTables #11248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When executing a foreach ($ht as &$ref), foreach calls zend_hash_iterator_pos_ex() on every iteration. If the HashTable contained in the $ht variable is not the tracked HashTable, it will reset the position to the internal array pointer of the array currently in $ht. This behaviour is generally fine, but undesirable for copy-on-write copies of the iterated HashTable. This may trivially occur when the iterated over HashTable is assigned to some variable, then the iterated over variable modified, leading to array separation, changing the HashTable pointer in the variable. Thus foreach happily restarting iteration. This behaviour (despite existing since PHP 7.0) is considered a bug, if not only for the behaviour being unexpected to the user, also copy-on-write should not have trivially observable side-effects by mere assignment. The bugfix consists of duplicating HashTableIterators whenever zend_array_dup() is called (the primitive used on array separation). When a further access to the HashPosition through the HashTableIterators API happens and the HashTable does not match the tracked one, all the duplicates (which are tracked by single linked list) are searched for the wanted HashTable. If found, the HashTableIterator is replaced by the found copy and all other copies are removed. This ensures that we always end up tracking the correct HashTable. Fixes phpGH-11244. Signed-off-by: Bob Weinand <[email protected]>
821c598
to
b75ae27
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to work and fixes weird behaviour.
I wasn't able to cheat the patch, but I can't be sure that all cases are going to be handled properly.
@iluuu1994 could you please also analyse this and check the performance implication.
if (zend_array_dup_element(source, target, target_idx, p, q, 0, static_keys, with_holes)) { | ||
if (source->nInternalPointer == idx) { | ||
target->nInternalPointer = target_idx; | ||
if (EXPECTED(!HT_HAS_ITERATORS(target))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be optimized for arrays without holes (note that the function is always inlined)
if (!with_holes || EXPECTED(!HT_HAS_ITERATORS(target))) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, changed :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, zend_array_dup_value has a branch, if not packed and no holes, it may still return 0, if any indirects are undef. Because of that edge case, I'm afraid that we cannot do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably fine to do it given no copy-on-writes are ever supposed to happen on arrays containing IS_INDIRECT elements, but ... meh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like my suggestion was wrong. It's better to be safe.
It's also possible to insert empty Buckets instead of compaction when iterators are used.
Zend/zend_types.h
Outdated
@@ -533,6 +533,7 @@ typedef uint32_t HashPosition; | |||
typedef struct _HashTableIterator { | |||
HashTable *ht; | |||
HashPosition pos; | |||
uint32_t next_copy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment, that copies are linked into a circular list. (If I understood this properly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes :-) Comment added.
I expect this to have virtually no overhead outside of the specific case of copying by-ref iterated arrays. There are a couple well-predictable branches (never taken outside of that scenario) with trivial conditions. |
Signed-off-by: Bob Weinand <[email protected]>
c60d0bf
to
b740dfd
Compare
When executing a foreach ($ht as &$ref), foreach calls zend_hash_iterator_pos_ex() on every iteration. If the HashTable contained in the $ht variable is not the tracked HashTable, it will reset the position to the internal array pointer of the array currently in $ht. This behaviour is generally fine, but undesirable for copy-on-write copies of the iterated HashTable. This may trivially occur when the iterated over HashTable is assigned to some variable, then the iterated over variable modified, leading to array separation, changing the HashTable pointer in the variable. Thus foreach happily restarting iteration. This behaviour (despite existing since PHP 7.0) is considered a bug, if not only for the behaviour being unexpected to the user, also copy-on-write should not have trivially observable side-effects by mere assignment.
The bugfix consists of duplicating HashTableIterators whenever zend_array_dup() is called (the primitive used on array separation). When a further access to the HashPosition through the HashTableIterators API happens and the HashTable does not match the tracked one, all the duplicates (which are tracked by single linked list) are searched for the wanted HashTable. If found, the HashTableIterator is replaced by the found copy and all other copies are removed. This ensures that we always end up tracking the correct HashTable.
Note that I'm adding a
uint32_t next_copy
to the HashTableIterator struct, which would break ABI for 32-bit builds, thus targeting master only.Given that it's longstanding behaviour, it might be okay to only fix this in PHP 8.3.
I have no idea how one would fix that in PHP 8.1. An alternative could be fixing it for 64 bit targets only...
Fixes GH-11244.