Skip to content

Track HashTableIterators for copy-on-write copies of HashTables #11248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 27, 2023

Conversation

bwoebi
Copy link
Member

@bwoebi bwoebi commented May 16, 2023

When executing a foreach ($ht as &$ref), foreach calls zend_hash_iterator_pos_ex() on every iteration. If the HashTable contained in the $ht variable is not the tracked HashTable, it will reset the position to the internal array pointer of the array currently in $ht. This behaviour is generally fine, but undesirable for copy-on-write copies of the iterated HashTable. This may trivially occur when the iterated over HashTable is assigned to some variable, then the iterated over variable modified, leading to array separation, changing the HashTable pointer in the variable. Thus foreach happily restarting iteration. This behaviour (despite existing since PHP 7.0) is considered a bug, if not only for the behaviour being unexpected to the user, also copy-on-write should not have trivially observable side-effects by mere assignment.

The bugfix consists of duplicating HashTableIterators whenever zend_array_dup() is called (the primitive used on array separation). When a further access to the HashPosition through the HashTableIterators API happens and the HashTable does not match the tracked one, all the duplicates (which are tracked by single linked list) are searched for the wanted HashTable. If found, the HashTableIterator is replaced by the found copy and all other copies are removed. This ensures that we always end up tracking the correct HashTable.

Note that I'm adding a uint32_t next_copy to the HashTableIterator struct, which would break ABI for 32-bit builds, thus targeting master only.
Given that it's longstanding behaviour, it might be okay to only fix this in PHP 8.3.
I have no idea how one would fix that in PHP 8.1. An alternative could be fixing it for 64 bit targets only...

Fixes GH-11244.

When executing a foreach ($ht as &$ref), foreach calls zend_hash_iterator_pos_ex() on every iteration. If the HashTable contained in the $ht variable is not the tracked HashTable, it will reset the position to the internal array pointer of the array currently in $ht.
This behaviour is generally fine, but undesirable for copy-on-write copies of the iterated HashTable. This may trivially occur when the iterated over HashTable is assigned to some variable, then the iterated over variable modified, leading to array separation, changing the HashTable pointer in the variable. Thus foreach happily restarting iteration.
This behaviour (despite existing since PHP 7.0) is considered a bug, if not only for the behaviour being unexpected to the user, also copy-on-write should not have trivially observable side-effects by mere assignment.

The bugfix consists of duplicating HashTableIterators whenever zend_array_dup() is called (the primitive used on array separation).
When a further access to the HashPosition through the HashTableIterators API happens and the HashTable does not match the tracked one, all the duplicates (which are tracked by single linked list) are searched for the wanted HashTable. If found, the HashTableIterator is replaced by the found copy and all other copies are removed.
This ensures that we always end up tracking the correct HashTable.

Fixes phpGH-11244.

Signed-off-by: Bob Weinand <[email protected]>
@bwoebi bwoebi force-pushed the by-ref-track-cow-copy branch from 821c598 to b75ae27 Compare May 16, 2023 01:50
Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work and fixes weird behaviour.
I wasn't able to cheat the patch, but I can't be sure that all cases are going to be handled properly.

@iluuu1994 could you please also analyse this and check the performance implication.

if (zend_array_dup_element(source, target, target_idx, p, q, 0, static_keys, with_holes)) {
if (source->nInternalPointer == idx) {
target->nInternalPointer = target_idx;
if (EXPECTED(!HT_HAS_ITERATORS(target))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be optimized for arrays without holes (note that the function is always inlined)

			if (!with_holes || EXPECTED(!HT_HAS_ITERATORS(target))) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, changed :-)

Copy link
Member Author

@bwoebi bwoebi May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, zend_array_dup_value has a branch, if not packed and no holes, it may still return 0, if any indirects are undef. Because of that edge case, I'm afraid that we cannot do that.

Copy link
Member Author

@bwoebi bwoebi May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably fine to do it given no copy-on-writes are ever supposed to happen on arrays containing IS_INDIRECT elements, but ... meh.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like my suggestion was wrong. It's better to be safe.
It's also possible to insert empty Buckets instead of compaction when iterators are used.

@@ -533,6 +533,7 @@ typedef uint32_t HashPosition;
typedef struct _HashTableIterator {
HashTable *ht;
HashPosition pos;
uint32_t next_copy;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment, that copies are linked into a circular list. (If I understood this properly).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes :-) Comment added.

@bwoebi
Copy link
Member Author

bwoebi commented May 16, 2023

I expect this to have virtually no overhead outside of the specific case of copying by-ref iterated arrays. There are a couple well-predictable branches (never taken outside of that scenario) with trivial conditions.

Signed-off-by: Bob Weinand <[email protected]>
@bwoebi bwoebi force-pushed the by-ref-track-cow-copy branch from c60d0bf to b740dfd Compare May 16, 2023 10:51
@bwoebi bwoebi merged commit b07a2d4 into php:master Aug 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modifying a copied by-ref iterated array resets the array position
3 participants