Skip to content

preserves HASH_FLAG_ALLOW_COW_VIOLATION in zend_hash_real_init_ex() #13013

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

ju1ius
Copy link
Contributor

@ju1ius ju1ius commented Dec 23, 2023

See #12986 for context.

Closes #12986

{
ZEND_PARSE_PARAMETERS_NONE();
HashTable *ht = _zend_new_array_0();
HT_ALLOW_COW_VIOLATION(ht);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really need this, can you not initialize the HT before calling HT_ALLOW_COW_VIOLATION? Hash tables are very hot, so changes need to be made with caution.

Copy link
Contributor Author

@ju1ius ju1ius Dec 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really need this, can you not initialize the HT before calling HT_ALLOW_COW_VIOLATION?

@iluuu1994 Yes, that's one of the possible workarounds.

But if the bug can be fixed and benchmarks show no regressions, then why not fix it? 😉

Copy link
Member

@iluuu1994 iluuu1994 Dec 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change may add several instructions + some register pressure (keeping prev_flags alive) for little benefit (see https://github.com/php/php-src/actions/runs/7309927666?pr=13013#summary-19918042085). This function may be called many thousand times during the handling of a request. If your function relies on HT_ALLOW_COW_VIOLATION, it seems better to make it responsible to set the flag in a way that is compatible with the rest of the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the benchmarks show a small but noticeable impact.

If your function relies on HT_ALLOW_COW_VIOLATION, it seems better to make it responsible to set the flag in a way that is compatible with the rest of the code.

Yeah, that's what I'm already doing for my use case.

I was hoping to come-up with a solution so that the next person needing this functionality wouldn't waste time on finding the cause of the issue and reinventing a workaround.

Alas, unless I've missed something, there doesn't seem to be a zero-cost fix for this... 😞

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the benchmarks show a small but noticeable impact.

The benchmark isn't always reliable, but I would consider the diff quite large.

I was hoping to come-up with a solution so that the next person needing this functionality wouldn't waste time on finding the cause of the issue and reinventing a workaround.

IMO, a comment works well for that. :)

Bob doesn't agree. Dmitry is the primary maintainer of hash maps, so it makes sense to wait for his comment.

@dstogov
Copy link
Member

dstogov commented Dec 25, 2023

HT_ALLOW_COW_VIOLATION is a hack that allows doing something unusual (allows modification of array without separation - that is wrong). This flag is used in a very few places, and in case you need it, you probably doing something wrong.

I don't think we should modify PHP code to support this hack better. It should be possible to solve the problem in a "user-space" code initializing the HashTable and setting HT_ALLOW_COW_VIOLATION in different order.

@ju1ius
Copy link
Contributor Author

ju1ius commented Dec 25, 2023

HT_ALLOW_COW_VIOLATION is a hack that allows doing something unusual (allows modification of array without separation - that is wrong). This flag is used in a very few places, and in case you need it, you probably doing something wrong.

I understand why you call this is a hack, but in the Rust wrapper I'm writing for the Zend API, it is in fact essential for implementing the RC=1 separation pattern for (possibly in-place) modification of hashtables while guaranteeing memory safety. I can explain the details if that is of interest to anybody.

I don't think we should modify PHP code to support this hack better. It should be possible to solve the problem in a "user-space" code initializing the HashTable and setting HT_ALLOW_COW_VIOLATION in different order.

That's fair, I understand the legitimate uses-cases are extremely rare and specific, and workarounds are indeed possible.

It seems I'm not the only one to have been bitten by this issue however, so as @iluuu1994 mentioned, the fact that the flag is cleared by zend_hash_real_init should probably be documented somewhere.

I can update this PR (or submit another one) to add said documentation if you want. Just let me know.

@dstogov
Copy link
Member

dstogov commented Dec 26, 2023

I'm writing for the Zend API, it is in fact essential for implementing the RC=1 separation pattern for (possibly in-place) modification of hashtables while guaranteeing memory safety. I can explain the details if that is of interest to anybody.

Yeah, please try to make a short explanation. (I'm not sure if I'll able to review it in the next 2 weeks).

I can update this PR (or submit another one) to add said documentation if you want. Just let me know.

In case we have docs about HT_ALLOW_COW_VIOLATION, it makes sense to add the note.

@ju1ius
Copy link
Contributor Author

ju1ius commented Dec 26, 2023

Yeah, please try to make a short explanation.

OK, trying to make it short is going to be tough but I'll try anyway...

The main point is that a zend_array being a reference-counted type with shared ownership semantics, you cannot release an exclusive hashtable reference (a &mut zend_array) from safe rust code. That would be instant UB, basically making your entire library unsound.

zvals themselves are not shared but since they can be be part of other shared, reference-counted types, it would be impossible to obtain an exclusive &mut zval reference that is part of a zend_array, because that would require obtaining an exclusive &mut zend_array first. Ditto for zend_object properties, etc.

If the Zend Engine were written in pure rust, you'd probably have something resembling a Rc<RefCell<zend_array>> to enable mutable shared ownership through runtime borrow-checking. But zend_array is a FFI type so your only option is to wrap it inside an UnsafeCell to enable interior mutability. This allows you to mutate memory behind shared references without triggering UB, but only gets you this far in terms of memory safety because you now just gave-up on the compile-time guarantees of exclusive &mut references, so it is now possible to trigger UAFs.

If you were allowed to take an exclusive reference to a zval, this example would not compile, statically preventing the possible UAF. But you're not, so this updated example now compiles but the possible UAF cannot be statically prevented. The only way to make the previous example memory safe is to make the get_array method return a zend_array after incrementing its reference count, preventing it from being dropped in case the zval is later mutated. Which could look like the following example.

Now that we've seen that the only way to safely borrow shared pointers from a zval is to do so while appropriately incrementing and decrementing their reference count, it becomes obvious that the refcount of an array borrowed from a zval will necessary be >= 2. Therefore, to actually implement the RC=1 separation rule for (possibly in-place) array mutation while preserving memory safety, we need to:

  • Check if the RC=1 rule conditions hold:
    • no?: duplicate the array and that's it, we're done.
    • yes?: enable COW violations, then increment the array's refcount (making it RC=2), then disable COW violations once we're done mutating.

Which could be implemented using a helper type like the CowGuard in the following example.

And I think that's as short as it can get for now...

@ju1ius
Copy link
Contributor Author

ju1ius commented Dec 26, 2023

In case we have docs about HT_ALLOW_COW_VIOLATION, it makes sense to add the note.

I'm afraid there are none in the source code ATM.

As for the book, well...

@dstogov
Copy link
Member

dstogov commented Dec 29, 2023

I think, I more or less understood what you are trying to do.
I'm not a Rust expert, but I see the problems on PHP side.

In case you increment reference counter of HashTable, the corresponding array may be silently separated by PHP code and your Rust/FFI will stay with a detached copy of the same array.

Usage of HT_ALLOW_COW_VIOLATION is a bell, that something is going wrong. It shouldn't be used for regular PHP arrays.

May be you should convert the captured arrays (or any zvals) to PHP references. This way you'll take a PHP reference with rc==2 and array with rc==1. This approach should work.

May be @nikic may give you a better advise.

@ju1ius
Copy link
Contributor Author

ju1ius commented Jan 2, 2024

May be you should convert the captured arrays (or any zvals) to PHP references. This way you'll take a PHP reference with rc==2 and array with rc==1. This approach should work.

Unfortunately, that wouldn't work. The rules I mentioned in my previous comment apply to all reference-counted types, which include zend_references. So you cannot safely borrow a zend_array from of a zend_reference without incrementing its reference count. Otherwise mutating the zend_reference while a borrow is alive could cause an UAF.

In case you increment reference counter of HashTable, the corresponding array may be silently separated by PHP code and your Rust/FFI will stay with a detached copy of the same array.

Yes that could theoretically happen in the examples i've linked to. In this case, the solution is simple: we just need to change our CowGuard implementation to disallow extracting the value inside the InPlace variant, like in this updated example.

At that point, the potential for invalid usage is greatly reduced, and we're not talking about memory safety issues anymore.

As a side note, the Zend API itself does not completely prevent misuse either. For example it is possible to directly modify an array's values / buckets in a ZEND_FOREACH loop, regardless of the refcount or any hashtable flags.

Usage of HT_ALLOW_COW_VIOLATION is a bell, that something is going wrong. It shouldn't be used for regular PHP arrays.

Yes, I get it. But again, it is to my knowledge the only way to actually implement the array separation rule while guaranteeing memory safety (and without refactoring the ZE itself of course). So for now my choice is between either carefuly using this flag or exposing an unsafe-only API (which would defeat the whole purpose of writing a Rust wrapper). Pick your poison I guess... Of course I'd be glad to hear other suggestions, but I'm afraid we've already gone far off-topic of the original issue.

@ju1ius
Copy link
Contributor Author

ju1ius commented Jan 2, 2024

WRT the issue at hand, I can infer from the discussion that modifying the existing behaviour is not desired.

From the following comment:

In case we have docs about HT_ALLOW_COW_VIOLATION, it makes sense to add the note.

...and the absence of documentation about this flag, I can infer that documenting the issue is not needed/wanted.

Therefore I'm closing this PR. Feel free to reopen if these assumptions were wrong. 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HASH_FLAG_ALLOW_COW_VIOLATION is not preserved by zend_hash_real_init_(mixed|packed)_ex()
3 participants