uri: Fix normalization memory management for uri_parser_php_parse_url.c #19600

TimWolla · 2025-08-26T20:18:40Z

There were two issues with the previous implementation of normalization:

php_raw_url_decode_ex() would be used to modify a string with RC >1.
The return value of php_raw_url_decode_ex() was not used, resulting in incorrect string lengths when percent-encoded characters are decoded.

Additionally there was a bogus assertion that verified that strings returned from the read handlers are RC =2, which was not the case for the parse_url-based parser when repeatedly retrieving a component even without normalization happening. Remove that assertion, since its usefulness is questionable. Any obvious data type issues with read handlers should be detectable when testing during development.

This is a follow-up for the issue detected in #19587.

nielsdos · 2025-08-26T20:29:31Z

ext/zend_test/test.c

@@ -724,6 +725,67 @@ static ZEND_FUNCTION(zend_test_crash)
 	php_printf("%s", invalid);
 }

+static ZEND_FUNCTION(zend_test_uri_parser)


What I've done for DOM once is define a debugging function under #if ZEND_DEBUG. Doing that here too would keep the uri stuff in ext/uri.

I'll leave the decision to Máté 😃

I'm fine with adding this function to ext/zend_test. :) It doesn't make it more difficult for any tooling to parse stub files.

E.g. adding any symbol to stub files that is not for production should have the @undocumentable phpdoc, otherwise the documentation generator tries to sync it with the manual. AFAIK PHPStan also parses these files. So 👍 overall!

There were two issues with the previous implementation of normalization: - `php_raw_url_decode_ex()` would be used to modify a string with RC >1. - The return value of `php_raw_url_decode_ex()` was not used, resulting in incorrect string lengths when percent-encoded characters are decoded. Additionally there was a bogus assertion that verified that strings returned from the read handlers are RC =2, which was not the case for the `parse_url`-based parser when repeatedly retrieving a component even without normalization happening. Remove that assertion, since its usefulness is questionable. Any obvious data type issues with read handlers should be detectable when testing during development.

TimWolla requested a review from nielsdos August 26, 2025 20:18

TimWolla requested a review from kocsismate as a code owner August 26, 2025 20:18

github-actions bot added Extension: zend_test Extension: uri labels Aug 26, 2025

TimWolla force-pushed the uri-parse_url-memory-management branch 2 times, most recently from 6c6197e to cd01b22 Compare August 26, 2025 20:27

nielsdos reviewed Aug 26, 2025

View reviewed changes

TimWolla force-pushed the uri-parse_url-memory-management branch from cd01b22 to ad9e557 Compare August 26, 2025 20:55

kocsismate approved these changes Aug 26, 2025

View reviewed changes

TimWolla merged commit e99f1b4 into php:master Aug 26, 2025
9 checks passed

TimWolla deleted the uri-parse_url-memory-management branch August 26, 2025 21:44

TimWolla added a commit that referenced this pull request Aug 26, 2025

NEWS for recent ext/uri changes (GH-19587, GH-19591, GH-19600)

e844e68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

uri: Fix normalization memory management for uri_parser_php_parse_url.c #19600

uri: Fix normalization memory management for uri_parser_php_parse_url.c #19600

TimWolla commented Aug 26, 2025

Uh oh!

nielsdos Aug 26, 2025

Uh oh!

TimWolla Aug 26, 2025

Uh oh!

kocsismate Aug 26, 2025

Uh oh!

Uh oh!

Uh oh!

uri: Fix normalization memory management for uri_parser_php_parse_url.c #19600

uri: Fix normalization memory management for uri_parser_php_parse_url.c #19600

Conversation

TimWolla commented Aug 26, 2025

Uh oh!

nielsdos Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

TimWolla Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

kocsismate Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!