Opened 6 years ago
Closed 6 years ago
#45315 closed defect (bug) (duplicate)
Canonical redirection should compare URLs taking encoding into account
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | minor | Version: | |
Component: | Canonical | Keywords: | |
Focuses: | Cc: |
Description
We're encountering redirect loops because the URL comparison logic does not take encoding fully into account.
If a layer of the stack (in our case, Varnish) re-encodes query parameters either in the "Location" header of a redirect or in a URL prior to it reaching WordPress, WordPress will issue a redirect to its preferred encoding, causing a loop as it attempts to "fix" the encoding choice that another layer repeatedly makes. Such URL rewrites (with re-encoding) may happen if the proxy is configured to sort incoming query parameters (to improve cache hit rates) or otherwise manipulate them. These are fairly common uses for a reverse proxy.
As noted in the source code, hex encodings are not case-sensitive. The RFC also states that encoding ":" and "@" is optional in query string elements. (PHP always encodes them given the lack of context in rawurlencode.) A proxy may make different choices than WordPress, but it should not cause WordPress to start a redirect loop.
One possible solution is to decode entities before comparing the requested URL with the redirect URL.
For example, in wp-includes/canonical.php around line 490, this change fixes the issue:
<?php if ( ! $redirect_url || rawurldecode($redirect_url) == rawurldecode($requested_url) ) { return; }
Technically, this change could cause some false positives where URLs match when one contains a reserved character in the same position as an encoded one in the other URL. This is an unlikely problem for canonical URL redirects, but I wanted to acknowledge the limitation.
If such false positives aren't acceptable, then I would suggest comparing the individual query string parts one by one, decoding each for its own comparison.
Presumably, similar bugs exist for encodings of the path, but I haven't investigated.
I created a workaround that's possible to prepend to
wp-config.php
or use withphp.ini
prepend functionality. It reprocesses the query string to produce one with equivalent meaning (after decoding) but meeting the expectations of the canonical URL component.The re-encoding method in the workaround should also have no false positives (in the sense of causing the canonical redirection logic to not redirect when it should).
Dropping severity to "minor" given the workaround.