libxml streams use wrong `content-type` header when requesting a redirected resource

Summary

When requesting a HTTP resource using the DOM or SimpleXML extensions, the wrong content-type header is used to determine the charset when the requested resource performs a redirect.

Details

When the HTTP stream wrapper follows a redirect, it does not clear the list of captured headers before performing the following requests. This means in the returned array containing the response headers, the headers of multiple requests are stored one after each other. The final request comes last in this array.

The php_libxml_input_buffer_create_filename() / php_libxml_sniff_charset_from_stream() function scans the header array from top to bottom, returning after finding the first content-type header. This content-type header does not necessarily belong to the response that corresponds to the HTML body that is being parsed.

PoC

redirect.php

<?php

header('content-type: text/html;charset=utf-16');
header('location: http://example.com');

Run: php -S localhost:8080 and then execute

<?php

// Or using DOMDocument / SimpleXML
$document = \Dom\HTMLDocument::createFromFile("http://localhost:8080/redirect.php");

if (\str_contains($document->querySelector('body')->textContent, 'Example')) {
  throw new Exception('Refusing to store example content');
}

var_dump(\str_contains($document->saveHtml(), 'Example')); // bool(true)

Impact

This allows an attacker to cause a document to be parsed incorrectly, changing its meaning and possibly bypassing validation. When exporting such a document with ->saveHtml() the document will be returned with the original charset.

Users that request documents via HTTP using the DOM or SimpleXML extensions are impacted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libxml streams use wrong `content-type` header when requesting a redirected resource

Package

Affected versions

Patched versions

Description

Summary

Details

PoC

Impact

Severity

CVE ID

Weaknesses

Credits