From: thomas dot koch at ymc dot ch Operating system: Debian Lenny PHP version: 5.2.9 PHP Bug Type: XML related Bug description: no option to set HTML input encoding
Description: ------------ Enhancement request. I need a possibility to indicate the html input encoding (as parsed from the HTTP headers) when parsing a html string with DOMDocument::loadHTML. Using loadHTMLFile is not always an option. libxml2 honors the content-type meta tag, but this may not always be present. How should the input encoding be indicated? In DOMDocument::__construct() or in DOMDocument::encoding or is that both the same? One could look in libxml2/HTMLparser.c#5580, function htmlCreateFileParserCtxt(const char *filename, const char *encoding) There the encoding is set by first building a "charset=$encoding" string and passing it to htmlCheckEncoding, which in turn parses the encoding out of the string again. This may be worth cleaning up together with upstream. Reproduce code: --------------- <?php $html = <<<EOT <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html> <head> <!--meta http-equiv="content-type" content="text/html; charset=utf-8" --> </head> <body id="umlaut">süÃ</body> </html> EOT; $dom = new DOMDocument; var_dump( $dom->loadHTML( $html ) ); $elem = $dom->getElementById( 'umlaut' ); echo $elem->textContent; -- Edit bug report at http://bugs.php.net/?id=47875&edit=1 -- Try a CVS snapshot (PHP 5.2): http://bugs.php.net/fix.php?id=47875&r=trysnapshot52 Try a CVS snapshot (PHP 5.3): http://bugs.php.net/fix.php?id=47875&r=trysnapshot53 Try a CVS snapshot (PHP 6.0): http://bugs.php.net/fix.php?id=47875&r=trysnapshot60 Fixed in CVS: http://bugs.php.net/fix.php?id=47875&r=fixedcvs Fixed in CVS and need be documented: http://bugs.php.net/fix.php?id=47875&r=needdocs Fixed in release: http://bugs.php.net/fix.php?id=47875&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=47875&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=47875&r=needscript Try newer version: http://bugs.php.net/fix.php?id=47875&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=47875&r=support Expected behavior: http://bugs.php.net/fix.php?id=47875&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=47875&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=47875&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=47875&r=globals PHP 4 support discontinued: http://bugs.php.net/fix.php?id=47875&r=php4 Daylight Savings: http://bugs.php.net/fix.php?id=47875&r=dst IIS Stability: http://bugs.php.net/fix.php?id=47875&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=47875&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=47875&r=float No Zend Extensions: http://bugs.php.net/fix.php?id=47875&r=nozend MySQL Configuration Error: http://bugs.php.net/fix.php?id=47875&r=mysqlcfg
