Author: giecrilj
Description:
Steps to reproduce:
Load the result into a HTML SCRIPT element, as follows:
<!DOCTYPE HTML PUBLIC "-W3CDTD HTML 4.01//EN"
<HTML
<HEAD
<TITLE >MediaWiki XML encoding switching problem</TITLE
<STYLE TYPE="TEXT/CSS"
<!-- .ERROR { COLOR: RED } --></STYLE
<SCRIPT ID="MWX"
TYPE="text/xml"
SRC="http://en.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=info&format=xml"
</SCRIPT ><SCRIPT TYPE="text/vbscript" ><!--
OPTION EXPLICIT
SUB WINDOW_ONLOAD
DIM A3DOC, A1X3DOC, L4ELTS, A4PARS3ERR
SET A3DOC = WINDOW. DOCUMENT
SET A1X3DOC = A3DOC. GETELEMENTBYID("MWX")
SET L4ELTS = A3DOC. FORMS. NAMEDITEM("MAIN"). ELEMENTS
L4ELTS. NAMEDITEM("FURL"). SETATTRIBUTE "value", A1X3DOC. SRC
SET A1X3DOC = A1X3DOC. XMLDOCUMENT
L4ELTS. NAMEDITEM("FXML"). SETATTRIBUTE "value", A1X3DOC. XML
SET A4PARS3ERR = A1X3DOC. PARSEERROR
L4ELTS. NAMEDITEM("FWHY"). SETATTRIBUTE "value", A4PARS3ERR. REASON
L4ELTS. NAMEDITEM("FWHERE"). SETATTRIBUTE "value", A4PARS3ERR. SRCTEXT
IF A4PARS3ERR THEN WINDOW. LOCATION. HREF = "#FWHY"
END SUB
REM --></SCRIPT ></HEAD
<BODY
<FORM ID="MAIN" ACTION="#MAIN"
<FIELDSET CLASS="RESULT"
<LEGEND >XML loaded</LEGEND
<P
The document
loaded from the <LABEL >URL <INPUT TYPE=TEXT ID=FURL READONLY >
contains the following code:
<TEXTAREA ID=FXML COLS=80 ROWS=25 READONLY ></TEXTAREA ></FIELDSET
<FIELDSET CLASS="ERROR" ><LEGEND >XML not loaded</LEGEND
<P >REASON: <BR ><TEXTAREA ID=FWHY COLS=80 READONLY ></TEXTAREA
<P >SOURCE: <BR ><TEXTAREA ID=FWHERE COLS=80 REAONLY ></TEXTAREA
</FORM ></BODY
</HTML >
Expected results:
XML returned should display in a TEXTAREA.
Actual results:
Error:
"Switch from current encoding to specified encoding not supported."
at the XML declaration "<?xml version="1.0" encoding="utf-8"?>".
Affected systems:
Microsoft HTML engine.
Diagnosis:
The error is explained at http://msdn.microsoft.com/en-us/library/aa468560.aspx#xmlencod_topic3). When the XML processor does not load the XML text itself but it relies on an external mechanism to get it (MSHTML in this case), the downloading agent is allowed to recode the text but it is not obliged to convert or strip the encoding declaration. As a result, the text presented to the XML engine has a different encoding than declared, causing the parser to fail.
Backround:
The encoding declaration is necessary only for documents that cannot be described otherwise. Documents transported via HTTP have an encoding declaration in the HTTP headers.
Since the default encoding of XML is UTF-8, declaring this encoding has no effect or causes parsing errors. There is no advantage whatsoever.
Recommendation:
Remove the encoding declaration.
Workarounds:
- Use the XML extension element instead.
- Use MSXML.DOMDocument directly from script.
Version: unspecified
Severity: normal
OS: Windows XP
Platform: PC
URL: http://en.wikipedia.org/w/api.php?action=query&titles=Albert%20Einstein&prop=info&format=xml