Kohsuke Kawaguchi (kohsuke.kawaguchi@sun.com)
I have been working on “Java Architecture for XML Binding” (aka JAXB) for a few years now. I am the lead engineer of its reference implementation. I am also a member of its expert group, which generally designs the technology. We had our fair share of issues with XML Schema, and I'd like to talk about those in this report.
The XML Schema spec says explicitly that it is not an error for <xs:import>
to fail (See Schema Representation Constraint: Import Constraints and Semantics). Because of this,
some XML Schema implementations (in particular Apache Xerces) does not report an error if:
@schemaLocation
contains a typo and fails to point to a proper file,@schemaLocation
points to a schema file but the file contains a typo and not well-formed, or@schemaLocation
points to http://schemas.xmlsoap.org/soap/envelope/ but the proxy
configuration is wrong and the resource couldn't be retrieved.Most likely, the user will receive an error like “element foo is not defined” which points to a location where he/she references a component defined in the imported schema.
These are very common operator errors, and even experienced developers get very confused because
nothing in the error message indicates that the <xs:import>
failed. More than a few people blamed our
schema compiler for this reason.
I have yet to come across the case where failing to resolve <xs:import>
is not meant to be an error. This
design of XML Schema is causing a lot of grief among developers.
In most of the places you can reference a named type, you can use an anonymous type. For example,
<xs:attribute name="foo" type="xs:string"/>
… and
<xs:attribute name="foo"> <xs:simpleType> … </xs:simpleType> <xs:attribute>
… are both allowed. However, there is one place where you cannot use an anonymous type, which is when you are defining a complex type with a simple content by extension; the simple type to be extended has to be always a named type.
This lack of consistency hurts JAXB. When we try to map an user-written class to XML Schema, sometimes we have to give a meaningless name to a simple type.
Some tools fail to detect the UPA constraint violations (in particular Altova XML Spy.) I even saw a consortium produced a “standard” schema that contains UPA violations, but it happens more often with schemas written by smaller entities.
When people run these broken schemas against our schema compiler, we reject it as an error, which only make them think that ours is broken. This trouble-shooting can get quite complicated if it involves in a type hierarchy, substitution groups, and/or wildcards.
XML Schema does not provide a way of marking the possible root elements. Because of this, given the following schema, JAXB needs to assume that the “name” element might be a root element.
<xs:complexType name="Address"> <xs:sequence> <xs:element ref="name" /> <xs:element ref="street" /> <xs:element ref="zipCode" /> </xs:sequence> </xs:complexType> <xs:element name="name" type="xs:string" /> <xs:element name="street" type="xs:string" /> <xs:element name="zipCode" type="xs:integer"/>
This prevents us from generating the following class and calling it a day.
class Address { String name; String street; BigInteger zipCode; }
It needs to generate more code for name, street, and zip code, and it complicates the Address class unnecessarily.
Many schemas are written in this style (Partly because it's recommended in XML Schemas: Best Practices), and this makes the generated code less than optimal. This is one example of “schema allowing more than what's intended”
Another example of “schema allowing more than what's intended” is the element substitution capability.
JAXB would like to allow schemas to be compiled separately and used together at the runtime. That is, if schema X refers to Y, one person can compile Y, and another person can compile X (while referencing Y), then they can put the generated code together to run.
This is challenging for many reasons, but one of the challenge is the fact that XML Schema allows element substitution by default. Suppose Y contains a following fragment:
<xs:complexType name="Address"> <xs:sequence> <xs:element ref="name" /> <xs:element ref="street" /> <xs:element ref="zipCode" /> </xs:sequence> </xs:complexType> <xs:element name="name" type="xs:string" /> <xs:element name="street" type="xs:string" /> <xs:element name="zipCode" type="xs:integer"/>
When presented this schema, JAXB needs to consider the theoretical possibility of the “name” element being substituted by another element in X.
Schema can prohibit element substitutions, but most of the schemas don't bother to set that flag. So the end result is that many schemas allow element substitutions even though they are not intended. I believe the developer community would have been better served if the element substitution is opt-in, not opt-out.
This makes it difficult for JAXB to just generate this:
class Address { String name; String street; BigInteger zipCode; }
Yet another example of “schema allowing more than what's intended” is the type substitution capability.
XML Schema allows every type reference to be substitutable by default. For example, consider the following fragment taken from UBL:
<xsd:complexType name="TextType"> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="languageID" … /> <xsd:attribute name="languageLocaleID" … /> </xsd:extension> </xsd:simpleContent> </xsd:complexType> … <xsd:element name="Name" type="xsd:string"/>
because the “TextType” derives from “string”,XML Schema considers the following document valid:
<Name xsi:type="TextType" languageID="…">Kohsuke</Name>
This happens very often in many schemas, because the only way to define an element with text and
attributes is to define an complex type like this. While type substitutions can be explicitly turned off (by the using the block
attribute),
many schemas don't bother to prohibit it, even if this substitution is not intended by the schema author.
The net result is that the schema allows unintended type substitutions.
When JAXB is presented with this schema, it has two choices;
java.lang.String
java.lang.Object
(which
is the GCD of java.lang.String
and the TextType
class)The former runs a risk of not being able to handle some valid documents. The latter is less usable.
Again I believe that the community would have been better served if it's opt-in, not opt-out.
Leonid Arbouzov (leonid.arbouzov@sun.com)
I have been working on Java conformance test suites for Java SE, JAXB and JAXP (“Java API for XML Processing”). The W3C XML Schema test suite was a great help for us. It saved us resources for test development and helped to identify and fix problems in XML Schema implementations. Over 10000 W3C XML Schema tests are included into Java conformance test suites and all Java implementations are required to pass every single test. This helps to improve conformance and compatibility of XML Schema support on Java platform. At the same time it is necessary for tests are of high quality. Working with the W3C XML Schema test suite we have discovered a number issues with tests described below.
We've discovered 383 tests (of total ~10000) that seem to contradict the XML Schema specification. This includes 120 tests in NISTTEST subsuite and 263 tests in MSXDSTEST subsuite. Developers can implement wrong semantics if they try to make their implementation pass all W3C XML Schema tests. It is not easy for them to find out which tests are valid and which are not. This may lead to incompatible implementations.
Even though the W3C XML Schema test suite contains more than 10000 tests the coverage those tests provide is not clear. We tried to do sample estimation of assertion coverage and result was 40-50% of assertion coverage. It means there are parts of the XML Schema specification which are not tested. If implementation behaviors differ in those places, this may pass unnoticed and lead to incompatibilities. Even if somebody wanted to improve test coverage, it wouldn't be easy to identify portions of specification that require additional tests.
Errata to specs are published regularly. Some of them invalidate tests. However, the test suite is not always updated accordingly, and therefore becomes out of date. It is not easy for users to find out which level or errata the test suite corresponds to. As a result, implementers do not get proper guidance and may make errors in their implementations that may lead to incompatibilities.
There should be a way for the test suite users to challenge tests and get quick response. Test suite issues do not always get proper attention from the XML Schema group.