Definitive XML Schema (Walmsley, Priscilla)
Definitive XML Schema (Walmsley, Priscilla)
XML
Schema
Second Edition
The Charles F. Goldfarb
Definitive XML Series
Priscilla Walmsley Dmitry Kirsanov
Definitive XML Schema Second Edition XSLT 2.0 Web Development
Charles F. Goldfarb and Paul Prescod Yuri Rubinsky and Murray Maloney
Charles F. Goldfarb’s XML Handbook™ SGML on the Web:
Fifth Edition Small Steps Beyond HTML
Rick Jelliffe David Megginson
The XML and SGML Cookbook: Structuring XML Documents
Recipes for Structured Information Sean McGrath
Charles F. Goldfarb, Steve Pepper, XML Processing with Python
and Chet Ensign
XML by Example:
SGML Buyer’s Guide: Choosing the Right Building E-commerce Applications
XML and SGML Products and Services
ParseMe.1st:
G. Ken Holman SGML for Software Developers
Definitive XSL-FO Chet Ensign
Definitive XSLT and XPath $GML: The Billion Dollar Secret
Bob DuCharme Ron Turner, Tim Douglass, and
XML: The Annotated Specification Audrey Turner
SGML CD ReadMe.1st:
Truly Donovan SGML for Writers and Editors
Industrial-Strength SGML: Charles F. Goldfarb and
An Introduction to Enterprise Publishing Priscilla Walmsley
Lars Marius Garshol XML in Office 2003:
Definitive XML Application Development Information Sharing with Desktop XML
JP Morgenthal with Bill la Forge Michael Floyd
Enterprise Application Integration with Building Web Sites with XML
XML and Java Fredrick Thomas Martin
Michael Leventhal, David Lewis, and TOP SECRET Intranet:
Matthew Fuchs How U.S. Intelligence Built Intelink—The
Designing XML Internet Applications World’s Largest, Most Secure Network
Adam Hocek and David Cuddihy J. Craig Cleaveland
Definitive VoiceXML Program Generators with XML and Java
Priscilla Walmsley
The author and publisher have taken care in the preparation of this book, but make no
expressed or implied warranty of any kind and assume no responsibility for errors or omissions.
No liability is assumed for incidental or consequential damages in connection with or arising
out of the use of the information or programs contained herein.
Titles in this series are produced using XML, SGML, and/or XSL. XSL-FO documents are
rendered into PDF by the XEP Rendering Engine from RenderX: www.renderx.com.
The publisher offers excellent discounts on this book when ordered in quantity for bulk
purchases or special sales, which may include electronic versions and/or custom covers and
content particular to your business, training goals, marketing focus, and branding interests.
For more information, please contact:
International Sales
[email protected]
All rights reserved. Printed in the United States of America. This publication is protected by
copyright, and permission must be obtained from the publisher prior to any prohibited
reproduction, storage in a retrieval system, or transmission in any form or by any means,
electronic, mechanical, photocopying, recording, or likewise. To obtain permission to use
material from this work, please submit a written request to Pearson Education, Inc.,
Permissions Department, One Lake Street, Upper Saddle River, New Jersey 07458, or you
may fax your request to (201) 236–3290.
ISBN-13: 978-0-132-88672-7
ISBN-10: 0-132-88672-3
Text printed in the United States on recycled paper at Edwards Brothers Malloy in Ann
Arbor, MI.
First printing: September 2012
vii
viii Overview
Foreword xxxi
Acknowledgments xxxiii
How to use this book xxxv
Chapter 1 Schemas: An introduction 2
1.1 What is a schema? 3
1.2 The purpose of schemas 5
1.2.1 Data validation 5
1.2.2 A contract with trading partners 5
1.2.3 System documentation 6
1.2.4 Providing information to processors 6
1.2.5 Augmentation of data 6
1.2.6 Application information 6
1.3 Schema design 7
1.3.1 Accuracy and precision 7
1.3.2 Clarity 8
1.3.3 Broad applicability 8
ix
x Contents
clas·sic (adjective)
judged over a period of time to be important and of the
highest quality:
a classic novel
a classic car
After all, it is a rare book on software that even survives long enough
to be “judged over a period of time.”
Nevertheless, Definitive XML Schema satisfies every definition of
“classic.” It is one of the elite few software books that have been in
print continuously for over ten years, and an essential trustworthy guide
for tens of thousands of readers.
This Second Edition continues to be an essential and trustworthy
classic:
Essential because in the last ten years XML has become the accepted
standard for data interchange, and XML Schema 1.0 is largely
responsible. Now version 1.1 has extended the ability to specify and
xxxi
xxxii Foreword
The result, as you will see, retains the structure, clarity, patient expla-
nations, validated examples (over 450!), and well-reasoned advice that
critics praised in the 2002 edition—but now they are ten years more
up-to-date.
And after you’ve read Definitive XML Schema, Second Edition, it
won’t take another ten years for you, too, to judge it a classic.
Charles F. Goldfarb
Belmont, CA
August 2012
Acknowledgments
First and foremost, I would like to thank Charles Goldfarb for his in-
valuable guidance and support. Alina Kirsanova and Dmitry Kirsanov
did an excellent job preparing this book for publication. I would also
like to thank Mark Taub at Prentice Hall for his hand in the making
this work possible.
Of course, this book would not have been possible without the efforts
of all of the members of the W3C XML Schema Working Group, with
whom I have had the pleasure of working for six years. The content of
this book was shaped by the questions and comments of the people
who contribute to XML-DEV and xmlschema-dev.
Finally, I’d like to thank my Dad for teaching me to “get stuck into
it,” a skill which allowed me to complete this substantial project.
Priscilla Walmsley
Traverse City, Michigan
March 2012
xxxiii
How to use this
book
This book covers the two versions of XML Schema—1.0 and 1.1—and
provides revision bars to assist access to just one or the other. In refer-
ring to both versions as “XML Schema,” the book follows customary
practice, despite the official name of 1.1 being “W3C XML Schema
Definition Language (XSD) 1.1.” For either version, the book is useable
as both a tutorial and a reference.
As a tutorial, the book can be read from cover to cover with confi-
dence that each topic builds logically on the information that was pre-
viously covered. (Of course, knowledge of XML itself is always a pre-
requisite to learning about XML Schema, and is assumed in this book.)
When using this book as a reference, you have several access options
available to you:
xxxv
xxxvi How to use this book
Finally, if your interest is all of 1.1 (because you don’t already know
1.0), you can easily disregard the revision bars (that’s why they are
grayed out ).
Syntax tables
This book contains syntax tables, each summarizing the allowed syntax
of an XML Schema component. The first such table does not occur
until Section 4.2 on p. 58, by which point the undefined terms in this
explanation will have been introduced.
Syntax tables, whose captions all start with “XSD Syntax,” look like
the example below, which shows the syntax for named simple types. It
contains the following information:
Parents
schema, redefine, 1.1
override
In some cases, there is more than one syntax table for the same ele-
ment name, because certain element names in XML Schema have
multiple uses. For example, simpleType is used for both named
simple types and anonymous simple types. Each of these use cases of
simpleType allows different attributes and a different set of parent
elements, so each is described with its own table.
Companion website
This book has a companion website, maintained by the author, at
www.datypic.com/books/defxmlschema2. On the website, you can
view any errata and download the examples from this book. In addition
to the examples that appear in the book, which are generally concise
in order to illustrate a particular point, the website also has larger, more
comprehensive instances and schemas that can be copied or used to
test validation.
Schemas:
An introduction
2
Chapter
3
4 Chapter 1 | Schemas: An introduction
1.3.2 Clarity
Schemas should be very clear, allowing a reader to instantly understand
the structure and characteristics of the instance being described. Clarity
can be achieved by
1. Outside this book, two earlier unofficial names may also be in use: XML
Schema Definition Language (XSDL) and W3C XML Schema (WXS).
12 Chapter 1 | Schemas: An introduction
1.4.4.1 RELAX NG
RELAX NG covers some of the same ground as DTDs and XML
Schema. RELAX NG was developed by an OASIS technical committee
and was adopted as an ISO standard (ISO/IEC 19757-2). RELAX NG
is intended only for validation; the processor does not pass documenta-
tion or application information from the schema to the application.
RELAX NG does not have a complete built-in type library; it is
designed to use other type libraries (such as that of XML Schema).
Some of the benefits of RELAX NG over XML Schema 1.0 have
been addressed as new features in XML Schema 1.1. However,
RELAX NG still has some advantages as compared to XML Schema 1.1:
1.4.4.2 Schematron
XML Schema, DTDs, and RELAX NG are all grammar-based schema
languages. They specify what must appear in an instance, and in what
order.
Schematron, on the other hand, is rule-based. It allows you to define
a series of rules to which the document must conform. These rules are
expressed using XPath. In contrast to grammar-based languages,
Schematron considers anything that does not violate a rule to be valid.
14 Chapter 1 | Schemas: An introduction
16
Chapter
17
18 Chapter 2 | A quick tour of XML Schema
Example 2–2 shows a schema that might be used to validate our in-
stance. Its three element declarations and one attribute declaration
assign names and types to the components they declare.
locally defined, in which case they are anonymous and cannot be used
by any element or attribute declaration other than the one in which
they are defined.
2.4 Types
Types allow for validation of the content of elements and the values of
attributes. They can be either simple types or complex types. The term
“type” is used throughout this book to mean “simple or complex type.”
Attributes always have simple types, not complex types. This makes
sense, because attributes themselves cannot have children or other
attributes. Example 2–5 shows some attributes that have simple types.
Union types may have values that are either atomic values or list
values. What differentiates them is that the set of valid values, or “value
space,” for the type is the union of the value spaces of two or more
other simple types. For example, to represent a dress size, you may
define a union type that allows a value to be either an integer from 2
through 18 or one of the string values small, medium, or large.
List and union types are covered in Chapter 10.
<product>
<number>557</number>
<size>10</size>
</product>
<color value="blue"/>
26 Chapter 2 | A quick tour of XML Schema
These groups can be nested and may occur multiple times, allowing
you to create sophisticated content models. Example 2–8 shows a more
complex content model for ProductType. Instances of this new defi-
nition of ProductType must have a number child, optionally followed
In this case, only the product element has a prefixed name. This
is because the other two elements and the attribute are declared
locally. By default, locally declared components do not take on the
target namespace. However, this can be overridden by specifying
elementFormDefault and attributeFormDefault for the schema
document. This is discussed in detail in Chapters 6 and 7.
<xs:include schemaLocation="moreOrderInfo.xsd"/>
<xs:import namespace="http://datypic.com/prod"
schemaLocation="productInfo.xsd"/>
<!--...-->
</xs:schema>
namespace. Example 2–12 shows how you might include and import
other schema documents.
The include and import mechanisms are not the only way for pro-
cessors to assemble schema documents into a schema. Unfortunately,
there is not always a “main” schema document that represents the whole
schema. Instead, a processor might join schema documents from various
predefined locations, or take multiple hints from the instance. See
Chapter 4 for more information on schema composition.
2.10 Annotations
XML Schema provides many mechanisms for describing the structure
of XML documents. However, it cannot express everything there is to
know about an instance or the data it contains. For this reason, XML
Schema allows annotations to be added to almost any schema compo-
nent. These annotations can contain human-readable information
(under documentation) or application information (under appinfo).
Example 2–14 shows an annotation for the product element
declaration. Annotations are covered in Chapter 21.
2.11.5 Assertions
Assertions are XPath constraints on XML data, which allow complex
validation above and beyond what can be specified in a content model.
This is especially useful for co-constraints, where the values or existence
of certain child elements or attributes affect the validity of other child
elements or attributes. For example, “If the value of newCustomer is
false, then customerID must appear.” Chapter 14 covers assertions in
detail.
Namespaces
34
Chapter
35
36 Chapter 3 | Namespaces
The URI syntax only allows basic Latin letters and digits, with a few
special punctuation characters. Non-Latin characters can be represented,
but they must be escaped. In Namespaces 1.1, and therefore when using
XML Schema 1.1, namespace names are actually IRIs (Internationalized
Resource Identifiers) rather than URIs, which means that non-Latin
characters can be directly represented in namespace names.
Note that number appears twice, with two different prefixes. This
illustrates the usefulness of namespaces which make it obvious
whether it is a product number or an order number. In most cases, the
two can be distinguished based on their context in the instance, but
not always.
You do not need to declare xmlns:ord and xmlns:prod as at-
tributes in the order element declaration in your schema. In fact, it
3.1 | Namespaces in XML 39
Unqualified names, on the other hand, are names that are not in any
namespace. For element names, this means they are unprefixed and
there is no default namespace declaration. For attribute names, this
means they are unprefixed, period.
Prefixed names are names that contain a namespace prefix, such as
prod:product. Prefixed names are qualified names, assuming there
is a namespace declaration for that prefix in scope.
Unprefixed names are names that do not contain a prefix, such as
items. Unprefixed element names can be either qualified or unqual-
ified, depending on whether there is a default namespace declaration.
A local name is the part of a qualified name that is not the prefix. In
Example 3–3, local names include items and product.
Non-colonized names, known as NCNames, are simply XML names
that do not contain colons. That means that they are case-sensitive,
they may start with a letter or underscore (_), and contain letters, digits,
underscores (_), dashes (-), and periods (.). They cannot start with
the letters “XML” either in lower or uppercase. All local names and
3.1 | Namespaces in XML 41
In version 1.1 (but not in 1.0), you can also undeclare a prefix by
using an empty string. In Example 3–9, the namespace declaration for
the ord prefix in the product start tag undeclares the one on the root
element, meaning that the ord prefix is undefined within the scope of
product.
Although an element cannot have two attributes with the same name,
this example is valid because the attribute names are in different
namespaces (or rather, one is in a namespace and one is not), and they
therefore are considered to have different names.
Example 3–11 is also valid, even though the default namespace and
the namespace mapped to the prod prefix are the same. This is again
because the unprefixed system attribute is not in any namespace.
Example 3–11. Two more attributes with the same local name
<product xmlns="http://datypic.com/prod"
xmlns:prod="http://datypic.com/prod">
<number>557</number>
<size system="US-DRESS" prod:system="R32">10</size>
</product>
Table 3–1 explains which namespace each name is in, and why.
3.1 | Namespaces in XML 47
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element ref="number"/>
<xs:element ref="size"/>
</xs:sequence>
</xs:complexType>
<!--...-->
</xs:schema>
If you do not plan to use namespaces, you are not required to specify
a target namespace. In this case, omit the targetNamespace attribute
entirely.
It is interesting to note that while all the element names are prefixed,
all of the attribute names are unprefixed. This is because none of
the attributes in the XML Schema Namespace is declared globally.
This is explained further in Section 7.4 on p. 122.
3.3 | Using namespaces in schemas 51
Note that the prefix used for the target namespace in the schema
does not necessarily correspond to the prefix used in the instance doc-
ument. While the schema in the previous example uses the prefix
prod for the target namespace, a valid instance document could
use prod, foo, or any other prefix, or make that namespace the default.
It is the namespace names that must match, not prefixes.
Schema
composition
56
Chapter
57
58 Chapter 4 | Schema composition
Parents
none
Attribute name Type Description
id ID Unique ID.
version token Version of the
schema document
(not the version of
the XML Schema
language).
xml:lang language Natural language
of the schema
document.
targetNamespace anyURI Namespace to which
all global schema
components belong,
see Section 3.3.1.
attributeFormDefault "qualified" | Whether local at-
"unqualified" : tribute declarations
"unqualified" should use qualified
names, see
Section 7.4.
(Continues)
60 Chapter 4 | Schema composition
Content
(include | import | redefine | 1.1 override | annotation)*,
1.1
defaultOpenContent?, (simpleType | complexType | group |
attributeGroup | element | attribute | notation | annotation)*
4.3 | Combining multiple schema documents 61
As you can see from the content model, there are two distinct sections
of a schema document. At the beginning, you specify all the includes,
imports, redefines, and overrides that are used to refer to other schema
documents. After that come the global, or top-level, components of the
schema, such as elements, attributes, named types, and groups. These
components can appear in the schema document in any order. Anno-
tations can appear at the top level throughout the schema document.
4.3.1 include
An include is used when you want to include other schema documents
in a schema document that has the same target namespace. This pro-
vides for modularization of schema documents. For example, you may
want to break your schema into several documents: two different order
schema documents and a customer schema document. This is depicted
in Figure 4–2.
Parents
schema
The include elements may only appear at the top level of a schema
document, and they must appear at the beginning (along with the
import, redefine, and override elements).
The schemaLocation attribute indicates where the included schema
document is located. This attribute is required, although the location
is not required to be resolvable. However, if it is resolvable, it must be
a complete schema document.
Example 4–1 shows the use of include in a schema document. The
schema author wants to use the type OrderNumType in the number
element declaration. However, OrderNumType is defined in a different
schema document. The include statement references the location of
the schema document, ord2.xsd, that contains the definition of
OrderNumType. In this example, the including document is referring
to a simple type in the included document, but it could similarly refer to
64 Chapter 4 | Schema composition
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">
<xs:include schemaLocation="ord2.xsd"/>
</xs:schema>
ord2.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">
<xs:simpleType name="OrderNumType">
<xs:restriction base="xs:string"/>
</xs:simpleType>
</xs:schema>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">
<xs:include schemaLocation="cust.xsd"/>
(Continues)
66 Chapter 4 | Schema composition
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="CustomerType">
<xs:sequence>
<xs:element name="name" type="CustNameType"/>
<!--...-->
</xs:sequence>
</xs:complexType>
<xs:simpleType name="CustNameType">
<xs:restriction base="xs:string"/>
</xs:simpleType>
</xs:schema>
4.3.2 import
An import is used to tell the processor that you will be referring to
components from other namespaces. For example, if you want to refer-
ence an attribute from another namespace in your complex type defini-
tion, or you want to derive your type from a type in another namespace,
you must import this namespace. This is depicted in Figure 4–3.
4.3 | Combining multiple schema documents 67
Parents
schema
The import elements may only appear at the top level of a schema
document, and must appear at the beginning (along with the include,
redefine, and override elements).
The namespace attribute indicates the namespace that you wish to
import. If you do not specify a namespace, it means that you are im-
porting components that are not in any namespace. The imported
namespace cannot be the same as the target namespace of the importing
schema document. If the importing schema document has no target
namespace, the import element must have a namespace attribute.
The schemaLocation attribute provides a hint to the processor as
to where to find a schema document that declares components for that
namespace. If you do not specify a schemaLocation, it is assumed
that the processor somehow knows where to find the schema document,
perhaps because it was specified by the user or built into the processor.
When schemaLocation is present and the processor is able to resolve
the location to some resource, it must resolve to a schema document.
4.3 | Combining multiple schema documents 69
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod"
targetNamespace="http://datypic.com/ord">
<xs:import namespace="http://datypic.com/prod"
schemaLocation="prod.xsd"/>
(Continues)
70 Chapter 4 | Schema composition
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">
<xs:complexType name="ItemsType">
<xs:sequence>
<xs:element name="product" type="ProductType"/>
</xs:sequence>
</xs:complexType>
<!-- ... -->
</xs:schema>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod"
xmlns:ext="http://datypic.com/ext"
targetNamespace="http://datypic.com/ord">
<xs:import namespace="http://datypic.com/prod"
schemaLocation="prod.xsd"/>
<xs:import namespace="http://datypic.com/ext"
schemaLocation="extension.xsd"/>
prod.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
xmlns:ext="http://datypic.com/ext"
targetNamespace="http://datypic.com/prod">
<xs:import namespace="http://datypic.com/ext"
schemaLocation="extension.xsd"/>
<xs:complexType name="ItemsType">
<xs:sequence>
<!-- ... -->
<xs:element name="extension" type="ext:ExtensionType"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
(Continues)
72 Chapter 4 | Schema composition
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ext"
targetNamespace="http://datypic.com/ext">
<xs:complexType name="ExtensionType">
<!-- ... -->
</xs:complexType>
</xs:schema>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/root"
xmlns:ord="http://datypic.com/ord"
targetNamespace="http://datypic.com/root">
<xs:import namespace="http://datypic.com/ord"
schemaLocation="Summary.xsd"/>
<xs:import namespace="http://datypic.com/ord"
schemaLocation="Detail.xsd"/>
</xs:schema>
Summary.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">
<xs:element name="orderSummary"/>
<!-- ... -->
</xs:schema>
Detail.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">
<xs:element name="orderDetails"/>
<!-- ... -->
</xs:schema>
74 Chapter 4 | Schema composition
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/root"
xmlns:ord="http://datypic.com/ord"
targetNamespace="http://datypic.com/root">
<xs:import namespace="http://datypic.com/ord"
schemaLocation="Orders.xsd"/>
</xs:schema>
Orders.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">
<xs:include schemaLocation="Summary.xsd"/>
<xs:include schemaLocation="Detail.xsd"/>
<!-- ... -->
</xs:schema>
4.4 | Schema assembly considerations 75
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">
<xs:include schemaLocation="ord2.xsd"/>
<xs:element name="order" type="OrderType"/>
</xs:schema>
ord2.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">
<xs:element name="order" type="OrderType"/>
</xs:schema>
It is not illegal for two schema documents to exist that have duplicate
names, since they may be used at different times in different situa-
tions. However, since ord1.xsd includes ord2.xsd, they will be used
76 Chapter 4 | Schema composition
element. The fact that there are unresolved references in the schema is
only an error if such a reference is directly involved in the validation.
78
Chapter
79
80 Chapter 5 | Instances and schemas
1. While any prefix may be mapped to the namespace, this book uses the
prefix xsi as a shorthand, sometimes without explicitly stating that it is
mapped to http://www.w3.org/2001/XMLSchema-instance.
5.2 | Schema processing 81
a relative URI). The processor will retrieve the schema document from
the schema location and make sure that its target namespace matches
that of the namespace it is paired with in xsi:schemaLocation.
Since spaces are used to separate values in this attribute, you
should not have spaces in your schema location path. You can replace
a space with %20, which is standard for URLs. For example, instead of
my schema.xsd, use my%20schema.xsd. To use an absolute path
rather than a relative path, some processors require that you start your
schema location with file:/// (with three forward slashes), as in
file:///C:/Users/PW/Documents/prod.xsd.
If multiple namespaces are used in the document,
xsi:schemaLocation can contain more than one pair of values,
as shown in Example 5–3.
It is not illegal to list two or more pairs of values that refer to the
same namespace. In Example 5–3, you could refer to both ord1.xsd
and ord2.xsd, repeating the same namespace name for each. However,
this is not recommended because many processors will ignore all but the
first schema location for a particular namespace.
It is generally a good practice to use one main schema document
that includes or imports all other schema documents needed for valida-
tion. This simplifies the instance and makes name collisions more
obvious.
The xsi:schemaLocation attribute may appear anywhere in an
instance, in the tags of any number of elements. Its appearance in a
particular tag does not signify its scope. However, it must appear before
any elements that it would validate. It is most typical to put the
xsi:schemaLocation attribute on the root element, for simplicity.
specified, but once again, you should check with your processor
to see what it will accept.
88
Chapter
89
90 Chapter 6 | Element declarations
Parents
schema, 1.1
override
Content
annotation?, (simpleType | complexType)?, 1.1
alternative*,
(key | keyref | unique)*
Example 6–1 shows two global element declarations: name and size.
A complex type is then defined which references these element
declarations by name using the ref attribute.
The qualified names used by global element declarations must be
unique in the schema. This includes not just the schema document in
which they appear, but also any other schema documents that are
used with it.
The name specified in an element declaration must be an XML non-
colonized name, which means that it must start with a letter or under-
score (_), and may only contain letters, digits, underscores (_), hyphens
(-), and periods (.). The qualified element name consists of the target
namespace of the schema document, plus the local name in the decla-
ration. In Example 6–1, the name and size element declarations take
on the target namespace http://datypic.com/prod.
92 Chapter 6 | Element declarations
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element ref="name"/>
<xs:element ref="size" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
Parents
all, choice, sequence
Example 6–3 shows two local element declarations, name and size,
which appear entirely within a complex type definition.
Occurrence constraints (minOccurs and maxOccurs) can
appear in local element declarations. Some attributes, namely
substitutionGroup, final, and abstract, are valid in global
element declarations but not in local element declarations. This is
because these attributes all relate to substitution groups, in which
local element declarations cannot participate.
The name specified in a local element declaration must also be an
XML non-colonized name. If its form is qualified, it takes on the target
6.1 | Global and local element declarations 95
<xs:element name="product">
<xs:complexType>
<xs:sequence>
<xs:element ref="name"/>
<xs:element ref="size"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="anything"/>
# Simple types
# Complex types with simple content
# Complex types with mixed content, if all children are optional
The default or fixed value must be valid for the type of that element.
For example, it is not legal to specify a default value of xyz if the type
of the element is integer.1
The specification of fixed and default values in element declarations
is independent of their occurrence constraints (minOccurs and
maxOccurs). Unlike defaulted attributes, a defaulted element may be
required (i.e., minOccurs in its declaration may be more than 0). If an
element with a default value is required, it may still appear empty and
have its default value filled in.
1. This is not considered an error in the schema, but any instance that relies
on the value would be in error.
6.4 | Default and fixed values 103
whether the value of the element is in fact equivalent to the fixed value,
it takes into account the element’s type.
Table 6–4 shows some valid and invalid instances for elements de-
clared with fixed values. The size element has the type integer, so
all forms of the integer “1” are accepted in the instance, including “01”,
“+1”, and “ 1 ” surrounded by whitespace. Whitespace around a value
is acceptable because the whiteSpace facet value for integer is
collapse, meaning that whitespace is stripped before validation takes
place. A value that contains only whitespace, like <size> </size>,
is not valid because it is not considered empty but also is not equal to 1.
The name element, on the other hand, has the type string. The
string “01” is invalid because it is not considered to be equal to the
string “1”. The string “ 1 ” is also invalid because the whiteSpace
facet value for string is preserve, meaning that the leading and
trailing spaces are kept. For more information on type equality, see
Section 11.7 on p. 253.
<name>1</name> <name>01</name>
<name/> <name>+1</name>
<name></name> <name> 1 </name>
<name> </name>
<name>2</name>
6.5 | Nils and nillability 105
# You can easily turn off default value processing. The default
value for the element will not be added if it is marked as nil.
has an xsi:nil set to true, the default value is not filled in even
though the element is empty.
Elements should not be declared nillable if they will ever be used as
fields in an identity constraint, such as a key or a uniqueness constraint.
See Section 17.7.2 on p. 434 for more information on identity
constraint fields.
Attribute
declarations
112
Chapter
113
114 Chapter 7 | Attribute declarations
and other information that is used by the browser but not di-
rectly by the end user. This is a convenient separation for some
narrative XML vocabularies.
# If you plan to validate using DTDs as well as schemas, you can
perform some minimal type checking on attribute values.
For example, color can be constrained to a certain set of val-
ues. Elements’ values character data content cannot be validated
using DTDs.
# Attributes can be added to the instance by specifying default
values; elements cannot (they must appear to receive a
default value).
# Attributes can be inherited by descendant elements, as described
in Section 7.6 on p. 126.
As you can see, there are many more advantages to using elements
than attributes, but attributes are useful in some cases. A general rec-
ommendation is to use attributes for metadata and elements for data.
For example, use an attribute to describe the units, language, or time
dependence of an element value. Additionally, attributes should be
used for ID and IDREF values as well as XLink expressions. Elements
should be used for everything else.
Parents
schema, 1.1
override
<xs:complexType name="SizeType">
<xs:attribute ref="system" use="required"/>
<xs:attribute ref="dim"/>
</xs:complexType>
</xs:schema>
7.2 | Global and local attribute declarations 117
Parents
complexType, restriction, extension, attributeGroup
<xs:complexType name="SizeType">
<xs:attribute name="system" type="xs:string" use="required"/>
<xs:attribute name="dim" type="xs:integer"/>
</xs:complexType>
</xs:schema>
<xs:attribute name="system">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="US-DRESS"/>
<!--...-->
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="anything"/>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">
<xs:attribute name="global" type="xs:string"/>
<xs:element name="size" type="SizeType"/>
<xs:complexType name="SizeType">
<xs:attribute ref="global"/>
<xs:attribute name="unqual" form="unqualified"/>
<xs:attribute name="qual" form="qualified"/>
<xs:attribute name="unspec"/>
</xs:complexType>
</xs:schema>
Valid instance:
<prod:size xmlns:prod="http://datypic.com/prod"
prod:global="x" unqual="x" prod:qual="x" unspec="x"/>
Table 7–3 describes how attribute default values are inserted in dif-
ferent situations, based on the declaration in Example 7–5. Note that
the only time the default value is inserted is when the attribute is
absent. If the attribute’s value is the empty string, it is left as is. In that
case, if an empty string is not valid for that type, which it is not for
integer, an error is raised. This is different from the behavior of
default values for elements, described in Section 6.4.1 on p. 102.
The system attribute, on the other hand, has the type string. The
string “01” is invalid because it is not considered equal to the string
“1”. The string “ 1 ” is also invalid because the whiteSpace facet value
126 Chapter 7 | Attribute declarations
for string is preserve, meaning that the leading and trailing spaces
are kept. For more information on type equality, please see Section 11.7
on p. 253.
128
Chapter
129
130 Chapter 8 | Simple types
<orderDate>
<year>2001</year>
<month>06</month>
<day>15</day>
</orderDate>
Parents
schema, redefine, 1.1
override
Content
annotation?, (restriction | list | union)
Parents
element, attribute, restriction, list, union, 1.1
alternative
If a type is named, you can also derive new types from it, which is
another way to promote reuse and consistency.
Named types can also make a schema more readable when its type
definitions are complicated.
An anonymous type, on the other hand, can be used only in the ele-
ment or attribute declaration that contains it. It can never be redefined,
overridden, have types derived from it, or be used in a list or union
type. This can seriously limit its reusability, extensibility, and ability
to change over time.
However, there are cases where anonymous types are preferable to
named types. If the type is unlikely to ever be reused, the advantages
listed above no longer apply. Also, there is such a thing as too much
reuse. For example, if an element can contain the values 1 through 10,
it does not make sense to define a type named OneToTenType to be
reused by other unrelated element declarations with the same value
space. If the value space for one of the element declarations using that
named type changes but the other element declarations stay the same,
it actually makes maintenance more difficult, because a new type would
need to be defined at that time.
In addition, anonymous types can be more readable when they are
relatively simple. It is sometimes desirable to have the definition of
the type right there with the element or attribute declaration.
8.3 | Simple type restrictions 135
Simple types may also restrict user-derived simple types that are
defined in the same schema document, or even in a different schema
document. For example, you could further restrict DressSizeType
by defining another simple type, MediumDressSizeType, as shown
in Example 8–4.
A simple type restricts its base type by applying facets to restrict its
values. In Example 8–4, the facets minInclusive and maxInclusive
are used to restrict the value of MediumDressSizeType to be between
8 and 12 inclusive.
Parents
simpleType
The syntax for applying a facet is shown in Table 8–5. All facets
(except assertion) must have a value attribute, which has different
138 Chapter 8 | Simple types
valid values depending on the facet. Most facets may also have a fixed
attribute, as described in Section 8.3.4 on p. 140.
Parents
restriction
Certain facets are not applicable to some types. For example, it does
not make sense to apply the fractionDigits facet to a character
string type. There is a defined set of applicable facets for each of the
built-in types.1 If a facet is applicable to a built-in type, it is also appli-
cable to atomic types that are derived from it. For example, since the
length facet is applicable to string, if you derive a new type from
1. Technically, it is the primitive types that have applicable facets, with the
rest of the built-in types inheriting that applicability from their base types.
However, since most people do not have the built-in type hierarchy
memorized, it is easier to list applicable facets for all the built-in types.
8.3 | Simple type restrictions 139
string, the length facet is also applicable to your new type. Sec-
tion 8.4 on p. 142 describes each of the facets in detail and lists the
built-in types to which the facet can apply.
This rule also applies when you are restricting the built-in types. For
example, the short type has a maxInclusive value of 32767. It is
illegal to define a restriction of short that sets maxInclusive to
32768.
Although enumeration facets can appear multiple times in the same
type definition, they are treated in much the same way. If both a
derived type and its ancestor have a set of enumeration facets, the
values of the derived type must be a subset of the values of the ancestor.
An example of this is provided in Section 8.4.4 on p. 145.
Likewise, the pattern facets specified in a derived type must allow
a subset of the values allowed by the ancestor types. A schema processor
will not necessarily check that the regular expressions represent a subset;
instead, it will validate instances against the patterns of both the derived
type and all the ancestor types, effectively taking the intersection of the
pattern values.
8.4 Facets
8.4.1 Bounds facets
The four bounds facets (minInclusive, maxInclusive,
minExclusive, and maxExclusive) restrict a value to a speci-
fied range. Our previous examples applied minInclusive and
maxInclusive to restrict the value space of DressSizeType. While
minInclusive and maxInclusive specify boundary values that
are included in the valid range, minExclusive and maxExclusive
specify bounds that are excluded from the valid range.
There are several constraints associated with the bounds facets:
The four bounds facets can be applied only to the date/time and
numeric types, and the types derived from them. Special consideration
should be given to time zones when applying bounds facets to date/time
types. For more information, see Section 11.4.15 on p. 235.
8.4 | Facets 143
8.4.4 Enumeration
The enumeration facet allows you to specify a distinct set of
valid values for a type. Unlike most other facets (except pattern and
assertion), the enumeration facet can appear multiple times in a
single restriction. Each enumerated value must be unique, and must
be valid for that type. If it is a string-based or binary type, you may also
specify the empty string in an enumeration value, which allows elements
or attributes of that type to have empty values.
Example 8–9 shows a simple type SMLXSizeType that allows the
values small, medium, large, and extra large.
Note that you need to repeat all of the enumeration values that
apply to the new type. This example is legal because the values for
SMLSizeType (small, medium, and large) are a subset of the values
for SMLXSizeType. By contrast, Example 8–11 attempts to add an
enumeration facet to allow the value extra small. This type defini-
tion is illegal because it attempts to extend rather than restrict the value
space of SMLXSizeType.
extra small to the set of valid values. Union types are described in
detail in Section 10.2 on p. 183.
When enumerating numbers, it is important to remember that the
enumeration facet works on the actual value of the number, not its
lexical representation as it appears in an XML instance. Example 8–13
shows a simple type NewSmallDressSizeType that is based on
integer, and specifies an enumeration of 2, 4, and 6. The two instance
elements shown, which contain 2 and 02, are both valid. This is
because 02 is equivalent to 2 for integer-based types. However, if
the base type of NewSmallDressSizeType had been string, the
<xs:simpleType name="NewSmallDressSizeType">
<xs:restriction base="xs:integer">
<xs:enumeration value="2"/>
<xs:enumeration value="4"/>
<xs:enumeration value="6"/>
</xs:restriction>
</xs:simpleType>
Valid instances:
<size>2</size>
<size>02</size>
148 Chapter 8 | Simple types
value 02 would not be valid, because the strings 2 and 02 are not
the same. If you wish to constrain the lexical representation of a numeric
type, you should apply the pattern facet instead. For more information
on type equality in XML Schema, see Section 11.7 on p. 253.
The enumeration facet can be applied to any type except boolean.
8.4.5 Pattern
The pattern facet allows you to restrict values to a particular pattern,
represented by a regular expression. Chapter 9 provides more detail on
the rules for the regular expression syntax. Unlike most other facets
(except enumeration and assertion), the pattern facet can be
specified multiple times in a single restriction. If multiple pattern
facets are specified in the same restriction, the instance value must
match at least one of the patterns. It is not required to match all of the
patterns.
Example 8–14 shows a simple type DressSizeType that includes
the pattern \d{1,2}, which restricts the size to one or two digits.
8.4.6 Assertion
The assertion facet allows you to specify additional constraints on
values using XPath 2.0. Example 8–17 is a simple type with an asser-
tion, namely that the value must be divisible by 2. It uses a facet named
assertion with a test attribute that contains the XPath expression.
Simple type assertions are a flexible and powerful feature covered in
more detail, along with complex type assertions, in Chapter 14.
1. optional, making the time zone optional (the value for most
built-in date/time types)
2. required, making the time zone required (the value for the
dateTimeStamp built-in type)
3. prohibited, disallowing the time zone
8.4.8 Whitespace
The whiteSpace facet allows you to specify the whitespace normaliza-
tion rules which apply to this value. Unlike the other facets, which re-
strict the value space of the type, the whiteSpace facet is an instruction
to the schema processor on to what to do with whitespace. This type
of facet is known as a prelexical facet because it results in some process-
ing of the value before the other constraining facets are applied. The
valid values for the whiteSpace facet are:
The whitespace processing, if any, will happen first, before any vali-
dation takes place. In Example 8–9, the base type of SMLXSizeType
152 Chapter 8 | Simple types
<size>extra
large</size>
Example 8–20. Valid values for the final attribute in simple type definitions
final="#all"
final="restriction list union"
final="list restriction extension"
final="union"
final=""
158
Chapter
159
160 Chapter 9 | Regular expressions
9.2 Atoms
An atom describes one or more characters. It may be any one of the
following:
a|b|c a, b, c abc
1. Except when they are within square brackets, as described in Section 9.2.4.6
on p. 175.
9.2 | Atoms 163
a *z az, a z, a z, a z a *z
\P{Lu} a, b, c, 1, 2, 3 A , B, C
\p{Nd} 1, 2, 3 a, b, c, A, B, C
\P{Nd} a, b, c, A, B, C 1 , 2, 3
\P{IsBasicLatin} â, ß, ç a , b, c
1. The rules are actually slightly more complex and less strict than this; they
also differ between versions 1.0 and 1.1. However, it is never an error to
escape these characters inside a character class expression.
176 Chapter 9 | Regular expressions
the string ab or the string cd to come before z, you can use the expres-
sion (ab|cd)z. This example makes use of branches, which are
described further in Section 9.4 on p. 177. Table 9–18 shows some
examples of parenthesizing within regular expressions.
9.3 Quantifiers
A quantifier indicates how many times the atom may appear in a
matching string. Table 9–19 lists the quantifiers.
9.4 Branches
As mentioned early in this chapter, a regular expression can consist of
an unlimited number of branches. Branches, separated by the vertical
bar (|) character, represent a choice between several expressions. The
| character does not act on the atom immediately preceding it, but
on the entire expression that precedes it (back to the previous | or an
opening parenthesis). For example, the regular expression true|false
178 Chapter 9 | Regular expressions
180
Chapter
10
181
182 Chapter 10 | Union and list types
or “value space,” for the type is the union of the value spaces of
two or more other simple types. For example, to represent a
dress size, you may define a union type that allows a value to
be either an integer from 2 through 18, or one of the string
values small, medium, or large.
The variety of the resulting type depends on both the derivation type
and the variety of the original type. Table 10–1 shows all possible
combinations of derivation types and original type varieties. The im-
portant thing to understand is that when you restrict, for example, a
list type, the resulting type is still a list type. All the rules for list types,
such as applicable facets, also apply to this new type.
10.2 | Union types 183
Parents
simpleType
The simple types that compose a union type are known as its
member types. Member types must always be simple types; there is no
such thing as a union of complex types. There must be at least one
member type, and there is no limit for how many member types may
be specified.
In Example 10–1, the member types are defined anonymously
within the union, as simpleType children. It is also possible to spec-
ify the member types using a memberTypes attribute of the union el-
ement, as shown in Example 10–2. It is assumed that DressSizeType
and SMLSizeType are defined elsewhere in the schema.
10.2 | Union types 185
Parents
simpleType
Content
annotation?, simpleType?,
(enumeration | pattern | 1.1 assertion)*
Parents
simpleType
Example 10–7 shows a simple type that allows a list of available dress
sizes.
# They are not appropriate for values that may contain whitespace
(see Section 10.3.4 on p. 195).
# If you later wish to expand the values by adding children
or attributes, this will not be possible if you use a
list. For example, if you use markup, you can later add an
attribute to size to indicate the measurement system, such
as <size system="US-DRESS">.
# There is no way to represent nil values.
# There may be limited support for lists in other XML technolo-
gies. For example, individual values in a list cannot be accessed
via XPath 1.0 or XSLT 1.0.
Parents
simpleType
Content
annotation?, simpleType?, (length | minLength | maxLength |
pattern | enumeration | whiteSpace | 1.1 assertion)*
When applying facets to a list type, you do not specify the facets di-
rectly in the list type definition. Instead, you define the list type, then
define a restriction of that list type. This can be done with two separate
named simple types, or it can be accomplished all in one definition as
shown in Example 10–10.
However, this would not behave as you expect. It would restrict the
value of the entire list to only one of the values: small, medium, or
large. Therefore, <availableSizes>small</availableSizes>
would be valid, but <availableSizes>small medium</available-
Sizes> would not. Instead, apply the enumeration to the item type,
as shown in Example 10–13.
There may be cases where you do want to restrict the entire list
to certain values. Example 10–14 shows a list that may only have two
values, as shown.
194 Chapter 10 | Union and list types
<xs:simpleType name="ApplicableSizesType">
<xs:restriction>
<xs:simpleType>
<xs:list itemType="SizeType"/>
</xs:simpleType>
<xs:enumeration value="small medium large"/>
<xs:enumeration value="2 4 6 8 10 12 14 16 18"/>
</xs:restriction>
</xs:simpleType>
Instance:
<xs:simpleType name="AvailableSizesType">
<xs:list itemType="SizeType"/>
</xs:simpleType>
The only restriction on lists of unions is that the union type cannot
have any list types among its member types. That would equate to a
list of lists, which is not legal.
<xs:simpleType name="TwoDimensionalArrayType">
<xs:list itemType="RowType"/>
</xs:simpleType>
Instead, you should put markup around the items in the lists. Exam-
ple 10–21 shows a complex type definition that accomplishes this and
a valid instance.
<xs:complexType name="VectorType">
<xs:sequence maxOccurs="unbounded">
<xs:element name="e" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ArrayType">
<xs:sequence maxOccurs="unbounded">
<xs:element name="r" type="VectorType"/>
</xs:sequence>
</xs:complexType>
Instance:
<array>
<r> <e>1</e> <e>12</e> <e>15</e> </r>
<r> <e>44</e> <e>2</e> <e>3</e> </r>
</array>
198 Chapter 10 | Union and list types
200
Chapter
11
201
202 Chapter 11 | Built-in simple types
this purpose. The character data content of an element of this type will
have its whitespace preserved.
11.2.2 Name
The type Name represents an XML name, which can be used as an ele-
ment name or attribute name, among other things. Values of this type
must start with a letter, underscore (_), or colon (:), and may contain
only letters, digits, underscores (_), colons (:), hyphens (-), and
11.2 | String-based types 209
_my.Element
my-element
The facets indicated in Table 11–5 can restrict Name and its derived
types.
11.2.3 NCName
The type NCName represents an XML non-colonized name, which is
simply a name that does not contain colons. An NCName must start
with either a letter or underscore (_) and may contain only letters,
digits, underscores (_), hyphens (-), and periods (.). This is identical
to the Name type, except that colons are not permitted.
Table 11–6 shows some valid and invalid values of the NCName type.
_my.Element
my-element
The facets indicated in Table 11–7 can restrict NCName and its
derived types.
11.2.4 language
The type language represents a natural language identifier, generally
used to indicate the language of a document or a part of a document.
Before creating a new attribute of type language, consider using the
xml:lang attribute that is intended to indicate the natural language
of the element and its content.
Values of the language type conform to RFC 3066, Tags for
the Identification of Languages, in version 1.0 and to RFC 4646, Tags
for Identifying Languages, and RFC 4647, Matching of Language Tags, in
version 1.1. The three most common formats are:
Any of these three formats may have additional parts, each preceded
by a hyphen, which identify more countries or dialects. Schema proces-
sors will not verify that values of the language type conform to the
above rules. They will simply validate them based on the pattern
specified for this type, which says that it must consist of one or more
parts of up to eight characters each, separated by hyphens.
Table 11–8 shows some valid and invalid values of the
language type.
212 Chapter 11 | Built-in simple types
The facets indicated in Table 11–9 can restrict language and its
derived types.
11.3 | Numeric types 213
one digit in the exponent. No positive signs are included. For example,
the canonical representation of the float value +12 is 12.0E0.
Table 11–10 shows some valid and invalid values of the float and
double types.
4268.22752E11
+24.3e-3
12
+3.5 Any value valid for decimal is also valid for float
and double.
INF Positive infinity.
-INF Negative infinity.
+INF Positive infinity, value allowed in version 1.1 but
1.1
not in 1.0.
+0 Positive 0.
-0 Negative 0.
NaN Not a Number.
Invalid values Comment
-3E2.4 The exponent must be an integer.
12E An exponent must be specified if “E” is present.
Inf Values are case-sensitive and must be capitalized
correctly.
NAN Values are case-sensitive and must be capitalized
correctly.
An empty value is not valid, unless xsi:nil is used.
11.3 | Numeric types 215
11.3.2 decimal
The type decimal represents a decimal number of arbitrary precision.
Schema processors vary in the number of significant digits they support,
but a minimally conforming processor must support at least 16 signifi-
cant digits. The lexical representation of decimal is a sequence of
digits optionally preceded by a sign (“+” or “-”) and optionally contain-
ing a period. If the fractional part is 0 then the period and trailing zeros
may be omitted. Leading and trailing zeros are permitted but not con-
sidered significant. That is, the decimal values 3.0 and 3.0000 are
considered equal.
The canonical representation of decimal always contains a decimal
point. No leading or trailing zeros are present, except that there is always
at least one digit before and after the decimal point. No positive
signs are included.
Table 11–12 shows some valid and invalid values of the
decimal type.
216 Chapter 11 | Built-in simple types
-0.3
The facets indicated in Table 11–13 can restrict decimal and its
derived types.
Table 11–15 shows some valid and invalid values of the integer types.
The facets indicated in Table 11–16 can restrict the integer types
and their derived types.
# You will ever compare two values of that type numerically. For
example, if you compare the quantity 100 to the quantity 99,
you obviously want 100 to be greater. But if you define them
as strings, they will be compared as strings in languages such as
XSLT 2.0 and XQuery, and 100 will be considered less than 99.
# You will ever perform mathematical operations on values of
that type. You might want to double a quantity, but you are
unlikely to want to double a zip code.
# You want to restrict their values’ bounds. For example, you may
require that quantity must be between 0 and 100. While it can
be possible to restrict a string in this way, by applying a pattern,
it is more cumbersome.
11.4.1 date
The type date represents a Gregorian calendar date. The lexical repre-
sentation of date is YYYY-MM-DD where YY represents the year, MM the
month and DD the day. No left truncation is allowed for any part of
the date. To represent years later than 9999, additional digits can be
added to the left of the year value, but extra leading zeros are not per-
mitted. To represent years before 0000, a preceding minus sign (“-”)
is allowed. An optional time zone expression may be added at the end,
as described in Section 11.4.13 on p. 233.
Table 11–17 shows some valid and invalid values of the date type.
in version 1.0.
Invalid values Comment
99-04-12 Left truncation of the century is not allowed.
2004-4-2 Month and day must be two digits each.
2004/04/02 Slashes are not valid separators.
04-12-2004 The value must be in YYYY-MM-DD order.
2004-04-31 The date must be a valid date (April has 30 days).
+2004-04-02 Positive signs are not permitted.
An empty value is not valid, unless xsi:nil is used.
11.4.2 time
The type time represents a time of day. The lexical representation of
time is hh:mm:ss.sss where hh represents the hour, mm the minutes,
and ss.sss the seconds. An unlimited number of additional digits
can be used to increase the precision of fractional seconds if desired.
The time is based on a 24-hour time period, so hours should be repre-
sented as 00 through 24. Either of the values 00:00:00 or 24:00:00
can be used to represent midnight. An optional time zone expression
may be added at the end, as described in Section 11.4.13 on p. 233.
Table 11–18 shows some valid and invalid values of the time type.
11.4 | Date and time types 223
11.4.3 dateTime
The type dateTime represents a specific date and time. The lexical
representation of dateTime is YYYY-MM-DDThh:mm:ss.sss, which
is a concatenation of the date and time forms, separated by a literal
letter T. All of the same rules that apply to the date and time types
are applicable to dateTime as well. An optional time zone expression
may be added at the end, as described in Section 11.4.13 on p. 233.
Table 11–19 shows some valid and invalid values of the
dateTime type.
224 Chapter 11 | Built-in simple types
11.4.4 dateTimeStamp
The type dateTimeStamp represents a specific date and time, but with
a time zone required. It is derived from dateTime and has the same
lexical representation and rules. The only difference is that a value is
required to end in a time zone, as described in Section 11.4.13 on
p. 233.
Table 11–20 shows some valid and invalid values of the
dateTimeStamp type.
11.4 | Date and time types 225
11.4.5 gYear
The type gYear represents a specific Gregorian calendar year. The
letter g at the beginning of most date and time types signifies “Grego-
rian.” The lexical representation of gYear is YYYY. No left truncation
is allowed. To represent years later than 9999, additional digits can be
added to the left of the year value. To represent years before 0000, a
preceding minus sign (“-”) is allowed. An optional time zone expression
may be added at the end, as described in Section 11.4.13 on p. 233.
Table 11–21 shows some valid and invalid values of the gYear type.
11.4.6 gYearMonth
The type gYearMonth represents a specific month of a specific year. The
lexical representation of gYearMonth is YYYY-MM. No left truncation
is allowed on either part. To represent years later than 9999, additional
digits can be added to the left of the year value. To represent years be-
fore 0000, a preceding minus sign (“-”) is permitted. An optional
time zone expression may be added at the end, as described in
Section 11.4.13 on p. 233.
Table 11–22 shows some valid and invalid values of the
gYearMonth type.
11.4.7 gMonth
The type gMonth represents a specific month that recurs every year. It
can be used to indicate, for example, that fiscal year-end processing
occurs in September of every year. To represent a duration in months,
use the duration type instead. The lexical representation of gMonth
is --MM. An optional time zone expression may be added at the end,
as described in Section 11.4.13 on p. 233. No preceding sign is allowed.
Table 11–23 shows some valid and invalid values of the gMonth type.
11.4.8 gMonthDay
The type gMonthDay represents a specific day that recurs every year.
It can be used to say, for example, that your birthday is on the
12th of April every year. The lexical representation of gMonthDay is
--MM-DD. An optional time zone expression may be added at the end,
as described in Section 11.4.13 on p. 233.
Table 11–24 shows some valid and invalid values of the
gMonthDay type.
228 Chapter 11 | Built-in simple types
11.4.9 gDay
The type gDay represents a day that recurs every month. It can be used
to say, for example, that checks are paid on the 5th of each month.
To represent a duration in days, use the duration type instead. The
lexical representation of gDay is ---DD. An optional time zone
expression may be added at the end, as described in Section 11.4.13
on p. 233.
Table 11–25 shows some valid and invalid values of the gDay type.
11.4.10 duration
The type duration represents a duration of time expressed as a number
of years, months, days, hours, minutes, and seconds. The lexical repre-
sentation of duration is PnYnMnDTnHnMnS, where P is a literal value
that starts the expression, nY is the number of years followed by a literal
Y, nM is the number of months followed by a literal M, nD is the number
of days followed by a literal D, T is a literal value that separates the date
and time, nH is the number of hours followed by a literal H, nM is the
number of minutes followed by a literal M, and nS is the number of
seconds followed by a literal S. The following rules apply to duration
values:
11.4.11 yearMonthDuration
The type yearMonthDuration, new in version 1.1, represents a dura-
tion of time expressed as a number of years and months. The lexical
representation of duration is PnYnM, where P is a literal value that
starts the expression, nY is the number of years followed by a literal Y,
and nM is the number of months followed by a literal M.
yearMonthDuration is derived from duration, and all of the same
lexical rules apply.
Table 11–27 shows some valid and invalid values of the
yearMonthDuration type.
11.4.12 dayTimeDuration
The type dayTimeDuration, new in version 1.1, represents a duration
of time expressed as a number of days, hours, minutes, and seconds.
The lexical representation of duration is PnDTnHnMnS, where P is a
literal value that starts the expression, nD is the number of days followed
by a literal D, T is a literal value that separates the date and time, nH is
the number of hours followed by a literal H, nM is the number of minutes
followed by a literal M, and nS is the number of seconds followed by a
literal S.
dayTimeDuration is derived from duration, and all of the same
lexical rules apply.
Table 11–28 shows some valid and invalid values of the
dayTimeDuration type.
11.4 | Date and time types 233
11.4.14 Facets
The facets indicated in Table 11–30 can restrict the date and time types
as well as their derived types.
11.4 | Date and time types 235
11.5.1 ID
The type ID is used for an attribute that uniquely identifies an element
in an XML document. An ID value must conform to the rules for an
NCName, as described in Section 11.2.3 on p. 210. This means that it
must start with a letter or underscore, and can only contain letters,
digits, underscores, hyphens, and periods.
ID values must be unique within an XML instance, regardless of
the attribute’s name or its element name. Example 11–2 is invalid if
attributes custID and orderID are both declared to be of type ID.
The facets indicated in Table 11–31 can restrict ID and its derived
types.
11.5.2 IDREF
The type IDREF is used for an attribute that references an ID. A com-
mon use case for IDREF is to create a cross-reference to a particular
section of a document. Like ID, an IDREF value must be an NCName,
as described in Section 11.2.3 on p. 210.
All attributes of type IDREF must reference an ID in the same XML
document. In Example 11–5, the ref attribute of quote is of type
IDREF, and the id attribute of footnote is of type ID. The instance
contains a reference between them.
238 Chapter 11 | Built-in simple types
<xs:element name="quote">
<xs:complexType>
<!--content model-->
<xs:attribute name="ref" type="xs:IDREF"/>
</xs:complexType>
</xs:element>
<xs:element name="footnote">
<xs:complexType>
<!--content model-->
<xs:attribute name="id" type="xs:ID" use="required"/>
</xs:complexType>
</xs:element>
Instance:
<quote ref="fn1">...</quote>
<footnote id="fn1">...</footnote>
11.5.3 IDREFS
The type IDREFS represents a list of IDREF values separated by
whitespace. There must be at least one IDREF in the list.
Each of the values in an attribute of type IDREFS must reference an
ID in the same XML document. In Example 11–6, the refs attribute
of quote is of type IDREFS, and the id attribute of footnote is of
type ID. The instance contains a reference from the quote element to
two footnote elements, with their IDs (fn1 and fn2) separated by
whitespace.
<xs:element name="quote">
<xs:complexType>
<!--content model-->
<xs:attribute name="refs" type="xs:IDREFS"/>
</xs:complexType>
</xs:element>
<xs:element name="footnote">
<xs:complexType>
<!--content model-->
<xs:attribute name="id" type="xs:ID" use="required"/>
</xs:complexType>
</xs:element>
Instance:
The facets indicated in Table 11–33 can restrict IDREFS and its
derived types.
240 Chapter 11 | Built-in simple types
11.5.4 ENTITY
The type ENTITY represents a reference to an unparsed entity. The
ENTITY type is most often used to include information from another
location that is not in XML format, such as graphics. An ENTITY
value must be an NCName, as described in Section 11.2.3 on p. 210.
An ENTITY value carries the additional constraint that it must match
the name of an unparsed entity in a document type definition (DTD)
for the instance.
Example 11–7 shows an XML document that links product numbers
to pictures of the products. In the schema, the picture element dec-
laration declares an attribute location that has the type ENTITY. In
the instance, each value of the location attribute (in this case,
prod557 and prod563) matches the name of an entity declared in the
internal DTD subset of the instance.
The facets indicated in Table 11–34 can restrict ENTITY and its
derived types.
11.5 | Legacy types 241
<xs:element name="picture">
<xs:complexType>
<xs:attribute name="location" type="xs:ENTITY"/>
</xs:complexType>
</xs:element>
<!--...-->
Instance:
<catalog>
<product>
<number>557</number>
<picture location="prod557"/>
</product>
<product>
<number>563</number>
<picture location="prod563"/>
</product>
</catalog>
11.5.5 ENTITIES
The type ENTITIES represents a list of ENTITY values separated by
whitespace. There must be at least one ENTITY in the list. Each of the
ENTITY values must match the name of an unparsed entity that has
been declared in a document type definition (DTD) for the instance.
Expanding on the example from the previous section, Example 11–8
shows the declaration of an attribute named location that is of type
ENTITIES. In the instance, the location attribute can include a list
of entity names. Each value (in this case there are two: prod557a and
prod557b) matches the name of an entity that is declared in the internal
DTD subset for the instance.
<xs:element name="pictures">
<xs:complexType>
<xs:attribute name="location" type="xs:ENTITIES"/>
</xs:complexType>
</xs:element>
Instance:
<catalog>
<product>
<number>557</number>
<pictures location="prod557a prod557b"/>
</product>
</catalog>
The facets indicated in Table 11–35 can restrict ENTITIES and its
derived types.
11.5 | Legacy types 243
11.5.6 NMTOKEN
The type NMTOKEN represents a single string token. NMTOKEN values
may consist of letters, digits, periods (.), hyphens (-), underscores (_),
and colons (:). They may start with any of these characters. NMTOKEN
has a whiteSpace facet value of collapse, so any leading or trailing
whitespace will be removed. However, no whitespace may appear
within the value itself. Table 11–36 shows some valid and invalid values
of the NMTOKEN type.
The facets indicated in Table 11–37 can restrict NMTOKEN and its
derived types.
11.5.7 NMTOKENS
The type NMTOKENS represents a list of NMTOKEN values separated by
whitespace. There must be at least one NMTOKEN in the list. Table 11–38
shows some valid and invalid values of the NMTOKENS type.
The facets indicated in Table 11–39 can restrict NMTOKENS and its
derived types.
Since NMTOKENS is a list type, restricting an NMTOKENS value with
these facets may not behave as you expect. The facets length,
minLength, and maxLength apply to the number of items in the
NMTOKENS list, not the length of each item. The enumeration facet
applies to the whole list, not the individual items in the list. For more
information, see Section 10.3.3 on p. 190.
11.5 | Legacy types 245
11.5.8 NOTATION
The type NOTATION represents a reference to a notation. A notation is
a method of interpreting XML and non-XML content. For example,
if an element in an XML document contains binary graphics data in
JPEG format, a notation can be declared to indicate that this is JPEG
data. An attribute of type NOTATION can then be used to indicate which
notation applies to the element’s content. A NOTATION value must be
a QName as described in Section 11.6.1 on p. 246.
NOTATION is the only built-in type that cannot be the type of at-
tributes or elements. Instead, you must define a new type that restricts
NOTATION, applying one or more enumeration facets. Each of these
enumeration values must match the name of a declared notation. For
more information on declaring notations and NOTATION-based types,
see Section 19.7 on p. 493.
246 Chapter 11 | Built-in simple types
The facets indicated in Table 11–40 can restrict NOTATION and its
derived types.
The facets indicated in Table 11–42 can restrict QName and its
derived types.
11.6.2 boolean
The type boolean represents logical yes/no values. The valid values
for boolean are true, false, 0, and 1. Values that are capitalized
(e.g., TRUE) or abbreviated (e.g., T) are not valid. Table 11–43 shows
some valid and invalid values of the boolean type.
248 Chapter 11 | Built-in simple types
false
0 false
1 true
The facets indicated in Table 11–44 can restrict boolean and its
derived types.
letters A through F are permitted. For example, 0FB8 and 0fb8 are
two equal hexBinary representations consisting of two octets. The
canonical representation of hexBinary uses only uppercase letters.
The type base64Binary, typically used for embedding images
and other binary content, uses base64 encoding, as described in
RFC 3548. The following rules apply to base64Binary values:
11.6.4 anyURI
The type anyURI represents a Uniform Resource Identifier (URI) ref-
erence. URIs are used to identify resources, and they may be absolute
11.6 | Other types 251
or relative. Absolute URIs provide the entire context for locating a re-
source, such as http://datypic.com/prod.html. Relative URIs
are specified as the difference from a base URI, for example
../prod.html. It is also possible to specify a fragment identifier using
the # character, for example ../prod.html#shirt.
The three previous examples happen to be HTTP URLs (Uniform
Resource Locators), but URIs also encompass URLs of other schemes
(e.g., FTP, gopher, telnet), as well as URNs (Uniform Resource Names).
URIs are not required to be dereferenceable; that is, it is not necessary
for there to be a web page at http://datypic.com/prod.html in
order for this to be a valid URI.
URIs require that some characters be escaped with their hexa-
decimal Unicode code point preceded by the % character. This
includes non-ASCII characters and some ASCII characters including
control characters, space, and certain punctuation characters. For
example, ../édition.html must be represented instead as
../%C3%A9dition.html, with the é escaped as %C3%A9. However,
the anyURI type will accept these characters either escaped or un-
escaped. With the exception of the characters % and #, it will assume
that unescaped characters are intended to be escaped when used in an
actual URI, although the schema processor will do nothing to alter
them. It is valid for an anyURI value to contain a space, but this practice
is strongly discouraged. Spaces should instead be escaped using %20.
For more information on URIs, see RFC 2396, Uniform Resource
Identifiers (URI): Generic Syntax.
Version 1.1 expands the definition of anyURI to include IRfIs, or
Internationalized Resource Identifiers. Compared to URIs, IRIs allow
a much broader range of characters without requiring them to be es-
caped. Since the anyURI type does not require escaping anyway, this
has little practical impact on your schemas. For more information about
IRIs, see RFC 3987, Internationalized Resource Identifiers (IRIs).
Note that when relative URI references such as ../prod are used
as values of anyURI, no attempt is made by the schema processor to
determine or keep track of the base URI to which they may be applied.
For example, it will not attempt to resolve the value relative to the URL
252 Chapter 11 | Built-in simple types
The facets indicated in Table 11–48 can restrict anyURI and its
derived types.
are related to each other by restriction, list, or union can have values
that are equal. For example, the value 2 of type integer and the
value 2 of type positiveInteger are considered equal, since
positiveInteger is derived from integer. Types that are not related
in the hierarchy can never have values that are equal. This means that
an integer value will never equal a string value, even if they are
both 2. This is true of both the built-in and user-derived types.
Example 11–9 illustrates this point.1
1. Assume for this section that there are element declarations with names
that are the same as their type names. For example, <xs:element
name="integer" type="xs:integer"/>.
11.7 | Comparing typed values 255
the token type’s is collapse. The value “ a ” that has the type string
will not equal “ a ” that has the type token, because the leading and
trailing spaces will be stripped for the token value. Example 11–11
illustrates this point.
256
Chapter
12
omplex types are used to define the content model and at-
257
258 Chapter 12 | Complex types
<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
</product>
<color value="blue"/>
Parents
schema, redefine, 1.1
override
Parents
element, 1.1
alternative
Parents
all, choice, sequence
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element ref="number"/>
<xs:element ref="name"/>
<xs:element ref="size" minOccurs="0"/>
<xs:element ref="color" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
<!--...-->
</xs:schema>
Parents
complexType, restriction, extension, group, choice, sequence
<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
</product>
<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<color value="blue"/>
</product>
272 Chapter 12 | Complex types
Parents
complexType, restriction, extension, group, choice, sequence
Example 12–19 shows a choice group that specifies that any one
of the elements (shirt, hat, or umbrella) must appear.
<items>
<hat>...</hat>
</items>
then name, then any number of the properties (such as size or color)
of the product, in any order. Note that the choice group is inside the
sequence group, allowing you to combine the power of both kinds
of model groups.
Parents
complexType, restriction, extension, group
<product>
<name>Short-Sleeved Linen Blouse</name>
<number>557</number>
</product>
1. This is a conflict in version 1.0 only. In version 1.1, the element declaration
has precedence over the wildcard.
12.6 | Using attribute declarations 281
Parents
complexType, restriction, extension, attributeGroup
<xs:complexType name="ProductType">
<xs:sequence>
<!--...-->
</xs:sequence>
<xs:attribute ref="effDate" default="2000-12-31"/>
</xs:complexType>
</xs:schema>
284 Chapter 12 | Complex types
Parents
choice, sequence, 1.1
all, 1.1
openContent, 1.1
defaultOpenContent
<description>
This shirt is the <xhtml:b>best-selling</xhtml:b> shirt in
our catalog! <xhtml:br/> Note: runs large.
</description>
<!--...-->
</catalog>
Parent
complexType, extension, restriction
Parent
schema
Content
annotation?, any
296 Chapter 12 | Complex types
<xs:defaultOpenContent mode="suffix">
<xs:any namespace="##local"/>
</xs:defaultOpenContent>
<xs:complexType name="CatalogType">
<xs:sequence>
<xs:element name="product" type="ProductType"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
298 Chapter 12 | Complex types
Note that the default open content model does not apply to complex
types with simple content, since they do not allow children. By default,
it does not apply to complex types with empty content, either.
However, you can use an appliesToEmpty="true" attribute on
defaultOpenContent to indicate that the default open content
model should apply to complex types with empty content.
Parents
complexType, restriction, extension, attributeGroup
300
Chapter
13
301
302 Chapter 13 | Deriving complex types
Parents
complexType
Parents
complexType
Parents
simpleContent
<xs:complexType name="SizeType">
<xs:simpleContent>
<xs:extension base="xs:integer">
<xs:attribute name="system" type="xs:token"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
Instance:
<size system="US-DRESS">10</size>
13.4 | Complex type extensions 307
Parents
complexContent
<xs:complexType name="ShirtType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:choice maxOccurs="unbounded">
<xs:element name="size" type="xs:integer"/>
<xs:element name="color" type="xs:string"/>
</xs:choice>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="ExpandedItemsType">
<xs:complexContent>
<xs:extension base="ItemsType">
<xs:choice maxOccurs="unbounded">
<xs:element ref="sweater"/>
<xs:element ref="suit"/>
</xs:choice>
</xs:extension>
</xs:complexContent>
</xs:complexType>
310 Chapter 13 | Deriving complex types
<xs:complexType name="ShirtType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:all>
<xs:element name="size" type="xs:integer"/>
<xs:element name="color" type="xs:string"/>
</xs:all>
</xs:extension>
</xs:complexContent>
</xs:complexType>
The effective content model in this case is one big all group, shown
in Example 13–6, not two all groups inside a sequence.
When extending an all group with another all group, both groups
must have the same value for minOccurs (if any). The minOccurs of
the effective resulting group is the minOccurs of both groups. In Ex-
ample 13–5, the value for both groups defaults to 1, so the group shown
in Example 13–6 does also. Alternatively, both of the all groups could
have, for example, minOccurs="0", in which case the effective
minOccurs is 0.
13.4 | Complex type extensions 311
# If openContent is specified for the base type but not the derived
type, the openContent is inherited as is from the base type.
# If openContent is specified for the derived type but not the
base type, it is considered to be added in the derived type.
# If it is specified in both the base type and the derived type, it
must be the same or less restrictive in the derived type. For ex-
ample, if mode is suffix in the base type but interleave in
the derived type, this is legal because it is less constraining. The
opposite is not legal; attempting to turn interleave mode
into suffix mode means creating a more restrictive type. In
addition, the namespace allowances on the derived type must
be the same as, or a superset of, those allowed for the base type.
<xs:complexType name="ShirtType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:openContent mode="suffix">
<xs:any namespace="##any" processContents="lax"/>
</xs:openContent>
<xs:sequence>
<xs:element name="size" type="xs:integer"/>
<xs:element name="color" type="xs:string"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
(Continues)
314 Chapter 13 | Deriving complex types
<xs:complexType name="ItemType">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute ref="xml:lang"/>
</xs:complexType>
(Continues)
13.4 | Complex type extensions 315
Instance:
<product id="prod557"
xml:lang="en"
lang="en"
effDate="2001-04-12"/>
<xs:complexType name="DerivedType">
<xs:complexContent>
<xs:extension base="BaseType">
<xs:anyAttribute processContents="strict"
namespace="##targetNamespace
http://www.w3.org/1999/xhtml"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
Parents
simpleContent
Parents
complexContent
<xs:complexType name="RestrictedProductType">
<xs:complexContent>
<xs:restriction base="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
<xs:sequence>
<xs:sequence>
<xs:element name="a"/>
<xs:element name="b"/>
</xs:sequence>
</xs:sequence>
Legal restriction:
<xs:sequence>
<xs:element name="a"/>
<xs:element name="b"/>
</xs:sequence>
<xs:sequence>
<xs:element name="a" maxOccurs="3"/>
<xs:element name="b" fixed="bValue"/>
<xs:element name="c" type="xs:string"/>
</xs:sequence>
Legal restriction:
<xs:sequence>
<xs:element name="a" maxOccurs="2"/>
<xs:element name="b" fixed="bValue"/>
<xs:element name="c" type="xs:token"/>
</xs:sequence>
Illegal restriction:
<xs:sequence>
<xs:element name="a" maxOccurs="4"/>
<xs:element name="b" fixed="newValue"/>
<xs:element name="c" type="xs:integer"/>
</xs:sequence>
<xs:sequence>
<xs:element name="a"/>
<xs:any namespace="##other" maxOccurs="1"/>
</xs:sequence>
Legal restriction:
<xs:sequence>
<xs:element name="a"/>
<xs:element ref="otherns:b"/>
</xs:sequence>
Illegal restriction:
<xs:sequence>
<xs:element name="a"/>
<xs:element ref="b"/>
<xs:element name="c"/>
</xs:sequence>
Legal restriction:
Illegal restriction:
<xs:sequence>
<xs:element name="a"/>
<xs:element name="b" minOccurs="0"/>
</xs:sequence>
Legal restriction:
<xs:element name="a"/>
13.5 | Complex type restrictions 325
Legal restriction:
Illegal restriction:
When replacing a group with a group of the same kind (all, choice,
or sequence), the order of the children (element declarations
and groups) must be preserved. This is true even for all and choice
groups, when the order is not significant for validation. This is
illustrated in Example 13–21.
<xs:all>
<xs:element name="a"/>
<xs:element name="b" minOccurs="0"/>
<xs:element name="c"/>
</xs:all>
(Continues)
326 Chapter 13 | Deriving complex types
<xs:all>
<xs:element name="a"/>
<xs:element name="c"/>
</xs:all>
Illegal restriction:
<xs:all>
<xs:element name="c"/>
<xs:element name="a"/>
</xs:all>
<xs:all>
<xs:element name="a"/>
<xs:element name="b" minOccurs="0"/>
<xs:element name="c"/>
</xs:all>
Legal restriction:
<xs:all>
<xs:element name="a"/>
<xs:element name="c"/>
</xs:all>
(Continues)
13.5 | Complex type restrictions 327
<xs:all>
<xs:element name="a"/>
<xs:element name="b"/>
</xs:all>
<xs:choice>
<xs:element name="a"/>
<xs:element name="b"/>
<xs:element name="c"/>
</xs:choice>
Legal restriction:
<xs:choice>
<xs:element name="a"/>
<xs:element name="c"/>
</xs:choice>
Illegal restriction:
<xs:choice>
<xs:element name="a"/>
<xs:element name="d"/>
</xs:choice>
328 Chapter 13 | Deriving complex types
<xs:all>
<xs:element name="a"/>
<xs:element name="b" minOccurs="0"/>
<xs:element name="c"/>
</xs:all>
Legal restriction:
<xs:sequence>
<xs:element name="a"/>
<xs:element name="c"/>
</xs:sequence>
Illegal restriction:
<xs:sequence>
<xs:element name="a"/>
<xs:element name="b"/>
<xs:element name="c" minOccurs="2"/>
</xs:sequence>
<xs:choice maxOccurs="2">
<xs:element name="a"/>
<xs:element name="b"/>
<xs:element name="c"/>
</xs:choice>
Legal restriction:
<xs:sequence>
<xs:element name="a"/>
<xs:element name="c"/>
</xs:sequence>
Illegal restriction:
<xs:sequence>
<xs:element name="a"/>
<xs:element name="b"/>
<xs:element name="c"/>
</xs:sequence>
<xs:complexType name="BaseType">
<xs:openContent mode="suffix">
<xs:any namespace="http://datypic.com/prod
http://datypic.com/ord"/>
</xs:openContent>
<xs:sequence>
<xs:element name="a" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
Legal restriction:
<xs:complexType name="LegalDerivedType">
<xs:complexContent>
<xs:restriction base="BaseType">
<xs:openContent mode="suffix">
<xs:any namespace="http://datypic.com/prod"/>
</xs:openContent>
<xs:sequence>
<xs:element name="a" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
Illegal restriction:
<xs:complexType name="IllegalDerivedType">
<xs:complexContent>
<xs:restriction base="BaseType">
<xs:openContent mode="interleave">
<xs:any namespace="##any"/>
</xs:openContent>
<xs:sequence>
<xs:element name="a" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
13.5 | Complex type restrictions 331
<xs:complexType name="RestrictedLetterType">
<xs:simpleContent>
<xs:restriction base="LetterType">
<xs:simpleType>
<xs:restriction base="xs:string"/>
</xs:simpleType>
</xs:restriction>
</xs:simpleContent>
</xs:complexType>
<xs:complexType name="RestrictedItemType">
<xs:complexContent>
<xs:restriction base="ItemType">
<xs:attribute name="routingNum" type="xs:short"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="DerivedType">
<xs:complexContent>
<xs:restriction base="BaseType">
<xs:attribute name="a" type="xs:positiveInteger"/>
<xs:attribute name="b" type="xs:string" default="b"/>
<xs:attribute name="c" type="xs:string" default="c2"/>
<xs:attribute name="d" type="xs:string" fixed="d"/>
<xs:attribute name="e" type="xs:string" fixed="e"/>
<xs:attribute name="f" type="xs:string" use="required"/>
<xs:attribute name="g" type="xs:string" use="prohibited"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="IllegalDerivedType">
<xs:complexContent>
<xs:restriction base="BaseType2">
<xs:attribute name="h" type="xs:decimal"/>
<xs:attribute name="i" type="xs:string" fixed="i2"/>
<xs:attribute name="j" type="xs:string" default="j"/>
<xs:attribute name="k" type="xs:string"/>
<xs:attribute name="l" type="xs:string" use="prohibited"/>
<xs:attribute ref="pref:l"/>
<xs:attribute name="m" type="xs:string"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="DerivedType">
<xs:complexContent>
<xs:restriction base="BaseType">
<xs:anyAttribute processContents="strict"
namespace="##targetNamespace
http://www.w3.org/1999/xhtml"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
13.5 | Complex type restrictions 337
<xs:complexType name="DerivedType">
<xs:complexContent>
<xs:restriction base="BaseType">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="name" type="xs:string"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://datypic.com/prod"
xmlns:prod="http://datypic.com/prod"
elementFormDefault="qualified"
attributeFormDefault="qualified">
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element ref="prod:number"/>
<xs:element ref="prod:name"/>
<xs:element ref="prod:size" minOccurs="0"/>
</xs:sequence>
<xs:attribute ref="prod:dept"/>
</xs:complexType>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer"/>
<xs:attribute name="dept" type="xs:string"/>
</xs:schema>
ord.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod">
<xs:import namespace="http://datypic.com/prod"
schemaLocation="prod.xsd"/>
<xs:complexType name="RestrictedProductType">
<xs:complexContent>
<xs:restriction base="prod:ProductType">
<xs:sequence>
<xs:element ref="prod:number"/>
<xs:element ref="prod:name"/>
</xs:sequence>
<xs:attribute ref="prod:dept" use="required"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:schema>
13.5 | Complex type restrictions 339
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://datypic.com/prod"
elementFormDefault="qualified"
attributeFormDefault="qualified">
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
</xs:complexType>
</xs:schema>
ord.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod"
elementFormDefault="qualified"
attributeFormDefault="qualified">
<xs:import namespace="http://datypic.com/prod"
schemaLocation="prod.xsd"/>
<xs:complexType name="RestrictedProductType">
<xs:complexContent>
<xs:restriction base="prod:ProductType">
<xs:sequence>
<xs:element name="number" type="xs:string"
targetNamespace="http://datypic.com/prod"/>
<xs:element name="name" type="xs:string"
targetNamespace="http://datypic.com/prod"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string" use="required"
targetNamespace="http://datypic.com/prod"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:schema>
13.6 | Type substitution 341
(Continues)
342 Chapter 13 | Deriving complex types
<product xsi:type="ShirtType">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<color>blue</color>
</product>
<!--...-->
</items>
# #all prevents any derived types from substituting for your type
in instances.
# extension prevents any extensions of your type from
substituting for your type in instances.
# restriction prevents any restrictions of your type from
substituting for your type in instances.
1. The finalDefault attribute can contain the values list and union
which are not applicable to complex types. If these values are present, they
are ignored in this context.
13.7 | Controlling type derivation and substitution 345
<xs:complexType name="ShirtType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:choice maxOccurs="unbounded">
<xs:element name="size" type="xs:integer"/>
<xs:element name="color" type="xs:string"/>
</xs:choice>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:element name="shirt" type="ShirtType"/>
<xs:complexType name="ShirtType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:choice maxOccurs="unbounded">
<xs:element name="size" type="xs:integer"/>
<xs:element name="color" type="xs:string"/>
</xs:choice>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:element name="shirt" type="ShirtType"/>
<shirt>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<color>blue</color>
</shirt>
348 Chapter 13 | Deriving complex types
<product xsi:type="ProductType">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
</product>
Assertions
350
Chapter
14
14.1 Assertions
Assertions are defined on types, rather than element or attribute decla-
rations, so they are shared across all elements or attributes that have a
particular type. Example 14–1 shows two types, one simple and one
351
352 Chapter 14 | Assertions
As you can see, two different elements are used: assertion is used
in simple types (and in simple content), and assert is used in complex
types. Both assertion and assert have a test attribute that specifies
an XPath expression. The XPath returns a Boolean (true/false) value.
If the expression is true, the element or attribute is valid with respect
to the assertion. If it is false, it is invalid.
Assertions are specified using XPath 2.0, which is a powerful language
that includes over a hundred built-in functions and many operators.
This chapter describes some of the XPath 2.0 functions, operators, and
expression syntax that are most useful for assertions, but it is by no
means complete. For a complete explanation of all XPath operators
and syntax, you can refer to the XML Path Language (XPath) 2.0
recommendation at www.w3.org/TR/xpath20.
Syntactically, any XPath 2.0 is allowed in an assertion. However,
one limitation of assertions is that your XPath expression has to stay
within the scope of the type itself. It can only access attributes, content,
14.1 | Assertions 353
and descendants of the element that has that type. It cannot access the
parent or other ancestor elements, siblings, separate XML documents,
or any other nondescendant elements. This means that for cross-
element validation, the assertion needs to be specified on an ancestor
type that contains all of the elements or attributes mentioned in the
assertion.
Parents
restriction
The assertion facet can also be used inside the restriction ele-
ment for complex types with simple content, just like any other facet.
Example 14–3 shows two complex types with simple content, one
restricting the other by adding an assertion. However, if you need
to access the attributes of that type in the assertion, you should use an
assert instead, as shown later in Example 14–17.
You can specify multiple assertions on the same simple type, in which
case they must all return true for the element or attribute to be valid.
Values of type DepartmentCodeType in Example 14–4 must be
valid with respect to both specified assertions. Assertions can be com-
bined with other facets, in any order. In fact, it is recommended that
you continue to use other facets if they can express the constraint. For
example, use the length facet as shown rather than an assertion with
a test of string-length($value) = 3.
(Continues)
358 Chapter 14 | Assertions
These functions are all built in, and you do not need to use name-
space prefixes on their names. Your schema processor may support ad-
ditional implementation-defined functions that are in other namespaces.
Typically, in simple type assertions, you will be passing $value as one
of the arguments. Table 14–4 shows some example values for
simple type assertions that use common XPath functions.
Note that the matches function interprets regular expressions
slightly differently from the pattern facet. The value of a pattern
facet is the regular expression for the whole string, with implied anchors
at the beginning and the end. The matches function, on the other
hand, tests whether a string contains any substring that matches the
pattern. To indicate that a pattern should match the start and/or end
14.1 | Assertions 359
of the entire string, anchors ^ (for the start of a string) and $ (for the
end of the string) must be used.
The examples in the table focus on assertions that cannot be
expressed with other facets. For example, to simply test whether a
value starts with ABC, you could use a pattern, as in <xs:pattern
value="ABC.*"/>. However, it usually requires an assertion to express
that a value must not match a pattern or an enumeration, or to
indicate that processing should be case-sensitive.
Some processors will treat these type errors like dynamic errors,
meaning that they are not reported as errors in the schema. Instead,
dynamic errors simply cause the assertion to return false, rendering the
element or attribute in question invalid. Most processors will issue
14.1 | Assertions 361
warnings in these cases, though. XPath syntax errors and other static
errors, on the other hand, will be flagged as errors in the schema by
your processor.
To correct type errors like these, one should consider whether the
simple types are being derived from the correct primitive types to start
with. If you are performing arithmetic operations on a value, perhaps
it should have a numeric type rather than a string type. For these
examples, let’s assume that the primitive types were chosen correctly.
SizeType is really trying to limit the size of the integer. In this case,
it makes sense to change it to use one of the bounds facets to limit
the value of the integer, instead of trying to constrain its string
representation.
For DepartmentCodeType, both operands in the comparison need
to have the same type (or have types derived from each other). You
could convert the $value to a numeric type, but the best approach
here is to put quotes around the 001 to make it a string. Comparing
them as strings takes into account the leading zeroes, which may be
significant in a string-based department code.
For EffectiveDateType, as with the previous example, the
operands need to be of comparable types. We could convert $value
to a string, but then it would compare the values as strings instead of
date/time values, which would mean that time zones may not be taken
into account correctly. Instead, it is preferable to convert the second
operand to a date/time type. This is done in XPath 2.0 using a type
constructor, which is a special kind of function whose name is the ap-
propriate built-in type name. It accepts a single argument, the value to
be converted. For example, xs:dateTime('2000-01-01T12:00:00')
converts the string to a date/time.
Example 14–8 shows our three examples, corrected to reflect the
types of the values.
In addition to the type constructor functions, there is a string
function that converts a value to a string, and a number function that
converts a value to a floating-point number (double). Both of these
functions also take a single argument, the value to be converted.
362 Chapter 14 | Assertions
Table 14–6 shows some additional examples of XPath tests that are
appropriate for list types.
The assertions in Table 14–6 apply to the list as a whole. If you want
to constrain every value in the list, it makes more sense to put the asser-
tion on the item type instead. Example 14–11 is a simple type
SizeType that has one assertion on the item type of the list (testing
that the value is less than 12) and one assertion on the list itself
(testing the number of items in the list).
an assertion on the number element’s simple type would not have access
to the dept attribute since it is out of scope, so the assertion must be
moved up to the product parent.
Parents
complexType, extension, restriction
For the second example in Table 14–8, you might think that you
can use product[number > 500] to test that product numbers are
greater than 500. However, that will return true if there is at least one
product number greater than 500; it does not ensure that all of
the products have a number greater than 500. Using the not function,
as shown in the table, works because it tests that there aren’t any that
are less than 500.
You may have noticed that most of the examples in the table actually
return product elements rather than a Boolean true/false value.
The results of XPaths used in assertions are automatically converted to
a Boolean value. A sequence of one or more elements or attributes is
treated as a “true” value, and an empty sequence (no elements or
attributes) is treated as a “false” value.
(Continues)
14.1 | Assertions 371
(Continues)
372 Chapter 14 | Assertions
element names used in the assertion XPaths are then prefixed with
prod to indicate that they are in that namespace. Otherwise, the
processor would be looking for those elements in no namespace.
for all unprefixed element names that are used in the XPath. As with
regular default namespace declarations, xpathDefaultNamespace
does not affect attribute names.
Example 14–19 uses the xpathDefaultNamespace attribute on
the schema element. This means that the element names number
and size in the XPaths are interpreted as being in the
http://datypic.com/prod namespace. It is not looking for
the dept attribute in that namespace. This is appropriate since the
attributeFormDefault is defaulting to unqualified, meaning
that locally declared attributes are in no namespace.
Instead of containing
a specific namespace name, the
xpathDefaultNamespace attribute can contain one of three special
keywords:
Parents
element
Content
annotation?, (simpleType | complexType)?
14.2 | Conditional type assignment 377
1. The first alternative indicates that if the value of the dept at-
tribute is ACC, the type assigned to the element declaration
is AccessoryType.
2. The second alternative indicates that if the value of the
dept attribute is either WMN or MEN, the type assigned is
ClothingType.
3. The third alternative has no test attribute, indicating that
ProductType is the default type if neither of the two other
alternatives apply.
The processor will run through the alternatives and choose the first
one in order whose test returns true. If none of the tests return true,
and there is a default type specified by an alternative with no test at-
tribute, as there is in Example 14–20, that alternative indicates the type.
It is also possible to use type alternatives even though you have al-
ready declared a type in the usual way, giving element a type attribute
or a simpleType or complexType child. An example is shown in
Example 14–21, where the type attribute is used on element to
assign the type ProductType to the element.
This is saying that ProductType is the type for product unless one
of the alternatives applies. It is similar to the previous example, but
defining it this way comes with the additional constraint that the type
378 Chapter 14 | Assertions
The last example in the table makes use of the integer type con-
structor function to ensure that the two values are being compared as
numbers. Otherwise, they would be compared as strings, and a string
100 is considered to be less than a string 99.
This highlights an important difference between assertions and
conditional type assignment with regard to types in XPath. In assertions,
type information is used in the XPath expressions because there is only
one type to consider. In the case of conditional type assignment, the
type has not even been assigned yet, so it is impossible to determine
the types of the attributes. When num is compared to a literal integer,
as in the second-to-last example, it is automatically converted to an
integer. But when num and maxNum are compared to each other, and
neither has a type, they need to be converted to integers to ensure that
they are compared appropriately.
It doesn’t have to just be the last alternative, with no test, that uses
the error type. It can be used with a test, and as an earlier alternative,
as shown in Example 14–24. This example will raise an error if the
product does not have a dept attribute.
The interesting thing about this example is that although the first
title element does not have a language attribute in its start tag in
the instance, in the XPath expressions it is treated as if it does,
because the attribute is inherited. The instance in Example 14–26 is
valid according to this schema.
Named groups
384
Chapter
15
385
386 Chapter 15 | Named groups
Parents
schema, redefine, 1.1
override
that the group has one child, a sequence, which has no occurrence
constraints on it.
In Example 15–1, the element declarations are local in the group,
as evidenced by the appearance of a name attribute instead of a ref
attribute. It is also possible to use global element declarations, and
then reference them from the named model group, as shown in
Example 15–2.
Note that the type attribute is now in the global element declara-
tion, while minOccurs stays in the reference to the element declaration.
This is the same syntax as that used in complex types to reference
global element declarations. In fact, when a complex type references a
388 Chapter 15 | Named groups
<xs:group name="DescriptionGroup">
<xs:sequence>
<xs:element ref="description"/>
<xs:element ref="comment" minOccurs="0"/>
</xs:sequence>
</xs:group>
</xs:schema>
named model group, it is as if the schema author cut and pasted the
contents of the group element into the complex type definition.
All local element declarations in the group become local to that
complex type.
Whether to use local element declarations in the group depends on
whether you want these element declarations to be local to the complex
type. For a complete discussion of global versus local element declara-
tions, see Section 6.1.3 on p. 95.
Parents
complexType, restriction, extension, sequence, choice, 1.1
all
Example 15–5. Group reference at the top level of the content model
<xs:complexType name="DescriptionType">
<xs:group ref="DescriptionGroup"/>
<xs:attribute ref="xml:lang"/>
</xs:complexType>
In version 1.0, since all groups can only appear at the top level of
a complex type, the only way to reference a named model group that
contains an all group is at the top level, as shown in Example 15–5.
Version 1.1 has relaxed this constraint, and it is possible to reference
a named model group that contains all from another all group,
provided that minOccurs and maxOccurs are 1 on the group reference.
However, it is still not legal to reference such a group from within a
choice or sequence.
392 Chapter 15 | Named groups
<xs:group name="DescriptionGroup">
<xs:sequence>
<xs:element name="description" type="xs:string"/>
<xs:element name="comment" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:group>
</xs:schema>
Parents
schema, redefine, 1.1
override
<xs:attributeGroup name="IdentifierGroup">
<xs:attribute ref="id" use="required"/>
<xs:attribute ref="version"/>
</xs:attributeGroup>
</xs:schema>
Note that the type attribute is now in the global attribute declara-
tion, while the use attribute stays in the reference to the attribute
declaration. This is the same way that complex types reference global
attribute declarations. In fact, when a complex type references an at-
tribute group, it is as if the schema author cut and pasted the contents
of the attribute group definition into the complex type definition. All
15.3 | Attribute groups 395
attributes that are declared locally in the attribute group become local
to that complex type.
Whether to declare attributes locally in the attribute group depends
on whether you want the attributes to be local to the complex type.
For a complete discussion of global versus local attribute declarations,
see Section 7.2.3 on p. 119.
Attribute groups may reference other attribute groups, as described
in the next section. Attribute groups may also contain one attribute
wildcard at the very end, as shown in Example 15–10. Attribute groups
are limited to one attribute wildcard because a complex type cannot
contain more than one attribute wildcard. See Section 12.7.3 on p. 298
for more information.
Parents
complexType, restriction, extension, attributeGroup
(Continues)
398 Chapter 15 | Named groups
<xs:complexType name="ProductType">
<xs:attributeGroup ref="IdentifierGroup"/>
<xs:attributeGroup ref="VersionGroup"/>
</xs:complexType>
</xs:schema>
<xs:attributeGroup name="IdentifierGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="version" type="xs:decimal"/>
</xs:attributeGroup>
</xs:schema>
<xs:attributeGroup name="IdentifierGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="version" type="xs:decimal"/>
</xs:attributeGroup>
<xs:complexType name="ProductType">
<xs:sequence>
<!--...-->
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
</xs:complexType>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
xmlns="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod"
targetNamespace="http://datypic.com/ord">
<xs:import namespace="http://datypic.com/prod"
schemaLocation="prod.xsd"/>
<xs:complexType name="PurchaseOrderType">
<xs:sequence>
<xs:group ref="prod:DescriptionGroup" minOccurs="0"/>
<xs:element ref="items"/>
<!--...-->
</xs:sequence>
<xs:attributeGroup ref="prod:IdentifierGroup"/>
</xs:complexType>
</xs:schema>
prod.xsd
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">
<xs:group name="DescriptionGroup">
<xs:sequence>
<xs:element name="description" type="xs:string"/>
<xs:element name="comment" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:group>
<xs:attributeGroup name="IdentifierGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="version" type="xs:decimal"/>
<xs:anyAttribute namespace="##other"/>
</xs:attributeGroup>
</xs:schema>
The locally declared elements have qualified names, that is, they are
in a namespace and elementFormDefault is set to qualified.
Note that the names of those elements declared in prod.xsd
15.5 | Design hint: Named groups or complex type derivations? 403
# The fragment you want to reuse does not appear first in some
of the types’ content models. This is because extension adds a
derived type’s content model after its base type’s content model
as if they were in a sequence group. In the above example, if
the descriptive information did not come first, it would have
been impossible to use extension.
404 Chapter 15 | Named groups
<xs:complexType name="PurchaseOrderType">
<xs:complexContent>
<xs:extension base="DescribedType">
<xs:sequence>
<xs:element ref="items"/>
<!--...-->
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="ItemsType">
<xs:complexContent>
<xs:extension base="DescribedType">
<xs:sequence>
<xs:element ref="product" maxOccurs="unbounded"/>
<!--...-->
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
406
Chapter
16
407
408 Chapter 16 | Substitution groups
names that indicate the kind of product. Lastly, you may want the
definition to be flexible enough to accept new kinds of products without
altering the original schema. This is a perfect application for substitution
groups.
<!--...-->
</xs:schema>
16.3 | Declaring a substitution group 411
<xs:group name="ProductGroup">
<xs:choice>
<xs:element name="product" type="ProductType"/>
<xs:element name="shirt" type="ShirtType"/>
<xs:element name="hat" type="HatType"/>
<xs:element name="umbrella" type="ProductType"/>
</xs:choice>
</xs:group>
<!--...-->
</xs:schema>
<xs:complexType name="HatType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:sequence>
<xs:element name="size" type="HatSizeType"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="UmbrellaType">
<xs:complexContent>
<xs:extension base="ProductType"/>
</xs:complexContent>
</xs:complexType>
<!--...-->
</xs:schema>
Example 16–8 shows a valid instance for this approach. The product
element is repeated many times, with the xsi:type attribute
distinguishing between the different product types.
The advantage of this approach is that the instance may be easier to
process. A Java program or XSLT stylesheet that handles this instance
can treat all product types the same based on their element name, but
also distinguish between them using the value of xsi:type if necessary.
16.6 | Alternatives to substitution groups 417
Example 16–9 shows four element declarations that control the use
of substitution groups. With this declaration of product, the schema
shown in Example 16–2 would have been illegal, since it attempts to
use the product element declaration as the head of a substitution
group.
422
Chapter
17
423
424 Chapter 17 | Identity constraints
# They are recommended for use only for attributes, not elements.
# They are scoped to the entire document only.
# They are based on one value, as opposed to multifield keys.
# They require ID or IDREF to be the type of the attribute,
precluding data validation of that attribute.
# They are based on string equality, as opposed to value equality.
# They require that the values be based on XML names, meaning
they must start with a letter and can only contain letters, digits,
and a few punctuation marks.
Parents
element
It is valid for two products to have the same number, as long as they
have different effective dates. In other words, we want to validate that
the combinations of number and effDate are unique. Example 17–4
shows the uniqueness constraint that accomplishes this.
Note that this example works because both number and effDate
are subordinate to the product elements. Using the instance in Exam-
ple 17–3, it would be invalid to define a multifield uniqueness con-
straint on the department number and the product number. If you
defined the selector to select all departments, the product/number
field would yield more than one field node per selected node, which is
not permitted. If you defined the selector to select all products, you
would have to access an ancestor node to get the department number,
which is not permitted.
You can get around this by defining two uniqueness constraints: one
in the scope of catalog to ensure that all department numbers are
unique within a catalog, and another in the scope of department
to ensure that all product numbers are unique within a department.
Parents
element
Parents
element
Example 17–7 shows the definition of a key reference and its associ-
ated key. In this example, the number attribute of any child of items
must match a number child of a product element. The meaning of
the XPath syntax will be described in detail later in this chapter.
Note that the key reference field values are not required to be unique;
that is not their purpose. It is valid to have duplicate shirt numbers in
the items section.
As with key and uniqueness constraints, key references can be on
multiple fields. There must be an equal number of fields in the
key reference as there are in the key or uniqueness constraint that it
432 Chapter 17 | Identity constraints
references. The fields are matched in the same order, and they must
have related types.
17.7.1 Selectors
The purpose of a selector is to identify the set of nodes to which the
constraint applies. The selector is relative to the scoping element. In
Example 17–2, our selector was */product. This selects all the
product grandchildren of catalog. There may be other grandchildren
of catalog, or other product elements elsewhere in the document,
but the constraint does not apply to them.
The selector is represented by a selector element, whose syntax is
shown in Table 17–4.
Parents
unique, key, keyref
(Continues)
434 Chapter 17 | Identity constraints
Content
annotation?
17.7.2 Fields
Each field must identify a single node relative to each node selected by
the selector. The key reference in Example 17–7 works because there
can only ever be one number attribute per selected node. In the instance
in Example 17–6, the selector selects three nodes (the three children
of items), and there is only one number attribute per node.
You might have been tempted to define a uniqueness constraint
as shown in Example 17–8. This would not work because the
selector would select one node (the single department element) and
there would be three product/number nodes relative to it.
The elements or attributes that are used as fields must have simple
content and cannot be declared nillable.
Fields are represented by field elements, whose syntax is shown in
Table 17–5.
Parents
unique, key, keyref
Content
annotation?
Parents
element
instead of a name. Note that the two element declarations specify the
same type; this is not a requirement, but it is common since most
identity constraints would only be shared among elements that contain
a similar structure.
<!--...-->
</xs:schema>
<xs:complexType name="RestrictedCatalogListType">
<xs:complexContent>
<xs:restriction base="CatalogListType">
<xs:sequence>
<xs:element name="catalog" type="CatalogType"
maxOccurs="1">
<xs:unique ref="dateAndProdNumKey"/>
</xs:element>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
<!--...-->
</xs:schema>
Redefining and
overriding schema
components
446
Chapter
18
447
448 Chapter 18 | Redefining and overriding schema components
18.1 Redefinition
Redefinition is a way to extend and modify schemas over time while
still reusing the original definitions. It involves defining a new version
of a schema component, with the same name, that replaces the original
definition throughout the schema. This is useful for extending and/or
creating a subset of an existing schema.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="DressSizeType">
<xs:restriction base="xs:integer"/>
</xs:simpleType>
prod2.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">
<xs:redefine schemaLocation="prod1.xsd">
<xs:simpleType name="DressSizeType">
<xs:restriction base="DressSizeType">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="16"/>
</xs:restriction>
</xs:simpleType>
</xs:redefine>
Parents
schema
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="DressSizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="size" type="DressSizeType"/>
</xs:schema>
prod2.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="prod1.xsd">
<xs:simpleType name="DressSizeType">
<xs:restriction base="DressSizeType">
<xs:minInclusive value="2"/>
</xs:restriction>
</xs:simpleType>
</xs:redefine>
<xs:element name="newSize" type="DressSizeType"/>
</xs:schema>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
prod2.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="prod1.xsd">
<xs:complexType name="ProductType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:sequence>
<xs:element name="color" type="xs:string"/>
</xs:sequence>
<xs:attribute name="effDate" type="xs:date"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
</xs:redefine>
</xs:schema>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:group name="DescriptionGroup">
<xs:sequence>
<xs:element name="description" type="xs:string"/>
<xs:element name="comment" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:group>
</xs:schema>
prod2.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="prod1.xsd">
<xs:group name="DescriptionGroup">
<xs:sequence>
<xs:element name="description" type="xs:string"/>
</xs:sequence>
</xs:group>
</xs:redefine>
</xs:schema>
Our example is legal because the comment elements are optional per
the original definition. The exact definition of a legal subset is the same
as that used for complex type restriction. In other words, if a content
model is considered a legal restriction of another content model
(in complex type derivation), it is also a legal subset in the redefinition
of a named model group. See Section 13.5 on p. 316 for the rules of
complex type restriction.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:group name="DescriptionGroup">
<xs:sequence>
<xs:element name="description" type="xs:string"/>
<xs:element name="comment" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:group>
</xs:schema>
prod2.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="prod1.xsd">
<xs:group name="DescriptionGroup">
<xs:sequence>
<xs:group ref="DescriptionGroup"/>
<xs:element name="notes" type="xs:string"/>
</xs:sequence>
</xs:group>
</xs:redefine>
</xs:schema>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:import namespace="http://www.w3.org/XML/1998/namespace"/>
<xs:attributeGroup name="IdentifierGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="version" type="xs:decimal"/>
<xs:attribute ref="xml:lang"/>
</xs:attributeGroup>
</xs:schema>
prod2.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="prod1.xsd">
<xs:attributeGroup name="IdentifierGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="version" type="xs:integer"/>
</xs:attributeGroup>
</xs:redefine>
</xs:schema>
The rules used to define a subset of an attribute group are the same
as those used for attribute restriction in complex type derivation. This
means that you can eliminate optional attributes, make attributes re-
quired, add a fixed value, change default values, or change types to be
more restrictive. Eliminating the xml:lang attribute in Example 18–6
is legal because it is optional (by default) in the original attribute group.
Changing the type of version is legal because integer is a restric-
tion of decimal. See Section 13.5.5 on p. 333 for more information
on attribute restrictions.
458 Chapter 18 | Redefining and overriding schema components
Unlike complex type derivation, however, you must redeclare all at-
tributes you want to appear in the new definition. The attribute decla-
rations will not automatically be copied from the original definition to
the new definition.
If the original definition contains an attribute wildcard, you
may repeat or further restrict the wildcard. Subsetting of attribute
wildcards also follows the rules used in complex type derivation. See
Section 13.5.6 on p. 335 for more information on attribute wildcard
restrictions.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:import namespace="http://www.w3.org/XML/1998/namespace"/>
<xs:attributeGroup name="IdentifierGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="version" type="xs:decimal"/>
<xs:attribute ref="xml:lang"/>
</xs:attributeGroup>
</xs:schema>
prod2.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="prod1.xsd">
<xs:attributeGroup name="IdentifierGroup">
<xs:attributeGroup ref="IdentifierGroup"/>
<xs:attribute name="effDate" type="xs:date"/>
</xs:attributeGroup>
</xs:redefine>
</xs:schema>
18.2 | Overrides 459
18.2 Overrides
The override feature is a convenient way to customize schemas. It in-
volves defining a new version of a schema component, with the same
name, that replaces the original definition throughout the schema. This
is useful when you want to reuse a schema but you want to make some
modifications (minor or major) to the components in that schema
while still preserving the original definitions.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="DressSizeType">
<xs:restriction base="xs:integer"/>
</xs:simpleType>
(Continues)
18.2 | Overrides 461
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">
<xs:override schemaLocation="prod1.xsd">
<xs:simpleType name="DressSizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="16"/>
</xs:restriction>
</xs:simpleType>
</xs:override>
Parents
schema
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="DressSizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="size" type="DressSizeType"/>
</xs:schema>
prod2.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="prod1.xsd">
<xs:simpleType name="DressSizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
</xs:override>
<xs:element name="newSize" type="DressSizeType"/>
</xs:schema>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
prod2.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="prod1.xsd">
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:string"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="color" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:override>
</xs:schema>
466 Chapter 18 | Redefining and overriding schema components
size, and add color. As with simple types, the overriding definition
can be similar to the overridden definition, as it is in this case, but
it can also be completely different.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="description" type="xs:string"/>
<xs:attribute name="version" type="xs:decimal"/>
</xs:schema>
prod2.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="prod1.xsd">
<xs:element name="description" type="DescriptionType"/>
<xs:attribute name="version" type="xs:string" default="1.0"/>
</xs:override>
<xs:complexType name="DescriptionType">
<xs:sequence>
<xs:element name="source" type="xs:string"/>
<xs:element name="content" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:group name="DescriptionGroup">
<xs:sequence>
<xs:element name="description" type="xs:string"/>
<xs:element name="comment" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:group>
<xs:attributeGroup name="IdentifierGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="version" type="xs:decimal"/>
</xs:attributeGroup>
</xs:schema>
prod2.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="prod1.xsd">
<xs:group name="DescriptionGroup">
<xs:sequence>
<xs:element name="description" type="xs:string"/>
</xs:sequence>
</xs:group>
<xs:attributeGroup name="IdentifierGroup">
<xs:attribute name="effDate" type="xs:date"/>
<xs:attribute name="id" type="xs:ID"/>
</xs:attributeGroup>
</xs:override>
</xs:schema>
468 Chapter 18 | Redefining and overriding schema components
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer" minOccurs="0"/>
<xs:element name="name" type="xs:string" minOccurs="0"/>
<xs:element name="size" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
(Continues)
18.3 | Risks of redefines and overrides 469
prod2.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="prod1.xsd">
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="color" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:override>
</xs:schema>
472
Chapter
19
473
474 Chapter 19 | Topics for DTD users
Schema:
Schema:
<xs:element name="price">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:decimal">
<xs:attribute name="currency" type="xs:NMTOKEN"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
476 Chapter 19 | Topics for DTD users
Schema:
<xs:element name="product">
<xs:complexType>
<xs:sequence>
<xs:element ref="number"/>
<xs:element ref="name" maxOccurs="unbounded"/>
<xs:element ref="size" minOccurs="0"/>
<xs:element ref="color" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Table 19–2 shows the mapping between DTD groups and XML
Schema model groups.
(a|b|c) choice
no equivalent all
* 0 unbounded
+ 1 unbounded
? 0 1
Schema:
<xs:element name="el">
<xs:complexType>
<xs:sequence>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="a"/>
<xs:element ref="b"/>
</xs:choice>
<xs:choice minOccurs="0" maxOccurs="1">
<xs:element ref="c"/>
<xs:element ref="d"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>
1. Technically, in DTDs mixed content also refers to element types with just
#PCDATA content, but this case is covered in Sections 19.1.1 on p. 474 and
19.1.2 on p. 475.
19.1 | Element declarations 479
Schema:
<xs:element name="letter">
<xs:complexType mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="custName"/>
<xs:element ref="prodName"/>
</xs:choice>
</xs:complexType>
</xs:element>
Schema:
<xs:element name="color">
<xs:complexType>
<!-- no content model is specified here -->
<xs:attribute name="value" type="xs:NMTOKEN"/>
</xs:complexType>
</xs:element>
480 Chapter 19 | Topics for DTD users
Schema:
<xs:element name="anything">
<xs:complexType mixed="true">
<xs:sequence>
<xs:any minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Schema:
The built-in type token is used as the base type for the restriction,
which will result in whitespace handling identical to that of enumerated
attribute types in DTDs.
Schema:
#IMPLIED use="optional"
"x" default="x"
<!ATTLIST product
id ID #REQUIRED
name CDATA #IMPLIED
type NMTOKEN "PR"
version NMTOKEN #FIXED "A123">
Schema:
<!ELEMENT x %AOrB;>
<!ELEMENT y %AOrB;>
Schema:
<xs:complexType name="AOrBType">
<xs:choice>
<xs:element ref="a"/>
<xs:element ref="b"/>
</xs:choice>
</xs:complexType>
that is used as part of the entire content model in the x element decla-
ration. See Section 15.2 on p. 386 for more information on named
model groups.
Schema:
<xs:group name="AOrBGroup">
<xs:choice>
<xs:element ref="a"/>
<xs:element ref="b"/>
</xs:choice>
</xs:group>
<xs:element name="x">
<xs:complexType>
<xs:sequence>
<xs:group ref="AOrBGroup"/>
<xs:element ref="c"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<!ATTLIST x %HeaderGroup;>
Schema:
<xs:attributeGroup name="HeaderGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="variety" type="xs:NMTOKEN"/>
</xs:attributeGroup>
<xs:element name="x">
<xs:complexType>
<xs:attributeGroup ref="HeaderGroup"/>
</xs:complexType>
</xs:element>
Schema:
<xs:group name="ext">
<xs:sequence/>
</xs:group>
<xs:element name="x">
<xs:complexType>
<xs:sequence>
<xs:element ref="a"/>
<xs:element ref="b"/>
<xs:group ref="ext"/>
</xs:sequence>
</xs:complexType>
</xs:element>
(Continues)
488 Chapter 19 | Topics for DTD users
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="original.xsd">
<xs:group name="ext">
<xs:sequence>
<xs:group ref="ext"/>
<xs:element ref="c"/>
<xs:element ref="d"/>
</xs:sequence>
</xs:group>
</xs:redefine>
</xs:schema>
Schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="original.xsd">
<xs:group name="ext">
<xs:sequence>
<xs:element ref="a"/>
<xs:element ref="b"/>
<xs:element ref="c"/>
<xs:element ref="d"/>
</xs:sequence>
</xs:group>
</xs:override>
</xs:schema>
19.4 | Parameter entities for extensibility 489
Schema:
<xs:element name="x">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element ref="a"/>
<xs:element ref="b"/>
<xs:element ref="ext"/>
</xs:choice>
</xs:complexType>
</xs:element>
<xs:element name="ext" abstract="true" type="xs:string"/>
Schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:include schemaLocation="original.xsd"/>
<xs:element name="c" substitutionGroup="ext"/>
<xs:element name="d" substitutionGroup="ext"/>
</xs:schema>
Schema:
<xs:attributeGroup name="attExt"/>
<xs:element name="x">
<xs:complexType>
<!-- content model here -->
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attributeGroup ref="attExt"/>
</xs:complexType>
</xs:element>
19.4 | Parameter entities for extensibility 491
Schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="original.xsd">
<xs:attributeGroup name="attExt">
<xs:attributeGroup ref="attExt"/>
<xs:attribute name="myAttr" type="xs:NMTOKEN"/>
</xs:attributeGroup>
</xs:redefine>
</xs:schema>
Schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="original.xsd">
<xs:attributeGroup name="attExt">
<xs:attribute name="myAttr" type="xs:NMTOKEN"/>
</xs:attributeGroup>
</xs:override>
</xs:schema>
Schema:
<xs:include schemaLocation="prod.xsd"/>
19.7 | Notations 493
19.7 Notations
Notations are used to indicate the format of non-XML data. For exam-
ple, notations can be declared to indicate whether certain binary
graphics data embedded in a picture element is in JPEG or GIF for-
mat. Notations may describe data embedded in an XML instance, or
data in external files that are linked to the instance through unparsed
entities.
A notation may have a system or public identifier. There are no
standard notation names or identifiers for well-known formats such as
JPEG. Sometimes the identifier points to an application that can be
used to process the format, for example viewer.exe, and other times
it points to documentation about that format. Sometimes it is simply
an abbreviation that can be interpreted by an application. Schema
494 Chapter 19 | Topics for DTD users
Parents
schema, 1.1
override
(Continues)
19.7 | Notations 495
<xs:simpleType name="PictureNotationType">
<xs:restriction base="xs:NOTATION">
<xs:enumeration value="jpeg"/>
<xs:enumeration value="gif"/>
</xs:restriction>
</xs:simpleType>
(Continues)
496 Chapter 19 | Topics for DTD users
<xs:element name="picture">
<xs:complexType>
<xs:attribute name="location" type="xs:ENTITY"/>
</xs:complexType>
</xs:element>
<!--...-->
(Continues)
19.8 | Comments 497
<catalog>
<product>
<number>557</number>
<picture location="prod557"/>
</product>
<product>
<number>563</number>
<picture location="prod563"/>
</product>
</catalog>
19.8 Comments
DTDs often use comments to further explain the declarations they
contain. Schema documents, as XML, can also contain comments.
However, XML Schema also offers an annotation facility that is de-
signed to provide more structured, usable documentation of schema
components. Example 19–26 shows a DTD fragment that has a com-
ment describing a section (CUSTOMER INFORMATION) and two element
declarations with element-specific comments appearing before each one.
The corresponding schema places each of these comments within
an annotation element. The first annotation element, which de-
scribes the section, appears as a direct child of the schema. The element-
specific annotations, on the other hand, are defined entirely within
the element declarations to which they apply. In all three cases,
documentation elements are used, which are designed for human-
readable information. The schema is considerably more verbose than
498 Chapter 19 | Topics for DTD users
Schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:doc="http://datypic.com/doc">
<xs:annotation>
<xs:documentation>
<doc:section>CUSTOMER INFORMATION</doc:section>
</xs:documentation>
</xs:annotation>
<catalog xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="prod.xsd">
<product>
<number>557</number>
<picture location="prod557"/>
</product>
<product>
<number>563</number>
<picture location="prod563"/>
</product>
</catalog>
Two separate validations can take place: one against the DTD and
one against the schema. The DTD validity will be assessed first. This
process will not only validate the instance, but also augment it by re-
solving the entities, filling in attributes’ default values, and normalizing
whitespace in attribute values. Validity according to the schema is then
assessed on the augmented instance. None of the declarations in the
DTD override the declarations in the schema. If there are declarations
for the same element in both the DTD and the schema and these
declarations are conflicting, an element may be DTD-valid but not
schema-valid.
XML information
modeling
500
Chapter
20
501
502 Chapter 20 | XML information modeling
You may continue to use these modeling paradigms along with your
XML application. For example, you may be parsing XML and storing
it in a relational database (this is sometimes known as “shredding”), in
which case you still have a relational model for your data. You may be
processing your XML documents with object-oriented code, so there
still needs to be a correspondence between the XML and the object
model.
Some schema designers choose to maintain these models, such as
UML models, entity-relationship diagrams, and/or supplementary
documentation, alongside the XML Schema. Others rely more heavily
on the XML Schema to represent the entire model. This is convenient
in that there is a one-to-one mapping to the actual XML documents
that are in use. However, it does have some drawbacks in that XML
Schema cannot express every constraint on the data and is somewhat
technology-specific.
Some developers maintain a connection between the models using
toolkits that generate program code or even databases. It is particularly
common to use data binding toolkits to generate object-oriented
classes from schemas. As appropriate, this chapter describes some of
the considerations for designing XML documents to optimize the use
of these toolkits.
20.2 | Relational models 503
where possible. For example, if your corporate data model says that an
Address entity has the properties line1, line2, city, state, and
zip, it makes sense to use the same definitions and names (or the
relevant subset of them) for the elements in your XML messages.
On the other hand, it is best to avoid tightly coupling your XML
messages with any one relational database schema. You might use the
same names and definitions if they are well-designed, but should not,
for example, generate your XML schemas from relational databases or
have your application automatically insert the contents of XML ele-
ments into relational columns of the same name. This would create
too close a relationship between the XML message and the database,
where the message schema would have to change if the database changes.
that in XML (leaving aside the relationships for now) might be as shown
in Example 20–1.
(Continues)
506 Chapter 20 | XML information modeling
20.2.2 Relationships
An entity-relationship model allows entities to be independent of each
other and have relationships to various other entities. Sometimes these
relationships map naturally onto a hierarchical XML model, especially
in the case of XML messages that represent a temporary view on the
data. Sometimes it is more of a challenge to represent relationships
in XML.
(Continues)
20.2 | Relational models 509
(Continues)
20.2 | Relational models 511
In your schema, you can use either ID- and IDREF-typed attributes
or identity constraints to validate the relationship. Identity constraints,
described fully in Chapter 17, use the key and keyref elements. This
is shown in Example 20–4, where the key element defines the unique
identifier of each product. Within it, selector identifies the element
that needs to be unique (the product), and field specifies the
element that contains the unique identifier (the number child).
The keyref element is used to establish the foreign key relationship
from the productRef’s ref attribute to the product element’s number
child. It uses a syntax similar to the key element, except that it also
includes a refer attribute that indicates the key to which it refers.
Compared to the first approach, this type of structure can be harder
to process, either in XPath or in program code generated by data
binding tools. Although the relationship can be expressed and validated
512 Chapter 20 | XML information modeling
using a schema, defining it via the schema identity constraints will not
have any particular representation or meaning in generated class defini-
tions. For example, for a generated Order class, it will not generate a
getProduct method that will go out and get a related Product object,
whereas with the first approach you can simply use a getProduct
method. However, this approach has the advantage of being a lot less
verbose if there is a lot of product information and/or it is repeated
many times.
(Continues)
514 Chapter 20 | XML information modeling
20.3.1 Inheritance
Object-oriented inheritance can be implemented using type derivation
in XML Schema. For example, suppose we want to have separate ele-
ments for three different kinds of products: shirts, hats, and umbrellas.
They have some information in common, such as product number,
name, and description. The rest of their content is specific to their
subclass: Shirts might have a choice of sizes and a fabric. A hat might
20.3 | Modeling object-oriented concepts 515
In XML Schema, you can create a ProductType type, like the one
shown in Example 20–6, that specifies the content common to all
three types. The type can optionally be abstract, meaning that it cannot
be used directly by an element declaration.
You can then derive three new types from ProductType, one for
each kind of product. An example of ShirtType is shown in
Example 20–7.
516 Chapter 20 | XML information modeling
20.3.2 Composition
There is an alternative way to represent the fact that shirts, hats,
and umbrellas have properties in common. Through the use of
named model groups, XML Schema allows you to identify shared
content model fragments. This distinction could be seen as composi-
tion rather than generalization in object-oriented terminology. A shirt
definition is composed of product properties, plus has its own
properties.
Named model groups are described in detail in Chapter 15. Exam-
ple 20–10 shows a named model group, ProductProperties,
with a content model fragment describing all the generic product
information.
Example 20–11. Shirt type that uses the product property group
<xs:complexType name="ShirtType">
<xs:sequence>
<xs:group ref="ProductProperties"/>
<xs:element name="fabric" type="xs:string"/>
<xs:element name="availableSizes"
type="AvailableShirtSizesType"/>
</xs:sequence>
</xs:complexType>
In this case, the message instance will have an extra level of structure
with the productProperties element, as shown in Example 20–13.
the purchase order itself, the state of the purchase order, what action
needs to be performed next with it, the format of the desired response
or acknowledgement, and the location to send the response. Some of
this information will appear in the header. As the message is passed
from service to service, it might accumulate additional information,
such as customer details and more detailed pricing and tax informa-
tion, for each of the ordered items. Modeling all of this information
as a single message to be passed to an operation may not be intuitive
for the average object-oriented designer.
A complete discussion of designing service-oriented architectures
and their contracts is outside the scope of this book, but it is useful to
note several key points related to message design.
The order element shown in Example 20–14 contains all the re-
quired data, but its design has several weaknesses. The first is that it
does not take advantage of reuse opportunities. The structure of the
bill-to and ship-to addresses is the same, but it is defined twice in
the design. The schema describing this document has to declare each
city element twice, each state element twice, and so on. Since the
20.6 | Considerations for a hierarchical model 529
element names are different, any code that handles address information
(for example, to populate it or display it) also has to be written twice,
once for each set of element names.
A better design is shown in Example 20–16, where two intermediate
elements, billToAddress and shipToAddress, have been added to
represent the bill-to and ship-to addresses. The two have identical
children, which means that they can share the same complex type. It
is named AddressType and is shown in Example 20–17 with the re-
vised OrderType, whose elements reference it. AddressType is not
530 Chapter 20 | XML information modeling
only used twice in the revised schema for this message, but may also
be reused in other schemas in other contexts.
flexibility, but can limit validation specificity. One case where this
comes into play is when you have several data items that represent a
particular class of things, but each is a specialization. It is a design de-
cision whether to use element names that represent the overall class or
the specialized subclasses. Using the product example, each product
has a number of features associated with it. Each feature has a name
and a value. One way to represent this is by declaring a different specific
element for each feature. To indicate whether a product is mono-
grammable, you might have a monogrammable element of type
boolean. Example 20–19 shows some product features marked up
with specific elements.
The downside of using these specific element names is that they are
not very flexible. Every time a new feature comes along, which can be
relatively often, a number of changes have to be made. The schema
must be modified to add the new element declaration for the feature.
Applications that use those documents, including any generated code,
must also be changed to handle the new features.
On the other hand, you could use a more generic feature element
that contains the value of the feature, and put the name of the feature
in a name attribute, as shown in Example 20–20.
A product schema that uses a generic feature element is shown in
Example 20–21. Certain fundamental features such as number and
name still have specific elements, because they are common to all
products and are important to validate. Both the value and the name
of the feature are defined as strings.
20.6 | Considerations for a hierarchical model 535
This is far more flexible, in that new features do not require changes
to the schema or the basic structure of the service classes. The only
modification that needs to be made is that the code that creates
feature elements must add one for the new feature.
There is a downside to using generic elements, however. One is that
you cannot specify data types for the values. There is no way in XML
Schema to say “if a feature element’s name attribute is weight, make
the content integer, and if it’s monogrammable, make it boolean.”
In version 1.0, this means that you cannot take advantage of XML
Schema type validation to ensure that the values in the message conform
to, for example, an enumerated list or range of values. This is not an
536 Chapter 20 | XML information modeling
issue when using specific elements because you simply create separate
weight and monogrammable elements with different types.
Another downside to generic elements is that you have no control
over their order or whether they are required or repeating. Using XML
Schema 1.0, you cannot specify that there must be a feature element
whose name is weight. You also cannot specify that there can only be
one feature element whose name is monogrammable. For any
feature name, there can be zero, one, or more values for it, and they
can appear in any order. You could enforce this as part of the applica-
tion, but then it would not be written into the service contract. Again,
this is not a problem when you use specific elements for each feature
because you can use minOccurs and maxOccurs on individual element
declarations to control this.
Here are some considerations on whether to use generic versus specific
elements.
538
Chapter
21
539
540 Chapter 21 | Schema design and documentation
valid or not. Are all of the required elements there, in the right
order? Do they contain valid values according to their data types?
Schema validation does a good job of checking the basic
structure and content of elements.
# A service contract. A schema serves as part of the understanding
between two parties. The document provider and the document
consumer can both use the schema as a machine-enforceable
set of rules describing an interface between two systems or
services.
# Documentation. Schemas are used to document the XML
structure for the developers and end users that will be imple-
menting or using it. Narrative human-readable annotations can
be added to schema components to further document them.
Although schemas themselves are not particularly human-
readable, they can be viewed by less technical users in a graphical
XML editor tool. In addition, there are a number of tools that
will generate HTML documentation from schemas, making
them more easily understood.
# Providing type information. Schemas contain information
about the data types that can affect how the information is
processed. For example, if the schema tells an XSLT 2.0
stylesheet that a value is an integer, it will know to sort it and
compare to other values as an integer instead of a string.
# Assisted editing. For documents that will be hand-modified
by human users, a schema can be used by XML editing software
to provide context-sensitive validation, help, and content
completion.
# Code generation. Schemas are also commonly used, particularly
in web services and other structured data interfaces, to generate
classes and interfaces that read and write the XML message
payloads. When a schema is designed first, classes can be gener-
ated automatically from the schema definitions, ensuring that
they match. Other software artifacts can also be generated from
schemas, for example, data entry forms.
542 Chapter 21 | Schema design and documentation
features, and the types of features change over time as new technology
is developed. When designing a message that incorporates these camera
descriptions, I want enough flexibility to handle variations in feature
types, without having to redesign my message every time a new
feature comes along. On the other hand, I want to be able to accurately
and precisely specify these features.
To allow for total flexibility in the camera features, I could declare
a features element whose type contains an element wildcard, which
means that any well-formed XML is allowed. This would have the ad-
vantage of being extremely versatile and adaptable to change. The
disadvantage is that the message structure is very poorly defined. A
developer trying to write an application to process the message would
have no idea what features to expect and what format they might have.
On the other hand, I can declare highly constrained elements for
each feature, with no opportunity for variation. This has the benefit
of making the features well defined, easy to validate, and much more
predictable. Validation is more effective because certain features can
be required and their values can be constrained by specific data types.
However, the schema is brittle because it must be changed every time
a new feature is introduced. When the schema changes, the applications
that process the documents must also often change.
The ideal design is usually somewhere in the middle. A balanced
approach in the case of the camera features might be to create a repeat-
ing feature element that contains the name of the feature as an
attribute and the value of the feature as its content. This eliminates the
brittleness while still providing a predictable structure for implementers.
21.3.2 Reusability
Reuse is an important goal in the design of any software. Schemas
that reuse XML components across multiple kinds of documents are
easier for developers and users to learn, are more consistent, and save
development and maintenance time that could be spent writing
redundant software components.
544 Chapter 21 | Schema design and documentation
21.3.3.3 Simplicity
It is best to minimize the number of ways a particular type of data or
content can be expressed. Having multiple ways to represent a particular
kind of data or content in your XML documents may seem like a good
idea because it is more flexible. However, allowing too many choices is
confusing to users, puts more of a burden on applications that
process the documents, and can lead to interoperability problems.
21.3 | Schema design goals 547
These considerations are covered in the rest of this chapter and the
next two chapters.
The disadvantage of this approach is that the types are not reusable
by multiple element declarations. Often you will have multiple element
names that have the same structure, such as billingAddress and
shippingAddress with the same address structure. Using this model,
the entire address structure would need to be respecified each time, or
put into a named model group. Anonymous types also cannot be used
in derivation—another form of reuse and sometimes an important ex-
pression of an information model. Although you can reuse the element
declarations, this might mean watered-down element names, such as
address instead of a more specific kind of address. Since elements are
globally declared, it is not possible to have more than one element with
the same name but a different type or other characteristics; all element
names in the entire schema must be unique.
This approach does have some advantages over Russian Doll,
namely that it is more readable and does allow some degree of reuse
through element declarations. Unlike Russian Doll, it does allow the
use of substitution groups, which require global element declarations.
reuse through types, but also allows the flexibility of varying element
names. It maps very cleanly onto an object-oriented model, where the
complex types are analogous to classes and the element declarations
are analogous to instance variables that have that class.
force the uniqueness of element names, this approach is the right choice.
Many standard XML vocabularies use this approach.
Overall, the Garden of Eden and Venetian Blind, depending on your
requirements, are the recommended approaches. The Russian Doll
approach has obvious limitations in terms of reuse, and the Salami
Slice approach does not benefit from the very significant advantages of
named types over anonymous types.
prefixes. In addition, an XML name cannot start with the letters xml
in either upper or lower case.
Names in XML are always case-sensitive, so accountNumber and
AccountNumber are two different element names.
Since schema components have XML names, these name restrictions
apply not only to the element and attribute names that appear in
instances, but also to the names of the types, named model groups, at-
tribute groups, identity constraints, and notations you define in your
schemas.
21.6.2 Separators
If a name is made up of several terms, such as “account number,” you
should decide on a standard way to separate the terms. It can be done
through capitalization (e.g., accountNumber) or through punctuation
(e.g., account-number).
Some programming languages, database management systems, and
other technologies do not allow hyphens or other punctuation in the
names they use. Therefore, if you want to directly match your
element names, for example, with variable names or database column
names, you should use capitalization to separate terms.
If you choose to use capitalization, the next question is whether
to use mixed case (e.g., AccountNumber) or camel case (e.g.,
accountNumber). In some programming languages, it is a convention
to use mixed case for class names and camel case for instance variables.
In XML, this maps roughly to using mixed case for type names and
camel case for element names. This is the convention used in this book.
Regardless of which approach you choose, the most important thing
is being consistent.
In this case, it may be clearer to leave off the prod term on the child
elements, as shown in Example 21–6.
There may be other cases where the object is not so obvious. In Ex-
ample 21–7, there are two names: a customer name and a product
name. If we took out the terms cust and prod, we would not be able
to distinguish between the two names. In this case, it should be left as
shown.
1. For a demanding real-world example, see the DTD for XSD in Appendix A
of www.w3.org/TR/2012/REC-xmlschema11-1-20120405/structures.html.
21.7 | Namespace considerations 565
1. Same namespace: Use the same namespace for all of the schema
documents.
2. Different namespaces: Use multiple namespaces, perhaps a
different one for each schema document.
3. Chameleon namespaces: Use a namespace for the parent
schema document, but no namespaces for the included schema
documents.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/all"
targetNamespace="http://datypic.com/all"
elementFormDefault="qualified">
<xs:include schemaLocation="prod.xsd"/>
<xs:include schemaLocation="cust.xsd"/>
<xs:element name="order" type="OrderType"/>
<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="customer" type="CustomerType"/>
<xs:element name="items" type="ItemsType"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
(Continues)
21.7 | Namespace considerations 567
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/all"
targetNamespace="http://datypic.com/all"
elementFormDefault="qualified">
<xs:complexType name="ItemsType">
<xs:sequence maxOccurs="unbounded">
<xs:element name="product" type="ProductType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
cust.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/all"
targetNamespace="http://datypic.com/all"
elementFormDefault="qualified">
<xs:complexType name="CustomerType">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
have multiple global components with the same name in the same
namespace, so you will have to be careful of name collisions.
This approach assumes that you have control over all the schema
documents. If you are using elements from a namespace over which
you have no control, such as the XHTML namespace, you should use
the approach described in the next section.
This approach is best within a particular application where you have
control over all the schema documents involved.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:prod="http://datypic.com/prod"
xmlns:cust="http://datypic.com/cust"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord"
elementFormDefault="qualified">
<xs:import schemaLocation="prod.xsd"
namespace="http://datypic.com/prod"/>
<xs:import schemaLocation="cust.xsd"
namespace="http://datypic.com/cust"/>
<xs:element name="order" type="OrderType"/>
(Continues)
570 Chapter 21 | Schema design and documentation
prod.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
elementFormDefault="qualified">
<xs:complexType name="ItemsType">
<xs:sequence maxOccurs="unbounded">
<xs:element name="product" type="ProductType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
cust.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/cust"
targetNamespace="http://datypic.com/cust"
elementFormDefault="qualified">
<xs:complexType name="CustomerType">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
imports the other two schema documents, you are not required to
specify xsi:schemaLocation pairs for all three schema documents,
just the “main” one.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord"
elementFormDefault="qualified">
<xs:include schemaLocation="prod.xsd"/>
<xs:include schemaLocation="cust.xsd"/>
<xs:element name="order" type="OrderType"/>
<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="customer" type="CustomerType"/>
<xs:element name="items" type="ItemsType"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
(Continues)
574 Chapter 21 | Schema design and documentation
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:complexType name="ItemsType">
<xs:sequence maxOccurs="unbounded">
<xs:element name="product" type="ProductType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
cust.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:complexType name="CustomerType">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:prod="http://datypic.com/prod"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord"
elementFormDefault="qualified">
<xs:import schemaLocation="prod.xsd"
namespace="http://datypic.com/prod"/>
<xs:element name="order" type="OrderType"/>
<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="items" type="prod:ItemsType"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
prod.xsd:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
elementFormDefault="qualified">
<xs:complexType name="ItemsType">
<xs:sequence maxOccurs="unbounded">
<xs:element name="product" type="ProductType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
omit the attribute. In this case, globally declared elements still must
use qualified element names, hence the use of ord:order in the
instance.
21.8.1 Annotations
Annotations are represented by annotation elements, whose syntax
is shown in Table 21–4. An annotation may appear in almost any
element in the schema, with the exception of annotation itself and
its children, appinfo and documentation. The schema, override,
and redefine elements can contain multiple annotation elements
anywhere among their children. All other elements may only contain
one annotation, and it must be their first child.
Parents
all elements except annotation, appinfo, and documentation
Attribute name Type Description
id ID Unique ID.
Content
(documentation | appinfo)*
Parents
annotation
<xs:simpleType name="CountryType">
<xs:annotation>
<xs:documentation>
<doc:name>Country identifier</doc:name>
<doc:identifier>3166</doc:identifier>
<doc:version>1990</doc:version>
<doc:registrationAuthority>ISO</doc:registrationAuthority>
<doc:definition>A code for the names of countries of the
world.</doc:definition>
<doc:keyword>geopolitical entity</doc:keyword>
<doc:keyword>country</doc:keyword>
<!--...-->
</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:token">
<!--...-->
</xs:restriction>
</xs:simpleType>
</xs:schema>
21.8 | Schema documentation 585
<xs:simpleType name="CountryType">
<xs:annotation>
<xs:documentation>
<doc:author>Priscilla Walmsley</doc:author>
<doc:version>1.1</doc:version>
<doc:since>1.0</doc:since>
<doc:see>
<doc:label>Country Code Listings</doc:label>
<doc:link>http://datypic.com/countries.html</doc:link>
</doc:see>
<doc:deprecated>false</doc:deprecated>
</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:token">
<!--...-->
</xs:restriction>
</xs:simpleType>
</xs:schema>
structured. This means that they can be used, for example, to generate
XHTML documentation for the schema.
<xs:annotation><xs:documentation><sectionHeader>
********* Product-Related Element Declarations ***************
</sectionHeader></xs:documentation></xs:annotation>
<xs:element name="product" type="ProductType"/>
<xs:element name="size" type="SizeType"/>
<xs:annotation><xs:documentation><sectionHeader>
********* Order-Related Element Declarations *****************
</sectionHeader></xs:documentation></xs:annotation>
<xs:element name="order" type="OrderType"/>
<xs:element name="items" type="ItemsType"/>
<!--...-->
</xs:schema>
Parents
annotation
(Continues)
21.8 | Schema documentation 591
594
Chapter
22
595
596 Chapter 22 | Extensibility and reuse
22.1 Reuse
First, let’s talk about reusing schema components exactly as they are.
Later in this chapter, we will look at extending and restricting existing
schema components. The benefits of reuse are numerous.
These are the kinds of data structures that tend to be rewritten over
and over again if there is no plan in place to reuse them. Having one
definition for these low-level components can save a lot of time in
developing and maintaining not only the schema, but the code that
processes and/or generates the messages.
If all of these common components are defined and placed in one
or more separate schema documents, they are easier to reuse than if
they are embedded in another context-specific schema document.
Typically, they are defined as types rather than elements, so that they
can be reused by many element declarations. Example 22–1 shows a
simple common components library.
(Continues)
22.2 | Extending schemas 599
22.2.1 Wildcards
Wildcards are the most straightforward way to define extensible types.
They can be used to allow additional elements and attributes in your
instances. Of the methods of extension discussed in this chapter,
wildcards and open content are the only ones that allow an in-
stance with extensions to validate against the original schema. All the
other methods require defining a new schema for the extensions.
Example 22–2 shows a complex type definition that contains both
an element wildcard (the any element) and an attribute wildcard
Note that the element and attribute declarations are global. This is
necessary so that the processor can find the declarations.
Another approach for “extending” complex types with wildcards is
actually to restrict them. You could define a complex type that restricts
ProductType and includes the declarations of giftWrap and points.
For more information, see Section 13.5.2.3 on p. 322.
The advantage of using wildcards for making types extensible is that
this is very flexible: The instance author is not required to have a
604 Chapter 22 | Extensibility and reuse
The use of the openContent element means that the extension ele-
ments can appear interleaved anywhere in the content. To allow them
to only appear at the end, you can use a mode="suffix" attribute
on openContent. The instance shown in Example 22–6 takes advan-
tage of the open content in the ProductType definition to add an
spc:giftWrap element into the middle of the content.
Example 22–13 shows a valid instance. As you can see, the child el-
ements can appear in any order. In this case, they are all in the same
namespace. It is also possible for substitution element declarations to
be in different namespaces.
the same name as they had in the original definition. However, redefi-
nition can only be done within the same namespace, so it is not appro-
priate for altering schemas over which you have no control. In addition,
redefinition has some risks associated with it, as detailed in Section 18.3
on p. 468.
The original type might look exactly like the one shown in Exam-
ple 22–7, with similar constraints. It must be named, and it should
use a sequence group. Example 22–14 shows a redefinition of
ProductType to add a new element declaration and attribute declara-
tion. It is similar to the definition of the derived type shown in
Example 22–8, with two important differences.
<xs:group name="ProductPropertyGroup">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer" minOccurs="0"/>
</xs:sequence>
</xs:group>
<xs:attributeGroup name="ExtensionGroup"/>
A valid instance would look like the one shown in Example 22–17.
In this case, giftWrap appears as the first child of product.
Example 22–17. Instance using redefined named model group and attribute
group
<order xmlns="http://datypic.com/ord"
xmlns:spc="http://datypic.com/spc">
<product spc:points="100">
<spc:giftWrap>ADULT BDAY</spc:giftWrap>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size>10</size>
</product>
</order>
22.2.7 Overrides
Starting in version 1.1, overrides can be used instead of redefines. In
fact, they are preferred because redefines are deprecated. Overrides
22.2 | Extending schemas 613
Example 22–20. Instance using overridden named model group and attribute
group
<order xmlns="http://datypic.com/ord"
xmlns:spc="http://datypic.com/spc">
<product spc:points="100">
<spc:giftWrap>ADULT BDAY</spc:giftWrap>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size>10</size>
</product>
</order>
Versioning
616
Chapter
23
617
618 Chapter 23 | Versioning
1. Example version numbers start with 2.0 in this chapter to specify the version
of the vocabulary being defined by the schema, to avoid confusion with
the versions of the XML Schema language itself which are 1.0 and 1.1.
23.1 | Schema compatibility 619
For example, suppose you have the complex type definition shown
in Example 23–1. Its version number is 2.0.
Example 23–6 shows version 2.1 of the schema, with a new element
desc and a new attribute dept. This version of the schema also includes
element and attribute wildcards to allow it to be forward-compatible
with version 2.2 of the schema.
624 Chapter 23 | Versioning
The method shown in Examples 23–5 and 23–6 works fine, but
only because size is required. In version 1.0 of XML Schema, if size
were optional, this complex type would violate the Unique Particle
Attribution rule. A processor, upon encountering a size element,
would not know whether to use the size element declaration or
the wildcard to validate it. In version 1.1, this constraint has been
eliminated, and the element declaration will always be used instead of
the wildcard when both might apply.
23.1 | Schema compatibility 625
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
version="2.1">
<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer" minOccurs="0"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
(Continues)
23.2 | Using version numbers 629
<product xmlns="http://datypic.com/prod"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/prod prod.xsd">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
</product>
attribute to indicate that the catalog element declaration and its type
are at version 2.1, while the product element declaration and its
type are at version 2.0.
This may be useful as a way to clearly delineate which components
have changed over multiple versions. However, it does require some
extra management of the components.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">
<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer" minOccurs="0"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
(Continues)
23.2 | Using version numbers 631
<product xmlns="http://datypic.com/prod"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/prod prod_2.1.xsd">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
</product>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">
<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer" minOccurs="0"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
<xs:attribute name="version"/>
</xs:complexType>
</xs:schema>
Instance:
<product xmlns="http://datypic.com/prod"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/prod prod.xsd"
version="2.1">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
</product>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
xmlns="http://datypic.com/prod/2"
targetNamespace="http://datypic.com/prod/2">
<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer" minOccurs="0"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
Instance:
<product xmlns="http://datypic.com/prod/2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/prod/2 prod.xsd">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
</product>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
xmlns="http://datypic.com/prod/2"
targetNamespace="http://datypic.com/prod/2"
version="2.1">
<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer" minOccurs="0"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
<xs:attribute name="version" type="xs:decimal"/>
</xs:complexType>
</xs:schema>
Instance:
<product xmlns="http://datypic.com/prod/2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/prod/2
schemas/prod/2.1/prod.xsd"
version="2.1">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
</product>
itself. This book describes two different versions of XML Schema: 1.0
and 1.1. Depending on which processor you are using, you may be
required to use one version or the other. Unlike some other XML vo-
cabularies, there is no way to indicate in your schema which version
of XML Schema you are using. Instead, this might be a setting that
you pass to your XML Schema processor, or the processor may only
support one of the versions.
This example may seem to violate one of the basic rules of XML
Schema—namely, it has two global element declarations with the same
name. However, the version control attributes have a special power,
signaling to the processor that it should preprocess the schema (using
a process called conditional inclusion) to strip out all the declarations
that don’t apply to the version it is using. It is the output of this pre-
processing that must follow all the rules of XML Schema. In Exam-
ple 23–15, there will never be more than one product declaration in
the schema after preprocessing. However, care must be taken not to
use overlapping values for minVersion and/or maxVersion, lest
duplicate declarations remain after preprocessing.
Unfortunately, this mechanism does not help with the transition
from XML Schema 1.0 to 1.1, because a typical 1.0 processor will not
honor or even know about the minVersion and/or maxVersion
attributes.
This would not have been appropriate for a prelexical facet like
saxon:preprocess, however, because if the facet is simply ignored,
the instruction to turn the value to upper case before validating it would
be skipped. The resulting schema would have been stricter than
intended when using a processor other than Saxon, because lowercase
values would not be allowed.
XSD keywords
648
Appendix
A.1 Elements
Table A–1 all
649
650 Appendix A | XSD keywords
A.2 Attributes
Table A–66 abstract
Table A–86 id
Description Unique ID
Sections various
Elements all XSD elements except documentation and appinfo
Type/valid values ID
690
Appendix
691
692 Appendix B | Built-in simple types
(Continues)
B.1 | Built-in simple types 693
explicitTimezone
fractionDigits
maxExclusive
minExclusive
minInclusive
enumeration
maxInlusive
maxLength
whiteSpace
minLength
totalDigits
assertion
pattern
length
Name
1.1
1.1
explicitTimezone
fractionDigits
maxExclusive
minExclusive
minInclusive
enumeration
maxInlusive
maxLength
whiteSpace
minLength
totalDigits
assertion
pattern
length
Name
1.1
1.1
language A A A C V A A
Numeric types
float A A A A C A A A
double A A A A C A A A
decimal A A A A A A C A A A
integer A A A A A 0 C V A A
long A V V A A 0 C V A A
int A V V A A 0 C V A A
short A V V A A 0 C V A A
byte A V V A A 0 C V A A
positiveInteger A V A A A 0 C V A A
nonPositiveInteger A A V A A 0 C V A A
negativeInteger A A V A A 0 C V A A
nonNegativeInteger A V A A A 0 C V A A
unsignedLong A V V A A 0 C V A A
unsignedInt A V V A A 0 C V A A
unsignedShort A V V A A 0 C V A A
unsignedByte A V V A A 0 C V A A
Date and time types
date A A A A C A A A A
(Continues)
B.2 | Applicability of facets to built-in simple types 697
explicitTimezone
fractionDigits
maxExclusive
minExclusive
minInclusive
enumeration
maxInlusive
maxLength
whiteSpace
minLength
totalDigits
assertion
pattern
Name length
1.1
1.1
time A A A A C A A A A
dateTime A A A A C A A A A
1.1
dateTimeStamp A A A A C A A A V
gYear A A A A C A A A A
gYearMonth A A A A C A A A A
gMonth A A A A C A A A A
gMonthDay A A A A C A A A A
gDay A A A A C A A A A
duration A A A A C A A A
1.1
yearMonthDuration A A A A C A A A
1.1
dayTimeDuration A A A A C A A A
XML DTD types
ID A A A C V A A
IDREF A A A C V A A
IDREFS A V A C A A A
ENTITY A A A C V A A
ENTITIES A V A C A A A
NMTOKEN A A A C V A A
NMTOKENS A V A C A A A
NOTATION A A A C A A A
(Continues)
698 Appendix B | Built-in simple types
explicitTimezone
fractionDigits
maxExclusive
minExclusive
minInclusive
enumeration
maxInlusive
maxLength
whiteSpace
minLength
totalDigits
assertion
pattern
length
Name
1.1
1.1
Other types
QName A A A C A A A
boolean C A A
hexBinary A A A C A A A
base64Binary A A A C A A A
anyURI A A A C A A A
Other varieties
List types A A A C A A A
Union types A A A
A — Facet is applicable to this type.
V — Facet has a value for this type, but it is not fixed.
0 — Facet is applicable to this type, but the only value that can be specified
is 0.
C — Facet is applicable to this type, but the only value that can be specified
is collapse.
Index
699
700 Index
XML Schema 1.1 (cont.) using XPath 2.0 with, 352, 355–365,
defaultOpenContent element in, 367–370, 378–380, 435–440
295–298, 655, 672, 681 wildcards in, 289–291, 293, 683
element declarations in: xpathDefaultNamespace attribute
multiple, 413 in, 60, 373–375, 381, 441, 689
vs. wildcards, 280, 624 XML Schema Instance Namespace, 51,
element wildcards in, 625 79–80, 108
elementFormDefault attribute in, XML Schema Namespace, 50, 97
100 prefixes mapped to, 38, 50–52
explicitTimezone facet in, XML Schema recommendation, 11–14,
137–138, 150, 234, 657, 201
695–698 xml:lang attribute, 59, 120, 211
field element in, 375 syntax of, 678
final attribute in, 418 xmlns attribute, 39
finalDefault attribute of, 153, xmlns prefix, 37, 39
419 xpath attribute, 435
forward compatibility in, 625, syntax of, 689
641–642, 679–680 XPath language, 13
ID type in, 236 and list types, 190
implementation-defined facets in, attributes in, 367, 436–437
155, 642, 645–646, 675 expressions in, 367–369, 435–440
implementation-defined simple types processing relationships in, 511
in, 154, 642–645, 687 unprefixed names in, 440–441
inheritable attributes in, 126–127, wildcards in, 436–437
283, 382–383, 678 XPath 2.0 language
integer values in, 218 comparing types in, 359–362
IRIs in, 251 conditional expressions in, 369–370
namespaces in, 36–37, 43–44, for assertions, 352
289–291, 459–462, 572, 683 functions in, 357–359, 363–364
open content in, 292–298, 311–312, in conditional type assignment,
329–331, 600, 604–605, 619, 378–380
625, 664, 681 operators in, 355–356
overrides in, 33, 459–471, 488, xpathDefaultNamespace attribute
491–492, 572, 581, 585, 600, ##defaultNamespace value of, 375
612–614, 665 ##local value of, 375
primitive types in, 203 ##targetNamespace value of, 374
processContents attribute in, 605 of alternative element, 375, 381
referencing identity constraints in, of assert element, 375
442–444, 660–661, 671 of assertion element, 375
restrictions in, 320 of field element, 375
selector element in, 375 of schema element, 60, 373–375,
substitution groups in, 413–414 441
targetNamespace attribute in, of selector element, 375
339–341, 686 syntax of, 689