0% found this document useful (0 votes)

194 views766 pages

Definitive XML Schema (Walmsley, Priscilla)

Uploaded by

ahquelac

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

194 views766 pages

Definitive XML Schema (Walmsley, Priscilla)

Uploaded by

ahquelac

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 766

Definitive

XML
Schema

Second Edition
The Charles F. Goldfarb
Definitive XML Series
Priscilla Walmsley Dmitry Kirsanov
Definitive XML Schema Second Edition XSLT 2.0 Web Development
Charles F. Goldfarb and Paul Prescod Yuri Rubinsky and Murray Maloney
Charles F. Goldfarb’s XML Handbook™ SGML on the Web:
Fifth Edition Small Steps Beyond HTML
Rick Jelliffe David Megginson
The XML and SGML Cookbook: Structuring XML Documents
Recipes for Structured Information Sean McGrath
Charles F. Goldfarb, Steve Pepper, XML Processing with Python
and Chet Ensign
XML by Example:
SGML Buyer’s Guide: Choosing the Right Building E-commerce Applications
XML and SGML Products and Services
ParseMe.1st:
G. Ken Holman SGML for Software Developers
Definitive XSL-FO Chet Ensign
Definitive XSLT and XPath $GML: The Billion Dollar Secret
Bob DuCharme Ron Turner, Tim Douglass, and
XML: The Annotated Specification Audrey Turner
SGML CD ReadMe.1st:
Truly Donovan SGML for Writers and Editors
Industrial-Strength SGML: Charles F. Goldfarb and
An Introduction to Enterprise Publishing Priscilla Walmsley
Lars Marius Garshol XML in Office 2003:
Definitive XML Application Development Information Sharing with Desktop XML
JP Morgenthal with Bill la Forge Michael Floyd
Enterprise Application Integration with Building Web Sites with XML
XML and Java Fredrick Thomas Martin
Michael Leventhal, David Lewis, and TOP SECRET Intranet:
Matthew Fuchs How U.S. Intelligence Built Intelink—The
Designing XML Internet Applications World’s Largest, Most Secure Network
Adam Hocek and David Cuddihy J. Craig Cleaveland
Definitive VoiceXML Program Generators with XML and Java

About the Series Author

Charles F. Goldfarb is the father of XML technology. He invented SGML, the Standard
Generalized Markup Language on which both XML and HTML are based. You can ﬁnd
him on the Web at: www.xmlbooks.com.

About the Series Logo

The rebus is an ancient literary tradition, dating from 16th century Picardy, and is especially
appropriate to a series involving ﬁne distinctions between markup and text, metadata and
data. The logo is a rebus incorporating the series name within a stylized XML comment
declaration.
Definitive
XML
Schema
Second Edition

Priscilla Walmsley

Upper Saddle River, NJ • Boston • Indianapolis • San Francisco

New York • Toronto • Montreal • London • Munich • Paris • Madrid
Cape Town • Sydney • Tokyo • Singapore • Mexico City
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in this book, and the publisher was
aware of a trademark claim, the designations have been printed with initial capital letters or
in all capitals.

The author and publisher have taken care in the preparation of this book, but make no
expressed or implied warranty of any kind and assume no responsibility for errors or omissions.
No liability is assumed for incidental or consequential damages in connection with or arising
out of the use of the information or programs contained herein.

Titles in this series are produced using XML, SGML, and/or XSL. XSL-FO documents are
rendered into PDF by the XEP Rendering Engine from RenderX: www.renderx.com.

The publisher offers excellent discounts on this book when ordered in quantity for bulk
purchases or special sales, which may include electronic versions and/or custom covers and
content particular to your business, training goals, marketing focus, and branding interests.
For more information, please contact:

U.S. Corporate and Government Sales

(800) 382–3419
[email protected]

For sales outside the United States, please contact:

International Sales
[email protected]

Visit us on the Web: informit.com/ph

Library of Congress Cataloging-in-Publication Data is on ﬁle

Copyright © 2013 Pearson Education, Inc.

All rights reserved. Printed in the United States of America. This publication is protected by
copyright, and permission must be obtained from the publisher prior to any prohibited
reproduction, storage in a retrieval system, or transmission in any form or by any means,
electronic, mechanical, photocopying, recording, or likewise. To obtain permission to use
material from this work, please submit a written request to Pearson Education, Inc.,
Permissions Department, One Lake Street, Upper Saddle River, New Jersey 07458, or you
may fax your request to (201) 236–3290.

ISBN-13: 978-0-132-88672-7
ISBN-10: 0-132-88672-3
Text printed in the United States on recycled paper at Edwards Brothers Malloy in Ann
Arbor, MI.
First printing: September 2012

Editor-in-Chief: Mark L. Taub

Managing Editor: Kristy Hart
Book Packager: Alina Kirsanova
Cover Designer: Alan Clements
To Doug, my SH
This page intentionally left blank
Overview

Chapter 1 Schemas: An introduction 2

Chapter 2 A quick tour of XML Schema 16
Chapter 3 Namespaces 34
Chapter 4 Schema composition 56
Chapter 5 Instances and schemas 78
Chapter 6 Element declarations 88
Chapter 7 Attribute declarations 112
Chapter 8 Simple types 128
Chapter 9 Regular expressions 158
Chapter 10 Union and list types 180
Chapter 11 Built-in simple types 200
Chapter 12 Complex types 256
Chapter 13 Deriving complex types 300
Chapter 14 Assertions 350
Chapter 15 Named groups 384
Chapter 16 Substitution groups 406
Chapter 17 Identity constraints 422

vii
viii Overview

Chapter 18 Redeﬁning and overriding schema

components 446
Chapter 19 Topics for DTD users 472
Chapter 20 XML information modeling 500
Chapter 21 Schema design and documentation 538
Chapter 22 Extensibility and reuse 594
Chapter 23 Versioning 616
Appendix A XSD keywords 648
Appendix B Built-in simple types 690
Contents

Foreword xxxi
Acknowledgments xxxiii
How to use this book xxxv
Chapter 1 Schemas: An introduction 2
1.1 What is a schema? 3
1.2 The purpose of schemas 5
1.2.1 Data validation 5
1.2.2 A contract with trading partners 5
1.2.3 System documentation 6
1.2.4 Providing information to processors 6
1.2.5 Augmentation of data 6
1.2.6 Application information 6
1.3 Schema design 7
1.3.1 Accuracy and precision 7
1.3.2 Clarity 8
1.3.3 Broad applicability 8

ix
x Contents

1.4 Schema languages 9

1.4.1 Document Type Definition (DTD) 9
1.4.2 Schema requirements expand 10
1.4.3 W3C XML Schema 11
1.4.4 Other schema languages 12
1.4.4.1 RELAX NG 12
1.4.4.2 Schematron 13
Chapter 2 A quick tour of XML Schema 16
2.1 An example schema 17
2.2 The components of XML Schema 18
2.2.1 Declarations vs. definitions 18
2.2.2 Global vs. local components 19
2.3 Elements and attributes 20
2.3.1 The tag/type distinction 20
2.4 Types 21
2.4.1 Simple vs. complex types 21
2.4.2 Named vs. anonymous types 22
2.4.3 The type definition hierarchy 22
2.5 Simple types 23
2.5.1 Built-in simple types 23
2.5.2 Restricting simple types 24
2.5.3 List and union types 24
2.6 Complex types 25
2.6.1 Content types 25
2.6.2 Content models 26
2.6.3 Deriving complex types 27
2.7 Namespaces and XML Schema 28
Contents xi

2.8 Schema composition 29

2.9 Instances and schemas 30
2.10 Annotations 31
2.11 Advanced features 32
2.11.1 Named groups 32
2.11.2 Identity constraints 32
2.11.3 Substitution groups 32
2.11.4 Redeﬁnition and overriding 33
2.11.5 Assertions 33
Chapter 3 Namespaces 34
3.1 Namespaces in XML 35
3.1.1 Namespace names 36
3.1.2 Namespace declarations and preﬁxes 37
3.1.3 Default namespace declarations 39
3.1.4 Name terminology 40
3.1.5 Scope of namespace declarations 41
3.1.6 Overriding namespace declarations 42
3.1.7 Undeclaring namespaces 43
3.1.8 Attributes and namespaces 44
3.1.9 A summary example 46
3.2 The relationship between namespaces and
schemas 48
3.3 Using namespaces in schemas 48
3.3.1 Target namespaces 48
3.3.2 The XML Schema Namespace 50
3.3.3 The XML Schema Instance Namespace 51
3.3.4 The Version Control Namespace 51
xii Contents

3.3.5 Namespace declarations in schema documents 52

3.3.5.1 Map a prefix to the XML Schema Namespace 52
3.3.5.2 Map a prefix to the target namespace 53
3.3.5.3 Map prefixes to all namespaces 54
Chapter 4 Schema composition 56
4.1 Modularizing schema documents 57
4.2 Defining schema documents 58
4.3 Combining multiple schema documents 61
4.3.1 include 62
4.3.1.1 The syntax of includes 63
4.3.1.2 Chameleon includes 65
4.3.2 import 66
4.3.2.1 The syntax of imports 67
4.3.2.2 Multiple levels of imports 70
4.3.2.3 Multiple imports of the same namespace 72
4.4 Schema assembly considerations 75
4.4.1 Uniqueness of qualified names 75
4.4.2 Missing components 76
4.4.3 Schema document defaults 77
Chapter 5 Instances and schemas 78
5.1 Using the instance attributes 79
5.2 Schema processing 81
5.2.1 Validation 81
5.2.2 Augmenting the instance 82
5.3 Relating instances to schemas 83
5.3.1 Using hints in the instance 84
5.3.1.1 The xsi:schemaLocation attribute 84
5.3.1.2 The xsi:noNamespaceSchemaLocation attribute 86
5.4 The root element 87
Contents xiii

Chapter 6 Element declarations 88

6.1 Global and local element declarations 89
6.1.1 Global element declarations 89
6.1.2 Local element declarations 93
6.1.3 Design hint: Should I use global or local element
declarations? 95
6.2 Declaring the types of elements 96
6.3 Qualified vs. unqualified forms 98
6.3.1 Qualified local names 98
6.3.2 Unqualified local names 98
6.3.3 Using elementFormDefault 99
6.3.4 Using form 100
6.3.5 Default namespaces and unqualified names 101
6.4 Default and fixed values 101
6.4.1 Default values 102
6.4.2 Fixed values 103
6.5 Nils and nillability 105
6.5.1 Using xsi:nil in an instance 108
6.5.2 Making elements nillable 109
Chapter 7 Attribute declarations 112
7.1 Attributes vs. elements 113
7.2 Global and local attribute declarations 115
7.2.1 Global attribute declarations 115
7.2.2 Local attribute declarations 117
7.2.3 Design hint: Should I use global or local attribute
declarations? 119
7.3 Declaring the types of attributes 120
xiv Contents

7.4 Qualiﬁed vs. unqualiﬁed forms 122

7.5 Default and ﬁxed values 123
7.5.1 Default values 124
7.5.2 Fixed values 125
7.6 Inherited attributes 126

Chapter 8 Simple types 128

8.1 Simple type varieties 129
8.1.1 Design hint: How much should I break down my data
values? 130
8.2 Simple type definitions 131
8.2.1 Named simple types 131
8.2.2 Anonymous simple types 132
8.2.3 Design hint: Should I use named or anonymous types? 133
8.3 Simple type restrictions 135
8.3.1 Defining a restriction 136
8.3.2 Overview of the facets 137
8.3.3 Inheriting and restricting facets 139
8.3.4 Fixed facets 140
8.3.4.1 Design hint: When should I fix a facet? 141
8.4 Facets 142
8.4.1 Bounds facets 142
8.4.2 Length facets 143
8.4.2.1 Design hint: What if I want to allow empty values? 143
8.4.2.2 Design hint: What if I want to restrict the length of an integer? 144
8.4.3 totalDigits and fractionDigits 145
8.4.4 Enumeration 145
8.4.5 Pattern 148
8.4.6 Assertion 150
Contents xv

8.4.7 Explicit Time Zone 150

8.4.8 Whitespace 151
8.5 Preventing simple type derivation 152
8.6 Implementation-defined types and facets 154
8.6.1 Implementation-defined types 154
8.6.2 Implementation-defined facets 155
Chapter 9 Regular expressions 158
9.1 The structure of a regular expression 159
9.2 Atoms 161
9.2.1 Normal characters 162
9.2.2 The wildcard escape character 164
9.2.3 Character class escapes 164
9.2.3.1 Single-character escapes 165
9.2.3.2 Multicharacter escapes 166
9.2.3.3 Category escapes 167
9.2.3.4 Block escapes 170
9.2.4 Character class expressions 171
9.2.4.1 Listing individual characters 171
9.2.4.2 Specifying a range 172
9.2.4.3 Combining individual characters and ranges 173
9.2.4.4 Negating a character class expression 173
9.2.4.5 Subtracting from a character class expression 174
9.2.4.6 Escaping rules for character class expressions 175
9.2.5 Parenthesized regular expressions 175
9.3 Quantifiers 176
9.4 Branches 177

Chapter 10 Union and list types 180

10.1 Varieties and derivation types 181
10.2 Union types 183
xvi Contents

10.2.1 Deﬁning union types 183

10.2.2 Restricting union types 185
10.2.3 Unions of unions 186
10.2.4 Specifying the member type in the instance 187
10.3 List types 188
10.3.1 Deﬁning list types 188
10.3.2 Design hint: When should I use lists? 189
10.3.3 Restricting list types 190
10.3.3.1 Length facets 192
10.3.3.2 Enumeration facet 192
10.3.3.3 Pattern facet 194
10.3.4 Lists and strings 195
10.3.5 Lists of unions 196
10.3.6 Lists of lists 196
10.3.7 Restricting the item type 198
Chapter 11 Built-in simple types 200
11.1 The XML Schema type system 201
11.1.1 The type hierarchy 202
11.1.2 Value spaces and lexical spaces 204
11.1.3 Facets and built-in types 204
11.2 String-based types 205
11.2.1 string, normalizedString, and token 205
11.2.1.1 Design hint: Should I use string, normalizedString,
or token? 207
11.2.2 Name 208
11.2.3 NCName 210
11.2.4 language 211
11.3 Numeric types 213
Contents xvii

11.3.1 float and double 213

11.3.2 decimal 215
11.3.3 Integer types 217
11.3.3.1 Design hint: Is it an integer or a string? 220
11.4 Date and time types 221
11.4.1 date 221
11.4.2 time 222
11.4.3 dateTime 223
11.4.4 dateTimeStamp 224
11.4.5 gYear 225
11.4.6 gYearMonth 226
11.4.7 gMonth 227
11.4.8 gMonthDay 227
11.4.9 gDay 228
11.4.10 duration 229
11.4.11 yearMonthDuration 231
11.4.12 dayTimeDuration 232
11.4.13 Representing time zones 233
11.4.14 Facets 234
11.4.15 Date and time ordering 235
11.5 Legacy types 236
11.5.1 ID 236
11.5.2 IDREF 237
11.5.3 IDREFS 239
11.5.4 ENTITY 240
11.5.5 ENTITIES 242
11.5.6 NMTOKEN 243
xviii Contents

11.5.7 NMTOKENS 244

11.5.8 NOTATION 245
11.6 Other types 246
11.6.1 QName 246
11.6.2 boolean 247
11.6.3 The binary types 248
11.6.4 anyURI 250
11.7 Comparing typed values 253

Chapter 12 Complex types 256

12.1 What are complex types? 257
12.2 Deﬁning complex types 258
12.2.1 Named complex types 258
12.2.2 Anonymous complex types 260
12.2.3 Complex type alternatives 261
12.3 Content types 262
12.3.1 Simple content 262
12.3.2 Element-only content 264
12.3.3 Mixed content 264
12.3.4 Empty content 265
12.4 Using element declarations 266
12.4.1 Local element declarations 266
12.4.2 Element references 267
12.4.3 Duplication of element names 268
12.5 Using model groups 270
12.5.1 sequence groups 270
12.5.1.1 Design hint: Should I care about the order of elements? 272
12.5.2 choice groups 273
Contents xix

12.5.3 Nesting of sequence and choice groups 275

12.5.4 all groups 276
12.5.5 Named model group references 278
12.5.6 Deterministic content models 279
12.6 Using attribute declarations 281
12.6.1 Local attribute declarations 281
12.6.2 Attribute references 282
12.6.3 Attribute group references 284
12.6.4 Default attributes 284
12.7 Using wildcards 284
12.7.1 Element wildcards 285
12.7.1.1 Controlling the namespace of replacement elements 287
12.7.1.2 Controlling the strictness of validation 287
12.7.1.3 Negative wildcards 289
12.7.2 Open content models 292
12.7.2.1 Open content in a complex type 292
12.7.2.2 Default open content 295
12.7.3 Attribute wildcards 298
Chapter 13 Deriving complex types 300
13.1 Why derive types? 301
13.2 Restriction and extension 302
13.3 Simple content and complex content 303
13.3.1 simpleContent elements 303
13.3.2 complexContent elements 304
13.4 Complex type extensions 305
13.4.1 Simple content extensions 306
13.4.2 Complex content extensions 307
13.4.2.1 Extending choice groups 309
13.4.2.2 Extending all groups 310
xx Contents

13.4.2.3 Extending open content 311

13.4.3 Mixed content extensions 312
13.4.4 Empty content extensions 313
13.4.5 Attribute extensions 314
13.4.6 Attribute wildcard extensions 315
13.5 Complex type restrictions 316
13.5.1 Simple content restrictions 317
13.5.2 Complex content restrictions 318
13.5.2.1 Eliminating meaningless groups 320
13.5.2.2 Restricting element declarations 321
13.5.2.3 Restricting wildcards 322
13.5.2.4 Restricting groups 324
13.5.2.5 Restricting open content 329
13.5.3 Mixed content restrictions 331
13.5.4 Empty content restrictions 332
13.5.5 Attribute restrictions 333
13.5.6 Attribute wildcard restrictions 335
13.5.7 Restricting types from another namespace 337
13.5.7.1 Using targetNamespace on element and attribute
declarations 339
13.6 Type substitution 341
13.7 Controlling type derivation and substitution 343
13.7.1 final: Preventing complex type derivation 343
13.7.2 block: Blocking substitution of derived types 344
13.7.3 Blocking type substitution in element declarations 346
13.7.4 abstract: Forcing derivation 346
Chapter 14 Assertions 350
14.1 Assertions 351
14.1.1 Assertions for simple types 353
14.1.1.1 Using XPath 2.0 operators 355
Contents xxi

14.1.1.2 Using XPath 2.0 functions 357

14.1.1.3 Types and assertions 359
14.1.1.4 Inheriting simple type assertions 362
14.1.1.5 Assertions on list types 363
14.1.2 Assertions for complex types 365
14.1.2.1 Path expressions 367
14.1.2.2 Conditional expressions 369
14.1.2.3 Assertions in derived complex types 370
14.1.3 Assertions and namespaces 372
14.1.3.1 Using xpathDefaultNamespace 373
14.2 Conditional type assignment 375
14.2.1 The alternative element 376
14.2.2 Specifying conditional type assignment 377
14.2.3 Using XPath in the test attribute 378
14.2.4 The error type 380
14.2.5 Conditional type assignment and namespaces 381
14.2.6 Using inherited attributes in conditional type
assignment 382
Chapter 15 Named groups 384
15.1 Why named groups? 385
15.2 Named model groups 386
15.2.1 Deﬁning named model groups 386
15.2.2 Referencing named model groups 388
15.2.2.1 Group references 388
15.2.2.2 Referencing a named model group in a complex type 389
15.2.2.3 Using all in named model groups 391
15.2.2.4 Named model groups referencing named model groups 392
15.3 Attribute groups 392
15.3.1 Deﬁning attribute groups 393
15.3.2 Referencing attribute groups 395
15.3.2.1 Attribute group references 395
xxii Contents

15.3.2.2 Referencing attribute groups in complex types 396

15.3.2.3 Duplicate attribute names 397
15.3.2.4 Duplicate attribute wildcard handling 398
15.3.2.5 Attribute groups referencing attribute groups 398
15.3.3 The default attribute group 399
15.4 Named groups and namespaces 401
15.5 Design hint: Named groups or complex type
derivations? 403

Chapter 16 Substitution groups 406

16.1 Why substitution groups? 407
16.2 The substitution group hierarchy 408
16.3 Declaring a substitution group 409
16.4 Type constraints for substitution groups 412
16.5 Members in multiple groups 413
16.6 Alternatives to substitution groups 414
16.6.1 Reusable choice groups 414
16.6.2 Substituting a derived type in the instance 415
16.7 Controlling substitution groups 418
16.7.1 final: Preventing substitution group declarations 418
16.7.2 block: Blocking substitution in instances 419
16.7.3 abstract: Forcing substitution 420
Chapter 17 Identity constraints 422
17.1 Identity constraint categories 423
17.2 Design hint: Should I use ID/IDREF or
key/keyref? 424
17.3 Structure of an identity constraint 424
17.4 Uniqueness constraints 426
Contents xxiii

17.5 Key constraints 428

17.6 Key references 430
17.6.1 Key references and scope 432
17.6.2 Key references and type equality 432
17.7 Selectors and ﬁelds 433
17.7.1 Selectors 433
17.7.2 Fields 434
17.8 XPath subset for identity constraints 435
17.9 Identity constraints and namespaces 439
17.9.1 Using xpathDefaultNamespace 441
17.10 Referencing identity constraints 442

Chapter 18 Redeﬁning and overriding

schema components 446
18.1 Redefinition 448
18.1.1 Redefinition basics 448
18.1.1.1 Include plus redefine 450
18.1.1.2 Redefine and namespaces 450
18.1.1.3 Pervasive impact 450
18.1.2 The mechanics of redefinition 451
18.1.3 Redefining simple types 452
18.1.4 Redefining complex types 453
18.1.5 Redefining named model groups 454
18.1.5.1 Defining a subset 454
18.1.5.2 Defining a superset 455
18.1.6 Redefining attribute groups 456
18.1.6.1 Defining a subset 457
18.1.6.2 Defining a superset 458
18.2 Overrides 459
18.2.1 Override basics 459
xxiv Contents

18.2.1.1 Include plus override 461

18.2.1.2 Override and namespaces 461
18.2.1.3 Pervasive impact 462
18.2.2 The mechanics of overriding components 462
18.2.3 Overriding simple types 464
18.2.4 Overriding complex types 465
18.2.5 Overriding element and attribute declarations 466
18.2.6 Overriding named groups 467
18.3 Risks of redefines and overrides 468
18.3.1 Risks of redefining or overriding types 468
18.3.2 Risks of redefining or overriding named groups 470
Chapter 19 Topics for DTD users 472
19.1 Element declarations 473
19.1.1 Simple types 474
19.1.2 Complex types with simple content 475
19.1.3 Complex types with complex content 476
19.1.4 Mixed content 478
19.1.5 Empty content 479
19.1.6 Any content 480
19.2 Attribute declarations 480
19.2.1 Attribute types 480
19.2.2 Enumerated attribute types 481
19.2.3 Notation attributes 482
19.2.4 Default values 482
19.3 Parameter entities for reuse 483
19.3.1 Reusing content models 484
19.3.2 Reusing attributes 485
19.4 Parameter entities for extensibility 486
Contents xxv

19.4.1 Extensions for sequence groups 486

19.4.2 Extensions for choice groups 489
19.4.3 Attribute extensions 490
19.5 External parameter entities 492
19.6 General entities 493
19.6.1 Character and other parsed entities 493
19.6.2 Unparsed entities 493
19.7 Notations 493
19.7.1 Declaring a notation 494
19.7.2 Declaring a notation attribute 495
19.7.3 Notations and unparsed entities 496
19.8 Comments 497
19.9 Using DTDs and schemas together 499

Chapter 20 XML information modeling 500

20.1 Data modeling paradigms 502
20.2 Relational models 503
20.2.1 Entities and attributes 504
20.2.2 Relationships 507
20.2.2.1 One-to-one and one-to-many relationships 507
20.2.2.2 Many-to-many relationships 507
20.2.2.2.1 Approach #1: Use containment with repetition 508
20.2.2.2.2 Approach #2: Use containment with references 510
20.2.2.2.3 Approach #3: Use relationship elements 512
20.3 Modeling object-oriented concepts 514
20.3.1 Inheritance 514
20.3.2 Composition 519
20.4 Modeling web services 522
xxvi Contents

20.5 Considerations for narrative content 524

20.5.1 Semantics vs. style 524
20.5.1.1 Beneﬁts of excluding styling 524
20.5.1.2 Rendition elements: “block” and “inline” 525
20.5.2 Considerations for schema design 526
20.5.2.1 Flexibility 526
20.5.2.2 Reusing existing vocabularies 526
20.5.2.3 Attributes are for metadata 526
20.5.2.4 Humans write the documents 527
20.6 Considerations for a hierarchical model 527
20.6.1 Intermediate elements 527
20.6.2 Wrapper lists 531
20.6.3 Level of granularity 532
20.6.4 Generic vs. speciﬁc elements 533
Chapter 21 Schema design and
documentation 538
21.1 The importance of schema design 539
21.2 Uses for schemas 540
21.3 Schema design goals 542
21.3.1 Flexibility and extensibility 542
21.3.2 Reusability 543
21.3.3 Clarity and simplicity 545
21.3.3.1 Naming and documentation 545
21.3.3.2 Clarity of structure 546
21.3.3.3 Simplicity 546
21.3.4 Support for graceful versioning 547
21.3.5 Interoperability and tool compatibility 547
21.4 Developing a schema design strategy 548
21.5 Schema organization considerations 550
21.5.1 Global vs. local components 550
Contents xxvii

21.5.1.1 Russian Doll 551

21.5.1.2 Salami Slice 553
21.5.1.3 Venetian Blind 554
21.5.1.4 Garden of Eden 555
21.5.2 Modularizing schema documents 557
21.6 Naming considerations 559
21.6.1 Rules for valid XML names 559
21.6.2 Separators 560
21.6.3 Name length 560
21.6.4 Standard terms and abbreviations 561
21.6.5 Use of object terms 562
21.7 Namespace considerations 564
21.7.1 Whether to use namespaces 564
21.7.2 Organizing namespaces 565
21.7.2.1 Same namespace 565
21.7.2.2 Different namespaces 568
21.7.2.3 Chameleon namespaces 572
21.7.3 Qualified vs. unqualified forms 575
21.7.3.1 Qualified local names 575
21.7.3.2 Unqualified local names 576
21.7.3.3 Using form in schemas 576
21.7.3.4 Form and global element declarations 578
21.7.3.5 Default namespaces and unqualified names 578
21.7.3.6 Qualified vs. unqualified element names 579
21.7.3.7 Qualified vs. unqualified attribute names 580
21.8 Schema documentation 580
21.8.1 Annotations 581
21.8.2 User documentation 582
21.8.2.1 Documentation syntax 582
21.8.2.2 Data element definitions 584
21.8.2.3 Code documentation 585
21.8.2.4 Section comments 585
xxviii Contents

21.8.3 Application information 586

21.8.4 Non-native attributes 588
21.8.4.1 Design hint: Should I use annotations or non-native attributes? 589
21.8.5 Documenting namespaces 589
Chapter 22 Extensibility and reuse 594
22.1 Reuse 596
22.1.1 Reusing schema components 596
22.1.2 Creating schemas that are highly reusable 597
22.1.3 Developing a common components library 597
22.2 Extending schemas 599
22.2.1 Wildcards 601
22.2.2 Open content 604
22.2.3 Type substitution 605
22.2.4 Substitution groups 607
22.2.5 Type redeﬁnition 609
22.2.6 Named group redeﬁnition 611
22.2.7 Overrides 612
Chapter 23 Versioning 616
23.1 Schema compatibility 617
23.1.1 Backward compatibility 618
23.1.2 Forward compatibility 623
23.2 Using version numbers 626
23.2.1 Major and minor versions 626
23.2.2 Placement of version numbers 628
23.2.2.1 Version numbers in schema documents 628
23.2.2.2 Versions in schema locations 630
23.2.2.3 Versions in instances 631
23.2.2.4 Versions in namespace names 632
23.2.2.5 A combination strategy 633
Contents xxix

23.3 Application compatibility 634

23.4 Lessening the impact of versioning 635
23.4.1 Deﬁne a versioning strategy 636
23.4.2 Make only necessary changes 636
23.4.3 Document all changes 637
23.4.4 Deprecate components before deleting them 638
23.4.5 Provide a conversion capability 639
23.5 Versions of the XML Schema language 639
23.5.1 New features in version 1.1 640
23.5.2 Forward compatibility of XML Schema 1.1 641
23.5.3 Portability of implementation-deﬁned types and facets 642
23.5.3.1 Using typeAvailable and typeUnavailable 644
23.5.3.2 Using facetAvailable and facetUnavailable 645
Appendix A XSD keywords 648
A.1 Elements 649
A.2 Attributes 671

Appendix B Built-in simple types 690

B.1 Built-in simple types 691
B.2 Applicability of facets to built-in simple types 695
Index 699
Foreword

clas·sic (adjective)
judged over a period of time to be important and of the
highest quality:

a classic novel
a classic car

Neither this deﬁnition, nor any of the leading dictionary deﬁnitions,

has a usage example anything like:

a classic work on high-tech software

After all, it is a rare book on software that even survives long enough
to be “judged over a period of time.”
Nevertheless, Definitive XML Schema satisfies every definition of
“classic.” It is one of the elite few software books that have been in
print continuously for over ten years, and an essential trustworthy guide
for tens of thousands of readers.
This Second Edition continues to be an essential and trustworthy
classic:
Essential because in the last ten years XML has become the accepted
standard for data interchange, and XML Schema 1.0 is largely
responsible. Now version 1.1 has extended the ability to specify and

xxxi
xxxii Foreword

validate document data, to a degree previously possible only for

databases. These updates are covered in this book by extensive revi-
sions—the most significant 250 of which are flagged in the text and
table of contents. Hundreds more unflagged revisions reflect W3C
corrections of XML Schema errata, and ten years of evolving “best
practices.”
Trustworthy because it is both authoritative and accurate.

# The author(ity), Priscilla Walmsley, is a noted consultant who

has been using XML Schema ever since she helped develop it
as a member of the W3C XML Schema Group. She personally
devised many of the current “best practices” described in this
book. Priscilla is the Editor of the W3C XML Schema Primer,
Second Edition.
# Accuracy was preserved by using the same XML-based produc-
tion system that was used in 2002, operated by the same team
of XML experts who read and thoroughly understood the book.
Priscilla’s original XML source (in which she had personally
tagged the version 1.1 revisions) was used throughout produc-
tion. Dmitry Kirsanov copy-edited and proofed it, while Alina
Kirsanova prepared the index, coded the XSL transformations,
and generated the camera-ready PDFs.

The result, as you will see, retains the structure, clarity, patient expla-
nations, validated examples (over 450!), and well-reasoned advice that
critics praised in the 2002 edition—but now they are ten years more
up-to-date.
And after you’ve read Deﬁnitive XML Schema, Second Edition, it
won’t take another ten years for you, too, to judge it a classic.

Charles F. Goldfarb
Belmont, CA
August 2012
Acknowledgments

First and foremost, I would like to thank Charles Goldfarb for his in-
valuable guidance and support. Alina Kirsanova and Dmitry Kirsanov
did an excellent job preparing this book for publication. I would also
like to thank Mark Taub at Prentice Hall for his hand in the making
this work possible.
Of course, this book would not have been possible without the efforts
of all of the members of the W3C XML Schema Working Group, with
whom I have had the pleasure of working for six years. The content of
this book was shaped by the questions and comments of the people
who contribute to XML-DEV and xmlschema-dev.
Finally, I’d like to thank my Dad for teaching me to “get stuck into
it,” a skill which allowed me to complete this substantial project.

Priscilla Walmsley
Traverse City, Michigan
March 2012

xxxiii
How to use this
book

This book covers the two versions of XML Schema—1.0 and 1.1—and
provides revision bars to assist access to just one or the other. In refer-
ring to both versions as “XML Schema,” the book follows customary
practice, despite the official name of 1.1 being “W3C XML Schema
Definition Language (XSD) 1.1.” For either version, the book is useable
as both a tutorial and a reference.
As a tutorial, the book can be read from cover to cover with confi-
dence that each topic builds logically on the information that was pre-
viously covered. (Of course, knowledge of XML itself is always a pre-
requisite to learning about XML Schema, and is assumed in this book.)
When using this book as a reference, you have several access options
available to you:

# A comprehensive index starts on page 699.

# An alphabetical list of all the elements and attributes that make
up the XML Schema syntax is in Appendix A on p. 648. For
each there is a reference to further coverage in the body of
the book.
# XML Schema includes a basic set of datatypes, known formally
as the “built-in simple types.” They are listed in Appendix B on
p. 690. This appendix also refers to more detailed descriptions
in the body of the book.

xxxv
xxxvi How to use this book

# The major changes in version 1.1 of XML Schema are summa-

rized in Section 23.5.1 on p. 640, with references to detailed
coverage elsewhere in the book.

Revisions in the Second Edition

This edition of Deﬁnitive XML Schema contains more than 500 revi-
sions, covering such new and updated topics as:

# W3C published corrections for errata in XML Schema 1.0

# Current “best practices” after ten years of experience
# XML information modeling for relational and object-oriented
modeling paradigms
# Schema design: evaluating pros and cons of alternatives
# Schema strategy: formulating a coherent strategy for schema
development and maintenance
# Version 1.1 updates and additions to XML Schema

Identifying 1.1-related revisions

The author has chosen a “1.1 subset” of the book revisions, comprising
the 250 most signiﬁcant revisions that deal with version 1.1. If a section,
table, example, or paragraph has content entirely from the “1.1 subset,”
there is a solid gray revision bar on its right. If other material might be
included, the bar is a gray dotted line.

Strategies for using the revision bars

If your interest is solely 1.0 (perhaps because your software does not
yet support 1.1), you may decide to focus on content that either has a
dotted revision bar or no bar at all.
If you are interested only in what is new in 1.1 (presumably because
you already know 1.0), consider content having either a solid or dotted
revision bar in deciding where to focus your reading.
How to use this book xxxvii

Finally, if your interest is all of 1.1 (because you don’t already know
1.0), you can easily disregard the revision bars (that’s why they are
grayed out ).

Syntax tables
This book contains syntax tables, each summarizing the allowed syntax
of an XML Schema component. The ﬁrst such table does not occur
until Section 4.2 on p. 58, by which point the undeﬁned terms in this
explanation will have been introduced.
Syntax tables, whose captions all start with “XSD Syntax,” look like
the example below, which shows the syntax for named simple types. It
contains the following information:

# The element name(s) used for this XML Schema component.

# The possible parent element name(s). Note that “1.1”, printed
white on a gray box, precedes override to identify it as a
construct that is only permitted in version 1.1. This convention
is followed in all syntax tables; it occurs once more in this table.
# A list of allowed attributes, along with their types, valid values,
and brief descriptions. The names of required attributes appear
in bold font. Default values appear in italics in the Type column.
# The allowed child elements, shown as a content model that uses,
for compactness, the XML DTD syntax. Commas indicate that
child elements must appear in the order shown, while vertical
bars (|) indicate a choice among child elements. Occurrence
constraints indicate how many of each may appear: ? means
zero or one, * means zero or more, and + means one or more.
Otherwise, one and only one is required. In this example, the
allowed content is zero or one annotation element, followed
by a choice of either one restriction, one list, or one
union element.
xxxviii How to use this book

Table XSD Syntax: named simple type deﬁnition

Name
simpleType

Parents
schema, redefine, 1.1
override

Attribute name Type Description

In some cases, there is more than one syntax table for the same ele-
ment name, because certain element names in XML Schema have
multiple uses. For example, simpleType is used for both named
simple types and anonymous simple types. Each of these use cases of
simpleType allows different attributes and a different set of parent
elements, so each is described with its own table.

Companion website
This book has a companion website, maintained by the author, at
www.datypic.com/books/defxmlschema2. On the website, you can
view any errata and download the examples from this book. In addition
to the examples that appear in the book, which are generally concise
in order to illustrate a particular point, the website also has larger, more
comprehensive instances and schemas that can be copied or used to
test validation.
Schemas:
An introduction

2
Chapter

his chapter provides a brief introduction to schemas and

T explains why they are important. It also discusses the basic

schema design goals and describes the various existing schema
languages.

1.1 What is a schema?

The word schema means a diagram, plan, or framework. In XML, it
refers to a document that describes an XML document. Suppose you
have the XML instance shown in Example 1–1.

Example 1–1. Product instance

3
4 Chapter 1 | Schemas: An introduction

The instance consists of a product element that has two children

(number and size) and an attribute (effDate).
The sentence you just read could be a schema because it describes
the instance document, and all other instances of a product with the
same kind of children and attribute. However, to be useful with XML
and benefit from computer processing, this kind of schema won’t
do. The schema must be defined in a schema document using a formal
schema language.
Example 1–2 shows a schema document that describes the instance.
It contains element and attribute declarations that assign types and
names to elements and attributes. The document is written in the XML
Schema Definition Language (XSD).

Example 1–2. Product schema in XSD schema language

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="size" type="SizeType"/>
</xs:sequence>
<xs:attribute name="effDate" type="xs:date"/>
</xs:complexType>
<xs:simpleType name="SizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>

In contrast, Example 1–3 on p. 9 shows an equally valid schema in

the familiar Document Type Deﬁnition (DTD) language of XML.
The disproportionate sizes of the two schema documents roughly reﬂect
the capabilities of the two languages. The extensive features of XML
Schema, when to use them, and how to use them effectively, form the
subject matter of this book.
1.2 | The purpose of schemas 5

1.2 The purpose of schemas

Schemas are developed for many purposes. How effectively a schema
can achieve them depends on the capabilities of the schema language
and the schema processor, as well as the quality of the schema design.

1.2.1 Data validation

One of the most common uses for schemas is to verify that an XML
document is valid according to a deﬁned set of rules. A schema can be
used to validate:

# The structure of elements and attributes. For example, a

product must have a number and a size, and may optionally
have an effDate (effective date).
# The order of elements. For example, number must appear
before size.
# The data values of elements and attributes, based on ranges,
enumerations, and pattern matching. For example, size must
be an integer between 2 and 18, and effDate must be a
valid date.
# The uniqueness of values in an instance. For example, all
product numbers in an instance must be unique.

1.2.2 A contract with trading partners

Often, XML instances are passed between organizations. A schema
may act as a contract with your trading partners. It clearly lays out the
rules for document structure and what is required. Since an instance
can be validated against a schema, the “contract” can be enforced using
available tools.
6 Chapter 1 | Schemas: An introduction

1.2.3 System documentation

Schemas can provide documentation about the data in an XML
instance. Anyone who needs to understand the data can refer to the
schema for information about names, structures, and data types of
the items. To include further documentation, you can add annotations
to any schema component.

1.2.4 Providing information to processors

Schemas contain information about the type and structure of elements
that is useful to have when processing the document. For example, if
a processor knows the data type of a value, it knows how to sort it or
compare it to other values, and it knows what operations it can reason-
ably perform on that data. Providing information to the processor can
also be useful for debugging: If the processor knows that a certain
element is not allowed by a schema, it can signal an error to the
processing code.

1.2.5 Augmentation of data

Schema processing can also add to the instance. A schema can insert
default and ﬁxed values for elements and attributes and normalize
whitespace according to the type.

1.2.6 Application information

Schemas provide a way for additional information about the data to
be supplied to the application when processing a particular type of
document. For example, you could include information on how to
map the product element instances to a database table, and have the
application use this information to automatically update that particular
table with the data.
1.3 | Schema design 7

In addition to being available at processing time, such information

in a schema can be used to generate code, such as:

# User interfaces for editing the information. For example, if you

know that size is between 2 and 18, you can generate an
interface with a slider bar that has these values as the limits.
# Stylesheets to transform instance data into a reader-friendly
representation such as XHTML. For example, if you know that
the human-readable name for the content of a number element
is “Product Number” you can use this as a column header.
# Code to insert or extract the data from a database. For example,
if you know that the product number maps to the PROD_NUM
column on the PRODUCTS table, you can generate an efﬁcient
routine to insert it into that column.

1.3 Schema design

Schema languages often provide several ways to accurately describe the
same thing. The decisions made during the design stage can affect a
schema’s usability, accuracy, and applicability. Therefore, it is important
to keep in mind your design objectives when creating a schema. These
objectives may vary depending on how you are using XML, but some
are common to all use cases.

1.3.1 Accuracy and precision

Obviously, a schema should accurately describe an XML instance and
allow it to be validated. Schemas should also be precise in describing
data. Precision can result in more complete validation as well as
better documentation. Precision can be achieved by deﬁning restrictive
types that truly represent valid values.
8 Chapter 1 | Schemas: An introduction

1.3.2 Clarity
Schemas should be very clear, allowing a reader to instantly understand
the structure and characteristics of the instance being described. Clarity
can be achieved by

# Appropriate choice of names

# Consistency in naming
# Consistency in structure
# Good documentation
# Avoiding unnecessary complexity

1.3.3 Broad applicability

There is a temptation to create schemas that are useful only for a speciﬁc
application. In some cases, this may be appropriate. However, it is
better to create a schema that has broader applicability. For
example, a business unit that handles only domestic accounts may not
use a country element declaration as part of an address. They should
consider adding it in as an optional element for the purposes
of consistency and future usability.
There are two components to a schema’s broad applicability:
reusability and extensibility. Reusable schema components are modular
and well documented, encouraging schema authors to reuse them
in other schemas. Extensible components are ﬂexible and open,
allowing other schema authors to build on them for future uses.
Since designing schemas well is so important, this book offers sections
labeled “Design Hint” to address pros and cons of various alternatives
in schema design. In addition, Chapters 21, 22, and 23 pull these
principles together into a coherent design strategy.
1.4 | Schema languages 9

1.4 Schema languages

1.4.1 Document Type Definition (DTD)
Document Type Definitions (DTDs) are a commonly used method
of describing XML documents. The DTD syntax is the original W3C
schema language, built into XML 1.0 itself. A DTD allows you to
define the basic structure of an XML instance, including

# The structure and order of elements

# The allowed attributes for elements
# Basic data typing for attributes
# Default and ﬁxed values for attributes
# Notations to represent other data formats

Example 1–3 shows a DTD that is roughly equivalent to our schema

in Example 1–2.

Example 1–3. Product schema in DTD schema language

<!ELEMENT product (number, size?)>
<!ELEMENT number (#PCDATA)>
<!ELEMENT size (#PCDATA)>
<!ATTLIST product effDate CDATA #IMPLIED>

DTDs have many advantages. They are relatively simple, have a

compact syntax, and are widely understood by XML software
implementers.
However, DTDs also have some shortcomings. They do not support
namespaces easily, and they provide very limited data typing, for at-
tributes only. Also, because they have a non-XML syntax, they cannot
be parsed as XML, which is useful for generating documentation or
making wholesale changes. However, conversion tools such as James
Clark’s open source Trang (www.relaxng.org) and other free and
10 Chapter 1 | Schemas: An introduction

commercial products can convert DTDs to other schema languages

for this purpose.

1.4.2 Schema requirements expand

As XML became increasingly popular for applications such as
e-commerce and enterprise application integration (EAI), a more robust
schema language was needed. Speciﬁcally, XML developers wanted:

# The ability to constrain data based on common data types such

as integer and date.
# The ability to deﬁne their own types in order to further
constrain data.
# Support for namespaces.
# The ability to specify multiple element declarations with the
same name in different contexts.
# Object-oriented features such as type derivation. The ability to
express types as extensions or restrictions of other types allows
them to be processed similarly and substituted for each other.
# A schema language that uses XML syntax. This is advantageous
because it is extensible, can represent more advanced models,
and can be processed by many available tools.
# The ability to add structured documentation and application
information that is passed to the application during processing.

DTDs have not disappeared since newer schema languages arrived

on the scene. They are supported in many tools, are widely understood,
and are still in use in many applications, especially in the publishing
arena. In addition, they continue to be useful as a lightweight alternative
to newer schema languages.
1.4 | Schema languages 11

1.4.3 W3C XML Schema

Four schema languages were developed before work began on XML
Schema: XDR (XML Data Reduced), DCD, SOX, and DDML. These
four languages were considered, together, as a starting point for XML
Schema, and many of their originators were involved in the creation
of XML Schema.
The World Wide Web Consortium (W3C) began work on XML
Schema in 1998, and it became an official recommendation on
May 2, 2001.
On April 5, 2012, version 1.1 of XML Schema became official. It
includes several significant enhancements as well as many small changes.
One change was the name, which is officially “W3C XML Schema
Definition Language (XSD) 1.1.” Understandably, this book follows
the common practice of continuing to use the name “XML Schema,”
along with “XSD” in syntax tables and other formal language contexts.1
XML Schema 1.1 is backward-compatible with 1.0, and schema
authors do not need to specify in their schema documents the version
to which they conform. A list of the major changes in 1.1 can be found
in Section 23.5.1 on p. 640 of this book.
The formal recommendation is in three parts:

# XML Schema Part 0: Primer is a non-normative introduction

to XML Schema 1.0 that provides a lot of examples and expla-
nations. It can be found at www.w3.org/TR/xmlschema-0.
# XML Schema Part 1: Structures describes most of the components
of XML Schema. The most recent version (1.1) can be found
at www.w3.org/TR/xmlschema11-1.
# XML Schema Part 2: Datatypes covers simple types. It explains
the built-in types and the facets that may be used to restrict

1. Outside this book, two earlier unofﬁcial names may also be in use: XML
Schema Deﬁnition Language (XSDL) and W3C XML Schema (WXS).
12 Chapter 1 | Schemas: An introduction

them. It is a separate document so that other speciﬁcations may

use it, without including all of XML Schema. The most recent
version (1.1) can be found at www.w3.org/TR/xmlschema11-2.

1.4.4 Other schema languages

XML Schema and DTDs are not always the most appropriate
schema languages for all cases. This section describes two other schema
languages.

1.4.4.1 RELAX NG
RELAX NG covers some of the same ground as DTDs and XML
Schema. RELAX NG was developed by an OASIS technical committee
and was adopted as an ISO standard (ISO/IEC 19757-2). RELAX NG
is intended only for validation; the processor does not pass documenta-
tion or application information from the schema to the application.
RELAX NG does not have a complete built-in type library; it is
designed to use other type libraries (such as that of XML Schema).
Some of the beneﬁts of RELAX NG over XML Schema 1.0 have
been addressed as new features in XML Schema 1.1. However,
RELAX NG still has some advantages as compared to XML Schema 1.1:

# Many people consider the syntax of RELAX NG to be simpler

and more elegant than that of XML Schema.
# It has a convenient, compact non-XML syntax.
# It includes attributes in the elements’ content models. For ex-
ample, you can specify that a product element must either
have an effectiveDate attribute or a startDate attribute.
XML Schema does not directly provide a way to do this.
# It allows the deﬁnition of components from multiple namespaces
in the same document. In XML Schema, multiple schema
documents are required.
1.4 | Schema languages 13

# It does not require content models to be deterministic. This is

explained in Section 12.5.6 on p. 279.

However, RELAX NG also has some limitations compared to XML

Schema:

# It has no equivalent of XML Schema 1.1 assertions, which allow

complex XPath expressions to be used to determine the validity
of an element or attribute.
# It has no type derivation capabilities. XML Schema’s restriction
and extension mechanisms allow type substitution and many
other beneﬁts, described in Section 13.1 on p. 301.
# It has no equivalent of identity constraints. XML Schema’s
identity constraint mechanism is useful in data-oriented
applications, as described in Chapter 17.
# Because it is only intended for validation, it does not
provide application information to the processor. In fact, the
RELAX NG processor passes to the application the exact
same information that is available from a DTD. This is not a
disadvantage if your only objective is validation, but it does not
allow you to use the schema to help you understand how to
process the instance.

For more information on RELAX NG, see http://relaxng.org.

1.4.4.2 Schematron
XML Schema, DTDs, and RELAX NG are all grammar-based schema
languages. They specify what must appear in an instance, and in what
order.
Schematron, on the other hand, is rule-based. It allows you to deﬁne
a series of rules to which the document must conform. These rules are
expressed using XPath. In contrast to grammar-based languages,
Schematron considers anything that does not violate a rule to be valid.
14 Chapter 1 | Schemas: An introduction

There is no need to have declarations for every element and attribute

that may appear in the instance.
Schematron, sometimes referred to as “ISO Schematron”, is an ISO
standard (ISO/IEC 19757-3). Like RELAX NG, Schematron is intended
only for validation of instances. It has a number of advantages over
XML Schema 1.1:

# It is easy to learn and use. It uses XPath, which is familiar to

many people already using XML.
# The use of XPath allows it to very ﬂexibly and succinctly express
relationships between elements in a way that is not possible with
other schema languages.
# Assertions in a Schematron rule can access XML data any-
where in a particular instance document. XML Schema 1.1
assertions, by contrast, can only be based on the contents of a
particular type.

The limitations of Schematron compared to XML Schema are:

# It does not provide a model of the instance data. A person can-

not gain an understanding of what instance data is expected by
looking at the schema.
# It is intended only for validation and cannot be used to pass
any information about the instance, such as types or default
values, to an application.
# Anything is valid unless it is speciﬁcally prohibited. This puts
a burden to anticipate all possible errors on the schema author.

As Schematron and XML Schema complement each other, it makes

sense to combine the two. For more information on Schematron, see
www.schematron.com.
A quick tour
of XML Schema

16
Chapter

his chapter provides a quick tour of the main components

T of XML Schema. It also introduces a simple example of a

schema and a conforming instance that will be used and
built upon throughout the book.

2.1 An example schema

Suppose you have the instance shown in Example 2–1. It consists of a
product element that has two children (number and size) and an
attribute (effDate).

Example 2–1. Product instance

17
18 Chapter 2 | A quick tour of XML Schema

Example 2–2 shows a schema that might be used to validate our in-
stance. Its three element declarations and one attribute declaration
assign names and types to the components they declare.

Example 2–2. Product schema

2.2 The components of XML Schema

Schemas are made up of a number of components of different kinds,
listed in Table 2–1. All of the components of XML Schema are dis-
cussed in detail in this book, in the chapters indicated in Table 2–1.

2.2.1 Declarations vs. deﬁnitions

Schemas contain both declarations and deﬁnitions. The term declaration
is used for components that can appear in the instance and be validated
by name. This includes elements, attributes, and notations. The term
deﬁnition is used for other components that are internal to the schema,
such as complex and simple types, model groups, attribute groups, and
identity constraints. Throughout this book, you will see the terms
2.2 | The components of XML Schema 19

Table 2–1 XML Schema components

Component Can be Can be Can be Can be Chapter
named? unnamed? global? local?
Element yes no yes yes 6
Attribute yes no yes yes 7
Simple type yes yes yes yes 8–11
Complex type yes yes yes yes 12–13
Notation yes no yes no 19
Named model group yes no yes no 15
Attribute group yes no yes no 15
Identity constraint yes no no yes 17

“element declaration” and “type deﬁnition,” but not “element

definition” or “type declaration.”
The order of declarations and definitions in the schema document
is insignificant. A declaration can refer to other declarations or defini-
tions that appear before or after it, or even those that appear in another
schema document.

2.2.2 Global vs. local components

Components can be declared (or defined) globally or locally. Global
components appear at the top level of a schema document, and they
are always named. Their names must be unique, within their component
type, within the entire schema. For example, it is not legal to have two
global element declarations with the same name in the same schema.
However, it is legal to have an element declaration and a complex type
definition with the same name.
Local components, on the other hand, are scoped to the definition
or declaration that contains them. Element and attribute declarations
can be local, which means their scope is the complex type in which
they are declared. Simple types and complex types can also be
20 Chapter 2 | A quick tour of XML Schema

locally deﬁned, in which case they are anonymous and cannot be used
by any element or attribute declaration other than the one in which
they are deﬁned.

2.3 Elements and attributes

Elements and attributes are the basic building blocks of XML docu-
ments. The instance in Example 2–1 contains three elements (product,
number, and size) and one attribute (effDate). As a result, the
schema contains three element declarations and one attribute declara-
tion. The product element declaration is global, since it appears at
the top level of the schema document. The other two element
declarations, as well as the attribute declaration, are local, and their
scope is the ProductType type in which they are declared. Elements
and attributes are covered in detail in Chapters 6 and 7, respectively.

2.3.1 The tag/type distinction

Each of the elements and attributes is associated with a type. XML
Schema separates the concepts of elements and attributes from their
types. This allows using different names for data that is structurally
the same. For example, you can write two element declarations,
shippingAddress and billingAddress, which have the exact same
structure but different names. You are only required to deﬁne one type,
AddressType, and use it in both element declarations. In addition to
using different names, you can place the corresponding elements in
different places in the document. A shippingAddress element may
only be relevant in the shipment information section of a purchase
order, while a billingAddress may appear only in the billing section.
You can also have two element declarations with the same name, but
different types, in different contexts. For example, a size element can
contain an integer when it is a child of shirt, or a value S, M, or L
when it is a child of hat.
2.4 | Types 21

2.4 Types
Types allow for validation of the content of elements and the values of
attributes. They can be either simple types or complex types. The term
“type” is used throughout this book to mean “simple or complex type.”

2.4.1 Simple vs. complex types

Elements that have been assigned simple types have character data
content, but no child elements or attributes. Example 2–3 shows the
size, comment, and availableSizes elements that have simple
types.
By contrast, elements that have been assigned complex types may
have child elements or attributes. Example 2–4 shows the size,
comment, and availableSizes elements with complex types.

Example 2–3. Elements with simple types

<size>10</size>
<comment>Runs large.</comment>
<availableSizes>10 large 2</availableSizes>

Example 2–4. Elements with complex types

<size system="US-DRESS">10</size>
<comment>Runs <b>large</b>.</comment>
<availableSizes><size>10</size><size>2</size></availableSizes>

Attributes always have simple types, not complex types. This makes
sense, because attributes themselves cannot have children or other
attributes. Example 2–5 shows some attributes that have simple types.

Example 2–5. Attributes with simple types

system="US-DRESS"
availableSizes="10 large 2"
22 Chapter 2 | A quick tour of XML Schema

2.4.2 Named vs. anonymous types

Types can be either named or anonymous. Named types are always
deﬁned globally (at the top level of a schema document) and are re-
quired to have a unique name. Anonymous types, on the other hand,
must not have names. They are always deﬁned entirely within an
element or attribute declaration, and may only be used once, by that
declaration. The two types in Example 2–2 are both named types. An
anonymous type is shown in Example 2–6.

Example 2–6. Anonymous type

<xs:element name="size">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
</xs:element>

2.4.3 The type deﬁnition hierarchy

XML Schema allows types to be derived from other types. In Exam-
ple 2–2, the simple type SizeType is derived from the integer simple
type. A complex type can also be derived from another type, either
simple or complex. It can either restrict or extend the other type. For
example, you could deﬁne a complex type UKAddressType that extends
AddressType to add more children.
The derivation of types from other types forms a type deﬁnition
hierarchy. Derived types are related to their ancestors and inherit
qualities from them. They can also be substituted for each other in
instances. If the shippingAddress element declaration refers to the
type AddressType, a corresponding element can also have the type
UKAddressType in the instance.
This is very powerful because applications designed to process
generic AddressType elements can also process UKAddressType
2.5 | Simple types 23

elements without caring about the differences. Other processors that

do care about the differences between them can distinguish between
the different types.

2.5 Simple types

2.5.1 Built-in simple types
Forty-nine simple types are built into the XML Schema recommenda-
tion. These simple types represent common data types such as strings,
numbers, date and time values, and also include types for each of the
valid attribute types in XML DTDs. The built-in types are summarized
in Table 2–2 and discussed in detail in Chapter 11.
Example 2–2 assigned the built-in simple type integer to the
number elements, and the built-in simple type date to the effDate
attribute.

Table 2–2 Built-in simple type summary

Category Built-in types
Strings and names string, normalizedString, token, Name, NCName,
QName, language
Numeric float, double, decimal, integer, long, int, short,
byte, positiveInteger, nonPositiveInteger,
negativeInteger, nonNegativeInteger,
unsignedLong, unsignedInt, unsignedShort,
unsignedByte

Date and time duration, dateTime, date, time, gYear, gYearMonth,

gMonth, gMonthDay, gDay, 1.1 dayTimeDuration,
1.1
yearMonthDuration, 1.1 dateTimeStamp

XML DTD types ID, IDREF, IDREFS, ENTITY, ENTITIES, NMTOKEN,

NMTOKENS, NOTATION

Other boolean, hexBinary, base64Binary, anyURI

24 Chapter 2 | A quick tour of XML Schema

2.5.2 Restricting simple types

New simple types may be derived from other simple types by restricting
them. Example 2–2 showed the deﬁnition of a simple type SizeType
that restricts the built-in type integer. We applied the facets
minInclusive and maxInclusive to restrict the valid values of the
size elements to be between 2 and 18. Using the fourteen facets that
are part of XML Schema, you can specify a valid range of values, con-
strain the length and precision of values, enumerate a list of valid values,
or specify a regular expression that valid values must match. These
fourteen facets are summarized in Table 2–3. Chapter 8 explains how
to derive new simple types.

Table 2–3 Facets

Category Facets
Bounds minInclusive, maxInclusive,
minExclusive, maxExclusive

Length length, minLength, maxLength

Precision totalDigits, fractionDigits

Enumerated values enumeration

Pattern matching pattern

Whitespace processing whiteSpace

1.1
XPath-based assertions assertion
1.1
Time zone requirements explicitTimezone

2.5.3 List and union types

Most simple types, including those we have seen so far, are atomic types.
They contain values that are indivisible, such as 10. There are two
other varieties of simple types: list and union types.
List types have values that are whitespace-separated lists of atomic
values, such as <availableSizes>10 large 2</availableSizes>.
2.6 | Complex types 25

Union types may have values that are either atomic values or list
values. What differentiates them is that the set of valid values, or “value
space,” for the type is the union of the value spaces of two or more
other simple types. For example, to represent a dress size, you may
deﬁne a union type that allows a value to be either an integer from 2
through 18 or one of the string values small, medium, or large.
List and union types are covered in Chapter 10.

2.6 Complex types

2.6.1 Content types
The “content” of an element is the character data and child elements
that are between its tags. There are four types of content for complex
types: simple, element-only, mixed, and empty. The content type is in-
dependent of attributes; all of these content types allow attributes.
Example 2–7 shows the instance elements size, product, letter,
and color that have complex types. They represent the four different
content types.

# The size element has simple content, because it contains only

character data.
# The product element has element-only content, because it has
child elements, but no character data content.

Example 2–7. Elements with complex types

<letter>Dear <custName>Priscilla Walmsley</custName>...</letter>

<color value="blue"/>
26 Chapter 2 | A quick tour of XML Schema

# The letter element has mixed content, because it has both

child elements and character data content.
# The color element has empty content, because it does not have
any content (just attributes).

2.6.2 Content models

The order and structure of the child elements of a complex type are
known as its content model. Content models are deﬁned using a combi-
nation of model groups, element declarations or references, and wild-
cards. In Example 2–2, the content model of ProductType was a
single sequence model group containing two element declarations.
There are three kinds of model groups:

# sequence groups require that the child elements appear in the

order speciﬁed.
# choice groups allow any one of several child elements to appear.
# all groups allow child elements to appear in any order.

These groups can be nested and may occur multiple times, allowing
you to create sophisticated content models. Example 2–8 shows a more
complex content model for ProductType. Instances of this new deﬁ-
nition of ProductType must have a number child, optionally followed

Example 2–8. More complicated content model

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:choice minOccurs="0" maxOccurs="3">
<xs:element name="size" type="SizeType"/>
<xs:element name="color" type="ColorType"/>
</xs:choice>
<xs:any namespace="##other"/>
</xs:sequence>
<xs:attribute name="effDate" type="xs:date"/>
</xs:complexType>
2.6 | Complex types 27

by up to three children which may be either size or color elements,

followed by any one element from another namespace.
An any element is known as a wildcard, and it allows for open
content models. There is an equivalent wildcard for attributes,
anyAttribute, which allows any attribute to appear in a complex type.

2.6.3 Deriving complex types

Complex types may be derived from other types either by restriction
or by extension.
Restriction, as the name suggests, restricts the valid contents of a type.
The values for the new type are a subset of those for the
base type. All values of the restricted type are also valid according to
the base type.
Extension allows for adding additional child elements and/or attributes
to a type, thus extending the contents of the type. Values of the base
type are not necessarily valid for the extended type, since required ele-
ments or attributes may be added. Example 2–9 shows the deﬁnition
of ShirtType that is a complex type extension. It adds another
element declaration, color, and another attribute declaration, id, to
ProductType. New element declarations or references may only be
added to the end of a content model, so instances of ShirtType must
have the children number, size, and color, in that order.

Example 2–9. Complex type extension

<xs:complexType name="ShirtType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:sequence>
<xs:element name="color" type="ColorType"/>
</xs:sequence>
<xs:attribute name="id" type="xs:ID" use="required"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
28 Chapter 2 | A quick tour of XML Schema

2.7 Namespaces and XML Schema

Namespaces are an important part of XML Schema, and they are dis-
cussed in detail in Chapter 3. Example 2–10 shows our now-familiar
schema, this time with a target namespace declared. Let’s take a closer
look at the attributes of a schema element.

1. The namespace http://www.w3.org/2001/XMLSchema is

mapped to the xs: preﬁx. This indicates that the elements used
in the schema document itself, such as schema, element, and
complexType, are part of the XML Schema namespace.
2. A target namespace, http://datypic.com/prod, is declared.
Any schema document may have a target namespace, which
applies to the global (and some local) components declared or
deﬁned in it. Although a schema document can only have one
target namespace, multiple schema documents with different

Example 2–10. Product schema document with target namespace

<xs:element name="product" type="prod:ProductType"/>

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="size" type="prod:SizeType"/>
</xs:sequence>
<xs:attribute name="effDate" type="xs:date"/>
</xs:complexType>
<xs:simpleType name="SizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
2.8 | Schema composition 29

target namespaces can be assembled together to represent a

schema.
3. The target namespace is mapped to the prod preﬁx.

Example 2–11 shows a new instance, where a namespace is declared.

In order for an instance to be valid according to a schema, the name-
space declaration in the instance must match the target namespace of
the schema document.

Example 2–11. Instance with namespace

<prod:product xmlns:prod="http://datypic.com/prod"
effDate="2001-04-12">
<number>557</number>
<size>10</size>
</prod:product>

In this case, only the product element has a preﬁxed name. This
is because the other two elements and the attribute are declared
locally. By default, locally declared components do not take on the
target namespace. However, this can be overridden by specifying
elementFormDefault and attributeFormDefault for the schema
document. This is discussed in detail in Chapters 6 and 7.

2.8 Schema composition

An XSD schema is a set of components such as type deﬁnitions and
element declarations. Example 2–2 showed a schema document that
was used alone to validate an instance. It contained the declarations
and deﬁnitions for all of the components of the schema.
However, a schema could also be represented by an assembly of
schema documents. One way to compose them is through the include
and import mechanisms. Include is used when the other schema docu-
ment has the same target namespace as the “main” schema document.
Import is used when the other schema document has a different target
30 Chapter 2 | A quick tour of XML Schema

Example 2–12. Schema composition using include and import

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">

<xs:include schemaLocation="moreOrderInfo.xsd"/>

<xs:import namespace="http://datypic.com/prod"
schemaLocation="productInfo.xsd"/>

</xs:schema>

namespace. Example 2–12 shows how you might include and import
other schema documents.
The include and import mechanisms are not the only way for pro-
cessors to assemble schema documents into a schema. Unfortunately,
there is not always a “main” schema document that represents the whole
schema. Instead, a processor might join schema documents from various
predeﬁned locations, or take multiple hints from the instance. See
Chapter 4 for more information on schema composition.

2.9 Instances and schemas

A document that conforms to a schema is known as an instance. An
instance can be validated against a particular schema, which may be
made up of the schema components defined in multiple schema docu-
ments. A number of different ways exist for the schema documents
to be located for a particular instance. One way is using the
xsi:schemaLocation attribute. Example 2–13 shows an instance
that uses the xsi:schemaLocation attribute to map a namespace to
a particular schema document.
Using xsi:schemaLocation is not the only way to tell the proc-
essor where to find the schema. XML Schema is deliberately
flexible on this topic, allowing processors to use different methods for
choosing schema documents to validate a particular instance. These
2.10 | Annotations 31

Example 2–13. Using xsi:schemaLocation

<prod:product xmlns:prod="http://datypic.com/prod"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/prod prod.xsd"
effDate="2001-04-12">
<number>557</number>
<size>10</size>
</prod:product>

methods include built-in schemas, use of internal catalogs, use of the

xsi:schemaLocation attribute, and dereferencing of namespaces.
Chapter 5 covers the validation of instances in detail.

2.10 Annotations
XML Schema provides many mechanisms for describing the structure
of XML documents. However, it cannot express everything there is to
know about an instance or the data it contains. For this reason, XML
Schema allows annotations to be added to almost any schema compo-
nent. These annotations can contain human-readable information
(under documentation) or application information (under appinfo).
Example 2–14 shows an annotation for the product element
declaration. Annotations are covered in Chapter 21.

Example 2–14. Annotation

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:doc="http://datypic.com/doc">
<xs:element name="product" type="ProductType">
<xs:annotation>
<xs:documentation xml:lang="en"
source="http://datypic.com/prod.html#product">
<doc:description>This element represents a product.
</doc:description>
</xs:documentation>
</xs:annotation>
</xs:element>
</xs:schema>
32 Chapter 2 | A quick tour of XML Schema

2.11 Advanced features

XML Schema has some more advanced features. These features are
available if you need them, but are certainly not an integral part
of every schema. Keep in mind that you are not required to use all of
XML Schema. You should choose a subset that is appropriate for
your needs.

2.11.1 Named groups

XML Schema provides the ability to deﬁne groups of element and
attribute declarations that are reusable by many complex types. This
facility promotes reuse of schema components and eases maintenance.
Named model groups are fragments of content models, and attribute
groups are bundles of related attributes that are commonly used
together. Chapter 15 explains named groups.

2.11.2 Identity constraints

Identity constraints allow you to uniquely identify nodes in a document
and ensure the integrity of references between them. They are similar
to the primary and foreign keys in databases. They are described in
detail in Chapter 17.

2.11.3 Substitution groups

Substitution groups are a ﬂexible way to designate certain element
declarations as substitutes for other element declarations in content
models. If you have a group of related elements that may appear inter-
changeably in instances, you can reference the substitution group as a
whole in content models. You can easily add new element declarations
to the substitution groups, from other schema documents, and even
other namespaces, without changing the original declarations in any
way. Substitution groups are covered in Chapter 16.
2.11 | Advanced features 33

2.11.4 Redeﬁnition and overriding

Redefinition and overriding allow you to define a new version of a
schema component while keeping the same name. This is useful for
extending or creating a subset of an existing schema document, or
overriding the definitions of components in a schema document.
Redefinition and overriding are covered in Chapter 18.

2.11.5 Assertions
Assertions are XPath constraints on XML data, which allow complex
validation above and beyond what can be speciﬁed in a content model.
This is especially useful for co-constraints, where the values or existence
of certain child elements or attributes affect the validity of other child
elements or attributes. For example, “If the value of newCustomer is
false, then customerID must appear.” Chapter 14 covers assertions in
detail.
Namespaces

34
Chapter

nderstanding namespaces is essential to understanding XML

U Schema. This chapter introduces namespaces and explains

their relationship to schemas.

3.1 Namespaces in XML

Before we delve into the use of namespaces in schema documents, let’s
take a minute to learn about namespaces in general. Namespaces are a
surprisingly simple concept considering how much confusion and
controversy it causes. The purpose of namespaces is to provide contain-
ers for the names used in XML. A name, such as table, can have several
meanings. Its meaning in XHTML is very different from its meaning
in a hypothetical language for describing ofﬁce furniture, FurnitureML.
An element or attribute name in an instance can be declared to be in
a namespace, which provides context and identiﬁes the XML vocabulary
to which the element or attribute belongs.

35
36 Chapter 3 | Namespaces

Namespaces are deﬁned by a separate W3C recommendation

called Namespaces in XML, which is in two versions: 1.0 and 1.1.
XML Schema 1.0 uses Namespaces 1.0, and XML Schema 1.1 uses
Namespaces 1.1. There are few substantive differences between them,
mentioned in the appropriate sections of this chapter.

3.1.1 Namespace names

Namespace names are Uniform Resource Identiﬁers (URIs).
URIs encompass URLs of various schemes (e.g., HTTP, FTP,
gopher, telnet), as well as URNs (Uniform Resource Names). Many
namespaces are written in the form of HTTP URLs, such as
http://datypic.com/prod. It is also legal to use a URN, such
as urn:example:org.
The main purpose of a namespace is not to point to a location where
a resource resides. Instead, much like a Java package name, it is intended
to provide a unique name that can be associated with a particular
person or organization. Therefore, namespace names are not required
to be dereferenceable. That is, there does not necessarily need to
be an HTML page or other resource that can be accessed at
http://datypic.com/prod. The namespace URI could point to a
schema, an HTML page, a directory of resources, or nothing at all.
This is explained further in Section 21.8.5 on p. 589.
Namespace names are case-sensitive. Two namespaces are considered
different if their capitalization is different, even if you might consider
them equivalent URLs. For example, http://DATYPIC.COM/prod
and http://datypic.com/prod represent different namespaces,
because they are capitalized differently.
Although relative URI references, such as ../prod or just plain
prod are legal as URIs, they are not appropriate namespace names. A
namespace name should be unique, and it is difﬁcult to ensure the
uniqueness of ../prod. In fact, version 1.1 of the Namespaces
recommendation says that they are deprecated.
3.1 | Namespaces in XML 37

The URI syntax only allows basic Latin letters and digits, with a few
special punctuation characters. Non-Latin characters can be represented,
but they must be escaped. In Namespaces 1.1, and therefore when using
XML Schema 1.1, namespace names are actually IRIs (Internationalized
Resource Identiﬁers) rather than URIs, which means that non-Latin
characters can be directly represented in namespace names.

3.1.2 Namespace declarations and preﬁxes

An instance may include one or more namespace declarations that relate
elements and attributes to namespaces. This happens through a prefix,
which serves as a proxy for the namespace.
A namespace is declared using a special attribute whose name starts
with the letters xmlns. Example 3–1 shows an instance whose root
element has a namespace declaration. This declaration maps the
namespace http://datypic.com/prod to the prefix prod. All of
the element names in the document, namely product, number, and
size, are prefixed with prod. The system attribute does not have a
prefixed name, so it is not “in” the namespace.

Example 3–1. Namespace declaration

<prod:product xmlns:prod="http://datypic.com/prod">
<prod:number>557</prod:number>
<prod:size system="US-DRESS">10</prod:size>
</prod:product>

Preﬁxes are convenient because they are generally shorter than

namespace names, so they make the document more readable. A more
important reason for prefixes, though, is that namespace names may
contain characters that are not permitted in XML names. Prefixes
are constrained by the rules for XML non-colonized names, as described
in Section 3.1.4 on p. 40. There is no limit to how many characters
long a prefix can be, but it is best to keep prefixes short for readability.
38 Chapter 3 | Namespaces

Although the instance author may choose preﬁxes arbitrarily,

there are commonly used prefixes for some namespaces. For
example, the xsl prefix is usually mapped to the Extensible Stylesheet
Language (XSL) namespace. It is legal to map the prefix bob to the
XSL namespace and write a stylesheet with every XSL element name
prefixed with bob. However, this is not recommended because it is
confusing. For the XML Schema Namespace, the commonly used
prefixes are xsd and xs.
You can declare more than one namespace in the same
instance, as shown in Example 3–2. Two prefixes, ord and prod,
are mapped to the namespaces http://datypic.com/ord and
http://datypic.com/prod, respectively. The element names in
the document are prefixed with either ord or prod to relate them to
one of the two namespaces.

Example 3–2. Multiple namespace declarations

<ord:order xmlns:ord="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod">
<ord:number>123ABBCC123</ord:number>
<ord:items>
<prod:product>
<prod:number>557</prod:number>
<prod:size system="US-DRESS">10</prod:size>
</prod:product>
</ord:items>
</ord:order>

Note that number appears twice, with two different preﬁxes. This
illustrates the usefulness of namespaces which make it obvious
whether it is a product number or an order number. In most cases, the
two can be distinguished based on their context in the instance, but
not always.
You do not need to declare xmlns:ord and xmlns:prod as at-
tributes in the order element declaration in your schema. In fact, it
3.1 | Namespaces in XML 39

is illegal to declare them. All schema processors understand that

attributes preﬁxed with xmlns and the unpreﬁxed attribute with the
name xmlns are always permitted.

3.1.3 Default namespace declarations

An instance may also include a default namespace declaration that maps
unprefixed element names to a namespace. The default namespace
declaration uses the attribute xmlns, with no colon or prefix. In Exam-
ple 3–3, the start order tag contains a default namespace declaration.
This declaration relates the namespace http://datypic.com/ord
to all of the unprefixed element names in the document, namely order,
number, and items.

Example 3–3. Default namespace declaration

<order xmlns="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod">
<number>123ABBCC123</number>
<items>
<prod:product>
<prod:number>557</prod:number>
<prod:size system="US-DRESS">10</prod:size>
</prod:product>
</items>
</order>

Note that the default namespace declaration can be combined with

other namespace declarations in the same document and even in the
same tag.
Default namespace declarations do not directly apply to attributes.
In this case, the system attribute, although its name is not preﬁxed,
is not in the default namespace http://datypic.com/ord. It is not
directly in any namespace at all. For further explanation of the
relationship between attributes and namespaces, see Section 3.1.8 on
p. 44.
40 Chapter 3 | Namespaces

3.1.4 Name terminology

In the context of namespaces, there are several different kinds of names.
They include:
Qualiﬁed names, known as QNames, are names that are qualiﬁed
with a namespace name. This may happen one of two ways:

1. The name contains a preﬁx that is mapped to a namespace. In

Example 3–3, prod:product is a prefixed, qualified name.
2. The name does not contain a prefix, but there is a default
namespace declared for that element. In Example 3–3, items
is an unprefixed, qualified name. This applies only to elements;
there is no such thing as an unprefixed, qualified attribute name,
as you will see in Section 3.1.8 on p. 44.

Unqualified names, on the other hand, are names that are not in any
namespace. For element names, this means they are unprefixed and
there is no default namespace declaration. For attribute names, this
means they are unprefixed, period.
Prefixed names are names that contain a namespace prefix, such as
prod:product. Prefixed names are qualified names, assuming there
is a namespace declaration for that prefix in scope.
Unprefixed names are names that do not contain a prefix, such as
items. Unprefixed element names can be either qualified or unqual-
ified, depending on whether there is a default namespace declaration.
A local name is the part of a qualified name that is not the prefix. In
Example 3–3, local names include items and product.
Non-colonized names, known as NCNames, are simply XML names
that do not contain colons. That means that they are case-sensitive,
they may start with a letter or underscore (_), and contain letters, digits,
underscores (_), dashes (-), and periods (.). They cannot start with
the letters “XML” either in lower or uppercase. All local names and
3.1 | Namespaces in XML 41

unpreﬁxed names are NCNames. Preﬁxes are also NCNames, because

they follow these same rules.

3.1.5 Scope of namespace declarations

In the previous examples, namespace declarations appeared in the start
tag of the root element. Namespace declarations, including default
namespace declarations, can appear in any start tag in the document.
Example 3–4 shows the previous order example, but with the
namespace declaration for the http://datypic.com/prod namespace
moved down to the product tag.

Example 3–4. Namespace declarations in multiple tags

<order xmlns="http://datypic.com/ord">
<number>123ABBCC123</number>
<items>
<prod:product xmlns:prod="http://datypic.com/prod">
<prod:number>557</prod:number>
<prod:size system="US-DRESS">10</prod:size>
</prod:product>
</items>
</order>

The scope of a namespace declaration is the element in whose start

tag it appears, and all of its children, grandchildren, and so on. In
Example 3–4, it would be invalid to use the prod preﬁx outside of
the product element and its children. In Example 3–5, the second
product element uses the prod preﬁx, which is illegal because the
namespace declaration is outside its scope.
Generally, it is preferable to put all your namespace declarations in
the root element’s start tag. It allows you to see at a glance what
namespaces a document uses, there is no confusion about their scopes,
and it keeps them from cluttering the rest of the document.
42 Chapter 3 | Namespaces

Example 3–5. Invalid preﬁx outside of scope

3.1.6 Overriding namespace declarations

Namespace declarations can also be overridden. If a namespace decla-
ration appears within the scope of another namespace declaration with
the same prefix, it overrides it. Example 3–6 illustrates this. In the
order tag, the prefix prod is mapped to http://datypic.com/prod.
In number, it is mapped to http://datypic.com/prod2. The second
namespace declaration overrides the first within the scope of the number
element. This includes the number element itself.

Example 3–6. Overriding a namespace declaration

<order xmlns="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod">
<number>123ABBCC123</number>
<items>
<prod:product>
<prod:number xmlns:prod="http://datypic.com/prod2">
557</prod:number>
<prod:size system="US-DRESS">10</prod:size>
</prod:product>
</items>
</order>
3.1 | Namespaces in XML 43

Likewise, if a default namespace declaration appears within the scope

of another default namespace declaration, it overrides it, as shown in
Example 3–7. The default namespace declaration in the product start
tag overrides the one on the root element, meaning that product and
its children are in the http://datypic.com/prod namespace.

Example 3–7. Overriding a default namespace declaration

<order xmlns="http://datypic.com/ord">
<number>123ABBCC123</number>
<items>
<product xmlns="http://datypic.com/prod">
<number>557</number>
<size system="US-DRESS">10</size>
</product>
</items>
</order>

3.1.7 Undeclaring namespaces

A default namespace declaration may also be the empty string (that is,
xmlns=""). This means that unpreﬁxed element names in its scope
are not in any namespace. This can be used to essentially “undeclare”
the default namespace.
Example 3–8 is similar to Example 3–7 except that it uses the empty
string. This means that product and its children are in no namespace.

Example 3–8. Undeclaring a default namespace

<order xmlns="http://datypic.com/ord">
<number>123ABBCC123</number>
<items>
<product xmlns="">
<number>557</number>
<size system="US-DRESS">10</size>
</product>
</items>
</order>
44 Chapter 3 | Namespaces

In version 1.1 (but not in 1.0), you can also undeclare a prefix by
using an empty string. In Example 3–9, the namespace declaration for
the ord prefix in the product start tag undeclares the one on the root
element, meaning that the ord prefix is undefined within the scope of
product.

Example 3–9. Undeclaring a preﬁxed namespace

<ord:order xmlns:ord="http://datypic.com/ord">
<ord:number>123ABBCC123</ord:number>
<ord:items>
<prod:product xmlns:ord="" xmlns:prod="http://datypic.com/prod">
<prod:number>557</prod:number>
<prod:size system="US-DRESS">10</prod:size>
</prod:product>
</ord:items>
</ord:order>

3.1.8 Attributes and namespaces

The relationship between attributes and namespaces is slightly simpler
than the relationship between elements and namespaces. Prefixed at-
tribute names, as you would expect, are in whichever namespace is
mapped to that prefix. Attributes with prefixed names are sometimes
referred to as global attributes. Unprefixed attribute names, however,
are never in a namespace. This is because they are not affected by default
namespace declarations.
Some people make the argument that an unprefixed attribute is (or
should be) in the namespace of its parent element. While it may be
indirectly associated with that namespace, it is not directly in it. For
the purposes of writing schemas and using other XML technologies
such as XSLT and XQuery, you should treat an unprefixed attribute
as if it were in no namespace at all.
Example 3–10 shows a size element that has two attributes:
app:system and system. app:system is associated with the
3.1 | Namespaces in XML 45

namespace http://datypic.com/app through the app preﬁx. The

unpreﬁxed system attribute is not in any namespace at all, despite
the default namespace declaration.

Example 3–10. Two attributes with the same local name

Although an element cannot have two attributes with the same name,
this example is valid because the attribute names are in different
namespaces (or rather, one is in a namespace and one is not), and they
therefore are considered to have different names.
Example 3–11 is also valid, even though the default namespace and
the namespace mapped to the prod preﬁx are the same. This is again
because the unpreﬁxed system attribute is not in any namespace.

Example 3–11. Two more attributes with the same local name
<product xmlns="http://datypic.com/prod"
xmlns:prod="http://datypic.com/prod">
<number>557</number>
<size system="US-DRESS" prod:system="R32">10</size>
</product>

Example 3–12 shows an invalid duplication of attributes. The

problem is not that two different preﬁxes are mapped to the same
namespace; this is perfectly acceptable. However, it is not valid for an
element to have two attributes with the same name that are in the
same namespace, even if they have different preﬁxes.
46 Chapter 3 | Namespaces

Example 3–12. Invalid duplicate attributes

This example illustrates an important point: The preﬁx itself

has no particular meaning. Instead, it is the namespace to which it is
mapped that matters when validating and comparing names. The two
qualiﬁed names prod:system and prod2:system are equal, even
though the preﬁxes differ.

3.1.9 A summary example

To summarize our discussion of namespaces, Example 3–13 provides
a more complex instance that shows various combinations of namespace
declarations in different scopes.

Example 3–13. A summary example

<envelope>
<order xmlns="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod">
<number>123ABBCC123</number>
<items>
<product xmlns="http://datypic.com/prod">
<number prod:id="prod557">557</number>
<name xmlns="">Short-Sleeved Linen Blouse</name>
<prod:size system="US-DRESS">10</prod:size>
<prod:color xmlns:prod="http://datypic.com/prod2"
prod:value="blue"/>
</product>
</items>
</order>
</envelope>

Table 3–1 explains which namespace each name is in, and why.
3.1 | Namespaces in XML 47

Table 3–1 Explanation of the summary example

Name Namespace Explanation
(http://datypic.com/ . . . )
envelope none No prefix, no default
namespace in scope.
order ord Takes default namespace
from order (itself).
number ord Takes default namespace
(child of order) from order.
product prod Takes default namespace
from product (itself).
number prod Takes default namespace
(child of product) from product.
prod:id prod Prefix ties it to namespace
declaration in order.
name none Default namespace set to
empty string, which is
equivalent to saying that it
has no namespace.
prod:size prod Prefix ties it to the name-
space declaration in order.
system none Unprefixed attribute names
are never in a namespace,
even if there is a default
namespace declaration.
prod:color prod2 Prefix ties it to the name-
space declaration in color
(itself).
prod:value prod2 Prefix ties it to the name-
space declaration in color.
48 Chapter 3 | Namespaces

3.2 The relationship between namespaces

and schemas
Namespaces and schemas have a many-to-many relationship.
A namespace can have names deﬁned in any number of schemas. A
namespace can exist without any schema. Some namespaces have one
schema that deﬁnes its names. Other namespaces have multiple schemas.
These schemas may be designed to be used together, or be completely
incompatible with each other. They could present different perspectives
on the same information, or be designed for different purposes such
as varying levels of validation or system documentation. They could
be different versions of each other. There are no rules that prevent
several schemas from utilizing the same namespace, with overlapping
declarations. As long as the processor does not try to validate an instance
against all of them at once, this is completely legal.
A schema can declare names for any number of target namespaces.
Some schemas have no target namespace at all. Other schemas are
represented by composing multiple schema documents, each with its
own target namespace. This is described in detail in Chapter 4.

3.3 Using namespaces in schemas

3.3.1 Target namespaces
Each schema document can declare and deﬁne components for
one namespace, known as its target namespace. Every globally
declared or deﬁned component (element, attribute, type, named group,
or notation) is associated with that target namespace. Example 3–14
shows a schema document that declares a target namespace of
http://datypic.com/prod. Three element declarations are
global, and therefore all of them are in the namespace
http://datypic.com/prod. Local element declarations may or may
not use the target namespace of the schema document, as described in
Section 6.3 on p. 98.
3.3 | Using namespaces in schemas 49

Example 3–14. Declaring a target namespace

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">

<xs:element name="product" type="ProductType"/>

<xs:element name="number" type="xs:integer"/>
<xs:element name="size" type="SizeType"/>

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element ref="number"/>
<xs:element ref="size"/>
</xs:sequence>
</xs:complexType>

</xs:schema>

Adding a target namespace to a schema is not just informational; the

target namespace becomes an important part of the names, and it must
be reflected in the instance documents. Example 3–15 shows how
the elements from the previous example could appear in an instance.
Since they are associated with the http://datypic.com/prod name-
space, they must be qualified in some way, either through a prefix or
by a default namespace declaration.

Example 3–15. Preﬁxed names in an instance

<prod:product xmlns:prod="http://datypic.com/prod">
<prod:number>557</prod:number>
<prod:size>10</prod:size>
</prod:product>

A schema document cannot have more than one target namespace.

However, you can link together schema documents that have different
target namespaces, using an import. This is described in Section 4.3.2
on p. 66.
50 Chapter 3 | Namespaces

If you do not plan to use namespaces, you are not required to specify
a target namespace. In this case, omit the targetNamespace attribute
entirely.

3.3.2 The XML Schema Namespace

Since schema documents are XML, namespaces also apply to them.
For example, all the elements used in schemas, such as schema,
element, and simpleType, are in the XML Schema Namespace,
whose namespace name is http://www.w3.org/2001/XMLSchema.
In addition, the names of the built-in simple types are in this
namespace.
The prefixes most commonly mapped to this namespace are xsd or
xs. It is recommended that you use one of these for clarity, although
you could just as easily use any other prefix. Example 3–16 shows a
schema document that maps the XML Schema Namespace to xs and
prefixes all of the element names in the schema document.

Example 3–16. Declaring the XML Schema Namespace

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="number" type="xs:integer"/>
<xs:element name="size" type="SizeType"/>
<xs:simpleType name="SizeType">

</xs:simpleType>
</xs:schema>

It is interesting to note that while all the element names are preﬁxed,
all of the attribute names are unpreﬁxed. This is because none of
the attributes in the XML Schema Namespace is declared globally.
This is explained further in Section 7.4 on p. 122.
3.3 | Using namespaces in schemas 51

The xs preﬁx is also used when referring to the built-in type

integer. This is because integer is a simple type that is deﬁned in
the schema for schemas, whose target namespace is the XML Schema
Namespace.
Mapping a preﬁx such as xs to the XML Schema Namespace is one
of the three options for namespace declarations in schema documents.
See Section 3.3.5 on p. 52 for more information.

3.3.3 The XML Schema Instance Namespace

The XML Schema Instance Namespace is a separate namespace
for the four schema-related attributes that may appear in
instances. Its namespace name is http://www.w3.org/2001/
XMLSchema-instance. These attributes, whose names are com-
monly preﬁxed with xsi, are: type, nil, schemaLocation, and
noNamespaceSchemaLocation. They are described in Section 5.1
on p. 79.

3.3.4 The Version Control Namespace

The XML Schema Version Control Namespace is a namespace
used by six attributes that signal to processors the conditions
under which they should pay attention to particular schema
components. Its namespace name is http://www.w3.org/2007/
XMLSchema-versioning, and it is commonly associated with
the preﬁx vc. Four of these attributes control the portability of
implementation-deﬁned facets and types and are covered in Sec-
tion 23.5.3 on p. 642. The other two indicate versions of the XML
Schema language and are described in Section 23.5.2 on p. 641.
52 Chapter 3 | Namespaces

3.3.5 Namespace declarations in schema documents

Schema documents must contain namespace declarations of both the
XML Schema Namespace and the target namespace in order to resolve
the references between schema components. There are three ways to
set up the namespace declarations in your schema document, each of
which is described in this section.

3.3.5.1 Map a preﬁx to the XML Schema Namespace

You can map the XML Schema Namespace to a preﬁx such as xsd or
xs, and make the target namespace the default namespace. Exam-
ple 3–17 shows a schema document that uses this approach. This
method is used throughout this book. Its advantage is that it makes it
clear which components are deﬁned by XML Schema, especially when
it comes to referencing built-in types.

Example 3–17. Preﬁxing the XML Schema Namespace

If your schema document does not have a target namespace, you

must map a preﬁx to the XML Schema Namespace. Otherwise, you will
have no way of referencing other schema components that are deﬁned
in your schema document. Example 3–18 shows a schema document
that does not have a target namespace and defaults the XML Schema
Namespace. This is invalid, because the declaration of size references
the type SizeType. Since the default namespace is the XML Schema
3.3 | Using namespaces in schemas 53

Namespace, the processor will look unsuccessfully for a deﬁnition of

SizeType in the XML Schema Namespace.

Example 3–18. Invalid absence of preﬁxes

3.3.5.2 Map a preﬁx to the target namespace

Another alternative is to map a prefix to the target namespace, and
make the XML Schema Namespace the default namespace. Exam-
ple 3–19 shows a schema document that uses this approach. The names
in the declarations themselves do not need to be prefixed because
they automatically become part of the target namespace. The only
place the prefix is used is in references to other components. For
example, the declaration of size references the type SizeType by its
qualified name. If it did not prefix the name of the type, the processor
would look unsuccessfully for a definition of SizeType in the XML
Schema Namespace.

Example 3–19. Preﬁxing the target namespace

<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:prod="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">
<element name="number" type="integer"/>
<element name="size" type="prod:SizeType"/>
<simpleType name="SizeType">

</simpleType>
</schema>
54 Chapter 3 | Namespaces

If you plan to use identity constraints, you may be required to map

a preﬁx to the target namespace. See Section 17.9 on p. 439 for more
information.

3.3.5.3 Map preﬁxes to all namespaces

It is also possible to map preﬁxes to all the namespaces, as shown in
Example 3–20. This has the advantage of clarity, particularly when
your schema documents import other namespaces.

Example 3–20. Preﬁxing all namespaces

Note that the prefix used for the target namespace in the schema
does not necessarily correspond to the prefix used in the instance doc-
ument. While the schema in the previous example uses the prefix
prod for the target namespace, a valid instance document could
use prod, foo, or any other prefix, or make that namespace the default.
It is the namespace names that must match, not prefixes.
Schema
composition

56
Chapter

t is convenient to think of a schema as an individual schema

I document. However, in XML Schema, a schema can be

composed of components deﬁned in one or more schema
documents. This chapter explains how schema documents
are assembled together to represent a schema through various mecha-
nisms, including those built into XML Schema and processor-speciﬁc
handling.

4.1 Modularizing schema documents

Breaking a schema document into multiple documents has a number
of advantages such as promoting reuse, easing maintenance, and pro-
viding more granular access or versioning. The modularization can be
based on subject areas, areas of responsibility, or the likely containers
of reuse. How best to do it is a design decision that is covered in more
detail in Section 21.5.2 on p. 557.

57
58 Chapter 4 | Schema composition

4.2 Deﬁning schema documents

A schema document is most typically a physical XML ﬁle whose root
element is schema, but this is only one form of schema document. A
schema document may also be a fragment of another XML document
referenced using a fragment identiﬁer or an XPointer, a DOM tree in
memory, or some other physical representation.
Each schema document describes components for at most one
namespace, known as its target namespace. Several schema documents
can describe components in the same namespace. Some schema docu-
ments have no target namespace at all. Figure 4–1 shows several schema
documents in different namespaces.

Figure 4–1 Schema documents

4.2 | Defining schema documents 59

Each schema document is represented by a schema element whose

syntax is shown in Table 4–1.

Table 4–1 XSD Syntax: schema document

Name
schema

Parents
none
Attribute name Type Description
id ID Unique ID.
version token Version of the
schema document
(not the version of
the XML Schema
language).
xml:lang language Natural language
of the schema
document.
targetNamespace anyURI Namespace to which
all global schema
components belong,
see Section 3.3.1.
attributeFormDefault "qualified" | Whether local at-
"unqualified" : tribute declarations
"unqualified" should use qualiﬁed
names, see
Section 7.4.
(Continues)
60 Chapter 4 | Schema composition

Table 4–1 (Continued)

Attribute name Type Description
elementFormDefault "qualified" | Whether local ele-
"unqualified" : ment declarations
"unqualified" should use qualiﬁed
names, see
Section 6.3.
blockDefault "#all" | list of Whether to block ele-
("substitution" | ment substitution or
"extension" | type substitution, see
"restriction") Section 13.7.
finalDefault "#all" | list of Whether to disallow
("extension" | type derivation; see
"restriction" | Section 16.7.1 for
"list" | "union") element declarations,
Section 13.7.1 for
complex types, Sec-
tion 8.5 for simple
types.
1.1
defaultAttributes QName Name of the default
attribute group, see
Section 15.3.3.
1.1
xpathDefault- anyURI | The default name-
Namespace "##defaultNamespace" space for XPath
| expressions.
"##targetNamespace"
| "##local"

As you can see from the content model, there are two distinct sections
of a schema document. At the beginning, you specify all the includes,
imports, redeﬁnes, and overrides that are used to refer to other schema
documents. After that come the global, or top-level, components of the
schema, such as elements, attributes, named types, and groups. These
components can appear in the schema document in any order. Anno-
tations can appear at the top level throughout the schema document.

4.3 Combining multiple schema documents

There are several methods of explicitly combining multiple schema
documents.

# Includes are used to combine schema documents that have the

same target namespace.
# Imports are used to combine schema documents that have
different target namespaces.
# Redeﬁnes and overrides are used to combine schema documents
that have the same target namespace, while revising the
deﬁnition of the included components.

Includes and imports are covered in this section. Because of the

complexities of redeﬁnes and overrides, all of Chapter 18 is devoted
to them.
Although includes and imports are very common, they are not the
only way to assemble schema documents. There is not always a “main”
schema document that represents the whole schema. Some other
alternatives are:

# The instance author can specify multiple schema locations in

the instance, as described in Section 5.3.1 on p. 84.
# The processor can assemble schema documents from predeﬁned
locations.
62 Chapter 4 | Schema composition

# Multiple command-line parameters can be used to list the

locations of the schema documents.

4.3.1 include
An include is used when you want to include other schema documents
in a schema document that has the same target namespace. This pro-
vides for modularization of schema documents. For example, you may
want to break your schema into several documents: two different order
schema documents and a customer schema document. This is depicted
in Figure 4–2.

Figure 4–2 Includes

4.3 | Combining multiple schema documents 63

4.3.1.1 The syntax of includes

Includes are represented by include elements, whose syntax is shown
in Table 4–2.

Table 4–2 XSD Syntax: include

Name
include

Parents
schema

Attribute name Type Description

id ID Unique ID.
schemaLocation anyURI Location of the included schema
document.
Content
annotation?

The include elements may only appear at the top level of a schema
document, and they must appear at the beginning (along with the
import, redefine, and override elements).
The schemaLocation attribute indicates where the included schema
document is located. This attribute is required, although the location
is not required to be resolvable. However, if it is resolvable, it must be
a complete schema document.
Example 4–1 shows the use of include in a schema document. The
schema author wants to use the type OrderNumType in the number
element declaration. However, OrderNumType is deﬁned in a different
schema document. The include statement references the location of
the schema document, ord2.xsd, that contains the deﬁnition of
OrderNumType. In this example, the including document is referring
to a simple type in the included document, but it could similarly refer to
64 Chapter 4 | Schema composition

elements, attributes, complex types, or any other global components

in the included document.

Example 4–1. Include

ord1.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">

<xs:include schemaLocation="ord2.xsd"/>

<xs:element name="order" type="OrderType"/>

<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="number" type="OrderNumType"/>

</xs:sequence>
</xs:complexType>

</xs:schema>

ord2.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">

<xs:simpleType name="OrderNumType">
<xs:restriction base="xs:string"/>
</xs:simpleType>
</xs:schema>

The schema documents ord1.xsd and ord2.xsd have the same

target namespace. When you use includes, one of the following must
be true:

# Both schema documents have the same target namespace.

# Neither schema document has a target namespace.
4.3 | Combining multiple schema documents 65

# The including schema document has a target namespace,

and the included schema document does not have a target
namespace.

There can be multiple include elements in a schema document.

There can also be multiple levels of includes in schema documents. For
example, ord1.xsd can include ord2.xsd, which includes cust.xsd,
and so on. It is not an error to include the exact same schema document
twice.

4.3.1.2 Chameleon includes

In the case where the included schema document has no target
namespace, all components of the included schema document take on
the namespace of the including schema document. These components
are sometimes called chameleon components, because their namespace
changes depending on where they are included. This is shown in
Example 4–2.

Example 4–2. Chameleon include

ord1.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">

<xs:include schemaLocation="cust.xsd"/>

<xs:element name="order" type="OrderType"/>

<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="number" type="xs:string"/>
<xs:element name="customer" type="CustomerType"/>

</xs:sequence>
</xs:complexType>
</xs:schema>

(Continues)
66 Chapter 4 | Schema composition

Example 4–2. (Continued)

cust.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:complexType name="CustomerType">
<xs:sequence>
<xs:element name="name" type="CustNameType"/>

</xs:sequence>
</xs:complexType>
<xs:simpleType name="CustNameType">
<xs:restriction base="xs:string"/>
</xs:simpleType>
</xs:schema>

Note that in cust.xsd, the element declaration of name can refer-

ence the type CustNameType without any namespace. Even though
these components will take on the target namespace of the ord.xsd
schema document, the unqualified references between components in
cust.xsd will be honored.
However, in ord1.xsd, the references to the cust.xsd components
such as CustomerType do have to be qualified. This example works
because ord1.xsd declares http://datypic.com/ord as the default
namespace (in addition to being the target namespace). This means
that any unprefixed references, such as the one to CustomerType, are
considered to be in that namespace.

4.3.2 import
An import is used to tell the processor that you will be referring to
components from other namespaces. For example, if you want to refer-
ence an attribute from another namespace in your complex type deﬁni-
tion, or you want to derive your type from a type in another namespace,
you must import this namespace. This is depicted in Figure 4–3.
4.3 | Combining multiple schema documents 67

Figure 4–3 Import

Imports differ from includes in two important ways. First, includes

only take place within a namespace, while imports take place across
namespaces. The second, subtler distinction is their general purpose.
The purpose of an include is speciﬁcally to pull in other schema docu-
ments, while the purpose of an import is to record a dependency on
another namespace, not necessarily another schema document. Import
does allow you to specify the location of a schema document for that
namespace, but it is just a hint, and the processor is not required to try
to resolve it.

4.3.2.1 The syntax of imports

Imports are represented by import elements, whose syntax is shown
in Table 4–3.
68 Chapter 4 | Schema composition

Table 4–3 XSD Syntax: import

Name
import

Parents
schema

Attribute name Type Description

id ID Unique ID.
namespace anyURI Namespace to be imported.
schemaLocation anyURI Location of a schema document which
describes components in the imported
namespace.
Content
annotation?

The import elements may only appear at the top level of a schema
document, and must appear at the beginning (along with the include,
redefine, and override elements).
The namespace attribute indicates the namespace that you wish to
import. If you do not specify a namespace, it means that you are im-
porting components that are not in any namespace. The imported
namespace cannot be the same as the target namespace of the importing
schema document. If the importing schema document has no target
namespace, the import element must have a namespace attribute.
The schemaLocation attribute provides a hint to the processor as
to where to find a schema document that declares components for that
namespace. If you do not specify a schemaLocation, it is assumed
that the processor somehow knows where to find the schema document,
perhaps because it was specified by the user or built into the processor.
When schemaLocation is present and the processor is able to resolve
the location to some resource, it must resolve to a schema document.
4.3 | Combining multiple schema documents 69

That schema document’s target namespace must be equal to the value

of the namespace attribute of the import element.
Looping references (ord1.xsd imports prod.xsd’s namespace, and
prod.xsd imports ord1.xsd’s namespace) are also acceptable, because
this just indicates the interdependence of the components.
Example 4–3 shows the use of import in a schema document. The
schema author wants to use the type ItemsType in an element decla-
ration. However, ItemsType is defined in a different namespace. The
import statement references the namespace and location of the schema
document that contains ItemsType. The declaration of items is then
able to reference ItemsType using the appropriate prefix.
Note that the target namespace of prod.xsd must also be declared
as a namespace in the importing schema document. This is necessary
so that the items element declaration can refer to the type ItemsType
using its appropriate namespace prefix.

Example 4–3. Import

ord1.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod"
targetNamespace="http://datypic.com/ord">

<xs:import namespace="http://datypic.com/prod"
schemaLocation="prod.xsd"/>

<xs:element name="order" type="OrderType"/>

<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="number" type="xs:string"/>
<xs:element name="items" type="prod:ItemsType"/>

</xs:sequence>
</xs:complexType>
</xs:schema>

(Continues)
70 Chapter 4 | Schema composition

Example 4–3. (Continued)

prod.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">

<xs:complexType name="ItemsType">
<xs:sequence>
<xs:element name="product" type="ProductType"/>
</xs:sequence>
</xs:complexType>

</xs:schema>

4.3.2.2 Multiple levels of imports

As with includes, imports can be chained together. That is, ord1.xsd
can import prod.xsd which itself imports a third schema document,
extension.xsd. However, imports are subtly different from includes
in this regard. With multilevel includes, ord1.xsd would automatically
be able to refer to components from extension.xsd. With imports,
if ord1.xsd wants to directly refer to components in extension.xsd,
it would have to also directly import the target namespace of
extension.xsd. This is because the import is also used to record a
dependency on another namespace, not just incorporate another schema
document.
This is shown in Example 4–4, where ord1.xsd directly references
ext:ExtensionType from extension.xsd. This requires it to import
extension.xsd (or at least the ext namespace), which it does. If it
did not directly refer to a component in that namespace, it would not
need to import it.
4.3 | Combining multiple schema documents 71

Example 4–4. Multiple levels of import

ord1.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod"
xmlns:ext="http://datypic.com/ext"
targetNamespace="http://datypic.com/ord">

<xs:import namespace="http://datypic.com/prod"
schemaLocation="prod.xsd"/>
<xs:import namespace="http://datypic.com/ext"
schemaLocation="extension.xsd"/>

<xs:element name="order" type="OrderType"/>

<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="number" type="xs:string"/>
<xs:element name="items" type="prod:ItemsType"/>

<xs:element name="extension" type="ext:ExtensionType"/>
</xs:sequence>
</xs:complexType>
</xs:schema>

prod.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
xmlns:ext="http://datypic.com/ext"
targetNamespace="http://datypic.com/prod">

<xs:import namespace="http://datypic.com/ext"
schemaLocation="extension.xsd"/>

<xs:complexType name="ItemsType">
<xs:sequence>

<xs:element name="extension" type="ext:ExtensionType"/>
</xs:sequence>
</xs:complexType>
</xs:schema>

(Continues)
72 Chapter 4 | Schema composition

Example 4–4. (Continued)

extension.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ext"
targetNamespace="http://datypic.com/ext">

<xs:complexType name="ExtensionType">

</xs:complexType>
</xs:schema>

4.3.2.3 Multiple imports of the same namespace

It is legal to have multiple imports of the same namespace in the same
schema document or a set of assembled schema documents. If they refer
to the same schema document, the processor will determine that they
are the same components and will not raise errors about duplicate
deﬁnitions. This was the case in Example 4–4, where both ord1.xsd
and prod.xsd import the same schema document, extension.xsd.
However, if multiple imports of the same namespace refer to different
schema documents, most processors will ignore all but the ﬁrst one
they encounter. This is permissible (if not always convenient)
because imports, unlike includes, are considered to be just “hints” that
the processor can choose to ignore.
Suppose, for example, that a schema document root.xsd declares
a root element root that has two children: orderSummary and
orderDetails. The two child elements are in the same namespace
but declared in different schema documents. You might expect to be
able to import both Summary.xsd and Detail.xsd into root.xsd,
as shown in Example 4–5. However, when you validate an
instance against root.xsd, most processors will ignore the import
of Detail.xsd and raise an error because no declaration of
orderDetails could be found.
4.3 | Combining multiple schema documents 73

Example 4–5. Multiple imports of the same namespace

root.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/root"
xmlns:ord="http://datypic.com/ord"
targetNamespace="http://datypic.com/root">

<xs:import namespace="http://datypic.com/ord"
schemaLocation="Summary.xsd"/>
<xs:import namespace="http://datypic.com/ord"
schemaLocation="Detail.xsd"/>

<xs:element name="root" type="RootType"/>

<xs:complexType name="RootType">
<xs:sequence>
<xs:element ref="ord:orderSummary"/>
<xs:element ref="ord:orderDetails"/>

</xs:sequence>
</xs:complexType>

</xs:schema>

Summary.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">

<xs:element name="orderSummary"/>

</xs:schema>

Detail.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">

<xs:element name="orderDetails"/>

</xs:schema>
74 Chapter 4 | Schema composition

One way to get around this is to declare a single schema document,

sometimes called a proxy schema, for the ord namespace and include
all the necessary schema documents in that namespace, as shown in
Example 4–6. This will work because there is only one import for that
namespace in root.xsd and the includes in Orders.xsd cannot be
ignored.

Example 4–6. Proxy schema to avoid multiple imports

root.xsd:

<xs:import namespace="http://datypic.com/ord"
schemaLocation="Orders.xsd"/>

<xs:element name="root" type="RootType"/>

<xs:complexType name="RootType">
<xs:sequence>
<xs:element ref="ord:orderSummary"/>
<xs:element ref="ord:orderDetails"/>

</xs:sequence>
</xs:complexType>

</xs:schema>

Orders.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">

<xs:include schemaLocation="Summary.xsd"/>
<xs:include schemaLocation="Detail.xsd"/>

</xs:schema>
4.4 | Schema assembly considerations 75

4.4 Schema assembly considerations

Whether you are using includes, imports, redeﬁnes, or overrides, there
are several factors to take into consideration when combining multiple
schema documents into a single schema. These factors are discussed in
this section.

4.4.1 Uniqueness of qualiﬁed names

The qualiﬁed names of globally declared components must be unique
in the schema, not just a schema document. When assembling a schema
from multiple schema documents, be careful not to introduce duplicate
qualiﬁed names. Example 4–7 shows two schema documents, both of
which contain global element declarations for order.

Example 4–7. Illegal duplication of element names

ord1.xsd:

ord2.xsd:

It is not illegal for two schema documents to exist that have duplicate
names, since they may be used at different times in different situa-
tions. However, since ord1.xsd includes ord2.xsd, they will be used
76 Chapter 4 | Schema composition

together, and this is illegal. Remember, the qualiﬁed name includes

the namespace name, so this example would be valid if the two schema
documents had different target namespaces (and ord2.xsd had been
imported rather than included).
This rule holds true for all named, global components, including
attributes, simple and complex types, named model groups, attribute
groups, identity constraints, and notations. The uniqueness of qualified
names is within the type of component. For example, it is illegal to
have two global element declarations for order, but it is legal to have
both an element declaration and a simple type definition with that
name. However, simple and complex types cannot share the same
qualified name.

4.4.2 Missing components

In some cases, declarations or definitions will refer to components that
are outside the schema document. In Example 4–8, the order element
declaration uses the type OrderType that is not defined in the
schema document. This is not illegal unless a processor tries to use that
declaration and cannot find a definition of OrderType.

Example 4–8. Missing component

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">

<xs:element name="number" type="xs:integer"/>

<xs:element name="order" type="OrderType"/>
</xs:schema>

In this case, the processor might obtain access to a schema document

that contains the OrderType deﬁnition by some other means, as
described in Section 4.3 on p. 61.
Even if a schema document containing the deﬁnition of OrderType
is never found, the processor will still be able to validate a number
4.4 | Schema assembly considerations 77

element. The fact that there are unresolved references in the schema is
only an error if such a reference is directly involved in the validation.

4.4.3 Schema document defaults

As we saw in Section 4.2 on p. 58, schema documents can have four
defaults specified: attributeFormDefault, elementFormDefault,
blockDefault, and finalDefault. As schema documents are assem-
bled into schemas, these defaults are not overridden in any way. The
defaults of a schema document still apply to all components defined
or declared in that particular schema document. For example, if
ord2.xsd has elementFormDefault set to unqualified, all of the
local element declarations in ord2.xsd will have unqualified names,
even if ord2.xsd is included in another schema document that has
elementFormDefault set to qualified.
Instances
and schemas

78
Chapter

here is a many-to-many relationship between instances and

T schemas. A schema can describe many valid instances,

possibly with different root element names. Likewise, an
instance may be described by many schemas, depending
on the circumstances. For example, you may have multiple schemas
for an instance, with different levels of validation. One may just validate
the structure, while another checks every data item against a type. There
may also be multiple schemas with different application informa-
tion to be used at processing time. This chapter explains the interaction
between schemas and instances.

5.1 Using the instance attributes

There are four attributes that can apply to any element
in an instance. These four attributes, which are described in
Table 5–1, are all in the XML Schema Instance Namespace,

79
80 Chapter 5 | Instances and schemas

http://www.w3.org/2001/XMLSchema-instance. This namespace

is commonly mapped to the preﬁx xsi.1

Table 5–1 Instance attributes

Attribute name Type Description
nil boolean: false Whether the element’s
value is nil, see
Section 6.5.
type QName The name of a type that
is being substituted for
the element’s declared
type, see Section 13.6.
schemaLocation list of anyURI List of the locations of
schema documents for
designated namespaces,
see Section 5.3.1.1.
noNamespaceSchemaLocation anyURI Location of a schema
document with no target
namespace, see
Section 5.3.1.2.

Example 5–1 shows the use of xsi:type in an instance.

Because these four attributes are globally declared, their names must
be prefixed in instances. You are required to declare the XML Schema
Instance Namespace and map a prefix (preferably xsi) to it. However,
you are not required to specify a schema location for these four
attributes. You are also not required or even permitted to declare
xsi:type as an attribute in the type definition for number. The at-
tributes in the XML Schema Instance Namespace, like namespace

1. While any preﬁx may be mapped to the namespace, this book uses the
preﬁx xsi as a shorthand, sometimes without explicitly stating that it is
mapped to http://www.w3.org/2001/XMLSchema-instance.
5.2 | Schema processing 81

Example 5–1. Using an instance attribute

declarations, are special attributes that a schema processor always

recognizes without explicit declarations.
In fact, the number element in this example can have a simple type,
even though elements with simple types are normally not allowed to
have attributes.

5.2 Schema processing

5.2.1 Validation
Validation is an important part of schema processing. Validation deter-
mines whether an instance conforms to all of the constraints described
in the schema. It involves checking all of the elements and attributes in
an instance to determine that they have declarations and that they
conform to those declarations and to the corresponding type deﬁnitions.
The validation process veriﬁes:

# Correctness of the data. Validating against a schema does not

provide a 100% guarantee that the data is correct, but it can
signal invalid formats or out-of-range values.
# Completeness of the data. Validation can check that all required
information is present.
# Shared understanding of the data. Validation can make sure that
the way you perceive the document is the same way that the
sender perceives it.
82 Chapter 5 | Instances and schemas

Whether to validate your instances on a regular basis depends on a

number of factors.

# Where the instances originate. Within your organization, perhaps

you have control over the application that generates instances.
After some initial testing, you may trust that all documents
coming from that application are valid, without performing
validation. However, often the instances you are processing are
originating outside your organization. You may be less likely to
trust these documents.
# Whether the instances were application-generated or user-generated.
Human involvement can introduce typographical and other
errors. Even with validating XML editors, it is still possible to
introduce errors inadvertently during the handling of the
documents.
# Data quality. For example, if the instances are generated directly
from an existing database, they may not be complete or 100%
correct.
# Performance. Obviously, it takes extra time to validate. If perfor-
mance is critical, you may want to avoid some validation or
write application-speciﬁc code that can validate more efﬁciently
than a schema processor.

5.2.2 Augmenting the instance

In addition to validating the instance, a schema processor may alter
the instance by

# Adding default and ﬁxed values for elements and attributes

# Normalizing whitespace in element and attribute values that
contain character data
5.3 | Relating instances to schemas 83

Because of this, it is important that the sender and receiver of the

document agree on the schema to use. If the receiver processes an
element with a declaration that has a default value different from that
of the sender’s declaration, it can alter the data of the element in ways
unintended by the sender.

5.3 Relating instances to schemas

Instances can be related to schemas in a number of ways.

# Using hints in the instance. The xsi:schemaLocation and

xsi:noNamespaceSchemaLocation attributes can be used in
the instance to provide a hint to the processor where to ﬁnd the
schema documents.
# Application’s choice. Most applications will be processing the
same type of instances repeatedly. These applications may al-
ready know where the appropriate schema documents are on
the web, or locally, or even have them built in. In this case, the
processor could either (1) ignore xsi:schemaLocation, or
(2) reject documents containing xsi:schemaLocation
attributes, or (3) reject documents in which the
xsi:schemaLocation does not match the intended schema
document.
# User’s choice. The location of the schema document(s) can be
speciﬁed, at processing time, by a command-line instruction or
user dialog.
# Dereferencing the namespace. The namespace name can be
dereferenced to retrieve a schema document or resource direc-
tory. However, this is not typically done by XML Schema
processors.
84 Chapter 5 | Instances and schemas

5.3.1 Using hints in the instance

XML Schema provides two attributes that act as hints to where
the processor might ﬁnd the schema document(s) for the instance.
Different processors may ignore or acknowledge these hints
in different ways.
These two attributes are: xsi:schemaLocation, for use
with schema documents that have target namespaces, and
xsi:noNamespaceSchemaLocation, for use with schema documents
without target namespaces.

5.3.1.1 The xsi:schemaLocation attribute

The xsi:schemaLocation attribute allows you to specify a list of
pairs that match namespace names with schema locations. Example 5–2
shows an instance that uses xsi:schemaLocation. The default
namespace for the document is http://datypic.com/prod. The
xsi preﬁx is assigned to the XML Schema Instance Namespace, so
that the processor will recognize the xsi:schemaLocation attribute.
Then, the xsi:schemaLocation attribute is speciﬁed to relate the
namespace http://datypic.com/prod to the schema location
prod.xsd.

Example 5–2. Using xsi:schemaLocation

The value of the xsi:schemaLocation attribute is actually at least

two values separated by whitespace. The ﬁrst value is the namespace
name (in this example http://datypic.com/prod), and the second
value is the URL for the schema location (in this example prod.xsd,
5.3 | Relating instances to schemas 85

a relative URI). The processor will retrieve the schema document from
the schema location and make sure that its target namespace matches
that of the namespace it is paired with in xsi:schemaLocation.
Since spaces are used to separate values in this attribute, you
should not have spaces in your schema location path. You can replace
a space with %20, which is standard for URLs. For example, instead of
my schema.xsd, use my%20schema.xsd. To use an absolute path
rather than a relative path, some processors require that you start your
schema location with file:/// (with three forward slashes), as in
file:///C:/Users/PW/Documents/prod.xsd.
If multiple namespaces are used in the document,
xsi:schemaLocation can contain more than one pair of values,
as shown in Example 5–3.

Example 5–3. Using xsi:schemaLocation with multiple pairs

If you have a schema document that imports schema documents

with different target namespaces, you do not have to specify schema
locations for all the namespaces (if the processor has some other way
of ﬁnding the schema documents, such as the schemaLocation at-
tribute of import). For example, if ord1.xsd imports prod.xsd, it
is not necessary to specify prod.xsd in the xsi:schemaLocation in
the instance. You do still need to declare your namespaces using the
xmlns attributes, as shown in the example.
86 Chapter 5 | Instances and schemas

It is not illegal to list two or more pairs of values that refer to the
same namespace. In Example 5–3, you could refer to both ord1.xsd
and ord2.xsd, repeating the same namespace name for each. However,
this is not recommended because many processors will ignore all but the
ﬁrst schema location for a particular namespace.
It is generally a good practice to use one main schema document
that includes or imports all other schema documents needed for valida-
tion. This simpliﬁes the instance and makes name collisions more
obvious.
The xsi:schemaLocation attribute may appear anywhere in an
instance, in the tags of any number of elements. Its appearance in a
particular tag does not signify its scope. However, it must appear before
any elements that it would validate. It is most typical to put the
xsi:schemaLocation attribute on the root element, for simplicity.

5.3.1.2 The xsi:noNamespaceSchemaLocation

attribute
The xsi:noNamespaceSchemaLocation attribute is used to
reference a schema document with no target namespace.
xsi:noNamespaceSchemaLocation does not take a list of values;
only one schema location may be speciﬁed. The schema document
referenced cannot have a target namespace. Example 5–4 shows the
use of xsi:noNamespaceSchemaLocation in an instance.

Example 5–4. Using xsi:noNamespaceSchemaLocation

It is legal according to XML Schema to have both

xsi:noNamespaceSchemaLocation and xsi:schemaLocation
5.4 | The root element 87

speciﬁed, but once again, you should check with your processor
to see what it will accept.

5.4 The root element

Sometimes you want to be able to specify which element declaration
is for the root element of the instance. For example, you may not want
the document shown in Example 5–5 to be considered a valid instance,
although the element itself is valid according to its declaration.

Example 5–5. A valid instance?

Schemas work similarly to DTDs in this regard. There is no way

to designate the root. Any element conforming to a global element
declaration can be a root element for validation purposes.
You can work around this by having only one global element decla-
ration. If the number declaration is local, Example 5–5 is not valid on
its own. However, there are times that you cannot avoid global element
declarations either because you are using substitution groups or because
you are importing element declarations over which you have no control.
A better approach is to use the application to verify that the root
element is the one you expect.
Using some schema processors, validation may not necessarily
start at the root. It is possible to validate sections of instance
documents with different schema documents using different
xsi:schemaLocation hints, or to validate fragments of instance
documents identiﬁed by IDs or XPointer expressions. Also, one
schema document may describe several related types of instance docu-
ments (e.g., purchase orders and invoices) which may have different
root elements.
Element
declarations

88
Chapter

his chapter covers the basic building blocks of XML: ele-

T ments. It explains how to use element declarations to assign

names and types to elements. It also describes element
properties that can be set via element declarations, such as
default and fixed values, nillability, and qualified versus unqualified
name forms.

6.1 Global and local element declarations

Element declarations are used to assign names and types to elements.
This is accomplished using an element element. Element declarations
can be either global or local.

6.1.1 Global element declarations

Global element declarations appear at the top level of the schema
document, meaning that their parent must be schema. These global

89
90 Chapter 6 | Element declarations

element declarations can then be used in multiple complex types, as

described in Section 12.4.2 on p. 267. Table 6–1 shows the syntax for
a global element declaration.

Table 6–1 XSD Syntax: global element declaration

Name
element

Parents
schema, 1.1
override

Attribute name Type Description

id ID Unique ID.
name NCName Element name.
type QName Type, see Section 6.2.
default string Default value, see
Section 6.4.1.
fixed string Fixed value, see Section 6.4.2.
nillable boolean: false Whether xsi:nil can be used
in the instance, see
Section 6.5.
abstract boolean: false Whether the declaration can
apply to an instance element
(as opposed to being just the
head of a substitution group),
see Section 16.7.3.
substitutionGroup QName or Head of the substitution group
1.1
list of QName to which it belongs, see
Section 16.3.
(Continues)
6.1 | Global and local element declarations 91

Table 6–1 (Continued)

Attribute name Type Description
block "#all" | list of Whether type and/or element
("substitution" substitutions should be
| "extension" | blocked from the instance (see
"restriction") Section 13.7.3 for type substi-
tutions, Section 16.7.2 for ele-
ment substitutions); defaults
to blockDefault of schema.
final "#all" | list of Whether the declaration can
("extension" | be the head of a substitution
"restriction") group (see Section 16.7.1);
defaults to finalDefault of
schema.

Content
annotation?, (simpleType | complexType)?, 1.1
alternative*,
(key | keyref | unique)*

Example 6–1 shows two global element declarations: name and size.
A complex type is then defined which references these element
declarations by name using the ref attribute.
The qualified names used by global element declarations must be
unique in the schema. This includes not just the schema document in
which they appear, but also any other schema documents that are
used with it.
The name specified in an element declaration must be an XML non-
colonized name, which means that it must start with a letter or under-
score (_), and may only contain letters, digits, underscores (_), hyphens
(-), and periods (.). The qualified element name consists of the target
namespace of the schema document, plus the local name in the decla-
ration. In Example 6–1, the name and size element declarations take
on the target namespace http://datypic.com/prod.
92 Chapter 6 | Element declarations

Example 6–1. Global element declarations

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">

<xs:element name="name" type="xs:string"/>

<xs:element name="size" type="xs:integer"/>

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element ref="name"/>
<xs:element ref="size" minOccurs="0"/>
</xs:sequence>
</xs:complexType>

</xs:schema>

Since globally declared element names are qualiﬁed by the target

namespace of the schema document, it is not legal to include
a namespace preﬁx in the value of the name attribute, as shown
in Example 6–2. If you want to declare elements in a different name-
space, you must create a separate schema document with that target
namespace and import it into the original schema document.

Example 6–2. Illegal attempt to preﬁx an element name

Occurrence constraints (minOccurs and maxOccurs) appear in an

element reference rather than the global element declaration. This is
because they are related to the appearance of an element in a particular
content model. Element references are covered in Section 12.4.2 on
p. 267.
6.1 | Global and local element declarations 93

6.1.2 Local element declarations

Local element declarations, on the other hand, appear entirely within
a complex type deﬁnition. Local element declarations can only be used
in that type deﬁnition, never referenced by other complex types or
used in a substitution group. Table 6–2 shows the syntax for a local
element declaration.

Table 6–2 XSD Syntax: local element declaration

Name
element

Parents
all, choice, sequence

Attribute name Type Description

id ID Unique ID.
name NCName Element name.
form "qualified" | Whether the element
"unqualified" name must be qualiﬁed
in the instance (see
Section 6.3); defaults to
elementFormDefault
of schema, which de-
faults to unqualified.
type QName Type, see Section 6.2.
minOccurs nonNegativeInteger: 1 Minimum number of
element occurrences,
see Section 12.4.2.
maxOccurs nonNegativeInteger | Maximum number of
"unbounded" : 1 element occurrences,
see Section 12.4.2.
(Continues)
94 Chapter 6 | Element declarations

Table 6–2 (Continued)

Attribute name Type Description
default string Default value, see
Section 6.4.1.
fixed string Fixed value, see
Section 6.4.2.
nillable boolean: false Whether xsi:nil can
be used in the instance,
see Section 6.5.
block "#all" | list of Whether type substitu-
("extension" | tions should be blocked
"restriction") from the instance (see
Section 13.7.3); de-
faults to blockDefault
of schema.
1.1
targetNamespace anyURI The target namespace,
if restricting a type in
another namespace, see
Section 13.5.7.
Content
annotation?, (simpleType | complexType)?, 1.1
alternative*,
(key | keyref | unique)*

Example 6–3 shows two local element declarations, name and size,
which appear entirely within a complex type definition.
Occurrence constraints (minOccurs and maxOccurs) can
appear in local element declarations. Some attributes, namely
substitutionGroup, final, and abstract, are valid in global
element declarations but not in local element declarations. This is
because these attributes all relate to substitution groups, in which
local element declarations cannot participate.
The name specified in a local element declaration must also be an
XML non-colonized name. If its form is qualified, it takes on the target
6.1 | Global and local element declarations 95

Example 6–3. Local element declarations

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:schema>

namespace of the schema document. If it is unqualiﬁed, it is considered

to be in no namespace. See Section 6.3 on p. 98 for more information.
Names used in local element declarations are scoped to the complex
type within which they are declared. You can have two completely
different local element declarations with the same element name, as
long as they are in different complex types. You can also have two local
element declarations with the same element name in the same
complex type, provided that they themselves have the same type. This
is explained further in Section 12.4.3 on p. 268.

6.1.3 Design hint: Should I use global or local element

declarations?
Use global element declarations if:

# The element declaration could ever apply to the root element

during validation. Such a declaration should be global so that
the schema processor can access it.
# You want to use the exact same element declaration in more
than one complex type.
# You want to use the element declaration in a substitution group.
Local element declarations cannot participate in substitution
groups (see Chapter 16).
96 Chapter 6 | Element declarations

Use local element declarations if:

# You want to allow unqualiﬁed element names in the instance.

In this case, make all of the element declarations local except
for the root element declaration. If you mix global and local
declarations, and you want the element names in local declara-
tions to be unqualified, you will require your instance authors
to know which element declarations are global and which are
local. Global element names are always qualified in the instance
(see Section 6.3 on p. 98).
# You want to have several element declarations with the same
name but different types or other properties. Using local decla-
rations, you can have two element declarations for size: One
that is a child of shoe has the type ShoeSizeType, and one that
is a child of hat has the type HatSizeType. If the size decla-
ration is global, it can only occur once, and therefore use only
one type, in that schema. The same holds true for default and
fixed values as well as nillability.

6.2 Declaring the types of elements

Regardless of whether they are local or global, all element declarations
associate an element name with a type, which may be either simple or
complex. There are four ways to associate a type with an element name:

1. Reference a named type by specifying the type attribute in the

element declaration. This may be either a built-in type or a
user-deﬁned type.
2. Deﬁne an anonymous type by specifying either a simpleType
or a complexType child.
3. Use no particular type, by specifying neither a type attribute
nor a simpleType or complexType child. In this case, the
6.2 | Declaring the types of elements 97

actual type is anyType which allows any children and/or

character data content, and any attributes, as long as it is
well-formed XML.1
4. Deﬁne one or more type alternatives using alternative chil-
dren. This more advanced feature of version 1.1 is described
separately in Section 14.2 on p. 375.

Example 6–4 shows four element declarations with different type

assignment methods.

Example 6–4. Assigning types to elements

<xs:element name="size" type="SizeType"/>

<xs:element name="name" type="xs:string"/>

<xs:element name="product">
<xs:complexType>
<xs:sequence>
<xs:element ref="name"/>
<xs:element ref="size"/>
</xs:sequence>
</xs:complexType>
</xs:element>

<xs:element name="anything"/>

The ﬁrst example uses the type attribute to specify SizeType as

the type of size. The second example also uses the type attribute,
this time to assign a built-in type string to name. The xs prefix is
used because built-in types are part of the XML Schema Namespace.
For a complete explanation of the use of prefixes in schema documents,
see Section 3.3.5 on p. 52.
The third example uses an in-line anonymous complex type, which
is defined entirely within the product declaration. Finally, the fourth

1. Unless it is in a substitution group, as described in Chapter 16.

98 Chapter 6 | Element declarations

element declaration, anything, does not specify a particular type,

which means that anything elements can have any well-formed content
and any attributes.
For a detailed discussion of using named or anonymous types, see
Section 8.2.3 on p. 133.

6.3 Qualiﬁed vs. unqualiﬁed forms

When an element declaration is local—that is, when it isn’t at the top
level of a schema document—you have the choice of putting those ele-
ment names into the target namespace of the schema or not. Let’s
explore the two alternatives.

6.3.1 Qualiﬁed local names

Example 6–5 shows an instance where all element names are
qualiﬁed. Every element name has a preﬁx that maps it to the product
namespace.

Example 6–5. Qualiﬁed local names

<prod:product xmlns:prod="http://datypic.com/prod">
<prod:number>557</prod:number>
<prod:size>10</prod:size>
</prod:product>

6.3.2 Unqualiﬁed local names

Example 6–6, on the other hand, shows an instance where only the
root element name, product, is qualiﬁed. The other element names
have no preﬁx, and since there is no default namespace declaration,
they are not in any namespace.
6.3 | Qualified vs. unqualified forms 99

Example 6–6. Unqualiﬁed local names

<prod:product xmlns:prod="http://datypic.com/prod">
<number>557</number>
<size>10</size>
</prod:product>

6.3.3 Using elementFormDefault

Let’s look at the schemas that would describe these two instances.
Example 6–7 shows a schema for the instance in Example 6–5, which
has qualiﬁed element names.

Example 6–7. Schema for qualiﬁed local element names

The schema document has elementFormDefault set to qualified.

As a result, elements conforming to local declarations must use qualiﬁed
element names in the instance. In this example, the declaration for
product is global and the declarations for number and size are local.
To create a schema for the instance in Example 6–6, which
has unqualiﬁed names, you can simply change the value of
elementFormDefault in the schema document to unqualified.
Or, since the default value is unqualified, you could simply
omit the attribute. In this case, elements conforming to global
100 Chapter 6 | Element declarations

declarations must still use qualiﬁed element names—hence the use of

prod:product in the instance.

6.3.4 Using form

It is also possible to specify the form on a particular element declaration
using a form attribute whose value, like elementFormDefault, is
either qualified or unqualified. Example 6–8 shows a revised
schema that uses the form attribute on the number element declaration
to override elementFormDefault and make it unqualiﬁed.

Example 6–8. Using the form attribute

A valid instance is shown in Example 6–9.

Example 6–9. Overridden form

<prod:product xmlns:prod="http://datypic.com/prod">
<number>557</number>
<prod:size>10</prod:size>
</prod:product>
6.4 | Default and fixed values 101

6.3.5 Default namespaces and unqualiﬁed names

Default namespaces do not mix well with unqualiﬁed element names.
The instance in Example 6–10 declares the prod namespace as the
default namespace. However, this will not work with a schema docu-
ment where elementFormDefault is set to unqualified, because
it will be unsuccessfully looking for the elements number and size in
the prod namespace whereas they are in fact not in any namespace.

Example 6–10. Invalid mixing of unqualiﬁed names and a default namespace

Although unqualiﬁed element names may seem confusing, they do

have some advantages when combining multiple namespaces. Sec-
tion 21.7.3 on p. 575 provides a more complete coverage of the pros
and cons of unqualiﬁed local names.

6.4 Default and ﬁxed values

Default and fixed values are used to augment an instance by adding
values to empty elements. The schema processor will insert a default
or fixed value if the element in question is empty. If the element is ab-
sent from the instance, it will not be inserted. This is different from
the treatment of default and fixed values for attributes.
Default and fixed values are specified by the default and fixed
attributes, respectively. Only one of the two attributes (default or
fixed) may appear; they are mutually exclusive. Default and fixed
values can be specified in element declarations with:
102 Chapter 6 | Element declarations

# Simple types
# Complex types with simple content
# Complex types with mixed content, if all children are optional

The default or fixed value must be valid for the type of that element.
For example, it is not legal to specify a default value of xyz if the type
of the element is integer.1
The specification of fixed and default values in element declarations
is independent of their occurrence constraints (minOccurs and
maxOccurs). Unlike defaulted attributes, a defaulted element may be
required (i.e., minOccurs in its declaration may be more than 0). If an
element with a default value is required, it may still appear empty and
have its default value filled in.

6.4.1 Default values

The default value is ﬁlled in if the element is empty. Example 6–11
shows the declaration of product with two children, name and size,
that have default values speciﬁed.

Example 6–11. Specifying an element’s default value

<xs:element name="product">
<xs:complexType>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="name" type="xs:string" default="N/A"/>
<xs:element name="size" type="xs:integer" default="12"/>
</xs:choice>
</xs:complexType>
</xs:element>

1. This is not considered an error in the schema, but any instance that relies
on the value would be in error.
6.4 | Default and fixed values 103

It is important to note that certain types allow an empty value. This

includes string, normalizedString, token, and any types derived
from them that do not speciﬁcally disallow the empty string as a value.
Additionally, unrestricted list types allow empty values. For any type
that allows an empty string value, the element will never be considered
to have that empty string value because the default value will be ﬁlled
in. However, if an element has the xsi:nil attribute set to true, its
default value is not inserted.
Table 6–3 describes how element default values are inserted in
different situations, based on the declaration in Example 6–11.

Table 6–3 Default value behavior for elements

Situation Result Before example After example
Value Original value <size>10</size> <size>10</size>
specified kept
Empty Value filled in <size/> <size>12</size>
element <size></size> <size>12</size>
<name/> <name>N/A</name>
<name></name> <name>N/A</name>
Value is just No value <size> </size> <size></size>
whitespace filled in <name> </name> (error is raised)
<name> </name>

Element is nil No value <size <size

ﬁlled in xsi:nil="true"/> xsi:nil="true"/>
Element does No element <product/> <product/>
not appear added

6.4.2 Fixed values

Fixed values are added in all the same situations as default values. The
only difference is that if the element has a value, its value must be
equivalent to the ﬁxed value. When the schema processor determines
104 Chapter 6 | Element declarations

whether the value of the element is in fact equivalent to the ﬁxed value,
it takes into account the element’s type.
Table 6–4 shows some valid and invalid instances for elements de-
clared with ﬁxed values. The size element has the type integer, so
all forms of the integer “1” are accepted in the instance, including “01”,
“+1”, and “ 1 ” surrounded by whitespace. Whitespace around a value
is acceptable because the whiteSpace facet value for integer is
collapse, meaning that whitespace is stripped before validation takes
place. A value that contains only whitespace, like <size> </size>,
is not valid because it is not considered empty but also is not equal to 1.
The name element, on the other hand, has the type string. The
string “01” is invalid because it is not considered to be equal to the
string “1”. The string “ 1 ” is also invalid because the whiteSpace
facet value for string is preserve, meaning that the leading and
trailing spaces are kept. For more information on type equality, see
Section 11.7 on p. 253.

Table 6–4 Elements with ﬁxed values

Schema
<xs:element name="name" type="xs:string" fixed="1"/>
<xs:element name="size" type="xs:integer" fixed="1"/>

Valid instances Invalid instances

<name>1</name> <name>01</name>
<name/> <name>+1</name>
<name></name> <name> 1 </name>
<name> </name>
<name>2</name>
6.5 | Nils and nillability 105

6.5 Nils and nillability

In some cases, an element may be either absent from an instance or
empty (contain no value). The instance shown in Example 6–12 is a
purchase order with some absent and empty elements.

Example 6–12. Missing values

<order>
<giftWrap>ADULT BDAY</giftWrap>
<customer>
<name>
<first>Priscilla</first>
<middle/>
<last>Walmsley</last>
</name>
</customer>
<items>
<shirt>
<giftWrap/>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size></size>
</shirt>
<umbrella>
<number>443</number>
<name>Deluxe Golf Umbrella</name>
<size></size>
</umbrella>
</items>
</order>

There are many possible reasons for an element value to be missing

in an instance:

# The information is not applicable: Umbrellas do not come in

different sizes.
# We do not know whether the information is applicable: We do
not know whether the customer has a middle name.
106 Chapter 6 | Element declarations

# It is not relevant to this particular application of the data: The

billing application does not care about product sizes.
# It is the default, so it is not speciﬁed: The customer’s title should
default to “Mr.”
# It actually is present and the value is an empty string: The gift
wrap value for the shirt is empty, meaning “none,” which should
override the gift wrap value of the order.
# It is erroneously missing because of a user error or technical
bug: We should have a size for the shirt.

Different applications treat missing values in different ways. One

application might treat an absent element as not applicable, and an
empty element as an error. Another might treat an empty element as
not applicable, and an absent element as an error. The treatment of
missing values may vary within the same schema. In our example, we
used a combination of absent and empty elements to signify different
reasons for missing values.
XML Schema offers a third method of indicating a missing value:
nils. By marking an element as nil, you are telling the processor “I know
this element is empty, but I want it to be valid anyway.” The actual
reason why it is empty, or what the application should do with it, is
entirely up to you. XML Schema does not associate any particular se-
mantics with this absence. It only offers an additional way to express
a missing value, with the following beneﬁts:

# You do not have to weaken the type by allowing empty content

and/or making attributes optional.
# You are making a deliberate statement that the information
does not exist. This is a clearer message than simply omitting
the element, which would mean that we do not know if it exists.
# If for some reason an application is relying on that element being
there, for example as a placeholder, nil provides a way for it to
exist without imparting any additional information.
6.5 | Nils and nillability 107

# You can easily turn off default value processing. The default
value for the element will not be added if it is marked as nil.

An approach for our purchase order document is outlined below. It

uses nils, derived types, simple type restrictions, and default values to
better constrain missing values. The resulting instance is shown in
Example 6–13.

Example 6–13. Missing values, revisited

<order xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<giftWrap>ADULT BDAY</giftWrap>
<customer>
<name>
<title/> 
<first>Priscilla</first>
<middle xsi:nil="true"/>
<last>Walmsley</last>
</name>
</customer>
<items>
<shirt>
<giftWrap/>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size></size> 
</shirt>
<umbrella>
<number>443</number>
<name>Deluxe Golf Umbrella</name>
</umbrella>
</items>
</order>

# The information is not applicable: Give shirt and umbrella

different types and do not include the size element declaration
in UmbrellaType.
# We do not know whether the information is applicable: Make
middle nillable and set xsi:nil to true if it is not present.
108 Chapter 6 | Element declarations

# It is not relevant to this particular application of the data: Give

the billing application a separate schema document and
insert a wildcard where the size and other optional element
declarations or references may appear.
# It is the default, so it is not speciﬁed: Specify a default value of
“Mr.” for title.
# It actually is present and the value is an empty string: Allow
giftWrap to appear empty.
# It is erroneously missing because of a user error or technical
bug: Make size required and make it an integer or other
type that does not accept empty values.

This is one of the many reasonable approaches for handling absent

values. The important thing is to deﬁne a strategy that provides all
the information your application needs and ensures that all errors are
caught.

6.5.1 Using xsi:nil in an instance

To indicate that the value of an instance element is nil, specify the
xsi:nil attribute on that element. Example 6–14 shows ﬁve instances
of size that use the xsi:nil attribute. The xsi:nil attribute applies
to the element in whose tag it appears, not any of the attributes. There
is no way to specify that an attribute value is nil.

Example 6–14. xsi:nil in instance elements

The xsi:nil attribute is in the XML Schema Instance Name-

space (http://www.w3.org/2001/XMLSchema-instance). This
6.5 | Nils and nillability 109

namespace must be declared in the instance, but it is not necessary to

specify a schema location for it. Any schema processor will recognize
the xsi:nil attribute of any XML element.
If the xsi:nil attribute appears on an element, and its value is set
to true, that element must be empty. It cannot contain any child ele-
ments or character data, even if its type requires content. The last in-
stance of the size element in Example 6–14 is invalid because xsi:nil
is true but it contains data. However, it is valid for a nil element to
have other attributes, as long as they are declared for that type.

6.5.2 Making elements nillable

In order to allow an element to appear in the instance with the xsi:nil
attribute, its element declaration must indicate that it is nillable. Nilla-
bility is indicated by setting the nillable attribute in the element
declaration to true. Example 6–15 shows an element declaration
illustrating this.

Example 6–15. Making size elements nillable

<xs:element name="size" type="xs:integer" nillable="true"/>

Specifying nillable="true" in the declaration allows elements to

have the xsi:nil attribute. Otherwise, the xsi:nil attribute cannot
appear, even with its value set to false. It is not necessary (or even
legal) to separately declare the xsi:nil attribute for the type used in
the element declaration. In Example 6–15, we gave size a simple type.
Normally this would mean that it cannot have attributes, but the
xsi:nil attribute is given special treatment. Elements with either
complex or simple types can be nillable.
If nillable is set to true, a ﬁxed value may not be speciﬁed in the
declaration.1 However, it is legal to specify a default value. If an element

1. This would be considered an error in the instance, not the schema.

110 Chapter 6 | Element declarations

has an xsi:nil set to true, the default value is not filled in even
though the element is empty.
Elements should not be declared nillable if they will ever be used as
fields in an identity constraint, such as a key or a uniqueness constraint.
See Section 17.7.2 on p. 434 for more information on identity
constraint fields.
Attribute
declarations

112
Chapter

his chapter covers the other basic building block of XML:

T attributes. It explains how to declare attributes and assign

types to them. It also describes ﬁxed and default values as
they apply to attributes.

7.1 Attributes vs. elements

Whether to model data values as elements or attributes is an often-
discussed question. XML Schema, with its ability to deﬁne data types
for elements, eliminates many of the advantages of attributes in DTDs.
The advantages of using attributes are:

# They are less verbose.

# For narrative content, attributes are typically used for values
that should not appear in the content—that is, for metadata.
In a typical (X)HTML document, elements are used for the
content that appears on a page, while attributes specify style

113
114 Chapter 7 | Attribute declarations

and other information that is used by the browser but not di-
rectly by the end user. This is a convenient separation for some
narrative XML vocabularies.
# If you plan to validate using DTDs as well as schemas, you can
perform some minimal type checking on attribute values.
For example, color can be constrained to a certain set of val-
ues. Elements’ values character data content cannot be validated
using DTDs.
# Attributes can be added to the instance by specifying default
values; elements cannot (they must appear to receive a
default value).
# Attributes can be inherited by descendant elements, as described
in Section 7.6 on p. 126.

The advantages of using elements are:

# They are more extensible because attributes can later be added

to them without affecting a processing application. For example,
if you realized that you needed to keep track of what currency
a price is expressed in, you can declare a currency attribute in
the price element declaration. If price is an attribute, this is
not possible.
# They can contain other elements. For example, if you want to
mark up a textual description using XHTML tags, this is not
possible if description is an attribute.
# They can be repeated. An element may only appear once now,
but later you may wish to extend it to appear multiple times.
For example, if you decide later that a product can have multiple
colors, you can allow a color child to appear more than
once. Attributes may only appear once per element.
# You have more control over the rules of their appearance. For
example, you can say that a product can have either a number
or a productCode child. This is not possible for attributes.
# They can be used in substitution groups.
7.2 | Global and local attribute declarations 115

# They can be given nil values.

# They can use type substitution to substitute derived types in
the instance.
# Their order is signiﬁcant, while the order of attributes is not.
Obviously, this is only an advantage if you care about the order.
# When the values are lengthy, elements tend to be more readable
than attributes.

As you can see, there are many more advantages to using elements
than attributes, but attributes are useful in some cases. A general rec-
ommendation is to use attributes for metadata and elements for data.
For example, use an attribute to describe the units, language, or time
dependence of an element value. Additionally, attributes should be
used for ID and IDREF values as well as XLink expressions. Elements
should be used for everything else.

7.2 Global and local attribute declarations

Attribute declarations are used to name an attribute and associate it
with a particular simple type. This is accomplished using an attribute
element. Attribute declarations may be either global or local.

7.2.1 Global attribute declarations

Global attribute declarations appear at the top level of the schema
document, meaning that their parent must be the schema element.
These global attribute declarations can then be used in multiple complex
types, as described in Section 12.6 on p. 281. Table 7–1 shows the
syntax of a global attribute declaration.
Example 7–1 shows two global attribute declarations: system and
dim. A complex type is then deﬁned which references those attribute
declarations by name using the ref attribute.
116 Chapter 7 | Attribute declarations

Table 7–1 XSD Syntax: global attribute declaration

Name
attribute

Parents
schema, 1.1
override

Attribute name Type Description

id ID Unique ID.
name NCName Unique name.
type QName Type, see Section 7.3.
default string Default value, see Section 7.5.1.
fixed string Fixed value, see Section 7.5.2.
1.1
inheritable boolean: Whether the value can be inherited by
false descendant elements, see Section 7.6.
Content
annotation?, simpleType?

Example 7–1. Global attribute declarations

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">

<xs:attribute name="system" type="xs:string"/>

<xs:attribute name="dim" type="xs:integer"/>

<xs:complexType name="SizeType">
<xs:attribute ref="system" use="required"/>
<xs:attribute ref="dim"/>
</xs:complexType>

</xs:schema>
7.2 | Global and local attribute declarations 117

The qualiﬁed names used by global attribute declarations must be

unique in the schema. This includes not just the schema document
in which they appear, but any other schema documents that are used
with it.
The use attribute, which indicates whether an attribute is required
or optional, appears in the attribute reference rather than attribute
declaration. This is because it applies to the appearance of that
attribute in a complex type, not the attribute itself. Attribute references
are covered in Section 12.6 on p. 281.
The name specified in an attribute declaration must be an XML non-
colonized name, which means that it must start with a letter or under-
score, and may only contain letters, digits, underscores, hyphens, and
periods. The qualified name consists of the target namespace of the
schema document plus the local name in the declaration. In Exam-
ple 7–1, the system and dim attributes take on the target namespace
http://datypic.com/prod.
Since globally declared attribute names are qualified by the
target namespace of the schema document, it is not legal to include a
namespace prefix in the value of the name attribute. If you want to
declare attributes in another namespace, you must create a separate
schema document with that namespace as target and import it into
the original schema document. If you simply want to specify an attribute
from another namespace, such as xml:lang, use the ref attribute to
reference it in a complex type.

7.2.2 Local attribute declarations

Local attribute declarations, on the other hand, appear entirely within
a complex type deﬁnition. They may only be used in that type deﬁni-
tion, and are never reused by other types. Table 7–2 shows the syntax
for a local attribute declaration.
118 Chapter 7 | Attribute declarations

Table 7–2 XSD Syntax: local attribute declaration

Name
attribute

Parents
complexType, restriction, extension, attributeGroup

Attribute name Type Description

id ID Unique ID.
name NCName Unique name.
type QName Simple type, see Section 7.3.
form "qualified" | Whether the attribute name must
"unqualified" be qualiﬁed in the instance (see
Section 7.4); defaults to
attributeFormDefault of
schema, which defaults to
unqualified.

use "optional" | Whether it is required or

"prohibited" | optional, see Section 12.6.2.
"required" :
optional
default string Default value, see Section 7.5.1.
fixed string Fixed value, see Section 7.5.2.
1.1
targetNamespace anyURI The target namespace if restrict-
ing a type in another namespace,
see Section 13.5.7.
1.1
inheritable boolean: false Whether the value can be inher-
ited by descendant elements, see
Section 7.6.
Content
annotation?, simpleType?
7.2 | Global and local attribute declarations 119

Example 7–2 shows two local attribute declarations, system and

dim, which appear entirely within a complex type deﬁnition.

Example 7–2. Local attribute declarations

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">

<xs:complexType name="SizeType">
<xs:attribute name="system" type="xs:string" use="required"/>
<xs:attribute name="dim" type="xs:integer"/>
</xs:complexType>

</xs:schema>

Unlike global attribute declarations, local attribute declarations can

have a use attribute, which indicates whether an attribute is required
or optional.
The name specified in a local attribute declaration must also be an
XML non-colonized name. If its form is qualified, it takes on the target
namespace of the schema document. If it is unqualified, it is considered
to be in no namespace. See Section 7.4 on p. 122 for more information
on qualified versus unqualified attribute names.
Locally declared attribute names are scoped to the complex type in
which they are declared. It is illegal to have two attributes with the
same qualified name in the same complex type definition. This is
explained further in Section 12.6 on p. 281.

7.2.3 Design hint: Should I use global or local

attribute declarations?
Global attribute declarations are discouraged unless the attribute is
used in a variety of element declarations which are in a variety of
namespaces. This is because globally declared attribute names must be
120 Chapter 7 | Attribute declarations

preﬁxed in instances, resulting in an instance element that looks

like this:
<prod:size prod:system="US-DRESS" prod:dim="1"/>

Preﬁxing every attribute is not what users generally expect, and it

adds a lot of extra text without any additional meaning.
Two examples of global attributes are the xml:lang attribute that
is part of XML and the xsi:type attribute that is part of XML
Schema. Virtually any element in any namespace may have these two
attributes, so in this case it is desirable to distinguish them by their
namespace.
If you are tempted to use a global attribute declaration because
you want to be able to reuse it multiple times, consider these two
alternatives:

# Put it into an attribute group. This makes it, effectively, a local

attribute declaration, while still allowing you to reuse it.
# Deﬁne a named simple type that can be reused by multiple local
attribute declarations.

7.3 Declaring the types of attributes

Regardless of whether they are local or global, all attribute declarations
associate an attribute name with a simple type. All attributes have
simple types rather than complex types, which makes sense since they
cannot themselves have child elements or attributes. There are three
ways to assign a simple type to an attribute.

1. Reference a named simple type by specifying the type attrib-

ute in the attribute declaration. This may be either a built-in
type or a user-derived type.
2. Deﬁne an anonymous type by specifying a simpleType child.
7.3 | Declaring the types of attributes 121

3. Use no particular type, by specifying neither a type attribute

nor a simpleType child. In this case, the actual type is
anySimpleType, which may have any value, as long as it
is well-formed XML.

Example 7–3 shows four attribute declarations with different type

assignment methods.

Example 7–3. Declaring the types of attributes

<xs:attribute name="color" type="ColorType"/>

<xs:attribute name="dim" type="xs:integer"/>

<xs:attribute name="system">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="US-DRESS"/>

</xs:restriction>
</xs:simpleType>
</xs:attribute>

<xs:attribute name="anything"/>

The ﬁrst example uses the type attribute to assign ColorType to

the attribute color. The second example also uses the type attrib-
ute, this time to assign the built-in type integer to the attribute dim.
The xs prefix is used because the built-in types are part of the XML
Schema Namespace. For a complete explanation of the use of prefixes
in schema documents, see Section 3.3.5 on p. 52.
The third example uses an inline anonymous simple type, which is
defined entirely within the system attribute declaration. Finally, the
fourth attribute, anything, does not specify a particular type, which
means that any value is valid.
For a detailed discussion of using named or anonymous types, see
Section 8.2.3 on p. 133.
122 Chapter 7 | Attribute declarations

7.4 Qualiﬁed vs. unqualiﬁed forms

XML Schema allows you to exert some control over using namespace-
qualified or unqualified attribute names in the instance. Since default
namespace declarations do not apply to attributes, this is essentially a
question of whether you want the attribute names to be prefixed or
unprefixed.
This is indicated by the form attribute, which may be set to
qualified or unqualified. If the form attribute is not present in
a local attribute declaration, the value defaults to the value of the
attributeFormDefault attribute of the schema element. If neither
attribute is present, the default is unqualified. The form and
attributeFormDefault attributes only apply to locally declared at-
tributes. If an attribute is declared globally (at the top level of the
schema document), it must always have a qualified (prefixed) name in
the instance.
Example 7–4 shows a schema that declares several attributes, along
with a valid instance. In the instance, the global attribute’s name
is qualified (prefixed) because it is globally declared. The attributes
unqual and qual both have a form attribute specified, and their
names appear in the instance as designated. The unspec
attribute’s name is unqualified (unprefixed) because that is the default
when neither form nor attributeFormDefault are present.
Qualified attribute names should only be used for attributes that
apply to a variety of elements in a variety of namespaces, such as
xml:lang or xsi:type. For locally declared attributes, whose scope
is limited to the type definition in which they appear, prefixes add extra
text without any additional meaning.
The best way to handle qualification of attribute names is to ignore
the form and attributeFormDefault attributes completely. Then,
globally declared attributes will have qualified names, and locally
declared attributes will have unqualified names, which makes sense.
Section 21.7.3 on p. 575 provides a more complete coverage of the
pros and cons of unqualified local names for both elements and
attributes.
7.5 | Default and fixed values 123

Example 7–4. Qualiﬁed and unqualiﬁed attribute names

Schema:

Valid instance:

<prod:size xmlns:prod="http://datypic.com/prod"
prod:global="x" unqual="x" prod:qual="x" unspec="x"/>

7.5 Default and ﬁxed values

Default and fixed values are used to augment an instance by adding
attributes when they are not present. If an attribute is absent, and a
default or fixed value is specified in its declaration, the schema processor
will insert the attribute and give it the default or fixed value.
Default and fixed values are specified by the default and fixed
attributes, respectively. Only one of the two attributes (default or
fixed) may appear; they are mutually exclusive. If an attribute has a
default value specified, it cannot be a required attribute. This makes
sense, because if the attribute is required, it will always appear in
instances, and the default value will never be used.
The default or fixed value must be valid for the type of that attribute.
For example, it is not legal to specify a default value of xyz if the type
of the attribute is integer.
124 Chapter 7 | Attribute declarations

7.5.1 Default values

A default value is ﬁlled in if the attribute is absent from the element.
If the attribute appears, with any value, it is left alone. Example 7–5
shows the declaration of size with one attribute, dim, that has a default
value speciﬁed.

Example 7–5. Declaring a default value for an attribute

<xs:element name="size">
<xs:complexType>
<xs:attribute name="dim" type="xs:integer" default="1"/>
</xs:complexType>
</xs:element>

Table 7–3 describes how attribute default values are inserted in dif-
ferent situations, based on the declaration in Example 7–5. Note that
the only time the default value is inserted is when the attribute is
absent. If the attribute’s value is the empty string, it is left as is. In that
case, if an empty string is not valid for that type, which it is not for
integer, an error is raised. This is different from the behavior of
default values for elements, described in Section 6.4.1 on p. 102.

Table 7–3 Default value behavior for attributes

Situation Result Before example After example
Attribute is Attribute is <size/> <size dim="1"/>
absent. added with
the default
value.
Attribute appears Original value <size dim="2"/> <size dim="2"/>
with a value. is kept.
Attribute appears Empty string <size dim=""/> <size dim=""/>
with empty string is kept. (Error is raised.)
as its value.
7.5 | Default and fixed values 125

7.5.2 Fixed values

Fixed values are inserted in all the same situations as default values.
The only difference is that if the attribute appears, its value must be
equal to the fixed value. When the schema processor determines whether
the value of the attribute is in fact equal to the fixed value, it takes into
account the attribute’s type.
Table 7–4 shows some valid and invalid instances for attributes
declared with fixed values. The dim attribute has the type integer,
so all forms of the integer “1” are accepted in the instance, including
“01”, “+1”, and “ 1 ” surrounded by whitespace. The whitespace
is acceptable because the whiteSpace facet value for integer is
collapse, meaning that leading and trailing whitespace is stripped
before validation takes place.

Table 7–4 Attributes with ﬁxed values

Schema
<xs:element name="size">
<xs:complexType>
<xs:attribute name="system" type="xs:string" fixed="1"/>
<xs:attribute name="dim" type="xs:integer" fixed="1"/>
</xs:complexType>
</xs:element>

Valid instances Invalid instances

The system attribute, on the other hand, has the type string. The
string “01” is invalid because it is not considered equal to the string
“1”. The string “ 1 ” is also invalid because the whiteSpace facet value
126 Chapter 7 | Attribute declarations

for string is preserve, meaning that the leading and trailing spaces
are kept. For more information on type equality, please see Section 11.7
on p. 253.

7.6 Inherited attributes

In version 1.1, it is possible to declare an attribute to be inheritable.
Conceptually, this means that it is relevant not just to the element on
which it appears, but to the descendant elements as well. A good exam-
ple is a language attribute which could be declared as an allowed at-
tribute of a chapter element, but could be overridden in a descendant
p element, as shown in Example 7–6.

Example 7–6. Instance containing an inherited attribute

The implication is that the language of the chapter is English (en),

and all descendant elements have that language unless otherwise
speciﬁed with another language attribute. In this case, the ﬁrst para-
graph doesn’t have a language attribute, so it inherits the value en
from the chapter. The second paragraph does not inherit the value,
because it overrides it by saying its language is French (fr).
Example 7–7 is a schema for the example. The language attribute
declaration uses inheritable="true" to indicate that language
attributes are inheritable from chapters down to their descendants.
Note that using an inheritable language attribute declaration in
ChapterType does not mean that the attribute can be automatically
7.6 | Inherited attributes 127

Example 7–7. Declaring an inheritable attribute

<xs:element name="chapter" type="ChapterType"/>
<xs:complexType name="ChapterType">
<xs:sequence>
<xs:element name="p" type="ParaType" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="language" type="xs:language"
inheritable="true"/>
</xs:complexType>
<xs:complexType name="ParaType">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="language" type="xs:language"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>

valid on all descendants of a chapter. In fact, this example requires a

separate language attribute declaration within ParaType.1
A practical implication of declaring an attribute inheritable is that
inherited attributes can be used in the type alternatives that are deﬁned
for the descendant elements. This is described further in Section 14.2.6
on p. 382.

1. In this example, making language inheritable with ParaType would have

no meaning, since there are no allowed descendants. But otherwise it is
good practice to make language inheritable.
Simple types

128
Chapter

oth element and attribute declarations can use simple types

B to describe their data content. This chapter introduces

simple types and explains how to deﬁne your own
atomic simple types for use in your schemas.

8.1 Simple type varieties

There are three varieties of simple types: atomic types, list types, and
union types.

1. Atomic types have values that are indivisible, such as 10 or large.

2. List types have values that are whitespace-separated lists of
atomic values, such as <availableSizes>10 large
2</availableSizes>.
3. Union types may have values that are either atomic values or list
values. What differentiates them is that the set of valid values,
or “value space,” for the type is the union of the value spaces of

129
130 Chapter 8 | Simple types

two or more other simple types. For example, to represent a

dress size, you may deﬁne a union type that allows a value to
be either an integer from 2 through 18, or one of the string
values small, medium, or large.

List and union types are covered in Chapter 10.

8.1.1 Design hint: How much should I break down

my data values?
Data values should be broken down to the most atomic level possible.
This allows them to be processed in a variety of ways for different uses,
such as display, mathematical operations, and validation. It is much
easier to concatenate two data values back together than it is to split
them apart. In addition, more granular data is easier to validate.
It is a fairly common practice to put a data value and its
units in the same element, for example <length>3cm</length>.
However, the preferred approach is to have a separate data value,
preferably an attribute, for the units, for example <length
units="cm">3</length>.
Using a single concatenated value is limiting because:

# It is extremely cumbersome to validate. You have to apply a

complicated pattern that would need to change every time
a unit type is added.
# You cannot perform comparisons, conversions, or mathematical
operations on the data without splitting it apart.
# If you want to display the data item differently (for example, as
“3 centimeters” or “3 cm” or just “3”, you have to split it apart.
This complicates the stylesheets and applications that process
instance documents.

It is possible to go too far, though. For example, you may break a

date down as follows:
8.2 | Simple type definitions 131

This is probably overkill unless you have a special need to process

these items separately.

8.2 Simple type deﬁnitions

8.2.1 Named simple types
Simple types can be either named or anonymous. Named simple types
are always defined globally (i.e., their parent is always schema1)
and are required to have a name that is unique among the types (both
simple and complex) in the schema. The syntax for a named simple
type definition is shown in Table 8–1.
The name of a simple type must be an XML non-colonized name,
which means that it must start with a letter or underscore, and may
only contain letters, digits, underscores, hyphens, and periods. You
cannot include a namespace prefix when defining the type; it takes its
namespace from the target namespace of the schema document.
All examples of named types in this book have the word “Type” at
the end of their names to clearly distinguish them from element and
attribute names. However, this is a convention and not a requirement.
You can even have a type definition and an element declaration using
the same name, but this is not recommended because it can be
confusing.
Example 8–1 shows the definition of a named simple type
DressSizeType along with an element declaration that references it.
Named types can be used in multiple element and attribute declarations.

1. Except in the case of a redeﬁne or override.

132 Chapter 8 | Simple types

Table 8–1 XSD Syntax: named simple type deﬁnition

Name
simpleType

Parents
schema, redefine, 1.1
override

Attribute name Type Description

id ID Unique ID.
name NCName Simple type name.
final "#all" | list of Whether other types can be
("restriction" | derived from this one (see
"list" | "union" | Section 8.5); defaults to
1.1
"extension") finalDefault of schema.

Content
annotation?, (restriction | list | union)

Example 8–1. Deﬁning and referencing a named simple type

<xs:simpleType name="DressSizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>

<xs:element name="size" type="DressSizeType"/>

8.2.2 Anonymous simple types

Anonymous types, on the other hand, must not have names. They are
always deﬁned entirely within an element or attribute declaration, and
may only be used once, by that declaration. Deﬁning a type
anonymously prevents it from ever being restricted, used in a list or
8.2 | Simple type definitions 133

union, redeﬁned, or overridden. The syntax to deﬁne an anonymous

simple type is shown in Table 8–2.

Table 8–2 XSD Syntax: anonymous simple type deﬁnition

Name
simpleType

Parents
element, attribute, restriction, list, union, 1.1
alternative

Attribute name Type Description

id ID Unique ID.
Content
annotation?, (restriction | list | union)

Example 8–2 shows the deﬁnition of an anonymous simple type

within an element declaration.

Example 8–2. Deﬁning an anonymous simple type

<xs:element name="size">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
</xs:element>

8.2.3 Design hint: Should I use named or anonymous

types?
The advantage of named types is that they may be deﬁned once
and used many times. For example, you may deﬁne a type named
134 Chapter 8 | Simple types

ProductCodeType that lists all of the valid product codes in your

organization. This type can then be used in many element and
attribute declarations in many schemas. This has the advantages of

# Encouraging consistency throughout the organization

# Reducing the possibility of error
# Requiring less time to deﬁne new schemas
# Simplifying maintenance, because new product codes need only
be added in one place

If a type is named, you can also derive new types from it, which is
another way to promote reuse and consistency.
Named types can also make a schema more readable when its type
definitions are complicated.
An anonymous type, on the other hand, can be used only in the ele-
ment or attribute declaration that contains it. It can never be redefined,
overridden, have types derived from it, or be used in a list or union
type. This can seriously limit its reusability, extensibility, and ability
to change over time.
However, there are cases where anonymous types are preferable to
named types. If the type is unlikely to ever be reused, the advantages
listed above no longer apply. Also, there is such a thing as too much
reuse. For example, if an element can contain the values 1 through 10,
it does not make sense to define a type named OneToTenType to be
reused by other unrelated element declarations with the same value
space. If the value space for one of the element declarations using that
named type changes but the other element declarations stay the same,
it actually makes maintenance more difficult, because a new type would
need to be defined at that time.
In addition, anonymous types can be more readable when they are
relatively simple. It is sometimes desirable to have the definition of
the type right there with the element or attribute declaration.
8.3 | Simple type restrictions 135

8.3 Simple type restrictions

Every simple type is a restriction of another simple type, known as its
base type. It is not possible to extend a simple type, except by adding
attributes which results in a complex type. This is described in
Section 13.4.1 on p. 306.
Every new simple type restricts the value space of its base type in
some way. Example 8–3 shows a deﬁnition of DressSizeType that
restricts the built-in type integer.

Example 8–3. Deriving a simple type from a built-in simple type

<xs:simpleType name="DressSizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
<xs:pattern value="\d{1,2}"/>
</xs:restriction>
</xs:simpleType>

Simple types may also restrict user-derived simple types that are
deﬁned in the same schema document, or even in a different schema
document. For example, you could further restrict DressSizeType
by deﬁning another simple type, MediumDressSizeType, as shown
in Example 8–4.
A simple type restricts its base type by applying facets to restrict its
values. In Example 8–4, the facets minInclusive and maxInclusive
are used to restrict the value of MediumDressSizeType to be between
8 and 12 inclusive.

Example 8–4. Deriving a simple type from a user-derived simple type

<xs:simpleType name="MediumDressSizeType">
<xs:restriction base="DressSizeType">
<xs:minInclusive value="8"/>
<xs:maxInclusive value="12"/>
</xs:restriction>
</xs:simpleType>
136 Chapter 8 | Simple types

8.3.1 Deﬁning a restriction

The syntax for a restriction element is shown in Table 8–3. You
must specify one base type either by using the base attribute or by
deﬁning the simple type anonymously using a simpleType child. The
option of using a simpleType child is generally only useful when
restricting list types, as described in Section 10.3.3 on p. 190.

Table 8–3 XSD Syntax: simple type restriction

Name
restriction

Parents
simpleType

Attribute name Type Description

id ID Unique ID.
base QName Simple type that is being restricted; either a base
attribute or a simpleType child is required.
Content
annotation?, simpleType?, (minExclusive | minInclusive |
maxExclusive | maxInclusive | length | minLength | maxLength |
totalDigits | fractionDigits | enumeration | pattern |
whiteSpace | 1.1 assertion | 1.1 explicitTimezone |
1.1
{any element in another namespace})*

Within a restriction element, you can specify any of the facets,

in any order. However, the only facets that may appear more than once
in the same restriction are pattern, enumeration, and assertion.
It is legal to deﬁne a restriction that has no facets speciﬁed. In this case,
the derived type allows the same values as the base type.
8.3 | Simple type restrictions 137

8.3.2 Overview of the facets

The available facets are listed in Table 8–4.

Table 8–4 Facets

Facet Meaning
minExclusive Value must be greater than x.
minInclusive Value must be greater than or equal to x.
maxInclusive Value must be less than or equal to x.
maxExclusive Value must be less than x.
length The length of the value must be equal to x.
minLength The length of the value must be greater than or equal
to x.
maxLength The length of the value must be less than or equal
to x.
totalDigits The number of signiﬁcant digits must be less than
or equal to x.
fractionDigits The number of fractional digits must be less than or
equal to x.
whiteSpace The schema processor should either preserve, replace,
or collapse whitespace depending on x.
enumeration x is one of the valid values.
pattern x is one of the regular expressions that the value may
match.
1.1
explicitTimezone The time zone part of the date/time value is required,
optional, or prohibited depending on x.
1.1
assertion The value must conform to a constraint in the XPath
expression.

The syntax for applying a facet is shown in Table 8–5. All facets
(except assertion) must have a value attribute, which has different
138 Chapter 8 | Simple types

valid values depending on the facet. Most facets may also have a fixed
attribute, as described in Section 8.3.4 on p. 140.

Table 8–5 XSD Syntax: facet

Name
minExclusive, minInclusive, maxExclusive, maxInclusive, length,
minLength, maxLength, totalDigits, fractionDigits, enumeration,
pattern, whiteSpace, 1.1 explicitTimezone†

Parents
restriction

Attribute name Type Description

id ID Unique ID.
value various Value of the restricting facet.
fixed boolean: Whether the facet is ﬁxed and therefore cannot
false be restricted further (see Section 8.3.4); not
applicable for pattern, enumeration.
Content
annotation?
† The assertion facet has a different syntax that is described in Table 14–1.

Certain facets are not applicable to some types. For example, it does
not make sense to apply the fractionDigits facet to a character
string type. There is a deﬁned set of applicable facets for each of the
built-in types.1 If a facet is applicable to a built-in type, it is also appli-
cable to atomic types that are derived from it. For example, since the
length facet is applicable to string, if you derive a new type from

1. Technically, it is the primitive types that have applicable facets, with the
rest of the built-in types inheriting that applicability from their base types.
However, since most people do not have the built-in type hierarchy
memorized, it is easier to list applicable facets for all the built-in types.
8.3 | Simple type restrictions 139

string, the length facet is also applicable to your new type. Sec-
tion 8.4 on p. 142 describes each of the facets in detail and lists the
built-in types to which the facet can apply.

8.3.3 Inheriting and restricting facets

When a simple type restricts its base type, it inherits all of the facets of
its base type, its base type’s base type, and so on back through its ances-
tors. Example 8–4 showed a simple type MediumDressSizeType
whose base type is DressSizeType. DressSizeType has a pattern
facet which restricts its value space to one- or two-digit numbers. Since
MediumDressSizeType inherits all of the facets from DressSizeType,
this same pattern facet applies to MediumDressSizeType also. Ex-
ample 8–5 shows an equivalent deﬁnition of MediumDressSizeType
where it restricts integer and has the pattern facet applied.

Example 8–5. Effective deﬁnition of MediumDressSizeType

<xs:simpleType name="MediumDressSizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="8"/>
<xs:maxInclusive value="12"/>
<xs:pattern value="\d{1,2}"/>
</xs:restriction>
</xs:simpleType>

Sometimes a simple type deﬁnition will include facets that

are also speciﬁed for one of its ancestors. In Example 8–4,
MediumDressSizeType includes minInclusive and maxInclusive,
which are also applied to its base type, DressSizeType. The
minInclusive and maxInclusive facets of MediumDressSizeType
(whose values are 8 and 12, respectively) override those of
DressSizeType (2 and 18, respectively).
It is a requirement that the facets of a derived type (in this case
MediumDressSizeType) be more restrictive than those of the base
type. In Example 8–6, we deﬁne a new restriction of DressSizeType,
140 Chapter 8 | Simple types

called SmallDressSizeType, and set minInclusive to 0. This type

deﬁnition is illegal, because it attempts to expand the value space by
allowing 0, which was not valid for DressSizeType.

Example 8–6. Illegal attempt to extend a simple type

<xs:simpleType name="SmallDressSizeType">
<xs:restriction base="DressSizeType">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="6"/>
</xs:restriction>
</xs:simpleType>

This rule also applies when you are restricting the built-in types. For
example, the short type has a maxInclusive value of 32767. It is
illegal to define a restriction of short that sets maxInclusive to
32768.
Although enumeration facets can appear multiple times in the same
type definition, they are treated in much the same way. If both a
derived type and its ancestor have a set of enumeration facets, the
values of the derived type must be a subset of the values of the ancestor.
An example of this is provided in Section 8.4.4 on p. 145.
Likewise, the pattern facets specified in a derived type must allow
a subset of the values allowed by the ancestor types. A schema processor
will not necessarily check that the regular expressions represent a subset;
instead, it will validate instances against the patterns of both the derived
type and all the ancestor types, effectively taking the intersection of the
pattern values.

8.3.4 Fixed facets

When you define a simple type, you can fix one or more of the facets.
This means that further restrictions of this type cannot change the
value of the facet. Any of the facets may be fixed, with the exception
of pattern, enumeration, and assertion. Example 8–7 shows our
8.3 | Simple type restrictions 141

DressSizeType with ﬁxed minInclusive and maxInclusive facets,

as indicated by a fixed attribute set to true.

Example 8–7. Fixed facets

<xs:simpleType name="DressSizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="2" fixed="true"/>
<xs:maxInclusive value="18" fixed="true"/>
<xs:pattern value="\d{1,2}"/>
</xs:restriction>
</xs:simpleType>

With this deﬁnition of DressSizeType, it would have been illegal

to define the MediumDressSizeType as shown in Example 8–4 because
it attempts to override the minInclusive and maxInclusive facets
which are now fixed. Some of the built-in types have fixed facets that
cannot be overridden. For example, the built-in type integer has its
fractionDigits facet fixed at 0, so it is illegal to derive a type from
integer and specify a fractionDigits that is not 0.

8.3.4.1 Design hint:When should I ﬁx a facet?

Fixing facets makes your type less flexible and discourages other schema
authors from reusing it. Keep in mind that any types that may be de-
rived from your type must be more restrictive, so you are not at risk
that your type will be dramatically changed if its facets are unfixed.
A justification for fixing facets might be that changing that facet
value would significantly alter the meaning of the type. For example,
suppose you want to define a simple type that represents price. You
define a Price type and fix the fractionDigits at 2. This still allows
other schema authors to restrict Price to define other types, for exam-
ple, by limiting it to a certain range of values. However, they cannot
modify the fractionDigits of the type, because this would result in
a type not representing a price in dollars and cents.
142 Chapter 8 | Simple types

8.4 Facets
8.4.1 Bounds facets
The four bounds facets (minInclusive, maxInclusive,
minExclusive, and maxExclusive) restrict a value to a speci-
ﬁed range. Our previous examples applied minInclusive and
maxInclusive to restrict the value space of DressSizeType. While
minInclusive and maxInclusive specify boundary values that
are included in the valid range, minExclusive and maxExclusive
specify bounds that are excluded from the valid range.
There are several constraints associated with the bounds facets:

# minInclusive and minExclusive cannot both be applied to

the same type. Likewise, maxInclusive and maxExclusive
cannot both be applied to the same type. You may, however,
mix and match, applying, for example, minInclusive and
maxExclusive together. You may also apply just one end of
the range, such as minInclusive only.
# The value for the lower bound (minInclusive or
minExclusive) must be less than or equal to the value for the
upper bound (maxInclusive or maxExclusive).
# The facet value must be a valid value for the base type. For
example, when restricting integer, it is illegal to specify a
maxInclusive value of 18.5, because 18.5 is not a valid
integer.

The four bounds facets can be applied only to the date/time and
numeric types, and the types derived from them. Special consideration
should be given to time zones when applying bounds facets to date/time
types. For more information, see Section 11.4.15 on p. 235.
8.4 | Facets 143

8.4.2 Length facets

The length facet allows you to limit values to a speciﬁc length. If it
is a string-based type, length is measured in number of characters.
This includes the XML DTD types and anyURI. If it is a binary type,
length is measured in octets of binary data. If it is a list type, length is
measured as the number of items in the list. The facet value for length
must be a nonnegative integer.
The minLength and maxLength facets allow you to limit a value’s
length to a speciﬁc range. Either of both of these facets may be applied.
If they are both applied, minLength must be less than or equal to
maxLength. If the length facet is applied, neither minLength
nor maxLength may be applied. The facet values for minLength and
maxLength must be nonnegative integers.
The three length facets (length, minLength, maxLength) can be
applied to any string-based types (including the XML DTD types),
the binary types, and anyURI. They cannot be applied to the date/time
types, numeric types, or boolean.

8.4.2.1 Design hint:What if I want to allow empty values?

Many of the built-in types do not allow empty values. Types other than
string, normalizedString, token, hexBinary, base64Binary,
and anyURI do not allow empty values unless xsi:nil appears in the
element tag.
You may have an integer that you want to be either between 2 and
18, or empty. First, consider whether you want to make the element
(or attribute) optional. In this case, if the data is absent, the
element will not appear at all. However, sometimes it is desirable for
the element to appear, as a placeholder, or perhaps it is unavoidable
because of the technology used to generate the instance.
If you do determine that the elements must be able to appear empty,
you must deﬁne a union type that includes both the integer type and
an empty string, as shown in Example 8–8.
144 Chapter 8 | Simple types

Example 8–8. Union allowing an empty value

<xs:simpleType name="DressSizeType">
<xs:union>
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType>
<xs:restriction base="xs:token">
<xs:enumeration value=""/>
</xs:restriction>
</xs:simpleType>
</xs:union>
</xs:simpleType>

8.4.2.2 Design hint:What if I want to restrict the length of

an integer?
The length facet only applies to the string-based types, the XML
DTD types, the binary types, and anyURI. It does not make sense to
try to limit the length of the date/time types because they have fixed
lexical representations. But what if you want to restrict the length of
an integer value?
You can restrict the lower and upper bounds of an integer by applying
bounds facets, as discussed in Section 8.4.1 on p. 142. You can also
control the number of significant digits in an integer using the
totalDigits facet, as discussed in Section 8.4.3 on p. 145. However,
these facets do not consider leading zeros as significant. Therefore, they
cannot force an integer to appear in the instance with a specific number
of digits. To do this, you need a pattern. For example, the pattern
\d{1,2} used in our DressSizeType example forces the size to be
one or two digits long, so 012 would be invalid.
Before taking this approach, however, you should reconsider whether
it is really an integer or a string. See Section 11.3.3.1 on p. 220 for a
discussion of this issue.
8.4 | Facets 145

8.4.3 totalDigits and fractionDigits

The totalDigits facet allows you to specify the maximum number
of digits in a number. The facet value for totalDigits must be a
positive integer.
The fractionDigits facet allows you to specify the maximum
number of digits in the fractional part of a number. The facet value
for fractionDigits must be a nonnegative integer, and it must not
exceed the value for totalDigits, if one exists.
The totalDigits facet can be applied to decimal or any of the
integer types, as well as types derived from them. The fractionDigits
facet may only be applied to decimal, because it is ﬁxed at 0 for all
integer types.

8.4.4 Enumeration
The enumeration facet allows you to specify a distinct set of
valid values for a type. Unlike most other facets (except pattern and
assertion), the enumeration facet can appear multiple times in a
single restriction. Each enumerated value must be unique, and must
be valid for that type. If it is a string-based or binary type, you may also
specify the empty string in an enumeration value, which allows elements
or attributes of that type to have empty values.
Example 8–9 shows a simple type SMLXSizeType that allows the
values small, medium, large, and extra large.

Example 8–9. Applying the enumeration facet

<xs:simpleType name="SMLXSizeType">
<xs:restriction base="xs:token">
<xs:enumeration value="small"/>
<xs:enumeration value="medium"/>
<xs:enumeration value="large"/>
<xs:enumeration value="extra large"/>
</xs:restriction>
</xs:simpleType>
146 Chapter 8 | Simple types

When restricting types that have enumerations, it is important to

note that you must restrict, rather than extend, the set of enumeration
values. For example, if you want to restrict the valid values of
SMLSizeType to only be small, medium, and large, you could deﬁne
a simple type as in Example 8–10.

Example 8–10. Restricting an enumeration

<xs:simpleType name="SMLSizeType">
<xs:restriction base="SMLXSizeType">
<xs:enumeration value="small"/>
<xs:enumeration value="medium"/>
<xs:enumeration value="large"/>
</xs:restriction>
</xs:simpleType>

Note that you need to repeat all of the enumeration values that
apply to the new type. This example is legal because the values for
SMLSizeType (small, medium, and large) are a subset of the values
for SMLXSizeType. By contrast, Example 8–11 attempts to add an
enumeration facet to allow the value extra small. This type deﬁni-
tion is illegal because it attempts to extend rather than restrict the value
space of SMLXSizeType.

Example 8–11. Illegal attempt to extend an enumeration

<xs:simpleType name="XSMLXSizeType">
<xs:restriction base="SMLXSizeType">
<xs:enumeration value="extra small"/>
<xs:enumeration value="small"/>
<xs:enumeration value="medium"/>
<xs:enumeration value="large"/>
<xs:enumeration value="extra large"/>
</xs:restriction>
</xs:simpleType>

The only way to add an enumeration value to a type is by deﬁning

a union type. Example 8–12 shows a union type that adds the value
8.4 | Facets 147

Example 8–12. Using a union to extend an enumeration

<xs:simpleType name="XSMLXSizeType">
<xs:union memberTypes="SMLXSizeType">
<xs:simpleType>
<xs:restriction base="xs:token">
<xs:enumeration value="extra small"/>
</xs:restriction>
</xs:simpleType>
</xs:union>
</xs:simpleType>

extra small to the set of valid values. Union types are described in
detail in Section 10.2 on p. 183.
When enumerating numbers, it is important to remember that the
enumeration facet works on the actual value of the number, not its
lexical representation as it appears in an XML instance. Example 8–13
shows a simple type NewSmallDressSizeType that is based on
integer, and speciﬁes an enumeration of 2, 4, and 6. The two instance
elements shown, which contain 2 and 02, are both valid. This is
because 02 is equivalent to 2 for integer-based types. However, if
the base type of NewSmallDressSizeType had been string, the

Example 8–13. Enumerating numeric values

Schema:

<xs:simpleType name="NewSmallDressSizeType">
<xs:restriction base="xs:integer">
<xs:enumeration value="2"/>
<xs:enumeration value="4"/>
<xs:enumeration value="6"/>
</xs:restriction>
</xs:simpleType>

Valid instances:

<size>2</size>
<size>02</size>
148 Chapter 8 | Simple types

value 02 would not be valid, because the strings 2 and 02 are not
the same. If you wish to constrain the lexical representation of a numeric
type, you should apply the pattern facet instead. For more information
on type equality in XML Schema, see Section 11.7 on p. 253.
The enumeration facet can be applied to any type except boolean.

8.4.5 Pattern
The pattern facet allows you to restrict values to a particular pattern,
represented by a regular expression. Chapter 9 provides more detail on
the rules for the regular expression syntax. Unlike most other facets
(except enumeration and assertion), the pattern facet can be
speciﬁed multiple times in a single restriction. If multiple pattern
facets are speciﬁed in the same restriction, the instance value must
match at least one of the patterns. It is not required to match all of the
patterns.
Example 8–14 shows a simple type DressSizeType that includes
the pattern \d{1,2}, which restricts the size to one or two digits.

Example 8–14. Applying the pattern facet

<xs:simpleType name="DressSizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
<xs:pattern value="\d{1,2}"/>
</xs:restriction>
</xs:simpleType>

When restricting types that have patterns, it is important to note

that you must restrict, rather than extend, the set of valid values that
the patterns represent. In Example 8–15, we deﬁne a simple type
SmallDressSizeType that is derived from DressSizeType, and add
an additional pattern facet that restricts the size to one digit.
8.4 | Facets 149

Example 8–15. Restricting a pattern

<xs:simpleType name="SmallDressSizeType">
<xs:restriction base="DressSizeType">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="6"/>
<xs:pattern value="\d{1}"/>
</xs:restriction>
</xs:simpleType>

It is not technically an error to apply a pattern facet that does

not represent a subset of the ancestors’ pattern facets. However, the
schema processor tries to match the instance value against the pattern
facets of both the type and its ancestors, ensuring that it is in fact a
subset. Example 8–16 shows an illegal attempt to deﬁne a new
size type that allows the size value to be up to three digits long. While
the schema is not in error, it will not have the desired effect
because the schema processor will check values against both the pattern
of LongerDressSizeType and the pattern of DressSizeType.
The value 004 would not be considered a valid instance of
LongerDressSizeType because it does not conform to the pattern
of DressSizeType.

Example 8–16. Illegal attempt to extend a pattern

<xs:simpleType name="LongerDressSizeType">
<xs:restriction base="DressSizeType">
<xs:pattern value="\d{1,3}"/>
</xs:restriction>
</xs:simpleType>

Unlike the enumeration facet, the pattern facet applies to the

lexical representation of the value. If the value 02 appears in an instance,
the pattern is applied to the digits 02, not 2 or +2 or any other form
of the integer.
The pattern facet can be applied to any type.
150 Chapter 8 | Simple types

8.4.6 Assertion
The assertion facet allows you to specify additional constraints on
values using XPath 2.0. Example 8–17 is a simple type with an asser-
tion, namely that the value must be divisible by 2. It uses a facet named
assertion with a test attribute that contains the XPath expression.
Simple type assertions are a ﬂexible and powerful feature covered in
more detail, along with complex type assertions, in Chapter 14.

Example 8–17. Simple type assertion

<xs:simpleType name="EvenDressSizeType">
<xs:restriction base="DressSizeType">
<xs:assertion test="$value mod 2 = 0" />
</xs:restriction>
</xs:simpleType>

8.4.7 Explicit Time Zone

The explicitTimezone facet allows you to control the presence of
an explicit time zone on a date/time value. Example 8–18 is a simple
type based on time but with an explicit time zone required. The syntax
of time zones is described in more detail in Section 11.4.13 on p. 233.
The value attribute of explicitTimezone has three possible values:

1. optional, making the time zone optional (the value for most
built-in date/time types)
2. required, making the time zone required (the value for the
dateTimeStamp built-in type)
3. prohibited, disallowing the time zone

Example 8–18. Explicit time zone

<xs:simpleType name="SpecificTimeType">
<xs:restriction base="xs:time">
<xs:explicitTimezone value="required"/>
</xs:restriction>
</xs:simpleType>
8.4 | Facets 151

8.4.8 Whitespace
The whiteSpace facet allows you to specify the whitespace normaliza-
tion rules which apply to this value. Unlike the other facets, which re-
strict the value space of the type, the whiteSpace facet is an instruction
to the schema processor on to what to do with whitespace. This type
of facet is known as a prelexical facet because it results in some process-
ing of the value before the other constraining facets are applied. The
valid values for the whiteSpace facet are:

# preserve: All whitespace is preserved; the value is not changed.

# replace: Each occurrence of tab (#x9), line feed (#xA), and
carriage return (#xD) is replaced with a single space (#x20).
# collapse: As with replace, each occurrence of tab (#x9), line
feed (#xA), and carriage return (#xD) is replaced with a
single space (#x20). After the replacement, all consecutive spaces
are collapsed into a single space. In addition, leading and trailing
spaces are deleted.

Table 8–6 shows examples of how values of a string-based type will

be handled depending on its whiteSpace facet.

Table 8–6 Handling of string values depending on whiteSpace facet

Original string string normalizedString token
(preserve) (replace) (collapse)
a string a string a string a string
on on on two lines on two lines
two lines two lines
has spaces has spaces has spaces has spaces
leading tab leading tab leading tab leading tab
leading spaces leading spaces leading spaces leading spaces

The whitespace processing, if any, will happen ﬁrst, before any vali-
dation takes place. In Example 8–9, the base type of SMLXSizeType
152 Chapter 8 | Simple types

is token, which has a whiteSpace facet of collapse. Example 8–19

shows valid instances of SMLXSizeType. They are valid because the
leading and trailing spaces are removed, and the line feed is turned into
a space. If the base type of SMLXSizeType had been string, the
whitespace would have been left as is, and these values would have been
invalid.

Example 8–19. Valid instances of SMLXSizeType

<size> small </size>

<size>extra
large</size>

Although you should understand what the whiteSpace facet repre-

sents, it is unlikely that you will ever apply it directly in your schemas.
The whiteSpace facet is ﬁxed at collapse for most built-in types.
Only the string-based types can be restricted by a whiteSpace
facet, but this is not recommended. Instead, select a base type that
already has the whiteSpace facet you want. The types string,
normalizedString, and token have the whiteSpace values
preserve, replace, and collapse, respectively. For example, if you
wish to deﬁne a string-based type that will have its whitespace collapsed,
base your type on token, instead of basing it on string and applying
a whiteSpace facet. Section 11.2.1 on p. 205 provides a discussion
of these three types.

8.5 Preventing simple type derivation

XML Schema allows you to prevent derivation of other types from
your type. By specifying the final attribute with a value of #all
in your simple type deﬁnition, you prevent derivation of any kind
8.5 | Preventing simple type derivation 153

(restriction, extension, list, or union). If you want more granular con-

trol, the value of final can be a whitespace-separated list of any of the
keywords restriction, extension, list, or union. The extension
value refers to the extension of simple types to derive complex types,
described in Section 13.4.1 on p. 306. Example 8–20 shows some valid
values for final.

Example 8–20. Valid values for the final attribute in simple type deﬁnitions
final="#all"
final="restriction list union"
final="list restriction extension"
final="union"
final=""

Example 8–21 shows a simple type that cannot be restricted

by any other type or used as the item type of a list. With this
deﬁnition of DressSizeType, it would have been illegal to deﬁne
MediumDressSizeType in Example 8–4 because it attempts to restrict
DressSizeType.

Example 8–21. Preventing type derivation

<xs:simpleType name="DressSizeType" final="restriction list">
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>

If no final attribute is speciﬁed, it defaults to the value of the

finalDefault attribute of the schema element. If neither final
nor finalDefault is speciﬁed, there are no restrictions on derivation
from that type. You can specify the empty string ("") for the final
value if you want to override the finalDefault value.
154 Chapter 8 | Simple types

8.6 Implementation-deﬁned types and

facets
Starting with version 1.1, additional simple types and facets may be
deﬁned and supported by a particular XML Schema implementation.

8.6.1 Implementation-deﬁned types

An implementation can choose to support a set of primitive simple
types in addition to those built into XML Schema (described in
Chapter 11).
Suppose that an implementation defines a special primitive type
ordinalDate that represents an ordinal date: a year, followed by a
hyphen, followed by a number from 001 to 366 indicating the day of
the year. Although an ordinal date value could be represented as a
string, it may be beneficial to promote it to its own primitive type if it
has special considerations for ordering or validation of its values, or
special operations that can be performed on it (for example, subtracting
two ordinal dates to get a duration).
A schema author can use an implementation-defined type just
like a built-in type, except that it will be in a different namespace
defined by the implementation. The schema in Example 8–22

Example 8–22. Using an implementation-deﬁned type

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:ext="http://example.org/extensions">
<xs:element name="anyOrdinalDate" type="ext:ordinalDate"/>
<xs:element name="recentOrdinalDate" type="OrdinalDateIn2011"/>
<xs:simpleType name="OrdinalDateIn2011">
<xs:restriction base="ext:ordinalDate">
<xs:minInclusive value="2011-001"/>
<xs:maxInclusive value="2011-365"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
8.6 | Implementation-defined types and facets 155

contains two references to the ordinalDate type, which is in

the hypothetical http://example.org/extensions namespace.
The anyOrdinalDate element declaration refers to the type directly
by its qualiﬁed name. The OrdinalDateIn2011 user-deﬁned simple
type is a restriction of ordinalDate using bounds facets to specify a
range of allowed values.

8.6.2 Implementation-deﬁned facets

Implementation-deﬁned facets might specify additional constraints on
the valid values, or even signal to the processor how to process the value.
An example is the Saxon processor’s preprocess facet which allows
you to specify an XPath expression that transforms the value in some
way before validation.
In Example 8–23, the saxon:preprocess facet appears
among the children of restriction. You can tell that it is an
implementation-deﬁned facet because it is in a different namespace,
http://saxon.sf.net/. This particular example is telling the pro-
cessor to convert the value to upper case before validating it against
the enumeration facets. It is essentially implementing a case-insensitive
enumeration.

Example 8–23. Using the Saxon preprocess facet

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:saxon="http://saxon.sf.net/">
<xs:simpleType name="SMLXSizeType">
<xs:restriction base="xs:token">
<saxon:preprocess action="upper-case($value)"/>
<xs:enumeration value="SMALL"/>
<xs:enumeration value="MEDIUM"/>
<xs:enumeration value="LARGE"/>
<xs:enumeration value="EXTRA LARGE"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
156 Chapter 8 | Simple types

Implementation-deﬁned facets can apply to the XML Schema built-in

types (and user-defined restrictions of them); they can also apply to
any implementation-defined types such as the ordinalDate example
type described in the previous section.
While implementation-defined types and facets can be useful, they
do affect the portability of your schema. With the schema in Exam-
ple 8–23, if you try to validate a document that contains lower-case
“small” for a size, it would be valid when using Saxon but not when
using a different implementation. Therefore, implementation-defined
facets should be used only in controlled situations. Section 23.5.3 on
p. 642 provides more information on how to make your schemas more
portable across implementations when using implementation-defined
types and facets.
Regular
expressions

158
Chapter

egular expressions are used in XML Schema to restrict the

R values of simple types to certain patterns of characters. For

example, a pattern could specify that an SKU must be three
digits, followed by a hyphen, followed by two uppercase
letters. This chapter explains the XML Schema syntax for regular
expressions.

9.1 The structure of a regular expression

XML Schema’s regular expression language is very similar to that of
the Perl programming language. Regular expressions, also known as
“regexes,” are made up of branches, which are in turn made up of pieces.
Each piece consists of one atom and an optional quantiﬁer.
For example, suppose the product number in your organization can
either be an SKU or a 7-digit number. The SKU format is three digits,
followed by a hyphen, followed by two uppercase letters—for example,

159
160 Chapter 9 | Regular expressions

Example 9–1. A simple type with a pattern

<xs:simpleType name="ProductNumberType">
<xs:restriction base="xs:string">
<xs:pattern value="\d{3}-[A-Z]{2}|\d{7}"/>
</xs:restriction>
</xs:simpleType>

123-AB. We could represent this pattern by deﬁning the simple type

shown in Example 9–1.
One difference between XML Schema regular expressions and other
regular expression languages is that XML Schema assumes anchors to
be present at the beginning and end of the expression. This means that
the whole value, not just a substring, must match the expression. In the
previous example, the whole product number must be a 6-digit SKU
or a 7-digit number, with no characters before or after it.
Figure 9–1 shows the structure of the regular expression from the
previous example.
It has two branches separated by a vertical bar (|): the ﬁrst to repre-
sent an SKU, and the second to represent a seven-digit number. If there
is more than one branch in a regular expression, a matching string must

Figure 9–1 Structure of a regular expression

9.2 | Atoms 161

match at least one of the branches; it is not required to match all of

the branches. In this case, the ﬁrst branch consists of three pieces:

# \d{3} represents the initial three digits. The atom, \d, is a

character class escape that represents any digit. The quantifier,
{3}, indicates how many times this atom (a digit) may appear.
# - represents the hyphen. The atom in this case is the hyphen,
which represents itself as a normal character. This piece does
not contain a quantifier, so the hyphen must appear once and
only once.
# [A-Z]{2} represents the two letters. The atom, [A-Z], is a
character class expression which represents any one of the letters
A through Z. The quantifier, {2}, indicates how many times
this atom (a letter) may appear.

The second branch, \d{7}, consists of only one piece which

represents the seven digits.
The rest of this chapter explains these concepts in detail.

9.2 Atoms
An atom describes one or more characters. It may be any one of the
following:

# A normal character, such as a.

# Another regular expression, enclosed in parentheses, such as
(a|b).
# An escape, indicated by a backslash, such as \d or
\p{IsBasicLatin}.
# A character class expression, indicated by square brackets, such
as [A-Z].

Each of these types of atoms is described in the sections that follow.

162 Chapter 9 | Regular expressions

9.2.1 Normal characters

An atom can be a single character, as shown in Table 9–1. Each of the
characters a, b, and c is an atom.

Table 9–1 Using normal characters

Regular expression Matching strings Nonmatching strings
a a b

a|b|c a, b, c abc

Most characters that can be entered from a keyboard can be repre-

sented directly in a regular expression. Some characters, known as
metacharacters, have special meaning and must be escaped1 in order to
be treated like normal characters. They are: ., \, ?, *, +, |, {, }, (, ),
[, and ]. This is explained in Section 9.2.3.1 on p. 165.
The space character can be used to represent itself in a regular expres-
sion. This means that you may not put any extra whitespace in the
regular expression that will not appear in the matching strings. For
example, the regular expression “a | b” will only match “a followed
by a space” or “a space followed by b”.
Characters that are not easily entered on a keyboard may also be
represented as they are in any XML document—by character references
that specify the character’s Unicode code point. XML character
references take two forms:

# “&#” plus a sequence of decimal digits representing the

character’s code point, followed by “;”
# “&#x” plus a sequence of hexadecimal digits representing the
character’s code point, followed by “;”

1. Except when they are within square brackets, as described in Section 9.2.4.6
on p. 175.
9.2 | Atoms 163

For example, a space can be represented as  . You can also

include the predeﬁned XML entities for the “less than,” “greater than,”
ampersand, apostrophe, and quote characters. Table 9–2 lists some
common XML character references and entities.

Table 9–2 Common XML character references and entities

Entity/Reference Meaning
  Space

 Line feed (also represented by \n)
 Carriage return (also represented by \r)
	 Tab (also represented by \t)
< Less than (<)
> Greater than (>)
& Ampersand (&)
' Apostrophe (')
" Quote (")

Table 9–3 illustrates the inclusion of character entities and references

in regular expressions.

Table 9–3 Using character references

Regular expression Matching strings Nonmatching strings
a z a z az

a *z az, a z, a z, a z a *z

PB&J PB&J PBJ

164 Chapter 9 | Regular expressions

9.2.2 The wildcard escape character

The period (.) has special signiﬁcance in regular expressions; it
matches any one character except a carriage return or line feed. The
period character, known as wildcard, represents only one matching
character, but a quantiﬁer (such as *) may be applied to it to represent
multiple characters. Table 9–4 shows some examples of the wildcard
escape character in use.
The period character is also useful at the beginning or end of a regular
expression to signify a pattern that starts with, ends with, or contains
a matching string, as shown in the last three examples of the table. This
gets around the implicit anchors in XML Schema regular expressions.
It is important to note that the period loses its wildcard power when
placed in a character class expression (within square brackets).

Table 9–4 Using the wildcard escape character

Regular expression Matching strings Nonmatching strings
a.z aaz, abz, a2z az, abbz

a..z aaaz, abcz, a12z az, abz

a.*z az, abz, abcdez a b

a\.z a.z abz

.*abc.* abc, xxxabc, abcxxx, xxx
xxxabcxxx

.*abc abc, xxxabc abcxxx, xxxabcxxx

abc.* abc, abcxxx xxxabc, xxxabcxxx

9.2.3 Character class escapes

A character class escape uses the backslash (\) as an escape character to
indicate any one of several characters. There are four categories of
character class escapes:
9.2 | Atoms 165

# Single character escapes, which represent one speciﬁc character

# Multicharacter escapes, which represent any one of several
characters
# Category escapes, which represent any one of a group of charac-
ters with similar characteristics (such as “Punctuation”), as
deﬁned in the Unicode standard
# Block escapes, which represent any one character within a range
of code points, as deﬁned in the Unicode standard

Note that each escape may be matched by only one character.

You must apply a quantiﬁer such as * to an escape to make it represent
multiple characters.
This section describes each of the four types of character class escapes.

9.2.3.1 Single-character escapes

Single-character escapes are used for characters that are either difﬁcult
to read and write in their natural form, or have special meaning in
regular expressions. Each escape represents only one possible matching
character. Table 9–5 provides a complete list of single-character escapes.

Table 9–5 Single-character escapes

Escape Meaning Escape Meaning
\n Line feed (#xA) \* *
\r Carriage return (#xD) \+ +
\t Tab (#x9) \{ {
\\ \ \} }
\| | $ (
\. . $ )
\- - \[ [
\^ ^ \] ]
\? ?
166 Chapter 9 | Regular expressions

Table 9–6 illustrates single-character escapes in regular expressions.

The first example has an unescaped plus sign (+). However, the plus
sign has another meaning in regular expressions—it is treated as a
quantifier on the atom consisting of the character 1. The second exam-
ple escapes the plus sign, which results in it being treated as an atom
itself that can appear in the matching string. The third example escapes
the first plus sign, but not the second, resulting in the first one being
interpreted as an atom and the second one being interpreted as a
quantifier.

Table 9–6 Using single-character escapes

Regular expression Matching strings Nonmatching strings
1+2 12, 112, 1112 1+2

1\+2 1+2 12, 1\+2

1\++2 1+2, 1++2, 1+++2 12

9.2.3.2 Multicharacter escapes

A multicharacter escape may represent any one of several characters.
Table 9–7 provides a complete list of multicharacter escapes.

Table 9–7 Multicharacter escapes

Escape Meaning
\d Any decimal digit.
\D Any character that is not a decimal digit.
\s A whitespace character (space, tab, carriage return, or line feed).
\S Any character that is not a whitespace character.
(Continues)
9.2 | Atoms 167

Table 9–7 (Continued)

Escape Meaning
\i Any character that may be the ﬁrst character of an XML name,
namely a letter, an underscore (_), or a colon (:).
\I Any character that is not permitted as the ﬁrst character of an
XML name.
\c Any character that may be part of an XML name, namely a
letter, a digit, an underscore (_), a colon (:), a hyphen (-), or
a period (.).
\C Any character that cannot be part of an XML name.
\w A “word” character, that is, any character not in one of the cate-
gories Punctuation, Separators, and Other, described in the next
section.
\W Any character in one of the categories Punctuation, Separators,
and Other, described in the next section.

Table 9–8 illustrates multicharacter escapes in regular expressions.

Table 9–8 Using multicharacter escapes

Regular expression Matching strings Nonmatching strings
a\dz a0z, a1z az, adz, a12z

a\s*z az, a z axz

9.2.3.3 Category escapes

Category escapes provide convenient groupings of characters, based on
their characteristics. These categories are deﬁned by the Unicode stan-
dard. More information about the Unicode standard can be found at
168 Chapter 9 | Regular expressions

www.unicode.org. Table 9–9 provides a complete list of category

escapes.

Table 9–9 (Continued)

Category Property Meaning
Z all separators
Zs space
Separators
Zl line
Zp paragraph
S all symbols
Sm math
Symbols Sc currency
Sk modiﬁer
So other
C all others
Cc control
Other Cf format
Co private use
Cn not assigned

The syntax to use one of these escapes is \p{xx} where xx is the

one- or two-character property. For example, \p{Nd} represents any
decimal digit. It is also possible to represent the complement—that is,
any character that is not part of the category, using a capital P. For
example, \P{Nd} represents any character that is not a decimal digit.
Table 9–10 illustrates category escapes in regular expressions.
Note that the category escapes include all character sets. If you only
intend for an expression to match the capital letters A through Z, it is
better to use [A-Z] than \p{Lu}, because \p{Lu} will allow uppercase
letters of all alphabets, not just Latin. Likewise, if your intention is to
allow only the decimal digits 0 through 9, use [0-9] rather than
\p{Nd} or \d, because there are digits other than 0 through 9 in some
languages’ scripts.
170 Chapter 9 | Regular expressions

Table 9–10 Using category escapes

Regular expression Matching strings Nonmatching strings
\p{Lu} A, B, C a, b, c, 1, 2, 3

\P{Lu} a, b, c, 1, 2, 3 A , B, C

\p{Nd} 1, 2, 3 a, b, c, A, B, C
\P{Nd} a, b, c, A, B, C 1 , 2, 3

9.2.3.4 Block escapes

Block escapes represent a range of characters based on their Unicode
code points. The Unicode standard provides names for these ranges,
such as Basic Latin, Greek, Thai, Mongolian, etc. The block names
used in regular expressions are these same names, with the spaces re-
moved. Table 9–11 lists the ﬁrst ﬁve block escape ranges as an example.
A complete list of the most recent Unicode blocks can be downloaded
from www.unicode.org/Public/UNIDATA/Blocks.txt.

Table 9–11 Partial list of block escapes

Start code End code Block name
#x0000 #x007F BasicLatin

#x0080 #x00FF Latin-1Supplement

#x0100 #x017F LatinExtended-A

#x0180 #x024F LatinExtended-B

#x0250 #x02AF IPAExtensions

... ... ...

The syntax to use one of the block escapes is \p{IsXX} where XX is

the block name. For example, \p{IsBasicLatin} represents any
9.2 | Atoms 171

character in the range #x0000 to #x007F. It is also possible to represent

the complement—that is, any character that is not part of the
block—using a capital P. For example, \P{IsBasicLatin} represents
any character that is not in that range. Table 9–12 illustrates block
escapes in regular expressions.

Table 9–12 Using block escapes

Regular expression Matching strings Nonmatching strings
\p{IsBasicLatin} a, b, c â, ß, ç

\P{IsBasicLatin} â, ß, ç a , b, c

9.2.4 Character class expressions

A character class expression allows you to specify a choice from a set
of characters. The expression, which appears in square brackets, may
include a list of individual characters or character escapes, or a char-
acter range, or both. It is also possible to negate the speciﬁed set of
characters, or subtract values from it. Like an escape, a character class
expression may only represent one character in the matching string.
To allow a matching character to appear multiple times, a quantiﬁer
may be applied to the expression.

9.2.4.1 Listing individual characters

The simplest case of a character class expression is a list of the matching
characters or escapes. The expression represents one and only one of
the characters listed. Table 9–13 illustrates a list of characters inside
an expression. The ﬁrst example can be read as “a or b or c, followed
by z.” The character class expression in the second example uses escapes
to represent one character that is either an uppercase letter or a decimal
digit.
172 Chapter 9 | Regular expressions

Table 9–13 Specifying a list of characters

Regular expression Matching strings Nonmatching strings
[abc]z az, bz, cz abz, z, abcz, abc

[\p{Lu}\d]z Az, Bz, 1z, 2z az, bz, cz, A1z

9.2.4.2 Specifying a range

A range of characters may be specified in a character class expression.
The lower and upper bounds are inclusive, and they are separated by
a hyphen. For example, to allow the letters a through f, you can spec-
ify [a-f]. The bounds must be single characters or single character
escapes. It is not valid to specify a range using multicharacter strings,
such as [(aa)-(fe)], or multicharacter escapes, such as
[\p{Lu}-\p{Ll}]. The lower bound must have a code point that is
less than or equal to that of the upper bound.
Multiple ranges may be specified in the same expression. If multiple
ranges are specified, the character must match one of the ranges.
Table 9–14 illustrates ranges in expressions. The first example can
be read as “a letter between a and f (inclusive), followed by z.” The
second example provides three ranges, namely the digits 0 to 9, lower-
case a to f, and uppercase A to F. The first character of a matching
string must conform to at least one of these ranges. The third example
uses character entities to represent the bounds.

Table 9–14 Specifying a range

Regular expression Matching strings Nonmatching strings
[a-f]z az, fz z, abz, gz, hz

[0-9a-fA-F]z 1z, az, Bz z, gz, Gz, 1aBz

[ -]z az, bz, cz âz

9.2 | Atoms 173

9.2.4.3 Combining individual characters and ranges

It is also possible to combine ranges, individual characters, and escapes
in an expression, in any order. Table 9–15 illustrates this. The ﬁrst
example allows the ﬁrst character of the matching string to be either a
digit 0 through 9, or one of the letters p, q, or r. The second example
represents nearly the same thing, with a range on the letters instead
of the numbers, and the escape \d to represent the digits. It is not
exactly the same thing because \d also includes decimal digits from
other character sets, not just the digits 0 through 9.

Table 9–15 Combining characters and ranges

Regular expression Matching strings Nonmatching strings
[0-9pqr]z 1z, 2z, pz, rz cz,dz, 0sz

[p-r\d]z 1z, 2z, pz, rz cz,dz, 0sz

9.2.4.4 Negating a character class expression

A character class expression can be negated to represent any character
that is not in the speciﬁed set of characters. You can negate any expres-
sion, regardless of whether it is a range, a list of characters, or both.
The negation character, ^, must appear directly after the opening
bracket.
Table 9–16 illustrates this negation. The character class expression
in the ﬁrst example represents “any character except a or b.” In the
second example it is “any character that is not a digit.” In the third, it
is “any character that does not fall in the range 1 through 3 or a through
c.” Note that the negation in the third example applies to both ranges.
It is not possible to negate one range but not another in the
same expression. To represent this, use subtraction, which is described
in the next section.
174 Chapter 9 | Regular expressions

Table 9–16 Negating a character class expression

Regular expression Matching strings Nonmatching strings
[^ab]z cz, dz, 1z az, bz

[^\d]z az, bz, cz 1z, 2z, 3z

[^1-3a-c]z dz, 4z 1z, az

9.2.4.5 Subtracting from a character class expression

It is possible to subtract individual values or ranges of values from a
specified set of characters. A minus sign (-) precedes the values to be
subtracted, which are themselves enclosed in square brackets.
Table 9–17 illustrates subtractions from character class expressions.
The first example represents “any character between a and z, except
for c, followed by z.” The second is “any character between a and z,
except for c and d, followed by z.” The third example subtracts a range,
namely c through e, from the range a through z. The net result is that
the allowed values are a through b and f through z. The fourth example
is a subtraction from a negation of a subtraction. The negation character
applies only to the a-z range, and the 123 digits are subtracted from
that. Essentially, the example allows the first character to be anything
except the letters a to z or the digits 1, 2, or 3.

Table 9–17 Subtracting from a character class expression

Regular expression Matching strings Nonmatching strings
[a-z-[c]]z az, dz, ez, zz cz

[a-z-[cd]]z az, ez, zz cz, dz

[a-z-[c-e]]z az, zz cz, dz, ez, 1z

[^a-z-[123]]z 4z az, 3z, zz

9.2 | Atoms 175

9.2.4.6 Escaping rules for character class expressions

Special escaping rules apply to character class expressions. They are as
follows:

# The characters [, ], \, and - should be escaped when included

as individual characters or bounds in a range.1
# The character ^ should be escaped if it appears ﬁrst in the
character class expression, directly after the opening bracket ([).

The other metacharacters do not need to be escaped when used in

a character class expression, because they have no special meaning in that
context. This includes the period character, which does not serve as a
wildcard escape character when inside a character class expression.
However, it is never an error to escape any of the metacharacters, and
getting into the habit of always escaping them eliminates the need to
remember these rules.

9.2.5 Parenthesized regular expressions

A parenthesized regular expression may be used as an atom in a larger
regular expression. Any regular expression may be included in the
parentheses, including those containing normal characters, characters
entities, escapes, and character class expressions.
Parenthesized expressions are useful for repeating certain sequences
of characters. For example, suppose you want to indicate a repetition of
the string ab. The expression ab* will match abbb, but not abab be-
cause the quantiﬁer applies to the ﬁnal atom, not the entire string. To
allow abab, you need to parenthesize the two characters: (ab)*.
Parenthesized expressions are also useful when you want to allow a
choice between several different patterns. For example, to allow either

1. The rules are actually slightly more complex and less strict than this; they
also differ between versions 1.0 and 1.1. However, it is never an error to
escape these characters inside a character class expression.
176 Chapter 9 | Regular expressions

the string ab or the string cd to come before z, you can use the expres-
sion (ab|cd)z. This example makes use of branches, which are
described further in Section 9.4 on p. 177. Table 9–18 shows some
examples of parenthesizing within regular expressions.

Table 9–18 Using parenthesized regular expressions

Regular expression Matching strings Nonmatching strings
(ab)*z z, abz, ababz az, bz, aabbz

(ab|cd)z abz, cdz abcdz, z

(a+b)*z z, abz, aabz, abaabz az, abbz

([a-f]x)*z z, axz, bxfxfxz gxz, xz

(\db)*z z, 1bz, 1b2bz 1z, bz

9.3 Quantifiers
A quantifier indicates how many times the atom may appear in a
matching string. Table 9–19 lists the quantifiers.

Table 9–19 Quantiﬁers

Quantiﬁer Meaning
none Must appear once.
? May appear 0 or 1 times.
* May appear 0 or more times.
+ May appear 1 or more times.
{n} Must appear n times.
{n,} May appear n or more times.
{n,m} May appear n through m times.
9.4 | Branches 177

Table 9–20 illustrates quantiﬁers in regular expressions. The ﬁrst

seven examples illustrate the seven types of quantifiers. They each have
three atoms: a, b, and z, with the quantifier applying only to b. The
remaining three examples show how quantifiers can apply not just to
normal character atoms, but also to character class expressions, character
class escapes, and parenthesized regular expressions, respectively.

Table 9–20 Using quantiﬁers

Regular expression Matching strings Nonmatching strings
abz abz az, abbz

ab?z az, abz abbz

ab*z az, abz, abbz, abbbz, ... a1z

ab+z abz, abbz, abbbz, ... az

ab{2}z abbz abz, abbbz

ab{2,}z abbz, abbbz, abbbbz, ... az, abz

ab{2,3}z abbz, abbbz az, abz, abbbbz

a[b-d]+z abz, abdz, addbccdddz az, aez, abez

a\p{Nd}+z a1z, a11z, a234z az, abcz

a(bc)+z abcz, abcbcz, abcbcbcz, ... az, abz, acbz

9.4 Branches
As mentioned early in this chapter, a regular expression can consist of
an unlimited number of branches. Branches, separated by the vertical
bar (|) character, represent a choice between several expressions. The
| character does not act on the atom immediately preceding it, but
on the entire expression that precedes it (back to the previous | or an
opening parenthesis). For example, the regular expression true|false
178 Chapter 9 | Regular expressions

indicates a choice between true and false, not “tru, followed by e

or f, followed by alse”. It is not necessary to put true and false in
parentheses. Table 9–21 shows some examples that exhibit the
interaction between branches, expressions, and parentheses.

Table 9–21 Branches, expressions, and parentheses

Regular expression Matching strings Nonmatching strings
true|false true, false trufalse

tru(e|f)alse trufalse true, false

yes|no|maybe yes, no, maybe yenoaybe

(a|b)c|d ac, bc, d c, ad

Union and list
types

180
Chapter

n Chapter 8, we learned how to deﬁne atomic simple types.

I This chapter covers the other two varieties of simple

types: union types and list types.

10.1 Varieties and derivation types

As we saw in Chapter 8, there are three varieties of simple types:
atomic types, list types, and union types.

# Atomic types have values that are indivisible, such as 10 and

large.
# List types have values that are whitespace-separated lists
of atomic values, such as <availableSizes>10 large
2</availableSizes>.
# Union types may have values that are either atomic values or list
values. What differentiates them is that the set of valid values,

181
182 Chapter 10 | Union and list types

or “value space,” for the type is the union of the value spaces of
two or more other simple types. For example, to represent a
dress size, you may deﬁne a union type that allows a value to
be either an integer from 2 through 18, or one of the string
values small, medium, or large.

Each newly deﬁned simple type must be based on an existing type,

using one of the following methods:

# A restriction of another type, known as the base type of the

restriction. This results in a type of the same variety as the base
type, with a restricted set of valid values. For example, you can
define a SmallInteger type that restricts the value space of
the integer type.
# A list of another type (either an atomic or union type), known
as the item type of the list. This results in a type that allows a
whitespace-separated list of values of the item type. For example,
you can define an IntegerList type that is a list of integer
values.
# A union of one or more other types, known as the member
types of the union. This results in a type that allows values that
are valid for any of its member types. For example, you can
define an IntegerOrString type that allows either an integer
or a string.

The variety of the resulting type depends on both the derivation type
and the variety of the original type. Table 10–1 shows all possible
combinations of derivation types and original type varieties. The im-
portant thing to understand is that when you restrict, for example, a
list type, the resulting type is still a list type. All the rules for list types,
such as applicable facets, also apply to this new type.
10.2 | Union types 183

Table 10–1 Varieties of derived types

Derivation type
restriction list union
atomic atomic list union
Base type variety list list not legal union
union union list †
union
† Legal only if the union type does not itself contain a list.

10.2 Union types

10.2.1 Deﬁning union types
Union types allow a value to conform to any one of several different
simple types. The syntax to deﬁne a union type is shown in Table 10–2.

Table 10–2 XSD Syntax: union type

Name
union

Parents
simpleType

Attribute name Type Description

id ID Unique ID.
memberTypes list of QName Member types that make up the union
type; either a memberTypes attribute or
a simpleType child (or a combination)
is required.
Content
annotation?, simpleType*
184 Chapter 10 | Union and list types

To continue with our DressSizeType example, perhaps we want

to allow a value to be either an integer from 2 to 18, or one of the
speciﬁc values small, medium, or large. Example 10–1 shows
the deﬁnition of a union type that accomplishes this.

Example 10–1. Deﬁning a union type

<xs:simpleType name="SizeType">
<xs:union>
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType>
<xs:restriction base="xs:token">
<xs:enumeration value="small"/>
<xs:enumeration value="medium"/>
<xs:enumeration value="large"/>
</xs:restriction>
</xs:simpleType>
</xs:union>
</xs:simpleType>

The simple types that compose a union type are known as its
member types. Member types must always be simple types; there is no
such thing as a union of complex types. There must be at least one
member type, and there is no limit for how many member types may
be specified.
In Example 10–1, the member types are defined anonymously
within the union, as simpleType children. It is also possible to spec-
ify the member types using a memberTypes attribute of the union el-
ement, as shown in Example 10–2. It is assumed that DressSizeType
and SMLSizeType are defined elsewhere in the schema.
10.2 | Union types 185

Example 10–2. Using the memberTypes attribute

<xs:simpleType name="SizeType">
<xs:union memberTypes="DressSizeType SMLSizeType"/>
</xs:simpleType>

You can also combine the memberTypes attribute with simpleType

children, as shown in Example 10–3.

Example 10–3. Combining memberTypes and simpleType

<xs:simpleType name="SizeType">
<xs:union memberTypes="DressSizeType">
<xs:simpleType>
<xs:restriction base="xs:token">
<xs:enumeration value="small"/>
<xs:enumeration value="medium"/>
<xs:enumeration value="large"/>
</xs:restriction>
</xs:simpleType>
</xs:union>
</xs:simpleType>

10.2.2 Restricting union types

It is possible to restrict a union type. The syntax for restricting a union
type is shown in Table 10–3.
Of all the facets, only three may be applied to union types: pattern,
enumeration, and assertion. These restrictions are considered to
be in addition to the restrictions of the individual member types. Ex-
ample 10–4 shows a restriction of SizeType that only allows integers
2, 4, and 6, and the value small. A value of the type SmallSizeType
is ﬁrst validated against the enumerations deﬁned in SmallSizeType,
then validated against each of the member types of SizeType until it
is successfully validated against one.
186 Chapter 10 | Union and list types

Table 10–3 XSD Syntax: union type restriction

Name
restriction

Parents
simpleType

Attribute name Type Description

id ID Unique ID.
base QName Base type of the restriction (in this case, the
union type); either a base attribute or a
simpleType child is required.

Content
annotation?, simpleType?,
(enumeration | pattern | 1.1 assertion)*

Example 10–4. Restricting a union

<xs:simpleType name="SmallSizeType">
<xs:restriction base="SizeType">
<xs:enumeration value="2"/>
<xs:enumeration value="4"/>
<xs:enumeration value="6"/>
<xs:enumeration value="small"/>
</xs:restriction>
</xs:simpleType>

10.2.3 Unions of unions

It is possible to deﬁne a union type that has another union type as
its member type. For example, if you want to expand your size
type yet again, to include non-US sizes, you might deﬁne a new type
InternationalSizeType that is the union of SizeType (which is
itself a union) and a new anonymous type, as shown in Example 10–5.
10.2 | Union types 187

Example 10–5. A union of a union

<xs:simpleType name="InternationalSizeType">
<xs:union memberTypes="SizeType">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="24"/>
<xs:maxInclusive value="54"/>
</xs:restriction>
</xs:simpleType>
</xs:union>
</xs:simpleType>

The only caveat is that union type references cannot be circular,

either directly or indirectly. For example, you cannot deﬁne a union
type union1 that has another type union2 among its member types,
if union2 also has union1 among its member types.

10.2.4 Specifying the member type in the instance

An instance element can optionally use the xsi:type attribute to
specify its type. In the case of union types, you can use xsi:type
to specify which of the member types the element conforms to. This
allows more targeted validation and provides a clue to the application
that processes the instance about what type of value to expect.
Example 10–6 shows what an instance element might look like.

Example 10–6. Specifying the member type in the instance

Naturally, this technique only works for elements, not attributes. In

the previous example, if size were an attribute, you would have no
way of specifying its member type, because attributes cannot have
attributes.
If the xsi:type attribute is not used in the instance, an element is
considered to have the ﬁrst member type for which it is valid.
188 Chapter 10 | Union and list types

10.3 List types

10.3.1 Defining list types
List types are whitespace-separated lists of atomic values. A list type is
defined by designating another simple type (an atomic or union type)
as its item type. Table 10–4 shows the syntax for defining a list type.

Table 10–4 XSD Syntax: list type

Name
list

Parents
simpleType

Attribute name Type Description

id ID Unique ID.
itemType QName The simple type of each item in the list; either
an itemType attribute or a simpleType child
is required.
Content
annotation?, simpleType?

Example 10–7 shows a simple type that allows a list of available dress
sizes.

Example 10–7. Deﬁning a list type using an itemType attribute

<xs:simpleType name="AvailableSizesType">
<xs:list itemType="DressSizeType"/>
</xs:simpleType>

An instance element of the type AvailableSizesType is shown

in Example 10–8.
10.3 | List types 189

Example 10–8. List instance

Example 10–7 uses the itemType attribute to designate a global

simple type named DressSizeType as its item type. Alternatively, the
item type can be speciﬁed anonymously in a simpleType child within
the list type deﬁnition, as shown in Example 10–9. Either the itemType
attribute or the simpleType child must appear, not both.

Example 10–9. Deﬁning a list type using a simpleType child

<xs:simpleType name="AvailableSizesType">
<xs:list>
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
</xs:list>
</xs:simpleType>

There is no way to represent an absent or nil item in a list. The

whiteSpace facet for all list types is ﬁxed at collapse, which means
that if multiple whitespace characters appear consecutively, they are
collapsed into one space. In Example 10–8, even though there are two
spaces between the values 12 and 14, there are only three items in
the list.

10.3.2 Design hint: When should I use lists?

When representing sequences of like values, you are faced with a
decision whether to use a list, such as:
<availableSizes>10 12 14</availableSizes>
190 Chapter 10 | Union and list types

or use markup to separate the distinct values, such as:

The advantage of using a list is obvious: It is less verbose. However,

there are a number of disadvantages of lists.

# They are not appropriate for values that may contain whitespace
(see Section 10.3.4 on p. 195).
# If you later wish to expand the values by adding children
or attributes, this will not be possible if you use a
list. For example, if you use markup, you can later add an
attribute to size to indicate the measurement system, such
as <size system="US-DRESS">.
# There is no way to represent nil values.
# There may be limited support for lists in other XML technolo-
gies. For example, individual values in a list cannot be accessed
via XPath 1.0 or XSLT 1.0.

10.3.3 Restricting list types

The syntax for restricting a list type is shown in Table 10–5. A limited
number of facets may be applied to list types. These facets have a
slightly different behavior when applied to a list type, because they
apply to the list as a whole, not to the individual items in the list. To
restrict the values of each item in the list, you should restrict the item
type, not the list type itself.
10.3 | List types 191

Table 10–5 XSD Syntax: list type restriction

Name
restriction

Parents
simpleType

Attribute name Type Description

id ID Unique ID.
base QName The base type of the restriction (in this case,
the list type); either a base attribute or a
simpleType child is required.

When applying facets to a list type, you do not specify the facets di-
rectly in the list type definition. Instead, you define the list type, then
define a restriction of that list type. This can be done with two separate
named simple types, or it can be accomplished all in one definition as
shown in Example 10–10.

Example 10–10. Length facet applied to a list

<xs:simpleType name="AvailableSizesType">
<xs:restriction>
<xs:simpleType>
<xs:list itemType="SMLSizeType"/>
</xs:simpleType>
<xs:maxLength value="3"/>
</xs:restriction>
</xs:simpleType>
192 Chapter 10 | Union and list types

10.3.3.1 Length facets

Length facets length, minLength, and maxLength may be used to
restrict list types. The length is measured as number of items in the
list, not the length of each item. Example 10–10 shows a list that is
restricted by a maxLength facet.
Example 10–11 shows a valid instance of AvailableSizesType.
It is valid because the number of items in the list is not more than
three. The fact that the strings medium and large are longer than three
characters is not relevant. To restrict the length of each item in the list,
apply the maxLength facet to the item type itself (SMLSizeType), not
to the list type.

Example 10–11. Valid instance of a length-restricted list

<availableSizes>medium large</availableSizes>

When you deﬁne a list type, there are no automatic restrictions on

the length of the list. Therefore, a list with zero items (i.e., empty
elements or just whitespace) is considered valid. If you do not want
a list to be valid if it is empty, restrict the list type by setting its
minLength to 1.

10.3.3.2 Enumeration facet

The enumeration facet may also be used to restrict list types.
However, the enumeration specified applies to the whole list, not to
each item in the list. For example, to restrict the values in a list to a
specific set, you may be tempted to define a simple type like the one
shown in Example 10–12.
10.3 | List types 193

Example 10–12. Enumeration applied inappropriately to a list type

<xs:simpleType name="AvailableSizesType">
<xs:restriction>
<xs:simpleType>
<xs:list itemType="xs:token"/>
</xs:simpleType>
<xs:enumeration value="small"/>
<xs:enumeration value="medium"/>
<xs:enumeration value="large"/>
</xs:restriction>
</xs:simpleType>

However, this would not behave as you expect. It would restrict the
value of the entire list to only one of the values: small, medium, or
large. Therefore, <availableSizes>small</availableSizes>
would be valid, but <availableSizes>small medium</available-
Sizes> would not. Instead, apply the enumeration to the item type,
as shown in Example 10–13.

Example 10–13. Enumeration applied to the item type of a list

<xs:simpleType name="AvailableSizesType">
<xs:list>
<xs:simpleType>
<xs:restriction base="xs:token">
<xs:enumeration value="small"/>
<xs:enumeration value="medium"/>
<xs:enumeration value="large"/>
</xs:restriction>
</xs:simpleType>
</xs:list>
</xs:simpleType>

There may be cases where you do want to restrict the entire list
to certain values. Example 10–14 shows a list that may only have two
values, as shown.
194 Chapter 10 | Union and list types

Example 10–14. Enumeration correctly applied to a list type

Schema:

<xs:simpleType name="ApplicableSizesType">
<xs:restriction>
<xs:simpleType>
<xs:list itemType="SizeType"/>
</xs:simpleType>
<xs:enumeration value="small medium large"/>
<xs:enumeration value="2 4 6 8 10 12 14 16 18"/>
</xs:restriction>
</xs:simpleType>

Instance:

<applicableSizes>small medium large</applicableSizes>

10.3.3.3 Pattern facet

The pattern facet may also be applied to list types. Like the length
and enumeration facets, the pattern facet in this case applies to the
entire list, not the items in the list. For example, suppose you want to
represent vector information as a list of integers. You want your list
to always contain zero or more groups of three integers each, separated
by whitespace. The restriction shown in Example 10–15 enforces this
constraint.

Example 10–15. Pattern applied to a list type

<xs:simpleType name="VectorType">
<xs:restriction>
<xs:simpleType>
<xs:list itemType="xs:unsignedInt"/>
</xs:simpleType>
<xs:pattern value="\d+\s+\d+\s+((\d+\s+){3})*\d+"/>
</xs:restriction>
</xs:simpleType>
10.3 | List types 195

10.3.4 Lists and strings

Be careful when deriving list types from string-based types whose values
may contain whitespace. This includes the built-in types string,
normalizedString, and token, as well as user-deﬁned types derived
from them. Since list items are separated by whitespace, strings that
contain whitespace may give unexpected results when included as items
in a list. Example 10–16 shows the deﬁnition of AvailableSizesType
as a list of SMLXSizeType, which is derived from token and allows
the values small, medium, large, and extra large.

Example 10–16. Deﬁning a list of a string-based type

<xs:simpleType name="AvailableSizesType">
<xs:list itemType="SMLXSizeType"/>
</xs:simpleType>
<xs:simpleType name="SMLXSizeType">
<xs:restriction base="xs:token">
<xs:enumeration value="small"/>
<xs:enumeration value="medium"/>
<xs:enumeration value="large"/>
<xs:enumeration value="extra large"/>
</xs:restriction>
</xs:simpleType>

Example 10–17 shows an invalid instance of AvailableSizesType.

The schema processor would consider this instance to be a list of three
items (“small”, “extra”, and “large” ) rather than the expected
two items (“small” and “extra large”). When it attempts to validate
the value “extra” against the enumerated values, it will ﬁnd it invalid.

Example 10–17. Invalid instance of AvailableSizesType

<availableSizes>
small
extra large
</availableSizes>
196 Chapter 10 | Union and list types

10.3.5 Lists of unions

Lists of union types are no different from lists of atomic types. Each
item in the list must simply be a valid value of one of the member types
of the union type. Example 10–18 deﬁnes our now familiar union
type SizeType, then deﬁnes a list type AvailableSizesType whose
item type is SizeType.

Example 10–18. Deﬁning a list of a union

<xs:simpleType name="SizeType">
<xs:union memberTypes="DressSizeType SMLXSizeType"/>
</xs:simpleType>

<xs:simpleType name="AvailableSizesType">
<xs:list itemType="SizeType"/>
</xs:simpleType>

Example 10–19 shows a valid instance of AvailableSizesType.

Note that both the integers and the enumerated small, medium, and
large are valid list items, in any order.

Example 10–19. Instance of a list of a union

<availableSizes>10 large 2</availableSizes>

The only restriction on lists of unions is that the union type cannot
have any list types among its member types. That would equate to a
list of lists, which is not legal.

10.3.6 Lists of lists

Lists of lists are not legal. The item type of a list type cannot be
a list type itself, nor can it be derived at any level from another list
type (for example, as a restriction of a list, or a union of a list).
10.3 | List types 197

Example 10–20 is illegal as it attempts to deﬁne a simple type

TwoDimensionalArrayType as a list of lists.

Example 10–20. Illegal list of lists

<xs:simpleType name="RowType">
<xs:list itemType="xs:integer"/>
</xs:simpleType>

<xs:simpleType name="TwoDimensionalArrayType">
<xs:list itemType="RowType"/>
</xs:simpleType>

Instead, you should put markup around the items in the lists. Exam-
ple 10–21 shows a complex type deﬁnition that accomplishes this and
a valid instance.

Example 10–21. An array using markup

Schema:

<xs:complexType name="VectorType">
<xs:sequence maxOccurs="unbounded">
<xs:element name="e" type="xs:integer"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="ArrayType">
<xs:sequence maxOccurs="unbounded">
<xs:element name="r" type="VectorType"/>
</xs:sequence>
</xs:complexType>

<xs:element name="array" type="ArrayType"/>

Instance:

<array>
<r> <e>1</e> <e>12</e> <e>15</e> </r>
<r> <e>44</e> <e>2</e> <e>3</e> </r>
</array>
198 Chapter 10 | Union and list types

10.3.7 Restricting the item type

Once you have deﬁned a list type, you cannot derive another list type
from it that restricts the item type. For example, it is impossible to
derive a list of MediumDressSizeType from a list of DressSizeType.
Instead, you must restrict the item type (in this case DressSizeType),
then deﬁne a new list type of the new restricted atomic type (e.g.,
MediumDressSizeType).
Built-in simple
types

200
Chapter

his chapter describes the 49 built-in simple types that are

T included in XML Schema. These simple types represent

common types that can be used directly in schemas. They
are also the foundation for deriving other simple types,
as described in Chapter 8. A complete reference to the built-in simple
types and the facets that apply to them can be found in Appendix B.

11.1 The XML Schema type system

There are 49 simple types built into XML Schema. They are speciﬁed
in Part 2 of the XML Schema recommendation. This part of the rec-
ommendation makes a distinction between “datatypes” and “simple
types.” Datatypes are abstract concepts of data, such as “integer.”
Simple types are the concrete representations of these datatypes. Most
of the built-in types are atomic types, although there are three list types
as well.

201
202 Chapter 11 | Built-in simple types

11.1.1 The type hierarchy

Types in the XML Schema type system form a hierarchy. Figure 11–1
depicts the hierarchy of the built-in types, showing that some built-in
types are derived from other built-in types. The downward arrows
represent derivation by restriction, so the types become more restrictive
toward the bottom of the diagram. For example, nonPositiveInteger

Figure 11–1 The built-in type hierarchy

11.1 | The XML Schema type system 203

is more restrictive than integer, which is more restrictive than

anyAtomicType.
At the top of the hierarchy are three special types:

1. anyType is a generic complex type that allows anything: any

attributes, any child elements, any text content.
2. anySimpleType, derived from anyType, is the base of all
simple types, including atomic, list, and union types.
3. anyAtomicType, derived from anySimpleType, is a generic
type from which all atomic types are derived.

anyType can be declared as the type of an element, in which case

that element can have any content. It is also the default type for ele-
ments if none is specified. anyType can also be extended or restricted
by complex type definitions. anySimpleType and anyAtomicType
are special types that cannot be used as the base for user-defined types
in a schema. However, they can be declared as the type of an element
or attribute.
The types directly under anyAtomicType are known as primitive
types, while the rest are derived built-in types. The primitive types
represent basic type concepts, and all other built-in atomic types are
restrictions of those types. When you define new simple types in your
schema, they can never be primitive; they must be derived from a
built-in primitive type.
Figure 11–1 shows that the three built-in list types (NMTOKENS,
ENTITIES, and IDREFS) are derived from anySimpleType. Any user-
defined list and union types are also derived from anySimpleType,
although they have item types or member types that may be specific
atomic types.
Starting in version 1.1, it is possible for implementations to support
other primitive types, in addition to the built-in types described in this
chapter. Consult the documentation of your XML Schema processor
to determine whether you have access to any additional primitive types.
204 Chapter 11 | Built-in simple types

11.1.2 Value spaces and lexical spaces

Every type in the XML Schema type system has a value space. This
value space represents the set of possible values for a type. For example,
for the int type, it is the set of integers from –2147483648 to
2147483647. Every type also has a lexical space, which includes all
the possible representations of those values. For int, each value in the
value space might have several lexical representations. For example,
the value 12 could also be written as 012 or +12. All of these values
are considered equal for this type (but not if their type were string).
Of the lexical representations, one is considered the canonical repre-
sentation: It maps one-to-one to a value in the value space and can
be used to determine whether two values are equal. For the int type,
the rule is that the canonical representation has no plus sign and no
leading zeros. If you turn each of the three values 12, +12, and 012
into their canonical representation using that rule, they would all be
12 and therefore equal to each other. Some primitive types, such as
string, only have one lexical representation, which becomes, by de-
fault, the canonical representation. In this chapter, the canonical repre-
sentation of a particular type is only mentioned if there can be more
than one lexical representation per value in the value space.

11.1.3 Facets and built-in types

As we saw in Chapter 8, simple types inherit the facets of their ancestors.
For example, the integer type has a fractionDigits facet that is
set to 0. This means that all of the twelve types derived (directly and
indirectly) from integer also have a fractionDigits of 0.
However, it is not just the facet value that is inherited, but also the
applicability of a facet. Each primitive type has certain facets that are
applicable to it. For example, the string type has length as an appli-
cable facet, but not totalDigits, because it does not make sense to
apply totalDigits to a string. Therefore, the totalDigits facet
cannot be applied to string or any of its derived types, whether they
are built-in or user-deﬁned.
11.2 | String-based types 205

It is not necessary to remember which types are primitive and which

are derived. This chapter lists the applicable facets for all of the built-in
types, not just the primitive types. When you derive new types from
the built-in types, you may simply check which facets are applicable
to the built-in type, regardless of whether it is primitive or derived.

11.2 String-based types

11.2.1 string, normalizedString, and token
The types string, normalizedString, and token represent a
character string that may contain any Unicode characters allowed by
XML. Certain characters, namely the “less than” symbol (<) and the
ampersand (&), must be escaped (using the entities < and &,
respectively) when used in strings in XML instances. The only difference
between the three types is in the way whitespace is handled by a
schema-aware processor, as shown in Table 11–1.

Table 11–1 Whitespace handling of string types

Original string string normalizedString token
(preserve) (replace) (collapse)
A string A string A string A string
On On On two lines On two lines
two lines two lines
Has spaces Has spaces Has spaces Has spaces
Leading tab Leading tab Leading tab Leading tab
Leading spaces Leading spaces Leading spaces Leading spaces

The string type has a whiteSpace facet of preserve, which

means that all whitespace characters (spaces, tabs, carriage returns, and
line feeds) are preserved by the processor.
The normalizedString type has a whiteSpace facet of replace,
which means that the processor replaces each carriage return, line feed,
206 Chapter 11 | Built-in simple types

and tab by a single space. There is no collapsing of multiple consecutive

spaces into a single space.
The token type represents a tokenized string. The name token may
be slightly confusing because it implies that there may be only one token
with no whitespace. In fact, there can be whitespace in a token value.
The token type has a whiteSpace facet of collapse, which means
that the processor replaces each carriage return, line feed, and tab by a
single space. After this replacement, each group of consecutive spaces
is collapsed into one space character, and all leading and trailing
spaces are removed.
Table 11–2 shows some valid and invalid values of the string types.

Table 11–2 Values of the string types

Valid values Comment
This is a string!
Édition française.
12.5
An empty string is valid.
PB&J When parsed, it will become
PB&J.
Separated by 3 spaces.
This
is on two lines.

Invalid values† Comment

AT&T Ampersand must be escaped.
3 < 4 The “less than” symbol must be
escaped.
† In physical XML ﬁles.
11.2 | String-based types 207

The facets indicated in Table 11–3 can restrict string,

normalizedString, and token, and their derived types.1

Table 11–3 Facets applicable to string, normalizedString, and token

types
length A minExclusive totalDigits

minLength A minInclusive fractionDigits

maxLength A maxInclusive pattern A

whiteSpace V maxExclusive enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is preserve for string, replace for normalizedString,
and collapse for token.

11.2.1.1 Design hint: Should I use string,

normalizedString, or token?
First, consider whether to use a string-based simple type at all. If it is
a long string of general text, such as a letter or a long description of an
item, this may not be wise. This is because simple types are not exten-
sible. Later, you may want to allow XHTML markup in the letter, or
break the item description down into more structured components. It
will be impossible to do this without altering the schema in a way that
is not backwards compatible.
Additionally, simple types cannot support internationalization
requirements such as Ruby annotations and BIDI (bidirectionality)
elements.
In these cases, you should instead declare a complex type with
mixed content, and include a wildcard to allow for future extensions.
The complex type deﬁnition shown in Example 11–1 accomplishes

1. A—applicable, V—value specified, F—fixed value specified.

208 Chapter 11 | Built-in simple types

this purpose. The character data content of an element of this type will
have its whitespace preserved.

Example 11–1. Extensible mixed content

<xs:complexType name="TextType" mixed="true">
<xs:sequence>
<xs:any namespace="##any" processContents="lax"
minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute ref="xml:lang"/>
</xs:complexType>

For short, atomic items, such as a postal code or a gender, it does

make sense to use string, normalizedString, or token. But which
one? Here are some general guidelines:

# string should be used for text where formatting, such as tabs

and line breaks, is significant. However, as mentioned above,
it may be better to use mixed complex type in this case.
# normalizedString is used when formatting is not significant
but consecutive whitespace characters are significant. This can
be used when the information in the string is positional.
# token should be used for most short, atomic strings, especially
ones that have an enumerated set of values. Basing your enumer-
ated types on token means that <gender> M </gender> will
be valid as well as <gender>M</gender>.

11.2.2 Name
The type Name represents an XML name, which can be used as an ele-
ment name or attribute name, among other things. Values of this type
must start with a letter, underscore (_), or colon (:), and may contain
only letters, digits, underscores (_), colons (:), hyphens (-), and
11.2 | String-based types 209

periods (.). Colons should only be used to separate namespace preﬁxes

from local names.
Table 11–4 shows some valid and invalid values of the Name type.

Table 11–4 Values of the Name type

Valid values Comment
myElement

_my.Element

my-element

pre:myelement3 This is recommended only if pre is a namespace

preﬁx; otherwise, colons should not be used.
Invalid values Comment
-myelement A Name must not start with a hyphen.
3rdElement A Name must not start with a number.
An empty value is not valid, unless xsi:nil is used.

The facets indicated in Table 11–5 can restrict Name and its derived
types.

Table 11–5 Facets applicable to Name type

length A minExclusive totalDigits

minLength A minInclusive fractionDigits

maxLength A maxInclusive pattern V

whiteSpace V maxExclusive enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse, pattern is \i\c*.
210 Chapter 11 | Built-in simple types

11.2.3 NCName
The type NCName represents an XML non-colonized name, which is
simply a name that does not contain colons. An NCName must start
with either a letter or underscore (_) and may contain only letters,
digits, underscores (_), hyphens (-), and periods (.). This is identical
to the Name type, except that colons are not permitted.
Table 11–6 shows some valid and invalid values of the NCName type.

Table 11–6 Values of the NCName type

Valid values Comment
myElement

_my.Element

my-element

Invalid values Comment

pre:myElement An NCName must not contain a colon.
-myelement An NCName must not start with a hyphen.
An empty value is not valid, unless xsi:nil is used.

The facets indicated in Table 11–7 can restrict NCName and its
derived types.

Table 11–7 Facets applicable to NCName type

length A minExclusive totalDigits

minLength A minInclusive fractionDigits

maxLength A maxInclusive pattern V

whiteSpace V maxExclusive enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse, pattern is [\i-[:]][\c-[:]]*.
11.2 | String-based types 211

11.2.4 language
The type language represents a natural language identiﬁer, generally
used to indicate the language of a document or a part of a document.
Before creating a new attribute of type language, consider using the
xml:lang attribute that is intended to indicate the natural language
of the element and its content.
Values of the language type conform to RFC 3066, Tags for
the Identiﬁcation of Languages, in version 1.0 and to RFC 4646, Tags
for Identifying Languages, and RFC 4647, Matching of Language Tags, in
version 1.1. The three most common formats are:

# For ISO-recognized languages, the format is a two- or three-

letter (usually lowercase) language code that conforms to
ISO 639, optionally followed by a hyphen and a two-letter,
usually uppercase, country code that conforms to ISO 3166.
For example, en or en-US.
# For languages registered by the Internet Assigned Numbers
Authority (IANA), the format is i-langname, where langname
is the registered name. For example, i-navajo.
# For unofﬁcial languages, the format is x-langname, where
langname is a name of up to eight characters agreed upon
by the two parties sharing the document. For example,
x-Newspeak.

Any of these three formats may have additional parts, each preceded
by a hyphen, which identify more countries or dialects. Schema proces-
sors will not verify that values of the language type conform to the
above rules. They will simply validate them based on the pattern
speciﬁed for this type, which says that it must consist of one or more
parts of up to eight characters each, separated by hyphens.
Table 11–8 shows some valid and invalid values of the
language type.
212 Chapter 11 | Built-in simple types

Table 11–8 Values of the language type

Valid values Comment
en English.
en-GB UK English.
en-US US English.
fr French.
de German.
es Spanish.
it Italian.
nl Dutch.
zh Chinese.
ja Japanese.
ko Korean.
i-navajo IANA-registered language.
x-Newspeak Private, unregistered language.
any-value-with-short-parts Although a schema processor will con-
sider this value valid, it does not follow
RFC 3066 guidelines.
Invalid values Comment
longerThan8 Parts may not exceed eight characters
in length.
An empty value is not valid, unless
xsi:nil is used.

The facets indicated in Table 11–9 can restrict language and its
derived types.
11.3 | Numeric types 213

Table 11–9 Facets applicable to language type

length A minExclusive totalDigits

minLength A minInclusive fractionDigits

maxLength A maxInclusive pattern V

whiteSpace V maxExclusive enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse, pattern is
([a-zA-Z]{1,8})(-[a-zA-Z0-9]{1,8})*.

11.3 Numeric types

11.3.1 float and double
The type float represents an IEEE single-precision 32-bit floating-
point number, and double represents an IEEE double-precision 64-bit
floating-point number. The lexical representation of both float and
double values is a mantissa (a number which conforms to the
type decimal described in the next section) followed, optionally, by
the character “E” or “e” followed by an exponent. The exponent must
be an integer. For example, 3E2 represents 3 times 10 to the 2nd
power, or 300.
In addition, the following values are valid: INF (infinity), +INF
(positive infinity, version 1.1 only), -INF (negative infinity), 0 (positive
0), -0 (negative 0), and NaN (Not a Number). 0 and -0 are considered
equal. INF and +INF are equal and are considered to be greater than
all other values, while -INF is less than all other values. The value NaN
cannot be compared to any other values.
The canonical representation for float and double always contains
an uppercase letter E and a decimal point in the mantissa. No leading
or trailing zeros are present, except that there is always at least one
digit before and after the decimal point in the mantissa, and at least
214 Chapter 11 | Built-in simple types

one digit in the exponent. No positive signs are included. For example,
the canonical representation of the float value +12 is 12.0E0.
Table 11–10 shows some valid and invalid values of the float and
double types.

Table 11–10 Values of the float and double types

Valid values Comment
-3E2

4268.22752E11

+24.3e-3

+3.5 Any value valid for decimal is also valid for float
and double.
INF Positive infinity.
-INF Negative infinity.
+INF Positive infinity, value allowed in version 1.1 but
1.1

not in 1.0.
+0 Positive 0.
-0 Negative 0.
NaN Not a Number.
Invalid values Comment
-3E2.4 The exponent must be an integer.
12E An exponent must be speciﬁed if “E” is present.
Inf Values are case-sensitive and must be capitalized
correctly.
NAN Values are case-sensitive and must be capitalized
correctly.
An empty value is not valid, unless xsi:nil is used.
11.3 | Numeric types 215

The facets indicated in Table 11–11 can restrict float, double,

and their derived types.

Table 11–11 Facets applicable to float and double types

length minExclusive A totalDigits

minLength minInclusive A fractionDigits

maxLength maxInclusive A pattern A

whiteSpace F maxExclusive A enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse.

11.3.2 decimal
The type decimal represents a decimal number of arbitrary precision.
Schema processors vary in the number of significant digits they support,
but a minimally conforming processor must support at least 16 signifi-
cant digits. The lexical representation of decimal is a sequence of
digits optionally preceded by a sign (“+” or “-”) and optionally contain-
ing a period. If the fractional part is 0 then the period and trailing zeros
may be omitted. Leading and trailing zeros are permitted but not con-
sidered significant. That is, the decimal values 3.0 and 3.0000 are
considered equal.
The canonical representation of decimal always contains a decimal
point. No leading or trailing zeros are present, except that there is always
at least one digit before and after the decimal point. No positive
signs are included.
Table 11–12 shows some valid and invalid values of the
decimal type.
216 Chapter 11 | Built-in simple types

Table 11–12 Values of the decimal type

Valid values Comment
3.0

-3.0 A negative sign is permitted.

+3.5 A positive sign is permitted.
3 A decimal point is not required.
0

-0.3

0003.0 Leading zeros are permitted.

3.0000 Trailing zeros are permitted; considered equal to 3.0.
Invalid values Comment
3,5 Commas are not permitted; the decimal separator must
be a period.
24.3e-3 Exponents cannot be speciﬁed.
An empty value is not valid, unless xsi:nil is used.

The facets indicated in Table 11–13 can restrict decimal and its
derived types.

Table 11–13 Facets applicable to decimal type

length minExclusive A. totalDigits A
minLength minInclusive A fractionDigits A
maxLength maxInclusive A pattern A
whiteSpace F maxExclusive A enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse.
11.3 | Numeric types 217

11.3.3 Integer types

The type integer represents an arbitrarily large integer; from it,
12 other built-in integer types are derived (directly or indirectly). The
lexical representation of the integer types is a sequence of digits.
Some of the integer types allow or require a sign (“+” or “-”) to precede
the numbers, others prohibit it. Leading zeros are permitted, but
decimal points are not.
The canonical representations of integer types do not contain leading
zeros or positive signs. Table 11–14 lists all of the integer types, with
their bounds and the rules for preceding signs.

Table 11–14 Integer types

Type name minInclusive maxInclusive Preceding sign
integer n/a n/a + (optional)
or -.
positive- 1 n/a + (optional).
Integer

nonPositive- n/a 0 - or + (op-

Integer tional, and
only if the
value is 0).
negative- n/a -1 - (required).
Integer

nonNegative- 0 n/a + (optional)

Integer or - (op-
tional, and
only if the
value is 0).
long -9223372036854775808 9223372036854775807 + (optional)
or -.
int -2147483648 2147483647 + (optional)
or -.
(Continues)
218 Chapter 11 | Built-in simple types

Table 11–14 (Continued)

Type name minInclusive maxInclusive Preceding sign
short -32768 32767 + (optional)
or -.
byte -128 127 + (optional)
or -.
unsignedLong 0 18446744073709551615 In 1.0:
prohibited.
1.1
In 1.1: +
(optional) or -
(optional, and
only if the
value is 0).
unsignedInt 0 4294967295 In 1.0:
prohibited.
1.1
In 1.1: +
(optional) or -
(optional, and
only if the
value is 0).
unsignedShort 0 65535 In 1.0:
prohibited.
1.1
In 1.1: +
(optional) or -
(optional, and
only if the
value is 0).
unsignedByte 0 255 In 1.0:
prohibited.
1.1
In 1.1: +
(optional) or -
(optional, and
only if the
value is 0).
11.3 | Numeric types 219

Table 11–15 shows some valid and invalid values of the integer types.
The facets indicated in Table 11–16 can restrict the integer types
and their derived types.

Table 11–15 Values of the integer types

Valid values Comment
122 Valid for all integer types except negativeInteger and
nonPositiveInteger.

00122 Leading zeros are permitted.

0 0 is permitted for most integer types (except
positiveInteger and negativeInteger).

-3 A negative sign is permitted for some integer types, see

Table 11–14.
+3 A positive sign is permitted for some integer types, see
Table 11–14.
Invalid values Comment
3. An integer must not contain a decimal point.
3.0 An integer must not contain a decimal point.
An empty value is not valid, unless xsi:nil is used.

Table 11–16 Facets applicable to integer types

length minExclusive A totalDigits A
minLength minInclusive V fractionDigits F
maxLength maxInclusive V pattern V
whiteSpace F maxExclusive A enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse, fractionDigits is 0, pattern is [\-+]?[0-9]+,
minInclusive and maxInclusive are as speciﬁed in Table 11–14.
220 Chapter 11 | Built-in simple types

11.3.3.1 Design hint: Is it an integer or a string?

When deﬁning types for values that are sequences of digits, it may be
difﬁcult to determine whether the type should be based on an integer
or a string. For example, a quantity is an example of a value that is
better treated as an integer than a string. 5-digit U.S. zip codes, on the
other hand, are valid integers, but they are probably better interpreted
as strings. Here are some general guidelines:
Use integer (or, more likely, nonNegativeInteger) if:

# You will ever compare two values of that type numerically. For
example, if you compare the quantity 100 to the quantity 99,
you obviously want 100 to be greater. But if you deﬁne them
as strings, they will be compared as strings in languages such as
XSLT 2.0 and XQuery, and 100 will be considered less than 99.
# You will ever perform mathematical operations on values of
that type. You might want to double a quantity, but you are
unlikely to want to double a zip code.
# You want to restrict their values’ bounds. For example, you may
require that quantity must be between 0 and 100. While it can
be possible to restrict a string in this way, by applying a pattern,
it is more cumbersome.

Use string (or, more likely, token) if:

# You want to restrict your values’ lexical length. For example,

zip codes must be five digits long; 8540 is not a valid zip code,
but 08540 is valid. While it is technically possible to restrict an
integer to five digits by applying a pattern, it is more
cumbersome.
# You will ever take a substring. For example, you may want to
extract the central processing facility as the first three digits of
a zip code.
# You plan to derive nonnumeric types from this type, or use it
in a substitution group with nonnumeric types. For example,
11.4 | Date and time types 221

if you plan to also deﬁne types for international postal codes,

which may contain letters or other characters, it is safer to base
your U.S. zip code elements on a string type, so that they can
be used in a substitution group with other postal code elements.

11.4 Date and time types

XML Schema provides a number of built-in date and time types, whose
formats are based on ISO 8601. This section explains each of the date
and time types and provides general information that applies to all
date and time types.

11.4.1 date
The type date represents a Gregorian calendar date. The lexical repre-
sentation of date is YYYY-MM-DD where YY represents the year, MM the
month and DD the day. No left truncation is allowed for any part of
the date. To represent years later than 9999, additional digits can be
added to the left of the year value, but extra leading zeros are not per-
mitted. To represent years before 0000, a preceding minus sign (“-”)
is allowed. An optional time zone expression may be added at the end,
as described in Section 11.4.13 on p. 233.
Table 11–17 shows some valid and invalid values of the date type.

Table 11–17 Values of the date type

Valid values Comment
2004-04-12 April 12, 2004.
-0045-01-01 January 1, 45 B.C.
12004-04-12 April 12, 12004.
(Continues)
222 Chapter 11 | Built-in simple types

Table 11–17 (Continued)

Valid values Comment
2004-04-12-05:00 April 12, 2004, US Eastern Standard Time, which
is 5 hours behind Coordinated Universal Time
(UTC).
2004-04-12Z April 12, 2004, Coordinated Universal Time
(UTC).
0000-04-12 The year zero is permitted in version 1.1, but not
1.1

in version 1.0.
Invalid values Comment
99-04-12 Left truncation of the century is not allowed.
2004-4-2 Month and day must be two digits each.
2004/04/02 Slashes are not valid separators.
04-12-2004 The value must be in YYYY-MM-DD order.
2004-04-31 The date must be a valid date (April has 30 days).
+2004-04-02 Positive signs are not permitted.
An empty value is not valid, unless xsi:nil is used.

11.4.2 time
The type time represents a time of day. The lexical representation of
time is hh:mm:ss.sss where hh represents the hour, mm the minutes,
and ss.sss the seconds. An unlimited number of additional digits
can be used to increase the precision of fractional seconds if desired.
The time is based on a 24-hour time period, so hours should be repre-
sented as 00 through 24. Either of the values 00:00:00 or 24:00:00
can be used to represent midnight. An optional time zone expression
may be added at the end, as described in Section 11.4.13 on p. 233.
Table 11–18 shows some valid and invalid values of the time type.
11.4 | Date and time types 223

Table 11–18 Values of the time type

Valid values Comment
13:20:00 1:20 P.M.
13:20:30.5555 1:20 P.M. and 30.5555 seconds.
13:20:00-05:00 1:20 P.M., US Eastern Standard Time.
13:20:00Z 1:20 P.M., Coordinated Universal Time (UTC).
00:00:00 Midnight.
24:00:00 Midnight.
Invalid values Comment
5:20:00 Hours, minutes, and seconds must be two digits
each.
13:20 Seconds must be speciﬁed, even if it is 00.
13:20.5:00 Values for hours and minutes must be integers.
13:65:00 The value must be a valid time of day.
30:05:00 The value must be a valid time of day; for a duration
of 30 hours, use the duration type.
An empty value is not valid, unless xsi:nil is used.

11.4.3 dateTime
The type dateTime represents a speciﬁc date and time. The lexical
representation of dateTime is YYYY-MM-DDThh:mm:ss.sss, which
is a concatenation of the date and time forms, separated by a literal
letter T. All of the same rules that apply to the date and time types
are applicable to dateTime as well. An optional time zone expression
may be added at the end, as described in Section 11.4.13 on p. 233.
Table 11–19 shows some valid and invalid values of the
dateTime type.
224 Chapter 11 | Built-in simple types

Table 11–19 Values of the dateTime type

Valid values Comment
2004-04-12T13:20:00 1:20 P.M. on April 12, 2004.
2004-04-12T13:20:15.5 1:20 P.M. and 15.5 seconds on April 12,
2004.
2004-04-12T13:20:00-05:00 1:20 P.M. on April 12, 2004, US Eastern
Standard Time.
2004-04-12T13:20:00Z 1:20 P.M. on April 12, 2004, Coordinat-
ed Universal Time (UTC).
Invalid values Comment
2004-04-12T13:00 Seconds must be speciﬁed.
2004-04-1213:20:00 The letter T is required.
2004-04-12t13:20:00 The letter T must be upper case.
99-04-12T13:00 The century must not be left-truncated.
2004-04-12 The time is required.
An empty value is not valid, unless
xsi:nil is used.

11.4.4 dateTimeStamp
The type dateTimeStamp represents a speciﬁc date and time, but with
a time zone required. It is derived from dateTime and has the same
lexical representation and rules. The only difference is that a value is
required to end in a time zone, as described in Section 11.4.13 on
p. 233.
Table 11–20 shows some valid and invalid values of the
dateTimeStamp type.
11.4 | Date and time types 225

Table 11–20 Values of the dateTimeStamp type

Valid values Comment
2004-04-12T13:20:00-05:00 1:20 P.M. on April 12, 2004, US Eastern
Standard Time.
2004-04-12T13:20:00Z 1:20 P.M. on April 12, 2004, Coordinat-
ed Universal Time (UTC).
Invalid values Comment
2004-04-12T13:20:00 A time zone is required.
2004-04-12T13:20:15.5 A time zone is required.
An empty value is not valid, unless
xsi:nil is used.

11.4.5 gYear
The type gYear represents a speciﬁc Gregorian calendar year. The
letter g at the beginning of most date and time types signiﬁes “Grego-
rian.” The lexical representation of gYear is YYYY. No left truncation
is allowed. To represent years later than 9999, additional digits can be
added to the left of the year value. To represent years before 0000, a
preceding minus sign (“-”) is allowed. An optional time zone expression
may be added at the end, as described in Section 11.4.13 on p. 233.
Table 11–21 shows some valid and invalid values of the gYear type.

Table 11–21 Values of the gYear type

Valid values Comment
2004 2004.
2004-05:00 2004, US Eastern Standard Time.
12004 The year 12004.
0922 The year 922.
-0045 45 B.C.
(Continues)
226 Chapter 11 | Built-in simple types

Table 11–21 (Continued)

Invalid values Comment
99 The century must not be truncated.
922 No left truncation is allowed; leading zeros should be
added if necessary.
An empty value is not valid, unless xsi:nil is used.

11.4.6 gYearMonth
The type gYearMonth represents a speciﬁc month of a speciﬁc year. The
lexical representation of gYearMonth is YYYY-MM. No left truncation
is allowed on either part. To represent years later than 9999, additional
digits can be added to the left of the year value. To represent years be-
fore 0000, a preceding minus sign (“-”) is permitted. An optional
time zone expression may be added at the end, as described in
Section 11.4.13 on p. 233.
Table 11–22 shows some valid and invalid values of the
gYearMonth type.

Table 11–22 Values of the gYearMonth type

Valid values Comment
2004-04 April 2004.
2004-04-05:00 April 2004, US Eastern Standard Time.
Invalid values Comment
99-04 The century must not be truncated.
2004 The month is required.
2004-4 The month must be two digits.
2004-13 The month must be a valid month.
An empty value is not valid, unless xsi:nil is used.
11.4 | Date and time types 227

11.4.7 gMonth
The type gMonth represents a speciﬁc month that recurs every year. It
can be used to indicate, for example, that ﬁscal year-end processing
occurs in September of every year. To represent a duration in months,
use the duration type instead. The lexical representation of gMonth
is --MM. An optional time zone expression may be added at the end,
as described in Section 11.4.13 on p. 233. No preceding sign is allowed.
Table 11–23 shows some valid and invalid values of the gMonth type.

Table 11–23 Values of the gMonth type

Valid values Comment
--04 April.
--04-05:00 April, US Eastern Standard Time.
Invalid values Comment
2004-04 The year must not be speciﬁed; use gYearMonth instead.
04 The leading hyphens are required.
--4 The month must be two digits.
--13 The month must be a valid month.
An empty value is not valid, unless xsi:nil is used.

11.4.8 gMonthDay
The type gMonthDay represents a speciﬁc day that recurs every year.
It can be used to say, for example, that your birthday is on the
12th of April every year. The lexical representation of gMonthDay is
--MM-DD. An optional time zone expression may be added at the end,
as described in Section 11.4.13 on p. 233.
Table 11–24 shows some valid and invalid values of the
gMonthDay type.
228 Chapter 11 | Built-in simple types

Table 11–24 Values of the gMonthDay type

Valid values Comment
--04-12 April 12.
--04-12Z April 12, Coordinated Universal Time (UTC).
Invalid values Comment
04-12 The leading hyphens are required.
--04-31 It must be a valid day of the year (April has 30 days).
--4-6 The month and day must be two digits each.
An empty value is not valid, unless xsi:nil is used.

11.4.9 gDay
The type gDay represents a day that recurs every month. It can be used
to say, for example, that checks are paid on the 5th of each month.
To represent a duration in days, use the duration type instead. The
lexical representation of gDay is ---DD. An optional time zone
expression may be added at the end, as described in Section 11.4.13
on p. 233.
Table 11–25 shows some valid and invalid values of the gDay type.

Table 11–25 Values of the gDay type

Valid values Comment
---02 The 2nd of the month.
Invalid values Comment
02 The leading hyphens are required.
---2 The day must be two digits.
---32 The day must be a valid day of the month; no month
has 32 days.
An empty value is not valid, unless xsi:nil is used.
11.4 | Date and time types 229

11.4.10 duration
The type duration represents a duration of time expressed as a number
of years, months, days, hours, minutes, and seconds. The lexical repre-
sentation of duration is PnYnMnDTnHnMnS, where P is a literal value
that starts the expression, nY is the number of years followed by a literal
Y, nM is the number of months followed by a literal M, nD is the number
of days followed by a literal D, T is a literal value that separates the date
and time, nH is the number of hours followed by a literal H, nM is the
number of minutes followed by a literal M, and nS is the number of
seconds followed by a literal S. The following rules apply to duration
values:

# Any of these numbers and corresponding designators may be

absent if they are equal to 0, but at least one number and
designator must appear.
# The numbers may be any unsigned integer, with the exception
of the number of seconds, which may be an unsigned decimal
number.
# If a decimal point appears in the number of seconds, there must
be at least one digit after the decimal point.
# A minus sign may appear before the P to specify a negative
duration.
# If no time items (hours, minutes, seconds) are present, the letter
T must not appear.

In the canonical representation of duration, the months value must

be less than 12, the hours value less than 24, and the minutes and sec-
onds values less than 60. This means that P15M and P1Y3M are both
valid (and equal) lexical representations that map to the same
canonical value P1Y3M.
Table 11–26 shows some valid and invalid values of the
duration type.
230 Chapter 11 | Built-in simple types

Table 11–26 Values of the duration type

Valid values Comment
P2Y6M5DT12H35M30S 2 years, 6 months, 5 days, 12 hours, 35 minutes,
30 seconds.
P1DT2H 1 day, 2 hours.
P20M 20 months (the number of months can be more
than 12).
PT20M 20 minutes.
P0Y20M0D 20 months (0 is permitted as a number, but is not
required).
P0Y 0 years.
-P60D Minus 60 days.
PT1M30.5S 1 minute, 30.5 seconds.
Invalid values Comment
P-20M The minus sign must appear ﬁrst if it is present.
P20MT No time items are present, so T must not be present.
P1YM5D No value is speciﬁed for months, so M must not be
present.
P15.5Y Only the seconds can be expressed as a decimal.
P1D2H T must be present to separate days and hours.
1Y2M P must always be present.
P2M1Y Years must appear before months.
P At least one number and designator are required.
PT15.S At least one digit must follow the decimal point if
it appears.
An empty value is not valid, unless xsi:nil is used.

When deriving types from duration, applying the bounds

facets (minExclusive, minInclusive, maxInclusive, and
11.4 | Date and time types 231

maxExclusive) can have unexpected results. For example, if the

maxInclusive value for a duration-based type is P1M, and an instance
value contains P30D, it is ambiguous. Months may have 28, 29, 30, or
31 days, so is 30 days less than a month or not?
It is best to avoid the ambiguity by always specifying bounds for
durations in the same unit in which the instance values will appear, in
this case setting maxExclusive to P32D instead of P1M. You can
use the pattern facet to force a particular unit of duration. For
example, the pattern P\d+D applied to the duration type would force
the duration to be expressed in days only.
Alternatively, if you are using version 1.1, you can use one of
the two totally ordered duration types, yearMonthDuration or
dayTimeDuration, described in the next two sections.

11.4.11 yearMonthDuration
The type yearMonthDuration, new in version 1.1, represents a dura-
tion of time expressed as a number of years and months. The lexical
representation of duration is PnYnM, where P is a literal value that
starts the expression, nY is the number of years followed by a literal Y,
and nM is the number of months followed by a literal M.
yearMonthDuration is derived from duration, and all of the same
lexical rules apply.
Table 11–27 shows some valid and invalid values of the
yearMonthDuration type.

Table 11–27 Values of the yearMonthDuration type

Valid values Comment
P1Y2M 1 year, 2 months.
P20M 20 months (the number of months can be more than 12).
(Continues)
232 Chapter 11 | Built-in simple types

Table 11–27 (Continued)

Valid values Comment
P0Y20M 20 months (0 is permitted as a number, but is not
required).
P0Y 0 years.
-P2Y Minus 2 years.
Invalid values Comment
P2Y6M5D The value cannot contain a number of days.
P-20M The minus sign must appear ﬁrst if it is present.
P1YM No value is speciﬁed for months, so M must not be present.
P15.5Y Years cannot be expressed as a decimal.
1Y2M P must always be present.
P2M1Y Years must appear before months.
P At least one number and designator are required.
An empty value is not valid, unless xsi:nil is used.

11.4.12 dayTimeDuration
The type dayTimeDuration, new in version 1.1, represents a duration
of time expressed as a number of days, hours, minutes, and seconds.
The lexical representation of duration is PnDTnHnMnS, where P is a
literal value that starts the expression, nD is the number of days followed
by a literal D, T is a literal value that separates the date and time, nH is
the number of hours followed by a literal H, nM is the number of minutes
followed by a literal M, and nS is the number of seconds followed by a
literal S.
dayTimeDuration is derived from duration, and all of the same
lexical rules apply.
Table 11–28 shows some valid and invalid values of the
dayTimeDuration type.
11.4 | Date and time types 233

Table 11–28 Values of the dayTimeDuration type

Valid values Comment
P5DT12H35M30S 5 days, 12 hours, 35 minutes, 30 seconds.
PT20M 20 minutes.
P0DT20M 20 minutes (0 is permitted as a number, but is not
required).
P0D 0 days.
-P60D Minus 60 days.
PT1M30.5S 1 minute, 30.5 seconds.
Invalid values Comment
P-20D The minus sign must appear ﬁrst if it is present.
P20DT No time items are present, so T must not be present.
PT5DM10S No value is speciﬁed for minutes, so M must not be
present.
P15.5D Only the seconds can be expressed as a decimal.
P1D2H T must be present to separate days and hours.
PT30S1H Hours must appear before seconds.
P At least one number and designator are required.
PT15.S At least one digit must follow the decimal point if it
appears.
An empty value is not valid, unless xsi:nil is used.

11.4.13 Representing time zones

All of the date and time types, with the exception of the duration types,
allow a time zone indicator at the end. The letter Z is used to indicate
Coordinated Universal Time (UTC). All other time zones are repre-
sented by their difference from Coordinated Universal Time in the
format +hh:mm or -hh:mm. These values may range from -14:00 to
14:00.
234 Chapter 11 | Built-in simple types

For example, US Eastern Standard Time, which is 5 hours behind

UTC, is represented as -05:00. If no time zone value is present, it is
considered unknown; it is not assumed to be UTC.
For most built-in types, a time zone is optional. However, for the
dateTimeStamp type, it is required. This is because that type has its
explicitTimezone facet set to required. In user-deﬁned types
derived from the date and time types, you can choose to leave
the time zone optional, or require or prohibit a time zone using the
explicitTimezone facet, as described in Section 8.4.7 on p. 150.
Table 11–29 shows some valid and invalid values of time zones.

Table 11–29 Time zone values

Valid values Comment
Z Coordinated Universal Time (UTC).
-05:00 US Eastern Standard Time, which is 5 hours behind
UTC.
+09:00 UTC plus 9 hours, Japan’s time zone.
Unknown time zone.
Invalid values Comment
+9:00 The hour must be two digits in length; use leading zeroes
if necessary.
Z+05:00 The value may be Z or a time zone, but not both.
+20:00 The range is limited to -14:00 through +14:00.
-05 Minutes are required.

11.4.14 Facets
The facets indicated in Table 11–30 can restrict the date and time types
as well as their derived types.
11.4 | Date and time types 235

Table 11–30 Facets applicable to date and time types

length minExclusive A totalDigits

minLength minInclusive A fractionDigits

maxLength maxInclusive A pattern A

whiteSpace F maxExclusive A enumeration A
1.1
explicitTimezone V 1.1
assertion A
whiteSpace is collapse, explicitTimezone is required for
dateTimeStamp, not applicable for duration types, and optional for all
other date and time types.

11.4.15 Date and time ordering

When deriving types from date and time types (other than the
duration types), it is important to note that applying the bounds
facets (minExclusive, minInclusive, maxInclusive, and
maxExclusive) can have unexpected results. If the values of the bounds
facets specify time zones and the instance values do not, or vice
versa, it may be impossible to compare the two. For example, if
maxInclusive for a time-based type is 14:30:00Z, this means that
the maximum value is 2:30 P.M. in UTC. If the value 13:30:00 appears
in an instance, which is 1:30 P.M. with no time zone speciﬁed, it is
impossible to tell if this value is valid. It could be 1:30 P.M. in UTC,
which would be valid, or 1:30 P.M. US Eastern Standard Time, which
would be 6:30 P.M. UTC, and therefore invalid. Since this is
indeterminate, the schema processor will consider it an invalid value.
To avoid this problem, either use time zones in both bounds facet
values and instance values, or do not use time zones at all. If both the
bounds and the instance values have a time zone, the two values
can be compared. Likewise, if neither has a time zone, the two
values are assumed to be in the same time zone and compared as such.
236 Chapter 11 | Built-in simple types

11.5 Legacy types

The XML DTD types described in this section are attribute types that
are speciﬁed in the XML recommendation. It is recommended
that these types are only used for attributes, in order to maintain
compatibility with XML DTDs. However, it is not an error to use
these types in element declarations.

11.5.1 ID
The type ID is used for an attribute that uniquely identiﬁes an element
in an XML document. An ID value must conform to the rules for an
NCName, as described in Section 11.2.3 on p. 210. This means that it
must start with a letter or underscore, and can only contain letters,
digits, underscores, hyphens, and periods.
ID values must be unique within an XML instance, regardless of
the attribute’s name or its element name. Example 11–2 is invalid if
attributes custID and orderID are both declared to be of type ID.

Example 11–2. Invalid nonunique IDs

In version 1.0, ID carries two additional constraints, both of which

have been eliminated in version 1.1:

1. A complex type cannot include more than one attribute of type

ID or of any type derived from ID. The type definition in
Example 11–3 is illegal.
2. ID attributes cannot have default or fixed values specified. The
attribute declarations in Example 11–4 are illegal.
11.5 | Legacy types 237

Example 11–3. Illegal duplication of ID attributes (version 1.0)

<xs:complexType name="CustType">
<xs:attribute name="id" type="xs:ID"/>
<xs:attribute name="custID" type="xs:ID"/>
</xs:complexType>

Example 11–4. Illegal attribute declarations (version 1.0)

<xs:attribute name="id" type="xs:ID" fixed="A123"/>
<xs:attribute name="custID" type="xs:ID" default="C00000"/>

The facets indicated in Table 11–31 can restrict ID and its derived
types.

Table 11–31 Facets applicable to ID type

length A minExclusive totalDigits

minLength A minInclusive fractionDigits

maxLength A maxInclusive pattern V

whiteSpace V maxExclusive enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse, pattern is [\i-[:]][\c-[:]]*.

11.5.2 IDREF
The type IDREF is used for an attribute that references an ID. A com-
mon use case for IDREF is to create a cross-reference to a particular
section of a document. Like ID, an IDREF value must be an NCName,
as described in Section 11.2.3 on p. 210.
All attributes of type IDREF must reference an ID in the same XML
document. In Example 11–5, the ref attribute of quote is of type
IDREF, and the id attribute of footnote is of type ID. The instance
contains a reference between them.
238 Chapter 11 | Built-in simple types

Example 11–5. Using IDREF

Schema:

<xs:element name="quote">
<xs:complexType>

<xs:attribute name="ref" type="xs:IDREF"/>
</xs:complexType>
</xs:element>
<xs:element name="footnote">
<xs:complexType>

<xs:attribute name="id" type="xs:ID" use="required"/>
</xs:complexType>
</xs:element>

Instance:

ID and IDREF are best used for referencing unique locations in

document-oriented XML. To enforce complex uniqueness of data
values, and primary and foreign key references, consider using identity
constraints, which are described in Chapter 17.
The facets indicated in Table 11–32 can restrict IDREF and its
derived types.

Table 11–32 Facets applicable to IDREF type

length A minExclusive totalDigits

minLength A minInclusive fractionDigits

maxLength A maxInclusive pattern V

whiteSpace V maxExclusive enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse, pattern is [\i-[:]][\c-[:]]*.
11.5 | Legacy types 239

11.5.3 IDREFS
The type IDREFS represents a list of IDREF values separated by
whitespace. There must be at least one IDREF in the list.
Each of the values in an attribute of type IDREFS must reference an
ID in the same XML document. In Example 11–6, the refs attribute
of quote is of type IDREFS, and the id attribute of footnote is of
type ID. The instance contains a reference from the quote element to
two footnote elements, with their IDs (fn1 and fn2) separated by
whitespace.

Example 11–6. Using IDREFS

Schema:

<xs:element name="quote">
<xs:complexType>

<xs:attribute name="refs" type="xs:IDREFS"/>
</xs:complexType>
</xs:element>
<xs:element name="footnote">
<xs:complexType>

<xs:attribute name="id" type="xs:ID" use="required"/>
</xs:complexType>
</xs:element>

Instance:

<quote refs="fn1 fn2">...</quote>

The facets indicated in Table 11–33 can restrict IDREFS and its
derived types.
240 Chapter 11 | Built-in simple types

Table 11–33 Facets applicable to IDREFS type

length A minExclusive totalDigits

minLength V minInclusive fractionDigits

maxLength A maxInclusive pattern A

whiteSpace V maxExclusive enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse, minLength is 1.

Since IDREFS is a list type, restricting an IDREFS value with these

facets may not behave as you expect. The facets length, minLength,
and maxLength apply to the number of items in the IDREFS list, not
the length of each item. The enumeration facet applies to the whole
list, not the individual items in the list. For more information, see
Section 10.3.3 on p. 190.

11.5.4 ENTITY
The type ENTITY represents a reference to an unparsed entity. The
ENTITY type is most often used to include information from another
location that is not in XML format, such as graphics. An ENTITY
value must be an NCName, as described in Section 11.2.3 on p. 210.
An ENTITY value carries the additional constraint that it must match
the name of an unparsed entity in a document type deﬁnition (DTD)
for the instance.
Example 11–7 shows an XML document that links product numbers
to pictures of the products. In the schema, the picture element dec-
laration declares an attribute location that has the type ENTITY. In
the instance, each value of the location attribute (in this case,
prod557 and prod563) matches the name of an entity declared in the
internal DTD subset of the instance.
The facets indicated in Table 11–34 can restrict ENTITY and its
derived types.
11.5 | Legacy types 241

Example 11–7. Using an unparsed entity

Schema:

<xs:element name="picture">
<xs:complexType>
<xs:attribute name="location" type="xs:ENTITY"/>
</xs:complexType>
</xs:element>

Instance:

<!DOCTYPE catalog SYSTEM "catalog.dtd" [

<!NOTATION jpeg SYSTEM "JPG">
<!ENTITY prod557 SYSTEM "prod557.jpg" NDATA jpeg>
<!ENTITY prod563 SYSTEM "prod563.jpg" NDATA jpeg>
]>

Table 11–34 Facets applicable to ENTITY type

length A minExclusive totalDigits

minLength A minInclusive fractionDigits

maxLength A maxInclusive pattern V

whiteSpace V maxExclusive enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse, pattern is [\i-[:]][\c-[:]]*.
242 Chapter 11 | Built-in simple types

11.5.5 ENTITIES
The type ENTITIES represents a list of ENTITY values separated by
whitespace. There must be at least one ENTITY in the list. Each of the
ENTITY values must match the name of an unparsed entity that has
been declared in a document type deﬁnition (DTD) for the instance.
Expanding on the example from the previous section, Example 11–8
shows the declaration of an attribute named location that is of type
ENTITIES. In the instance, the location attribute can include a list
of entity names. Each value (in this case there are two: prod557a and
prod557b) matches the name of an entity that is declared in the internal
DTD subset for the instance.

Example 11–8. Using ENTITIES

Schema:

<xs:element name="pictures">
<xs:complexType>
<xs:attribute name="location" type="xs:ENTITIES"/>
</xs:complexType>
</xs:element>

Instance:

<!DOCTYPE catalog SYSTEM "catalog.dtd" [

<!NOTATION jpeg SYSTEM "JPG">
<!ENTITY prod557a SYSTEM "prod557a.jpg" NDATA jpeg>
<!ENTITY prod557b SYSTEM "prod557b.jpg" NDATA jpeg>
]>

The facets indicated in Table 11–35 can restrict ENTITIES and its
derived types.
11.5 | Legacy types 243

Table 11–35 Facets applicable to ENTITIES type

length A minExclusive totalDigits

minLength V minInclusive fractionDigits

maxLength A maxInclusive pattern A

whiteSpace V maxExclusive enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse, minLength is 1.

Since ENTITIES is a list type, restricting an ENTITIES value

with these facets may not behave as you expect. The facets length,
minLength, and maxLength apply to the number of items in the
ENTITIES list, not the length of each item. The enumeration facet
applies to the whole list, not the individual items in the list. For more
information, see Section 10.3.3 on p. 190.

11.5.6 NMTOKEN
The type NMTOKEN represents a single string token. NMTOKEN values
may consist of letters, digits, periods (.), hyphens (-), underscores (_),
and colons (:). They may start with any of these characters. NMTOKEN
has a whiteSpace facet value of collapse, so any leading or trailing
whitespace will be removed. However, no whitespace may appear
within the value itself. Table 11–36 shows some valid and invalid values
of the NMTOKEN type.

Table 11–36 Values of the NMTOKEN type

Valid values Comment
ABCD
123_456
starts_with_a_space When parsed, leading spaces will be removed.
(Continues)
244 Chapter 11 | Built-in simple types

Table 11–36 (Continued)

Invalid values Comment
contains a space Value must not contain a space.
An empty value is not valid, unless xsi:nil
is used.

The facets indicated in Table 11–37 can restrict NMTOKEN and its
derived types.

Table 11–37 Facets applicable to NMTOKEN type

length A minExclusive totalDigits

minLength A minInclusive fractionDigits

maxLength A maxInclusive pattern V

whiteSpace V maxExclusive enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse, pattern is \c+.

11.5.7 NMTOKENS
The type NMTOKENS represents a list of NMTOKEN values separated by
whitespace. There must be at least one NMTOKEN in the list. Table 11–38
shows some valid and invalid values of the NMTOKENS type.
The facets indicated in Table 11–39 can restrict NMTOKENS and its
derived types.
Since NMTOKENS is a list type, restricting an NMTOKENS value with
these facets may not behave as you expect. The facets length,
minLength, and maxLength apply to the number of items in the
NMTOKENS list, not the length of each item. The enumeration facet
applies to the whole list, not the individual items in the list. For more
information, see Section 10.3.3 on p. 190.
11.5 | Legacy types 245

Table 11–38 Values of the NMTOKENS type

Valid values Comment
ABCD 123

ABCD One-item list.

Invalid values Comment
An empty value is not valid, unless xsi:nil is used.

Table 11–39 Facets applicable to NMTOKENS type

length A minExclusive totalDigits

minLength V minInclusive fractionDigits

maxLength A maxInclusive pattern A

whiteSpace V maxExclusive enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse, minLength is 1.

11.5.8 NOTATION
The type NOTATION represents a reference to a notation. A notation is
a method of interpreting XML and non-XML content. For example,
if an element in an XML document contains binary graphics data in
JPEG format, a notation can be declared to indicate that this is JPEG
data. An attribute of type NOTATION can then be used to indicate which
notation applies to the element’s content. A NOTATION value must be
a QName as described in Section 11.6.1 on p. 246.
NOTATION is the only built-in type that cannot be the type of at-
tributes or elements. Instead, you must deﬁne a new type that restricts
NOTATION, applying one or more enumeration facets. Each of these
enumeration values must match the name of a declared notation. For
more information on declaring notations and NOTATION-based types,
see Section 19.7 on p. 493.
246 Chapter 11 | Built-in simple types

The facets indicated in Table 11–40 can restrict NOTATION and its
derived types.

Table 11–40 Facets applicable to NOTATION type

length A minExclusive totalDigits

minLength A minInclusive fractionDigits

maxLength A maxInclusive pattern A

whiteSpace F maxExclusive enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse.

11.6 Other types

11.6.1 QName
The type QName represents an XML namespace-qualified name that
consists of a namespace name and a local part.
When appearing in XML documents, the lexical representation of
a QName consists of a prefix and a local part, separated by a colon, both
of which are NCNames. The prefix and colon are optional.
The lexical structure is mapped onto the QName value in the context
of namespace declarations, as described in Chapter 3. If the QName
value is prefixed, the namespace name is that which is in scope for that
prefix. If it is not prefixed, the default namespace declaration in scope
(if any) becomes the QName’s namespace.
QName is not based on string like the other name-related types,
because it has this special two-part value with additional constraints
that cannot be expressed with XML Schema facets. Table 11–41 shows
some valid and invalid values of the QName type.
11.6 | Other types 247

Table 11–41 Values of the QName type

Valid values Comment
pre:myElement Valid assuming the preﬁx pre is mapped to a
namespace in scope.
myElement Preﬁx and colon are optional.
Invalid values Comment
:myElement A QName must not start with a colon.
pre:3rdElement The local part must not start with a number; it must
be a valid NCName.
An empty value is not valid, unless xsi:nil is used.

The facets indicated in Table 11–42 can restrict QName and its
derived types.

Table 11–42 Facets applicable to QName type

length A minExclusive totalDigits

minLength A minInclusive fractionDigits

maxLength A maxInclusive pattern A

whiteSpace F maxExclusive enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse.

11.6.2 boolean
The type boolean represents logical yes/no values. The valid values
for boolean are true, false, 0, and 1. Values that are capitalized
(e.g., TRUE) or abbreviated (e.g., T) are not valid. Table 11–43 shows
some valid and invalid values of the boolean type.
248 Chapter 11 | Built-in simple types

Table 11–43 Values of the boolean type

Valid values Comment
true

false

0 false

1 true

Invalid values Comment

TRUE Values are case sensitive.
T The word “true” must be spelled out.
An empty value is not valid, unless xsi:nil is used.

The facets indicated in Table 11–44 can restrict boolean and its
derived types.

Table 11–44 Facets applicable to boolean type

length minExclusive totalDigits
minLength minInclusive fractionDigits
maxLength maxInclusive pattern A
whiteSpace F maxExclusive enumeration
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse.

11.6.3 The binary types

The types hexBinary and base64Binary represent binary data. Their
lexical representation is a sequence of binary octets.
The type hexBinary uses hexadecimal encoding, where each binary
octet is a two-character hexadecimal number. Lowercase and uppercase
11.6 | Other types 249

letters A through F are permitted. For example, 0FB8 and 0fb8 are
two equal hexBinary representations consisting of two octets. The
canonical representation of hexBinary uses only uppercase letters.
The type base64Binary, typically used for embedding images
and other binary content, uses base64 encoding, as described in
RFC 3548. The following rules apply to base64Binary values:

# The following characters are allowed: the letters A to Z (upper

and lower case), digits 0 through 9, the plus sign (+), the slash
(/), the equals sign (=), and XML whitespace characters.
# XML whitespace characters may appear anywhere in the value.
# The number of nonwhitespace characters must be divisible by 4.
# Equals signs may only appear at the end of the value, and there
may be zero, one, or two of them. If there are two equals signs,
they must be preceded by one of the following characters: AQgw.
If there is only one equals sign, it must be preceded by one of
the following characters: AEIMQUYcgkosw048. In either
case, there may be whitespace between the necessary characters
and the equals sign(s).

The canonical representation of base64Binary removes all

whitespace characters. For more information on base64 encoding, see
RFC 3548, The Base16, Base32, and Base64 Data Encodings.
Table 11–45 shows some valid and invalid values of the binary types.
The facets indicated in Table 11–46 can restrict hexBinary,
base64Binary, and their derived types.
The length facet for the binary types represents the number of
binary octets (groups of 8 bits each). For example, the length of the
hexBinary value 0FB8 is equal to 2. Since base64 characters represent
6 bits each, the length of the base64Binary value 0FB8 is equal to 3.
Whitespace and equals signs are ignored when calculating the length
of a base64Binary value.
250 Chapter 11 | Built-in simple types

Table 11–45 Values of the binary types

Valid values Comment
0FB8

0fb8 In hexBinary, the equivalent of 0FB8;

in base64Binary represents a different value.
0 FB8 0F+9 base64Binary only; whitespace is allowed anywhere
in the value (not valid for hexBinary).
0F+40A== base64Binary only; equals signs are used for padding
(not valid for hexBinary).
An empty value is valid.
Invalid values Comment
FB8 An odd number of characters is not valid; characters
appear in pairs (in hexBinary) or groups of four (in
base64Binary).

==0F Equals signs may only appear at the end in

base64Binary, and not at all in hexBinary.

Table 11–46 Facets applicable to binary types

length A minExclusive totalDigits
minLength A minInclusive fractionDigits
maxLength A maxInclusive pattern A
whiteSpace F maxExclusive enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse.

11.6.4 anyURI
The type anyURI represents a Uniform Resource Identiﬁer (URI) ref-
erence. URIs are used to identify resources, and they may be absolute
11.6 | Other types 251

or relative. Absolute URIs provide the entire context for locating a re-
source, such as http://datypic.com/prod.html. Relative URIs
are specified as the difference from a base URI, for example
../prod.html. It is also possible to specify a fragment identifier using
the # character, for example ../prod.html#shirt.
The three previous examples happen to be HTTP URLs (Uniform
Resource Locators), but URIs also encompass URLs of other schemes
(e.g., FTP, gopher, telnet), as well as URNs (Uniform Resource Names).
URIs are not required to be dereferenceable; that is, it is not necessary
for there to be a web page at http://datypic.com/prod.html in
order for this to be a valid URI.
URIs require that some characters be escaped with their hexa-
decimal Unicode code point preceded by the % character. This
includes non-ASCII characters and some ASCII characters including
control characters, space, and certain punctuation characters. For
example, ../édition.html must be represented instead as
../%C3%A9dition.html, with the é escaped as %C3%A9. However,
the anyURI type will accept these characters either escaped or un-
escaped. With the exception of the characters % and #, it will assume
that unescaped characters are intended to be escaped when used in an
actual URI, although the schema processor will do nothing to alter
them. It is valid for an anyURI value to contain a space, but this practice
is strongly discouraged. Spaces should instead be escaped using %20.
For more information on URIs, see RFC 2396, Uniform Resource
Identifiers (URI): Generic Syntax.
Version 1.1 expands the definition of anyURI to include IRfIs, or
Internationalized Resource Identifiers. Compared to URIs, IRIs allow
a much broader range of characters without requiring them to be es-
caped. Since the anyURI type does not require escaping anyway, this
has little practical impact on your schemas. For more information about
IRIs, see RFC 3987, Internationalized Resource Identifiers (IRIs).
Note that when relative URI references such as ../prod are used
as values of anyURI, no attempt is made by the schema processor to
determine or keep track of the base URI to which they may be applied.
For example, it will not attempt to resolve the value relative to the URL
252 Chapter 11 | Built-in simple types

of the containing document, or any xml:base attributes that may

appear in it.
Table 11–47 shows some examples of valid and invalid anyURI
values. The schema processor is not required to parse the contents of
an anyURI value to determine whether it is valid according to any
particular URI scheme. Since the bare minimum rules for valid URI
references are fairly generic, the schema processor will accept most
character strings, including an empty value. The only values that are
not accepted are ones that make inappropriate use of reserved characters,
such as those containing multiple # characters or % characters not
followed by two hexadecimal digits.

Table 11–47 Valid values of anyURI type

Valid values Comment
http://datypic.com Absolute URI (also a URL).
mailto:[email protected] Absolute URI.
../%C3%A9dition.html Relative URI containing es-
caped non-ASCII character.
../édition.html Relative URI containing
unescaped non-ASCII
character.
http://datypic.com/prod.html#shirt URI with fragment identiﬁer.
../prod.html#shirt Relative URI with fragment
identiﬁer.
urn:example:org URN.
An empty value is allowed.
Invalid values Comment
http://datypic.com#frag1#frag2 Too many # characters.
http://datypic.com#f% rag % character followed by
something other than two
hexadecimal digits.
11.7 | Comparing typed values 253

The facets indicated in Table 11–48 can restrict anyURI and its
derived types.

Table 11–48 Facets applicable to anyURI type

length A minExclusive totalDigits

minLength A minInclusive fractionDigits

maxLength A maxInclusive pattern A

whiteSpace F maxExclusive enumeration A
1.1 1.1
explicitTimezone assertion A
whiteSpace is collapse.

11.7 Comparing typed values

When a schema processor is comparing two values, it does more than
compare lexical values as if they were strings. It takes into account the
types of the values. This comes into play during validation of an
instance in several places:

# Validating ﬁxed values

# Validating enumerated values
# Validating values against bounds facets
# Determining uniqueness of identity constraint ﬁelds
# Validating key references
# Comparisons in assertions

This is also important to consider when using schema-aware lan-

guages, such as XSLT 2.0 and XQuery, which allow a processor to use
type information from the schema when comparing values.
One of the factors used in determining the equality of two values is
the relationship of their types in the derivation hierarchy. Types that
254 Chapter 11 | Built-in simple types

are related to each other by restriction, list, or union can have values
that are equal. For example, the value 2 of type integer and the
value 2 of type positiveInteger are considered equal, since
positiveInteger is derived from integer. Types that are not related
in the hierarchy can never have values that are equal. This means that
an integer value will never equal a string value, even if they are
both 2. This is true of both the built-in and user-derived types.
Example 11–9 illustrates this point.1

Example 11–9. Equality based on type deﬁnition hierarchy

<integer>2</integer> does not equal <string>2</string>
<integer>2</integer> equals <positiveInteger>2</positiveInteger>
<string>abc</string> equals <NCName>abc</NCName>
<string>abc</string> does not equal <QName>abc</QName>
<IDREFS>abc</IDREFS> equals <IDREF>abc</IDREF>

Some of the built-in types have multiple lexical representations that

are equivalent. For example, an integer may be represented as 2, 02,
+2, or +00002. These values are all considered equal if they have the
type integer, because they all represent the same canonical value.
However, these same lexical values are unequal if they have the type
string. Example 11–10 illustrates this point.
Another factor to take into account is whitespace normalization.
Whitespace is normalized before any validation takes place. Therefore,
it plays a role in determining whether two values are equal. For example,
the string type has a whiteSpace facet value of preserve, while

1. Assume for this section that there are element declarations with names
that are the same as their type names. For example, <xs:element
name="integer" type="xs:integer"/>.
11.7 | Comparing typed values 255

the token type’s is collapse. The value “ a ” that has the type string
will not equal “ a ” that has the type token, because the leading and
trailing spaces will be stripped for the token value. Example 11–11
illustrates this point.

Example 11–10. Equality based on equivalent lexical representations

<integer>2</integer> equals <integer>02</integer>
<integer>2</integer> equals <positiveInteger>02</positiveInteger>
<string>2</string> does not equal <string>02</string>
<boolean>true</boolean> equals <boolean>1</boolean>
<hexBinary>0fb8</hexBinary> equals <hexBinary>0FB8</hexBinary>
<time>13:20:00-05:00</time> equals <time>12:20:00-06:00</time>

Example 11–11. Equality based on whitespace normalization

<string> a </string> does not equal <token> a </token>
<string>a</string> equals <token> a </token>
<token>a</token> equals <token> a </token>
Complex types

256
Chapter

omplex types are used to deﬁne the content model and at-

C tributes of elements. This chapter introduces complex types.

It covers the four content types (simple, element-only, mixed,
and empty) and the use of element declarations, model
groups, attribute declarations, and wildcards to deﬁne complex types.

12.1 What are complex types?

Elements that have complex types have child elements or attributes.
They may also have character content. Example 12–1 shows the
elements size, product, letter, and color that have complex
types. They have the four different content types that are described in
this chapter (simple, element-only, mixed, and empty, respectively).
Attributes can never have complex types; they always have simple
types. This makes sense, because attributes cannot themselves have
children or attributes.

257
258 Chapter 12 | Complex types

Example 12–1. Elements with complex types

<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
</product>

<letter>Dear <custName>Priscilla Walmsley</custName>...</letter>

12.2 Deﬁning complex types

12.2.1 Named complex types
Complex types may be either named or anonymous. Named types can
be used by multiple element and attribute declarations. They are always
defined globally (i.e., their parent is always schema1) and are required
to have a name that is unique among the types (both simple and com-
plex) in the schema. The syntax to define a named complex type is
shown in Table 12–1.
The name of a complex type must be an XML non-colonized name,
which means that it must start with a letter or underscore, and may
only contain letters, digits, underscores, hyphens, and periods. You
cannot include a namespace prefix when defining the type; it takes its
namespace from the target namespace of the schema document. All of
the examples of types in this book have the word “Type” at the end
of their names, to clearly distinguish them from element names.
However, this is a convention and not a requirement. You can even
have a type definition and an element declaration using the same name,
but this is not recommended because it can be confusing.

1. Except in the case of a redeﬁne or override.

12.2 | Defining complex types 259

Table 12–1 XSD Syntax: named complex type deﬁnition

Name
complexType

Parents
schema, redefine, 1.1
override

Attribute name Type Description

id ID Unique ID.
name NCName Complex type name.
mixed boolean: false Whether the complex type
allows mixed content, see
Section 12.3.3.
abstract boolean: false Whether the type can be
used in an instance, see
Section 13.7.4.
block "#all" | list of Whether to block type substi-
("extension" | tution in the instance, see
"restriction") Section 13.7.2; defaults to
blockDefault of schema.
final "#all" | list of Whether other types can be
("extension" | derived from this one, see
"restriction") Section 13.7.1; defaults to
finalDefault of schema.
1.1
defaultAttributes- boolean: true Whether the default attribute
Apply group applies to this complex
type, see Section 15.3.3.
Content
annotation?, (simpleContent | complexContent | ( 1.1 openContent?,
(group | all | choice | sequence)?, ((attribute |
attributeGroup)*, anyAttribute?), 1.1 assert*))
260 Chapter 12 | Complex types

Example 12–2 shows the deﬁnition of the named complex type

ProductType, along with an element declaration that references it.

Example 12–2. Named complex type

<xs:element name="product" type="ProductType"/>

12.2.2 Anonymous complex types

Anonymous complex types, on the other hand, must not have names.
They are always defined entirely within an element declaration, and
may only be used once, by that declaration. Defining a type anony-
mously prevents it from ever being restricted, extended, redefined, or
overridden. The syntax to define an anonymous complex type is shown
in Table 12–2.
Example 12–3 shows the definition of an anonymous complex type
within an element declaration.

Example 12–3. Anonymous complex type

<xs:element name="product">
<xs:complexType>
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="SizeType"/>
</xs:sequence>
</xs:complexType>
</xs:element>
12.2 | Defining complex types 261

Table 12–2 XSD Syntax: anonymous complex type deﬁnition

Name
complexType

Parents
element, 1.1
alternative

Attribute name Type Description

id ID Unique ID.
mixed boolean: false Whether the complex type
allows mixed content.
1.1
defaultAttributes- boolean: true Whether the default attribute
Apply group applies to this complex
type, see Section 15.3.3.
Content
annotation?, (simpleContent | complexContent | ( 1.1 openContent?,
(group | all | choice | sequence)?, ((attribute |
attributeGroup)*, anyAttribute?), 1.1 assert*))

The question of whether to use named or anonymous types is covered

in Section 8.2.3 on p. 133.

12.2.3 Complex type alternatives

There are four different possible structures for the children of
complexType elements, representing four different methods of
creating complex types:

# A single complexContent child, which is used to derive a

complex type from another complex type. It is covered in detail
in the next chapter, Chapter 13.
# A single simpleContent child, which is used to derive a
complex type from a simple type. This is covered brieﬂy in
262 Chapter 12 | Complex types

Section 12.3.1 of this chapter, and in more detail in the next

chapter.
# A group (group, all, choice, or sequence) and/or attribute
declarations. This is used to deﬁne a complex type without de-
riving it from any particular type. We will cover this method
in this chapter.
# No content at all, in which case the type allows no attributes
and no content.

The various declarations that make up the content of a complex

type are known collectively as particles. Particles include local element
declarations, element references, model groups (all, choice, or
sequence), named model group references, and element wildcards.
All of these kinds of particles are described in detail in this chapter.

12.3 Content types

The contents of an element are the character data and child elements
that are between its tags. The order and structure of the children allowed
by a complex type are known as its content model. There are four types
of content for complex types: simple, element-only, mixed, and empty.
The content type is independent of attributes; all of these content types
allow attributes. Figure 12–1 shows the decision tree needed to deter-
mine the appropriate content type. This section explains the four con-
tent types and provides an example of how each is represented in a
schema.

12.3.1 Simple content

Simple content allows character data only, with no children. Exam-
ple 12–4 shows the element size that has character data (the value
10) but no child elements. Generally, the only thing that distinguishes
a simple type from a complex type with simple content is that the latter
12.3 | Content types 263

Figure 12–1 Content type decision tree

may have attributes. If a size element could never have a system

attribute, you could just give size a simple type, as described in
Chapter 8.
Example 12–5 shows a complex type deﬁnition that might be as-
signed to the size element that appears in Example 12–4. It deﬁnes
SizeType whose character data content conforms to the simple type
integer. It extends integer to add the attribute system.

Example 12–4. Instance element with simple content

Example 12–5. Complex type with simple content

<xs:complexType name="SizeType">
<xs:simpleContent>
<xs:extension base="xs:integer">
<xs:attribute name="system" type="xs:token"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
264 Chapter 12 | Complex types

12.3.2 Element-only content

Element-only content allows only children, with no character data
content. Example 12–6 shows an element product that has element-
only content. It has four children: number, name, size, and color.

Example 12–6. Instance element with element-only content

<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size system="US-DRESS">10</size>
<color value="blue"/>
</product>

Example 12–7 shows a complex type deﬁnition that might be used

by the product element declaration.

Example 12–7. Complex type with element-only content

12.3.3 Mixed content

Mixed content allows character data as well as child elements. This is
most often used for freeform text such as letters and documents. Exam-
ple 12–8 shows an element desc with mixed content. Note that there
is character data directly contained in the desc element, as well as
children i, b, and u.
12.3 | Content types 265

Example 12–8. Instance element with mixed content

<desc>This is our <i>best-selling</i> shirt.
<b>Note: </b> runs <u>large</u>.</desc>

Example 12–9 shows a type deﬁnition that describes the letter

element shown in Example 12–8. To indicate that character data con-
tent is permitted, the complexType element in Example 12–9 has an
attribute mixed that is set to true. The default value for mixed is
false, meaning that character data content is not permitted.

Example 12–9. Complex type with mixed content

<xs:element name="desc" type="DescType"/>
<xs:complexType name="DescType" mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="i" type="xs:string"/>
<xs:element name="b" type="xs:string"/>
<xs:element name="u" type="xs:string"/>

</xs:choice>
</xs:complexType>

It is important to note that the character data that is directly con-

tained in the desc element (e.g., “This is our ”, “shirt”, etc.) is
not assigned a type. It is completely unrestricted in terms of its type
and where (and whether) it appears. Therefore, you should not use
mixed content types for data that you wish to constrain in any way.

12.3.4 Empty content

Empty content allows neither character data nor child elements. Ele-
ments with empty content often have values in attributes. In some
cases, they may not even have attributes; their presence alone is
meaningful. For example, a <br/> element in XHTML indicates a
line break, without providing any data other than its presence.
Example 12–10 shows an element color with empty content.
266 Chapter 12 | Complex types

Example 12–10. Instance element with empty content

Example 12–11 shows a type deﬁnition that might be assigned to

the color element in Example 12–10. Note that there is no special
attribute in the complex type deﬁnition to indicate that the content
is empty. The fact that only an attribute, with no content model, is
speciﬁed for the complex type is enough to indicate this.

Example 12–11. Complex type with empty content

<xs:complexType name="ColorType">
<xs:attribute name="value" type="ColorValueType"/>
</xs:complexType>

12.4 Using element declarations

Element declarations can be included in complex-type content models
in three ways: as local element declarations, as references to global
element declarations, and as wildcards.

12.4.1 Local element declarations

Complex types can contain local element declarations. Such an element
declaration speciﬁes a name, a type, and other properties within the
complex type. The scope of that element declaration is limited to
the complex type within which it appears. Local element declarations
are described in detail in Section 6.1.2 on p. 93. All of the prior exam-
ples in this chapter have local element declarations, as evidenced by
the name attributes of the element elements.
12.4 | Using element declarations 267

12.4.2 Element references

Complex types can also contain references to global element declara-
tions. Global element declarations themselves are covered in detail in
Section 6.1.1 on p. 89. The syntax to reference a global element decla-
ration is shown in Table 12–3. An element element is used to make
the reference, though it uses a ref attribute instead of a name attribute.

Table 12–3 XSD Syntax: element reference

Name
element

Parents
all, choice, sequence

Attribute name Type Description

id ID Unique ID.
ref QName Name of the global ele-
ment declaration being
referenced.
minOccurs nonNegativeInteger: 1 Minimum number of
element occurrences.
maxOccurs nonNegativeInteger | Maximum number of
"unbounded" : 1 element occurrences.
Content
annotation?

Example 12–12 shows four global element declarations, with the

ProductType deﬁnition referencing them through the ref attribute.
Note that the type attribute appears in the global element declaration,
while the minOccurs attribute is in the element reference. Occurrence
constraints (minOccurs and maxOccurs) may only appear within
268 Chapter 12 | Complex types

Example 12–12. Element references

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="SizeType"/>
<xs:element name="color" type="ColorType"/>

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element ref="number"/>
<xs:element ref="name"/>
<xs:element ref="size" minOccurs="0"/>
<xs:element ref="color" minOccurs="0"/>
</xs:sequence>
</xs:complexType>

</xs:schema>

complex type deﬁnitions, that is, in element references or local element

declarations but not in global element declarations.
For a detailed discussion of local or global element declarations see
Section 6.1.3 on p. 95.

12.4.3 Duplication of element names

If two element declarations or references with the same name appear
anywhere in the same complex type, they must have the same type.
Example 12–13 shows a complex type deﬁnition that includes two
declarations for name. This content model represents “either number
or name or both.” It is valid because both name declarations refer to
the exact same type, string.
Example 12–14, on the other hand, is illegal because the two
declarations of name refer to different types.
12.4 | Using element declarations 269

Example 12–13. Legal duplication of element names

<xs:complexType name="ProductType">
<xs:choice>
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string" minOccurs="0"/>
</xs:sequence>
<xs:element name="name" type="xs:string"/>
</xs:choice>
</xs:complexType>

Example 12–14. Illegal duplication of element names

Anonymous types are never considered equal, even if they have

identical content models. If either of the name declarations used an
anonymous type, it would automatically be illegal.
It is the qualified name that is relevant here. If one of the name ele-
ment names had been qualiﬁed, or both had been qualiﬁed with differ-
ent namespace names, Example 12–14 would have been legal. This
applies regardless of whether the element declarations are global or local.
It is only the type name that must be consistent. It is legal for the
two name element declarations to specify different default values,
annotations, or other properties.
270 Chapter 12 | Complex types

12.5 Using model groups

Model groups allow you to group child element declarations or refer-
ences together to construct more meaningful content models. There
are three kinds of model groups: all groups, choice groups, and
sequence groups.
Every complex type (except empty-content types) has exactly one
model group child. This model group child contains the element dec-
larations or references, or other model groups that make up the content
model. Element declarations are never directly contained in the
complexType element.

12.5.1 sequence groups

A sequence group is used to indicate that elements should appear in
a speciﬁed order. In all previous examples, we speciﬁed a sequence
group using a sequence element, whose syntax is shown in Table 12–4.

Table 12–4 XSD Syntax: sequence group

Name
sequence

Parents
complexType, restriction, extension, group, choice, sequence

Attribute name Type Description

Example 12–15 shows a complex type deﬁnition that contains a

sequence group. All of the children (number, name, size, and
color), if they appear, must appear in that order. The fact that the
minOccurs attribute of the size and color element declarations is
set to 0 means that they are optional.

Example 12–15. A sequence group

Example 12–16 shows some valid instances of ProductType. They

are valid because the elements that are required (i.e., do not have their
minOccurs set to 0) do appear and the elements that appear are in the
correct order.

Example 12–16. Valid instances of ProductType

<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size system="US-DRESS">10</size>
<color value="blue"/>
</product>

<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
</product>

<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<color value="blue"/>
</product>
272 Chapter 12 | Complex types

Example 12–17 shows an invalid instance of ProductType. It is

invalid because the elements appear in the wrong order.

Example 12–17. Invalid instance of ProductType

<product>
<number>557</number>
<size system="US-DRESS">10</size>
<name>Short-Sleeved Linen Blouse</name>
<color value="blue"/>
</product>

12.5.1.1 Design hint: Should I care about the order of

elements?
You may not be concerned about the order in which an element’s
children appear. However, some constraints will only work if you en-
force an order. For example, in version 1.0, it is not possible to say that
a product may have one number, one name, and up to three color
children, in any order. You could use an all group if you only want
to allow each element once, or you could use a choice group if you
do not mind there being more than one number or name. You could
also use a choice group containing several sequence groups to iterate
over all the possible orders of the child elements, but this is rather
cumbersome and becomes explosively more cumbersome as more
children are added. In order to ensure that you will only have one
number and one name, and allow more than one color, the best ap-
proach is to enforce the order and use a sequence group, as shown in
Example 12–18.
Version 1.1 relaxes some constraints on all groups, making it pos-
sible to express the above constraint without forcing a particular order.
However, sequence groups still have a slight advantage of clarity and
ease of processing over all groups.
12.5 | Using model groups 273

Example 12–18. Enforcing order for better constraints

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="color" type="ColorType" maxOccurs="3"/>
</xs:sequence>
</xs:complexType>

12.5.2 choice groups

A choice group is used to indicate that only one of the declared ele-
ments must appear. This is accomplished using a choice element,
whose syntax is shown in Table 12–5.

Table 12–5 XSD Syntax: choice group

Name
choice

Parents
complexType, restriction, extension, group, choice, sequence

Attribute name Type Description

Example 12–19 shows a choice group that speciﬁes that any one
of the elements (shirt, hat, or umbrella) must appear.

Example 12–19. A choice group

<xs:complexType name="ItemsType">
<xs:choice>
<xs:element name="shirt" type="ShirtType"/>
<xs:element name="hat" type="HatType"/>
<xs:element name="umbrella" type="UmbrellaType"/>
</xs:choice>
</xs:complexType>

Example 12–20 shows some valid instances of ItemsType. They

are valid because each contains exactly one element declared in the
choice group. If more than one element appeared, or no elements at
all appeared, it would be invalid.

Example 12–20. Valid instances of ItemsType

A common use case for choice groups is to allow any number of

children to appear in any order. If, in our example above, we allowed
the choice group to repeat itself by setting its maxOccurs attrib-
ute to unbounded, as shown in Example 12–21, that would entirely
change the meaning of the group. Now it allows any of the children,
any number of times, in any order.
Example 12–22 shows a valid instance of the new deﬁnition of
ItemsType.
12.5 | Using model groups 275

Example 12–21. A repeating choice group

<xs:complexType name="ItemsType">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="shirt" type="ShirtType"/>
<xs:element name="umbrella" type="UmbrellaType"/>
<xs:element name="hat" type="HatType"/>
</xs:choice>
</xs:complexType>

Example 12–22. Valid instance of ItemsType

In Example 12–21, we also set the minOccurs attribute of the

choice group to 0. This means that items would also be valid if it
were completely empty.
Use of repeating choice groups is especially common in narrative
document-oriented XML, where child elements can be intermingled
with character data content and their number and position needs to
be very ﬂexible. Example 12–9 on p. 265 showed a typical mixed-
content complex type that uses a repeating choice group to allow any
number of i, b, and u elements to appear any number of times, in any
order. Substitution groups, described in Chapter 16, are another ap-
proach to deﬁning such a structure.

12.5.3 Nesting of sequence and choice groups

In order to specify more advanced content models, sequence and
choice groups can be nested within each other as many levels deep as
necessary. Example 12–23 shows a slightly more complicated definition
of ProductType. With this definition, number must appear first,
276 Chapter 12 | Complex types

then name, then any number of the properties (such as size or color)
of the product, in any order. Note that the choice group is inside the
sequence group, allowing you to combine the power of both kinds
of model groups.

Example 12–23. Multiple nested groups

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="size" type="SizeType"/>
<xs:element name="color" type="ColorType"/>
</xs:choice>
</xs:sequence>
</xs:complexType>

12.5.4 all groups

An all group is used to indicate that elements can appear in any order.
This is accomplished by using an all element, whose syntax is shown
in Table 12–6.
An all group follows rules that are different from choice and
sequence groups:

# In version 1.0, it can only contain element declarations and

references, not other groups or wildcards. In version 1.1, this
constraint has been relaxed, and an all group can contain
wildcards and group references, as long as those groups also
use all.
# In version 1.0, for each element it contains, maxOccurs must
be 1, and minOccurs may only be 0 or 1. In version 1.1, this
constraint has been eliminated; both minOccurs and
maxOccurs can be greater than 1, and maxOccurs can be
unbounded.
12.5 | Using model groups 277

Table 12–6 XSD Syntax: all group

Name
all

Parents
complexType, restriction, extension, group

Attribute name Type Description

id ID Unique ID.
minOccurs "0" | "1" : 1 Whether the entire group is optional.
maxOccurs "0" | "1" : 1 Indicates that the all group cannot
repeat.
Content
annotation?, (element | 1.1
any | 1.1
group)*

# It cannot occur multiple times. For the all element itself,

maxOccurs must be 1, and minOccurs may only be 0 or 1.
# It cannot appear in other model groups. An all group must
be at the top level of the complex type.

Example 12–24 uses an all group to represent the same basic

structure for product elements, with no speciﬁc order for the children.

Example 12–24. An all group

<xs:complexType name="ProductType">
<xs:all>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="SizeType" minOccurs="0"/>
<xs:element name="color" type="ColorType" minOccurs="0"/>
</xs:all>
</xs:complexType>
278 Chapter 12 | Complex types

Example 12–25 shows some valid instances of ProductType, accord-

ing to the new deﬁnition. The name and number elements are still
required, but now they may appear in any order.
As mentioned above, an all group cannot appear inside another
model group. Example 12–26 is illegal because the all group is inside
a sequence group.

Example 12–25. Valid instances of ProductType using an all group

<product>
<color value="blue"/>
<size system="US-DRESS">10</size>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
</product>

<product>
<name>Short-Sleeved Linen Blouse</name>
<number>557</number>
</product>

Example 12–26. An illegal all group

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:all>
<xs:element name="size" type="SizeType" minOccurs="0"/>
<xs:element name="color" type="ColorType" minOccurs="0"/>
</xs:all>
</xs:sequence>
</xs:complexType>

12.5.5 Named model group references

Named model groups may be referenced in complex type deﬁnitions,
in order to make use of predeﬁned content model fragments. Named
model groups are described in detail in Section 15.2 on p. 386. Exam-
12.5 | Using model groups 279

ple 12–27 shows a reference to a named model group in a complex

type deﬁnition.

Example 12–27. Complex type with a named model group reference

<xs:complexType name="ProductType">
<xs:sequence>
<xs:group ref="DescriptionGroup"/>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>

12.5.6 Deterministic content models

Like XML DTDs, XML Schema requires that content models be de-
terministic. In XML Schema, this is known as the Unique Particle
Attribution (UPA) constraint. A schema processor, as it makes its way
through the children of an instance element, must be able to ﬁnd only
one branch of the content model that is applicable, without having to
look ahead to the rest of the children. The classic example is
Example 12–28. It is intended to represent a choice among a, b, and
a followed by b.

Example 12–28. Illegal nondeterministic content model

<xs:complexType name="AOrBOrBothType">
<xs:choice>
<xs:element name="a" type="xs:string"/>
<xs:element name="b" type="xs:string"/>
<xs:sequence>
<xs:element name="a" type="xs:string"/>
<xs:element name="b" type="xs:string"/>
</xs:sequence>
</xs:choice>
</xs:complexType>
280 Chapter 12 | Complex types

This content model is nondeterministic because the processor, if it

first encounters a, will not know whether it should validate it against
the first declaration of a or the second declaration of a without looking
ahead to see if there is also a child b. It may seem that it should not
matter which a declaration we are talking about; they both have the
same type. However, they may have other properties that are different,
such as default values, identity constraints, or annotations.
Example 12–29 represents the same desired content model in a de-
terministic way. There is only one declaration of a. There are two
declarations of b, but distinguishing between them does not require
looking ahead. If the processor encounters b as the first child, it knows
to use the second declaration of b. If it encounters b after a, it uses the
first declaration of b.
In this example, the conflict is between two element declarations
with the same name. However, there are also more subtle conflicts,
such as those between

# Two element wildcards whose valid values overlap

# An element declaration and a wildcard for which it is a valid
replacement1
# An element declaration and the head of its substitution group

Example 12–29. Deterministic content model

<xs:complexType name="AOrBOrBothType">
<xs:choice>
<xs:sequence>
<xs:element name="a" type="xs:string"/>
<xs:element name="b" type="xs:string" minOccurs="0"/>
</xs:sequence>
<xs:element name="b" type="xs:string"/>
</xs:choice>
</xs:complexType>

1. This is a conﬂict in version 1.0 only. In version 1.1, the element declaration
has precedence over the wildcard.
12.6 | Using attribute declarations 281

12.6 Using attribute declarations

As with elements, attributes can be included in complex types as local
declarations, as references to global declarations, or via attribute group
references.
Within complex type definitions, attributes must be specified after
the content model. The local attribute declarations, attribute references,
and attribute group references may appear in any order, intermixed
with each other. There is no significance to the ordering of attributes
in XML.
It is illegal to define a complex type that contains two attribute dec-
larations with the same qualified name. This is understandable, since
XML forbids this. However, if two attribute declarations have the same
local name but different namespace names, they may both appear in
the same complex type.

12.6.1 Local attribute declarations

Complex type deﬁnitions can contain local attribute declarations. This
means that the attributes are declared (that is, given a name, a type,
and other properties) within the complex type. The scope of that at-
tribute declaration is limited to the complex type within which it
appears. Local attribute declarations are described in detail in
Section 7.2.2 on p. 117.
Example 12–30 shows adding a local attribute declaration to
ProductType. Note that the attribute declaration appears after the
sequence group that represents the content model.

Example 12–30. Local attribute declaration

<xs:complexType name="ProductType">
<xs:sequence>

</xs:sequence>
<xs:attribute name="effDate" type="xs:date"
default="1900-01-01"/>
</xs:complexType>
282 Chapter 12 | Complex types

12.6.2 Attribute references

Complex type deﬁnitions can also contain references to global attribute
declarations. Global attribute declarations themselves are covered
in detail in Section 7.2.1 on p. 115. The syntax used to reference a
global attribute declaration is shown in Table 12–7.

Table 12–7 XSD Syntax: attribute reference

Name
attribute

Parents
complexType, restriction, extension, attributeGroup

Attribute name Type Description

id ID Unique ID.
ref QName Name of the attribute being
referenced.
use "optional" | Whether the attribute is required.
"prohibited" |
"required" :
optional
default string Default value for the attribute, see
Section 7.5.1.
fixed string Fixed value for the attribute, see
Section 7.5.2.
1.1
inheritable boolean Whether the value can be inherited
by descendant elements, see
Section 7.6.
Content
annotation?
12.6 | Using attribute declarations 283

The use attribute may be used to indicate whether the attribute

is required or optional. The value prohibited is used only when
restricting complex types. If required is chosen, then a default or
fixed value in the global attribute declaration are ignored.
The default attribute may be used to add a default value, or to
override the default attribute in the global attribute declaration. This
is true for attribute references only; an element reference cannot override
the default value in a global element declaration.
The fixed attribute may be used to add a fixed value, but it cannot
override or remove a fixed value specified in the global attribute decla-
ration. Only one of default and fixed may appear; they are mutually
exclusive.
The inheritable attribute may be used to indicate that the at-
tribute is inheritable, as described in Section 7.6 on p. 126. If it is not
specified, it defaults to the inheritable value of the global attribute
declaration, which itself defaults to false.
Example 12–31 shows a complex type definition with a reference to
a global attribute declaration. Note that the type attribute is in the
global attribute declaration. In this case, the default attribute in
the attribute reference overrides the default attribute in the global
attribute declaration.
For a detailed discussion of local or global attribute declarations see
Section 7.2.3 on p. 119.

Example 12–31. Attribute reference

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:attribute name="effDate" type="xs:date"
default="1900-01-01"/>

<xs:complexType name="ProductType">
<xs:sequence>

</xs:sequence>
<xs:attribute ref="effDate" default="2000-12-31"/>
</xs:complexType>
</xs:schema>
284 Chapter 12 | Complex types

12.6.3 Attribute group references

Attribute groups may be referenced in complex type definitions, in
order to make use of predefined groups of attributes. Attribute groups
are described in detail in Section 15.3 on p. 392. Example 12–32
shows a complex type definition that references the attribute group
IdentifierGroup.

Example 12–32. Complex type with attribute group reference

<xs:complexType name="ProductType">
<xs:sequence>

</xs:sequence>
<xs:attributeGroup ref="IdentifierGroup"/>
<xs:attribute name="effDate" type="xs:date"/>
</xs:complexType>

12.6.4 Default attributes

In version 1.1, it is possible to indicate that an attribute group is
the default attribute group by specifying its name in the
defaultAttributes attribute on the schema element. If such a
default attribute group is deﬁned, those attributes will automatically
be allowed for every complex type in the schema document,
unless you speciﬁcally disallow it. To disallow it, use the attribute
defaultAttributesApply="false" to your complex type. Default
attribute groups are described further in Section 15.3.3 on p. 399.

12.7 Using wildcards

Wildcards allow for more ﬂexibility in the content models and attributes
deﬁned in a complex type. There are two kinds of wildcards: element
wildcards, which use the any element, and attribute wildcards, which
use the anyAttribute element.
12.7 | Using wildcards 285

12.7.1 Element wildcards

Element wildcards provide ﬂexibility in which child elements may ap-
pear. Element wildcards are represented by any elements, whose syntax
is shown in Table 12–8. The elements in an instance that match the
wildcard are referred to as replacement elements in this book.

Table 12–8 XSD Syntax: element wildcard

Name
any

Parents
choice, sequence, 1.1
all, 1.1
openContent, 1.1
defaultOpenContent

Attribute name Type Description

id ID Unique ID.
minOccurs nonNegativeInteger: 1 Minimum number of re-
placement elements that
may appear.
maxOccurs nonNegativeInteger | Maximum number of re-
"unbounded" : 1 placement elements that
may appear.
namespace "##any" | "##other" | Which namespace(s) the
list of (anyURI | replacement elements may
"##targetNamespace" | be in.
"##local")
processContents "lax" | "skip" | How strictly to validate the
"strict" : strict replacement elements.
1.1
notNamespace list of (anyURI
| Which namespace(s) the
"##targetNamespace" | replacement elements may
"##local") not be in.
1.1
notQName list of (QName
| Which elements may not
"##defined" | be used as replacement
"##definedSibling") elements.
Content
annotation?
286 Chapter 12 | Complex types

The minOccurs and maxOccurs attributes control the number of

replacement elements. This number represents how many total replace-
ment elements may appear, not how many of a particular type or
how many types. The number does not include child elements of the
replacement elements.
Example 12–33 shows a complex type deﬁnition for ProductType
that includes an element wildcard allowing any number of replacement
elements at the end of the content model.

Example 12–33. Using an element wildcard

<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:any minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:element name="color" type="xs:string"/>
<xs:element name="desc" type="xs:string"/>
<xs:element name="size" type="xs:integer"/>

Example 12–34 is a valid product instance. It has three replacement

elements: two color elements and a size element, which can appear
in any numbers and in any order.

Example 12–34. Instance with replacement elements

<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<color>blue</color>
<size>12</size>
<color>red</color>
</product>
12.7 | Using wildcards 287

12.7.1.1 Controlling the namespace of replacement elements

The namespace attribute allows you to specify what namespaces the
replacement elements may belong to. It may have the value ##any,
##other, or a list of values.
If it is ##any, the replacement elements can be in any namespace
whatsoever, or in no namespace. This is the default setting if neither
the namespace nor the notNamespace attribute have been speciﬁed.
If it is ##other, the replacement elements can be in any namespace
other than the target namespace of the schema document, but they
must be in a namespace. If the schema document has no target
namespace, the replacement elements can have any namespace, but
they must have one.
Otherwise, the value of the namespace attribute can be a whitespace-
separated list of values that may include any or all of the following
items:

# ##targetNamespace to indicate that the replacement elements

may be in the target namespace of the schema document
# ##local to indicate that the replacement elements may be in
no namespace
# Speciﬁc namespace names for the replacement elements

The namespace constraint applies only to the replacement elements.

The children of each replacement element, if they exist, are then vali-
dated according to the type of the replacement element.

12.7.1.2 Controlling the strictness of validation

The processContents attribute controls how much validation takes
place on the replacement elements. It may have one of three values:

# If it is skip, the schema processor performs no validation

whatsoever, and does not attempt to ﬁnd a schema document
associated with the wildcard’s namespace. The replacement
288 Chapter 12 | Complex types

elements must, however, be well-formed XML and must be in

one of the namespaces allowed by the wildcard.
# If it is lax, the schema processor will validate replacement ele-
ments for which it can find declarations and raise errors if they
are invalid. It will not, however, report errors on the elements
for which it does not find declarations.
# If it is strict, the schema processor will attempt to find a
schema document associated with the namespace, and validate
all of the replacement elements. If it cannot find the schema
document, or the elements are invalid, it will raise errors. This
is the default value.

Suppose our product element can also contain an extended textual

description that may run several paragraphs. This description is going
to appear on the company’s website, and we want the text to be format-
ted using XHTML. Example 12–35 shows an element wildcard that
will allow DescriptionType to contain any elements that are part of
the XHTML namespace.

Example 12–35. Complex type with element wildcard

<xs:complexType name="DescriptionType" mixed="true">
<xs:sequence>
<xs:any namespace="http://www.w3.org/1999/xhtml"
minOccurs="0" maxOccurs="unbounded"
processContents="skip"/>
</xs:sequence>
</xs:complexType>

Example 12–36 shows a description element which has the type

DescriptionType.
Since the processContents attribute is set to skip, it is not neces-
sary to provide any information about where to ﬁnd the schema to
validate the replacement elements. It is only necessary that the ele-
ments in the instance have names that are qualiﬁed with the XHTML
namespace. In our example, we accomplish this by associating the
12.7 | Using wildcards 289

Example 12–36. Instance with processContents of skip

<description>
This shirt is the <xhtml:b>best-selling</xhtml:b> shirt in
our catalog! <xhtml:br/> Note: runs large.
</description>

</catalog>

xhtml preﬁx with the XHTML namespace, and by preﬁxing all of

the XHTML element names.
If we had chosen strict for the value of processContents,
we would have to go further and tell the processor where to ﬁnd the
XHTML schema. We could do this by importing the XHTML
namespace and schema into our schema, as described in Section 4.3.2
on p. 66.

12.7.1.3 Negative wildcards

Version 1.1 provides two additional attributes for wildcards that
allow you to specify namespaces and names that are disallowed for
replacement elements.
The notNamespace attribute allows you to specify the namespaces
that the replacement elements may not belong to. It is a whitespace-
separated list of values that may include any or all of the following
items:

# ##targetNamespace to indicate that the replacement elements

may not be in the target namespace of the schema document.
# ##local to indicate that the replacement elements must be in
a namespace.
# Speciﬁc disallowed namespace names for the replacement
elements.
290 Chapter 12 | Complex types

The notNamespace and the namespace attributes on wildcards are

mutually exclusive. They cannot both be speciﬁed. If neither is speciﬁed,
the replacement elements can be in any namespace.
The notQName attribute allows you to disallow certain elements
from being replacement elements. It is a whitespace-separated list of
values that may include any or all of the following items:

# ##defined, to disallow replacement elements whose names

match global element declarations in the schema.
# ##definedSibling, to disallow replacement elements whose
names match declarations (local element declarations or element
references) in the same complex type, i.e. that could be siblings
of the replacement element.
# Speciﬁc names for the disallowed replacement elements, which
may or may not actually be declared in the schema; if these
names are in a namespace, they must be preﬁxed appropriately
or use in-scope default namespace declarations to assign the
namespace.

Example 12–37 shows the use of the notNamespace and notQName

attributes, which can be used together or individually.

Example 12–37. Negative wildcards

<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:any minOccurs="0" maxOccurs="unbounded"
notNamespace="http://www.w3.org/1999/xhtml"
notQName="##definedSibling desc size"
processContents="lax"/>
</xs:sequence>
</xs:complexType>
<xs:element name="color" type="xs:string"/>
<xs:element name="size" type="xs:integer"/>
12.7 | Using wildcards 291

Example 12–38 is a valid product instance. It has three replacement

elements: two color elements and a random element. Neither color
nor random is in the prohibited namespace or in the list of pro-
hibited names. The random element is allowed even though it is not
declared in the schema, because processContents is set to lax.

Example 12–38. Instance conforming to the schema with negative wildcards

<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<color>blue</color>
<random>not declared in the schema</random>
<color>red</color>
</product>

Example 12–39 is a product instance where every replacement

element shown is invalid. A size element is disallowed from being a
replacement element because it is listed in notQName, as is desc,
even though there is no declaration for desc in the schema. Also
disallowed as replacement elements are number and name, since
##definedSibling is speciﬁed. Finally, there can be no replacement
elements from the XHTML namespace because that namespace is
listed in notNamespace.

Example 12–39. Invalid instance with disallowed namespaces and names

<product xmlns:xhtml="http://www.w3.org/1999/xhtml">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size>12</size>
<desc>Our best-selling shirt</desc>
<number>12345</number>
<xhtml:p>Our best-selling shirt</xhtml:p>
</product>
292 Chapter 12 | Complex types

12.7.2 Open content models

Some XML vocabularies need to be very ﬂexible about where they allow
extension elements. To this end, version 1.1 allows you to specify an
open content model, which is essentially an element wildcard that allows
replacement elements to appear interleaved with other elements, without
forcing the schema author to explicitly specify wildcards between every
element declaration in a content model. An open content model can
apply to a single complex type, or be declared as the default for complex
types in a schema document.

12.7.2.1 Open content in a complex type

An open content model for a single complex type is deﬁned using an
openContent element, whose syntax is shown in Table 12–9.
The openContent element always appears before the content model.
It contains a standard any wildcard as deﬁned in Section 12.7.1 on
p. 285. The one difference is that the wildcard inside the openContent

Table 12–9 XSD Syntax: open content

Name
openContent

Parent
complexType, extension, restriction

Attribute name Type Description

id ID Unique ID.
mode "none" | "interleave" | Where replacement
"suffix" : interleave elements are allowed to
appear.
Content
annotation?, any?
12.7 | Using wildcards 293

cannot have minOccurs and maxOccurs speciﬁed; it is implied that

any number of replacement elements can appear.
The openContent element has a mode attribute that indicates where
the replacement elements can appear.

# If it is interleave (the default), the replacement elements can

appear intermingled with the elements explicitly declared in the
content model.
# If it is suffix, the replacement elements can only appear at
the end of the content.
# If it is none, no any child appears within openContent and
the content model is not open (this is primarily used to override
a default open content model).

The complex type in Example 12–40 contains an openContent

element that allows any element from any other namespace to appear
interleaved in the instance.

Example 12–40. Deﬁning an interleaved open content model

<xs:element name="product" type="OpenProductType"/>
<xs:complexType name="OpenProductType">
<xs:openContent>
<xs:any namespace="##other" processContents="lax"/>
</xs:openContent>
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>

Example 12–41 shows a valid instance, where the product element

contains elements from an extension namespace interleaved with the
declared number and name elements. The number and name elements
must still conform to the content model speciﬁed in the sequence
group.
294 Chapter 12 | Complex types

Example 12–41. Instance of an interleaved open content model

<product xmlns:ext="http://datypic.com/extension">
<ext:something>an extension element</ext:something>
<number>557</number>
<ext:something_else>another extension element</ext:something_else>
<name>Short-Sleeved Linen Blouse</name>
<ext:something_else>another extension element</ext:something_else>
<ext:something>an extension element</ext:something>
</product>

To allow extension elements only at the end, simply add

mode="suffix" to the openContent element, as shown in Exam-
ple 12–42. This example also has different settings for the wildcard,
namely that the replacement elements are being strictly validated and
must not be in a namespace.

Example 12–42. Deﬁning a sufﬁx open content model

<xs:element name="product" type="OpenProductType"/>
<xs:complexType name="OpenProductType">
<xs:openContent mode="suffix">
<xs:any namespace="##local" processContents="strict"/>
</xs:openContent>
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:element name="something" type="xs:string"/>

Example 12–43 is a valid instance, with the replacement elements

now at the end. As with the previous example, multiple replacement
elements may appear.
This example may not appear to be any different from simply
including the wildcard at the end of the sequence group instead of in
openContent. However, it is subtly different in that extensions of
this type will expect the replacement elements to appear at the end
of the extended type, not right after the name element. This is described
further in Section 13.4.2.3 on p. 311.
12.7 | Using wildcards 295

Example 12–43. Instance of a sufﬁx open content model

<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<something>an extension element</something>
<something>an extension element</something>
</product>

12.7.2.2 Default open content

It is also possible in version 1.1 to specify a default open content
model that can apply to any complex type in the schema that allows
children (that is, any one with element-only or mixed content). This
is accomplished using a defaultOpenContent element, whose syntax
shown in Table 12–10.

Table 12–10 XSD Syntax: default open content

Name
defaultOpenContent

Parent
schema

Attribute name Type Description

id ID Unique ID.
appliesToEmpty boolean : false Whether the open content
applies to empty content
types.
mode "interleave" | Where replacement elements
"suffix" : are allowed to appear.
interleave

Content
annotation?, any
296 Chapter 12 | Complex types

The defaultOpenContent element works the same way as the

openContent element, containing an element wildcard and specifying
a mode attribute to indicate where the replacement elements can appear.
However, since it applies to multiple complex types in a schema docu-
ment, it appears at the top level of the schema, after any includes,
imports, and overrides but before any component deﬁnitions.
Example 12–44 is a schema that contains a defaultOpenContent
and two complex types to which it applies.

Example 12–44. Deﬁning a default open content model

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:defaultOpenContent mode="suffix">
<xs:any namespace="##local"/>
</xs:defaultOpenContent>

<xs:element name="catalog" type="CatalogType"/>

<xs:complexType name="CatalogType">
<xs:sequence>
<xs:element name="product" type="ProductType"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>

<xs:element name="something" type="xs:string"/>

</xs:schema>

A valid instance is shown in Example 12–45, where both the

catalog and product elements can contain replacement elements.
In this case, they must appear at the end, since the mode is set to
suffix.
12.7 | Using wildcards 297

Example 12–45. Instance using default open content model

<catalog>
<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<something>an extension element</something>
<something>an extension element</something>
</product>
<something>an extension element</something>
<something>an extension element</something>
</catalog>

If a default open content model is deﬁned, it is possible to override

it in an individual complex type using the openContent element
with a mode="none" attribute. In Example 12–46, CatalogType
will not have open content because it overrides the default, but
ProductType will.

Example 12–46. Overriding a default open content model

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:defaultOpenContent mode="suffix">
<xs:any namespace="##local"/>
</xs:defaultOpenContent>

<xs:element name="catalog" type="CatalogType"/>

<xs:complexType name="CatalogType">
<xs:openContent mode="none"/>
<xs:sequence>
<xs:element name="product" type="ProductType"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
298 Chapter 12 | Complex types

Note that the default open content model does not apply to complex
types with simple content, since they do not allow children. By default,
it does not apply to complex types with empty content, either.
However, you can use an appliesToEmpty="true" attribute on
defaultOpenContent to indicate that the default open content
model should apply to complex types with empty content.

12.7.3 Attribute wildcards

Attribute wildcards are used to allow flexibility as to what attributes
may appear on elements of a particular complex type. Attribute wild-
cards are represented by anyAttribute elements, whose syntax is
shown in Table 12–11.
The namespace, processContents, notNamespace, and
notQName attributes for attribute wildcards work exactly the same as
for element wildcards described in Section 12.7.1 on p. 285. The only
difference between attribute wildcards and element wildcards is that
attribute wildcards cannot have minOccurs and maxOccurs specified.
If an attribute wildcard is present, it is assumed that there may be zero,
one, or many replacement attributes present.
Attribute wildcards in a complex type must appear after all of the
attribute declarations, attribute references, and attribute group refer-
ences. There can only be one attribute wildcard in each complex type
definition.
Example 12–47 shows the definition of a type that allows any number
of replacement attributes from any namespace other than the target
namespace of the schema document.

Example 12–47. Complex type with attribute wildcard

<xs:complexType name="ProductType">
<xs:sequence>

</xs:sequence>
<xs:anyAttribute namespace="##other" processContents="lax"/>
</xs:complexType>
12.7 | Using wildcards 299

Table 12–11 XSD Syntax: attribute wildcard

Name
anyAttribute

Parents
complexType, restriction, extension, attributeGroup

Attribute name Type Description

id ID Unique ID.
namespace "##any" | "##other" | Which namespace(s) the
list of (anyURI | replacement attributes
"##targetNamespace" | may be in.
"##local")
processContents "lax" | "skip" | How strictly to validate
"strict" : strict the replacement
attributes.
1.1
notNamespace list of (anyURI
| Which namespace(s) the
"##targetNamespace" | replacement attributes
"##local") may not be in.
1.1
notQName list of (QName | Which attributes
"##defined") may not be used as
replacement attributes.
Content
annotation?
Deriving
complex types

300
Chapter

n the previous chapter, we saw how to deﬁne new complex

I types that are not speciﬁcally derived from another type.

This chapter covers the complexities of deriving complex
types from other types, both complex and simple.

13.1 Why derive types?

XML Schema allows you to derive a new complex type from an existing
simple or complex type. While it is always possible to make a copy of
an existing type and modify it to suit your needs, using type derivation
has a number of advantages:

# Subsetting. If you want to deﬁne a more restrictive subset of a

schema, the best way to do this is using restriction. Your schema
processor will validate that you have in fact defined a legal sub-
set. It also allows future modifications to the original types to
be reflected in your derived types automatically.

301
302 Chapter 13 | Deriving complex types

# Safe extensions. If you want to add to existing schema compo-

nents, XML Schema’s extension mechanism ensures that you
do that in such a way that an application can still handle the
original definition.
# Type substitution. Derived types can substitute for their ancestor
types in instances, which is a very flexible way to support
variations in content.
# Reuse. If several types share the same basic structure but have
minor differences, it makes sense to reuse the similar parts.
This makes maintenance easier and ensures consistency. Type
derivation is one way to reuse content model fragments and
attributes.
# Convenience in a type-aware language. Languages such as
XSLT 2.0 and XQuery are type-aware, which allows you to
define processes on base types that may be passed down
to their derived types. For example, XSLT 2.0 lets you apply a
specific template to “anything of type AddressType or any
type derived from it.”

13.2 Restriction and extension

Complex types are derived from other types either by restriction or
extension.

# Restriction, as the name suggests, restricts the valid contents of

a type. The values for the new type are a subset of those for the
base type. All values of the restricted type are also valid according
to the base type.
# Extension allows for adding children and/or attributes to a type.
Values of the base type are not necessarily valid for the extended
type, since required elements or attributes may be added.
13.3 | Simple content and complex content 303

It is not possible to restrict and extend a complex type at the same

time, but it is possible to do this in two steps, ﬁrst extending a type,
and then restricting the extension, or vice versa. However, when doing
this, it is not legal to remove something in a restriction and then use
extension to add it back in an incompatible way; for example, you
cannot re-add an element declaration with a different type.

13.3 Simple content and complex content

A complex type always has either simple content or complex content.
Simple content means that it has only character data content, with no
children. Complex content encompasses the other three content types
(mixed, element-only, and empty) that were covered in Section 12.3
on p. 262. A complex type is derived from another type using either a
simpleContent element or a complexContent element.

13.3.1 simpleContent elements

A simpleContent element is used when deriving a complex type from
a simple type, or from another complex type with simple content. This
can be done to add or remove attribute declarations, or to further re-
strict the simple type of the character content. If a complex type has
simple content, all types derived from it, directly or indirectly, must
also have simple content. It is impossible to switch from simple content
to complex content by deriving a type with child elements. Table 13–1
shows the syntax for a simpleContent element. It contains either an
extension or a restriction child element. These elements are
discussed in Sections 13.4.1 on p. 306 and 13.5.1 on p. 317,
respectively.
304 Chapter 13 | Deriving complex types

Table 13–1 XSD Syntax: simple content deﬁnition

Name
simpleContent

Parents
complexType

Attribute name Type Description

id ID Unique ID.
Content
annotation?, (extension | restriction)

13.3.2 complexContent elements

A complexContent element is used when deriving a complex type
from another complex type which itself has complex content. This in-
cludes mixed, element-only, and empty content types. This can be
done to add or remove parts of the content model as well as attribute
declarations. Table 13–2 shows the syntax for a complexContent

Table 13–2 XSD Syntax: complex content deﬁnition

Name
complexContent

Parents
complexType

Attribute name Type Description

id ID Unique ID.
mixed boolean Whether the complex type allows mixed
content; defaults to mixed value of
complexType.
Content
annotation?, (extension | restriction)
13.4 | Complex type extensions 305

element. It too must contain either an extension or a restriction,

but with deﬁnitions different from their counterparts in
simpleContent. These elements are discussed in Sections 13.4.2 on
p. 307 and 13.5.2 on p. 318, respectively.
If complexContent has a mixed attribute, that value is used. If it
has no mixed attribute, the mixed attribute of complexType is used.
If neither element has a mixed attribute, the default for mixed is false.

13.4 Complex type extensions

Complex types may be extended by adding to the content model and
to the attribute declarations. Table 13–3 shows the legal extensions for
each content type.

Table 13–3 Legal extensions by content type

BASE TYPE
Complex type
Simple
type Simple Element- Mixed Empty
content only
D Simple type no no no no no
E
Simple yes, see yes, see no no no
R
content 13.4.1 13.4.1
I
V Element- no no yes, see no yes, see
E only 13.4.2 13.4.4
D Complex
type Mixed no no no yes, see yes, see
T 13.4.3 13.4.4
Y Empty no no no no yes, see
P 13.4.4
E
306 Chapter 13 | Deriving complex types

13.4.1 Simple content extensions

The only purpose of simple content extensions is to add attribute dec-
larations. It is not possible to extend the value space of the simple
content, just as it is not possible to extend the value space of a
simple type. Table 13–4 shows the syntax for an extension element
that is the child of a simpleContent element.

Table 13–4 XSD Syntax: simple content extension

Name
extension

Parents
simpleContent

Attribute name Type Description

id ID Unique ID.
base QName Base type being extended.
Content
annotation?, (attribute | attributeGroup)*, anyAttribute?,
1.1
assert*

Example 13–1. Simple content extension

Schema:

<xs:complexType name="SizeType">
<xs:simpleContent>
<xs:extension base="xs:integer">
<xs:attribute name="system" type="xs:token"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>

Instance:

<size system="US-DRESS">10</size>
13.4 | Complex type extensions 307

Example 13–1 shows the deﬁnition of a complex type SizeType

that has simple content. It has a content type of integer, and it has
been extended to add the system attribute declaration. A valid instance
is also shown.

13.4.2 Complex content extensions

Complex content extensions allow you to add to the end of the content
model of the base type. You can also add attribute declarations, but
you cannot modify or remove the base type’s attribute declarations.
Table 13–5 shows the syntax for an extension element that is the
child of a complexContent element.

Table 13–5 XSD Syntax: complex content extension

Name
extension

Parents
complexContent

Attribute name Type Description

id ID Unique ID.
base QName Base type being extended.
Content
annotation?, 1.1 openContent?, (group | all | choice |
sequence)?, (attribute | attributeGroup)*, anyAttribute?,
1.1
assert*

When deﬁning a complex content extension, you do not need to

copy the content model from the base type. The processor handles
complex content extensions by appending the new content model after
the base type’s content model, as if they were together in a sequence
group.
308 Chapter 13 | Deriving complex types

Example 13–2 shows a complex content extension. The complex

type ProductType has two children: number and name. The type
ShirtType extends ProductType by adding a choice group
containing two additional children: size and color.
The effective content model of ShirtType is shown in Exam-
ple 13–3. It is as if there were a sequence group at the top level of the

Example 13–2. Complex content extension

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="ShirtType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:choice maxOccurs="unbounded">
<xs:element name="size" type="xs:integer"/>
<xs:element name="color" type="xs:string"/>
</xs:choice>
</xs:extension>
</xs:complexContent>
</xs:complexType>

Example 13–3. Effective content model of ShirtType

<xs:complexType name="ShirtType">
<xs:sequence>
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
<xs:choice maxOccurs="unbounded">
<xs:element name="size" type="xs:integer"/>
<xs:element name="color" type="xs:string"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
13.4 | Complex type extensions 309

complex type, which contains the content model of ProductType,

followed by the content model extensions speciﬁed in the ShirtType
deﬁnition itself.

13.4.2.1 Extending choice groups

Since extending requires the addition of an “artiﬁcial” sequence group,
extension does not work well as a way to add elements to choice
groups. Example 13–4 shows a type ExpandedItemsType that extends
ItemsType to add new product types. Intuitively, you may think that
the two additional element references, sweater and suit, are added
to the choice group, allowing a choice among the ﬁve element decla-
rations. In fact, the effective content model of ExpandedItemsType
is a sequence group that contains two choice groups. As a result,
ExpandedItemsType will require any of the shirt, hat, and
umbrella elements to appear before any of the sweater or suit
elements.

Example 13–4. choice group extension

<xs:complexType name="ItemsType">
<xs:choice maxOccurs="unbounded">
<xs:element ref="shirt"/>
<xs:element ref="hat"/>
<xs:element ref="umbrella"/>
</xs:choice>
</xs:complexType>

<xs:complexType name="ExpandedItemsType">
<xs:complexContent>
<xs:extension base="ItemsType">
<xs:choice maxOccurs="unbounded">
<xs:element ref="sweater"/>
<xs:element ref="suit"/>
</xs:choice>
</xs:extension>
</xs:complexContent>
</xs:complexType>
310 Chapter 13 | Deriving complex types

A better way to extend a choice group is through substitution

groups. See Section 22.2.4 on p. 607 for more information.

13.4.2.2 Extending all groups

In version 1.0, extension is not allowed for all groups. In version 1.1,
this constraint has been relaxed, and complex types that contain all
groups can be extended, provided that the derived type also uses an
all group, as shown in Example 13–5.

Example 13–5. all group extension

<xs:complexType name="ProductType">
<xs:all>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:all>
</xs:complexType>

<xs:complexType name="ShirtType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:all>
<xs:element name="size" type="xs:integer"/>
<xs:element name="color" type="xs:string"/>
</xs:all>
</xs:extension>
</xs:complexContent>
</xs:complexType>

The effective content model in this case is one big all group, shown
in Example 13–6, not two all groups inside a sequence.
When extending an all group with another all group, both groups
must have the same value for minOccurs (if any). The minOccurs of
the effective resulting group is the minOccurs of both groups. In Ex-
ample 13–5, the value for both groups defaults to 1, so the group shown
in Example 13–6 does also. Alternatively, both of the all groups could
have, for example, minOccurs="0", in which case the effective
minOccurs is 0.
13.4 | Complex type extensions 311

Example 13–6. Effective content model of ShirtType with all groups

combined
<xs:complexType name="ShirtType">
<xs:all>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer"/>
<xs:element name="color" type="xs:string"/>
</xs:all>
</xs:complexType>

13.4.2.3 Extending open content

It is possible to extend a type that has open content, or to add open
content in an extension. There are several possible scenarios:

# If openContent is specified for the base type but not the derived
type, the openContent is inherited as is from the base type.
# If openContent is specified for the derived type but not the
base type, it is considered to be added in the derived type.
# If it is specified in both the base type and the derived type, it
must be the same or less restrictive in the derived type. For ex-
ample, if mode is suffix in the base type but interleave in
the derived type, this is legal because it is less constraining. The
opposite is not legal; attempting to turn interleave mode
into suffix mode means creating a more restrictive type. In
addition, the namespace allowances on the derived type must
be the same as, or a superset of, those allowed for the base type.

Example 13–7 shows the case where openContent appears in both

types. This example is legal because the mode is equally constraining
and the list of allowed namespaces is less constraining.
Note that since the mode is suffix, in an instance of ShirtType
the replacement elements for the wildcard will go at the very end,
after the color element. Even though openContent is deﬁned for the
312 Chapter 13 | Deriving complex types

base type, it is not possible to include replacement elements directly

after name, where they would appear in an instance of ProductType.

Example 13–7. Extending open content

<xs:complexType name="ProductType">
<xs:openContent mode="suffix">
<xs:any namespace="##other" processContents="lax"/>
</xs:openContent>
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="ShirtType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:openContent mode="suffix">
<xs:any namespace="##any" processContents="lax"/>
</xs:openContent>
<xs:sequence>
<xs:element name="size" type="xs:integer"/>
<xs:element name="color" type="xs:string"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>

13.4.3 Mixed content extensions

Complex types with mixed content can be extended, but the derived
type must also have mixed content. The extension is treated the same
way as it is for element-only complex types described in the previous
section. It is illegal to extend a mixed content type to result in an
element-only content type. The reverse is also true; it is illegal to extend
an element-only content type to result in a mixed content type.
13.4 | Complex type extensions 313

When extending a mixed content type, you must also specify

the mixed attribute for the derived type. Example 13–8 shows a mixed
complex type LetterType that is extended to derive another
mixed complex type, ExtendedLetterType.

Example 13–8. Mixed content extension

<xs:complexType name="LetterType" mixed="true">
<xs:sequence>
<xs:element name="custName" type="xs:string"/>
<xs:element name="prodName" type="xs:string"/>
<xs:element name="prodSize" type="xs:integer"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="ExtendedLetterType" mixed="true">

<xs:complexContent>
<xs:extension base="LetterType">
<xs:sequence>
<xs:element name="prodNum" type="xs:string"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>

13.4.4 Empty content extensions

Complex types with empty content can be extended to add a content
model and/or attribute declarations. Example 13–9 shows an empty
complex type named ItemType, which is extended by ProductType
to add a sequence group containing two element declarations.

Example 13–9. Empty content extension

<xs:complexType name="ItemType">
<xs:attribute name="routingNum" type="xs:integer"/>
</xs:complexType>

(Continues)
314 Chapter 13 | Deriving complex types

Example 13–9. (Continued)

<xs:complexType name="ProductType">
<xs:complexContent>
<xs:extension base="ItemType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>

13.4.5 Attribute extensions

When defining an extension, you may specify additional attribute
declarations in the derived type’s definition. When extending complex
types, attributes are always passed down from the base type to the
new type. It is not necessary (or even legal) to repeat any attrib-
ute declarations from the base type or any other ancestors in the new
type definition. It is not possible to modify or remove any attribute
declarations from the base type in an extension.
Example 13–10 shows the definition of ProductType, which extends
ItemType. It adds two attribute declarations: effDate and lang. It
may be surprising that lang is legal, since it appears in the base type
definition. This is because the new lang is in a different namespace,
so it is allowed. The lang in the base type definition must be prefixed
when it appears in the instance, as shown in the instance example.

Example 13–10. Attribute extension

Schema:

<xs:complexType name="ItemType">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute ref="xml:lang"/>
</xs:complexType>

(Continues)
13.4 | Complex type extensions 315

Example 13–10. (Continued)

<xs:complexType name="ProductType">
<xs:complexContent>
<xs:extension base="ItemType">
<xs:attribute name="effDate" type="xs:date"/>
<xs:attribute name="lang" type="xs:language"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>

Instance:

13.4.6 Attribute wildcard extensions

If an attribute wildcard is specified in an extension, and there is no
attribute wildcard specified in the definition of its base type or any of
its ancestors, it is a straightforward matter of using the one attribute
wildcard. If, however, one or more of the ancestor types have an at-
tribute wildcard, the effective wildcard is the union of the new wildcard
and all ancestor wildcards. The value for processContents is taken
from the new derived type, and the union of the namespace constraints
of the attribute wildcards is used. A simple rule of thumb is that
if an attribute is an allowed replacement attribute for at least one of
the attribute wildcards, it can be used.
Example 13–11 shows the definition of DerivedType that extends
BaseType. Both DerivedType and BaseType have attribute wildcards
specified, with different values for processContents and namespace.
Example 13–12 shows the effective definition of DerivedType, after
taking the union of the two attribute wildcards. Note that the value of
processContents is taken from the derived type, and the namespace
list is the union of those of the two types.
316 Chapter 13 | Deriving complex types

Example 13–11. Attribute wildcard extension

<xs:complexType name="BaseType">
<xs:anyAttribute processContents="lax"
namespace="##local
http://datypic.com/prod"/>
</xs:complexType>

<xs:complexType name="DerivedType">
<xs:complexContent>
<xs:extension base="BaseType">
<xs:anyAttribute processContents="strict"
namespace="##targetNamespace
http://www.w3.org/1999/xhtml"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>

Example 13–12. Effective attribute wildcard

<xs:complexType name="DerivedType">
<xs:anyAttribute processContents="strict"
namespace="##local
http://datypic.com/prod
##targetNamespace
http://www.w3.org/1999/xhtml"/>
</xs:complexType>

13.5 Complex type restrictions

Complex types may be restricted by eliminating or restricting attribute
declarations as well as by subsetting content models. When restriction
is used, instances of the derived type will always be valid for the
base type as well. Table 13–6 shows the legal restrictions for each
content type.
13.5 | Complex type restrictions 317

Table 13–6 Legal restrictions by content type

BASE TYPE
Complex type
Simple
type Simple Element- Mixed Empty
content only
D Simple type yes, see no no no no
E 8.3.1
R Simple no yes, see no yes,† see no
I content 13.5.1 13.5.3
V
E Element- no no yes, see yes, see no
D Complex only 13.5.2 13.5.3
type Mixed no no no yes, see no
T
Y 13.5.3
P Empty no no yes,† see yes,† see yes, see
E 13.5.2 13.5.3 13.5.4
† If all children are optional

13.5.1 Simple content restrictions

The purpose of a simple content restriction is to restrict the simple
content and/or attribute declarations of a complex type. Table 13–7
shows the syntax of a restriction element that is the child of a
simpleContent element. The base attribute must refer to a complex
type with simple content, not a simple type. This is because a restriction
of a simple type is another simple type, not a complex type.
In Example 13–1 we deﬁned a complex type SizeType that had
simple content, and declared a system attribute. Example 13–13 shows
a new type, SmallSizeType, which restricts SizeType. It restricts
both the content, by applying the minInclusive and maxInclusive
facets, and the system attribute declaration, by making it required.
See Section 13.5.5 on p. 333 for more information on restricting
attribute declarations.
318 Chapter 13 | Deriving complex types

Table 13–7 XSD Syntax: simple content restriction

Name
restriction

Parents
simpleContent

Attribute name Type Description

id ID Unique ID.
base QName Base type being restricted.
Content
annotation?, simpleType?, (enumeration | length | maxExclusive
| maxInclusive | maxLength | minExclusive | minInclusive |
minLength | pattern | totalDigits | fractionDigits |
whiteSpace| pattern | 1.1 assertion | 1.1 explicitTimezone |
1.1
{any element in another namespace})*, (attribute |
attributeGroup)*, anyAttribute?

Example 13–13. Simple content restriction

<xs:complexType name="SmallSizeType">
<xs:simpleContent>
<xs:restriction base="SizeType">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="6"/>
<xs:attribute name="system" type="xs:token"
use="required"/>
</xs:restriction>
</xs:simpleContent>
</xs:complexType>

13.5.2 Complex content restrictions

Complex content restrictions allow you to restrict the content model
and/or attribute declarations of a complex type. Table 13–8 shows
13.5 | Complex type restrictions 319

Table 13–8 XSD Syntax: complex content restriction

Name
restriction

Parents
complexContent

Attribute name Type Description

id ID Unique ID.
base QName Base type being restricted.
Content
annotation?, 1.1 openContent?, (group | all | choice |
sequence)?, (attribute | attributeGroup)*, anyAttribute?,
1.1
assert*

the syntax of a restriction element that is the child of a

complexContent element.
When restricting complex content, it is necessary to repeat all of
the content model that is desired. The full content model speciﬁed
in the restriction becomes the content model of the derived type. This
content model must be a restriction of the content model of the base
type. This means that all instances of the new restricted type must also
be valid for the base type.
Example 13–14 shows the deﬁnition of a complex type Restricted-
ProductType that restricts the complex type ProductType by elimi-
nating the size and color child elements. This is legal because all
instances of RestrictedProductType are also valid according to
ProductType. However, if the size element declaration had a
minOccurs value of 1 in ProductType, the restriction would not be
legal, because values of RestrictedProductType would not be valid
according to ProductType; they would be missing a required element.
320 Chapter 13 | Deriving complex types

Example 13–14. Complex content restriction

<xs:complexType name="RestrictedProductType">
<xs:complexContent>
<xs:restriction base="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>

In most cases, you can use common sense to determine whether a

restriction is legal. If you can think of a valid instance of the derived
type that is not valid for the base type, there is a problem with your
restriction. In case you want to do a more thorough analysis, the rest
of this section describes the rules for legal content model restrictions
that are detailed in version 1.0.
In version 1.1, these speciﬁc rules have been replaced with a general
statement that the derived type must be more restrictive than the base
type. However, the following sections may still be useful as a guideline
for the types of restrictions that can be deﬁned.

13.5.2.1 Eliminating meaningless groups

Any meaningless groups may be eliminated. This includes:

# Groups with no children

# Groups that have minOccurs and maxOccurs equal to 1, and
only have one child
13.5 | Complex type restrictions 321

# sequence groups that have minOccurs and maxOccurs equal

to 1 and are contained in another sequence group (this is
illustrated in Example 13–15)
# choice groups that have minOccurs and maxOccurs equal to
1 and are contained in another choice group

Example 13–15. Eliminating meaningless groups

Base group:

<xs:sequence>
<xs:sequence>
<xs:element name="a"/>
<xs:element name="b"/>
</xs:sequence>
</xs:sequence>

Legal restriction:

<xs:sequence>
<xs:element name="a"/>
<xs:element name="b"/>
</xs:sequence>

13.5.2.2 Restricting element declarations

When restricting a speciﬁc element declaration, several rules apply.

# The occurrence constraints in the derived element declaration

must be equal or more restrictive. This is illustrated by a in
Example 13–16.
# The type in the derived element declaration must be a restriction
of the type in the base element declaration (or they must have
the same type). This is illustrated by c in Example 13–16.
# If the base element declaration specified a fixed value, the derived
element declaration must specify the same fixed value. This is
illustrated by b in Example 13–16.
322 Chapter 13 | Deriving complex types

# The identity constraints (key, keyref, unique) in the derived

element declaration must be more restrictive than those of the
base element declaration.
# The contents of the block attribute of the derived ele-
ment declaration must be a subset of that of the base element
declaration.
# If the base element declaration had nillable set to false, the
derived element declaration cannot reverse that property.

Example 13–16. Restricting element declarations

Base group:

<xs:sequence>
<xs:element name="a" maxOccurs="3"/>
<xs:element name="b" fixed="bValue"/>
<xs:element name="c" type="xs:string"/>
</xs:sequence>

Legal restriction:

<xs:sequence>
<xs:element name="a" maxOccurs="2"/>
<xs:element name="b" fixed="bValue"/>
<xs:element name="c" type="xs:token"/>
</xs:sequence>

Illegal restriction:

<xs:sequence>
<xs:element name="a" maxOccurs="4"/>
<xs:element name="b" fixed="newValue"/>
<xs:element name="c" type="xs:integer"/>
</xs:sequence>

13.5.2.3 Restricting wildcards

When replacing an element wildcard with speciﬁc element declarations
or a group of element declarations, these derived declarations must
13.5 | Complex type restrictions 323

yield valid replacement elements for the wildcard, in terms of their

namespace and occurrence constraints. This is illustrated in Exam-
ple 13–17 which shows a restriction that is illegal for two reasons. First,
b is illegal because it is in the same namespace as the other ele-
ments (while the wildcard says ##other). Second, two replacement
elements are declared, but the wildcard has a maxOccurs of 1.

Example 13–17. Replacing a wildcard with element declarations

Base group:

<xs:sequence>
<xs:element name="a"/>
<xs:any namespace="##other" maxOccurs="1"/>
</xs:sequence>

Legal restriction:

<xs:sequence>
<xs:element name="a"/>
<xs:element ref="otherns:b"/>
</xs:sequence>

Illegal restriction:

<xs:sequence>
<xs:element name="a"/>
<xs:element ref="b"/>
<xs:element name="c"/>
</xs:sequence>

When replacing an element wildcard with another element wildcard,

the derived wildcard’s namespace constraint must be a subset of the
base wildcard’s namespace constraint, as described in Section 13.5.6
on p. 335. Also, the occurrence constraints must be a subset. This is
illustrated in Example 13–18, which shows a restriction that is illegal
because neither the namespace constraint nor the occurrence constraint
speciﬁes a subset of what is allowed by the base wildcard.
324 Chapter 13 | Deriving complex types

Example 13–18. Replacing a wildcard with another wildcard

Base wildcard:

<xs:any namespace="urn:a:1 urn:a:2" maxOccurs="2"/>

Legal restriction:

<xs:any namespace="urn:a:1" maxOccurs="1"/>

Illegal restriction:

<xs:any namespace="##other" maxOccurs="3"/>

13.5.2.4 Restricting groups

When replacing a group with an element declaration, it must be valid
for an instance of that group to just have that one element child. For
example, a choice group that contains that element declaration, or a
sequence group declaring all other elements optional, would work as
base groups in this case. This is illustrated in Example 13–19.
When replacing a group with another group, the occurrence con-
straints must become more restrictive. For example, if the maxOccurs
value for a group in the base type is 5, the group in the derived type
cannot have a maxOccurs that is greater than 5. This is illustrated in
Example 13–20.

Example 13–19. Replacing a group with an element declaration

Base group:

<xs:sequence>
<xs:element name="a"/>
<xs:element name="b" minOccurs="0"/>
</xs:sequence>

Legal restriction:

<xs:element name="a"/>
13.5 | Complex type restrictions 325

Example 13–20. Restricting occurrence constraints of a group

Base group:

<xs:sequence minOccurs="2" maxOccurs="5">

<xs:element name="a"/>
<xs:element name="b"/>
</xs:sequence>

Legal restriction:

<xs:sequence minOccurs="3" maxOccurs="4">

<xs:element name="a"/>
<xs:element name="b"/>
</xs:sequence>

Illegal restriction:

<xs:sequence minOccurs="0" maxOccurs="6">

<xs:element name="a"/>
<xs:element name="b"/>
</xs:sequence>

When replacing a group with a group of the same kind (all, choice,
or sequence), the order of the children (element declarations
and groups) must be preserved. This is true even for all and choice
groups, when the order is not signiﬁcant for validation. This is
illustrated in Example 13–21.

Example 13–21. Maintaining the order of the children in an all group

Base group:

<xs:all>
<xs:element name="a"/>
<xs:element name="b" minOccurs="0"/>
<xs:element name="c"/>
</xs:all>

(Continues)
326 Chapter 13 | Deriving complex types

Example 13–21. (Continued)

Legal restriction:

<xs:all>
<xs:element name="a"/>
<xs:element name="c"/>
</xs:all>

Illegal restriction:

<xs:all>
<xs:element name="c"/>
<xs:element name="a"/>
</xs:all>

When restricting an all or sequence group, if any child element

declarations or groups are not included in the derived group, they must
be optional in the base group. This is illustrated in Example 13–22.

Example 13–22. Restricting an all group

Base group:

<xs:all>
<xs:element name="a"/>
<xs:element name="b" minOccurs="0"/>
<xs:element name="c"/>
</xs:all>

Legal restriction:

<xs:all>
<xs:element name="a"/>
<xs:element name="c"/>
</xs:all>

(Continues)
13.5 | Complex type restrictions 327

Example 13–22. (Continued)

Illegal restriction:

<xs:all>
<xs:element name="a"/>
<xs:element name="b"/>
</xs:all>

When replacing a choice group with another choice group, the

child element declarations of the derived group must be a subset of
those in the base group. This is illustrated in Example 13–23.

Example 13–23. Restricting a choice group

Base group:

<xs:choice>
<xs:element name="a"/>
<xs:element name="b"/>
<xs:element name="c"/>
</xs:choice>

Legal restriction:

<xs:choice>
<xs:element name="a"/>
<xs:element name="c"/>
</xs:choice>

Illegal restriction:

<xs:choice>
<xs:element name="a"/>
<xs:element name="d"/>
</xs:choice>
328 Chapter 13 | Deriving complex types

When replacing an all group with a sequence group, each element

declaration in the all group cannot appear more than once in the
sequence group, or appear with maxOccurs greater than 1. This is
illustrated in Example 13–24.

Example 13–24. Replacing an all group with a sequence group

Base group:

<xs:all>
<xs:element name="a"/>
<xs:element name="b" minOccurs="0"/>
<xs:element name="c"/>
</xs:all>

Legal restriction:

<xs:sequence>
<xs:element name="a"/>
<xs:element name="c"/>
</xs:sequence>

Illegal restriction:

<xs:sequence>
<xs:element name="a"/>
<xs:element name="b"/>
<xs:element name="c" minOccurs="2"/>
</xs:sequence>

When replacing a choice group with a sequence group, the

maxOccurs of the choice group must be enough to cover the number
of elements that the sequence group will yield. This is illustrated in
Example 13–25.
13.5 | Complex type restrictions 329

Example 13–25. Replacing a choice group with a sequence group

Base group:

<xs:choice maxOccurs="2">
<xs:element name="a"/>
<xs:element name="b"/>
<xs:element name="c"/>
</xs:choice>

Legal restriction:

<xs:sequence>
<xs:element name="a"/>
<xs:element name="c"/>
</xs:sequence>

Illegal restriction:

<xs:sequence>
<xs:element name="a"/>
<xs:element name="b"/>
<xs:element name="c"/>
</xs:sequence>

13.5.2.5 Restricting open content

It is possible to restrict a type that has open content, but as with the
rest of the content model, it is not inherited automatically. If open
content is desired in the restricted type, it is necessary to respecify it.
In order to be a legal restriction, the open content in the restricted type
should not be more permissive than the base type, in terms of both
the mode and the namespace constraint.
Example 13–26 shows the two examples of restricting open content.
The ﬁrst, LegalDerivedType, is legal because suffix mode is as
permissive as the base type, and the namespace constraint is more re-
strictive in the derived type (one choice instead of two). The second
example, IllegalDerivedType, is illegal because interleave mode
is more permissive than the base type, and the namespace constraint of
##any is also more permissive.
330 Chapter 13 | Deriving complex types

Example 13–26. Restricting open content

Base group:

<xs:complexType name="BaseType">
<xs:openContent mode="suffix">
<xs:any namespace="http://datypic.com/prod
http://datypic.com/ord"/>
</xs:openContent>
<xs:sequence>
<xs:element name="a" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:complexType>

Legal restriction:

<xs:complexType name="LegalDerivedType">
<xs:complexContent>
<xs:restriction base="BaseType">
<xs:openContent mode="suffix">
<xs:any namespace="http://datypic.com/prod"/>
</xs:openContent>
<xs:sequence>
<xs:element name="a" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>

Illegal restriction:

<xs:complexType name="IllegalDerivedType">
<xs:complexContent>
<xs:restriction base="BaseType">
<xs:openContent mode="interleave">
<xs:any namespace="##any"/>
</xs:openContent>
<xs:sequence>
<xs:element name="a" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
13.5 | Complex type restrictions 331

It is also legal to remove the openContent element completely in

a restriction, since that is less permissive. As you would expect, it is not
legal to add one unless the mode is none.

13.5.3 Mixed content restrictions

Complex types with mixed content may be restricted to derive other
complex types with mixed content or with element-only content. The
reverse is not true: It is not possible to restrict an element-only complex
type to result in a complex type with mixed content.
If you want the derived type to be mixed, you must specify the mixed
attribute for the derived type, since the quality of being mixed is not
inherited from the base type. Example 13–27 shows a mixed complex
type LetterType that is restricted to derive another mixed
complex type, RestrictedLetterType.

Example 13–27. Mixed content restriction

<xs:complexType name="RestrictedLetterType" mixed="true">

<xs:complexContent>
<xs:restriction base="LetterType">
<xs:sequence>
<xs:element name="custName" type="xs:string"/>
<xs:element name="prodName" type="xs:string"/>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
332 Chapter 13 | Deriving complex types

It is also possible to restrict a mixed content type to derive an empty

content type, or even a complex type with simple content. This is
only legal if all of the children in the content model of the base type
are optional. Example 13–28 shows a slightly different LetterType
deﬁnition where the sequence group is optional. The derived type
RestrictedLetterType will allow only character data content of
type string, with no children. Note that this is the only case where
a restriction element must have both a base attribute and a
simpleType child.

Example 13–28. Mixed content restricted to simple content

<xs:complexType name="LetterType" mixed="true">
<xs:sequence minOccurs="0">
<xs:element name="custName" type="xs:string"/>
<xs:element name="prodName" type="xs:string"/>
<xs:element name="prodSize" type="xs:integer"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="RestrictedLetterType">
<xs:simpleContent>
<xs:restriction base="LetterType">
<xs:simpleType>
<xs:restriction base="xs:string"/>
</xs:simpleType>
</xs:restriction>
</xs:simpleContent>
</xs:complexType>

13.5.4 Empty content restrictions

Complex types with empty content may be restricted, but the restriction
applies only to the attributes. The derived type must also have empty
content. Example 13–29 shows a restriction of the empty complex
type ItemType. The only restriction is applied to the type of the
routingNum attribute.
13.5 | Complex type restrictions 333

Example 13–29. Empty content restriction

<xs:complexType name="ItemType">
<xs:attribute name="routingNum" type="xs:integer"/>
</xs:complexType>

<xs:complexType name="RestrictedItemType">
<xs:complexContent>
<xs:restriction base="ItemType">
<xs:attribute name="routingNum" type="xs:short"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>

13.5.5 Attribute restrictions

When deﬁning a restriction, you may restrict or eliminate attribute
declarations of the base type. All attribute declarations are passed down
from the base type to the derived type, so the only attribute declarations
that need to appear in the derived type deﬁnition are those you want
to restrict or remove. The legal ways to restrict an attribute declaration
are as follows:

# Change the type, as long as the new type is a restriction (or a

restriction of a restriction, etc.) of the original type
# Add, change, or remove a default value
# Add a ﬁxed value if none is present in the base type
# Make optional attributes required
# Make optional attributes prohibited

It is not legal in a restriction to

# Change the type to one that is not a restriction of the

original type
# Change or remove a ﬁxed value
# Make required attributes optional
# Make required attributes prohibited
334 Chapter 13 | Deriving complex types

Example 13–30 shows a deﬁnition of DerivedType which legally

restricts BaseType. The declarations of attributes a, b, c, d, e, f, and
g represent, respectively, changing the type, adding a default, changing
a default, adding a fixed value, keeping the fixed value the same,
making an optional attribute required, and prohibiting an optional
attribute. Instances of DerivedType can also have the attribute
x, although it is not mentioned in the definition. This is because all of
the attributes of BaseType are passed down to DerivedType.

Example 13–30. Legal restrictions of attributes

<xs:complexType name="BaseType">
<xs:attribute name="a" type="xs:integer"/>
<xs:attribute name="b" type="xs:string"/>
<xs:attribute name="c" type="xs:string" default="c"/>
<xs:attribute name="d" type="xs:string"/>
<xs:attribute name="e" type="xs:string" fixed="e"/>
<xs:attribute name="f" type="xs:string"/>
<xs:attribute name="g" type="xs:string"/>
<xs:attribute name="x" type="xs:string"/>
</xs:complexType>

<xs:complexType name="DerivedType">
<xs:complexContent>
<xs:restriction base="BaseType">
<xs:attribute name="a" type="xs:positiveInteger"/>
<xs:attribute name="b" type="xs:string" default="b"/>
<xs:attribute name="c" type="xs:string" default="c2"/>
<xs:attribute name="d" type="xs:string" fixed="d"/>
<xs:attribute name="e" type="xs:string" fixed="e"/>
<xs:attribute name="f" type="xs:string" use="required"/>
<xs:attribute name="g" type="xs:string" use="prohibited"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>

Example 13–31 shows a deﬁnition of IllegalDerivedType, which

illegally restricts the complex type BaseType2. Attribute h is illegal
because decimal is not a restriction of integer. Attribute i is
illegal because the ﬁxed value is changed. Attribute j is illegal because
the ﬁxed value is removed and replaced by a default value. Attribute k
13.5 | Complex type restrictions 335

is illegal because a required attribute is made optional. Attribute l is

illegal because a required attribute is made prohibited. Attributes
pref:l and m are illegal because they do not appear in the deﬁnition
of BaseType2.

Example 13–31. Illegal attribute restrictions

<xs:complexType name="BaseType2">
<xs:attribute name="h" type="xs:integer"/>
<xs:attribute name="i" type="xs:string" fixed="i"/>
<xs:attribute name="j" type="xs:string" fixed="j"/>
<xs:attribute name="k" type="xs:string" use="required"/>
<xs:attribute name="l" type="xs:string" use="required"/>
</xs:complexType>

<xs:complexType name="IllegalDerivedType">
<xs:complexContent>
<xs:restriction base="BaseType2">
<xs:attribute name="h" type="xs:decimal"/>
<xs:attribute name="i" type="xs:string" fixed="i2"/>
<xs:attribute name="j" type="xs:string" default="j"/>
<xs:attribute name="k" type="xs:string"/>
<xs:attribute name="l" type="xs:string" use="prohibited"/>
<xs:attribute ref="pref:l"/>
<xs:attribute name="m" type="xs:string"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>

13.5.6 Attribute wildcard restrictions

Unlike attribute declarations, attribute wildcards are not automatically
passed down from the base type to the restricted type. If you want to
use an attribute wildcard for the restricted type, you must specify it
inside the restriction element.
When an attribute wildcard is speciﬁed in a restriction, that wildcard
becomes the effective wildcard of the type, overriding any attribute
wildcards of the base type or its ancestors. However, if any ancestor
has an attribute wildcard, the namespace constraint of the new wildcard
336 Chapter 13 | Deriving complex types

must be a subset of the ancestor wildcard’s namespace constraint.

Table 13–9 shows the legal subsets of namespace constraints.

Table 13–9 Wildcard namespace subsets

Base type Derived type
##any Any value or list of values
Any list of values (including Any list of values that is the same or a subset
##targetNamespace and of the base type’s list
##local)
##other ##other (if the target namespaces of the base
type and of the derived type are the same)
or any list of values that does not include the
target namespace of the base type or ##local

Example 13–32 shows a deﬁnition of DerivedType that restricts

BaseType. Both DerivedType and BaseType have attribute
wildcards speciﬁed, with different values for processContents
and namespace. This deﬁnition is legal because DerivedType’s
wildcard is a subset of BaseType’s wildcard.

Example 13–32. Restricting an attribute wildcard

<xs:complexType name="BaseType">
<xs:anyAttribute processContents="lax" namespace="##any"/>
</xs:complexType>

<xs:complexType name="DerivedType">
<xs:complexContent>
<xs:restriction base="BaseType">
<xs:anyAttribute processContents="strict"
namespace="##targetNamespace
http://www.w3.org/1999/xhtml"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
13.5 | Complex type restrictions 337

It is also possible to restrict an attribute wildcard by replacing it with

declarations for attributes that are valid according to that wildcard.
This is illustrated in Example 13–33.

Example 13–33. Replacing an attribute wildcard with attributes

<xs:complexType name="BaseType">
<xs:anyAttribute processContents="lax" namespace="##any"/>
</xs:complexType>

<xs:complexType name="DerivedType">
<xs:complexContent>
<xs:restriction base="BaseType">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="name" type="xs:string"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>

13.5.7 Restricting types from another namespace

Sometimes it is useful to define complex types in your schema
that are restrictions of base types defined in another target
namespace. It may be that you are embedding elements from
another XML vocabulary in your elements, and you only want
to allow a restricted subset of the other vocabulary. Example 13–34
shows this case where the base type, ProductType, is in a
schema document with http://datypic.com/prod as the target
namespace. RestrictedProductType is a derived type, but it
is defined in a schema document whose target namespace is
http://datypic.com/ord.
This example shows a legal restriction because the complex type
contains references to global element declarations. All of the element
names in the restricted type are still in the http://datypic.com/prod
namespace, as evidenced by the use of the prod prefix.
338 Chapter 13 | Deriving complex types

Example 13–34. Restricting a type from another namespace with global

declarations
prod.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://datypic.com/prod"
xmlns:prod="http://datypic.com/prod"
elementFormDefault="qualified"
attributeFormDefault="qualified">
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element ref="prod:number"/>
<xs:element ref="prod:name"/>
<xs:element ref="prod:size" minOccurs="0"/>
</xs:sequence>
<xs:attribute ref="prod:dept"/>
</xs:complexType>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer"/>
<xs:attribute name="dept" type="xs:string"/>
</xs:schema>

ord.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod">
<xs:import namespace="http://datypic.com/prod"
schemaLocation="prod.xsd"/>
<xs:complexType name="RestrictedProductType">
<xs:complexContent>
<xs:restriction base="prod:ProductType">
<xs:sequence>
<xs:element ref="prod:number"/>
<xs:element ref="prod:name"/>
</xs:sequence>
<xs:attribute ref="prod:dept" use="required"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:schema>
13.5 | Complex type restrictions 339

A problem arises, however, if local element declarations are used and

they are qualified with a namespace name (either via a form attribute
on the element declaration, or an elementFormDefault attribute on
the schema). In that case, the name attribute is used instead of ref,
and it is not legal to use a namespace prefix in the name attribute; all
of the values of name take on the target namespace of the schema doc-
ument. If Example 13–34 were modified to use local element declara-
tions, the elements in RestrictedProductType would take on the
http://datypic.com/ord namespace, and no longer be a valid
restriction of the base type since the element names have changed. The
same problem arises for attributes as well as elements, but this occurs
less frequently since qualified local attribute declarations are less
common.
In version 1.0, this problem is typically avoided by creating a new
schema document in the http://datypic.com/prod namespace
whose sole purpose is to restrict the original schema document.
That new schema document is the one that is imported into the
http://datypic.com/ord schema document, which can then
reference the restricted types.

13.5.7.1 Using targetNamespace on element and

attribute declarations
Starting in version 1.1, it is possible to restrict a type that has a different
target namespace, even if it uses qualiﬁed local declarations. This
is addressed by the use of a targetNamespace attribute, which can
appear on a local element declaration or local attribute declaration.
This is shown in Example 13–35, which is similar to Example 13–34
but with local declarations instead of global ones. The
targetNamespace attribute is used on the two element declarations
and one attribute declaration in the restricted type to indicate that these
names still refer to the http://datypic.com/prod namespace.
340 Chapter 13 | Deriving complex types

Example 13–35. Using targetNamespace on element and attribute

declarations
prod.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://datypic.com/prod"
elementFormDefault="qualified"
attributeFormDefault="qualified">
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
</xs:complexType>
</xs:schema>

ord.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod"
elementFormDefault="qualified"
attributeFormDefault="qualified">
<xs:import namespace="http://datypic.com/prod"
schemaLocation="prod.xsd"/>
<xs:complexType name="RestrictedProductType">
<xs:complexContent>
<xs:restriction base="prod:ProductType">
<xs:sequence>
<xs:element name="number" type="xs:string"
targetNamespace="http://datypic.com/prod"/>
<xs:element name="name" type="xs:string"
targetNamespace="http://datypic.com/prod"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string" use="required"
targetNamespace="http://datypic.com/prod"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:schema>
13.6 | Type substitution 341

Without the targetNamespace attributes, this example would not

be a legal restriction because it would be trying to change the name-
spaces of the elements and the attribute from http://datypic.com/
prod to http://datypic.com/ord.
Note that this technique is only allowed when restricting a type from
another namespace. It is not possible to use the targetNamespace
attribute generally to declare elements and attributes in a target
namespace other than that of the schema document.

13.6 Type substitution

One of the elegant features of derived types is that they can substitute
for their ancestor types in instances. In an instance, an element declared
to be of one type can actually have any type that either extends or
restricts it. Suppose we have a section of a purchase order that lists
products of various kinds. We want repeating product elements, but
we also want to allow different content models for each kind of product.
For example, a shirt may have a color and a size, in addition to the
normal product information.
Example 13–36 shows a deﬁnition of ShirtType that extends
ProductType. It adds the children size and color to the end of the
content model.

Example 13–36. A derived type

(Continues)
342 Chapter 13 | Deriving complex types

Example 13–36. (Continued)

Example 13–37 shows a valid instance of product. Instead of

ProductType, it has the type ShirtType which allows it to contain
the color element. It uses the xsi:type attribute to indicate the type
substitution. We could deﬁne an additional type for every kind of
product, each with a different content model.

Example 13–37. Substitution of ShirtType for ProductType

<product xsi:type="ShirtType">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<color>blue</color>
</product>

</items>

The xsi:type attribute is part of the XML Schema Instance

Namespace, which must be declared in the instance. This attribute
does not, however, need to be declared in the type deﬁnition for
product; a schema processor recognizes xsi:type as a special attribute
that may appear on any element.
13.7 | Controlling type derivation and substitution 343

13.7 Controlling type derivation and

substitution
Type derivation is a powerful tool, but in some cases, you may want
to control the creation or substitution of derived types. Three properties
of complex types control their derivation:

# The final property limits the deﬁnition of derived types in

schemas.
# The block property limits the substitution of derived types
in instances.
# The abstract property forces the deﬁnition of derived types.

This section describes each of these three properties in detail.

13.7.1 final: Preventing complex type derivation

You may want to prevent the derivation of other complex types
from your type. This is accomplished using the final attribute, which
may have one of the following values:

# #all prevents any other types from extending or restricting

your type.
# extension prevents any other types from extending your type.
# restriction prevents any other types from restricting your
type.
# extension restriction and restriction extension
have the same effect as #all.
# "" (an empty string) means that there are no restrictions. This
value is useful for overriding the value of finalDefault, as
described below.
344 Chapter 13 | Deriving complex types

# If no final attribute is speciﬁed, it takes its value from

the finalDefault attribute of the schema element.1 If neither
final nor finalDefault is speciﬁed, there are no restrictions
on derivation of that complex type.

Example 13–38 shows the deﬁnition of a complex type that cannot

be restricted or extended by any other type.

Example 13–38. Preventing derivation

<xs:complexType name="ProductType" final="#all">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>

13.7.2 block: Blocking substitution of derived types

As we saw in Section 13.6 on p. 341, derived types may substitute for
their ancestor types in an instance. While this is a valuable feature,
there are times when you only want to allow the original type to be
used. This is accomplished using the block attribute, which may have
one of the following values:

# #all prevents any derived types from substituting for your type
in instances.
# extension prevents any extensions of your type from
substituting for your type in instances.
# restriction prevents any restrictions of your type from
substituting for your type in instances.

1. The finalDefault attribute can contain the values list and union
which are not applicable to complex types. If these values are present, they
are ignored in this context.
13.7 | Controlling type derivation and substitution 345

# extension restriction and restriction extension

have the same effect as #all.
# "" (an empty string) means that there are no restrictions. This
value is useful for overriding the value of blockDefault, as
described below.
# If no block attribute is speciﬁed, it takes its value from
the blockDefault attribute of the schema element. If neither
block nor blockDefault is speciﬁed, there are no restrictions.

Example 13–39 shows a deﬁnition of ProductType that does not

allow extensions of the type to be used in its place.

Example 13–39. Preventing substitution of derived types

<xs:complexType name="ProductType" block="extension">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:element name="product" type="ProductType"/>

The deﬁnition of ShirtType in this example is completely legal.

The block attribute does not prohibit extensions of ProductType,
just the substitution of the extensions in place of the original type
in the instance. Example 13–40 shows an illegal instance where
the element product is attempting to substitute ShirtType for
346 Chapter 13 | Deriving complex types

ProductType. This example would have been legal if the block

attribute had not been used.

Example 13–40. Illegal substitution of ShirtType

<product xsi:type="ShirtType">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<color>blue</color>
</product>

13.7.3 Blocking type substitution in element

declarations
You can also block type substitution for an element declaration that
uses the type, rather than the type itself. An element element can
also have the block attribute, with the same valid values as for
complexType.1 If, in Example 13–39, the block="extension"
attribute had appeared in the product element declaration rather than
in the ProductType deﬁnition, the effect would have been the same
as far as the product instance elements are concerned. Other elements
using ProductType would then be free to substitute derived types.

13.7.4 abstract: Forcing derivation

Abstract complex types are types that cannot be used in instances. They
exist solely as placeholders for their derived types. Example 13–41
shows our ProductType example as an abstract type.

1. The block attribute of element may also contain the value

substitution, as described in Section 16.7.2 on p. 419.
13.7 | Controlling type derivation and substitution 347

Example 13–41. An abstract type

<xs:complexType name="ProductType" abstract="true">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:element name="product" type="ProductType"/>

Note that product is declared to be of the type ProductType. This

is legal, but if a product element appears in an instance, it must
use the xsi:type attribute to indicate a type that is derived from
ProductType, as shown in Example 13–42.

Example 13–42. Legal instances of product and shirt

<product xsi:type="ShirtType">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<color>blue</color>
</product>

<shirt>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<color>blue</color>
</shirt>
348 Chapter 13 | Deriving complex types

Example 13–43 shows two illegal product elements that attempt

to use the type ProductType.

Example 13–43. Illegal uses of the abstract ProductType

<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
</product>

<product xsi:type="ProductType">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
</product>
Assertions

350
Chapter

ssertions are a powerful new feature in 1.1 that allows you

A to specify additional constraints using XPath 2.0. This ad-

dresses a signiﬁcant limitation in XML Schema 1.0 that
prevented the deﬁnition of co-constraints, where one data
item affects the validity of another. It also generally allows for much
more complex validation criteria to be expressed.
This chapter covers assertions, as well as conditional type assignment
which allows the type of an element to be determined by an XPath
expression on its attributes. Conditional type assignment is also new
in version 1.1. Although this feature is separate from assertions, it has
similar syntax and some overlapping use cases.

14.1 Assertions
Assertions are deﬁned on types, rather than element or attribute decla-
rations, so they are shared across all elements or attributes that have a
particular type. Example 14–1 shows two types, one simple and one

351
352 Chapter 14 | Assertions

complex, that have assertions. For SizeType, it is testing to make sure

that the value is not equal to zero. For ProductType, it is testing
the validity of the product number, based on the department.

Example 14–1. Assertions on simple and complex types

<xs:simpleType name="SizeType">
<xs:restriction base="xs:integer">
<xs:assertion test="$value != 0"/>
</xs:restriction>
</xs:simpleType>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="SizeType"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
<xs:assert test="(@dept = 'ACC' and number > 500) or
(number < 300)"/>
</xs:complexType>

As you can see, two different elements are used: assertion is used
in simple types (and in simple content), and assert is used in complex
types. Both assertion and assert have a test attribute that speciﬁes
an XPath expression. The XPath returns a Boolean (true/false) value.
If the expression is true, the element or attribute is valid with respect
to the assertion. If it is false, it is invalid.
Assertions are speciﬁed using XPath 2.0, which is a powerful language
that includes over a hundred built-in functions and many operators.
This chapter describes some of the XPath 2.0 functions, operators, and
expression syntax that are most useful for assertions, but it is by no
means complete. For a complete explanation of all XPath operators
and syntax, you can refer to the XML Path Language (XPath) 2.0
recommendation at www.w3.org/TR/xpath20.
Syntactically, any XPath 2.0 is allowed in an assertion. However,
one limitation of assertions is that your XPath expression has to stay
within the scope of the type itself. It can only access attributes, content,
14.1 | Assertions 353

and descendants of the element that has that type. It cannot access the
parent or other ancestor elements, siblings, separate XML documents,
or any other nondescendant elements. This means that for cross-
element validation, the assertion needs to be speciﬁed on an ancestor
type that contains all of the elements or attributes mentioned in the
assertion.

14.1.1 Assertions for simple types

Assertions in simple types are facets, and as such they appear alongside
all the other facets inside a restriction element. That facet is called
assertion, and its syntax is shown in Table 14–1.

Table 14–1 XSD Syntax: simple type assertion

Name
assertion

Parents
restriction

Attribute name Type Description

id ID Unique ID.
test XPath expression Assertion test.
xpathDefaultNamespace anyURI | The default
"##defaultNamespace" | namespace
"##targetNamespace" | for XPath
"##local" expressions.
Content
annotation?

Example 14–2 shows an assertion on a simple type. Simple type as-

sertions are generally less complicated than those for complex types
354 Chapter 14 | Assertions

because there are no descendants, only a value to test. A special built-in

variable is used to access that value, called $value. There is no context
item for a simple type assertion, so you cannot use a period (.) to rep-
resent the current element or value1 like you might in some XPath
expressions.

Example 14–2. An assertion on a simple type

<xs:simpleType name="SizeType">
<xs:restriction base="xs:integer">
<xs:assertion test="$value != 0"/>
</xs:restriction>
</xs:simpleType>

The assertion facet can also be used inside the restriction ele-
ment for complex types with simple content, just like any other facet.
Example 14–3 shows two complex types with simple content, one
restricting the other by adding an assertion. However, if you need
to access the attributes of that type in the assertion, you should use an
assert instead, as shown later in Example 14–17.

Example 14–3. An assertion on the simple content of a complex type

<xs:complexType name="SizeType">
<xs:simpleContent>
<xs:extension base="xs:integer">
<xs:attribute name="system" type="xs:string"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
<xs:complexType name="RestrictedSizeType">
<xs:simpleContent>
<xs:restriction base="SizeType">
<xs:assertion test="$value != 0"/>
</xs:restriction>
</xs:simpleContent>
</xs:complexType>

1. Except inside subexpressions, such as predicates.

14.1 | Assertions 355

You can specify multiple assertions on the same simple type, in which
case they must all return true for the element or attribute to be valid.
Values of type DepartmentCodeType in Example 14–4 must be
valid with respect to both speciﬁed assertions. Assertions can be com-
bined with other facets, in any order. In fact, it is recommended that
you continue to use other facets if they can express the constraint. For
example, use the length facet as shown rather than an assertion with
a test of string-length($value) = 3.

Example 14–4. A simple type with more than one assertion

<xs:simpleType name="DepartmentCodeType">
<xs:restriction base="xs:token">
<xs:assertion test="not(contains($value,'X'))"/>
<xs:assertion test="substring($value,2,2) != '00'"/>
<xs:length value="3"/>
</xs:restriction>
</xs:simpleType>

14.1.1.1 Using XPath 2.0 operators

The XPath 2.0 language allows a number of operators in its syntax, for
example for performing comparisons and arithmetic operations.
Table 14–2 shows some of the operators that are likely to be used in
simple type assertions, along with examples.
Parentheses can be used in XPath to change the evaluation order of
these operators. For example, by default, and takes precedence over
or. The assertion in Example 14–5 uses parentheses around the ﬁrst
two comparisons to change the evaluation order. Without the paren-
theses, the second and third comparisons would have been combined
by and before evaluating the or.
Parentheses can also be used to create a sequence of multiple values
to test, as shown in the assertion in Example 14–6. The expression
evaluates to true if the value is any of the three strings listed.
356 Chapter 14 | Assertions

Table 14–2 Common XPath 2.0 operators

Operator Description Example
= equals $value = 'ABC'

!= not equals $value != 'ABC'

>, >=, <, <= comparison $value > 12

$value <= 50
and Boolean “and” $value > 12 and $value <= 50
or Boolean “or” $value <= 12 or $value > 50
+ addition $value + 2 > 12

- subtraction $value - 2 > 12

* multiplication $value * 2 > 12

div division $value div 2 > 12

mod modulus (remainder $value mod 2 = 0

after division)

Example 14–5. Using parentheses to change evaluation order

<xs:simpleType name="SizeType">
<xs:restriction base="xs:integer">
<xs:assertion test="($value < 12 or $value > 50)
and $value != 0"/>
</xs:restriction>
</xs:simpleType>

Example 14–6. Using parentheses to create sequences

<xs:simpleType name="DepartmentCodeType">
<xs:restriction base="xs:token">
<xs:assertion test="$value = ('ACC','WMN','MEN')"/>
</xs:restriction>
</xs:simpleType>
14.1 | Assertions 357

14.1.1.2 Using XPath 2.0 functions

XPath 2.0 includes over 100 built-in functions. Functions in XPath
are called using a syntax that is probably familiar from other program-
ming languages: the function name, followed by parentheses that con-
tain the arguments to the function separated by commas. Table 14–3
provides a sample of built-in functions that would commonly
be used in simple type assertions. For a complete list, refer to the
XQuery 1.0 and XPath 2.0 Functions and Operators recommendation
at www.w3.org/TR/xpath-functions.

Table 14–3 Common XPath 2.0 functions on single values

Function name Returns
String-related
string-length($arg) The number of characters in the string.
substring($sourceString, A substring of the $sourceString, based
$startingLoc, $length) on a starting location and optional
length.
substring-before($arg1, A substring of $arg1 that appears before
$arg2) the ﬁrst occurrence of $arg2.
substring-after($arg1, A substring of $arg1 that appears after
$arg2) the ﬁrst occurrence of $arg2.
upper-case($arg) $arg converted to upper case.
lower-case($arg) $arg converted to lower case.
normalize-space($arg) $arg with whitespace normalized as if it
were a token value.
contains($arg1, $arg2) Whether $arg1 contains the string
$arg2.

starts-with($arg1, $arg2) Whether $arg1 starts with the string

$arg2.

(Continues)
358 Chapter 14 | Assertions

Table 14–3 (Continued)

Function name Returns
ends-with($arg1, $arg2) Whether $arg1 ends with the string
$arg2.
matches($input, $pattern, Whether $input matches a regular ex-
$flags) pression pattern, with an optional set of
ﬂags controlling how to interpret the
regular expression.
Number-related
round($arg) $arg rounded to the nearest integer.
round-half-to-even($arg, $arg rounded with the speciﬁed
$precision) precision.
Boolean-related
not($arg) True if $arg is false, false if $arg is true.
true() A Boolean true value.
false() A Boolean false value.
exists($arg) True if $arg is not the empty sequence.
empty($arg) True if $arg is the empty sequence.

These functions are all built in, and you do not need to use name-
space preﬁxes on their names. Your schema processor may support ad-
ditional implementation-deﬁned functions that are in other namespaces.
Typically, in simple type assertions, you will be passing $value as one
of the arguments. Table 14–4 shows some example values for
simple type assertions that use common XPath functions.
Note that the matches function interprets regular expressions
slightly differently from the pattern facet. The value of a pattern
facet is the regular expression for the whole string, with implied anchors
at the beginning and the end. The matches function, on the other
hand, tests whether a string contains any substring that matches the
pattern. To indicate that a pattern should match the start and/or end
14.1 | Assertions 359

Table 14–4 Examples of XPath 2.0 function calls

Example Explanation
not($value = ('ABC','DEF','GHI')) Value is not one of ABC, DEF,
or GHI.
substring($value,2,2) != 'XX' Value does not have XX in the
second position.
not(upper-case($value) = 'ABC') Value is not ABC in upper, lower,
or mixed case.
not(starts-with($value,'ABC')) Value does not start with ABC.
not(matches($value,'ÂBC')) Value does not match the regular
expression ÂBC.
matches($value,'ÂBC', 'i') Value matches the case-insensitive
regular expression ÂBC.
normalize-space($value) != '' Value is not all whitespace
characters.

of the entire string, anchors ^ (for the start of a string) and $ (for the
end of the string) must be used.
The examples in the table focus on assertions that cannot be
expressed with other facets. For example, to simply test whether a
value starts with ABC, you could use a pattern, as in <xs:pattern
value="ABC.*"/>. However, it usually requires an assertion to express
that a value must not match a pattern or an enumeration, or to
indicate that processing should be case-sensitive.

14.1.1.3 Types and assertions

XPath 2.0 is a type-aware language, meaning that the processor pays
attention to the types of values when performing operations on them.
It is not valid in XPath 2.0 to compare an integer to a string, at least
not with converting one value to the other’s type. Likewise, the built-in
functions require arguments to be of a speciﬁc type. For example, the
360 Chapter 14 | Assertions

substring function will not accept an integer as the ﬁrst argument,

because it is expecting a string.
The processor is getting the information about the type of the value
from the simple type deﬁnition itself. For example, if the simple type
is a restriction of integer, then the value will be treated like an integer.
Example 14–7 shows three simple types that have type errors in their
assertions.

1. SizeType is in error because the value is an integer and it is

being passed to the string-length function which expects a
string.
2. DepartmentCodeType is in error because the value is a string
but it is being compared to a number.
3. EffectiveDateTimeType is in error because the value is a
date/time but it is being compared to a string.

Example 14–7. Assertions with type errors

<xs:simpleType name="SizeType">
<xs:restriction base="xs:integer">
<xs:assertion test="string-length($value) < 2"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="DepartmentCodeType">
<xs:restriction base="xs:string">
<xs:assertion test="$value != 001"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="EffectiveDateTimeType">
<xs:restriction base="xs:dateTime">
<xs:assertion test="$value > '2000-01-01T12:00:00'"/>
</xs:restriction>
</xs:simpleType>

Some processors will treat these type errors like dynamic errors,
meaning that they are not reported as errors in the schema. Instead,
dynamic errors simply cause the assertion to return false, rendering the
element or attribute in question invalid. Most processors will issue
14.1 | Assertions 361

warnings in these cases, though. XPath syntax errors and other static
errors, on the other hand, will be flagged as errors in the schema by
your processor.
To correct type errors like these, one should consider whether the
simple types are being derived from the correct primitive types to start
with. If you are performing arithmetic operations on a value, perhaps
it should have a numeric type rather than a string type. For these
examples, let’s assume that the primitive types were chosen correctly.
SizeType is really trying to limit the size of the integer. In this case,
it makes sense to change it to use one of the bounds facets to limit
the value of the integer, instead of trying to constrain its string
representation.
For DepartmentCodeType, both operands in the comparison need
to have the same type (or have types derived from each other). You
could convert the $value to a numeric type, but the best approach
here is to put quotes around the 001 to make it a string. Comparing
them as strings takes into account the leading zeroes, which may be
significant in a string-based department code.
For EffectiveDateType, as with the previous example, the
operands need to be of comparable types. We could convert $value
to a string, but then it would compare the values as strings instead of
date/time values, which would mean that time zones may not be taken
into account correctly. Instead, it is preferable to convert the second
operand to a date/time type. This is done in XPath 2.0 using a type
constructor, which is a special kind of function whose name is the ap-
propriate built-in type name. It accepts a single argument, the value to
be converted. For example, xs:dateTime('2000-01-01T12:00:00')
converts the string to a date/time.
Example 14–8 shows our three examples, corrected to reflect the
types of the values.
In addition to the type constructor functions, there is a string
function that converts a value to a string, and a number function that
converts a value to a floating-point number (double). Both of these
functions also take a single argument, the value to be converted.
362 Chapter 14 | Assertions

Example 14–8. Assertions with corrected type errors

<xs:simpleType name="SizeType">
<xs:restriction base="xs:integer">
<xs:maxExclusive value="100"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="DepartmentCodeType">
<xs:restriction base="xs:string">
<xs:assertion test="$value != '001'"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="EffectiveDateTimeType">
<xs:restriction base="xs:dateTime">
<xs:assertion test="$value >
xs:dateTime('2000-01-01T12:00:00')"/>
</xs:restriction>
</xs:simpleType>

14.1.1.4 Inheriting simple type assertions

Like other facets, assertions are inherited when a simple type restricts
another simple type. Any assertions that are speciﬁed in the restriction
are added to the constraints on that value. In other words, a value
must conform to the assertions on its simple type and on any
other simple types it restricts, directly or indirectly. Values of
type NonOverheadDepartmentCodeType in Example 14–9 must
conform both to the assertion in that type and to the one speciﬁed in
DepartmentCodeType.

Example 14–9. A simple type with inherited assertions

<xs:simpleType name="DepartmentCodeType">
<xs:restriction base="xs:token">
<xs:assertion test="not(contains($value,'X'))"/>
<xs:length value="3"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="NonOverheadDepartmentCodeType">
<xs:restriction base="DepartmentCodeType">
<xs:assertion test="substring($value,2,2) != '00'"/>
</xs:restriction>
</xs:simpleType>
14.1 | Assertions 363

14.1.1.5 Assertions on list types

In most cases, $value evaluates to a single atomic value. However,
when the value has a list type and consists of multiple items, $value
is a sequence of multiple atomic values. You can still do a comparison,
such as $value > 2, but that will return true if at least one of the values
in the list is greater than 2. You can refer to specific values in the list
using numeric predicates—for example, $value[1] to get the first
value in the list. Example 14–10 shows an assertion on a list type stating
that the first item in the list must be 0.
When working with multiitem sequences, there are a number of
additional XPath functions that are useful. They are listed in Ta-
ble 14–5. As we will see later, these functions are also useful on complex
type assertions when there are repeating children—another example
of multiitem sequences.

Example 14–10. An assertion on a list type

<xs:simpleType name="SizeListType">
<xs:restriction>
<xs:simpleType>
<xs:list itemType="xs:integer"/>
</xs:simpleType>
<xs:assertion test="$value[1] = 0"/>
</xs:restriction>
</xs:simpleType>

Table 14–5 Common XPath 2.0 functions on multiitem sequences

Function name Returns
count($arg) The number of items in $arg.
distinct-values($arg) A sequence of the unique values in $arg.
avg($arg) The average of the values in $arg.
max($arg) The maximum of the values in $arg.
min($arg) The minimum of the values in $arg.
(Continues)
364 Chapter 14 | Assertions

Table 14–5 (Continued)

Function name Returns
sum($arg) The sum of the values in $arg.
position() The position of the current item in the current
context (typically used inside a predicate).
last() The number of items in the current context
(typically used inside a predicate to obtain the
last item).

Table 14–6 shows some additional examples of XPath tests that are
appropriate for list types.

Table 14–6 Examples of assertion tests on list types

Example Explanation
count($value) > 2 There are more than two items in
the list.
$value > 12 At least one of the values is greater
than 12.
not($value > 12) None of the values is greater
than 12.
sum($value) < 20 The sum of the values in the list is
less than 20.
$value[1] = 'ABC' The ﬁrst value in the list is equal
to ABC.
$value[last()] = 'ABC' The last value in the list is equal to
ABC.

$value[position() > 1] = 'ABC' At least one of the values after the

ﬁrst one in the list is equal to ABC.
count($value) = No values in the list are repeated.
count(distinct-values($value))
14.1 | Assertions 365

The assertions in Table 14–6 apply to the list as a whole. If you want
to constrain every value in the list, it makes more sense to put the asser-
tion on the item type instead. Example 14–11 is a simple type
SizeType that has one assertion on the item type of the list (testing
that the value is less than 12) and one assertion on the list itself
(testing the number of items in the list).

Example 14–11. Assertions on a list type and its item type

<xs:simpleType name="SizeListType">
<xs:restriction>
<xs:simpleType>
<xs:list>
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:assertion test="$value < 12"/>
</xs:restriction>
</xs:simpleType>
</xs:list>
</xs:simpleType>
<xs:assertion test="count($value) > 2"/>
</xs:restriction>
</xs:simpleType>

14.1.2 Assertions for complex types

For assertions on complex types, the assert element is used instead
of assertion. The assert element, whose syntax is shown in Ta-
ble 14–7, can appear in a complex type extension or restriction,
or can appear directly as a child of complexType if neither
simpleContent nor complexContent is used.
Example 14–12 shows a constraint where the valid values of a child
element (number) depend on the value of an attribute (dept). Con-
straints that cross multiple elements or attributes are sometimes called
co-constraints and are a common use case for complex type assertions.
If it were just a constraint on the number child individually, for example
that it must be greater than 500 or less than 300, the assertion could
have been put on the simple type of the number element. However,
366 Chapter 14 | Assertions

an assertion on the number element’s simple type would not have access
to the dept attribute since it is out of scope, so the assertion must be
moved up to the product parent.

Table 14–7 XSD Syntax: complex type assertion

Name
assert

Parents
complexType, extension, restriction

Attribute name Type Description

Example 14–12. An assertion on a complex type

<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
<xs:assert test="(@dept = 'ACC' and number > 500) or
(number < 300)"/>
</xs:complexType>
14.1 | Assertions 367

The same XPath operators and functions described in Sec-

tions 14.1.1.1 on p. 355 and 14.1.1.2 on p. 357 can be used in these
complex type assertions. One difference is that in complex content as-
sertions, you do not use the $value variable (since there is no simple
value) but instead use element and attribute names to indicate the values
to be tested. Attribute names are preceded by an at sign (@) in XPath,
as shown in the reference to @dept.

14.1.2.1 Path expressions

When element or attribute names are used, they are known as path ex-
pressions, and they are evaluated relative to the element being validated.
In Example 14–12, it is looking for a dept that is an attribute of
product (or any other element of type ProductType), and a number
element that is a direct child of product.
Path expressions can also have multiple steps, separated by forward
slashes, that access elements and attributes further down in the element
content. Example 14–13 shows a complex type CatalogType that is
one level up from ProductType. To access the number child of
product from there, it uses the multistep path product/number.
Relative to catalog, product/number brings back multiple number
elements and passes them all as a sequence to the max function, then
compares that maximum to the maxNumber attribute of catalog.
As an alternative to specifying the exact path down to a descendant
element, you can use the shortcut .// before an element name to indi-
cate a descendant. For example, .//number, relative to catalog, will
bring back all number elements that are descendants anywhere within
the catalog, at any level.
Path expressions often involve predicates, which are Boolean expres-
sions in square brackets that ﬁlter the elements and attributes returned
by the expression. An element to which a predicate is applied is
only returned if the Boolean expression returns true. For example,
product[number > 500] will test for products whose number is
greater than 500. Table 14–8 shows some examples of assertions using
predicates that could apply to CatalogType from Example 14–13.
368 Chapter 14 | Assertions

Example 14–13. An assertion with a multistep path

<xs:element name="catalog" type="CatalogType"/>
<xs:complexType name="CatalogType">
<xs:sequence>
<xs:element name="product" type="ProductType"
maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="maxNumber" type="xs:integer"/>
<xs:assert test="not(max(product/number) > @maxNumber)"/>
</xs:complexType>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
</xs:complexType>

Table 14–8 Examples of assertion with predicates

Example Explanation
product[number < 500] There is at least one product whose
number is less than 500.
not(product[number < 500]) There is no product whose number
is less than 500.
product[number < 500 and There is at least one product whose
@dept='ACC'] number is less than 500 and whose
department is ACC.
product[2][number < 500] The second product has a number
that is less than 500.
product[2][number] The second product has a number
child (regardless of value).
product[last()][number < 500] The last product has a number that
is less than 500.
14.1 | Assertions 369

For the second example in Table 14–8, you might think that you
can use product[number > 500] to test that product numbers are
greater than 500. However, that will return true if there is at least one
product number greater than 500; it does not ensure that all of
the products have a number greater than 500. Using the not function,
as shown in the table, works because it tests that there aren’t any that
are less than 500.
You may have noticed that most of the examples in the table actually
return product elements rather than a Boolean true/false value.
The results of XPaths used in assertions are automatically converted to
a Boolean value. A sequence of one or more elements or attributes is
treated as a “true” value, and an empty sequence (no elements or
attributes) is treated as a “false” value.

14.1.2.2 Conditional expressions

The XPath 2.0 language also includes an if-then-else construct, known
as a conditional expression, that is very useful for co-constraints. It uses
if, then, and else keywords and the else clause is always required.
The other syntactic requirement is that the if expression has to be in
parentheses. Conditional expressions can be nested so that one condi-
tional expression is embedded inside the then or else clause of another
conditional expression.
Example 14–14 shows such an assertion which tests for different
values of the dept attribute to determine the valid range of the number
child. Since the else clause is always required, it simply calls the false
function in the last clause, which means that if the department was
not one of the three speciﬁed departments, the product is not valid,
regardless of the number child.
370 Chapter 14 | Assertions

Example 14–14. An assertion using conditional expressions

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
<xs:assert test="if (@dept = 'ACC')
then number > 500
else if (@dept = 'WMN')
then number <= 300 and number > 200
else if (@dept = 'MEN')
then number < 200
else false()"/>
</xs:complexType>

14.1.2.3 Assertions in derived complex types

Assertions are inherited by derived complex types. Any assertions that
are speciﬁed in an extension or restriction are added to the constraints
on that type. In other words, an element must conform to the
assertions on its complex type and on any other complex types from
which its type is derived. Elements of type ExtendedProductType
in Example 14–15 must conform both to the assertion in that type and
the one speciﬁed in ProductType.

Example 14–15. Assertions in complex type extension

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
<xs:assert test="(@dept = 'ACC' and number > 500) or
(number < 300)"/>
</xs:complexType>

(Continues)
14.1 | Assertions 371

Example 14–15. (Continued)

<xs:complexType name="ExtendedProductType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:sequence>
<xs:element name="size" type="xs:integer" minOccurs="0"/>
</xs:sequence>
<xs:assert test="if (@dept = 'ACC')
then not(size)
else true()"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>

Assertions are also inherited in restrictions. Elements of type

RestrictedProductType in Example 14–16 must conform both to
the assertion in that type and the one specified in ProductType.
Unlike the content model, which needs to be respecified in the restricted
type definition, assertions are inherited automatically.

Example 14–16. Assertions in complex type restriction

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string" minOccurs="0"/>
<xs:element name="size" type="xs:integer" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
<xs:assert test="(@dept = 'ACC' and number > 500) or
(number < 300)"/>
</xs:complexType>
<xs:complexType name="RestrictedProductType">
<xs:complexContent>
<xs:restriction base="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer" minOccurs="0"/>
</xs:sequence>

(Continues)
372 Chapter 14 | Assertions

Example 14–16. (Continued)

<xs:attribute name="dept" type="xs:string"
use="required"/>
<xs:assert test="if (@dept = 'ACC')
then not(size)
else true()"/>
</xs:restriction>
</xs:complexContent>
</xs:complexType>

When a complex type with simple content extends a simple type, it

can use assert to add assertions to the simple type. This is useful as
an alternative to using assertion to restrict the content type, because
the assert allows access to the attributes while the assertion doesn’t.
The $value variable can be used in this case; just like with simple
types, $value will contain the content of the element, with an appro-
priate data type. Example 14–17 shows an assertion that tests both the
system attribute and the value of the element.

Example 14–17. An assertion on a complex type with simple content

<xs:complexType name="SizeType">
<xs:simpleContent>
<xs:extension base="xs:integer">
<xs:attribute name="system" type="xs:string"/>
<xs:assert test="if (@system='US')
then $value < 20
else $value >= 20"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>

14.1.3 Assertions and namespaces

If your schema has a target namespace, it is necessary to correctly preﬁx
the element and attribute names used in XPath expressions. Exam-
ple 14–18 shows a schema with a target namespace as well as a name-
space declaration that maps that namespace to the preﬁx prod. The
14.1 | Assertions 373

element names used in the assertion XPaths are then preﬁxed with
prod to indicate that they are in that namespace. Otherwise, the
processor would be looking for those elements in no namespace.

Example 14–18. Assertions using preﬁxed element names

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://datypic.com/prod"
xmlns:prod="http://datypic.com/prod"
elementFormDefault="qualified">
<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:string" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
<xs:assert test="(@dept = 'ACC' and prod:number > 500) or
(prod:number < 300)"/>
<xs:assert test="if (@dept = 'ACC')
then not(prod:size)
else true()"/>
</xs:complexType>
</xs:schema>

Note the fact that elementFormDefault is set to qualified,

which is what puts the locally declared number and size in the target
namespace. Otherwise, you wouldn’t need to prefix their names in the
XPath, since they would be unqualified. For more information on
qualified and unqualified element names, see Section 6.3 on p. 98.

14.1.3.1 Using xpathDefaultNamespace

You might expect to be able to declare a default namespace, such as
xmlns="http://datypic.com/prod", to avoid having to preﬁx
the element names. However, regular default namespace declarations
do not apply to XPath expressions. You can, however, use an
xpathDefaultNamespace attribute to designate the default namespace
374 Chapter 14 | Assertions

for all unpreﬁxed element names that are used in the XPath. As with
regular default namespace declarations, xpathDefaultNamespace
does not affect attribute names.
Example 14–19 uses the xpathDefaultNamespace attribute on
the schema element. This means that the element names number
and size in the XPaths are interpreted as being in the
http://datypic.com/prod namespace. It is not looking for
the dept attribute in that namespace. This is appropriate since the
attributeFormDefault is defaulting to unqualified, meaning
that locally declared attributes are in no namespace.

Example 14–19. Assertions using xpathDefaultNamespace

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://datypic.com/prod"
xmlns:prod="http://datypic.com/prod"
elementFormDefault="qualified"
xpathDefaultNamespace="http://datypic.com/prod">
<xs:element name="product" type="prod:ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:string" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
<xs:assert test="(@dept = 'ACC' and number > 500) or
(number < 300)"/>
<xs:assert test="if (@dept = 'ACC')
then not(size)
else true()"/>
</xs:complexType>
</xs:schema>

Instead of containing
a speciﬁc namespace name, the
xpathDefaultNamespace attribute can contain one of three special
keywords:

# ##targetNamespace, indicating that the default XPath

namespace is the same as the target namespace
14.2 | Conditional type assignment 375

# ##defaultNamespace, indicating that the default XPath

namespace is the namespace that is declared as the default
(with an xmlns= attribute)
# ##local, indicating that there is no default XPath namespace

In Example 14–19, changing the value to ##targetNamespace

would have the same meaning, since http://datypic.com/prod is
the target namespace.
It is most convenient to specify xpathDefaultNamespace on the
schema element, in which case it applies to all XPath expressions
in the schema document. It can also be speciﬁed on (and is relevant
to) the following elements:

# The assert and assertion elements, where it affects the

test attribute
# The alternative element, where it affects the test attribute
# The selector and field elements, where it affects the xpath
attribute

If xpathDefaultNamespace does not appear on one of these ele-

ments, the value is taken from the schema. If no value is provided for
the schema, the default value is no namespace, meaning that unpreﬁxed
element names are interpreted as not being in a namespace.

14.2 Conditional type assignment

Another new feature in XML Schema 1.1 is conditional type assignment,
which allows for the type of an element to be assigned based on the
values and/or presence of its attributes. A set of type alternatives
are speciﬁed using alternative elements, which appear as children
of the element declaration.
376 Chapter 14 | Assertions

14.2.1 The alternative element

The syntax of an alternative element is shown in Table 14–9. It
has a test attribute that speciﬁes the condition under which that type
is selected, in the form of an XPath 2.0 expression. It also indicates
the type if this condition is true, which is signiﬁed either by a type
attribute or an anonymous simpleType or complexType child.

Table 14–9 XSD Syntax: type alternative

Name
alternative

Parents
element

Attribute name Type Description

id ID Unique ID.
test limited XPath 2.0 The condition under
expression which this alternative
applies.
type QName The designated type.
Either a type
attribute or a
simpleType or
complexType child is
required.
xpathDefaultNamespace anyURI | The default name-
"##defaultNamespace" space for XPath
| expressions, see
"##targetNamespace" Section 14.1.3.1.
| "##local"

Content
annotation?, (simpleType | complexType)?
14.2 | Conditional type assignment 377

14.2.2 Specifying conditional type assignment

Example 14–20 shows an example of conditional type assignment
where there are three type alternatives:

1. The ﬁrst alternative indicates that if the value of the dept at-
tribute is ACC, the type assigned to the element declaration
is AccessoryType.
2. The second alternative indicates that if the value of the
dept attribute is either WMN or MEN, the type assigned is
ClothingType.
3. The third alternative has no test attribute, indicating that
ProductType is the default type if neither of the two other
alternatives apply.

Example 14–20. Conditional type assignment with default type

<xs:element name="product">
<xs:alternative test="@dept='ACC'" type="AccessoryType"/>
<xs:alternative test="@dept='WMN' or @dept='MEN'"
type="ClothingType"/>
<xs:alternative type="ProductType"/>
</xs:element>

The processor will run through the alternatives and choose the first
one in order whose test returns true. If none of the tests return true,
and there is a default type specified by an alternative with no test at-
tribute, as there is in Example 14–20, that alternative indicates the type.
It is also possible to use type alternatives even though you have al-
ready declared a type in the usual way, giving element a type attribute
or a simpleType or complexType child. An example is shown in
Example 14–21, where the type attribute is used on element to
assign the type ProductType to the element.
This is saying that ProductType is the type for product unless one
of the alternatives applies. It is similar to the previous example, but
defining it this way comes with the additional constraint that the type
378 Chapter 14 | Assertions

Example 14–21. Conditional type assignment with declared type

<xs:element name="product" type="ProductType">
<xs:alternative test="@dept='ACC'" type="AccessoryType"/>
<xs:alternative test="@dept='WMN' or @dept='MEN'"
type="ClothingType"/>
</xs:element>

alternatives must be derived from the declared type. In this case,

AccessoryType and ClothingType must be derived (directly or
indirectly) from ProductType.
A third possibility is that neither a declared type nor a default type
is speciﬁed, as in Example 14–22. In that case, if no alternatives apply,
a product element can contain any well-formed XML; its type is
anyType.

Example 14–22. Conditional type assignment with no default

<xs:element name="product">
<xs:alternative test="@dept='ACC'" type="AccessoryType"/>
<xs:alternative test="@dept='WMN' or @dept='MEN'"
type="ClothingType"/>
</xs:element>

14.2.3 Using XPath in the test attribute

Only a very small subset of the XPath 2.0 syntax is allowed in the test
attribute by default, although some implementations may choose to
support a more complete subset. The only XPath functions and
operators that are allowed are:

# and and or Boolean operators

# Comparison operators (=, !=, <, <=, >, >=)
# The not function
# The type constructor functions
14.2 | Conditional type assignment 379

In addition, the XPath expression can only access the attributes of

the element being validated. It cannot access its parent or ancestors,
and it cannot even access its children or descendants like assertions can.
Additional example values for the test attribute are shown in
Table 14–10.

Table 14–10 Examples of type alternative tests

Example Explanation
@foo The foo attribute exists.
not(@foo) The foo attribute does not exist.
@foo = 'yes' The foo attribute exists and contains a value
equal to yes.
@foo != 'yes' The foo attribute exists and contains a value
equal to something other than yes.
not(@foo = 'yes') The foo attribute exists and contains a value
equal to something other than yes, or the foo
attribute does not exist.
@foo = 'yes' and The foo attribute exists and contains a value
@bar = 'A' equal to yes, and the bar attribute exists and
contains a value equal to A.
@foo = 'yes' or @bar The foo attribute exists and contains a value
= 'A' equal to yes, or the bar attribute exists and
contains a value equal to A.
@foo != @bar The foo and bar attributes both exist and contain
different values.
@num > 12 The num attribute exists and contains a value
greater than 12.
xs:integer(@num) > The num attribute contains a value that is greater
xs:integer(@maxNum) than the value of the maxNum attribute, when they
are compared as integers.
380 Chapter 14 | Assertions

The last example in the table makes use of the integer type con-
structor function to ensure that the two values are being compared as
numbers. Otherwise, they would be compared as strings, and a string
100 is considered to be less than a string 99.
This highlights an important difference between assertions and
conditional type assignment with regard to types in XPath. In assertions,
type information is used in the XPath expressions because there is only
one type to consider. In the case of conditional type assignment, the
type has not even been assigned yet, so it is impossible to determine
the types of the attributes. When num is compared to a literal integer,
as in the second-to-last example, it is automatically converted to an
integer. But when num and maxNum are compared to each other, and
neither has a type, they need to be converted to integers to ensure that
they are compared appropriately.

14.2.4 The error type

A special built-in simple type named error (in the XML Schema
namespace) is deﬁned for use in conditional type assignment.1 It is
used to indicate that a validation error should be raised under certain
conditions.
Example 14–23 uses the error type to raise an error if the dept
attribute is equal to anything other than ACC, WMN, or MEN.

Example 14–23. Using the error type as the default

1. Technically, it can be used anywhere a type is normally used, but it is only

practically useful in conditional type assignment.
14.2 | Conditional type assignment 381

It doesn’t have to just be the last alternative, with no test, that uses
the error type. It can be used with a test, and as an earlier alternative,
as shown in Example 14–24. This example will raise an error if the
product does not have a dept attribute.

Example 14–24. Using the error type with a test

<xs:element name="product">
<xs:alternative test="not(@dept)" type="xs:error"/>
<xs:alternative test="@dept='ACC'" type="AccessoryType"/>
<xs:alternative test="@dept='WMN' or @dept='MEN'"
type="ClothingType"/>
</xs:element>

14.2.5 Conditional type assignment and namespaces

As with assertions, if the schema has a target namespace, you may need
to pay attention to namespace prefixes in your XPath. It is less
likely to be an issue because you are only using attribute names and it
is less common for attributes to be in the target namespace of the
schema. However, if an attribute is in a namespace, for example because
it is globally declared or because attributeFormDefault is set to
qualified, its name does need to be prefixed.
Example 14–25 shows a revised example where dept is globally de-
clared, which means that it is in the target namespace. The XPath must
now reflect the target namespace, so a prod prefix is added to dept
wherever it appears in the XPaths.
As with assertions, the xpathDefaultNamespace attribute affects
the XPaths in type alternatives. However, since it does not affect at-
tribute names, it is unlikely to be useful in conditional type assignment.
382 Chapter 14 | Assertions

Example 14–25. Conditional type assignment with globally declared attribute

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://datypic.com/prod"
xmlns:prod="http://datypic.com/prod"
elementFormDefault="qualified">
<xs:element name="product" type="prod:ProductType">
<xs:alternative test="@prod:dept='ACC'"
type="prod:AccessoryType"/>
<xs:alternative test="@prod:dept='WMN' or @prod:dept='MEN'"
type="prod:ClothingType"/>
</xs:element>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
<xs:attribute ref="prod:dept"/>
</xs:complexType>
<xs:attribute name="dept" type="xs:string"/>

</xs:schema>

14.2.6 Using inherited attributes in conditional type

assignment
Section 7.6 on p. 126 introduced the concept of inherited attributes,
which are relevant not just to an element but also to its descendant el-
ements. Example 14–26 shows an instance example where a language
attribute is intended to be inherited by the ﬁrst title element from
its parent workTitles.

Example 14–26. Instance with an inherited attribute

<workTitles language="en">
<title>Time Transfixed</title>
<title language="fr">La Durée poignardée</title>
</workTitles>
14.2 | Conditional type assignment 383

Example 14–27 is a schema that deﬁnes a type alternative for the

title element. It says that if the language is English, it has the
type EnglishTitleType, meaning that the contents can only contain
basic Latin characters (which is admittedly very simpliﬁed). Otherwise,
the title element has the less restrictive type TitleType that allows
any string.

Example 14–27. Assertions on inherited attribute

<xs:element name="workTitles" type="WorkTitlesType"/>
<xs:complexType name="WorkTitlesType">
<xs:sequence>
<xs:element name="title" maxOccurs="unbounded"
type="TitleType">
<xs:alternative test="@language='en'"
type="EnglishTitleType"/>
</xs:element>
</xs:sequence>
<xs:attribute name="language" type="xs:language"
inheritable="true"/>
</xs:complexType>
<xs:complexType name="EnglishTitleType">
<xs:simpleContent>
<xs:restriction base="TitleType">
<xs:pattern value="\p{IsBasicLatin}+"/>
</xs:restriction>
</xs:simpleContent>
</xs:complexType>
<xs:complexType name="TitleType">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="language" type="xs:language"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>

The interesting thing about this example is that although the ﬁrst
title element does not have a language attribute in its start tag in
the instance, in the XPath expressions it is treated as if it does,
because the attribute is inherited. The instance in Example 14–26 is
valid according to this schema.
Named groups

384
Chapter

ML Schema provides the ability to deﬁne groups of element

X and attribute declarations that are reusable by many complex

types. Named model groups are fragments of content
models, and attribute groups are bundles of attribute declara-
tions that are commonly used together. This chapter explains how to
deﬁne and reference named model groups and attribute groups.

15.1 Why named groups?

Deﬁning a group and reusing it many times has the advantages of

# Encouraging consistency across schema components

# Allowing the schema author to change multiple content models
in only one place
# Making it obvious that certain complex types share similar
children or attributes
# In many cases, making the schema less verbose

385
386 Chapter 15 | Named groups

15.2 Named model groups

Named model groups are reusable fragments of content models. For
example, if there are many type definitions in your schema that specify
a description, optionally followed by a comment, you could define a
group that represents this content model fragment. The group could
then be used by many complex type definitions. Named model groups
cannot contain attribute declarations; that is the purpose of attribute
groups, which are described in Section 15.3 on p. 392.
A note on terminology: XML Schema formally uses the term
“model group definition” for group elements, and “model group” for
all, choice, and sequence groups. In this book, group elements
are referred to as “named model groups” to reduce confusion associated
with the two similar terms.

15.2.1 Deﬁning named model groups

Named model groups are represented by group elements whose syntax
is shown in Table 15–1. Named model groups are required to have a
name, and that name must be unique among all the named model
groups in the schema. Named model groups are always deﬁned globally,
meaning that their parent is always schema.1
Named model groups may contain any content model. However, a
group cannot contain an element directly. Instead, group must have
one and only one model group (choice, sequence, or all) as a child.
There is an additional constraint that this one model group child cannot
have occurrence constraints (minOccurs and maxOccurs) like other
model groups. If you wish to indicate that the contents of the group
appear multiple times, you may put occurrence constraints on the
group reference, as described in Section 15.2.2.1 on p. 388.
Example 15–1 shows the deﬁnition of a named model group that
contains a description optionally followed by a comment. Note

1. Except in the case of a redeﬁne or override.

15.2 | Named model groups 387

Table 15–1 XSD Syntax: named model group deﬁnition

Name
group

Parents
schema, redefine, 1.1
override

Attribute name Type Description

id ID Unique ID.
name NCName Name of the named model group.
Content
annotation?, (all | choice | sequence)

Example 15–1. Named model group with local element declarations

that the group has one child, a sequence, which has no occurrence
constraints on it.
In Example 15–1, the element declarations are local in the group,
as evidenced by the appearance of a name attribute instead of a ref
attribute. It is also possible to use global element declarations, and
then reference them from the named model group, as shown in
Example 15–2.
Note that the type attribute is now in the global element declara-
tion, while minOccurs stays in the reference to the element declaration.
This is the same syntax as that used in complex types to reference
global element declarations. In fact, when a complex type references a
388 Chapter 15 | Named groups

Example 15–2. Named model group with element references

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="description" type="xs:string"/>

<xs:element name="comment" type="xs:string"/>

<xs:group name="DescriptionGroup">
<xs:sequence>
<xs:element ref="description"/>
<xs:element ref="comment" minOccurs="0"/>
</xs:sequence>
</xs:group>
</xs:schema>

named model group, it is as if the schema author cut and pasted the
contents of the group element into the complex type deﬁnition.
All local element declarations in the group become local to that
complex type.
Whether to use local element declarations in the group depends on
whether you want these element declarations to be local to the complex
type. For a complete discussion of global versus local element declara-
tions, see Section 6.1.3 on p. 95.

15.2.2 Referencing named model groups

Named model groups may be referenced in complex types and in other
groups. Since they are named global schema components, they may be
referenced not only from within the same schema document, but also
from other schema documents.

15.2.2.1 Group references

The syntax to reference a named model group is shown in Table 15–2.
Named model groups are referenced through the ref attribute, just
like other global schema components.
15.2 | Named model groups 389

Table 15–2 XSD Syntax: named model group reference

Name
group

Parents
complexType, restriction, extension, sequence, choice, 1.1
all

Attribute name Type Description

id ID Unique ID.
ref QName Name of the group being
referenced.
minOccurs nonNegativeInteger: 1 Minimum number of times
the group may appear.
maxOccurs nonNegativeInteger | Maximum number of times
"unbounded" : 1 the group may appear.
Content
annotation?

15.2.2.2 Referencing a named model group in a complex type

Example 15–3 shows the deﬁnition of the complex type
PurchaseOrderType that references DescriptionGroup.

Example 15–3. Referencing a group from a complex type deﬁnition

<xs:complexType name="PurchaseOrderType">
<xs:sequence>
<xs:group ref="DescriptionGroup" minOccurs="0"/>
<xs:element ref="items"/>

</xs:sequence>
</xs:complexType>
390 Chapter 15 | Named groups

Note that when referencing a group, minOccurs and maxOccurs

may be specified to indicate how many times the contents of the group
may appear. If minOccurs and maxOccurs are not specified, the default
for both values is 1. The value for minOccurs must be less than or
equal to the value for maxOccurs. This means that if a minOccurs
value is specified that is more than 1, maxOccurs must also appear,
with a value greater than or equal to minOccurs.
To illustrate how named model group references are handled,
Example 15–4 shows a content model equivalent to Example 15–3
(assuming it was referencing DescriptionGroup from Example 15–1)
but without a named model group being used. Note that the
minOccurs="0" constraint that appeared in the group reference now
appears in the sequence tag.

Example 15–4. Equivalent content model without a named model group

reference
<xs:complexType name="PurchaseOrderType">
<xs:sequence>
<xs:sequence minOccurs="0">
<xs:element name="description" type="xs:string"/>
<xs:element name="comment" type="xs:string" minOccurs="0"/>
</xs:sequence>
<xs:element ref="items"/>

</xs:sequence>
</xs:complexType>

In Example 15–3, the group is referenced within a sequence group.

Named model groups may be referenced anywhere in the content
model, and multiple named model group references (even to the same
group) are allowed.
A named model group may also be referenced at the top level of a
complexType, if the group contains the entire content model of the
type. A complex type may only directly contain either one named
model group (group) or one model group (all, sequence, or
15.2 | Named model groups 391

choice). Example 15–5 shows the deﬁnition of DescriptionType

that references DescriptionGroup at the top level.

Example 15–5. Group reference at the top level of the content model
<xs:complexType name="DescriptionType">
<xs:group ref="DescriptionGroup"/>
<xs:attribute ref="xml:lang"/>
</xs:complexType>

15.2.2.3 Using all in named model groups

An all group may appear in named model groups, but the additional
constraints on all groups still apply. Example 15–6 shows a legal use
of an all group within a named model group. In version 1.0, an
all group may only contain element declarations or references, not
other groups.

Example 15–6. Group with an all model group

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:group name="DescriptionGroup">
<xs:all>
<xs:element name="description" type="xs:string"/>
<xs:element name="comment" type="xs:string" minOccurs="0"/>
</xs:all>
</xs:group>
</xs:schema>

In version 1.0, since all groups can only appear at the top level of
a complex type, the only way to reference a named model group that
contains an all group is at the top level, as shown in Example 15–5.
Version 1.1 has relaxed this constraint, and it is possible to reference
a named model group that contains all from another all group,
provided that minOccurs and maxOccurs are 1 on the group reference.
However, it is still not legal to reference such a group from within a
choice or sequence.
392 Chapter 15 | Named groups

15.2.2.4 Named model groups referencing named model

groups
Named model groups may reference other named model groups. This
is shown in Example 15–7, where ProductPropertyGroup references
DescriptionGroup.

Example 15–7. Group reference from a group

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:group name="ProductPropertyGroup">
<xs:sequence>
<xs:group ref="DescriptionGroup"/>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:group>

<xs:group name="DescriptionGroup">
<xs:sequence>
<xs:element name="description" type="xs:string"/>
<xs:element name="comment" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:group>
</xs:schema>

The group references cannot be circular. That is, group a cannot

reference itself, and group a cannot reference group b if the latter ref-
erences group a, or another group c which references a, and so on. In
addition, groups may only contain references to other groups; they
cannot actually contain the deﬁnitions of groups, since all groups are
deﬁned globally.

15.3 Attribute groups

Attribute groups are used to represent groups of related attributes that
appear in many different complex types. For example, if the attributes
id, name, and version are used in multiple complex types in your
15.3 | Attribute groups 393

schema, it may be useful to deﬁne an attribute group that contains

declarations for these three attributes, and then reference the attribute
group in various complex type deﬁnitions.

15.3.1 Deﬁning attribute groups

Attribute groups are represented by attributeGroup elements, whose
syntax is shown in Table 15–3. Attribute groups are required to have
a name, and that name must be unique among all the attribute groups
in the schema. Attribute groups are always deﬁned globally, meaning
that their parent is always schema.1

Table 15–3 XSD Syntax: attribute group deﬁnition

Name
attributeGroup

Parents
schema, redefine, 1.1
override

Attribute name Type Description

id ID Unique ID.
name NCName Attribute group name.
Content
annotation?, (attribute | attributeGroup)*, anyAttribute?

Attribute groups may contain any number of attribute declarations

and references to other attribute groups, plus one optional attribute
wildcard. An attribute group cannot contain more than one attrib-
ute declaration with the same qualiﬁed name. In version 1.0, there is

1. Except in the case of a redeﬁne or override.

394 Chapter 15 | Named groups

an additional constraint that an attribute group cannot contain more

than one attribute declaration of type ID.
For example, if many complex type deﬁnitions will use the attributes
id and version, you could deﬁne an attribute group that contains
declarations for these two attributes, as shown in Example 15–8.

Example 15–8. Attribute group with local attribute declarations

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:attributeGroup name="IdentifierGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="version" type="xs:decimal"/>
</xs:attributeGroup>
</xs:schema>

In Example 15–8, the attributes are declared locally in the attribute

group, as evidenced by the appearance of a name attribute instead of a
ref attribute. It is also possible to use global attribute declarations and
reference them from the attribute group, as shown in Example 15–9.

Example 15–9. Attribute group with attribute references

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:attribute name="id" type="xs:ID"/>
<xs:attribute name="version" type="xs:decimal"/>

<xs:attributeGroup name="IdentifierGroup">
<xs:attribute ref="id" use="required"/>
<xs:attribute ref="version"/>
</xs:attributeGroup>
</xs:schema>

Note that the type attribute is now in the global attribute declara-
tion, while the use attribute stays in the reference to the attribute
declaration. This is the same way that complex types reference global
attribute declarations. In fact, when a complex type references an at-
tribute group, it is as if the schema author cut and pasted the contents
of the attribute group deﬁnition into the complex type deﬁnition. All
15.3 | Attribute groups 395

attributes that are declared locally in the attribute group become local
to that complex type.
Whether to declare attributes locally in the attribute group depends
on whether you want the attributes to be local to the complex type.
For a complete discussion of global versus local attribute declarations,
see Section 7.2.3 on p. 119.
Attribute groups may reference other attribute groups, as described
in the next section. Attribute groups may also contain one attribute
wildcard at the very end, as shown in Example 15–10. Attribute groups
are limited to one attribute wildcard because a complex type cannot
contain more than one attribute wildcard. See Section 12.7.3 on p. 298
for more information.

Example 15–10. Attribute group with a wildcard

15.3.2 Referencing attribute groups

Attribute groups may be referenced in complex types and in other at-
tribute groups. Since they are named global schema components, they
may be referenced not only from within the same schema document,
but also from other schema documents.

15.3.2.1 Attribute group references

The syntax to reference an attribute group is shown in Table 15–4.
Attribute groups are referenced using the ref attribute, just like other
global schema components.
396 Chapter 15 | Named groups

Table 15–4 XSD Syntax: attribute group reference

Name
attributeGroup

Parents
complexType, restriction, extension, attributeGroup

Attribute name Type Description

id ID Unique ID.
ref QName Name of the attribute group being referenced.
Content
annotation?

15.3.2.2 Referencing attribute groups in complex types

Example 15–11 shows the deﬁnition of the complex type ProductType
that references the attribute group IdentifierGroup.

Example 15–11. Referencing an attribute group from a complex type deﬁnition

<xs:complexType name="ProductType">
<xs:sequence>

</xs:sequence>
<xs:attributeGroup ref="IdentifierGroup"/>
<xs:attribute name="effDate" type="xs:date"/>
</xs:complexType>

As shown in the example, references to attribute groups must appear

after the content model (a sequence group in this example). They
may appear before, after, or in between attribute declarations. The order
of attribute groups (and attributes) in a complex type is insigniﬁcant.
To illustrate how XML Schema handles attribute group references,
Example 15–12 shows a complex type deﬁnition that is equivalent
15.3 | Attribute groups 397

to Example 15–11 (assuming it was referencing IdentifierGroup

from Example 15–8) but without an attribute group being used. It is
as if the schema author cut and pasted the attribute declarations from
the attribute group.

Example 15–12. Equivalent complex type without an attribute group

<xs:complexType name="ProductType">

<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="version" type="xs:decimal"/>
<xs:attribute name="effDate" type="xs:date"/>
</xs:complexType>

15.3.2.3 Duplicate attribute names

It is illegal to declare two attributes with the same qualified name in
the same complex type. When using attribute groups, be sure that the
referenced attribute group does not declare an attribute that is already
declared directly in your complex type definition. Also, when referenc-
ing more than one attribute group in a complex type definition, be
sure that the two attribute groups do not contain attribute declarations
with identical names.
In Example 15–13, each of the attribute groups IdentifierGroup
and VersionGroup contain a declaration of version, and the defini-
tion of ProductType references both attribute groups. This results in
an illegal duplication of the version attribute for ProductType.

Example 15–13. Illegal duplication of attributes

(Continues)
398 Chapter 15 | Named groups

Example 15–13. (Continued)

<xs:attributeGroup name="VersionGroup">
<xs:attribute name="version" type="xs:decimal"/>
</xs:attributeGroup>

<xs:complexType name="ProductType">
<xs:attributeGroup ref="IdentifierGroup"/>
<xs:attributeGroup ref="VersionGroup"/>
</xs:complexType>
</xs:schema>

15.3.2.4 Duplicate attribute wildcard handling

Each attribute group definition may only contain one attribute wildcard.
However, it is possible for a complex type definition to reference two
attribute groups, each of which contains an attribute wildcard. The
complex type definition may also have a “local” wildcard, that is, an
anyAttribute child.
In such a case, an effective attribute wildcard is determined by taking
the intersection of the constraints of all of the wildcards, including the
local wildcard and any that were included from attribute groups. A
simple rule of thumb is that any replacement attribute must conform
to all of the attribute wildcards for that complex type.
The value of processContents for this effective wildcard is the
value of processContents for the local wildcard, if it is present. If it
is not, the schema processor takes the value of processContents from
the first attribute wildcard among the attributeGroup children.
For more information on attribute wildcards, see Section 12.7.3 on
p. 298.

15.3.2.5 Attribute groups referencing attribute groups

Deﬁnitions of attribute groups may also reference other attribute groups.
This is shown in Example 15–14, where HeaderGroup references
15.3 | Attribute groups 399

IdentifierGroup. Attribute groups may only contain references to

other attribute groups; they cannot actually contain the deﬁnitions of
attribute groups, since all attribute groups are deﬁned globally.

Example 15–14. Attribute group referencing an attribute group

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:attributeGroup name="HeaderGroup">
<xs:attributeGroup ref="IdentifierGroup"/>
<xs:attribute ref="xml:lang"/>
</xs:attributeGroup>

<xs:attributeGroup name="IdentifierGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="version" type="xs:decimal"/>
</xs:attributeGroup>

</xs:schema>

In version 1.0, attribute group references cannot be circular.

That is, HeaderGroup cannot reference IdentifierGroup if
IdentifierGroup also has a reference back to HeaderGroup (either
directly or through a chain of attribute group references). In version 1.1,
circular references are permitted, and the effect is that all of the at-
tributes of both HeaderGroup and IdentifierGroup are included.

15.3.3 The default attribute group

In version 1.1, you can indicate that an attribute group is the default
attribute group by specifying its name in the defaultAttributes
attribute on the schema element. If such a default attribute group is
deﬁned, the attributes declared in that group will automatically be al-
lowed for every complex type in the schema document, unless you
speciﬁcally disallow it.
400 Chapter 15 | Named groups

Example 15–15 shows a schema that deﬁnes a default attribute

group. The defaultAttributes attribute on the schema names
prod:IdentifierGroup as the default attribute group. It uses the
prod preﬁx because the namespace must be taken into account when
referencing any other schema component.

Example 15–15. Default attribute group

<xs:attributeGroup name="IdentifierGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="version" type="xs:decimal"/>
</xs:attributeGroup>

<xs:complexType name="ProductType">
<xs:sequence>

</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
</xs:complexType>

<xs:complexType name="CatalogType" defaultAttributesApply="false">

<xs:sequence>

</xs:sequence>
<xs:attribute name="catalogNumber" type="xs:integer"/>
</xs:complexType>
</xs:schema>

The effect of declaring the default attribute group is as if

ProductType had included a reference to the attribute group; id and
version attributes can appear on instances of this type. ProductType
can also contain other attribute declarations or attribute group
references. CatalogType, however, does not use the default attrib-
ute group, because it speciﬁcally disallows it using the attribute
defaultAttributesApply="false".
15.4 | Named groups and namespaces 401

15.4 Named groups and namespaces

Like most global components, named groups take on the target
namespace of the schema document, so they need to be referenced with
their qualiﬁed names. Example 15–16 works because the target
namespace is the same as the default namespace, so that when the
groups are referenced using the ref attribute, it is looking for their
names in the target namespace.

Example 15–16. Named groups with a target namespace

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">
<xs:group name="DescriptionGroup">
<xs:sequence>
<xs:element name="description" type="xs:string"/>
<xs:element name="comment" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:group>
<xs:attributeGroup name="IdentifierGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="version" type="xs:decimal"/>
<xs:anyAttribute namespace="##other"/>
</xs:attributeGroup>
<xs:complexType name="PurchaseOrderType">
<xs:sequence>
<xs:group ref="DescriptionGroup" minOccurs="0"/>
<xs:element ref="items"/>

</xs:sequence>
<xs:attributeGroup ref="IdentifierGroup"/>
</xs:complexType>
</xs:schema>

When referencing named groups that are deﬁned in a different

namespace, the names need to be appropriately preﬁxed. This is shown
in Example 15–17 where the values of the ref attribute use preﬁxed
names prod:DescriptionGroup and prod:IdentifierGroup
because that namespace is not the default in ord.xsd.
402 Chapter 15 | Named groups

Example 15–17. Named groups across namespaces

ord.xsd

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
xmlns="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod"
targetNamespace="http://datypic.com/ord">
<xs:import namespace="http://datypic.com/prod"
schemaLocation="prod.xsd"/>
<xs:complexType name="PurchaseOrderType">
<xs:sequence>
<xs:group ref="prod:DescriptionGroup" minOccurs="0"/>
<xs:element ref="items"/>

</xs:sequence>
<xs:attributeGroup ref="prod:IdentifierGroup"/>
</xs:complexType>
</xs:schema>

prod.xsd

The locally declared elements have qualiﬁed names, that is, they are
in a namespace and elementFormDefault is set to qualified.
Note that the names of those elements declared in prod.xsd
15.5 | Design hint: Named groups or complex type derivations? 403

will retain the target namespace of that schema document,

http://datypic.com/prod. Being referenced as a group from a
schema document with a different target namespace does not change
their namespace, even though they are locally declared.

15.5 Design hint: Named groups or complex

type derivations?
There may be cases where it is unclear when to define a named group
and when to use complex type derivation. Complex type derivation,
like named groups, serves the purpose of allowing reuse of content
models and attributes.
For example, if there are several places in your purchase order where
you allow a description optionally followed by a comment, you
could define a named model group to represent this. You could then
reuse this group in the content model of several, possibly dissimilar
complex types.
However, it is also possible to represent this differently. You can
define a base complex type that has the descriptive element declarations,
and several complex types that extend the base type to specify additional
children. Example 15–18 illustrates this approach.
The same dilemma can apply to attributes, which can be reused both
through attribute groups and through complex type extensions.
Either of these two methods is legal, but each has its advantages and
disadvantages. Use a named model group if:

# The fragment you want to reuse does not appear ﬁrst in some
of the types’ content models. This is because extension adds a
derived type’s content model after its base type’s content model
as if they were in a sequence group. In the above example, if
the descriptive information did not come ﬁrst, it would have
been impossible to use extension.
404 Chapter 15 | Named groups

Example 15–18. Reusing content model fragments through derivation

<xs:complexType name="DescribedType">
<xs:sequence>
<xs:element name="description" type="xs:string"/>
<xs:element name="comment" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="PurchaseOrderType">
<xs:complexContent>
<xs:extension base="DescribedType">
<xs:sequence>
<xs:element ref="items"/>

</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>

<xs:complexType name="ItemsType">
<xs:complexContent>
<xs:extension base="DescribedType">
<xs:sequence>
<xs:element ref="product" maxOccurs="unbounded"/>

</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>

# The types are dissimilar concepts that just happen to share a

small content model fragment. It may not be intuitive to derive
them from the same base type.

On the other hand, use a complex type derivation if:

# The reusable content model fragments appear at the beginning

of the content model.
# The types have mostly the same content model with just a few
differing element or attribute declarations.
Substitution
groups

406
Chapter

ubstitution groups are a ﬂexible way to designate element

S declarations as substitutes for other element declarations in

content models. You can easily designate new element dec-
larations as substitutes, from other schema documents and
even other namespaces, without changing the original content model.
This chapter describes how to deﬁne and use substitution groups.

16.1 Why substitution groups?

Substitution groups are useful for simplifying content models and
making them more extensible and ﬂexible. Suppose you have a section
of a purchase order that lists products of various kinds. You could use
repeating product elements, each having an attribute or child element
to indicate what kind of a product it is. However, you may also want to
allow different content models for different kinds of products. For ex-
ample, shirts have a mandatory size, while umbrellas are not allowed
to have a size speciﬁed. Also, you may want to use descriptive element

407
408 Chapter 16 | Substitution groups

names that indicate the kind of product. Lastly, you may want the
deﬁnition to be ﬂexible enough to accept new kinds of products without
altering the original schema. This is a perfect application for substitution
groups.

16.2 The substitution group hierarchy

Each substitution group consists of a head and one or more members.
Wherever the head element declaration is referenced in a content
model, one of the member element declarations may be substituted in
place of the head. For example, the head of your substitution group
might be product, with the members being the different kinds of
products such as shirt, hat, and umbrella. This hierarchy is depicted
in Figure 16–1.
This means that anywhere product appears in a content model, any
of product, shirt, hat, or umbrella may appear in the instance.
The members themselves cannot be substituted for each other. For
example, if shirt appears in a content model, umbrella cannot be
substituted in its place.
Substitution groups form a hierarchy. There can be multiple levels
of substitution, and a member of one group may be the head of
another group. Other element declarations might have shirt as their
substitution group head, as shown in Figure 16–2. In this case, tShirt
and blouse may substitute for either product or shirt.

Figure 16–1 Substitution group hierarchy

16.3 | Declaring a substitution group 409

Figure 16–2 Multilevel substitution group hierarchy

16.3 Declaring a substitution group

Example 16–1 shows the ItemsType complex type that contains a
product element declaration. The product element declaration will

Example 16–1. The head of a substitution group

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="items" type="ItemsType"/>
<xs:complexType name="ItemsType">
<xs:sequence>
<xs:element ref="product" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>

<xs:element name="product" type="ProductType"/>

be the head of the substitution group, although there is nothing special

about the product declaration to indicate this. It is signiﬁcant,
however, that it is a global declaration, since only a global element
declaration can be the head of a substitution group.
Example 16–2 shows the three element declarations that are members
of the substitution group. The product, shirt, hat, and umbrella
element declarations can be used interchangeably wherever product
appears in any content model. Each of the declarations uses the
substitutionGroup attribute to indicate that it is substitutable for

Example 16–2. Members of a substitution group

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="shirt" type="ShirtType"

substitutionGroup="product"/>
<xs:complexType name="ShirtType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:sequence>
<xs:element name="size" type="ShirtSizeType"/>
<xs:element name="color" type="ColorType"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>

<xs:element name="hat" substitutionGroup="product">

<xs:complexType>
<xs:complexContent>
<xs:extension base="ProductType">
<xs:sequence>
<xs:element name="size" type="HatSizeType"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
</xs:element>

<xs:element name="umbrella" substitutionGroup="product"/>

</xs:schema>
16.3 | Declaring a substitution group 411

product. Members of a substitution group must be globally declared;

it is not legal to use the substitutionGroup attribute in local element
declarations or element references.
Example 16–3 shows a valid instance. Since items can contain an
unlimited number of product elements, any combination of product,
shirt, hat, and umbrella may appear in items, in any order. Keep
in mind that everywhere a reference to the global product element
declaration appears in a content model, it can be replaced by these
other element declarations because the substitution group is in effect.
If there is a content model where you only want product elements to
be valid, with no substitution, you can get around this by supplying a
local product element declaration in that content model.

Example 16–3. Instance of items

<items>
<product>
<number>999</number>
<name>Special Seasonal</name>
</product>
<shirt>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size>10</size>
<color value="blue"/>
</shirt>
<hat>
<number>563</number>
<name>Ten-Gallon Hat</name>
<size>L</size>
</hat>
<umbrella>
<number>443</number>
<name>Deluxe Golf Umbrella</name>
</umbrella>
</items>

The substitutionGroup attribute takes a QName as its value. This

means that if the head’s element name is in a namespace (i.e., the
schema document in which it is declared has a target namespace), you
412 Chapter 16 | Substitution groups

must preﬁx the element name you specify in the substitutionGroup

attribute. You can have an element declaration from a different
namespace as the head of your substitution group, provided that
the namespace of that element declaration has been imported into your
schema document.

16.4 Type constraints for substitution groups

In Example 16–2, the complex types of shirt and hat are both derived
from the type of product. This is a requirement; members of a substi-
tution group must have types that are either the same as the type of
the head, or derived from it by either extension or restriction. They
can be directly derived from it, or derived indirectly through multiple
levels of restriction and/or extension.
In our example, shirt is assigned a named type, ShirtType, which
extends ProductType, while hat has an anonymous type, also an ex-
tension of ProductType. The third element declaration, umbrella,
does not specify a type. If a substitution group member is speciﬁed
without a type, it automatically takes on the type of the head of its
substitution group. Therefore, in this case, umbrella has the type
ProductType.
This type constraint on the members of a substitution group is not
as restrictive as it seems. You can make the type of the head very
generic, allowing almost anything to be derived from it. In fact, you
do not have to specify a type in the head element declaration at all,
which gives it the generic anyType. Since all types are derived (directly
or indirectly) from anyType, the members of the substitution group
in this case can have any types, including simple types.
The previous examples in this chapter use complex types, but substi-
tution groups may also be used for element declarations with simple
types. If a member of a substitution group has a simple type, it must
be a restriction of (or the same as) the simple type of the head.
16.5 | Members in multiple groups 413

Example 16–4 shows a substitution group of element declarations

with simple types. Note that the head element declaration, number,
does not specify a type, meaning that the members of the substitution
group may have any type.

Example 16–4. Substitution group with simple types

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="number"/>
<xs:element name="skuNumber" type="xs:string"
substitutionGroup="number"/>
<xs:element name="productID" type="xs:integer"
substitutionGroup="number"/>
</xs:schema>

16.5 Members in multiple groups

In version 1.0, each element declaration can only be a member of one
substitution group. In version 1.1, it is possible for an element declara-
tion to be a member of many substitution groups. This is done
by specifying a space-separated list of names as the value of the
substitutionGroup attribute, as shown in Example 16–5.

Example 16–5. A member of two substitution groups

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="product" type="ProductType"/>
<xs:element name="discontinuedProduct" type="ProductType"/>
<xs:element name="hat" type="HatType"
substitutionGroup="product"/>
<xs:element name="shirt" type="ShirtType"
substitutionGroup="product"/>
<xs:element name="umbrella" type="UmbrellaType"
substitutionGroup="product discontinuedProduct"/>

</xs:schema>
414 Chapter 16 | Substitution groups

In this example, there are two head element declarations: product

and discontinuedProduct. The hat and shirt declarations are in
just the product substitution group, but umbrella is in both. This
means that umbrella can appear anywhere either of these two elements
can appear. The type restrictions described in the previous section still
apply, so it is generally necessary for the two head element declarations
to use same type (which can be anyType) or types that are related to
each other by derivation.

16.6 Alternatives to substitution groups

Substitution groups are very useful, but as you may have guessed, there
are other methods of achieving similar goals. This section will take a
closer look at two of these methods.

16.6.1 Reusable choice groups

The behavior of substitution groups is similar to that of named choice
groups. In the previous examples, we said that wherever product can
appear, it can really be product, shirt, hat, or umbrella. This
choice can also be represented by a named choice group that lists the
relevant element declarations. Example 16–6 shows the deﬁnition
of a named model group that allows a choice of product or shirt or
hat or umbrella. This named model group is then referenced in the
ItemsType deﬁnition.
It is easy to see the list of elements that are allowed, because they are
all declared within the named model group. This can be an advantage
if the list of member element declarations will not change. On the
other hand, if you want to be able to add new element declarations as
needed, from a variety of schema documents, using substitution groups
16.6 | Alternatives to substitution groups 415

Example 16–6. Using a choice group

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="items" type="ItemsType"/>

<xs:complexType name="ItemsType">
<xs:group ref="ProductGroup" maxOccurs="unbounded"/>
</xs:complexType>

<xs:group name="ProductGroup">
<xs:choice>
<xs:element name="product" type="ProductType"/>
<xs:element name="shirt" type="ShirtType"/>
<xs:element name="hat" type="HatType"/>
<xs:element name="umbrella" type="ProductType"/>
</xs:choice>
</xs:group>

</xs:schema>

is a much better approach. This is because named choice groups are

more rigid. While you can use redeﬁning or overriding to extend a
named choice group, it is more cumbersome and can only be done
in schema documents with the same target namespace.

16.6.2 Substituting a derived type in the instance

Another alternative to using substitution groups is to repeat the same
element name for all of the items (in this case product), and use
xsi:type attributes to distinguish between the different types of
products. Using this approach, we would not declare shirt, hat, or
umbrella elements at all, just their types, as shown in Example 16–7.
Remember, it is acceptable to substitute a derived type in an instance
if you specify the xsi:type attribute. This is described in more detail
in Section 13.6 on p. 341.
416 Chapter 16 | Substitution groups

Example 16–7. Deﬁning derived types

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="ShirtType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:sequence>
<xs:element name="size" type="ShirtSizeType"/>
<xs:element name="color" type="ColorType"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>

<xs:complexType name="HatType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:sequence>
<xs:element name="size" type="HatSizeType"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>

<xs:complexType name="UmbrellaType">
<xs:complexContent>
<xs:extension base="ProductType"/>
</xs:complexContent>
</xs:complexType>

</xs:schema>

Example 16–8 shows a valid instance for this approach. The product
element is repeated many times, with the xsi:type attribute
distinguishing between the different product types.
The advantage of this approach is that the instance may be easier to
process. A Java program or XSLT stylesheet that handles this instance
can treat all product types the same based on their element name, but
also distinguish between them using the value of xsi:type if necessary.
16.6 | Alternatives to substitution groups 417

Example 16–8. Valid instance using derived types

<items xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<product>
<number>999</number>
<name>Special Seasonal</name>
</product>
<product xsi:type="ShirtType">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size>10</size>
<color value="blue"/>
</product>
<product xsi:type="HatType">
<number>563</number>
<name>Ten-Gallon Hat</name>
<size>L</size>
</product>
<product xsi:type="UmbrellaType">
<number>443</number>
<name>Deluxe Golf Umbrella</name>
</product>
</items>

Using substitution groups, with some XML technologies, if one

wanted to retrieve all the products, it would be necessary to select them
based on their position in the instance (e.g., all children of items)
rather than their element name (product), which could be less
reliable. This distinction is not as important with schema-aware tech-
nologies like XSLT 2.0, with which you can refer generically to
schema-element(product) which means “product or any of its
substitutes.”
Type substitution also has some disadvantages. It works ﬁne for
schema validation, but it is impossible to write a DTD that would
validate this instance to the same degree. Also, it looks slightly more
complicated and requires a declaration of the XML Schema Instance
Namespace, which adds an extra dependency.
418 Chapter 16 | Substitution groups

16.7 Controlling substitution groups

Substitution groups are a powerful tool, and you may want to control
their use. Three attributes of element declarations control the creation
and use of substitutions.

# The final attribute limits the declaration of substitution groups

in schemas.
# The block attribute limits the use of substituted elements in
instances.
# The abstract attribute forces element substitution in instances.

These three attributes only apply to global element declarations,

since local element declarations can never serve as heads of substitution
groups.

16.7.1 final: Preventing substitution group

declarations
You may want to prevent other people from deﬁning schemas that
use your element declaration as the head of a substitution group. This
is accomplished using the final attribute in the element declaration,
which may have one of the following values:

# #all, in version 1.0, prevents any other element declaration

from using your element declaration as a substitution group
head. In version 1.1, it only prevents element declarations whose
types are extensions or restrictions from being in the substitution
group, but allows element declarations that have the same type
as the head.
# extension prevents extension in substitution group members.
An element declaration that uses your element declaration as
its substitution group head must have a type that is either the
16.7 | Controlling substitution groups 419

same as, or derived by restriction from, the type of your element

declaration.
# restriction prevents restriction in substitution group mem-
bers. An element declaration that uses your element declaration
as its substitution group head must have a type that is either the
same as, or derived by extension from, the type of your element
declaration.
# extension restriction and restriction extension
are values that have the same effect as #all.
# "" (an empty string) means that there are no restrictions. This
value is useful for overriding the value of finalDefault, as
described below.
# If no final attribute is speciﬁed, it takes its value from
the finalDefault attribute of the schema element. If neither
final nor finalDefault is speciﬁed, there are no restrictions
on substitutions for that element declaration.

Example 16–9 shows four element declarations that control the use
of substitution groups. With this declaration of product, the schema
shown in Example 16–2 would have been illegal, since it attempts to
use the product element declaration as the head of a substitution
group.

Example 16–9. Using final to control substitution group declaration

<xs:element name="product" type="ProductType" final="#all"/>
<xs:element name="items" type="ItemsType" final="extension"/>
<xs:element name="color" type="ColorType" final="restriction"/>
<xs:element name="size" type="SizeType" final=""/>

16.7.2 block: Blocking substitution in instances

In the previous section, we saw how to prevent a schema from contain-
ing an element declaration that uses your element declaration as its
substitution group head. There is another way to control element
420 Chapter 16 | Substitution groups

substitutions, this time in the instance. This is accomplished by using

the block attribute, and assigning the value substitution (or #all)
to it. Example 16–10 shows element declarations that use the block
attribute.
With this declaration of product, the schema shown in Exam-
ple 16–2 would have been legal, but the instance in Example 16–3
would have been illegal. This is the extremely subtle difference between
the final and block attributes as they relate to substitution groups.
The block attribute also accepts the values extension and
restriction, as described in Section 13.7.3 on p. 346. These values
can also affect substitution groups, in that they can block members
whose types are derived by either extension or restriction. For example,
if Example 16–2 were changed to add block="extension" to the
product declaration, that would make substituting shirt or hat in-
valid in the instance, because their types are derived by extension from
the type of product.

Example 16–10. Using block to prevent substitution group use

<xs:element name="product" type="ProductType" block="#all"/>
<xs:element name="hat" type="HatType" block="substitution"/>

16.7.3 abstract: Forcing substitution

An element declaration may be abstract, meaning that its sole purpose
is to serve as the head of a substitution group. Elements declared as
abstract can never appear in instance documents. This is indicated by
the abstract attribute in the element declaration. Example 16–11
shows an abstract element declaration for product. With this declara-
tion, Example 16–3 would be invalid because a product element ap-
pears in the instance. Instead, only shirt, hat, and umbrella would
be able to appear in items.

Example 16–11. An abstract element declaration

<xs:element name="product" type="ProductType" abstract="true"/>
Identity
constraints

422
Chapter

dentity constraints allow you to uniquely identify nodes in

I a document and ensure the integrity of references between

them. This chapter explains how to deﬁne and use identity
constraints.

17.1 Identity constraint categories

There are three categories of identity constraints.

# Uniqueness constraints enforce that a value (or combination of

values) is unique within a speciﬁed scope. For example, all
product numbers must be unique within a catalog.
# Key constraints also enforce uniqueness, and additionally require
that all values be present. For example, every product must
have a number and it must be unique within a catalog.
# Key references enforce that a value (or combination of values)
corresponds to a value represented by a key or uniqueness

423
424 Chapter 17 | Identity constraints

constraint. For example, for every product number that appears

as an item in a purchase order, there must be a corresponding
product number in the product description section.

17.2 Design hint: Should I use ID / IDREF or

key / keyref?
The identity constraints described in this chapter are much more
powerful than using attributes of types ID and IDREF. Limitations of
ID and IDREF include:

# They are recommended for use only for attributes, not elements.
# They are scoped to the entire document only.
# They are based on one value, as opposed to multiﬁeld keys.
# They require ID or IDREF to be the type of the attribute,
precluding data validation of that attribute.
# They are based on string equality, as opposed to value equality.
# They require that the values be based on XML names, meaning
they must start with a letter and can only contain letters, digits,
and a few punctuation marks.

However, if ID and IDREF fulﬁll your requirements, there is no

reason not to use them, particularly when representing simple cross-
references in narrative documents or converting DTDs that are already
in use.

17.3 Structure of an identity constraint

The three categories of identity constraints are similar in their deﬁni-
tions and associated rules. This section describes the basic structure of
identity constraints. Example 17–1 shows an instance that contains
product catalog information.
17.3 | Structure of an identity constraint 425

Example 17–1. Product catalog information

<catalog>
<department number="021">
<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<price currency="USD">29.99</price>
</product>
<product>
<number>563</number>
<name>Ten-Gallon Hat</name>
<price currency="USD">69.99</price>
</product>
<product>
<number>443</number>
<name>Deluxe Golf Umbrella</name>
<price currency="USD">49.99</price>
</product>
</department>
</catalog>

Example 17–2 shows the deﬁnition of a uniqueness constraint that

might be applied to the instance in Example 17–1.

Example 17–2. A uniqueness constraint

<xs:element name="catalog" type="CatalogType">
<xs:unique name="prodNumKey">
<xs:selector xpath="*/product"/>
<xs:field xpath="number"/>
</xs:unique>
</xs:element>

All three categories of identity constraints are deﬁned entirely within

an element declaration. It can be either a global or local element decla-
ration, but it cannot be an element reference. Identity constraints must
be deﬁned at the end of the element declaration, after any simpleType
or complexType child. There can be multiple identity constraints in
a single element declaration.
426 Chapter 17 | Identity constraints

Every identity constraint has a name, which takes on the target

namespace of the schema document. The qualiﬁed name must
be unique among all identity constraints of all categories within the
entire schema. For example, it would be illegal to have a key con-
straint named customerNumber and a uniqueness constraint named
customerNumber in the same schema, even if they were scoped to
different elements.
There are three parts to an identity constraint deﬁnition.

1. The scope is an element whose declaration contains the con-

straint. In our example, a catalog element is the scope. It is
perfectly valid to have two products with the same number if
they are contained in two different catalog elements.
2. The selector serves to select all the nodes to which the constraint
applies. In our example, the selector value is */product, which
selects all the product grandchildren of catalog.
3. The one or more fields are the element and attribute values
whose combination must be unique among the selected nodes.
There can be only one instance of the field per selected
node. In our example, there is one field specified: the number
child of each product element.

17.4 Uniqueness constraints

A uniqueness constraint is used to validate that the values of certain
elements or attributes are unique within a particular scope. This is
represented by a unique element, whose syntax is shown in Table 17–1.
In Example 17–2, we used a uniqueness constraint to ensure that all
the product numbers in the catalog are unique. It is also possible to
ensure uniqueness of a combination of multiple ﬁelds. In the instance
shown in Example 17–3, each product may have an effective date.
17.4 | Uniqueness constraints 427

Table 17–1 XSD Syntax: uniqueness constraint

Name
unique

Parents
element

Attribute name Type Description

id ID Unique ID.
name NCName Unique name.
Content
annotation?, selector, field+

Example 17–3. Product catalog information, revisited

<catalog>
<department number="021">
<product effDate="2000-02-27">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<price currency="USD">29.99</price>
</product>
<product effDate="2001-04-12">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<price currency="USD">39.99</price>
</product>
<product effDate="2001-04-12">
<number>563</number>
<name>Ten-Gallon Hat</name>
<price currency="USD">69.99</price>
</product>
<product>
<number>443</number>
<name>Deluxe Golf Umbrella</name>
<price currency="USD">49.99</price>
</product>
</department>
</catalog>
428 Chapter 17 | Identity constraints

It is valid for two products to have the same number, as long as they
have different effective dates. In other words, we want to validate that
the combinations of number and effDate are unique. Example 17–4
shows the uniqueness constraint that accomplishes this.

Example 17–4. Constraining uniqueness of two combined ﬁelds

<xs:element name="catalog" type="CatalogType">
<xs:unique name="dateAndProdNumKey">
<xs:selector xpath="department/product"/>
<xs:field xpath="number"/>
<xs:field xpath="@effDate"/>
</xs:unique>
</xs:element>

Note that this example works because both number and effDate
are subordinate to the product elements. Using the instance in Exam-
ple 17–3, it would be invalid to define a multifield uniqueness con-
straint on the department number and the product number. If you
defined the selector to select all departments, the product/number
field would yield more than one field node per selected node, which is
not permitted. If you defined the selector to select all products, you
would have to access an ancestor node to get the department number,
which is not permitted.
You can get around this by defining two uniqueness constraints: one
in the scope of catalog to ensure that all department numbers are
unique within a catalog, and another in the scope of department
to ensure that all product numbers are unique within a department.

17.5 Key constraints

A key constraint is similar to a uniqueness constraint in that the com-
bined ﬁelds in the key must be unique. Key constraints have an addi-
tional requirement that all of the ﬁeld values must be present in the
17.5 | Key constraints 429

document. Therefore, you should not deﬁne keys on elements or at-

tributes that are optional. In addition, the ﬁelds on which the key is
deﬁned cannot be nillable.
Key constraints are represented by key elements, whose syntax is
shown in Table 17–2. It is identical to that of the unique elements.

Table 17–2 XSD Syntax: key constraint

Name
key

Parents
element

Attribute name Type Description

id ID Unique ID.
name NCName Unique name.
Content
annotation?, selector, field+

Example 17–5 changes Example 17–2 to be a key constraint instead

of a uniqueness constraint. In this case, every product element in the
instance would be required to have a number child, regardless of
whether the complex type of product requires it. The values of those
number children have to be unique within the scope of catalog.

Example 17–5. Deﬁning a key on product number

<xs:element name="catalog" type="CatalogType">
<xs:key name="prodNumKey">
<xs:selector xpath="*/product"/>
<xs:field xpath="number"/>
</xs:key>
</xs:element>
430 Chapter 17 | Identity constraints

17.6 Key references

Key references are used to ensure that there is a match between two
sets of values in an instance. They are similar to foreign keys in
databases. Key references are represented by keyref elements, whose
syntax is shown in Table 17–3.

Table 17–3 XSD Syntax: key reference

Name
keyref

Parents
element

Attribute name Type Description

id ID Unique ID.
name NCName Unique name.
refer QName Name of the key/uniqueness constraint
being referenced.
Content
annotation?, selector, field+

The refer attribute is used to reference a key or uniqueness con-

straint by its qualified name. If the constraint is defined in a schema
document with a target namespace, the refer attribute must reference
a name that is either prefixed or in the scope of a default namespace
declaration.
Suppose we have an order for three items: two shirts and one sweater,
as shown in Example 17–6. The two shirts are the same except for their
color, so they both have the same product number. All the descriptive
product information appears at the end of the order. We want a way
to ensure that every item in the order has a corresponding product
description in the document.
17.6 | Key references 431

Example 17–6. Key references

<order>
<number>123ABBCC123</number>
<items>
<shirt number="557">
<quantity>1</quantity>
<color value="blue"/>
</shirt>
<shirt number="557">
<quantity>1</quantity>
<color value="sage"/>
</shirt>
<hat number="563">
<quantity>1</quantity>
</hat>
</items>
<products>
<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<price currency="USD">29.99</price>
</product>
<product>
<number>563</number>
<name>Ten-Gallon Hat</name>
<price currency="USD">69.99</price>
</product>
</products>
</order>

Example 17–7 shows the definition of a key reference and its associ-
ated key. In this example, the number attribute of any child of items
must match a number child of a product element. The meaning of
the XPath syntax will be described in detail later in this chapter.
Note that the key reference field values are not required to be unique;
that is not their purpose. It is valid to have duplicate shirt numbers in
the items section.
As with key and uniqueness constraints, key references can be on
multiple fields. There must be an equal number of fields in the
key reference as there are in the key or uniqueness constraint that it
432 Chapter 17 | Identity constraints

Example 17–7. Deﬁning a key reference on product number

<xs:element name="order" type="OrderType">
<xs:keyref name="prodNumKeyRef" refer="prodNumKey">
<xs:selector xpath="items/*"/>
<xs:field xpath="@number"/>
</xs:keyref>
<xs:key name="prodNumKey">
<xs:selector xpath=".//product"/>
<xs:field xpath="number"/>
</xs:key>
</xs:element>

references. The ﬁelds are matched in the same order, and they must
have related types.

17.6.1 Key references and scope

There is an additional constraint on the scope of key references and
key constraints. The key referenced by a keyref must be defined in
the same element declaration or in a declaration of one of its descen-
dants. It is not possible for a keyref to reference a key that is defined
in a sibling or ancestor element declaration. In our example, the key
and keyref were both defined in the declaration of order. It would
also have been valid if the key had been defined in the products
declaration. However, it would have been invalid if the keyref had
been defined in the items declaration, because items is a child of
order.

17.6.2 Key references and type equality

When deﬁning key references, it is important to understand XML
Schema’s concept of equality. When determining whether two values
are equal, their type is taken into account. Values with unrelated types
will never be considered equal. For example, a value 2 of type string
is not equal to a value 2 of type integer. However, if two types are
17.7 | Selectors and fields 433

related by restriction, such as integer and positiveInteger, they

can have equal values. When you define a key reference, make sure that
the types of its fields are related to the types of the fields in the refer-
enced key or uniqueness constraint. In Example 17–7, if the number
attribute of shirt were declared as an integer and the number child
of product were declared as a string, there would have been no
matches. For more information on type equality, see Section 11.7 on
p. 253.

17.7 Selectors and ﬁelds

All three categories of identity constraints are specified in terms of a
selector and one or more fields. This section explains selectors and fields
in more detail.

17.7.1 Selectors
The purpose of a selector is to identify the set of nodes to which the
constraint applies. The selector is relative to the scoping element. In
Example 17–2, our selector was */product. This selects all the
product grandchildren of catalog. There may be other grandchildren
of catalog, or other product elements elsewhere in the document,
but the constraint does not apply to them.
The selector is represented by a selector element, whose syntax is
shown in Table 17–4.

Table 17–4 XSD Syntax: constraint selector

Name
selector

Parents
unique, key, keyref

(Continues)
434 Chapter 17 | Identity constraints

Table 17–4 (Continued)

Attribute name Type Description
id ID Unique ID.
xpath XPath subset XPath to the selected
nodes.
1.1
xpathDefault- anyURI | The default namespace
Namespace "##defaultNamespace" | for XPath expressions.
"##targetNamespace" |
"##local"

Content
annotation?

17.7.2 Fields
Each ﬁeld must identify a single node relative to each node selected by
the selector. The key reference in Example 17–7 works because there
can only ever be one number attribute per selected node. In the instance
in Example 17–6, the selector selects three nodes (the three children
of items), and there is only one number attribute per node.
You might have been tempted to deﬁne a uniqueness constraint
as shown in Example 17–8. This would not work because the
selector would select one node (the single department element) and
there would be three product/number nodes relative to it.

Example 17–8. Illegal uniqueness constraint

<xs:element name="catalog" type="CatalogType">
<xs:unique name="prodNumKey">
<xs:selector xpath="department"/>
<xs:field xpath="product/number"/>
</xs:unique>
</xs:element>
17.8 | XPath subset for identity constraints 435

The elements or attributes that are used as ﬁelds must have simple
content and cannot be declared nillable.
Fields are represented by field elements, whose syntax is shown in
Table 17–5.

Table 17–5 XSD Syntax: constraint ﬁeld

Name
field

Parents
unique, key, keyref

Attribute name Type Description

id ID Unique ID.
xpath XPath subset XPath to the key ﬁeld.
1.1
xpathDefault- anyURI | The default namespace
Namespace "##defaultNamespace" | for XPath expressions.
"##targetNamespace" |
"##local"

Content
annotation?

17.8 XPath subset for identity constraints

All values of the xpath attribute in the selector and field tags must
be legal XPath expressions. However, they must also conform to a
subset of XPath that is deﬁned speciﬁcally for identity constraints.
XPath expressions are made up of paths, separated by vertical bars.
For example, the XPath expression department/product/name|
department/product/price uses two paths to select all the nodes
436 Chapter 17 | Identity constraints

that are either name or price children of product elements whose

parent is department.
Each path may begin with the .// literal, which means that the
matching nodes may appear anywhere among the descendants of
the current scoping element. If it is not included, it is assumed that
matching nodes may appear only as direct children of the scoping
element.
Each path is made up of steps, separated by forward slashes. For ex-
ample, the path department/product/name is made up of three
steps: department, product, and name. Table 17–6 lists the types
of steps that may appear in the identity constraint XPath subset.
The context node of the selector expression is the element in whose
declaration the identity constraint is defined. The context node of
the field expression is the result of evaluating the selector expression.
Table 17–7 shows some legal XPath expressions for selectors and
fields. They assume that the scope of the identity constraint is the
catalog element, as shown in Example 17–3.

Table 17–6 XPath subset steps

Step Description Example
Qualified element A child element name, which department,
name must be prefixed if it is in a prod:department
namespace.
Period (.) The current element. .
Asterisk (*) A wildcard representing any *
element.
Prefix plus an A wildcard representing any prod:*
asterisk element in the namespace mapped
to that prefix.
@ plus a qualified An attribute name, which must @number,
attribute name be prefixed if the attribute name @prod:number
is in a namespace (legal for
field only).
17.8 | XPath subset for identity constraints 437

Table 17–7 XPath subset expressions in the scope of catalog

XPath Nodes selected
. catalog itself.
* All direct children of catalog.
.//* All elements appearing anywhere in
catalog.

department or ./department All department elements that are direct

children of catalog.
./*/* All grandchildren of catalog (regardless
of their or their parents’ element names).
.//product All product elements appearing anywhere
in catalog.
./*/product All product elements that are grand-
children of catalog (regardless of their
parent).
.//product/name All name elements whose parent is
product appearing anywhere in catalog.

./department/product/name All name elements whose parent is a

product whose parent is a department
whose parent is catalog.
.//department | .//product All department elements and product
elements appearing anywhere in catalog.
@effDate The effDate attribute (legal only for
field, not selector).

product/@effDate The effDate attributes of all product

children (legal only for field, not
selector).

/prod:product/prod:* All grandchildren whose parent is

product in the namespace mapped to the
prod preﬁx.
438 Chapter 17 | Identity constraints

Technically, any of the XPath expressions in Table 17–7 is legal for

a field. However, since the field XPath can only identify a node that
appears once relative to the selected node, most of the expressions
that contain wildcards to select multiple nodes are inappropriate
for fields. The field XPath will usually consist of a single child element
or a single attribute.
Table 17–8 shows some expressions that, while they are legal XPath,
are not in the identity constraint XPath subset.

Table 17–8 Illegal XPath subset expressions

XPath(s) Comments Workaround
../department or Ancestors cannot be Move the constraint up
ancestor::department accessed. one level to the parent
or parent::department element (this does not
always work if the ﬁelds
appear at multiple levels).
descendant::product The descendant .//product
keyword is not
supported.
text() You only need to Use a single period (.) if
specify the element it is the current node,
name; text() is otherwise simply leave off
implied. the text().
department//name All steps must con- department/*/name |
tain something; department/*/*/name
double slash is not | ...
allowed except for or move the constraint
“.//” at the down one level to the
beginning of path. department element.

//product The document root Move the constraint up

cannot be accessed. to the level of the root
(catalog) element, then
use .//product.
(Continues)
17.9 | Identity constraints and namespaces 439

Table 17–8 (Continued)

XPath(s) Comments Workaround
/catalog/product The document root Move the constraint up
cannot be accessed. to the level of the root
(catalog) element, then
use product.

17.9 Identity constraints and namespaces

Special consideration must be given to namespaces when defining
identity constraints. By default, qualified element names and attribute
names used in the XPath expressions must be prefixed in order to be
legal. Let’s take another look at our uniqueness constraint from Exam-
ple 17–4. That definition assumed that the schema document had
no target namespace. If we add a target namespace, it looks like
Example 17–9.
Each of the element names in the XPath is prefixed with prod,
mapping it to the http://datypic.com/prod namespace. In our

Example 17–9. Preﬁxing names in the XPath expression

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:prod="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">
<xs:element name="catalog" type="prod:CatalogType">
<xs:unique name="dateAndProdNumKey">
<xs:selector xpath="prod:department/prod:product"/>
<xs:field xpath="prod:number"/>
<xs:field xpath="@effDate"/>
</xs:unique>
</xs:element>
<xs:element name="department" type="prod:DepartmentType"/>
<xs:element name="product" type="prod:ProductType"/>
<xs:element name="number" type="xs:integer"/>

</xs:schema>
440 Chapter 17 | Identity constraints

example, all element declarations (department, product, and number)

are global, and therefore their names must be prefixed. Let’s assume
that the attribute effDate is locally declared and unqualified, so its
name is not prefixed in the XPath expression.
The names that must be qualified in an XPath expression are those
that must be qualified in an instance, namely:

# All element names and attribute names in global declarations

# Element names and attribute names in local declarations
whose form is qualified, either directly, using the form
attribute, or indirectly through elementFormDefault or
attributeFormDefault

Note that the target namespace is mapped to a preﬁx, rather than

being the default namespace. This is because XPath expressions are not
affected by default namespace declarations. Unprefixed names in XPath
expressions are assumed to be in no namespace, even if a default
namespace declaration is in scope.
Therefore, if you want to use identity constraints in a schema docu-
ment that has a target namespace, you must map the target namespace
to a prefix. Example 17–10 uses unprefixed names in the XPath

Example 17–10. Illegal attempt to apply default namespace to XPath

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">
<xs:element name="catalog" type="CatalogType">
<xs:unique name="dateAndProdNumKey">
<xs:selector xpath="department/product"/>
<xs:field xpath="number"/>
<xs:field xpath="@effDate"/>
</xs:unique>
</xs:element>
<xs:element name="department" type="DepartmentType"/>
<xs:element name="product" type="ProductType"/>
<xs:element name="number" type="xs:integer"/>

</xs:schema>
17.9 | Identity constraints and namespaces 441

expressions, assuming that these names take on the default namespace.

This is not the case; in fact, these elements will not be found because
the processor will be looking for elements with unqualiﬁed names when
evaluating the XPath expressions.

17.9.1 Using xpathDefaultNamespace

In version 1.1, this problem is alleviated somewhat because you can
specify an xpathDefaultNamespace attribute, which designates the
default namespace for all unpreﬁxed element names that are used in
the XPath. It does not affect attribute names.
Example 17–11 uses the xpathDefaultNamespace attribute on
the schema element. This means that the element names department,
product, and number used in the selector and ﬁeld XPaths are
interpreted as being in the http://datypic.com/prod namespace.
Instead of specifying a namespace name, the xpathDefault-
Namespace attribute can contain one of three special keywords:
##targetNamespace, ##defaultNamespace, or ##local. These
are described in detail in Section 14.1.3.1 on p. 373.

Example 17–11. Using xpathDefaultNamespace

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
xpathDefaultNamespace="http://datypic.com/prod">
<xs:element name="catalog" type="CatalogType">
<xs:unique name="dateAndProdNumKey">
<xs:selector xpath="department/product"/>
<xs:field xpath="number"/>
<xs:field xpath="@effDate"/>
</xs:unique>
</xs:element>
<xs:element name="department" type="DepartmentType"/>
<xs:element name="product" type="ProductType"/>
<xs:element name="number" type="xs:integer"/>

</xs:schema>
442 Chapter 17 | Identity constraints

17.10 Referencing identity constraints

In version 1.1, identity constraints can be defined once and referenced
from multiple elements. This is true for all three kinds of identity
constraints: uniqueness constraints, key constraints, and key references.
This is useful if you have the same constraints in multiple scopes and
want to reuse the code.
The syntax for referencing an identity constraint is shown in Ta-
ble 17–9. It is the same for all three kinds of identity constraints. Instead
of a name attribute, it has a ref attribute that references the identity
constraint by its qualified name. References to identity constraints do
not contain selector or field elements; they take their definition
from the constraint they reference.

Table 17–9 XSD Syntax: identity constraint reference

Name
unique, key, keyref

Parents
element

Attribute name Type Description

id ID Unique ID.
ref QName Name of the identity constraint being
referenced.
Content
annotation?

Example 17–12 shows a new element declaration discontinued-

ProductList that has the same uniqueness constraint as catalog.
To indicate this, it contains a unique element, but with a ref attribute
17.10 | Referencing identity constraints 443

instead of a name. Note that the two element declarations specify the
same type; this is not a requirement, but it is common since most
identity constraints would only be shared among elements that contain
a similar structure.

Example 17–12. Referencing an identity constraint

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
xpathDefaultNamespace="http://datypic.com/prod">
<xs:element name="catalog" type="CatalogType">
<xs:unique name="dateAndProdNumKey">
<xs:selector xpath="department/product"/>
<xs:field xpath="number"/>
<xs:field xpath="@effDate"/>
</xs:unique>
</xs:element>

<xs:element name="discontinuedProductList" type="CatalogType">

<xs:unique ref="dateAndProdNumKey"/>
</xs:element>

</xs:schema>

Being able to reference identity constraints is also useful when restrict-

ing types. In version 1.0, if you used a local element declaration that
contained an identity constraint, it was impossible to restrict the com-
plex type that contained it because there was no formal deﬁnition of a
valid restriction of an identity constraint. Now that it can be named
and referenced, there is a formal way of indicating that an identity
constraint is the same as the identity constraint in the base type. This
is shown in Example 17–13, where the catalog element declaration
in the base type has an identity constraint, and the catalog element
declaration in the derived type references that identity constraint.
444 Chapter 17 | Identity constraints

Example 17–13. Referencing an identity constraint in a restriction

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://datypic.com/prod"
xmlns="http://datypic.com/prod"
xpathDefaultNamespace="http://datypic.com/prod">
<xs:complexType name="CatalogListType">
<xs:sequence>
<xs:element name="catalog" type="CatalogType"
maxOccurs="unbounded">
<xs:unique name="dateAndProdNumKey">
<xs:selector xpath="department/product"/>
<xs:field xpath="number"/>
<xs:field xpath="@effDate"/>
</xs:unique>
</xs:element>
</xs:sequence>
</xs:complexType>

<xs:complexType name="RestrictedCatalogListType">
<xs:complexContent>
<xs:restriction base="CatalogListType">
<xs:sequence>
<xs:element name="catalog" type="CatalogType"
maxOccurs="1">
<xs:unique ref="dateAndProdNumKey"/>
</xs:element>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>

</xs:schema>
Redeﬁning and
overriding schema
components

446
Chapter

here are two methods of including a schema document

T into a schema while modifying or overriding certain

parts of it: redefine and override. Redefinition—a version
1.0 feature—allows you to extend or restrict certain compo-
nents (namely types and groups), replacing the original definitions, but
only in certain constrained ways. Because of the limitations of redefini-
tion and its inconsistent implementation among processors, redefinition
was deprecated in version 1.1 and replaced with a new override feature
which is more flexible and better defined.
For the sake of completeness, redefinition is covered in the first half
of this chapter. However, it being deprecated, in version 1.1 you are
strongly encouraged to use the override feature instead if you require
this kind of functionality. Overrides are described in the second half
of this chapter.1

1. The override section contains information that is redundant with the

redeﬁne section; this is to allow readers who are implementing one or
the other to skip the section that does not interest them.

447
448 Chapter 18 | Redefining and overriding schema components

18.1 Redefinition
Redefinition is a way to extend and modify schemas over time while
still reusing the original definitions. It involves defining a new version
of a schema component, with the same name, that replaces the original
definition throughout the schema. This is useful for extending and/or
creating a subset of an existing schema.

18.1.1 Redeﬁnition basics

A redefine is similar to an include, with the additional option of
specifying new definitions of some or all of the components in the re-
defined schema document. This is depicted in Figure 18–1. Like in-
cluded schema documents, redefined schema documents must have
the same target namespace as the redefining schema document, or
none at all.

Figure 18–1 Redeﬁne

18.1 | Redefinition 449

Only certain types of schema components can be redeﬁned, namely

complex types, simple types, named model groups, and attribute groups.
The new definitions must be based on the original definitions. For
types, this means that the new type must restrict or extend the
original type. For attribute groups and named model groups, it means
that the new group must either be a subset or a superset of the original
group.
Example 18–1 shows a simple redefinition where the schema
document prod2.xsd redefines prod1.xsd. The simple type
DressSizeType is redefined in prod2.xsd.

Example 18–1. A simple redeﬁnition

prod1.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="DressSizeType">
<xs:restriction base="xs:integer"/>
</xs:simpleType>

<xs:element name="size" type="DressSizeType"/>

<xs:element name="color" type="xs:string"/>

</xs:schema>

prod2.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">

<xs:redefine schemaLocation="prod1.xsd">
<xs:simpleType name="DressSizeType">
<xs:restriction base="DressSizeType">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="16"/>
</xs:restriction>
</xs:simpleType>
</xs:redefine>

<xs:element name="newSize" type="DressSizeType"/>

</xs:schema>
450 Chapter 18 | Redefining and overriding schema components

18.1.1.1 Include plus redeﬁne

When a schema document is redefined, all of its components are in-
cluded in the redefining schema document, regardless of whether they
are specifically mentioned in the redefinition. In this way, the redefine
feature is similar to the include feature. In our example, the resulting
schema document includes all of the components defined and declared
in both prod2.xsd and prod1.xsd. Even though color is not
mentioned in prod2.xsd, it will be included in the resulting schema
document.

18.1.1.2 Redeﬁne and namespaces

The target namespace of the redefined schema document must be the
same as that of the redefining schema document, or nonexistent. If
the redefined schema document does not have a target namespace, all
of its components become chameleon components and will take on
the target namespace of the redefining schema document. For example,
since prod1.xsd does not have a target namespace, all of its compo-
nents will take on the target namespace of prod2.xsd. This includes
color, which is not specifically mentioned in the redefinition. It is not
a problem that the size element declaration in prod1.xsd references
DressSizeType without a prefix; the processor will correctly interpret
the references between components.

18.1.1.3 Pervasive impact

Once a schema is redefined, the new definitions completely replace the
original definitions—not just for components in the new (redefining)
schema document, but also for components that reference them in the
original (redefined) schema document. In Example 18–1, the size
element declaration now uses the new DressSizeType. If there had
been types derived from DressSizeType in prod1.xsd, they would
now become derived from the new DressSizeType.
18.1 | Redefinition 451

Redeﬁnition of a component has a ripple effect on all the other

components that depend on it. For a type, that includes all other types
that are derived from it at any level. For a group, that includes all
complex types that reference it, as well as the types derived from those
types. While this is generally intentional and desirable, there is no
guarantee that you will not break these dependent components, and
schema processors are not required to warn you if you do. Specific risks
associated with redefinition are described in Section 18.3 on p. 468.
For a comparison of redefinition, type derivation, and other methods
of extending schemas see Section 22.2 on p. 599.

18.1.2 The mechanics of redeﬁnition

A redefine element is used to contain redeﬁned schema compo-
nents. A redefine element may only occur at the top level of a schema
document (with schema as its parent), and all redefine children must
be at the beginning of the schema document, along with the include,
import, and override elements. The syntax for a redefine element
is shown in Table 18–1.

Table 18–1 XSD Syntax: redeﬁnition

Name
redefine

Parents
schema

Attribute name Type Description

A schema document can contain multiple redeﬁnitions of vari-

ous other schema documents. The schemaLocation attribute indicates
the location of the schema document to be redefined. It must reference
a complete schema document with schema as its root element.
As mentioned above, the redefined schema document must have the
same target namespace as the redefining schema document, or none
at all.
The redefine element contains the new definitions of the schema
components, in any order. For every definition that appears in a
redefine element, there must be a corresponding definition (with
the same qualified name) in the redefined schema document. Only the
components that need to be modified should appear in the redefine
element. All other components of the redefined schema document will
be included in the new schema as is. In fact, a redefine element is
not required to have any children at all, in which case it acts exactly
like an include element.

18.1.3 Redeﬁning simple types

When redefining a simple type, the new definition must restrict
the original simple type. Example 18–2 shows how you would
redefine DressSizeType to change minInclusive to be 2. The re-
stricted DressSizeType uses itself as the base type. Redefinition is
the only case where a simple type can restrict itself.
The redefinition of DressSizeType affects not only the newSize
element declaration in prod2.xsd, but also the size element declara-
tion in prod1.xsd. Because of the redefinition, size instances that
conform to prod2.xsd cannot have the value 0. This illustrates the
effect redefinition has on components in the original schema.
18.1 | Redefinition 453

Example 18–2. Redeﬁning a simple type

prod1.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="DressSizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="size" type="DressSizeType"/>
</xs:schema>

prod2.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="prod1.xsd">
<xs:simpleType name="DressSizeType">
<xs:restriction base="DressSizeType">
<xs:minInclusive value="2"/>
</xs:restriction>
</xs:simpleType>
</xs:redefine>
<xs:element name="newSize" type="DressSizeType"/>
</xs:schema>

18.1.4 Redeﬁning complex types

Complex types can also be redefined, provided that the new definition
of the complex type either extends or restricts the original complex
type. Example 18–3 shows how you would redefine ProductType to
add a new element declaration and a new attribute declaration. Like a
simple type, a complex type can be based on itself when it is part of
a redefinition.
454 Chapter 18 | Redefining and overriding schema components

Example 18–3. Redeﬁning a complex type

prod1.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:schema>

prod2.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="prod1.xsd">
<xs:complexType name="ProductType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:sequence>
<xs:element name="color" type="xs:string"/>
</xs:sequence>
<xs:attribute name="effDate" type="xs:date"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
</xs:redefine>
</xs:schema>

18.1.5 Redeﬁning named model groups

When redeﬁning named model groups, the new deﬁnition must be
either a subset or a superset of the original group.

18.1.5.1 Deﬁning a subset

Example 18–4 shows the redeﬁnition of DescriptionGroup to
disallow the comment element.
18.1 | Redefinition 455

Example 18–4. Redeﬁning a named model group as a subset

prod1.xsd:

prod2.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="prod1.xsd">
<xs:group name="DescriptionGroup">
<xs:sequence>
<xs:element name="description" type="xs:string"/>
</xs:sequence>
</xs:group>
</xs:redefine>
</xs:schema>

Our example is legal because the comment elements are optional per
the original definition. The exact definition of a legal subset is the same
as that used for complex type restriction. In other words, if a content
model is considered a legal restriction of another content model
(in complex type derivation), it is also a legal subset in the redefinition
of a named model group. See Section 13.5 on p. 316 for the rules of
complex type restriction.

18.1.5.2 Deﬁning a superset

On the other hand, suppose you want to extend the deﬁnition of
DescriptionGroup to include more children. Example 18–5 shows
the redeﬁnition of DescriptionGroup to add new element declara-
tions. In this case, you are saying that you want all of the original
456 Chapter 18 | Redefining and overriding schema components

Example 18–5. Redeﬁning a named model group as a superset

prod1.xsd:

prod2.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="prod1.xsd">
<xs:group name="DescriptionGroup">
<xs:sequence>
<xs:group ref="DescriptionGroup"/>
<xs:element name="notes" type="xs:string"/>
</xs:sequence>
</xs:group>
</xs:redefine>
</xs:schema>

element declarations of DescriptionGroup, followed by the new

notes element declaration.
The group refers to itself the way it would refer to any other group.
Redeﬁnition is the only case where a group reference can be circular,
but there are two constraints.

# The group may only reference itself once.

# maxOccurs and minOccurs of that self-reference must be 1
(or not present, in which case they default to 1).

18.1.6 Redeﬁning attribute groups

Like a named model group, an attribute group can be redeﬁned to be
a subset or superset of its original deﬁnition.
18.1 | Redefinition 457

18.1.6.1 Deﬁning a subset

Example 18–6 shows how you would redeﬁne the IdentifierGroup
as a subset. The new deﬁnition disallows the xml:lang attribute and
changes the type of the version attribute from decimal to integer.

Example 18–6. Redeﬁning an attribute group as a subset

prod1.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:import namespace="http://www.w3.org/XML/1998/namespace"/>
<xs:attributeGroup name="IdentifierGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="version" type="xs:decimal"/>
<xs:attribute ref="xml:lang"/>
</xs:attributeGroup>
</xs:schema>

prod2.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="prod1.xsd">
<xs:attributeGroup name="IdentifierGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="version" type="xs:integer"/>
</xs:attributeGroup>
</xs:redefine>
</xs:schema>

The rules used to deﬁne a subset of an attribute group are the same
as those used for attribute restriction in complex type derivation. This
means that you can eliminate optional attributes, make attributes re-
quired, add a ﬁxed value, change default values, or change types to be
more restrictive. Eliminating the xml:lang attribute in Example 18–6
is legal because it is optional (by default) in the original attribute group.
Changing the type of version is legal because integer is a restric-
tion of decimal. See Section 13.5.5 on p. 333 for more information
on attribute restrictions.
458 Chapter 18 | Redefining and overriding schema components

Unlike complex type derivation, however, you must redeclare all at-
tributes you want to appear in the new definition. The attribute decla-
rations will not automatically be copied from the original definition to
the new definition.
If the original definition contains an attribute wildcard, you
may repeat or further restrict the wildcard. Subsetting of attribute
wildcards also follows the rules used in complex type derivation. See
Section 13.5.6 on p. 335 for more information on attribute wildcard
restrictions.

18.1.6.2 Deﬁning a superset

On the other hand, suppose you want to extend the deﬁnition of
IdentifierGroup to include more attributes. Example 18–7 shows
how you would redeﬁne IdentifierGroup to add attributes. You

Example 18–7. Redeﬁning an attribute group as a superset

prod1.xsd:

prod2.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="prod1.xsd">
<xs:attributeGroup name="IdentifierGroup">
<xs:attributeGroup ref="IdentifierGroup"/>
<xs:attribute name="effDate" type="xs:date"/>
</xs:attributeGroup>
</xs:redefine>
</xs:schema>
18.2 | Overrides 459

cannot alter the declarations of the attributes in the original group,

only add new attributes. In this case, you are saying that you want all of
the original attributes of IdentifierGroup, plus a new effDate
attribute.
The attribute group refers to itself the way it would refer to any
other attribute group. Redeﬁnition is the only case where an attribute
group reference can be circular in version 1.0.

18.2 Overrides
The override feature is a convenient way to customize schemas. It in-
volves defining a new version of a schema component, with the same
name, that replaces the original definition throughout the schema. This
is useful when you want to reuse a schema but you want to make some
modifications (minor or major) to the components in that schema
while still preserving the original definitions.

18.2.1 Override basics

An override is similar to an include, with the additional option of
specifying new definitions for some or all of the components in the
overridden schema document. This is depicted in Figure 18–2.
Like included schema documents, overridden schema documents must
have the same target namespace as the overriding schema document,
or none at all.
You can override any top-level named schema component, namely
complex types, simple types, global element declarations, global attribute
declarations, named model groups, attribute groups, and notations.
Unlike redefines, the new definitions do not have to be based on the
original definitions. In fact, they cannot refer to the original definitions
in the way that redefining components do.
460 Chapter 18 | Redefining and overriding schema components

Figure 18–2 Override

Example 18–8 shows a simple override where the schema document

prod2.xsd overrides prod1.xsd. The simple type DressSizeType
is overridden in prod2.xsd.

Example 18–8. A simple override

prod1.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:simpleType name="DressSizeType">
<xs:restriction base="xs:integer"/>
</xs:simpleType>

<xs:element name="size" type="DressSizeType"/>

<xs:element name="color" type="xs:string"/>

</xs:schema>

(Continues)
18.2 | Overrides 461

Example 18–8. (Continued)

prod2.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod">

<xs:override schemaLocation="prod1.xsd">
<xs:simpleType name="DressSizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="16"/>
</xs:restriction>
</xs:simpleType>
</xs:override>

<xs:element name="newSize" type="DressSizeType"/>

</xs:schema>

18.2.1.1 Include plus override

When a schema document is overridden, all of its components are in-
cluded in the overriding schema document, regardless of whether they
are speciﬁcally mentioned in the override. In this way, the override
feature is similar to the include feature. In our example, the resulting
schema document includes all of the components deﬁned and declared
in both prod2.xsd and prod1.xsd. Even though color is not
mentioned in prod2.xsd, it will be included in the resulting schema
document.

18.2.1.2 Override and namespaces

The target namespace of the overridden schema document must be the
same as that of the overriding schema document, or nonexistent. If
the overridden schema document does not have a target namespace,
all of its components become chameleon components and will take on
the target namespace of the overriding schema document. For example,
since prod1.xsd does not have a target namespace, all of its
462 Chapter 18 | Redefining and overriding schema components

components will take on the target namespace of prod2.xsd. This

includes color, which is not speciﬁcally mentioned in the override.
It is not a problem that the size element declaration in prod1.xsd
references DressSizeType without a preﬁx; the processor will correctly
interpret the references between components.

18.2.1.3 Pervasive impact

Once a schema is overridden, the new definitions completely replace
the original definitions—not just for components in the new overriding
schema document, but also for components that reference it in the
original, overridden schema document. In Example 18–8, the size
element declaration now uses the new DressSizeType. If there had
been types derived from DressSizeType in prod1.xsd, they
would now be derived from the new DressSizeType.
Overriding a component has a ripple effect on all the other compo-
nents that depend on it. For a type, this includes all other types that
are derived from it at any level. For a group, it includes all complex
types that reference it, as well as the types derived from those types.
While this is generally intentional and desirable, there is no guarantee
that you will not break these dependent components, and schema
processors are not required to warn you if you do. Specific risks
associated with overrides are described in Section 18.3 on p. 468.
For a comparison of overriding, type derivation, and other methods
of extending schemas, see Section 22.2 on p. 599.

18.2.2 The mechanics of overriding components

An override element is used to contain overridden schema compo-
nents. An override element may only occur at the top level of a
schema document (with schema as its parent), and all override
children must be at the beginning of the schema document, along with
the include, import, and redefine elements. The syntax for an
override element is shown in Table 18–2.
18.2 | Overrides 463

Table 18–2 XSD Syntax: override

Name
override

Parents
schema

Attribute name Type Description

A schema document can contain multiple overrides of various other

schema documents. The schemaLocation attribute indicates the
location of the schema document to be overridden. It must reference
a complete schema document with schema as its root element. As
mentioned above, the overridden schema document must have the
same target namespace as the overriding schema document, or none
at all.
The override element contains the new definitions of the schema
components, in any order. For every definition that appears in an
override element, there must be a corresponding definition (with
the same qualified name) in the overridden schema document.
Only the components that need to be modified should appear in the
override element. All other components of the overridden schema
document will be included in the new schema as is. In fact, an
override element is not required to have any children at all, in which
case it acts exactly like an include element.
464 Chapter 18 | Redefining and overriding schema components

18.2.3 Overriding simple types

Example 18–9 shows how you would override a simple type
DressSizeType to change minInclusive to be 2. Unlike redeﬁnition,
the overriding type DressSizeType in prod2.xsd is not derived from
or in any way related to its counterpart in prod1.xsd. Even though
the maxInclusive constraint didn’t change, it needs to be respeciﬁed
in the overriding type.

Example 18–9. Overriding a simple type

prod1.xsd:

prod2.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="prod1.xsd">
<xs:simpleType name="DressSizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
</xs:override>
<xs:element name="newSize" type="DressSizeType"/>
</xs:schema>

The override of DressSizeType affects not only the newSize ele-

ment declaration in prod2.xsd, but also the size element declaration
in prod1.xsd. Because of the override, size instances that conform
18.2 | Overrides 465

to prod2.xsd cannot have the value 0. This illustrates the effect

overriding has on components in the original schema.
In this example, DressSizeType didn’t change much. It is still
based on integer but with some additional constraints. However,
you are not limited in the changes you can make when overriding, as
long as the overriding deﬁnition is still a simple type. It would be pos-
sible to turn DressSizeType into a string value, for example. It is
not possible to override a simple type with a complex type.

18.2.4 Overriding complex types

Complex types can also be overridden. Example 18–10 shows how you
would override ProductType to change the type of number, delete

Example 18–10. Overriding a complex type

prod1.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>

prod2.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="prod1.xsd">
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:string"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="color" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:override>
</xs:schema>
466 Chapter 18 | Redefining and overriding schema components

size, and add color. As with simple types, the overriding deﬁnition
can be similar to the overridden deﬁnition, as it is in this case, but
it can also be completely different.

18.2.5 Overriding element and attribute declarations

Unlike redeﬁnes, overrides can apply to element and attribute declara-
tions, as long as they are global. Example 18–11 shows the override of
the description global element declaration and the version global
attribute declaration. In this case, description changed signiﬁcantly,
from having a simple type to having a complex type with element-only
content.

Example 18–11. Overriding element and attribute declarations

prod1.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="description" type="xs:string"/>
<xs:attribute name="version" type="xs:decimal"/>
</xs:schema>

prod2.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="prod1.xsd">
<xs:element name="description" type="DescriptionType"/>
<xs:attribute name="version" type="xs:string" default="1.0"/>
</xs:override>
<xs:complexType name="DescriptionType">
<xs:sequence>
<xs:element name="source" type="xs:string"/>
<xs:element name="content" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>

Local element and attribute declarations cannot be overridden

directly, but the complex types that contain them can be.
18.2 | Overrides 467

18.2.6 Overriding named groups

It is also possible to override named model groups and attribute groups.
Unlike a redefinition, the new definition of a group has no superset or
subset relationship to the original group definition.
Example 18–12 shows an override of the DescriptionGroup named
model group and the IdentifierGroup attribute group.

Example 18–12. Overriding named groups

prod1.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:group name="DescriptionGroup">
<xs:sequence>
<xs:element name="description" type="xs:string"/>
<xs:element name="comment" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:group>
<xs:attributeGroup name="IdentifierGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="version" type="xs:decimal"/>
</xs:attributeGroup>
</xs:schema>

prod2.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="prod1.xsd">
<xs:group name="DescriptionGroup">
<xs:sequence>
<xs:element name="description" type="xs:string"/>
</xs:sequence>
</xs:group>
<xs:attributeGroup name="IdentifierGroup">
<xs:attribute name="effDate" type="xs:date"/>
<xs:attribute name="id" type="xs:ID"/>
</xs:attributeGroup>
</xs:override>
</xs:schema>
468 Chapter 18 | Redefining and overriding schema components

18.3 Risks of redeﬁnes and overrides

As mentioned previously, redefines and overrides both have a pervasive
impact on all components—not just in the overriding/redefining
schema document, but also in the overridden/redefined schema docu-
ment. When you use either of these techniques, if there are other
components that depend on the original definitions, you run the
risk of rendering these dependent components invalid by changing
the original definitions.

18.3.1 Risks of redeﬁning or overriding types

For a type, the risks pertain to all other types that are derived from it
at any level.
Example 18–13 shows a new prod1.xsd where there are two com-
plex types derived from ProductType: ShirtType (an extension) and
RestrictedProductType (a restriction). All changes to ProductType
during override or redeﬁne are passed down to the derived types, which
is probably your intention. However, in this case, both derived types
have been rendered illegal.

Example 18–13. Risks of overriding types

prod1.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer" minOccurs="0"/>
<xs:element name="name" type="xs:string" minOccurs="0"/>
<xs:element name="size" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:complexType>

(Continues)
18.3 | Risks of redefines and overrides 469

Example 18–13. (Continued)

<xs:complexType name="ShirtType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:sequence>
<xs:element name="color" type="xs:integer"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:complexType name="RestrictedProductType">
<xs:complexContent>
<xs:restriction base="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"
minOccurs="0"/>
<xs:element name="size" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
</xs:schema>

prod2.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="prod1.xsd">
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="color" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:override>
</xs:schema>

Some of the risks associated with redeﬁning or overriding

ProductType when there are extensions of the original deﬁnition are:

# Adding an attribute that ShirtType already has, resulting in

duplicate attributes for ShirtType.
470 Chapter 18 | Redefining and overriding schema components

# Adding an element declaration to the content model that

ShirtType already has, but with a different type, as shown in
Example 18–13 with color. It is illegal for a complex type to
contain two element declarations with the same name and
different types.
# Adding element declarations to the content model that render
the content model of ShirtType nondeterministic.

Some of the risks associated with redeﬁning or overriding

ProductType when there are restrictions of the original deﬁnition are:

# Restricting a content model further, or in a way incompatibly

different, than how RestrictedProductType restricted it, as
shown in Example 18–13.
# Restricting an attribute further than RestrictedProductType
restricted it, rendering RestrictedProductType’s restriction
illegal.
# Making an attribute, which is then restricted by Restricted-
ProductType, prohibited, resulting in an illegal attribute
declaration in the deﬁnition of RestrictedProductType.

Simple type derivations can also be negatively affected by redeﬁnes

and overrides. As with complex types, if there are dependent types in
the original schema document that restrict an overridden simple type
further or in incompatible ways, it can render them invalid.

18.3.2 Risks of redeﬁning or overriding named groups

When redeﬁning or overriding a named group, you should be aware
of an impact to the complex types that reference it, as well as the types
derived from those types. The risks include:

# Making a content model nondeterministic.

# Introducing duplicate attribute names.
18.3 | Risks of redefines and overrides 471

# Making element declarations inconsistent by introducing two

element declarations with the same name but different types.
# Rendering illegal the types derived by extension or restriction
from the types that directly use the group, in the ways described
in the previous section.
Topics for DTD
users

472
Chapter

his chapter provides a jump-start on XML Schema for

T readers who are familiar with DTDs. It offers a detailed

comparison of DTD and schema syntax, which is useful
both for understanding XML Schema and for converting
existing DTDs to schemas. It also describes some of the features
of XML Schema that require the use of DTDs, such as entities and
notations.

19.1 Element declarations

Table 19–1 shows examples of various DTD content models and
matches them up with the corresponding XML Schema content types.
Each of these content types is explained in the rest of this section.

473
474 Chapter 19 | Topics for DTD users

Table 19–1 Content types

Example DTD Simple Complex type
content model type
Simple Complex content
content
Element- Mixed Empty
only
(#PCDATA) with Section
no attributes 19.1.1
(#PCDATA) with Section
attributes 19.1.2
(a | b)* Section
19.1.3
(#PCDATA | a | b)* Section
19.1.4
EMPTY Section
19.1.5
ANY Section
19.1.6

19.1.1 Simple types

Element types with (#PCDATA) content and no attributes in a DTD
correspond to element declarations with simple types in schemas.
Example 19–1 shows such an element declaration.
Note that the built-in type decimal is assigned to price. It is pos-
sible to assign all #PCDATA element types the built-in type string,
which handles whitespace in the same way as DTD processors handle
whitespace for any character data content of an element. However, it
is advisable to be as speciﬁc as possible when choosing a type for an
element declaration. Chapter 11 describes the built-in simple types
in detail, and Chapter 8 describes how to deﬁne your own simple types.
19.1 | Element declarations 475

Example 19–1. Simple type

DTD:

<!ELEMENT price (#PCDATA)>

Schema:

<xs:element name="price" type="xs:decimal"/>

19.1.2 Complex types with simple content

Element types with (#PCDATA) content that do have attributes corre-
spond to element declarations using complex types with simple content
in schemas. Example 19–2 shows such an element declaration. It
extends the simple type decimal to add the attribute currency.

Example 19–2. Simple content (with attributes)

DTD:

<!ELEMENT price (#PCDATA)>

<!ATTLIST price currency NMTOKEN #IMPLIED>

Schema:

<xs:element name="price">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:decimal">
<xs:attribute name="currency" type="xs:NMTOKEN"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
476 Chapter 19 | Topics for DTD users

19.1.3 Complex types with complex content

Element types that may have children, regardless of whether they have
attributes, correspond to element declarations using complex types
with complex content in schemas. Example 19–3 shows such an element
declaration.

Example 19–3. Complex content

DTD:

<!ELEMENT product (number, name+, size?, color*)>

Schema:

<xs:element name="product">
<xs:complexType>
<xs:sequence>
<xs:element ref="number"/>
<xs:element ref="name" maxOccurs="unbounded"/>
<xs:element ref="size" minOccurs="0"/>
<xs:element ref="color" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>

In Example 19–3, the content model was converted into a sequence.

Groups, enclosed in parentheses in DTDs, are represented by one of
the three model groups in a schema.

# sequence groups require that the elements appear in order.

# choice groups allow a choice from several elements.
# all groups allow the elements to appear in any order.
19.1 | Element declarations 477

Table 19–2 shows the mapping between DTD groups and XML
Schema model groups.

Table 19–2 Group compositors

DTD model XML Schema model group
(a,b,c) sequence

(a|b|c) choice

no equivalent all

As shown in Example 19–3, the occurrence constraints on element

types and groups are represented by the minOccurs and maxOccurs
attributes in schemas. Table 19–3 shows the mapping between
occurrence constraints in DTDs and schemas.

Table 19–3 Occurrence constraints

DTD symbol Schema minOccurs value Schema maxOccurs value
(none) 1 1

* 0 unbounded

+ 1 unbounded

? 0 1

The defaults for minOccurs and maxOccurs are both 1. XML

Schema can provide more speciﬁc validation than DTDs, since
any non-negative integer can be speciﬁed. For example, you can specify
that the color element may appear a maximum of three times.
Groups may be nested in schemas just as they may in DTDs, as illus-
trated in Example 19–4. Note that minOccurs and maxOccurs may
appear on groups as well as on element declarations.
478 Chapter 19 | Topics for DTD users

Example 19–4. Nested groups

DTD:

<!ELEMENT el ((a | b)*, (c | d)?)>

Schema:

<xs:element name="el">
<xs:complexType>
<xs:sequence>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="a"/>
<xs:element ref="b"/>
</xs:choice>
<xs:choice minOccurs="0" maxOccurs="1">
<xs:element ref="c"/>
<xs:element ref="d"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:element>

19.1.4 Mixed content

Element types that have both #PCDATA content and children are
said to have mixed content.1 In schemas, mixed content is indicated
by a mixed attribute of a complexType element, as shown in
Example 19–5.
With DTDs, you are limited to the choice operator (|) with mixed
content element types. In schemas, any content model can be
mixed, allowing more complex validation of the children. For example,
in a DTD you cannot specify that custName must appear before
prodName. In schemas, you can accomplish this using a sequence
group instead of a choice group.

1. Technically, in DTDs mixed content also refers to element types with just
#PCDATA content, but this case is covered in Sections 19.1.1 on p. 474 and
19.1.2 on p. 475.
19.1 | Element declarations 479

Example 19–5. Mixed content

DTD:

<!ELEMENT letter (#PCDATA | custName | prodName)*>

Schema:

<xs:element name="letter">
<xs:complexType mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="custName"/>
<xs:element ref="prodName"/>
</xs:choice>
</xs:complexType>
</xs:element>

19.1.5 Empty content

Empty content, indicated by the keyword EMPTY in DTDs, is simply
indicated by an absence of a content model in a schema. Example 19–6
shows an element declaration with empty content, containing only
attribute declarations.

Example 19–6. Empty content

DTD:

<!ELEMENT color EMPTY>

<!ATTLIST color value NMTOKEN #IMPLIED>

Schema:

<xs:element name="color">
<xs:complexType>

<xs:attribute name="value" type="xs:NMTOKEN"/>
</xs:complexType>
</xs:element>
480 Chapter 19 | Topics for DTD users

19.1.6 Any content

Any content, indicated by the keyword ANY in DTDs, is represented
by an element wildcard any in a schema. This is illustrated in
Example 19–7.

Example 19–7. Any content

DTD:

<!ELEMENT anything ANY>

Schema:

<xs:element name="anything">
<xs:complexType mixed="true">
<xs:sequence>
<xs:any minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>

XML Schema offers much more sophisticated wildcard capabilities

than DTDs. It is possible with XML Schema to put a wildcard any-
where in a content model, specify how many replacement elements
may appear, restrict the namespace(s) of the replacement elements, and
control how strictly they are validated. See Section 12.7.1 on p. 285
for more information on element wildcards.

19.2 Attribute declarations

19.2.1 Attribute types
The DTD attribute types are represented in XML Schema as simple
types, most of them with the same name. Table 19–4 lists the DTD
attribute types and their equivalent types in XML Schema.
19.2 | Attribute declarations 481

Table 19–4 DTD attribute types and equivalents

DTD attribute type XML schema equivalent
CDATA normalizedString

NMTOKEN, NMTOKENS NMTOKEN, NMTOKENS

ID, IDREF, IDREFS ID, IDREF, IDREFS

ENTITY, ENTITIES ENTITY, ENTITIES
NOTATION Simple type derived from NOTATION, see
Section 19.2.3
Enumerated values Simple type derivation with enumeration facets
speciﬁed, see Section 19.2.2

19.2.2 Enumerated attribute types

In order to represent an enumerated attribute type in a schema, it is
necessary to deﬁne a new simple type and apply enumeration
facets to restrict the values to the desired set. This is illustrated in
Example 19–8.

Example 19–8. Representing an enumerated attribute

DTD:

<!ATTLIST price currency (USD | CHF) "USD">

Schema:

<xs:attribute name="currency" default="USD">

<xs:simpleType>
<xs:restriction base="xs:token">
<xs:enumeration value="USD"/>
<xs:enumeration value="CHF"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
482 Chapter 19 | Topics for DTD users

The built-in type token is used as the base type for the restriction,
which will result in whitespace handling identical to that of enumerated
attribute types in DTDs.

19.2.3 Notation attributes

A NOTATION attribute type exists in XML Schema as it does in XML
DTDs. However, the NOTATION type cannot be used directly by an
attribute. Instead, you must deﬁne a new simple type that restricts
NOTATION and apply enumeration facets to list the possible values for
that notation. This is illustrated in Example 19–9.

Example 19–9. Representing a notation attribute

DTD:

<!ATTLIST picture fmt NOTATION (jpg | gif) "jpg">

Schema:

<xs:attribute name="fmt" default="jpg">

<xs:simpleType>
<xs:restriction base="xs:NOTATION">
<xs:enumeration value="jpg"/>
<xs:enumeration value="gif"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>

19.2.4 Default values

Attribute default values are handled by three attributes in schemas: the
use attribute which indicates whether the attribute being declared is
required or optional, the default attribute which specifies a default
value, and the fixed attribute which specifies a fixed value. Table 19–5
shows how the DTD attribute default values correspond to schema
attributes.
19.3 | Parameter entities for reuse 483

Table 19–5 DTD default values and their equivalents

DTD default value Schema equivalent
#REQUIRED use="required"

#IMPLIED use="optional"

#FIXED "x" fixed="x"

"x" default="x"

Example 19–10 provides some examples of attribute declarations

with various types and default values.

Example 19–10. Attribute declarations

DTD:

<!ATTLIST product
id ID #REQUIRED
name CDATA #IMPLIED
type NMTOKEN "PR"
version NMTOKEN #FIXED "A123">

Schema:

<xs:attribute name="id" type="xs:ID" use="required"/>

<xs:attribute name="name" type="xs:normalizedString"
use="optional"/>
<xs:attribute name="type" type="xs:NMTOKEN" default="PR"/>
<xs:attribute name="version" type="xs:NMTOKEN" fixed="A123"/>

19.3 Parameter entities for reuse

Internal parameter entities are often used in DTDs to reuse pieces of
element or attribute declarations. Using schemas, reuse is handled by
creating reusable types, named model groups, and attribute groups.
484 Chapter 19 | Topics for DTD users

This section explains how to convert internal parameter entities into

XML Schema components.

19.3.1 Reusing content models

In DTDs, a parameter entity may be used to deﬁne a content model
once and reuse it for multiple element types. Using schemas, the best
way to accomplish this is to deﬁne a named complex type which is then
used by multiple element declarations. This is illustrated in Exam-
ple 19–11, where the AOrB content model is used by two element
declarations, x and y.

Example 19–11. Reusing entire content models

DTD:

<!ENTITY % AOrB "(a | b)">

<!ELEMENT x %AOrB;>
<!ELEMENT y %AOrB;>

Schema:

<xs:complexType name="AOrBType">
<xs:choice>
<xs:element ref="a"/>
<xs:element ref="b"/>
</xs:choice>
</xs:complexType>

<xs:element name="x" type="AOrBType"/>

<xs:element name="y" type="AOrBType"/>

A parameter entity may also be used to represent a fragment of a

content model. In XML Schema, named model groups are designated
for this purpose. Example 19–12 shows a content model fragment AOrB
19.3 | Parameter entities for reuse 485

that is used as part of the entire content model in the x element decla-
ration. See Section 15.2 on p. 386 for more information on named
model groups.

Example 19–12. Reusing fragments of content models

DTD:

<!ENTITY % AOrB "a | b">

<!ELEMENT x ((%AOrB;), c)>

Schema:

<xs:group name="AOrBGroup">
<xs:choice>
<xs:element ref="a"/>
<xs:element ref="b"/>
</xs:choice>
</xs:group>

<xs:element name="x">
<xs:complexType>
<xs:sequence>
<xs:group ref="AOrBGroup"/>
<xs:element ref="c"/>
</xs:sequence>
</xs:complexType>
</xs:element>

19.3.2 Reusing attributes

In some cases, parameter entities are used in DTDs to reuse an attribute
or a set of attributes that are common to several element types. In XML
Schema, attribute groups are used for this purpose. Example 19–13
shows the deﬁnition of an attribute group HeaderGroup containing
two attributes, which is then referenced by the x element declaration.
486 Chapter 19 | Topics for DTD users

Example 19–13. Reusing groups of attributes

DTD:

<!ENTITY % HeaderGroup "id ID #REQUIRED

variety NMTOKEN #IMPLIED">

<!ATTLIST x %HeaderGroup;>

Schema:

<xs:attributeGroup name="HeaderGroup">
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="variety" type="xs:NMTOKEN"/>
</xs:attributeGroup>

<xs:element name="x">
<xs:complexType>
<xs:attributeGroup ref="HeaderGroup"/>
</xs:complexType>
</xs:element>

19.4 Parameter entities for extensibility

Parameter entities are sometimes used to make DTDs more ﬂexible
and future-proof. Empty entities are declared and placed in various
parts of the DTD, most often in content models and attribute lists.
This allows a parent (or internal) DTD to override the entity declara-
tion, thus overriding the original DTD without having to completely
rewrite it. Using schemas, this can be accomplished through several
methods: type derivation, substitution groups, redeﬁnes, or overrides.

19.4.1 Extensions for sequence groups

In DTDs, you can place a reference to an empty parameter entity at
the end of a content model, as shown in Example 19–14. In XML
19.4 | Parameter entities for extensibility 487

Schema, this can be accomplished using the redeﬁne or override

mechanism.

Example 19–14. Allowing future extensions for sequence groups

DTD:

<!ENTITY % ext "" >

<!ELEMENT x (a, b %ext;)>

Schema:

<xs:group name="ext">
<xs:sequence/>
</xs:group>
<xs:element name="x">
<xs:complexType>
<xs:sequence>
<xs:element ref="a"/>
<xs:element ref="b"/>
<xs:group ref="ext"/>
</xs:sequence>
</xs:complexType>
</xs:element>

Example 19–15 shows how these extensions could be accomplished

in a new parent DTD or in a new schema. In the schema, the redeﬁne
mechanism is used to extend the named model group to add to the
end of the content model. Redeﬁnition is covered in Chapter 18.

Example 19–15. Implementing extensions for sequence groups using redefine

DTD:

<!ENTITY % ext ", c, d" >

<!ENTITY % original SYSTEM "original.dtd">
%original;

(Continues)
488 Chapter 19 | Topics for DTD users

Example 19–15. (Continued)

Schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="original.xsd">
<xs:group name="ext">
<xs:sequence>
<xs:group ref="ext"/>
<xs:element ref="c"/>
<xs:element ref="d"/>
</xs:sequence>
</xs:group>
</xs:redefine>
</xs:schema>

In version 1.1 of XML Schema, a better choice is to use override,

since redefine is deprecated. Example 19–16 shows a revised example
that uses override. Overrides are also covered in Chapter 18.

Example 19–16. Implementing extensions for sequence groups using override

DTD:

<!ENTITY % ext ", c, d" >

<!ENTITY % original SYSTEM "original.dtd">
%original;

Schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="original.xsd">
<xs:group name="ext">
<xs:sequence>
<xs:element ref="a"/>
<xs:element ref="b"/>
<xs:element ref="c"/>
<xs:element ref="d"/>
</xs:sequence>
</xs:group>
</xs:override>
</xs:schema>
19.4 | Parameter entities for extensibility 489

19.4.2 Extensions for choice groups

On the other hand, if it is a choice group that you wish to leave open,
extension will not meet your needs. This is because all extensions are
added to the end of the content model as part of a sequence group.
For a more detailed explanation of this, see Section 13.4.2.1 on p. 309.
The best approach to extending a choice group is by using a substi-
tution group. Substitution groups allow an element declaration to be
replaced by any of a group of designated element declarations. New
element declarations can be added to the substitution group at any
time. The schema fragment in Example 19–17 uses a choice group
that contains a reference to the ext element declaration. Because it is
abstract, ext can never be used in an instance.

Example 19–17. Allowing future extensions for choice groups

DTD:

<!ENTITY % ext "" >

<!ELEMENT x (a | b %ext;)*>

Schema:

<xs:element name="x">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element ref="a"/>
<xs:element ref="b"/>
<xs:element ref="ext"/>
</xs:choice>
</xs:complexType>
</xs:element>
<xs:element name="ext" abstract="true" type="xs:string"/>

Example 19–18 shows how these extensions would be accomplished

in a new parent DTD or in a new schema. In the schema, element
declarations c and d are added to the substitution group headed by
ext, allowing these element declarations to appear in the content
model as part of the choice.
490 Chapter 19 | Topics for DTD users

Example 19–18. Implementing extensions for choice groups

DTD:

<!ENTITY % ext "| c | d" >

<!ENTITY % original SYSTEM "original.dtd">
%original;

Schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:include schemaLocation="original.xsd"/>
<xs:element name="c" substitutionGroup="ext"/>
<xs:element name="d" substitutionGroup="ext"/>
</xs:schema>

19.4.3 Attribute extensions

Parameter entities may also be used in DTDs to leave attribute lists
open to future additions. Using schemas, this can be handled through
redeﬁning or overriding attribute groups. Example 19–19 shows

Example 19–19. Allowing future extensions for attributes

DTD:

<!ENTITY % attExt "" >

<!ATTLIST x id ID #REQUIRED
%attExt;>

Schema:

<xs:attributeGroup name="attExt"/>
<xs:element name="x">
<xs:complexType>

<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attributeGroup ref="attExt"/>
</xs:complexType>
</xs:element>
19.4 | Parameter entities for extensibility 491

a DTD that includes an empty parameter entity in an attribute

list. The corresponding schema has an empty attribute group that serves
the same purpose.
Example 19–20 shows how attribute extensions would be accom-
plished in a new parent DTD or in a new schema. In the schema, the
redeﬁne mechanism is used to extend the attribute group to add a new
attribute.

Example 19–20. Implementing extensions for attributes using redefine

DTD:

<!ENTITY % attExt "myAttr NMTOKEN #IMPLIED" >

<!ENTITY % original SYSTEM "original.dtd">
%original;

Schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:redefine schemaLocation="original.xsd">
<xs:attributeGroup name="attExt">
<xs:attributeGroup ref="attExt"/>
<xs:attribute name="myAttr" type="xs:NMTOKEN"/>
</xs:attributeGroup>
</xs:redefine>
</xs:schema>

This technique can also replace the declaration of multiple ATTLISTs

for a single element type that is sometimes used to extend attribute
lists.
In version 1.1 of XML Schema, a better choice is to use override,
since redefine is deprecated. Example 19–21 shows a revised example
that uses override.
492 Chapter 19 | Topics for DTD users

Example 19–21. Implementing extensions for attributes using override

DTD:

<!ENTITY % attExt "myAttr NMTOKEN #IMPLIED" >

<!ENTITY % original SYSTEM "original.dtd">
%original;

Schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:override schemaLocation="original.xsd">
<xs:attributeGroup name="attExt">
<xs:attribute name="myAttr" type="xs:NMTOKEN"/>
</xs:attributeGroup>
</xs:override>
</xs:schema>

19.5 External parameter entities

External parameter entities are used to include other DTDs (or frag-
ments of DTDs) in a parent DTD. In a schema, this is accomplished
using either include or import. An include can be used if both
schema documents are in the same namespace (or in no namespace),
while import is used if they are in different namespaces. Exam-
ple 19–22 illustrates the use of include to combine schema documents.
See Section 4.3.1 on p. 62 for more detailed information on the include
mechanism.

Example 19–22. Including other DTDs or schema documents

DTD:

<!ENTITY % prodInfo SYSTEM "prod.dtd">

%prodInfo;

Schema:

<xs:include schemaLocation="prod.xsd"/>
19.7 | Notations 493

19.6 General entities

19.6.1 Character and other parsed entities
General entities are used in DTDs to represent characters or other re-
peated character data that appears in instances. Unfortunately, there
is no direct equivalent for general entities in XML Schema. It is still
possible to use an internal or external DTD to declare the entities
and use this DTD in conjunction with schemas, as explained in
Section 19.9 on p. 499.

19.6.2 Unparsed entities

Unparsed entities are used in conjunction with notations to reference
external data in non-XML formats, such as graphics ﬁles. A schema-
validated instance must be associated with a DTD (usually an internal
DTD subset) that declares the unparsed entities. This is described
further in Section 19.7.3 on p. 496.

19.7 Notations
Notations are used to indicate the format of non-XML data. For exam-
ple, notations can be declared to indicate whether certain binary
graphics data embedded in a picture element is in JPEG or GIF for-
mat. Notations may describe data embedded in an XML instance, or
data in external files that are linked to the instance through unparsed
entities.
A notation may have a system or public identifier. There are no
standard notation names or identifiers for well-known formats such as
JPEG. Sometimes the identifier points to an application that can be
used to process the format, for example viewer.exe, and other times
it points to documentation about that format. Sometimes it is simply
an abbreviation that can be interpreted by an application. Schema
494 Chapter 19 | Topics for DTD users

processors do not resolve these identiﬁers; it is up to the consuming

application to process the notations as desired.
To indicate that a picture element contains JPEG data, it will
generally have a notation attribute (for example, fmt) that indicates
which notation applies. An element should only have one notation
attribute.
Example 19–23 shows an instance that uses a notation. The fmt at-
tribute contains the name of the notation that applies to the contents
of picture.

Example 19–23. Using a notation in an instance

<picture fmt="jpeg">47494638396132003200F7FF00FFFFFFFFFFCCFFFF99FF
FF66FFFF33FFFF00FF</picture>

19.7.1 Declaring a notation

Notations in XML Schema are declared using notation elements,
whose syntax is shown in Table 19–6. Notations are always declared
globally, with schema as their parent. Notations are named components
whose qualiﬁed names must be unique among all notations in a schema.
Like other named, global components, notations take on the target
namespace of the schema document. However, for compatibility, it is
recommended that notations only be declared in schemas that have no
target namespace.

Table 19–6 XSD Syntax: notation

Name
notation

Parents
schema, 1.1
override

(Continues)
19.7 | Notations 495

Table 19–6 (Continued)

Attribute name Type Description
id ID Unique ID.
name NCName Name of the notation.
public token Public identiﬁer; at least one of the system
and public attributes must be present.
system anyURI System identiﬁer.
Content
annotation?

19.7.2 Declaring a notation attribute

As mentioned earlier, elements that contain data described by a notation
have a notation attribute. This attribute has a type that restricts the
type NOTATION by specifying one or more enumeration facets. Each
of these enumeration values must match the name of a declared
notation.
Example 19–24 shows two notation declarations that represent
graphics formats. A simple type PictureNotationType is then
deﬁned, based on NOTATION, which enumerates the names of the no-
tations. Next, an element declaration for picture is provided which
declares an attribute fmt of type PictureNotationType.

Example 19–24. Declaring notations and notation attributes

<xs:notation name="jpeg" public="JPG"/>
<xs:notation name="gif" public="GIF"/>

<xs:simpleType name="PictureNotationType">
<xs:restriction base="xs:NOTATION">
<xs:enumeration value="jpeg"/>
<xs:enumeration value="gif"/>
</xs:restriction>
</xs:simpleType>

(Continues)
496 Chapter 19 | Topics for DTD users

Example 19–24. (Continued)

<xs:element name="picture">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:hexBinary">
<xs:attribute name="fmt" type="PictureNotationType"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>

19.7.3 Notations and unparsed entities

Example 19–24 showed the graphics data embedded directly in the
XML in binary format. Notations can also be used to indicate the for-
mat of an unparsed general entity. Example 19–25 shows an XML
document that lists products and links to pictures of those products.
In the schema, picture is declared to have an attribute location
that is of type ENTITY. In the instance, each value of the location
attribute (in this case, prod557 and prod563) matches the name of
an entity declared in the internal DTD subset for the instance. The
entity, in turn, refers to the notation via the NDATA parameter. In this
case, the notation must appear in the internal DTD subset of the
instance in order for the entity to be able to reference it.

Example 19–25. A notation with an unparsed entity

Schema:

<xs:element name="picture">
<xs:complexType>
<xs:attribute name="location" type="xs:ENTITY"/>
</xs:complexType>
</xs:element>

(Continues)
19.8 | Comments 497

Example 19–25. (Continued)

Instance:

<!DOCTYPE catalog SYSTEM "catalog.dtd" [

<!NOTATION jpeg SYSTEM "JPG">
<!ENTITY prod557 SYSTEM "prod557.jpg" NDATA jpeg>
<!ENTITY prod563 SYSTEM "prod563.jpg" NDATA jpeg>
]>

19.8 Comments
DTDs often use comments to further explain the declarations they
contain. Schema documents, as XML, can also contain comments.
However, XML Schema also offers an annotation facility that is de-
signed to provide more structured, usable documentation of schema
components. Example 19–26 shows a DTD fragment that has a com-
ment describing a section (CUSTOMER INFORMATION) and two element
declarations with element-specific comments appearing before each one.
The corresponding schema places each of these comments within
an annotation element. The first annotation element, which de-
scribes the section, appears as a direct child of the schema. The element-
specific annotations, on the other hand, are defined entirely within
the element declarations to which they apply. In all three cases,
documentation elements are used, which are designed for human-
readable information. The schema is considerably more verbose than
498 Chapter 19 | Topics for DTD users

the DTD, but the descriptive information is much better structured.

Section 21.8 on p. 580 covers schema documentation in detail.

Example 19–26. Comments

DTD:

<!ELEMENT billTo (%AddressType;)>

<!ELEMENT shipTo (%AddressType;)>

Schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:doc="http://datypic.com/doc">

<xs:annotation>
<xs:documentation>
<doc:section>CUSTOMER INFORMATION</doc:section>
</xs:documentation>
</xs:annotation>

<xs:element name="billTo" type="AddressType">

<xs:annotation>
<xs:documentation>
<doc:description>billing address</doc:description>
</xs:documentation>
</xs:annotation>
</xs:element>

<xs:element name="shipTo" type="AddressType">

<xs:annotation>
<xs:documentation>
<doc:description>shipping address</doc:description>
</xs:documentation>
</xs:annotation>
</xs:element>
</xs:schema>
19.9 | Using DTDs and schemas together 499

19.9 Using DTDs and schemas together

There is nothing to prevent an instance from being validated against
both a DTD and a schema. In fact, if you wish to use general entities,
you must continue to use DTDs alongside schemas. Example 19–27
shows an instance that has both a DTD and a reference to a schema.

Example 19–27. Using a DTD and a schema

<!DOCTYPE catalog SYSTEM "catalog.dtd" [
<!NOTATION jpeg SYSTEM "JPG">
<!ENTITY prod557 SYSTEM "prod557.jpg" NDATA jpeg>
<!ENTITY prod563 SYSTEM "prod563.jpg" NDATA jpeg>]>

Two separate validations can take place: one against the DTD and
one against the schema. The DTD validity will be assessed first. This
process will not only validate the instance, but also augment it by re-
solving the entities, filling in attributes’ default values, and normalizing
whitespace in attribute values. Validity according to the schema is then
assessed on the augmented instance. None of the declarations in the
DTD override the declarations in the schema. If there are declarations
for the same element in both the DTD and the schema and these
declarations are conflicting, an element may be DTD-valid but not
schema-valid.
XML information
modeling

500
Chapter

his chapter addresses some of the general modeling and de-

T sign questions that come up when designing XML docu-

ments, and to a lesser extent the schemas that describe them.
For developers who are accustomed to deﬁning data as
entity-relationship models, relational tables, UML, and object-oriented
models and classes, there is a learning curve associated with the
hierarchical model of XML.
This chapter will help you up that curve. It ﬁrst compares XML
modeling and design to other disciplines, such as relational models and
object-oriented models, and shows how XML Schema features can be
used to describe these models. It then provides some general design
principles for modeling web services, dealing with document-oriented
narrative content, and working with a hierarchical information model.

501
502 Chapter 20 | XML information modeling

20.1 Data modeling paradigms

If you are approaching the subject of XML Schema with some previous
background in data design, you may be wondering how to represent
in XML concepts from

# Relational models, such as entity-relationship data models or

relational database design
# Object-oriented models, which may exist for example as UML
class diagrams and/or object-oriented program code

You may continue to use these modeling paradigms along with your
XML application. For example, you may be parsing XML and storing
it in a relational database (this is sometimes known as “shredding”), in
which case you still have a relational model for your data. You may be
processing your XML documents with object-oriented code, so there
still needs to be a correspondence between the XML and the object
model.
Some schema designers choose to maintain these models, such as
UML models, entity-relationship diagrams, and/or supplementary
documentation, alongside the XML Schema. Others rely more heavily
on the XML Schema to represent the entire model. This is convenient
in that there is a one-to-one mapping to the actual XML documents
that are in use. However, it does have some drawbacks in that XML
Schema cannot express every constraint on the data and is somewhat
technology-speciﬁc.
Some developers maintain a connection between the models using
toolkits that generate program code or even databases. It is particularly
common to use data binding toolkits to generate object-oriented
classes from schemas. As appropriate, this chapter describes some of
the considerations for designing XML documents to optimize the use
of these toolkits.
20.2 | Relational models 503

20.2 Relational models

Designing an XML message structure is different in some ways from
traditional entity-relationship modeling and relational database design,
where the data model is a persistent-storage representation of the data.
When creating an entity-relationship model, great care is (hopefully)
taken to define what an entity is, as opposed to how it is used in any
particular context. For example, when you model a “customer” entity,
you decide on your definition of a customer, its unique identifier, and
all of its attributes. You also normalize all the relationships between
customers and other entities: For example, a customer can have one or
more addresses, and can be associated with zero or more purchases.
An XML message, on the other hand, often represents a particular
usage or view of the data, useful at a particular time in a certain opera-
tion. Instead of being the definitive source for all information about
that entity, it contains only the subset that is useful for the operation
in question. For a purchase order, you probably do not need to include
all of the information that can be known about a customer; perhaps
you just need an identifier, name, and shipping address. For a line item
in the purchase order, you may need to know a product’s identifier,
name, and price, but not its other attributes such as a long description
or a list of features.
Relationships also differ in the two models. In an entity-relationship
or relational model, there is no single starting point to the model; enti-
ties exist and can be accessed independently of each other. In an XML
hierarchy, one element must be at the root of the structure, and there
is an implied relationship between all of the elements within that hier-
archy. Again, only the relationships that are relevant to the particular
message are included, and their cardinality may differ in the message
as compared to the relational data model. Representing relationships
in XML is discussed later in this chapter.
In an ideal scenario, you will have a standardized canonical model
that you will draw on for your XML message schemas. Just as in
relational database design, in XML message design it makes sense to
use the same element names, types, and relationships for the same data
504 Chapter 20 | XML information modeling

where possible. For example, if your corporate data model says that an
Address entity has the properties line1, line2, city, state, and
zip, it makes sense to use the same deﬁnitions and names (or the
relevant subset of them) for the elements in your XML messages.
On the other hand, it is best to avoid tightly coupling your XML
messages with any one relational database schema. You might use the
same names and deﬁnitions if they are well-designed, but should not,
for example, generate your XML schemas from relational databases or
have your application automatically insert the contents of XML ele-
ments into relational columns of the same name. This would create
too close a relationship between the XML message and the database,
where the message schema would have to change if the database changes.

20.2.1 Entities and attributes

In a relational model, you will typically have entities, each with a set
of attributes or properties. In a relational database, these would be im-
plemented as tables and columns, with each instance represented as a
row with multiple cells. In XML, this roughly translates into elements
with complex content and elements with simple content. For the entity-
relationship model shown in Figure 20–1, our ﬁrst cut at representing

Figure 20–1 Entity-relationship diagram

20.2 | Relational models 505

that in XML (leaving aside the relationships for now) might be as shown
in Example 20–1.

Example 20–1. A simple representation of relational entities in XML

<model>
<order>
<number>1234</number>
<total>213.12</total>
</order>
<customer>
<number>12345</number>
<firstName>Priscilla</firstName>
<lastName>Walmsley</lastName>
</customer>
<address>
<type>Billing</type>
<line1>123 Main Street</line1>
<line2>Apartment 2B</line2>
<city>Traverse City</city>
<state>MI</state>
<zip>49684</zip>
</address>
<address>
<type>Shipping</type>
<line1>PO Box 9999999</line1>
<city>Traverse City</city>
<state>MI</state>
<zip>49686</zip>
</address>
<lineItem>
<giftWrap>bday</giftWrap>
<number>557</number>
<size>12</size>
<color>blue</color>
<quantity>1</quantity>
</lineItem>
<lineItem>
<number>443</number>
<size>L</size>
<color>tan</color>
<quantity>2</quantity>
</lineItem>

(Continues)
506 Chapter 20 | XML information modeling

Example 20–1. (Continued)

<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<availableSizes>2 4 6 8 10 12 14</availableSizes>
<availableColors>blue red</availableColors>
<desc>Our best-selling shirt!</desc>
</product>
<product>
<number>563</number>
<name>Ten-Gallon Hat</name>
<availableSizes>S M L</availableSizes>
</product>
<product>
<number>443</number>
<name>Deluxe Golf Umbrella</name>
<availableColors>tan black</availableColors>
<desc>Protect yourself from the rain!</desc>
</product>
</model>

This representation adheres strictly to a structure where every child

of model is an entity, and every grandchild is an attribute. However,
there can be any number of hierarchical levels of elements in XML,
which allows additional flexibility to add structure that is sometimes
useful. For example, it may be useful to combine firstName and
lastName into a parent name structure. This is described further in
Section 20.6.1 on p. 527.
Another difference is that with XML, any attribute or property can
repeat. With a relational database design, you would either need to
define a finite number of occurrences of an attribute or create a separate
table with repeating rows. In this example, instead of line1 and line2,
in XML it would be more natural to have a repeating line element.
The order of elements is significant in XML, so the numeric designators
are not required to indicate whether it is the first or second line of the
address.
20.2 | Relational models 507

20.2.2 Relationships
An entity-relationship model allows entities to be independent of each
other and have relationships to various other entities. Sometimes these
relationships map naturally onto a hierarchical XML model, especially
in the case of XML messages that represent a temporary view on the
data. Sometimes it is more of a challenge to represent relationships
in XML.

20.2.2.1 One-to-one and one-to-many relationships

Many relationships in XML are simply modeled as parent-child rela-
tionships, also known as containment relationships. For example,
suppose each customer only has one address. In an entity-relationship
model, there might be two separate entities: one for customer and one
for address. This would be a good design if other entities (such as sup-
pliers) also have addresses. In XML, this would be modeled as a parent-
child relationship, where the customer element would contain a single
address element.
A similar approach works if a customer can have more than one ad-
dress. In an entity-relationship model, there would be a one-to-many
relationship between the customer and the address. In XML, the
customer element can simply contain more than one address
element.

20.2.2.2 Many-to-many relationships

Many-to-many relationships are harder to represent directly in XML.
In many cases, since an XML message represents a temporary view of
the data, a one-to-many containment relationship is sufﬁcient even if
a many-to-many relationship exists in the real world. For example, a
purchase order might represent orders for more than one product, and
any one product can be ordered using many purchase orders, but for
the purposes of the message you only need to follow the relationship
in one direction: include all the products for the given purchase order.
508 Chapter 20 | XML information modeling

Sometimes there is a many-to-many relationship that does need to

be fully represented in one message. Suppose that instead of a single
purchase order, our XML document represents a summary report that
shows all the orders and the products that are ordered, over time. An
order can be for multiple products, and a product can be part of mul-
tiple orders, and this many-to-many relationship needs to be represented
in the XML. There are several ways to approach this, described in this
section.

20.2.2.2.1 Approach #1: Use containment with repetition

In this approach, you choose one entity as the parent, for example
order. Within each order, there is a repeating product child that
contains all of the product information. This is shown in Example 20–2.
This will result in some products being repeated, in their entirety, more
than once in the message, as is the case for product number 557 in the
example.
This is a perfectly acceptable solution for low volumes of infor-
mation with low repetition. However, if there were a lot of other
information in the message about each product, and a product could
be repeated in dozens of orders, the message would quickly become
unnecessarily large.

Example 20–2. Relationship with repetition

(Continues)
20.2 | Relational models 509

Example 20–2. (Continued)

<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<availableColors>blue red</availableColors>
</product>
</lineItem>
<lineItem>
<size>L</size>
<color>tan</color>
<quantity>2</quantity>
<product>
<number>443</number>
<name>Deluxe Golf Umbrella</name>
<availableColors>tan black</availableColors>
</product>
</lineItem>
</order>
<order>
<number>5678</number>
<total>245.55</total>
<lineItem>
<giftWrap>bday</giftWrap>
<size>12</size>
<color>blue</color>
<quantity>1</quantity>
<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<availableColors>blue red</availableColors>
</product>
</lineItem>
<lineItem>
<size>L</size>
<quantity>1</quantity>
<product>
<number>563</number>
<name>Ten-Gallon Hat</name>
<availableSizes>S M L</availableSizes>
</product>
</lineItem>
</order>
</report>
510 Chapter 20 | XML information modeling

20.2.2.2.2 Approach #2: Use containment with references

Another option is to keep the orders and products separate and use
unique identiﬁers to specify the relationships. This approach is similar
to foreign keys in a database. An order might contain several reference
elements that refer to unique keys of the products. An example is shown
in Example 20–3.
Here, as in the previous approach, the relationship from order to
product is represented, but not the relationship back from product
to order. However, the relationship from product to order can be
gleaned from the XML using program code.

Example 20–3. Relationship via reference

(Continues)
20.2 | Relational models 511

Example 20–3. (Continued)

<lineItem>
<size>L</size>
<quantity>1</quantity>
<productRef ref="563"/>
</lineItem>
</order>
<product>
<number>443</number>
<name>Deluxe Golf Umbrella</name>
<availableColors>tan black</availableColors>
</product>
<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<availableColors>blue red</availableColors>
</product>
<product>
<number>563</number>
<name>Ten-Gallon Hat</name>
<availableSizes>S M L</availableSizes>
</product>
</report>

In your schema, you can use either ID- and IDREF-typed attributes
or identity constraints to validate the relationship. Identity constraints,
described fully in Chapter 17, use the key and keyref elements. This
is shown in Example 20–4, where the key element defines the unique
identifier of each product. Within it, selector identifies the element
that needs to be unique (the product), and field specifies the
element that contains the unique identifier (the number child).
The keyref element is used to establish the foreign key relationship
from the productRef’s ref attribute to the product element’s number
child. It uses a syntax similar to the key element, except that it also
includes a refer attribute that indicates the key to which it refers.
Compared to the first approach, this type of structure can be harder
to process, either in XPath or in program code generated by data
binding tools. Although the relationship can be expressed and validated
512 Chapter 20 | XML information modeling

Example 20–4. Using identity constraints to validate references

<xs:element name="report" type="ReportType">
<xs:key name="productKey">
<xs:selector xpath=".//product"/>
<xs:field xpath="number"/>
</xs:key>
<xs:keyref name="productKeyRef" refer="productKey">
<xs:selector xpath=".//productRef"/>
<xs:field xpath="@ref"/>
</xs:keyref>
</xs:element>

using a schema, defining it via the schema identity constraints will not
have any particular representation or meaning in generated class defini-
tions. For example, for a generated Order class, it will not generate a
getProduct method that will go out and get a related Product object,
whereas with the first approach you can simply use a getProduct
method. However, this approach has the advantage of being a lot less
verbose if there is a lot of product information and/or it is repeated
many times.

20.2.2.2.3 Approach #3: Use relationship elements

A third option is to use a separate relationship element (called, for ex-
ample, orderProductRelationship) placed outside the contents of
either the order or the product elements. This is shown in Exam-
ple 20–5. This has the advantage of representing the relationship in
both directions in a compact way. It also provides a container for
information about that relationship, such as the quantity and color
ordered in our example. The disadvantage is that this is even more
difﬁcult to process using generated classes. It compounds the issues
with the previous method by requiring yet a third unrelated object
(orderProductRelationship) that has to be retrieved.
20.2 | Relational models 513

Example 20–5. Using a separate relationship element

<report>
<order>
<number>1234</number>
<total>213.12</total>
</order>
<order>
<number>5678</number>
<total>245.55</total>
</order>
<product>
<number>443</number>
<name>Deluxe Golf Umbrella</name>
<availableColors>tan black</availableColors>
</product>
<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<availableColors>blue red</availableColors>
</product>
<product>
<number>563</number>
<name>Ten-Gallon Hat</name>
<availableSizes>S M L</availableSizes>
</product>
<orderProductRelationship>
<orderRef ref="1234"/>
<productRef ref="557"/>
<giftWrap>bday</giftWrap>
<size>12</size>
<color>blue</color>
<quantity>1</quantity>
</orderProductRelationship>
<orderProductRelationship>
<orderRef ref="1234"/>
<productRef ref="443"/>
<size>L</size>
<color>tan</color>
<quantity>2</quantity>
</orderProductRelationship>

(Continues)
514 Chapter 20 | XML information modeling

Example 20–5. (Continued)

20.3 Modeling object-oriented concepts

Object-oriented concepts ﬁt nicely with XML Schema. Complex types
in XML Schema are like classes, and element declarations are like in-
stance variables that have those classes. Some of the considerations
described in the previous section apply to object-oriented concepts as
well. Objects are in some ways analogous to entities, and they can have
associations (relationships) that can be represented using the three ap-
proaches described. Some additional object-oriented concepts are
compared to XML in this section.

20.3.1 Inheritance
Object-oriented inheritance can be implemented using type derivation
in XML Schema. For example, suppose we want to have separate ele-
ments for three different kinds of products: shirts, hats, and umbrellas.
They have some information in common, such as product number,
name, and description. The rest of their content is speciﬁc to their
subclass: Shirts might have a choice of sizes and a fabric. A hat might
20.3 | Modeling object-oriented concepts 515

also have a choice of sizes, with the values conforming to a differ-

ent sizing scheme, as well as a different property like an SPF rating.
This is depicted as an object model in Figure 20–2.

Figure 20–2 Class inheritance

In XML Schema, you can create a ProductType type, like the one
shown in Example 20–6, that speciﬁes the content common to all
three types. The type can optionally be abstract, meaning that it cannot
be used directly by an element declaration.

Example 20–6. An abstract product type

<xs:complexType name="ProductType" abstract="true">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="desc" type="xs:string"/>
</xs:sequence>
</xs:complexType>

You can then derive three new types from ProductType, one for
each kind of product. An example of ShirtType is shown in
Example 20–7.
516 Chapter 20 | XML information modeling

Example 20–7. A derived shirt type

<xs:element name="shirt" type="ShirtType"/>
<xs:complexType name="ShirtType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:sequence>
<xs:element name="fabric" type="xs:string"/>
<xs:element name="availableSizes"
type="AvailableShirtSizesType"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:simpleType name="AvailableShirtSizesType">
<xs:list>
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
</xs:list>
</xs:simpleType>

As you can see, ShirtType is deﬁned as an extension of

ProductType. This means that it inherits the entire content
model of ProductType and adds two new elements, fabric and
availableSizes, which must appear at the end. A valid instance
of ShirtType is shown in Example 20–8.

Example 20–8. Instance of the derived shirt type

<shirt>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<desc>Our best-selling shirt!</desc>
<fabric>linen</fabric>
<availableSizes>2 4 6 8 10 12 14</availableSizes>
</shirt>
20.3 | Modeling object-oriented concepts 517

A different extension of ProductType could then be deﬁned for the

hat element, as shown in Example 20–9.

Example 20–9. A derived hat type

<xs:element name="hat" type="HatType"/>
<xs:complexType name="HatType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:sequence>
<xs:element name="spfRating" type="xs:integer"/>
<xs:element name="availableSizes"
type="AvailableHatSizesType"/>
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
<xs:simpleType name="AvailableHatSizesType">
<xs:list>
<xs:simpleType>
<xs:restriction base="xs:string"/>
</xs:simpleType>
</xs:list>
</xs:simpleType>

HatType also adds an availableSizes element, but its type is

different from that in ShirtType. Using locally declared elements,
as we do here, allows the same element name to have different
types in different contexts.
Data binding tools will generally treat complex type extensions as
class inheritance. They will generate a class named ShirtType that
extends the class named ProductType. ShirtType will inherit all
methods from ProductType and add new ones. This can be beneficial
for the same reasons it is beneficial in object-oriented programming.
Defining the common components once ensures that they are consistent
and makes it easier to write and maintain the code that manipulates
the common data. In addition, objects can be treated in the application
code either generically as products, or specifically as shirts or hats,
depending on the needs of the application.
518 Chapter 20 | XML information modeling

XML Schema offers several other ways of representing class

hierarchies.

# Complex types can be derived by restriction (rather than exten-

sion) from other complex types. The derived type must allow
only a subset of what the base type allows. This is of limited
usefulness unless you are attempting to define a subset of another
schema.
# Type substitution allows the same element name to be used for
derived types (by extension or restriction). For example, I could
declare a product element whose type is ProductType,
and then in the message itself use the xsi:type attribute
to indicate a derived type—for example, <product
xsi:type="ShirtType">. It is essentially casting an individ-
ual product element to a subtype, ShirtType, at runtime.
While this can be an elegant way to achieve flexibility, it is
not well supported in tools. For more interoperable schemas,
it is best to define separate elements (shirt, hat, umbrella)
for each type (ShirtType, HatType, UmbrellaType), as
shown earlier in this section.
# Substitution groups describe hierarchies of elements rather than
types. They allow you to specify that one or more elements are
substitutable for another element. For example, I could put the
shirt, hat, and umbrella elements in the substitution group
of the product element. Then, anywhere product appears in
a content model, any of the other three elements could appear
instead. It is a more extensible alternative to choice groups.
However, the support of substitution groups in data binding
tools is also somewhat limited. Data-oriented messages are
usually fairly predictable, so unless you need this kind of flexi-
bility, you are better off creating a hierarchy of types and
explicitly stating where elements of each type can appear.
20.3 | Modeling object-oriented concepts 519

20.3.2 Composition
There is an alternative way to represent the fact that shirts, hats,
and umbrellas have properties in common. Through the use of
named model groups, XML Schema allows you to identify shared
content model fragments. This distinction could be seen as composi-
tion rather than generalization in object-oriented terminology. A shirt
deﬁnition is composed of product properties, plus has its own
properties.
Named model groups are described in detail in Chapter 15. Exam-
ple 20–10 shows a named model group, ProductProperties,
with a content model fragment describing all the generic product
information.

Example 20–10. Product property group

<xs:group name="ProductProperties">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="desc" type="xs:string"/>
</xs:sequence>
</xs:group>

Example 20–11 shows how ShirtType references the

ProductProperties group. It also uses a group element, but this
time with a ref attribute instead of a name attribute. The effect of
referencing a group is as if you cut and pasted the content of the group
into the place where it is referenced.
HatType could similarly reference the group to reuse the deﬁni-
tion of the product properties. The instances of shirt and hat
will look exactly the same as if we used type extension; there will be no
ProductProperties element in the message content.
The advantage of this approach is its ﬂexibility. When using named
model groups, the shared portion can appear anywhere in the content
520 Chapter 20 | XML information modeling

Example 20–11. Shirt type that uses the product property group
<xs:complexType name="ShirtType">
<xs:sequence>
<xs:group ref="ProductProperties"/>
<xs:element name="fabric" type="xs:string"/>
<xs:element name="availableSizes"
type="AvailableShirtSizesType"/>
</xs:sequence>
</xs:complexType>

model, not just at the beginning. This might be a consideration if the

order matters, although you should avoid giving any significance
to the order of elements. It’s also more flexible in that you can also
include more than one group in a content model. With type derivation,
there is only single inheritance.
The disadvantage is that although the generic product components
are shared in the schema, most data binding toolkits do not generate
shared code or interfaces for these components. They will typically
generate a ShirtType class that has all the generic product properties,
and a separate HatType class that has separate definitions of the
generic product properties. No separate ProductProperties class or
interface will be generated; it will be as if the group did not exist.
Use of named model groups is most appropriate when the types
represent different concepts that happen to have a few of the same child
elements. However, if the types are really subclasses of a more generic
class, it is better to use type derivation because the generated code will
be more useful and representative of the real model.
Another option for composition is to define a child element to con-
tain the shared information. We could use the same ProductType
complex type we defined before. Instead of deriving ShirtType
from it, we could give ShirtType a child element named
productProperties that has the type ProductType. This is shown
in Example 20–12.
20.3 | Modeling object-oriented concepts 521

Example 20–12. Using a child element for composition

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="desc" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ShirtType">
<xs:sequence>
<xs:element name="productProperties" type="ProductType"/>
<xs:element name="fabric" type="xs:string"/>
<xs:element name="availableSizes"
type="AvailableShirtSizesType"/>
</xs:sequence>
</xs:complexType>

In this case, the message instance will have an extra level of structure
with the productProperties element, as shown in Example 20–13.

Example 20–13. Instance of shirt type using child element

<shirt>
<productProperties>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<desc>Our best-selling shirt!</desc>
</productProperties>
<fabric>linen</fabric>
<availableSizes>2 4 6 8 10 12 14</availableSizes>
</shirt>

One advantage of this approach is that it clearly identiﬁes the shared

information and provides a hook to access it. This can make it easier
for the consumer of the message, especially when using a tool like
XSLT. Using toolkits, an advantage over using named model groups
is that one class will be generated for productProperties, so the
code to process and generate that part of the message can be shared for
both shirts and hats.
522 Chapter 20 | XML information modeling

The disadvantage of this approach is that it adds an additional layer,

and therefore some additional complexity, to the messages. Compared
to the type inheritance method, this can make the message more verbose
and writing and maintaining the code that implements the service
slightly more difﬁcult.

20.4 Modeling web services

Modeling web services has a lot in common with the concepts already
described in this chapter. Typically the information being passed in
web service messages consists of entities and/or objects. Some people
equate designing a web service with designing an object-oriented API.
There are some similarities: There are interfaces (services) that have
methods (operations) that encapsulate the underlying functionality.
For very ﬁne-grained utility services, this comparison holds true. For
example, you might have some data services that are used to put a
wrapper around certain low-level database transactions, such as inserting
and updating customer information. These services might roughly
resemble an object-oriented interface, with methods/operations like
updateCustomerStatus and insertCustomerAddress.
However, as you design more coarse-grained composite services, es-
pecially those that will be used across organizational boundaries, the
messages passed to and from the service are likely to be broader in scope
than any data values you would normally pass in a call to a method. It
is desirable, for reasons of performance, scalability, and reliability, for
a service to not be too “chatty.” A chatty service is one to which multiple
calls must be made, often synchronously, to get a useful result. Instead,
all the information the service needs to give you in a result should be
included in the same message. This means that information needed to
accomplish several different actions may be included in the message
itself.
For example, for a purchasing service, the message passed to the
service might include security information such as logon parameters,
20.4 | Modeling web services 523

the purchase order itself, the state of the purchase order, what action
needs to be performed next with it, the format of the desired response
or acknowledgement, and the location to send the response. Some of
this information will appear in the header. As the message is passed
from service to service, it might accumulate additional information,
such as customer details and more detailed pricing and tax informa-
tion, for each of the ordered items. Modeling all of this information
as a single message to be passed to an operation may not be intuitive
for the average object-oriented designer.
A complete discussion of designing service-oriented architectures
and their contracts is outside the scope of this book, but it is useful to
note several key points related to message design.

# As described above, the scope of a single message should be

somewhat coarse-grained; it should provide all of the content
needed for an entire operation. This means that the root
element type should be broad enough to include a variety of
information, not just a single entity.
# It is helpful to use specific root elements for individual opera-
tions. For example, if the operation is to submit a purchase or-
der, it is helpful to have a root element name that specifically
states this using a verb, for example submitPurchaseOrder
rather than a generic message element or even a
purchaseOrder element. If you are using WSDL, you specify
the root element for the input and output of each operation in
a service. Using more specific root element names improves
validation and results in a more precise interface that makes it
easier to develop code to process.
# Although individual operations may use different root elements,
it is highly desirable for the contents of the various message
types to have as much in common as possible. Reusing types at
all levels is very important in a service-oriented environment to
avoid thousands of point-to-point transformations between
different representations of, for example, addresses or products.
524 Chapter 20 | XML information modeling

Some organizations develop canonical models in XML Schema

whose use is mandatory in all newly developed services, as a way
of formalizing proper reuse.

20.5 Considerations for narrative content

Narrative content—the kind you read in documents like books, articles,
and web pages—is very different from what we have been discussing
so far in this chapter. But that unpredictable human-written, human-
readable stuff also occurs alongside data-oriented content in even (or
especially) the best-managed documents.
This section ﬁrst introduces a key distinguishing trait of narrative
content, the relationship between semantics and style. It then discusses
other considerations for modeling and schema design.

20.5.1 Semantics vs. style

It is desirable in designing a narrative XML vocabulary to focus on the
semantics, or meaning, of the content, as opposed to its appearance.
For example, a phrase might be italicized because it is a foreign
phrase or a citation. It is helpful to have separate elements (for example,
foreign and citation) to reﬂect the semantics, instead of a single
element indicating the italicization (i). Presentation-only features, such
as colors and fonts, are best left to a stylesheet, not embedded in an
XML document itself.

20.5.1.1 Beneﬁts of excluding styling

The separation of pure semantic content from presentation style allows
you to

# Style content separately for different output devices such as

smartphones, e-book readers, and alternative web browsers
20.5 | Considerations for narrative content 525

# Implement interactive behavior associated with certain kinds

of text—for example, hyperlinks for intra- or inter-document
references or pop-up directions for addresses
# Generate derivative presentations of the content, such as tables
of contents, indexes, and summary views
# Provide more focused searching based on names of elements in
the content
# Ensure more consistent formatting across an entire body of
content
# Improve veriﬁcation of the content—for example, determine
whether intra-document references in a legal document are valid

20.5.1.2 Rendition elements:“block” and “inline”

Formatting an XML document occurs conceptually in two steps:

1. Rendition, which produces an abstract layout of the document,

not yet particularized for a speciﬁc output device. It contains
such elements as “page,” “frame,” “block,” and “inline.”
2. Presentation, which applies precise styling to produce a layout
that is coded for the speciﬁc output device.

The mapping from elements in the document to rendition elements

occurs in the stylesheet, along with the presentation styling. Usually,
an element with characters at the top level of its content is rendered as
either a “block” or an “inline” element. When presented, the content
of a block element occupies its own vertical space in a page or frame,
and the content of its inline elements occurs within that space. Examples
of block elements are p in HTML and block in XSL-FO.
Elements rendered inline are generally used for identifying text
within a paragraph for distinctive formatting and/or special processing
(such as getting its content from a database). In HTML, b (bold) and
a (anchor) are examples of inline elements. Substitution groups are
526 Chapter 20 | XML information modeling

often used to represent inline elements, since they can appear

interchangeably in many different content models.

20.5.2 Considerations for schema design

20.5.2.1 Flexibility
XML models for narrative content tend to be much more ﬂexible than
data-oriented models. While you can mandate that every product has
a product number and name, you are unlikely to mandate that every
paragraph must contain a URL and an emphasized phrase in bold. The
order of the actual elements is important in a narrative model (it is
important that Chapter 1 appears before Chapter 2) but the order of
the kinds of elements is generally less important. You would not require,
for example, that all tables must appear after all paragraphs in a section.

20.5.2.2 Reusing existing vocabularies

If you are writing schemas to model general narrative content, it is
highly desirable to reuse existing vocabularies, such as XHTML, XSL-
FO, DocBook, and NLM XML. Be sure to choose appropriate elements
for the degree of semantic speciﬁcity of your schema. XHTML and
XSL-FO, for example, include rendition elements.
Reusing parts of these vocabularies will save you time and ensure
that your vocabulary is consistent with industry-accepted norms. You
can either extend these standards using their formally deﬁned methods,
or pick and choose a subset of them that is useful to you.

20.5.2.3 Attributes are for metadata

Another characteristic of narrative XML models is that all of the “real”
content of an element, such as visible content of a web page or a
printed page, is contained in the element’s syntactic content (i.e., be-
tween its start and end tags). Attributes are reserved for metadata about
that content, for example the content’s last revision date or its source.
20.6 | Considerations for a hierarchical model 527

20.5.2.4 Humans write the documents

Another consideration for designing narrative content models is that
the corresponding XML document instances are far more likely to be
hand-created by human users. Although they will be most likely using
an editor to help them navigate the model, you should take their needs
into consideration when designing the schemas:

# Pay special attention to consistency and clarity in the model.

# Do not offer multiple different ways to represent the same thing.
# Although ﬂexibility is sometimes desirable, too many choices
can be overwhelming. Instead of allowing an article, for ex-
ample, to have too many child element choices, introduce an
intermediate level comprising front, body, and back elements:
front would allow elements such as title, author, etc.; the
body, elements like section, list, figure, table, etc.; and
the back, elements like index and appendix.
# Create separate authoring documentation that does not make
use of XML Schema terminology.

20.6 Considerations for a hierarchical model

XML modeling has more ﬂexibility than other modeling paradigms in
that you can have an unlimited number of levels of a hierarchy,
and there are a variety of ways of organizing that hierarchy. This section
describes some special considerations that allow you to take advantage
of the ﬂexibility of XML.

20.6.1 Intermediate elements

In XML, you can introduce intermediate elements anywhere in a
model to make the document easier to process and understand. Using
them as containers to group related elements together can be beneﬁcial
in promoting reuse, organizing the messages and generated code more
528 Chapter 20 | XML information modeling

logically, simplifying mapping to existing systems, and allowing more

expressive content models.
Going back to our order example, you could deﬁne it using a fairly
ﬂat structure, depicted in Example 20–14. There are lineItem ele-
ments to group each line item together, but otherwise all of the data
elements are at the same level of the purchase order. One possible
schema to describe this document is shown in Example 20–15.

Example 20–14. A ﬂat order example

<order>
<number>12345</number>
<date>2012-10-31</date>
<customerNumber>12345</customerNumber>
<customerName>Priscilla Walmsley</customerName>
<billToAddressLine>123 Main Street</billToAddressLine>
<billToCity>Traverse City</billToCity>
<billToState>MI</billToState>
<billToZip>49684</billToZip>
<shipToAddressLine>5100 Garfield Road</shipToAddressLine>
<shipToCity>Hillsborough</shipToCity>
<shipToState>NJ</shipToState>
<shipToZip>08876</shipToZip>

<lineItem>
<number>557</number>

</lineItem>
<lineItem>
<number>443</number>

</lineItem>
</order>

The order element shown in Example 20–14 contains all the re-
quired data, but its design has several weaknesses. The ﬁrst is that it
does not take advantage of reuse opportunities. The structure of the
bill-to and ship-to addresses is the same, but it is deﬁned twice in
the design. The schema describing this document has to declare each
city element twice, each state element twice, and so on. Since the
20.6 | Considerations for a hierarchical model 529

Example 20–15. A ﬂat order schema

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="order" type="OrderType"/>
<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="date" type="xs:date"/>
<xs:element name="customerNumber" type="xs:integer"/>
<xs:element name="customerName" type="xs:string"/>
<xs:element name="billToAddressLine" type="xs:string"
maxOccurs="unbounded"/>
<xs:element name="billToCity" type="xs:string"/>
<xs:element name="billToState" type="xs:string"/>
<xs:element name="billToZip" type="xs:string"/>
<xs:element name="shipToAddressLine" type="xs:string"
maxOccurs="unbounded"/>
<xs:element name="shipToCity" type="xs:string"/>
<xs:element name="shipToState" type="xs:string"/>
<xs:element name="shipToZip" type="xs:string"/>
<xs:element name="lineItem" type="LineItemType"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="LineItemType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>

</xs:sequence>
</xs:complexType>
</xs:schema>

element names are different, any code that handles address information
(for example, to populate it or display it) also has to be written twice,
once for each set of element names.
A better design is shown in Example 20–16, where two intermediate
elements, billToAddress and shipToAddress, have been added to
represent the bill-to and ship-to addresses. The two have identical
children, which means that they can share the same complex type. It
is named AddressType and is shown in Example 20–17 with the re-
vised OrderType, whose elements reference it. AddressType is not
530 Chapter 20 | XML information modeling

only used twice in the revised schema for this message, but may also
be reused in other schemas in other contexts.

Example 20–16. More structured address information

<order>

<billToAddress>
<addressLine>123 Main St.</addressLine>
<city>Traverse City</city>
<state>MI</state>
<zip>49684</zip>
</billToAddress>
<shipToAddress>
<addressLine>5100 Garfield Road</addressLine>
<city>Hillsborough</city>
<state>NJ</state>
<zip>08876</zip>
</shipToAddress>

</order>

Example 20–17. AddressType and revised OrderType deﬁnitions

<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="date" type="xs:date"/>
<xs:element name="customerNumber" type="xs:integer"/>
<xs:element name="customerName" type="xs:string"/>
<xs:element name="billToAddress" type="AddressType"/>
<xs:element name="shipToAddress" type="AddressType"/>
<xs:element name="lineItem" type="LineItemType"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="AddressType">
<xs:sequence>
<xs:element name="addressLine" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="state" type="xs:string"/>
<xs:element name="zip" type="xs:string"/>
</xs:sequence>
</xs:complexType>
20.6 | Considerations for a hierarchical model 531

In addition to reuse, another beneﬁt of the more detailed structure

is that a code generation tool will generate a separate class to represent
the address information. This tends to be more logical to the developer,
and can make it easier to integrate existing systems if, for example, the
party information is part of a different application or database than
the purchase order information. This further promotes reuse, since the
class written to handle address information can be reused as well as
the complex type.
Intermediate elements can also allow more robust content models.
In this case, if the ship-to address can be optional (for example,
when it is the same as the bill-to address), you can make the entire
shipToAddress element optional. You can then specify that if
the shipToAddress does appear, it must have certain required
children such as addressLine and zip. In the flat structure, the only
option would be to make all of the shipToXxx elements optional,
which would be a much less expressive content model. It would allow
illogical or incomplete documents—for example, one containing a
shipToAddressLine but not a shipToCity.
Finally, the use of intermediate elements can address extensibility
and address versioning problems. If you later decide that there can be
more than one ship-to address (for example, in the case of multi-
shipment orders), you can simply increase the maxOccurs attribute
on the shipToAddress element declaration without introducing a
backward-incompatible change.
Example 20–17 shows an appropriate structure for the purchase order
document. However, it is possible to have too many intermediate ele-
ments. Excessive levels of nesting in an XML message can make it
difficult to understand and overly lengthy. It can also make the schema
and program code more difficult to maintain.

20.6.2 Wrapper lists

A slightly different kind of intermediate element is a container element
that is used to group lists of like elements together. In the ﬂat order
532 Chapter 20 | XML information modeling

example shown in Example 20–14, all of the lineItem elements appear

at the top level of the order. It is common practice to place repeating
elements into a container element whose name is usually the plural of
the name of the element being repeated. In our case, we would wrap
our lineItem elements into a lineItems element, as shown in
Example 20–18.
This has some of the same beneﬁts described in the previous section,
namely extensibility and more expressive content models. It is more
extensible because, if you later decide to keep some other information
about the list, or change the contents of the list, you do not need to
make a backward-incompatible change to the outer complex type
(OrderType). In version 1.0, it will also allow a more expressive content
model if you choose to use all groups instead of sequence groups in
the outer content model, because it will get around the problem of all
groups not allowing repeating elements. Finally, documents with con-
tainer elements can be easier to process using technologies like XSLT.

Example 20–18. A repeating container element lineItems

20.6.3 Level of granularity

Another factor to consider is how far down to take the hierarchy. Many
data items have composite values, and it is sometimes unclear to what
extent they should be broken down into separate elements. For example,
20.6 | Considerations for a hierarchical model 533

suppose a product ID consists of two letters that represent its depart-

ment, followed by a four-digit number. Should all six characters be
modeled as one element, or should it be broken down into two
subelements?
It depends on how that data item is to be used. The value should be
split up if:

# The components are available separately, or can be separated

by a known parsing algorithm. For example, if an address is al-
ways stored as a whole text block by an application that gathers
this information, it may not be feasible to split it apart to put
it in the XML document.
# The components will be processed separately, for example for
display or for use in arithmetic operations.
# The objects will be sorted by one or more of the components
separately.
# The components have different data types.
# The components should be validated separately.
# The components need to establish a higher level of constraint
granularity.

It is easier to concatenate two data values back together than it is to

parse them apart, especially if the logic for splitting them is complex.
On the other hand, if the value is always used as a whole by message
consumers, it can be kept together. It comes down to the functional
nature of the application. For example, a service that simply provides
product information for display might offer the product ID as one
value, while an order application that needs to treat departments
separately may split it up.

20.6.4 Generic vs. speciﬁc elements

Another decision to make when modeling XML is how speciﬁc to make
your element names. Using more generic element names allows for
534 Chapter 20 | XML information modeling

flexibility, but can limit validation specificity. One case where this
comes into play is when you have several data items that represent a
particular class of things, but each is a specialization. It is a design de-
cision whether to use element names that represent the overall class or
the specialized subclasses. Using the product example, each product
has a number of features associated with it. Each feature has a name
and a value. One way to represent this is by declaring a different specific
element for each feature. To indicate whether a product is mono-
grammable, you might have a monogrammable element of type
boolean. Example 20–19 shows some product features marked up
with specific elements.

Example 20–19. Speciﬁc element names

The downside of using these specific element names is that they are
not very flexible. Every time a new feature comes along, which can be
relatively often, a number of changes have to be made. The schema
must be modified to add the new element declaration for the feature.
Applications that use those documents, including any generated code,
must also be changed to handle the new features.
On the other hand, you could use a more generic feature element
that contains the value of the feature, and put the name of the feature
in a name attribute, as shown in Example 20–20.
A product schema that uses a generic feature element is shown in
Example 20–21. Certain fundamental features such as number and
name still have specific elements, because they are common to all
products and are important to validate. Both the value and the name
of the feature are defined as strings.
20.6 | Considerations for a hierarchical model 535

Example 20–20. Generic element names

Example 20–21. ProductType with generic feature capability

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="desc" type="xs:string"/>
<xs:element name="feature" maxOccurs="unbounded"
type="FeatureType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="FeatureType">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="name" type="xs:string"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>

This is far more ﬂexible, in that new features do not require changes
to the schema or the basic structure of the service classes. The only
modiﬁcation that needs to be made is that the code that creates
feature elements must add one for the new feature.
There is a downside to using generic elements, however. One is that
you cannot specify data types for the values. There is no way in XML
Schema to say “if a feature element’s name attribute is weight, make
the content integer, and if it’s monogrammable, make it boolean.”
In version 1.0, this means that you cannot take advantage of XML
Schema type validation to ensure that the values in the message conform
to, for example, an enumerated list or range of values. This is not an
536 Chapter 20 | XML information modeling

issue when using specific elements because you simply create separate
weight and monogrammable elements with different types.
Another downside to generic elements is that you have no control
over their order or whether they are required or repeating. Using XML
Schema 1.0, you cannot specify that there must be a feature element
whose name is weight. You also cannot specify that there can only be
one feature element whose name is monogrammable. For any
feature name, there can be zero, one, or more values for it, and they
can appear in any order. You could enforce this as part of the applica-
tion, but then it would not be written into the service contract. Again,
this is not a problem when you use specific elements for each feature
because you can use minOccurs and maxOccurs on individual element
declarations to control this.
Here are some considerations on whether to use generic versus specific
elements.

# If there is little change (for example, if new features are rarely

added), there is not much benefit to using generic elements.
# If the features are treated similarly (for example, if the consumer
of the message is simply going to turn them into a “features”
table on a web page for human consumption), it is easier
to process them as generic elements.
# If the features are often treated differently, it can be easier to
process them as specific elements. For example, if code is gener-
ated from the schema and the application needs to know the
product’s weight to determine shipping cost, it is more conve-
nient to simply call a method like product.getWeight() than
retrieve all the features through product.getFeatures() and
loop through them until (possibly) finding one called weight.
# If the content is likely to be significantly different for each spe-
cialization (or example, if some features can have differing
multipart or complex values), it is best to use specific elements
so that you can adequately describe their structure in the schema.
20.6 | Considerations for a hierarchical model 537

# If it is important to validate the contents, or the order or appear-

ance of the data items, it is best to use speciﬁc elements, because
these constraints cannot be expressed in XML Schema 1.0 for
generic elements.1

The decision whether to use generic versus speciﬁc elements can be

made at any level, not just for low-level name-value pairs like features.
One level higher, a decision must be made whether to use a generic
product element versus separate shirt and hat elements. This can
be taken all the way up to the root element of the document, where
you could have a generic root element such as message with an at-
tribute saying what type of message it is, or choose to use a speciﬁc root
element for every kind of message. The same considerations described
in this section apply, regardless of the level in the hierarchy.

1. In version 1.1, you can use conditional type assignment or assertions

to enforce certain constraints on generic elements, but it is not as
straightforward as assigning types to speciﬁc elements.
Schema design and
documentation

538
Chapter

t is fairly easy to create a schema once you know the syntax.

I It is harder to design one well. This chapter focuses on a

strategy for designing schemas that are accurate, durable,
and easy to implement. Carefully planning your schema
design strategy is especially important when creating a complex set of
schemas, or a standard schema that is designed to be used and extended
by others.

21.1 The importance of schema design

Schemas are a fundamental part of many XML-based applications,
whether XML is being used in temporary messages for information
sharing or as an enduring representation of content (e.g., in publishing).
Enterprise architects, DBAs, and software developers often devote a
lot of time to data design: They create enterprise data models, data
dictionaries with strict naming, and documentation standards, and
carefully design and optimize relational databases. Unfortunately,

539
540 Chapter 21 | Schema design and documentation

software designers and implementers often do not pay as much attention

to good design when it comes to XML messages.
There are several reasons for this. Some people feel that with transi-
tory XML messages, it is not important how they are structured. Some
decide that it is easier to use whatever schema is generated for them by
a toolkit. Others decide to use an industry-standard XML vocabulary,
but fail to figure out how their data really fits into that standard, or to
come up with a strategy for customizing it for their needs.
As with any data design, there are many ways to organize XML
messages. For example, decisions must be made about how many levels
of elements to include, whether the elements should represent generic
or more specific concepts, how to represent relationships, and how far
to break down data into separate elements. In addition, there are
multiple ways to express the same XML structure in XML Schema.
Decisions must be made about whether to use global versus local dec-
larations, whether to use named versus anonymous types, whether to
achieve reuse through type extension or through named model groups,
and how schemas should be broken down into separate schema
documents.
The choices you make when designing a schema can have a significant
impact on the ease of implementation, ease of maintenance, and even
the ongoing relevance of the system itself. Failure to take into account
design goals such as reuse, graceful versioning, flexibility, and tool
support can have serious financial impacts on software development
projects.

21.2 Uses for schemas

When designing a schema, it is ﬁrst important to understand what it
will be used for. Schemas actually play several roles.

# Validation. Validation is the purpose that is most often associ-

ated with schemas. Given an XML document, you can use a
schema to automatically determine whether that document is
21.2 | Uses for schemas 541

valid or not. Are all of the required elements there, in the right
order? Do they contain valid values according to their data types?
Schema validation does a good job of checking the basic
structure and content of elements.
# A service contract. A schema serves as part of the understanding
between two parties. The document provider and the document
consumer can both use the schema as a machine-enforceable
set of rules describing an interface between two systems or
services.
# Documentation. Schemas are used to document the XML
structure for the developers and end users that will be imple-
menting or using it. Narrative human-readable annotations can
be added to schema components to further document them.
Although schemas themselves are not particularly human-
readable, they can be viewed by less technical users in a graphical
XML editor tool. In addition, there are a number of tools that
will generate HTML documentation from schemas, making
them more easily understood.
# Providing type information. Schemas contain information
about the data types that can affect how the information is
processed. For example, if the schema tells an XSLT 2.0
stylesheet that a value is an integer, it will know to sort it and
compare to other values as an integer instead of a string.
# Assisted editing. For documents that will be hand-modified
by human users, a schema can be used by XML editing software
to provide context-sensitive validation, help, and content
completion.
# Code generation. Schemas are also commonly used, particularly
in web services and other structured data interfaces, to generate
classes and interfaces that read and write the XML message
payloads. When a schema is designed first, classes can be gener-
ated automatically from the schema definitions, ensuring that
they match. Other software artifacts can also be generated from
schemas, for example, data entry forms.
542 Chapter 21 | Schema design and documentation

# Debugging. Schemas can assist in the debugging and testing

processes for applications that will process the XML. For exam-
ple, importing a schema into a XSLT 2.0 stylesheet or an
XQuery query can help identify invalid paths and type errors
in the code that, otherwise, may not have been found during
testing.

As you can see, schemas are an important part of an XML implemen-

tation, and can be involved at both design time and run time. Although
it is certainly possible to use XML without schemas, valuable function-
ality would be lost. You would be forced to use a nonstandard method
to validate your messages, document your system interfaces, and gener-
ate code for web services. You also would not be able to take advantage
of the many schema-based tools that implement this functionality at
low cost.
The various roles of schemas should be taken into account when
designing them. For example, use of obscure schema features can make
code generation difﬁcult, and not adequately documenting schemas
can impact the usefulness of generated documentation.

21.3 Schema design goals

Designing schemas well is a matter of paying attention to certain im-
portant design considerations: ﬂexibility and extensibility, reusability,
clarity and simplicity, support for versioning, interoperability, and tool
support. This section takes a closer look at each of these design goals.

21.3.1 Flexibility and extensibility

Schema design often requires a balancing act between ﬂexibility, on
the one hand, versus rigidity on the other. For example, suppose I am
selling digital cameras that have a variety of features, such as resolution,
battery type, and screen size. Each camera model has a different set of
21.3 | Schema design goals 543

features, and the types of features change over time as new technology
is developed. When designing a message that incorporates these camera
descriptions, I want enough flexibility to handle variations in feature
types, without having to redesign my message every time a new
feature comes along. On the other hand, I want to be able to accurately
and precisely specify these features.
To allow for total flexibility in the camera features, I could declare
a features element whose type contains an element wildcard, which
means that any well-formed XML is allowed. This would have the ad-
vantage of being extremely versatile and adaptable to change. The
disadvantage is that the message structure is very poorly defined. A
developer trying to write an application to process the message would
have no idea what features to expect and what format they might have.
On the other hand, I can declare highly constrained elements for
each feature, with no opportunity for variation. This has the benefit
of making the features well defined, easy to validate, and much more
predictable. Validation is more effective because certain features can
be required and their values can be constrained by specific data types.
However, the schema is brittle because it must be changed every time
a new feature is introduced. When the schema changes, the applications
that process the documents must also often change.
The ideal design is usually somewhere in the middle. A balanced
approach in the case of the camera features might be to create a repeat-
ing feature element that contains the name of the feature as an
attribute and the value of the feature as its content. This eliminates the
brittleness while still providing a predictable structure for implementers.

21.3.2 Reusability
Reuse is an important goal in the design of any software. Schemas
that reuse XML components across multiple kinds of documents are
easier for developers and users to learn, are more consistent, and save
development and maintenance time that could be spent writing
redundant software components.
544 Chapter 21 | Schema design and documentation

Using XML Schema, reuse can be achieved in a number of ways.

# Reusing types. It is highly desirable to reuse complex and simple

types in multiple element and attribute declarations. For exam-
ple, you can define a complex type named AddressType that
represents a mailing address, and then use it for both
BillingAddress and ShippingAddress elements. Only
named, global types can be reused, so types in XML Schema
should generally be named.
# Type inheritance. In XML Schema, complex types can be
specialized from other types using type extensions. For example,
I can create a more generic type ProductType and derive types
named CameraType and LensType from it. This is a form of
reuse because CameraType and LensType inherit a shared set
of properties from ProductType.
# Named model groups and attribute groups. Through the use
of named model groups, it is possible to define reusable pieces of
content models. This is a useful alternative to type inheritance
for types that are semantically different but just happen to share
some properties with other types.
# Reusing schema documents. Entire schema documents can
be reused by taking advantage of the include and import
mechanisms of XML Schema. This is useful for defining com-
ponents that might be used in several different contexts or ser-
vices. In order to plan for reuse, schema documents should be
broken down into logical components by subject area. Having
schema documents that are too large and all-encompassing tends
to inhibit reuse because it forces other schema documents to
take all or nothing when importing them. It is also good practice
to create a “core components” schema that has low-level building
blocks, such as types for Address and Quantity, that are
imported by all other schema documents.
21.3 | Schema design goals 545

21.3.3 Clarity and simplicity

When human users are creating and updating XML documents, clarity
is of the utmost importance. If users have difﬁculty understanding the
document structure, it will take far more time to edit a document, and
the editing process will be much more prone to errors. Even when
XML documents are both written and read by software applications,
they still should be designed so that they are easy to conceptualize and
process. Implementers on both sides—those who create XML docu-
ments and those who consume them—are writing and maintaining
applications to process these messages, and they must understand them.
Overly complex message designs lead to overly complex applications
that create and process them, and both are hard to learn and maintain.

21.3.3.1 Naming and documentation

Properly and consistently naming schema components—elements,
attributes, types, groups—can go a long way toward making the
documents comprehensible. Using a common set of terms rather than
multiple synonymous terms is good practice, as is the avoidance
of obscure acronyms. In XML Schema, it is helpful to identify the
kind of component in its name, for example by using the word “Type”
at the end of type names. Namespaces should also be consistently and
meaningfully named.
Of course, good documentation is very important to achieving clar-
ity. XML Schema allows components to be documented using annota-
tions. While you probably have other documentation that describes
your system, having human-readable deﬁnitions of the components
in your schema is very useful for people who maintain and use
that schema. It also allows you to use tools that automatically generate
schema documentation more effectively.
546 Chapter 21 | Schema design and documentation

21.3.3.2 Clarity of structure

Consistent structure can also help improve clarity. For example, if many
different types have child elements Identifier and Name, put them
first and always in the same order. Reuse of components helps to ensure
consistent structure.
It is often difficult to determine how many levels of elements to put
in a message. Using intermediate elements that group together related
properties can help with understanding. For example, embedding all
address-related elements (street, city, etc.) inside an Address child
element, not directly inside a Customer element, is an obvious choice.
It makes the components of the address clearly contained and allows
you to make the entire address optional or repeating.
It is also often useful to use intermediate elements to contain lists of
list-like elements. For example, it is a good idea to embed a repeating
sequence of OrderedItem elements inside an OrderedItems (plural)
container, rather than directly inside a PurchaseOrder element.
These container elements can make messages easier to process and often
work better with code generation tools.
However, there is such a thing as excessive use of intermediate ele-
ments. XML messages that are a dozen levels deep can become unwieldy
and difficult to process.

21.3.3.3 Simplicity
It is best to minimize the number of ways a particular type of data or
content can be expressed. Having multiple ways to represent a particular
kind of data or content in your XML documents may seem like a good
idea because it is more ﬂexible. However, allowing too many choices is
confusing to users, puts more of a burden on applications that
process the documents, and can lead to interoperability problems.
21.3 | Schema design goals 547

21.3.4 Support for graceful versioning

Systems will change over time. Schemas should be designed with a
plan for how to handle changes in a way that causes minimum impact
on the systems that create and process XML documents.
A typical schema versioning strategy differentiates between major
versions and minor versions. Major versions, such as 1.0, 2.0, or 3.0,
are by deﬁnition disruptive and not backward-compatible; at times this
is an unavoidable part of software evolution. On the other hand, minor
versions, such as 1.1, 1.2, or 1.3, are backward-compatible. They involve
changes to schemas that will still allow old message instances to be valid
according to the new schema. For example, a version 1.2 message can
be valid according to a version 1.3 schema if the version 1.3 limits itself
to backward-compatible changes.

21.3.5 Interoperability and tool compatibility

Schemas are used heavily by tools—not just for validation but also for
the generation of code and documentation. In an ideal world, all schema
parsers and toolkits would support the exact same schema language,
and all schemas would be interoperable. The unfortunate reality is that
tools, especially code generation tools, vary in their support for XML
Schema, for several reasons.

# Some toolkits incorrectly implement features of XML Schema

because the recommendation is complex and in some cases even
ambiguous.
# Some web services toolkits deliberately do not support certain
features of XML Schema because they do not ﬁnd them to be
relevant or useful to a particular use case, such as data binding.
# Some XML Schema concepts do not map cleanly onto object-
oriented concepts. Even if a toolkit attempts to support these
features, it may do so in a less than useful way.
548 Chapter 21 | Schema design and documentation

In general, it is advisable to stick to a subset of the XML Schema

language that is well supported by the kinds of toolkits you will be using
in your environment. For example, features of XML Schema to
avoid in a web services environment where data-binding toolkits are
in use include

# Mixed content (elements that allow text content as well as

children)
# choice and all model groups
# Complex content models with nested model groups
# Substitution groups
# Dynamic type substitution using the xsi:type attribute
# Default and ﬁxed values for elements or attributes
# Redeﬁnition of schema documents

It is advisable to test your schemas against a variety of toolkits to be

sure that they can handle them gracefully.

21.4 Developing a schema design strategy

Many organizations that are implementing medium- to large-scale
XML vocabularies develop enterprise-wide guidelines for schema design,
taking into account the considerations described in this chapter.
Sometimes these guidelines are organized into documents that are
referred to as Naming and Design Rules (NDR) documents.
Having a cohesive schema design strategy has a number of beneﬁts.

# It promotes a standard approach to schema development that

improves consistency and therefore clarity.
# It ensures that certain strategies, such as how to approach ver-
sioning, are well thought out before too much investment has
been made in development.
21.4 | Developing a schema design strategy 549

# It allows the proposed approach to be tested with toolkits in

use in the organization to see if they generate manageable code.
# It serves as a basis for design reviews, which are a useful way for
centralized data architects to guide or even enforce design
standards within an organization.

A schema design strategy should include the following topics:

# Naming standards: standard word separators, upper versus

lower case names, a standard glossary of terms, special consider-
ations for naming types and groups. Naming standards are
discussed in Section 21.6 on p. 559.
# Namespaces: what they should be named, how many to have,
how many schema documents to use per namespace, how
they should be documented. See Section 21.7 on p. 564 for
namespace guidelines.
# Schema structure strategy: how many schema documents
to have, recommended folder structure, global versus local
components. Section 21.5 on p. 550 covers these topics.
# Documentation standards: the types of documentation
required for schema components, where they are to be
documented. Schema documentation is covered in Section 21.8
on p. 580.
# XML Schema features: a list of allowed (or prohibited) XML
Schema features, limited to promote simplicity, better tool
support, and interoperability.
# Versioning strategy: whether to require forward compatibility
(and if so how to accomplish it), rules for backward compatibil-
ity of releases, patterns for version numbering. All of Chapter 23
is devoted to versioning, with particular attention paid to
developing a versioning strategy in Section 23.4.1 on p. 636.
# Reuse strategy: recommended methods of achieving reuse, an
approach for a common component library. Reuse is covered
in Section 22.1 on p. 596.
550 Chapter 21 | Schema design and documentation

# Extension strategy: which external standards are approved for

use, description of the correct way to incorporate or extend
them, how other standards should extend yours and under what
conditions. Section 22.2 on p. 599 compares and contrasts six
methods for extending schemas.

These considerations are covered in the rest of this chapter and the
next two chapters.

21.5 Schema organization considerations

There are a number of design decisions that affect the way a schema is
organized, without impacting validation of instances. They include
whether to use global or local declarations and how to modularize your
schemas.

21.5.1 Global vs. local components

Some schema components can be either global or local. Element and
attribute declarations, for example, can be scoped entirely with a com-
plex type (local) or at the top level of the schema document (global).
Type deﬁnitions (both simple and complex) can be scoped to a partic-
ular element or attribute declaration, in which case they are anonymous,
or at the top level of the schema document, in which case they are
named. Sections 6.1.3 on p. 95, 7.2.3 on p. 119, and 8.2.3 on p. 133
cover the pros and cons of global versus local components.
It is possible to decide individually for each component whether it
should be global or local, but it is better to have a consistent strategy
that is planned in advance. Table 21–1 provides an overview of the
four possible approaches to the global/local decision. The names asso-
ciated with the approaches (with the exception of Garden of Eden)
were developed as the result of a discussion on XML-DEV led by
Roger Costello, who wrote them up as a set of best practices at
www.xfront.com/GlobalVersusLocal.pdf.
21.5 | Schema organization considerations 551

Table 21–1 Schema structure patterns

Element declarations
Local Global
Anonymous/Local Russian Doll Salami Slice
Type deﬁnitions
Named/Global Venetian Blind Garden of Eden

This section provides an overview of the advantages and disadvantages

of each approach. All four of these approaches will validate the same
instance, so the question is more one of schema design than XML
document design.
In all four approaches, the attribute declarations are locally declared.
This follows the recommended practice of allowing unqualiﬁed attribute
names when the attributes are part of the vocabulary being deﬁned by
the schema.

21.5.1.1 Russian Doll

The Russian Doll approach is characterized by all local deﬁnitions,
with the exception of the root element declaration. All types are
anonymous, and all element and attribute declarations are local.
Example 21–1 is a Russian Doll schema.
The main disadvantage of this approach is that neither the elements
nor the types are reusable. This can result in code that is redundant
and hard to maintain. It can also be cumbersome to read. With all the
indenting it is easy to lose track of where you are in the hierarchy.
There are a few advantages of this approach but they are less
compelling.

# Since the elements are locally declared, it is possible to have

more than one element with the same name but a different
type or other characteristics. For example, there can be a
number child of product that has a format different
from a number child of order.
552 Chapter 21 | Schema design and documentation

Example 21–1. Schema for Russian Doll approach

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
elementFormDefault="qualified">
<xs:element name="catalog">
<xs:complexType>
<xs:sequence>
<xs:element name="product" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

# Since the elements are locally declared, their names can be

unqualiﬁed in the instance. However, this practice is not
recommended, as described in Section 21.7.3.6 on p. 579.
# There is only one global element declaration, so it is obvious
which one is the root.
# Since there is no reuse, it is easier to see the impact of a change:
You simply look up the hierarchy.
21.5 | Schema organization considerations 553

21.5.1.2 Salami Slice

The Salami Slice uses global element declarations but anonymous (local)
types. This places importance on the element as the unit of reuse.
Example 21–2 is a Salami Slice schema.

Example 21–2. Schema for Salami Slice approach

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
elementFormDefault="qualified">
<xs:element name="catalog">
<xs:complexType>
<xs:sequence>
<xs:element ref="product" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="product">
<xs:complexType>
<xs:sequence>
<xs:element ref="number"/>
<xs:element ref="name"/>
<xs:element ref="size"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
</xs:complexType>
</xs:element>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:schema>
554 Chapter 21 | Schema design and documentation

The disadvantage of this approach is that the types are not reusable
by multiple element declarations. Often you will have multiple element
names that have the same structure, such as billingAddress and
shippingAddress with the same address structure. Using this model,
the entire address structure would need to be respeciﬁed each time, or
put into a named model group. Anonymous types also cannot be used
in derivation—another form of reuse and sometimes an important ex-
pression of an information model. Although you can reuse the element
declarations, this might mean watered-down element names, such as
address instead of a more speciﬁc kind of address. Since elements are
globally declared, it is not possible to have more than one element with
the same name but a different type or other characteristics; all element
names in the entire schema must be unique.
This approach does have some advantages over Russian Doll,
namely that it is more readable and does allow some degree of reuse
through element declarations. Unlike Russian Doll, it does allow the
use of substitution groups, which require global element declarations.

21.5.1.3 Venetian Blind

The Venetian Blind approach has local element declarations but named
global types. Example 21–3 is a Venetian Blind schema.
The significant advantage to this approach is that the types are
reusable. The advantages of named types over anonymous types
are compelling and make this approach more flexible and better defined
than either of the previous two approaches. Since elements are locally
declared, it is possible to have more than one element with the same
name but a different type or other characteristics, which also improves
flexibility.
The disadvantage of this approach is that element declarations are
not reused, so if they have complex constraints such as type alternatives
or identity constraints, these need to be respecified in multiple places.
Also, element declarations cannot participate in substitution groups.
If substitution groups aren’t needed, this is the author’s preferred
approach and is the style used in most of this book. It allows for full
21.5 | Schema organization considerations 555

Example 21–3. Schema for Venetian Blind approach

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
elementFormDefault="qualified">
<xs:element name="catalog" type="CatalogType"/>
<xs:complexType name="CatalogType">
<xs:sequence>
<xs:element name="product" type="ProductType"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="SizeType"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
</xs:complexType>
<xs:simpleType name="SizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>

reuse through types, but also allows the ﬂexibility of varying element
names. It maps very cleanly onto an object-oriented model, where the
complex types are analogous to classes and the element declarations
are analogous to instance variables that have that class.

21.5.1.4 Garden of Eden

The Garden of Eden approach has all global (named) types and global
element declarations. Example 21–4 is a Garden of Eden schema.
The only disadvantage of this approach, other than its verbosity, is
that since elements are globally declared, it is not possible to have
more than one element with the same name but a different type or
556 Chapter 21 | Schema design and documentation

Example 21–4. Schema for Garden of Eden approach

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
elementFormDefault="qualified">
<xs:element name="catalog" type="CatalogType"/>
<xs:complexType name="CatalogType">
<xs:sequence>
<xs:element ref="product" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element ref="number"/>
<xs:element ref="name"/>
<xs:element ref="size"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:string"/>
</xs:complexType>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="SizeType"/>
<xs:simpleType name="SizeType">
<xs:restriction base="xs:integer">
<xs:minInclusive value="2"/>
<xs:maxInclusive value="18"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>

other characteristics. All element names in the entire schema must be

unique. However, some would consider unique, meaningful element
names to be an advantage, and indeed it can simplify processing of the
document using technologies like XSLT and SAX.
Garden of Eden is a very viable approach to schema design. Its big
advantage is that it allows both element declarations and types to be
referenced from multiple components, maximizing their reuse potential.
In addition, compared to Venetian Blind, it allows the use of substitu-
tion groups. If substitution groups are needed, and it is acceptable to
21.5 | Schema organization considerations 557

force the uniqueness of element names, this approach is the right choice.
Many standard XML vocabularies use this approach.
Overall, the Garden of Eden and Venetian Blind, depending on your
requirements, are the recommended approaches. The Russian Doll
approach has obvious limitations in terms of reuse, and the Salami
Slice approach does not beneﬁt from the very signiﬁcant advantages of
named types over anonymous types.

21.5.2 Modularizing schema documents

Another decision related to schema structure is how to modularize your
schema documents. Consider a project that involves orders for retail
products. The order will contain information from several different
domains. It will contain general information that applies to the order
itself, such as the order number and date. It may also contain customer
information, such as customer name, number, and address. Finally, it
may contain product information, such as product number, name,
description, and size.
Do you want one big schema document, or three schema documents,
one for each of the subject areas (order, customer, and product)? There
are a number of advantages to composing your schema representation
from multiple schema documents.

# Easier reuse. If schema documents are small and focused, they

are more likely to be reused. For example, a product catalog
application might want to reuse the deﬁnitions from the product
schema. This is much more efﬁcient if the product catalog ap-
plication is not forced to include everything from the order
application.
# Ease of maintenance. Smaller schema documents are more
readable and manageable.
# Reduced chance of name collisions. If different namespaces
are used for the different schema documents, name collisions
are less likely.
558 Chapter 21 | Schema design and documentation

# Versioning granularity. It is helpful to separate components

that change more frequently, or change on a different schedule,
into a separate schema document. This creates less disruption
when a new version is released. Code lists (simple types with
enumerations) are often placed into individual schema
documents so they can be versioned separately.
# Access control granularity. Security can be managed per
schema document, allowing more granular access control.

Dividing up your schema documents too much, though, can make

them hard to manage. For example, having one element declaration or
type deﬁnition per schema document would necessitate the use of
dozens of imports or includes in your schema documents. If different
namespaces are used in these schema documents, that compounds the
complexity, because an instance document will have to declare dozens
of namespaces.
There are several ways to distribute your components among schema
documents, for example:

# Subject area. If your instance will contain application data, this

could mean one schema document per application or per
database. If the instances incorporate XML documents of differ-
ent types, such as test reports and product specifications, and
each is defined by its own root element declaration, it would
be logical to use a separate schema document for each
document type.
# General/specific. There may be a base set of components that
can be extended for a variety of different purposes. For example,
you may create a schema document that contains generic (pos-
sibly abstract) definitions for purchase orders and invoices, and
separate schema documents for each set of industry-specific
extensions.
# Basic/advanced. Suppose you have a core set of components
that are used in all instances, plus a number of optional compo-
nents. You may want to define these optional components in a
21.6 | Naming considerations 559

separate schema document. This allows an instance to be vali-

dated against just the core set of components or the enhanced
set, depending on the application.
# Governance. Schemas should be divided so that a single schema
document is not governed by more than one group of people.
# Versioning Schedule. As mentioned above, it is helpful to
separate components that are versioned frequently or on
different schedules, such as code lists.

Another issue when you break up schema documents is whether to

use the same namespace for all of them, or break them up into separate
namespaces. This issue is covered in detail in Section 21.7.2 on p. 565.

21.6 Naming considerations

This section provides detailed recommendations for choosing and
managing names for XML elements and attributes, as well as for other
XML Schema components. It discusses naming guidelines, the use of
qualified and unqualified names, and organizing a namespace.
Consistency in naming can be as important as the names themselves.
Consistent names are easier to understand, remember, and maintain.
This section provides guidelines for defining an XML naming standard
to ensure the quality and consistency of names.
These guidelines apply primarily to the names that will appear in an
instance—namely, element, attribute, and notation names. However,
much of this section is also applicable to the names used within the
schema, such as type, group, and identity constraint names.

21.6.1 Rules for valid XML names

Names in XML must start with a letter or underscore (_), and can
contain only letters, digits, underscores (_), colons (:), hyphens (-),
and periods (.). Colons should be reserved for use with namespace
560 Chapter 21 | Schema design and documentation

preﬁxes. In addition, an XML name cannot start with the letters xml
in either upper or lower case.
Names in XML are always case-sensitive, so accountNumber and
AccountNumber are two different element names.
Since schema components have XML names, these name restrictions
apply not only to the element and attribute names that appear in
instances, but also to the names of the types, named model groups, at-
tribute groups, identity constraints, and notations you deﬁne in your
schemas.

21.6.2 Separators
If a name is made up of several terms, such as “account number,” you
should decide on a standard way to separate the terms. It can be done
through capitalization (e.g., accountNumber) or through punctuation
(e.g., account-number).
Some programming languages, database management systems, and
other technologies do not allow hyphens or other punctuation in the
names they use. Therefore, if you want to directly match your
element names, for example, with variable names or database column
names, you should use capitalization to separate terms.
If you choose to use capitalization, the next question is whether
to use mixed case (e.g., AccountNumber) or camel case (e.g.,
accountNumber). In some programming languages, it is a convention
to use mixed case for class names and camel case for instance variables.
In XML, this maps roughly to using mixed case for type names and
camel case for element names. This is the convention used in this book.
Regardless of which approach you choose, the most important thing
is being consistent.

21.6.3 Name length

There is no technical limit to the length of an XML name.
However, the ideal length of a name is somewhere between four
21.6 | Naming considerations 561

and twelve characters. Excessively long element names, such as

HazardousMaterialsHandlingFeeDomestic, can, if used frequently,
add dramatically to the size of the instance. They are also difﬁcult to
type, hard to distinguish from other long element names, and not very
readable. On the other hand, very short element names, such as b, can
be too cryptic for people who are not familiar with the vocabulary.

21.6.4 Standard terms and abbreviations

In order to encourage consistency, it is helpful to choose a standard set
of terms that will be used in your names. Table 21–2 shows a sample
list of standard terms. This term list is used in the examples in this
book. Synonyms are included in the list to prevent new terms
from being created that have the same meaning as other terms.

Table 21–2 Terms

Term Abbreviation Synonyms
color color
currency curr
customer cust client, purchaser, account holder
date date
description desc
effective eff begin, start
identiﬁer id
name name
number num code
order ord purchase order
price price cost
product prod item
size size
562 Chapter 21 | Schema design and documentation

These consistent terms are then combined to form element and

attribute names. For example, “product number” might become
productNumber.
In some cases, the name will be too long if all of the terms are
concatenated together. Therefore, it is useful to have a standard abbre-
viation associated with each term. Instead of productNumber,
prodNumber may be more manageable by being shorter.

21.6.5 Use of object terms

In some contexts, using an element name such as prodNumber may
be redundant. In Example 21–5, it is obvious from the context that
the number is a product number as opposed to some other kind of
number.

Example 21–5. Repetition of terms

<product>
<prodNumber>557</prodNumber>
<prodName>Short-Sleeved Linen Blouse</prodName>
<prodSize sizeSystem="US-DRESS">10</prodSize>
</product>

In this case, it may be clearer to leave off the prod term on the child
elements, as shown in Example 21–6.

Example 21–6. No repetition of terms

<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size system="US-DRESS">10</size>
</product>
21.6 | Naming considerations 563

There may be other cases where the object is not so obvious. In Ex-
ample 21–7, there are two names: a customer name and a product
name. If we took out the terms cust and prod, we would not be able
to distinguish between the two names. In this case, it should be left as
shown.

Example 21–7. Less clear context

<letter>Dear <custName>Priscilla Walmsley</custName>,
Unfortunately, we are out of stock of the
<prodName>Short-Sleeved Linen Blouse</prodName> in size
<prodSize>10</prodSize> that you ordered...</letter>

When creating element and attribute names, it is helpful to list two

names: one to be used when the object is obvious, and one to be used
in other contexts. This is illustrated in Table 21–3.

Table 21–3 Element or attribute names

Logical name Object Name inside Name outside
object context object context
customer name customer name custName

product name product name prodName

customer number customer number custNum

product number product number prodNum

order number order number ordNum

product size product size prodSize

product color product color prodColor

564 Chapter 21 | Schema design and documentation

21.7 Namespace considerations

21.7.1 Whether to use namespaces
Some designers of XML vocabularies wonder whether they should even
use namespaces. In general, using namespaces has a lot of advantages:

# It indicates clear ownership of the deﬁnitions in that namespace.

# The namespace name, if it is a URL, provides a natural place
to locate more information about that XML vocabulary.
# It allows the vocabulary to be combined with other XML vocab-
ularies with a clear separation and without the risk of name
collision.
# It is an indication to processing software what kind of document
it is.

The downside of using namespaces is the complexity, or perceived

complexity. You have to declare them in your schemas and your in-
stances, and the code you write to process the documents has to pay
attention to them. Another possible disadvantage is their limited support
in DTDs. If you are writing both a DTD and a schema for your vocab-
ulary, it requires special care to flexibly allow namespace declarations
and prefixes in the DTD.1
Despite some negative perceptions about namespaces, it is fairly
unusual for standard, reusable vocabularies not to use namespaces. If
you are writing a one-off XML vocabulary that is internal to a single
organization or application, it is probably fine not to use namespaces.
However, if you are planning for your vocabulary to be used by a variety
of organizations, or combined with other vocabularies, namespaces are
highly recommended.

1. For a demanding real-world example, see the DTD for XSD in Appendix A
of www.w3.org/TR/2012/REC-xmlschema11-1-20120405/structures.html.
21.7 | Namespace considerations 565

The complexity of namespaces can be somewhat mitigated by

choosing a straightforward namespace strategy. For example, using
fewer namespaces, using qualiﬁed local element names, and using
conventional preﬁxes consistently can make namespaces seem less
cumbersome.

21.7.2 Organizing namespaces

Consider a project that involves orders for retail products. An order
will contain information from several different domains. It will contain
general information that applies to the order itself, such as order
number and date. It may also contain customer information, such as
customer name, number, and address. Finally, it may contain product
information, such as product number, name, description, and size. Is
it best to use the same namespace for all of them, or break them up
into separate namespaces? There are three approaches:

1. Same namespace: Use the same namespace for all of the schema
documents.
2. Different namespaces: Use multiple namespaces, perhaps a
different one for each schema document.
3. Chameleon namespaces: Use a namespace for the parent
schema document, but no namespaces for the included schema
documents.

21.7.2.1 Same namespace

It is possible to give all the schema documents the same target name-
space and use include to assemble them to represent a schema with
that namespace. This is depicted in Figure 21–1.
Example 21–8 shows our three schema documents using this ap-
proach. They all have the same target namespace, and ord.xsd includes
the other two schema documents.
566 Chapter 21 | Schema design and documentation

Figure 21–1 Same namespace

Example 21–8. Same namespace in a schema

ord.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/all"
targetNamespace="http://datypic.com/all"
elementFormDefault="qualified">
<xs:include schemaLocation="prod.xsd"/>
<xs:include schemaLocation="cust.xsd"/>
<xs:element name="order" type="OrderType"/>
<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="customer" type="CustomerType"/>
<xs:element name="items" type="ItemsType"/>
</xs:sequence>
</xs:complexType>
</xs:schema>

(Continues)
21.7 | Namespace considerations 567

Example 21–8. (Continued)

prod.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/all"
targetNamespace="http://datypic.com/all"
elementFormDefault="qualified">
<xs:complexType name="ItemsType">
<xs:sequence maxOccurs="unbounded">
<xs:element name="product" type="ProductType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:schema>

cust.xsd:

Example 21–9 shows an instance that conforms to the schema.

Since there is only one namespace for all of the elements, a default
namespace declaration is used.
The advantages of this approach are that it is uncomplicated and the
instance is not cluttered by preﬁxes. The disadvantage is that you cannot
568 Chapter 21 | Schema design and documentation

Example 21–9. Same namespace in an instance

<order xmlns="http://datypic.com/all"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/all ord.xsd">
<customer>
<name>Priscilla Walmsley</name>
</customer>
<items>
<product>
<number>557</number>
</product>
</items>
</order>

have multiple global components with the same name in the same
namespace, so you will have to be careful of name collisions.
This approach assumes that you have control over all the schema
documents. If you are using elements from a namespace over which
you have no control, such as the XHTML namespace, you should use
the approach described in the next section.
This approach is best within a particular application where you have
control over all the schema documents involved.

21.7.2.2 Different namespaces

It is also possible to give each schema document a different target
namespace and use an import (or other method) to assemble the
multiple schema documents. This is depicted in Figure 21–2.
Example 21–10 shows our three schema documents using this ap-
proach. They all have different target namespaces, and ord.xsd imports
the other two schema documents.
21.7 | Namespace considerations 569

Figure 21–2 Different namespaces

Example 21–10. Different namespaces in a schema

ord.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:prod="http://datypic.com/prod"
xmlns:cust="http://datypic.com/cust"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord"
elementFormDefault="qualified">
<xs:import schemaLocation="prod.xsd"
namespace="http://datypic.com/prod"/>
<xs:import schemaLocation="cust.xsd"
namespace="http://datypic.com/cust"/>
<xs:element name="order" type="OrderType"/>

(Continues)
570 Chapter 21 | Schema design and documentation

Example 21–10. (Continued)

<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="customer" type="cust:CustomerType"/>
<xs:element name="items" type="prod:ItemsType"/>
</xs:sequence>
</xs:complexType>
</xs:schema>

prod.xsd:

cust.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/cust"
targetNamespace="http://datypic.com/cust"
elementFormDefault="qualified">
<xs:complexType name="CustomerType">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>

Example 21–11 shows an instance that conforms to this schema.

You are required to declare all three namespaces in the instance and to
preﬁx the element names appropriately. However, since ord.xsd
21.7 | Namespace considerations 571

imports the other two schema documents, you are not required to
specify xsi:schemaLocation pairs for all three schema documents,
just the “main” one.

Example 21–11. Different namespaces in an instance

<order xmlns="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod"
xmlns:cust="http://datypic.com/cust"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/ord ord.xsd">
<customer>
<cust:name>Priscilla Walmsley</cust:name>
</customer>
<items>
<prod:product>
<prod:number>557</prod:number>
</prod:product>
</items>
</order>

To slightly simplify the instance, different default namespace decla-

rations could appear at different levels of the document, resulting
in the instance shown in Example 21–12. It could be simpliﬁed even
further by the use of unqualiﬁed local element names, as discussed in
Section 21.7.3.2 on p. 576.

Example 21–12. Different namespaces in an instance, with default namespaces

<order xmlns="http://datypic.com/ord"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/ord ord.xsd">
<customer>
<name xmlns="http://datypic.com/cust">Priscilla Walmsley</name>
</customer>
<items>
<product xmlns="http://datypic.com/prod">
<number>557</number>
</product>
</items>
</order>
572 Chapter 21 | Schema design and documentation

This is an obvious approach when you are using namespaces over

which you have no control—for example, if you want to include
XHTML elements in your product description. There is no point trying
to copy all the XHTML element declarations into a new namespace.
This would create maintenance problems and would not be very clear
to users. In addition, applications that process XHTML may require
that the elements be in the XHTML namespace.
The advantage to this approach is that the source and context of an
element are very clear. In addition, it allows different groups to be re-
sponsible for different namespaces. Finally, you can be less concerned
about name collisions, because the names must only be unique within
a namespace.
The disadvantage of this approach is that instances are more complex,
requiring preﬁxes for multiple different namespaces. Also, you cannot
use the redefine or override feature on these components, since
they are in a different namespace.

21.7.2.3 Chameleon namespaces

The third possibility is to specify a target namespace only for the
“main” schema document, not the included schema documents.
The included components then take on the target namespace of the
including document. In our example, all of the deﬁnitions and
declarations in both prod.xsd and cust.xsd would take on the
http://datypic.com/ord namespace once they are included in
ord.xsd. This is depicted in Figure 21–3.
Example 21–13 shows our three schema documents using this ap-
proach. Neither prod.xsd nor cust.xsd has a target namespace,
while ord.xsd does. ord.xsd includes the other two schema
documents and changes their namespace as a result.
The instance in this case would look similar to that of the same-
namespace approach shown in Example 21–9, since all the elements
would be in the same namespace. The only difference is that in this
case the namespace would be http://datypic.com/ord instead of
http://datypic.com/all.
21.7 | Namespace considerations 573

Figure 21–3 Chameleon namespaces

Example 21–13. Chameleon namespaces in a schema

ord.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord"
elementFormDefault="qualified">
<xs:include schemaLocation="prod.xsd"/>
<xs:include schemaLocation="cust.xsd"/>
<xs:element name="order" type="OrderType"/>
<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="customer" type="CustomerType"/>
<xs:element name="items" type="ItemsType"/>
</xs:sequence>
</xs:complexType>
</xs:schema>

(Continues)
574 Chapter 21 | Schema design and documentation

Example 21–13. (Continued)

prod.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:complexType name="ItemsType">
<xs:sequence maxOccurs="unbounded">
<xs:element name="product" type="ProductType"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:schema>

cust.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:complexType name="CustomerType">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>

The advantage of this approach is its ﬂexibility. Components can be

included in multiple namespaces, and redefined or overridden whenever
desired.
The disadvantage is that the risk of name collisions is even more se-
rious. If the non-namespace schema documents grow over time, the
risk increases that there will be name collisions with the schema docu-
ments that include them. If they were in their own namespace,
unexpected collisions would be far less likely.
Another disadvantage is that the chameleon components lack an
identity. Namespaces can be well-defined containers that provide a
recognizable context as well as specific semantics, documentation, and
application code.
21.7 | Namespace considerations 575

21.7.3 Qualiﬁed vs. unqualiﬁed forms

Many instances include elements from more than one namespace. This
can potentially result in instances with a large number of different
prefixes, one for each namespace. However, when an element dec-
laration is local—that is, when it isn’t at the top level of a schema
document—you have the choice of using either qualified or unqualified
element names in instances, a concept that was introduced in Sec-
tion 6.3 on p. 98. Let’s recap the two alternatives, this time looking at
more complex multinamespace documents.

21.7.3.1 Qualiﬁed local names

Example 21–14 shows an instance where all element names are qual-
iﬁed. Every element name has a preﬁx that maps it to either the product
namespace or the order namespace.

Example 21–14. Qualiﬁed local names

<ord:order xmlns:ord="http://datypic.com/ord"
xmlns:prod="http://datypic.com/prod">
<ord:number>123123</ord:number>
<ord:items>
<prod:product>
<prod:number>557</prod:number>
</prod:product>
</ord:items>
</ord:order>

This instance is very explicit about which namespace each element

is in. There will be no confusion about whether a particular number
element is in the order namespace or in the product namespace.
However, the application or person that generates this instance must
be aware which elements are in the order namespace and which are in
the product namespace.
576 Chapter 21 | Schema design and documentation

21.7.3.2 Unqualiﬁed local names

Example 21–15, on the other hand, shows an instance where only the
root element name, order, is qualiﬁed. The other element names
have no preﬁx, and since there is no default namespace provided, they
are not in any namespace.
This instance has the advantage of looking slightly less complicated
and not requiring the instance author to care about what namespace
each element belongs in. In fact, the instance author does not even
need to know of the existence of the product namespace.

Example 21–15. Unqualiﬁed local names

<ord:order xmlns:ord="http://datypic.com/ord">
<number>123123</number>
<items>
<product>
<number>557</number>
</product>
</items>
</ord:order>

21.7.3.3 Using form in schemas

Let’s look at the schemas that would describe these two instances. Ex-
ample 21–16 shows how to represent the schema for the instance in
Example 21–14, which has qualified element names. The representation
is made up of two schema documents: ord.xsd, which defines compo-
nents in the order namespace, and prod.xsd, which defines
components in the product namespace.
Both schema documents have elementFormDefault set to
qualified. As a result, locally declared elements must use qualified
element names in the instance. In this example, the declaration for
order is global and the declarations for product, items, and number
are local.
21.7 | Namespace considerations 577

Example 21–16. Schema for qualiﬁed local element names

ord.xsd:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:prod="http://datypic.com/prod"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord"
elementFormDefault="qualified">
<xs:import schemaLocation="prod.xsd"
namespace="http://datypic.com/prod"/>
<xs:element name="order" type="OrderType"/>
<xs:complexType name="OrderType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="items" type="prod:ItemsType"/>
</xs:sequence>
</xs:complexType>
</xs:schema>

prod.xsd:

To create a schema for the instance in Example 21–15, which

has unqualiﬁed names, you can simply change the value of
elementFormDefault in both schema documents to unqualified.
Since the default value is unqualified, you could alternatively simply
578 Chapter 21 | Schema design and documentation

omit the attribute. In this case, globally declared elements still must
use qualiﬁed element names, hence the use of ord:order in the
instance.

21.7.3.4 Form and global element declarations

An important thing to notice is that the choice between qualified and
unqualified names applies only to local element declarations. All glob-
ally declared elements must have qualified element names in the in-
stance; there is no way to override this. In our example, all of the
element declarations except order are local. If the product declaration
had been global, the product elements would have to use qualified
element names, regardless of the value of elementFormDefault.
This can cause confusion if you choose to use unqualified local ele-
ment names, and you want to mix global and local element declarations.
Not only would an instance author be required to know what name-
space each element is in, but he or she would also need to know whether
it was globally or locally declared in the schema.
This can be avoided by making all element declarations local, except
for the declaration for the root elements. However, you may not have
this choice if you import element declarations from namespaces that
are not under your control. Also, if you plan to use substitution groups,
the participating element declarations must be global.

21.7.3.5 Default namespaces and unqualiﬁed names

Default namespaces do not mix well with unqualiﬁed element names.
The instance in Example 21–17 declares the order namespace as the
default namespace. However, this will not work with a schema docu-
ment where elementFormDefault is set to unqualified, because
it will be looking for the elements items, product, and number in
the order namespace, in which they are not—they are not in any
namespace.
21.7 | Namespace considerations 579

Example 21–17. Invalid mixing of unqualiﬁed names and default namespace

<order xmlns="http://datypic.com/ord">
<number>123ABBCC123</number>
<items>
<product>
<number>557</number>
</product>
</items>
</order>

21.7.3.6 Qualiﬁed vs. unqualiﬁed element names

Whether to use qualified or unqualified local element names is a matter
of style. The advantages of using qualified local element names are:

# You can tell by looking at the document which namespace a

name is in. If you see that a b element is in the XHTML
namespace, you can more quickly understand its meaning.
# There is no ambiguity to a person or application what name-
space an element belongs in. In our example, there was a number
element in each of the namespaces. In most cases, you can de-
termine from its position in the instance whether it is an order
number or a product number, but not always.
# Certain applications or processors, for example an XHTML
processor, might be expecting the element names to be qualiﬁed
with the appropriate namespace.
# You can mix global and local element declarations without af-
fecting the instance authors. You may be forced to make some
element declarations global because you are using substitution
groups, or because you are importing a schema document over
which you have no control. If you use unqualiﬁed local names,
the instance author has to know which element declarations are
global and which are local.

The advantages of using unqualiﬁed local element names are:

580 Chapter 21 | Schema design and documentation

# The instance author does not have to be aware of which

namespace each element name is in. If many namespaces are
used in the instance, this can simplify creation of instances.
# The lack of preﬁxes and namespace declarations makes the
instance look less cluttered.

In general, it is best to use qualiﬁed element names, for the reasons

stated above. If consistent preﬁxes are used, they just become part of
the name, and authors get used to writing prod:number rather than
just number. Most XML editors assist instance authors in choosing the
right element names, preﬁxed or not.

21.7.3.7 Qualiﬁed vs. unqualiﬁed attribute names

The decision about qualified versus unqualified forms is simpler for
attributes than elements. Qualified attribute names should only be used
for attributes that apply to a variety of elements in a variety of name-
spaces, such as xml:lang or xsi:type. Such attributes are almost al-
ways declared globally. For locally declared attributes, whose scope is
the type definition in which they appear, prefixes add extra text without
any additional meaning.
The best way to handle qualification of attribute names is to ignore
the form and attributeFormDefault attributes completely. Then,
globally declared attributes will have qualified names, and locally
declared attributes will have unqualified names, which makes sense.

21.8 Schema documentation

XML Schema is a full-featured language for describing the structure
of XML documents. However, it cannot express everything there is to
know about an instance or the data it contains. This section explains
how you can extend XML Schema to include additional information
for users and applications, using two methods: annotations and
non-native attributes.
21.8 | Schema documentation 581

21.8.1 Annotations
Annotations are represented by annotation elements, whose syntax
is shown in Table 21–4. An annotation may appear in almost any
element in the schema, with the exception of annotation itself and
its children, appinfo and documentation. The schema, override,
and redefine elements can contain multiple annotation elements
anywhere among their children. All other elements may only contain
one annotation, and it must be their ﬁrst child.

Table 21–4 XSD Syntax: annotation

Name
annotation

Parents
all elements except annotation, appinfo, and documentation
Attribute name Type Description
id ID Unique ID.
Content
(documentation | appinfo)*

The content model for annotation allows two types of children:

documentation and appinfo. A documentation element is intended
to be human-readable user documentation, and appinfo is machine-
readable for applications. A single annotation may contain multiple
documentation and appinfo elements, in any order, which may
serve different purposes.

21.8.2 User documentation

User documentation is represented by the documentation element.
Sometimes it will consist of simple text content used for a description
582 Chapter 21 | Schema design and documentation

of a component. However, as it can contain child elements, it can be

used for a more complex structure. Often, it is preferable to store more
detail than just a simple description. The types of user information
that you might add to a schema include:

# Descriptive information about what the component means. The

name and structure of a type deﬁnition or element declaration
can explain a lot, but they cannot impart all the semantics of
the schema component.
# An explanation of why a component is structured in a particular
way, or why certain XML Schema mechanisms are used.
# Metadata such as copyright information, who is responsible for
the schema component, its version, and when it was last
changed.
# Internationalization and localization parameters, including
language translations.
# Examples of valid instances.

21.8.2.1 Documentation syntax

The syntax for a documentation element is shown in Table 21–5. It
uses an element wildcard for its content model, which speciﬁes that
it may contain any number of elements from any namespace (or no
namespace), in any order. Its content is mixed, so it may contain
character data as well as children.
The source attribute can contain a URI reference that points to
further documentation. The schema processor does not dereference
this URI during validation.
Example 21–18 shows the use of documentation to document the
product element declaration.
21.8 | Schema documentation 583

Table 21–5 XSD Syntax: documentation

Name
documentation

Parents
annotation

Attribute name Type Description

source anyURI Source of further documentation.
xml:lang language Natural language of the documentation.
Content
any well-formed XML (any character data content and/or child elements)

Example 21–18. Documentation

Although you can put character data content directly in

documentation or appinfo, it is preferable to structure it using at
least one child element. This allows the type of information (e.g.,
description) to be uniquely identiﬁed in the case of future additions.
Instead of (or in addition to) including the information in the anno-
tation, you can also provide links to one or more external documents.
To do this, you can either use the source attribute or other
584 Chapter 21 | Schema design and documentation

mechanisms, such as XLink. This allows reuse of documentation that

may apply to more than one schema component.

21.8.2.2 Data element deﬁnitions

When creating reusable schema documents such as type libraries, it is
helpful to have a complete definition of each component declared or
defined in it. This ensures that schema authors are reusing the correct
components, and allows you to automatically generate human-
readable documentation about the components. Example 21–19 shows
a schema document that includes a complete definition of a simple
type CountryType in its documentation. The example is roughly
based on ISO 11179, the ISO standard for the specification and
standardization of data elements.

Example 21–19. ISO 11179-based type deﬁnition

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:doc="http://datypic.com/doc">

<xs:simpleType name="CountryType">
<xs:annotation>
<xs:documentation>
<doc:name>Country identifier</doc:name>
<doc:identifier>3166</doc:identifier>
<doc:version>1990</doc:version>
<doc:registrationAuthority>ISO</doc:registrationAuthority>
<doc:definition>A code for the names of countries of the
world.</doc:definition>
<doc:keyword>geopolitical entity</doc:keyword>
<doc:keyword>country</doc:keyword>

</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:token">

</xs:restriction>
</xs:simpleType>

</xs:schema>
21.8 | Schema documentation 585

21.8.2.3 Code documentation

Another type of user documentation is code control information, such
as when it was created and by whom, its version, and its dependencies.
This is illustrated in Example 21–20. The element names used in the
example are similar to the keywords in Javadoc.

Example 21–20. Code documentation

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:doc="http://datypic.com/doc">

<xs:simpleType name="CountryType">
<xs:annotation>
<xs:documentation>
<doc:author>Priscilla Walmsley</doc:author>
<doc:version>1.1</doc:version>
<doc:since>1.0</doc:since>
<doc:see>
<doc:label>Country Code Listings</doc:label>
<doc:link>http://datypic.com/countries.html</doc:link>
</doc:see>
<doc:deprecated>false</doc:deprecated>
</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:token">

</xs:restriction>
</xs:simpleType>

</xs:schema>

21.8.2.4 Section comments

There is a reason that schema, override, and redefine elements
can have multiple annotations anywhere in their content. These anno-
tations can be used to break a schema document into sections and
provide comments on each section. Example 21–21 shows annotations
that serve as section comments. Although they are more verbose than
regular XML comments (which are also permitted), they are more
586 Chapter 21 | Schema design and documentation

structured. This means that they can be used, for example, to generate
XHTML documentation for the schema.

Example 21–21. Section identiﬁers

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:annotation><xs:documentation><sectionHeader>
********* Product-Related Element Declarations ***************
</sectionHeader></xs:documentation></xs:annotation>
<xs:element name="product" type="ProductType"/>
<xs:element name="size" type="SizeType"/>

<xs:annotation><xs:documentation><sectionHeader>
********* Order-Related Element Declarations *****************
</sectionHeader></xs:documentation></xs:annotation>
<xs:element name="order" type="OrderType"/>
<xs:element name="items" type="ItemsType"/>

</xs:schema>

21.8.3 Application information

There is a wide variety of use cases for adding application information
to schemas. Some of the typical kinds of application information to
include are:

# Extra validation rules, such as co-constraints. XML Schema

alone cannot express every constraint you might want to impose
on your instances.
# Mappings to other structures, such as databases or EDI messages.
These mappings are a ﬂexible way to tell an application where
to store or extract individual elements.
# Mapping to XHTML forms or other user input mechanisms.
The mappings can include special presentation information
for each data element, such as translations of labels to other
languages.
21.8 | Schema documentation 587

# Formatting information, such as a stylesheet fragment that can

convert the instance element to XHTML, making it presentable
to the user.

The syntax for appinfo, shown in Table 21–6, is identical to that

of documentation, minus the xml:lang attribute.

Table 21–6 XSD Syntax: application information

Name
appinfo

Parents
annotation

Attribute name Type Description

source anyURI Source of further documentation.
Content
any well-formed XML (any character data content and/or child elements)

Example 21–22 shows the use of appinfo to map product to a

database table.

Example 21–22. Application information

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:app="http://datypic.com/app">
<xs:element name="product" type="ProductType">
<xs:annotation>
<xs:appinfo>
<app:dbmapping>
<app:tb>PRODUCT_MASTER</app:tb>
</app:dbmapping>
</xs:appinfo>
</xs:annotation>
</xs:element>

</xs:schema>
588 Chapter 21 | Schema design and documentation

In this example, we declare a namespace http://datypic.com/app

for the dbmapping and tb elements used in the annotation. This is
not required; appinfo can contain elements with no namespace.
However, it is preferable to use a namespace because it makes the ex-
tension easily distinguishable from other information that may be
included for use by other applications.

21.8.4 Non-native attributes

In addition to annotations, all schema elements are permitted to have
additional attributes. These attributes are known as non-native at-
tributes, since they must be in a namespace other than the XML Schema
Namespace. Example 21–23 shows an element declaration that has the
non-native attribute description.

Example 21–23. Non-native attributes

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:doc="http://datypic.com/doc"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/doc doc.xsd">
<xs:element name="product" type="ProductType"
doc:description="This element represents a product."/>

</xs:schema>

As with appinfo and documentation contents, the non-native

attributes are validated through a wildcard with lax validation.
If attribute declarations can be found for the attributes, they will be
validated, otherwise the processor will ignore them. In this case, the
xsi:schemaLocation attribute points to a schema document for
the additional attributes. Example 21–24 shows a schema that might
be used to validate the non-native attributes.
The schema does not include new declarations for schema elements.
Rather, it contains global declarations of any non-native attributes.
21.8 | Schema documentation 589

Example 21–24. A schema for non-native attributes

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/doc"
targetNamespace="http://datypic.com/doc">
<xs:attribute name="description" type="xs:string"/>
</xs:schema>

21.8.4.1 Design hint: Should I use annotations or non-native

attributes?
This is roughly the same as the general question of whether to use ele-
ments or attributes. Both can convey additional information, and both
are made available to the application by the schema processor. Non-
native attributes are less verbose, and perhaps more clear because they
are closer to the deﬁnitions to which they apply. However, they have
drawbacks: they cannot be used more than once in a particular element,
they cannot be extended in the future, and they cannot contain other
elements. For example, if you decide that you want the descriptions to
be expressed in XHTML, this cannot be done if the description is an
attribute. For more information on attributes versus elements, see
Section 7.1 on p. 113.

21.8.5 Documenting namespaces

It is generally a good idea to use URLs for namespace names and to
put a resource at the location referenced by the URL. There are many
reasons not to require your application to dereference a namespace
name at runtime, including security, performance, and network avail-
ability. However, a person might want to dereference the namespace
in order to ﬁnd out more information about it.
It might seem logical to put a schema document at that location.
Having a namespace name resolve to a schema document, though, is
not ideal because:
590 Chapter 21 | Schema design and documentation

# Many schemas may describe that namespace. Which one do

you choose?
# A variety of documents in other formats may also describe that
namespace, including DTDs, human-readable documentation,
schemas written in other schema languages, and stylesheets.
Each may be applicable in different circumstances.
# Schema documents are not particularly human-readable, even
by humans who write them!

A better choice is a resource directory, which lists all the resources

related to a namespace. Such a directory can be both human- and
application-readable. It can also allow different resources to be used
depending on the application or purpose.
One language that can be used to deﬁne a resource directory is RDDL
(Resource Directory Description Language). RDDL is an extension of
XHTML that is used to deﬁne a resource directory. It does not only
apply to namespaces, but it is an excellent choice for documenting a
namespace. Example 21–25 shows an RDDL document that might be
placed at the location http://datypic.com/prod.

Example 21–25. RDDL for the product catalog namespace

<?xml version='1.0'?>
<!DOCTYPE html PUBLIC "-//XML-DEV//DTD XHTML RDDL 1.0//EN"
"rddl/rddl-xhtml.dtd">
<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:rddl="http://www.rddl.org/">
<head><title>Product Catalog</title></head>
<body><h1>Product Catalog</h1>
<div id="toc"><h2>Table of Contents</h2>
<ol>
<li><a href="#intro">Introduction</a></li>
<li><a href="#related.resources">Resources</a></li>
</ol>
</div>

(Continues)
21.8 | Schema documentation 591

Example 21–25. (Continued)

<div id="intro"><h2>Introduction</h2>
<p>This document describes the <a href="#xmlschemap1">Product
Catalog</a> namespace and contains a directory of links to
related resources.</p>
</div>
<div id="related.resources">
<h2>Related Resources for the Product Catalog Namespace</h2>

<div class="resource" id="DTD">

<rddl:resource xlink:title="DTD for validation"
xlink:arcrole="http://www.rddl.org/purposes#validation"
xlink:role="http://www.isi.edu/in-
notes/iana/assignments/media-types/text/xml-dtd"
xlink:href="prod.dtd">
<h3>DTD</h3>
<p>A <a href="prod.dtd">DTD</a> for the Product Catalog.</p>
</rddl:resource>
</div>

<div class="resource" id="xmlschema">

<rddl:resource xlink:title="Products schema"
xlink:role="http://www.w3.org/2001/XMLSchema"
xlink:arcrole="http://www.rddl.org/purposes#schema-validation"
xlink:href="prod.xsd">
<h3>XML Schema</h3>
<p>An <a href="prod.xsd">XML Schema</a> for the Product
Catalog.</p>
</rddl:resource>
</div>

<div class="resource" id="documentation">

<rddl:resource xlink:title="Application Documentation"
xlink:role="http://www.w3.org/TR/html4/"
xlink:arcrole="http://www.rddl.org/purposes#reference"
xlink:href="prod.html">
<h3>Application Documentation</h3>
<p><a href="prod.html">Application documentation</a> for
the Product Catalog application.</p>
</rddl:resource>
</div>
</div>
</body></html>
592 Chapter 21 | Schema design and documentation

This document deﬁnes three related resources. Each resource has a

role, which describes the nature of the resource (e.g., schema, DTD,
stylesheet), and an arcrole, which indicates the purpose of the resource
(e.g., validation, reference). An application that wants to do schema
validation, for example, can read this document and extract the location
of the schema document to be used for validation. A person could also
read this document in a browser, as shown in Figure 21–4.
For more information on RDDL, see www.rddl.org.

Figure 21–4 Viewing a RDDL document in a web browser

Extensibility and
reuse

594
Chapter

ometimes we forget that the “X” in XML stands for

S “eXtensible.” One of the beauties of XML is that additional

elements and attributes can appear in an instance without
affecting the core information. Specialized requirements
for particular applications, industries, or organizations can be addressed
using extensions.
However, in order to make XML extensible, you need to leave
avenues open to do that. If you write schemas and applications that
require very strict validation of an instance, the major beneﬁt of XML
is not realized! This chapter provides detailed recommendations for
developing schemas that will be reusable and extensible in the future.
It also provides guidelines for extending existing schemas.

595
596 Chapter 22 | Extensibility and reuse

22.1 Reuse
First, let’s talk about reusing schema components exactly as they are.
Later in this chapter, we will look at extending and restricting existing
schema components. The beneﬁts of reuse are numerous.

# It reduces development time, because schema developers are

not reinventing the wheel. In addition, developers of stylesheets
and program code to process instances can reuse their code,
saving their time too.
# It reduces maintenance time, because changes only need to be
made in one place. Again, this applies not just to schemas, but
also to applications.
# It increases interoperability. If two systems are reusing some of
the same schema components, it is easier for them to talk to
each other.
# It results in better-designed schemas with fewer errors. Two
heads are better than one, and reused components tend to be
designed more carefully and reviewed more closely.
# It reduces the learning curve on schemas, because the reused
components only need to be learned once. In addition, it
encourages consistency, which also reduces learning curves.

22.1.1 Reusing schema components

When creating a schema, you should attempt to reuse components that
have already been developed—either within your organization, by
standards bodies, or by technology vendors. In addition, you should
attempt to reuse as much within your schema as possible.
You do not have to reuse everything from a schema document that
you import or include. If it was properly modularized, you should not
have to include many components that you do not want to reuse.
Components that can be reused include:
22.1 | Reuse 597

# Named types, both simple and complex

# Named model groups and attribute groups
# Global element and attribute declarations
# Notations

22.1.2 Creating schemas that are highly reusable

You should also consider the reusability of your components as you
deﬁne them. To increase the reusability of your components:

# Use named types, because anonymous types cannot be reused.

# Use named model groups for fragments of content models that
could be reused by multiple unrelated types.
# Use global element declarations, so they can participate in
substitution groups.
# Use generic names when declaring elements and defining types.
For example, if you are defining an address type for customers,
call it AddressType rather than CustomerAddressType.
# Think about a broader applicability of your types. For example,
when defining an address type, consider adding a country ele-
ment declaration, even if you will only be using domestic
addresses.
# Modularize your schemas into smaller documents, so that others
reusing your components will not have to include or import
them all.

22.1.3 Developing a common components library

When designing a complex vocabulary, it is advisable to create libraries
of low-level components that can be used in many contexts. These
components (usually types) are sometimes referred to as “common
components” or “core components.” Examples of good candidates for
common components are:
598 Chapter 22 | Extensibility and reuse

# Identifiers (for example, product identifiers, customer identifiers,

especially if they are made up of multiple parts)
# Code lists such as departments, product types, currencies, natural
languages
# Measurement (i.e., an amount with a unit of measure)
# Price (a number with an associated currency)
# Person information, such as name, contact information, and
mailing address

These are the kinds of data structures that tend to be rewritten over
and over again if there is no plan in place to reuse them. Having one
definition for these low-level components can save a lot of time in
developing and maintaining not only the schema, but the code that
processes and/or generates the messages.
If all of these common components are defined and placed in one
or more separate schema documents, they are easier to reuse than if
they are embedded in another context-specific schema document.
Typically, they are defined as types rather than elements, so that they
can be reused by many element declarations. Example 22–1 shows a
simple common components library.

Example 22–1. Sample common components library

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://datypic.com/common"
xmlns="http://datypic.com/common"
elementFormDefault="qualified">
<xs:simpleType name="ProductIDType">
<xs:restriction base="xs:string">
<xs:pattern value="[A-Z]{2}[0-9]{4}"/>
</xs:restriction>
</xs:simpleType>

(Continues)
22.2 | Extending schemas 599

Example 22–1. (Continued)

<xs:complexType name="PriceType">
<xs:simpleContent>
<xs:extension base="xs:decimal">
<xs:attribute name="currency" type="CurrencyCodeType"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
<xs:complexType name="MeasurementType">
<xs:simpleContent>
<xs:extension base="xs:decimal">
<xs:attribute name="units" type="UnitsCodeType"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
<xs:complexType name="AddressType">
<xs:sequence>
<xs:element name="street" type="xs:string" maxOccurs="3"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="state" type="xs:string"/>
<xs:element name="postalCode" type="xs:string"/>
<xs:element name="country" type="xs:string"/>
</xs:sequence>
</xs:complexType>

</xs:schema>

22.2 Extending schemas

In some cases, you want to reuse existing schema components, but you
have specific extensions you need to add to make them useful to
you. Creating a completely new schema that copies the original defini-
tions is tempting because it is easy and flexible. You do not have to
worry about basing the new definitions on the original ones. However,
there are some important drawbacks.

# Your new instances could be completely incompatible with the

original ones.
600 Chapter 22 | Extensibility and reuse

# You will have duplicate deﬁnitions of the same components.

This makes maintenance more difﬁcult and discourages
consistency.
# You do not have a record of the differences between the two
deﬁnitions.

This section identiﬁes several ways in which XML Schema allows

extension. It describes both how to make your schemas extensible and
how to extend others’ schemas. The various extension mechanisms are
summarized in Table 22–1.

Table 22–1 Comparison of extension mechanisms

Wild- 1.1 Open Type Substi- Type Named 1.1 Over-
cards content substitution redefi- group rides
tution groups nition redefi-
nition
Is an extended yes yes no no no no no
instance valid
against original
definition?
Does it require no no yes no no no no
use of xsi:type
in instance?
Can it define ex- yes yes yes yes no no no
tended types in
a different
namespace?
Must element no no yes† no yes no no
extensions ap-
pear at the end
of a content
model?
† Except when an all group is extended with another all group, which is only
allowed in version 1.1.
22.2 | Extending schemas 601

Most of these extension methods take some planning, or at least re-

quire the use of certain design characteristics, such as global compo-
nents, when the original schemas are being created. If you are designing
an XML vocabulary, particularly a complex one or one that you intend
for other organizations to use and extend, you should choose one or
more of these extension methods and design your schemas accordingly.
It is a good idea to document the extension method you have in mind,
with examples, in your Naming and Design Rules document.
If you are in the position of extending another schema over which
you have no control, you may not be able to use all of these methods,
depending on how the original schema was designed.

22.2.1 Wildcards
Wildcards are the most straightforward way to define extensible types.
They can be used to allow additional elements and attributes in your
instances. Of the methods of extension discussed in this chapter,
wildcards and open content are the only ones that allow an in-
stance with extensions to validate against the original schema. All the
other methods require defining a new schema for the extensions.
Example 22–2 shows a complex type definition that contains both
an element wildcard (the any element) and an attribute wildcard

Example 22–2. Original type using wildcards

<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer" minOccurs="0"/>
<xs:any minOccurs="0" maxOccurs="unbounded"
namespace="##other" processContents="lax"/>
</xs:sequence>
<xs:anyAttribute namespace="##other" processContents="skip"/>
</xs:complexType>
602 Chapter 22 | Extensibility and reuse

(the anyAttribute element). For a complete discussion of element

and attribute wildcards, see Sections 12.7.1 on p. 285 and 12.7.3 on
p. 298, respectively.
There are several things to note about the deﬁnition of
ProductType.

# The namespace constraint is set to ##other. This will avoid

erroneous content from being validated, such as a product ele-
ment that contains two color elements. It also avoids non-
deterministic content models that violate the Unique Particle
Attribute rule, as described at the end of this section.
# The value of processContents is lax. This allows the instance
author to provide hints as to where to find the declarations for
the additional elements or attributes. If they do not provide
hints, or the particular processor ignores the hints, it is not a
problem; no errors will be raised. However, if the declarations
can be found, they will be validated.
# The values of minOccurs and maxOccurs are 0 and
unbounded, respectively. This allows zero, one, or many replace-
ment elements to appear. The values of these two attributes
default to 1, which is generally not the intention of the schema
author.
# The wildcard appears at the end of the complex type definition.
This allows replacement elements only after the defined content
model. This is similar to the way extension works. You are
permitted to put wildcards anywhere in the content model, but
it might make processing the instance more difficult. With a
wildcard at the end, the application can process what it is
expecting and ignore the rest.

Suppose some additional features have been added to the ordering

process, such as a points system to reward regular customers and a gift
wrap capability. The instance shown in Example 22–3 takes
advantage of the wildcards in the ProductType deﬁnition to add
22.2 | Extending schemas 603

an spc:giftWrap element to the end of the content, as well as an

spc:points attribute.

Example 22–3. Instance with extensions

<order xmlns="http://datypic.com/ord"
xmlns:spc="http://datypic.com/spc">
<product spc:points="100">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size>10</size>
<spc:giftWrap>ADULT BDAY</spc:giftWrap>
</product>
</order>

Since processContents was set to lax, the instance shown would

be valid according to the original schema, without specifying any dec-
larations for the new attribute and element. If you want to validate the
new attribute and element, you can create a schema that contains their
declarations, as shown in Example 22–4.

Example 22–4. Schema for extensions

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://datypic.com/spc"
targetNamespace="http://datypic.com/spc">
<xs:element name="giftWrap" type="xs:string"/>
<xs:attribute name="points" type="xs:nonNegativeInteger"/>
</xs:schema>

Note that the element and attribute declarations are global. This is
necessary so that the processor can find the declarations.
Another approach for “extending” complex types with wildcards is
actually to restrict them. You could define a complex type that restricts
ProductType and includes the declarations of giftWrap and points.
For more information, see Section 13.5.2.3 on p. 322.
The advantage of using wildcards for making types extensible is that
this is very flexible: The instance author is not required to have a
604 Chapter 22 | Extensibility and reuse

schema that declares the replacement elements and attributes. However,

in some cases this ﬂexibility may be a little too forgiving, as it can
obscure real errors.
One challenge of using wildcards in version 1.0 is that if the wildcards
allow extensions in the same namespace, i.e. the target namespace of
the schema, you can run into Unique Particle Attribution violations.
If the wildcard is preceded by a declaration for an optional element,
the processor does not know whether to use the element declaration
or the wildcard to validate an element whose name matches the decla-
ration. Fortunately, this is alleviated in version 1.1, and the processor
will always choose the element declaration.

22.2.2 Open content

Open content is an even more ﬂexible form of wildcards available
starting in version 1.1. Complex types that have open content can allow
replacement elements to appear anywhere within its content, not just
in places designated by element wildcards.
Example 22–5 shows a complex type deﬁnition that has open
content. Note that openContent doesn’t apply to attributes, so it is
necessary to include an attribute wildcard to support any attribute ex-
tensions. For a complete discussion of open content, see Section 12.7.2
on p. 292.

Example 22–5. Original type using open content

<xs:complexType name="ProductType">
<xs:openContent>
<xs:any namespace="##other" processContents="lax"/>
</xs:openContent>
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer" minOccurs="0"/>
</xs:sequence>
<xs:anyAttribute namespace="##other" processContents="skip"/>
</xs:complexType>
22.2 | Extending schemas 605

The use of the openContent element means that the extension ele-
ments can appear interleaved anywhere in the content. To allow them
to only appear at the end, you can use a mode="suffix" attribute
on openContent. The instance shown in Example 22–6 takes advan-
tage of the open content in the ProductType deﬁnition to add an
spc:giftWrap element into the middle of the content.

Example 22–6. Instance with open content extensions

<order xmlns="http://datypic.com/ord"
xmlns:spc="http://datypic.com/spc">
<product spc:points="100">
<number>557</number>
<spc:giftWrap>ADULT BDAY</spc:giftWrap>
<name>Short-Sleeved Linen Blouse</name>
<size>10</size>
</product>
</order>

As with the wildcard example, since processContents was set to

lax, the instance shown would be valid according to the original
schema, without specifying any declarations for the new attribute and
element. If you want to validate the new attribute and element, you
can create a schema that declares them globally, as was shown in
Example 22–4.

22.2.3 Type substitution

Deriving new types from the existing types is another possibility. You
can create a new schema whose types extend the original types.
You would then have to indicate the new types in the instance, using
the xsi:type attribute. Unlike the wildcard approach, instances that
contain extensions would not be valid according to the original schema.
If you want to use the extended instance as a replacement for the orig-
inal instance, you should ﬁrst check to make sure that your application
can handle the new extended instance.
606 Chapter 22 | Extensibility and reuse

This approach is appropriate when you want to extend a schema

over which you have no control. Example 22–7 shows a complex type
that you might want to extend.

Example 22–7. Original type

There are several things to note about the deﬁnition of

ProductType.

# It is a named complex type. Anonymous complex types cannot

be extended.
# There are no block or final attributes to prohibit type
derivation or substitution.
# A sequence group is used. Extension does not work well for
choice groups, as described in the next section. For all groups,
extension is forbidden in version 1.0 but permitted (and useful)
in version 1.1.

Example 22–8 shows an extension of the original ProductType.

For more information on complex content extension, see Section 13.4.2
on p. 307.
The instance shown in Example 22–9 conforms to the extended type
deﬁnition, but not the base type deﬁnition. It is identical to the instance
using wildcards shown in Example 22–3, except that the xsi:type
attribute appears in the product tag. For more information on
type substitution, see Section 13.6 on p. 341.
22.2 | Extending schemas 607

Example 22–8. Extended type

<xs:complexType name="ExtendedProductType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:sequence>
<xs:element ref="spc:giftWrap" minOccurs="0"/>
</xs:sequence>
<xs:attribute ref="spc:points"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>

Example 22–9. Instance using extended type

<order xmlns="http://datypic.com/ord"
xmlns:spc="http://datypic.com/spc"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<product spc:points="100" xsi:type="ExtendedProductType">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size>10</size>
<spc:giftWrap>ADULT BDAY</spc:giftWrap>
</product>
</order>

22.2.4 Substitution groups

As we saw in Section 13.4.2.1 on p. 309, extending a content model
which contains a choice group can have unexpected results. Exam-
ple 22–10 shows a type ExpandedItemsType that extends ItemsType
to add new product types. Intuitively, you may think that the two ad-
ditional element declarations, sweater and suit, are added to the
choice group, allowing a choice among the ﬁve elements. In fact, the
effective content model of ExpandedItemsType is a sequence group
that contains two choice groups. As a result, ExpandedItemsType
will require any of the shirt, hat, and umbrella elements to appear
before any of the sweater or suit elements.
608 Chapter 22 | Extensibility and reuse

Example 22–10. choice group extension

<xs:complexType name="ItemsType">
<xs:choice maxOccurs="unbounded">
<xs:element ref="shirt"/>
<xs:element ref="hat"/>
<xs:element ref="umbrella"/>
</xs:choice>
</xs:complexType>
<xs:complexType name="ExpandedItemsType">
<xs:complexContent>
<xs:extension base="ItemsType">
<xs:choice maxOccurs="unbounded">
<xs:element ref="sweater"/>
<xs:element ref="suit"/>
</xs:choice>
</xs:extension>
</xs:complexContent>
</xs:complexType>

Substitution groups are a better way to extend choice groups. If

you add another element declaration, otherProduct, to the choice
group in ItemsType, it can serve as the head of a substitution group.
This makes extending the choice much easier. The element declarations
for sweater and suit can be supplied in another schema document,
even in another namespace.
In Example 22–11, the otherProduct element declaration is added
to act as the head of the substitution group. It would also have been

Example 22–11. Original type with an abstract element declaration

<xs:complexType name="ItemsType">
<xs:choice maxOccurs="unbounded">
<xs:element ref="shirt"/>
<xs:element ref="hat"/>
<xs:element ref="umbrella"/>
<xs:element ref="otherProduct"/>
</xs:choice>
</xs:complexType>

<xs:element name="otherProduct" type="ProductType"

abstract="true"/>
22.2 | Extending schemas 609

legal to simply make umbrella the head of the substitution group,

but this would be less intuitive and would prevent you from ever
allowing umbrella without also allowing sweater and suit in its
place.
Example 22–12 shows the two element declarations that are
substitutable for otherProduct.

Example 22–12. Extension using substitution groups

<xs:element name="sweater" substitutionGroup="otherProduct"/>
<xs:element name="suit" substitutionGroup="otherProduct"/>

Example 22–13 shows a valid instance. As you can see, the child el-
ements can appear in any order. In this case, they are all in the same
namespace. It is also possible for substitution element declarations to
be in different namespaces.

Example 22–13. Instance using extension via substitution groups

It would have also been valid to put an element wildcard in the

choice group. However, the substitution group approach is more
controlled, because you can speciﬁcally designate the substitutable
element declarations. For complete coverage of substitution groups,
see Chapter 16.

22.2.5 Type redeﬁnition

Redeﬁnition, unlike type substitution, does not require the use of the
xsi:type attribute in instances. The redeﬁned components have
610 Chapter 22 | Extensibility and reuse

the same name as they had in the original definition. However, redefi-
nition can only be done within the same namespace, so it is not appro-
priate for altering schemas over which you have no control. In addition,
redefinition has some risks associated with it, as detailed in Section 18.3
on p. 468.
The original type might look exactly like the one shown in Exam-
ple 22–7, with similar constraints. It must be named, and it should
use a sequence group. Example 22–14 shows a redefinition of
ProductType to add a new element declaration and attribute declara-
tion. It is similar to the definition of the derived type shown in
Example 22–8, with two important differences.

1. It is deﬁned entirely within the redefine element.

2. The extended type and the original type have the same name.

For more information on type redeﬁnition, see Section 18.1.4 on

p. 453.
Again, a valid instance would look like Example 22–3.

Example 22–14. Redeﬁned type

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:spc="http://datypic.com/spc"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">
<xs:import namespace="http://datypic.com/spc"/>
<xs:redefine schemaLocation="original.xsd">
<xs:complexType name="ProductType">
<xs:complexContent>
<xs:extension base="ProductType">
<xs:sequence>
<xs:element ref="spc:giftWrap" minOccurs="0"/>
</xs:sequence>
<xs:attribute ref="spc:points"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
</xs:redefine>
</xs:schema>
22.2 | Extending schemas 611

Although a redeﬁnition of the type must take place in the same

namespace, the extended element and attribute declarations are not
required to be in that namespace. In our example, they are not.

22.2.6 Named group redeﬁnition

Another alternative is to define named model groups and attribute
groups, and redefine these groups. This is less rigid than redefining
types because the extensions do not have to be at the end of the content
models.
Example 22–15 shows the original ProductType definition,
this time using a named model group and an attribute group. The
entire content model of the type is contained in the group
ProductPropertyGroup.

Example 22–15. Original type

<xs:complexType name="ProductType">
<xs:group ref="ProductPropertyGroup"/>
<xs:attributeGroup ref="ExtensionGroup"/>
</xs:complexType>

<xs:group name="ProductPropertyGroup">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer" minOccurs="0"/>
</xs:sequence>
</xs:group>

<xs:attributeGroup name="ExtensionGroup"/>

Example 22–16 shows a redeﬁnition of the named model group and

attribute group. Redeﬁning the groups affects all the complex types
that reference those groups.
612 Chapter 22 | Extensibility and reuse

Example 22–16. Redeﬁned named model group and attribute group

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:spc="http://datypic.com/spc"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord">
<xs:import namespace="http://datypic.com/spc"/>
<xs:redefine schemaLocation="original.xsd">
<xs:group name="ProductPropertyGroup">
<xs:sequence>
<xs:element ref="spc:giftWrap"/>
<xs:group ref="ProductPropertyGroup"/>
</xs:sequence>
</xs:group>
<xs:attributeGroup name="ExtensionGroup">
<xs:attributeGroup ref="ExtensionGroup"/>
<xs:attribute ref="spc:points"/>
</xs:attributeGroup>
</xs:redefine>
</xs:schema>

A valid instance would look like the one shown in Example 22–17.
In this case, giftWrap appears as the ﬁrst child of product.

Example 22–17. Instance using redeﬁned named model group and attribute
group
<order xmlns="http://datypic.com/ord"
xmlns:spc="http://datypic.com/spc">
<product spc:points="100">
<spc:giftWrap>ADULT BDAY</spc:giftWrap>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size>10</size>
</product>
</order>

22.2.7 Overrides
Starting in version 1.1, overrides can be used instead of redeﬁnes. In
fact, they are preferred because redeﬁnes are deprecated. Overrides
22.2 | Extending schemas 613

work similarly to redeﬁnes, but have an advantage of being more ﬂexi-

ble. The new deﬁnition does not have to relate to the original deﬁnition
in any way.
Example 22–18 shows a schema similar to Example 22–14, but with
an override instead of a redefine. In our case, we chose to modify
it in a similar way: add the spc:giftWrap element declaration at the
end and add the spc:points attribute. However, the spc:giftWrap
element declaration could have appeared anywhere in the content
model; in fact, the original element declarations could have been
removed or reordered.

Example 22–18. Overridden type

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:spc="http://datypic.com/spc"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord"
elementFormDefault="qualified">
<xs:import namespace="http://datypic.com/spc"/>
<xs:override schemaLocation="original.xsd">
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer" minOccurs="0"/>
<xs:element ref="spc:giftWrap" minOccurs="0"/>
</xs:sequence>
<xs:attribute ref="spc:points"/>
</xs:complexType>
</xs:override>
</xs:schema>

Overrides can also be used on named groups. Example 22–19 shows

a schema similar to Example 22–16, again replacing the redefine
with override.
A valid instance would look like the one shown in Example 22–20.
In this case, giftWrap appears as the ﬁrst child of product.
614 Chapter 22 | Extensibility and reuse

Example 22–19. Overridden named model group and attribute group

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:spc="http://datypic.com/spc"
xmlns="http://datypic.com/ord"
targetNamespace="http://datypic.com/ord"
elementFormDefault="qualified">
<xs:import namespace="http://datypic.com/spc"/>
<xs:override schemaLocation="original.xsd">
<xs:group name="ProductPropertyGroup">
<xs:sequence>
<xs:element ref="spc:giftWrap"/>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer" minOccurs="0"/>
</xs:sequence>
</xs:group>
<xs:attributeGroup name="ExtensionGroup">
<xs:attribute ref="spc:points"/>
</xs:attributeGroup>
</xs:override>
</xs:schema>

Example 22–20. Instance using overridden named model group and attribute
group
<order xmlns="http://datypic.com/ord"
xmlns:spc="http://datypic.com/spc">
<product spc:points="100">
<spc:giftWrap>ADULT BDAY</spc:giftWrap>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size>10</size>
</product>
</order>
Versioning

616
Chapter

s business and technical requirements change over time, you

A will need to deﬁne new versions of your schemas. Deﬁning

new versions is a special case of extension and restriction.
You may be both adding and removing components, with
the intention of replacing the older version.
When you create a new version intended to replace a previous one,
you should create a completely new schema rather than attempt to ex-
tend or restrict the existing one. Otherwise, as time goes on and addi-
tional versions are created, the definitions could become unnecessarily
complicated and difficult to process. If you are not using the restriction
and extension mechanisms of XML Schema, though, you need to take
extra care to make the new definitions compatible with the old ones.

23.1 Schema compatibility

In many cases, you will want to maintain some compatibility between
versions. You might want to allow instances to be validated against

617
618 Chapter 23 | Versioning

either schema, or be processed by an application that supports either

version. This is especially true if your instances persist for a period. If
your instances are short-lived messages between applications, compati-
bility is less of an issue. However, you should still try to be as consistent
as possible to reduce learning curves and minimize the changes in
the applications that process the instances.
There are two kinds of compatibility:

1. Backward compatibility, where all instances that conform to the

previous version of the schema are also valid according to
the new version
2. Forward compatibility, where all instances that conform to the
new version are also valid according to the previous version of
the schema

23.1.1 Backward compatibility

Ideally, you should have backward compatibility of the schemas
from one version to the next. That is, instances that were created to
conform to version 2.0 of the schema should also be valid according
to version 2.1.1 This is possible if you are only adding optional new
components and/or reducing restrictiveness. To accomplish this, the
previous version must allow a subset of what is allowed by the new
version.
The following changes to a schema are backward-compatible:

# Adding optional elements and attributes.

# Making required elements and attributes optional.
# Making occurrence constraints less restrictive—for example,
allowing more than one color element where only one
was allowed before.

1. Example version numbers start with 2.0 in this chapter to specify the version
of the vocabulary being deﬁned by the schema, to avoid confusion with
the versions of the XML Schema language itself which are 1.0 and 1.1.
23.1 | Schema compatibility 619

# Turning speciﬁc element declarations into choice groups. For

example, where color was allowed, now it can be color or
size or weight. Similarly, you can declare new substitution
groups. For example, where the content model allowed color,
now size and weight are valid substitutes.
# Making simple types less restrictive by making bounds facets
and length facets less restrictive, adding enumeration values, or
making patterns less restrictive.
# Turning a simple type into a union of that simple type and one
or more other simple types.
# Turning a simple type into a list type that allows multiple values
of the original type.
# Adding optional wildcards or open content.
# Replacing element or attribute declarations with wildcards.
# Turning a sequence group into an all group or a repeating
choice group.

The following changes to a schema are not backward-compatible:

# Changing the order of elements or imposing an order where

none was imposed previously.
# Changing the structure of elements, for example adding more
levels of elements.
# Removing any element or attribute declarations.
# Removing wildcards or open content, or making them more
restrictive in terms of what namespaces they allow or how
strictly replacement elements are validated.
# Changing the names of any elements or attributes.
# Changing the target namespace of the schema.
# Adding any required elements or attributes.
# Making optional elements or attributes required.
620 Chapter 23 | Versioning

# Making occurrence constraints more restrictive—for example,

allowing only one color element where more than one
was allowed before.
# Making simple types more restrictive by making bounds facets
and length facets more restrictive, removing enumeration values,
or making patterns more restrictive.

For example, suppose you have the complex type deﬁnition shown
in Example 23–1. Its version number is 2.0.

Example 23–1. Version 2.0 of a complex type

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="2.0">
<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer" minOccurs="0"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="SizeType"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="SizeType">
<xs:simpleContent>
<xs:extension base="xs:integer">
<xs:attribute name="system" type="xs:token"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:schema>

Example 23–2 shows a backward-incompatible deﬁnition for a new

version, 2.1. It is backward-incompatible for a number of reasons.

# The order of the element declarations changed; name is now

after size.
# The number element was removed, which is incompatible even
though it was optional.
23.1 | Schema compatibility 621

Example 23–2. Backward-incompatible deﬁnition

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="2.1">
<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="size" type="SizeType" maxOccurs="3"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="description" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="SizeType">
<xs:simpleContent>
<xs:extension base="xs:positiveInteger">
<xs:attribute name="system" type="xs:token"
use="required"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:schema>

# A required description element was added.

# The optional system attribute was made required.
# The occurrence constraints on size were made more restrictive.
# The contents of SizeType were made more restrictive, allowing
only positive integers instead of all integers.

As a result of all these changes, an instance that conformed to ver-

sion 2.0 may not be valid according to version 2.1. Example 23–3
shows such an instance. On the other hand, Example 23–4 shows a
deﬁnition that is backward-compatible.
All of the changes in this example were backward-compatible because
they do not affect the validity of version 2.0 instances. For example:

# No element or attribute declarations were removed or reordered.

# Only declarations for optional elements and attributes (desc
and units) were added.
# The required name element was made optional.
622 Chapter 23 | Versioning

Example 23–3. Backward-incompatible instance

<product>
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size>0</size>
<size>2</size>
<size>4</size>
<size>6</size>
</product>

Example 23–4. Backward-compatible deﬁnition

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="2.1">
<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"
minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="name" type="xs:string" minOccurs="0"/>
<xs:element name="size" type="SizeType"
maxOccurs="unbounded"/>
<xs:element name="desc" type="xs:string" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="SizeType">
<xs:simpleContent>
<xs:extension base="xs:decimal">
<xs:attribute name="system" type="xs:token"/>
<xs:attribute name="units" type="xs:token"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:schema>

# The number element was made repeating, which is less

restrictive.
# The contents of SizeType were made less restrictive, allowing
any decimal number instead of only an integer.
23.1 | Schema compatibility 623

23.1.2 Forward compatibility

Some schema designers take their versioning strategy a step further:
They make their schemas forward-compatible, so that a version 2.1
instance is valid according to the version 2.0 schema. This requires
some careful planning when developing the 2.0 schema. An area needs
to be set aside for the elements that might be added in version 2.1.
This area needs to be allowed to contain unspecified content in the 2.0
schema, but be more specifically defined (by adding new element dec-
larations) in the 2.1 schema.
This is typically done by defining wildcards in the original schema.
In Example 23–5, both element and attribute wildcards are used in the
version 2.0 schema. The processContents option is set to skip so
that the processor does not look for declarations that do not exist
in this version of the schema.

Example 23–5. Version 2.0 of a forward-compatible complex type

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="2.0">
<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer" minOccurs="0"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer"
maxOccurs="unbounded"/>
<xs:any minOccurs="0" maxOccurs="unbounded"
processContents="skip"/>
</xs:sequence>
<xs:anyAttribute processContents="skip"/>
</xs:complexType>
</xs:schema>

Example 23–6 shows version 2.1 of the schema, with a new element
desc and a new attribute dept. This version of the schema also includes
element and attribute wildcards to allow it to be forward-compatible
with version 2.2 of the schema.
624 Chapter 23 | Versioning

Example 23–6. Version 2.1 of a forward-compatible complex type

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="2.1">
<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer" minOccurs="0"/>
<xs:element name="name" type="xs:string"/>
<xs:element name="size" type="xs:integer"
maxOccurs="unbounded"/>
<xs:element name="desc" type="xs:string"/>
<xs:any minOccurs="0" maxOccurs="unbounded"
processContents="skip"/>
</xs:sequence>
<xs:attribute name="dept" type="xs:token"/>
<xs:anyAttribute processContents="skip"/>
</xs:complexType>
</xs:schema>

Example 23–7 shows an instance that is valid according to ver-

sion 2.1, but is also allowed by version 2.0 because of the wildcards.

Example 23–7. Forward-compatible instance

<product dept="WMN">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
<size>0</size>
<desc>Our best-selling shirt</desc>
</product>

The method shown in Examples 23–5 and 23–6 works ﬁne, but
only because size is required. In version 1.0 of XML Schema, if size
were optional, this complex type would violate the Unique Particle
Attribution rule. A processor, upon encountering a size element,
would not know whether to use the size element declaration or
the wildcard to validate it. In version 1.1, this constraint has been
eliminated, and the element declaration will always be used instead of
the wildcard when both might apply.
23.1 | Schema compatibility 625

In version 1.0, this problem can be avoided either by inserting a

dummy required element at the end of the deﬁned content model, or
by putting the wildcard inside a child element, for example one called
extension. Unfortunately, neither is a great option, because for every
minor version there needs to be a new dummy element or extension
child, cluttering up the content model. Instead, it is highly recom-
mended that you upgrade to XML Schema 1.1 if you need forward
compatibility.
One disadvantage of forward compatibility is that having an open-
ended wildcard on every type makes the schemas very ﬂexible. The
wildcard in Example 23–6 will allow any replacement elements, includ-
ing ones that are already declared in that version 2.1 schema. In
version 1.1 of XML Schema, this can be mitigated by putting a
notQName="##defined" attribute on the wildcard. This means that
a replacement element cannot be one that is already declared in the
schema.
In fact, version 1.1 offers a number of features to make forward
compatibility easier, including:

# Open content, where wildcards can be automatically interleaved

everywhere in a type. This removes the need to specify wildcards
between each pair of adjacent elements when maximum future
ﬂexibility is needed. Open content is covered in Section 12.7.2
on p. 292.
# Negative wildcards, where you can specify names and name-
spaces that are not allowed as replacement elements, thus limiting
excessive ﬂexibility of wildcards. Negative wildcards are covered
in Section 12.7.1.3 on p. 289.
# Looser restrictions on all groups, which means that it is easier
to create types where the order of child elements doesn’t matter.
This makes it easier to insert elements that can be interleaved
in future versions without requiring that all new content comes
at the end. This is covered in Section 12.5.4 on p. 276.
626 Chapter 23 | Versioning

Forward compatibility is harder to achieve and therefore less com-

mon. However, it is a worthy goal, especially in cases where it is likely
that older application code (designed to process prior versions) is
likely to persist unchanged for long periods of time.
Note that forward compatibility does not automatically include
backward compatibility. It is possible to introduce backward-
incompatible changes to a forward-compatible schema. In fact, Exam-
ple 23–6 is not backward-compatible because the desc element is re-
quired. It is possible to have a 2.0 instance without a desc element,
in which case it is invalid in version 2.1. If both forward compatibility
and backward compatibility are desired, both must be considered when
designing schemas.

23.2 Using version numbers

23.2.1 Major and minor versions
The version numbers used in this chapter have the format of two
numbers separated by a period, for example, 2.1. It is implied that “2”
represents the major version number and “1” represents the minor
version number. There is no requirement for version numbers to have
this format in XML Schema. In fact, the version attribute will accept
any string. However, it is common practice to use numeric version
numbers because they make it easy to see the order over time.
It is also typical to use both major and minor version numbers.
A change in the minor version number only indicates a minor
release—one that has little impact in terms of the number or extensive-
ness of changes. A change in the major version number indicates a
major release, which tends to be more disruptive and involve
more changes. Many designers of XML vocabularies make this
deﬁnition more formal: They use minor versions for releases that
are backward-compatible and major versions for releases that are
backward-incompatible.
23.2 | Using version numbers 627

Figure 23–1 depicts this approach. There is backward compatibility

among the 2.x releases, and backward compatibility among the 3.x
releases, but not between the two major releases. Within a particular
major version, there is backward-compatibility from one release to the
next. Version 2.1 is obviously designed to be backward-compatible
with 2.0. Version 2.2 should be built on version 2.1, including all of
the new (optional) elements and attributes and other changes made in
version 2.1, so it is backward-compatible with both versions 2.1 and 2.0.
When version 3.0 is released, it doesn’t have to be backward-
compatible with version 2.3 or any of the 2.x versions. This is a chance
to make signiﬁcant changes. It may be useful during a major release to
consider making some of the optional elements and attributes added
in minor versions required. They may have been added as optional
simply to achieve backward compatibility in minor releases, even though
it was actually preferable to make them required. It is also an opportu-
nity to remove any elements or attributes that were deprecated in
previous releases.

Figure 23–1 Major and minor versions

628 Chapter 23 | Versioning

23.2.2 Placement of version numbers

Every schema should have an associated version number. There are at
least four possible places to indicate the version number of a schema,
none of which is actually required by XML Schema. They are discussed
in this section.

23.2.2.1 Version numbers in schema documents

The version attribute of schema is an arbitrary string that represents
the version of the vocabulary being described by the schema document.
Note that it is not intended to convey whether you are using version 1.0
or 1.1 of the XML Schema language itself; there is no need to indicate
this in your schema document. The version attribute is strictly
for documentation; an XML Schema processor does not use it. It is
optional, but its use is encouraged.
Example 23–8 shows a schema that uses the version attribute to
indicate that it is version 2.1 of the schema, along with an instance that
conforms to it. The instance in this example is not doing anything
special to indicate the version of the schema to which it conforms.

Example 23–8. Using a version number in a schema

Schema (prod.xsd):

(Continues)
23.2 | Using version numbers 629

Example 23–8. (Continued)

Instance:

The version attribute is intended to apply to the schema document

itself and all the components deﬁned within it. It is also possible to use
non-native attributes or annotations to indicate version numbers for
individual components in a schema. Example 23–9 shows a schema
document that uses non-native attributes to add a doc:version

Example 23–9. Using a version number on individual schema components

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
xmlns="http://datypic.com/prod"
targetNamespace="http://datypic.com/prod"
xmlns:doc="http://datypic.com/doc"
version="2.1">
<xs:element name="product" type="ProductType" doc:version="2.0"/>
<xs:complexType name="ProductType" doc:version="2.0">
<xs:sequence>
<xs:element name="number" type="xs:integer" minOccurs="0"/>
<xs:element name="name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:element name="catalog" type="CatalogType" doc:version="2.1"/>
<xs:complexType name="CatalogType" doc:version="2.1">
<xs:sequence>
<xs:element name="catalog_id" type="xs:string"/>
<xs:element ref="product" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
630 Chapter 23 | Versioning

attribute to indicate that the catalog element declaration and its type
are at version 2.1, while the product element declaration and its
type are at version 2.0.
This may be useful as a way to clearly delineate which components
have changed over multiple versions. However, it does require some
extra management of the components.

23.2.2.2 Versions in schema locations

The filename or URL of the schema document can also contain the
version number. For example, the new version may have a file-
name of prod_2.1.xsd or be located in a directory structure that
indicates the version number, for example 2.1/prod.xsd or
http://datypic.com/prod/2.1/prod.xsd. Changing the URL
makes it easier for other schema documents that may include or import
your schema document to continue to use the previous version until
they can upgrade. Example 23–10 shows a schema whose filename
contains its version number.

Example 23–10. Using a version number in the schema location

Schema (prod_2.1.xsd):

(Continues)
23.2 | Using version numbers 631

Example 23–10. (Continued)

Instance:

<product xmlns="http://datypic.com/prod"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/prod prod_2.1.xsd">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
</product>

23.2.2.3 Versions in instances

It may be worthwhile to have instances identify the schema version to
which they conform. This will allow an application to process it accord-
ingly. For example, in XSLT, the stylesheet element has a required
attribute named version. This is a signal to processors that the
stylesheet instance conforms to, for example, version 2.0 of XSLT.
Typically a version attribute in an instance appears on the root ele-
ment. Example 23–11 shows a schema and related instance where
the version number 2.1 is indicated on the root element. Note that the
version attribute has to be declared in the schema; it is not a special
attribute that can appear without a declaration.
Some schemas put a fixed value on the version declaration to
ensure that an instance can only be validated by a particular version of
a schema. For example, if fixed="2.1" were added to the attribute
declaration in Example 23–11, the version number would have to
be 2.1 (or the attribute would have to be absent) in the instance for
it to be valid according to this schema. However, this is not recom-
mended, at least for minor versions, because it breaks backward
compatibility.
632 Chapter 23 | Versioning

Example 23–11. Using a version number in the instance

Schema (prod.xsd):

Instance:

23.2.2.4 Versions in namespace names

Many vocabularies also indicate their version number in the namespace
name. When you change a namespace name, it is as if you were com-
pletely renaming the components in that namespace. This instantly
breaks backward compatibility between schema versions, as the names
have essentially changed. It also frequently requires applications that
process the instances to change, since many XML technologies (such
as XPath, XQuery, and XSLT) are namespace-aware.
That may be desirable in the case of a major release where there is
no intention of backward compatibility and the instances change so
much that it is necessary for applications to change the way they
23.2 | Using version numbers 633

process the instances. It is deﬁnitely not appropriate for minor releases

intended to be backward-compatible. Therefore, when a version
number is included in a namespace name, it is frequently only the
major version number.
Example 23–12 shows a schema that uses the major version number
(“2”) in the namespace name.

Example 23–12. Using a version number in the namespace name

Schema (prod.xsd):

Instance:

<product xmlns="http://datypic.com/prod/2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/prod/2 prod.xsd">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
</product>

23.2.2.5 A combination strategy

It is likely that you will use a combination of some, or all, of the four
version number locations. Example 23–13 shows a schema that uses
all four methods.
634 Chapter 23 | Versioning

Example 23–13. Using multiple methods to indicate version number

Schema (schemas/prod/2.1/prod.xsd):

Instance:

<product xmlns="http://datypic.com/prod/2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://datypic.com/prod/2
schemas/prod/2.1/prod.xsd"
version="2.1">
<number>557</number>
<name>Short-Sleeved Linen Blouse</name>
</product>

23.3 Application compatibility

Whether or not you can achieve schema compatibility, it is also
worthwhile to try to achieve application compatibility. Well-designed
applications that were written to process the previous version should
be able to process instances of the new version without crashing.
Likewise, applications that process the new version can be made to
support both versions. If the new version only contains optional addi-
tions, the application can use the same logic for both versions of in-
stances. Alternatively, the application can check the version number
(as described in Section 23.2) and process each version differently.
23.4 | Lessening the impact of versioning 635

It is impossible to predict how people will modify or extend a schema

over time, but several simple practices can help handle changes more
gracefully.

# Ignore irrelevant elements or attributes. The application should

process the elements and attributes it is expecting, without
generating errors if additional elements or attributes appear.
This is especially true if they are in a different namespace. The
application should treat every content model as if it had both
attribute and element wildcards, even if it does not.
# Avoid overdependence on the document structure. Minimize the
amount of structural checking you do in the application code.
If you are using a SAX parser, process the element you are inter-
ested in by name, but do not necessarily keep track of its parent
or grandparent. In XSLT, consider using more of a “push”
model instead of “pull,” creating templates for individual ele-
ments such as product/number rather than hard-coding entire
paths like catalog/product/number. This will allow the
XSLT to still work even if a department element is added
between catalog and product.
# Avoid overdependence on namespaces. A change in a namespace
name, for example to include a new version number, is disrup-
tive to your code. While you may need to write entirely new
code for the new version, it is ideal if you can reuse some
of the code from the previous version. Avoiding the use of
namespace names when referring to element names, or at least
parameterizing the namespace names instead of hard-coding
them throughout your code, can make the upgrade easier and
promote reuse.

23.4 Lessening the impact of versioning

A few best practices can ease the pain of versioning for the implementers
of your XML vocabulary. They are discussed in this section.
636 Chapter 23 | Versioning

23.4.1 Deﬁne a versioning strategy

If you are deﬁning a complex vocabulary, one that changes frequently
or one that is used by a variety of implementers, it is helpful to clearly
deﬁne a versioning strategy. That way, implementers know what to
expect when a new version is released. A versioning strategy should
specify the following information:

# How will version numbers be formatted and ordered?

# Where will version numbers be indicated in the schemas? In
the version attribute? In the schema document URL? In the
instance? In the namespace? Using some combination of these?
# Are minor releases backward-compatible? Are they forward-
compatible?
# Are major releases backward-compatible? Are they forward-
compatible?
# How will deprecated components be indicated?
# How will changes be documented?
# Are implementers expected to support multiple versions?
# Are implementers expected to upgrade to the newest version in
a particular time frame?

23.4.2 Make only necessary changes

When developing a new version that is not required to be backward-
compatible, it is tempting to make small fixes—change names that are
not as descriptive as they could be, reorder elements to be more intu-
itive, or change cardinalities to be slightly more constrained. Sometimes
there are good reasons to make these changes, for example because the
schema is not conformant to a particular NDR specification or is gen-
uinely confusing. But if there is no good reason for that, don’t give in
to the temptation. The changes may seem small, but they can add up
and cause confusion, software bugs, and incompatibilities, placing a
significant burden on implementers.
23.4 | Lessening the impact of versioning 637

23.4.3 Document all changes

All changes to a schema in a new version should be clearly documented
in a set of release notes or a formal change log. Each entry in the change
log should have the following information:

# Description of the change

# Reason for the change
# Whether the change is backward-compatible
# Notes on upgrading or downgrading—for example, if a required
element is added, how should that value be determined when
upgrading instances to the new version?

If there are a lot of changes, consider creating a side-by-side mapping

document that shows all the differences, like the one shown in Ta-
ble 23–1. The ﬁrst two columns contain the element names used in
new and old instances, indented to show the hierarchy of elements
in each version. The third column describes the change, and the fourth
column indicates whether the change is backward-compatible.

Table 23–1 Sample change log showing mapping

2.0 element 3.0 element Change BC?
catalog catalog

product product Max occurrences changed Y

from 100 to unbounded
name name Length now limited to 32 N
characters
number number
desc description Name change N
size Deleted element N
product_id New required element N
price New optional element Y
638 Chapter 23 | Versioning

23.4.4 Deprecate components before deleting them

To ease the transition from one version to the next, it is possible to
indicate that certain components are deprecated—that is, they are still
in the schema but are not recommended for use, and are likely to be
deleted in a future version of the schema.
There is no formal way to deprecate components in XML Schema,
but deprecation can be indicated in non-native attributes or annotations.
If the deprecated element is intended to be replaced by another element,
the two can be put together in a choice group or substitution group
during the deprecation period, so that either is allowed. It is also useful
to provide human-readable descriptive information that includes its
replacement, if any.
Example 23–14 shows one approach to deprecation. A deprecated
element with a value true is inserted into appinfo to formally indicate

Example 23–14. One approach to deprecation

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:doc="http://datypic.com/doc"
version="2.1">
<xs:element name="product" type="ProductType"/>
<xs:complexType name="ProductType">
<xs:sequence>
<xs:element name="number" type="xs:integer"/>
<xs:element name="name" type="xs:string"/>
<xs:choice>
<xs:element name="color" type="xs:NMTOKEN">
<xs:annotation>
<xs:documentation>Deprecated in
favor of colorList.</xs:documentation>
<xs:appinfo>
<doc:deprecated>true</doc:deprecated>
</xs:appinfo>
</xs:annotation>
</xs:element>
<xs:element name="colorList" type="xs:NMTOKENS"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
</xs:schema>
23.5 | Versions of the XML Schema language 639

that it is deprecated, and a human-readable description is also provided

in documentation. The deprecated element, color, is placed in a
choice group with its intended replacement, colorList. In the next
version, color will be deleted, leaving colorList as the only choice.

23.4.5 Provide a conversion capability

If a new version is not backward-compatible, you should provide a
clear upgrade path from the old version to the new version. A good
way to do this is by providing an XSLT stylesheet to upgrade instances,
which can be done automatically by an application.
Such a conversion needs to handle two changes carefully:

1. If required elements or attributes are added in the new version,

the XSLT should insert them, ideally with a default value if one
can be determined or calculated. Otherwise, an empty or nil
value may be appropriate.
2. If elements or attributes are deleted, the XSLT should provide
messages to the user that it is deleting information. It could also
insert the deleted data as comments in the output, for a human
user who may be reviewing the converted documents.

It may also be worthwhile to write an opposite conversion—one that

downgrades instances from the newer version to the older one. This
makes sense if older implementations that only support the previous
version are likely to persist for some time, and there is no forward
compatibility. The considerations listed above when adding and
deleting components also apply to the downgrade conversion.

23.5 Versions of the XML Schema language

In addition to having multiple versions of your XML vocabulary, you
may be dealing with multiple versions of the XML Schema language
640 Chapter 23 | Versioning

itself. This book describes two different versions of XML Schema: 1.0
and 1.1. Depending on which processor you are using, you may be
required to use one version or the other. Unlike some other XML vo-
cabularies, there is no way to indicate in your schema which version
of XML Schema you are using. Instead, this might be a setting that
you pass to your XML Schema processor, or the processor may only
support one of the versions.

23.5.1 New features in version 1.1

Version 1.1 of XML Schema introduces a number of useful new
features, including:

# Assertions (XPath constraints) on types (Sections 14.1.1 on

p. 353 and 14.1.2 on p. 365)
# Conditional type assignment for elements (Section 14.2 on
p. 375)
# Open content for complex types (Section 12.7.2 on p. 292)
# Relaxed constraints on all groups (Section 12.5.4 on p. 276)
# More powerful namespace constraints for wildcards
(Section 12.7.1.3 on p. 289)
# Multiple inheritance for substitution groups (Section 16.5 on
p. 413)
# Default attributes (Section 15.3.3 on p. 399)
# Inheritable attributes (Section 7.6 on p. 126)
# Overrides, as a replacement for redeﬁnes (Section 18.2 on
p. 459)
# A new explicitTimezone facet (Section 8.4.7 on p. 150)
# Three new built-in simple types: yearMonthDuration (Sec-
tion 11.4.11 on p. 231), dayTimeDuration (Section 11.4.12
on p. 232), and dateTimeStamp (Section 11.4.4 on p. 224)
23.5 | Versions of the XML Schema language 641

# Support for implementation-deﬁned facets and types

(Section 8.6 on p. 154)
# Simpliﬁcation of restrictions through relaxed rules for valid
restrictions (Section 13.5.2 on p. 318), the ability to reuse
identity constraints (Section 17.10 on p. 442), and the ability
to restrict element and attribute declarations in a different target
namespace (Section 13.5.7.1 on p. 339)

These new features required the introduction of new elements and

attributes into the XML Schema language. Version 1.1 of XML Schema
is backward-compatible with version 1.0, so any 1.0 schema will also
work with a 1.1 processor and have the same meaning. However, there
is no forward compatibility between the two versions, so a 1.0 processor
will not be able to handle a 1.1 schema if it uses any of the 1.1 elements
or attributes.

23.5.2 Forward compatibility of XML Schema 1.1

Version 1.1 of XML Schema has some new capabilities to accommodate
the fact that there may be new versions of the XML Schema language
in the future. Specifically, it provides a mechanism for indicating that
a particular XML Schema component applies only to certain versions
of the XML Schema language. These constructs use the minVersion
and/or maxVersion attributes, which are in the Version Control
Namespace, http://www.w3.org/2007/XMLSchema-versioning.
In Example 23–15, the first declaration for product indicates that
it should only be honored by processors using version 1.3 or higher.
Presumably, it makes use of special version 1.3 constructs that are un-
known to version 1.1. If an XML Schema 1.1 processor parses this
schema, it will ignore the first declaration and all of its descendants.
The second product declaration indicates that it should be honored
by processors using versions from 1.1 up to, but not including, ver-
sion 1.3. A processor will only be using one version of XML Schema
during any given validation.
642 Chapter 23 | Versioning

Example 23–15. Using minVersion and maxVersion

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning">
<xs:element name="product" vc:minVersion="1.3">

</xs:element>
<xs:element name="product" vc:minVersion="1.1"
vc:maxVersion="1.3">

</xs:element>
</xs:schema>

This example may seem to violate one of the basic rules of XML
Schema—namely, it has two global element declarations with the same
name. However, the version control attributes have a special power,
signaling to the processor that it should preprocess the schema (using
a process called conditional inclusion) to strip out all the declarations
that don’t apply to the version it is using. It is the output of this pre-
processing that must follow all the rules of XML Schema. In Exam-
ple 23–15, there will never be more than one product declaration in
the schema after preprocessing. However, care must be taken not to
use overlapping values for minVersion and/or maxVersion, lest
duplicate declarations remain after preprocessing.
Unfortunately, this mechanism does not help with the transition
from XML Schema 1.0 to 1.1, because a typical 1.0 processor will not
honor or even know about the minVersion and/or maxVersion
attributes.

23.5.3 Portability of implementation-deﬁned types and

facets
Another aspect of handling variations in the XML Schema language
involves support for implementation-defined types and facets. Sec-
tion 8.6 on p. 154 introduced the concept, providing examples of type
definitions and element declarations that depend on type names and
facets that may only be supported by specific implementations.
23.5 | Versions of the XML Schema language 643

Example 23–16 provides a recap, showing a simple type def-

inition based on a hypothetical implementation-defined type
(ext:ordinalDate), as well as a simple type definition that uses
an implementation-defined facet (saxon:preprocess) which is
currently implemented in Saxon.

Example 23–16. Using implementation-deﬁned types and facets

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:ext="http://example.org/extensions"
xmlns:saxon="http://saxon.sf.net/">
<xs:element name="anyOrdinalDate" type="ext:ordinalDate"/>
<xs:element name="recentOrdinalDate" type="OrdinalDateIn2011"/>
<xs:simpleType name="OrdinalDateIn2011">
<xs:restriction base="ext:ordinalDate">
<xs:minInclusive value="2011-001"/>
<xs:maxInclusive value="2011-365"/>
</xs:restriction>
</xs:simpleType>

<xs:element name="size" type="SMLXSizeType"/>

<xs:simpleType name="SMLXSizeType">
<xs:restriction base="xs:token">
<saxon:preprocess action="upper-case($value)"/>
<xs:enumeration value="SMALL"/>
<xs:enumeration value="MEDIUM"/>
<xs:enumeration value="LARGE"/>
<xs:enumeration value="EXTRA LARGE"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>

While implementation-deﬁned types and facets can be useful, they

do affect the portability of your schema. In fact, if a processor encoun-
ters a reference to any implementation-deﬁned type or facet that it
does not understand, the entire component, and any other components
that depend on it, is considered “unknown” and excluded from the
schema used for validation. It is not technically an error in the schema,
but if one of these dependent elements or attributes is used in an
instance it will fail validation. In Example 23–16, that means that a
644 Chapter 23 | Versioning

recentOrdinalDate or anyOrdinalDate element could never be

valid if the processor does not understand ext:ordinalDate, and a
size element could never be valid if the processor does not understand
saxon:preprocess.
It is possible to take special measures to ensure that implementation-
defined types and facets are only used by processors that can
understand them. This is accomplished through four attributes in the
Version Control Namespace: typeAvailable, typeUnavailable,
facetAvailable, and facetUnavailable. These attributes can be
used on any element in a schema document, and their value is a
qualified name or a space-separated list of qualified names.

23.5.3.1 Using typeAvailable and

typeUnavailable
The typeAvailable attribute is used to test whether the named type(s)
are known to the processor. If any of the listed types is known, the
schema element on which it appears is retained; if the types
are not known, that element and all of its descendants are ignored.
The typeUnavailable has the opposite effect, and the two are often
used in conjunction with each other.
Example 23–17 shows a more portable schema that uses
ordinalDate: There are two separate anyOrdinalDate declara-
tions, one with the typeAvailable attribute and one with the
typeUnavailable attribute. If ordinalDate is known to the proces-
sor, the first declaration is used, and if it is not, the second declaration
is used.
Likewise, there are two separate definitions of the OrdinalDate-
In2011 type. If ordinalDate is known to the processor, the first type
definition is used, and if it is not, the second one is used. This means
that while validation is less strict if a different processor is used, at least
it will not fail unnecessarily.
23.5 | Versions of the XML Schema language 645

Example 23–17. Using vc:typeAvailable and vc:typeUnavailable

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:ext="http://example.org/extensions"
xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning">
<xs:element name="anyOrdinalDate" type="ext:ordinalDate"
vc:typeAvailable="ext:ordinalDate"/>
<xs:element name="anyOrdinalDate" type="xs:string"
vc:typeUnavailable="ext:ordinalDate"/>

<xs:element name="recentOrdinalDate" type="OrdinalDateIn2011"/>

<xs:simpleType name="OrdinalDateIn2011"
vc:typeAvailable="ext:ordinalDate">
<xs:restriction base="ext:ordinalDate">
<xs:minInclusive value="2011-001"/>
<xs:maxInclusive value="2011-365"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="OrdinalDateIn2011"
vc:typeUnavailable="ext:ordinalDate">
<xs:restriction base="xs:string">
<xs:pattern value="2011-\d{3}"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>

23.5.3.2 Using facetAvailable and

facetUnavailable
The facetAvailable and facetUnavailable attributes work sim-
ilarly. Example 23–18 is a schema that contains two type definitions:
The first is used if the saxon:preprocess facet is known, and the
second is used if it is unknown.
As with the minVersion and maxVersion attributes, these at-
tributes do not have to be on top-level components; they can appear
on any element in the schema to indicate that it should be included only
under the specified conditions. Example 23–19 shows a schema
that uses the facetAvailable attribute on the ext:maxLength-
WithoutWhitespace facet itself to instruct the processor to not read
it if it does not understand it.
646 Chapter 23 | Versioning

Example 23–18. Using vc:facetAvailable and vc:facetUnavailable

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:saxon="http://saxon.sf.net/"
xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning">
<xs:element name="size" type="SMLXSizeType"/>
<xs:simpleType name="SMLXSizeType"
vc:facetAvailable="saxon:preprocess">
<xs:restriction base="xs:token">
<saxon:preprocess action="upper-case($value)"/>
<xs:enumeration value="SMALL"/>
<xs:enumeration value="MEDIUM"/>
<xs:enumeration value="LARGE"/>
<xs:enumeration value="EXTRA LARGE"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="SMLXSizeType"
vc:facetUnavailable="saxon:preprocess">
<xs:restriction base="xs:token"/>
</xs:simpleType>
</xs:schema>

Example 23–19. Using vc:facetAvailable

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:ext="http://example.org/extensions"
xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning">
<xs:element name="astring" type="ShortString"/>
<xs:simpleType name="ShortString">
<xs:restriction base="xs:string">
<ext:maxLengthWithoutWhitespace value="5"
vc:facetAvailable="ext:maxLengthWithoutWhitespace"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>

This would not have been appropriate for a prelexical facet like
saxon:preprocess, however, because if the facet is simply ignored,
the instruction to turn the value to upper case before validating it would
be skipped. The resulting schema would have been stricter than
intended when using a processor other than Saxon, because lowercase
values would not be allowed.
XSD keywords

648
Appendix

A.1 Elements
Table A–1 all

Description all model group, which allows elements in any order

Section 12.5.4 on p. 276
Possible parents complexType, restriction, extension, group
Attributes id, minOccurs, maxOccurs
Content model annotation?, (element | 1.1
any | 1.1
group)*

649
650 Appendix A | XSD keywords

Table A–2 alternative

Description Alternative type for an element if a particular XPath test

is true
Section 14.2.1 on p. 376
Possible parent element
Attributes id, test, type, xpathDefaultNamespace
Content model annotation?, (simpleType | complexType)?

Table A–3 annotation

Description Annotation that further documents a schema component

Section 21.8.1 on p. 581
Possible parents all elements except annotation, appinfo, and
documentation
Attribute id
Content model (documentation | appinfo)*

Table A–4 any

Description Element wildcard

Section 12.7.1 on p. 285
Possible parents choice, sequence, all,
1.1 1.1
openContent,
1.1
defaultOpenContent
Attributes id, minOccurs, maxOccurs, namespace,
processContents, 1.1 notNamespace, 1.1 notQName
Content model annotation?
A.1 | Elements 651

Table A–5 anyAttribute

Description Attribute wildcard

Section 12.7.3 on p. 298
Possible parents attributeGroup, complexType, extension, restriction
Attributes id, namespace, processContents, 1.1
notNamespace,
1.1
notQName
Content model annotation?

Table A–6 appinfo

Description Application information

Section 21.8.3 on p. 586
Possible parent annotation
Attribute source
Content model any well-formed content

Table A–7 assert

Description Assertion on a complex type

Section 14.1.2 on p. 365
Possible parents complexType, extension, restriction
Attributes id, test, xpathDefaultNamespace
Content model annotation?
652 Appendix A | XSD keywords

Table A–8 assertion

Description Assertion on a simple type

Section 14.1.1 on p. 353
Possible parent restriction
Attributes id, test, xpathDefaultNamespace
Content model annotation?

Table A–9 attribute (global declaration)

Description Global attribute declaration

Section 7.2.1 on p. 115
Possible parents schema, 1.1
override
Attributes id, name, type, default, fixed, 1.1
inheritable
Content model annotation?, simpleType?

Table A–10 attribute (local declaration)

Description Local attribute declaration

Section 7.2.2 on p. 117
Possible parents attributeGroup, complexType, extension, restriction
Attributes id, name, type, form, use, default, fixed,
1.1
targetNamespace, 1.1 inheritable
Content model annotation?, simpleType?
A.1 | Elements 653

Table A–11 attribute (reference)

Description Attribute reference

Section 12.6.2 on p. 282
Possible parents attributeGroup, complexType, extension, restriction
Attributes id, ref, use, default, fixed, 1.1
inheritable
Content model annotation?

Table A–12 attributeGroup (deﬁnition)

Description Attribute group deﬁnition

Section 15.3.1 on p. 393
Possible parents schema, redefine, 1.1
override
Attributes id, name
Content model annotation?, (attribute | attributeGroup)*,
anyAttribute?

Table A–13 attributeGroup (reference)

Description Attribute group reference

Section 15.3.2.1 on p. 395
Possible parents attributeGroup, complexType, extension, restriction
Attributes id, ref
Content model annotation?
654 Appendix A | XSD keywords

Table A–14 choice

Description choice group that allows a choice among elements

Section 12.5.2 on p. 273
Possible parents choice, complexType, extension, group, restriction,
sequence
Attributes id, minOccurs, maxOccurs
Content model annotation?, (element | group | choice |
sequence | any)*

Table A–15 complexContent

Description Complex content speciﬁcation

Section 13.3.2 on p. 304
Possible parents complexType
Attributes id, mixed
Content model annotation?, (restriction | extension)

Table A–16 complexType (named, global deﬁnition)

Description Named complex type deﬁnition

Section 12.2.1 on p. 258
Possible parents schema, redefine, 1.1
override
Attributes id, name, mixed, abstract, block, final,
1.1
defaultAttributesApply
Content model annotation?, (simpleContent | complexContent |
( 1.1 openContent?, (group | all | choice |
sequence)?, ((attribute | attributeGroup)*,
anyAttribute?), 1.1 assert*))
A.1 | Elements 655

Table A–17 complexType (anonymous, local deﬁnition)

Description Anonymous complex type deﬁnition

Table A–18 defaultOpenContent

Description Default open content model

Section 12.7.2.2 on p. 295
Possible parent schema
Attributes id, appliesToEmpty, mode
Content model annotation?, any

Table A–19 documentation

Description Human-readable documentation of a schema component

Section 21.8.2 on p. 581
Possible parent annotation
Attributes source, xml:lang
Content model any well-formed content
656 Appendix A | XSD keywords

Table A–20 element (global declaration)

Description Global element declaration

Section 6.1.1 on p. 89
Possible parents schema, 1.1
override
Attributes id, name, type, default, fixed, nillable, abstract,
substitutionGroup, block, final
Content model annotation?, (simpleType | complexType)?,
1.1
alternative*, (key | keyref | unique)*

Table A–21 element (local declaration)

Description Local element declaration

Section 6.1.2 on p. 93
Possible parents all, choice, sequence
Attributes id, name, form, type, minOccurs, maxOccurs, default,
fixed, nillable, block, 1.1 targetNamespace
Content model annotation?, (simpleType | complexType)?,
1.1
alternative*, (key | keyref | unique)*

Table A–22 element (reference)

Description Element reference

Section 12.4.2 on p. 267
Possible parents all, choice, sequence
Attributes id, ref, minOccurs, maxOccurs
Content model annotation?
A.1 | Elements 657

Table A–23 enumeration

Description Facet specifying a valid value

Sections 8.4.4 on p. 145, 10.3.3.2 on p. 192
Possible parent restriction
Attributes id, value
Content model annotation?

Table A–24 explicitTimezone

Description Facet specifying whether a time zone is required, optional,

or prohibited
Section 8.4.7 on p. 150
Possible parent restriction
Attributes id, fixed, value
Content model annotation?

Table A–25 extension (simple content)

Description Simple content extension

Section 13.4.1 on p. 306
Possible parent simpleContent
Attributes id, base
Content model annotation?, (attribute | attributeGroup)*,
anyAttribute?, 1.1 assert*
658 Appendix A | XSD keywords

Table A–26 extension (complex content)

Description Complex content extension

Section 13.4.2 on p. 307
Possible parent complexContent
Attributes id, base
Content model annotation?, 1.1 openContent?, (group | all |
choice | sequence)?, (attribute |
attributeGroup)*, anyAttribute?, 1.1 assert*

Table A–27 field

Description Identity constraint ﬁeld

Section 17.7.2 on p. 434
Possible parents key, keyref, unique
Attributes id, xpath, 1.1
xpathDefaultNamespace
Content model annotation?

Table A–28 fractionDigits

Description Facet specifying the number of digits allowed after the

decimal point
Section 8.4.3 on p. 145
Possible parent restriction
Attributes id, value, fixed
Content model annotation?
A.1 | Elements 659

Table A–29 group (deﬁnition)

Description Named model group deﬁnition

Section 15.2.1 on p. 386
Possible parents schema, redefine, 1.1
override
Attributes id, name
Content model annotation?, (all | choice | sequence)

Table A–30 group (reference)

Description Named model group reference

Section 15.2.2.1 on p. 388
Possible parents complexType, extension, restriction, sequence,
choice, 1.1
all
Attributes id, ref, minOccurs, maxOccurs
Content model annotation?

Table A–31 import

Description Import of another namespace

Section 4.3.2 on p. 66
Possible parent schema
Attributes id, namespace, schemaLocation
Content model annotation?
660 Appendix A | XSD keywords

Table A–32 include

Description Include of another schema document

Section 4.3.1 on p. 62
Possible parent schema
Attributes id, schemaLocation
Content model annotation?

Table A–33 key (key constraint)

Description Key constraint

Section 17.5 on p. 428
Possible parent element
Attributes id, name
Content model annotation?, selector, field+

Table A–34 key (reference to key constraint)

Description Reference to key constraint

Section 17.10 on p. 442
Possible parent element
Attributes id, ref
Content model annotation?
A.1 | Elements 661

Table A–35 keyref (key reference)

Description Key reference

Section 17.6 on p. 430
Possible parent element
Attributes id, name, refer
Content model annotation?, selector, field+

Table A–36 keyref (reference to key reference)

Description Reference to key reference

Section 17.10 on p. 442
Possible parent element
Attributes id, ref
Content model annotation?

Table A–37 length

Description Facet specifying the exact length of a value

Sections 8.4.2 on p. 143, 10.3.3.1 on p. 192
Possible parent restriction
Attributes id, value, fixed
Content model annotation?
662 Appendix A | XSD keywords

Table A–38 list

Description List type that allows a space-separated list of values

Section 10.3.1 on p. 188
Possible parent simpleType
Attributes id, itemType
Content model annotation?, simpleType?

Table A–39 maxExclusive

Description Facet specifying the maximum allowed value (exclusive)

Section 8.4.1 on p. 142
Possible parent restriction
Attributes id, value, fixed
Content model annotation?

Table A–40 maxInclusive

Description Facet specifying the maximum allowed value (inclusive)

Section 8.4.1 on p. 142
Possible parent restriction
Attributes id, value, fixed
Content model annotation?
A.1 | Elements 663

Table A–41 maxLength

Description Facet specifying the maximum length of the value

Sections 8.4.2 on p. 143, 10.3.3.1 on p. 192
Possible parent restriction
Attributes id, value, fixed
Content model annotation?

Table A–42 minExclusive

Description Facet specifying the minimum allowed value (exclusive)

Section 8.4.1 on p. 142
Possible parent restriction
Attributes id, value, fixed
Content model annotation?

Table A–43 minInclusive

Description Facet specifying the minimum allowed value (inclusive)

Section 8.4.1 on p. 142
Possible parent restriction
Attributes id, value, fixed
Content model annotation?
664 Appendix A | XSD keywords

Table A–44 minLength

Description Facet specifying the minimum allowed length of the value

Sections 8.4.2 on p. 143, 10.3.3.1 on p. 192
Possible parent restriction
Attributes id, value, fixed
Content model annotation?

Table A–45 notation

Description Notation declaration

Section 19.7.1 on p. 494
Possible parents schema, 1.1
override
Attributes id, name, public, system
Content model annotation?

Table A–46 openContent

Description Open content model

Section 12.7.2.1 on p. 292
Possible parents complexType, extension, restriction
Attributes id, mode
Content model annotation?, any?
A.1 | Elements 665

Table A–47 override

Description Override of another schema document

Table A–48 pattern

Description Facet specifying a regular expression

Sections 8.4.5 on p. 148, 10.3.3.3 on p. 194
Possible parent restriction
Attributes id, value
Content model annotation?

Table A–49 redefine

Description Redeﬁnition of another schema document

Table A–50 restriction (simple atomic type)

Description Simple type restriction

Section 8.3.1 on p. 136
Possible parent simpleType
Attributes id, base
Content model annotation?, simpleType?, (minExclusive |
minInclusive | maxExclusive | maxInclusive |
length | minLength | maxLength | totalDigits |
fractionDigits | enumeration | pattern |
whiteSpace | 1.1 assertion | 1.1 explicitTimezone |
1.1
{any element in another namespace})*

Table A–51 restriction (list type)

Description List type restriction

Table A–52 restriction (union type)

Description Union type restriction

Section 10.2.2 on p. 185
Possible parent simpleType
Attributes id, base
Content model annotation?, simpleType?, (enumeration |
pattern | 1.1 assertion)*
A.1 | Elements 667

Table A–53 restriction (simple content)

Description Simple content restriction

Section 13.5.1 on p. 317
Possible parent simpleContent
Attributes id, base
Content model annotation?, simpleType?, (enumeration |
length | maxExclusive | maxInclusive |
maxLength | minExclusive | minInclusive |
minLength | pattern | totalDigits |
fractionDigits | whiteSpace| pattern |
1.1
assertion | 1.1 explicitTimezone |
1.1
{any element in another namespace})*,
(attribute | attributeGroup)*, anyAttribute?

Table A–54 restriction (complex content)

Description Complex content restriction

Table A–55 schema

Description Schema document

Table A–56 selector

Description Identity constraint selector

Section 17.7.1 on p. 433
Possible parents key, keyref, unique
Attributes id, xpath, 1.1
xpathDefaultNamespace
Content model annotation?

Table A–57 sequence

Description sequence group that requires elements in a particular order

Table A–58 simpleContent

Description Simple content speciﬁcation

Section 13.3.1 on p. 303
Possible parent complexType
Attributes id
Content model annotation?, (restriction | extension)

Table A–59 simpleType (named, global deﬁnition)

Description Named simple type deﬁnition

Section 8.2.1 on p. 131
Possible parents schema, redefine, 1.1
override
Attributes id, name, final
Content model annotation?, (restriction | list | union)

Table A–60 simpleType (anonymous, local deﬁnition)

Description Anonymous simple type deﬁnition

Section 8.2.2 on p. 132
Possible parents attribute, element, list, restriction, union,
1.1
alternative
Attribute id
Content model annotation?, (restriction | list | union)
670 Appendix A | XSD keywords

Table A–61 totalDigits

Description Facet specifying the total number of digits allowed in a

value
Section 8.4.3 on p. 145
Possible parent restriction
Attributes id, value, fixed
Content model annotation?

Table A–62 union

Description Union type that allows values from multiple types

Section 10.2.1 on p. 183
Possible parent simpleType
Attributes id, memberTypes
Content model annotation?, simpleType*

Table A–63 unique (uniqueness constraint)

Description Uniqueness constraint

Section 17.4 on p. 426
Possible parent element
Attributes id, name
Content model annotation?, selector, field+
A.2 | Attributes 671

Table A–64 unique (reference to uniqueness constraint)

Description Reference to uniqueness constraint

Section 17.10 on p. 442
Possible parent element
Attributes id, ref
Content model annotation?

Table A–65 whiteSpace

Description Facet specifying how whitespace should be handled in

values
Section 8.4.8 on p. 151
Possible parent restriction
Attributes id, value, fixed
Content model annotation?

A.2 Attributes
Table A–66 abstract

Description Whether a type or element can be used in an instance

Sections 13.7.4 on p. 346, 16.7.3 on p. 420
Elements complexType, element
Type/valid values boolean
Default value false
672 Appendix A | XSD keywords

Table A–67 appliesToEmpty

Description Whether the default open content model applies to

empty content
Section 12.7.2.2 on p. 295
Element defaultOpenContent
Type/valid values boolean
Default value false

Table A–68 attributeFormDefault

Description Whether local attributes should be qualiﬁed in instances

Section 7.4 on p. 122
Element schema
Type/valid values "qualified" | "unqualified"
Default value unqualified

Table A–69 base

Description Base type of the derivation

Sections various
Elements extension, restriction
Type/valid values QName

Table A–70 block (complex type)

Description Whether to block type substitution in the instance

Section 13.7.2 on p. 344
Element complexType
Type/valid values "#all" or list of ("extension" | "restriction")
Default value defaults to blockDefault of schema
A.2 | Attributes 673

Table A–71 block (element)

Description Whether type and/or element substitutions should be

blocked from the instance
Sections 13.7.3 on p. 346, 16.7.2 on p. 419
Element element
Type/valid values "#all" or list of ("extension" | "restriction" |
"substitution")
Default value defaults to blockDefault of schema

Table A–72 blockDefault

Description Whether the use of derived types in instances should be

blocked
Sections 13.7.2 on p. 344, 13.7.3 on p. 346
Element schema
Type/valid values "#all" or list of ("extension" | "restriction" |
"substitution")

Table A–73 default

Description Default value for the element or attribute

Sections 6.4.1 on p. 102, 7.5.1 on p. 124
Elements attribute, element
Type/valid values string
674 Appendix A | XSD keywords

Table A–74 defaultAttributes

Description The default attribute group for all complex types in a

schema document
Section 15.3.3 on p. 399
Element schema
Type/valid values QName

Table A–75 defaultAttributesApply

Description Whether the default attribute group applies to a

particular complex type
Section 15.3.3 on p. 399
Element complexType
Type/valid values boolean
Default value true

Table A–76 elementFormDefault

Description Whether local element names should be qualiﬁed in

instances
Section 6.3.3 on p. 99
Element schema
Type/valid values "qualified" | "unqualified"
Default value unqualified
A.2 | Attributes 675

Table A–77 facetAvailable†

Description Facet(s) that must be available to include the component

in the schema
Section 23.5.3 on p. 642
Elements any element in the schema
Type/valid values list of QName
† Namespace is http://www.w3.org/2007/XMLSchema-versioning, usually
mapped to the preﬁx vc.

Table A–78 facetUnavailable†

Description Facet(s) that must be unavailable to include the

component in the schema
Section 23.5.3 on p. 642
Elements any element in the schema
Type/valid values list of QName
† Namespace is http://www.w3.org/2007/XMLSchema-versioning, usually
mapped to the preﬁx vc.

Table A–79 final (complex type)

Description Whether other types can be derived from this one

Section 13.7.1 on p. 343
Element complexType
Type/valid values "#all" or list of ("extension" | "restriction")
Default value defaults to finalDefault of schema
676 Appendix A | XSD keywords

Table A–80 final (element)

Description Whether the element declaration can be the head of a

substitution group
Section 16.7.1 on p. 418
Element element
Type/valid values "#all" or list of ("extension" | "restriction")
Default value defaults to finalDefault of schema

Table A–81 final (simple type)

Description Whether other types can be derived from this one

Section 8.5 on p. 152
Element simpleType
Type/valid values "#all" or list of ("restriction" | "list" |
"union" | 1.1
"extension")
Default value defaults to finalDefault of schema

Table A–82 finalDefault

Description Whether type derivation should be disallowed

Sections 8.5 on p. 152, 13.7.1 on p. 343, 16.7.1 on p. 418
Element schema
Type/valid values "#all" or list of ("extension" | "restriction" |
"list" | "union")
A.2 | Attributes 677

Table A–83 fixed (element/attribute)

Description Fixed value for the element or attribute

Sections 6.4.2 on p. 103, 7.5.2 on p. 125
Elements attribute, element
Type/valid values string

Table A–84 fixed (facet)

Description Whether the facet is ﬁxed and therefore cannot be

restricted further
Section 8.3.4 on p. 140
Elements fractionDigits, length, maxExclusive,
maxInclusive, maxLength, minExclusive,
minInclusive, minLength, totalDigits, whiteSpace,
1.1
explicitTimezone
Type/valid values boolean
Default value false

Table A–85 form

Description Whether the element or attribute name must be qualiﬁed

in the instance
Sections 6.3 on p. 98, 7.4 on p. 122
Elements attribute, element
Type/valid values "qualified" | "unqualified"
Default value defaults to attributeFormDefault or
elementFormDefault of schema
678 Appendix A | XSD keywords

Table A–86 id

Description Unique ID
Sections various
Elements all XSD elements except documentation and appinfo
Type/valid values ID

Table A–87 inheritable

Description Whether the value can be inherited by descendant

elements
Section 7.6 on p. 126
Element attribute
Type/valid values boolean
Default value false

Table A–88 itemType

Description The simple type of each item in the list

Section 10.3.1 on p. 188
Element list
Type/valid values QName

Table A–89 lang†

Description Natural language

Section 21.8.2 on p. 581
Elements schema, documentation
Type/valid values language
† Namespace is http://www.w3.org/XML/1998/namespace, mapped to the
preﬁx xml.
A.2 | Attributes 679

Table A–90 maxOccurs

Description Maximum number of element or group occurrences

Sections 12.4 on p. 266, 12.5 on p. 270
Elements all, any, choice, element, group, sequence
Type/valid values nonNegativeInteger | "unbounded". For all group,
must be 0 or 1.
Default value 1

Table A–91 maxVersion†

Description Maximum version supported by a component

Section 23.5.2 on p. 641
Elements any element in the schema
Type/valid values decimal
† Namespace is http://www.w3.org/2007/XMLSchema-versioning, usually
mapped to the preﬁx vc.

Table A–92 memberTypes

Description Member types that make up the union type

Section 10.2.1 on p. 183
Element union
Type/valid values list of QName
680 Appendix A | XSD keywords

Table A–93 minOccurs

Description Minimum number of element or group occurrences

Sections 12.4 on p. 266, 12.5 on p. 270
Elements all, any, choice, element, group, sequence
Type/valid values nonNegativeInteger. For all group, must be 0 or 1.
Default value 1

Table A–94 minVersion†

Description Minimum version supported by a component

Table A–95 mixed

Description Whether the complex type allows mixed content

Sections 12.3.3 on p. 264, 13.3.2 on p. 304
Elements complexContent, complexType
Type/valid values boolean
Default value false for complexType, for complexContent it defaults
to the mixed value of the complex type
A.2 | Attributes 681

Table A–96 mode

Description Where replacement elements can appear in an open

content model
Sections 12.7.2 on p. 292, 12.7.2.2 on p. 295
Elements openContent, defaultOpenContent
Type/valid values "none" | "interleave" | "suffix" ("none" is not
allowed for the defaultOpenContent element)
Default value interleave

Table A–97 name

Description Name of the schema component

Sections various
Elements attribute, attributeGroup, complexType, element,
group, key, keyref, notation, simpleType, unique
Type/valid values NCName

Table A–98 namespace (wildcard)

Description Which namespace(s) the replacement elements/attributes

may belong to
Sections 12.7.1 on p. 285, 12.7.3 on p. 298
Elements any, anyAttribute
Type/valid values "##any" or "##other" or list of
(anyURI | "##targetNamespace" | "##local")
Default value "##any"
682 Appendix A | XSD keywords

Table A–99 namespace (import)

Description Namespace to be imported

Section 4.3.2 on p. 66
Element import
Type/valid values anyURI

Table A–100 nil†

Description Whether the element’s value is nil

Section 6.5 on p. 105
Elements any instance element
Type/valid values boolean
Default value false
† Namespace is http://www.w3.org/2001/XMLSchema-instance, often
mapped to the preﬁx xsi.

Table A–101 nillable

Description Whether elements can be nilled, i.e. have the xsi:nil

attribute
Section 6.5 on p. 105
Element element
Type/valid values boolean
Default value false
A.2 | Attributes 683

Table A–102 noNamespaceSchemaLocation†

Description Location of the schema in instances with no namespaces
Section 5.3.1.2 on p. 86
Elements any instance element
Type/valid values anyURI
† Namespace is http://www.w3.org/2001/XMLSchema-instance, often
mapped to the preﬁx xsi.

Table A–103 notNamespace

Description Disallowed namespaces for a wildcard
Sections 12.7.1 on p. 285, 12.7.3 on p. 298
Elements any, anyAttribute
Type/valid values list of (anyURI | "##targetNamespace" | "##local")

Table A–104 notQName

Description Disallowed element or attribute names for a wildcard
Sections 12.7.1 on p. 285, 12.7.3 on p. 298
Elements any, anyAttribute
Type/valid values list of (QName | "##defined" |
"##definedSibling"). Note that ##definedSibling
cannot be used on anyAttribute.
684 Appendix A | XSD keywords

Table A–105 processContents

Description How strictly to validate the replacement elements or

attributes
Sections 12.7.1 on p. 285, 12.7.3 on p. 298
Elements any, anyAttribute
Type/valid values ("lax" | "skip" | "strict")
Default value strict

Table A–106 public

Description Public ID of a notation

Section 19.7.1 on p. 494
Element notation
Type/valid values token

Table A–107 ref

Description Name of the schema component being referenced

Sections various
Elements attribute, attributeGroup, element, group, 1.1
key,
1.1
keyref, 1.1 unique
Type/valid values QName

Table A–108 refer

Description Name of the key being referenced

Section 17.6 on p. 430
Element keyref
Type/valid values QName
A.2 | Attributes 685

Table A–109 schemaLocation (import/include/redeﬁne/override)

Description Location of the schema which describes included or
imported components
Section 4.3 on p. 61
Elements import, include, redefine, 1.1
override
Type/valid values anyURI

Table A–110 schemaLocation (instance)†

Description List of locations of the schemas that correspond to
namespaces in an instance
Section 5.3.1.1 on p. 84
Elements any instance element
Type/valid values list of pairs of anyURI values
† Namespace is http://www.w3.org/2001/XMLSchema-instance, often
mapped to the preﬁx xsi.

Table A–111 source

Description Source of further documentation
Sections 21.8.2 on p. 581, 21.8.3 on p. 586
Elements appinfo, documentation
Type/valid values anyURI

Table A–112 substitutionGroup

Description Head of the substitution group to which the element
declaration belongs
Section 16.3 on p. 409
Element element
Type/valid values QName ( 1.1 list of QName)
686 Appendix A | XSD keywords

Table A–113 system

Description System ID of a notation

Section 19.7.1 on p. 494
Element notation
Type/valid values anyURI

Table A–114 targetNamespace (schema)

Description Namespace to which all global schema components

belong
Section 3.3.1 on p. 48
Element schema
Type/valid values anyURI

Table A–115 targetNamespace (local declaration)

Description Namespace of a local element or attribute declaration

being restricted
Section 13.5.7 on p. 337
Elements attribute, element
Type/valid values anyURI

Table A–116 test

Description XPath assertion on a complex or simple type

Sections 14.2 on p. 375, 14.1.2 on p. 365, 14.1.1 on p. 353
Elements alternative, assert, assertion
Type/valid values XPath expression
A.2 | Attributes 687

Table A–117 type (element/attribute)

Description Type of attributes or elements
Sections 6.2 on p. 96, 7.3 on p. 120
Elements attribute, element, 1.1
alternative
Type/valid values QName

Table A–118 type (instance)†

Description Type of the element, used in type substitution and union
types
Section 13.6 on p. 341
Elements any instance element
Type/valid values QName
† Namespace is http://www.w3.org/2001/XMLSchema-instance, often
mapped to the preﬁx xsi.

Table A–119 typeAvailable†

Description Type(s) that must be available to include the component
in the schema
Section 23.5.3 on p. 642
Elements any element in the schema
Type/valid values list of QName
† Namespace is http://www.w3.org/2007/XMLSchema-versioning, usually
mapped to the preﬁx vc.
688 Appendix A | XSD keywords

Table A–120 typeUnavailable†

Description Type(s) that must be unavailable to include the

Table A–121 use

Description Whether an attribute is required or optional

Section 12.6 on p. 281
Element attribute
Type/valid values "optional" | "prohibited" | "required"
Default value optional

Table A–122 value

Description Value of the facet

Sections 8.4.1 on p. 142, 8.4.2 on p. 143, 8.4.3 on p. 145, 8.4.4
on p. 145, 8.4.5 on p. 148, 8.4.8 on p. 151
Elements enumeration, fractionDigits, length, maxExclusive,
maxInclusive, maxLength, minExclusive,
minInclusive, minLength, pattern, totalDigits,
whiteSpace, 1.1 explicitTimezone
Type/valid values various, depending on the facet and base type
A.2 | Attributes 689

Table A–123 version

Description Version number of the schema
Section 23.2 on p. 626
Element schema
Type/valid values token

Table A–124 xpath

Description XPath to the selected nodes or key ﬁeld
Section 17.8 on p. 435
Elements field, selector
Type/valid values XPath subset

Table A–125 xpathDefaultNamespace

Description The default namespace for XPath expressions
Sections 14.1.3.1 on p. 373, 17.9.1 on p. 441
Elements alternative, assert, assertion, selector, field,
schema
Type/valid values anyURI | "##defaultNamespace" |
"##targetNamespace" | "##local"
Built-in simple
types

690
Appendix

B.1 Built-in simple types

Table B–1 Built-in simple types
Name Description or range Examples Section
String and name types
string character string This is a string! 11.2.1
normalizedString character string with This is a 11.2.1
“replace” whitespace normalizedString!
processing
token character string with This is a token! 11.2.1
“collapse” whitespace
processing
Name valid XML name size 11.2.2
(Continues)

691
692 Appendix B | Built-in simple types

Table B–1 (Continued)

Name Description or range Examples Section
NCName non-colonized size 11.2.3
(unprefixed) name
language natural language en-GB, en-US, fr 11.2.4
identifier
Numeric types
float single-precision -INF, -3E2, -0, 0, 11.3.1
32-bit floating-point 24.3e-3, 12, 15.2, INF,
number NaN

double double-precision -INF, -3E2, -0, 0, 11.3.1

64-bit ﬂoating-point 24.3e-3, 12, 15.2, INF,
number NaN

decimal any decimal number -3, +3.5, +.5, 3, 3., 11.3.2

003.0, 3.0000

integer any integer -3, 0, +3, 05, 4268 11.3.3

long –9223372036854775808 -3000000000, +3, 11.3.3
to 699999999999999
9223372036854775807

int –2147483648 to -2147483648, +3, 11.3.3

2147483647 2147483647

short –32768 to 32767 -32768, +3, 32767 11.3.3

byte –128 to 127 -128, +3, 127 11.3.3
positiveInteger 1, 2, . . . 1, +426822752 11.3.3
nonPositive- . . . , –2, –1, 0 -426822752, -1, 0 11.3.3
Integer

negativeInteger . . . , –2, –1 -426822752, -1 11.3.3

nonNegative- 0, 1, 2, . . . 0, +1, 426822752 11.3.3
Integer

(Continues)
B.1 | Built-in simple types 693

Table B–1 (Continued)

Name Description or range Examples Section
unsignedLong 0 to 0, 70000000000000 11.3.3
18446744073709551615
unsignedInt 0 to 4294967295 0, 4294967295 11.3.3
unsignedShort 0 to 65535 0, 65535 11.3.3
unsignedByte 0 to 255 0, 255 11.3.3
Date and time types
date date, 2004-04-12 11.4.1
YYYY-MM-DD
time time, hh:mm:ss.sss 13:20:00.000, 11.4.2
13:20:00Z,
13:20:00-05:00

dateTime date and time 2004-04-12T13:20:00, 11.4.3

(YYYY-MMDDThh: 2004-04-12T13:20:00-05:00
mm:ss.sss) followed
by an optional time
zone
1.1
dateTimeStamp date and time 2004-04-12T13:20:00Z, 11.4.4
(YYYY-MMDDThh: 2004-04-12T13:20:00-05:00
mm:ss.sss) followed
by a required time
zone
gYear speciﬁc year, YYYY 2004 11.4.5
gYearMonth speciﬁc year and 2004-04 11.4.6
month, YYYY-MM
gMonth recurring month of --04 11.4.7
the year, --MM
gMonthDay recurring day of the --04-12 11.4.8
year, --MM-DD
(Continues)
694 Appendix B | Built-in simple types

Table B–1 (Continued)

Name Description or range Examples Section
gDay recurring day of the ---12 11.4.9
month, ---DD
duration length of time, P2Y6M5DT12H35M30.5S, 11.4.10
PnYnMnDTnHnMnS P2Y
1.1
yearMonth- length of time in P2Y6M, P2Y, P12M 11.4.11
Duration years and/or months,
PnYnM
1.1
dayTime- length of time in P5DT12H35M30.5S, P5D, 11.4.12
Duration days, hours, minutes, PT12H
and/or seconds,
PnDTnHnMnS
XML DTD types
ID unique identifier 11.5.1
IDREF reference to a unique 11.5.2
identifier
IDREFS list of IDREF 11.5.3
ENTITY unparsed entity 11.5.4
reference
ENTITIES list of ENTITY 11.5.5
NMTOKEN single token (no small 11.5.6
whitespace)
NMTOKENS list of NMTOKEN small medium large 11.5.7
NOTATION notation reference 11.5.8
Other types
QName qualified name prod:size, size 11.6.1
(may be prefixed
or unprefixed)
(Continues)
B.2 | Applicability of facets to built-in simple types 695

Table B–1 (Continued)

Name Description or range Examples Section
boolean logical true, false, 0, 1 11.6.2
hexBinary binary with hex 0FB8, 0fb8 11.6.3
encoding
base64Binary binary with base64 0FB8, 0fb8 11.6.3
encoding
anyURI URI reference http://datypic.com, 11.6.4
(absolute or relative) ../prod.html#shirt,
urn:example:org

B.2 Applicability of facets to built-in simple

types
Table B–2 Applicability of facets to built-in simple types

explicitTimezone
fractionDigits
maxExclusive
minExclusive
minInclusive

enumeration
maxInlusive
maxLength

whiteSpace
minLength

totalDigits

assertion
pattern
length

Name
1.1

1.1

String and name types

string A A A V A A A
normalizedString A A A V A A A
token A A A C A A A
Name A A A C V A A
NCName A A A C V A A
(Continues)
696 Appendix B | Built-in simple types

Table B–2 (Continued)

explicitTimezone
fractionDigits
maxExclusive
minExclusive
minInclusive

enumeration
maxInlusive
maxLength

whiteSpace
minLength

totalDigits

assertion
pattern
length
Name

1.1

1.1
language A A A C V A A
Numeric types
float A A A A C A A A
double A A A A C A A A
decimal A A A A A A C A A A
integer A A A A A 0 C V A A
long A V V A A 0 C V A A
int A V V A A 0 C V A A
short A V V A A 0 C V A A
byte A V V A A 0 C V A A
positiveInteger A V A A A 0 C V A A
nonPositiveInteger A A V A A 0 C V A A
negativeInteger A A V A A 0 C V A A
nonNegativeInteger A V A A A 0 C V A A
unsignedLong A V V A A 0 C V A A
unsignedInt A V V A A 0 C V A A
unsignedShort A V V A A 0 C V A A
unsignedByte A V V A A 0 C V A A
Date and time types
date A A A A C A A A A
(Continues)
B.2 | Applicability of facets to built-in simple types 697

Table B–2 (Continued)

explicitTimezone
fractionDigits
maxExclusive
minExclusive
minInclusive

enumeration
maxInlusive
maxLength

whiteSpace
minLength

totalDigits

assertion
pattern
Name length

1.1

1.1
time A A A A C A A A A
dateTime A A A A C A A A A
1.1
dateTimeStamp A A A A C A A A V
gYear A A A A C A A A A
gYearMonth A A A A C A A A A
gMonth A A A A C A A A A
gMonthDay A A A A C A A A A
gDay A A A A C A A A A
duration A A A A C A A A
1.1
yearMonthDuration A A A A C A A A
1.1
dayTimeDuration A A A A C A A A
XML DTD types
ID A A A C V A A
IDREF A A A C V A A
IDREFS A V A C A A A
ENTITY A A A C V A A
ENTITIES A V A C A A A
NMTOKEN A A A C V A A
NMTOKENS A V A C A A A
NOTATION A A A C A A A
(Continues)
698 Appendix B | Built-in simple types

Table B–2 (Continued)

explicitTimezone
fractionDigits
maxExclusive
minExclusive
minInclusive

enumeration
maxInlusive
maxLength

whiteSpace
minLength

totalDigits

assertion
pattern
length
Name

1.1

1.1
Other types
QName A A A C A A A
boolean C A A
hexBinary A A A C A A A
base64Binary A A A C A A A
anyURI A A A C A A A
Other varieties
List types A A A C A A A
Union types A A A
A — Facet is applicable to this type.
V — Facet has a value for this type, but it is not fixed.
0 — Facet is applicable to this type, but the only value that can be specified
is 0.
C — Facet is applicable to this type, but the only value that can be specified
is collapse.
Index

Symbols in regular expressions, 162, 165,

_ (underscore) 176–177
in NMTOKEN type, 243 / (slash)
in XML names, 40, 91, 167, 208, 559 in binary types, 249
- (hyphen, dash, minus sign) in XPath, 436–437
in binary types, 249 // (XPath), 438
in dates, 221–228, 234 . (period)
in durations, 229–233 in NMTOKEN type, 243
in NMTOKEN type, 243 in numbers, 215–216
in numbers, 214–219 in regular expressions, 162, 165, 175
in regular expressions, 161, 165, in time values, 222–225
172–176 in XML names, 40, 91, 167, 209, 559
in time values, 233–234 in XPath, 436–437
in XML names, 40, 91, 167, 208, 559 .// (XPath), 436–437
in XPath 2.0, 356 ^ (circumﬂex accent), in regular
, (comma) expressions, 165, 173–175
in DTDs, 477 ' (apostrophe), in regular expressions,
in regular expressions, 176–177 163
; (semicolon), in regular expressions, 162 " (quote), in regular expressions, 163
: (colon) ( ) (parentheses)
in NMTOKEN type, 243 in DTDs, 476–477
in time values, 222–225, 233–234 in regular expressions, 161–162, 165,
in XML names, 167, 208–209, 246, 175–178
559 in XPath 2.0, 355
!= operator (XPath 2.0), 356, 378 [ ] (square brackets), in regular
? (question mark) expressions, 160–162, 165,
in DTDs, 477 171–177

Index entries in gray refer to XML Schema 1.1.

699
700 Index

{ } (curly brackets), in regular in XPath, 435, 437

expressions, 160–162, 165, -0 (negative zero), numeric value,
176–177 213–214
@ (commercial at), in XPath, 436–437 0 (zero)
* (asterisk) as boolean value, 247
in DTDs, 474, 477 as numeric value, 213–214
in regular expressions, 162–163, 165, in dates, 221, 226
176–177 leading/trailing, in numbers, 104, 125,
operator (XPath 2.0), 356 215–217, 219, 254
wildcard (XPath), 436–437 1 (boolean value), 247
\ (backslash), in regular expressions,
161–166, 175 A
& (ampersand) a element (HTML), 525
in character entities, 206 abstract attribute
in regular expressions, 163 of complexType element, 343,
in strings, 205 346–348
# (number sign) of element element, 418, 420
in regular expressions, 162 syntax of, 671
in URIs, 251–252 all group, 26, 276–278
% (percent sign), in URIs, 251–252 avoiding in web services, 548
+ (plus sign) element declarations/references in, 276,
in binary types, 249 391
in DTDs, 477 extending, 310–311, 606
in numbers, 104, 214–219, 254 group references in, 276, 391
in regular expressions, 162, 165–166, in complex types, 390
176–177 in named model groups, 386, 391
in time values, 234 occurrence constraints in, 276–277,
in XPath 2.0, 356 532
< (less than), 163, 206 restricting, 325–328
in regular expressions, 163 in version 1.1, 625
in strings, 205 syntax of, 277, 649
in XPath 2.0, 356, 378 vs. DTDs, 477
<= operator (XPath 2.0), 356, 378 wildcards in, 276
= (equals sign) #all value
in binary types, 249 of block attribute, 344, 420
in XPath 2.0, 356, 378 of final attribute, 152, 343
> (greater than), 163 in version 1.1, 418
in regular expressions, 163 alternative element, 97
in XPath 2.0, 356, 378 syntax of, 376, 650
>= operator (XPath 2.0), 356, 378 test attribute of, 375–376
| (vertical bar) type attribute of, 376
in DTDs, 474, 477–478 xpathDefaultNamespace attribute
in regular expressions, 160–162, 165, of, 375, 381
176–178

Index entries in gray refer to XML Schema 1.1.

Index 701

& entity reference appinfo element, 31, 581, 587–588

in regular expressions, 163 syntax of, 587, 651
in strings, 205 applications
ancestor keyword (XPath), 438 compatibility of, 634–635
and operator (XPath 2.0), 356, 378 ignoring irrelevant elements and
annotation element, 31, 497–498, 581, attributes in, 635
585 providing information for, 6, 31
syntax of, 581, 650 appliesToEmpty attribute
vs. DTDs, 497–498 (defaultOpenContent element),
vs. non-native attributes, 589 298
anonymous complex types, 22, 96–97, syntax of, 672
260–261 assert element, 365–372
never equal, 269 syntax of, 366, 651
vs. named, 550 test attribute of, 352, 375
anonymous simple types, 22, 96 xpathDefaultNamespace attribute
definitions of, 120–121, 132–133 of, 375
readability of, 134 assertion element, 137, 150
vs. named, 133–134, 550 for built-in types, 695–698
ANY speciﬁer (DTDs), 474, 480 syntax of, 353, 652
any wildcard, 27, 285–288, 480, 601 test attribute of, 352, 375
namespace attribute of, 287, 329 xpathDefaultNamespace attribute
notNamespace attribute of, 289–290 of, 375
notQName attribute of, 290 assertions, 33, 351–375
processContents attribute of, and namespaces, 372–375
287–289 for complex types, 365–372
syntax of, 285, 650 for list types, 363–365
##any value (namespace attribute on for simple types, 353–365
wildcard), 287, 329 for union types, 185
anyAtomicType type, 203 inheriting, 362, 370–372
anyAttribute wildcard, 27, 298, 398, multiple on the same type, 355
602 type errors in, 359–362
namespace attribute of, 298, using XPath 2.0 for, 352
315–316, 336–337 with multistep paths, 368
notNamespace attribute of, 298 with predicates, 368
notQName attribute of, 298 assisted editing, 541
processContents attribute of, 298, asterisk. See *
315–316, 336–337, 602–603 atomic types, 181–183
syntax of, 299, 651 atoms, in regular expressions, 161–176
anySimpleType type, 121, 203 attribute declarations, 4, 18, 115–122,
anyType type, 96, 203 281–284
anyURI type, 250–253 from a different namespace, 117
facets applicable to, 253, 698 global, 115–117, 119–120, 394, 550
values of, 143–144 local, 19, 117–120, 122, 281, 339,
' entity reference, 163 394, 550

Index entries in gray refer to XML Schema 1.1.

702 Index

attribute declarations (cont.) reusing, 597

location of, in type definitions, 281 vs. complex type extension, 403
overriding, 459, 466 vs. DTDs, 485–486
removing, 619 wildcards in, 395, 398
restricting, 318, 333–335 attribute references, 282–283
reusing, 597 in attribute groups, 394
target namespace of, 48 location of, in type definitions, 281
vs. wildcards, 619 attribute wildcards, 27, 298, 602
attribute element, 115–125 extending complex types with, 315
default attribute of, 123 for forward compatibility, 623
fixed attribute of, 123 in attribute groups, 395, 398
form attribute of, 122 processContents attribute of, 315,
inheritable attribute of, 126–127 336
name attribute of, 117 restricting, 335–337, 458
ref attribute of, 115, 117 vs. attribute declarations, 619
syntax of: See also anyAttribute element
global declaration, 116, 652 attributeFormDefault attribute
local declaration, 118, 652 (schema element), 77, 122
reference, 282, 653 ignoring for attribute names, 580
targetNamespace attribute of, qualified value of, 122
339–341 syntax of, 672
type attribute of, 120–121, 394 unqualified value of, 122
use attribute of, 117, 119, 283, 394, attributeGroup element, 393
482–483, 688 ref attribute of, 395
attribute group references, 395–399 syntax of:
in attribute groups, 398–399 definition, 393, 653
in complex type definitions, 284, 396 reference, 396, 653
location of, in type definitions, 281 attributes, 19
attribute groups, 19, 32, 120, 392–400, and default namespace declarations, 39
544 applicable to all elements, 79–81
attribute references in, 394 co-constraints for, 33
default, 284, 399–400 deprecated, 627, 638
definitions of, 18, 393 empty, 124, 145
local attribute declarations in, 394 extending, 314, 490–491
location of, in type definitions, 281 inherited, 126–127, 283, 382–383
names of, 393, 545, 559 names of, 4, 117, 545, 559–563
duplicating (illegal), 397 changing, 619
order of, 396 duplicating (illegal), 45, 119, 397,
overriding, 459, 467, 491–492, 470
613–614 in XPath, 367, 436–437, 439
risks of, 470–471 qualified (prefixed), 44–46, 117,
redefining, 449, 451, 456–458, 119–120, 122–123, 580
490–491, 600, 611–612 unprefixed, 44, 122
risks of, 470–471 unqualified, 40, 119, 122–123, 580

Index entries in gray refer to XML Schema 1.1.

Index 703

nillable, 435 block attribute

non-native, 588–589, 629 #all value of, 344, 420
order of, 115, 281 empty value of, 345
prohibited, 333, 335 extension value of, 344, 420
required vs. optional, 117, 119, 123, of complexType element, 343–346
143, 283, 333, 457, 482, of element element, 322, 346,
618–619, 627 418–419
reusing, 302 restriction value of, 344, 420
types associated with, 4, 18, 21, substitution value of, 420
120–121, 125, 236–246, 457 syntax of, 672–673
enumerated, 481 block element (XSL-FO), 525
vs. DTDs, 480–482 block escapes, 170–171
units of measurement in, 130 blockDefault attribute (schema
values of: element), 77, 345
default, 82, 114, 123–124, 283, syntax of, 673
333, 457, 482, 499, 548 boolean type, 247–248, 255
fixed, 82, 123, 125, 283, 333, 457, facets applicable to, 698
548 bounds facets, 142
validating, 114 applicable to:
whitespace in, 82 date and time types, 235
vs. elements, 113 duration types, 230
avg function (XPath 2.0), 363 integer types, 217–218
simple content, 317–318
B changing restrictiveness of, 619–620
b element (HTML), 525 br element (HTML), 265
backslash, in regular expressions, branches (in regular expressions),
161–166, 175 159–161, 177–178
backward compatibility, 618–622 built-in types, 23, 201–255, 691–695
and version numbers, 626, 631–632 facets applicable to, 138–139, 141,
base attribute (restriction) element, 152, 695–698
136 hierarchy of, 202–203
syntax of, 672 namespace of, 50
base64 encoding, 249 restricting, 135, 140
base64Binary type, 248–250 byte type, 218
facets applicable to, 250, 698 facets applicable to, 696
values of:
empty, 143 C
length of, 249 C, in category escapes, 169
BIDI elements, 207 \c and \C, multicharacter escapes, 167
binary types, 248–250, 695 camel case, 560
facets applicable to, 250, 698 carriage return character
values of: in regular expressions, 163, 165–166
length of, 143–144, 249 in strings, 151
valid, 145 CDATA data type speciﬁer (DTDs), 481

Index entries in gray refer to XML Schema 1.1.

704 Index

chameleon namespaces, 65–66, 565, forward, 623–626

572–574 complex content
character class escapes, 161, 164 deriving complex types from, 304–305
block, 170–171 extending, 307–316
category, 167–169 restricting, 318–329
multicharacter, 166 in version 1.1, 320
single-character, 165 vs. database tables, 504
character class expressions, 161, 171 vs. DTDs, 476
escaping, 175 complex type definitions
list of characters in, 171 attribute group references in, 396
negating, 173–174 attribute groups in, 394, 397
ranges in, 172 attribute wildcards in, 298, 398, 602
subtracting from, 174 attributes in, 115–119, 281, 283–284,
character references, 162–163 397
chatty services, 522 element declarations in, 93, 266–268,
choice group, 26, 273–276 387
avoiding in web services, 548 element wildcards in, 601
compatibility of, 619 ID attributes in, 236
element wildcards in, 609 named model groups in, 278–279,
extending, 309, 489–490, 607–608 387, 389
in complex types, 390 repeating element names in, 415–416
in named model groups, 386 complex types, 19, 257–298
meaningless, 321 abstract, 346, 348
nesting, 275–276 anonymous, 22, 96–97, 260–261, 269,
occurrence constraints in, 274–275 550
restricting, 324–329 assertions for, 365–372
syntax of, 273, 654 associating with element names, 96
vs. DTDs, 477 base type of, 302, 314–315, 319
vs. substitution groups, 414–415 extending, 22, 27, 302–303, 305–316,
circumflex accent, in regular expressions, 403, 606
165, 173–175 global vs. local, 19, 22
class hierarchies, 518 named, 258–260, 484–485, 550,
co-constraints, 586 559–560, 597
code generating, 541 overriding, 459, 465
collapse value (whiteSpace facet), risks of, 468–470
104, 125, 151–152, 189, 205–206, preventing derivation of, 343–344
255 redefining, 449, 451, 453–454, 600,
colon. See : 609
comma. See , risks of, 468–470
comments, 497–498 restricting, 22, 27, 301–303, 316–337,
common components libraries, 597–598 455, 603
compatibility, 617–626 reusing, 597
application, 634–635 vs. named model groups, 403–404
backward, 618–622, 626, 631–632 vs. OO classes, 514

Index entries in gray refer to XML Schema 1.1.

Index 705

vs. simple types, 21 Coordinated Universal Time. See UTC

with open content, 604 Costello, Roger, 550
complexContent element, 304 count function (XPath 2.0), 363
syntax of, 304, 654 curly brackets, in regular expressions,
complexType element, 28, 96, 258–261 160–162, 165, 176–177
abstract attribute of, 343, 346–348
block attribute of, 343–346 D
defaultAttributes attribute of, D, in durations, 229–233
400 \d multicharacter escape, 160–161,
final attribute of, 343–344 166–167, 173
mixed attribute of, 265 \D multicharacter escape, 166
no direct element declarations in, 270 dash. See -
syntax of: data binding tools
anonymous definition, 261, 655 and generic elements, 520
named definition, 259, 654 complex types in, 517–518
composition, 519–522 processing relationships in, 511
conditional expressions (XPath), databases
369–370 coupling with XML messages, 504
conditional inclusion, 642 foreign keys in, 32, 430, 510
conditional type assignment, 375–383 generating instances from, 82
and namespaces, 381–382 mapping elements to, 586–587
inherited attributes in, 382–383 names in, 560
validation errors in, 380–381 tables and columns in, 504
with declared types, 378 datatypes, 201
with default types, 377 date and time types, 221–235, 693
contains function (XPath 2.0), 357 comparing dates in, 235
content models, 26, 261–269 facets applicable to, 142, 235, 696–697
absence of, for empty content, 479 date type, 221
and web services, 548 facets applicable to, 696
deterministic, 279–280 dateTime type, 223–224
eliminating meaningless groups in, facets applicable to, 697
320–321 dateTimeStamp type, 224–225
extending, 27, 305–313, 607 facets applicable to, 697
in DTDs, 473–480 dayTimeDuration type, 232–233
location of extensions in, 600, 611 facets applicable to, 697
named model groups in, 390 DCD (Document Content Description),
nondeterministic, 280, 470, 602 11
open, 27, 292–298, 311, 600, 619 DDML (Document Deﬁnition Markup
restricting, 318–333, 455 Language), 11
reusability of, 302, 385 debugging, 6, 542
reusing, 484–485 decimal point, 215–216
content types, 25–26, 266, 473–480 decimal type, 145, 213–216
See also complex, element-only, empty, canonical representation of, 215
mixed, simple content facets applicable to, 216, 696

Index entries in gray refer to XML Schema 1.1.

706 Index

declarations, 18 distinct-values function

See also attribute, element, namespace (XPath 2.0), 363
declarations div operator (XPath 2.0), 356
default attribute DocBook, 526
of attribute element, 123, 283, 482 documentation, 6–7, 31, 580–592
of element element, 101 generating from schemas, 541, 545,
syntax of, 673 584
default values human-readable, 497–498, 541, 545,
avoiding in web services, 548 581
for nils, 107 metadata in, 582, 585
of attributes, 82, 114, 123–124, 283, on namespaces, 589–592
333, 457, 482, 499 reusing, 584
of elements, 82, 101–103, 110, 269 separate authoring, 527
of occurrence constraints, 477 documentation element, 497–498,
defaultAttributes attribute 581–584
(schema element), 284, 399–400 source attribute of, 582
syntax of, 674 syntax of, 583, 655
defaultAttributesApply attribute documents. See schema documents
(schema element), 284, 400 double type, 213–215
syntax of, 674 canonical representation of, 213
##defaultNamespace value facets applicable to, 215, 696
(xpathDefaultNamespace DTDs (Document Type Deﬁnitions),
attribute), 375 9–10, 473–499
defaultOpenContent element, #PCDATA in, 474–475, 478
295–298 attributes in, 475
mode attribute of, 296 types of, 480–482
syntax of, 295, 655 values of, 114, 482
##defined value (notQName comments in, 497–498
attribute), 290–291, 625 converting to schemas, 10
##definedSibling value (notQName elements in, 476, 478
attribute), 290–291, 625 empty content in, 479
definitions, 18 extensibility of, 486–492
order of, 19 general entities in, 493, 499
See also complex type definitions groups in, 476–484
deprecated element, 638 limited support of namespaces in, 564
derivation. See type derivation occurrence constraints in, 477
descendant keyword (XPath), 438 parameter entities in, 483–492
deterministic content models, 279–280 reusing:
digits attributes, 485–486
in binary types, 249 content models, 484–485
in NMTOKEN type, 243 unparsed entities in, 240, 242, 493,
in regular expressions, 161–162, 166, 496
168 using with schemas, 499
in XML names, 40, 91, 208, 559 whitespace handling in, 474

Index entries in gray refer to XML Schema 1.1.

Index 707

wildcards in, 480 syntax of:

duration type, 229–231 global declaration, 90–91, 656
facets applicable to, 697 local declaration, 93, 656
reference, 267, 656
E targetNamespace attribute of,
e or E (exponent), in numbers, 213–214 339–341
e-book readers, 524 type attribute of, 96–97, 267, 387
element declarations, 4, 18, 89–110 element references, 267–268
abstract, 420 duplicating names of, 268–269
duplicating names of, 268–269 in model groups, 270, 276–279, 388,
global, 19, 89–92, 550, 578 391
identity constraints in, 425 occurrence constraints in, 92
in content models, 266–269 element substitution groups. See
in model groups, 270, 276–279, 387, substitution groups
391 element wildcards, 27, 285–288, 601
in substitution groups, 95, 114 for forward compatibility, 623
multiple, 413–414 in choice groups, 609
local, 19, 93–96, 99, 266, 339, 550, overlapping values of, 280
578 restricting, 322–324
missing external components in, 76 vs. element declarations, 604, 619
occurrence constraints in, 94 in version 1.1, 624
overriding, 459, 466 in version 1.1., 280
referencing types in, 96 See also any, replacement elements
removing, 619 elementFormDefault attribute
restricting, 321–322 (schema element), 77, 99, 101, 578
reusing, 597 overriding, 100
target namespace of, 48 qualified value of, 99, 402, 576
vs. OO instance variables, 514 syntax of, 674
vs. wildcards, 280, 604, 619, 624 unqualified value of, 77, 99, 101,
element element, 28, 89–95 577–578
abstract attribute of, 418, 420 element-only content, 25, 264
block attribute of, 322, 346, 418–419 extending, 305
default attribute of, 101 restricting, 317
final attribute of, 418–419 elements, 19, 89–110
fixed attribute of, 101 absent, 101, 105
form attribute of, 100 block vs. inline, 525
name attribute of, 266, 339 co-constraints for, 33
nillable attribute of, 109 container, 531–532
ref attribute of, 267, 388 deprecated, 627, 638
substitutionGroup attribute of, empty, 101, 105–106, 143, 145
410–412 for relationships, 512–514
generic, 520

Index entries in gray refer to XML Schema 1.1.

708 Index

elements (cont.) empty function (XPath 2.0), 358

intermediate, 527–531, 546 EMPTY speciﬁer (DTDs), 474, 479
mapping to databases, 586–587 ends-with function (XPath 2.0), 358
names of, 4, 96, 545, 559–563, 597 ENTITIES type, 242–243
changing, 619 facets applicable to, 243, 697
generic vs. specific, 533–537 in DTDs, 481
in XPath, 439 ENTITY type, 240–241
qualified, 91, 98–100, 575, 580 facets applicable to, 241, 697
searching in content, 525 in DTDs, 481, 483–492
unprefixed, 39, 94 enumeration facet, 137, 145–148, 481
unprefixed qualified, 40 case-insensitive, 155
unqualified, 40, 96, 576–580 extending, 146–147
unqualified local, 98–101 for built-in types, 695–698
vs. names of rows in databases, 506 for derived types, 140
nillable, 103, 106–110, 115, 143, 435 for list types, 192–194, 240, 243–244
and identity constraints, 110 for union types, 185
of derived declarations, 322 syntax of, 138, 657
order of, 115, 270–272, 506, 536, 619 error built-in type (XML Schema
in narrative content, 526 namespace), 380–381
repeating, 114, 506, 536 escapes. See character class escapes
in all groups, 532 exists function (XPath 2.0), 358
replacement, 285–291, 323 explicitTimezone facet, 137–138,
required vs. optional, 143, 271, 531, 150
536, 618–619, 627 for built-in types, 695–698
root, 87, 95, 523 syntax of, 657
separate vs. composite values, 532–533 value attribute of, 150, 234
types associated with, 4, 18, 20–21, 96 exponent, in numbers, 213–214
values of: extensibility, 8, 542–543, 599–614
and units of measurement, 130 and anonymous types, 134
data types for, 535 in DTDs, 486–492
default, 82, 101–103, 107, 110, intermediate elements for, 531
269, 548 extension
fixed, 82, 101–104, 110, 321, 548 of all groups, 606
whitespace in, 82 of attributes, 314, 490–491
vs. attributes, 114 of choice groups, 489–490, 607–608
else keyword (XPath 2.0), 369–370 of complex types, 22, 27, 302–303,
empty attribute values, 124, 145 305–316, 403, 516, 606
empty content, 25–26, 265–266 of content models, 27, 305–313, 607
applying default open content to, of enumerations, 146–147
298 of schemas, 8, 531, 542–543, 599–614
extending, 305, 313 of sequence groups, 486, 488,
restricting, 317, 332–333 606–607
vs. DTDs, 479

Index entries in gray refer to XML Schema 1.1.

Index 709

extension element, 303, 305, 625 finalDefault attribute (schema

syntax for: element), 77, 153, 344
complex content, 307, 658 overriding, 153, 419
simple content, 306, 657 syntax of, 676
extension value fixed attribute
of block attribute, 344, 420 of attribute element, 123, 283,
of final attribute, 153, 343, 419 482, 631
of element element, 101
F of facets, 140–141
facetAvailable attribute (Version syntax of:
Control namespace), 645–646 on declaration, 677
syntax of, 675 on facet, 677
facets, 24, 135–152 fixed values
applicability of, 138, 204, 695 avoiding in web services, 548
changing restrictiveness of, 619–620 of attributes, 82, 123, 125, 283, 333,
fixed, 140–141 457
implementation-defined, 155, 642 of elements, 82, 101–104, 110, 321
inheriting, 139, 204 of schema's version, 631
order of, 136 #FIXED speciﬁer (DTDs), 483
prelexical, 151, 643, 646 float type, 213–215
facetUnavailable attribute (Version canonical representation of, 213
Control namespace), 645–646 facets applicable to, 215, 696
syntax of, 675 floating-point numbers, 213
false function (XPath 2.0), 358 form attribute
field element ignoring for attribute names, 580
syntax of, 435, 658 of attribute element, 122
xpath attribute of, 435 of element element, 100
xpathDefaultNamespace attribute qualified value of, 100, 122
of, 375 syntax of, 677
final attribute unqualified value of, 100, 122
#all value of, 152, 343 forward compatibility, 623–626
in version 1.1, 418 in version 1.1, 625, 641–642
empty value of, 343, 419 fractionDigits facet, 137, 145
extension value of, 153, 343, 419 applicability of, 138
list value of, 153 fixed, 141
of complexType element, 343–344 for built-in types, 695–698
of element element, 418–419 for numeric types, 219
of simpleType element, 152–153 syntax of, 138, 658
restriction value of, 153, 343, 419 fragment identifiers, in URIs, 251
syntax of:
on complex type, 675 G
on element, 676 Garden of Eden design, 555–557
on simple type, 676 gDay type, 228
union value of, 153 facets applicable to, 697

Index entries in gray refer to XML Schema 1.1.

710 Index

general entities (DTDs), 493, 499 in attribute groups, 394

gMonth type, 227 in DTDs, 481
facets applicable to, 697 limitations of, 424
gMonthDay type, 227–228 unique values of, 236
facets applicable to, 697 using attributes for, 115
granularity of data, 130 identity constraints, 19, 32, 238,
Gregorian calendar, 221, 225 423–444
group element, 386–390 and namespaces, 54, 439–440
ref attribute of, 388–389, 519 changing restrictiveness of, 322
syntax of: definitions of, 18
definition, 387, 659 fields in, 110, 426, 434
reference, 389, 659 for containment relationships, 511
group references, 388 location of, in element declarations,
in all groups, 276, 391 425
in complex types, 387, 390 names of, 426, 559
nested, 392 referencing, 442–444
occurrence constraints in, 386, 390 scope of, 424, 426
self-referencing, 456 selectors in, 426, 433
> entity reference, 163 XPath subset for, 435–438
gYear type, 225 IDREF type, 237–238
facets applicable to, 697 comparing values of, 254
gYearMonth type, 226 facets applicable to, 238, 697
facets applicable to, 697 for containment relationships, 511
in DTDs, 481
H limitations of, 424
H, in durations, 229–233 using attributes for, 115
hexadecimal encoding, 162, 248 IDREFS type, 239–240
hexBinary type, 248–250 comparing values of, 254
facets applicable to, 250, 698 facets applicable to, 240, 697
values of: in DTDs, 481
comparing, 255 if keyword (XPath 2.0), 369–370
empty, 143 #IMPLIED speciﬁer (DTDs), 483
length of, 249 import element, 30, 66–74
hyperlinks, 525 and namespaces, 49, 85, 568–572
hyphen. See - chained, 70–74
namespace attribute of, 68
I schemaLocation attribute of, 68, 85
i-, in language names, 211 syntax of, 67–68, 659
\i and \I, multicharacter escapes, 167 top-level location of, 68
IANA (Internet Assigned Numbers vs. DTDs, 492
Authority), 211 include element, 29, 62–66, 450,
id attribute, 678 565–568
ID type, 236–237 chameleon, 65–66, 565, 572–574
facets applicable to, 237, 697 schemaLocation attribute of, 63
for containment relationships, 511

Index entries in gray refer to XML Schema 1.1.

Index 711

syntax of, 63, 660 ISO/IEC 19757-2 standard, 12

top-level location of, 63 item types, 182, 188
vs. DTDs, 492 anonymous, 189
indexes, for narrative content, 525 facets applicable to, 192–193
INF (infinity), numeric value, 213–214 length of, 192
-INF (negative infinity), numeric value, lists for (illegal), 196–197
213–214 restricting, 198
+INF (positive infinity), numeric value, unions for, 196
213–214 whitespace in, 189, 195
inheritable attribute (attribute itemType attribute, 189
element), 126–127, 283, 382–383 syntax of, 678
syntax of, 678
instances, 30, 79–87 K
augmenting, 82–83 key constraints, 423, 428–429
upgrading with XSLT, 639 changing restrictiveness of, 322
XHTML representation of, 587 fields in, 426, 428–429
xsi:schemaLocation attribute of, names of, 426
30 referencing, 442–444
int type, 217 scope of, 426, 429, 432
canonical representation of, 204 selectors in, 426
facets applicable to, 696 key element
integer type, 51, 217 for containment relationships, 511
comparing values of, 254–255 syntax of:
facets applicable to, 141, 217, 696 definition, 429, 660
preceding sign rule for, 217 reference, 442, 660
restricting, 142 key references, 423, 430–433
whitespace in, 104, 125 changing restrictiveness of, 322
integer types, 217–220 fields in, 426, 431, 433
canonical representation of, 217 names of, 426
facets applicable to, 217–219 referencing, 442–444
values of: scope of, 426, 432
comparing, 220, 253 selectors in, 426
length of, 144–145 keyref element
vs. strings, 220 for containment relationships, 511
interleave value (mode attribute), 293, refer attribute of, 430
311 syntax of:
internationalization, 582 definition, 430, 661
IRIs (Internationalized Resource reference, 442, 661
Identifiers), 251
Is (in block escapes), 170 L
ISO 11179 standard, 584 L, in category escapes, 168
ISO 3166 standard, 211 lang attribute (XML namespace), 59,
ISO 639 standard, 211 120, 211
ISO 8601 standard, 221 syntax of, 678

Index entries in gray refer to XML Schema 1.1.

712 Index

language type, 211–213 long type, 217

facets applicable to, 213, 696 facets applicable to, 696
last function (XPath 2.0), 364 lower-case function (XPath 2.0), 357
lax value (processContents < entity reference
attribute), 288, 291, 602–603 in regular expressions, 163
in open content, 605 in strings, 205
length facet, 137, 143
changing restrictiveness of, 619–620 M
for binary types, 249 M
for built-in types, 695–698 in category escapes, 168
for list types, 192, 240, 243–244 in durations, 229–233
syntax of, 138, 661 mantissa, in numbers, 213
letters marks, in regular expressions, 168
in binary types, 249 matches function (XPath 2.0), 358
in NMTOKEN type, 243 max function (XPath 2.0), 363
in regular expressions, 161, 168 maxExclusive facet, 137, 142
in XML names, 40, 91, 167, 208, 559 for built-in types, 695–698
line feed character for date and time types, 235
in regular expressions, 163–166 for duration types, 230
in strings, 151 syntax of, 138, 662
list element, 188–189 maxInclusive facet, 137, 142
syntax of, 188, 662 fixed value of, 140
list types, 181–183, 188–198 for built-in types, 695–698
assertions for, 363–365 for date and time types, 235
comparing, 253 for duration types, 230
derived from string-based types, 195 for integer types, 217–218
disadvantages of, 190 for simple content, 317–318
empty, 192 syntax of, 138, 662
facets applicable to, 182, 189–194, maxLength facet, 137, 143
240, 243–244, 698 changing restrictiveness of, 619–620
item types of, 182, 188, 196–197 for built-in types, 695–698
length of, 143, 192 for list types, 192, 240, 243–244
no absent or nil items in, 189–190 syntax of, 138, 663
restricting, 136, 182, 190–194 maxOccurs attribute
list value (final/finalDefault changing restrictiveness of, 321,
attributes), 153 324–329
local names, 40 default value of, 477
##local value for replacement elements, 286
of namespace attribute, 287 in all groups, 277
of notNamespace attribute, 289 in element declarations, 94, 267, 322
of xpathDefaultNamespace in element references, 92, 267
attribute, 375 in element wildcards, 602
localization, 582 in group references, 386, 390, 456
syntax of, 679

Index entries in gray refer to XML Schema 1.1.

Index 713

unbounded value of: in group references, 386, 390, 456

in all groups, 276 in sequence groups, 271
in choice groups, 274 syntax of, 680
vs. DTDs, 477 vs. DTDs, 477
maxVersion attribute (Version Control minus sign. See -
namespace), 641 minVersion attribute (Version Control
syntax of, 679 namespace), 641
member types, 182, 184–187 syntax of, 680
memberTypes attribute, 184–185 missing values, 105–106
syntax of, 679 mixed attribute, 331, 478
metacharacters, 162 of complexContent element, 305
metadata, 114–115, 582, 585 of complexType element, 265
using attributes for, 526 syntax of, 680
min function (XPath 2.0), 363 mixed content, 25–26, 207–208, 264
minExclusive facet, 137, 142 avoiding in web services, 548
for built-in types, 695–698 default/fixed values for, 102
for date and time types, 235 extending, 305, 312–313
for duration types, 230 restricting, 317, 331–332
syntax of, 138, 663 vs. DTDs, 478
minInclusive facet, 137, 142 mod operator (XPath 2.0), 356
fixed value of, 140 mode attribute
for built-in types, 695–698 interleave value of, 293, 311
for date and time types, 235 none value of, 293, 331
for duration types, 230 of defaultOpenContent element,
for integer types, 217–218 296
for simple content, 317–318 of openContent element, 293, 311
syntax of, 138, 663 restricted, 329–330
minLength facet, 137, 143 suffix value of, 293, 296, 311, 329
changing restrictiveness of, 619–620 syntax of, 681
for built-in types, 695–698 model groups, 26, 270–279
for list types, 192, 240, 243–244 definitions of, 18
syntax of, 138, 664 meaningless, 320–321
minOccurs attribute named. See named model groups
changing restrictiveness of, 321, nesting, 26, 275–276
324–329 restricting, 324–329
default value of, 477 See also all, choice, sequence
for defaulted elements, 102 group
for replacement elements, 286 modifiers, in regular expressions, 168
in all groups, 276–277, 310 multicharacter escapes, 166
in choice groups, 275
in element declarations, 94, 267 N
in element references, 92, 267 N, in category escapes, 168–169
in element wildcards, 602 \n single-character escape, 163–165

Index entries in gray refer to XML Schema 1.1.

714 Index

name attribute reusing, 597

of attribute element, 117 vs. anonymous, 133–134, 550
of element element, 92, 339, 387 names, 35–46
of group element, 386 capitalization in, 560
syntax of, 681 case sensitiveness of, 40, 559
Name type, 208–209 changing, 619
facets applicable to, 209, 695 disallowed for replacement elements,
named complex types, 22, 258–260 289–291
extending, 606 duplicating (illegal), 45, 119, 397, 470
names of, 258, 559–560 good practice of, 545, 559–563
referencing in element declarations, 96 in databases and programming
reusing, 597 languages, 560
vs. anonymous, 550 length of, 560
vs. DTDs, 484 non-colonized, 37, 40, 91, 210
named model groups, 19, 32, 385–392, prefixed, 40
544 qualified, 40, 246
all groups in, 391 of attributes, 44–46, 117, 119–120,
and namespaces, 401–403 122–123, 580
definitions of, 386–388 of elements, 91
element declarations in, 387 uniqueness of, 75–76, 117
element references in, 388 qualified local, 98–100, 575, 580
names of, 386, 545, 559 searching in content, 525
occurrence constraints in, 386 separators in, 560
overriding, 459, 467, 613–614 terms and abbreviations in, 561–562
risks of, 470–471 uniqueness of, 19, 557, 568, 574
redefining, 449, 451, 454–456, 600, unprefixed, 40, 94
611–612 of attributes, 122
risks of, 470–471 of elements, 39
referencing, 388–392, 456 undeclaring default namespace with,
in complex types, 278–279, 43
389–391 unqualified, 40
in named model groups, 392 of attributes, 119, 122–123, 580
reusing, 597 of elements, 96
target namespace of, 48 unqualified local, 98–101, 576–580
vs. DTDs, 476–484 valid XML, 208, 559
vs. OO concepts, 519 namespace attribute
vs. type derivation, 403–404, 520 ##any value of, 287, 329
See also all, choice, sequence ##local value of, 287
group ##other value of, 287, 323, 602
named simple types, 22, 121 ##targetNamespace value of, 287
definition of, 131–132 of attribute wildcard, 298, 315–316,
in local attributes, 120 336–337
names of, 131, 559–560 of derived attribute declaration,
referencing in element declarations, 96 335–336

Index entries in gray refer to XML Schema 1.1.

Index 715

of derived element declaration, 322 target. See target namespace

of element wildcard, 287 Namespaces in XML recommendation,
of import element, 68 36–37
of restricted open content, 329–330 NaN (Not a Number), numeric value,
syntax of: 213–214
on import, 682 narrative content, 524–527
on wildcard, 681 NCName type, 210, 236
namespace declarations, 29, 37–39 comparing values of, 254
default, 39–43 facets applicable to, 210, 695
and attributes, 39 NCNames. See non-colonized names
undeclaring, 43–44 NDR (Naming and Design Rules), 548,
overriding, 42–43 601
scope of, 41 negation, in regular expressions, 165,
setting in schema documents, 52–54 173–175
namespace-qualified names. See qualified negative infinity. See -INF
names negative sign. See -
namespaces, 28–29, 35–54 negativeInteger type, 217
advantages of, 36, 564 facets applicable to, 696
and assertions, 372–375 newline. See line feed character
and conditional type assignment, nil attribute (XSI namespace), 51, 80,
381–382 103, 107–110, 143
and imports, 66–67, 568–572 syntax of, 682
and includes, 565–568 nil values, 103, 106–110, 115
and named model groups, 401–403 nillable attribute (element element),
and schemas, 48, 565 109
chameleon, 65–66, 565, 572–574 of derived element declarations, 322
default, 52, 101, 571, 578 syntax of, 682
for path expressions, 60 NLM XML, 526
disallowed for replacement elements, NMTOKEN type, 243–244
289–291 facets applicable to, 244, 697
documenting, 589–592 in DTDs, 481
in path expressions, 440–441 NMTOKENS type, 244
limited support in DTDs for, 564 facets applicable to, 245, 697
multiple in an instance, 85 in DTDs, 481
names of, 35–44 noNamespaceSchemaLocation
case sensitiveness of, 36 attribute (XSI namespace), 51, 80,
version numbers in, 632 83–84, 86
of overridden schema documents, syntax of, 683
459–462, 572 non-colonized names (NCNames), 37,
of redefined schema documents, 448, 40, 91, 210
450, 572, 600 nondeterministic content models, 280,
of replacement elements, 287 470, 602
organizing, 565–574 none value (mode attribute), 293, 331
prefixes for, 28, 37, 41 non-native attributes, 588–589, 629

Index entries in gray refer to XML Schema 1.1.

716 Index

nonNegativeInteger type, 217 number of fractional digits in, 137,

facets applicable to, 696 145
nonPositiveInteger type, 217 numeric types, 213–220, 692–693
facets applicable to, 696 enumerating, 147–148
normalizedString type, 205–208, 481 facets applicable to, 142, 148, 696
empty value of, 103, 143, 206
facets applicable to, 205–207, 695 O
whitespace in, 151–152, 205–206 OASIS international consortium, 12
normalize-space function object-oriented concepts, 514–522
(XPath 2.0), 357 occurrence constraints
not function (XPath 2.0), 358, 378 changing restrictiveness of, 321,
notation element, 494 324–329, 618–620
syntax of, 664 for defaulted elements, 102
NOTATION type, 245–246 for replacement elements, 286
facets applicable to, 246, 697 in all groups, 276, 310
in DTDs, 481–482 in choice groups, 275
restricting, 495 in element declarations, 94, 267, 322
notations, 19, 245–246, 493–496 in element references, 92, 267
and unparsed entities, 496 in element wildcards, 602
declarations of, 18, 494–495 in group references, 386, 390, 456
names of, 494, 559–563 in sequence groups, 271
overriding, 459 vs. DTDs, 477
reusing, 597 open content, 292–298, 604–605
target namespace of, 48 adding/removing, 619
notNamespace attribute and forward compatibility, 625
##local value of, 289 default, 295–298
##targetNamespace value of, 289 extending, 311–312
of attribute wildcard, 298 restricting, 329–331
of element wildcard, 289–290 vs. other extension mechanisms, 600
syntax of, 683 openContent element, 292, 311, 604
notQName attribute mode attribute of, 293, 311, 329–330
##defined value of, 290–291, 625 removing in restriction, 331
##definedSibling value of, syntax of, 292, 664
290–291, 625 optional value
of attribute wildcard, 298 of use attribute, 283
of element wildcard, 290, 625 of value attribute, 150
syntax of, 683 or operator (XPath 2.0), 356, 378
number sign. See # ##other value (namespace attribute on
numbers wildcard), 287, 323, 602
decimal, 215 override element, 459–463, 612–614
floating-point, 213 annotations in, 581, 585
in regular expressions, 168 order of new definitions in, 463
number of digits in, 137, 139, syntax of, 463, 665
144–145, 148–149 top-level location of, 462

Index entries in gray refer to XML Schema 1.1.

Index 717

overrides, 33, 459–471, 612–614 #PCDATA speciﬁer (DTDs), 474–475,

and target namespace, 459–462, 572 478
of attribute groups, 467, 491–492 percent sign, in URIs, 251–252
of complex types, 465 performance, 82
of global declarations, 466 period. See .
of named model groups, 467 Perl programming language, 159
of simple types, 464–465 plus sign. See +
ripple effect of, 461–462 position function (XPath 2.0), 364
risks of, 468–471 positiveInteger type, 217
vs. DTDs, 488 comparing values of, 254–255
vs. other extension mechanisms, 600 facets applicable to, 696
prefixed names, 40
P of attributes, 44–46, 120, 122
P prefixes, 28, 37, 98
in category escapes, 168 in path expressions, 439
in durations, 229–233 mapping to:
p element (HTML), 525 target namespace, 29, 53
\p multicharacter escape, 161, 169–170, XML Schema Namespace, 38,
177 50–52
\P multicharacter escape, 169–171 naming rules for, 41
parameter entities (DTDs) prelexical facets, 151
external, 492 preprocess facet (Saxon), 155
for attribute extensions, 490–491 preserve value (whiteSpace facet),
for extensibility, 486–492 104, 125, 151–152, 205, 254
for reuse, 483–486 primitive types, 203
internal, 483–491 additional, 203
parent keyword (XPath), 438 processContents attribute
parentheses. See ( ) and forward compatibility, 623
particles (of complex type), 262 lax value of, 288, 291, 602–603
path expressions (XPath), 367–369, in open content, 605
435–440 of attribute wildcard, 298, 315–316,
default namespace for, 60 336–337, 602–603
unprefixed names in, 440–441 of element wildcard, 287–289
pattern facet, 137, 139, 148–149 skip value of, 287–289
changing restrictiveness of, 619–620 strict value of, 288–289
for built-in types, 695–698 syntax of, 684
for derived types, 140 prohibited value
for duration types, 231 of use attribute, 283
for list types, 194 of value attribute, 150
for numeric types, 219 proxy schemas, 74
for union types, 185 public attribute (notation element),
multiple occurrences of, 148 493
syntax of, 138, 665 syntax of, 684

Index entries in gray refer to XML Schema 1.1.

718 Index

punctuation signs, in category escapes, of simple types, 452–453

168 ripple effect of, 451
purchase orders, 503, 507 risks of, 468–471
intermediate elements for, 527–531 vs. DTDs, 486–487
ref attribute
Q of attribute groups, 395
QName type, 246 of attributes, 115, 117
comparing values of, 254 of elements, 91, 267, 388
facets applicable to, 247, 698 of identity constraints, 442
qualified names (QNames), 40, 98–100, of named model groups, 388–389
246 syntax of, 684
local, 575, 580 refer attribute (keyref element), 430
local part of, 40 syntax of, 684
of attributes, 44–46, 122, 580 regular expressions, 148–149, 159–178
of elements, 40, 91 atoms in, 161–176
uniqueness of, 75–76, 91, 117 branches in, 159–161, 177–178
qualified value characters in, 161–162, 173
of attributeFormDefault nested, 175
attribute, 122 pieces in, 159, 161
of elementFormDefault attribute, quantifiers in, 161, 165–166, 176–177
99, 402, 576 ranges in, 173
of form attribute, 100, 122 relational models, 503–514
quantifiers (in regular expressions), 161, relationship elements, 512–514
165–166, 176–177 relationships, 507–514
question mark. See ? many-to-many, 507–514
" entity reference, 163 one-to-many, 507
one-to-one, 507
R with references, 510–512
\r single-character escape, 163, 165 with relationship elements, 512–514
RDDL (Resource Directory Description with repetition, 508–509
Language), 590–592 RELAX NG schema language, 12–14
readability rendition, 525–526
and named/anonymous types, 134 replace value (whiteSpace facet),
and namespace prefixes, 37 151–152, 205
of smaller schema documents, 557 replacement elements, 285–291
redefine element, 448–452 disallowed namespaces and names
annotations in, 581, 585 of, 289–291
syntax of, 451, 665 in derived element declarations, 323
top-level location of, 451 namespaces of, 287
redefinition, 33, 448–459 occurrence constraints for, 286
and target namespace, 448, 450, 572 any number of, 293
avoiding in web services, 548 validating, 289
of attribute groups, 456–458, 490–491 See also element wildcards
of complex types, 453–454 representation, 525
of named model groups, 454–456

Index entries in gray refer to XML Schema 1.1.

Index 719

required value round-half-to-even function

of use attribute, 283 (XPath 2.0), 358
of value attribute, 150, 234 Ruby annotations, 207
#REQUIRED speciﬁer (DTDs), 483 Russian Doll design, 551–552
restriction, 303, 305
assertions in, 353–354, 371 S
of all groups: S
in version 1.1, 625 in category escapes, 169
of attribute wildcards, 458 in durations, 229–230, 232–233
of complex types, 455, 603 \s and \S, multicharacter escapes, 166
of content models, 455 Salami Slice design, 553–554
of integer types, 142 SAX (Simple API for XML), 635
of item types, 198 saxon:preprocess facet, 155, 643,
of list types, 136, 182, 190–194 646
of notations, 495 schema documents, 57–77
of simple types, 135–136, 138–140, combining, 61–77
182 comments in, 497–498
of union types, 185 defaults of, 77
with regular expressions, 148–149 location of, 83, 630
restriction element missing external components of, 76
base attribute of, 136 modularizing, 28–30, 57, 62, 492,
syntax for: 557–559, 565–574, 597
complex content, 319, 667 namespace declarations in, 52
list type, 191, 666 natural language of, 59, 211
simple content, 318, 667 overriding, 461
simple type, 136, 666 readability of, 37, 134, 557
union type, 186, 666 redefining, 448, 450
restriction value reusing, 544
of block attribute, 344, 420 structure of, 58–61
of final attribute, 153, 343, 419 sections in, 585–586
reusability, 8 top level of, 19
and anonymous types, 134 global declarations at, 89, 115
of content models, 302, 385 imports at, 68
of schemas, 543, 597 includes at, 63
reusable groups. See named model groups named model groups at, 386
RFC 2396 standard, 251 overrides at, 462
RFC 3066 standard, 211 redefines at, 451
RFC 3548 standard, 249 version numbers of, 59, 628–629
RFC 3987 standard, 251 fixed, 631
RFC 4646 standard, 211 schema element, 28, 58–60, 393
RFC 4647 standard, 211 annotations in, 581, 585
root element, 87, 95 attributeFormDefault attribute
specific for individual operations, 523 of, 77, 122, 580
round function (XPath 2.0), 358 blockDefault attribute of, 77, 345

Index entries in gray refer to XML Schema 1.1.

720 Index

schema element (cont.) selector element

defaultAttributes attribute of, syntax of, 433–434, 668
399–400 xpath attribute of, 435
elementFormDefault attribute of, xpathDefaultNamespace attribute
77, 99–101, 576, 578 of, 375
finalDefault attribute of, 77, 153, semicolon, in regular expressions, 162
344, 419 separators, in category escapes, 169
syntax of, 59, 668 sequence group, 26, 270–272
targetNamespace of, 49–50 extending, 486–488, 606–607
version attribute of, 59, 626, in complex content extensions,
628–629, 631, 689 307–308
xml:lang attribute of, 59 in complex types, 390
xpathDefaultNamespace attribute in named model groups, 386
of, 60, 373–375, 441 meaningless, 321
schema languages, 9–14 nesting, 275–276
schemaLocation attribute occurrence constraints in, 271
of import element, 68, 85 overriding, 488
of include element, 63 redefining, 486–488
syntax of, 685 restricting, 324–329
schemaLocation attribute (XSI syntax of, 270, 668
namespace), 30, 51, 80, 83–87, 588 turning into all or choice group,
of imported documents, 571 619
schemas, 3–8 vs. DTDs, 477
and namespaces, 48, 565 service contracts, 541
components of, 18–20 short type, 218
designing, 7–8, 502, 526–527, facets applicable to, 696
539–580 simple content, 25, 262–263
documenting changes to, 637 assertions for, 354
extending, 8, 531, 542–543, 599–614 default/fixed values for, 102
generating documentation from, 541, deriving complex types from, 303–304
584 extending, 305–306
interoperability of, 518 restricting, 317–318
mapping to XHTML forms, 586 vs. database columns, 504
organizing, 527–537, 550–559 vs. DTDs, 475
portability of, 156 simple types, 19, 21, 23–25, 129–156
preprocessing, 642 anonymous, 22, 96, 120–121,
reusing, 8, 543–544, 557, 584, 132–134, 550
596–597 assertions for, 353–365
tool compatibility of, 547 associating with element names, 96
uses for, 5–7, 540–542 base type of, 135–136, 139, 182
using with DTDs, 499 built-in. See built-in types
version numbers of, 626–634 changing restrictiveness of, 619
versioning, 531, 547–549, 558–559 deriving, 182
Schematron schema language, 13–14 facets applicable to, 138–139

Index entries in gray refer to XML Schema 1.1.

Index 721

global, 22 square brackets, in regular expressions,

implementation-defined, 154, 160–162, 165, 171–172
642–645 starts-with function (XPath 2.0),
local, 19 357
named, 120–121, 131–134, 550, strict value (processContents
559–560, 597 attribute), 288–289
overriding, 459, 464–465 string type, 97, 205–208
risks of, 468–470 facets applicable to, 205–207, 695
patterns in, 160 values of:
preventing derivation of, 152–153 comparing, 254–255
redefining, 449, 452–453, 600, 609 empty, 103, 143, 206
risks of, 468–470 whitespace in, 104, 125, 151–152,
restricting, 24, 135–136, 138–140, 205–206, 254
182 string types, 691–692
with regular expressions, 148–149 deriving list types from, 195
reusing, 597 values of:
turning into unions/lists, 619 comparing, 220, 254
values of: length of, 143–144
comparing, 253 valid, 145
default/fixed, 102 vs. integers, 220
vs. complex types, 21 whitespace in, 151–152
vs. datatypes, 201 string-length function (XPath 2.0),
vs. DTDs, 474 357
simpleContent element, 263, stylesheet element (XSLT), 631
303–304, 306 substitution groups, 32, 407–414,
syntax of, 304, 669 607–609
simpleType element, 96, 120 and data binding tools, 518
final attribute of, 152–153 compatibility of, 619
syntax of: controlling, 418–420
anonymous definition, 133, 669 declaring, 409–412
named definition, 132, 669 disadvantages of, 417
using with restriction, 136 element declarations in, 95, 114
single-character escapes, 165 multiple, 413–414
skip value (processContents for inline elements, 525
attribute), 287–289 head of, 408–409, 608
smartphones, 524 members of, 408–410
source attribute (documentation type constraints for, 412–413
element), 582 vs. choice groups, 414–415
syntax of, 685 vs. DTDs, 489
SOX (Schema for Object-oriented XML), vs. other extension mechanisms, 600
11 substitution value
space character (block/blockDefault attributes),
in regular expressions, 162–163, 166 420
in strings, 151, 205–206

Index entries in gray refer to XML Schema 1.1.

722 Index

substitutionGroup attribute targetNamespace attribute

(element element), 410–412 of element or attribute elements,
list of names in, 413–414 339–341
syntax of, 685 of schema element, 49–50
substring, substring-after, syntax of:
substring-before functions on local declaration, 686
(XPath 2.0), 357 on schema, 686
suffix value (mode attribute), 293, 296, ##targetNamespace value
311, 329 of namespace attribute, 287
sum function (XPath 2.0), 364 of notNamespace attribute, 289
system attribute (notation element), of xpathDefaultNamespace
121, 493 attribute, 374
syntax of, 686 test attribute (alternative, assert,
assertion elements), 352,
T 375–376
T syntax of, 686
in dates, 223 using XPath in, 378–380
in durations, 229–230, 232–233 text() (XPath), 438
\t single-character escape, 163, 165 then keyword (XPath 2.0), 369–370
tab character time type, 222–223
in regular expressions, 163, 165–166 comparing values of, 255
in strings, 151, 205–206 facets applicable to, 697
tables of contents, for narrative content, time zones, 222–234
525 and bounds facets, 235
target namespace, 28–29, 48–54 token type, 205–208
and conditional type assignment, facets applicable to, 205–207, 695
381 values of:
and global attributes, 117 comparing, 255
and identity constraints, 54 empty, 103, 143, 206
and named model groups, 401 vs. integers, 220
and notations, 494 whitespace in, 151–152, 205–206, 254
changing, 619 totalDigits facet, 137, 144–145
for included documents, 62–66 for built-in types, 695–698
in path expressions, 440–441 syntax of, 138, 670
making default, 52 trading partners, 5
mapping prefixes to, 29, 53 Trang schema converter, 10
multiple in assembled schema true function (XPath 2.0), 358
documents, 28–29 type attribute
of overridden schema documents, of alternative element, 376
459–462, 572 of attribute element, 120–121, 394
of redefined schema documents, 448, of element element, 96–97, 267, 387
450, 572 syntax of, 687

Index entries in gray refer to XML Schema 1.1.

Index 723

type attribute (XSI namespace), 51, 80, typeUnavailable attribute (Version

120, 518, 600 Control namespace), 644–645
avoiding in web services, 548 syntax of, 688
for member types, 187
for repeated element names, 415–416 U
for type derivation, 606 unbounded value (maxOccurs
for type redefinition, 609 attribute), 274
for type substitution, 342, 605 in all groups, 276
syntax of, 687 underscore. See _
type constraints, 412–413 Unicode standard, 162, 165, 167, 205,
type derivation, 301–348, 605 251
by extension, 22, 27, 302–303, 403, union element, 183–185
606 syntax of, 183, 670
by restriction, 22, 301–303, 316–337 union types, 146, 181–187
prohibiting, 343–348 comparing, 253
vs. named model groups, 403–404, facets applicable to, 185, 698
520 lists of, 196
vs. OO class inheritance, 514–518 member types of, 182, 184–187
type libraries, 584 restricting, 185
type substitution, 115, 302, 341–342, union value (final/finalDefault
518, 605 attributes), 153
avoiding in web services, 548 unique element, 426–428
prohibiting, 344–346 syntax of:
vs. other extension mechanisms, 600 definition, 427, 670
typeAvailable attribute (Version reference, 442, 671
Control namespace), 644–645 uniqueness constraints, 423, 426–428
syntax of, 687 changing restrictiveness of, 322
types, 4, 18–23 defining, 425
applicability of, 597 fields in, 110, 426, 428
canonical representation of, 204 names of, 426
comparing, 432 referencing, 442–444
definitions of, 18 scope of, 426, 428
extending, 516 selectors in, 426, 428
hierarchy of, 22, 253 units of measurement, 130
inheriting, 544 unparsed entities (DTDs), 240, 242, 493
lexical representations of, 204, 254 and notations, 496
named vs. anonymous, 22, 96 unprefixed names, 40, 43
names of, 545, 597 and default namespace, 40
redefining, 600, 609 of attributes, 44, 122
reusing, 523–524, 544 of elements, 39, 94
target namespace of, 48 unqualified names
value spaces of, 204 and default namespace, 101
See also complex, simple types local, 98–101, 576–580
of attributes, 40, 122, 580

Index entries in gray refer to XML Schema 1.1.

724 Index

unqualified names (cont.) with RELAX NG, 12

of elements, 40, 96 with Schematron, 13–14
unqualified value value attribute, 137
of attributeFormDefault of explicitTimezone facet, 150,
attribute, 122 234
of elementFormDefault attribute, syntax of, 688
77, 99, 101, 577–578 $value variable, 353, 358
of form attribute, 100, 122 for list types, 363–365
unsignedByte, unsignedInt, vc:facetAvailable attribute,
unsignedLong, unsignedShort 645–646
types, 218 syntax of, 675
facets applicable to, 696 vc:facetUnavailable attribute,
UPA (Unique Particle Attribution), 279, 645–646
602, 604, 624 syntax of, 675
upper-case function (XPath 2.0), 357 vc:maxVersion attribute, 641
URIs (Uniform Resource Identiﬁers), 36, syntax of, 679
250–251 vc:minVersion attribute, 641
URLs (Uniform Resource Locators), 36, syntax of, 680
251 vc:typeAvailable attribute, 645
spaces in, 85 syntax of, 687
URNs (Uniform Resource Names), 36, vc:typeUnavailable attribute, 645
251 syntax of, 688
use attribute (attribute element), 117, Venetian Blind design, 554–557
119, 394, 482–483 version attribute (schema element),
optional value of, 283 59, 626, 628–629, 631
prohibited value of, 283 fixed value of, 631
required value of, 283 syntax of, 689
syntax of, 688 Version Control Namespace (vc), 51,
user documentation. See documentation 641, 645–646
UTC (Coordinated Universal Time), 233 versioning, 617–639
and compatibility, 617–626
V defining strategy for, 547–549, 636
validation, 5, 81–82, 540–541 granularity of, 558–559
against both DTDs and schemas, 114, intermediate elements for, 531
499 major and minor versions for,
and performance, 82 626–627, 633
and specific root elements, 523 numbering, 626–634
by type, 7, 21 vertical bar. See |
choosing schema documents for, 30,
87 W
co-constraints for, 586 \w and \W, multicharacter escapes, 167
of concatenated values, 130 W3C (World Wide Web Consortium),
of intra-document references, 525 11
strictness of, 287–289, 588 web browsers, 524

Index entries in gray refer to XML Schema 1.1.

Index 725

web services, 522–524 XLink (XML Linking Language)

whitespace for external documentation, 584
in binary types, 249 using attributes for, 115
in DTD processors, 474 XML DTD types, 236–246, 694
in lists, 182, 188, 195 facets applicable to, 697
in regular expressions, 166 length of values of, 143–144
normalizing, 82, 254, 499 XML (Extensible Markup Language)
whiteSpace facet, 103, 137, 151 documents. See instances
collapse value of, 104, 125, intermediate elements in, 527–531,
151–152, 189, 205–206, 255 546
for built-in types, 695–698 names in, 167
for list types, 189 separating content from representation
for NMTOKEN type, 243 in, 524–526
for numeric types, 104, 125, 219 XML messages, 502–537
for strings, 104, 125, 205–207, 254 tightly coupling with databases, 504
preserve value of, 104, 125, XML Schema 1.0
151–152, 205, 254 all groups in, 532
replace value of, 151–152, 205 attribute groups in, 394
syntax of, 138, 671 element declarations vs. wildcards in,
wildcard escape character, 164 604, 624
wildcards, 284–298, 600–604 generic elements in, 535–537
adding/removing, 619 redefinition in, 448–459
for attributes. See attribute wildcards XML Schema 1.1, 11, 640–641
for forward compatibility, 623 +INF value in, 213–214
location of, in complex type all groups in, 276, 310–311, 391,
definitions, 602 606, 625
negative, 289–291, 625 alternative element in, 97,
restricting, 322–324 375–376, 650
vs. DTDs, 480 assertions in, 33, 137, 150, 185,
vs. other extension mechanisms, 600 351–375, 651–652, 686,
See also attribute, element wildcards 695–698
wrapper lists, 531–532 attributes in Version Control
WXS (W3C XML Schema). See XML Namespace in, 51, 641,
Schema 645–646
backward-compatible with 1.0, 11
X conditional type assignment in,
x-, in language names, 211 375–383
XDR (XML Data Reduced), 11 date and time types in, 224–225,
XHTML (Extensible HyperText Markup 231–233, 697
Language), 526 defaultAttributes and
and simple types, 207 defaultAttributesApply
including elements from, 572 attributes in, 284, 399–400,
mapping schemas to forms in, 586 674

Index entries in gray refer to XML Schema 1.1.

726 Index

XML Schema 1.1 (cont.) using XPath 2.0 with, 352, 355–365,
defaultOpenContent element in, 367–370, 378–380, 435–440
295–298, 655, 672, 681 wildcards in, 289–291, 293, 683
element declarations in: xpathDefaultNamespace attribute
multiple, 413 in, 60, 373–375, 381, 441, 689
vs. wildcards, 280, 624 XML Schema Instance Namespace, 51,
element wildcards in, 625 79–80, 108
elementFormDefault attribute in, XML Schema Namespace, 50, 97
100 prefixes mapped to, 38, 50–52
explicitTimezone facet in, XML Schema recommendation, 11–14,
137–138, 150, 234, 657, 201
695–698 xml:lang attribute, 59, 120, 211
field element in, 375 syntax of, 678
final attribute in, 418 xmlns attribute, 39
finalDefault attribute of, 153, xmlns preﬁx, 37, 39
419 xpath attribute, 435
forward compatibility in, 625, syntax of, 689
641–642, 679–680 XPath language, 13
ID type in, 236 and list types, 190
implementation-defined facets in, attributes in, 367, 436–437
155, 642, 645–646, 675 expressions in, 367–369, 435–440
implementation-defined simple types processing relationships in, 511
in, 154, 642–645, 687 unprefixed names in, 440–441
inheritable attributes in, 126–127, wildcards in, 436–437
283, 382–383, 678 XPath 2.0 language
integer values in, 218 comparing types in, 359–362
IRIs in, 251 conditional expressions in, 369–370
namespaces in, 36–37, 43–44, for assertions, 352
289–291, 459–462, 572, 683 functions in, 357–359, 363–364
open content in, 292–298, 311–312, in conditional type assignment,
329–331, 600, 604–605, 619, 378–380
625, 664, 681 operators in, 355–356
overrides in, 33, 459–471, 488, xpathDefaultNamespace attribute
491–492, 572, 581, 585, 600, ##defaultNamespace value of, 375
612–614, 665 ##local value of, 375
primitive types in, 203 ##targetNamespace value of, 374
processContents attribute in, 605 of alternative element, 375, 381
referencing identity constraints in, of assert element, 375
442–444, 660–661, 671 of assertion element, 375
restrictions in, 320 of field element, 375
selector element in, 375 of schema element, 60, 373–375,
substitution groups in, 413–414 441
targetNamespace attribute in, of selector element, 375
339–341, 686 syntax of, 689

Index entries in gray refer to XML Schema 1.1.

Index 727

xs preﬁx, 28, 38, 50–52, 97 for repeated element names, 415–416

See also built-in types for type derivation, 606
xs:error built-in type, 380–381 for type redefinition, 609
XSD (W3C XML Schema Definition for type substitution, 342, 605
Language). See XML Schema syntax of, 687
xsd preﬁx, 38, 50–52 XSL-FO (Extensible Stylesheet Language
XSDL (XML Schema Definition Formatting Objects), 526
Language). See XML Schema XSLT (Extensible Stylesheet Language
xsi preﬁx, 80 Transformations), 635
xsi:nil attribute, 51, 80, 103, and list types, 190
107–110, 143 for upgrading instances, 639
syntax of, 682 processing messages in, 521, 532
xsi:noNamespaceSchemaLocation schema-awareness of (version 2.0), 417
attribute, 51, 80, 83–84, 86
syntax of, 683 Y
xsi:schemaLocation attribute, 30, Y, in durations, 229–231
51, 80, 83–87, 588 yearMonthDuration type, 231–232
of imported documents, 571 facets applicable to, 697
syntax of, 685
xsi:type attribute, 51, 80, 120, 518, Z
600 Z
avoiding in web services, 548 in category escapes, 169
for member types, 187 in time values, 233–234
zero. See 0

Index entries in gray refer to XML Schema 1.1.

Structure Manager
No ratings yet
Structure Manager
582 pages
Apache Cocoon
No ratings yet
Apache Cocoon
20 pages
XSD Tutorial
100% (1)
XSD Tutorial
48 pages
Guide To The Unified Process Featuring UML, Java and Design Patterns
No ratings yet
Guide To The Unified Process Featuring UML, Java and Design Patterns
1 page
SQL For Beginners
No ratings yet
SQL For Beginners
171 pages
034568-Oracle PLM in Consumer Goods
No ratings yet
034568-Oracle PLM in Consumer Goods
52 pages
Microsoft Windows Communication Foundation 4.0 Cookbook for Developing SOA Applications
From Everand
Microsoft Windows Communication Foundation 4.0 Cookbook for Developing SOA Applications
Steven Cheng
No ratings yet
TRN-4751 Introduction To PTC Windchill PDMLink 11.0 For Heavy Users Overview
No ratings yet
TRN-4751 Introduction To PTC Windchill PDMLink 11.0 For Heavy Users Overview
9 pages
Avaya IP500 With OneStream H323
No ratings yet
Avaya IP500 With OneStream H323
6 pages
Advanced ITT PDF
No ratings yet
Advanced ITT PDF
37 pages
Application Build For Administrators: Classroom
No ratings yet
Application Build For Administrators: Classroom
2 pages
PLM Course Curriculum
No ratings yet
PLM Course Curriculum
5 pages
XML and Database
No ratings yet
XML and Database
89 pages
LINQPad Succinclty
No ratings yet
LINQPad Succinclty
106 pages
Econnect Programmer'S Guide: Microsoft Dynamics GP 2013 R2
No ratings yet
Econnect Programmer'S Guide: Microsoft Dynamics GP 2013 R2
160 pages
Oracle Financials Cloud 2016 Sales Specialist Assessment
67% (3)
Oracle Financials Cloud 2016 Sales Specialist Assessment
20 pages
DotNetNuke 5.4 Cookbook
From Everand
DotNetNuke 5.4 Cookbook
John K Murphy
5/5 (1)
Oracle PLSQL Best Practices and Tuning PDF
100% (1)
Oracle PLSQL Best Practices and Tuning PDF
270 pages
Advanced Functional Programming: Mastering Concepts and Techniques
From Everand
Advanced Functional Programming: Mastering Concepts and Techniques
Peter Jones
No ratings yet
Multidimensional Model Programming
100% (2)
Multidimensional Model Programming
317 pages
Microsoft Dynamics CRM 2011: Dashboards Cookbook
From Everand
Microsoft Dynamics CRM 2011: Dashboards Cookbook
Mark AuCoin
No ratings yet
Ultimate Salesforce Data Cloud for Customer Experience: Explore, Implement and Elevate B2C Experiences Through Customer Data Innovations Using Salesforce Data Cloud
From Everand
Ultimate Salesforce Data Cloud for Customer Experience: Explore, Implement and Elevate B2C Experiences Through Customer Data Innovations Using Salesforce Data Cloud
Gourab Mukherjee
No ratings yet
SQL Server XML
100% (1)
SQL Server XML
118 pages
XSLT Tutorial
No ratings yet
XSLT Tutorial
19 pages
The Mastering Formats Guide
No ratings yet
The Mastering Formats Guide
11 pages
Avianca XML Training
No ratings yet
Avianca XML Training
109 pages
4 Dicom
No ratings yet
4 Dicom
14 pages
XML Web Services For The Microsoft .NET Platform
No ratings yet
XML Web Services For The Microsoft .NET Platform
388 pages
Microsoft Dynamics GP 2010 Cookbook
From Everand
Microsoft Dynamics GP 2010 Cookbook
Mark Polino
5/5 (1)
R0001 Ebook BizTalk Mapping Patterns and Best Practices 20140923 v1
No ratings yet
R0001 Ebook BizTalk Mapping Patterns and Best Practices 20140923 v1
367 pages
ColdFusion Interview Questions, Answers, and Explanations: ColdFusion Certification Review
From Everand
ColdFusion Interview Questions, Answers, and Explanations: ColdFusion Certification Review
equitypress
No ratings yet
New Challenges in Data Integration: Large Scale Automatic Schema Matching
No ratings yet
New Challenges in Data Integration: Large Scale Automatic Schema Matching
8 pages
Transact-SQL Data Definition Language - DDL - Reference
No ratings yet
Transact-SQL Data Definition Language - DDL - Reference
772 pages
SAS Data Analytic Development: Dimensions of Software Quality
From Everand
SAS Data Analytic Development: Dimensions of Software Quality
Troy Martin Hughes
No ratings yet
MySQL Management and Administration with Navicat
From Everand
MySQL Management and Administration with Navicat
Gokhan Ozar
No ratings yet
Position Budgeting: An Oracle White Paper May 2007
No ratings yet
Position Budgeting: An Oracle White Paper May 2007
29 pages
Cloning Internet Applications with Ruby
From Everand
Cloning Internet Applications with Ruby
Chang Sau Sheong
5/5 (2)
Basic Oracle SQL Courseware
0% (1)
Basic Oracle SQL Courseware
214 pages
Convert To Bi Publisher 11g 1611815
No ratings yet
Convert To Bi Publisher 11g 1611815
53 pages
IEEE 1074 Standards
No ratings yet
IEEE 1074 Standards
37 pages
File Explorer Manager Guide
No ratings yet
File Explorer Manager Guide
32 pages
SQL Server 2012 For Developers
No ratings yet
SQL Server 2012 For Developers
816 pages
Oracle Corporation by Vijayakumar Excel B-School
No ratings yet
Oracle Corporation by Vijayakumar Excel B-School
31 pages
Advanced Julia Programming: Comprehensive Techniques and Best Practices
From Everand
Advanced Julia Programming: Comprehensive Techniques and Best Practices
Adam Jones
No ratings yet
Mastering Neumorphism
No ratings yet
Mastering Neumorphism
147 pages
Embedded Systems Programming with C: Writing Code for Microcontrollers
From Everand
Embedded Systems Programming with C: Writing Code for Microcontrollers
Larry Jones
No ratings yet
Apex Overview
No ratings yet
Apex Overview
32 pages
A Guide To SQL Standard
0% (1)
A Guide To SQL Standard
9 pages
EDI Material PDF
No ratings yet
EDI Material PDF
244 pages
Database Systems Introduction
No ratings yet
Database Systems Introduction
40 pages
Java Complete Self-Assessment Guide
From Everand
Java Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
DataTable JSON Serialization in JSON - Net and JavaScriptSerializer
No ratings yet
DataTable JSON Serialization in JSON - Net and JavaScriptSerializer
9 pages
IBM Cognos Business Intelligence 10.1 Dashboarding Cookbook
From Everand
IBM Cognos Business Intelligence 10.1 Dashboarding Cookbook
Ankit Garg
No ratings yet
CX ML Users Guide
No ratings yet
CX ML Users Guide
442 pages
Exchange Server 2010 Administration: Real World Skills for MCITP Certification and Beyond (Exams 70-662 and 70-663)
From Everand
Exchange Server 2010 Administration: Real World Skills for MCITP Certification and Beyond (Exams 70-662 and 70-663)
Joel Stidley
4/5 (1)
Microsoft Project 2010 Step by Step-Chapter2
No ratings yet
Microsoft Project 2010 Step by Step-Chapter2
31 pages
iOS 18 App Development Essentials: Developing iOS Apps with SwiftUI, Swift, and Xcode 16
From Everand
iOS 18 App Development Essentials: Developing iOS Apps with SwiftUI, Swift, and Xcode 16
Neil Smyth
No ratings yet
Business rules A Complete Guide
From Everand
Business rules A Complete Guide
Gerardus Blokdyk
No ratings yet
C TB1200 90
No ratings yet
C TB1200 90
4 pages
Oracle - Passguide.1z0 146.v2015!04!08.by - Bernadine
No ratings yet
Oracle - Passguide.1z0 146.v2015!04!08.by - Bernadine
124 pages
Ii Sem - DBMS Lab Manual - Part B
No ratings yet
Ii Sem - DBMS Lab Manual - Part B
26 pages
Mysql Dump 10.13 Distrib 8.0.13, For Win64 (X86 - 64) - Host: Localhost Database: Online - Store - Server Version 8.0.13
No ratings yet
Mysql Dump 10.13 Distrib 8.0.13, For Win64 (X86 - 64) - Host: Localhost Database: Online - Store - Server Version 8.0.13
51 pages
Session 2
No ratings yet
Session 2
8 pages
Determining Suitability of Database Functionality and Scalability
0% (1)
Determining Suitability of Database Functionality and Scalability
27 pages
Manifest
No ratings yet
Manifest
59 pages
Data-Base: Punjab University College of Information Technology
No ratings yet
Data-Base: Punjab University College of Information Technology
10 pages
Dbms MBA Notes
50% (2)
Dbms MBA Notes
125 pages
St. Xavier'S College: Database Management System Lab Assignment #1
No ratings yet
St. Xavier'S College: Database Management System Lab Assignment #1
6 pages
Examples
No ratings yet
Examples
11 pages
SGML and XML
No ratings yet
SGML and XML
23 pages
DBMS 3
No ratings yet
DBMS 3
40 pages
Top SQL Interview Questions and Answers in 2023
No ratings yet
Top SQL Interview Questions and Answers in 2023
40 pages
Sample of School ID
No ratings yet
Sample of School ID
10 pages
Question Bank DBMS
No ratings yet
Question Bank DBMS
7 pages
Week #6 - Data Modeling and Normalization
No ratings yet
Week #6 - Data Modeling and Normalization
10 pages
Untitled
No ratings yet
Untitled
5 pages
Dbms
No ratings yet
Dbms
32 pages
Exercicios SQL
No ratings yet
Exercicios SQL
6 pages
DBMS Q1
No ratings yet
DBMS Q1
4 pages
DBMS Theory Assignment 1 (B)
0% (1)
DBMS Theory Assignment 1 (B)
1 page
Um6p Cs Introdb 2023 Midterm Exam Solutions
No ratings yet
Um6p Cs Introdb 2023 Midterm Exam Solutions
7 pages
Query Optimization
No ratings yet
Query Optimization
5 pages
Thinksheet - Class 12 Database Management Sysytem 2025
No ratings yet
Thinksheet - Class 12 Database Management Sysytem 2025
3 pages
Chapter 15 - Relational Database Design Algorithms and Further Dependencies
No ratings yet
Chapter 15 - Relational Database Design Algorithms and Further Dependencies
6 pages
Orm PDF
No ratings yet
Orm PDF
30 pages
COMSATS University Islamabad, Vehari: Department of Computer Science 2nd Assignment SP 20
No ratings yet
COMSATS University Islamabad, Vehari: Department of Computer Science 2nd Assignment SP 20
14 pages
Unit 5 Database and SQL
100% (1)
Unit 5 Database and SQL
21 pages
Logical Database Design and The Relational Model
No ratings yet
Logical Database Design and The Relational Model
50 pages
Ariv Nayak Aadarcard Original PDF
No ratings yet
Ariv Nayak Aadarcard Original PDF
1 page