4-Schemas
4-Schemas
schema:
How can we make tools that check that a formal definition of the syntax of an XML language
an XML document is a syntactically correct
Recipe Markup Language document (and thus
meaningful)? schema language:
a notation for writing schemas
An Introduction to XML and Web Technologies 3 An Introduction to XML and Web Technologies 4
1
Validation Why use Schemas?
instance
document Formal but human-readable descriptions
schema
Data validation can be performed with existing
schema schema processors
processor
valid invalid
normalized error
instance message
document
An Introduction to XML and Web Technologies 5 An Introduction to XML and Web Technologies 6
An Introduction to XML and Web Technologies 7 An Introduction to XML and Web Technologies 8
2
Examples DTD – Document Type Definition
Associates a DTD schema with the instance document <!ELEMENT element-name content-model >
<?xml version="1.1"?>
<!DOCTYPE collection SYSTEM "http://www.brics.dk/ixwt/recipes.dtd"> Content models:
<collection>
... EMPTY
</collection>
ANY
<!DOCTYPE html mixed content: (#PCDATA|e1|e2|...|en)*
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN”
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> element content: regular expression over element names
(concatenation is written with “,”)
<!DOCTYPE collection [ ... ]>
Example:
<!ELEMENT table
(caption?,(col*|colgroup*),thead?,tfoot?,(tbody+|tr+)) >
An Introduction to XML and Web Technologies 11 An Introduction to XML and Web Technologies 12
3
Attribute-
Attribute-List Declarations Attribute Types
An Introduction to XML and Web Technologies 13 An Introduction to XML and Web Technologies 14
An Introduction to XML and Web Technologies 15 An Introduction to XML and Web Technologies 16
4
Entity Declarations (2/3) Entity Declarations (3/3)
An Introduction to XML and Web Technologies 17 An Introduction to XML and Web Technologies 18
Allow parts of schemas to be enabled/disabled A DTD processor (also called a validating XML parser)
by a switch parses the input document (includes checking
Example: well-formedness)
• <![%person.simple; [
<!ELEMENT person (firstname,lastname)> checks the root element name
]]>
<![%person.full; [ for each element, checks its contents and
<!ELEMENT person (firstname,lastname,email+,phone?)>
<!ELEMENT email (#PCDATA)> attributes
<!ELEMENT phone (#PCDATA)>
]]> checks uniqueness and referential constraints
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)> (ID/IDREF(S) attributes)
• <!ENTITY % person.simple "INCLUDE" >
<!ENTITY % person.full "IGNORE" >
An Introduction to XML and Web Technologies 19 An Introduction to XML and Web Technologies 20
5
RecipeML with DTD (1/2) RecipeML with DTD (2/2)
An Introduction to XML and Web Technologies 21 An Introduction to XML and Web Technologies 22
6
Requirements for XML Schema Types and Declarations
Technical requirements:
Namespace support
Element declaration:
User-defined datatypes associates an element name with a simple or complex type
Inheritance (OO-like)
Evolution Attribute declaration:
Embedded documentation associates an attribute name with a simple type
...
An Introduction to XML and Web Technologies 25 An Introduction to XML and Web Technologies 26
<schema xmlns="http://www.w3.org/2001/XMLSchema"
<b:card xmlns:b="http://businesscard.org">
xmlns:b="http://businesscard.org"
<b:name>John Doe</b:name> targetNamespace="http://businesscard.org">
<b:title>CEO, Widget Inc.</b:title>
<b:email>[email protected]</b:email> <element name="card" type="b:card_type"/>
<element name="name" type="string"/>
<b:phone>(202) 555-1414</b:phone>
<element name="title" type="string"/>
<b:logo b:uri="widget.gif"/> <element name="email" type="string"/>
</b:card> <element name="phone" type="string"/>
<element name="logo" type="b:logo_type"/>
<attribute name="uri" type="anyURI"/>
An Introduction to XML and Web Technologies 27 An Introduction to XML and Web Technologies 28
7
Example (3/3) Connecting Schemas and Instances
<complexType name="card_type">
<sequence>
<element ref="b:name"/> <b:card xmlns:b="http://businesscard.org“
<element ref="b:title"/> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
<element ref="b:email"/> xsi:schemaLocation="http://businesscard.org
<element ref="b:phone" minOccurs="0"/> business_card.xsd">
<element ref="b:logo" minOccurs="0"/> <b:name>John Doe</b:name>
</sequence>
<b:title>CEO, Widget Inc.</b:title>
</complexType>
<b:email>[email protected]</b:email>
<b:phone>(202) 555-1414</b:phone>
<complexType name="logo_type">
<b:logo b:uri="widget.gif"/>
<attribute ref=“b:uri" use="required"/>
</b:card>
</complexType>
</schema>
An Introduction to XML and Web Technologies 29 An Introduction to XML and Web Technologies 30
An Introduction to XML and Web Technologies 31 An Introduction to XML and Web Technologies 32
8
Derivation of Simple Types – Restriction Examples
An Introduction to XML and Web Technologies 33 An Introduction to XML and Web Technologies 34
<simpleType name="boolean_or_decimal">
<simpleType name="integerList"> <union>
<list itemType="integer"/> <simpleType>
</simpleType> <restriction base="boolean"/>
</simpleType>
matches whitespace separated lists of integers <simpleType>
<restriction base="decimal"/>
</simpleType>
</union>
</simpleType>
An Introduction to XML and Web Technologies 35 An Introduction to XML and Web Technologies 36
9
Built-
Built-In Derived Simple Types Complex Types with Complex Contents
An Introduction to XML and Web Technologies 37 An Introduction to XML and Web Technologies 38
An Introduction to XML and Web Technologies 39 An Introduction to XML and Web Technologies 40
10
Derivation with Complex Content Global vs. Local Descriptions
<complexType name="basic_card_type">
<sequence> Global (toplevel) style: Local (inlined) style:
<element ref="b:name"/>
<element name="card“ <element name="card">
</sequence> inlined
</complexType> type="b:card_type"/> <complexType>
<element name="name“ <sequence>
<complexType name="extended_type"> type="string"/> <element name="name"
<complexContent> <complexType name="further_derived"> type="string"/>
<extension base= <complexContent>
<restriction base= <complexType name="card_type"> ...
"b:basic_card_type">
<sequence> "b:extended_type"> <sequence> </sequence>
<element ref="b:title"/> <sequence> <element ref="b:name"/> </complexType>
<element ref="b:email" <element ref="b:name"/>
... </element>
minOccurs="0"/> <element ref="b:title"/>
<element ref="b:email"/> </sequence>
</sequence>
</extension> </sequence> </complexType>
</complexContent> </restriction>
</complexType> </complexContent>
</complexType>
Local type definitions are anonymous Two element declarations that have the same name
and appear in the same complex type must have identical types
An Introduction to XML and Web Technologies 43 An Introduction to XML and Web Technologies 44
11
Namespaces Derived Types and Subsumption
Assume D is (in some number of steps) derived from B, <element name="w:widget" xmlns:w="http://www.widget.org">
<complexType>
ED is an element declaration of type D, and in every widget, each part must have
...
EB is an element declaration of type B unique (manufacturer, productid)
</complexType>
<key name="my_widget_key">
<selector xpath="w:components/w:part"/>
If ED is in substitution group of EB then <field xpath="@manufacturer"/> only a “downward”
an ED element may be used whenever an EB is required <field xpath="w:info/@productid"/> subset of XPath is used
</key>
<keyref name="annotation_references" refer="w:my_widget_key">
(This is subsumption based on element declarations, <selector xpath=".//w:annotation"/>
not on types) <field xpath="@manu"/>
<field xpath="@prod"/>
</keyref>
in every widget, for each annotation,
</element>
(manu, prod) must match a my_widget_key
unique: as key, but fields may be absent
An Introduction to XML and Web Technologies 47 An Introduction to XML and Web Technologies 48
12
Other Features in XML Schema RecipeML with XML Schema (1/5)
An Introduction to XML and Web Technologies 49 An Introduction to XML and Web Technologies 50
RecipeML with XML Schema (2/5) RecipeML with XML Schema (3/5)
<element name="ingredient">
<complexType>
<element name="recipe"> <sequence minOccurs="0">
<complexType> <element ref="r:ingredient" minOccurs="0" maxOccurs="unbounded"/>
<sequence> <element ref="r:preparation"/>
<element name="title" type="string"/> </sequence>
<element name="date" type="string"/> <attribute name="name" use="required"/>
<element ref="r:ingredient" minOccurs="0" maxOccurs="unbounded"/> <attribute name="amount" use="optional">
<element ref="r:preparation"/> <simpleType>
<element name="comment" type="string" minOccurs="0"/> <union>
<element ref="r:nutrition"/> <simpleType>
<element ref="r:related" minOccurs="0" maxOccurs="unbounded"/> <restriction base="r:nonNegativeDecimal"/>
</sequence> </simpleType>
<attribute name="id" type="NMTOKEN"/> <simpleType>
</complexType> <restriction base="string">
</element> <enumeration value="*"/>
</restriction>
</simpleType>
</union>
</simpleType>
</attribute>
<attribute name="unit" use="optional"/>
</complexType>
</element>
An Introduction to XML and Web Technologies 51 An Introduction to XML and Web Technologies 52
13
RecipeML with XML Schema (4/5) RecipeML with XML Schema (5/5)
<element name="preparation">
<complexType> <simpleType name="nonNegativeDecimal">
<sequence> <restriction base="decimal">
<element name="step" type="string“ minOccurs="0“ maxOccurs="unbounded"/> <minInclusive value="0"/>
</sequence> </restriction>
</complexType> </simpleType>
</element>
<simpleType name="percentage">
<element name="nutrition"> <restriction base="string">
<complexType> <pattern value="([0-9]|[1-9][0-9]|100)%"/>
<attribute name="calories" type="r:nonNegativeDecimal“ use="required"/> </restriction>
<attribute name="protein" type="r:percentage" use="required"/> </simpleType>
<attribute name="carbohydrates" type="r:percentage" use="required"/>
<attribute name="fat" type="r:percentage" use="required"/> </schema>
<attribute name="alcohol" type="r:percentage" use="optional"/>
</complexType>
</element>
<element name="related">
<complexType>
<attribute name="ref" type="NMTOKEN" use="required"/>
</complexType>
</element>
An Introduction to XML and Web Technologies 53 An Introduction to XML and Web Technologies 54
calories should contain a non-negative number 1. The details are extremely complicated (and the spec is unreadable)
ed
solv a value on the form N% where N
protein should contain 2. Declarations are (mostly) context insentitive
is between 0 and 100; 3. It is impossible to write an XML Schema description of XML Schema
comment should be allowed to appear 4. With mixed content, character data cannot be constrained
anywhere in the contents of recipe 5. Unqualified local elements are bad practice
unit should only be allowed in an elements 6. Cannot require specific root element
where amount is also present 7. Element defaults cannot contain markup
8. The type system is overly complicated
nested ingredient elements should only be
9. xsi:type is problematic
allowed when amount is absent
10. Simple type definitions are inflexible
An Introduction to XML and Web Technologies 55 An Introduction to XML and Web Technologies 56
14
Strengths of XML Schema Document Structure Description 2.0
Modularization
An Introduction to XML and Web Technologies 57 An Introduction to XML and Web Technologies 58
OASIS + ISO competitor to XML Schema For a valid instance document, the root element
must match a designated pattern
Validation only (no normalization)
A pattern may match elements, attributes, or
Designed for simplicity and expressiveness, character data
solid mathematical foundation
Element patterns can contain sub-patterns, that
describe contents and attributes
An Introduction to XML and Web Technologies 59 An Introduction to XML and Web Technologies 60
15
Patterns – Regular Hedge Expressions Example
An Introduction to XML and Web Technologies 61 An Introduction to XML and Web Technologies 62
</grammar>
An Introduction to XML and Web Technologies 63 An Introduction to XML and Web Technologies 64
16
RecipeML with RELAX NG (1/5) RecipeML with RELAX NG (2/5)
An Introduction to XML and Web Technologies 65 An Introduction to XML and Web Technologies 66
An Introduction to XML and Web Technologies 67 An Introduction to XML and Web Technologies 68
17
RecipeML with RELAX NG (5/5) Summary
<define name="element-related">
<element name="related"> schema: formal description of the syntax of an
<attribute name="ref">
<data datatypeLibrary="http://relaxng.org/..." type="IDREF"/>
XML language
</attribute>
</element>
</define>
DTD: simple schema language
<define name="PERCENTAGE"> • elements, attributes, entities, ...
<data type="string">
<param name="pattern">([0-9]|[1-9][0-9]|100)%</param> XML Schema: more advanced schema language
</data>
</define> • element/attribute declarations
<define name="NUMBER">
• simple types, complex types, type derivations
<data type="decimal"><param name="minInclusive">0</param></data> • global vs. local descriptions
</define>
• ...
</grammar>
An Introduction to XML and Web Technologies 69 An Introduction to XML and Web Technologies 70
http://www.w3.org/TR/xml11/
http://www.w3.org/TR/xmlschema-1/
http://www.w3.org/TR/xmlschema-2/
18