0432 XML DTD and XML Schema
0432 XML DTD and XML Schema
XML Schema
Introduction to Databases
CompSci 316 Fall 2014
2
…
5
Semi-structured data
• Observation: most data have some structure, e.g.:
• Book: chapters, sections, titles, paragraphs, references,
index, etc.
• Item for sale: name, picture, price (range), ratings,
promotions, etc.
• Web page: HTML
• Ideas:
• Ensure data is “well-formatted”
• If needed, ensure data is also “well-structured”
• But make it easy to define and extend this structure
• Make data “self-describing”
6
"
#
$
" %
'
' & '
• Text-based
• Capture data (content), not presentation
• Data self-describes its structure
• Names and nesting of tags have meanings!
8
A tree representation
/ &
" # $ "
%
/ / / &
* / * &
/ 2
/ & &
5 1
/
12
• And more…
13
DTD explained
8 B9;CDE :
is the root element of the document
8EFEAE,; G 'HI One or more
consists of a sequence of one or more ' elements
8EFEAE,; ' G ! ! 6! 6! / I
Zero or one
Zero or more
' consists of a , zero or more ,
an optional , and zero or more / ’s, in sequence
8";;F*+; ' *+ , * JKELM*KE
' has a required *+ , attribute which is a unique identifier
8";;F*+; ' / 9 ";" J*ADF*E
' has an optional (J*ADF*E )
price attribute which contains
character data ' *+ ,-.*+ ,1 . / -. 0 .
"
#
Other attribute types include $
" %
* KE (reference to an * ),
* KE + (space-separated list of references), ' &
enumerated list, etc.
15
Using DTD
• DTD can be included in the XML source file
• 6)5 7 -. 0 .6
8 B9;CDE :
& &
=
& &
& &
• 6)5 7 -. 0 .6
8 B9;CDE 5 DM F*9 .1 %?9 ; O#;AF 0 + / E,.
. < 22202?0 ;K ) 5 ; ) 5 1 / 0 .
5
& &
5
17
XML Schema
• A more powerful way of defining the structure and
constraining the contents of XML documents
• An XML Schema definition is itself an XML
document
• Typically stored as a standalone .xsd file
• XML (data) documents refer to external .xsd files
• W3C recommendation
• Unlike DTD, XML Schema is separate from the XML
specification
21
& &
) < / 5
22
XSD example
) < 5 5 -. '. We are now defining an element named '
) </ 5 ); Declares a structure with child elements/attributes as opposed to just text)
) < R / Declares a sequence of child elements, like “(…, …, …)” in DTD
) < 5 5 -. . -.) < . A leaf element with string content
) < 5 5 -. . -.) < .
5 B// -. . 5 )B// -. . Like in DTD
) < 5 5 -. . -.) < . Like 6 in DTD
5 B// -. . 5 )B// -. .
) < 5 5 -. . -.) < . A leaf element with integer content
5 B// -. . 5 )B// -. .
) < 5 -. / . Like / in DTD; / is defined elsewhere
5 B// -. . 5 )B// -. .
) < R /
) < 5 -.*+ ,. -.) < . -. R .
Declares an attribute under '… and this attribute is required
) < 5 -. / . -.) < / 5 . -. .
) </ 5 ); This attribute has a decimal value, and it is optional
) < 5
23
Named types
• Define once:
) </ 5 ); 5 -. 5 ; ) ; . 5 ) -. .
) </ / 5 B// -. . 5 )B// -. .
) < 5 5 -. . -.) < .
) < 5 5 -. . -.) < .
) </ /
) </ 5 );
Restrictions
) < 5 ; 5 -. / ; .
) < / -.) < / 5 .
) <5 * / 7 7 -. 0 .
) < /
) < 5 ;
) < 5 ; 5 -. ; .
) < / -.) < .
) < 5 7 -. /'.
) < 5 7 -. /'.
) < 5 7 -. .
) < /
) < 5 ;
27
Keys
) < 5 5 -. .
) </ 5 ); & & ) </ 5 );
) <' 5 -. 'S .
) < / ) -.0 '.
) < ) -.T*+ ,.
) <'
) < 5
• Under any element, elements
reachable by selector “0 '” (i.e., ' child
elements) must have unique values for field “T*+ ,”
(i.e., *+ , attributes)
• In general, a key can consist of multiple fields (multiple
) < elements under ) <' )
• More on XPath in next lecture
28
Foreign keys
• Suppose content can reference books
) < 5 5 -./ . ) < 5 5 -. .
) </ 5 ); 5 ) -. . ) </ 5 ); & & ) </ 5 );
) </ / 5 B// -. . 5 )B// -. . ) <' 5 -. 'S .
) < 5 5 -. . -.) < . ) < / ) -.0 '.
) < 5 5 -. . -.) < . ) < ) -.T*+ ,.
) < 5 5 -. '1 . ) <'
) <' 5 -. ' S .
) </ 5 );
-. 'S .
) < 5 -.*+ ,. ) < / ) -.0 '1 .
-.) < . ) < ) -.T*+ ,.
) </ 5 ); ) <'
) < 5 ) < 5
) </ /
) </ 5 );
) < 5
Case study
• Design an XML document representing cities,
counties, and states
• For states, record name and capital (city)
• For counties, record name, area, and location (state)
• For cities, record name, population, and location (county
and state)
• Assume the following:
• Names of states are unique
• Names of counties are only unique within a state
• Names of cities are only unique within a county
• A city is always located in a single county
• A county is always located in a single state
32
A possible design
(
5 ) <
/ (/ ( ) < …
5 ) <
) < / 5 / / …
) <
5 ) <
) <
/ / …
Declare S in ( with
Selector 0
Field T 5 Declare / * + S in with
Selector 0 /
Declare / * 9 S in / with
Field T 5
Selector 0 /
Field T 5
Declare / * S in ( with
Selector 0 / /
Field T
Declare / 9 * S K in ( referencing / * S , with
Selector 0
Field T/ (/ (