XML Basics PDF
XML Basics PDF
2
Web Technologies and SOA 2
What Is XML?
SAX/DOM REST/
Core of a broader ecosystem SOAP+
WSDL
– Data – XML (also RDF, Ch. 12) HTTP
DTD/
– Schema – DTD and XML Schema Schema
XML
– Programmatic access – DOM and SAX
– Query – XPath, XSLT, XQuery
– Distributed programs – Web services Database Document Web
Service
6
Web Technologies and SOA 6
Outline
p-i element
Root
text
?xml dblp
mastersthesis article
mdate mdate
key key
2002… author title year school editor title journal volume year ee ee
1992 2002…
ms/Brown92 The… 1997
tr/dec/…
PRPL…
Digital… db/labs/dec
Kurt P…. Univ…. Paul R.
SRC… http://www.
10
Web Technologies and SOA 10
XML Easily Encodes Relations
<student-course-grade>
<tuple sid=“1” cid=“570103” exp-grade=“B”/>
<tuple sid=“23” cid=“550103” exp-grade=“A”/>
</student-course-grade>
11
XML is “Semi-Structured”
<parents>
<parent name=“Jean” >
<son>John</son>
<daughter>Joan</daughter>
<daughter>Jill</daughter>
</parent>
<parent name=“Feng”>
<daughter>Ella</daughter>
</parent>
…
12
Web Technologies and SOA 12
Combining XML from Multiple Sources with the Same
Tags: Namespaces
13
Web Technologies and SOA 13
Outline
We also need:
– An idea of (at least part of) the structure
– Some knowledge of how to interpret the tags…
15
Web Technologies and SOA 15
Structural Constraints:
Document Type Definitions (DTDs)
17
Web Technologies and SOA 17
The Limitations of DTDs
(Note there are other XML schema formats like RELAX NG)
19
Web Technologies and SOA 19
Basics of XML Schema
20
Web Technologies and SOA 20
Simple XML Schema Example
Associates “xsd” namespace
with XML Schema
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name=“mastersthesis" type=“ThesisType"/>
This is the root element,
<xsd:complexType name=“ThesisType"> with type specified below
<xsd:attribute name=“mdate" type="xsd:date"/>
<xsd:attribute name=“key" type="xsd:string"/>
<xsd:attribute name=“advisor" type="xsd:string"/>
<xsd:sequence>
<xsd:element name=“author" type=“xsd:string"/>
<xsd:element name=“title" type=“xsd:string"/>
<xsd:element name=“year" type=“xsd:integer"/>
<xsd:element name=“school" type=“xsd:string”/>
<xsd:element name=“committeemember" type=“CommitteeType”
minOccurs=“0"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema> 21
Web Technologies and SOA 21
Designing an XML Schema/DTD
22
Web Technologies and SOA 22
Outline
24
Web Technologies and SOA 24
Querying XML
25
Web Technologies and SOA 25
Querying XML
29
Web Technologies and SOA 29
Recall Our XML Tree root attribute
p-i element
Root
text
?xml dblp
mastersthesis article
mdate mdate
key key
2002… author title year school editor title journal volume year ee ee
1992 2002…
ms/Brown92 The… 1997
tr/dec/…
PRPL…
Digital… db/labs/dec
Kurt P…. Univ…. Paul R.
SRC… http://www.
30
Web Technologies and SOA 30
Some Example XPath Queries
• /dblp/mastersthesis/title
• /dblp/*/editor
• //title
• //title/text()
31
Web Technologies and SOA 31
Context Nodes and Relative Paths
32
Web Technologies and SOA 32
Predicates – Selection Operations
/dblp/article[title = “Paper1”]
/dblp/article[./title/text() = “Paper1”]
33
Web Technologies and SOA 33
Axes: More Complex Traversals
Thus far, we’ve seen XPath expressions that go down the tree
(and up one step)
– But we might want to go up, left, right, etc. via axes:
• self::path-step
• child::path-step parent::path-step
• descendant::path-step ancestor::path-step
• descendant-or-self::path-step ancestor-or-self::path-step
• preceding-sibling::path-step following-sibling::path-step
• preceding::path-step following::path-step
– The previous XPaths we saw were in “abbreviated form”
/child::dblp/child::mastersthesis/child::title
/descendant-or-self::title
34
Web Technologies and SOA 34
Querying Order
child::article[fn::position() = fn::last()]
35
Web Technologies and SOA 35
XPath Is Used within Many Standards
36
Web Technologies and SOA 36
XPath Is Used to Express XML Schema Keys &
Foreign Keys
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:complexType name=“ThesisType">
<xsd:attribute name=“key" type="xsd:string"/>
<xsd:sequence>
<xsd:element name=“author" type=“xsd:string"/> …
<xsd:element name=“school" type=“xsd:string”/> …
</xsd:sequence>
</xsd:complexType> Foreign key refers
<xsd:element name=“dblp”> <xsd:sequence> to key by its ID
<xsd:element name=“mastersthesis" type=“ThesisType">
<xsd:keyref name=“schoolRef” refer=“schoolId">
<xsd:selector xpath=“./school”/> <xsd:field xpath=“./text()"/>
</xsd:keyref> </xsd:element>
<xsd:element name=“university" type=“SchoolType“>…</xsd:element>
</xsd:sequence>
<xsd:key name=“schoolId">
<xsd:selector xpath=“university”/><xsd:field xpath="@key"/>
</xsd:key> </xsd:element> </xsd:schema>
Item w/key = selector
Field is its key 37
Web Technologies and SOA 37
Beyond XPath: XQuery
38
Web Technologies and SOA 38
XQuery’s Basic Form
p-i element
Root
text
?xml dblp
mastersthesis article
mdate mdate
key key
2002… author title year school editor title journal volume year ee ee
1992 2002…
ms/Brown92 The… 1997
tr/dec/…
PRPL…
Digital… db/labs/dec
Kurt P…. Univ…. Paul R.
SRC… http://www.
40
Web Technologies and SOA 40
“Iterations” in XQuery
A series of (possibly nested) FOR statements assigning the results of XPaths to variables
41
Web Technologies and SOA 41
Two XQuery Examples
<root-tag> {
for $p in doc (“dblp.xml”)/dblp/article,
$yr in $p/yr
where $yr = “1997”
return <paper> { $p/title } </paper>
} </root-tag>
42
Web Technologies and SOA 42
Restructuring Data in XQuery
Nesting XML trees is perhaps the most common operation
In XQuery, it’s easy – put a subquery in the return clause where you want things to
repeat!
for $u in doc(“dblp.xml”)/dblp/university
where $u/country = “USA”
return <ms-theses-99>
{ $u/name } {
for $mt in doc(“dblp.xml”)/dblp/mastersthesis
where $mt/year/text() = “1999” and $mt/school = $u/name
return $mt/title }
</ms-theses-99>
43
Web Technologies and SOA 43
Collections & Aggregation in XQuery
44
Web Technologies and SOA 44
Collections, Ctd.
45
Web Technologies and SOA 45
Distinct-ness
46
Web Technologies and SOA 46
Sorting in XQuery
47
Web Technologies and SOA 47
Querying & Defining Metadata
48
Web Technologies and SOA 48
Views in XQuery
• A view is a named query
• We use the name of the view to invoke the query
(treating it as if it were the relation it returns)
49
Web Technologies and SOA 49
Outline
53
Web Technologies and SOA 53
The Second Key: Finite Automata
• Convert each XPath to an equivalent regular
expression
//year year
∑
Matching an XPath
• Assume a “cursor” on active state in the automaton
• On matching open-tag: push advance active state
• On close-tag: pop active state
59
Web Technologies and SOA 59
From XPaths to XQueries
• An XQuery takes multiple XPaths in the FOR/LET
clauses, and iterates over the elements of each XPath
(binding the variable to each)
FOR $rootElement in doc(“dblp.xml”)/dblp,
$rootChild in $rootElement/article[author=“Bob”],
$textContent in $rootChild/text()
– We can think of an XQuery as doing tree matching, which
returns tuples ($i, $j) for each tree matching $i and $j in a
document
article
$ rootChild
author = “Bob”
operator tagging
(<editor>Paul R. McJones</editor>,
<title>The 1995…</title>,
Π
<text>Paul R. McJones</text><text>The 1995…</text>)
hild $ editor
editor
XPath $rootC
title
matcher $ title set
Relational-style ⊐⋈
...
...
[“Paul R. McJones”,”The 1995…”, …]) (<text>Paul R. McJones</text>)
(<text>The 1995…</text>)
XML
tagging
(“Paul R. McJones”)
(“The 1995…”)
ntent $ txt
XPath $textCo
matcher
Streaming XPath Π
(<dblp>…</dblp>, <article>…</article>, [“Paul R. McJones”,”The 1995…”, …])
... dblp
l $ rootElement
.xm
evaluation Streaming d
XPath
b l p
$ rootChild
article
text()
author =
“B
ob”
$ textContent set
Σ
<dblp>...
dblp.xml 64
Web Technologies and SOA
Optimizing XQueries
68
Web Technologies and SOA 68
Mapping Example between
Two XML Schemas
Target: Publications by book Source: Publications by author
<authors>
<pubs> <author>*
<book>* <full-name>
<publication>*
<title> <title>
<author>* <pub-type>
<name>
Has an entity-relationship model representation like:
publication writtenBy author
<pubs>
<book>
{: $a IN document(“…”)/authors/author,
$an IN $a/full-name,
$t IN $a/publication/title,
$typ IN $a/publication/pub-type Output one
WHERE $typ = “book” :} book per
match to
<title>{$t}</title> author
<author><name>{$an}</name></author>
</book> Insert title and author
</pubs> name subelements
70
Web Technologies and SOA 70
Example Piazza-XML Mapping
Merge elements if they are
<pubs> for the same value of $t
<book piazza:id={$t}>
{: $a IN document(“…”)/authors/author,
$an IN $a/full-name,
$t IN $a/publication/title,
$typ IN $a/publication/pub-type Output one
WHERE $typ = “book” :} book per
match to
<title piazza:id={$t}>{$t}</title> author
<author><name>{$an}</name></author>
</book> Insert title and author
</pubs> name subelements
71
Web Technologies and SOA 71
A More Formal Model:
Nested TGDs
The underpinnings of the Piazza-XML mapping language
can be captured using nested tuple-generating dependencies
(nested TGDs)
– Recall relational TGDs from Chapter 3
X , Y , S ( ( X , Y ) (S ) Z , T ( ( X , Z ) (T )))
<title piazza:id={$t}>{$t}</title>
<author><name>{$an}</name></author>
</book>
</pubs>
authors(author ) author ( f , publicatio n) publicatio n(t , book )
p( pubs(book ) bookt (t , author ' , publisher ) author 't , f ( f ) publishert ( p))