0% found this document useful (0 votes)
17 views

Structured Web Document XML

This document provides an overview of structured web documents in XML. It compares HTML and XML, explaining that XML allows for more semantic structure that makes documents more easily interpretable by machines. Key differences discussed include XML's use of user-defined tags to add meaning, inclusion of attributes, and representation of documents as trees. The lecture also outlines how XML documents are further defined through techniques like DTDs and XML Schema.

Uploaded by

Aqsa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Structured Web Document XML

This document provides an overview of structured web documents in XML. It compares HTML and XML, explaining that XML allows for more semantic structure that makes documents more easily interpretable by machines. Key differences discussed include XML's use of user-defined tags to add meaning, inclusion of attributes, and representation of documents as trees. The lecture also outlines how XML documents are further defined through techniques like DTDs and XML Schema.

Uploaded by

Aqsa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 124

Lecture 19-20

Structured Web Documents in XML

Mr. Mubashir Ali


Lecture (Dept. of CS)

[email protected]

1 Chapter 2 A Semantic Web Primer


An HTML Example

<h2>Nonmonotonic Reasoning: Context-


Dependent Reasoning</h2>
<i>by <b>V. Marek</b> and
<b>M. Truszczynski</b></i><br>
Springer 1993<br>
ISBN 0387976892

2 Chapter 2 A Semantic Web Primer


The Same Example in XML

<book>
<title>Nonmonotonic Reasoning: Context-
Dependent Reasoning</title>
<author>V. Marek</author>
<author>M. Truszczynski</author>
<publisher>Springer</publisher>
<year>1993</year>
<ISBN>0387976892</ISBN>
</book>

3 Chapter 2 A Semantic Web Primer


HTML versus XML: Similarities

 Both use tags (e.g. <h2> and </year>)


 Tags may be nested (tags within tags)
 Human users can read and interpret both
HTML and XML representations quite easily
… But how about machines?

4 Chapter 2 A Semantic Web Primer


Problems with Automated
Interpretation of HTML Documents

An intelligent agent trying to retrieve the names


of the authors of the book
 Authors’ names could appear immediately
after the title
 or immediately after the word by
 Are there two authors?
 Or just one, called “V. Marek and M.
Truszczynski”?
5 Chapter 2 A Semantic Web Primer
HTML vs XML: Structural Information

 HTML documents do not contain structural


information: pieces of the document and their
relationships.
 XML more easily accessible to machines because
– Every piece of information is described.
– Relations are also defined through the nesting structure.
– E.g., the <author> tags appear within the <book> tags, so
they describe properties of the particular book.

6 Chapter 2 A Semantic Web Primer


HTML vs XML: Structural Information (2)

 A machine processing the XML document


would be able to deduce that
– the author element refers to the enclosing book
element
– rather than by proximity considerations
 XML allows the definition of constraints on
values
– E.g. a year must be a number of four digits

7 Chapter 2 A Semantic Web Primer


HTML vs XML: Formatting

 The HTML representation provides more


than the XML representation:
– The formatting of the document is also described
 Τhe main use of an HTML document is to
display information: it must define formatting
 XML: separation of content from display
– same information can be displayed in different
ways

8 Chapter 2 A Semantic Web Primer


HTML vs XML: Another Example

 In HTML
<h2>Relationship matter-energy</h2>
<i> E = M × c2 </i>
 In XML
<equation>
<meaning>Relationship matter
energy</meaning>
<leftside> E </leftside>
<rightside> M × c2 </rightside>
</equation>
9 Chapter 2 A Semantic Web Primer
HTML vs XML: Different Use of Tags

 In both HTML docs same tags


 In XML completely different

 HTML tags define display: color, lists …


 XML tags not fixed: user definable tags
 XML meta markup language: language for
defining markup languages

10 Chapter 2 A Semantic Web Primer


XML Vocabularies

 Web applications must agree on common


vocabularies to communicate and collaborate
 Communities and business sectors are
defining their specialized vocabularies
– mathematics (MathML)
– bioinformatics (BSML)
– human resources (HRML)
– …

11 Chapter 2 A Semantic Web Primer


Lecture Outline

1. Introduction
2. Detailed Description of XML
3. Structuring
a) DTDs
b) XML Schema
4. Namespaces
5. Accessing, querying XML documents: XPath
6. Transformations: XSLT
12 Chapter 2 A Semantic Web Primer
The XML Language

An XML document consists of


 a prolog
 a number of elements
 an optional epilog (not discussed)

13 Chapter 2 A Semantic Web Primer


Prolog of an XML Document

The prolog consists of


 an XML declaration and
 an optional reference to external structuring
documents

<?xml version="1.0" encoding="UTF-16"?>

<!DOCTYPE book SYSTEM "book.dtd">


14 Chapter 2 A Semantic Web Primer
XML Elements

 The “things” the XML document talks about


– E.g. books, authors, publishers
 An element consists of:
– an opening tag
– the content
– a closing tag
<lecturer>Mubashir Ali</lecturer>

15 Chapter 2 A Semantic Web Primer


XML Elements (2)

 Tag names can be chosen almost freely.


 The first character must be a letter, an
underscore, or a colon
 No name may begin with the string “xml” in
any combination of cases
– E.g. “Xml”, “xML”

16 Chapter 2 A Semantic Web Primer


Content of XML Elements

 Content may be text, or other elements, or nothing

<lecturer>
<name>David Billington</name>
<phone> +61 − 7 − 3875 507 </phone>
</lecturer>

 If there is no content, then the element is called


empty; it is abbreviated as follows:
<lecturer/> for <lecturer></lecturer>

17 Chapter 2 A Semantic Web Primer


XML Attributes

 An empty element is not necessarily


meaningless
– It may have some properties in terms of attributes
 An attribute is a name-value pair inside the
opening tag of an element
<lecturer name="David Billington"
phone="+61 − 7 − 3875 507"/>

18 Chapter 2 A Semantic Web Primer


XML Attributes: An Example

<order orderNo="23456" customer="John Smith"


date="October 15, 2002">
<item itemNo="a528" quantity="1"/>
<item itemNo="c817" quantity="3"/>
</order>

19 Chapter 2 A Semantic Web Primer


The Same Example without Attributes

<order>
<orderNo>23456</orderNo>
<customer>John Smith</customer>
<date>October 15, 2002</date>
<item>
<itemNo>a528</itemNo>
<quantity>1</quantity>
</item>
<item>
<itemNo>c817</itemNo>
<quantity>3</quantity>
</item>
</order>
20 Chapter 2 A Semantic Web Primer
XML Elements vs Attributes

 Attributes can be replaced by elements


 When to use elements and when attributes is
a matter of taste
 But attributes cannot be nested

21 Chapter 2 A Semantic Web Primer


Further Components of XML Docs

 Comments
– A piece of text that is to be ignored by parser
– <!-- This is a comment -->
 Processing Instructions (PIs)
– Define procedural attachments
– <?stylesheet type="text/css"
href="mystyle.css"?>

22 Chapter 2 A Semantic Web Primer


Well-Formed XML Documents

 Syntactically correct documents


 Some syntactic rules:
– Only one outermost element (called root element)
– Each element contains an opening and a
corresponding closing tag
– Tags may not overlap
 <author><name>Lee Hong</author></name>
– Attributes within an element have unique names
– Element and tag names must be permissible

23 Chapter 2 A Semantic Web Primer


The Tree Model of XML Documents:
An Example

<email>
<head>
<from name=“hameed"
address=“[email protected]"/>
<to name=“shahzaib"
address="[email protected]"/>
<subject>Where is your draft?</subject>
</head>
<body>
shahzaib, where is the draft of the paper you promised me
last week?
</body>
</email>

24 Chapter 2 A Semantic Web Primer


The Tree Model of XML Documents:
An Example (2)

25 Chapter 2 A Semantic Web Primer


The Tree Model of XML Docs

 The tree representation of an XML document


is an ordered labeled tree:
– There is exactly one root
– There are no cycles
– Each non-root node has exactly one parent
– Each node has a label.
– The order of elements is important
– … but the order of attributes is not important

26 Chapter 2 A Semantic Web Primer


Lecture Outline

1. Introduction
2. Detailed Description of XML
3. Structuring
a) DTDs
b) XML Schema
4. Namespaces
5. Accessing, querying XML documents: XPath
6. Transformations: XSLT
27 Chapter 2 A Semantic Web Primer
Structuring XML Documents

 Define all the element and attribute names


that may be used
 Define the structure
– what values an attribute may take
– which elements may or must occur within other
elements, etc.
 If such structuring information exists, the
document can be validated
28 Chapter 2 A Semantic Web Primer
Structuring XML Dcuments (2)

 An XML document is valid if


– it is well-formed
– respects the structuring information it uses
 There are two ways of defining the structure
of XML documents:
– DTDs (Document Type Definition)
 the older and more restricted way
– XML Schema
 offers extended possibilities
29 Chapter 2 A Semantic Web Primer
DTD: Element Type Definition

<lecturer>
<name>David Billington</name>
<phone> +61 − 7 − 3875 507 </phone>
</lecturer>
DTD for above element (and all lecturer elements):
<!ELEMENT lecturer (name,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT phone (#PCDATA)>

30 Chapter 2 A Semantic Web Primer


The Meaning of the DTD

 The element types lecturer, name, and phone may


be used in the document
 A lecturer element contains a name element and a
phone element, in that order (sequence)
 A name element and a phone element may have
any content
 In DTDs, #PCDATA is the only atomic type for
elements

31 Chapter 2 A Semantic Web Primer


DTD: Disjunction in Element Type
Definitions

 We express that a lecturer element contains


either a name element or a phone element as
follows:
<!ELEMENT lecturer (name|phone)>
 A lecturer element contains a name element
and a phone element in any order.
<!ELEMENT lecturer((name,phone)|(phone,name))>

32 Chapter 2 A Semantic Web Primer


Example of an XML Element

<order orderNo="23456"
customer="John Smith"
date="October 15, 2002">
<item itemNo="a528" quantity="1"/>
<item itemNo="c817" quantity="3"/>
</order>

33 Chapter 2 A Semantic Web Primer


The Corresponding DTD

<!ELEMENT order (item+)>


<!ATTLIST order orderNo ID #REQUIRED
customer CDATA #REQUIRED
date CDATA #REQUIRED>

<!ELEMENT item EMPTY>


<!ATTLIST item itemNo ID #REQUIRED
quantity CDATA #REQUIRED
comments CDATA #IMPLIED>

34 Chapter 2 A Semantic Web Primer


Comments on the DTD

 The item element type is defined to be empty


 + (after item) is a cardinality operator:
– ?: appears zero times or once
– *: appears zero or more times
– +: appears one or more times
– No cardinality operator means exactly once

35 Chapter 2 A Semantic Web Primer


Comments on the DTD (2)

 In addition to defining elements, we define


attributes
 This is done in an attribute list containing:
– Name of the element type to which the list applies
– A list of triplets of attribute name, attribute type,
and value type
 Attribute name: A name that may be used in
an XML document using a DTD
36 Chapter 2 A Semantic Web Primer
DTD: Attribute Types

 Similar to predefined data types, but limited selection


 The most important types are
– CDATA, a string (sequence of characters)
– ID, a name that is unique across the entire XML document
– IDREF, a reference to another element with an ID attribute
carrying the same value as the IDREF attribute
– IDREFS, a series of IDREFs
– (v1| . . . |vn), an enumeration of all possible values
 Limitations: no dates, number ranges etc.

37 Chapter 2 A Semantic Web Primer


DTD: Attribute Value Types

 #REQUIRED
– Attribute must appear in every occurrence of the element
type in the XML document
 #IMPLIED
– The appearance of the attribute is optional
 #FIXED "value"
– Every element must have this attribute
 "value"
– This specifies the default value for the attribute

38 Chapter 2 A Semantic Web Primer


Referencing with IDREF and IDREFS

<!ELEMENT family (person*)>


<!ELEMENT person (name)>
<!ELEMENT name (#PCDATA)>
<!ATTLIST person id ID #REQUIRED
mother IDREF #IMPLIED
father IDREF #IMPLIED
children IDREFS #IMPLIED>

39 Chapter 2 A Semantic Web Primer


An XML Document Respecting the DTD

<family>
<person id="bob" mother="mary" father="peter">
<name>Bob Marley</name>
</person>
<person id="bridget" mother="mary">
<name>Bridget Jones</name>
</person>
<person id="mary" children="bob bridget">
<name>Mary Poppins</name>
</person>
<person id="peter" children="bob">
<name>Peter Marley</name>
</person>
</family>
40 Chapter 2 A Semantic Web Primer
A DTD for an Email Element

<!ELEMENT email (head,body)>


<!ELEMENT head (from,to+,cc*,subject)>
<!ELEMENT from EMPTY>
<!ATTLIST from name CDATA #IMPLIED
address CDATA #REQUIRED>
<!ELEMENT to EMPTY>
<!ATTLIST to name CDATA #IMPLIED
address CDATA #REQUIRED>

41 Chapter 2 A Semantic Web Primer


A DTD for an Email Element (2)

<!ELEMENT cc EMPTY>
<!ATTLIST cc name CDATA #IMPLIED
address CDATA #REQUIRED>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT body (text,attachment*)>
<!ELEMENT text (#PCDATA)>
<!ELEMENT attachment EMPTY>
<!ATTLIST attachment
encoding (mime|binhex) "mime"
file CDATA #REQUIRED>
42 Chapter 2 A Semantic Web Primer
Interesting Parts of the DTD

 A head element contains (in that order):


– a from element
– at least one to element
– zero or more cc elements
– a subject element
 In from, to, and cc elements
– the name attribute is not required
– the address attribute is always required

43 Chapter 2 A Semantic Web Primer


Interesting Parts of the DTD (2)

 A body element contains


– a text element
– possibly followed by a number of attachment
elements
 The encoding attribute of an attachment
element must have either the value “mime”
or “binhex”
– “mime” is the default value

44 Chapter 2 A Semantic Web Primer


Remarks on DTDs

 A DTD can be interpreted as an Extended


Backus-Naur Form (EBNF)
– <!ELEMENT email (head,body)>
– is equivalent to email ::= head body
 Recursive definitions possible in DTDs
– <!ELEMENT bintree
((bintree root bintree)|emptytree)>

45 Chapter 2 A Semantic Web Primer


Lecture Outline

1. Introduction
2. Detailed Description of XML
3. Structuring
a) DTDs
b) XML Schema
4. Namespaces
5. Accessing, querying XML documents: XPath
6. Transformations: XSLT
46 Chapter 2 A Semantic Web Primer
XML Schema

 Significantly richer language for defining the


structure of XML documents
 Its syntax is based on XML itself
– not necessary to write separate tools
 Reuse and refinement of schemas
– Expand or delete already existent schemas
 Sophisticated set of data types, compared to
DTDs (which only supports strings)

47 Chapter 2 A Semantic Web Primer


XML Schema (2)

 An XML schema is an element with an


opening tag like
<xsd:schema
xmlns:xsd="http://www.w3.org/2000/10/XM
LSchema” version="1.0">
 Structure of schema elements
– Element and attribute types using data types

48 Chapter 2 A Semantic Web Primer


Element Types

<element name="email"/>
<element name="head" minOccurs="1"
maxOccurs="1"/>
<element name="to" minOccurs="1"/>
Cardinality constraints:
 minOccurs="x" (default value 1)
 maxOccurs="x" (default value 1)
 Generalizations of *,?,+ offered by DTDs

49 Chapter 2 A Semantic Web Primer


Attribute Types

<attribute name="id" type="ID“


use="required"/>
< attribute name="speaks" type="Language"
use="default" value="en"/>
 Existence: use="x", where x may be
optional or required
 Default value: use="x" value="...", where x
may be default or fixed

50 Chapter 2 A Semantic Web Primer


Data Types

 There is a variety of built-in data types


– Numerical data types: integer, Short, Byte,
Long, Decimal, Float etc.
– String types: string, ID, IDREF, CDATA etc.
– Date and time data types: time, Month etc.
 There are also user-defined data types
– simple data types, which cannot use elements or
attributes
– complex data types, which can use these
51 Chapter 2 A Semantic Web Primer
Data Types (2)

 Complex data types are defined from already


existing data types by defining some
attributes (if any) and using:
– sequence, a sequence of existing data type
elements (order is important)
– all, a collection of elements that must appear
(order is not important)
– choice, a collection of elements, of which one will
be chosen

52 Chapter 2 A Semantic Web Primer


A Data Type Example

<complexType name="lecturerType">
<sequence>
<element name="firstname" type="string"
minOccurs="0“ maxOccurs="unbounded"/>
<element name="lastname" type="string"/>
</sequence>
<attribute name="title" type="string"
use="optional"/>
</complexType>

53 Chapter 2 A Semantic Web Primer


Data Type Extension

 Already existing data types can be extended by


new elements or attributes. Example:
<complexType name="extendedLecturerType">
<extension base="lecturerType">
<sequence>
<element name="email" type="string"
minOccurs="0" maxOccurs="1"/>
</sequence>
<attribute name="rank" type="string" use="required"/>
</extension>
</complexType>
54 Chapter 2 A Semantic Web Primer
Resulting Data Type

<complexType name="extendedLecturerType">
<sequence>
<element name="firstname" type="string"
minOccurs="0" maxOccurs="unbounded"/>
<element name="lastname" type="string"/>
<element name="email" type="string"
minOccurs="0" maxOccurs="1"/>
</sequence>
<attribute name="title" type="string" use="optional"/>
<attribute name="rank" type="string" use="required"/>
</complexType>

55 Chapter 2 A Semantic Web Primer


Data Type Extension (2)

 A hierarchical relationship exists between the


original and the extended type
– Instances of the extended type are also instances
of the original type
– They may contain additional information, but
neither less information, nor information of the
wrong type

56 Chapter 2 A Semantic Web Primer


Data Type Restriction

 An existing data type may be restricted by adding


constraints on certain values
 Restriction is not the opposite from extension
– Restriction is not achieved by deleting elements or attributes
 The following hierarchical relationship still holds:
– Instances of the restricted type are also instances of the
original type
– They satisfy at least the constraints of the original type

57 Chapter 2 A Semantic Web Primer


Example of Data Type Restriction

<complexType name="restrictedLecturerType">
<restriction base="lecturerType">
<sequence>
<element name="firstname" type="string"
minOccurs="1" maxOccurs="2"/>
</sequence>
<attribute name="title" type="string"
use="required"/>
</restriction>
</complexType>

58 Chapter 2 A Semantic Web Primer


Restriction of Simple Data Types

<simpleType name="dayOfMonth">
<restriction base="integer">
<minInclusive value="1"/>
<maxInclusive value="31"/>
</restriction>
</simpleType>

59 Chapter 2 A Semantic Web Primer


Data Type Restriction: Enumeration

<simpleType name="dayOfWeek">
<restriction base="string">
<enumeration value="Mon"/>
<enumeration value="Tue"/>
<enumeration value="Wed"/>
<enumeration value="Thu"/>
<enumeration value="Fri"/>
<enumeration value="Sat"/>
<enumeration value="Sun"/>
</restriction>
</simpleType>

60 Chapter 2 A Semantic Web Primer


XML Schema: The Email Example

<element name="email" type="emailType"/>

<complexType name="emailType">
<sequence>
<element name="head" type="headType"/>
<element name="body" type="bodyType"/>
</sequence>
</complexType>

61 Chapter 2 A Semantic Web Primer


XML Schema: The Email Example (2)

<complexType name="headType">
<sequence>
<element name="from" type="nameAddress"/>
<element name="to" type="nameAddress"
minOccurs="1" maxOccurs="unbounded"/>
<element name="cc" type="nameAddress"
minOccurs="0" maxOccurs="unbounded"/>
<element name="subject" type="string"/>
</sequence>
</complexType>

62 Chapter 2 A Semantic Web Primer


XML Schema: The Email Example (3)

<complexType name="nameAddress">
<attribute name="name" type="string"
use="optional"/>
<attribute name="address"
type="string" use="required"/>
</complexType>

 Similar for bodyType

63 Chapter 2 A Semantic Web Primer


Lecture Outline

1. Introduction
2. Detailed Description of XML
3. Structuring
a) DTDs
b) XML Schema
4. Namespaces
5. Accessing, querying XML documents: XPath
6. Transformations: XSLT
64 Chapter 2 A Semantic Web Primer
Namespaces

 An XML document may use more than one


DTD or schema
 Since each structuring document was
developed independently, name clashes may
appear
 The solution is to use a different prefix for
each DTD or schema
– prefix:name
65 Chapter 2 A Semantic Web Primer
An Example
<vu:instructors xmlns:vu="http://www.vu.com/empDTD"
xmlns:gu="http://www.gu.au/empDTD"
xmlns:uky="http://www.uky.edu/empDTD">

<uky:faculty uky:title="assistant professor"


uky:name="John Smith"
uky:department="Computer Science"/>

<gu:academicStaff gu:title="lecturer"
gu:name="Mate Jones"
gu:school="Information Technology"/>
</vu:instructors>

66 Chapter 2 A Semantic Web Primer


Namespace Declarations

 Namespaces are declared within an element


and can be used in that element and any of
its children (elements and attributes)
 A namespace declaration has the form:
– xmlns:prefix="location"
– location is the address of the DTD or schema
 If a prefix is not specified: xmlns="location"
then the location is used by default
67 Chapter 2 A Semantic Web Primer
Lecture Outline

1. Introduction
2. Detailed Description of XML
3. Structuring
a) DTDs
b) XML Schema
4. Namespaces
5. Accessing, querying XML documents: XPath
6. Transformations: XSLT
68 Chapter 2 A Semantic Web Primer
Addressing and Querying XML
Documents

 In relational databases, parts of a database


can be selected and retrieved using SQL
– Same necessary for XML documents
– Query languages: XQuery, XQL, XML-QL
 The central concept of XML query languages
is a path expression
– Specifies how a node or a set of nodes, in the
tree representation of the XML document can be
reached

69 Chapter 2 A Semantic Web Primer


XPath

 XPath is core for XML query languages


 Language for addressing parts of an XML
document.
– It operates on the tree data model of XML
– It has a non-XML syntax

70 Chapter 2 A Semantic Web Primer


Types of Path Expressions

 Absolute (starting at the root of the tree)


– Syntactically they begin with the symbol /
– It refers to the root of the document (situated one
level above the root element of the document)
 Relative to a context node

71 Chapter 2 A Semantic Web Primer


An XML Example
<library location="Bremen">
<author name="Henry Wise">
<book title="Artificial Intelligence"/>
<book title="Modern Web Services"/>
<book title="Theory of Computation"/>
</author>
<author name="William Smart">
<book title="Artificial Intelligence"/>
</author>
<author name="Cynthia Singleton">
<book title="The Semantic Web"/>
<book title="Browser Technology Revised"/>
</author>
</library>
72 Chapter 2 A Semantic Web Primer
Tree Representation

73 Chapter 2 A Semantic Web Primer


Examples of Path Expressions in
XPath

 Address all author elements


/library/author
 Addresses all author elements that are
children of the library element node, which
resides immediately below the root
 /t1/.../tn, where each ti+1 is a child node of
ti, is a path through the tree representation

74 Chapter 2 A Semantic Web Primer


Examples of Path Expressions in
XPath (2)

 Address all author elements


//author
 Here // says that we should consider all
elements in the document and check
whether they are of type author
 This path expression addresses all author
elements anywhere in the document

75 Chapter 2 A Semantic Web Primer


Examples of Path Expressions in
XPath (3)

 Address the location attribute nodes within


library element nodes
/library/@location

 The symbol @ is used to denote attribute


nodes

76 Chapter 2 A Semantic Web Primer


Examples of Path Expressions in
XPath (4)

 Address all title attribute nodes within book


elements anywhere in the document, which
have the value “Artificial Intelligence”
//book/@title="Artificial Intelligence"

77 Chapter 2 A Semantic Web Primer


Examples of Path Expressions in
XPath (5)

 Address all books with title “Artificial Intelligence”


//book[@title="Artificial Intelligence"]
 Test within square brackets: a filter expression
– It restricts the set of addressed nodes.
 Difference with query 4.
– Query 5 addresses book elements, the title of which
satisfies a certain condition.
– Query 4 collects title attribute nodes of book elements

78 Chapter 2 A Semantic Web Primer


Tree Representation of Query 4

79 Chapter 2 A Semantic Web Primer


Tree Representation of Query 5

80 Chapter 2 A Semantic Web Primer


Examples of Path Expressions in
XPath (6)

 Address the first author element node in the XML


document
//author[1]
 Address the last book element within the first
author element node in the document
//author[1]/book[last()]
 Address all book element nodes without a title
attribute
//book[not @title]

81 Chapter 2 A Semantic Web Primer


General Form of Path Expressions

 A path expression consists of a series of


steps, separated by slashes
 A step consists of
– An axis specifier,
– A node test, and
– An optional predicate

82 Chapter 2 A Semantic Web Primer


General Form of Path Expressions (2)

 An axis specifier determines the tree


relationship between the nodes to be
addressed and the context node
– E.g. parent, ancestor, child (the default), sibling,
attribute node
– // is such an axis specifier: descendant or self

83 Chapter 2 A Semantic Web Primer


General Form of Path Expressions (3)

 A node test specifies which nodes to


address
– The most common node tests are element names
– E.g., * addresses all element nodes
– comment() addresses all comment nodes

84 Chapter 2 A Semantic Web Primer


General Form of Path Expressions (4)

 Predicates (or filter expressions) are


optional and are used to refine the set of
addressed nodes
– E.g., the expression [1] selects the first node
– [position()=last()] selects the last node
– [position() mod 2 =0] selects the even nodes
 XPath has a more complicated full syntax.
– We have only presented the abbreviated syntax

85 Chapter 2 A Semantic Web Primer


86 Chapter 2 A Semantic Web Primer
87 Chapter 2 A Semantic Web Primer
88 Chapter 2 A Semantic Web Primer
89 Chapter 2 A Semantic Web Primer
90 Chapter 2 A Semantic Web Primer
91 Chapter 2 A Semantic Web Primer
92 Chapter 2 A Semantic Web Primer
93 Chapter 2 A Semantic Web Primer
94 Chapter 2 A Semantic Web Primer
95 Chapter 2 A Semantic Web Primer
96 Chapter 2 A Semantic Web Primer
97 Chapter 2 A Semantic Web Primer
98 Chapter 2 A Semantic Web Primer
99 Chapter 2 A Semantic Web Primer
100 Chapter 2 A Semantic Web Primer
101 Chapter 2 A Semantic Web Primer
Lecture Outline

1. Introduction
2. Detailed Description of XML
3. Structuring
a) DTDs
b) XML Schema
4. Namespaces
5. Accessing, querying XML documents: XPath
6. Transformations: XSLT
102 Chapter 2 A Semantic Web Primer
Displaying XML Documents

<author>
<name>Grigoris Antoniou</name>
<affiliation>University of Bremen</affiliation>
<email>[email protected]</email>
</author>
may be displayed in different ways:
Grigoris Antoniou Grigoris Antoniou
University of Bremen University of Bremen
[email protected] [email protected]
103 Chapter 2 A Semantic Web Primer
Style Sheets

 Style sheets can be written in various


languages
– E.g. CSS2 (cascading style sheets level 2)
– XSL (extensible stylesheet language)
 XSL includes
– a transformation language (XSLT)
– a formatting language
– Both are XML applications

104 Chapter 2 A Semantic Web Primer


XSL Transformations (XSLT)

 XSLT specifies rules with which an input XML


document is transformed to
– another XML document
– an HTML document
– plain text
 The output document may use the same DTD or
schema, or a completely different vocabulary
 XSLT can be used independently of the formatting
language

105 Chapter 2 A Semantic Web Primer


XSLT (2)

 Move data and metadata from one XML


representation to another
 XSLT is chosen when applications that use different
DTDs or schemas need to communicate
 XSLT can be used for machine processing of content
without any regard to displaying the information for
people to read.
 In the following we use XSLT only to display XML
documents

106 Chapter 2 A Semantic Web Primer


XSLT Transformation into HTML

<xsl:template match="/author">
<html>
<head><title>An author</title></head>
<body bgcolor="white">
<b><xsl:value-of select="name"/></b><br>
<xsl:value-of select="affiliation"/><br>
<i><xsl:value-of select="email"/></i>
</body>
</html>
</xsl:template>

107 Chapter 2 A Semantic Web Primer


Style Sheet Output

<html>
<head><title>An author</title></head>
<body bgcolor="white">
<b>Grigoris Antoniou</b><br>
University of Bremen<br>
<i>[email protected]</i>
</body>
</html>

108 Chapter 2 A Semantic Web Primer


Observations About XSLT

 XSLT documents are XML documents


– XSLT resides on top of XML
 The XSLT document defines a template
– In this case an HTML document, with some
placeholders for content to be inserted
 xsl:value-of retrieves the value of an
element and copies it into the output
document
– It places some content into the template
109 Chapter 2 A Semantic Web Primer
A Template

<html>
<head><title>An author</title></head>
<body bgcolor="white">
<b>...</b><br>
...<br>
<i>...</i>
</body>
</html>

110 Chapter 2 A Semantic Web Primer


Auxiliary Templates

 We have an XML document with details of


several authors
 It is a waste of effort to treat each author
element separately
 In such cases, a special template is defined for
author elements, which is used by the main
template

111 Chapter 2 A Semantic Web Primer


Example of an Auxiliary Template
<authors>
<author>
<name>Grigoris Antoniou</name>
<affiliation>University of Bremen</affiliation>
<email>[email protected]</email>
</author>
<author>
<name>David Billington</name>
<affiliation>Griffith University</affiliation>
<email>[email protected]</email>
</author>
</authors>

112 Chapter 2 A Semantic Web Primer


Example of an Auxiliary Template (2)

<xsl:template match="/">
<html>
<head><title>Authors</title></head>
<body bgcolor="white">
<xsl:apply-templates select="authors"/>
<!-- Apply templates for AUTHORS
children -->
</body>
</html>
</xsl:template>

113 Chapter 2 A Semantic Web Primer


Example of an Auxiliary Template (3)

<xsl:template match="authors">
<xsl:apply-templates select="author"/>
</xsl:template>
<xsl:template match="author">
<h2><xsl:value-of select="name"/></h2>
Affiliation:<xsl:value-of
select="affiliation"/><br>
Email: <xsl:value-of select="email"/>
<p>
</xsl:template>
114 Chapter 2 A Semantic Web Primer
Multiple Authors Output
<html>
<head><title>Authors</title></head>
<body bgcolor="white">
<h2>Grigoris Antoniou</h2>
Affiliation: University of Bremen<br>
Email: [email protected]
<p>
<h2>David Billington</h2>
Affiliation: Griffith University<br>
Email: [email protected]
<p>
</body>
</html>

115 Chapter 2 A Semantic Web Primer


Explanation of the Example

 xsl:apply-templates element causes all children of


the context node to be matched against the selected
path expression
– E.g., if the current template applies to /, then the element
xsl:apply-templates applies to the root element
– I.e. the authors element (/ is located above the root
element)
– If the current context node is the authors element, then the
element xsl:apply-templates select="author" causes the
template for the author elements to be applied to all author
children of the authors element

116 Chapter 2 A Semantic Web Primer


Explanation of the Example (2)

 It is good practice to define a template for


each element type in the document
– Even if no specific processing is applied to certain
elements, the xsl:apply-templates element
should be used
– E.g. authors
 In this way, we work from the root to the
leaves of the tree, and all templates are
applied

117 Chapter 2 A Semantic Web Primer


Processing XML Attributes

 Suppose we wish to transform to itself the element:


<person firstname="John" lastname="Woo"/>

 Wrong solution:
<xsl:template match="person">
<person firstname="<xsl:value-of
select="@firstname">"
lastname="<xsl:value-of select="@lastname">"/>
</xsl:template>

118 Chapter 2 A Semantic Web Primer


Processing XML Attributes (2)

 Not well-formed because tags are not


allowed within the values of attributes
 We wish to add attribute values into template
<xsl:template match="person">
<person
firstname="{@firstname}"
lastname="{@lastname}"/>
</xsl:template>

119 Chapter 2 A Semantic Web Primer


Transforming an XML Document to
Another

120 Chapter 2 A Semantic Web Primer


Transforming an XML Document to
Another (2)

<xsl:template match="/">
<?xml version="1.0" encoding="UTF-16"?>
<authors>
<xsl:apply-templates select="authors"/>
</authors>
</xsl:template>

<xsl:template match="authors">
<author>
<xsl:apply-templates select="author"/>
</author>
</xsl:template>

121 Chapter 2 A Semantic Web Primer


Transforming an XML Document to
Another (3)

<xsl:template match="author">
<name><xsl:value-of select="name"/></name>
<contact>
<institution>
<xsl:value-of select="affiliation"/>
</institution>
<email><xsl:value-of select="email"/></email>
</contact>
</xsl:template>

122 Chapter 2 A Semantic Web Primer


Summary

 XML is a metalanguage that allows users to


define markup
 XML separates content and structure from
formatting
 XML is the de facto standard for the
representation and exchange of structured
information on the Web
 XML is supported by query languages

123 Chapter 2 A Semantic Web Primer


Points for Discussion in Subsequent
Chapters

 The nesting of tags does not have standard meaning


 The semantics of XML documents is not accessible
to machines, only to people
 Collaboration and exchange are supported if there is
underlying shared understanding of the vocabulary
 XML is well-suited for close collaboration, where
domain- or community-based vocabularies are used
– It is not so well-suited for global communication.

124 Chapter 2 A Semantic Web Primer

You might also like