Representing Web Data: XML: Web Technologies A Computer Science Perspective
Representing Web Data: XML: Web Technologies A Computer Science Perspective
JEFFREY C. JACKSON
1
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML
Example XML document:
An XML document is one that follows certain syntax rules (most of which we followed for XHTML)
2
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Syntax
An XML document consists of
Markup
Tags, which begin with < and end with > References, which begin with & and end with ;
Character, e.g.   Entity, e.g. < The entities lt, gt, amp, apos, and quot are recognized in every XML document. Other XHTML entities, such as nbsp, are only recognized in other XML documents if they are defined in the DTD
XML Syntax
Comments
Begin with <!- End --> Must not contain
CDATA section
Special element the entire content of which is interpreted as character data, even if it appears to be markup Begins with <![CDATA[ Ends with ]]> (illegal except when ending CDATA)
4
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Syntax
The CDATA section
5
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Syntax
< and & must be represented by references except
When beginning markup Within comments Within CDATA sections
6
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Syntax
Element tags and elements
Three types
Start, e.g. <message> End, e.g. </message> Empty element, e.g. <br />
Start and end tags must properly nest Corresponding pair of start and end element tags plus everything in between them defines an element Character data may only appear within an element
7
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Syntax
Start and empty-element tags may contain attribute specifications separated by white space
Syntax: name = quoted value quoted value must not contain <, can contain & only if used as start of reference quoted value must begin and end with matching quote characters ( or )
8
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Syntax
Element and attribute names are case sensitive XML white space characters are space, carriage return, line feed, and tab
9
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Documents
A well-formed XML document
follows the XML syntax rules and has a single root element
Well-formed documents have a tree structure Many XML parsers (software for reading/writing XML documents) use tree representation internally
10
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Documents
An XML document is written according to an XML vocabulary that defines
Recognized element and attribute names Allowable element content Semantics of elements and attributes
XHTML is one widely-used XML vocabulary Another example: RSS (rich site summary)
11
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Documents
12
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Documents
13
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Documents
Valid names and content for an XML vocabulary can be specified using
Natural language XML DTDs (Chapter 2) XML Schema (Chapter 9)
If DTD is used, then XML document can include a document type declaration:
14
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Documents
Two types of XML parsers:
Validating
Requires document type declaration Generates error if document does not
Conform with DTD and Meet XML validity constraints Example: every attribute value of type ID must be unique within the document
Non-validating
Checks for well-formedness Can ignore external DTD
15
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Documents
Good practice to begin XML documents with an XML declaration
Minimal example: If included, < must be very first character of the document To override default UTF-8/UTF-16 character encoding, include encoding declaration following version:
16
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Namespaces
XML Namespace: Collection of element and attribute names associated with an XML vocabulary Namespace Name: Absolute URI that is the name of the namespace
Ex: http://www.w3.org/1999/xhtml is the namespace name of XHTML 1.0
Default namespace for elements of a document is specified using a form of the xmlns attribute:
17
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Namespaces
Another form of xmlns attribute known as a namespace declaration can be used to associate a namespace prefix with a namespace name:
Namespace prefix
Namespace declaration
18
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Namespaces
Example use of namespace prefix:
19
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Namespaces
In a namespace-aware XML application, all element and attribute names are considered qualified names
A qualified name has an associated expanded name that consists of a namespace name and a local name Ex: item is a qualified name with expanded name <null, item> Ex: xhtml:a is a qualified name with expanded name <http://www.w3.org/1999/xhtml, a>
20
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Namespaces
Other namespace usage:
A namespace can be declared and used on the same element
21
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XML Namespaces
Other namespace usage:
22
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
24
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
Java-based DOM
Java DOM API defined by org.w3c.dom package Semantically similar to JavaScript DOM API, but many small syntactic differences
Nodes of DOM tree belong to classes such as Node, Document, Element, Text Non-method properties accessed via methods
Ex: parentNode accessed by calling getParentNode()
26
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
Java-based DOM
Methods such as getElementsByTagName() return instance of NodeList
getLength() method returns # of items item() method returns an item
27
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
Java-based DOM
Default parser is non-validating and nonnamespace-aware.
28
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XSL
The Extensible Stylesheet Language (XSL) is an XML vocabulary typically used to transform XML documents from one form to another form
XSL document
XSLT Processor
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
XSL
Components of XSL:
XSL Transformations (XSLT): defines XSL namespace elements and attributes XML Path Language (XPath): used in many XSL attribute values (ex: child::message)
XSL Formatting Objects (XSL-FO): XML vocabulary for defining document style (printoriented)
30
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
31
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0
32
Jackson, Web Technologies: A Computer Science Perspective, 2007 Prentice-Hall, Inc. All rights reserved. 0-13-185603-0