0% found this document useful (0 votes)
223 views296 pages

Introducing XML: Beginning XML Joe Fawcett, Liam R.E. Quin, and Danny Ayers John Wiley & Sons, Inc., 2012

xml

Uploaded by

Tamer AlSelmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
223 views296 pages

Introducing XML: Beginning XML Joe Fawcett, Liam R.E. Quin, and Danny Ayers John Wiley & Sons, Inc., 2012

xml

Uploaded by

Tamer AlSelmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 296

Beginning XML

Joe Fawcett, Liam R.E. Quin, and Danny Ayers


John Wiley & Sons, Inc., 2012

Chapter 1

Introducing XML

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
STEPS LEADING UP TO XML: DATA
REPRESENTATION
AND MARKUPS
 There are two main uses for XML:
 One is a way to represent low-level data, for example
configuration files.
 The second is a way to add metadata to documents for
example, you may want to stress a particular sentence in a
report by putting it in italics or bold, or  information about
videos that you uploaded in a standard XML format.
 Metadata is "data about data" Or “information about
information”. Metadata describes other data. It provides
information about a certain item's content. For example, an
image may include metadata that describes how large the
picture is, the color depth, the image resolution, when the
image was created, and other data.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
XML Usage
 The first usage for XML is meant as a
replacement for the more traditional ways this
has been done before, usually by means of
lists of name/value pairs as is seen in
Windows’ INI or Java’s Property files.
 second application of XML is similar to how
HTML files work. The document text is
contained in an overall container, “tags”.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
XML solution
 The problem is intercommunication, the increased use of
the Internet and extensive existence of distributed
applications, particularly those that rely on components
designed and managed by different parties
 XML was conceived as a solution to this kind of problem
 it is meant to make passing data between different
components much easier and relieve the need to
continually worry about different formats of input and
output, freeing up developers to concentrate on the more
important aspects of coding such as the business logic.
 XML is also seen as a solution to the question of whether fi
les should be easily readable by software or by humans;
XML’s aim is to be both.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
data representation
 you can store data in files using two ways:
 as binary files, or
 as text files

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Binary Files

 A binary file, at its simplest, is just a stream of bits (1s and 0s). binary
files can only be read and produced by certain computer programs
 For example when saving a document in Microsoft Word, using a version
before 2003, the file created (which has a doc extension) is in a binary
format.
 If you open the file in a text editor such as Notepad, you won’t be able to
see a picture of the original Word document (rubbish).
 The characters in the document other than the actual text are metadata,
literally information about information.
 Metadata can specify things such as which words should be shown in
bold, what text is to be displayed in a table, and so on.
 To interpret this file you need the help of the application that created it.
 Without the help of a converter, you won’t be able to open a MS word
document with another similar application such as WordPerfect.
 The main advantage of binary formats is that they are concise and can be
expressed in a relatively small space.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Text Files
 The main difference between text and binary files is that text fi les are human
and machine readable.
 each group of bits represents a character from a known set. (ASCII code)
 This means that many different applications can read text files.
 Windows machine you have a choice of Notepad, WordPad, and others.
 The ability to be read and understood by both humans and machines is not the
only advantage
 of text files;
 they are also comparatively easier to parse than binary files.
 The main disadvantage is their size
 Another disadvantage of text fi les is their lack of support for metadata.
 A human and application readable files including metadata brings us to the
subject of markup.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
A Brief History of Markup
 The advantages of text files made it the preferred choice over binary files
 But, people wanted to standardize how metadata could be added.
 Most agreed that markup, but The main two questions were:
➤ How can metadata be differentiated from the basic text?
➤ What metadata is allowed?
 To answer the above questions, a definition called Standard Generalized
Markup Language was released, commonly shortened to SGML.
 SGML is a step removed from defining an actual markup language, such
as the Hyper Text Markup Language, or HTML.
 Instead it relays how markup languages are to be defined.
 SGML allows you to create your own markup language

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
THE BIRTH OF XML
 different types of markup, suffered from one major failing:
 it was very complicated. All the flexibility came at a cost.
 but it needed to be simpler.
 With this goal in mind, a small working group began working
in the mid-1990s on a subset of SGML known as Extensible
Markup Language (XML).
 The first working draft was published in 1996.
 two years later the W3C published a revised version on
February 10, 1998.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
XML, SGML, HTML
 XML is derived as a subset of SGML
 whereas HTML is an application of SGML.
 XML doesn’t dictate the overall format of a file
or what metadata can be added
 it just specifies a few rules.
 That means it retains a lot of the file
extensibility of SGML without most of the
complexity.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
example
 suppose you have a standard text fi le containing a list of
application users:
Joe Fawcett
Danny Ayers
Catherine Middleton
 This file has no metadata
 It’s depend on your own knowledge of how names are typically
represented in the western world.
 Now look at these names as they might appear in an XML
document:
<applicationUsers>
<user firstName=”Joe” lastName=”Fawcett” />
<user firstName=”Danny” lastName=”Ayers” />
<user firstName=”Catherine” lastName=”Middleton” />
</applicationUsers>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 Using the XML format rather than the plain text
version,
 it’s much easier to map these data items within
the application itself so they can be handled
correctly.
 The two common features of virtually all XML
file are called elements and attributes.
 the elements are applicationUsers and user
 the attributes are firstName and lastName.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
.Cont
 A big disadvantage of this metadata, however, is the
consequent increase in the size of the file.
 The metadata adds about 130 extra characters to the
fi le’s original 43 character size
 an increase of more than 300 percent.
 The creators of XML decided that the power of
metadata warranted this increase
 Later on in the book you’ll see a number of ways to
minimize the size of an XML file if needed.
 However, all these methods are, to some extent, a
tradeoff against readability and ease of use.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Example: differences in how XML fi les
can be handled compared to plain text fi
les.
 Opening an XML File in a Browser
 Create a new text fi le in Notepad, or an
equivalent simple text editor, and paste in the
list of names first shown earlier.
 Save this fi le at a convenient location as
appUsers.txt.
 Next, open a browser and paste the path to
appUsers.txt into the address bar.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
.Cont
 Now create another text file based on the
XML version and save it as appUsers.xml.
<applicationUsers>
<user firstName=”Joe” lastName=”Fawcett” />
<user firstName=”Danny” lastName=”Ayers” />
<user firstName=”Catherine” lastName=”Middleton” />
</applicationUsers>

 Open this file

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
How It Works
 Browsers use an XML stylesheet or transformation to display
XML files.
 An XML stylesheet is used to convert from a particular XML
format to another or from XML to HTML
 Transformations are covered in depth in Chapter 8, “XSLT.”
 If you want to view the default style sheet that Firefox uses to
display XML, type
chrome://global/content/xml/XMLPrettyPrint.xsl into the
Firefox address bar
 IE has a similar built-in style sheet but it’s not so easily
viewable

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
MORE ADVANTAGES OF
XML
 One of the aims of XML is to implement a
clear separation between data and
presentation.
 for example when moving data across a
network bandwidth is not wasted by having to
carry redundant information concerned only
with the look and feel.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
XML Rules
 In order to maintain this clear separation
 For instance, in the appUsers.xml fi le you saw
 values of the users’ first and last names were within quotes;
this is a prerequisite for XML files;
 therefore, the followingwould not be considered XML:
<applicationUsers>
<user firstName=Joe lastName=Fawcett />
<user firstName=Danny lastName=Ayers />
<user firstName=Catherine lastName=Middleton />
</applicationUsers>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 The need for quotes in turn makes it easy to tell when certain
data is missing, for example here:
<applicationUsers>
<user lastName=”Fawcett” />
<user lastName=”Ayers” />
<user lastName=”Middleton” />
</applicationUsers>
 None of the users has a first name.
 Now, your application may find that acceptable or it may not
 This means unsuitable fi les can be rejected

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 Another advantage is the easy extensibility of XML files.
 If you want to add more data, perhaps a middle name for
example,
 you can do that easily by creating a new attribute,
middleName:
<applicationUsers>
<user firstName=”Joe” middleName=”John” lastName=”Fawcett” />
<user firstName=”Danny” middleName=”John” lastName=”Ayers” />
<user firstName=”Catherine” middleName=”Elizabeth” lastName=”Middleton” />
</applicationUsers>
 Now the older version of the application can still consume
this data and simply ignore the middle name information
 while the new versions can take advantage of it.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 This is more difficult to accomplish if the data is in
the type of text file, such as:
Joe John Fawcett
Danny John Ayers
Catherine Elizabeth Middleton

 If the extra data is added to the middle column:


 he existing application will probably misinterpret it.
 And it’s may cause a problems in parsing the file.
 Why???

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Hierarchical Data Representation
 for instance a filesystem.
 Path Type
 C:\folder
 C:\pagefile.sys file
 C:\Program Files folder
 C:\Program Files\desktop.ini file
 C:\Program Files\Microsoft folder
 C:\Program Files\Mozilla folder
 C:\Windows folder
 C:\Windows\System32 folder
 C:\Temp folder
 C:\Temp\~123.tmp file
 C:\Temp\~345.tmp file

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 As you can see, this is not pretty and the information is hard for us humans to read
 XML version of the same information:
 <folder name=”C:\”>
 <folder name=”Program Files”>
 <folder name=”Microsoft”>
 </folder>
 <folder name=”Mozilla”>
 </folder>
 </folder>
 <folder name=”Windows>
 <folder name=”System32”>
 </folder>
 </folder>
 <folder name=”Temp”>
 <files>
 <file name=”~123.tmp”></file>
 <file name=”~345.tmp”></file>
 </files>
 </folder>
 <files>
 <file name=”pagefile.sys”></file>
 </files>
 </folder>
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Interoperability
 It is much quicker to agree on or publish an
XML format and use that to exchange data
between different applications.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
XML IN PRACTICE
 You can find the latest W3C XML Recommendation
at www.w3.org/TR/xml.
 JSON stands for JavaScript Object Notation and is
discussed more in Chapters 14 and 16 which relate
to web services and Ajax.
 If you need more information in the meantime, visit:
www.json.org.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
current uses of XML
 Data Versus Document
 The previous examples are known as data-centric uses of
XML.
 raw data is combined with markup to help give it meaning
and to make it:
1) easier to use
2) enable greater interoperability
 There is a second major use of XML which is known as
document-centric
 Document-centric XML is generally used to facilitate
multiple publishing channels
 It useful in which content changes need to be applied to
multiple forms of media at once.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
XML Scenarios
 XML is used for representing and storing data.
 Some common scenarios in which XML is used:
 Configuration Files
Visual Studio project files and software build process in Java are examples of XML.
 Web Services
Web services are web application components. It is a software function provided at a network address over the web with the
service always on. SOAP (stands for Simple Object Access Protocol) and It relies on XML. now, many have the option to use JSON
 Web Content
there are a lot of content stored as plain XML, which is transformed either server-side or client-side when needed.
 Document Management
XML is also used in document-management systems to store and keep track of documents and manage metadata. XML is used to store
information such as a document’s author, the date of creation, and any modifications
 Database Systems
Most modern high-end database systems, such as Oracle and SQL Server, can store XML documents. MySQL, provide such a column
type, designed specifically to store XML.
 Image Representation
Vector images can be represented with XML, the SVG format being the most popular. Scalable Vector Graphics (SVG) is an XML-
based vector image format for two-dimensional graphics that has support for interactivity and animation. 
 Business Interoperability
Hundreds of industries now have standard XML formats to describe the different entities that are used in day-to-day transactions, such
as: medical data, financial data etc.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
XML Technologies
 To enable the preceding scenarios you can use a number of associated technologies:
 XML Parsers
Before any work can be done with an XML document it needs to be parsed; that is, broken down into its constituent parts. a number of XML
parsers are available — some free, some as commercial products ex: MSXML (Microsoft Core XML Services),
System.Xml.XmlDocument, Saxon, Java built-in parser and Xerces
 The Document Object Model
the DOM is a tree-like representation of an XML document for extracting or inserting data. DOM can be applied to XML, HTML and script
libraries such as jQuery.
 DTDs and XML Schemas
Both document type definitions (DTDs) and XML Schemas serve to describe the definition of an XML document, its structure, and what data is
allowed where.
 XML Namespaces
namespaces serve as a way of grouping XML. Like package on java
 Xpath
It enables you to target specifi c elements or attributes. It works similar to how paths in a file-system work
 XSLT
One of the main places you find XPath is XSLT. Extensible Stylesheet Language Transformations (XSLT) is powerful way to transform files
from one format to another.
 Xquery
XQuery shares many features with XSLT. XQuery was designed to query XML data.
 XML Pipelines
XML pipelines are used when single atomic steps are insuffi cient to achieve the output you desire. An XML Pipeline specifies a sequence of
operations to be performed on zero or more XML documents.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
An Example in HTML
<table border=‘1’>
<tr style=‘background:black;color:white’>
<th>Item
<th>Price
</tr>
<tr valign=‘top’ style=‘background:silver’>
<td>BK123 - <u>Care and Feeding of Wombats</u>
<td>$42.00
</tr> Price Item
</table> $42.00 BK123 - Care and Feeding of
Wombats
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
The Same Thing in XML

<order>
<item code=‘BK123’>
<name>Care and Feeding of Wombats</name>
<price currency=‘USD’>42.00</price>
</item>
</order> - <order>
- <item code="BK123">
  <name>Care and Feeding of Wombats</name>
  <price currency="USD">42.00</price>
  </item>
 </order>
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Beginning XML
Joe Fawcett, Liam R.E. Quin, and Danny Ayers
John Wiley & Sons, Inc., 2012

Chapter 2
Well-Formed XML

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
XML rules
 In this chapter you’ll be examining the rules that apply to a
document that decide whether or not it is XML.
 WHAT DOES WELL-FORMED MEAN?
 well-formed XML means a document that follows the W3C’s
XML Recommendation
 CREATING XML IN A TEXT EDITOR
 Creating XML in a text editor, something as simple as
Notepad in Windows or Vim in Linux
 There are many advanced tools for XML validation such as,
Altova, editix2014, oxygen, and many open source tools like
xpontus,… etc.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
Forbidden Characters
 These rules vary slightly depending on version 1.0 or
1.1.
 Both versions forbid the use of null
 In version 1.0 you are also forbidden to use the
characters represented by the hexadecimal codes
between 0x01 and 0x19, except, tab , newline,
carriage return and space
 Carriage Return in UTF-8 XML: &#13;
Line Feed in UTF-8 XML: &#10;

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.cont
 XML Prolog
 The first part of a document is the prolog.
 It is optional so you won’t see it every time,
 The prolog begins with an XML declaration
 <?xml version=”1.0”?>
 Sometimes the declaration may also contain
information about the encoding used in the document:
 <?xml version=”1.0” encoding=”UTF-8”?>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 Encoding with Unicode
 Encoding is the process of turning characters into their equivalent binary
representation.
 Some encodings use only a single byte, or eight bits; others use more.
 Unicode is a text encoding specification
 Two main encoding systems use Unicode: UTF-8 and UTF-16.
 UTF stands for UCS Transformation Format, and UCS itself means
Universal Character Set.
 UTF-8 encoding is probably best because it has a wide range of
characters and is supported by all XML parsers.
 UTF-8 encoding is also the default assumed if no specific encoding is
declared.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 Completing the Declaration
 The final part of the declaration is determining whether the document is
considered to be standalone:
 <?xml version=”1.0” encoding=”UTF-8” standalone=”yes”?>
 Standalone applies only to documents that specify a DTD
 If XML document isn’t using a DTD, instead relay on schema you can
set the standalone declaration to yes or leave it out altogether.
 If you were to ever use a DTD, an example for an XHTML document
would look something like this:

<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN”


“www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>.
 Comments:
<!-- and terminated by -->.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Creating Elements
 All elements are defined in one of two ways.
 start tag: <myElement> and close tag </myElement>
 You can add spaces after the name in a start tag,
such as <myElement >, but not before the name as in
< myElement>.
 There is an alternative syntax used to defi ne an
element, and this can only be used for elements with
no content: <myElement />
 This sort of element is known as self-closing.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 Naming Styles
 Naming Specifications
 rules used when naming elements
 An element name can begin with either an underscore or an
uppercase or lowercase letter from the Unicode character set.
 Subsequent characters can also be a dash (-) or a digit.
 Names are case-sensitive, so the start and end tags must
match exactly.
 Names cannot contain spaces
 Names beginning with the letters XML, either in uppercase-
or lowercase, are reserved, and shouldn’t be used (although
many parsers allow them in practice).
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
example

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 The next step after writing the prolog is
creating the root element.
 All documents must have one and only one
root element.
 Everything else in the document lies under
this element to form a hierarchical tree.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
example
 One example is when using XML as a logging format. A typical log file might look like this:
(whithout root)
 <entry date=”2012-03-03T10:09:53” type=”audit”>Failed logon attempt
 with username jfawcett</entry>
 <entry date=”2012-03-03T10:11:01” type=”audit”>Successful
 logon attempt with username jfawcett</entry>
 <entry date=”2012-03-03T10:12:11” type=”information”>Successful folder
 synchronisation for use jfawcett</entry>
 you have to add root one to make it well-formed:
 <log>
 <entry date=”2012-03-03T10:09:53” type=”audit”>Failed logon attempt
 with username jfawcett</entry>
 <entry date=”2012-03-03T10:11:01” type=”audit”>Successful
 logon attempt with username jfawcett</entry>
 <entry date=”2012-03-03T10:12:11” type=”information”>
 Successful folder synchronisation for use jfawcett</entry>
 </log>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 Other Elements
 Underneath the root element can lie other elements that
follow the same rules for naming and attributes and, as you
saw earlier, there can also be free text.
 For example,
 your root element could be <person> and the elements
underneath could show the person’s characteristics, such as
<biography>and <address>.
 Alternatively, your main element could be <people> and
underneath that you could have one or more <person>
elements,

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 Remember that all elements must be nested
underneath the root element,
 so the following sort of markup, which you may
have gotten away with in HTML, is not allowed:
 <myElement>
 <elementA><elementB></elementA></elementB>
 </myElement>
 You can’t have the end tag of an element before the
end tag of one nested below it.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
.Cont
 Attributes
 Elements are one of the two main building blocks of XML — the other
one is attributes.
 Attributes are name-value pairs associated with an element.
 <myElement myFirstAttribute=”One” mySecondAttribute=”Two”>
</myElement>
 A number of rules also govern attributes exist:
 Attributes consist of a name and a value separated by an equals sign.
 The attribute value must be in quotes, You can use either single or double
quotes, but you can’t mix them in a single attribute.
 There must be a value part, even if it’s just empty quotes.
 Attribute names must be unique per element.
 If you use double quotes as the delimiter you can’t also use them as part
of the value. The same applies for single quotes.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Element and Attribute Content
 Attribute values and elements can both contain
character data (called text in normal parlance).
<myElement>Here is some character
content</myElement>
 Addition rules:

 Two characters cannot appear in attribute values or


direct element content:
 ampersand (&) and the left angle bracket (<).

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Entity and Character References
 There are five entity references in XML

 These characters cannot be used directly,


 either because they are forbidden by the specification or because they
don’t exist in the encoding you have chosen.
 You can declare your own if you want using a DTD
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Elements Versus Attributes
 You can represent data as an element or an attribute.
<applicationUsers>
<user firstName=”Joe” lastName=”Fawcett” />
<user firstName=”Danny” lastName=”Ayers” />
<user firstName=”Catherine” lastName=”Middleton” />
</applicationUsers>
 You could choose to represent the users’ first names and last names as elements
<applicationUsers>
<user>
<firstName>Joe</firstName>
<lastName>Fawcett</lastName>
</user>
<user>
<firstName>Danny</firstName>
<lastName>Ayers</lastName>
</user>
<user>
<firstName>Catherine</firstName>
<lastName>Middleton</lastName>
</user>
</applicationUsers>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
When to Use Elements
 Elements are useful when the data is not a simple type:
 Therefore use this:
<person firstName=”Joe” lastName=”Fawcett”>
<address>
<line1>Chapter House</line1>
<line2>Crucifix Lane</line2>
<city>London</city>
<postCode>SE1 3JW</postCode>
<country>England</country>
</address>
</person>
 Rather than this:
<person
firstName=”Joe”
lastName=”Fawcett”
address=”Chapter House, Crucifix Lane, London, SE1 3JW, England” />

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 Elements are also better when items may need to be repeated.
<applicationUsers>
<user firstName=”Joe” lastName=”Fawcett”>
<roles>
<role name=”administrators” />
<role name=”general” />
</roles
<!-- other users here -->
</applicationUsers>
 Notice how each role can have only one name, so an attribute rather than
an element represents that portion.
 Another plus for elements is that they can be ordered.
 You can place attributes in a special order in your document but the XML
parser may ignore this order for processing purposes.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
.Cont
 The other major case for using an element is when you have a
large amount of content that is just text.
 you could use an attribute but that would mean you could get
a file that looks like this:

<longDocument data=”In here is a very long piece of text that


goes on for many,
many,
many,
many,
lines” />
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Processing Instructions
 The Processing Instruction, or PI, is used to communicate
with the application that is consuming the XML.
 A common PI is the one that tells a browser to perform a
transformation on the XML and looks like this:
<?xml-stylesheet type=”text/xsl” href=”appUsers.xslt” ?>
 In this case a browser will recognize the target as saying that
the XML should be transformed beforebeing shown;
 the first attribute states that the type of the transform is XSL
 the second attribute points to its location.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
CDATA Sections
 CDATA stands for character data and means that no markup
is present.
 For example, suppose you have a simple document that
contains information that makes use of the less than sign (<).
 Normally this is taken as part of the markup so it must be
escaped using the entity reference &lt;.
<conversionData>
1 kilometer &lt; 1 mile
1 pint &lt; 1 liter
1 pound &lt; 1 kilogram
</conversionData>
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
.Cont
 If you’d prefer the text to use the readily recognizable < sign, which
makes it easier for humans to read and write,
 you can mark the element’s contents as a CDATA section:
<conversionData><![CDATA[
1 kilometer < 1 mile
1 pint < 1 liter
1 pound < 1 kilogram
]]>
</conversionData>
 The CDATA section starts with <![CDATA[ and ends with ]]>.
 Anything inside is considered text and you can use any characters that
normally need escaping

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Example XML Document (1 of 6)
 XML declaration

01 <?xml version=‘1.0’ encoding=‘Shift_JIS’?>


02 <!DOCTYPE order SYSTEM ‘grammar.dtd’>
03 <?xml-stylesheet type=‘text/xsl’ href=‘style.xsl’?>
04 <order>
05 <item code=‘BK123’>
06 <name>Care and Feeding of Wombats</name>
07 <price currency=‘USD’>42.00</price>
08 </item>
Dr. Khaled ALqawasmi, Zarqa University,
09 </order> 2013/2014
Example XML Document (2 of 6)
 Document type declaration

01 <?xml version=‘1.0’ encoding=‘Shift_JIS’?>


02 <!DOCTYPE order SYSTEM ‘grammar.dtd’>
03 <?xml-stylesheet type=‘text/xsl’ href=‘style.xsl’?>
04 <order>
05 <item code=‘BK123’>
06 <name>Care and Feeding of Wombats</name>
07 <price currency=‘USD’>42.00</price>
08 </item>
Dr. Khaled ALqawasmi, Zarqa University,
09 </order> 2013/2014
Example XML Document (3 of 6)
 Processing instructions

01 <?xml version=‘1.0’ encoding=‘Shift_JIS’?>


02 <!DOCTYPE order SYSTEM ‘grammar.dtd’>
03 <?xml-stylesheet type=‘text/xsl’ href=‘style.xsl’?>
04 <order>
05 <item code=‘BK123’>
06 <name>Care and Feeding of Wombats</name>
07 <price currency=‘USD’>42.00</price>
08 </item>
Dr. Khaled ALqawasmi, Zarqa University,
09 </order> 2013/2014
Example XML Document (4 of 6)
 Element tags

01 <?xml version=‘1.0’ encoding=‘Shift_JIS’?>


02 <!DOCTYPE order SYSTEM ‘grammar.dtd’>
03 <?xml-stylesheet type=‘text/xsl’ href=‘style.xsl’?>
04 <order>
05 <item code=‘BK123’>
06 <name>Care and Feeding of Wombats</name>
07 <price currency=‘USD’>42.00</price>
08 </item>
Dr. Khaled ALqawasmi, Zarqa University,
09 </order> 2013/2014
Example XML Document (5 of 6)
 Attributes of element tags

01 <?xml version=‘1.0’ encoding=‘Shift_JIS’?>


02 <!DOCTYPE order SYSTEM ‘grammar.dtd’>
03 <?xml-stylesheet type=‘text/xsl’ href=‘style.xsl’?>
04 <order>
05 <item code=‘BK123’>
06 <name>Care and Feeding of Wombats</name>
07 <price currency=‘USD’>42.00</price>
08 </item>
Dr. Khaled ALqawasmi, Zarqa University,
09 </order> 2013/2014
Example XML Document (6 of 6)
 Text content

01 <?xml version=‘1.0’ encoding=‘Shift_JIS’?>


02 <!DOCTYPE order SYSTEM ‘grammar.dtd’>
03 <?xml-stylesheet type=‘text/xsl’ href=‘style.xsl’?>
04 <order>
05 <item code=‘BK123’>
06 <name>Care and Feeding of Wombats</name>
07 <price currency=‘USD’>42.00</price>
08 </item>
Dr. Khaled ALqawasmi, Zarqa University,
09 </order> 2013/2014
Differences with HTML
 Elements must be balanced, properly nested
 e.g. <br /> OK
 e.g. <b>bold <i> and italic </i> text</b> OK
 e.g. <b>bold <i> and italic </b> text</i> BAD!
 e.g. <ul> <li> list item </ul>BAD!
 Attributes must be specified, quoted
 e.g. <img src=‘images/banner.gif’/> OK
 e.g. <img src=images/banner.gif /> BAD!
 e.g. <ul compact> <li> list item </li> </ul> BAD!

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Using a Browser to Find Errors
<?xml version="1.0" encoding="utf-8"?>
<pangrams createdOn="2012-01-04T10:19:45’>
<!-- This file is designed to show
how errors are reported in a browser -->
<pangram>The quick brown fox jumps over the lazy dog.</pangram>
<pangram>Pack my box with five dozen liquor jugs.</pangram>
<pangram>Glib jocks quiz nymph to vex dwarf.</pangram>
<pangram>The five boxing wizards jump quickly.</Pangram>
<pangram>What you write deserves better than a jiggling, shaky,
inexact & questionably fuzzy approximation of blur</pangram>
<pangram> one < seven , true or false </pangram>
</pangrams>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Other Important Points
 Documents must be well-formed
 Document contains single root element
 Elements are balanced and properly nested
 Attributes are specified and quoted
 Text content contains legal XML characters
 Documents may be valid
 Document structure and content follows rules
specified by grammar (e.g. DTD, XML Schema)
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Useful Links
 XML 1.0 Specification
 http://www.w3.org/TR/REC-xml
 Annotated XML 1.0 Specification
 http://www.xml.com/axml/testaxml.htm
 Informational web sites
 http://www.xml.com/
 http://www.xmlhack.com/

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Chapter 3
XML Namespaces

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
DEFINING NAMESPACES
 At their simplest, namespaces are a way of
grouping elements and attributes under a
common heading in order to differentiate
them from similarly-named items.
 namespaces work with elements and
attributes.
 You can group these items under a namespace
so that they keep their familiar name

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
WHY DO YOU NEED
?NAMESPACES
 Documents use different vocabularies
 Example_1: details about your company
employees stored as XML
 Example_2: include a brief biography in the
form of some HTML within the document.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
example
Why need to declare a namespaces
basic document
<employees> <employees>
<employee id=”001”> <employee id=”001”>
<firstName>Joe</firstName> <firstName>Joe</firstName>
<lastName>Fawcett</lastName> <lastName>Fawcett</lastName>
<title>Mr</title> <title>Mr</title>
<dateOfBirth>1962-11-19</dateOfBirth> <dateOfBirth>1962-11-19</dateOfBirth>
<dateOfHire>2005-12-05</dateOfHire> <dateOfHire>2005-12-05</dateOfHire>
<position>Head of Software Development</position> <position>Head of Software Development</position>
<biography><!-- biography here --></biography> <biography>
</employee> <html>
<!-- more employee elements can be added here--> <head>
</employees> <title>Joe’s Biography</title>
</head>
<body>
<p>After graduating from the University of Life
Joe moved into software development,
originally working with COBOL on mainframes in
the1980s.</p>
</body>
•Now without namespaces you have </html>
</biography>
a clash — two <title> elements </employee>
performing two distinct functions. <!-- more employee elements -->
</employees>
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
The Problem
 Documents use different vocabularies
 Example 1: CD music collection
 Example 2: online order transaction
 Merging multiple documents together
 Name collisions can occur
 Example 1: albums have a <name>
 Example 2: customers have a <name>
 How do you differentiate between the two?

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
The Solution: Namespaces!
 What is a namespace?
 A syntactic way to differentiate similar names in an
XML document
 namespaces were already heavily used in many
programming languages such as C# and Java.
 In C# they are actually called namespaces
 whereas Java prefers the term packages.
 Namespaces for XML is a way to make sure that your
elements and attributes have a unique name.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
URLs, URIs, and URNs
 in the real world you have two ways to create
a unique namespace: using URIs or URNs.
 difference between URLs, URIs, and URNs.
 A URL is a Uniform Resource Locator. It
specifi es the location of a resource:

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Example:
 a webpage, and how it can be retrieved
[Scheme]://[Domain]:[Port]/[Path]?[QueryString]#[FragmentId]
 The terms in square brackets are replaced by their actual
values
 and the rest of the items other than Scheme and Domain are
optional.
 So a typical web URL would be:
http://www.wrox.com/remtitle.cgi?isbn=0470114878.
 The scheme is http
 the domain is www.wrox.com
 followed by the path and a query string
 You can use many other schemes, such as FTP and HTTPS
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
.Cont
 A URI, on the other hand, is a Uniform Resource
Identifier.
 it can have the same format as a URL or it can be in
the URN format,
 It doesn’t have to point to anything tangible
 it’s just a unique string that identifi es something
 All URLs are also URIs but the opposite is not
necessarily true.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 URNs are slightly different again; the letters
stand for Uniform Resource Name.
 A URN is a name that uniquely defi nes
something.
 For example; In the non-computing world,
National number and ISBN.
 They both uniquely identify something
 Jordanian citizens and editions of books
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
.Cont
 URNs take the following format:
urn:[namespace identifier]:[namespace specific
string]
 square brackets need to be replaced by actual
values and the three-character prefix, urn, is
not case-sensitive.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 The namespace identifier is a string of characters such as
isbn, which identifi es how the namespace specific string
should be interpreted.
 Namespace identifi ers can be registered with the Internet
Assigned Numbers Authority (IANA)
 the namespace specifi c string, identifi es the actual thing
within the category set by the identifier.
 An example of a URN using a registered scheme would be:
urn:isbn:9780470114872

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 This URN uniquely identifies the fourth edition of
this book,
 but because it’s a URN, not a URL,
 it doesn’t tell you anything about how to retrieve
either the book itself or any information about it.
 URLs and URNs are both URIs;
 a URL tells you the how and where of something,
 and the URN is simply a unique name.
 Both URLs and URNs are used to create XML
Namespace URIs, as you’ll see next.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Creating Your First Namespace
 When creating your first namespace you should use the URI
format.
 most companies have their own registered domain
 it’s become fairly standard to use their domain name as a
starting point.
 Now for your user element, which in the example scenario came
from an application configuration file that may have been used
by your HR system, you might choose the full namespace:
 http://wrox.com/namespaces/applications/hr/config
 This actual string of characters chosen is known as the
namespace URI.
 Namespaces are casesensitive
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
HOW TO DECLARE A
NAMESPACE
 You can declare a namespace in two ways,
depending on:
 you want all the elements in a document to be
under the namespace
 or just a few specific elements to be under it.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 If you want all elements to be included, you
can use the following style:
 xmlns=
http://wrox.com/namespaces/applications/hr/c
onfig
 For example: if you take your appUsers.xml
file from Chapter 1.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Example
 appUsers With Default Namespace.xml
<applicationUsers
xmlns=”http://wrox.com/namespaces/applications/hr
/config”>
<user firstName=”Joe” lastName=”Fawcett” />
<user firstName=”Danny” lastName=”Ayers” />
<user firstName=”Catherine” lastName=”Middleton” />
</applicationUsers>
 This is known as declaring a default namespace

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 It is associated with the element on which it is
declared, in this case <applicationUsers>, and
any element contained within it.
 The namespace is said to be in scope for all
these elements.
 Attributes, such as firstName, are not covered
by a default namespace.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 As mentioned previously, a default namespace applies only to
elements.
 Attributes need their namespaces to be specifically declared,
 To declare a namespace explicitly you have to choose a
prefix to represent it.
 The prefix can be more or less; it follows the same naming
rules as an element or attribute, but cannot contain a colon (:).
 Say you decide to use hr as your prefix.

xmlns:hr=“http://wrox.com/namespaces/applications/hr/config“

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Example: namespace declaration that
has a prefix of hr
<applicationUsers
xmlns:hr=”http://wrox.com/namespaces/applications
/hr/config”>
<user firstName=”Joe” lastName=”Fawcett” />
<user firstName=”Danny” lastName=”Ayers” />
<user firstName=”Catherine” lastName=”Middleton” />
</applicationUsers>
 none of the elements or attributes are grouped in that
namespace.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 To associate the elements with the namespace you
have to add the prefix to the elements’ tags.
<hr:applicationUsers
xmlns:hr=”http://wrox.com/namespaces/applications/hr/config”>
<user firstName=”Joe” lastName=”Fawcett” />
<user firstName=”Danny” lastName=”Ayers” />
<user firstName=”Catherine” lastName=”Middleton” />
</hr:applicationUsers>
 Notice that the prefix, hr, has been added to both the
start and the end tags and is followed by a colon and
then the element’s local name.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
.Cont
 If you want the attributes in the document to be also in the hr
namespace you follow a similar procedure as shown:
<hr:applicationUsers xmlns:hr=”http://wrox.com/namespaces/applications/hr/config”>
<user hr:firstName=”Joe” hr:lastName=”Fawcett” />
<user hr:firstName=”Danny” hr:lastName=”Ayers” />
<user hr:firstName=”Catherine” hr:lastName=”Middleton” />
</hr:applicationUsers>
 Again the namespace prefix is prepended to the attribute’s
name and followed by a colon.
 Remember that the namespace declaration must come either
on the element that uses it
 or on one higher in the tree, an ancestor as it’s often called.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Revision: problem
 if everyone is creating personalized XML vocabularies, you’ll
soon run into a problem:
 many words are available in human languages
 a lot of them are going to be snapped up by people defining
document types
 Example: If your company feels that an <order> should
contain a certain set of information while another company
feels that it should contain a different set of information
 Both companies can even use the name <order> for entirely
different uses if desired.
 if you need to combine various XML elements from different
document types into one XML document, How can you then
further distinguish those two <order> elements.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Revision: solution
 Using Prefixes
 The best way to solve this problem is for every
element in a document to have a completely
distinct name.
 For example :
 element for XML document type gets your own
prefix
 and every XHTML element gets another prefix.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Example:xml and html doc
 XML document type containing information about a person, including that person’s title.
<?xml version="1.0"?>
<person>
<name>
<title>Sir</title>
<first>John</first>
<middle>Fitzgerald Johansen</middle>
<last>Doe</last>
</name>
<position>Vice President of Marketing</position>
<résumé>
<html>
<head><title>Resume of John Doe</title></head>
<body>
<h1>John Doe</h1>
<p>John's a great guy, you know?</p>
</body> </html>
</résumé>
</person>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Prefix example: You could rewrite the previous XML
:document to something like this
<?xml version="1.0"?>
<pers:person>
<pers:name>
<pers:title>Sir</pers:title>
<pers:first>John</pers:first>
<pers:middle>Fitzgerald Johansen</pers:middle>
<pers:last>Doe</pers:last> </pers:name>
<pers:position>Vice President of Marketing </pers:position>
<pers:résumé>
<xhtml:html>
<xhtml:head><xhtml:title>Resume of John Doe</xhtml:title></xhtml:head>
<xhtml:body>
<xhtml:h1>John Doe</xhtml:h1>
<xhtml:p>John's a great guy, you know?</xhtml:p>
</xhtml:body>
</xhtml:html>
</pers:résumé>
</pers:person>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
:The two namespaces illustrated
 Any elements with the pers prefix belong to the same “category” as each
other, just as any elements with the xhtml prefix belong to another
“category.”
 These “categories” are called namespaces.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Why Doesn’t XML Just Use These Prefixes?

 there is a drawback to the prefix approach to namespaces


used in the previous XML:
 Who will monitor the prefixes?
 The whole reason for using them is to distinguish names from
different document types
 the prefixes themselves also have to be unique.
 If one company chose the prefix pers and another company
also chose that same prefix, the original problem still exists.
 To solve this problem: you could take advantage of the
already unambiguous Internet domain names in existence
and specify that URIs must be used for the prefix names.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
How XML Namespaces Work

 To use XML namespaces in your documents,


 elements are given qualified names (QName ).
 These qualified names consist of two parts:
 the local part, which is the same as the names
we have been giving elements all along,
 and the namespace prefix, which specifies to
which namespace this name belongs.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
example
 For example, to declare a namespace called
http://www.wiley.com/pers and associate a
<person> element with that namespace, you
would do something like the following:
<pers:person xmlns:pers="http://www.wiley.com/pers"/>
 The key is the xmlns:pers attribute (xmlns
stands for XML Namespace).

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Example:Adding XML Namespaces to
Your Document
 Open Notepad and type in the following XML:
<?xml version="1.0"?>
<pers:person xmlns:pers="http://www.wiley.com/pers" xmlns:html="http://www.w3.org/1999/xhtml">
<pers:name>
<pers:title>Sir</pers:title>
<pers:first>John</pers:first>
<pers:middle>Fitzgerald Johansen</pers:middle>
<pers:last>Doe</pers:last>
</pers:name>
<pers:position>Vice President of Marketing</pers:position>
<pers:résumé>
<html:html>
<html:head><html:title>Resume of John Doe</html:title></html:head>
<html:body>
<html:h1>John Doe</html:h1>
<html:p>John's a great guy, you know?</html:p>
</html:body>
</html:html>
</pers:résumé>
</pers:person>
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
How It Works
 you declare the pers prefix, which is used to specify elements that belong
to the “pers” names-pace, and the html prefix, which is used to specify
elements that belong to the XHTML namespace.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Namespace Binding Syntax
 Binding namespaces
 Uses Uniform Resource Identifier (URI)
 e.g. “http://example.com/NS”
 Can bind to a named or “default” prefix
 Use “xmlns” attribute
 Named prefix
 e.g. <a:foo xmlns:a=‘http://example.com/NS’/>
 Default prefix
 e.g. <foo xmlns=‘http://example.com/NS’/>
 Element and attribute names are “qualified”
 URI, local part (or “local name”) pair
 e.g. { “http://example.com/NS” , “foo” }

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Example Document (1 of 3)
 Namespace binding
01 <?xml version=‘1.0’ encoding=‘UTF-8’?>
02 <order>
03 <item code=‘BK123’>
04 <name>Care and Feeding of Wombats</name>
05 <desc xmlns:html=‘http://www.w3.org/1999/xhtml’>
06 The <html:b>best</html:b> book ever written!
07 </desc>
08 </item>
09 </order>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Example Document (2 of 3)
 Namespace scope
01 <?xml version=‘1.0’ encoding=‘UTF-8’?>
02 <order>
03 <item code=‘BK123’>
04 <name>Care and Feeding of Wombats</name>
05 <desc xmlns:html=‘http://www.w3.org/1999/xhtml’>
06 The <html:b>best</html:b> book ever written!
07 </desc>
08 </item>
09 </order>
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Example Document (3 of 3)
 Bound elements
01 <?xml version=‘1.0’ encoding=‘UTF-8’?>
02 <order>
03 <item code=‘BK123’>
04 <name>Care and Feeding of Wombats</name>
05 <desc xmlns:html=‘http://www.w3.org/1999/xhtml’>
06 The <html:b>best</html:b> book ever written!
07 </desc>
08 </item>
09 </order>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Declaring More Than One
Namespace
 Many documents use more than one
namespace to group their elements.
 you have a number of choices when you need
to design XML in this fashion
 One option is to choose a default namespace
for some elements and an explicit one for
others.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
example
<applicationUsers
xmlns=”http://wrox.com/namespaces/applications/hr/config”
xmlns:ent=”http://wrox.com/namespaces/general/entities”>
<ent:user firstName=”Joe” lastName=”Fawcett” />
<ent:user firstName=”Danny” lastName=”Ayers” />
<ent:user firstName=”Catherine” lastName=”Middleton” />
</applicationUsers>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
./cont
 For more clear you can avoid the default namespace and
make both namespace declarations explicit:
<hr:applicationUsers
xmlns:hr=”http://wrox.com/namespaces/applications/hr/config”
xmlns:ent=”http://wrox.com/namespaces/general/entities”>
<ent:user firstName=”Joe” lastName=”Fawcett” />
<ent:user firstName=”Danny” lastName=”Ayers” />
<ent:user firstName=”Catherine” lastName=”Middleton” />
</hr:applicationUsers>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Important Points
 In scope means the namespace is available to be
used.
 Namespace “scope” is the element and descendents
from point of binding
 Attributes are not in element’s namespace
 Unless implicitly prefixed
 A default namespace: you don’t have to specify a
prefix for all of the elements (descendents ) that use
it.
 Can not unbind named prefixes
 However, you can unbind default prefix
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Chapter 4
Document Type Definitions
DTD

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
WHAT ARE DOCUMENT TYPE
?DEFINITIONS
 Definitions (DTDs) are a way to describe
fairly precisely the “shape” of the language.
This idea has parallels in human language.
 A DTD defines the document structure with a
list of legal elements and attributes.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Validation of XML Documents
 XML documents must be well-formed
 XML documents may be valid
 Validation verifies that the structure and content of
the document follows rules specified by grammar
 Types of grammars
 Document Type Definition (DTD)
 XML Schema (XSD)
 Relax NG (RNG)
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Working with DTDs
 There are two ways of associating a DTD with a
document: internally and externally.
 The internal approach includes the DTD within the
XML document.
 External approach Once a DTD has been developed,
typically XML documents will be associated with it
by reference. like the following:
 <!DOCTYPE html PUBLIC “-//W3C//DTD
XHTML 1.0 Transitional//EN”
 “http://www.w3.org/TR/xhtml1/DTD/xhtml1-
transitional.dtd”>
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Using jEdit
 Most Development Environments (IDEs), such as Eclipse and
NetBeans, have some XML facilities.
 For the practical examples in this chapter and the next, you’ll
be using the jEdit programmer’s editor available from
www.jedit.org.
 Once you have downloaded and installed jEdit you will need
to add XML support.
 Install xml plugin: click: Plugins menu->Plugins Manager->
install-> click the XML checkbox-> click Install button-. close
the Plugin Manager window
 you’ll use jEdit’s DTD capabilities to validate an XML
document.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
jEdit tool

•To validate the XML document :


Click the Plugins menu, and choose XML ➪ Parse XML.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
online validators
 http://cds.library.brown.edu/service/xmlvalid/

 http://www.cogsci.ed.ac.uk/~richard/xml-che
ck.html

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
example
 In this example, you embed a DTD that defi nes the <name> vocabulary directly
within an XML document.

<?xml version=”1.0”?>
<!DOCTYPE name [
<!ELEMENT name (first, middle, last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT middle (#PCDATA)>
<!ELEMENT last (#PCDATA)>
]>
<name>
<first>Joseph</first>
<middle>John</middle>
<last>Fawcett</last>
</name>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
?How it works
 Let’s break the DTD down into smaller pieces:
1) XML header: begin with the XML declaration to avoid XML version confl
icts
<?xml version=”1.0”?>
2) document type declaration, commonly referred to as the DOCTYPE: The
DOCTYPE informs the parser that a DTD is associated with this XML
document.
<!DOCTYPE name [
 The DOCTYPE informs the parser that a DTD is associated with this
XML document.
 DTD declaration must appear at the start of the document (preceded only
by the XML header)
 it is not permitted anywhere else within the document.
 The DOCTYPE declaration has an exclamation mark (!)

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
3) DTD body:This is where you declare elements,
attributes, entities, and notations:
<!ELEMENT name (first, middle, last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT middle (#PCDATA)>
<!ELEMENT last (#PCDATA)>
 you have declared several elements that make up the
vocabulary of the <name> document.
 Also, the element declarations must start with an
exclamation mark.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
.Cont
4) the declaration section of the DTD is closed
using a closing bracket and a closing angle
bracket (]>).
 This ends the definition and the XML
document immediately follows.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
The Document Type Declaration in
Detail
 The document type declaration, or DOCTYPE, informs
the parser that:
1) your document should conform to a DTD.
2) indicates where the parser can find the rest of the
definition.
 the DOCTYPE declaration:

<!DOCTYPE name [ ]>


 whitespace is not allowed to appear in between
DOCTYPE and the opening <!.
 Remember that XML is case sensitive.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 After the whitespace, the name of the XML
document’s root element must appear exactly as it
will in the document, including any namespace
prefix.
 Following the name of the root element, write the
rest of the document type declaration.
 element declarations appeared between the [ and ] of
the DTD. This called internal subset declarations.
 DTD declarations that appear in external documents
are external subset declarations.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 You can refer to an external DTD by either using system
identifiers or public identifiers.
 A system identifi er enables you to specify the location of an
external fi le containing DTD declarations.
 It is comprised of two parts: the keyword SYSTEM, and a
URI reference pointing to the document’s location.
 <!DOCTYPE name SYSTEM “name.dtd” [...]>
 The following examples use system identifiers:
<!DOCTYPE name SYSTEM “file:///c:/name.dtd” [ ]>
<!DOCTYPE name SYSTEM “http://wiley.com/hr/name.dtd” [ ]>
<!DOCTYPE name SYSTEM “name.dtd”>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 Public identifiers provide a second mechanism to locate DTD
resources and look like this:
 <!DOCTYPE name PUBLIC “-//Beginning XML//DTD
Name Example//EN”>
 instead of a reference to a fi le, public identifiers are used to
identify an entry in a catalog.
 public identifiers can follow any format; however, a
commonly used format is called Formal Public Identifiers, or
FPIs.
 The syntax for FPIs matches the following basic structure:
-//Owner//Class Description//Language//Version
 namespaces much more powerful than public identifi ers.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
ANATOMY OF A DTD
Document Structure
 DTD declarations can be broken into three
parts:
➤ Element declarations
➤ Attribute declarations
➤ Entity declarations

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Element declarations
 you must declare each element that appears within the
document.
 DTDs can include declarations for optional elements:
elements that may or may not appear in the XML
document.
 Element declarations consist of three basic parts:

➤ The ELEMENT declaration


➤ The element name
➤ The element content model
<!ELEMENT name (first, middle, last)>
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
The ELEMENT declaration
 define an element using ELEMENT keyword
 Follows by the name of the element
 After the element name, an element’s content
model, defines the allowable content within
the element.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Element Declaration
 Content models
 ANY
 EMPTY
 Children
 Nestable groups of sequences and/or choices
 Occurrences for individual elements and groups
 Mixed content
 Intermixed elements and parsed character data (text)

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
element content
 To defining a content model with element content.
 simply include the allowable elements within parentheses
 Ex: <contact> element contain a <name> element:
 <!ELEMENT contact (name)>
 the <contact> element needs to include more than just the
name.
 For example we need to include <contact> element children
such as: <name>, <location>, <phone>, <knows>, and
<description> element:

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Children Content Model
 There are two fundamental ways of specifying the element
children: Sequences, Choices and nested.
 Sequences
 the elements within these documents must appear in a distinct order.
 Order required
 e.g. <!ELEMENT contact (name, location, phone, knows,
description)>
 e.g. (foo,bar,baz)
 The parser raises an error in three instances:
 ➤ If your XML document is missing one of the elements within the
sequence
 ➤ If your document contains more elements
 ➤ If the elements appeared in another order

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 Choices
 Suppose you needed to allow one element or another, but
not both.
 <!ELEMENT location (address | GPS)> This declaration
allows the <location> element to contain one <address>
or one <GPS> element
 Any one from list e.g. (foo|bar|baz)
 Parser will raise error when:
 element were empty
 or if it contained more than one of these elements
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
.Cont
 Nested sequences and choices
 using both simple sequences and choices as
building blocks.
 <!ELEMENT location (address | (latitude,
longitude))>
 e.g. (foo,bar,(baz|mumble))
 e.g. (foo|(bar,baz))

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Mixed Content
 XML recommendation specifies that any element with text in
its content is a mixed content model element.
 Within mixed content models, text can appear by itself or it
can be interspersed between elements.
 The rules for mixed content models are similar to the element
content model rules.
 Simple mixed content model — text-only:
<!ELEMENT first (#PCDATA)>
 PCDATA is a keyword derived from Parsed Character
DATA.
 It simply indicates that the character data within the content
model should be parsed by the parser.
 example element that conforms to this declaration:
<first>John</first>
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Empty Content
 Some elements within your XML documents
may or may not have content
 Like <br> element in html language
 To define an element with an empty content
model
 <!ELEMENT br EMPTY>
 you shouldn’t declare elements that may
contain content like <middle> name.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Any Content
 you can declare an element using the ANY keyword.
For example:
 <!ELEMENT description ANY>
 the ANY keyword indicates that text (PCDATA)
and/or any elements declared within the DTD can be
used within the content of the <description> element
 And can be used in any order any number of times.
 However, the ANY keyword does not allow you to
include elements that are not declared within the
DTD.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Example: “Making Contact”:
try it using jEdit
<?xml version=”1.0”?>
<!DOCTYPE contacts PUBLIC “-//Beginning XML//DTD Contact Example//EN”
“contacts1.dtd”>
<contacts>
<contact>
<name>
<first>Joseph</first>
<middle>John</middle>
<last>Fawcett</last>
</name>
<location>
<latitude>50.7218</latitude>
<longitude>-3.533617</longitude>
</location>
<phone>001-234-567-8910</phone>
<knows>John Doe, Danny Ayers</knows>
<description>Joseph is a developer and author for Beginning XML <em>5th
edition</em>.<br/>Joseph <strong>loves</strong> XML!</description>
</contact> Dr. Khaled ALqawasmi, Zarqa University,
</contacts> 2013/2014
define DTD for the example contact
.XML documnet
<!ELEMENT contacts (contact)>
<!ELEMENT contact (name, location, phone, knows, description)>
<!ELEMENT name (first, middle, last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT middle (#PCDATA)>
<!ELEMENT last (#PCDATA)>
<!ELEMENT location (address | (latitude, longitude))>
<!ELEMENT address (#PCDATA)>
<!ELEMENT latitude (#PCDATA)>
<!ELEMENT longitude (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT knows (#PCDATA)>
<!ELEMENT description (#PCDATA | em | strong | br)*>
<!ELEMENT em (#PCDATA)>
<!ELEMENT strong (#PCDATA)>
<!ELEMENT br EMPTY>
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Cardinality
 An element’s cardinality defi nes how many
times it will appear within a content model.
 DTDs allow four indicators for cardinality, as
shown here in Table:

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Example, DTD
<!ELEMENT contacts (contact*)>
<!ELEMENT contact (name, location, phone, knows, description)>
<!ELEMENT name (first+, middle?, last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT middle (#PCDATA)>
<!ELEMENT last (#PCDATA)>
<!ELEMENT location (address | (latitude, longitude))*>
<!ELEMENT address (#PCDATA)>
<!ELEMENT latitude (#PCDATA)>
<!ELEMENT longitude (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT knows (#PCDATA)>
<!ELEMENT description (#PCDATA | em | strong | br)*>
<!ELEMENT em (#PCDATA)>
<!ELEMENT strong (#PCDATA)>
<!ELEMENT br EMPTY>
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Example: XML
<?xml version=”1.0”?> <contact>
<!DOCTYPE contacts PUBLIC “-//Beginning XML//DTD <name>
Contact Example//EN” <first>John</first>
“contacts2.dtd”> <last>Doe</last>
<contacts> </name>
<contact> <location>
<name> <address>Address is not known</address>
<first>Joseph</first> </location>
<first>John</first> <phone>321 321 3213</phone>
<last>Fawcett</last> <knows>Joseph Fawcett, Danny Ayers</knows>
</name> <description>Senior Technical Consultant for
<location> LMX.</description>
<address>Exeter, UK</address> </contact>
<latitude>50.7218</latitude> </contacts>
<longitude>-3.533617</longitude>
</location>
<phone>001-234-567-8910</phone>
<knows>John Doe, Danny Ayers</knows>
<description>Joseph is a developer and author for Beginning
XML <em>5th
edition</em>.<br/>Joseph <strong>loves</strong> XML!
</description>
</contact>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
mixed content
 Such elements may contain any combination of
elements and PCDATA. For example, the
declaration
<!ELEMENT myMessage ( #PCDATA | message )*>
 Element myMessage contains two message
elements and three instances of character data.
Because of the *, element myMessage could have
contained nothing.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Attribute Declarations
 declare a list of allowable attributes for each element.
 These lists are called ATTLIST declarations, and look like
this:
<!ELEMENT contacts (contact*)>
<!ATTLIST contacts source CDATA #IMPLIED>
 This particular ATTLIST declares only one attribute, source,
for the <contacts> element.
 An ATTLIST declaration consists of three basic parts:

➤ The ATTLIST keyword


➤ The associated element’s name
➤ The list of declared attributes
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
.Cont
 Each attribute is listed in three pieces of information:
➤ The attribute name
➤ The attribute type
➤ The attribute value declaration
 attribute name is source.
 source attribute contain character data — the CDATA keyword
is used to give the attribute’s type.
 the declaration indicates that the attribute has no default value,
and that this attribute does not need to appear within the element
using the #IMPLIED keyword. This called Value declaration; it
controls how the XML parser handles the attribute’s value.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Attribute Names
 Attribute Names: follow XML naming rules,
you must also ensure that you don’t have
duplicate names within the attribute list for a
given element.
 To declare an attribute name, simply type the
name exactly as it appears in the XML
document, including any namespace prefix.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Attribute Types
 Is used to
specify how the
processor
should handle
the data that
appears in the
value.
 The following
Table provides a
summary of the
different
attribute types: Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Attribute Value Declarations
 Within each attribute declaration you must
specify how the value will appear in the
document.
 The XML Recommendation allows you to
specify that the attribute either:
➤ Has a default value
➤ Has a fi xed value
➤ Is required
➤ Is implied (or is optional)
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Default Values
 you can be sure that it is included in the final output.
 a validating parser automatically inserts the attribute with the default
value if the attribute has been omitted.
 Example: the kind attribute of the <phone> element should have the value
of one of several alternatives like so:
 <!ATTLIST phone kind (Home | Work | Cell | Fax)>
 Given this in the DTD, one possible valid form of the element with
attribute would be:
 <phone kind=”Work”>
 You can specify the default attribute by simply including the value in
quotation marks after the attribute type:
 <!ATTLIST phone kind (Home | Work | Cell | Fax) “Home”>
 Here, the default value is Home.
 if the kind attribute has been omitted, the parser will automatically insert
the attribute kind with the value Home.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Fixed Values
 When an attribute’s value can never change, you use the
#FIXED keyword followed by the fixed value.
 if the fixed attribute is encountered, the parser checks
whether the fi xed value and attribute value match.
 If they do not match, the parser raises an error.
 A common use of fixed attributes is specifying version
numbers.
 <!ATTLIST contacts version CDATA #FIXED “1.0”>
 So this would be valid:
 <contacts version=”1.0”>
 But this would not:
 <contacts version=”1.1”>
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Required Values
 When you specify that an attribute is required, it
must be included within the XML document.
 Suppose you require this kind attribute:

<!ATTLIST phone kind (Home | Work | Cell | Fax)


#REQUIRED>
 In the preceding example, the declaration indicates
that the kind attribute must appear within every
<phone> element in the document.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Implied Values
 When declaring an attribute, you must always
specify a value declaration. If the attribute
you are declaring has no default value, has no
fixed value, and is not required, then you
must declare that the attribute is implied.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Specifying Multiple Attributes
 The ATTLIST declaration enables you to declare
more than one attribute, like so:
 <!ATTLIST contacts version CDATA #FIXED “1.0”
 source CDATA #IMPLIED>
 an alternative is to use multiple ATTLISTs, each
describing characteristics of the attributes one at a
time:
<!ATTLIST contacts version CDATA #FIXED “1.0”>
<!ATTLIST contacts source CDATA #IMPLIED>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Example DTD (1 of 6)
 Text declaration

01 <?xml version=‘1.0’ encoding=‘ISO-8859-1’?>


02 <!ELEMENT order (item)+>
03 <!ELEMENT item (name,price)>
04 <!ATTLIST item code NMTOKEN #REQUIRED>
05 <!ELEMENT name (#PCDATA)>
06 <!ELEMENT price (#PCDATA)>
07 <!ATTLIST price currency NMTOKEN ‘USD’>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Example DTD (2 of 6)
 Element declarations

01 <?xml version=‘1.0’ encoding=‘ISO-8859-1’?>


02 <!ELEMENT order (item)+>
03 <!ELEMENT item (name,price)>
04 <!ATTLIST item code NMTOKEN #REQUIRED>
05 <!ELEMENT name (#PCDATA)>
06 <!ELEMENT price (#PCDATA)>
07 <!ATTLIST price currency NMTOKEN ‘USD’>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Example DTD (3 of 6)
 Element content models

01 <?xml version=‘1.0’ encoding=‘ISO-8859-1’?>


02 <!ELEMENT order (item)+>
03 <!ELEMENT item (name,price)>
04 <!ATTLIST item code NMTOKEN #REQUIRED>
05 <!ELEMENT name (#PCDATA)>
06 <!ELEMENT price (#PCDATA)>
07 <!ATTLIST price currency NMTOKEN ‘USD’>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Example DTD (4 of 6)
 Attribute list declarations

01 <?xml version=‘1.0’ encoding=‘ISO-8859-1’?>


02 <!ELEMENT order (item)+>
03 <!ELEMENT item (name,price)>
04 <!ATTLIST item code NMTOKEN #REQUIRED>
05 <!ELEMENT name (#PCDATA)>
06 <!ELEMENT price (#PCDATA)>
07 <!ATTLIST price currency NMTOKEN ‘USD’>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Example DTD (5 of 6)
 Attribute value type

01 <?xml version=‘1.0’ encoding=‘ISO-8859-1’?>


02 <!ELEMENT order (item)+>
03 <!ELEMENT item (name,price)>
04 <!ATTLIST item code NMTOKEN #REQUIRED>
05 <!ELEMENT name (#PCDATA)>
06 <!ELEMENT price (#PCDATA)>
07 <!ATTLIST price currency NMTOKEN ‘USD’>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Example DTD (6 of 6)
 Attribute default value

01 <?xml version=‘1.0’ encoding=‘ISO-8859-1’?>


02 <!ELEMENT order (item)+>
03 <!ELEMENT item (name,price)>
04 <!ATTLIST item code NMTOKEN #REQUIRED>
05 <!ELEMENT name (#PCDATA)>
06 <!ELEMENT price (#PCDATA)>
07 <!ATTLIST price currency NMTOKEN ‘USD’>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Entity Declarations
 Document pieces, or “storage units”
 Entities are variables used to define shortcuts to standard
text or special characters.
 Simplify writing of documents and DTD grammars
 Modularize documents and DTD grammars
 You learned that five entities built into XML enable you to
include characters that have special meaning in XML
documents. Such as : &apos
 Also, you can use character references to include characters
that are difficult to type, such as the (©)

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Entities types
➤ Built-in entities
➤ Character entities
➤ General entities
➤ Parameter entities

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Built-in entities
➤ &amp; — The & character
➤ &lt; — The < character
➤ &gt; — The > character
➤ &apos; — The ‘ character
➤ &quot; — The “ character
 For example
 <description>Author &amp; programmer</description> legal
 <contacts version=&quot;1.0&quot;> illegal
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
Character Entities
 Character entities are typically used when unusual characters
are needed in a document.
 It beginning with an ampersand (&) and ending with a
semicolon(;).
 For example
 &#169;
 You can also refer to a character entity by using the
hexadecimal Unicode value for the character:
 &#x00A9;
 the hexadecimal value 00A9 is used in place of the decimal
value 169.
 www.unicode.org/charts/.
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
General Entities
 XML developers use general entities to create
reusable sections of replacement text.
 Instead of representing only a single character,
general entities can represent characters, paragraphs,
and even entire documents.
 two ways to declare general entities
 Internal Entity Declaration
 External Entity Declaration

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
An Internal Entity Declaration

 Syntax
<!ENTITY entity-name "entity-value">
 DTD Example:

<!ENTITY writer "Donald Duck.">


<!ENTITY copyright "Copyright W3Schools.">
XML example:
<author>&writer;&copyright;</author>
 Note: An entity has three parts: an ampersand (&), an
entity name, and a semicolon (;).
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
An External Entity Declaration

 Syntax
<!ENTITY entity-name SYSTEM "URI/URL">
 DTD Example:

<!ENTITY writer SYSTEM "http://www.w3schools.com/entities.dtd">


<!ENTITY copyright SYSTEM "http://www.w3schools.com/entities.dtd">

 XML example:

<author>&writer;&copyright;</author>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Parameter Entities
 Parameter entities, much like general entities, enable
you to create reusable sections of replacement text.
 you can refer to parameter entities only within the
DTD.
 Parameter entities for use in DTD
 Parameter entity declarations:

<!ENTITY % DefaultPhoneKind “Home”>


 The percent sign (%) before the name of the entity
indicates this is a parameter entity.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 parameter entities can also refer to external files using a
system or public identifier.
<!ENTITY % NameDeclarations SYSTEM “name.dtd”>
 Or

<!ENTITY % NameDeclarations PUBLIC “-//Beginning


XML//DTD External module//EN” “name.dtd”>
 Instead of redeclaring the <name>, <first>, <middle>, and
<last> elements in the DTD for the contacts list, you could
refer to the name.dtd from earlier in the chapter.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
References to Parameter Entities
 Instead of using an ampersand (&) you must use a
percent sign (%), as shown in the following
example:
%NameDeclarations;
 Example Suppose you wanted to make use of the
DefaultPhoneKind parameter entity within the
ATTLIST declaration for the phone element.
<!ENTITY % DefaultPhoneKind “&#34;Home&#34;”>
<!ATTLIST phone kind (Home | Work | Cell | Fax) %DefaultPhoneKind;>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
.Cont
 Declaration
 <!ENTITY % boolean ‘(true|false)’>
 <!ENTITY % html SYSTEM ‘html.dtd’>
 Reference in DTD
 <!ATTLIST person cool %boolean; #IMPLIED>
 %html;

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Specifying DTD in Document
 Doctype declaration
 Must appear before the root element
 May contain declarations internal to document
 May reference declarations external to document
 Internal subset
 Commonly used to declare general entities
 Overrides declarations in external subset

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
examples

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Mixed content: example
<!ELEMENT note (to+,from,header,message*,#PCDATA)>

This example declares that the element note must contain


at least one to child element, exactly one from child
element, exactly one header, zero or more message, and
some other parsed character data as well.

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Internal DTD example
<!DOCTYPE company [
< ! ELEMENT company ( (person | product)*) >
< ! ELEMENT person ( ssn, name, office, phone?) >
< ! ELEMENT ssn ( # PCDATA) >
< ! ELEMENT name ( # PCDATA) >
< ! ELEMENT office ( # PCDATA) >
< ! ELEMENT phone ( # PCDATA) >
< ! ELEMENT product ( pid, name, description? ) >
< ! ELEMENT pid ( # PCDATA) >
< ! ELEMENT description ( # PCDATA) >
]>
<company>
<person> <ssn> 12345678 < /ssn>
<name> John </name>
<office> B432 </office>
<phone> 1234 </phone>
</person>
<product> … </product>

</company>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
External DTD example
<!DOCTYPE company SYSTEM “company.dtd”>
<company>
<person> <ssn> 12345678 < /ssn>
<name> John </name>
<office> B432 </office>
<phone> 1234 </phone>
</person>
<product> … </product>

</company>
 The DTD can reside at a different URL, and then we refer to it as:

<!DOCTYPE company SYSTEM “http://www.mydtd.com/company.dtd”>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Cont. This is the file “company.dtd” containing the
:DTD
< ! ELEMENT company ( (person | product)*) >
< ! ELEMENT person ( ssn, name, office, phone?)
< ! ELEMENT ssn ( # PCDATA) >
< ! ELEMENT name ( # PCDATA) >
< ! ELEMENT office ( # PCDATA) >
< ! ELEMENT phone ( # PCDATA) >
< ! ELEMENT product ( pid, name, description? ) >
< ! ELEMENT pid ( # PCDATA) >
< ! ELEMENT description ( # PCDATA) >

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Attribute declaration examples
:Example 1
DTD example:
<!ELEMENT square EMPTY>
<!ATTLIST square width CDATA "0">

XML example:
<square width="100"></square>
or
<square width="100”/>
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
:Example 2
DTD example:
< !DOCTYPE family [
< !ELEMENT family (person)* >
< !ELEMENT person ( name ) >
< !ELEMENT name (#PCDATA) >
< !ATTLIST person id ID #REQUIRED
mother IDREF #IMPLIED
father IDREF #IMPLIED
children IDREFS #IMPLIED
]>
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
.Cont
XML example:
<family>
<person id = “jane” mother = “mary” father = “john”>
<name> Jane Doe </name>
</person>
<person id = “john” children = “jane jack”>
<name> John Doe </name>
</person>
<person id = “mary” children = “jane jack”>
<name> Mary Smith </name>
</person>
<person id = “jack” mother = “smith” father = “john”>
<name> Jack Smith </name>
</person>
</family>
Dr. Khaled ALqawasmi, Zarqa University,
2013/2014
DTD - Examples from the Net

 Visit the web site:

 http://www.xmlfiles.com/dtd/dtd_examples.as
p

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
TV Scedule DTD

 By David Moisan. Copied from his Web: http://www.davidmoisan.org/


 
 <!DOCTYPE TVSCHEDULE [ <!ELEMENT TVSCHEDULE (CHANNEL+)>
<!ELEMENT CHANNEL (BANNER, DAY+)>
<!ELEMENT BANNER (#PCDATA)>
<!ELEMENT DAY ((DATE, HOLIDAY) | (DATE, PROGRAMSLOT+))+>
<!ELEMENT HOLIDAY (#PCDATA)>
<!ELEMENT DATE (#PCDATA)>
<!ELEMENT PROGRAMSLOT (TIME, TITLE, DESCRIPTION?)>
<!ELEMENT TIME (#PCDATA)>
<!ELEMENT TITLE (#PCDATA)> 
<!ELEMENT DESCRIPTION (#PCDATA)><!ATTLIST TVSCHEDULE
NAME CDATA #REQUIRED>
<!ATTLIST CHANNEL CHAN CDATA #REQUIRED>
<!ATTLIST PROGRAMSLOT VTR CDATA #IMPLIED>
<!ATTLIST TITLE RATING CDATA #IMPLIED>
<!ATTLIST TITLE LANGUAGE CDATA #IMPLIED>]>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Newspaper Article DTD

 Copied from http://www.vervet.com/
 
<!DOCTYPE NEWSPAPER [ <!ELEMENT NEWSPAPER (ARTICLE+)>
<!ELEMENT ARTICLE (HEADLINE, BYLINE, LEAD, BODY, NOTES)>
<!ELEMENT HEADLINE (#PCDATA)>
<!ELEMENT BYLINE (#PCDATA)>
<!ELEMENT LEAD (#PCDATA)>
<!ELEMENT BODY (#PCDATA)>
<!ELEMENT NOTES (#PCDATA)> <!ATTLIST ARTICLE AUTHOR CDATA
#REQUIRED>
<!ATTLIST ARTICLE EDITOR CDATA #IMPLIED>
<!ATTLIST ARTICLE DATE CDATA #IMPLIED>
<!ATTLIST ARTICLE EDITION CDATA #IMPLIED>
<!ENTITY NEWSPAPER "Vervet Logic Times">
<!ENTITY PUBLISHER "Vervet Logic Press">
<!ENTITY COPYRIGHT "Copyright 1998 Vervet Logic Press">]>

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
…Beyond DTDs
 DTD limitations
 Simple document structures
 Lack of “real” datatypes
 Advanced schema languages
 XML Schema
 Relax NG
 …

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
Useful Links
 XML 1.0 Specification
 http://www.w3.org/TR/REC-xml
 Annotated XML 1.0 Specification
 http://www.xml.com/axml/testaxml.htm
 Informational web sites
 http://www.xml.com/
 http://www.xmlhack.com/

Dr. Khaled ALqawasmi, Zarqa University,


2013/2014
XML Schemas

Chapter 5

Dr. Khaled Alqawasmi, Zarqa University 2014


Introduction to XML Schema
 XML Schema is an XML-based alternative
to DTD.
 An XML schema describes the structure of
an XML document.
 The XML Schema language is also referred
to as XML Schema Definition (XSD).

Dr. Khaled Alqawasmi, Zarqa University 2014


?What is it
 A grammar definition language
 Like DTDs but better
 Uses XML syntax
 DTDs are a type of schema
 Defined by W3C

Dr. Khaled Alqawasmi, Zarqa University 2014


the limitations of DTD
 DTDs do not have built-in datatypes.
 DTDs do not support user-derived datatypes.
 DTDs allow only limited control over
cardinality (the number of occurrences of an
element within its parent).
 DTDs do not support Namespaces or any
simple way of reusing or importing other
schemas.

Dr. Khaled Alqawasmi, Zarqa University 2014


XML Schemas Use XML Syntax
 When creating an XML Schema, the syntax is
entirely in XML.
 For example:
 DTD element rule
 <!ELEMENT first (#PCDATA)>
 The same rule (approximately) is expressed in
XML Schema as:
 <element name=”first” type=”string”/>
Dr. Khaled Alqawasmi, Zarqa University 2014
XML Schema Data Types
 XML Schemas divide data types into two
broad categories:
 complex and simple
 Elements that may contain attributes or other
elements are declared using complex types.
 Attribute values and text content within
elements are declared using simple types.

Dr. Khaled Alqawasmi, Zarqa University 2014


.Cont
• an element of complex type can contain child elements and attributes,
whereas a simple-type element can only contain text
•Schema authors can define their own types or use the built-in types.

Dr. Khaled Alqawasmi, Zarqa University 2014


Cont: XML Schema Types
 Simple types
 Basic datatypes
 Can be used for attributes and element text
 Extendable
 Complex types
 Defines structure of elements
 Extendable
 Types can be named or “anonymous”
Dr. Khaled Alqawasmi, Zarqa University 2014
Simple Types
 DTD datatypes
 Strings, ID/IDREF, NMTOKEN, etc…
 Numbers
 Integer, long, float, double, etc…
 Other
 Binary (base64, hex)
 QName, URI, date/time
 etc…
Dr. Khaled Alqawasmi, Zarqa University 2014
Deriving Simple Types
 Apply facets
 Specify enumerated values
 Add restrictions to data
 Restrict lexical space
 Allowed length, pattern, etc…
 Restrict value space
 Minimum/maximum values, etc…
 Extend by list or union
Dr. Khaled Alqawasmi, Zarqa University 2014
A Simple Type Example (1 of 4)
]Integer with value (1234, 5678 

01 <xsd:simpleType name=‘MyInteger’>
02 <xsd:restriction base=‘xsd:integer’>
03 <xsd:minExclusive value=‘1234’/>
04 <xsd:maxInclusive value=‘5678’/>
05 </xsd:restriction>
06 </xsd:simpleType>

Dr. Khaled Alqawasmi, Zarqa University 2014


A Simple Type Example (2 of 4)
]Integer with value (1234, 5678 

01 <xsd:simpleType name=‘MyInteger’>
02 <xsd:restriction base=‘xsd:integer’>
03 <xsd:minExclusive value=‘1234’/>
04 <xsd:maxInclusive value=‘5678’/>
05 </xsd:restriction>
06 </xsd:simpleType>

Dr. Khaled Alqawasmi, Zarqa University 2014


A Simple Type Example (3 of 4)
]Integer with value (1234, 5678 

01 <xsd:simpleType name=‘MyInteger’>
02 <xsd:restriction base=‘xsd:integer’>
03 <xsd:minExclusive value=‘1234’/>
04 <xsd:maxInclusive value=‘5678’/>
05 </xsd:restriction>
06 </xsd:simpleType>

Dr. Khaled Alqawasmi, Zarqa University 2014


A Simple Type Example (4 of 4)
]Validating integer with value (1234, 5678 

01 <data xsi:type='MyInteger'></data> INVALID


02 <data xsi:type='MyInteger'>Andy</data> INVALID
03 <data xsi:type='MyInteger'>-32</data> INVALID
04 <data xsi:type='MyInteger'>1233</data> INVALID
05 <data xsi:type='MyInteger'>1234</data> INVALID
06 <data xsi:type='MyInteger'>1235</data>
07 <data xsi:type='MyInteger'>5678</data>
08 <data xsi:type='MyInteger'>5679</data> INVALID

Dr. Khaled Alqawasmi, Zarqa University 2014


online XML Schema validator tools
 http://www.corefiling.com/opensource/schem
aValidate.html
 http://apps.gotdotnet.com/xmltools/xsdvalidat
or

 Or, use jEdit : www.jedit.org

Dr. Khaled Alqawasmi, Zarqa University 2014


example
 <?xml version=”1.0”?>
 <schema xmlns=”http://www.w3.org/2001/XMLSchema”
 xmlns:target=”http://www.example.com/name”
 targetNamespace=”http://www.example.com/name”
elementFormDefault=”qualified”>
 <element name=”name”>
 <complexType>
 <sequence>
 <element name=”first” type=”string”/>
 <element name=”middle” type=”string”/>
 <element name=”last” type=”string”/>
 </sequence>
 <attribute name=”title” type=”string”/>
 </complexType>
 </element>
 </schema>
Dr. Khaled Alqawasmi, Zarqa University 2014
.Cont
 Because the <name> element contains the elements <first>,
<middle>, and <last>, it must be declared as a complex type.
 The <sequence> declaration contains three <element>
declarations.
 Within these declarations, you have specified that their type
is string. which allows any textual content
 within the <complexType> defi nition is an <attribute>
declaration.
 Because the title attribute is declared in the <complexType>
declaration for the <name> element, the attribute is allowed
to appear in the <name> element

Dr. Khaled Alqawasmi, Zarqa University 2014


Complex Types
 Element content models
 Simple
 Mixed
 Unlike DTDs, elements in mixed content can be
ordered
 Sequences and choices
 Can contain nested sequences and choices
 All
 All elements required but order is not important
Dr. Khaled Alqawasmi, Zarqa University 2014
DEFINING XML SCHEMAS
 <schema> Declarations
 The <schema> element is the root element within an
XML Schema and it enables you to declare
namespace information as well as defaults for
declarations throughout the document Like:
 <schema targetNamespace=”URI”
 attributeFormDefault=”qualified or unqualified”
 elementFormDefault=”qualified or unqualified”
 version=”version number”>

Dr. Khaled Alqawasmi, Zarqa University 2014


The XML Schema Namespace
 In the example
http://www.w3.org/2001/XMLSchema was
declared as the default within the <schema>
element.
 Using prefix, the XML Schema
Recommendation itself uses the prefix xs:
 Prefix it is only a shortcut to the namespace
declaration.
Dr. Khaled Alqawasmi, Zarqa University 2014
Target Namespaces
 purpose of XML Schemas is to declare vocabularies
 These vocabularies can be identified by a namespace
that is specified in the targetNamespace attribute.
 Not all XML Schemas will have a targetNamespace.
 Many XML Schemas defi ne vocabularies that are
reused in another XML Schema,
 or vocabularies that are used in documents where the
namespace is not necessary.

Dr. Khaled Alqawasmi, Zarqa University 2014


.Cont
 Some possible targetNamespace declarations include the
following:
 <schema xmlns=”http://www.w3.org/2001/XMLSchema”
 targetNamespace=”http://www.example.com/name”
 xmlns:target=”http://www.example.com/name”>
 or
 <xs:schema
xmlns:xs=”http://www.w3.org/2001/XMLSchema”
 targetNamespace=”http://www.example.com/name”
 xmlns=”http://www.example.com/name”>

Dr. Khaled Alqawasmi, Zarqa University 2014


Element and Attribute Qualification
 elements and attributes may be qualified or unqualifi
ed
 An element or attribute is qualified if it has an
associated namespace
 For example, the following elements are qualified:
 <name xmlns=”http://www.example.com/name”>
 <first>John</first>
 <middle>Fitzgerald</middle>
 <last>Doe</last>
 </name>
Dr. Khaled Alqawasmi, Zarqa University 2014
 qualifi ed and prefi xed:
 <n:name xmlns:n=”http://www.example.com/name”>
 <n:first>John</n:first>
 <n:middle>Fitzgerald</n:middle>
 <n:last>Doe</n:last>
 </n:name>
 Unqualifi ed elements have no associated namespace:
 <n:name xmlns:n=”http://www.example.com/name”>
 <first>John</first>
 <middle>Fitzgerald</middle>
 <last>Doe</last>
 </n:name>
Dr. Khaled Alqawasmi, Zarqa University 2014
.Cont
 Within the <schema> element you can modify the
defaults specifying how elements should be qualified
by including the following attributes:
➤ elementFormDefault
➤ attributeFormDefault
 The elementFormDefault and attributeFormDefault
attributes enable you to control the default
qualification form for elements and attributes in the
instance documents.
 The default value for both elementFormDefault and
attributeFormDefault is unqualified.
Dr. Khaled Alqawasmi, Zarqa University 2014
Content Models
 ways of interpreting a list of elements:
➤ <sequence>: Elements must appear in the
given order.
➤ <choice>: Only one of the elements in the list
may appear.
➤ <all>: Elements can appear in any order, with
each child element occurring zero or one
time.
Dr. Khaled Alqawasmi, Zarqa University 2014
element> Declarations<
 declaring an element conatns:
 element name and defi ning the allowable content, such as:
<element
name=”name of the element”
type=”global type”
ref=”global element declaration”
form=”qualified or unqualified”
minOccurs=”non-negative number”
maxOccurs=”non-negative number or ‘unbounded’”
default=”default value”
fixed=”fixed value”>

Dr. Khaled Alqawasmi, Zarqa University 2014


Element names
 XML names can include numerical digits,
periods (.), hyphens (-), and underscores (_),
they must begin with a letter or an underscore
(_).
 the colon (:) is also disallowed anywhere in
the name.

Dr. Khaled Alqawasmi, Zarqa University 2014


Element type
 An element’s allowable content is determined
by its type.
 You can specify the type in three main ways:
 local type
 global type
 referring to a global element declaration

Dr. Khaled Alqawasmi, Zarqa University 2014


Global versus Local
 The difference between global and local
declarations:
➤ Global declarations are declarations that appear as
direct children of the <schema> element. Global
element declarations can be reused throughout the
XML Schema.
➤ Local declarations do not have the <schema>
element as their direct parent and can be used only in
their specific context.
Dr. Khaled Alqawasmi, Zarqa University 2014
example <?xml version=”1.0”?>
 This XML Schema has four element <schema xmlns=”http://www.w3.org/2001/XMLSchema”
declarations. xmlns:target=”http://www.example.com/name”
targetNamespace=”http://www.example.com/name”
 the <name> element, is a global elementFormDefault=”qualified”>
declaration because it is a direct <element name=”name”>
child of the <schema> element. <complexType>
 The declarations for the <first>, <sequence>
<middle>, and <last> elements are <element name=”first” type=”string”/>
considered local because the <element name=”middle” type=”string”/>
declarations are not direct children <element name=”last” type=”string”/>
of the <schema> element. </sequence>
 The declarations for the <first>, <attribute name=”title” type=”string”/>
<middle>, and <last> elements are </complexType>
valid only within the <sequence> </element>
declaration — they cannot be reused </schema>
elsewhere in the XML Schema.

Dr. Khaled Alqawasmi, Zarqa University 2014


Creating a Local Type
 To create a local type, include the type
declaration as a child of the element
declaration, as in the following example:
<element name=”name”> <element name=”name”>
<complexType> <simpleType>
<sequence> <restriction base=”string”>
<element name=”first” type=”string”/> <enumeration value=”Home”/>
<element name=”middle” type=”string”/> <enumeration value=”Work”/>
<element name=”last” type=”string”/> <enumeration value=”Cell”/>
</sequence> <enumeration value=”Fax”/>
<attribute name=”title” type=”string”/> </restriction>
</complexType> </simpleType>
</element> </element>

Dr. Khaled Alqawasmi, Zarqa University 2014


Creating a Global Type
 Instead of declaring duplicate local types
throughout your schema, you can create a global
type.
 Within your element declarations, you can refer
to a global type by name.
 <element name=”first” type=”string”/>
 But this declaration use a built-in data type
 Instead You can also create your own global
declarations and refer to them.
Dr. Khaled Alqawasmi, Zarqa University 2014
Creating Reusable Global Types
 <?xml version=”1.0”?>
 <schema xmlns=”http://www.w3.org/2001/XMLSchema”
 xmlns:target=”http://www.example.com/name”
 targetNamespace=”http://www.example.com/name”
 elementFormDefault=”qualified”>
 <complexType name=”NameType”>
 <sequence>
 <element name=”first” type=”string”/>
 <element name=”middle” type=”string”/>
 <?xml version=”1.0”?>
 <schema xmlns=”http://www.w3.org/2001/XMLSchema”
 xmlns:target=”http://www.example.com/name”
 targetNamespace=”http://www.example.com/name”
 elementFormDefault=”qualified”>
 <complexType name=”NameType”>
 <sequence>
 <element name=”first” type=”string”/>
 <element name=”middle” type=”string”/>

Dr. Khaled Alqawasmi, Zarqa University 2014


Referring to Global Element
Declarations
 <?xml version=”1.0”?>
 <schema xmlns=”http://www.w3.org/2001/XMLSchema”
 xmlns:target=”http://www.example.com/name”
 targetNamespace=”http://www.example.com/name”
 elementFormDefault=”qualified”>
 <element name=”first” type=”string”/>
 <element name=”middle” type=”string”/>
 <element name=”last” type=”string”/>
 <complexType name=”NameType”>
 <sequence>
 <element ref=”target:first”/>
 <element ref=”target:middle”/>
 <element ref=”target:last”/>
 </sequence>
 <attribute name=”title” type=”string”/>
 </complexType>
 <element name=”name” type=”target:NameType”/>
 </schema>

Dr. Khaled Alqawasmi, Zarqa University 2014


Cardinality
 Cardinality specifi es the number of times a
particular element appears within a content model.
 you can modify an element’s cardinality by
specifying the
 minOccurs and maxOccurs attributes
 within the element declaration.
 Note: The minOccurs and maxOccurs attributes are
not permitted within global element declarations

Dr. Khaled Alqawasmi, Zarqa University 2014


example
1. <element name=”first” type=”string” minOccurs=”2” maxOccurs=”2”/>
2. <element ref=”target:first” maxOccurs=”10”/>
3. <element name=”location” “minOccurs=”0” maxOccurs=”unbounded”/>
1. <first> must appear within the instance document a minimum of
two times and a maximum of two times.
2. declares your element using a reference to the global <first>
declaration. maxOccurs attribute was included with the value
10. minOccurs attribute was not included, so a schema
validator would use the default value, 1.
3. <location> may or may not appear within your instance
document because the minOccurs attribute has the value 0. It
also indicates that it may appear an infinite number of times
because the value of maxOccurs is unbounded.

Dr. Khaled Alqawasmi, Zarqa University 2014


.Cont
 unbounded, which indicates there is no limit
to the number of occurrences.
 When specifying minOccurs and maxOccurs
is that the value of maxOccurs must be
greater than or equal to the value for
minOccurs.

Dr. Khaled Alqawasmi, Zarqa University 2014


Default and Fixed Values
 When declaring default values for elements,
you can only specify a text value.
 You are not permitted to specify a default
value for an element whose content model
will contain other elements, unless the content
model is mixed.

Dr. Khaled Alqawasmi, Zarqa University 2014


cont
 To specify a default value, include the default attribute with the desired
value.
 Example
 <element name=”last” type=”string” default=”Doe”/>
 if the schema validator encounters:
 <last></last>
or
 <last/>
 it would treat the element as follows:
 <last>Doe</last>
 Note that if the element does not appear within the document or if the
element already has content, the default value is not used.

Dr. Khaled Alqawasmi, Zarqa University 2014


.Cont

 When specifying fi xed or default values in


element declarations, you must ensure that the
value you specify is allowable content for the
type you have declared.
 For example
 If element has the type positiveInteger
 ou cannot use Doe as a default value because
it is not a positive integer.
Dr. Khaled Alqawasmi, Zarqa University 2014
fixed value
 In some circumstances you may want to ensure that an
element’s value does not change
 Example
 <element name=”version” type=”string” fixed=”1.0”/>
 the following elements are legal:
 <version>1.0</version>
 <version></version>
 <version/>
 The following value is not legal:
 <version>2.0</version>

Dr. Khaled Alqawasmi, Zarqa University 2014


Element Wildcards: the <any>
Declaration
 Suppose you want to specify that your element can contain
any of the elements declared in your namespace, or any
elements from another namespace.
 include any element from a namespace are called element
wildcards.
 To declare an element wildcard, use the <any> declaration,
like so:
 <any
 minOccurs=”non negative number”
 maxOccurs=”non negative number or unbounded”
 namespace=”allowable namespaces”
 processContents=”lax or skip or strict”>
Dr. Khaled Alqawasmi, Zarqa University 2014
.Cont
 namespace attribute

Dr. Khaled Alqawasmi, Zarqa University 2014


Mixed Content
 Mixed content models enable you to include both text and element
content within a single content model.
 To create a mixed content model in XML Schemas, simply include the
mixed attribute with the value true in your <complexType> definition:
 <element name=”description”>
 <complexType mixed=”true”>
 <choice minOccurs=”0” maxOccurs=”unbounded”>
 <element name=”em” type=”string”/>
 <element name=”strong” type=”string”/>
 <element name=”br” type=”string”/>
 </choice>
 </complexType>
 </element>

Dr. Khaled Alqawasmi, Zarqa University 2014


.Cont
 The preceding example declares a <description>
element, which can contain an infinite number of
<em>, <strong>, and <br> elements.
 Because the complex type is declared as mixed, text
can be interspersed throughout these elements.
 Example:
<description>Joe is a developer &amp; author for
Beginning XML <em>5th
edition</em></description>

Dr. Khaled Alqawasmi, Zarqa University 2014


group> Declarations<
 XML Schemas also enable you to define reusable
groups of elements.
 you can easily reuse and combine entire content
models, for example:
 <group name=”name of global group”>
 <group name=”NameGroup”>
 the basic structure of a global <group> declaration
follows:
 <group name=”NameGroup”>
 <!-- content model goes here -->
 </group>
Dr. Khaled Alqawasmi, Zarqa University 2014
attribute> Declarations<
 attribute declarations are very similar to
element declarations.
 attribute declarations are restricted to simple
types.
 Remember that complex types are used to
define types that contain attributes or
elements.
 simple types are used to restrict text-only
content.
Dr. Khaled Alqawasmi, Zarqa University 2014
.Cont
 A basic attribute declaration looks like this:
 <attribute name=”title”>
 <simpleType>
 <!-- type information -->
 </simpleType>
 </element>
 Like elements, you can also reuse attributes by
referring to global declarations.
Dr. Khaled Alqawasmi, Zarqa University 2014
An XML Schema for Contacts
<?xml version=”1.0”?> <phone kind=”Home”>001-234-567-
<contacts 8910</phone>
xmlns=”http://www.example.com/contact <knows/>
s” <description> Joseph is a developer and author
xmlns:xsi=”http://www.w3.org/2001/XMLSch for Beginning XML
ema-instance” <em>5th edition</em>.<br/>Joseph
xsi:schemaLocation=”http://www.example.co <strong>loves</strong> XML!
m/contacts contacts5.xsd” </description>
source=”Beginning XML 5E” version=”1.0”> </contact>
<contact> <contact>
<name> <name>
<first>Joseph</first> <first>Liam</first>
<first>John</first> <last>Quin</last>
<last>Fawcett</last> </name>
</name> <location>
<location> <address>Ontario, Canada</address>
<address>Exeter, UK</address> </location>
<latitude>50.7218</latitude> <phone>+1 613 476 8769</phone>
<longitude>-3.533617</longitude> <knows/>
</location> <description>XML Activity Lead at
W3C</description>
</contact>
Dr. Khaled Alqawasmi, Zarqa University 2014
</contacts>
.Cont
 The previous example is the instance
documents (“contacts.xml”).
 This document is associated with the
identified XML Schema (“contacts.xsd”)
 To begin to build your XML Schema,
perform the following steps:

Dr. Khaled Alqawasmi, Zarqa University 2014


.Start building your XML at root .1
 <schema xmlns=
http://www.w3.org/2001/XMLSchema
xmlns:contacts=http://www.example.com/contacts
targetNamespace=http://www.example.com/contacts
elementFormDefault=”qualified”>

Dr. Khaled Alqawasmi, Zarqa University 2014


contacts root element (global) .2
<element name=”contacts”>
<complexType>
<sequence>
<element name=”contact” minOccurs=”0” maxOccurs=”unbounded”>

Dr. Khaled Alqawasmi, Zarqa University 2014


3. contacts root element (local)
 It is possible to use local <complexType>
declarations inside of other <complexType>
declarations.
 <complexType>
 <sequence>
 <element name=”name”
type=”contacts:NameType”/>
<element name=”location”
type=”contacts:LocationType”/>
Dr. Khaled Alqawasmi, Zarqa University 2014
define the <phone> element .4
 <element name=”phone”>
 <complexType>
 <simpleContent>
 <extension base=”string”>
 <attribute name=”kind” type=”string”
default=”Home” />
 </extension>
 </simpleContent>
 </complexType>
 </element>
Dr. Khaled Alqawasmi, Zarqa University 2014
declare the attribute you want to use .5
 The attribute declaration has a name and type
just like element declarations.
 Any of the following examples are allowable
<phone> elements based on the declaration:
 <phone kind=”Home”>001-909-555-
1212</phone>
 <phone>001-909-555-1212</phone>
 <phone />
Dr. Khaled Alqawasmi, Zarqa University 2014
knows> element and the <description> < .6
element
 <element name=”knows”
type=”contacts:KnowsType”/>
 <element name=”description”
type=”contacts:DescriptionType”/>
 </sequence>
 </complexType>
 </element>
Dr. Khaled Alqawasmi, Zarqa University 2014
contacts> element attribute< .7
information
 <element name=”contacts”>
 <complexType>
 <sequence>
 <element name=”contact” ...
 ...
 </sequence>
 <attributeGroup ref=”contacts:ContactAttributes”/>
 </complexType>
 </element>
Dr. Khaled Alqawasmi, Zarqa University 2014
The <attributeGroup> here refers to a global .8
grouping named ContactAttributes
 <attributeGroup name=”ContactAttributes”>
 <attribute name=”version” type=”string” fi
xed=”1.0” />
 <attribute name=”source” type=”string”/>
 </attributeGroup>

Dr. Khaled Alqawasmi, Zarqa University 2014


Defi ne the content model for the global NameType .9
>using a reference to a <group
 <group name=”NameGroup”>
 <sequence>
 <element name=”fi rst” type=”string” minOccurs=”1”
 maxOccurs=”unbounded”/>
 <element name=”middle” type=”string”
minOccurs=”0” maxOccurs=”1”/>
 <element name=”last” type=”string”/>
 </sequence>
 </group>
Dr. Khaled Alqawasmi, Zarqa University 2014
Next, in the LocationType .10
<complexType> defi nition
 <complexType name=”LocationType”>
 <choice minOccurs=”0” maxOccurs=”unbounded”>
 <element name=”address” type=”string”/>
 <sequence>
 <element name=”latitude” type=”string”/>
 <element name=”longitude” type=”string”/>
 </sequence>
 </choice>
 </complexType>
Dr. Khaled Alqawasmi, Zarqa University 2014
knows> element< .11
 The global declaration for KnowsType didn’t contain a content model.
 Because of this, make the <knows> element in the instance document
empty like so:
 <complexType name=”KnowsType”>
 </complexType>
 <complexType name=”DescriptionType” mixed=”true”>
 <choice minOccurs=”0” maxOccurs=”unbounded”>
 <element name=”em” type=”string”/>
 <element name=”strong” type=”string”/>
 <element name=”br” type=”string”/>
 </choice>
 </complexType>

Dr. Khaled Alqawasmi, Zarqa University 2014


Finally, close off your schema to .12
:finish up
 </schema>
 After reading and following along with all the
preceding steps, you now know how to
develop an XML Schema.
 The following activity builds on these steps to
express a list of contacts using XML Schema.

Dr. Khaled Alqawasmi, Zarqa University 2014


Use jEdit
 <?xml version=“1.0“?>
 <schema xmlns=“http://www.w3.org/2001/XMLSchema“
 xmlns:contacts=“http://www.example.com/contacts“
 targetNamespace=“http://www.example.com/contacts“ elementFormDefault=“qualified“>
 <element name=“contacts“>
 <complexType>
 <sequence>
 <element name=“contact“ minOccurs=“0“ maxOccurs=“unbounded“>
 <complexType>
 <sequence>
 <element name=“name“ type=“contacts:NameType“/>
 <element name=“location“ type=“contacts:LocationType“/>
 <element name=“phone“>
 <complexType>
 <simpleContent>
 <extension base=“string“>
 <attribute name=“kind“ type=“string“ default=“Home“ />
 </extension>
 </simpleContent>
 </complexType>
 </element>
 <element name=“knows“ type=“contacts:KnowsType“/>
 <element name=“description“ type=“contacts:DescriptionType“/>
 </sequence> Dr. Khaled Alqawasmi, Zarqa University 2014
 </complexType>
.Cont
</element>  <complexType name=“LocationType“>
</sequence>  <choice minOccurs=“0“
<attributeGroup ref=“contacts:ContactAttributes“/> maxOccurs=“unbounded“>
</complexType>  <element name=“address“ type=“string“/>
</element>  <sequence>
<attributeGroup name=“ContactAttributes“>  <element name=“latitude“ type=“string“/>
<attribute name=“version“ type=“string“  <element name=“longitude“ type=“string“/>
fixed=“1.0“ />  </sequence>
<attribute name=“source“ type=“string“/>  </choice>
</attributeGroup>  </complexType>
<attribute name=“title“ type=“string“/>  <complexType
<complexType name=“NameType“> name=“KnowsType“></complexType>
<group ref=“contacts:NameGroup“/>  <complexType name=“DescriptionType“
mixed=“true“>
</complexType>
 <choice minOccurs=“0“
<group name=“NameGroup“> maxOccurs=“unbounded“>
<sequence>  <element name=“em“ type=“string“/>
<element name=“first“ type=“string“ minOccurs=“1“  <element name=“strong“ type=“string“/>
maxOccurs=“unbounded“/>
<element name=“middle“ type=“string“
 <element name=“br“ type=“string“/>
minOccurs=“0“ maxOccurs=“1“/>  </choice>
<element name=“last“ type=“string“/>  </complexType>
</sequence>  </schema>
</group> Dr. Khaled Alqawasmi, Zarqa University 2014
Data Types
 The XML Schema Recommendation allows
you to use two kinds of data types:
➤ Built-in data types
➤ User-defi ned data types

Dr. Khaled Alqawasmi, Zarqa University 2014


Built-in Data Types

Dr. Khaled Alqawasmi, Zarqa University 2014


Dr. Khaled Alqawasmi, Zarqa University 2014
User-Defined Data Types
 As you are developing your XML Schemas,
you will run into many elements and attribute
values that require a type not defined in the
XML Schema Recommendation.
 What you need is to create a list of allowable
values as you did in your DTD.

Dr. Khaled Alqawasmi, Zarqa University 2014


simpleType> Declarations<
 You can create custom user-defined data
types using the <simpleType> definition that
follows:
<simpleType
name=”name of the simpleType”
final=”#all or list or union or restriction”>

Dr. Khaled Alqawasmi, Zarqa University 2014


.CONT
 When you declare a <simpleType>, you must
always base your declaration on an existing
data type.
 The existing data type may be a built-in XML
Schema data type, or it may be another
custom data type.
 <simpleType> definitions are often called
derived types.
Dr. Khaled Alqawasmi, Zarqa University 2014
.CONT
 There are three primary derived types:
➤ Restriction types
➤ List types
➤ Union types

Dr. Khaled Alqawasmi, Zarqa University 2014


restriction> Declarations<
 Restriction types are declared using the
<restriction> declaration as follows:
 <restriction base=”name of the simpleType
you are deriving from”>

Dr. Khaled Alqawasmi, Zarqa University 2014


simpleType Constraining Facets

Dr. Khaled Alqawasmi, Zarqa University 2014


EXAMPLE
<attribute name=”kind”>
<simpleType>
<restriction base=”string”>
<enumeration value=”Home”/>
<enumeration value=”Work”/>
<enumeration value=”Cell”/>
<enumeration value=”Fax”/>
</restriction>
</simpleType>
</attribute>
Dr. Khaled Alqawasmi, Zarqa University 2014
list> Declarations<
 You’ll often need to create a list of items.
 you can base your list items on a specific
<simpleType>:
 <list itemType=”name of simpleType used for
validating items in the list”>

Dr. Khaled Alqawasmi, Zarqa University 2014


example
 Suppose you created a global  This simple type only allows for one of
<simpleType> called the enumerated values to be used.
ContactTagsType  If you want to allow for multiple items,
<simpleType name=”ContactTagsType”> you can make a type called
<restriction base=”string”> ContactTagsListType, which allows for a
list of tags using the <list> declaration,
<enumeration value=”author”/> as in the following:
<enumeration value=”xml”/> <simpleType name=”ContactTagsListType”>
<enumeration value=”poetry”/> <list itemType=”contacts:ContactTagsType”/>
<enumeration value=”consultant”/> </simpleType>
<enumeration value=”CGI”/>
<enumeration value=”semantics”/>
<enumeration value=”animals”/>
</restriction>
</simpleType>

Dr. Khaled Alqawasmi, Zarqa University 2014


union> Declarations<
 Finally, when creating your derived types,
you may need to combine two or more types.
 By declaring a <union> in the following
example
 you can validate the values in your instance
document against multiple types at once:
 <union memberTypes=”whitespace separated
list of types”>
Dr. Khaled Alqawasmi, Zarqa University 2014
CREATING A SCHEMA FROM
MULTIPLE DOCUMENTS
 The XML Schema Recommendation
introduces mechanisms for combining XML
Schemas and reusing definitions.
 The XML Schema Recommendation provides
two primary declarations for use with
multiple XML Schema documents:
➤ <import>
➤ <include>
Dr. Khaled Alqawasmi, Zarqa University 2014
import> Declarations<
 allows you to import global declarations from other
XML Schemas.
 Note that the <import> declaration allows you to
refer to declarations only within other XML
Schemas.
 This is the typical shape of an import declaration:

<import
namespace=””
schemaLocation=””>

Dr. Khaled Alqawasmi, Zarqa University 2014


include> Declarations<
 The <include> declaration is very similar to the
<import> declaration,
 except that the
 <include> declaration allows you to combine XML
Schemas that are designed for the same
targetNamespace (or no targetNamespace) much
more effectively.
 This is the shape of a typical <include> declaration:
<include
schemaLocation=””>

Dr. Khaled Alqawasmi, Zarqa University 2014


Chapter 7
Extracting Data from XML

Selected by Dr. Mohammed Al-Ghanim


Zarqa University, 2016/2017
DOCUMENT MODELS:
REPRESENTING XML IN MEMORY
 XML is a text-based way to represent
documents.
 Once an XML document has been read into
memory, it’s usually represented as a tree.

Selected by Dr. Mohammed Al-Ghanim


Zarqa University, 2016/2017
Models: DOM, XDM, and PSVI
 DOM: the W3C Document Object Model
 The DOM was originally designed for handling
HTML in web browsers.
 The XDM is more powerful than the DOM,
includes support for objects with types
described by W3C XML Schema.
 it supports items such as floating-point numbers
and sequences intermingled with the XML data.
Selected by Dr. Mohammed Al-Ghanim
Zarqa University, 2016/2017
 PSVI : Post-Validation Information Set . an abstract schema
was defined by W3C XML.
 The PSVI is the result of validating an XML document
against a W3C XML Schema and “augmenting” the XML
document with type annotations;
 for example, saying a <hatsize> element contains an integer.
The term information set comes from a specification (the
XML Information Set), which provides a standard set of
terminology for other specifications (like XSD) to use.

Selected by Dr. Mohammed Al-Ghanim


Zarqa University, 2016/2017
A Sample DOM Tree
 There are three main reasons why it is
important to talk about the DOM:
 Some of the most widely used XPath
implementations return DOM node lists.
 jQuery and other similar libraries are built on top
of the DOM and described in terms of the DOM.
 The XDM used by XPath 2 and later is based on
the same principles as DOM.

Selected by Dr. Mohammed Al-Ghanim


Zarqa University, 2016/2017
Example
<entry id=”armstrong-john”>
<title>Armstrong, John</title>
<body>
<p>, an English physician and poet, was born in <born>1715</born> in the
parish of Castleton in Roxburghshire, where his father and brother were
clergymen; and having completed his education at the University of
Edinburgh, took his degree in physics, Feb. 4, 1732, with much
reputation.
...
</p>
</body>
</entry>
Selected by Dr. Mohammed Al-Ghanim
Zarqa University, 2016/2017
Selected by Dr. Mohammed Al-Ghanim
Zarqa University, 2016/2017
DOM Node Types
 Document Node: The node at the very top of the tree, Document, is
special; it is not an element, it does not correspond to anything in the
XML, but represents the entire document.
 DocumentFragment Node: A DocumentFragment is a node used for
holding part of a document, such as a buffer for copy and paste. It does
not need to meet all of the well-formedness constraints — for example, it
could contain multiple top-level elements.
 Element Node: An Element node in a DOM tree represents a single
XML element and its contents.
 Attr (Attribute) Node: Attr nodes each represent a single attribute with
its name, value, and possibly, type information. They are normally only
found hidden inside element nodes, and you have to use a method such as
getAttributeNode(name) to retrieve them.

Selected by Dr. Mohammed Al-Ghanim


Zarqa University, 2016/2017
…Continue
 Text node: A text node is the textual content of an element.
 DocumentType, CDATASection, Notation, Entity, and
Comment Nodes: These are for more advanced use of the
DOM.
DOM nodes also have properties; you just saw an example of this
in Figure 7-1 — an element node has a property called
tagName. Most DOM programs will contain code that tests
the various properties of a node, especially its type (element,
attribute, text and so on) and behaves accordingly, even
though this is not usual in object-oriented design.

Selected by Dr. Mohammed Al-Ghanim


Zarqa University, 2016/2017
DOM Node Lists
 When you access parts of a DOM tree from a
program, you generally get back a node list.
As the name suggests, this is simply a list of
nodes.
 You often then use an iterator to march
through the list one item at a time.
 In particular, XPath queries return a node list.

Selected by Dr. Mohammed Al-Ghanim


Zarqa University, 2016/2017
The Limitations of DOM
1. The DOM provides a low-level application programming
interface (API) for accessing a tree in memory.
 Although the DOM has some object-oriented features, it does not
support the two most important features:
 information hiding (a concept alien to XML)
 implicit dispatch,
 the idea that the compiler or interpreter chooses the right function or
“method” to call based on the type or class of a target object.
2. The DOM nodes have a lot of methods and properties, which
use a lot of memory in many programming languages.

Selected by Dr. Mohammed Al-Ghanim


Zarqa University, 2016/2017
THE XPATH LANGUAGE
 The XML Path Language, XPath, is used to
point into XML documents and select parts of
them for later use.
 XPath was designed to be embedded and used
by other languages — in particular by XSLT,
XLink, and XPointer, and later by XQuery
 XPath isn’t a complete programming
language.
Selected by Dr. Mohammed Al-Ghanim
Zarqa University, 2016/2017
XPath Basics
 The most common way to use XPath is to pass an
XPath expression and one or more XML documents
to an XPath engine: the XPath engine evaluates the
expression and gives you back the result.
 This can be via a programming language API, using
a standalone command-line program, or indirectly,
as when XPath is included in another language such
as XQuery or XSLT.

Selected by Dr. Mohammed Al-Ghanim


Zarqa University, 2016/2017
 With XPath version 1.0, the result of evaluating an
XPath expression is a node list, and in practice this
is often a DOM node list.
 how to get at John Armstrong’s date of birth:

/entry/body/p/born
 If there was a whole book with lots of entries, and
you just wanted the one for John Armstrong, then
you would use this instead:
/book/entry[@id = “armstrong-john”]/body/p/born
Selected by Dr. Mohammed Al-Ghanim
Zarqa University, 2016/2017
XPath Node Types and Node Tests
 So far you’ve seen XPath used only with
XML elements, attributes, and text nodes,
 but XML documents can also contain node
types such as processing instructions and
comments. In addition, XPath 2 introduced
typed nodes: all the types defined by W3C
XML Schema are available.

Selected by Dr. Mohammed Al-Ghanim


Zarqa University, 2016/2017
Selected by Dr. Mohammed Al-Ghanim
Zarqa University, 2016/2017
Selected by Dr. Mohammed Al-Ghanim
Zarqa University, 2016/2017
XPath Predicates
 The general pattern is that you can apply a predicate to any
node list. The result is those nodes for which the predicate
evaluated to a true, or non-zero, value. You can even
combine predicates like so:
/book/chapter[position() = 2]/p[@class = ‘footnote’][position() = 1]
 This example finds all <chapter> elements in a book, then
uses a predicate to pick out only those chapters that are the
second child of the book (this is just the second chapter of the
book). It then finds all the <p> children of that chapter, and
uses a predicate to filter out all except those <p> elements
that have a class attribute with the value footnote, Finally, it
uses yet another predicate to choose only the first node in the
list — that is, the first such <p> element.
Selected by Dr. Mohammed Al-Ghanim
Zarqa University, 2016/2017
/entries/entry[body/p/born = /entries/entry/@born]
 This expression finds all the entry elements that
contain a born element whose value is equal to the
born attribute on some (possibly different) entry
element.
 Two types:
 Positional Predicates
 The Context in Predicates

Selected by Dr. Mohammed Al-Ghanim


Zarqa University, 2016/2017
Positional Predicate
 The simplest predicate is just a number, like [17],
which selects only the seventeenth node.
 You could write /book/ chapter[2] to select the
second chapter: that’s because it’s exactly the same
as writing /book/chapter[position() = 2].
 This is often called a positional predicate. It’s really
still a boolean (true or false) expression because of
the way it expands to position() = 17

Selected by Dr. Mohammed Al-Ghanim


Zarqa University, 2016/2017
Context Predecate
 For example, if you’re processing an SVG document you
might want to know if a definition is used, so, in the context
of a <def> element, you might look for
 //*[@use = current()/@id]
 to find every element with an attribute called use whose value is
equal to the id attribute on the
 current <def> element. You couldn’t just write
 //*[@use = @id]
 because that would look for every element having use and id
attributes with the same value, which is not at all what you
want.
Selected by Dr. Mohammed Al-Ghanim
Zarqa University, 2016/2017
Context Predecate
/book/entry[@id = “armstrong-john”]
the XPath engine finds the top-level <book> element, then makes a list of all
<entry> elements underneath that <book> element. Then, for each of those
<entry> elements,
Then the Xpath engine evaluates the predicate and keeps only the <entry> nodes
for which the predicate is true
you can access the initial context node with the current() function, and that
you can get at the current context node with a period (.). The names make
more sense when Xpath is used within XSLT, but for now all you need to
remember is that there’s always a context and you can access it explicitly
if you need to.

Selected by Dr. Mohammed Al-Ghanim


Zarqa University, 2016/2017
XPath Steps and Axes
 An XPath axis is really just a direction, like up or down.
 And a step moves you along the axis you have chosen.
 The most common XPath axis is called child, and is the
direction the XPath engine goes when you use a slash.
 Formally,
 /book/chapter
 is just a shorthand for
 /child::book/child::chapter

Selected by Dr. Mohammed Al-Ghanim


Zarqa University, 2016/2017
Selected by Dr. Mohammed Al-Ghanim
Zarqa University, 2016/2017
LISTING 7-3: extra.xml 
<?xml version=”1.0” encoding=”utf-8”?>
<people>
<person id=”armstrong-john”>
<p>W. J. Maloney’s 1954 book
“George and John Armstrong of Castleton,
2 Eighteenth-Century Medical Pioneers”
was reviewed by Dr. Jerome M Schneck.</p>
</person>
<person id=”newton-sir-isaac”>
<p>Invented the apple pie.</p>
</person>
</people>
Now you could use XPath to extract the names of people who are in both the biography and the extra file like so:

/entry[@id = doc(“extra.xml”)//person/@id]/title

Selected by Dr. Mohammed Al-Ghanim


Zarqa University, 2016/2017
Chapter 8
XSLT
?WHAT is XSLT
 One of the main places you find XPath is
XSLT. Extensible Stylesheet Language
Transformations
 (XSLT) is powerful way to transform files
from one format to another. Originally it
could only operate on XML files, although the
output could be any form of text file. Since
version 2.0 however,
Selected by Dr. Mohammed Al-Ghanim,
Zarqa University, 2016/2017
 it also has the capability to use any text file as
an input. XSLT is a declarative language and
uses templates to define the output that should
result from processing different parts of the
source files.

Selected by Dr. Mohammed Al-Ghanim,


Zarqa University, 2016/2017
 XSLT is often used to transform XML to
(X)HTML, either server-side or in the
browser. The advantages of doing a client-
side transformation are that it offloads the
presentational side of the process to the
application layer that deals with the display.

Selected by Dr. Mohammed Al-Ghanim,


Zarqa University, 2016/2017
 Additionally it frees resources on the server
making it more responsive, and it tends to
reduce the amount of data transmitted
between the server and the browser.
 This is especially the case when the data
consists of many rows of similar data that are
to be shown in tabular form.
 HTML tables are very verbose and can easily
double or triple the amount of bandwidth
between client and server.
Selected by Dr. Mohammed Al-Ghanim,
Zarqa University, 2016/2017
XSLT differs from many mainstream 
programming languages such as C# or Java in
.two main ways
,First, XSLT is a declarative language 
.Second, it is a functional language 
XSLT takes this idea of letting the processor look after 
.the low-level details one stage further
It is designed from the ground up as a declarative 
language, so you needn’t concern yourself with how
.something is done
.You concentrate on describing what you want done 

Selected by Dr. Mohammed Al-Ghanim,


Zarqa University, 2016/2017
XSLT as a Declarative Language
For example, if you want to perform a similar operation to
output all author names from an XML document
:containing many <author> elements, such as this one
>authors<
>author<
>firstName>Danny</firstName<
>lastName>Ayers</lastName<
>author/<
>author<
>firstName>Joe</firstName<
>lastName>Fawcett</lastName<
>author/<
>author<
>firstName>William</firstName<
>lastName>Shakespeare</lastName <
>author/<
>authors/<

Selected by Dr. Mohammed Al-Ghanim,


Zarqa University, 2016/2017
:you’d use an XSLT template such as
>/ ”xsl:template match=”author<
>/ ”xsl:value-of select=”firstName” /> <xsl:value-of select=”lastName<
>xsl:template/<
As you can see, you haven’t had to declare a variable to 
keep track of the <author> elements or write any code that
.loops through them
You just tell the XSLT processor to output the value of the 
<firstName> and the <lastName> element whenever you
come across an <author> element. You’ll learn more about
how this all works when you’ve dealt with another aspect
of XSLT programming—the fact that it’s a functional
.language
Selected by Dr. Mohammed Al-Ghanim,
Zarqa University, 2016/2017
FOUNDATIONAL XSLT
ELEMENTS
XSLT is based on the idea of templates. The 
basic concept is that you specify a number of
templates that each match XML in the source
.document
When the matching XML is found, the template 
is activated and its contents are added to the
.output document
For example, you may have a template that matches 
a <Person> element. For each <Person> element
encountered in the source document the
.corresponding template will be activated
Selected by Dr. Mohammed Al-Ghanim,
Zarqa University, 2016/2017
Any code inside the template will be executed 
.and added to the output
The code within the templates can be 
complex and has full access to the item that
was matched and caused the template to run
as well as other information about the input
.document

Selected by Dr. Mohammed Al-Ghanim,


Zarqa University, 2016/2017
SETTING UP YOUR XSLT
DEVELOPMENT ENVIRONMENT
Before you start to run any XSLT code, you need 
to set up an environment to write and process
your transformations. The Saxon processor runs 
:the examples in this chapter for three reasons
It’s the acknowledged leader in its fi eld with the 
.most up-to-date implementation of XSLT
It’s free to use (although commercial versions have 
.more features)
It has both a Java and a .NET version, making it 
.suitable to run on nearly all environments
Selected by Dr. Mohammed Al-Ghanim,
Zarqa University, 2016/2017
Basic XSLT constructs that enables
:basic transformations writing
xsl:stylesheet>: This is the all-encompassing document element< 
used to hold all your templates. You also use it for some confi
.guration, such as setting which version of XSLT you want to use
xsl:template>: This is the bedrock of XSLT and has two main< 
features. It details what items from the source document it should
handle and uses its content to specify what should be added to the
.output when it is executed
xsl:apply-templates>: This element is responsible for deciding< 
which items in the source document should be processed; they are
.then handled by the appropriate template
xsl:value-of>: This element is used to evaluate an expression< 
and add the result to the output. For example, you may be
processing a <Person> element and use <xsl:value-of> to add the
.contents of its <Name> element to the output

Selected by Dr. Mohammed Al-Ghanim,


Zarqa University, 2016/2017
xsl:for-each>: Occasionally you need to< 
process a number of items in a similar fashion
but using an <xsl:template> isn’t a good
option. In that case you can use this element
to group the items and produce output based
.on each one

Selected by Dr. Mohammed Al-Ghanim,


Zarqa University, 2016/2017

You might also like