XML Simplified
XML Simplified
XML Simplified
Learners Guide
XML Simplified
Learners Guide
2012 Aptech Limited
All rights reserved
No part of this book may be reproduced or copied in any form or by any means graphic, electronic or
mechanical, including photocopying, recording, taping, or storing in information retrieval system or sent
or transferred without the prior written permission of copyright owner Aptech Limited.
All trademarks acknowledged.
APTECH LIMITED
Head Office:
Aptech House,
A-65, MIDC,
Andheri (East),
Mumbai - 400 093.
www.aptech-globaltraining.com
E-mail: [email protected]
Second Edition - 2012
Dear Learner,
We congratulate you on your decision to pursue an Aptech Worldwide course.
Aptech Ltd. designs its courses using a sound instructional design model from conceptualization
to execution, incorporating the following key aspects:
The detailed instructional material Training aids, Learner material, reference material, project
guidelines, etc.- are then developed. Rigorous quality checks are conducted at every stage.
Assessment of learning
The learning is assessed through different modes tests, assignments & projects. The
assessment system is designed to evaluate the level of knowledge & skills as defined by the
learning objectives.
*TAG Technology & Academics Group comprises of members from Aptech Ltd., professors from
reputed Academic Institutions, Senior Managers from Industry, Technical gurus from Software
Majors & representatives from regulatory organizations/forums.
Technology heads of Aptech Ltd. meet on a monthly basis to share and evaluate the technology
trends. The group interfaces with the representatives of the TAG thrice a year to review and
validate the technology and academic directions and endeavors of Aptech Ltd.
1
Evaluation of
Instructional
Processes and
Material
Need Analysis
and design of
curriculum
Design and
development of
instructional
material
Assessment of
learning
Strategies for
delivery of
instructions
Preface
In this book, XML Simplified, students will learn the fundamentals of XML. This book is an introduction
to XML that prepares students with a strong foundation in one of the key elements of Web programming,
that is, XML. The book describes and explains various features and concepts of XML.
This book is the result of a concentrated effort of the Design Team, which is continuously striving to bring
you the best and the latest in Information Technology. The process of design has been a part of the ISO
9001 certification for Aptech-IT Division, Education Support Services. As part of Aptechs quality drive,
this team does intensive research and curriculum enrichment to keep it in line with industry trends.
We will be glad to receive your suggestions. Please send us your feedback, addressed to the Design
Centre at Aptechs corporate office, Mumbai.
Design Team
Table of Contents
Module
1.
Introduction to XML
2.
Namespaces
38
3.
DTDs
50
4.
XML Schema
80
5.
Style Sheets
124
6.
158
7.
More on XSLT
211
Module
Concepts
Introduction to XML
Module Overview
Welcome to the module, Introduction to XML. The module describes drawbacks of earlier
markup languages that led to the development of XML. The module also explains the structure
and lifecycle of the XML document. This module covers more on the XML syntax and the various
parts of the XML document.
In this module, you will learn about:
Introduction to XML
Exploring XML
XML Syntax
XML Simplified
page of 264
Module 1
Introduction to XML
Concepts
Procedural Markup: Procedural markup is similar to presentation markup but in the former, the
user will be able to edit the text file. Here, the user is helped by the software to arrange the text.
These markup languages are used in professional publishing organizations.
Descriptive Markup: Descriptive markup is also known as semantic markup. This kind of markup
determines the content of the document.
Two types of markup languages are popular in recent times. They are as follows:
Generalized Markup Languages: These type of languages describe the structure and meaning
of the text in a document.
Specific Markup Languages: These type of markup languages are used to generate application
specific code.
Generalized Markup Language (GML), a project by IBM, helped the documents to be edited, formatted,
and searched by different programs using its content-based tags. Generalized Markup Language (GML)
was developed to accomplish the following:
The markup should describe only the structure of the document but not its flair.
The syntax of the markup language should be strictly followed so that the code can clearly be
read by a software program or by a human being.
In 1980, ANSI Committee created Standard Generalized Markup Language (SGML), an all-encompassing
coding scheme and flexible toolkit for developing specialized markup languages. Standard Generalized
Markup language (SGML) is the successor to GML. In 1986, International Organization for Standardization
(ISO) acquired it as a standard. SGML is a meta language as other languages are created from it.
SGML has a syntax to include markup in documents. SGML also has a syntax to describe what tags
are allowed in different locations. SGML application consists of SGML declaration and SGML Document
Type Definition (DTD).
In 1989, Hyper Text Markup Language (HTML), a technology for sharing information by using hyperlinked
text documents was developed. HTML (Hyper Text Markup Language) was created from SGML. In the
early years, it was extensively accessed by scientists and technicians. HTML was originally created
to mark up technical papers, so that they could be transferred across different platforms. However,
with time, it began to be used for marking up non-technical documents too. As the use of the Internet
became popular, browser manufacturers started developing different tags to display documents with
more creativity. But, this created problems for implementation in different browsers with the increase in
the number of tags used.
page of 264
XML Simplified
Module 1
Introduction to XML
Concepts
Features
1.
GML describes the document in terms of its format, structure, and other properties.
2.
SGML ensures that the system can represent the data in its own way.
3.
HTML used ASCII text, which allows the user to use any text editor.
Drawbacks
1.
GML and SGML were not suited for data interchange over the Web.
2.
HTML possesses instructions on how to display the content rather than the content they
encompass.
XML Simplified
page of 264
Module 1
Concepts
Introduction to XML
1.1.3
Features of XML
XML tags are not predefined. You must define your own tags
page of 264
XML Simplified
Module 1
1.1.4
XML Markup
XML markup defines the physical and logical layout of the document. XML can be considered as an
information container. It contains shapes, labels, structures and also protects information. XML employs
a tree-based structure to represent a document. The basic foundation of XML is laid down by symbols
embedded in the text known as markup. The markup combines the text and extra information about the
text like its structure and presentation. The markup divides the information into a hierarchy of character
data and container-like elements and its attributes. A number of software programs process electronic
documents use a markup.
The underlying unit of XML is a character. The combination of characters is known as an entity. These
entities are either present in the entity declaration or in a text file stored externally. All the characters are
grouped together to form an XML document.
XMLs markup divides a document into separate information containers called elements. A document
consists of one outermost element called root element that contains all the other elements, plus some
optional administrative information at the top, known as XML declaration. The following code demonstrates
the elements.
Code Snippet:
<?xml version="1.0" encoding="iso-8859-1" ?>
- <FlowerPlanet>
<Name>Rose</Name>
<Price>$1</Price>
<Description>Red in color</Description>
<Number>700</Number>
</FlowerPlanet>
where,
<Name>, <Price>, <Description> and <Number> inside the tags are elements.
<FlowerPlanet> and </FlowerPlanet> are the root elements.
The usage of XML can be observed in many real-life scenarios. It can be used in the fields of information
sharing, single application usage, content delivery, re-use of data, separation of data and presentation,
semantics, and so forth. News agencies are a common place where XML is used. News producers and
news consumers often use a standard specification named XMLNews to produce, retrieve, and relay
information across different systems in the world.
Note: XML is a subset of SGML, with the same goals, but with as much of the complexity eliminated
as possible. This means that any document which follows XMLs syntax rules will also follow SGMLs
syntax rules, and can therefore be read by existing SGML tools.
XML Simplified
page of 264
Concepts
Introduction to XML
Module 1
Concepts
1.1.5
Introduction to XML
Data independence
Data independence is the essential characteristic of XML. It separates the content from its
presentation. Since an XML document describes data, it can be processed by any application.
Easier to parse
The absence of formatting instructions makes it easy to parse. This makes XML an ideal framework
for data exchange.
Easier to create
It is text-based, so it is easy to create an XML document with even the most primitive text processing
tools. However, XML also can describe images, vector graphics, animation or any other data type
to which it is extended.
e-Commerce
XML can be used as an exchange format in order to send data from one company to another.
page of 264
XML Simplified
Module 1
Note: The process of manipulating an XML document is called as XML Parsing. The parser loads the
document into the memory. After the document is loaded into the memory, Document Object Model
(DOM) manipulates the data.
Even though XML has many advantages, it has a few disadvantages too. Some of the disadvantages are
as follows:
Usage of XML leads to increase in data size and processing time. Since XML uses Unicode
encoding set for characters, it also consumes more memory.
XML lacks adequate amount of processing instructions. If the process of translation is not used,
then the developers globally are forced to prepare their own processing instructions to display
XML in the required form.
Versions of Internet Explorer (IE) earlier than 5.0 do not support XML.
Knowledge Check 1
1.
Which of the following statements are true and which are false in the case of XML?
(A)
(B)
(C)
(D)
(E)
2.
Which of the statements about XML are true and which of the statements are false?
(A) XML describes its data along with its presentation.
(B) Client reduces the server load by sending large amount of information in one XML document
to the server.
(C) XML uses only XSLT to be transformed to HTML.
(D) XML can be implemented as middle-tier for client server architectures.
(E) XML allows data exchange as it has no formatting instructions.
XML Simplified
page of 264
Concepts
Introduction to XML
Module 1
Concepts
Introduction to XML
State the functions of editors for XML and list the popularly used editors.
State the functions of parsers for XML and list names of commonly used parsers.
State the functions of browsers for XML and list the commonly used browsers.
Document Prolog: XML parser gets information about the content in the document with the help
of document prolog. Document prolog contains metadata and consists of two parts - XML
Declaration and Document Type Declaration. XML Declaration specifies the version of XML being
used. Document Type Declaration defines entities or attributes values and checks grammar and
vocabulary of markup.
Root Element: The second is an element called the root element. The root element is also called
a document element. It must contain all the other elements and content in the document. An XML
element has a start tag and end tag.
Sibling: They are elements which have the same parent element.
An XML document consists of a set of unambiguously named "entities". Every XML document starts with
a "root" or document entity. All other entities are optional. Entities are aliases for more complex functions.
A single entity name can represent a large amount of text. The alias name is used each time some text
is referenced and the processor expands the contents of the alias.
page of 264
XML Simplified
Module 1
Introduction to XML
Concepts
XML Simplified
page of 264
Module 1
Introduction to XML
Concepts
where,
The first block indicates xml declaration and document type declaration. Music_Library is the root
element.
page 10 of 264
XML Simplified
Module 1
Introduction to XML
Concepts
1.2.4 Editors
An XML Editor is used to create and edit XML documents. Any application can be used as an editor
in XML. Since all XML documents are text-based markup languages, a standard Windows Notepad
or Wordpad can also be used. However, for various reasons, Notepad should not be used for writing
professional XML. Notepad does not know that the text written in it is XML code, and thus, can create
problem. For an XML document to be error-free and possess XML-specific features, like the ability to edit
elements and attributes, a professional XML editor should be used.
XML Simplified
page 11 of 264
Module 1
Introduction to XML
Concepts
XMLwriter
XML Spy
XML Pro
XMLmind
XMetal
1.2.5 Parsers
An XML parser/XML processor reads the document and verifies it for its well-formedness.
An XML parser keeps the entire representation of XML data in memory with the help of Document Object
Model (DOM). The in-built parser used in IE 5.0 is also known as Microsoft XML Parser (MSXML). It is a
component, which is available once IE 5.0 is installed.
Microsofts XML parser goes through the entire data structure and accesses the values of the attributes in
the elements. The parser also creates or deletes the elements and converts the tree structure into XML.
MSXML supports some of the COM objects for backward interoperability.
page 12 of 264
XML Simplified
Module 1
Some of them are XSLProcessor objects, XSLTemplate objects, XMLDOMSelection objects,
XMLSchemaCache objects, and XMLDOMDocument2 objects.
MSXML 3.0 supports XSL transformations, which contain data manipulation operations. The XML parsers
ignore the white space by default, but if the default value of Boolean preserveWhiteSpace property of
DOMDocument object is true, the XML parsers preserve the white space. Using Microsoft Data type
Schema, MSXML 3.0 specifies a data type for the element or the attribute. MSXML also improves the
performance of the applications with the help of different caching features.
XML parsing in Mozilla performs functions like going through the elements, accessing their values, and
so on. Using JavaScript, an instance of XML parser can be created in Mozilla browsers. The XML parsing
in Firefox automatically parses the data into Document Object Model (DOM). Opera uses only those XML
parsers that do not validate DTDs by default.
Speed and performance are the criteria against which XML parsers are selected.
Commonly used parsers are:
Crimson
Xerces
MSXML
Reads the document and checks for its conformity with XML standards.
Validating parser
XML Simplified
page 13 of 264
Concepts
Introduction to XML
Module 1
Concepts
Introduction to XML
1.2.6 Browsers
After the XML document is read, the parser passes the data structure to the client application. The
application can be a Web browser. The browser then formats the data and displays it to the user. Other
programs like database, MIDI program or a spreadsheet program may also receive the data and present
it accordingly.
The final output of XML data is viewed in a browser. XML is not supported by all browsers, for example,
Netscape Navigator 4.0 does not support XML but later versions of the browser like Netscape 6.0 do
support it. Only browsers like IE 5.0 or greater give full support for XML specifications. In IE, XML can be
directly viewed using style sheets. It gives support to namespaces and handles a mechanism known as
data islands where XML is embedded into HTML.
Mozilla 5.0 uses an interface to the XML DOM (Document Object Model) via JavaScript and plug-ins. It
also supports elements from the HTML namespace.
Commonly used Web browsers are as follows:
Netscape
Mozilla
Internet Explorer
Firefox
Opera
Knowledge Check 2
1.
Which of the statements about the structure of XML documents are true and which statements are
false?
(A) XML documents are stored with .xml extension.
(B) Document prolog can consist of version declaration, DTD comments and processing
instructions.
(C) XML declaration informs the processing agent about the version of XML being used.
(D) Root element must not be a nonempty tag.
(E) The logical structure gives information about the elements and the order in which they are
to be included in the document.
page 14 of 264
XML Simplified
Module 1
2.
Which of the characteristics are true or false accordingly when an XML document is created by an
XML editor?
(A)
(B)
(C)
(D)
(E)
3.
Which of the statements about XML browsers and parsers are true and which statements are
false?
(A)
(B)
(C)
(D)
(E)
2.
DTD or Schema
3.
XML Simplified
page 15 of 264
Concepts
Introduction to XML
Module 1
Introduction to XML
Concepts
page 16 of 264
XML Simplified
Module 1
Introduction to XML
Concepts
XML Simplified
page 17 of 264
Module 1
Concepts
Introduction to XML
The Document Type Declaration declares and defines the elements used in the document class. This is
a DTD used internally as demonstrated in the following code:
<!DOCTYPE
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
]>
Student [
Student (Name,Dob,BloodGroup,RollNumber)>
Name (#PCDATA)>
Dob (#PCDATA)>
BloodGroup (#PCDATA)>
RollNumber (#PCDATA)>
page 18 of 264
XML Simplified
Module 1
Introduction to XML
Concepts
Structure
It describes the form of the document by specifying the relationship between different elements
in the document. It emphasizes to specify a single non-empty, root element that contains other
elements and the content.
Semantic
Semantics describes how each element is specified to the outside world of the document. For
example, an HTML enabled Web browser assigns "paragraph" to the tags <P> and </P> but not to
the tags <Message> and </Message>.
Style
It specifies how the content of the tag or element is displayed. It indicates whether the tag is bold,
normal, and pink in color or with the font size 10.
XML Simplified
page 19 of 264
Module 1
Concepts
Introduction to XML
Tags should contain combination of letters, numbers, periods (.), colons, underscores, or
hyphens (-).
Attributes are specified by a name and value pair which is delimited by equal (=) sign. The
values are delimited by quotation marks. For example,
page 20 of 264
XML Simplified
Module 1
Introduction to XML
Concepts
XML Simplified
page 21 of 264
Module 1
Concepts
Introduction to XML
Knowledge Check 3
1.
page 22 of 264
XML Simplified
Module 1
Introduction to XML
XML Simplified
Concepts
page 23 of 264
Module 1
Concepts
2.
Introduction to XML
(A)
(B)
(C)
(D)
page 24 of 264
XML Simplified
Module 1
Introduction to XML
Concepts
State and describe the use of comments and processing instructions in XML.
1.4.1 Comments
XML comments are used for the people to give information about the code in the absence of the developer.
It makes a document more readable. Comments are not restricted to document type definitions but may
be placed anywhere in the document. Comments in XML are similar to those in HTML. Comments should
be used only when needed, as they are not processed. Comments are used only for human consumption
rather than machine consumption. Since the comments are not parsed, their presence or absence does
not make any difference to the processors.
They are inserted into the XML document and are not part of the XML code. They can appear in the
document prolog, DTD or in the textual content. These comments will not appear inside the tags or
attribute values.
Comments start with the string <!-- and end with the string -->. The parser believes that the comment
has come to an end when it finds a > as shown in figure 1.8.
XML Simplified
page 25 of 264
Module 1
Introduction to XML
Concepts
Rules
The rules that are to be followed while writing the comments are as follows:
Comments should not include "-" or "" as it might lead to confuse the XML parser.
page 26 of 264
XML Simplified
Module 1
1.4.2 Processing Instructions
Processing instructions are information which is application specific. These instructions do not follow
XML rules or internal syntax. With the help of a parser these instructions are passed to the application.
The application can either use the processing instructions or pass them on to another application.
The main objective of a processing instruction is to present some special instructions to the application.
All processing instructions must begin with <? and end with ?>.
Though an XML declaration also begins with <? and end with ?>, it is not considered as a processing
instruction. It is because an XML declaration provides information only for the parsers and not for the
application. In some cases, the application might need the information in the processing instruction only
if it displays the output to the user.
Syntax:
<?PITarget <instruction>?>
where,
PITarget is the name of the application that should receive the processing instructions.
<instruction> is the instruction for the application.
The following code demonstrates an example of processing instruction.
Code Snippet:
<Name NickName=John>
<First>John</First>
<!--John is yet to pay the term fees-->
<Last>Brown</Last>
<?feesprocessor SELECT fees FROM STUDENTFEES?>
<Semester>Final</Semester>
</Name>
where,
feesprocessor is the name of the application that receives the processing instruction.
SELECT fees FROM STUDENTFEES is the instruction.
XML Simplified
page 27 of 264
Concepts
Introduction to XML
Module 1
Concepts
Introduction to XML
CDATA
PCDATA
1.4.4 PCDATA
The data that is parsed by the parser is called as parsed character data (PCDATA). The PCDATA specifies
that the element has parsed character data. It is used in the element declaration.
Escape character like "<" when used in the XML document will make the parser interpret it as a new
element. As a result it will generate an error as shown in figure 1.9.
page 28 of 264
XML Simplified
Module 1
Introduction to XML
Concepts
1.4.5 CDATA
CDATA stands for character data that has reserved and white space characters in it. Though the text
inside the CDATA is not parsed by the parser, it is commonly used for scripting code. XML parser ignores
all tags and entity references inside the CDATA blocks. The CDATA block indicates to the parser that it is
just a text but not a markup. A CDATA block always begins with the delimiter <![CDATA[ and ends with
the delimiter ]]>. Since the ending delimiter marks the end of the CDATA block, the character string ]]/
is not allowed in the middle of the CDATA block. This will signal the end of the CDATA section.
The syntax of CDATA is as follows:
<![CDATA[ data ]]>
The following code demonstrates an example of CDATA.
Code Snippet:
<Sample>
<![CDATA[
<Document>
<Name>Core XML</Name>
<Company>Aptech</Company>
</Document>]]>
</Sample>
XML Simplified
page 29 of 264
Module 1
Concepts
Introduction to XML
1.4.6 Entities
The XML document is made up of large amount of information called as entities. Entities are used to avoid
typing long pieces of text repeatedly, within the document. They can be categorized into the following:
Character entities: They form the mechanism, which is used in place of a characters literal form.
They provide the meaning of > when the symbol > is typed. Character entities can also be
used with decimal or hexadecimal values with a condition that the numbers support Unicode
coding. An XML processing program replaces character entities with their equivalent characters.
Content entities: These entities are used to replace certain values. They are similar to text
substitution macros in programming languages such as C. Content entity has the following syntax:
<!Entity name value>
Unparsed entities: These entities when used, turn off the parsing process. They can be used to
include multimedia content in the XML document.
Every entity consists a name and a value. The value ranges from a single character to a XML markup
file. As the XML document is parsed, it checks for entity references. For every entity reference, the parser
checks the memory to replace the entity reference with a text or markup.
An entity reference consists of an ampersand (&), the entity name, and a semicolon (;).
All the entities must be declared before they are used in the document. An entity can be declared either
in a document prolog or in a DTD.
Some of the entities are defined in the system and are known as pre-defined entities. These entities are
described in table 1.2.
Predefined Entity
<
>
&
'
"
Description
Produces the left angle bracket
Produces the right angle bracket
Produces the ampersand
Produces a single quote character
Produces a double quote character
Output
<
>
&
"
page 30 of 264
XML Simplified
Module 1
Introduction to XML
Concepts
page 31 of 264
Module 1
Introduction to XML
Concepts
General Entity
These are the entities used within the document content. General entities can either be declared
internally or externally. The references for general entities start out with an ampersand (&) and end
with a semicolon (;). The entitys name is present within these two characters.
Every internal general entity is defined in the DTD. It is declared with the keyword <!ENTITY>. The
syntax is as follows:
<!ENTITY Name "text that is to replaced">
where,
Name: value for the text that is to be replaced.
External entities refer to the storage units outside the document containing the root element. Using
external entity references, external entities can be embedded inside the document. An external
entity reference indicates the location where the parser should insert the external entity in the
document. The external entity has Uniform Resource Locator (URL) in its declaration; the URL
specified indicates the document where the text of the entity is present. The syntax is as follows:
<!ENTITY Name SYSTEM "URL">
The following code demonstrates an example of general entity.
Code Snippet:
<!DOCTYPE MusicCollection [
<!ENTITY R "Rock">
<!ENTITY S "Soft">
<!ENTITY RA "Rap">
<!ENTITY HH "Hiphop">
<!ENTITY F "Folk">
]>
Parameter Entity
These types of entities are used only in the DTD. These type of entities are declared in DTD. Both
internal and external parameter entities should not be used in the content of the XML document
as the processor does not recognise them. A well-formed parameter entity is similar to general
entity, except that it will include the % specifier. The reference is also similar to the general entity
reference.
page 32 of 264
XML Simplified
Module 1
Introduction to XML
Concepts
References to these entities are made by using percent-sign (%) and semicolon (;) as delimiters.
The following code demonstrates an example of parameter entity.
Code Snippet:
<!ENTITY % ADDRESS "text that is to be represented by an entity">
A well-formed parameter entity will look like a general entity, except that it will include the "%"
specifier.
XML Simplified
page 33 of 264
Module 1
Introduction to XML
Concepts
page 34 of 264
XML Simplified
Module 1
1.4.9 Attributes
Attributes are part of the elements. They provide information about the element and are embedded in
the element start-tag. An attribute consists of an attribute name and an attribute value. The name always
precedes its value, which are separated by an equal sign. The attribute value is enclosed in the quotes
to delimit multiple attributes in the same element. An attribute can be of CDATA, ENTITY, ENUMERATION,
ID, IDREF, NMTOKEN or NOTATION.
An enumerated attribute value is used when the attribute values are to be one of a fixed set of legal
values. The attribute of identifier type (ID) should be unique. It is used to search a particular instance
of an element. Each element can only have one attribute of type ID. IDREF is also an identifier type
and it should only point to one element. IDREF attributes can be used to refer to an element from
other elements. An attribute of type NMTOKEN allows any combination of name token characters. The
characters can be letters, numbers, periods, dashes, colons or underscores. An NMTOKENS type attribute
allows multiple values but separated by white space. A NOTATION type attribute must refer to a notation
declared elsewhere in the DTD. A declaration can also be an example for a list of notations.
Syntax:
<elementName attName1="attValue2" attName2="attValue2"...>
The following code demonstrates an example of attributes.
Code Snippet:
<?xml version="1.0" ?>
<Player Sex="male">
<FirstName>Tom</FirstName>
<LastName>Federer</LastName>
</Player>
Attributes have some limitations, which are as follows:
Unlike child elements, they do not contain multiple values and do not describe structures.
XML Simplified
page 35 of 264
Concepts
Introduction to XML
Module 1
Concepts
Introduction to XML
Knowledge Check 4
1.
Which of the following statements are true in the case of comments and processing instructions in
XML?
(A)
(B)
(C)
(D)
(E)
2.
Which of the following statements are valid for character data in XML?
(A)
(B)
(C)
(D)
(E)
page 36 of 264
XML Simplified
Module 1
Introduction to XML
Concepts
Module Summary
In this module, Introduction to XML, you learnt about:
Introduction to XML
XML was developed to overcome the drawbacks of earlier markup languages. XML consists
of set of rules that describe the content to be displayed in the document. XML markup
contains the content in the information containers called as elements.
Exploring XML
XML document is divided into two parts namely document prolog and root element. An XML
editor creates the XML document and a parser validates the document.
XML Syntax
Comments are used in the document to give information about the line or block of code. The
content in XML document is divided into mark up and character data. The entities in XML are
divided into general entities and parameter entities. A DTD can be declared either internally
or externally.
XML Simplified
page 37 of 264
Module
Concepts
Namespaces
Module Overview
Welcome to the module, Namespaces. This module introduces XML Namespaces and the reasons
for using Namespaces in XML documents. This module aims at giving a clear understanding of
Namespaces syntax.
In this module, you will learn about:
XML Namespaces
page 38 of 264
Module 2
Namespaces
Concepts
2.1.3 Namespaces
In XML, elements are distinguished by using namespaces. XML Namespaces provide a globally unique
name for an element or attribute so that they do not conflict one another.
A namespace is a collection of names that can be used as element names or attribute names in XML
document.
XML namespaces provide the following advantages:
Reusability
XML Namespaces enable reuse of markup by making use of the elements and attributes that were
defined earlier.
Modularity
Reusable code modules can be written, and these can be invoked for specific elements or attributes.
Elements and attributes from different modules can be integrated into a single XML document.
Universally unique element names and attributes guarantee that such modules can be invoked for
certain elements and attributes.
Extensibility
XML Namespaces provide the XML documents with the ability to embed elements and attributes
from other vocabularies like MathML, XHTML (Extensible HyperText MarkUp Language), and so
forth.
XML Simplified
page 39 of 264
Module 2
Concepts
Namespaces
Note: A Namespace is identified by the Uniform Resource Identifier (URI). For example, we can have all
three elements named as 'batch'. The first referring the batch of students in Aptech Education Center,
the second a batch of products, and the third, a batch of tourists. The element batch can be identified
with a unique URI as given.
http://www.Aptech_edu.ac.batch
http://www.tea.org.batch
http://www.tourism.org.batch
Knowledge Check 1
1.
Which of these statements about XML elements and XML Namespaces are true and which statements
are false?
(A) Browser has the ability distinguish duplicate element names in an XML document.
(B) XML developer has to ensure the uniqueness of the element names and attributes in a
document.
(C) A namespace is a collection of names that can be used as element names or attribute
names in XML document.
(D) In XML, elements are distinguished by using DTD.
page 40 of 264
XML Simplified
Module 2
Using prefixes in the element names provide a means for the document authors to prevent name collisions,
as demonstrated in the following code.
Code Snippet:
<CD:Title> Feel </CD:Title>
and
<Book:Title> Returning to Earth </Book:Title>.
In the example both CD and Book are namespace prefixes.
XML Simplified
page 41 of 264
Concepts
Namespaces
Module 2
Concepts
Namespaces
namespacePrefix
Each namespace has a prefix that is used as a reference to the namespace. Prefixes must not
begin with xmlns or xml. A namespace prefix can be any legal XML name that does not contain a
colon. A legal XML name must begin with a letter or an underscore.
elementName
It specifies the name of the element.
xmlns
The xmlns attribute is what notifies an XML processor that a namespace is being declared. xmlns
stands for XML Namespace.
URI
A Uniform Resource Identifier (URI) is a string of characters which identifies an Internet Resource.
The URI includes Uniform Resource Name (URN) and a Uniform Resource Locator (URL). URLs
contain the reference for a document or an HTML page on the Web. The URN is a universally
unique number that identifies Internet resources. Additionally, URIs are also case-sensitive, which
means that the following two namespaces are different: http://www.XMLNAMESPACES.com,
http://www.xmlnamespaces.com.
The purpose of namespaces, when used in an XML document, is to prevent the collision of
similar elements and attribute names. A namespace in XML is simply a collection of element and
attribute names identified by a URI reference. A URI reference is nothing but a string identifier.
A URI need not be valid, for example, there may not be an actual Web site with the name
http://www.xmlnamespaces.com. In particular, the URIs do not need to be valid or point to an
actual resource.
page 42 of 264
XML Simplified
Module 2
URIs need not be valid, therefore, XML namespaces treat them like strings. In particular, comparisons
are done character-by-character. According to this definition, the following URIs are not identical
even though they point to the same document: http://www.xmlnamespaces.com, http://
xmlnamespaces.com.
The following code demonstrates the properties of namespace.
Code Snippet:
<Auc:Books xmlns:Auc="http://www.auction.com/books"
xmlns:B="http://www.books.com/HTML/1998/xml1">
...
<Auc:BookReview>
<B:Table>
...
The elements that are prefixed with Auc are associated with a namespace with name http://
www.auction.com/books, whereas those prefixed with B are associated with a namespace
whose name is http://www.books.com/HTML/1998/xml1.
Note: URIs encompasses both URLs and URNs. URNs differ from URLs in that URLs describe the
physical location of the particular resource, whereas URNs define a unique location-independent name
for a resource that maps to one or more URLs. URLs all begin with Internet service prefix such as ftp:,
http:, and so on, whereas URNs begin with urn: prefix.
XML Simplified
page 43 of 264
Concepts
Namespaces
Module 2
Namespaces
Concepts
where,
prefix is used as a reference to the namespace. Prefixes must not begin with xmlns or xml.
localname is the name of an attribute.
value mentions a user defined value for an attribute.
In the following code, the type attribute is associated with the book namespace since it is preceded by
the prefix Book.
Code Snippet:
<Catalog xmlns:Book = "http://www.aptechworldwide.com">
<Book:Booklist>
<Book:Title Book:Type = "Fiction">Evening in Paris</Book:Title>
<Book:Price>$123</Book:Price>
</Book:Booklist>
</Catalog>
Note: Attributes from a particular namespace can also be added to elements from a different namespace.
The following example demonstrates this concept.
<Catalog
xmlns=
"http://www.Aptech_edu.ac"
xmlns:Author=
"http://www.
aptechworldwide.com">
<Book Type= "Adventure">The Last Samurai</Book>
<Book Author:Type= "Fiction">Hannibal</Book>
<Book>American Dream</Book>
</Catalog>
page 44 of 264
XML Simplified
Module 2
Namespaces
MathML Document
Mathematical Markup Language (MathML) is an XML-based markup language to represent complex
mathematical expressions. It comes in two types, as markup language for presenting the layout
of mathematical expressions and as a markup language for presenting the mathematical content
of the formula. For example, expression x + 1 can be written in MathML as demonstrated in the
following code.
Code Snippet:
<MRow>
<Mi>x</Mi>
<Mo>+</Mo>
<Mn>1</Mn>
</MRow>
Syntax:
The syntax of a default namespace is given as,
<elementName xmlns='URL'>
where,
elementName specifies the name of the element belonging to the same namespace.
URL specifies the namespace which is reference for a document or an HTML page on the Web.
The following code demonstrates the default namespace.
Code Snippet:
<Catalog xmlns="http://www.aptechworldwide.com">
<BookList>
<Title type = "Thriller">African Safari</Title>
<Price>$12</Price>
<ISBN>23345</ISBN>
</Booklist>
</Catalog>
A default namespace using the xmlns attribute with a URI as its value. Once this default namespace is
declared, child elements that are part of this namespace do not need a namespace prefix.
XML Simplified
page 45 of 264
Concepts
Module 2
Concepts
Namespaces
Logical and consistent prefix names should be used for convenience while creating an XML
document.
Namespaces should be applied to a personal vocabulary, even when there is just one.
Namespaces should be used to separate or isolate elements that may otherwise seem similar.
When designing two vocabularies that have some elements in common, one namespace should
be used to hold the common items.
page 46 of 264
XML Simplified
Module 2
Namespace URI should change for every substantive change to the vocabulary, including the
addition and deletion of new elements.
If possible, one prefix should be used for one namespace throughout all XML documents in a
system.
All namespace declarations should be made in the start tag of the document element.
Knowledge Check 2
1.
2.
Which of these statements about attributes and namespaces are true and which statements are
false?
(A) Attributes belonging to a particular elements within some namespace is also a part of the
same namespace.
(B) An attribute without a prefix is in default namespace.
(C) xmlns:localname="value" is the correct syntax for including a attribute in a
namespace.
(D) <Student:Name age = "12">Kevin</Student:Name> is the correct for associating
age with the student namespace.
(E) The prefix used in an attribute is used as a reference to the namespace.
XML Simplified
page 47 of 264
Concepts
Namespaces
Module 2
Concepts
3.
Namespaces
Which of these statements about default namespaces are true and which statements are false?
(A) <elementName xmlns='URL'> is the correct syntax for declaring a default namespace.
(B) The descendant has the same namespace as the parent element even if it has a new
namespace definition.
(C) A default namespace is used by an element and its child elements if the element has a
namespace prefix.
(D) A default namespace applies to the element on which it was defined and all descendants of
that element.
(E) A descendant having a new namespace cannot override the namespace defined by the
parent element.
page 48 of 264
XML Simplified
Module 2
Namespaces
Concepts
Module Summary
In this module, Namespaces, you learnt about:
XML Namespaces
Namespaces distinguish between elements and attributes with the same name from different
XML applications. It is a collection of names that can be used as element names or attribute
names in XML document. XML Namespaces provide a globally unique name for an element or
attribute to avoid name collisions.
XML Simplified
page 49 of 264
Module
Concepts
DTDs
Module Overview
Welcome to the module, Document Type Definitions (DTDs). This module focuses on how to
create DTDs for XML files. In this module, topics such as DOCTYPE declarations, types of DTDs,
well-formedness, and validity of XML files, declaration of elements, attributes, and entities in a
DTD are also covered.
In this module, you will learn about:
Declarations
page 50 of 264
Module 3
XML Parsers
An XML parser is a software program or set of programs that parses XML documents for its structure.
XML parsers are also capable of validating XML files depending on DTDs.
Abbreviation
Location
Syntax
DOCTYPE
Within the XML document's prolog
<!ATTLIST
element-name
attribute-name attribute-type defaultvalue>
Or
Language Type
...
<!DOCTYPE name_of_root_element
[ internal DTD subset ]>
<!DOCTYPE name_of_root_element
SYSTEM "URL of the external DTD
subset" >
Non XML
XML
XML Simplified
page 51 of 264
Concepts
DTDs
Module 3
Concepts
Properties
Variations
Dependency
DTDs
Document Type Declaration
Declaring either an internal or external
DTD
Is useful only if it has a DTD to define
page 52 of 264
XML Simplified
Module 3
Additionally, the following example displays the same DTD shown in the previous example but in the form
of an external DTD.
Example:
File: Mobile.dtd
<!ELEMENT Mobile (Company, Model, Price, Accessories)>
<!ELEMENT Company (#PCDATA)>
<!ELEMENT Model (#PCDATA)>
<!ELEMENT Price (#PCDATA)>
<!ELEMENT Accessories (#PCDATA)>
<!ATTLIST Model Type CDATA "Camera">
<!ENTITY HP "Head Phones">
<!ENTITY CH "Charger">
<!ENTITY SK "Starter's Kit">
File: Mobile.xml
...
<!DOCTYPE Mobile SYSTEM "Mobile.dtd">
...
The DTD is declared in a separate file "Mobile.dtd". The DOCTYPE declaration in the XML file refers to
it using the SYSTEM keyword.
DTDs validate XML documents but themselves are non-XML documents. One of the biggest
advantages of XML is its extensibility. However, DTDs cannot boast of this extensibility.
DTDs can be either external or internal depending on their DOCTYPE declaration. In spite of such
provisions, every XML document can have only one DTD to adhere to. This limits the advantage
of having both internal and external DTDs.
DTDs do not follow the latest trends of being object oriented, inheritance, and so on. This puts
them behind widely used programming languages and programmer preferences.
XML documents can use multiple namespaces. To use a namespace, XML documents require the
namespace to be declared within its DTD. However, DTDs do not support multiple namespaces.
This defeats the very advantage of being able to declare multiple namespaces.
XML Simplified
page 53 of 264
Concepts
DTDs
Module 3
Concepts
DTDs
DTDs support only one datatype, the text string. This is one of its biggest disadvantages
depending on the number of applications XML is being used for today.
Being able to define DTDs for standardizing XML documents is surely one of DTDs biggest tenets.
However, this is proving to be helpless with internal DTDs being able to override external DTDs.
Hours of efforts in creating effective DTDs could be ruined by malicious internal DTDs. Thus, the
security aspect of DTDs is very doubtful.
Knowledge Check 1
1.
Which of the statements about DTDs and their purpose are true and which statements are false?
(A)
(B)
(C)
(D)
(E)
Element Declarations
Attribute Declarations
Entity Declarations
page 54 of 264
XML Simplified
Module 3
Each element declaration specifies the name of the element, and the content which that element can
contain. Each attribute declaration specifies the element that owns the attribute, the attribute name, its
type and its default value (if any). Each entity declaration specifies the name of the entity and either its
value or location of its value.
These groups of element, attribute and entity declarations that form the three blocks of a DTD respectively
can be declared in any sequence.
2.
3.
4.
5.
6.
Steps 1 to 3 deal with element declaration, steps 4 and 5 with attribute declarations and step 6 with entity
declarations. This is in clear accordance with the structure of a DTD.
Syntax:
<!ELEMENT element-name (element-content)>
XML Simplified
page 55 of 264
Concepts
DTDs
Module 3
DTDs
Concepts
page 56 of 264
XML Simplified
Module 3
DTDs
Concepts
Syntax:
<!DOCTYPE name_of_root_element [ internal DTD subset ]>
or
<!DOCTYPE name_of_root_element SYSTEM "URL of the external DTD subset" >
Internal DTDs
Internal DTDs are characterized by the presence of the DTD in the document type declaration itself.
The document type declaration consists of the DTD name followed by the DTD enclosed in square
brackets.
Figure 3.2 depicts the internal DTD.
External DTDs
External DTDs are characterized by the presence of the DTDs address path in the document type
declaration. The document type declaration consists of the DTDs name followed by the SYSTEM
keyword followed by the address of the DTD document.
XML Simplified
page 57 of 264
Module 3
DTDs
Concepts
Knowledge Check 2
1.
Which of the statements about DTD structure and DOCTYPE declarations are true and which
statements are false?
(A)
(B)
(C)
(D)
(E)
2.
DTDs are made up of three blocks of declarations and the DOCTYPE declaration.
Elements, attributes and entities can be declared in any order.
DOCTYPE declarations are specified in the prolog of the XML document.
Internal DTDs specify the DTD within square brackets in the declaration itself.
External DTDs use the keyword URL to specify the location of the DTD.
page 58 of 264
XML Simplified
Module 3
DTDs
Concepts
3.3
In this third lesson, Valid XML Documents, you will learn to:
3.3.1
For an XML document to execute properly, it should be well-formed. A well-formed XML document
adheres to the basic XML syntax rules. The World Wide Web Consortium in its specifications, states that
XML documents with errors should not be processed by any program. The basic XML syntax rules are
as follows:
page 59 of 264
Module 3
Concepts
3.3.2
DTDs
A Valid XML document is a well-formed XML document that adheres to its DTD. The validity of an XML
document is determined by checking it against its DTD. Once it has been confirmed that the components
used in the XML document adhere to the declarations in the DTD, a well-formed XML document can be
termed as a valid XML document.
Validity of XML documents plays an important role in all XML applications. Be it creating modular parts
of the code, or data interchange, or processing of transferred data, or database access, and so forth,
validity of the XML document is a recommended feature. The following code demonstrates a valid XML
document.
Code Snippet:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Mail [
<!ELEMENT Mail (To, From, Date, Time, Cc, Bcc, Subject, Message,
Signature)>
<!ELEMENT To (#PCDATA)>
<!ELEMENT From (#PCDATA)>
<!ELEMENT Date (#PCDATA)>
<!ELEMENT Time (#PCDATA)>
<!ELEMENT Cc (#PCDATA)>
<!ELEMENT Bcc (#PCDATA)>
<!ELEMENT Subject (#PCDATA)>
<!ELEMENT Message (#PCDATA)>
<!ELEMENT Signature (#PCDATA)>
]>
<Mail>
<To> [email protected] </To>
<From> [email protected] </From>
<Date> 27th February 2007 </Date>
<Time> 11:30 am </Time>
<Cc> </Cc>
<Bcc> </Bcc>
<Subject> Meeting at Main Conference Room at 4:30pm </Subject>
<Message> Hi, Kindly request you to attend the cultural body general
meeting in the main conference room at 4:30 pm. Please be present to learn
about the new activities being planned for the employees for this year.
Yours sincerely, Bob </Message>
<Signature> </Signature>
</Mail>
page 60 of 264
XML Simplified
Module 3
3.3.3 Testing XML for Validity
Validity being a desirable trait, it is necessary to be able to validate XML documents after their creation.
Validity of XML documents can be determined by using a validating parser such as MSXML 6.0 Parser.
MSXML enables the Internet Explorer (IE) browser to validate the code. Once the code is displayed in IE,
right-click the code to display the context menu. The menu provides the option of validating the code. On
selection, IE internally validates the code against the code's DTD.
The first image of figure 3.5 displays the validation result of a valid mobile.xml file.
The second image of figure 3.5 displays the validation result after the removal of the signature element
from the document's DTD.
XML Simplified
page 61 of 264
Concepts
DTDs
Module 3
Concepts
DTDs
Knowledge Check 3
1.
page 62 of 264
XML Simplified
Module 3
<!DOCTYPE Mobile [
<!ELEMENT Mobile (Company, Model, Price, Accessories)>
<!ELEMENT Company (#PCDATA)>
<!ELEMENT Model (#PCDATA)>
<!ELEMENT Price (#PCDATA)>
<!ELEMENT Accessories (#PCDATA)>
<!ATTLIST Model Type CDATA "Camera">
<!ENTITY HP SYSTEM "hp.txt">
<!ENTITY SK SYSTEM "sk.txt">
]>
<Mobile>
<Company> Nokia </Company>
<Model Type="Camera"> 6600 </Model>
(B) <Price> 9999 </Price>
<Accessories> &HP;, &CH; and a &SK; </Accessories>
</Mobile>
hp.txt
Head Phones
ch.txt
Charger
sk.txt
Starter's Kit
XML Simplified
page 63 of 264
Concepts
DTDs
Module 3
Concepts
DTDs
<!DOCTYPE Mobile [
<!ELEMENT Mobile (Company, Model, Price, Accessories)>
<!ELEMENT Company (#PCDATA)>
<!ELEMENT Model (#PCDATA)>
<!ELEMENT Price (#PCDATA)>
<!ELEMENT Accessories (#PCDATA)>
<!ATTLIST Model Type CDATA "Camera">
<!ENTITY HP SYSTEM "hp.txt">
<!ENTITY CH SYSTEM "ch.txt">
<!ENTITY SK SYSTEM "sk.txt">
]>
(C) <Mobile>
<Company> Nokia </Company>
<Model Type="Camera"> 6600 </Model>
<Price> 9999 </Price>
<Accessories> &HP;, &CH; and a &SK; </Accessories>
</Mobile>
hp.txt
Head Phones
ch.txt
Charger
sk.txt
Starter's Kit
page 64 of 264
XML Simplified
Module 3
<!DOCTYPE Mobile [
<!ELEMENT Mobile (Company, Model, Price, Accessories)>
<!ELEMENT Company (#PCDATA)>
<!ELEMENT Model (#PCDATA)>
<!ELEMENT Price (#PCDATA)>
<!ELEMENT Accessories (#PCDATA)>
<!ATTLIST Model Type CDATA "Camera">
<!ENTITY HP SYSTEM "hp.txt">
<!ENTITY CH SYSTEM "ch.txt">
<!ENTITY SK SYSTEM "sk.txt">
]>
(D) <Mobile>
<Company> Nokia </Company>
<Model Type="Camera"> 6600 </Model>
<Price> 9999
<Accessories> &HP;, &CH; and a &SK; </Accessories>
</Mobile>
hp.txt
Head Phones
ch.txt
Charger
sk.txt
Starter's Kit
3.4
Declarations
XML Simplified
page 65 of 264
Concepts
DTDs
Module 3
Concepts
DTDs
Description
These elements are
empty and accept no
data.
Only Parsed Character Elements contain
Data
character data that
needs to be parsed.
page 66 of 264
Syntax
<!ELEMENT
element-name
EMPTY>
<!ELEMENT
element-name
(#PCDATA)>
Example
<!ELEMENT
Signature
EMPTY>
<!ELEMENT
Signature
(#PCDATA)>
XML Simplified
Module 3
Value
Any Contents
Children
Description
Elements can contain
any combination of data
that can be parsed.
Elements with one or
more children are defined
with the name of the
children elements inside
parentheses followed by
their declarations in the
same order.
Syntax
<!ELEMENT
element-name ANY>
Example
<!ELEMENT
Signature ANY>
<!ELEMENT
element-name
(child-elementname)>
or
<!ELEMENT elementname (childelement-name,
child-elementname,.....)>
<!ELEMENT Mail
(To, From,
Date, Time, Cc,
Bcc, Subject,
Message,
Signature)>
<!ELEMENT To
(#PCDATA)>
Description
Elements with children
declared only once within
the parentheses implicitly
appear only once in the
XML document.
Syntax
<!ELEMENT
element-name
(child-name)>
Minimum One
Occurrence
<!ELEMENT
element-name
(child-name+)>
Zero or More
Occurrences
<!ELEMENT
Elements with children
names accompanied by a '*' element-name
sign inside the parentheses (child-name*)>
can or cannot appear in the
XML document.
XML Simplified
Example
<!ELEMENT Mail
(To, From,
Date, Time, Cc,
Bcc, Subject,
Message,
Signature)>
<!ELEMENT
Mail (To+,
From+, Date+,
Time+, Cc,
Bcc, Subject+,
Message+,
Signature)>
<!ELEMENT
Mail (To*,
From+, Date+,
Time+, Cc,
Bcc, Subject+,
Message+,
Signature)>
page 67 of 264
Concepts
DTDs
Module 3
Concepts
Value
Zero or One
Occurrences
Either/Or Content
Mixed Content
DTDs
Description
Syntax
<!ELEMENT
Elements with children
names accompanied by a '?' element-name
sign inside the parentheses (child-name?)>
can be skipped or only
appear once in the XML
document.
Example
<!ELEMENT
Mail (To*,
From+, Date+,
Time+, Cc,
Bcc, Subject+,
Message+,
Signature?)>
<!ELEMENT
element<!ELEMENT Mail
Elements can have either
one of two or more children name (child-name, (To, From,
by specifying them in (child-name|child- Date, Time,
(Cc|Bcc),
parentheses separated by name))>
Subject,
'|'.
Message,
Signature)>
Elements can be declared to <!ELEMENT element- <!ELEMENT Mail
accept mixed content, either name (type|child- (To|From|Date|T
name|)>
ime|Cc|Bcc|Subj
data type or children, etc.
ect|Message|Sig
nature)>
Table 3.3: Other Element-Rule Options
page 68 of 264
XML Simplified
Module 3
DTDs
Value
PCDATA
CDATA
(en1|en2|..)
ID
IDREF
IDREFS
NMTOKEN
NMTOKENS
ENTITY
ENTITIES
NOTATION
xml:
Concepts
3.4.3
The attributevalue in a DTD declaration can have values as shown in table 3.5.
Value
value
#REQUIRED
#IMPLIED
#FIXED
en1|en2|...
Description
Default value
Value must be included
Value does not have to be included
Value is fixed
Listed enumerated values
#IMPLIED
<!ATTLIST element-name attribute-name attribute-type #IMPLIED>
#REQUIRED
<!ATTLIST element-name attribute-name attribute-type #REQUIRED>
XML Simplified
page 69 of 264
Module 3
Concepts
DTDs
#FIXED
<!ATTLIST element-name attribute-name attribute-type #FIXED "value">
Default Value
<!ATTLIST Model Type CDATA "Camera">
#IMPLIED
<!ATTLIST Model Type CDATA "Camera" #IMPLIED>
#REQUIRED
<!ATTLIST Model Type CDATA #REQUIRED>
#FIXED
<!ATTLIST Model Type CDATA #FIXED "Camera">
page 70 of 264
XML Simplified
Module 3
DTDs
Character
Mixed Content
Unparsed
Concepts
Character entities are entities that have a single character as their replacement value. Character entities
are further classified as:
Pre-defined
Numbered
Name
XML has a set of pre-defined entities to facilitate the use of characters like <, >, &, ", ', since these are
used by XML in its tags.
The Pre-defined entities and their references are shown in table 3.6.
Entity Name
Entity Name
amp
apos
gt
lt
quot
Value
Value
&
'
>
<
"
page 71 of 264
Module 3
Concepts
DTDs
Mixed content entities are entities that contain either text or markup language text as their replacement
value. For example, the entity test could also have "trial" or "<trial>Hi!</trial>" as its replacement value.
The text "<trial>Hi!</trial>" will be replaced with the entity reference and will act as an element trial.
Unparsed entities are entities that have data that is not to be parsed. The replacement value might be an
image or a file or something else. The syntax for an unparsed entity is as follows:
<!DOCTYPE name [
<!ENTITY entity-name SYSTEM "entity-location" NDATA Notation Identifier>
]>
The keyword NDATA stands for notation data and notation identifier specifies the format of the file or
object.
At the time of processing, the XML parser goes through the document declarations. The parser then
scans the document for all entity references. When the parser encounters an entity reference, it replaces
the entity reference with the entity value associated with that entity. The parser then resumes parsing
from the replaced text. Entity references within the replaced text are also replaced by doing so.
Syntax:
Entity declaration:
<!ENTITY entity-name "entity-value">
Entity Reference:
&entity-name;
page 72 of 264
XML Simplified
Module 3
DTDs
Concepts
XML Simplified
page 73 of 264
Module 3
DTDs
Concepts
page 74 of 264
XML Simplified
Module 3
DTDs
Concepts
XML Simplified
page 75 of 264
Module 3
Concepts
DTDs
Knowledge Check 4
1.
Can you match the element declaration descriptions with their correct syntaxes?
Description
(A) Element can contain character data but
to be parsed, incase it contains entity
references.
(B) Element with a number of children
appearing only once in the XML
document.
(C) Element can accept either data type or
children, and so forth.
Syntax
(1) <!ELEMENT element-name
(type|child-name|)>
(2) <!ELEMENT element-name
(child-name+)>
Can you match the attribute descriptions with their correct values?
(A)
(B)
(C)
(D)
(E)
Description
Name of a notation
Predefined xml value
id of another element
valid XML name
enumerated list
page 76 of 264
(1)
(2)
(3)
(4)
(5)
Value
NMTOKEN
NOTATION
(en1|en2|..)
xml:
IDREF
XML Simplified
Module 3
DTDs
Which of the following XML code is correct?
<!DOCTYPE Mobile [
<!ELEMENT Mobile (Company, Model, Price, Accessories)>
<!ELEMENT Company (#PCDATA)>
<!ELEMENT Model (#PCDATA)>
<!ELEMENT Price (#PCDATA)>
<!ELEMENT Accessories (#PCDATA)>
<!ATTLIST Model Type CDATA "Camera">
<!ENTITY HP "Head Phones">
(A)
<!ENTITY SK "Starters Kit">
]>
<Mobile>
<Company> Nokia </Company>
<Model Type="Camera"> 6600 </Model>
<Price> 9999 </Price>
<Accessories> &HP;, &CH; and a &SK; </Accessories>
</Mobile>
<!DOCTYPE Mobile [
<!ELEMENT Mobile (Company, Model, Price, Accessories)>
<!ELEMENT Company (#PCDATA)>
<!ELEMENT Model (#PCDATA)>
<!ELEMENT Price (#PCDATA)>
<!ELEMENT Accessories (#PCDATA)>
<!ATTLIST Model Type CDATA "Camera">
<!ENTITY HP "Head Phones">
(B)
<!ENTITY CH "Charger">
<!ENTITY SK "Starters Kit">
]>
<Mobile>
<Company> Nokia </Company>
<Model Type="Camera"> 6600 </Model>
<Price> 9999 </Price>
<Accessories> HP, CH and a SK </Accessories>
</Mobile>
XML Simplified
page 77 of 264
Concepts
3.
Module 3
Concepts
DTDs
<!DOCTYPE Mobile [
<!ELEMENT Mobile (Company, Model, Price, Accessories)>
<!ELEMENT Company (#PCDATA)>
<!ELEMENT Model (#PCDATA)>
<!ELEMENT Price (#PCDATA)>
<!ELEMENT Accessories (#PCDATA)>
<!ATTLIST Model Type CDATA "Camera">
<!ENTITY HP "hp.txt">
(C)
<!ENTITY CH "ch.txt">
<!ENTITY SK "sk.txt">
]>
<Mobile>
<Company> Nokia </Company>
<Model Type="Camera"> 6600 </Model>
<Price> 9999 </Price>
<Accessories> &HP;, &CH; and a &SK; </accessories>
</Mobile>
<!DOCTYPE Mobile [
<!ELEMENT Mobile (Company, Model, Price, Accessories)>
<!ELEMENT Company (#PCDATA)>
<!ELEMENT Model (#PCDATA)>
<!ELEMENT Price (#PCDATA)>
<!ELEMENT Accessories (#PCDATA)>
<!ATTLIST Model Type CDATA "Camera">
<!ENTITY HP "Head Phones">
(D)
<!ENTITY CH "Charger">
<!ENTITY SK "Starters Kit">
]>
<Mobile>
<Company> Nokia </Company>
<Model Type="Camera"> 6600 </Model>
<Price> 9999 </Price>
<Accessories> &HP;, &CH; and a &SK; </Accessories>
</Mobile>
page 78 of 264
XML Simplified
Module 3
DTDs
Concepts
Module Summary
In this module, Document Type Definitions, you learnt about:
Declarations
This lesson deals with declarations of elements, attributes, and entities in DTDs.
XML Simplified
page 79 of 264
Module
Concepts
XML Schema
Module Overview
Welcome to the module, XML Schema. This module focuses on exploring XML schema and
its features. The structure of an XML document with the help of schema is also discussed. This
module aims at providing a clear understanding of the different elements and data types supported
by an XML schema.
In this module, you will learn about:
XML Schema
4.1.1 Schema
DTDs define document structure and validate XML documents, but have some limitations. Hence, an
XML-based alternative to DTDs, known as XML schema has been introduced, with an objective to
overcome the drawbacks of DTDs.
The word schema originated from a Greek word symbolizing form or shape. The dictionary meaning of
schema is: "A diagrammatic representation; an outline or a model". Initially, the word schema was only
in the reach of the philosophers till it entered the zone of computer science. In the context of software,
a schema is generally understood to be a model used to describe the structure of a database. It defines
internal structures such as tables, fields, and the relationship between them.
XML Simplified
page 80 of 264
Module 4
However, in the context of XML, as defined by the W3C, a schema is "a set of rules to constrain the
structure and articulate the information set of XML documents". A schema describes a model for a whole
class of documents. The model describes the way in which the data is marked up, and also specifies the
possible arrangement of tags and text in a valid document. A schema might be considered as a common
vocabulary that is needed to exchange documents between different organizations.
An XML Schema defines the valid building blocks of an XML document. It can be considered as a
common vocabulary that different organizations can share to exchange documents. The XML Schema
language is referred as XML Schema Definition (XSD).
Figure 4.1 depicts the XML data validation.
XML Simplified
page 81 of 264
Concepts
XML Schema
Module 4
Concepts
XML Schema
The following XML code demonstrates an entity 'BOOK'. When this document is accessed through a
browser, it will represent the details of a book.
Code Snippet:
<Book>
<Title>Path To Paradise</Title>
<Author>David White</Author>
<Theme>Philosophy</Theme>
<Publisher>ABC Publication</Publisher>
<ISBN>11</ISBN>
<Price>$12481</Price>
<Edition>June 2000</Edition>
</Book>
The logical reasoning would compare the attributes of this 'BOOK', with the attributes of a book in general.
In other words, one's previous knowledge of what a book is, and what its attributes are, could be a kind of
schema, against which this 'BOOK' is compared. This would help to validate the attributes of the book.
Schemas overcome the limitations of DTDs and allow Web applications to exchange XML data more
robustly, without relying on ad hoc validation tools.
page 82 of 264
XML Simplified
Module 4
XML Schema
XML syntax is used as the basis for creating XML schema documents. It does not require learning
a new cryptic language as is the case with DTDs.
XML schemas can be manipulated just like any other XML document.
XML File
Given XML document contains a single element, <Message>. A schema for this document has to
declare the <Message> element.
XSD File
For storing schema documents, the file is saved with ".xsd" as the extension. Schema documents
are XML documents and can have DTDs, DOCTYPE declarations.
XML Simplified
page 83 of 264
Concepts
Module 4
XML Schema
Concepts
page 84 of 264
XML Simplified
Module 4
4.1.4 Features of Schema
XML schemas allow Web applications to exchange XML data more robustly using a range of new features.
They are:
XML Simplified
page 85 of 264
Concepts
XML Schema
Module 4
XML Schema
Concepts
Richer Datatypes
The Schema draft defines booleans, dates and times, URIs, time intervals, and also numeric types
like decimals, integers, bytes, longs, and many more.
Archetypes
An archetype is used to define the custom-named datatype from pre-existing data types. For
example, a 'ContactList' datatype is defined, and then two elements, 'FriendsList' and 'OfficialList'
are defined under that type.
Attribute Grouping
It is possible to have common attributes that apply to all elements, or several attributes that include
graphic or table elements. Attribute grouping allows the schema author to make this relationship
between elements explicit. Parameter entity supports grouping in DTDs, which simplifies the
process of authoring a DTD, but the information is not passed on to the processor.
Refinable Archetypes
A DTD follows a 'closed' type of content model. Content model is defined as the constraint on the
content of elements in an instance XML document. The 'closed' content model describes all, and
only those elements and attributes that may appear in the content of the element. XML Schema
allows two more possibilities:
In an 'open' content model, elements other than the required elements can also be present. The
open content model allows the inclusion of child elements and attributes within an element that are
not declared in the document's schema. DTDs only support closed content models, which require
declaring all elements and attributes in order to use them in a document. Additional elements may
be present in a 'refinable' content model, but the schema should define those additional elements.
page 86 of 264
XML Simplified
Module 4
4.1.5 Comparing DTDs with Schemas
XML inherited the concept of DTDs from Standard Generalized Markup Language (SGML), which is
an international standard for markup languages. DTDs are used to define content models, nesting of
elements in a valid order, and provide limited support to data types and attributes. The drawbacks of
using DTDs are:
page 87 of 264
Concepts
XML Schema
Module 4
Concepts
XML Schema
The following code demonstrates a sample external DTD file: program.dtd
Code Snippet:
<!ELEMENT program (comments, code)>
<!ELEMENT comments (#PCDATA)>
<!ELEMENT code (#PCDATA)>
The following code demonstrates a sample XML File with a reference to dtd: program.xml
Code Snippet:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE program SYSTEM "program.dtd">
<program>
<comments>
This is a simple Java Program. It will display the message
"Hello world!" on execution.
</comments>
<code>
public static void main(String[] args)
System.out.println("Hello World!"); // Display the string.
</code>
</program>
Additionally, examples are shown here to demonstrate how DTDs and schemas are referenced.
Simple XML Document: Book.xml
The following code demonstrates a simple XML document called "Book.xml":
Code Snippet:
<?xml version="1.0"?>
<Book>
<Title> Million Seconds </Title>
<Author> Kelvin Brown </Author>
<Chapter> The plot of the story starts from here. </Chapter>
</Book>
page 88 of 264
XML Simplified
Module 4
XML Schema
The following code demonstrates a DTD file called "Book.dtd" that defines the elements of the
Book.xml document.
Code Snippet:
<!ELEMENT Book (Title, Author, Chapter)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Author (#PCDATA)>
<!ELEMENT Chapter (#PCDATA)>
The first line of code defines the Book element to be the root element which in turn encloses three child
elements: 'Title', 'Author', and 'Chapter'. The rest of the block defines the 'Title', 'Author', and 'Chapter'
elements to be of type "#PCDATA".
XML Schema for Book.xml
The following code demonstrates that the corresponding XML Schema file called "Book.xsd" defines the
elements of the XML document Book.xml.
Code Snippet:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.booksworld.com" xmlns="http://www.booksworld.
com" elementFormDefault="qualified">
<xs:element name="Book">
<xs:complexType>
<xs:sequence>
<xs:element name="Title" type="xs:string"/>
<xs:element name="Author" type="xs:string"/>
<xs:element name="Chapter" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The Book element is a complex type because it encloses other elements. The child elements Title,
Author, Chapter are simple types because they do not contain other elements.
XML Simplified
page 89 of 264
Concepts
Module 4
XML Schema
Concepts
page 90 of 264
XML Simplified
Module 4
4.1.6 Advantages of XML Schemas over DTD
Schemas overcome the limitations of DTDs and allow Web applications to exchange XML data more
robustly, without relying on ad hoc validation tools.
The XML schema offers a range of new features:
Archtypes
An archetype allows to define own named datatype from pre-existing data types. For example,
one can define a 'ContactList' datatype, and then define two elements, "FriendsList" and
'OfficialList' under that type.
Attribute grouping
There can be common attributes that apply to all elements, or several attributes that include graphic
or table elements. Attribute grouping allows the schema author to make this relationship explicit.
Refinable archetypes
A DTD follows a 'closed' type of model. It describes all, and only those elements and attributes
that may appear in the content of the element. XML Schema allows two more possibilities: 'open'
and 'refinable'. In an 'open' content model, elements other than the required elements can also be
present. Additional elements may be present in a refinable content model, but the schema should
define those additional elements.
Code Snippet:
The following code demonstrates a sample schema File: mail.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.abc.com"
xmlns="http://www.abc.com"
elementFormDefault="qualified">
<xs:element name="mail">
<xs:complexType>
XML Simplified
page 91 of 264
Concepts
XML Schema
Module 4
Concepts
XML Schema
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="header" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The following code demonstrates a sample XML File with a reference to schema: mail.xml
<?xml version="1.0"?>
<mail
xmlns="http://www.abc.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.abc.com mail.xsd">
<to>John</to>
<from>Jordan</from>
<header>Scheduler</header>
<body>3rd March Monday, 7:30 PM: board meeting!</body>
</mail>
Knowledge Check 1
1.
Which of these statements about schemas are true and which statements are false?
(A) An XML Schema defines the structure of an XML document.
(B) An XML Schema is an XML-based add-on to DTDs.
(C) XML syntax is used as the basis to create a schema, so it can be stored with the same
extension .xml (dot XML).
(D) An XML Schema defines how many child elements can appear in an XML document.
(E) An XML Schema defines whether an element is empty or can include text.
page 92 of 264
XML Simplified
Module 4
2.
Can you match the different features for DTD and schemas against their corresponding
description?
(A)
(B)
(C)
(D)
(E)
Description
Allows to define own named data type from pre-existing data (1)
types.
Allows the schema author to make the attributes common that (2)
apply to all elements, or several attributes that include graphic
or table elements.
Describes elements which are not required to be present in the (3)
XML document.
Describes only those elements and attributes that may appear (4)
in the content of the element.
Allows to validate documents that use markup from multiple (5)
namespaces.
Feature
Attribute grouping
Namespace support
Closed model
Archetypes
Open model
Built-in data types, are available to all XML Schema authors, and should be implemented by a
conforming processor.
User-derived data types, are defined in individual schema instances, and are particular to that
schema (although it is possible to import these definitions into other definitions).
In XML schema, the data types can be broadly classified into built-in and derived types.
XML Simplified
page 93 of 264
Concepts
XML Schema
Module 4
XML Schema
Concepts
string
A group of characters is called a string. The string data type can contain characters, line feeds,
carriage returns, and tab characters. The string may also consist of a combination of Unicode
characters. Unicode is a universal standard to describe all possible characters of all languages with
a library of symbols with one unique number for each symbol.
Syntax:
<xs:element name="element_name" type="xs:string"/>
Code Snippet:
The string declaration in the schema is:
<xs:element name="Customer" type="xs:string"/>
An element in the xml document will be:
<Customer>John Smith</Customer>
boolean
The boolean data type is used to specify a mathematical representation. Legal values for boolean
data type are true and false. True can be replaced by the numeric value 1 and false can be replaced
by the value 0.
Syntax:
<xs:attribute name="attribute_name" type="xs:boolean"/>
Code Snippet:
The boolean declaration in the schema is:
<xs:attribute name="Disabled" type="xs:boolean"/>
An element in the xml document will be:
<Status Disabled="true">OFF</Status>
page 94 of 264
XML Simplified
Module 4
numeric
The data type numeric represents a numerical value. It includes numbers such as whole numbers,
and real numbers.
Syntax:
<xs:element name="element_name" type="xs:numeric"/>
Code Snippet:
The numeric declaration in the schema is:
<xs:element name="Price" type="xs:numeric"/>
An element in the xml document will be:
<Price>500</Price>
dateTime
It represents a particular time on a given date, written as a string. For example, "2001-0510T12:35:40" can be considered as a dateTime string. The date is specified in the following form
"YYYY-MM-DDThh:mm:ss":
where,
YYYY indicates the year,
MM indicates the month,
DD indicates the day,
T indicates the start of the required time,
hh indicates the hours,
mm indicates the minutes,
ss indicates the seconds.
XML Simplified
page 95 of 264
Concepts
XML Schema
Module 4
XML Schema
Concepts
Syntax:
<xs:element name="element_name" type="xs:dateTime"/>
Code Snippet:
The datetime declaration in the schema is:
<xs:element name="BeginAt" type="xs:dateTime"/>
An element in the xml document will be:
<start>2001-05-10T12:35:40</start>
binary
The binary type can include graphic files, executable programs, or any other string of binary data.
Binary data types are used to express binary-formatted data of two types such as base64Binary
(Base64-encoded binary data) and hexBinary (hexadecimal-encoded binary data).
Syntax:
<xs:element name="image_name" type="xs:hexBinary"/>
Code Snippet:
<xs:element name="Logo" type="xs:hexBinary"/>
anyURI
A universal resource identifier (URI) represents a file name or location of the file.
Syntax:
<xs:attribute name="image_name" type="xs:anyURI"/>
Code Snippet:
The anyURI declaration in the schema is:
<xs:attribute name="flower" type="xs:anyURI"/>
page 96 of 264
XML Simplified
Module 4
XML Schema
integer
The base type for integer is the numeric data type. It includes both positive and negative numbers.
For example, the numbers -2, -1, 0, 1, 2 are integers. The integer data type is used to specify a
numeric value without a fractional component.
Syntax:
<xs:element name="element_name" type="xs:integer"/>
Code Snippet:
The integer declaration in the schema is:
<xs:element name="Age" type="xs:integer"/>
An element in the xml document will be:
<Age>999</Age>
decimal
It can represent exact fractional parts such as 3.26. The basetype for decimal is the number data
type. The decimal data type is used to specify a numeric value.
Syntax:
<xs:element name="element_name" type="xs:decimal"/>
XML Simplified
page 97 of 264
Concepts
Module 4
XML Schema
Concepts
Code Snippet:
The decimal declaration in the schema is:
<xs:element name="Weight" type="xs:decimal"/>
An element in the xml document will be:
<prize>+70.7860</prize>
time
The base type is the dateTime data type. The default representation is 16:35:26. The time data
type is used to specify a time. The time is specified in the following form "hh:mm:ss" where, hh
stands for hour, mm indicates the minute and ss indicates the second.
Syntax:
<xs:element name="element_name" type="xs:time"/>
Code Snippet:
The decimal declaration in the schema is:
<xs:element name="BeginAt" type="xs:time"/>
An element in the xml document will be:
<BeginAt>09:30:10.5</BeginAt>
page 98 of 264
XML Simplified
Module 4
4.2.3 Schema Vocabulary
Creating a schema using XML schema vocabulary is like creating any other XML document using a
specialized vocabulary. To understand the XML Schema vocabulary and the elements, the example
discusses an XML document that will be validated against a schema.
Figure 4.5 depicts schema declaration.
XML Simplified
page 99 of 264
Concepts
XML Schema
Module 4
Concepts
XML Schema
elementFormDefault="qualified"
It indicates that elements used by the XML instance document which were declared in this schema needs
to be qualified by the namespace.
Figure 4.6 illustrates using of schema declaration.
XML Simplified
Module 4
XML Schema
1.
Can you match the xml data against their corresponding data type?
(A)
(B)
(C)
(D)
(E)
2.
Concepts
Knowledge Check 2
Description
<prize disabled="true">999</prize>
<img src="http://www.abc.com/images/flowers.gif"/>
<start>09:30:10.5</start>
<start>2002-09-24</start>
<prize>+999.5450</prize>
(1)
(2)
(3)
(4)
(5)
Features
dateTime
boolean
decimal
time
anyURI
The given is an instance of XML document referencing an XML schema. Can you arrange the XML
file in its proper order using the schema vocabulary?
(1)
(2)
(3)
(4)
(5)
<heading>Scheduler</heading>
<body>3rd March Monday, 7:30 PM: board meeting!</body>
<to>John</to>
<from>Jordan</from>
</mail>
<?xml version="1.0"?>
<mail xmlns="http://www.abc.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.abc.com mail.xsd">
XML Simplified
Module 4
Concepts
XML Schema
Empty elements
Empty elements optionally specify attributes types, but do not permit content as shown in the
following example.
Code Snippet:
<xs:element name="Books">
<xs:complexType>
<xs:attributename="BookCode"
type="xs:positiveInteger"/>
</xs:complexType>
</xs:element>
Only Elements
These elements can only contain elements and do not contain attributes as shown in the following
example.
Code Snippet:
<xs:element name="Books">
<xs:complexType>
<xs:sequence>
<xs:element name="ISBN" type="xs:string"/>
<xs:element name="Price" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
XML Simplified
Module 4
XML Schema
Only Text
These elements can only contain text and optionally may or may not have attributes as shown in
the following example.
Code Snippet:
<xs:complexType name="Books">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="BookCode" type="xs: positiveInteger"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
Mixed
These are elements that can contain text content as well as sub-elements within the element as
shown in the following example. They may or may not have attributes.
Code Snippet:
<xs:element name="Books">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element name="BookName" type="xs:string"/>
<xs:element name="ISBN" type="xs:positiveInteger"/>
<xs:element name="Price" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
XML Simplified
Concepts
Module 4
Concepts
XML Schema
XML Simplified
Module 4
4.3.3 minOccurs and maxOccurs
When working with DTDs, we used the markers *, ?, and + to indicate the number of times a particular child
element could be used as content for an element. Similarly, XML schema allows specifying a minimum
and maximum number of times an element can occur. In schemas, both elements and attributes use the
following attributes:
minOccurs
minOccurs specify the minimum number of occurrences of the element in an XML document.
The default value for the minOccurs attribute is 1. If an element has a minOccurs value of 0, it
is optional. An attribute is optional, if the minOccurs is 0. If minOccurs is set to 1, the attribute is
required.
maxOccurs
maxOccurs specify the maximum number of occurrences of the element in an XML document.
The default value for the maxOccurs attribute is 1. If its value is kept unbounded, it means that the
element can appear unlimited number of times. The maxOccurs attribute defaults to 1 unless it is
specified.
The following code demonstrates the use of minOccurs and maxOccurs attributes.
Code Snippet:
The example demonstrates that the Quantity element can occur a minimum of zero times and a
maximum of hundred times in the Books element.
XML Simplified
Concepts
XML Schema
Module 4
Concepts
XML Schema
The relationship between the minOccurs and maxOccurs attributes is displayed in table 4.1.
minOccur
0
1
0
1
>0
>maxOccurs
Any value
MaxOccur
1
1
*
*
*
>0
<minOccurs
XML Simplified
Module 4
XML Schema
Concepts
Books.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xs:element name= "Books" type= "BookType">
<xs:complexType name= "AuthorType">
<xs:sequence>
<xs:element name= "Name" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name= "PublisherType">
<xs:sequence>
<xs:element name= "Name" type= "xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name= "BookType">
<xs:sequence>
<xs:element name= "Title" type= "xs:string"/>
<xs:element name= "Author" type="ComposerType"
maxOccurs= "unbounded"/>
<xs:element name= "Publisher" type="PublisherType"
minOccurs="0" maxOccurs= "unbounded"/>
In the XML document the Author and the Publisher elements each contain Name elements. So built-in
data types like xs:string cannot be used, instead AuthorName and PublisherName can be defined
using top level xs:complexType elements.
XML Simplified
Module 4
XML Schema
Concepts
xs:all
This grouping construct requires that each element in the group must occur at most once, but that
order is not important. The only caution is that in this type of grouping the minOccurs attribute
can be 0 or 1 and the maxOccurs attribute has to be 1. The following code demonstrates this
concept.
Code Snippet:
<xs:element name= "Books">
<xs:complexType>
<xs:all>
<xs:element name="Name" type="xs:string" minOccurs= "1" maxOccurs= "1"/>
<xs:element name="ISBN" type="xs:string" minOccurs= "1" maxOccurs= "1"/>
<xs:element name="items" type="Items" minOccurs= "1" />
</xs:all>
</xs:complexType>
</xs:element>
XML Simplified
Module 4
xs:choice
The choice element is the opposite of all elements. Instead of requiring all the elements to be
present, it will allow only one of the choices to appear. The choice element provides an XML
representation for describing a selection from a set of element types. The choice element itself
can have minOccurs and maxOccurs attributes that establish exactly how many selections may
be made from the choice. The following code demonstrates this concept.
Code Snippet:
<xs:complexType name="AdressInfo">
<xs:group>
<xs:choice>
<xs:element name="Address" type="USAddress" />
<xs:element name="Address" type="UKAddress" />
<xs:element name="Address" type="FranceAddress" />
</xs:choice>
</xs:group>
In the code snippet, there are three child elements that are mutually exclusive. With the
xs:
choice element declared, only one element among the choices can be a child element of the
parent element AddressInfo.
xs:sequence
An xs:sequence element specifies each member of the sequence to appear in the same order
in the instance document as mentioned in the xs:sequence element. The number of times each
element is allowed to appear can be controlled by the element's minOccurs and maxOccurs
attributes. The following code demonstrates this concept.
Code Snippet:
<xs:element name="Books">
<xs:complexType>
<xs:sequence>
<xs:element name="Name" type="xs:string" />
<xs:element name="ISBN" type=" xs:string " />
<xs:element name="Price" type=" xs:string " />
</xs:sequence>
</xs:complexType>
</xs:element>
XML Simplified
Concepts
XML Schema
Module 4
Concepts
XML Schema
Knowledge Check 3
1.
Which of these statements about complex types are true and which statements are false?
(A) The order and the number of elements that appears in the mixed content cannot be specified
in the schema.
(B) If the value of maxOccurs attribute is kept unbounded, it means that the element can
appear unlimited number of times.
(C) Elements with complex type may contain nested elements and have attributes.
(D) The default value for the minOccurs attribute is 0.
(E) When a minOccurs attribute is used, there cannot be a maxOccurs attribute in the same
line.
2.
Which of these statements about element and mixed content are true and which statements are
false?
(A) Mixed content means that an element whose structure is the complex type can contains
elements with attributes.
(B) Element content means a complex type element that contains only elements.
(C) The order and the number of elements appearing in the mixed content cannot be specified
in schemas.
(D) Element content cannot have attributes.
3.
Which of these statements about grouping are true and which statements are false?
(A) The sequence element provides an XML representation for describing a selection from a
set of element types.
(B) The all element requires that each element in the group must occur at most once.
(C) For each element type associated with a sequence element, there must be an element in
the XML instance in the same order.
(D) The choice element cannot mention the minOccurs and maxOccurs attribute.
XML Simplified
Module 4
XML Schema
Concepts
List and describe the data types used with simple types.
XML Simplified
Module 4
Concepts
XML Schema
XML Simplified
Module 4
XML Schema
XML Simplified
Concepts
Syntax:
Module 4
XML Schema
Concepts
Code Snippet:
<xs:simpleType name="triangle">
<xs:restriction base="xsd:string">
<xs:enumeration value="isosceles"/>
<xs:enumeration value="right-angled"/>
<xs:enumeration value="equilateral"/>
</xs:restriction>
</xs:simpleType>
Elements of this type can have either the value "isosceles", or "right-angled", or
"equilateral".
XML Simplified
Module 4
4.4.5 Restrictions
Declaration of a data type puts certain limitations on the content of an XML element or attribute. If an
XML element is of type "xs:integer" and contains a string like "Welcome", the element will not be
validated. These limitations are called restrictions, which defines allowable values for XML elements and
attributes.
Restrictions can be specified for the simpleType elements and restriction types are declared using the
<restriction> declaration. Basically, the <restriction> declaration is used to declare a derived
simpleType, which is a subset of its base simpleType. The value of the base attribute can be any existing
simpleType, or built-in XML Schema datatype.
Syntax:
<restriction base="name of the simpleType you are deriving from">
In this <restriction> declaration, the base data type can be specified using the base attribute.
The following code demonstrates a simpleType "Age" is derived by specifying the base as integer type.
Code Snippet:
<xs:simpleType name="Age">
<xs:restriction base="xs:integer">
...
...
</xs:restriction>
</xs:simpleType>
4.4.6 Facets
With XML Schemas, custom restrictions can be specified on XML elements and attributes. These
restrictions are called facets.
Facets are used to restrict the set or range of values a datatype can contain. The value range defined by
the facet must be equal to or narrower than the value range of the base type.
XML Simplified
Concepts
XML Schema
Module 4
Concepts
XML Schema
There are 12 facet elements, declared using a common syntax. They each have a compulsory value
attribute that indicates the value for the facet. One restriction can contain more than one facet. Any values
appearing in the instance and value spaces must conform to all the listed facets.
Table 4.2 depicts the constraining facets.
Facet
minExclusive
minInclusive
maxExclusive
maxInclusive
totalDigits
fractionDigits
length
minLength
maxLength
enumeration
whiteSpace
pattern
Description
Specifies the minimum value for the type that excludes the value provided.
Specifies the minimum value for the type that includes the value provided.
Specifies the maximum value for the type that excludes the value provided.
Specifies the maximum value for the type that includes the value provided.
Specifies the total number of digits in a numeric type.
Specifies the number of fractional digits in a numeric type.
Specifies the number of items in a list type or the number of characters in a string
type.
Specifies the minimum number of items in a list type or the minimum number of
characters in a string type.
Specifies the maximum number of items in a list type or the maximum number of
characters in a string type.
Specifies an allowable value in an enumerated list.
Specifies how whitespace should be treated within the type.
Restricts string types.
Table 4.2: Constraining Facets
Syntax:
<xs:simpleType name= "name">
<xs:restriction base= "xs:source">
<xs:facet value= "value"/>
<xs:facet value= "value"/>
</xs:restriction>
</xs:simpleType>
XML Simplified
Module 4
XML Schema
Concepts
The following code demonstrates that the value attribute gives the value of that facet.
Code Snippet:
<xs:simpleType name="triangle">
<xs:restriction base="xs:string">
<xs:enumeration value="isosceles"/>
<xs:enumeration value="right-angled"/>
<xs:enumeration value="equilateral"/>
</xs:restriction>
</xs:simpleType>
Here, the facet enumeration is added to the restriction with the value attribute as either isosceles, or
right-angled, or equilateral. So, an element declared to be of type triangle must be a string with
a value of either isosceles, or right-angled, or equilateral.
4.4.7 Attributes
XML elements can contain attributes that describe elements. Within XML Schemas, attribute declarations
are similar to element declarations. To declare an attribute, <xs:attribute> element is used.
An attribute can be indicated whether it is required or optional or whether it has a default value. A default
value is automatically assigned to the attribute when no other value is specified.
The use attribute specifies whether the attribute is required or optional. Here are the different values that
can be assigned to the attributes:
Default
A default value is automatically assigned to the attribute when no other value is specified. For
example,
<xs:attribute name="genre" type="xs:string" default="fiction"/>
In this code, the default value of the attribute genre is fiction.
XML Simplified
Module 4
XML Schema
Concepts
Fixed
This value makes the attribute fixed. A fixed value is automatically assigned to the attribute, and
another value cannot be specified. For example,
<xs:attribute name="genre" type="xs:string" fixed="fiction"/>
In this code, fixed value of fiction is assigned to the attribute genre, so another value for the
attribute cannot be specified.
Optional
This value makes the attribute optional, which means that the attribute may have any value. The
default value for any an attribute is optional. For example,
xs:attribute name="genre" type="xs:string" use="optional"/>
In this code, the attribute genre can take any string value.
Prohibited
This value means that the attribute cannot be used. For example,
xs:attribute name="genre" type="xs:string" use="prohibited"/>
In this line of code, the element instance will not have the attribute genre.
Required
This value makes the attribute required. The attribute can have any value. For example,
xs:attribute name="genre" type="xs:string" use="required"/>
In this line of code, the attribute genre has to be used in the XML element declaration.
XML Simplified
Module 4
XML Schema
Concepts
XML Simplified
Module 4
XML Schema
Concepts
Knowledge Check 4
1.
Which of the following statements about simple type elements are true and which statements are
false?
(A)
(B)
(C)
(D)
(E)
A custom user defined datatype can be created using the <simpleType> definition.
Elements of simple type describe the content and data type of an element.
Elements of simple type constitute the structure of an XML document.
A built-in simple element can contain a default value or a facet value.
A default value is the value that is assigned automatically to the element when there is
no other value specified.
XML Simplified
Module 4
XML Schema
Can you match the different keywords against their corresponding description?
(A)
(B)
(C)
(D)
(E)
Description
Specifies the number of digits after decimal point
Restricts string types using regular expressions
Specifies an allowable value in an enumerated list
Specifies whether the attribute is required or optional
Specifies that the attribute cannot be used
XML Simplified
(1)
(2)
(3)
(4)
(5)
Term
pattern
use
prohibited
fractionDigits
enumeration
Concepts
2.
Module 4
Concepts
XML Schema
Module Summary
In this module, XML Schema, you learnt about:
XML Schema
An XML Schema is an XML-based alternative to DTDs, which describes the structure of an
XML document. The XML Schema language is also referred to as XML Schema Definition
(XSD). An XML Schema can define elements, attributes, child elements and the possible
values that can appear in a document. Schemas overcome the limitations of DTDs and
allow Web applications to exchange XML data robustly, without relying on ad hoc validation
tools.
XML Simplified
Module 4
XML Simplified
Concepts
XML Schema
Module
Concepts
Style Sheets
Module Overview
Welcome to the module, Style Sheets. This module introduces you to style sheets. It also
discusses how to use Cascading Style Sheets to format XML document. Finally, the module
explains the cascading order of style rules.
In this module, you will learn about:
Style Sheets
Selectors in CSS
XML Simplified
Module 5
Style Sheets
Concepts
XML Simplified
Module 5
Style Sheets
Concepts
XML Simplified
Module 5
Style Sheets
Concepts
Any style or presentation changes to data can be achieved faster as changes are to be
implemented in one place.
Device independence can be achieved by defining different style sheets for different device. For
example, you can have different style sheet for desktop computers, PDAs, and cell phones.
Reduction in document code as the presentation information is stored in a different file and can be
reused.
Note: CSS is supported by most browsers available today. Some of these browsers are Netscape
(6.0 or higher), Mozilla, Opera (4.0 or higher), and Internet Explorer (5.0 or higher). However, these
browsers support only parts of CSS specification. The current CSS specification is CSS 2. CSS 3
specification is under development.
selector
Selector is an element name of an XML document. A typical element name could be CD, Name, or
Title.
property
Property is a CSS style property that defines how the element will be rendered. Some of the CSS
properties are border, font, and color.
XML Simplified
Module 5
Style Sheets
Concepts
value
Value is the value associated with a CSS property. One CSS property can have several values. The
various values for property font-family are the various font family names such as "times", "arial",
and "courier" to name a few.
In Cascading Style Sheets (CSS), style rules can comprise more than one selector. To include multiple
selectors or group multiple selectors, one needs to provide a comma-separated list of element names.
The following code depicts a sample XML document containing information about endangered species.
Code Snippet:
<?xml version="1.0" ?>
<Endangered_Species>
<Animal>
<Name language="English">Tiger</Name>
<Threat>poachers</Threat>
<Weight>500 pounds</Weight>
</Animal>
</Endangered_Species>
The following code demonstrates an ideal way to define style rules by displaying each element on a
separate row.
Code Snippet:
Name { display: block }
Threat { display: block }
Weight { display: block }
However, these three style rules can be converted to a single style rule by grouping the selectors as
demonstrated in the following code.
Code Snippet:
Name, Threat, Weight { display: block }
XML Simplified
Module 5
5.1.6 Ways of Writing Style Rules
A single selector can have more than one property-value pairs associated with it. For example, figure 5.4
shows a CD element having two property declarations one to set the font family to sans-serif, and other
to set the color of text to black. Notice the property-value pairs are separated by a semi-colon.
Similarly, a collection of one or more property-value pairs can be associated with more than one selector.
For example, figure 5.4 shows two property declarations assigned to three elements namely CD, Title,
and Name.
It should appear in the prolog section of an XML document, that is, it should appear before any tag
of an element.
One XML document can have more than one style sheet processing instructions, each linked to
one .css file.
Syntax:
<?xml-stylesheet href="url" [type="text/css"]?>
where,
xml-stylesheet is the processing instruction.
XML Simplified
Concepts
Style Sheets
Module 5
Style Sheets
Concepts
url is the URL of a .css file; the .css file can on a local system or anywhere on the Internet.
type="text/css" is optional; however if a browser does not support CSS, it informs the browser
that it does not have to download a .css file.
The following code demonstrates an example of external style sheet.
Code Snippet:
<?xml-stylesheet href="headers.css" type="text/css"?>
Here the style rules are defined in a file named headers.css.
Knowledge Check 1
1.
Which of the following statements about style sheets are true and which are false?
(A)
(D)
Cascading Style Sheets derived the term cascade from the ability to mix and match rules
from different sources.
Cascading Style Sheets lack support to define spacing between data.
A CSS style sheet is associated with an XML document using the processing instruction
xml-stylesheet.
Style sheets allow you to mix presentation markup with data.
(E)
Style sheets contain one or more rules about the appearance of data.
(B)
(C)
Describe ID selectors.
XML Simplified
Module 5
Style Sheets
Concepts
5.2.3 ID Selector
An ID selector comprises a hash (#) symbol, immediately followed by an attribute's value followed by
property declarations. It is used to define styles for unique elements in a document. For example, if you
want the data of a unique element to be in a different style, you would define an ID selector for it. Unique
element is one which has one of its attributes named as id as shown in figure 5.5.
Module 5
Style Sheets
Concepts
Syntax:
#attribute_value { property_declarations }
The following code demonstrates id attribute's value.
Code Snippet:
#1001 { color : blue }
Displays the content of an element in blue if its id attribute's value equals 1001.
ID selectors are used to emphasize the data contained in unique elements in an XML document. However,
not all browsers support ID selectors. A browser has to read the document type definition (DTD) to
identify which attributes have an ID type. Browsers such as Safari, Mozilla, and Netscape do not read
external DTD subsets. Opera lacks the ability to read internal DTDs as well. Hence, these browsers may
not apply the style rules involving ID selectors. Internet Explorer does support ID selectors as it reads
external DTDs.
Knowledge Check 2
1.
Which of the following statements about selectors are true and which are false?
(A)
(B)
(C)
(D)
(E)
XML Simplified
Module 5
5.3.1 Color Properties
The CSS provides properties to set the foreground and background color of text. CSS uses color name,
red-green-blue (RGB) values, RGB percentages and hexadecimal values to specify color values. The
various ways in which color values are specified is shown in table 5.1.
Color Names
aqua
black
blue
gray
green
lime
maroon
navy
olive
purple
red
silver
teal
white
yellow
RGB Percentages
rgb(0%,65%,65)
rgb(0%,0%,0%)
rgb(0%,32%,100)
rgb(65%,65%,65%)
rgb(0%,100%,0%)
rgb(0%,65%,0%)
rgb(70%,0%,32%)
rgb(0%,0%,65%)
rgb(65%,65%,0%)
rgb(65%,0%,65%)
rgb(100%,0%,32%)
rgb(90%,90%,90%)
rgb(0%,65%,100%)
rgb(100%,100%,100%)
rgb(100%,100%,0%)
RGB Values
rgb(0,160,160)
rgb(0,0,0)
rgb(0,80,255)
rgb(160,160,160)
rgb(0,255,0)
rgb(0,160,0)
rgb(176,0,80)
rgb(0,0,160)
rgb(160,160,0)
rgb(160,0,160)
rgb(255,0,80)
rgb(225,225,255)
rgb(0,160,255)
rgb(255,255,255)
rgb(255,255,0)
Hexadecimal Values
#00a0a0
#000000
#0050ff
#a0a0a0
#00ff00
#00a000
#b00050
#0000a0
#a0a000
#a000a0
#ff0050
#d0d0d0
#00a0ff
#ffffff
#ffff00
Concepts
Style Sheets
Module 5
Concepts
Style Sheets
colorValue
colorValue can take up any value from the CSS color table.
background-color
Property to set the background color of text in an element.
Figure 5.7 shows the code for style rules defined in Colors.css file.
XML Simplified
Module 5
Style Sheets
Concepts
Description
To specify the font family
To specify the size of font
To specify the style of font
To specify the weight of font
XML Simplified
Module 5
Concepts
Style Sheets
Figure 5.11 shows the code for style rules stored in FontFamily.css file.
XML Simplified
Module 5
Style Sheets
Concepts
where,
font-size
Property to specify the size of font.
xx-small | x-small | small | medium | large | x-large | xx-large
One of various values that can be assigned to the property font-size.
Figure 5.15 shows the code for style rules defined in FontSize.css file.
XML Simplified
Module 5
Style Sheets
Concepts
XML Simplified
Module 5
Style Sheets
Concepts
Figure 5.19 shows the code for style rules defined in FontStyle.css file.
XML Simplified
Module 5
Style Sheets
Concepts
where,
BMW
Text is displayed in bold
Liverpool
Text is displayed in italics
XML Simplified
Module 5
Style Sheets
Concepts
Figure 5.23 shows the code for style rules defined in Margin.css file.
XML Simplified
Module 5
Style Sheets
Concepts
XML Simplified
Module 5
Style Sheets
Concepts
Figure 5.27 shows the code for style rules defined in Border.css file.
XML Simplified
Module 5
Style Sheets
Concepts
XML Simplified
Module 5
padding_width
Composite value that can have maximum four values in the following order: top, right, bottom, and
left. These values are followed by length unit designators. However, default is pixel unit.
Figure 5.31 shows the code for style rules defined in Padding.css file.
XML Simplified
Concepts
Style Sheets
Module 5
Concepts
Style Sheets
Color { padding: 2 5 8 }
Inserts padding between the borders and text of Color element. The value 2 is applied to top border,
5 to left and right borders, and value 8 to bottom border.
Price { padding: 2 5 8 10 }
Inserts padding between the borders and text of Price element. The value 2 is applied to top border,
5 to right border, 8 to bottom border, and 10 to left border.
The output is shown in figure 5.33.
Unit Designator
em defines the height of element's font.
ex defines the x-height of element's font.
px defines the pixel relative to display device.
Absolute
% - percentage.
in inches
cm centimeters
mm millimeters
pt 1/72 inch
pc 12 pt
Table 5.3: CSS Units
XML Simplified
Module 5
5.3.11 Position Properties
Every element's text is placed in a box of its own. Table 5.4 lists the CSS positioning properties to position
the text inside the box. Note that the properties top, left, bottom, and right are used only if value of position
property is not static.
Property
position
top
left
bottom
right
Description
Property to place an element in a static, relative, absolute
or fixed position
Property specifying how far the top edge of an element is
above/below the top edge of the parent element
Possible Values
static, fixed,
relative or absolute
Concepts
Style Sheets
Module 5
Concepts
Style Sheets
Figure 5.35 depicts the style sheet for position.css file.
XML Simplified
Module 5
5.3.13 Display Property
In HTML, if you wanted the text to appear as new paragraph, you would use the <P> tag. The same can
be achieved in XML by using CSS display property.
Figure 5.37 shows the syntax for display property.
XML Simplified
Concepts
Style Sheets
Module 5
Concepts
Style Sheets
Figure 5.38 shows the code for style rules defined in Display.css.
XML Simplified
Module 5
Style Sheets
Concepts
XML Simplified
Module 5
Concepts
Style Sheets
value
Floating point value followed by absolute units designators or relative units designators; or an
integer value followed by percentage (%) symbol.
Figure 5.42 shows the code for style rules defined in Align.css.
XML Simplified
Module 5
Style Sheets
Concepts
Knowledge Check 3
1.
(A)
(B)
(C)
(D)
(E)
2.
Description
To display an element's data in italics
To display an element's data in bold
To display an element's data in a small font
To display an element's data in Times New Roman
To display an element's data a big font
(1)
(2)
(3)
(4)
(5)
Property
font-family
font-size
font-style
font-weight
font-size: small
Can you write the style rules to display the data "XML by Example" in the format shown in the following
figure?
XML Simplified
Module 5
Concepts
Style Sheets
Note: In the image, display type is block, background color is blue, color is white, border is
medium solid magenta, text indent is 20.
3.
(A)
(B)
(C)
(D)
(E)
Description
To insert space around an element
To insert space between the text and border of element's box
To display an outline around the element's data
To place element's data at the specified location
To place element's data in the center
(1)
(2)
(3)
(4)
(5)
Property
text-align
border
padding
position
margin
Explain inheritance.
Find all property declarations for an element in question. Apply the style rules if the element name
matches the element in the selector. If there is no style rule defined for an element, the element
inherits the style rules defined for the parent element. However, if the parent element does not have
any style rules defined, then the element is rendered with default values.
2.
Style rules declared as important are considered next. A style rule can be declared important in the
following manner:
Threat { display: block ! important }
Important property declarations have a higher precedence over normal property declarations.
XML Simplified
Module 5
Note: Style rules defined as important are said to have higher weight.
3.
Next, the origin of style sheet is determined. A style sheet can have following sources:
Author: The author of the XML document can define a style sheet in an external document
to format the XML document.
User/Reader: The end-user viewing the XML document can specify his/her personal style
sheet to be used to format the XML document.
User-agent: The browser is referred to as user-agent. A browser has its own default style
sheet.
The style rules defined in the author's style sheet override the style rules defined in the user's style
sheet. The author's and the user's style sheet both override the user-agent's style sheet.
4.
5.
Next, the specificity of a selector is determined in that a more specific selector will override the
general selector. The specificity is determined by carrying out the following three activities:
Next, write the three numbers with no commas or spaces and in the same sequence as shown
earlier. Higher the number, higher is the specificity. Rules with higher specificity override the ones
with lower specificity. For example, following is a list of selectors sorted by specificity:
#favorite {...}
/* a=1;b=0;c=0; specificity = 100 */
Name, CD, Artist.caps {...} /* a=0;b=1;c=3; specificity = 013 */
Artist.caps {...}
/* a=0;b=0;c=1; specificity = 001 */
6.
If two rules have the same weight, then the one specified last wins.
XML Simplified
Concepts
Style Sheets
Module 5
Concepts
Style Sheets
Knowledge Check 4
1.
Which of the following statements about cascading and inheritance are true and which are false?
(A)
(B)
(C)
(D)
(E)
XML Simplified
Module 5
Style Sheets
Concepts
Module Summary
In this module, Style Sheets, you learnt about:
Style Sheets
Style Sheets are a set of rules that define the appearance of data. These rules are written in
a file with the extension .css. The .css file is associated with an XML document using the
xml-processing instruction.
Selectors in CSS
Selectors define the elements to which the styles will be applied. The various types of
selectors are simple, universal and ID selectors.
XML Simplified
Module
Concepts
Module Overview
Welcome to the module, XSL and XSLT. This module deals with the techniques of transforming
XML documents into XML, HTML, and Text documents using XSL style sheets. This module also
aims at providing a clear understanding of creating an XSL style sheet using the various XSL
elements.
In this module, you will learn about:
Introduction to XSL
Cascading Style Sheet (CSS) is a style sheet technology that is used for HTML content formatting.
XML Simplified
Module 6
Extensible Style sheet Language (XSL) is developed by World Wide Web Consortium (W3C) to describe
how the XML document should be displayed. Figure 6.1 shows an example of style sheet.
XML Simplified
Concepts
Module 6
Concepts
XML Simplified
Module 6
Note: You can transform an XML document into any text-based format, including HTML, plain text, or
another schema of XML using XSLT.
An XSLT processor is an application that connects an XML document with an XSLT style sheet. The
transformation of the XML document is in the form of a node tree, which can be displayed as output or
sent for further transformations.
Concepts
Module 6
Concepts
</xsl:styleheet>
where,
<xsl:stylesheet>: Root element of the style sheet.
xmlns:xsl="http://www.w3.org/1999/XSL/Transform": refers to the official W3C XSLT
namespace. You must include the attribute version="1.0" if you use this namespace.
The XSLT namespace must be declared at the top of the document in order to get access to the
XSLT elements, attributes and features.
Note: The <xsl:transform> element holds the same syntactic value as <xsl:stylesheet>. The
<xsl:transform> and <xsl:stylesheet> are completely synonymous and either can be used.
The xsl:stylesheet element is used more commonly than xsl:transform element.
XML Simplified
Module 6
The top level elements in the XSLT syntax are listed in table 6.1. These elements can occur directly
inside the xsl:stylesheet or xsl:transform elements.
Element
xsl:decimal-format
xsl:include
xsl:key
xsl:output
xsl:preserve-space
Description
Defines the default decimal format for number to text
conversion.
Used to include one style sheet into another.
Defines a named key to be used with the key() function for
operating on patterns and expressions.
Defines the format of the output created by the style
sheet.
Defines the format of the output created by the style
sheet.
Table 6.1: Top level XSLT Elements
XML Simplified
Concepts
Module 6
Concepts
NaN="string": Defines the string that is used to indicate that the value is not a number. The
default is string, "NaN".
pattern-separator="character":Define the character that is used to separate positive
and negative sub patterns in a format pattern. The default is semicolon (;).
percent="character": Defines the character that is used to represent a percent sign. The
default is the percent sign (%).
per-mille="character": Defines the character that is used to represent the per thousand
sign. The default is the Unicode per mille character (%).
zero-digit="character": Defines the character that is used to represent the digit zero.
The default is (0).
The following code demonstrates the several numeric format variations.
Code Snippet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:decimal-format name="name" digit="D" />
<xsl:output method="html"/>
<xsl:template match="/">
<xsl:value-of select='format-number(45665789, "#.000")' />
<xsl:value-of select='format-number(0.3456789, "###%")' />
<xsl:value-of select='format-number(789, "D.0", "name")' />
<xsl:value-of select='format-number(123456789, "$DDD,DDD,DDD.
DD","name")' />
<xsl:value-of select="format-number(193 div 200, '###.#%')"/>
<xsl:value-of select="format-number(a div 0, '###,###.00')"/>
<xsl:value-of select="format-number(1 div 0, '###,###.00')"/>
</xsl:template>
</xsl:stylesheet>
The select statement returns the values 45665789.000, 35%, 789.0, $123,456,789, 96.5%, NaN
and Infinity respectively.
XML Simplified
Module 6
XSL and XSLT
xsl:include XSLT element
Concepts
Syntax:
<xsl:include href = "uri"/>
where,
href =
included.
The following code demonstrates how to include another style sheet file within a style sheet file.
Code Snippet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:include href="ExampleTemplate.xsl"/>
<xsl:template match="/">
...........
..........
</xsl:template>
</xsl:stylesheet>
In this code, the ExampleTemplate.xsl style sheet file is included within the current style sheet
file.
XML Simplified
Module 6
Concepts
The following code contains the XML file and demonstrates how to use the xsl:key element.
Code Snippet:
<?xml version="1.0" encoding = "UTF-8"?>
<?xml-stylesheet type="text/xsl" href="key.xsl"?>
<APTstaff>
<Developer name="David Blake" address="B-602, East West Coast" phone="567877-9766"/>
<Developer name="Roger Blake" address="B-345, East West Coast" phone="345865-9777"/>
</APTstaff>
The following code contains the style sheet file and demonstrates how to use the xsl:key
element.
Code Snippet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:key name="stafflist" match="Developer" use="@name"/>
<xsl:template match="/">
<xsl:for-each select="key('stafflist', 'David Blake')">
NAME: <xsl:value-of select="@name"/>
ADDRESS: <xsl:value-of select="@address"/>
PHONE: <xsl:value-of select="@phone"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
The result upon execution would be:
NAME: David Blake ADDRESS: B-602, East West Coast PHONE: 567-8779766
XML Simplified
Module 6
XSL and XSLT
xsl:output XSLT element
Syntax:
<xsl:output cdata-section-elements="namelist" doctype-public="string"
doctype-system="string" encoding="string" indent="yes" | "no" mediatype="mimetype" method="html" | "name" | "text" | "xml" omit-xmldeclaration="yes" | "no" standalone="yes" | "no" version="version_number"
/>
The details of the attributes are given in table 6.2.
Attribute
Description
cdata-section-elements Optional. A white space delimited list of elements whose text contents
should be written as CDATA sections.
doctype-public
Optional. Specifies the public identifiers that go in the document type
declaration.
doctype-system
Optional. Specifies the system identifiers that go in the document type
declaration.
encoding
Optional. Sets the value of the encoding attribute in the output.
indent
Optional. Specifies whether or not to indent the output. If set to "yes",
the XML and HTML outputs are step-indented to make them more
readable.
media-type
Optional. Defines the MIME type of the output. The default is "text/
xml".
method
Optional. Defines the type of output. The three permitted values are
HTML, text and XML. The default is XML. However, if the first child
element of the root node is the HTML <html> tag and there are no
preceding text nodes, then the default output type is set to HTML.
omit-xml-declaration
Optional. A "yes" specifies that the XML declaration (<?xml...?>) should
be ignored in the output. The "no" specifies that the XML declaration
should be included in the output. The default is "no".
standalone
Optional. Specifies whether the XSLT processor should output a
standalone declaration. Yes specifies that a standalone declaration
should occur in the output. No, the default, signifies that a standalone
declaration should occur in the output.
version
Optional. Provides the W3C version number for the output format. If
the output is XML, the default version is 1.0. Or if the output type is
HTML, the default version is 4.0.
Table 6.2: xsl:output element Attributes
XML Simplified
Concepts
Module 6
Concepts
The following code demonstrates the use of xsl:output element. It shows that the output will be
an HTML document, version 4.0 and indented for readability.
Code Snippet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" version="4.0" indent="yes"/>
...
...
</xsl:stylesheet>
XML Simplified
Module 6
XSL and XSLT
Concepts
<br/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
XSL
Style sheet language to create a style for XML
documents
Provides a means of transforming XML documents
Knowledge Check 1
1.
The steps to process an XSL style sheet and apply it to an XML document are given here. Can you
arrange the steps in sequence to achieve the processing?
(1)
(2)
(3)
(4)
(5)
XML Simplified
Module 6
Concepts
2.
Can you match the XSL elements against their corresponding description?
(A)
(B)
(C)
(D)
(E)
3.
Description
Used to add one style sheet to another
Allow to set a variable in the xsl file
Allow the style sheet to alias one namespace prefix
for another in the result tree
Used to build templates
Allow style sheet authors to specify how they wish the
result tree to be output
(1)
(2)
(3)
XPath Node
xsl:template
xsl:output
xsl:import
(4)
(5)
xsl:namespace-alias
xsl:variable
Which of the statements about CSS and XSL style sheet languages are true and which statements
are false?
(A)
(B)
(C)
(D)
(E)
XML Simplified
Module 6
6.2.1 XSL Templates
A template is the main component of a style sheet. Templates are defined with the help of rules. A
template rule is used to control the output of the XSLT processor. It defines the method by which an
XML element node is transformed into an XSL element node. A template rule consists of a pattern that
identifies the XML node and an action that selects and transforms the node.
Each template rule is represented by the xsl:template element. The xsl:template is an element
that defines an action for producing output from a source document.
Figure 6.4 shows an example of XSL template.
XML Simplified
Concepts
Module 6
Concepts
Built-in template rule for modes: Ensures that elements and roots in any mode are processed.
The built-in template rule for modes is as follows:
<xsl:template match="*|/" mode="x">
<xsl:apply-templates mode="x"/>
</xsl:template>
Built-in template rule for element and root nodes: Processes the root node and its children.
The built-in template rules for element and root nodes are as follows:
<xsl:template match="*|/">
<xsl:apply-templates/>
</xsl:template>
Built-in template rule for text and attribute nodes: Copies text and attribute nodes to the result
tree. The built-in template rules for text and attribute nodes are as follows:
<xsl:template match="text()|@*">
<xsl:value-of select="."/>
</xsl:template>
Built-in template rule for comment and processing instruction nodes: Does not perform any
function. It is applied to the comments and processing instructions in the XML document. The
built-in template rules for comment and processing instruction nodes are as follows:
<xsl:template match="comment()|processing-instruction()"/>
Built-in template rule for namespaces node: Does not perform any function. It is applied to the
namespaces node in the XML document. The built-in template rule for namespaces node is as
follows:
<xsl:template match="namespace()"/>
Note: The Built-in rules have a lower priority than the other template rules.
While traversing an XML document, template rules are activated in the order in which they match
the elements. A template can change the order of traversal, and it can also prevent particular
elements from being processed. The xsl:apply-templates element allows the user to choose
the order in which the document is to be traversed.
XML Simplified
Module 6
For example, consider an element Order having child elements as OrderNumber, ItemInfo,
and Price. The ItemInfo element has further child elements as ItemName and Quantity. The
ItemName element comes before the Quantity element in the tree structure.
To get an output where Quantity is displayed before ItemName, the following template can be
used:
<xsl: template match= "ItemInfo">
<xsl: value-of select= "Quantity"/>
<xsl: value-of select= "ItemName"/>
</xsl: template>
>
</xsl:template>
where,
match: Is a pattern that is used to define which nodes will have which template rules applied to
them. If this attribute is omitted there must be a name attribute.
mode: Allows the same nodes to be processed more than once.
name: Specifies a name for the template. If this attribute is omitted there must be a match
attribute.
XML Simplified
Concepts
Module 6
Concepts
priority: Is a real number that sets the priority of importance for a template. The higher the
number, the higher the priority.
The syntax for the some of the match patterns are represented as follows:
The <xsl:template match ="/"> represents the root element in the XML document, <xsl:
template match = "name of the element"> represents all the elements in the XML document
that have the specified name, <xsl:template match = parent1/child> represents the child
element by specifying the exact parent element and <xsl:template match = "/ | *"> matches
the entire document.
The code in figure 6.5 demonstrates the usage of xsl:template element in style sheets.
XML Simplified
Module 6
6.2.3 The xsl:apply-templates Element
The xsl:apply-templates element defines a set of nodes to be processed. This element, by default,
selects all child nodes of the current node being processed, and finds a matching template rule to apply
to each node in the set.
Figure 6.6 shows the syntax for xsl:apply-templates.
XML Simplified
Concepts
Module 6
Concepts
Figure 6.7 shows the code for style rules that are defined in GEM_Stylesheet.xsl file.
XML Simplified
Module 6
XSL and XSLT
Concepts
Module 6
Concepts
XML Simplified
Module 6
XSL and XSLT
select
Uses the same kind of patterns as the match attribute of the xsl:template element. If select attribute
is not present, all child element, comment, text, and processing instruction nodes are selected.
Figure 6.11 shows the code for style rules that are defined in book_stylesheet.xsl file.
XML Simplified
Concepts
Module 6
Concepts
Figure 6.12 depicts the style sheet for book_style sheet.xsl file.
XML Simplified
Module 6
XSL and XSLT
Concepts
Module 6
Concepts
yes
Indicates that special characters should be displayed as is (for example, a < or >).
no
Indicates that special characters should not be displayed as is (for example, a > is displayed as
>).
Figure 6.15 shows the code for style rules that are defined in person_stylesheet.xsl file.
XML Simplified
Module 6
XSL and XSLT
Concepts
XML Simplified
Module 6
Concepts
XML Simplified
Module 6
XSL and XSLT
Concepts
XML Simplified
Module 6
Concepts
xsl:value-of select="Department"
Displays the value of Department element.
xsl:value-of select="Name"
Displays the value of Name element.
xsl:value-of select="Salary"
Displays the value of Salary element.
xsl:value-of select="Language"
Displays the value of Language element.
</xsl:for-each>
End of for-each loop.
The output is shown in figure 6.21.
XML Simplified
Module 6
6.2.7 The xsl:text element
The xsl:text element is used to add literal text to the output. This element cannot contain any other
XSL elements. It can contain only text.
Figure 6.22 shows the syntax for xsl:text element.
XML Simplified
Concepts
Module 6
Concepts
Figure 6.23 shows the code for style rules that are defined in Orders_stylesheet.xsl file.
XML Simplified
Module 6
XSL and XSLT
Concepts
where,
xsl:for-each select="Orders/Item"
Iterates through the Item element.
xsl:value-of select="Name"
Displays the value of Name element.
<xsl:text>,</xsl:text>
Inserts a comma (,) after each name value.
<xsl:text>!!!!</xsl:text>
Inserts four exclamation marks (!) at the end of the output.
The output is shown in figure 6.25.
XML Simplified
Module 6
Concepts
where,
count="pattern"
Indicates what nodes are to be counted. Only nodes that match the pattern are counted.
format="{ string }"
Sequence of tokens that specifies the format to be used for each number in the list.
value="expression"
Specifies the expression to be converted to a number and output to the result tree.
Figure 6.27 shows the code for style rules that are defined in item_stylesheet.xsl file.
XML Simplified
Module 6
XSL and XSLT
Concepts
Module 6
Concepts
where,
1.Water Bottle, I.Water Bottle
The first item is numbered with number 1 and roman number I.
4.Mobile Phone, IV. Mobile Phone
The fourth item is numbered with number 4 and roman number IV.
where,
test=expression
The condition in the source data to test with either a true or false answer.
XML Simplified
Module 6
Figure 6.31 shows the code for style rules are that defined in Librarybooks_stylesheet.xsl file.
XML Simplified
Concepts
Module 6
Concepts
XML Simplified
Module 6
XSL and XSLT
Concepts
Module 6
Concepts
Figure 6.35 shows the code for style rules that are defined in Publisherbooks_stylesheet.xsl
file.
XML Simplified
Module 6
XSL and XSLT
Concepts
XML Simplified
Module 6
Concepts
where,
xsl:when test="Price > 100"
Checks whether the price of the book is greater than 100. If true, the details like Author, Price
and Year are displayed with magenta as the background color.
xsl:otherwise
If all the conditions are false, this block is executed. Here, all other book details are printed in
normal background color.
The output is shown in figure 6.37.
XML Simplified
Module 6
XSL and XSLT
case-order
Indicates whether the sort will have upper or lowercase letters listed first in the sort output. The default
option is to list uppercase first.
data-type
Specifies the data type of the strings.
number
Sort key is converted to a number.
qname
Sort is based upon a user-defined data type.
text
Specifies that the sort keys should be sorted alphabetically.
order
The sort order for the strings. The default value is "ascending".
select
Expression that defines the key upon which the sort will be based. The expression is evaluated and
converted to a string that is used as the sort key.
XML Simplified
Concepts
Module 6
Concepts
Figure 3.69 shows the code for style rules that are defined in products_stylesheet.xsl file.
XML Simplified
Module 6
XSL and XSLT
Concepts
XML Simplified
Module 6
Concepts
Knowledge Check 2
1.
Can you identify the correct code to display the following output "March 20, 2001 The west
coast of Atlanta some 150 dolphins" ?
XML File
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="stylesheet.xsl" ?>
<Article>
<Date>March 20, 2001</Date>
<Para>
The west coast of
<Place>Atlanta</Place>
some 150 dolphins
</Para>
</Article>
(A) Style sheet File
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="Article">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="Date">
<xsl:apply-templates/>
<xsl:template match="Para">
<xsl:apply-templates/>
</xsl:stylesheet>
XML Simplified
Module 6
XML File
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="stylesheet.xsl" ?>
<Article>
<Para>
The west coast of
<Place>Atlanta</Place>
<Date>March 20, 2001</Date>
some 150 dolphins
</Para>
</Article>
(B) Stylesheet File
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="Article">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="Date">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Para">
<xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>
XML Simplified
Concepts
Module 6
Concepts
XML File
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="stylesheet.xsl" ?>
<Article>
<Date>March 20, 2001</Date>
<Para>
The west coast of
<Place>Atlanta</Place>
some 150 dolphins
</Para>
</Article>
(C) Style sheet File
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="Article">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="Date">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="Para">
<xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>
XML Simplified
Module 6
XML File
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="stylesheet.xsl" ?>
<Article>
<Date>March 20, 2001</Date>
<Para>
<Place>The west coast of</Place>
<Place>Atlanta</Place>
<Place>some 150 dolphins</Place>
</Para>
</Article>
(D)
Stylesheet File
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="Article">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="Date">
<xsl:template match="Para">
<xsl:apply-templates/>
<xsl:apply-templates/>
</xsl:template>
</xsl:template>
</xsl:stylesheet>
2.
Can you match the XSL elements against their corresponding description?
(A)
(B)
(C)
(D)
(E)
Description
Puts a conditional test against the content of the XML file
Adds literal text to the output
Extracts the value of a selected node
Applies a template repeatedly
Inserts a multiple conditional test against the XML file
XML Simplified
(1)
(2)
(3)
(4)
(5)
XPath Node
xsl:value-of
xsl:for-each
xsl:text
xsl:choose
xsl:if
Concepts
Module 6
Concepts
3.
Can you identify the correct code to display the following output "A. David Blake18/11/1973"?
XML File
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="stylesheet.xsl"?>
<MichiganStaff>
<Faculty>
<Name>David Blake</Name>
<DOB>18/11/1973</DOB>
</Faculty>
</MichiganStaff>
(A) Style sheet File
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/">
<xsl:for-each select="MichiganStaff/Faculty/Name">
<xsl:for-each select="MichiganStaff/Faculty">
<xsl:number value="position()" sequence="A."/>
<xsl:value-of select="Name"/>
<xsl:value-of select="DOB"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
XML Simplified
Module 6
XSL and XSLT
Concepts
XML File
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="stylesheet.xsl"?>
<MichiganStaff>
<Faculty>
<Name>David Blake</Name>
<DOB>18/11/1973</DOB>
</Faculty>
</MichiganStaff>
(B) Stylesheet File
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/">
<xsl:for-each select="MichiganStaff/Faculty/Name">
<xsl:for-each select="MichiganStaff/Faculty/DOB">
<xsl:number value="position()" format="A."/>
<xsl:value-of select="Name"/>
<xsl:value-of select="DOB"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
XML Simplified
Module 6
Concepts
XML File
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="stylesheet.xsl"?>
<MichiganStaff>
<Faculty>
<Name>David Blake</Name>
<DOB>18/11/1973</DOB>
</Faculty>
</MichiganStaff>
(C) Style sheet File
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/">
<xsl:for-each select="MichiganStaff">
<xsl:number value="position()" format="A."/>
<xsl:value-of select="Name"/>
<xsl:value-of select="DOB"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
XML Simplified
Module 6
XSL and XSLT
Concepts
XML File
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="stylesheet.xsl"?>
<MichiganStaff>
<Faculty>
<Name>David Blake</Name>
<DOB>18/11/1973</DOB>
</Faculty>
</MichiganStaff>
(D) Style sheet File
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/">
<xsl:for-each select="MichiganStaff/Faculty">
<xsl:number value="position()" format="A."/>
<xsl:value-of select="Name"/>
<xsl:value-of select="DOB"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
XML Simplified
Module 6
Concepts
Module Summary
In this module, XSL and XSLT, you learnt about:
Introduction to XSL
XML provides the ability to format document content. XSL provides the ability to define how
the formatted XML content is presented. An XSL Transformation applies rules to a source
tree read from an XML document to transform it into an output tree written out as an XML
document.
XML Simplified
Module
Concepts
More on XSLT
Module Overview
Welcome to the module, More on XSLT. This module aims at giving a clear understanding on
XPath, and identifying the various nodes of XPath. This module lists the different operators used
with XPath and describes the various XPath expressions and functions. Finally, this module
explains how to switch between styles and how to transform XML documents into HTML using
XSLT.
In this module, you will learn about:
XPath
7.1 XPath
In this first lesson, XPath, you will learn to:
7.1.1 XPath
XPath can be thought of as a query language like SQL. However, rather than extracting information from
a database, it extracts information from an XML document. XPath is a language for retrieving information
from a XML document. XPath is used to navigate through elements and attributes in an XML document.
Thus, XPath allows identifying parts of an XML document.
XML Simplified
Module 7
More on XSLT
Concepts
XPath provides a common syntax as shown in figure 7.1 for features shared by XSLT and XQuery.
XSLT
XSLT is a language for transforming XML documents into XML, HTML, or text.
XQuery
XQuery builds on XPath and is a language for extracting information from XML documents.
Any path that can occur in an XML document and any set of conditions for the nodes in the path
can be specified.
XPath is designed to be used in many contexts. It is applicable to providing links to nodes, for searching
repositories, and for many other applications.
XML Simplified
Module 7
7.1.3 XML Document in XPath
In XPath, an XML document is viewed conceptually as a tree in which each part of the document is
represented as a node as shown in figure 7.2.
Root
The XPath tree has a single root node, which contains all other nodes in the tree.
Element
Every element in a document has a corresponding element node that appears in the tree under
the root node. Within an element node appear all of the other types of nodes that correspond
to the element's content. Element nodes may have a unique identifier associated with them that is
used to reference the node with XPath.
XML Simplified
Concepts
More on XSLT
Module 7
More on XSLT
Concepts
Attribute
Each element node has an associated set of attribute nodes; the element is the parent of each of
these attribute nodes; however, an attribute node is not a child of its parent element.
Text
Character data is grouped into text nodes. Characters inside comments, processing instructions
and attribute values do not produce text nodes. The text node has a parent node and it may be
the child node too.
Comment
There is a comment node for every comment, except for any comment that occurs within the
document type declaration. The comment node has a parent node and it may be the child node
too.
Processing instruction
There is a processing instruction node for every processing instruction, except for any
processing instruction that occurs within the document type declaration. The processing
instruction node has a parent node and it may be the child node too.
Namespace
Each element has an associated set of namespace nodes. Although the namespace node has
a parent node, the namespace node is not considered a child of its parent node because they
are not contained in a parent node, but are used to provide descriptive information about their
parent node.
XML Simplified
Module 7
More on XSLT
Node Type
root
element
attribute
String Value
Determined by concatenating the stringvalues of all text-node descendents in
document order.
Determined by concatenating the stringvalues of all text-node descendents in
document order.
The normalized value of the attribute.
text
Expanded Name
None.
Concepts
Module 7
More on XSLT
Concepts
XML Simplified
Module 7
7.1.5 Operators in XPath
An XPath expression returns a node set, a boolean, a string, or a number. XPath provides basic floating
point arithmetic operators and some comparison and boolean operators.
The XPath expressions are constructed using the operators and special characters as shown in
table 7.2.
Operator
/
//
.
..
*
@
:
Description
Child operator; selects immediate children of the left-side collection
Recursive descent; searches for the specified element at any depth
Indicates the current context
The parent of the current context node
Wildcard; selects all elements regardless of the element name
Attribute; prefix for an attribute name
Namespace separator; separates the namespace prefix from the element or attribute
name
Table 7.2: XPath Operators
Note: In floating point operators, instead of / and % the keywords div and mod are used respectively.
XML Simplified
Concepts
More on XSLT
Module 7
More on XSLT
Concepts
Additionally, a list of the operators that can be used in XPath expressions is given in table 7.4.
Operator
|
Description
Example
Computes two node-sets //book | //cd
+
*
div
=
Addition
Subtraction
Multiplication
Division
Equal
!=
<
<=
>
>=
or
and
mod
Not Equal
less than
greater than
or
36+14
36-14
6*4
28 div 14
price=9.50
Return value
Returns a node-set with all
book and cd elements.
50
22
24
2
true, if price is 9.50
price!=9.50
price<9.50
price<=9.50
price>9.50
price>=9.50
price=9.50 or
price=9.90
and
price=9.50 and
price=9.90
Modulus (division
remainder)
15 mod 4
XML Simplified
Module 7
7.1.7 Types of Matching
XPath is used in the creation of the patterns. The match attribute of the xsl:template element supports
a complex syntax that allows to express exactly which nodes to be matched. The select attribute of
xsl:apply-templates, xsl:value-of, xsl:for-each, xsl:copy-of, and xsl:sort supports
an even more powerful superset of this syntax that allows to express exactly which nodes to selected and
which nodes not to be selected.
Some important types of matching are:
Matching by name
The source element is simply identified by its name, using the match attribute. The value given to
the match attribute is called the pattern. The following code demonstrate the example.
Code Snippet:
<xsl:template match = "Greeting">
matches all greeting elements in the source document.
Matching by ancestry
As in CSS, a match can be made using the element's ancestry. The following tag will match any 'EM'
element that has 'P' as an ancestor.
Code Snippet:
<xsl:template match = "P//EM">
XML Simplified
Concepts
More on XSLT
Module 7
More on XSLT
Concepts
Matching by attribute
The syntax used to match the attribute is:
Syntax:
<xsl:template match = " element name['attribute' (attribute-name)=attributevalue]">
This syntax uses square brackets to hold the attribute name and value.
The following code demonstrate the example.
Code Snippet:
<xsl:template match="Product">
<xsl:apply-templates select="@Units"/>
</xsl:template>
The given example applies the templates to the non-existent Units attributes of Product elements.
XML Simplified
Module 7
More on XSLT
1.
Which of these statements about XPath are true and which of these are false?
(A)
(B)
(C)
(D)
(E)
2.
XPath provides multiple syntax that can be used for queries, addressing and patterns.
XPath can be thought of as a query language like SQL.
In XPath, the structure of an XML document is viewed conceptually as a pyramid.
XPath provides a common syntax for features shared by XSLT and XQuery.
XPath is used to navigate through elements and attributes in an XML document.
Can you match the XPath nodes against their corresponding description?
(A)
(B)
(C)
(D)
(E)
3.
Concepts
Knowledge Check 1
Description
Has a parent node and it may be the child node too
Contains all other nodes in the tree
Is not considered a child of its parent node because they are not
contained in a parent node
May have a unique identifier associated with them, which is useful
when referencing the node with XPath
Has a parent node that is either an element or root node
XPath Node
Element
(1)
(2) Attribute
(3) Text
(4) Namespace
(5) Root
Can you match the different types of matching against their corresponding description?
Description
(A) <xsl:template match="/">
...
</xsl:template>
(B) <xsl:template match = "Greeting">
(C) <xsl:template match = "P//EM">
(D) <xsl:template match="Product">
<xsl:apply-templates select="@Unit"/>
</xsl:template>
(E) <xsl:template match="Product">
<xsl:value-of select="Product_ID"/>
</xsl:template>
XML Simplified
Types of Matching
(1) Matching by name
Module 7
Concepts
More on XSLT
Node-set
A node-set is an unordered group of nodes from the input document that match an expression's
criteria.
Boolean
A boolean has one of two values: true or false. XSLT allows any kind of data to be transformed
into a boolean. This is often done implicitly when a string or a number or a node-set is used
where a boolean is expected.
Number
XPath numbers are numeric values useful for counting nodes and performing simple arithmetic.
The numbers like 43 or 7000 that look like integers are stored as doubles. Non-number values,
such as strings and booleans, are converted to numbers automatically as necessary.
XML Simplified
Module 7
More on XSLT
String
A String is a sequence of zero or more Unicode characters. Other data types can be converted
to strings using the string() function.
Note: The XPath expression syntax includes literal forms for strings and numbers as well as operators
and functions for manipulating all four XPath data types.
XML Simplified
Concepts
Module 7
Concepts
More on XSLT
Figure 7.5 shows the syntax for node-set functions.
XML Simplified
Module 7
More on XSLT
Concepts
Module 7
More on XSLT
Concepts
XML Simplified
Module 7
More on XSLT
Concepts
The output is shown in figure 7.8. The formatted output gets displayed as shown here.
boolean(arg)
The function returns a boolean value for a number, string, or node-set. The syntax, code, and
output are shown.
Syntax:
fn:boolean(arg)
Code Snippet:
<ul>
XML Simplified
Module 7
More on XSLT
Concepts
not(arg)
The sense of an operation can be reversed by using the not() function. The syntax and code is
shown.
Syntax:
fn:not(arg)
Code Snippet:
This template rule selects all Product elements that are not the first child of their parents:
<xsl:template match="PRODUCT[not(position()=1)]">
<xsl:value-of select="."/>
</xsl:template>
The same template rule could be written using the not equal operator != instead:
Code Snippet:
<xsl:template match="PRODUCT[position()!=1]">
<xsl:value-of select="."/>
</xsl:template>
XML Simplified
Module 7
More on XSLT
true()
The true() function returns the boolean value true. The syntax, code, and output are shown.
Syntax:
fn:true()
Code Snippet:
<xsl:value-of select="true()"/>
Output:
true
false()
The false() function returns the boolean value true. The syntax, code, and output are shown.
Syntax:
fn:false()
Code Snippet:
The code snippet shows how to use true() and false() in XSL.
<xsl:value-of select="false() or false()"/>
<xsl:value-of select="true() and false()"/>
<xsl:value-of select="false() and false()"/>
Output:
true
false
false
The value derived from an expression depends on some rules as shown in table 7.5.
Expression Type
Node-set
String
Rule
True if the set contains at least one node, false if it is empty.
True unless the string is zero-length.
XML Simplified
Concepts
Module 7
Concepts
More on XSLT
Number
True unless the value is zero or NaN (not a number).
Result tree fragment Always true, because every fragment contains at least one node, its root node.
Table 7.5: Boolean Conversion Rules
Certain operators compare numerical values to arrive at a Boolean value. All the nodes in a node-set are
tested to determine whether any of them satisfies the comparison or not. The Comparison operators are
shown in table 7.6.
Operator
expr = expr
expr != expr
expr < expr
Returns
True if both expressions (string or numeric) have the same value, otherwise false.
True if the expressions do not have the same value (string or numeric), otherwise false.
True if the value of the first numeric expression is less than the value of the second,
otherwise false.
expr > expr True if the value of the first numeric expression is greater than the value of the second,
otherwise false.
expr <= expr True if the value of the first numeric expression is less than or equal to the value of the
second, otherwise false.
expr >= expr True if the value of the first numeric expression is greater than or equal to the value of
the second, otherwise false.
Table 7.6: Comparison Operators
Note: To use some of the comparison operators inside an XML document such as an XSLT style sheet
or a schema, one must use character references < and > instead of < and >.
The different functions that return Boolean functions are shown in table 7.7.
Function
expr and expr
expr or expr
true()
false()
not(expr)
Returns
True if both Boolean expressions are true, otherwise false.
True if at least one Boolean expression is true, otherwise false.
True.
False.
Negates the value of the Boolean expression: true if the expression is
false, otherwise false.
Table 7.7: Boolean Functions
XML Simplified
Module 7
7.2.5 Numeric Functions
XPath syntax supports number functions that return strings or numbers and can be used with comparison
operators in filter patterns. The different numeric functions are number(arg), ceiling(num),
floor(num) and round(num).
The rules for converting any expression into a numeric value are listed in table 7.8.
Expression Type
Node-set
Boolean
String
Result-tree fragment
Rule
The first node is converted into a string, then the string conversion
rule is used.
The value true is converted to the number 1, and false to the
number 0.
If the string is the literal serialization of a number (i.e., -123.5), it is
converted into that number. Otherwise, the value NaN is used.
Like node-sets, a result-tree fragment is converted into a string,
which is then converted with the string rule.
Returns
The sum of two numeric expressions.
The difference of the first numeric expression minus the second.
The product of two numeric expressions.
The first numeric expression divided by the second expression.
The first numeric expression modulo the second expression.
The value of the expression rounded to the nearest integer.
The sum of the values of the nodes in node-set. Unlike the other
functions in this table, this function operates over a node-set instead
of expressions.
Table 7.9: Numeric Operators and Functions
XML Simplified
Concepts
More on XSLT
Module 7
More on XSLT
Concepts
XML Simplified
Module 7
More on XSLT
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="Number.xsl"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/
Transform">
<xsl:output method="html"/>
<xsl:template match="/">
<html>
<body>
<h3>Numeric Functions</h3>
<ul>
<li>
<b>number('1548')</b>
=
<xsl:value-of select="number('1548')"/>
</li>
<li>
<b>number('-1548')</b>
=
<xsl:value-of select="number('-1548')"/>
</li>
<li>
<b>number('text')</b>
=
<xsl:value-of select="number('text')"/>
</li>
<li>
<b>number('226.38' div '1')</b>
=
<xsl:value-of select="number('226.38' div '1')"/>
</li>
</ul>
<ul>
<li>
<b>ceiling(2.5)</b>
=
<xsl:value-of select="ceiling(2.5)"/>
XML Simplified
Concepts
The following code depicts the style sheet for numeric functions.
Module 7
Concepts
More on XSLT
</li>
<b>ceiling(-2.3)</b>
=
<xsl:value-of select="ceiling(-2.3)"/>
</li>
<li>
<b>ceiling(4)</b>
=
<xsl:value-of select="ceiling(4)"/>
</li>
</ul>
<ul>
<li>
<b>floor(2.5)</b>
=
<xsl:value-of select="floor(2.5)"/>
</li>
<li>
<b>floor(-2.3)</b>
=
<xsl:value-of select="floor(-2.3)"/>
</li>
<li>
<b>floor(4)</b>
=
<xsl:value-of select="floor(4)"/>
</li>
</ul>
<ul>
<li>
<b>round(3.6)</b>
=
<xsl:value-of select="round(3.6)"/>
</li>
<li>
<b>round(3.4)</b>
=
<xsl:value-of select="round(3.4)"/>
XML Simplified
Module 7
More on XSLT
Concepts
</li>
<li>
<b>round(3.5)</b>
=
<xsl:value-of select="round(3.5)"/>
</li>
<li>
<b>round(-0.6)</b>
=
<xsl:value-of select="round(-0.6)"/>
</li>
<li>
<b>round(-2.5)</b>
=
<xsl:value-of select="round(-2.5)"/>
</li>
</ul>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
The output is shown in figure 7.10.
Module 7
Concepts
More on XSLT
Returns
A string that is the concatenation of the string
arguments.
format-number(number, pattern, decimal-format) A string containing the number, formatted
according to pattern. The optional decimalformat argument points to a format declaration
which assigns special characters like the grouping
character, which separates groups of digits in large
numbers for readability.
normalize-space(string)
The string with leading and trailing whitespace
removed, and all other strings of whitespace
characters replaced with single spaces.
substring(string, offset, range)
A substring of the string argument, starting offset
characters from the beginning and ending range
characters from the offset.
substring-after(string, to-match)
A substring of the string argument, starting at the
end of the first occurrence of the string to-match
and ending at the end of string.
substring-before(string, to-match)
A substring of the string argument, starting at the
beginning of string and ending at the beginning of
the first occurrence of the string to-match.
translate(string, characters-to-match, characters- The string with all characters in the string
replace-with)
characters-to-match replaced with their
counterpart characters in the string charactersreplace-with.
Table 7.10: String Functions
Some functions operate on strings and return numeric or Boolean values, are listed in table 7.11.
Function
contains(string, sub)
starts-with(string, sub)
string-length(string)
Returns
True if the given substring sub occurs within the string,
otherwise false.
True if the string begins with the substring sub, otherwise
false.
The number of characters inside the string.
Table 7.11: Other String Functions
XML Simplified
Module 7
More on XSLT
Concepts
XML Simplified
Module 7
More on XSLT
Concepts
XML Simplified
Module 7
More on XSLT
Concepts
XML Simplified
Module 7
More on XSLT
Concepts
Knowledge Check 2
1.
Which of these statements about XPath expressions are true and which of these are false?
(A)
(B)
(C)
(D)
(E)
2.
Which of these statements about XPath functions are true and which of these are false?
(A) The local-name() function returns the name of the current node or the first node in the
specified node set without the namespace prefix.
(B) The floor(num) function returns the largest integer that is not greater than the number
argument.
(C) The only allowed operation in a result tree fragment is on a number.
(D) In a substring() function, the index of the first character is 0.
(E) The translate() function returns the first argument string with occurrences of characters
in the second argument string replaced by the character at the corresponding position in the
third argument string.
XML Simplified
Module 7
More on XSLT
Concepts
Module 7
Concepts
More on XSLT
Step 1
Start by creating a normal XML document:
<?xml version="1.0" encoding="UTF-8"?>
XML Simplified
Module 7
More on XSLT
Step 2
Concepts
Step 3
Now, set it up to produce HTML-compatible output:
Code Snippet:
<xsl:stylesheet>
<xsl:output method="html"/>
...
</xsl:stylesheet>
To output anything besides well-formed XML, an <xsl:output> tag should be used like the one
shown, specifying either "text" or "html". (The default value is "xml".). Figure 7.17 depicts XML
transformation.
Module 7
Concepts
More on XSLT
XML Simplified
Module 7
More on XSLT
Concepts
XML Simplified
Module 7
Concepts
More on XSLT
Knowledge Check 3
1.
Which of these statements about switching between styles are true and which of these are false?
(A) An XSLT processor takes three things as input such as XSLT style sheet, XML document
and Document Type Declaration.
(B) The XSLT engine begins by reading in the XSLT style sheet and caching it as a look-up
table.
(C) For each node it processes, it will look in the table for the best matching rule to apply.
(D) Starting from the root node, the XSLT engine finds rules, executes them, and continues until
there are no more nodes in its context node set to work with.
(E) XSLT can also be called as XSLT document or transformation script.
2.
Can you specify the correct code snippet for transforming the XML document into HTML using
XSLT?
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"
(A) >
......
<xsl:output method="html"/>
......
</xsl:stylesheet>
<?xml version="1.0" encoding="UTF-8"?>
<xsl:output method="html"/>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
(B)
version="1.0"
>
......
</xsl:stylesheet>
XML Simplified
Module 7
More on XSLT
XML Simplified
Concepts
Module 7
Concepts
More on XSLT
Module Summary
In this module, More on XSLT, you learnt about:
XPath
XPath is a notation for retrieving information from a document. XPath provides a common
syntax for features shared by Extensible Style sheet Language Transformations (XSLT)
and XQuery. XPath have seven types of node as Root, Element, Attribute, Text, Comment,
Processing instruction and Namespace. XPath is used in the creation of the patterns.
XML Simplified
Answers
Answers to Knowledge
Checks
Module 1
Knowledge Check 1
1.
(A) - True, (B) False, (C) True, (D) True, (E) - True
2.
(A) - False, (B) - True, (C) - False, (D) - True, (E) True
1.
(A) - True, (B) True, (C) True, (D) False, (E) - True
2.
(A) True, (B) False, (C) True, (D) False, (E) - True
3.
(A) True, (B) False, (C) True, (D) False, (E) False
Answers
Knowledge Check 2
Knowledge Check 3
1.
(A)
2.
(C)
Knowledge Check 4
1.
(A) False, (B) False, (C) True, (D) True, (E) - False
2.
(A) False, (B) False, (C) True, (D) True, (E) - False
Module 2
Knowledge Check 1
1.
Knowledge Check 2
1.
(A)
2.
(A) - False, (B) - True, (C) - False, (D) - False, (E) - True
3.
(A) - True, (B) - False, (C) - False, (D) - True, (E) - False
XML Simplified
page
Module 3
Knowledge Check 1
1.
(A) - (False), (B) - (True), (C) - (True), (D) - (True), (E) - (False)
Answers
Knowledge Check 2
1.
(A) - (False), (B) - (True), (C) - (True), (D) - (True), (E) - (False)
2.
(A) - Declare all the possible elements, (B) - Specify the permissible element children, if any, (C) Set the order in which elements must appear, (D) - Declare all the possible element attributes, (E) Set the attribute data types and values, (F) - Declare all the possible entities.
Knowledge Check 3
1.
(C)
Knowledge Check 4
1.
(A) - (3), (B) - (4), (C) - (5), (D) - (1), (E) - (2)
2.
(A) - (2), (B) - (4), (C) - (5), (D) - (1), (E) - (3)
3.
(D)
Module 4
Knowledge Check 1
1.
(A) - True, (B) - False, (C) - False, (D) - True, (E) - True
2.
(A) (4), (B) - (1), (C) - (5), (D) - (3), (E) - (2)
Knowledge Check 2
1.
(A) (2), (B) - (5), (C) - (4), (D) - (1), (E) - (3)
2.
Knowledge Check 3
1.
(A) - False, (B) - True, (C) - True, (D) - False, (E) - False
2.
page ii of 234
XML Simplified
Knowledge Check 4
1.
(A) - False, (B) - True, (C) - True, (D) - False, (E) - True
2.
(A) (4), (B) - (1), (C) - (5), (D) - (2), (E) - (3)
Module 5
Knowledge Check 1
1.
(A) - True, (B) - False, (C) - True, (D) - False, (E) - True
1.
Answers
Knowledge Check 2
(A) - False, (B) - True, (C) - True, (D) - True, (E) - False
Knowledge Check 3
1.
(A) - (3), (B) - (4) , (C) - (5) , (D) - (1) , (E) - (2)
2.
display: block;
background-color: blue;
color: white;
border: medium solid magenta; text-indent: 20
3.
(A) - (5), (B) - (3), (C) - (2), (D) - (4) , (E) - (1)
Knowledge Check 4
1.
(A) - True, (B) - True, (C) - False, (D) - True, (E) - False
Module 6
Knowledge Check 1
1.
(A) - XML processor reads an XML document, (B) - XML processor creates a hierarchical tree containing nodes for each piece of information, (C) - Apply the rules of an XSL style sheet to document
tree., (D) - XSL processor starts with root node in tree and performs pattern matching, (E) - Portion
of a tree matching the given pattern is processed by appropriate style sheet template.
2.
(A) - (3), (B) - (5), (C) - (4), (D) - (1), (E) - (2)
3.
(A) - (False), (B) - True, (C) - (True), (D) - (False), (E) - (True)
XML Simplified
page iii
(C)
2.
(A) - (5), (B) - (3), (C) - (1), (D) - (2), (E) - (4)
3.
(D)
Module 7
Answers
Knowledge Check 1
1.
(A) - False, (B) True, (C) False, (D) True, (E) - True
2.
(A) (3), (B) - (5), (C) - (4), (D) - (1), and (E) - (2)
3.
(A) (4), (B) (1), (C) (2), (D) (5), and (E) (3)
Knowledge Check 2
1.
(A) - True, (B) True, (C) True, (D) False, (E) - False
2.
(A) - True, (B) True, (C) False, (D) False, (E) - True
Knowledge Check 3
1.
(A) - False, (B) True, (C) True, (D) True, (E) - False
2.
(A)
page iv of 234
XML Simplified