Unit 2

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 296

xml

UNIT-II: Working with XML: Introduction, The syntax of XML, XML


Document Structure, Document type Definition (DTD), Namespaces,
XML schemas, XSLT, XPath, XML Parsers - DOM and SAX
HIstory
Origins in SGML: XML was inspired by SGML, aiming to provide a simpler
and more web-friendly way to structure and transport data.
Standardization: XML 1.0 was standardized in 1998, leading to widespread
adoption and the development of related technologies like XSLT and XML
Schema.
Evolution: XML has evolved with updates and new specifications, and
although JSON has become a popular alternative, XML continues to be
used in various contexts for data representation and exchange.
XML’s development reflects the need for a versatile and structured format
for data interchange, and its impact is still felt in many areas of technology
today.
XML
XML (eXtensible Markup Language) is a versatile, text-based format
designed to store, transport, and structure data. Unlike HTML, which is
focused on displaying content, XML is primarily concerned with the data
itself. It is both human-readable and machine-readable, making it a
powerful tool for data interchange across different systems.
XML
XML (Extensible Markup Language) is a markup language that defines a set of rules for
encoding documents in a format that is both human-readable and machine-readable.
It is a flexible text format derived from SGML (Standard Generalized Markup
Language).

Purpose of XML:

XML was designed to transport and store data, focusing on simplicity, generality, and
usability across the internet.
Unlike HTML, which is used to display data, XML is used to describe and structure data.
Key Features of XML:

• Self-Descriptive: XML documents are self-descriptive because they use tags that
define the structure and meaning of the data.
• Platform-Independent: XML is platform-independent, meaning it can be used
across different systems, devices, and programming languages without
modification.
• Hierarchical Structure: XML data is structured in a tree-like hierarchy, with
elements containing sub-elements, which allows for complex data representations.
• Customizable Tags: Unlike HTML, XML doesn’t have predefined tags. Users can
define their own tags to describe the data, making it highly flexible.
• Text-Based: XML is text-based, which makes it easy to read, write, and transmit
over networks. This also allows it to be easily manipulated by various text-
processing tools.
XML Technologies
XML technologies encompass a wide array of tools, standards, and specifications that build upon or interact with
XML. These technologies enable the creation, validation, transformation, and management of XML data.

1. XML Schema Definition (XSD): Defines the structure and data types of an XML document.
2. Document Type Definition (DTD): Defines the structure and legal elements/attributes of an XML document.
3. XPath (XML Path Language): A language for navigating through elements and attributes in an XML document.
4. XSLT (Extensible Stylesheet Language Transformations): A language for transforming XML documents into
other formats like HTML, text, or another XML document.
5. XQuery: A query language for extracting and manipulating data from XML documents.
6. XPointer and XLink: Technologies for linking XML documents and addressing parts of XML documents.
7. SOAP (Simple Object Access Protocol):A protocol for exchanging structured information in web services.
8. SVG (Scalable Vector Graphics): An XML-based language for describing 2D graphics and graphical applications.
9. RSS (Really Simple Syndication): A web feed format for delivering regularly changing web content.
10. MathML (Mathematical Markup Language): An XML-based language for describing mathematical notation
and content.
Applications of XML
1. Data Interchange Between Systems: XML is widely used for exchanging data between different systems,
platforms, and organizations.
It provides a standardized way to encode data, making it easier to share and process across different
applications.
Example: Web services often use XML for data exchange in SOAP (Simple Object Access Protocol).
2. Web Services: XML forms the foundation for many web services, allowing applications to communicate
over the web.
It is used in both SOAP and RESTful web services for structuring request and response messages.
3. Configuration Files: Many software applications use XML to store configuration settings in a structured,
readable format.
Example: Microsoft’s .NET applications often use XML-based configuration files (.config files).
4. Document Storage: XML is used to store documents in a structured format that is easy to read, search, and
manipulate.
Example: Office file formats like Microsoft Word (.docx) and OpenDocument (.odt) are based on XML.
5. RSS and Atom Feeds: XML is used in syndication formats like RSS (Really Simple Syndication) and Atom.
These feeds are used to publish frequently updated information like blog posts, news headlines, and podcasts.
6. XHTML: XHTML (Extensible Hypertext Markup Language) is an XML-based version of HTML.
It combines the flexibility of XML with the structure of HTML, ensuring that web pages are well-formed and
properly structured.
7. Data Storage and Database Integration: XML is used in databases to store semi-structured data.
Some databases offer native support for XML, allowing you to query and manipulate XML data using
standard query languages like SQL with extensions.
8. E-commerce: XML is used in e-commerce applications to describe product information, transactions, and
orders.
Example: Electronic Data Interchange (EDI) systems use XML to standardize the exchange of business
documents between companies.
9. Industry-Specific Standards: Many industries have developed their own XML-based standards for data
interchange.
Example: HL7 (Health Level 7) for healthcare data, XBRL (eXtensible Business Reporting Language) for
financial reporting, and FPML (Financial Products Markup Language) for financial derivatives.
10. Content Management Systems (CMS): XML is used in content management systems to manage and
organize web content.
It allows for the separation of content from presentation, making it easier to update and maintain websites.
11. Mobile Applications: XML is often used in mobile applications for data storage and transmission.
Example: Android apps use XML for layout files (.xml) to define the user interface.
12. Scientific Data Representation: XML is used to store and share scientific data, ensuring consistency and
interoperability across different research institutions and software tools.
Example: MathML (Mathematical Markup Language) is an XML-based language used to describe
mathematical notations.
Benefits of XML:

• Interoperability: XML's platform and language independence make it ideal for


exchanging data between disparate systems.
• Flexibility: Customizable tags and structure allow XML to represent a wide range of data
models.
• Extensibility: New tags and structures can be added without affecting existing systems,
making XML adaptable over time.

Drawbacks of XML:

• Verbosity: XML documents can be verbose, leading to larger file sizes compared to
other formats like JSON.
• Complexity: Parsing and handling XML can be more complex compared to simpler
formats, especially for large documents
Difference b/w HTML and XML
HTML (HyperText Markup Language) and XML (eXtensible Markup Language) are both markup
languages used to structure and present data, but they serve different purposes and have distinct
characteristics. the key differences between HTML and XML:

1. Purpose
HTML:
• Primary Purpose: HTML is designed to display data and format web pages. It defines the
structure and layout of web content, including text, images, links, and multimedia.
• Focus: Presentation of data and user interface elements on web browsers.
XML:
• Primary Purpose: XML is designed to store, transport, and structure data. It is a flexible data
format that is used to represent complex data structures in a platform-independent way.
• Focus: Data representation, storage, and exchange between systems.
2. Tag Definition
HTML:
• Predefined Tags: HTML has a fixed set of predefined tags (e.g., <div>, <p>, <a>, <h1>), each with a
specific meaning and purpose.
• Tag Semantics: HTML tags are interpreted by web browsers to render content in a specific way.
XML:
Custom Tags: XML allows users to define their own tags based on the specific needs of the data being
represented.
Tag Semantics: XML tags do not have predefined meanings. The meaning is defined by the user or the
application processing the XML.
3. Case Sensitivity
HTML:
• Case Insensitive: HTML tags are not case-sensitive, so <DIV> and <div> are treated the same by
web browsers.
XML:
• Case Sensitive: XML is case-sensitive, meaning <Data> and <data> are considered different
elements.
4. Syntax Rules
HTML:
Lenient Syntax: HTML is forgiving with errors, such as missing closing tags or improperly nested tags. Browsers
often correct these errors automatically.
Self-Closing Tags: HTML has self-closing tags like <img /> and <br />, which do not require a closing tag.
XML:
Strict Syntax: XML requires strict adherence to syntax rules. All tags must be properly closed, and elements
must be properly nested.
Self-Closing Tags: XML allows self-closing tags, but they must be explicitly closed with a slash (e.g., <element />).

5. Document Structure
HTML:
Structure: HTML documents have a defined structure, typically including a <!DOCTYPE> declaration, <html>,
<head>, and <body> tags.
Multiple Root Elements: HTML allows multiple top-level elements, such as multiple <div> tags inside <body>.
XML:
Structure: XML documents have a tree-like structure with a single root element that contains all other
elements.
Single Root Element: XML requires a single root element to contain all other elements in the document.
6. Data vs. Presentation
HTML:
• Focus on Presentation: HTML is primarily concerned with how data is presented to the user. It includes tags for
formatting content (e.g., <b> for bold, <i> for italic).
• Integration with CSS and JavaScript: HTML integrates with CSS (Cascading Style Sheets) for styling and JavaScript
for interactivity.
XML:
• Focus on Data: XML is focused on structuring and storing data rather than presenting it. It separates data from
its presentation.
• No Styling: XML itself does not include tags for styling or presenting data. Styles and presentations are typically
handled by other technologies like XSLT.
7. Use Cases
HTML:
• Web Pages: Used to create and structure web pages displayed in browsers.
• UI Elements: Defines user interface elements for web applications.
XML:
• Data Exchange: Used for exchanging data between different systems and platforms (e.g., in web services, APIs).
• Configuration Files: Used in software configuration files (e.g., Android development, build systems).
• Document Formats: Used in various document formats like Microsoft Office files (e.g., .docx, .xlsx).
8. Handling
HTML:

• Rendering: HTML documents are rendered by web browsers to display content visually.
• Parsing: Browsers have built-in parsers that interpret HTML and display it as intended, even
if the HTML contains errors.
XML:

• Parsing: XML documents are parsed by XML parsers, which require the document to be
well-formed. Errors in XML structure can cause parsing to fail.
• Transformation: XML documents can be transformed into different formats using XSLT
(Extensible Stylesheet Language Transformations).
Basic Structure of an XML
Document:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rootElement SYSTEM "example.dtd">
<rootElement>
<!-- This is a comment -->
<childElement attribute="value">Content</childElement>
<anotherElement>
<subElement>More content</subElement>
<subElement>><![CDATA[content!]]></ subElement>
</anotherElement>
<?xml-stylesheet type="text/css" href=“xxx.css"?>
</rootElement>
Structure of an XML document
The structure of an XML document is hierarchical and tree-like, with a single root element that
contains all other elements. Each XML document must follow a well-defined structure to be
considered well-formed. Here's an overview of the components and structure of an XML
document:

1. XML Declaration
The XML declaration is an optional but recommended line that defines the version of XML
being used and the character encoding.

<?xml version="1.0" encoding="UTF-8"?>


• version="1.0": Specifies the version of XML.
• encoding="UTF-8": Specifies the character encoding. UTF-8 is the most common encoding
used.
<?xml version=“1.0” encoding=“UTF-8” ?>
<bookstore>
<book category=“science”>
<title lang=“Italian”> Everyday Italian </title>
<author>Giada De Laurentils</author>
<Year>2005</Year>
<price>30.00</price>
</book>
</bookstore>
<Employee>
<Name>
<Firstname>Rama</Firstname>
<Lastname>Rao</Lastname>
</Name>
<Contact>
<Mobile>123456789</Mobile>
<LandLine>564789</LandLine>
</Contact>
<Address>
<City>Hyd</City>
<State>Telangana</State>
<Zipcode>534101</Zipcode>
</Address>
</Employee>
2. Document Type Declaration (Optional)
The Document Type Declaration (DOCTYPE) defines the document type and can reference an
external DTD (Document Type Definition) or contain an internal subset of rules for the document
structure.
<!DOCTYPE rootElement SYSTEM "example.dtd">
• <!DOCTYPE>: Declares the document type.
• SYSTEM "example.dtd": References an external DTD file.

3. Root Element
The root element is the single, top-level element that contains all other elements in the XML
document. Every XML document must have exactly one root element.

<rootElement>
<!-- Other elements go here -->
</rootElement>
4. Child Elements
Child elements are nested within the root element and can themselves contain other elements, attributes,
or text content. XML allows for a hierarchical arrangement of elements.

<rootElement>
<childElement1>
<subElement>Content</subElement>
</childElement1>
<childElement2 attribute="value">More content</childElement2>
</rootElement>
5. Attributes
Attributes provide additional information about elements. They are defined within the start tag of an
element.
<element attributeName="attributeValue">Content</element>
attributeName="attributeValue": An attribute with a name (attributeName) and value (attributeValue).
6. Text Content
Elements can contain text content, which can be mixed with other elements.
<greeting>Hello, World!</greeting>
In this example, the <greeting> element contains text content.
7. Comments: Comments can be added to the XML document to provide explanations or notes. They are ignored
by the XML parser.
<!-- This is a comment -->
8. Processing Instructions (Optional)
Processing instructions provide information to the application processing the XML document. They are typically used for
linking stylesheets or other external resources.
<?xml-stylesheet type="text/xsl" href="style.xsl"?>
In this example, a stylesheet is linked using a processing instruction.
9. CDATA Sections (Optional)
CDATA (Character Data) sections are used to include text that should not be parsed by the XML parser, such as special
characters.
<![CDATA[Some text with special characters like <, &, and >]]>
• <![CDATA[ and ]]>: Encloses the text that should not be parsed.
10. Entity References (Optional)
Entity references are used to represent special characters or to define shortcuts for longer strings of text.

<message>Stay &lt;strong&gt;strong&lt;/strong&gt; &amp; positive!</message>

&lt;, &gt;, and &amp;: Represent the characters <, >, and &, respectively.
Example of a Complete XML Document
Here’s an example that combines all these elements:

<?xml version="1.0" encoding="UTF-8"?>


<!DOCTYPE note SYSTEM "note.dtd">
<note>
<!-- This is a comment -->
<to attribute="friend">Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body><![CDATA[Don't forget me <this> weekend!]]></body>
<?xml-stylesheet type="text/css" href="note.css"?>
</note>
Summary of XML Document Structure

• XML Declaration: Specifies XML version and encoding (optional but recommended).
• Document Type Declaration: Defines the document structure and references a DTD
(optional).
• Root Element: The single top-level element that contains all other elements.
• Child Elements: Nested elements that represent the data structure.
• Attributes: Additional data within elements, defined in the opening tag.
• Text Content: Text contained within elements.
• Comments: Notes and explanations ignored by the XML parser.
• Processing Instructions: Instructions for the application processing the XML (optional).
• CDATA Sections: Sections of text that should not be parsed by the XML parser (optional).
• Entity References: Special characters or shortcuts for strings (optional).

This structure ensures that an XML document is well-formed and can be correctly processed by
XML parsers.
Books Example Explanation:
<?xml version="1.0" encoding="UTF-8"?> • XML Declaration: <?xml version="1.0"
<library> encoding="UTF-8"?> specifies the XML version
<book id="1"> and character encoding.
<title>The Catcher in the Rye</title> • Root Element: <library> is the root element that
<author>J.D. Salinger</author> contains all the book entries.
<year>1951</year> • Book Elements: Each <book> element represents
<genre>Fiction</genre> an individual book. It includes:
<publisher>Little, Brown and Company</publisher> 1. Attributes: id is used to uniquely identify
<price currency="USD">10.99</price> each book.
</book> 2. Child Elements:
<book id="2"> i. <title>: The title of the book.
<title>To Kill a Mockingbird</title> ii. <author>: The author of the book.
<author>Harper Lee</author> iii. <year>: The year of publication.
<year>1960</year> iv. <genre>: The genre of the book.
<genre>Fiction</genre> v. <publisher>: The publisher of the book.
<publisher>J.B. Lippincott & Co.</publisher> vi. <price>: The price of the book, with a currency
<price currency="USD">7.99</price> attribute specifying the currency used.
</book>
<book id="3"> This XML structure is useful for representing and
<title>1984</title> organizing information about a collection of books.
<author>George Orwell</author>
Email Example:
<?xml version="1.0" encoding="UTF-8"?>
<email>
<header>
<to>[email protected]</to>
<from>[email protected]</from>
<subject>Meeting Reminder</subject>
<date>2024-08-12</date>
</header>
<body>
<paragraph>Hello John,</paragraph>
<paragraph>This is a reminder about our meeting scheduled for tomorrow at 10 AM.</paragraph>
<paragraph>Best regards,</paragraph>
<paragraph>Jane</paragraph>
</body>
<attachments>
<attachment fileName="agenda.pdf" fileSize="12345" />
<attachment fileName="minutes.docx" fileSize="67890" />
</attachments>
</email>
Weather Example
<?xml version="1.0" encoding="UTF-8"?>
<weatherReport>
<location>
<city>San Francisco</city>
<state>CA</state>
<country>USA</country>
</location>
<currentConditions>
<temperature unit="F">68</temperature>
<humidity>72</humidity>
<condition>Partly Cloudy</condition>
<wind>
<speed unit="mph">8</speed>
<direction>NW</direction>
</wind>
</currentConditions>
<forecast>
<day date="2024-08-12">
<highTemperature unit="F">75</highTemperature>
<lowTemperature unit="F">58</lowTemperature>
<condition>Sunny</condition>
</day>
<forecast>
<day date="2024-08-12">
<highTemperature unit="F">75</highTemperature>
<lowTemperature unit="F">58</lowTemperature>
<condition>Sunny</condition>
</day>
<day date="2024-08-13">
<highTemperature unit="F">72</highTemperature>
<lowTemperature unit="F">60</lowTemperature>
<condition>Partly Cloudy</condition>
</day>
<day date="2024-08-14">
<highTemperature unit="F">70</highTemperature>
<lowTemperature unit="F">59</lowTemperature>
<condition>Showers</condition>
</day>
</forecast>
</weatherReport>
Purchase Order:
<purchase_order>
<order_number>12345</order_number>
<date>2024-08-14</date>
<customer>
<name>John Doe</name>
<address>
<street>123 Elm Street</street>
<city>Springfield</city>
<state>IL</state>
<zip>62701</zip>
</address>
</customer>
<items>
<item>
<product_id>001</product_id>
<description>Laptop</description>
<quantity>1</quantity>
<price>999.99</price>
</item>
<item>
<product_id>002</product_id>
5. Contact Information:
<contacts>
<contact>
<name>John Doe</name>
<phone type="mobile">555-1234</phone>
<email>[email protected]</email>
<address>
<street>123 Elm Street</street>
<city>Springfield</city>
<state>IL</state>
<zip>62701</zip>
</address>
</contact>
<contact>
<name>Jane Smith</name>
<phone type="home">555-5678</phone>
<email>[email protected]</email>
<address>
<street>456 Oak Avenue</street>
<city>Metropolis</city>
<state>NY</state>
<zip>10001</zip>
Movie Database:

<movies>
<movie>
<title>Inception</title>
<director>Christopher Nolan</director>
<release_date>2010-07-16</release_date>
<genre>Science Fiction</genre>
<rating>PG-13</rating>
<description>A mind-bending thriller that explores the concept of dreams within dreams.</description>
</movie>
<movie>
<title>The Matrix</title>
<director>The Wachowskis</director>
<release_date>1999-03-31</release_date>
<genre>Action</genre>
<rating>R</rating>
<description>A hacker discovers the reality he lives in is a simulation controlled by machines.</description>
</movie>
</movies>
Explanation: This XML example describes a movie database, including details like the title, director, release date, genre,
rating, and description for each movie.
XML Syntax
The syntax of XML (eXtensible Markup Language) is designed to be simple and
straightforward, but it must adhere to specific rules to be considered well-
formed.
1. XML Declaration
The XML declaration is optional but recommended. It defines the XML version
and the character encoding used in the document.
<?xml version="1.0" encoding="UTF-8"?>

version="1.0": Specifies the version of XML being used.


encoding="UTF-8": Specifies the character encoding (UTF-8 is the most
common).
2. Elements
Elements are the primary building blocks of an XML document. They consist of a start
tag, content, and an end tag.

<elementName>Content goes here</elementName>


• <elementName>: The opening tag.
• Content goes here: The content of the element (can be text, other elements, or both).
• </elementName>: The closing tag.
3. Attributes
Attributes provide additional information about elements. They are defined within the
start tag.
<elementName attribute1="value1" attribute2="value2">Content</elementName>

attribute1="value1": An attribute with a name (attribute1) and a value (value1).


4. Nesting Elements
Elements can contain other elements, creating a hierarchical structure.

<parentElement>
<childElement>Child content</childElement>
</parentElement>

• <parentElement>: The parent element.


• <childElement>: A child element nested within the parent element.
5. Empty Elements
Elements that don’t have any content can be represented as empty elements.

<emptyElement />

<emptyElement />: An empty element. Note the self-closing slash (/).


6. Comments
Comments can be added to XML documents to provide explanations or notes. They are ignored
by the XML parser.
<!-- This is a comment -->
7. CDATA Sections
CDATA (Character Data) sections are used to include text that should not be parsed by the XML
parser, such as special characters.
<![CDATA[Some text with special characters like <, &, and >]]>

<![CDATA[ and ]]>: Encloses the text that should not be parsed.
8. Processing Instructions
Processing instructions provide information to the application processing the XML document.
<?target instruction?>

<?xml-stylesheet type="text/xsl" href="style.xsl"?>: An example that links an XSL stylesheet to


the XML document.
9. XML Entities: Entities are used to represent special characters that have specific meanings in XML.

&lt;: Represents <


&gt;: Represents >
&amp;: Represents &
&quot;: Represents "
&apos;: Represents '

10. Prolog: The prolog includes the XML declaration and any processing instructions, comments, or DOCTYPE
declarations that come before the root element.
<?xml version="1.0" encoding="UTF-8"?>
<!-- This is an XML document -->
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
11. Root Element
Every XML document must have exactly one root element that contains
all other elements.

<rootElement>
<!-- Other elements go here -->
</rootElement>
XML Syntax Rules:
 Case Sensitivity: XML tags are case-sensitive. <Note> and <note> are considered different elements.

 Proper Nesting: Elements must be properly nested within each other. Improper nesting will cause the
XML to be invalid.

<!-- Correct -->


<parent>
<child></child>
</parent>

<!-- Incorrect -->


<parent>
<child></parent></child>

 Closing Tags: All elements must have a corresponding closing tag.

<element>Content</element> <!-- Correct -->


<element>Content <!-- Incorrect: Missing closing tag -->
 Quotation Marks: Attribute values must be enclosed in either single (') or double (") quotes.

<element attribute="value">Content</element> <!-- Correct -->


<element attribute=value>Content</element> <!-- Incorrect -->

 Single Root Element: The XML document must have one, and only one, root element that encloses all other
elements.

<root>
<child1>Content</child1>
<child2>More content</child2>
</root> <!-- Correct -->

<root1></root1>
<root2></root2> <!-- Incorrect: Multiple root elements -->

• By adhering to these syntax rules, an XML document is considered "well-formed" and can be reliably parsed and
processed by XML parsers.
Synatx –rules summary

• XML Documents Must Have a Root Element


• The XML Prolog, is optional. If it exists, it must come first in the document.
• All XML Elements Must Have a Closing Tag
• XML Tags are Case Sensitive
• XML Elements Must be Properly Nested
• XML Attribute Values Must Always be Quoted
• use Entity References
• Comments in XML
• White-space is Preserved in XML
Tags, Elements & Attributes
XML Tag
An XML tag is a fundamental component of XML (eXtensible Markup Language) used to define
elements and structure within an XML document. Tags serve as the building blocks that delineate
the beginning and end of elements, attributes, and other components in the document.

Types of XML Tags


1. Opening Tag:

Definition: The tag that marks the start of an element. It is written within angle brackets.
Syntax: <elementName>
Example: <book>
<book>
Description: In this example, <book> is the opening tag for the "book" element.
2. Closing Tag:
Definition: The tag that marks the end of an element. It is written within angle brackets with a forward slash
before the element name.
Syntax: </elementName>
Example: </book>
</book>
Description: </book> is the closing tag for the "book" element. It corresponds to the opening tag and
signifies the end of the "book" element.

3. Self-Closing Tag:

Definition: A tag that marks an element that does not contain any content or child elements. It is a
combination of an opening and closing tag in one.
Syntax: <elementName />
Example: <br />
<br />
Description: <br /> is a self-closing tag that represents an empty element, such as a line break. Self-closing
tags are often used for elements that do not have any content between the opening and closing tags.
Example of XML Tags in Context
<bookstore>
<book>
<title>XML Developer's Guide</title>
<author>John Doe</author>
<price>39.95</price>
</book>
</bookstore>

Opening Tags: <bookstore>, <book>, <title>, <author>, <price>


Closing Tags: </bookstore>, </book>, </title>, </author>, </price>
Self-Closing Tag: If we had an empty element, such as an image or a line break, it would be represented by a self-
closing tag like <img src="cover.jpg" />.

 Key Points:
• Tags are used to define the structure and content of an XML document.
• Every opening tag must have a corresponding closing tag, except in the case of self-closing tags.
• Tags are enclosed in angle brackets (< >), with opening tags starting with the element name and closing tags
starting with a forward slash (/) followed by the element name.
• Self-closing tags are used for empty elements and combine the opening and closing tag into one.
XML Elements
XML Elements are the primary building blocks of an XML document. An XML
element consists of a start tag, content, and an end tag. Elements can contain text,
other elements (known as child elements), and attributes. They define the structure
and content of the XML data.

Structure of an XML Element


1. Start Tag: The opening part of the element, enclosed in angle brackets (< >).
Example: <title>
2. Content: The data or nested elements within the element.
Example: The Great Gatsby
3. End Tag: The closing part of the element, similar to the start tag but with a
forward slash (/) before the element name.
Example: </title>
 Example of a Simple XML Element
<title>The Great Gatsby</title>

Explanation:
<title> is the start tag.
The Great Gatsby is the content.
</title> is the end tag.

Together, they form a complete XML element representing the title of a book.

Nested XML Elements


XML elements can be nested within each other to create a hierarchical structure. This allows complex data to be represented in an organized
manner.

 Example of Nested Elements:


<book>
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<price>10.99</price>
</book>
Explanation:
<book> is the parent element.
<title>, <author>, and <price> are child elements nested within the <book> element.
This structure represents a book with its title, author, and price as separate elements.
Elements with Attributes
Elements can also contain attributes, which provide additional information about the element. Attributes are placed
within the start tag and consist of a name-value pair.

 Example of an Element with Attributes:

<book isbn="978-3-16-148410-0">
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<price>10.99</price>
</book>
Explanation:
The <book> element has an attribute isbn with the value "978-3-16-148410-0".
Attributes are used to provide extra information that is not part of the content.
Key Points:
• Elements: Define the structure and content of XML data. They are the core components of an XML document.
• Tags: Elements are defined by a start tag and an end tag. The content between the tags can be text, other elements,
or both.
• Nesting: Elements can be nested within each other to create a hierarchical structure.
• Attributes: Provide additional information about elements. They are included in the start tag and consist of name-
value pairs.
XML Attributes

XML Attributes are used to provide additional information about XML elements.
Attributes are always included within the start tag of an element and consist of a
name-value pair. They are a way to add metadata or properties to elements
without affecting the element's content or structure.

Structure of an XML Attribute


• Attribute Name: The name of the attribute, which identifies what kind of
information the attribute is providing.
• Equal Sign: An equal sign (=) separates the attribute name from its value.
• Attribute Value: The value assigned to the attribute, enclosed in double quotes
(").
Example of an XML Element with Attributes

<book isbn="978-3-16-148410-0" format="hardcover">


<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<price>10.99</price>
</book>

Explanation:

• <book> is the element, and it has two attributes: isbn and format.
• isbn="978-3-16-148410-0" is an attribute that provides the ISBN number of the book.
• format="hardcover" is an attribute that specifies the format of the book.
• The attributes are placed within the start tag of the book element.
 Multiple Attributes
An element can have multiple attributes, each separated by a space within the start tag.

Example:

<employee id="E123" department="HR" role="Manager">


<name>John Doe</name>
<salary>75000</salary>
</employee>

Explanation:

The employee element has three attributes: id, department, and role.
These attributes provide additional details about the employee, such as their ID, department, and role.

 When to Use Attributes


• Metadata: Use attributes to store metadata or information that describes the properties of an element.
Example: In an <image> element, attributes like src, alt, and width could describe the image's source file, alternative text,
and dimensions.
• Simple Data: Use attributes when the information is simple, such as a unique identifier, a flag, or a property that doesn't
require complex structure or nesting.
Example: In a <product> element, an id attribute might store the product's unique identifier.
Attributes vs. Elements
1. Attributes:

• Provide additional information in a compact form.


• Are included in the start tag of an element.
• Cannot contain complex data like child elements or mixed content.
• Are best suited for metadata and properties that describe the element.
2. Elements:

• Are the primary means of storing content and data in XML.


• Can contain complex structures, including other elements and text.
• Are more flexible than attributes for representing data with nested or hierarchical structures.
Example Comparison:

<!-- Using Attributes -->


<book isbn="978-3-16-148410-0" format="hardcover" />

<!-- Using Elements -->


<book>
<isbn>978-3-16-148410-0</isbn>
<format>hardcover</format>
</book>
• Attributes make the XML more compact but are less flexible if the data structure becomes more complex.
• Elements allow for greater flexibility and clarity when representing structured or hierarchical data.
Tag Vs Element
XML Tag: Just the markup used to create elements. Tags include
opening (<tag>), closing (</tag>), or self-closing (<tag />).

XML Element: A complete structure in XML, which consists of a start


tag, content (optional), and an end tag. Elements can contain other
elements, text, or attributes.
Entities
• Five special characters must be written as entities:
&amp; for & (almost always necessary)
&lt; for < (almost always necessary)
&gt; for > (not usually necessary)
&quot; for " (necessary inside double quotes)
&apos; for ' (necessary inside single quotes)
• These entities can be used even in places where they
are not absolutely required
• These are the only predefined entities in XML

55
Well-formed XML Document
• Every element must have both a start tag and an end tag, e.g. <name> ...
</name>
• But empty elements can be abbreviated: <break />.
• XML tags are case sensitive
• XML tags may not begin with the letters xml, in any combination of cases
• Elements must be properly nested, e.g. not <b><i>bold and
italic</b></i>
• Every XML document must have one and only one root element
• The values of attributes must be enclosed in single or double quotes, e.g. <time
unit="days">
• Character data cannot contain < or &
56
Names in XML
• Names (as used for tags and attributes) must begin with a letter or
underscore, and can consist of:
• Letters, both Roman (English) and foreign
• Digits, both Roman and foreign
. (dot)
- (hyphen)
_ (underscore)
: (colon) should be used only for namespaces
• Combining characters and extenders (not used in English)
Transaction Data
Thousands of XML formats exist, in many different industries, to describe day-to-day
data transactions:

• Stocks and Shares


• Financial transactions
• Medical data
• Mathematical data
• Scientific measurements
• News information
• Weather services
DTD
DTD
• XML DTD (Document Type Definition) is a set of rules that define the
structure and the legal elements, attributes, and entities of an XML
document.
• Essentially, it is a blueprint for what an XML document should look
like and how its content should be structured.
• It specifies what elements, attributes, and entities are allowed in the
XML document and how they can be arranged.
• DTDs help ensure that the data within an XML document adheres to a
specific format, making it possible for software to parse and interpret
the XML data consistently.
DTD….
• A DTD (Document Type Definition) describes the structure of one or
more XML documents.
• Specifically, a DTD describes about Elements, Attributes and Entities
defined in an XML document.
• An XML document is called a well-structured XML document if it
follows certain simple syntactic rules and an XML document is called a
valid XML document if it also specifies and conforms to a DTD.
Key Features of XML DTD:
Element Structure: DTD defines what elements (or tags) are allowed in the XML
document, how these elements are nested, and what their content should be (e.g., text,
other elements, or a combination).

Attribute Rules: It specifies which attributes can be used within elements, their types,
and any default values they might have.

Entity Definitions: DTD allows the definition of entities, which are placeholders for
repeatable content or special characters.

Validation: DTD can be used to validate whether an XML document adheres to the
defined structure, ensuring consistency and correctness of the data.
Why Use DTD?
Data Integrity: Ensures that the XML document adheres to a
predefined structure, maintaining data integrity.
Interoperability: Facilitates data exchange between different systems
by enforcing a common data structure.
Validation: Provides a mechanism to check if an XML document is
"well-formed" and "valid" according to the defined rules.
Types of DTD:
• Internal DTD: Embedded directly within the XML document.
• External DTD: Stored in a separate file and referenced by the XML
document.
Example of a Simple DTD
<!DOCTYPE note [
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
<!ATTLIST note
id ID #REQUIRED
>
]>
<note id="n1">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Example of XML with DTD
1. Internal DTD Example:
<!DOCTYPE note [
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
<!ATTLIST note id ID #REQUIRED>
]>
<note id="n1">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
2. External DTD Example: a. External DTD File (note.dtd):

<!ELEMENT note (to, from, heading, body)>


<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
<!ATTLIST note id ID #REQUIRED>

b. XML Document:

<!DOCTYPE note SYSTEM "note.dtd">


<note id="n1">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<!DOCTYPE library [

<!-- Entity Declarations -->


<!ENTITY author "John Doe"> <!-- Internal Entity -->
<!ENTITY pubyear "2023"> <!-- Internal Entity -->
<!ENTITY extEntity SYSTEM "external.txt"> <!-- External Entity -->
<!ENTITY perc SYSTEM "percentage.txt" NDATA txt> <!-- Unparsed Entity -->

<!-- Notation Declaration -->


<!NOTATION txt SYSTEM "text/plain">

<!-- Element Declarations -->


<!ELEMENT library (book+, magazine*)>

<!ELEMENT book (title, author, publisher, year)>


<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT publisher (#PCDATA)>
<!ELEMENT year (#PCDATA)>

<!ELEMENT magazine (title, editor)>


<!ELEMENT editor (#PCDATA)>

<!-- Attribute Declarations -->


1. DTD Declaration
a. Internal DTD: Defined within the XML document.

<!DOCTYPE root-element [
<!-- Internal DTD declarations here -->
]>

b. External DTD: Defined outside the XML document.


Example of External DTD reference:

<!DOCTYPE root-element SYSTEM "filename.dtd">


2. Element Declarations
Element declarations in a DTD define the name of the element and the
type of content it can hold. The declaration specifies whether the element
can contain text, other elements, or be empty.

Syntax
<!ELEMENT element-name content-model>

• element-name: The name of the element being declared.


• content-model: Defines what the element can contain. It could be text,
other elements, a combination of both, or it might be empty.
Content Models
1. Text Content (#PCDATA)/simple element: The element can contain
parsed character data (text).
<!ELEMENT element-name (#PCDATA)>
Example:
<!ELEMENT title (#PCDATA)>
• Declaring a title element that contains only text

2.Element Content/Compound elements: The element can contain other


child elements.
<!ELEMENT element-name (child1, child2, ...)>
Example:

<!ELEMENT book (title, author, publisher)>


3. Mixed Content: The element can contain both text and child elements.
<!ELEMENT element-name (#PCDATA | child1 | child2)*>
Example:
<!ELEMENT paragraph (#PCDATA | bold | italic)*>

4. Empty Content (EMPTY)/Standalone element: The element cannot contain any


content.
<!ELEMENT element-name EMPTY>
Example:
<!ELEMENT br EMPTY>

5. Any Content (ANY)/Unrestricted element: The element can contain any type of
content, including text and child elements.
<!ELEMENT element-name ANY>
<!ELEMENT note ANY>
<document>
<title>Sample Title</title> <!—simple Element>
<!DOCTYPE document [
</document>
<!-- Text Content -->
<library>
<!ELEMENT title (#PCDATA)>
<book>
<title>Sample Book</title>
<!-- Element Content -->
<author>Author Name</author> <!—Compound Element>
<!ELEMENT book (title, author, publisher)>
<publisher>Publisher Name</publisher>
<!ELEMENT author (#PCDATA)>
</book>
<!ELEMENT publisher (#PCDATA)>
</library>
<document>
<!-- Mixed Content -->
<paragraph>This is <bold>bold</bold> and <italic>italic</italic>
<!ELEMENT paragraph (#PCDATA | bold | italic)*>
text.</paragraph> <!mixed content>
<!ELEMENT bold (#PCDATA)>
</document>
<!ELEMENT italic (#PCDATA)>
<document>
<p>This is a line.<br/></p> <!--Empty Element
<!-- Empty Content -->
</document>
<!ELEMENT br EMPTY>
<document>
<!-- Any Content -->
<content> <!—unrestricted Element>
<!ELEMENT content ANY>
<title>Sample Title</title>
]>
<description>This can contain any elements.</description>
</content>
</document>
Occurrence Indicator:

Sometimes it is necessary to specify how many times element may occur in


document which is done by Occurrence Indicator. When no occurrence
indicator is specified, child element must occur exactly once in XML document.
Operator Syntax Description
Exactly one occurrence of a
None A
Zero or more occurrences of a i.e. any number of
* (Astrisck) a*
times

One or more occurrences of a i.e. at least once


+ (Plus) a+
Zero or one occurrences of a i.e. at most once
? (Question a?
mark)
Declaring multiple children:
• Elements with multiple children are declared with names of the child
elements inside parenthesis. The child elements must also be
declared.

Operator Syntax Description


, (Sequence) a,b a followed by b
| (Choice) a|b a or b
() (Singleton) (expression) Expression is treated
as a Unit
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE books [
<!ELEMENT books (book*)> <!-- A list of zero or more books -->

<!ELEMENT book (title, author+, publisher?, price, isbn, year?)> <!-- A book must have one title, at least one author,
an optional publisher, one price, one ISBN, and an optional year -->

<!ELEMENT title (#PCDATA)> <!-- The title of the book -->


<!ELEMENT author (#PCDATA)> <!-- The author(s) of the book; there must be at least one -->
<!ELEMENT publisher (#PCDATA)> <!-- The publisher of the book (optional) -->
<!ELEMENT price (#PCDATA)> <!-- The price of the book -->
<!ELEMENT isbn (#PCDATA)> <!-- The ISBN of the book -->
<!ELEMENT year (#PCDATA)> <!-- The year the book was published (optional) -->
]>

<books>
<book>
<title>XML Developer's Guide</title>
<author>John Doe</author>
<publisher>Example Press</publisher>
<price>44.95</price>
<isbn>1234567890</isbn>
<year>2020</year>
</book>
3. Attribute Declaration
In XML DTD (Document Type Definition), an attribute is a property associated with an
element that provides additional information about that element. Attributes are used to
define data about elements in XML documents and to constrain their values.

Attribute Declaration Syntax:


<!ATTLIST element-name attribute-name attribute-type default-value>

• element-name: The name of the element to which the attribute belongs.


• attribute-name: The name of the attribute.
• attribute-type: The type of the attribute.
• default-value: The default value of the attribute (can be #REQUIRED, #IMPLIED, or
#FIXED, or a default value).
Define attributes for elements.

<!ATTLIST element-name attribute-name attribute-type default-value >

a. Attribute Types:

CDATA - Character data.


ID - Unique identifier.
IDREF - Reference to an ID.
IDREFS - List of IDREFs.
NMTOKEN - Name token (a valid XML name).
NMTOKENS - List of NMTOKENs.
(value1 | value2 | ...) - Enumeration of values.
b. Default Values:

#REQUIRED - Attribute must be provided.


#IMPLIED - Attribute is optional.
#FIXED "value" - Attribute is fixed to the specified value.
"default-value" - Default value if not provided.

• Example:

<!ATTLIST person
id ID #REQUIRED
gender (male | female) "male"
>
Attribute Types
1. CDATA: Character data.
<!ATTLIST element-name attribute-name CDATA #IMPLIED>
Example:
<!ATTLIST book isbn CDATA #REQUIRED>
Example XML: <book isbn="1234567890">...</book>

2. ID: A unique identifier within the document.

<!ATTLIST element-name attribute-name ID #REQUIRED>


Example:
<!ATTLIST book id ID #REQUIRED>
Example XML: <book id="b1">...</book>
3. IDREF: References an ID value.
<!ATTLIST element-name attribute-name IDREF #IMPLIED>
Example:
<!ATTLIST book refID IDREF #IMPLIED>
Example XML: <book refID="b1">...</book>

4. IDREFS: A space-separated list of IDREF values.


<!ATTLIST element-name attribute-name IDREFS #IMPLIED>

5. NMTOKEN: A valid XML name token.


<!ATTLIST element-name attribute-name NMTOKEN #IMPLIED>
Example: <!ATTLIST book language NMTOKEN #IMPLIED>
Example XML: <book language="English">...</book>
6. NMTOKENS: A space-separated list of NMTOKEN values.

<!ATTLIST element-name attribute-name NMTOKENS #IMPLIED>

7. Enumeration: A list of possible values.

<!ATTLIST element-name attribute-name (value1 | value2 | ...) default-value>

Example: <!ATTLIST book category (fiction | nonfiction | technical) "fiction">


Example XML:<book category="technical">...</book>
• Enumerated type: enumerated attribute values are used when we
want attribute value to be one of fixed set of values. There are two
kinds of enumerated types:
• Enumeration: attributes are defined by a list of acceptable values from which
document author must choose a value. The values are explicitly specified in
declaration, separated by pipe(|)
• <!ATTLIST employee gender (male|female) #REQUIRED>
• Notation: it allows using value that has been declared a NOTATION in DTD.
Notation is used to specify format of non-XML data and common used is to
describe MIME types like image/gif, image/jpeg etc.
• <!NOTATION jpg SYSTEM “image/gif‟>
• <!ENTITY logo SYSTEM “logo.jpg‟ NDATA jpg>
• <!ATTLIST photo format NOTATION (jpg) #IMPLIED>
Default Values
• #REQUIRED: The attribute must be provided in the XML document.

<!ATTLIST element-name attribute-name CDATA #REQUIRED>


• #IMPLIED: The attribute is optional and may be omitted.

<!ATTLIST element-name attribute-name CDATA #IMPLIED>


• #FIXED "value": The attribute has a fixed value that cannot be changed.

<!ATTLIST element-name attribute-name CDATA #FIXED "default-value">


Wild Card Characters:

|: Choice between options.


*: Zero or more occurrences.
+: One or more occurrences.
?: Zero or one occurrence.
(): Grouping elements or content.
#: Special keywords for attribute qualifiers.
4. Entity
In XML DTD (Document Type Definition), an is a way to define reusable
pieces of data that can be referenced within an XML document. Entities
can represent text, external files, or unparsed data and are used to simplify
and manage repetitive content or to handle external resources.

Types of Entities in XML DTD


1. Internal Entities: Defined directly within the DTD. They are replaced by
their defined content when the XML document is parsed.

2. External Entities: Refer to an external file or resource. The external file's


content is included at the location where the entity is referenced.
1. Internal Entities: Defined directly within the DTD. They are
replaced by their defined content when the XML document is
parsed.

Syntax:
<!ENTITY entity-name "entity-content">
• Example:
<!ENTITY author “BalaguruSwamy">
Usage in XML: Resulting XML:

<document>
<document>
<author>&author;</author>
<author>BalaguruSwamy </author>
</document>
</document
2. External Entities: Refer to an external file or resource. The external
file's content is included at the location where the entity is referenced.
Syntax:

<!ENTITY entity-name SYSTEM "URI">


• Example:
<!ENTITY logo SYSTEM "logo.png">
• Usage in XML:

<document>
<img src="&logo;"/>
</document>

This would include the content of logo.png at the src attribute location.
3. Unparsed Entities: Used for referring to data that is not parsed as XML. Often
used for binary data or non-XML formats.
• Syntax:

<!ENTITY entity-name SYSTEM "URI" NDATA notation-name>


• Example:
<!ENTITY logo SYSTEM "logo.png" NDATA image/png>
• Usage in XML:

<document>
<graphic file="&logo;"/>
</document>

Note: The NDATA declaration indicates that the entity refers to unparsed data and should be
handled by an application specified by the notation-name.
Entity Declaration Syntax
Here’s a summary of the syntax for declaring entities in a DTD:

1. Internal Entity:

<!ENTITY entity-name "entity-content">


2. External Entity:

<!ENTITY entity-name SYSTEM "URI">


External Entity with Notation:

<!ENTITY entity-name SYSTEM "URI" NDATA notation-name>


 Example DTD with Entities

<!DOCTYPE document [
<!-- Internal Entity -->
<!ENTITY author "John Doe">

<!-- External Entity -->


<!ENTITY logo SYSTEM "logo.png">

<!-- Unparsed Entity -->


<!ENTITY file SYSTEM "file.doc" NDATA doc>
]>
Example
<!DOCTYPE document [
<!ENTITY author "John Doe">
<!ENTITY logo SYSTEM "logo.png">
<!ENTITY file SYSTEM "file.doc" NDATA doc>
]>

<document>
<title>Author Info</title>
<author>&author;</author>
<img src="&logo;"/>
<attachment file="&file;"/>
</document>
5. Notation Declaration
Define notations for non-XML data.

<!NOTATION notation-name SYSTEM "external-identifier">

• Example:

<!NOTATION gif SYSTEM "image/gif">


6. Comments
Comments in DTD are similar to those in XML:

<!-- This is a comment -->


CDATA vs PCDATA
CDATA and PCDATA are terms used to specify the type of content that
elements can contain. They help define how data within elements
should be treated by XML parsers.
1. CDATA (Character Data): It is Used to define sections of text in XML that should
be treated as raw character data. This means that any special characters or markup within a CDATA
section are not processed by the XML parser.

• Syntax in DTD:
<!ELEMENT element-name (content-model)>
<!ATTLIST element-name attribute-name CDATA #IMPLIED>
Usage in XML: CDATA sections are used to include text that might otherwise be interpreted as XML
markup. This is especially useful for including code, scripts, or text with special characters.

• Example in XML:
<example>
<![CDATA[
<note>This is a CDATA section. < & > are not parsed as markup.</note>
]]>
</example>
In the above XML, the CDATA section preserves the text exactly as it is, without parsing the <, >, and
& characters as XML markup or entities.
2. PCDATA (Parsed Character Data): Represents text that should be parsed
by the XML processor. This means that any special characters or markup within PCDATA
content are interpreted according to XML rules, such as converting & to &amp;.

• Syntax in DTD:
<!ELEMENT element-name (#PCDATA)>
Usage in XML: PCDATA is the default type of text content in XML elements, where special
characters are converted to their respective entity references.

• Example in XML:

<example>
<text>This is PCDATA. < & > are parsed and must be escaped.</text>
</example>
In the above XML, the <, >, and & characters would be represented as &lt;, &gt;, and &amp;,
respectively, when the XML is processed.
weather_report.xml
<?xml version="1.0" encoding="UTF-8"?> <weather_report>
<!DOCTYPE weather_report [ <location>
<!ELEMENT weather_report (location, date, forecast+)> <city>Mumbai</city>
<!ELEMENT location (city, country, coordinates?)> <country>India</country>
<!ELEMENT city (#PCDATA)> <coordinates>
<!ELEMENT country (#PCDATA)> <latitude>19.0760N</latitude>
<!ELEMENT coordinates (latitude, longitude)> <longitude>72.8777E</longitude>
<!ELEMENT latitude (#PCDATA)> </coordinates>
<!ELEMENT longitude (#PCDATA)> </location>
<!ELEMENT date (#PCDATA)> <date>2024-08-22</date>
<!ELEMENT forecast (time_of_day, temperature, humidity, <forecast wind_direction="SW">
wind_speed, conditions)> <time_of_day>Morning</time_of_day>
<!ELEMENT time_of_day (#PCDATA)> <temperature>28°C</temperature>
<!ELEMENT temperature (#PCDATA)> <humidity>85%</humidity>
<!ELEMENT humidity (#PCDATA)> <wind_speed>20 km/h</wind_speed>
<!ELEMENT wind_speed (#PCDATA)> <conditions>Cloudy</conditions>
<!ELEMENT conditions (#PCDATA)> </forecast>
<!ATTLIST weather_report <forecast>
unit CDATA "metric" <time_of_day>Afternoon</time_of_day>
> <temperature>32°C</temperature>
<!ATTLIST forecast <humidity>70%</humidity>
wind_direction CDATA #IMPLIED <wind_speed>18 km/h</wind_speed>
> <conditions>Sunny</conditions>
Library.xml <library>
<book format="hardcover" language="Hindi">
<?xml version="1.0" encoding="UTF-8"?>
<title>Godaan</title>
<!DOCTYPE library [
<author>Munshi Premchand</author>
<!ELEMENT library (book+)>
<publisher>Saraswati Press</publisher>
<!ELEMENT book (title, author+, publisher?, year, genre,
<year>1936</year>
isbn)>
<genre>Fiction</genre>
<!ELEMENT title (#PCDATA)>
<isbn>9788170281355</isbn>
<!ELEMENT author (#PCDATA)>
</book>
<!ELEMENT publisher (#PCDATA)>
<book>
<!ELEMENT year (#PCDATA)>
<title>Wings of Fire</title>
<!ELEMENT genre (#PCDATA)>
<author>A.P.J. Abdul Kalam</author>
<!ELEMENT isbn (#PCDATA)>
<author>Arun Tiwari</author>
<!ATTLIST book
<year>1999</year>
format (hardcover | paperback | ebook) "paperback"
<genre>Biography</genre>
language CDATA "English"
<isbn>8173711461</isbn>
>
</book>
]>
<book format="ebook">
<title>Ignited Minds</title>
<author>A.P.J. Abdul Kalam</author>
<year>2002</year>
<genre>Non-Fiction</genre>
<isbn>0143424127</isbn>
</book>
Employee.xml
<?xml version="1.0" encoding="UTF-8"?> <!ELEMENT employees (employee+)>
<!DOCTYPE employees SYSTEM "employees.dtd"> <!ELEMENT employee (name, position, department?, email,
<employees> description)>
<employee id="E001"> <!ATTLIST employee id CDATA #REQUIRED
<name>John &nameSeparator; Doe</name> status (active | inactive) "active">
<position>Software Engineer</position> <!ELEMENT name (#PCDATA)>
<department>Development</department> <!ELEMENT position (#PCDATA)>
<email>john.doe@&defaultDomain;</email> <!ELEMENT department (#PCDATA)>
<description>&jobDescription;</description> <!ELEMENT email (#PCDATA)>
</employee> <!ELEMENT description (#PCDATA)>
<employee id="E002">
<name>Jane Smith</name> <!-- Customized Entities -->
<position>Project Manager</position> <!ENTITY nameSeparator ", ">
<department>Management</department> <!ENTITY defaultDomain "example.com">
<email>jane.smith@&defaultDomain;</email> <!ENTITY jobDescription "This employee is part of the team.">
<description>&jobDescription;</description>
</employee> <!-- Internal Entity -->
<employee id="E003"> <!ENTITY amp "&#38;">
<name>Emily Johnson</name> <!ENTITY lt "&#60;">
<position>UX Designer</position> <!ENTITY gt "&#62;">
<department>Design</department> <!ENTITY quot "&#34;">
<email>emily.johnson@&defaultDomain;</email> <!ENTITY apos "&#39;">
Limitations of DTD
• There is no built-in data type in DTDs
• No new data types can be created in DTDs
• The use of cardinality in DTDs is limited
• Namespaces are not supported
• DTDs provide very limited support for modularity and reuse
• We cannot put any restrictions on text content
• Defaults for elements cannot be specified
• We have very little control over mixed content
• DTDs are written in strange format and are difficult to validate
XML namespaces
XML namespaces
• XML namespaces are a mechanism in XML to avoid name conflicts by
qualifying element and attribute names with a unique identifier.
• They are essential when combining XML documents from different XML
vocabularies or when different XML vocabularies are used together in a
single document.
• Name collision occurs when elements from two or more documents share
the same name.

• Name collision is not a problem if you are not concerned with validation.
The document content only needs to be well-formed. name collision will
keep a document from being validated
This figure shows name collision
Benefits of Using XML
Namespaces:
• Prevents Name Conflicts: Avoids collisions when combining XML
documents from different sources.
• Supports Modularity: Encourages modularity by allowing XML
documents to use elements and attributes from different
vocabularies.
• Facilitates Integration: Enables the integration of data from various
sources without name clashes.
• Promotes Standardization: Helps in adhering to industry standards
where specific namespaces are used for standard elements and
attributes.
Namespace Declaration Syntax
• A namespace is a defined collection of element and attribute names.

• Names that belong to the same namespace must be unique.


• Elements can share the same name if they reside in different
namespaces.

• Namespaces must be declared before they can be use.


A namespace can be declared in the prolog or as an element attribute. The
syntax for an attribute used to declare a namespace in the prolog is:

xmlns:prefix=“URI”

• Where URI is a Uniform Resource Identifier that assigns a unique name to


the namespace, and prefix is a string of letters that associates each
element or attribute in the document with the declared namespace.
• The URI is not a Web address. A URI identifies a physical or an abstract
resource.
APPLYING A NAMESPACE TO AN
ELEMENT
• Once it has been declared and its URI specified, the namespace is applied to
elements and attributes by inserting the namespace prefix before each element
name that belongs to the namespace.

<prefix:element>
content
</prefix:element>

• Here, prefix is the namespace prefix and element is the local part of the element
name.
Apply namespace to attribute
<book xmlns:bk="http://example.com/book"
xmlns:trans="http://example.com/translation" bk:title="Mastering
XML" bk:lang="en">

<translation trans:title="Dominar XML“ trans:lang="es">


<trans:translator>Juan Pérez</trans:translator>
</translation>
</book>
a. Example Without Namespaces
Imagine you have two XML vocabularies—one for books and one for magazines. Both vocabularies
have an element named title, but they refer to different things. Here’s how an XML document might
look without namespaces:

<library>
<item>
<title>Mastering XML</title> <!-- Is this a book or a magazine? -->
<author>Jane Doe</author>
</item>
<item>
<title>Monthly Tech</title> <!-- Is this a book or a magazine? -->
<editor>John Smith</editor>
</item>
</library>

In this example, there’s no way to distinguish whether the title refers to a book or a magazine. This
could lead to confusion or errors in processing the document.
b. Example With Namespaces
Use namespaces to clearly differentiate between books and magazines:

<library xmlns:bk="http://example.com/books"
xmlns:mag="http://example.com/magazines">
<bk:book>
<bk:title>Mastering XML</bk:title>
<bk:author>Jane Doe</bk:author>
</bk:book>
<mag:magazine>
<mag:title>Monthly Tech</mag:title>
<mag:editor>John Smith</mag:editor>
</mag:magazine>
</library>
Declarations
1. Namespace Declaration:

• An XML namespace is declared using the xmlns attribute in the start tag of an element.
• The xmlns attribute's value is a URI (Uniform Resource Identifier) that serves as a
unique identifier for the namespace.

Example:

<root xmlns:prefix="http://example.com/namespace">
<prefix:child>Content</prefix:child>
</root>

Here, prefix is the namespace prefix, and http://example.com/namespace is the


namespace URI.
2. Default Namespace:

If an xmlns attribute is declared without a prefix, it defines a default


namespace for the element and its descendants.
Example:

<root xmlns="http://example.com/namespace">
<child>Content</child>
</root>

In this case, all child elements within root are part of the default namespace.
3. Using Multiple Namespaces:

Multiple namespaces can be declared within a single XML document by using different
prefixes.

Example:

<root xmlns:ns1="http://example.com/ns1" xmlns:ns2="http://example.com/ns2">


<ns1:element1>Content 1</ns1:element1>
<ns2:element2>Content 2</ns2:element2>
</root>

• ns1 and ns2 are different namespaces used to qualify element1 and element2.
4. Namespace Scope:
The scope of a namespace is limited to the element where it's declared and its
children unless overridden by a new namespace declaration.

5. XML Namespaces and Validation:


• When using XML schemas (XSD), namespaces are crucial as they help in associating
XML elements with their corresponding types defined in the schema.
• A schema can define elements from multiple namespaces and validate XML
documents accordingly.

6. Namespaces in XPath and XSLT:


• In XPath, the namespace prefix must be used to refer to elements within a
namespace.
• In XSLT (Extensible Stylesheet Language Transformations), namespaces allow you to
work with multiple XML vocabularies effectively by ensuring that the correct
elements and attributes are selected and transformed.
Example of XML with Namespaces:

<book xmlns="http://example.com/books" xmlns:auth="http://example.com/authors">


<title>XML Guide</title>
<auth:author>
<auth:name>John Doe</auth:name>
</auth:author>
</book>

Here, the book element and its title child belong to the default namespace
(http://example.com/books), while the author element and its name child belong to a
different namespace (http://example.com/authors).
Types of Declarations
Declare Namespaces:
1. Root Element (Global Scope): Applies the namespace across the
entire document.
2. Child Element (Local Scope): Restricts the namespace to a specific
element and its children.
3. Attributes: Associates a namespace with an attribute, useful for
distinguishing attributes with the same name in different contexts.
4. XML Schema: Declares namespaces for schema elements and
types, setting the target namespace for validation.
1. Root Element (Global Scope):
<library xmlns="http://example.com/library"
xmlns:bk="http://example.com/books"
xmlns:auth="http://example.com/authors">
<bk:book>
<bk:title>Mastering XML</bk:title>
<auth:author>Jane Doe</auth:author>
</bk:book>
</library>
2. Child Element (Local Scope):
<library>
<bk:book xmlns:bk="http://example.com/books">
<bk:title>Mastering XML</bk:title>
<author>Jane Doe</author> <!-- Not in the bk namespace -->
</bk:book>
</library>
3. Attributes:
<library xmlns:bk="http://example.com/books">
<bk:book bk:title="Mastering XML"
xmlns:trans="http://example.com/translation" trans:lang="en">

<trans:translator>Juan Pérez</trans:translator>
</bk:book>
</library>
4. XML Schema
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:bk="http://example.com/books"
targetNamespace="http://example.com/books">
<xs:element name="book“ type="bk:BookType"/>
<xs:complexType name="BookType">
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
Namespace Examples
XML Document with Namespaces: Example1
<library xmlns="http://example.com/library" xmlns:bk="http://example.com/books"
xmlns:auth="http://example.com/authors">
<bk:book>
<bk:title>Mastering XML</bk:title>
<auth:author>
<auth:name>Jane Doe</auth:name>
<auth:birthplace>Chennai</auth:birthplace>
</auth:author>
<bk:published>2024</bk:published>
</bk:book>
<bk:book>
<bk:title>Learning XPath</bk:title>
<auth:author>
<auth:name>John Smith</auth:name>
<auth:birthplace>Pune</auth:birthplace>
</auth:author>
<bk:published>2023</bk:published>
</bk:book>
Example Exaplanation:
1. Namespace Declarations:
• The default namespace (xmlns="http://example.com/library") applies to the library element and any child
elements that don't have a prefix.
• The bk prefix (xmlns:bk="http://example.com/books") is used for elements related to books.
• The auth prefix (xmlns:auth="http://example.com/authors") is used for elements related to authors.

2. Elements in Different Namespaces:


• <bk:book>: The book element is part of the http://example.com/books namespace.
• <bk:title>, <bk:published>: These elements also belong to the http://example.com/books namespace.
• <auth:author>: The author element belongs to the http://example.com/authors namespace.
• <auth:name>, <auth:birthplace>: These elements are within the author element and belong to the
http://example.com/authors namespace.

3. XML Structure:
• The library element is the root and doesn't belong to any specific prefixed namespace.
• Inside the library, there are two bk:book elements, each with details like title, author, and published year.
• The author details are within the auth namespace, allowing the separation of book and author information
into distinct, manageable vocabularies.
This example shows how XML namespaces help organize elements and attributes from different vocabularies
within the same document, preventing name conflicts and making the document more modular and
maintainable.
 Example Without Namespaces:
<orderInfo> Example2: Order.xml
<id>12345</id>
<date>2024-08-24</date>
<name>John Doe</name>
<address>
<id>67890</id>
<street>Main Street</street> Issues Without Namespaces
<city>Bangalore</city> Ambiguity:
• The id and date elements are used both in the context
<date>2024-08-22</date>
</address> of orders and addresses, leading to potential
<items> confusion. Is id referring to the order ID, the address
<item> ID, or the item ID? What does date represent?
• The name element is used for both the customer’s
<id>001</id>
<name>Laptop</name> name and the product name, which could cause
<quantity>1</quantity> further confusion.
</item>
<item>
<id>002</id>
<name>Mouse</name>
<quantity>2</quantity>
</item>
</items>
Example With Namespaces:
<orderInfo xmlns:ord="http://example.com/order" xmlns:addr="http://example.com/address"
xmlns:prod="http://example.com/items">
<ord:id>12345</ord:id>
<ord:date>2024-08-24</ord:date>
<ord:name>John Doe</ord:name>
<ord:address>
<addr:id>67890</addr:id> Benefits With Namespaces
<addr:street>Main Street</addr:street> a. Clarity and Disambiguation:
<addr:city>Bangalore</addr:city>
<addr:date>2024-08-22</addr:date> • The ord:id, addr:id, and prod:id elements are now clearly
</ord:address> distinct, with ord:id referring to the order ID, addr:id to the
<ord:items> address ID, and prod:id to the product ID.
<ord:item> • Similarly, ord:date and addr:date are clearly distinct,
<prod:id>001</prod:id> representing the order date and the address date, respectively.
<prod:name>Laptop</prod:name> • The ord:name element is now clearly the customer’s name,
<prod:quantity>1</prod:quantity> while prod:name is the product name.
</ord:item> b. Contextual Separation:
<ord:item>
<prod:id>002</prod:id> • Each element is explicitly tied to its context (ord for order, addr
<prod:name>Mouse</prod:name> for address, and prod for product), eliminating any confusion
<prod:quantity>2</prod:quantity> about what each element represents.
</ord:item>
</ord:items>
Default Namespace Declaration
Default Namespace Declaration in XML namespaces assigns a default namespace to
all unprefixed components inside a given scope. This signifies that elements
without a prefix are presumed from the given namespace. The “xmlns” element is
used to declare the default namespace.

Syntax:
<root xmlns="http://example.com/ns">
<child>Content</child>
</root>

Example: To demonstrate the XML library catalog file structure in XML.


<?xml version="1.0" encoding="UTF-8"?>
<library
xmlns="http://example.com/library">
<book>
<title>XML Basics</title>
<author>John Doe</author>
</book>
<book>
<title>Advanced XML</title>
<author>Jane Smith</author>
</book>
</library>
Prefixed Namespace Declaration
In XML namespaces, the Prefixed Namespace Declaration method assigns a prefix to a
namespace URI. This allows elements and attributes from that namespace to be
identified with the specified prefix. Prefixed namespaces are especially beneficial
when items from different namespaces appear in the same XML document.

• Syntax:
<root xmlns:prefix="http://example.com/ns">
<prefix:child>Content</prefix:child>
</root>

Example: To demonsrtate the books catalog file structure in XML.


<?xml version="1.0" encoding="UTF-8"?>
<catalog
xmlns:bk="http://example.com/books"
xmlns:auth="http://example.com/authors">
<bk:book>
<bk:title>XML Basics</bk:title>
<auth:author>John Doe</auth:author>
</bk:book>
<bk:book>
<bk:title>Advanced XML</bk:title>
<auth:author>Jane Smith</auth:author>
</bk:book>
</catalog>
XML Schema
Document Type Definitions (DTDs) have several limitations,
including:
• Lack of flexibility: DTDs are not written in XML syntax, and they lack some flexibility.
• No data typing: DTDs don't support data typing, so you can't limit data to string or
integer.
• No namespace support: DTDs are not integrated with namespace technology, so users
can't import and reuse code.
• Slow processing times: DTDs can slow down processing times.
• Versioning: DTDs don't have built-in support for versioning XML documents.
• Vendor support: Some software vendors may not fully support DTDs.
• Element content restrictions: DTDs are a weak specification language that doesn't allow
restrictions on element contents.
• Attribute value data types: DTDs only have ten data types for attribute values.
• Validation: DTD validation requires a parser for both the XML and the DTD
DTDs (Document Type Definitions) have several limitations compared to XML Schema
Definitions (XSDs), including:
• Data types: XSDs support data types, allowing you to restrict the content of an
element. DTDs do not support data types, so you cannot restrict the content of an
element.
• Versioning: DTDs do not have built-in support for versioning, which can make it difficult to
manage changes to XML documents over time.
• Extensibility: XSDs are extensible, making it easier to derive new elements from existing
ones. DTDs are not extensible.
• Namespace support: DTDs do not support namespaces, so users cannot import and reuse
code.
• XML syntax: DTDs are not written using XML syntax, so they are not XML.
• Vendor support: Some software vendors may not provide full support for DTDs.
• Multiple XML schemas: XSDs support including or importing multiple XML schemas within
an XML schema, but DTDs do not.
• Default values: XSDs support default values for elements, but DTDs do not.
XML Schema
• XML Schema Definition or XSD is a recommendation by the World Wide Web
Consortium (W3C) to describe and validate the structure and content of an XML
document.
• An XML Schema defines the structure and rules for an XML document.
• It is written in XML itself and serves as a blueprint for what an XML document can
and should look like.
• An XML Schema specifies the elements, attributes, and their data types that an
XML document may contain, as well as the relationships between them.
• The information in the XSD is used to verify if each element, attribute or data type
in the document matches its description.
• An XSD is similar to earlier XML schema languages, such as Document Type
Definition (DTD), but it is a more powerful alternative as it provides greater control
over the XML structure.
Components of XML Schema (XSD)
1. Elements: These are the primary building blocks of an XML document. In a schema, you define
the elements that can appear in the XML document, their order, and their data types.
2. Attributes: These provide additional information about elements. In an XML Schema, you define
which attributes are allowed or required for each element and what their data types are.
3. Data Types: XML Schema supports a variety of data types, such as strings, integers, dates, and
custom types. This ensures that the data in the XML document adheres to specific formats.
4. Complex Types: These define elements that contain other elements and/or attributes. Complex
types are used to create more sophisticated structures within an XML document.
5. Simple Types: These define elements or attributes that contain only text and no child elements or
attributes. Simple types can also be restricted to certain patterns or values.
6. Sequence, Choice, and All: These are compositors used to define the order and occurrence of
child elements.
 Sequence: Child elements must appear in the specified order.
 Choice: One and only one of the child elements can appear.
 All: All child elements must appear, but in any order.
7. Namespaces: Namespaces are used in XML Schema to distinguish between elements and
attributes that may have the same name but different meanings in different contexts.
Usage
An XML document validated against this schema must conform to the rules
defined, ensuring that the data is consistent and follows the expected structure.

Benefits of Using XML Schema


1. Validation: Ensures that XML documents are well-formed and valid
according to predefined rules.
2. Data Integrity: By specifying data types and structure, XML Schema helps
maintain data integrity.
3. Interoperability: Helps in the exchange of structured data between different
systems and applications.
<?xml version="1.0" encoding="UTF-8"?>
<Bookstore xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://example.com/bookstore bookstore.xsd">
<Book id="B001">
<Title>The Great Gatsby</Title>
<Author>F. Scott Fitzgerald</Author>
<ISBN>9780743273565</ISBN>
<Publisher>Scribner</Publisher>
<Edition>1</Edition>
<Price>10.99</Price>
</Book>
<Book id="B002">
<Title>To Kill a Mockingbird</Title>
<Author>Harper Lee</Author>
<ISBN>9780061120084</ISBN>
<Publisher>Harper Perennial</Publisher>
<Edition>1</Edition>
<Price>7.99</Price>
</Book>
</Bookstore>
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<!-- Root element -->


<xs:element name="Bookstore">
<xs:complexType>
<xs:sequence>
<xs:element name="Book" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="Title" type="xs:string"/>
<xs:element name="Author" type="xs:string"/>
<xs:element name="ISBN" type="xs:string"/>
<xs:element name="Publisher" type="xs:string"/>
<xs:element name="Edition" type="xs:string"/>
<xs:element name="Price" type="xs:decimal"/>
</xs:sequence>
<xs:attribute name="id" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
Linking the XSD to the XML File
Given the XML and XSD files you provided earlier, here’s how to link them:

Step 1: Ensure Both Files are Saved


• Save the XML file as bookstore.xml.
• Save the XSD file as bookstore.xsd.
Step 2: Add Schema Location in the XML File
In the root element of your XML file (<Bookstore>), you need to add two attributes:

1. xmlns:xsi: Declares the XML Schema Instance namespace.


2. xsi:schemaLocation: Specifies the namespace and the location of the XSD file
<?xml version="1.0" encoding="UTF-8"?>
<Bookstore xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://example.com/bookstore bookstore.xsd"
xmlns="http://example.com/bookstore">
<?xml version="1.0" encoding="UTF-8"?>
Explanation of the Attributes:
<Bookstore
1. xmlns:xsi="http://www.w3.org/2001/XMLSchema-
xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance":
instance"
• Declares the XML Schema Instance namespace, which is
xsi:schemaLocation="http://example.com/bookstore
required for using schema-related attributes like
bookstore.xsd"
xsi:schemaLocation.
xmlns="http://example.com/bookstore">
2. xsi:schemaLocation="http://example.com/bookstore
<Book id="B001">
bookstore.xsd":
<Title>The Great Gatsby</Title>
<Author>F. Scott Fitzgerald</Author>
• Specifies the schema location for the
<ISBN>9780743273565</ISBN>
http://example.com/bookstore namespace.
<Publisher>Scribner</Publisher>
• The first part, http://example.com/bookstore, is the
<Edition>1</Edition>
namespace for the elements in your XML file.
<Price>10.99</Price>
• The second part, bookstore.xsd, is the path to your XSD file. If
</Book>
the XSD file is in the same directory as the XML file, you can
<Book id="B002">
just use the file name. Otherwise, provide the relative or
<Title>To Kill a Mockingbird</Title>
absolute path to the XSD file.
<Author>Harper Lee</Author>
3. xmlns="http://example.com/bookstore":
<ISBN>9780061120084</ISBN>
<Publisher>Harper Perennial</Publisher>
• Declares the default namespace for the XML document,
<Edition>1</Edition>
ensuring all elements within the document are associated
<Price>7.99</Price>
with the specified namespace
</Book>
1. Elements
Elements are the primary building blocks in XML Schema. They define the structure
and content of the XML document.

a. Simple Element: A simple element can contain only text. It cannot contain any child
elements or attributes.

Syntax: <xs:element name="elementName“ type="xs:dataType"/>

Example:

<xs:element name="firstName" type="xs:string"/>


This defines an element <firstName> that can contain a string.
b. Complex Element: A complex element can contain other elements and/or attributes.
This defines an <address> element with child elements <street>, <city>, and <state>.

Example:
Syntax:
<xs:element name="address">
<xs:complexType>
<xs:element name="elementName"> <xs:sequence>
<xs:complexType> <xs:element name="street" type="xs:string"/>
<!-- Child elements and/or attributes --> <xs:element name="city" type="xs:string"/>
</xs:complexType> <xs:element name="state" type="xs:string"/>
</xs:element> </xs:sequence>
</xs:complexType>
</xs:element>
2. Attributes
Attributes provide additional information about elements. Unlike
elements, attributes cannot contain other elements.

Syntax:
<xs:attribute name="attributeName" type="xs:dataType" use="optional|required"/>

Example:
<xs:attribute name="id" type="xs:int" use="required"/>
This defines an id attribute that must be an integer and is required.
3. Data Types
XML Schema supports built-in data types, which ensure that the data conforms to a specific format.

Common Built-in Data Types:


xs:string for text.
xs:int for integer numbers.
xs:boolean for true/false values.
xs:date for dates (YYYY-MM-DD).
xs:decimal for decimal numbers.

Example:

<xs:element name="age" type="xs:int"/>


<xs:element name="birthday" type="xs:date"/>

This defines an age element as an integer and a birthday element as a date.


<?xml version="1.0"?> <?xml version="1.0"?>
<class xmlns:xsi="www.w3.org/2001/XMLSchema- <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
instance" targetNamespace="http://www.example.com/class"
xmlns="http://www.example.com/class"
xsi:schemaLocation="http://www.example.com/class elementFormDefault="qualified">
text.xsd"
xmlns="http://www.example.com/class"> <xs:element name="class">
<student> <xs:complexType>
<firstname>raju</firstname> <xs:sequence>
<age>20</age> <xs:element name="student">
</student> <xs:complexType>
</class> <xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="age" type="xs:int"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
4. Complex Types
Complex types define elements that contain other elements and/or attributes. They allow for the creation of nested and
hierarchical structures.

 Syntax: <xs:complexType name="complexTypeName">


<!-- Define child elements and/or attributes here -->
</xs:complexType>

 Example:

<xs:complexType name="personType">
<xs:sequence>
<xs:element name="firstName" type="xs:string"/>
<xs:element name="lastName" type="xs:string"/>
<xs:element name="age" type="xs:int"/>
</xs:sequence>
<xs:attribute name="gender" type="xs:string"/>
</xs:complexType>
<xs:element name="person" type="personType"/>
5. Simple Types
Simple types are used to restrict or define custom data types for elements and attributes.
Restriction: Restriction limits the value of a simple type.
Syntax: <xs:simpleType name="simpleTypeName">
<xs:restriction base="xs:dataType">
<!-- Constraints like minLength, maxLength, pattern, etc. -->
</xs:restriction>
</xs:simpleType>
Example:

<xs:simpleType name="zipcodeType">
<xs:restriction base="xs:string">
<xs:pattern value="\d{5}"/>
</xs:restriction>
</xs:simpleType>

<xs:element name="zipcode" type="zipcodeType"/>


This defines a zipcodeType that must be a 5-digit string.
6. Sequence, Choice, and All
These are compositors that define the order and occurrence of child elements.

a. Sequence: Specifies that child elements must appear in the specified order.

Syntax:

<xs:sequence>
<!-- Define child elements here -->
</xs:sequence>
Example:

<xs:complexType name="addressType">
<xs:sequence>
<xs:element name="street" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="state" type="xs:string"/>
</xs:sequence>
</xs:complexType>
b. Choice:
Specifies that only one of the child elements can appear.

Syntax:

<xs:choice>
<!-- Define child elements here -->
</xs:choice>

Example:

<xs:complexType name="contactInfoType">
<xs:choice>
<xs:element name="email" type="xs:string"/>
<xs:element name="phone" type="xs:string"/>
</xs:choice>
</xs:complexType>
c. All:
Specifies that all child elements must appear, but in any order.

Syntax:

<xs:all>
<!-- Define child elements here -->
</xs:all>

Example:

<xs:complexType name="identityType">
<xs:all>
<xs:element name="firstName" type="xs:string"/>
<xs:element name="lastName" type="xs:string"/>
<xs:element name="id" type="xs:int"/>
</xs:all>
</xs:complexType>
7. Namespaces
Namespaces are used to avoid element name conflicts by qualifying names.

Syntax:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:prefix="namespaceURI">


<!-- Define elements, types, etc., using the prefix -->
</xs:schema>
Example:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:emp="http://www.example.com/employee">


<xs:element name="employee" type="emp:employeeType"/>
</xs:schema>
In this example, emp is a prefix that qualifies the employeeType to distinguish it from other possible employeeType
definitions.
8. Import and Include
These are used to bring in definitions from other schemas.

a. Import: Used when the schemas are in different namespaces


Syntax: <xs:import namespace="namespaceURI" schemaLocation="schemaFile.xsd"/>
Example:
<xs:import namespace="http://www.example.com/customer"
schemaLocation="customer.xsd"/>
b. Include: Used when the schemas are in the same namespace.
Syntax: <xs:include schemaLocation="schemaFile.xsd"/>
Example: <xs:include schemaLocation="address.xsd"/>
9. Annotations
Annotations provide documentation within the schema, which can help describe elements, attributes, or types.

 Syntax:
<xs:annotation>
<xs:documentation>
<!-- Description here -->
</xs:documentation>
</xs:annotation>

Example:

<xs:element name="employeeID" type="xs:int">


<xs:annotation>
<xs:documentation>
The unique identifier for an employee.
</xs:documentation>
</xs:annotation>
</xs:element>
10. Keys, Keyrefs, and Unique
These are used to define constraints on data, ensuring uniqueness and referential integrity.

a. Unique: Ensures that a set of elements or attributes is unique within a scope.

Syntax:

<xs:unique name="uniqueConstraintName">
<xs:selector xpath="XPath_expression"/>
<xs:field xpath="XPath_expression"/>
</xs:unique>
Example:

<xs:unique name="uniqueEmployeeID">
<xs:selector xpath=".//employee"/>
<xs:field xpath="employeeID"/>
</xs:unique>
b. Key:
Defines a unique key within a specific scope.

 Syntax:

<xs:key name="keyName">
<xs:selector xpath="XPath_expression"/>
<xs:field xpath="XPath_expression"/>
</xs:key>

 Example:

<xs:key name="employeeKey">
<xs:selector xpath=".//employee"/>
<xs:field xpath="employeeID"/>
</xs:key>
c. Keyref:
References a key defined elsewhere, establishing a relationship.

 Syntax:

<xs:keyref name="keyrefName" refer="keyName">


<xs:selector xpath="XPath_expression"/>
<xs:field xpath="XPath_expression"/>
</xs:keyref>

 Example:

<xs:keyref name="departmentEmployeeRef" refer="employeeKey">


<xs:selector xpath=".//department"/>
<xs:field xpath="employeeID"/>
</xs:keyref>
Common XSD Element Attributes
Schema element attributes in XML Schema Definition (XSD) define the properties and characteristics of the elements in
an XML document. These attributes provide important information like data types, constraints, and documentation that
help validate the structure and content of an XML document.
1. name:
Purpose: Specifies the name of the element.
Example: <xs:element name="student"/>

2. type:
Purpose: Defines the data type of the element (e.g., xs:string, xs:int, custom complex types).
Example: <xs:element name="age" type="xs:int"/>
3. minOccurs:

Purpose: Specifies the minimum number of times an element can appear.


Default: 1
Example: <xs:element name="student" minOccurs="0"/>
4. maxOccurs:

Purpose: Specifies the maximum number of times an element can appear.


Default: 1 (use "unbounded" to allow unlimited occurrences)
Example: <xs:element name="student" maxOccurs="unbounded"/>
5. default:

Purpose: Specifies a default value for the element if it is not provided in the XML document.
Example: <xs:element name="country" type="xs:string" default="India"/>
6. fixed:
Purpose: Specifies a fixed value for the element. The XML document must use this exact value.
Example: <xs:element name="currency" type="xs:string" fixed="INR"/>
7. nillable:
Purpose: Indicates whether the element can be explicitly set to nil in the XML document.
Default: false
Example: <xs:element name="middlename" type="xs:string" nillable="true"/>
9. substitutionGroup:
Purpose: Allows one element to be substituted for another in an XML document.
Example: <xs:element name="fulltimeStudent" substitutionGroup="student"/>
10. form:

Purpose: Specifies whether the element must be qualified with a namespace prefix.
Possible Values: qualified, unqualified
Example: <xs:element name="state" type="xs:string" form="qualified"/>
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<Bookstore xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance" <!-- Root element -->
xsi:schemaLocation="http://example.com/bookstore <xs:element name="Bookstore">
bookstore.xsd"> <xs:complexType>
<Book id="B001"> <xs:sequence>
<Title>The Great Gatsby</Title> <xs:element name="Book" maxOccurs="unbounded">
<Author>F. Scott Fitzgerald</Author> <xs:complexType>
<ISBN>9780743273565</ISBN> <xs:sequence>
<Publisher>Scribner</Publisher> <xs:element name="Title" type="xs:string"/>
<Edition>1</Edition> <xs:element name="Author" type="xs:string"/>
<Price>10.99</Price> <xs:element name="ISBN" type="xs:string"/>
</Book> <xs:element name="Publisher" type="xs:string"/>
<Book id="B002"> <xs:element name="Edition" type="xs:string"/>
<Title>To Kill a Mockingbird</Title> <xs:element name="Price" type="xs:decimal"/>
<Author>Harper Lee</Author> </xs:sequence>
<ISBN>9780061120084</ISBN> <xs:attribute name="id" type="xs:string"
<Publisher>Harper Perennial</Publisher> use="required"/>
<Edition>1</Edition> </xs:complexType>
<Price>7.99</Price> </xs:element>
</Book> </xs:sequence>
</Bookstore> </xs:complexType>
<?xml version="1.0">
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSch
ema">
<?xml version="1.0"?>
<xs:element name="class"> <class xmlns:xsi="www.w3.org/2001/XMLSchema-
<xs:complexType> instance"
<xs:sequence>
<xs:element name="student"> xsi:schemaLocation="http://www.example.com/class
<xs:complexType> student.xsd">
<xs:sequence>
<xs:element name="Name" <student>
type="xs:string"/> <Name>raju</Name>
<xs:element name="Branch" <Branch>CSE</Branch>
type="xs:String"/> <age>20</age>
<xs:element name="age" </student>
type="xs:int"/> </class>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:sequence>
</xs:complexType>
</xs:element>
XSLT
Extensible Stylesheet Language Transformation
XSLT
• XSLT (Extensible Stylesheet Language Transformations) is a powerful
language used to transform XML documents into different formats like
HTML, plain text, or another XML document.
• It works by applying a set of rules (templates) defined in an XSLT
stylesheet to an XML document.
• It separates content from presentation by allowing the data (in XML)
to be transformed into a desired output (e.g., HTML for web pages).
• XSL Family: XSLT is part of the XSL family, which also includes XPath
(used to navigate XML documents) and XSL-FO (used for formatting
XML documents).
Basic Structure of an XSLT
Stylesheet
An XSLT stylesheet is an XML document itself and typically starts with the following
structure:

<?xml version="1.0" encoding="UTF-8"?>


<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Templates and transformations go here -->
</xsl:stylesheet>

• <xsl:stylesheet>: The root element of the XSLT document.


• xmlns:xsl: Declares the XSLT namespace.
XSLT Processing Model
• XSLT transforms an input XML document by matching nodes in the
document to templates defined in the stylesheet.
• The transformation process is driven by template rules. These
templates specify how to handle different parts of the XML
document.
Common XSLT Elements

1. <xsl:template>: Defines a template that matches specific XML elements or nodes.


<xsl:template match="/">
<!-- Content to be processed when this template is matched -->
</xsl:template>
2. <xsl:value-of>: Extracts and outputs the value of an XML element or attribute.
<xsl:value-of select="element_name"/>
3. <xsl:for-each>: Iterates over a set of nodes.
<xsl:for-each select="node_set">
<!-- Content to repeat for each node -->
</xsl:for-each>
4. <xsl:apply-templates>: Applies templates to child nodes of the current node.
<xsl:apply-templates select="child_nodes"/>
5. <xsl:if>: Conditionally processes content.
<xsl:if test="condition">
<!-- Content to process if condition is true -->
</xsl:if>
6. <xsl:choose>, <xsl:when>, <xsl:otherwise>: Used for conditional branching (like if-else
statements).
<xsl:choose>
<xsl:when test="condition1">
<!-- Content for condition1 -->
</xsl:when>
<xsl:otherwise>
<!-- Content if no conditions are met -->
</xsl:otherwise>
</xsl:choose>
7. <xsl:attribute>: Adds an attribute to an element.
<xsl:attribute name="attribute_name">
<xsl:value-of select="attribute_value"/>
</xsl:attribute>
8. <xsl:output>: Specifies the format of the output document (e.g., HTML, XML, text).
<xsl:output method="html"/>
XPath in XSLT
XPath: A language used to navigate and select nodes in an XML document.
It’s integral to XSLT as it’s used within XSLT expressions to locate nodes.
Basic XPath Syntax:
/: Selects the root element.
.: Selects the current node.
//: Selects nodes from the document that match the selection,
regardless of location.
@: Selects attributes.
Example: //book/title selects all title elements within book elements.
XSLT Templates
Templates: Core components in XSLT that define rules for transforming XML nodes. They consist of
two parts:
a. Match Pattern: Specifies the node or nodes the template applies to.
b. Transformation Instructions: Defines how the matched nodes should be processed and
transformed.

<xsl:template match="book">
<html>
<body>
<h1><xsl:value-of select="title"/></h1>
<p><xsl:value-of select="author"/></p>
</body>
</html>
</xsl:template>
Control Structures
Conditionals: XSLT supports conditional logic through <xsl:if> and
<xsl:choose>.
Looping: <xsl:for-each> is used to iterate over a set of nodes.
Handling Namespaces
If the XML document uses namespaces, the XSLT stylesheet must account for
them. You can declare namespaces within the XSLT and use them in XPath
expressions.
Example:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"


xmlns:ns="http://example.com/ns">
<xsl:template match="ns:element">
<!-- Template content -->
</xsl:template>
</xsl:stylesheet>
Modes
Modes: Used to apply different templates to the same nodes depending on the
context.

<xsl:template match="element" mode="mode1">


<!-- Content for mode1 -->
</xsl:template>

<xsl:template match="element" mode="mode2">


<!-- Content for mode2 -->
</xsl:template>
Advanced Topics
Parameter Passing: <xsl:param> and <xsl:with-param> are used for
passing parameters to templates.
Named Templates: Templates can be named and invoked explicitly
using <xsl:call-template>.
Key Definitions and Efficient Access: XSLT allows the definition of keys
using <xsl:key> to optimize access to nodes.
Sorting: <xsl:sort> is used within <xsl:for-each> or <xsl:apply-
templates> to order nodes based on specific criteria.
Common Use Cases
XML to HTML Transformation: XSLT is commonly used to convert XML
data into HTML for web display.
XML to Text: Extracting data from XML and formatting it as plain text.
Data Migration: Transforming XML data into a different XML schema for
compatibility or migration purposes.
Reports and Documents: Generating formatted reports or documents
from XML data.
XSLT Processors
XSLT processors interpret XSLT stylesheets to transform XML
documents. Examples include:
• Saxon: A popular XSLT processor that supports XSLT 2.0 and 3.0.
• Xalan: An open-source XSLT processor from Apache.
• libxslt: A C-based XSLT processor used in many systems.
XSLT Versions
XSLT 1.0: The first version, widely supported and still used.
XSLT 2.0: Introduced richer data types, regular expressions, grouping,
and more powerful functions.
XSLT 3.0: Adds features like higher-order functions, streaming, and
more robust error handling.
Limitations
Complexity: Large or complex transformations can make XSLT difficult
to maintain.
Performance: For very large XML documents, XSLT transformations can
be slow, though this can often be mitigated with optimizations like
using keys.
Examples
Example 1: XML to HTML Transformation

<?xml version="1.0" encoding="UTF-8"?>


<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h1>Book List</h1>
<ul>
<xsl:for-each select="catalog/book">
<li>
<strong><xsl:value-of select="title"/></strong> by <xsl:value-of select="author"/>
</li>
</xsl:for-each>
</ul>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Example 2: Conditional Processing

<xsl:template match="price">
<xsl:choose>
<xsl:when test=". &gt; 100">
<p>Expensive</p>
</xsl:when>
<xsl:otherwise>
<p>Affordable</p>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
XSLT
The term “XSLT” is generated by combining two words i.e. ‘XSL’ and ‘T’,
‘XSL’ is the short form of ‘Extensible Stylesheet Language’ and ‘T’ is a
short form of ‘Transformation’.

So, basically, XSLT is a transformation language that is used to


transform/convert source XML documents to XML documents or to
other formats such as HTML, PDF by using XSL-FO (Formatting Objects),
etc.
XSLT
• XSL stands for EXtensible Stylesheet Language. It is a styling language
for XML just like CSS is a styling language for HTML.
• XSLT stands for XSL Transformation. It is used to transform XML
documents into other formats (like transforming XML into HTML).
• In HTML documents, tags are predefined but in XML documents, tags
are not predefined. World Wide Web Consortium (W3C) developed
XSL to understand and style an XML document, which can act as XML
based Stylesheet Language.
Main parts of XSL Document

• XSLT: It is a language for transforming XML documents into various


other types of documents.
• XPath: It is a language for navigating in XML documents.
• XQuery: It is a language for querying XML documents.
• XSL-FO: It is a language for formatting XML documents.
Advantage of XSLT
A list of advantages of using XSLT:

• XSLT provides an easy way to merge XML data into presentation because it applies user defined
transformations to an XML document and the output can be HTML, XML, or any other structured
document.
• XSLT provides Xpath to locate elements/attribute within an XML document. So it is more
convenient way to traverse an XML document rather than a traditional way, by using scripting
language.
• XSLT is template based. So it is more resilient to changes in documents than low level DOM and
SAX.
• By using XML and XSLT, the application UI script will look clean and will be easier to maintain.
• XSLT templates are based on XPath pattern which is very powerful in terms of performance to
process the XML document.
• XSLT can be used as a validation language as it uses tree-pattern-matching approach.
• You can change the output simply modifying the transformations in XSL files.
XSLT Usecases:
1. XML to HTML Transformation: Web content generation, Styling XML data for web presentation
2. XML to XML Transformation: Data interchange between different XML schemas, Schema evolution and
backward compatibility
3. XML to Text Transformation: Report generation (e.g., logs, configuration files),Template-based document
generation (e.g., CSV files)
4. XML to PDF Transformation: Document publishing (e.g., invoices, reports), Automated printing
workflows
5. Data Aggregation: Merging multiple XML documents, Filtering and sorting XML data,
6. Web Services and APIs: SOAP message transformation, API response formatting (e.g., JSON, HTML)
7. Content Management Systems (CMS): Content rendering for different output formats, Template
processing for content presentation
8. Localization and Internationalization : Multilingual content transformation, Date and number formatting
for different locales
9. Configuration File Transformation: Generating configuration files (e.g., JSON, INI) from XML, Dynamic
configuration based on environment
10. Legacy System Integration: Data migration from legacy systems, Interface adaptation for legacy system
requirements
These applications demonstrate the versatility of XSLT in various domains, including web development, data
How XSLT Works
• The XSLT stylesheet is written in XML
format.
• It is used to define the transformation rules
to be applied on the target XML document.
• The XSLT processor(Saxon,Xalan) takes the
XSLT stylesheet and applies the
transformation rules on the target XML
document and then it generates a
formatted document in the form of XML,
HTML, or text format.
• At the end it is used by XSLT formatter to
generate the actual output and displayed
on the end-user.
XSLT Transformation
For starting transformation we need one XML document on which the
XSLT code will run, the XSLT code file itself and the tool or software
having XSLT processor (You can use any free version or trial version of
the software for learning purposes).
XSLT Syntax
Student.xml

<?xml-stylesheet type="text/xsl" href="sample.xsl"?>


<class>
<student>
<firstname>Rama</firstname>
<lastname>Raju</lastname>
<Nickname>RR</Nickname>
</student>
</class>
Student.xsl
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/class">
<html>
<body>
Transformation result
<table>
<!DOCTYPE html>
<tr>
<html>
<th>Fname</th>
<body>
<th>Lname</th>
<table border="1" align="center">
<th>Nname</th> <h2>Student table</h2>
</tr> <tr Bgcolor="lightblue" >
<xsl:for-each select="student"> <th>Fname</th>
<tr><td><xsl:value-of select="firstname"/></td> <th>Lname</th>
<td><xsl:value-of select="lastname"/></td> <th>Nname</th>
<td><xsl:value-of select="Nickname"/></td> </tr>
</tr> <tr>
</xsl:for-each> <td>Rama</td>
</table> <td>Raju</td>
<td>RR</td>
</body> </tr>
</html> </table>
</xsl:template> </body>
</xsl:stylesheet> </html>
Online XSLT tools
https://easycodeforall.com/TestXSLT.jsp
https://linangdata.com/xslt-tester/
https://www.freeformatter.com/xsl-transformer.html#before-output
https://xslttest.appspot.com/
#1) XML Code
#2) XSLT Code
Below is the source XML code on which the XSLT code will
Below is the XSLT code based on <xsl:for-each> which will run
run.
on the above XML Document.
File Name: Books.xml
File Name: Books.xsl
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8"?>
<store> <!-- Root Element -->
<xsl:stylesheet
<book id ="5350192956">
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
<bookname>XSLT Programmer's
xmlns:xs="http://www.w3.org/2001/XMLSchema"
Reference</bookname>
exclude-result-prefixes="xs"
<authorname>Michael Kay</authorname>
version="2.0">
<publisher>Wrox</publisher>
<xsl:template match = "/">
<price>$40</price>
<html>
<edition>4th</edition>
<body>
</book>
<h2>Books:-</h2>
<book id ="3741122298">
<table border = "1">
<bookname>Head First Java</bookname>
<tr bgcolor = "#cd8932">
<authorname>Kathy Sierra</authorname>
<th>Book ID</th>
<publisher>O'reilly</publisher>
<th>Book Name</th>
<price>$19</price>
<th>Author Name</th>
<edition>1st</edition>
<th>Publisher</th>
</book>
<th>Price</th>
<book id ="9987436700">
<th>Edition</th>
<bookname>SQL The Complete Reference</bookname>
#3) Result / Output Code
The below code will be produced after using the XSLT code on the above XML document.
<html>
<body>
<h2>Books:-</h2>
<table border="1"> #4) View Result / Output in Web Browser
<tr bgcolor="#cd8932"> Books:
<th>Book ID</th>
<th>Book Name</th>
<th>Author Name</th>
<th>Publisher</th>
<th>Price</th>
<th>Edition</th>
</tr>
<tr bgcolor="#84cd32">
<td>5350192956</td>
<td>XSLT Programmer's Reference</td>
<td>Michael Kay</td>
<td>Wrox</td>
<td>$40</td>
<td>4th</td>
</tr>
<tr bgcolor="#84cd32">
<td>3741122298</td>
XSLT Elements
To understand the above XSLT code and its working, we first need to understand the different XSLT
elements and their attributes.
#1) <xsl:stylesheet> OR <xsl:transform>
Every XSLT code must start with the root element either <xsl:stylesheet> or <xsl:transform>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="2.0">

 Attributes:

• @xmlns:xsl: Connects XSLT document with XSLT standard.


• @version: Defines the version of the XSLT code to the parser.
#2) <xsl:template>: This declaration defines a set of rules applied to process or transform the
selected input element of the source document to the defined target element rules of the output
documents.

Basically, two types of templates are available as per their attributes:

(i) Named Template: When the xsl: template element contains the @name attribute then this is
called Named Template.
<xsl:template name="book">
• Named templates are called by xsl:call-template element.

<xsl:call-template name="book">
(ii) Match Template: The xsl:template element contains the @match attribute that contains a
matching pattern or XPath applied at the input nodes.
<xsl:template match="//book">
Match templates are called by xsl:apply-template element.
<xsl:apply-templates select="book"/>
• xsl:template element must have either@match attribute or @name attribute or both. An
xsl:template element that has no match attribute must have no mode attribute and no priority
#3) <xsl:value-of>
Provide the string/text value regarding the XPath expression defined in the @select attribute, as defined in
the above code.

<xsl:value-of
select = Expression
disable-output-escaping = "yes" | "no">
</xsl:value-of>
or
<xsl:value-of select = "bookname"/>

This will give the value of the book name.


Parameters:

Index Name Description


1) select It specifies the XPpath expression to be evaluated in current context.
2) disable-outputescaping Default-"no". If "yes", output text will not escape XML characters from
text.
#4) <xsl:for-each>: Repetition
This will process the instructions for each set of nodes (xpath defined in the @select
(required) attribute) in the sorted sequence.

<xsl:for-each select="store/book">
</xsl:for-each>
• Select: XPath Expression to be evaluated in current context to determine the set of
nodes to be iterated.
• The above code means for each node set of store/book means:

/store/book[1]
/store/book[2]
/store/book[3]

• <xsl:sort> can also be used as a child of xsl:for-each to define the order of sorting.
#5) <xsl:apply-templates>
The processor will find and apply all the templates that are having
XPath defined in the @select attribute.

The @mode attribute is also used if we want to give more than one
way of output with the same input content.
#6) <xsl:call-template>
The processor will make a call to the templates having value inside the
@name attribute (required).

<xsl:with-param> element is used to pass parameters to the template.


#7) <xsl:if>: Conditional Processing
The xsl:if instructions will only process if the Boolean value of the @test attribute will be
true otherwise the instruction will not be evaluated and the empty sequence is returned.

<xsl:if test="count(/store/book)>2">
<xsl:text>
Condition True: Count of books are more than two.
</xsl:text>
</xsl:if>

• Result: Condition True: Count of books are more than two.

Here the count() is the predefined function.


#8) <xsl:choose>: Alternatives condition processing: xsl:choose have
multiple causes for different conditions that are tested inside @test attribute of the xsl:when elements, the test
condition which comes true first among all the xsl:when, that will be processed first and there are an optional
xls:otherwise element so that if none of the condition tests come true then this xsl:otherwise will be considered.

<xsl:choose>
<xsl:when test="count(/store/book)=1">
Condition True: Count of book is one.
</xsl:when>
<xsl:when test="count(/store/book)=2">
Condition True: Count of book is two.
</xsl:when>
<xsl:when test="count(/store/book)=3">
Condition True: Count of book is three.
</xsl:when>
<xsl:otherwise>
No condition match.
</xsl:otherwise>
</xsl:choose>
 Result: Condition True: Count of the book is three.
#11) <xsl:comment>
This element is used to write a comment to the target result, any text
content that sides this tag will be printed as commented output.

<xsl:comment> This will be printed to output as a comment


node.</xsl:comment>

• Result: <!– This will be printed to output as a comment node.–>


#12) <xsl:text>
This will generate a text node to the result document, the value inside the
xsl:text will get printed as a string to output.

<xsl:text>
This is a
text line.
</xsl:text>

Output:
This is a
text line.
#13) <xsl:element>
This will generate an element to the result document with the name
mentioned in its @name attribute. The name attribute is the required
attribute.

<xsl:template match="/">
<xsl:element name="bookcode">
<xsl:value-of select="/store/book[1]/@id"/>
</xsl:element>
</xsl:template>

Result: <bookcode>5350192956</bookcode>
#14) <xsl:attribute>
This will generate an attribute to its parent element in the result document.
The name of the attribute is defined by the name attribute and the value of the
attribute is computed by the XPath mentioned in the select attribute as given
in the below code. The name attribute is the required attribute.

<xsl:template match="/">
<xsl:element name="bookcode">
<xsl:attribute name="id" select="/store/book[1]/@id"/>
</xsl:element>
</xsl:template>

Result: <bookcode id=”5350192956″/>


#15) <xsl:sort>
This element will sort the selected node in a sequence manner
accordingly in ascending or descending direction. The node or XPath is
given through @select attribute and the direction of sorting is defined
by the @order attribute.
#16) <xsl:variable>
• This element declares a variable that holds a value in it. A variable could be a global variable or a
local variable. The name of the variable is defined by the @name attribute and the value that this
variable will hold is defined by the @select attribute.

• The access of the global variable is global i.e. the variables can be called within any element and
remain accessible within the stylesheet.

• To define a global variable, we just need to declare that next to the root element of the stylesheet
as shown in the below code in the yellow highlighted, the variable ‘SecondBook’ is the global
variable and it holds the name of the second book.

• The access of the local variable is local to the element in which it is defined i.e. that variable would
not be accessible outside the element in which it is defined as shown in the below code that is grey
highlighted, the variable ‘first book’ is a local variable and it holds the name of the first book.

• To make a call to either the global variable to the local variable the Dollar symbol ($) is used before
the name of the variable, as shown below in yellow highlighted $.
Result:

First Book Name: XSLT Programmer’s Reference


Second Book Name: Head First Java
#9) <xsl:copy>
xsl:copy works on context item i.e. if that is node then it will copy the context node to the newly generated
node and this will not copy the children of the context node. Because of this reason, this is called a shallow
copy. Unlike xsl:copy-of element, the xsl:copy does not have the@select attribute.

In the below code, the context items are copied to output & all the children items are called & copied by the
xsl:apply-template recursively.

node()|@* Stands for all the nodes and all their attributes recursively.

<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
• Result: This will copy all the nodes and attributes of the source document recursively to the output
document, i.e. it will create an exact copy of the source document.
#10) <xsl:copy-of>
xsl:copy-of will copy the sequence of nodes with all of its children and attributes
recursively by default, due to this nature this is also called deep copying. @select
attribute is required for the evaluation of the XPath.

<xsl:template match="node()|@*">
<xsl:copy-of select="."/>
</xsl:template>
• Result: This will copy all the nodes and attributes of the source document recursively
to the output document, i.e. it will create an exact copy of the source document.

<xsl:copy-of select="."/>
Stands for a copy of the current node and current attribute.
#17) <xsl:Key>
This element is used to declare keys, for the matching pattern values to that particular key.

Name is a provider to that key by @name attribute(“get-publisher“), which is later used


inside the key() function. @match attribute is provided to index input node by XPath
expressions(“book“), like in the below yellow highlighted @match is used to index on all
the books available in the store.

Relative to @match attribute, the @use attribute is used, it declares the node to get the
value for that key through XPath expression(“publisher”).

Now, suppose if we need the details of the book which is published only by ‘Wrox’
publisher then we can get that value easily through xsl:key element by making a key-value
pair.
#18) <xsl:message>
This element is used for debugging purposes in XSLT development. The element gives its output to the
standard output screen of the application.

The @terminate attribute is used with two values either ‘yes’ or ‘no’, if the value is set to ‘yes’ then the
parser terminates immediately as soon the test condition gets satisfied for the message to get executed.

To understand this, let’s suppose if in our input document the price element comes to empty accidentally
as like in the below code, then the processing should stop immediately as soon as the processor
encounters the empty price element which can be easily achieved by using xsl:message inside the if test
condition as in the below XSLT code.

Result: Please note that as soon the parser


encounters the empty price tag, it
immediately terminates the processing
because of which the closing tags of
</table>, </body> and </html> would not
come at the end of the file.
#19) <xsl:param>&<xsl:with-param>
<xsl:param> element defines the parameter to template if defined inside <xsl:template>.
It can be defined either inside <xsl:stylesheet> as the global parameter or inside
<xsl:template> as the local parameter to that template.

The value of the <xsl:param> is passed/supplied when the template is called by <xsl:call-
template> or <xsl:apply-templates>.

<xsl:with-param> it passes the value of the parameter defined inside <xsl:param> to the
template. Attribute like @name contains the name of the parameter which should match
the @name attribute of the <xsl:param> element. @Select attribute is used to set a value
to that parameter.

To fetch the value of the parameter same like a variable dollar sign($) is used.
#20) <xsl:import>
<xsl:import> is used to import another stylesheet module inside our current
stylesheet. This helps in achieving a modular XSLT development approach.

After importing all the templates get available to use. The priority of the templates
defined in the parent stylesheet(which is importing another stylesheet) is higher than
the imported stylesheet (which is imported by the parent stylesheet).

If another stylesheet also has the same name template as defined inside the
template that is importing then the foreign templates get overridden by your own
template.

Attribute @href is used as the URI of the stylesheet that you want to import.

<xsl:import href="New_Book.xsl"/>
#21) <xsl:include>
Same as the above xsl:import, <xsl:include> also helps in achieving a
modular XSLT development approach. All the templates included by
<xsl:include> have the same priority/precedence as the calling stylesheet.
It is like you copy all the templates from another stylesheet to your own
stylesheet.

Attribute @href is used as the URI of the stylesheet that you want to
import.

<xsl:include href="New_Book.xsl"/>
#22)<xsl:output>
This element is used to specify the result tree in the output file. It contains attributes like
@method that can have values like ‘XML’, ‘HTML’, ‘XHTML’ and ‘text’ by default is ‘XML’.

@encoding specifies the character encoding that comes in the output file as shown in
below example encoding=”UTF-16″, the default values for XML or XHTML could be
either UTF-8 or UTF-16. @indent specifies the indentation of the XML or HTML output
code, for XML the default value is ‘no’ and for HTML and XHTML the default value is yes.

<xsl:output method="xml" encoding="UTF-16" indent="yes"/>


#23) <xsl:strip-space>
This element is used for stripping(removing) non-significant whitespace
for the listed source element inside the @element attribute and if we
want to strip whitespace from all the elements then we can use ‘*’
inside @elements attribute.

<xsl:strip-space elements="*"/>
#24) <xsl:preserve-space>
This element is used to preserve white spaces for the listed source
element inside the @element attribute and if we want to preserve
whitespace from all the elements, then we can use ‘*’ inside
@elements attribute.

<xsl:preserve-space elements="*"/>

https://www.softwaretestinghelp.com/xslt-tutorial/

https://www.youtube.com/watch?v=W--Yhp0m35A&list=PLhW3qG5bs-L9DloLUPwC3GdFimY5Ce_gS&index=6
<?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8"?>
<library> <xsl:stylesheet version="1.0"
<book> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author> <xsl:template match="/">
<year>1925</year> <html>
</book> <body>
<book> <h2>Library Books</h2>
<title>To Kill a Mockingbird</title> <table border="1">
<author>Harper Lee</author> <tr>
<year>1960</year> <th>Title</th>
</book> <th>Author</th>
<book> <th>Year</th>
<title>1984</title> </tr>
<author>George Orwell</author> <xsl:for-each select="library/book">
<year>1949</year> <tr>
</book> <td><xsl:value-of select="title"/></td>
</library> <td><xsl:value-of select="author"/></td>
<td><xsl:value-of select="year"/></td>
</tr>
</xsl:for-each>
</table>
</body>
<html>
<body>
<h2>Library Books</h2>
<table border="1">
<tr>
<th>Title</th>
<th>Author</th>
<th>Year</th>
</tr>
<tr>
<td>The Great Gatsby</td>
<td>F. Scott Fitzgerald</td>
<td>1925</td>
</tr>
<tr>
<td>To Kill a Mockingbird</td>
<td>Harper Lee</td>
<td>1960</td>
</tr>
<tr>
<td>1984</td>
<td>George Orwell</td>
<td>1949</td>
</tr>
<?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8"?>
<company> <xsl:stylesheet version="1.0"
<employee> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<name>John Doe</name>
<position>Software Engineer</position> <xsl:template match="/">
<department>IT</department> <html>
<salary>75000</salary> <body>
</employee> <h2>Company Employee List</h2>
<employee> <table border="1" cellpadding="5">
<name>Jane Smith</name> <tr>
<position>Project Manager</position> <th>Name</th>
<department>IT</department> <th>Position</th>
<salary>85000</salary> <th>Department</th>
</employee> <th>Salary ($)</th>
<employee> </tr>
<name>Michael Johnson</name> <xsl:for-each select="company/employee">
<position>HR Manager</position> <tr>
<department>Human Resources</department> <td><xsl:value-of select="name"/></td>
<salary>68000</salary> <td><xsl:value-of select="position"/></td>
</employee> <td><xsl:value-of select="department"/></td>
</company> <td><xsl:value-of select="salary"/></td>
</tr>
<html>
<body>
<h2>Company Employee List</h2>
<table border="1" cellpadding="5">
<tr>
<th>Name</th>
<th>Position</th>
<th>Department</th>
<th>Salary ($)</th>
</tr>
<tr>
<td>John Doe</td>
<td>Software Engineer</td>
<td>IT</td>
<td>75000</td>
</tr>
<tr>
<td>Jane Smith</td>
<td>Project Manager</td>
<td>IT</td>
<td>85000</td>
</tr>
XPath
XPath
• The XML Path Language (XPath) is used to uniquely identify or address
parts of an XML document.
• An XPath expression can be used to search through an XML document,
and extract information from any part of the document, such as an
element or attribute (referred to as a node in XML) in it. XPath can be
used alone or in conjunction with XSLT.
• XPath is an important and core component of XSLT standard. It is used to
traverse the elements and attributes in an XML document.
• XPath is a W3C recommendation. XPath provides different types of
expressions to retrieve relevant information from the XML document. It
is syntax for defining parts of an XML document.
Features of XPath
• XPath defines structure: XPath is used to define the parts of an XML document
i.e. element, attributes, text, namespace, processing-instruction, comment,
and document nodes.
• XPath provides path expression: XPath provides powerful path expressions,
select nodes, or list of nodes in XML documents.
• XPath is a core component of XSLT: XPath is a major element in XSLT standard
and must be followed to work with XSLT documents.
• XPath is a standard function: XPath provides a rich library of standard functions
to manipulate string values, numeric values, date and time comparison, node
and QName manipulation, sequence manipulation, Boolean values etc.
• Path is W3C recommendation.
XPath Expression
XPath defines a pattern or path expression to select nodes or node sets in an XML document.
These patterns are used by XSLT to perform transformations. The path expressions look like very
similar to the general expressions we used in traditional file system.

XPath specifies seven types of nodes that can be output of the execution of the XPath expression.
1. Root
2. Element
3. Text
4. Attribute
5. Comment
6. Processing Instruction
XPath Nodes
• XPath specifies seven types of nodes that can be output of the execution of the XPath expression.
There are seven kinds of nodes in XPath:

1. Element
2. Attribute
3. Text
4. Namespace
5. Processing-instruction
6. Comment
7. Document nodes.
An XML document can be specified as a tree of nodes. The topmost element of the tree is called the
root element.
Example : An XML document:
<?xml version="1.0" encoding="UTF-8"?> Nodes in the above XML document:
<Library>
<book> • <library> (root element node)
<title lang="en">Three Mistakes of My Life</title> • <author>Chetan Bhagat</author> (element
<author>Chetan Bhagat</author> node)
<year>2008</year> • lang="en" (attribute node)
<price>110</price>
</book>
</Library>

Atomic values: Atomic values are used to specify the nodes with no children or parent. For example: In
the above XML document, following are the atomic values:

Chetan Bhagat

"en"
Relationship of Nodes
Parent Node: Each element and attribute has a
parent which is a top element of the respective 2. Children Nodes: The children nodes can have zero, one or
element or attribute. more children. In this example, the title, author, year, and
price elements are all children of the book element.
example:
<book>
In this example, the book element is the parent of the <title lang="en">Three Mistakes of My Life</title>
title, author, year, and price. <author>Chetan Bhagat</author>
<year>2008</year>
<book> <price>110</price>
<title lang="en">Three Mistakes of My Life</title> </book>
<author>Chetan Bhagat</author>
<year>2008</year> 3. Siblings Nodes: The nodes having the same parent are known
<price>110</price> as siblings. In this example, the title, author, year, and price
</book> elements are all siblings.

<book>
<title lang="en">Three Mistakes of My Life</title>
<author>Chetan Bhagat</author>
<year>2008</year>
<price>110</price>
4. Ancestors: A node's parent or parent's parent is specified as
ancestor. In this example, the ancestors of the title element are
the book element and the library element.

<Library>
<book>
<title lang="en">Three Mistakes of My Life</title>
<author>Chetan Bhagat</author> 5. Descendants: A descendent is specified as a node's
<year>2008</year> children or children's children. In this example,
<price>110</price> descendants of the library element are the book, title,
</book> author, year, and price elements.
</Library>

<Library>
<book>
<title lang="en">Three Mistakes of My Life</title>
<author>Chetan Bhagat</author>
<year>2008</year>
<price>110</price>
</book>
</Library>
XPath Syntax
The XPath expression uses a path notation like URLs, for addressing
parts of an XML document. The expression is evaluated to yield an
object of the node-set, Boolean, number, or string type.
For example, the expression book/author will return a node-set of the
<author> elements contained in the <book> elements, if such elements
are declared in the source XML document.

In XPath, path expression is used to select nodes or node-sets in an


XML document. The node is selected by following a path or steps.
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book>
<title lang="en">Three Mistakes of My Life</title>
<price>110</price>
</book>
<book>
<title lang="en">Immortals of Meluha</title>
<price>200</price>
</book>
</bookstore>
Selecting Nodes: Path expressions used for selecting nodes are:
Index Expression Description
path
1) nodename Selects all nodes with the name "nodename" expressions
2) / Selects from the root node. and their
details in the
3) // Selects nodes in the document from the current node above
that match the selection no matter where they are. example:
4) . Selects the current node
5) .. Selects the parent of the current node
6) @ Selects attributes
Path Expression Result
bookstore Selects all nodes with the name "bookstore"
/bookstore Selects the root element bookstore. Note: if the path starts with a slash ( / )
it always represents an absolute path to an element!
bookstore/book Selects all book elements that are children of bookstore.
//book Selects all book elements no matter where they are in the document.
bookstore//book Selects all book elements that are descendant of the bookstore element, no
matter where they are under the bookstore element.
//@lang Selects all attributes that are named lang.
Predicates: Predicates are used to find a specific node or a node that contains a specific value.
• Predicates are always embedded in square brackets.
Path Expression Result
/bookstore/book[1] Selects the first book element that is the child of the bookstore
element. Note: In IE 5,6,7,8,9 first node is[0], but according to W3C, it is [1]. To
solve this problem in IE, set the selectionlanguage to XPath:in JavaScript:
xml.setProperty("SelectionLanguage","XPath");
/bookstore/book[last()] Selects the last book element that is the child of the bookstore element.
/bookstore/book[last()-1] Selects the last but one book element that is the child of the bookstore
element.
/bookstore/book[position()<3] Selects the first two book elements that are children of the bookstore
element.
//title[@lang] Selects all the title elements that have an attribute named lang.
//title[@lang='en'] Selects all the title elements that have a "lang" attribute with a value of "en".
/bookstore/book[price>100] Selects all the book elements of the bookstore element that have a price
element with a value greater than 100
/bookstore/book[price>100]/title Selects all the title elements of the book elements of the bookstore element
that have a price element with a value greater than 100
Selecting Unknown Nodes: XPath wildcards are used to select unknown XML nodes.

Wildcard Description
* Matches any element node
@* Matches any attribute node
node() Matches any node of any kind

Path Expression Result


/bookstore/* Selects all the child element nodes of the
bookstore element
//* Selects all elements in the document
//title[@*] Selects all title elements which have at
least one attribute of any kind
Selecting Several Paths : The | operator is used in XPath expression to select
several paths. From the above example, we have listed some path expressions
and result of the expressions.

Path Expression Result


//book/title | //book/price Selects all the title and price elements of all book
elements
//title | //price Selects all the title and price elements in the
document
/bookstore/book/title | //price Selects all the title elements of the book element
of the bookstore element and all the price
elements in the document

https://www.javatpoint.com/xpath-comparison-operators
?xml version = "1.0"?>
Employee.xsl
<?xml-stylesheet type = "text/xsl" href = "employee.xsl"?
>
<?xml version = "1.0" encoding = "UTF-8"?>
<class>
<xsl:stylesheet version = "1.0">
<employee id = "001">
xmlns:xsl = "http://www.w3.org/1999/XSL/Transform">
<firstname>Abhiram</firstname>
<xsl:template match = "/" >
<lastname>Kushwaha</lastname>
<html>
<nickname>Manoj</nickname>
<body>
<salary>>15000</salary>
<h3>Details of each Employee. </h3>
</employee>
<table border = "1">
<employee id = "002">
<tr bgcolor = "pink">
<firstname>Akash</firstname>
<th>ID</th>
<lastname>Singh</lastname>
<th>First Name</th>
<nickname>Bunty</nickname>
<th>Last Name</th>
<salary>25000</salary>
<th>Nick Name</th>
</employee>
<th>Salary</th>
</class>
</tr>
<tr>
<td><xsl:value-of select = "/class/employee[1]/@id"/></td>
<td><xsl:value-of select = "/class/employee[1]/firstname"/></td>
<td><xsl:value-of select = "/class/employee[1]/lastname"/></td>
<td><xsl:value-of select = "/class/employee[1]/nickname"/></td>
<td><xsl:value-of select = "/class/employee[1]/salary"/></td>
XPath Paths
There are two types of location paths used to specify the location of node in XML documents.
These paths are 1. absolute or 2. relative path.
1. Absolute Path: An absolute path starts with root node or with '/'.

• /company/employeeIt will select employee nodes within class root node.

• <xsl:for-each select = "/class/employee">


• /company/employee/firstname? It will select firstname of an employee node within class root
node.

<p><xsl:value-of select = "/class/employee/firstname"/></p>


2. Relative Path
A path is called relative path if it is started with the node that we've
selected.

See this syntax which specifies locating the elements using relative path
to employee node.

<p><xsl:value-of select = "firstname"/></p>


XPath Axes: As we know that path defines the location of a node using absolute or relative path. In the same
manner XPath axes are used to identify elements by their relationship like parent, child, sibling, etc. Axes refer to axis
on which elements are lying relative to an element.
Example: <p><xsl:value-of select = "firstname"/></p>
<xsl:value-of select = "/class/student/preceding-sibling::comment()"/>
Index Axis Description

1) ancestor It specifies the ancestors of the current nodes which include the parents up
to the root node.
2) ancestor-or-self It specifies the current node and its ancestors.

3) attribute It specifies the attributes of the current node.

A list of 4) child It specifies the children of the current node.


various 5) descendant It specifies the descendants of the current node i.e. the node's children up to
Axis the leaf node(no more children).
values: 6) descendant-or-self It specifies the current node and it's descendants.

7) following It specifies all nodes that come after the current node.

8) following-sibling It specifies the following siblings of the context node. Siblings are at the
same level as the current node and share it's parent.
9) namespace It specifies the namespace of the current node.

10) parent It specifies the parent of the current node.


XPath Operators
XPath defines operators and functions on nodes. An XPath expression
returns either a node-set, a string, a Boolean, or a number.

A list of operators used in XPath expression:


Index Operators/Functions Description
1) Comparison Operators Comparison operators are used to compare values.
2) Boolean Operators Boolean operators are used to check 'and', 'or' & 'not'
functionalities.
3) Number Functions/Operators Operators/Functions on Numbers.
4) String Functions It specifies various string functions.
5) Node Functions/Operators It specifies various functions and operators acting on
nodes.
XPath Comparison Operators XPath Boolean Operators
Index Ope Description
rato
r
Inde Operat Description
1) = It specifies equals to x or

2) != It specifies not equals to 1) and It specifies that both conditions must be


satisfied.
3) < It specifies less than
2) or It specifies that any one of the condition must be
4) > It specifies greater than satisfied.
5) <= It specifies less than or equals to
3) not() It specifies function to check condition not to be
6) >= It specifies greater than or equals to satisfied.

<xsl:for-each select = "class/employee">


<xsl:if test = "salary > 25000"> <xsl:for-each select = "class/employee[(@id = 001) or ((@id = 003))]">
XPath Number Operators/ Functions
Index Opera Description
tor

1) + It is used for addition operation.


2) - It is used for subtraction operation.
3) * It is used for multiplication operation.
4) div It is used for division operation.
5) mod It is used for modulo operation

Index Function Description


1) ceiling() It is used to return the smallest integer larger than the value
provided.
2) floor() It is used to return the largest integer smaller than the value
provided.
3) round() It is used to return the rounded value to nearest integer.
4) sum() It is used to return the sum of two numbers.

<xsl:when test = "salary div 25000 > 1">


XPath String Functions: <xsl:value-of select = "concat(firstname,' ',lastname)"/>

Index Function Description

1) starts-with(string1, string2) It returns true when first string starts with the
second string.
2) contains(string1, string2) It returns true when the first string contains the
second string.
3) substring(string, offset, length?) It returns a section of the string. The section
starts at offset up to the length provided.
4) substring-before(string1, string2) It returns the part of string1 up before the first
occurrence of string2.
5) substring-after(string1, string2) It returns the part of string1 after the first
occurrence of string2.
6) string-length(string) It returns the length of string in terms of
characters.
7) normalize-space(string) It trims the leading and trailing space from
string.
8) translate(string1, string2, string3) It returns string1 after any matching characters
in string2 have been replaced by the characters
in string3.
9) concat(string1, string2, ...) It is used to concatenate all strings.

10) format-number(number1, string1, string2) It returns a formatted version of number1 after


applying string1 as a format string. String2 is
XPath Node Functions
Inde Operator Description
x <xsl:value-of select = "position()"/>

1) / It is used to select node under a specific node.


2) // It is used to select node from root node.
3) [...] It is used to check node value.
4) | It is used for union of two node sets.

Index Function Description


1) node() It is used to select all kinds of nodes.
2) processing-instruction() It is used to select nodes which are processing
instruction.
3) text() It is used to select a text node.
4) name() It is used to provide the name of the node.
5) position() It is used to provide the position of the node.
6) last() It is used to select the last node relative to current
node;
XML Parsers
An XML parser is a software library or package that provides
interfaces for client applications to work with an XML document.
“The XML Parser is designed to read the XML and create a way for
programs to use XML.”

XML parser validates the document and check that the document is
well formatted.
•Parsers also check whether documents conform to the XML standard
and have a correct structure
•There are two types of XML parsers
1. Validating: check documents against a DTD or an XML
Schema
2. Non-validating: do not check documents against a DTD or an
XML Schema.
Types of XML Parsers

These are the two main types of XML Parsers:

1. DOM : Document Object Model


2. SAX: Simple API for XML

•A DOM parser implements DOM API


•A SAX parser implement SAX API
• Most major parsers implement both DOM and SAX API's
DOM
• The Document Object Model is an official recommendation of the
World Wide Web Consortium (W3C).
• It defines an interface that enables programs to access and update
the style, structure, and contents of the XML documents.
• XML parsers that support the DOM, implement that interface.
DOM…
• A DOM document is an object which contains all the information of an XML
document.
• It is composed like a tree structure.
• The DOM Parser implements a DOM API. This API is very simple to use.

Features of DOM Parser


• A DOM Parser creates an internal structure in memory which is a DOM
document object and the client applications get information of the original
XML document by invoking methods on this document object.
• DOM Parser has a tree based structure.
DOM (Document Object Model)
Advantages
1) It supports both read and write operations and the API is very simple
to use.
2) It is preferred when random access to widely separated parts of a
document is required.

Disadvantages
1) It is memory inefficient. (consumes more memory because the whole
XML document needs to loaded into memory).
2) It is comparatively slower than other parsers.
types of nodes in a DOM Document object
•Document node <?xmI version="1.0"?>
<?xmI-styIesheet type="text/css"
• Element node href=“test.css"?»
«!-- It's an xml-stylesheet processing
•Text node instruction. --»
<!DOCTYPE shapes SYSTEM “shapes.dtd">
•Attribute node <shapes>
•Processing instruction node <squre coIor=“BLUE"»
•Comment node «length» 20 «/Iength»
«/squre>

<lshapes>
• Each element node actually contains a list of other nodes as its children.
• These children might contain text values or other nodes
• DOM preserves the sequence of the elements that it reads from XML documents
<?xmI version="1.0"?>
<?xmI-styIesheet type="text/css" href=“test.css"?»
«!-- It's an xml-stylesheet processing instruction. --»
<!DOCTYPE shapes SYSTEM “shapes.dtd">
<shapes>

<squre coIor=“BLUE"»
«length» 20 «/Iength»
«/squre>

<shapes>
• Represents the content of xml document as tree structure
• It is programming API
• Can easily read,access, update the contents of document
• tree structure stored in the memory and can be used with any
programming language as javascript..,

• We need a parser to read XML document into memory and converts


into XML DOM Object that can be accesses with any programming
language
DOM (Document Object Model)
<?xml version="1.0"?>
<college>
<student>
<firstname>Durga</firstname>
<lastname>Madhu</lastname>
<contact>999123456</contact>
<email>[email protected]</email>
<address>
<city>Hyderabad</city>
<state>TS</state>
<pin>500088</pin>
</address>
</student>
</college>
DOM (Document Object Model)
Let's see the tree-structure representation of the above
example.
Online DOM Viewers
https://software.hixie.ch/utilities/js/live-dom-viewer/
https://bioub.github.io/dom-visualizer/
https://software.hixie.ch/utilities/js/live-dom-viewer.xml/
https://jsonformatter.org/xml-parser
When to use?
You should use a DOM parser when −

• You need to know a lot about the structure of a document.

• You need to move parts of the document around (you might want to
sort certain elements, for example).

• You need to use the information in the document more than once.
What you get?
When you parse an XML document with a DOM parser, you get back a tree
structure that contains all of the elements of your document. The DOM
provides a variety of functions you can use to examine the contents and
structure of the document.

Advantages

The DOM is a common interface for manipulating document structures.


One of its design goals is that the Java code written for one DOM-compliant
parser should run on any other DOM-compliant parser without changes.
DOM interfaces
The DOM defines several Java interfaces. Here are the most common interfaces −

 Node − The base datatype of the DOM.

 Element − The vast majority of the objects you will deal with are Elements.

 Attr − Represents an attribute of an element.

 Text − The actual content of an Element or Attr.

 Document − Represents the entire XML document. A Document object is often referred to as a
DOM tree.
Common DOM methods
When you are working with the DOM, there are several methods that are used often −

Document.getDocumentElement() − Returns the root element of the document.

• Node.getFirstChild() − Returns the first child of a given Node.

• Node.getLastChild() − Returns the last child of a given Node.

• Node.getNextSibling() − These methods return the next sibling of a given Node.

• Node.getPreviousSibling() − These methods return the previous sibling of a given Node.

• Node.getAttribute(attrName) − For a given Node, returns the attribute with the requested name.
Steps to Use DOM
Following are the steps used while parsing a document using the DOM Parser.

1. Import XML-related packages.

2. Create a DocumentBuilder

3. Create a Document from a file or stream

4. Extract the root element

5. Examine attributes

6. Examine sub-elements
1. Import XML-related packages
import org.w3c.dom.*; 4. Extract the root element
import javax.xml.parsers.*; Element root =
import java.io.*; document.getDocumentElement();
2. Create a DocumentBuilder 5. Examine attributes
DocumentBuilderFactory factory = //returns specific attribute
getAttribute("attributeName");
DocumentBuilderFactory.newInstance();
//returns a Map (table) of names/values
DocumentBuilder builder = factory.newDocumentBuilder(); getAttributes();
3. Create a Document from a file or stream 6. Examine sub-elements
StringBuilder xmlStringBuilder = new StringBuilder(); //returns a list of subelements of specified name
xmlStringBuilder.append("<?xml version = "1.0"?> <class> </class>"); getElementsByTagName("subelementName");
ByteArrayInputStream input = new ByteArrayInputStream( //returns a list of all child nodes
xmlStringBuilder.toString().getBytes("UTF-8"));
Document doc = builder.parse(input);
getChildNodes();
Demo Example

Here is the input xml file we need to parse −

<?xml version = "1.0"?>


<class>

<student rollno = "393">


<firstname>Dinkar</firstname>
<lastname>Kad</lastname>
<nickname>Dinkar</nickname>
<marks>85</marks>
</student>

<student rollno = "493">


<firstname>Vineet</firstname>
<lastname>Gupta</lastname>
<nickname>Vinni</nickname>
<marks>95</marks>
</student>

<student rollno = "593">


<firstname>Jasvir</firstname>
<lastname>Singh</lastname>
Demo Example
DomParserDemo.java

package com.tutorialspoint.xml;
The above program will generate the following
import java.io.File; result −
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder; Root element :class
import org.w3c.dom.Document; ----------------------------
import org.w3c.dom.NodeList;
import org.w3c.dom.Node; Current Element :student
import org.w3c.dom.Element; Student roll no : 393
First Name : Dinkar
public class DomParserDemo { Last Name : Kad
public static void main(String[] args){ Nick Name : Dinkar
Marks : 85
try {
File inputFile = new File("input.txt"); Current Element :student
DocumentBuilderFactory dbFactory Student roll no : 493
= DocumentBuilderFactory.newInstance(); First Name : Vineet
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder(); Last Name : Gupta
Document doc = dBuilder.parse(inputFile); Nick Name : Vinni
doc.getDocumentElement().normalize(); Marks : 95
SAX
• SAX 1.0 was released on May 11, 1998.
• SAX is a common, event-based API for parsing XML documents
• Primarily a Java API but there implementations in most
• languages
• The current version is SAX 2.0.1, and there are versions for several
programming language environments other than Java
SAX
The XML-DEV mailing group developed a Simple API for XML also called the SAX,
which is an event-driven online algorithm for parsing XML documents. SAX is a
way of reading data from an XML document that is an alternative to the
Document Object Model’s mechanism (DOM). Whereas the DOM works on the
document as a whole, creating the whole abstract syntax tree of an XML
document for the user’s convenience, SAX parsers work on each element of the
XML document sequentially, issuing parsing events while passing through the
input stream in a single pass. Unlike DOM, SAX does not have a formal
specification.

SAX is a programming interface for processing XML files based on events. The
DOM’s counterpart, SAX, has a very different way of reading XML code
Why use SAX Parser
Parsers are used to process XML documents. The parser examines the XML document, checks for
errors, and then validate it against a schema or DTD if it’s a validating parser. The next step is
determined by the parser in use. It may copy the data into a data structure native to the
computer language you’re using on occasion. It may also apply styling to the data or convert it
into a presentation format.

Apart from triggering certain events, the SAX parser does nothing with the data. It is up to the
SAX parser’s user to decide. The SAX events include (among others) as follows:

• XML Text Nodes


• XML Element Starts and Ends
• XML Processing Instructions
• XML Comments
The properties of SAX Parser are depicted below as follows:
does
work
•An XML document is seen as a series of “events”
• Unlike DOM, SAX does not store information in an internal tree
structure
•SAX is able to parse huge documents (think gigabytes) without having
to allocate large amounts of system resources
• If processing is built as a pipeline, it does not have to wait for the data
to be converted to an object; it can go to the next process once it clears
the preceding callback method
• SAX does not allow random access to the file; it proceeds in a single
pass, firing events as it goes.
•When the parser encounters start-tag, end-tag,etc., it thinks of them as events
• When such an event occurs, the handler automatically calls back to
a particular method overridden by the client, and feeds as arguments
the method what it sees
• SAX parser is event-based,it works like an event handler in Java
(e.g. MouseAdapter)
• Client application seems to be just receiving the data inactively,
from the data flow point of view
SAX (Simple API for XML)
A SAX Parser implements SAX API. This API is an event based API and less intuitive.

 Features of SAX Parser


• It does not create any internal structure.
• Clients does not know what methods to call, they just overrides the methods of the API and place his
own code inside method.
• It is an event based parser, it works like an event handler in Java.

 Advantages
1) It is simple and memory efficient.
2) It is very fast and works for huge documents.

 Disadvantages
1) It is event-based so its API is less intuitive.
2) Clients never know the full information because the data is broken into piece
• SAX is an event based parser for XML Documents
• The parser tells the application what is in the documents by notifying
the application of a stream of parsing events.
• Application then processes those events to act on data.
• SAX chooses to give you access to the information in your XML
document, not as a tree of nodes, but as a sequence of events.
• SAX chooses not to create a default object model on top of your XML
document (like DOM does)
When should I use it?

– Large documents
– Memory constrained devices
– If you need not to modify the document
SAX Structure(1/4)

Reailer
Structure(2/4)
SAXParserFactory:A SAXParserFactory object creates an
instance of the parser determined by the system property,
javax.xml.parsers.SAXParserFactory.
• SAXParser:The SAXParser interface defines several
kinds of parse() methods. In general, it passes an XML data
source and a DefaultHandler object to the parser, which processes
the XML and invokes the appropriate methods in the handler
object.
• SAXReader:The SAXParser wraps a SAXReader. Typically,
it doesn't care about that, but every once in a while it needs
to get hold of it using SAXParser's getXMLReader() so that it
can configure it. It is the SAXReader that carries on the
conversation with the SAX event handlers it defines.
Structure(3/4)
• DefaultHandler:Not shown in the diagram, a
DefaultHandler implements the ContentHandler, ErrorHandler,
DTDHandler, and EntityResolver interfaces (with null methods),
so it can override only the ones it is interested in.
• ContentHandler:Methods such as startDocument,
endDocument, startElement, and endElement are invoked
when an XML tag is recognized. This interface also defines the
methods characters and processinglnstruction, which are
invoked when the parser encounters the text in an XML
element or an inline processing instruction, respectively.
• EntityResolver:The resolve Entity method is invoked when
the parser must identify data identified by a URI
Structure(4/4)
• ErrorHandIer:Methods error, fatalError, and warning are
invoked in response to various parsing errors. The default error
handler throws an exception for fatal errors and ignores other
errors (including validation errors). That's one reason you need
to know something about the SAX parser, even if you are
using the DOM.
• Sometimes the application may be able to recover from a
validation error. Other times, it may need to generate an
exception. To ensure the correct handling, you'll need to supply
your own error handler to the parser.
■ DTDHandIer:Defines methods you will generally never be
called upon to use. Used when processing a DTD to recognize
and act on declarations for an unparsed entity.
Event
startDocumen
t
endDocument
startElement
endElement
characters
Pull Parsing Versus Push Parsing
• Streaming pull parsing refers to a programming model in which a
client application calls methods on an XML parsing library when it
needs to interact with an XML infoset--that is, the client only gets
(pulls) XML data
when it explicitly asks for it.
• Streaming push parsing refers to a programming model in which
an XML parser sends (pushes) XML data to the client as the parser
encounters elements in an XML infoset--that is, the parser sends the
data whether
or not the client is ready to use it at that time.
DOM SAX
 Tree model parser (Object based)  Event based parser (Sequence
(Tree of nodes). of events).

 DOM loads the file into the memory  SAX parses the file as it reads it,
and then parse- the file. i.e. parses node by node.

 Has memory constraints since it  No memory constraints as it does


loads the whole XML file before not store the XML content in the
parsing. memory.

DOM is read and write (can insert or SAX is read only i.e. can’t insert or
delete nodes). delete the node.

 If the XML content is small, then  Use SAX parser when XML content
prefer DOM parser. is large.

 Backward and forward search is  SAX reads the XML file from top to
possible for searching the tags and bottom and backward navigation is not
evaluation of the information inside the possible.
tags.

Slower at run time. Faster at run time.


End

You might also like