Unit 2
Unit 2
Unit 2
Purpose of XML:
XML was designed to transport and store data, focusing on simplicity, generality, and
usability across the internet.
Unlike HTML, which is used to display data, XML is used to describe and structure data.
Key Features of XML:
• Self-Descriptive: XML documents are self-descriptive because they use tags that
define the structure and meaning of the data.
• Platform-Independent: XML is platform-independent, meaning it can be used
across different systems, devices, and programming languages without
modification.
• Hierarchical Structure: XML data is structured in a tree-like hierarchy, with
elements containing sub-elements, which allows for complex data representations.
• Customizable Tags: Unlike HTML, XML doesn’t have predefined tags. Users can
define their own tags to describe the data, making it highly flexible.
• Text-Based: XML is text-based, which makes it easy to read, write, and transmit
over networks. This also allows it to be easily manipulated by various text-
processing tools.
XML Technologies
XML technologies encompass a wide array of tools, standards, and specifications that build upon or interact with
XML. These technologies enable the creation, validation, transformation, and management of XML data.
1. XML Schema Definition (XSD): Defines the structure and data types of an XML document.
2. Document Type Definition (DTD): Defines the structure and legal elements/attributes of an XML document.
3. XPath (XML Path Language): A language for navigating through elements and attributes in an XML document.
4. XSLT (Extensible Stylesheet Language Transformations): A language for transforming XML documents into
other formats like HTML, text, or another XML document.
5. XQuery: A query language for extracting and manipulating data from XML documents.
6. XPointer and XLink: Technologies for linking XML documents and addressing parts of XML documents.
7. SOAP (Simple Object Access Protocol):A protocol for exchanging structured information in web services.
8. SVG (Scalable Vector Graphics): An XML-based language for describing 2D graphics and graphical applications.
9. RSS (Really Simple Syndication): A web feed format for delivering regularly changing web content.
10. MathML (Mathematical Markup Language): An XML-based language for describing mathematical notation
and content.
Applications of XML
1. Data Interchange Between Systems: XML is widely used for exchanging data between different systems,
platforms, and organizations.
It provides a standardized way to encode data, making it easier to share and process across different
applications.
Example: Web services often use XML for data exchange in SOAP (Simple Object Access Protocol).
2. Web Services: XML forms the foundation for many web services, allowing applications to communicate
over the web.
It is used in both SOAP and RESTful web services for structuring request and response messages.
3. Configuration Files: Many software applications use XML to store configuration settings in a structured,
readable format.
Example: Microsoft’s .NET applications often use XML-based configuration files (.config files).
4. Document Storage: XML is used to store documents in a structured format that is easy to read, search, and
manipulate.
Example: Office file formats like Microsoft Word (.docx) and OpenDocument (.odt) are based on XML.
5. RSS and Atom Feeds: XML is used in syndication formats like RSS (Really Simple Syndication) and Atom.
These feeds are used to publish frequently updated information like blog posts, news headlines, and podcasts.
6. XHTML: XHTML (Extensible Hypertext Markup Language) is an XML-based version of HTML.
It combines the flexibility of XML with the structure of HTML, ensuring that web pages are well-formed and
properly structured.
7. Data Storage and Database Integration: XML is used in databases to store semi-structured data.
Some databases offer native support for XML, allowing you to query and manipulate XML data using
standard query languages like SQL with extensions.
8. E-commerce: XML is used in e-commerce applications to describe product information, transactions, and
orders.
Example: Electronic Data Interchange (EDI) systems use XML to standardize the exchange of business
documents between companies.
9. Industry-Specific Standards: Many industries have developed their own XML-based standards for data
interchange.
Example: HL7 (Health Level 7) for healthcare data, XBRL (eXtensible Business Reporting Language) for
financial reporting, and FPML (Financial Products Markup Language) for financial derivatives.
10. Content Management Systems (CMS): XML is used in content management systems to manage and
organize web content.
It allows for the separation of content from presentation, making it easier to update and maintain websites.
11. Mobile Applications: XML is often used in mobile applications for data storage and transmission.
Example: Android apps use XML for layout files (.xml) to define the user interface.
12. Scientific Data Representation: XML is used to store and share scientific data, ensuring consistency and
interoperability across different research institutions and software tools.
Example: MathML (Mathematical Markup Language) is an XML-based language used to describe
mathematical notations.
Benefits of XML:
Drawbacks of XML:
• Verbosity: XML documents can be verbose, leading to larger file sizes compared to
other formats like JSON.
• Complexity: Parsing and handling XML can be more complex compared to simpler
formats, especially for large documents
Difference b/w HTML and XML
HTML (HyperText Markup Language) and XML (eXtensible Markup Language) are both markup
languages used to structure and present data, but they serve different purposes and have distinct
characteristics. the key differences between HTML and XML:
1. Purpose
HTML:
• Primary Purpose: HTML is designed to display data and format web pages. It defines the
structure and layout of web content, including text, images, links, and multimedia.
• Focus: Presentation of data and user interface elements on web browsers.
XML:
• Primary Purpose: XML is designed to store, transport, and structure data. It is a flexible data
format that is used to represent complex data structures in a platform-independent way.
• Focus: Data representation, storage, and exchange between systems.
2. Tag Definition
HTML:
• Predefined Tags: HTML has a fixed set of predefined tags (e.g., <div>, <p>, <a>, <h1>), each with a
specific meaning and purpose.
• Tag Semantics: HTML tags are interpreted by web browsers to render content in a specific way.
XML:
Custom Tags: XML allows users to define their own tags based on the specific needs of the data being
represented.
Tag Semantics: XML tags do not have predefined meanings. The meaning is defined by the user or the
application processing the XML.
3. Case Sensitivity
HTML:
• Case Insensitive: HTML tags are not case-sensitive, so <DIV> and <div> are treated the same by
web browsers.
XML:
• Case Sensitive: XML is case-sensitive, meaning <Data> and <data> are considered different
elements.
4. Syntax Rules
HTML:
Lenient Syntax: HTML is forgiving with errors, such as missing closing tags or improperly nested tags. Browsers
often correct these errors automatically.
Self-Closing Tags: HTML has self-closing tags like <img /> and <br />, which do not require a closing tag.
XML:
Strict Syntax: XML requires strict adherence to syntax rules. All tags must be properly closed, and elements
must be properly nested.
Self-Closing Tags: XML allows self-closing tags, but they must be explicitly closed with a slash (e.g., <element />).
5. Document Structure
HTML:
Structure: HTML documents have a defined structure, typically including a <!DOCTYPE> declaration, <html>,
<head>, and <body> tags.
Multiple Root Elements: HTML allows multiple top-level elements, such as multiple <div> tags inside <body>.
XML:
Structure: XML documents have a tree-like structure with a single root element that contains all other
elements.
Single Root Element: XML requires a single root element to contain all other elements in the document.
6. Data vs. Presentation
HTML:
• Focus on Presentation: HTML is primarily concerned with how data is presented to the user. It includes tags for
formatting content (e.g., <b> for bold, <i> for italic).
• Integration with CSS and JavaScript: HTML integrates with CSS (Cascading Style Sheets) for styling and JavaScript
for interactivity.
XML:
• Focus on Data: XML is focused on structuring and storing data rather than presenting it. It separates data from
its presentation.
• No Styling: XML itself does not include tags for styling or presenting data. Styles and presentations are typically
handled by other technologies like XSLT.
7. Use Cases
HTML:
• Web Pages: Used to create and structure web pages displayed in browsers.
• UI Elements: Defines user interface elements for web applications.
XML:
• Data Exchange: Used for exchanging data between different systems and platforms (e.g., in web services, APIs).
• Configuration Files: Used in software configuration files (e.g., Android development, build systems).
• Document Formats: Used in various document formats like Microsoft Office files (e.g., .docx, .xlsx).
8. Handling
HTML:
• Rendering: HTML documents are rendered by web browsers to display content visually.
• Parsing: Browsers have built-in parsers that interpret HTML and display it as intended, even
if the HTML contains errors.
XML:
• Parsing: XML documents are parsed by XML parsers, which require the document to be
well-formed. Errors in XML structure can cause parsing to fail.
• Transformation: XML documents can be transformed into different formats using XSLT
(Extensible Stylesheet Language Transformations).
Basic Structure of an XML
Document:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rootElement SYSTEM "example.dtd">
<rootElement>
<!-- This is a comment -->
<childElement attribute="value">Content</childElement>
<anotherElement>
<subElement>More content</subElement>
<subElement>><![CDATA[content!]]></ subElement>
</anotherElement>
<?xml-stylesheet type="text/css" href=“xxx.css"?>
</rootElement>
Structure of an XML document
The structure of an XML document is hierarchical and tree-like, with a single root element that
contains all other elements. Each XML document must follow a well-defined structure to be
considered well-formed. Here's an overview of the components and structure of an XML
document:
1. XML Declaration
The XML declaration is an optional but recommended line that defines the version of XML
being used and the character encoding.
3. Root Element
The root element is the single, top-level element that contains all other elements in the XML
document. Every XML document must have exactly one root element.
<rootElement>
<!-- Other elements go here -->
</rootElement>
4. Child Elements
Child elements are nested within the root element and can themselves contain other elements, attributes,
or text content. XML allows for a hierarchical arrangement of elements.
<rootElement>
<childElement1>
<subElement>Content</subElement>
</childElement1>
<childElement2 attribute="value">More content</childElement2>
</rootElement>
5. Attributes
Attributes provide additional information about elements. They are defined within the start tag of an
element.
<element attributeName="attributeValue">Content</element>
attributeName="attributeValue": An attribute with a name (attributeName) and value (attributeValue).
6. Text Content
Elements can contain text content, which can be mixed with other elements.
<greeting>Hello, World!</greeting>
In this example, the <greeting> element contains text content.
7. Comments: Comments can be added to the XML document to provide explanations or notes. They are ignored
by the XML parser.
<!-- This is a comment -->
8. Processing Instructions (Optional)
Processing instructions provide information to the application processing the XML document. They are typically used for
linking stylesheets or other external resources.
<?xml-stylesheet type="text/xsl" href="style.xsl"?>
In this example, a stylesheet is linked using a processing instruction.
9. CDATA Sections (Optional)
CDATA (Character Data) sections are used to include text that should not be parsed by the XML parser, such as special
characters.
<![CDATA[Some text with special characters like <, &, and >]]>
• <![CDATA[ and ]]>: Encloses the text that should not be parsed.
10. Entity References (Optional)
Entity references are used to represent special characters or to define shortcuts for longer strings of text.
<, >, and &: Represent the characters <, >, and &, respectively.
Example of a Complete XML Document
Here’s an example that combines all these elements:
• XML Declaration: Specifies XML version and encoding (optional but recommended).
• Document Type Declaration: Defines the document structure and references a DTD
(optional).
• Root Element: The single top-level element that contains all other elements.
• Child Elements: Nested elements that represent the data structure.
• Attributes: Additional data within elements, defined in the opening tag.
• Text Content: Text contained within elements.
• Comments: Notes and explanations ignored by the XML parser.
• Processing Instructions: Instructions for the application processing the XML (optional).
• CDATA Sections: Sections of text that should not be parsed by the XML parser (optional).
• Entity References: Special characters or shortcuts for strings (optional).
This structure ensures that an XML document is well-formed and can be correctly processed by
XML parsers.
Books Example Explanation:
<?xml version="1.0" encoding="UTF-8"?> • XML Declaration: <?xml version="1.0"
<library> encoding="UTF-8"?> specifies the XML version
<book id="1"> and character encoding.
<title>The Catcher in the Rye</title> • Root Element: <library> is the root element that
<author>J.D. Salinger</author> contains all the book entries.
<year>1951</year> • Book Elements: Each <book> element represents
<genre>Fiction</genre> an individual book. It includes:
<publisher>Little, Brown and Company</publisher> 1. Attributes: id is used to uniquely identify
<price currency="USD">10.99</price> each book.
</book> 2. Child Elements:
<book id="2"> i. <title>: The title of the book.
<title>To Kill a Mockingbird</title> ii. <author>: The author of the book.
<author>Harper Lee</author> iii. <year>: The year of publication.
<year>1960</year> iv. <genre>: The genre of the book.
<genre>Fiction</genre> v. <publisher>: The publisher of the book.
<publisher>J.B. Lippincott & Co.</publisher> vi. <price>: The price of the book, with a currency
<price currency="USD">7.99</price> attribute specifying the currency used.
</book>
<book id="3"> This XML structure is useful for representing and
<title>1984</title> organizing information about a collection of books.
<author>George Orwell</author>
Email Example:
<?xml version="1.0" encoding="UTF-8"?>
<email>
<header>
<to>[email protected]</to>
<from>[email protected]</from>
<subject>Meeting Reminder</subject>
<date>2024-08-12</date>
</header>
<body>
<paragraph>Hello John,</paragraph>
<paragraph>This is a reminder about our meeting scheduled for tomorrow at 10 AM.</paragraph>
<paragraph>Best regards,</paragraph>
<paragraph>Jane</paragraph>
</body>
<attachments>
<attachment fileName="agenda.pdf" fileSize="12345" />
<attachment fileName="minutes.docx" fileSize="67890" />
</attachments>
</email>
Weather Example
<?xml version="1.0" encoding="UTF-8"?>
<weatherReport>
<location>
<city>San Francisco</city>
<state>CA</state>
<country>USA</country>
</location>
<currentConditions>
<temperature unit="F">68</temperature>
<humidity>72</humidity>
<condition>Partly Cloudy</condition>
<wind>
<speed unit="mph">8</speed>
<direction>NW</direction>
</wind>
</currentConditions>
<forecast>
<day date="2024-08-12">
<highTemperature unit="F">75</highTemperature>
<lowTemperature unit="F">58</lowTemperature>
<condition>Sunny</condition>
</day>
<forecast>
<day date="2024-08-12">
<highTemperature unit="F">75</highTemperature>
<lowTemperature unit="F">58</lowTemperature>
<condition>Sunny</condition>
</day>
<day date="2024-08-13">
<highTemperature unit="F">72</highTemperature>
<lowTemperature unit="F">60</lowTemperature>
<condition>Partly Cloudy</condition>
</day>
<day date="2024-08-14">
<highTemperature unit="F">70</highTemperature>
<lowTemperature unit="F">59</lowTemperature>
<condition>Showers</condition>
</day>
</forecast>
</weatherReport>
Purchase Order:
<purchase_order>
<order_number>12345</order_number>
<date>2024-08-14</date>
<customer>
<name>John Doe</name>
<address>
<street>123 Elm Street</street>
<city>Springfield</city>
<state>IL</state>
<zip>62701</zip>
</address>
</customer>
<items>
<item>
<product_id>001</product_id>
<description>Laptop</description>
<quantity>1</quantity>
<price>999.99</price>
</item>
<item>
<product_id>002</product_id>
5. Contact Information:
<contacts>
<contact>
<name>John Doe</name>
<phone type="mobile">555-1234</phone>
<email>[email protected]</email>
<address>
<street>123 Elm Street</street>
<city>Springfield</city>
<state>IL</state>
<zip>62701</zip>
</address>
</contact>
<contact>
<name>Jane Smith</name>
<phone type="home">555-5678</phone>
<email>[email protected]</email>
<address>
<street>456 Oak Avenue</street>
<city>Metropolis</city>
<state>NY</state>
<zip>10001</zip>
Movie Database:
<movies>
<movie>
<title>Inception</title>
<director>Christopher Nolan</director>
<release_date>2010-07-16</release_date>
<genre>Science Fiction</genre>
<rating>PG-13</rating>
<description>A mind-bending thriller that explores the concept of dreams within dreams.</description>
</movie>
<movie>
<title>The Matrix</title>
<director>The Wachowskis</director>
<release_date>1999-03-31</release_date>
<genre>Action</genre>
<rating>R</rating>
<description>A hacker discovers the reality he lives in is a simulation controlled by machines.</description>
</movie>
</movies>
Explanation: This XML example describes a movie database, including details like the title, director, release date, genre,
rating, and description for each movie.
XML Syntax
The syntax of XML (eXtensible Markup Language) is designed to be simple and
straightforward, but it must adhere to specific rules to be considered well-
formed.
1. XML Declaration
The XML declaration is optional but recommended. It defines the XML version
and the character encoding used in the document.
<?xml version="1.0" encoding="UTF-8"?>
<parentElement>
<childElement>Child content</childElement>
</parentElement>
<emptyElement />
<![CDATA[ and ]]>: Encloses the text that should not be parsed.
8. Processing Instructions
Processing instructions provide information to the application processing the XML document.
<?target instruction?>
10. Prolog: The prolog includes the XML declaration and any processing instructions, comments, or DOCTYPE
declarations that come before the root element.
<?xml version="1.0" encoding="UTF-8"?>
<!-- This is an XML document -->
<!DOCTYPE note SYSTEM "note.dtd">
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
11. Root Element
Every XML document must have exactly one root element that contains
all other elements.
<rootElement>
<!-- Other elements go here -->
</rootElement>
XML Syntax Rules:
Case Sensitivity: XML tags are case-sensitive. <Note> and <note> are considered different elements.
Proper Nesting: Elements must be properly nested within each other. Improper nesting will cause the
XML to be invalid.
Single Root Element: The XML document must have one, and only one, root element that encloses all other
elements.
<root>
<child1>Content</child1>
<child2>More content</child2>
</root> <!-- Correct -->
<root1></root1>
<root2></root2> <!-- Incorrect: Multiple root elements -->
• By adhering to these syntax rules, an XML document is considered "well-formed" and can be reliably parsed and
processed by XML parsers.
Synatx –rules summary
Definition: The tag that marks the start of an element. It is written within angle brackets.
Syntax: <elementName>
Example: <book>
<book>
Description: In this example, <book> is the opening tag for the "book" element.
2. Closing Tag:
Definition: The tag that marks the end of an element. It is written within angle brackets with a forward slash
before the element name.
Syntax: </elementName>
Example: </book>
</book>
Description: </book> is the closing tag for the "book" element. It corresponds to the opening tag and
signifies the end of the "book" element.
3. Self-Closing Tag:
Definition: A tag that marks an element that does not contain any content or child elements. It is a
combination of an opening and closing tag in one.
Syntax: <elementName />
Example: <br />
<br />
Description: <br /> is a self-closing tag that represents an empty element, such as a line break. Self-closing
tags are often used for elements that do not have any content between the opening and closing tags.
Example of XML Tags in Context
<bookstore>
<book>
<title>XML Developer's Guide</title>
<author>John Doe</author>
<price>39.95</price>
</book>
</bookstore>
Key Points:
• Tags are used to define the structure and content of an XML document.
• Every opening tag must have a corresponding closing tag, except in the case of self-closing tags.
• Tags are enclosed in angle brackets (< >), with opening tags starting with the element name and closing tags
starting with a forward slash (/) followed by the element name.
• Self-closing tags are used for empty elements and combine the opening and closing tag into one.
XML Elements
XML Elements are the primary building blocks of an XML document. An XML
element consists of a start tag, content, and an end tag. Elements can contain text,
other elements (known as child elements), and attributes. They define the structure
and content of the XML data.
Explanation:
<title> is the start tag.
The Great Gatsby is the content.
</title> is the end tag.
Together, they form a complete XML element representing the title of a book.
<book isbn="978-3-16-148410-0">
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<price>10.99</price>
</book>
Explanation:
The <book> element has an attribute isbn with the value "978-3-16-148410-0".
Attributes are used to provide extra information that is not part of the content.
Key Points:
• Elements: Define the structure and content of XML data. They are the core components of an XML document.
• Tags: Elements are defined by a start tag and an end tag. The content between the tags can be text, other elements,
or both.
• Nesting: Elements can be nested within each other to create a hierarchical structure.
• Attributes: Provide additional information about elements. They are included in the start tag and consist of name-
value pairs.
XML Attributes
XML Attributes are used to provide additional information about XML elements.
Attributes are always included within the start tag of an element and consist of a
name-value pair. They are a way to add metadata or properties to elements
without affecting the element's content or structure.
Explanation:
• <book> is the element, and it has two attributes: isbn and format.
• isbn="978-3-16-148410-0" is an attribute that provides the ISBN number of the book.
• format="hardcover" is an attribute that specifies the format of the book.
• The attributes are placed within the start tag of the book element.
Multiple Attributes
An element can have multiple attributes, each separated by a space within the start tag.
Example:
Explanation:
The employee element has three attributes: id, department, and role.
These attributes provide additional details about the employee, such as their ID, department, and role.
55
Well-formed XML Document
• Every element must have both a start tag and an end tag, e.g. <name> ...
</name>
• But empty elements can be abbreviated: <break />.
• XML tags are case sensitive
• XML tags may not begin with the letters xml, in any combination of cases
• Elements must be properly nested, e.g. not <b><i>bold and
italic</b></i>
• Every XML document must have one and only one root element
• The values of attributes must be enclosed in single or double quotes, e.g. <time
unit="days">
• Character data cannot contain < or &
56
Names in XML
• Names (as used for tags and attributes) must begin with a letter or
underscore, and can consist of:
• Letters, both Roman (English) and foreign
• Digits, both Roman and foreign
. (dot)
- (hyphen)
_ (underscore)
: (colon) should be used only for namespaces
• Combining characters and extenders (not used in English)
Transaction Data
Thousands of XML formats exist, in many different industries, to describe day-to-day
data transactions:
Attribute Rules: It specifies which attributes can be used within elements, their types,
and any default values they might have.
Entity Definitions: DTD allows the definition of entities, which are placeholders for
repeatable content or special characters.
Validation: DTD can be used to validate whether an XML document adheres to the
defined structure, ensuring consistency and correctness of the data.
Why Use DTD?
Data Integrity: Ensures that the XML document adheres to a
predefined structure, maintaining data integrity.
Interoperability: Facilitates data exchange between different systems
by enforcing a common data structure.
Validation: Provides a mechanism to check if an XML document is
"well-formed" and "valid" according to the defined rules.
Types of DTD:
• Internal DTD: Embedded directly within the XML document.
• External DTD: Stored in a separate file and referenced by the XML
document.
Example of a Simple DTD
<!DOCTYPE note [
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
<!ATTLIST note
id ID #REQUIRED
>
]>
<note id="n1">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Example of XML with DTD
1. Internal DTD Example:
<!DOCTYPE note [
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
<!ATTLIST note id ID #REQUIRED>
]>
<note id="n1">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
2. External DTD Example: a. External DTD File (note.dtd):
b. XML Document:
<!DOCTYPE root-element [
<!-- Internal DTD declarations here -->
]>
Syntax
<!ELEMENT element-name content-model>
5. Any Content (ANY)/Unrestricted element: The element can contain any type of
content, including text and child elements.
<!ELEMENT element-name ANY>
<!ELEMENT note ANY>
<document>
<title>Sample Title</title> <!—simple Element>
<!DOCTYPE document [
</document>
<!-- Text Content -->
<library>
<!ELEMENT title (#PCDATA)>
<book>
<title>Sample Book</title>
<!-- Element Content -->
<author>Author Name</author> <!—Compound Element>
<!ELEMENT book (title, author, publisher)>
<publisher>Publisher Name</publisher>
<!ELEMENT author (#PCDATA)>
</book>
<!ELEMENT publisher (#PCDATA)>
</library>
<document>
<!-- Mixed Content -->
<paragraph>This is <bold>bold</bold> and <italic>italic</italic>
<!ELEMENT paragraph (#PCDATA | bold | italic)*>
text.</paragraph> <!mixed content>
<!ELEMENT bold (#PCDATA)>
</document>
<!ELEMENT italic (#PCDATA)>
<document>
<p>This is a line.<br/></p> <!--Empty Element
<!-- Empty Content -->
</document>
<!ELEMENT br EMPTY>
<document>
<!-- Any Content -->
<content> <!—unrestricted Element>
<!ELEMENT content ANY>
<title>Sample Title</title>
]>
<description>This can contain any elements.</description>
</content>
</document>
Occurrence Indicator:
<!ELEMENT book (title, author+, publisher?, price, isbn, year?)> <!-- A book must have one title, at least one author,
an optional publisher, one price, one ISBN, and an optional year -->
<books>
<book>
<title>XML Developer's Guide</title>
<author>John Doe</author>
<publisher>Example Press</publisher>
<price>44.95</price>
<isbn>1234567890</isbn>
<year>2020</year>
</book>
3. Attribute Declaration
In XML DTD (Document Type Definition), an attribute is a property associated with an
element that provides additional information about that element. Attributes are used to
define data about elements in XML documents and to constrain their values.
a. Attribute Types:
• Example:
<!ATTLIST person
id ID #REQUIRED
gender (male | female) "male"
>
Attribute Types
1. CDATA: Character data.
<!ATTLIST element-name attribute-name CDATA #IMPLIED>
Example:
<!ATTLIST book isbn CDATA #REQUIRED>
Example XML: <book isbn="1234567890">...</book>
Syntax:
<!ENTITY entity-name "entity-content">
• Example:
<!ENTITY author “BalaguruSwamy">
Usage in XML: Resulting XML:
<document>
<document>
<author>&author;</author>
<author>BalaguruSwamy </author>
</document>
</document
2. External Entities: Refer to an external file or resource. The external
file's content is included at the location where the entity is referenced.
Syntax:
<document>
<img src="&logo;"/>
</document>
This would include the content of logo.png at the src attribute location.
3. Unparsed Entities: Used for referring to data that is not parsed as XML. Often
used for binary data or non-XML formats.
• Syntax:
<document>
<graphic file="&logo;"/>
</document>
Note: The NDATA declaration indicates that the entity refers to unparsed data and should be
handled by an application specified by the notation-name.
Entity Declaration Syntax
Here’s a summary of the syntax for declaring entities in a DTD:
1. Internal Entity:
<!DOCTYPE document [
<!-- Internal Entity -->
<!ENTITY author "John Doe">
<document>
<title>Author Info</title>
<author>&author;</author>
<img src="&logo;"/>
<attachment file="&file;"/>
</document>
5. Notation Declaration
Define notations for non-XML data.
• Example:
• Syntax in DTD:
<!ELEMENT element-name (content-model)>
<!ATTLIST element-name attribute-name CDATA #IMPLIED>
Usage in XML: CDATA sections are used to include text that might otherwise be interpreted as XML
markup. This is especially useful for including code, scripts, or text with special characters.
• Example in XML:
<example>
<![CDATA[
<note>This is a CDATA section. < & > are not parsed as markup.</note>
]]>
</example>
In the above XML, the CDATA section preserves the text exactly as it is, without parsing the <, >, and
& characters as XML markup or entities.
2. PCDATA (Parsed Character Data): Represents text that should be parsed
by the XML processor. This means that any special characters or markup within PCDATA
content are interpreted according to XML rules, such as converting & to &.
• Syntax in DTD:
<!ELEMENT element-name (#PCDATA)>
Usage in XML: PCDATA is the default type of text content in XML elements, where special
characters are converted to their respective entity references.
• Example in XML:
<example>
<text>This is PCDATA. < & > are parsed and must be escaped.</text>
</example>
In the above XML, the <, >, and & characters would be represented as <, >, and &,
respectively, when the XML is processed.
weather_report.xml
<?xml version="1.0" encoding="UTF-8"?> <weather_report>
<!DOCTYPE weather_report [ <location>
<!ELEMENT weather_report (location, date, forecast+)> <city>Mumbai</city>
<!ELEMENT location (city, country, coordinates?)> <country>India</country>
<!ELEMENT city (#PCDATA)> <coordinates>
<!ELEMENT country (#PCDATA)> <latitude>19.0760N</latitude>
<!ELEMENT coordinates (latitude, longitude)> <longitude>72.8777E</longitude>
<!ELEMENT latitude (#PCDATA)> </coordinates>
<!ELEMENT longitude (#PCDATA)> </location>
<!ELEMENT date (#PCDATA)> <date>2024-08-22</date>
<!ELEMENT forecast (time_of_day, temperature, humidity, <forecast wind_direction="SW">
wind_speed, conditions)> <time_of_day>Morning</time_of_day>
<!ELEMENT time_of_day (#PCDATA)> <temperature>28°C</temperature>
<!ELEMENT temperature (#PCDATA)> <humidity>85%</humidity>
<!ELEMENT humidity (#PCDATA)> <wind_speed>20 km/h</wind_speed>
<!ELEMENT wind_speed (#PCDATA)> <conditions>Cloudy</conditions>
<!ELEMENT conditions (#PCDATA)> </forecast>
<!ATTLIST weather_report <forecast>
unit CDATA "metric" <time_of_day>Afternoon</time_of_day>
> <temperature>32°C</temperature>
<!ATTLIST forecast <humidity>70%</humidity>
wind_direction CDATA #IMPLIED <wind_speed>18 km/h</wind_speed>
> <conditions>Sunny</conditions>
Library.xml <library>
<book format="hardcover" language="Hindi">
<?xml version="1.0" encoding="UTF-8"?>
<title>Godaan</title>
<!DOCTYPE library [
<author>Munshi Premchand</author>
<!ELEMENT library (book+)>
<publisher>Saraswati Press</publisher>
<!ELEMENT book (title, author+, publisher?, year, genre,
<year>1936</year>
isbn)>
<genre>Fiction</genre>
<!ELEMENT title (#PCDATA)>
<isbn>9788170281355</isbn>
<!ELEMENT author (#PCDATA)>
</book>
<!ELEMENT publisher (#PCDATA)>
<book>
<!ELEMENT year (#PCDATA)>
<title>Wings of Fire</title>
<!ELEMENT genre (#PCDATA)>
<author>A.P.J. Abdul Kalam</author>
<!ELEMENT isbn (#PCDATA)>
<author>Arun Tiwari</author>
<!ATTLIST book
<year>1999</year>
format (hardcover | paperback | ebook) "paperback"
<genre>Biography</genre>
language CDATA "English"
<isbn>8173711461</isbn>
>
</book>
]>
<book format="ebook">
<title>Ignited Minds</title>
<author>A.P.J. Abdul Kalam</author>
<year>2002</year>
<genre>Non-Fiction</genre>
<isbn>0143424127</isbn>
</book>
Employee.xml
<?xml version="1.0" encoding="UTF-8"?> <!ELEMENT employees (employee+)>
<!DOCTYPE employees SYSTEM "employees.dtd"> <!ELEMENT employee (name, position, department?, email,
<employees> description)>
<employee id="E001"> <!ATTLIST employee id CDATA #REQUIRED
<name>John &nameSeparator; Doe</name> status (active | inactive) "active">
<position>Software Engineer</position> <!ELEMENT name (#PCDATA)>
<department>Development</department> <!ELEMENT position (#PCDATA)>
<email>john.doe@&defaultDomain;</email> <!ELEMENT department (#PCDATA)>
<description>&jobDescription;</description> <!ELEMENT email (#PCDATA)>
</employee> <!ELEMENT description (#PCDATA)>
<employee id="E002">
<name>Jane Smith</name> <!-- Customized Entities -->
<position>Project Manager</position> <!ENTITY nameSeparator ", ">
<department>Management</department> <!ENTITY defaultDomain "example.com">
<email>jane.smith@&defaultDomain;</email> <!ENTITY jobDescription "This employee is part of the team.">
<description>&jobDescription;</description>
</employee> <!-- Internal Entity -->
<employee id="E003"> <!ENTITY amp "&">
<name>Emily Johnson</name> <!ENTITY lt "<">
<position>UX Designer</position> <!ENTITY gt ">">
<department>Design</department> <!ENTITY quot """>
<email>emily.johnson@&defaultDomain;</email> <!ENTITY apos "'">
Limitations of DTD
• There is no built-in data type in DTDs
• No new data types can be created in DTDs
• The use of cardinality in DTDs is limited
• Namespaces are not supported
• DTDs provide very limited support for modularity and reuse
• We cannot put any restrictions on text content
• Defaults for elements cannot be specified
• We have very little control over mixed content
• DTDs are written in strange format and are difficult to validate
XML namespaces
XML namespaces
• XML namespaces are a mechanism in XML to avoid name conflicts by
qualifying element and attribute names with a unique identifier.
• They are essential when combining XML documents from different XML
vocabularies or when different XML vocabularies are used together in a
single document.
• Name collision occurs when elements from two or more documents share
the same name.
• Name collision is not a problem if you are not concerned with validation.
The document content only needs to be well-formed. name collision will
keep a document from being validated
This figure shows name collision
Benefits of Using XML
Namespaces:
• Prevents Name Conflicts: Avoids collisions when combining XML
documents from different sources.
• Supports Modularity: Encourages modularity by allowing XML
documents to use elements and attributes from different
vocabularies.
• Facilitates Integration: Enables the integration of data from various
sources without name clashes.
• Promotes Standardization: Helps in adhering to industry standards
where specific namespaces are used for standard elements and
attributes.
Namespace Declaration Syntax
• A namespace is a defined collection of element and attribute names.
xmlns:prefix=“URI”
<prefix:element>
content
</prefix:element>
• Here, prefix is the namespace prefix and element is the local part of the element
name.
Apply namespace to attribute
<book xmlns:bk="http://example.com/book"
xmlns:trans="http://example.com/translation" bk:title="Mastering
XML" bk:lang="en">
<library>
<item>
<title>Mastering XML</title> <!-- Is this a book or a magazine? -->
<author>Jane Doe</author>
</item>
<item>
<title>Monthly Tech</title> <!-- Is this a book or a magazine? -->
<editor>John Smith</editor>
</item>
</library>
In this example, there’s no way to distinguish whether the title refers to a book or a magazine. This
could lead to confusion or errors in processing the document.
b. Example With Namespaces
Use namespaces to clearly differentiate between books and magazines:
<library xmlns:bk="http://example.com/books"
xmlns:mag="http://example.com/magazines">
<bk:book>
<bk:title>Mastering XML</bk:title>
<bk:author>Jane Doe</bk:author>
</bk:book>
<mag:magazine>
<mag:title>Monthly Tech</mag:title>
<mag:editor>John Smith</mag:editor>
</mag:magazine>
</library>
Declarations
1. Namespace Declaration:
• An XML namespace is declared using the xmlns attribute in the start tag of an element.
• The xmlns attribute's value is a URI (Uniform Resource Identifier) that serves as a
unique identifier for the namespace.
Example:
<root xmlns:prefix="http://example.com/namespace">
<prefix:child>Content</prefix:child>
</root>
<root xmlns="http://example.com/namespace">
<child>Content</child>
</root>
In this case, all child elements within root are part of the default namespace.
3. Using Multiple Namespaces:
Multiple namespaces can be declared within a single XML document by using different
prefixes.
Example:
• ns1 and ns2 are different namespaces used to qualify element1 and element2.
4. Namespace Scope:
The scope of a namespace is limited to the element where it's declared and its
children unless overridden by a new namespace declaration.
Here, the book element and its title child belong to the default namespace
(http://example.com/books), while the author element and its name child belong to a
different namespace (http://example.com/authors).
Types of Declarations
Declare Namespaces:
1. Root Element (Global Scope): Applies the namespace across the
entire document.
2. Child Element (Local Scope): Restricts the namespace to a specific
element and its children.
3. Attributes: Associates a namespace with an attribute, useful for
distinguishing attributes with the same name in different contexts.
4. XML Schema: Declares namespaces for schema elements and
types, setting the target namespace for validation.
1. Root Element (Global Scope):
<library xmlns="http://example.com/library"
xmlns:bk="http://example.com/books"
xmlns:auth="http://example.com/authors">
<bk:book>
<bk:title>Mastering XML</bk:title>
<auth:author>Jane Doe</auth:author>
</bk:book>
</library>
2. Child Element (Local Scope):
<library>
<bk:book xmlns:bk="http://example.com/books">
<bk:title>Mastering XML</bk:title>
<author>Jane Doe</author> <!-- Not in the bk namespace -->
</bk:book>
</library>
3. Attributes:
<library xmlns:bk="http://example.com/books">
<bk:book bk:title="Mastering XML"
xmlns:trans="http://example.com/translation" trans:lang="en">
<trans:translator>Juan Pérez</trans:translator>
</bk:book>
</library>
4. XML Schema
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:bk="http://example.com/books"
targetNamespace="http://example.com/books">
<xs:element name="book“ type="bk:BookType"/>
<xs:complexType name="BookType">
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
Namespace Examples
XML Document with Namespaces: Example1
<library xmlns="http://example.com/library" xmlns:bk="http://example.com/books"
xmlns:auth="http://example.com/authors">
<bk:book>
<bk:title>Mastering XML</bk:title>
<auth:author>
<auth:name>Jane Doe</auth:name>
<auth:birthplace>Chennai</auth:birthplace>
</auth:author>
<bk:published>2024</bk:published>
</bk:book>
<bk:book>
<bk:title>Learning XPath</bk:title>
<auth:author>
<auth:name>John Smith</auth:name>
<auth:birthplace>Pune</auth:birthplace>
</auth:author>
<bk:published>2023</bk:published>
</bk:book>
Example Exaplanation:
1. Namespace Declarations:
• The default namespace (xmlns="http://example.com/library") applies to the library element and any child
elements that don't have a prefix.
• The bk prefix (xmlns:bk="http://example.com/books") is used for elements related to books.
• The auth prefix (xmlns:auth="http://example.com/authors") is used for elements related to authors.
3. XML Structure:
• The library element is the root and doesn't belong to any specific prefixed namespace.
• Inside the library, there are two bk:book elements, each with details like title, author, and published year.
• The author details are within the auth namespace, allowing the separation of book and author information
into distinct, manageable vocabularies.
This example shows how XML namespaces help organize elements and attributes from different vocabularies
within the same document, preventing name conflicts and making the document more modular and
maintainable.
Example Without Namespaces:
<orderInfo> Example2: Order.xml
<id>12345</id>
<date>2024-08-24</date>
<name>John Doe</name>
<address>
<id>67890</id>
<street>Main Street</street> Issues Without Namespaces
<city>Bangalore</city> Ambiguity:
• The id and date elements are used both in the context
<date>2024-08-22</date>
</address> of orders and addresses, leading to potential
<items> confusion. Is id referring to the order ID, the address
<item> ID, or the item ID? What does date represent?
• The name element is used for both the customer’s
<id>001</id>
<name>Laptop</name> name and the product name, which could cause
<quantity>1</quantity> further confusion.
</item>
<item>
<id>002</id>
<name>Mouse</name>
<quantity>2</quantity>
</item>
</items>
Example With Namespaces:
<orderInfo xmlns:ord="http://example.com/order" xmlns:addr="http://example.com/address"
xmlns:prod="http://example.com/items">
<ord:id>12345</ord:id>
<ord:date>2024-08-24</ord:date>
<ord:name>John Doe</ord:name>
<ord:address>
<addr:id>67890</addr:id> Benefits With Namespaces
<addr:street>Main Street</addr:street> a. Clarity and Disambiguation:
<addr:city>Bangalore</addr:city>
<addr:date>2024-08-22</addr:date> • The ord:id, addr:id, and prod:id elements are now clearly
</ord:address> distinct, with ord:id referring to the order ID, addr:id to the
<ord:items> address ID, and prod:id to the product ID.
<ord:item> • Similarly, ord:date and addr:date are clearly distinct,
<prod:id>001</prod:id> representing the order date and the address date, respectively.
<prod:name>Laptop</prod:name> • The ord:name element is now clearly the customer’s name,
<prod:quantity>1</prod:quantity> while prod:name is the product name.
</ord:item> b. Contextual Separation:
<ord:item>
<prod:id>002</prod:id> • Each element is explicitly tied to its context (ord for order, addr
<prod:name>Mouse</prod:name> for address, and prod for product), eliminating any confusion
<prod:quantity>2</prod:quantity> about what each element represents.
</ord:item>
</ord:items>
Default Namespace Declaration
Default Namespace Declaration in XML namespaces assigns a default namespace to
all unprefixed components inside a given scope. This signifies that elements
without a prefix are presumed from the given namespace. The “xmlns” element is
used to declare the default namespace.
Syntax:
<root xmlns="http://example.com/ns">
<child>Content</child>
</root>
• Syntax:
<root xmlns:prefix="http://example.com/ns">
<prefix:child>Content</prefix:child>
</root>
a. Simple Element: A simple element can contain only text. It cannot contain any child
elements or attributes.
Example:
Example:
Syntax:
<xs:element name="address">
<xs:complexType>
<xs:element name="elementName"> <xs:sequence>
<xs:complexType> <xs:element name="street" type="xs:string"/>
<!-- Child elements and/or attributes --> <xs:element name="city" type="xs:string"/>
</xs:complexType> <xs:element name="state" type="xs:string"/>
</xs:element> </xs:sequence>
</xs:complexType>
</xs:element>
2. Attributes
Attributes provide additional information about elements. Unlike
elements, attributes cannot contain other elements.
Syntax:
<xs:attribute name="attributeName" type="xs:dataType" use="optional|required"/>
Example:
<xs:attribute name="id" type="xs:int" use="required"/>
This defines an id attribute that must be an integer and is required.
3. Data Types
XML Schema supports built-in data types, which ensure that the data conforms to a specific format.
Example:
Example:
<xs:complexType name="personType">
<xs:sequence>
<xs:element name="firstName" type="xs:string"/>
<xs:element name="lastName" type="xs:string"/>
<xs:element name="age" type="xs:int"/>
</xs:sequence>
<xs:attribute name="gender" type="xs:string"/>
</xs:complexType>
<xs:element name="person" type="personType"/>
5. Simple Types
Simple types are used to restrict or define custom data types for elements and attributes.
Restriction: Restriction limits the value of a simple type.
Syntax: <xs:simpleType name="simpleTypeName">
<xs:restriction base="xs:dataType">
<!-- Constraints like minLength, maxLength, pattern, etc. -->
</xs:restriction>
</xs:simpleType>
Example:
<xs:simpleType name="zipcodeType">
<xs:restriction base="xs:string">
<xs:pattern value="\d{5}"/>
</xs:restriction>
</xs:simpleType>
a. Sequence: Specifies that child elements must appear in the specified order.
Syntax:
<xs:sequence>
<!-- Define child elements here -->
</xs:sequence>
Example:
<xs:complexType name="addressType">
<xs:sequence>
<xs:element name="street" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="state" type="xs:string"/>
</xs:sequence>
</xs:complexType>
b. Choice:
Specifies that only one of the child elements can appear.
Syntax:
<xs:choice>
<!-- Define child elements here -->
</xs:choice>
Example:
<xs:complexType name="contactInfoType">
<xs:choice>
<xs:element name="email" type="xs:string"/>
<xs:element name="phone" type="xs:string"/>
</xs:choice>
</xs:complexType>
c. All:
Specifies that all child elements must appear, but in any order.
Syntax:
<xs:all>
<!-- Define child elements here -->
</xs:all>
Example:
<xs:complexType name="identityType">
<xs:all>
<xs:element name="firstName" type="xs:string"/>
<xs:element name="lastName" type="xs:string"/>
<xs:element name="id" type="xs:int"/>
</xs:all>
</xs:complexType>
7. Namespaces
Namespaces are used to avoid element name conflicts by qualifying names.
Syntax:
Syntax:
<xs:annotation>
<xs:documentation>
<!-- Description here -->
</xs:documentation>
</xs:annotation>
Example:
Syntax:
<xs:unique name="uniqueConstraintName">
<xs:selector xpath="XPath_expression"/>
<xs:field xpath="XPath_expression"/>
</xs:unique>
Example:
<xs:unique name="uniqueEmployeeID">
<xs:selector xpath=".//employee"/>
<xs:field xpath="employeeID"/>
</xs:unique>
b. Key:
Defines a unique key within a specific scope.
Syntax:
<xs:key name="keyName">
<xs:selector xpath="XPath_expression"/>
<xs:field xpath="XPath_expression"/>
</xs:key>
Example:
<xs:key name="employeeKey">
<xs:selector xpath=".//employee"/>
<xs:field xpath="employeeID"/>
</xs:key>
c. Keyref:
References a key defined elsewhere, establishing a relationship.
Syntax:
Example:
2. type:
Purpose: Defines the data type of the element (e.g., xs:string, xs:int, custom complex types).
Example: <xs:element name="age" type="xs:int"/>
3. minOccurs:
Purpose: Specifies a default value for the element if it is not provided in the XML document.
Example: <xs:element name="country" type="xs:string" default="India"/>
6. fixed:
Purpose: Specifies a fixed value for the element. The XML document must use this exact value.
Example: <xs:element name="currency" type="xs:string" fixed="INR"/>
7. nillable:
Purpose: Indicates whether the element can be explicitly set to nil in the XML document.
Default: false
Example: <xs:element name="middlename" type="xs:string" nillable="true"/>
9. substitutionGroup:
Purpose: Allows one element to be substituted for another in an XML document.
Example: <xs:element name="fulltimeStudent" substitutionGroup="student"/>
10. form:
Purpose: Specifies whether the element must be qualified with a namespace prefix.
Possible Values: qualified, unqualified
Example: <xs:element name="state" type="xs:string" form="qualified"/>
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<Bookstore xmlns:xsi="http://www.w3.org/2001/XMLSchema-
instance" <!-- Root element -->
xsi:schemaLocation="http://example.com/bookstore <xs:element name="Bookstore">
bookstore.xsd"> <xs:complexType>
<Book id="B001"> <xs:sequence>
<Title>The Great Gatsby</Title> <xs:element name="Book" maxOccurs="unbounded">
<Author>F. Scott Fitzgerald</Author> <xs:complexType>
<ISBN>9780743273565</ISBN> <xs:sequence>
<Publisher>Scribner</Publisher> <xs:element name="Title" type="xs:string"/>
<Edition>1</Edition> <xs:element name="Author" type="xs:string"/>
<Price>10.99</Price> <xs:element name="ISBN" type="xs:string"/>
</Book> <xs:element name="Publisher" type="xs:string"/>
<Book id="B002"> <xs:element name="Edition" type="xs:string"/>
<Title>To Kill a Mockingbird</Title> <xs:element name="Price" type="xs:decimal"/>
<Author>Harper Lee</Author> </xs:sequence>
<ISBN>9780061120084</ISBN> <xs:attribute name="id" type="xs:string"
<Publisher>Harper Perennial</Publisher> use="required"/>
<Edition>1</Edition> </xs:complexType>
<Price>7.99</Price> </xs:element>
</Book> </xs:sequence>
</Bookstore> </xs:complexType>
<?xml version="1.0">
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSch
ema">
<?xml version="1.0"?>
<xs:element name="class"> <class xmlns:xsi="www.w3.org/2001/XMLSchema-
<xs:complexType> instance"
<xs:sequence>
<xs:element name="student"> xsi:schemaLocation="http://www.example.com/class
<xs:complexType> student.xsd">
<xs:sequence>
<xs:element name="Name" <student>
type="xs:string"/> <Name>raju</Name>
<xs:element name="Branch" <Branch>CSE</Branch>
type="xs:String"/> <age>20</age>
<xs:element name="age" </student>
type="xs:int"/> </class>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:sequence>
</xs:complexType>
</xs:element>
XSLT
Extensible Stylesheet Language Transformation
XSLT
• XSLT (Extensible Stylesheet Language Transformations) is a powerful
language used to transform XML documents into different formats like
HTML, plain text, or another XML document.
• It works by applying a set of rules (templates) defined in an XSLT
stylesheet to an XML document.
• It separates content from presentation by allowing the data (in XML)
to be transformed into a desired output (e.g., HTML for web pages).
• XSL Family: XSLT is part of the XSL family, which also includes XPath
(used to navigate XML documents) and XSL-FO (used for formatting
XML documents).
Basic Structure of an XSLT
Stylesheet
An XSLT stylesheet is an XML document itself and typically starts with the following
structure:
<xsl:template match="book">
<html>
<body>
<h1><xsl:value-of select="title"/></h1>
<p><xsl:value-of select="author"/></p>
</body>
</html>
</xsl:template>
Control Structures
Conditionals: XSLT supports conditional logic through <xsl:if> and
<xsl:choose>.
Looping: <xsl:for-each> is used to iterate over a set of nodes.
Handling Namespaces
If the XML document uses namespaces, the XSLT stylesheet must account for
them. You can declare namespaces within the XSLT and use them in XPath
expressions.
Example:
<xsl:template match="price">
<xsl:choose>
<xsl:when test=". > 100">
<p>Expensive</p>
</xsl:when>
<xsl:otherwise>
<p>Affordable</p>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
XSLT
The term “XSLT” is generated by combining two words i.e. ‘XSL’ and ‘T’,
‘XSL’ is the short form of ‘Extensible Stylesheet Language’ and ‘T’ is a
short form of ‘Transformation’.
• XSLT provides an easy way to merge XML data into presentation because it applies user defined
transformations to an XML document and the output can be HTML, XML, or any other structured
document.
• XSLT provides Xpath to locate elements/attribute within an XML document. So it is more
convenient way to traverse an XML document rather than a traditional way, by using scripting
language.
• XSLT is template based. So it is more resilient to changes in documents than low level DOM and
SAX.
• By using XML and XSLT, the application UI script will look clean and will be easier to maintain.
• XSLT templates are based on XPath pattern which is very powerful in terms of performance to
process the XML document.
• XSLT can be used as a validation language as it uses tree-pattern-matching approach.
• You can change the output simply modifying the transformations in XSL files.
XSLT Usecases:
1. XML to HTML Transformation: Web content generation, Styling XML data for web presentation
2. XML to XML Transformation: Data interchange between different XML schemas, Schema evolution and
backward compatibility
3. XML to Text Transformation: Report generation (e.g., logs, configuration files),Template-based document
generation (e.g., CSV files)
4. XML to PDF Transformation: Document publishing (e.g., invoices, reports), Automated printing
workflows
5. Data Aggregation: Merging multiple XML documents, Filtering and sorting XML data,
6. Web Services and APIs: SOAP message transformation, API response formatting (e.g., JSON, HTML)
7. Content Management Systems (CMS): Content rendering for different output formats, Template
processing for content presentation
8. Localization and Internationalization : Multilingual content transformation, Date and number formatting
for different locales
9. Configuration File Transformation: Generating configuration files (e.g., JSON, INI) from XML, Dynamic
configuration based on environment
10. Legacy System Integration: Data migration from legacy systems, Interface adaptation for legacy system
requirements
These applications demonstrate the versatility of XSLT in various domains, including web development, data
How XSLT Works
• The XSLT stylesheet is written in XML
format.
• It is used to define the transformation rules
to be applied on the target XML document.
• The XSLT processor(Saxon,Xalan) takes the
XSLT stylesheet and applies the
transformation rules on the target XML
document and then it generates a
formatted document in the form of XML,
HTML, or text format.
• At the end it is used by XSLT formatter to
generate the actual output and displayed
on the end-user.
XSLT Transformation
For starting transformation we need one XML document on which the
XSLT code will run, the XSLT code file itself and the tool or software
having XSLT processor (You can use any free version or trial version of
the software for learning purposes).
XSLT Syntax
Student.xml
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="2.0">
Attributes:
(i) Named Template: When the xsl: template element contains the @name attribute then this is
called Named Template.
<xsl:template name="book">
• Named templates are called by xsl:call-template element.
<xsl:call-template name="book">
(ii) Match Template: The xsl:template element contains the @match attribute that contains a
matching pattern or XPath applied at the input nodes.
<xsl:template match="//book">
Match templates are called by xsl:apply-template element.
<xsl:apply-templates select="book"/>
• xsl:template element must have either@match attribute or @name attribute or both. An
xsl:template element that has no match attribute must have no mode attribute and no priority
#3) <xsl:value-of>
Provide the string/text value regarding the XPath expression defined in the @select attribute, as defined in
the above code.
<xsl:value-of
select = Expression
disable-output-escaping = "yes" | "no">
</xsl:value-of>
or
<xsl:value-of select = "bookname"/>
<xsl:for-each select="store/book">
</xsl:for-each>
• Select: XPath Expression to be evaluated in current context to determine the set of
nodes to be iterated.
• The above code means for each node set of store/book means:
/store/book[1]
/store/book[2]
/store/book[3]
• <xsl:sort> can also be used as a child of xsl:for-each to define the order of sorting.
#5) <xsl:apply-templates>
The processor will find and apply all the templates that are having
XPath defined in the @select attribute.
The @mode attribute is also used if we want to give more than one
way of output with the same input content.
#6) <xsl:call-template>
The processor will make a call to the templates having value inside the
@name attribute (required).
<xsl:if test="count(/store/book)>2">
<xsl:text>
Condition True: Count of books are more than two.
</xsl:text>
</xsl:if>
<xsl:choose>
<xsl:when test="count(/store/book)=1">
Condition True: Count of book is one.
</xsl:when>
<xsl:when test="count(/store/book)=2">
Condition True: Count of book is two.
</xsl:when>
<xsl:when test="count(/store/book)=3">
Condition True: Count of book is three.
</xsl:when>
<xsl:otherwise>
No condition match.
</xsl:otherwise>
</xsl:choose>
Result: Condition True: Count of the book is three.
#11) <xsl:comment>
This element is used to write a comment to the target result, any text
content that sides this tag will be printed as commented output.
<xsl:text>
This is a
text line.
</xsl:text>
Output:
This is a
text line.
#13) <xsl:element>
This will generate an element to the result document with the name
mentioned in its @name attribute. The name attribute is the required
attribute.
<xsl:template match="/">
<xsl:element name="bookcode">
<xsl:value-of select="/store/book[1]/@id"/>
</xsl:element>
</xsl:template>
Result: <bookcode>5350192956</bookcode>
#14) <xsl:attribute>
This will generate an attribute to its parent element in the result document.
The name of the attribute is defined by the name attribute and the value of the
attribute is computed by the XPath mentioned in the select attribute as given
in the below code. The name attribute is the required attribute.
<xsl:template match="/">
<xsl:element name="bookcode">
<xsl:attribute name="id" select="/store/book[1]/@id"/>
</xsl:element>
</xsl:template>
• The access of the global variable is global i.e. the variables can be called within any element and
remain accessible within the stylesheet.
• To define a global variable, we just need to declare that next to the root element of the stylesheet
as shown in the below code in the yellow highlighted, the variable ‘SecondBook’ is the global
variable and it holds the name of the second book.
• The access of the local variable is local to the element in which it is defined i.e. that variable would
not be accessible outside the element in which it is defined as shown in the below code that is grey
highlighted, the variable ‘first book’ is a local variable and it holds the name of the first book.
• To make a call to either the global variable to the local variable the Dollar symbol ($) is used before
the name of the variable, as shown below in yellow highlighted $.
Result:
In the below code, the context items are copied to output & all the children items are called & copied by the
xsl:apply-template recursively.
node()|@* Stands for all the nodes and all their attributes recursively.
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
• Result: This will copy all the nodes and attributes of the source document recursively to the output
document, i.e. it will create an exact copy of the source document.
#10) <xsl:copy-of>
xsl:copy-of will copy the sequence of nodes with all of its children and attributes
recursively by default, due to this nature this is also called deep copying. @select
attribute is required for the evaluation of the XPath.
<xsl:template match="node()|@*">
<xsl:copy-of select="."/>
</xsl:template>
• Result: This will copy all the nodes and attributes of the source document recursively
to the output document, i.e. it will create an exact copy of the source document.
<xsl:copy-of select="."/>
Stands for a copy of the current node and current attribute.
#17) <xsl:Key>
This element is used to declare keys, for the matching pattern values to that particular key.
Relative to @match attribute, the @use attribute is used, it declares the node to get the
value for that key through XPath expression(“publisher”).
Now, suppose if we need the details of the book which is published only by ‘Wrox’
publisher then we can get that value easily through xsl:key element by making a key-value
pair.
#18) <xsl:message>
This element is used for debugging purposes in XSLT development. The element gives its output to the
standard output screen of the application.
The @terminate attribute is used with two values either ‘yes’ or ‘no’, if the value is set to ‘yes’ then the
parser terminates immediately as soon the test condition gets satisfied for the message to get executed.
To understand this, let’s suppose if in our input document the price element comes to empty accidentally
as like in the below code, then the processing should stop immediately as soon as the processor
encounters the empty price element which can be easily achieved by using xsl:message inside the if test
condition as in the below XSLT code.
The value of the <xsl:param> is passed/supplied when the template is called by <xsl:call-
template> or <xsl:apply-templates>.
<xsl:with-param> it passes the value of the parameter defined inside <xsl:param> to the
template. Attribute like @name contains the name of the parameter which should match
the @name attribute of the <xsl:param> element. @Select attribute is used to set a value
to that parameter.
To fetch the value of the parameter same like a variable dollar sign($) is used.
#20) <xsl:import>
<xsl:import> is used to import another stylesheet module inside our current
stylesheet. This helps in achieving a modular XSLT development approach.
After importing all the templates get available to use. The priority of the templates
defined in the parent stylesheet(which is importing another stylesheet) is higher than
the imported stylesheet (which is imported by the parent stylesheet).
If another stylesheet also has the same name template as defined inside the
template that is importing then the foreign templates get overridden by your own
template.
Attribute @href is used as the URI of the stylesheet that you want to import.
<xsl:import href="New_Book.xsl"/>
#21) <xsl:include>
Same as the above xsl:import, <xsl:include> also helps in achieving a
modular XSLT development approach. All the templates included by
<xsl:include> have the same priority/precedence as the calling stylesheet.
It is like you copy all the templates from another stylesheet to your own
stylesheet.
Attribute @href is used as the URI of the stylesheet that you want to
import.
<xsl:include href="New_Book.xsl"/>
#22)<xsl:output>
This element is used to specify the result tree in the output file. It contains attributes like
@method that can have values like ‘XML’, ‘HTML’, ‘XHTML’ and ‘text’ by default is ‘XML’.
@encoding specifies the character encoding that comes in the output file as shown in
below example encoding=”UTF-16″, the default values for XML or XHTML could be
either UTF-8 or UTF-16. @indent specifies the indentation of the XML or HTML output
code, for XML the default value is ‘no’ and for HTML and XHTML the default value is yes.
<xsl:strip-space elements="*"/>
#24) <xsl:preserve-space>
This element is used to preserve white spaces for the listed source
element inside the @element attribute and if we want to preserve
whitespace from all the elements, then we can use ‘*’ inside
@elements attribute.
<xsl:preserve-space elements="*"/>
https://www.softwaretestinghelp.com/xslt-tutorial/
https://www.youtube.com/watch?v=W--Yhp0m35A&list=PLhW3qG5bs-L9DloLUPwC3GdFimY5Ce_gS&index=6
<?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8"?>
<library> <xsl:stylesheet version="1.0"
<book> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author> <xsl:template match="/">
<year>1925</year> <html>
</book> <body>
<book> <h2>Library Books</h2>
<title>To Kill a Mockingbird</title> <table border="1">
<author>Harper Lee</author> <tr>
<year>1960</year> <th>Title</th>
</book> <th>Author</th>
<book> <th>Year</th>
<title>1984</title> </tr>
<author>George Orwell</author> <xsl:for-each select="library/book">
<year>1949</year> <tr>
</book> <td><xsl:value-of select="title"/></td>
</library> <td><xsl:value-of select="author"/></td>
<td><xsl:value-of select="year"/></td>
</tr>
</xsl:for-each>
</table>
</body>
<html>
<body>
<h2>Library Books</h2>
<table border="1">
<tr>
<th>Title</th>
<th>Author</th>
<th>Year</th>
</tr>
<tr>
<td>The Great Gatsby</td>
<td>F. Scott Fitzgerald</td>
<td>1925</td>
</tr>
<tr>
<td>To Kill a Mockingbird</td>
<td>Harper Lee</td>
<td>1960</td>
</tr>
<tr>
<td>1984</td>
<td>George Orwell</td>
<td>1949</td>
</tr>
<?xml version="1.0" encoding="UTF-8"?> <?xml version="1.0" encoding="UTF-8"?>
<company> <xsl:stylesheet version="1.0"
<employee> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<name>John Doe</name>
<position>Software Engineer</position> <xsl:template match="/">
<department>IT</department> <html>
<salary>75000</salary> <body>
</employee> <h2>Company Employee List</h2>
<employee> <table border="1" cellpadding="5">
<name>Jane Smith</name> <tr>
<position>Project Manager</position> <th>Name</th>
<department>IT</department> <th>Position</th>
<salary>85000</salary> <th>Department</th>
</employee> <th>Salary ($)</th>
<employee> </tr>
<name>Michael Johnson</name> <xsl:for-each select="company/employee">
<position>HR Manager</position> <tr>
<department>Human Resources</department> <td><xsl:value-of select="name"/></td>
<salary>68000</salary> <td><xsl:value-of select="position"/></td>
</employee> <td><xsl:value-of select="department"/></td>
</company> <td><xsl:value-of select="salary"/></td>
</tr>
<html>
<body>
<h2>Company Employee List</h2>
<table border="1" cellpadding="5">
<tr>
<th>Name</th>
<th>Position</th>
<th>Department</th>
<th>Salary ($)</th>
</tr>
<tr>
<td>John Doe</td>
<td>Software Engineer</td>
<td>IT</td>
<td>75000</td>
</tr>
<tr>
<td>Jane Smith</td>
<td>Project Manager</td>
<td>IT</td>
<td>85000</td>
</tr>
XPath
XPath
• The XML Path Language (XPath) is used to uniquely identify or address
parts of an XML document.
• An XPath expression can be used to search through an XML document,
and extract information from any part of the document, such as an
element or attribute (referred to as a node in XML) in it. XPath can be
used alone or in conjunction with XSLT.
• XPath is an important and core component of XSLT standard. It is used to
traverse the elements and attributes in an XML document.
• XPath is a W3C recommendation. XPath provides different types of
expressions to retrieve relevant information from the XML document. It
is syntax for defining parts of an XML document.
Features of XPath
• XPath defines structure: XPath is used to define the parts of an XML document
i.e. element, attributes, text, namespace, processing-instruction, comment,
and document nodes.
• XPath provides path expression: XPath provides powerful path expressions,
select nodes, or list of nodes in XML documents.
• XPath is a core component of XSLT: XPath is a major element in XSLT standard
and must be followed to work with XSLT documents.
• XPath is a standard function: XPath provides a rich library of standard functions
to manipulate string values, numeric values, date and time comparison, node
and QName manipulation, sequence manipulation, Boolean values etc.
• Path is W3C recommendation.
XPath Expression
XPath defines a pattern or path expression to select nodes or node sets in an XML document.
These patterns are used by XSLT to perform transformations. The path expressions look like very
similar to the general expressions we used in traditional file system.
XPath specifies seven types of nodes that can be output of the execution of the XPath expression.
1. Root
2. Element
3. Text
4. Attribute
5. Comment
6. Processing Instruction
XPath Nodes
• XPath specifies seven types of nodes that can be output of the execution of the XPath expression.
There are seven kinds of nodes in XPath:
1. Element
2. Attribute
3. Text
4. Namespace
5. Processing-instruction
6. Comment
7. Document nodes.
An XML document can be specified as a tree of nodes. The topmost element of the tree is called the
root element.
Example : An XML document:
<?xml version="1.0" encoding="UTF-8"?> Nodes in the above XML document:
<Library>
<book> • <library> (root element node)
<title lang="en">Three Mistakes of My Life</title> • <author>Chetan Bhagat</author> (element
<author>Chetan Bhagat</author> node)
<year>2008</year> • lang="en" (attribute node)
<price>110</price>
</book>
</Library>
Atomic values: Atomic values are used to specify the nodes with no children or parent. For example: In
the above XML document, following are the atomic values:
Chetan Bhagat
"en"
Relationship of Nodes
Parent Node: Each element and attribute has a
parent which is a top element of the respective 2. Children Nodes: The children nodes can have zero, one or
element or attribute. more children. In this example, the title, author, year, and
price elements are all children of the book element.
example:
<book>
In this example, the book element is the parent of the <title lang="en">Three Mistakes of My Life</title>
title, author, year, and price. <author>Chetan Bhagat</author>
<year>2008</year>
<book> <price>110</price>
<title lang="en">Three Mistakes of My Life</title> </book>
<author>Chetan Bhagat</author>
<year>2008</year> 3. Siblings Nodes: The nodes having the same parent are known
<price>110</price> as siblings. In this example, the title, author, year, and price
</book> elements are all siblings.
<book>
<title lang="en">Three Mistakes of My Life</title>
<author>Chetan Bhagat</author>
<year>2008</year>
<price>110</price>
4. Ancestors: A node's parent or parent's parent is specified as
ancestor. In this example, the ancestors of the title element are
the book element and the library element.
<Library>
<book>
<title lang="en">Three Mistakes of My Life</title>
<author>Chetan Bhagat</author> 5. Descendants: A descendent is specified as a node's
<year>2008</year> children or children's children. In this example,
<price>110</price> descendants of the library element are the book, title,
</book> author, year, and price elements.
</Library>
<Library>
<book>
<title lang="en">Three Mistakes of My Life</title>
<author>Chetan Bhagat</author>
<year>2008</year>
<price>110</price>
</book>
</Library>
XPath Syntax
The XPath expression uses a path notation like URLs, for addressing
parts of an XML document. The expression is evaluated to yield an
object of the node-set, Boolean, number, or string type.
For example, the expression book/author will return a node-set of the
<author> elements contained in the <book> elements, if such elements
are declared in the source XML document.
Wildcard Description
* Matches any element node
@* Matches any attribute node
node() Matches any node of any kind
https://www.javatpoint.com/xpath-comparison-operators
?xml version = "1.0"?>
Employee.xsl
<?xml-stylesheet type = "text/xsl" href = "employee.xsl"?
>
<?xml version = "1.0" encoding = "UTF-8"?>
<class>
<xsl:stylesheet version = "1.0">
<employee id = "001">
xmlns:xsl = "http://www.w3.org/1999/XSL/Transform">
<firstname>Abhiram</firstname>
<xsl:template match = "/" >
<lastname>Kushwaha</lastname>
<html>
<nickname>Manoj</nickname>
<body>
<salary>>15000</salary>
<h3>Details of each Employee. </h3>
</employee>
<table border = "1">
<employee id = "002">
<tr bgcolor = "pink">
<firstname>Akash</firstname>
<th>ID</th>
<lastname>Singh</lastname>
<th>First Name</th>
<nickname>Bunty</nickname>
<th>Last Name</th>
<salary>25000</salary>
<th>Nick Name</th>
</employee>
<th>Salary</th>
</class>
</tr>
<tr>
<td><xsl:value-of select = "/class/employee[1]/@id"/></td>
<td><xsl:value-of select = "/class/employee[1]/firstname"/></td>
<td><xsl:value-of select = "/class/employee[1]/lastname"/></td>
<td><xsl:value-of select = "/class/employee[1]/nickname"/></td>
<td><xsl:value-of select = "/class/employee[1]/salary"/></td>
XPath Paths
There are two types of location paths used to specify the location of node in XML documents.
These paths are 1. absolute or 2. relative path.
1. Absolute Path: An absolute path starts with root node or with '/'.
See this syntax which specifies locating the elements using relative path
to employee node.
1) ancestor It specifies the ancestors of the current nodes which include the parents up
to the root node.
2) ancestor-or-self It specifies the current node and its ancestors.
7) following It specifies all nodes that come after the current node.
8) following-sibling It specifies the following siblings of the context node. Siblings are at the
same level as the current node and share it's parent.
9) namespace It specifies the namespace of the current node.
1) starts-with(string1, string2) It returns true when first string starts with the
second string.
2) contains(string1, string2) It returns true when the first string contains the
second string.
3) substring(string, offset, length?) It returns a section of the string. The section
starts at offset up to the length provided.
4) substring-before(string1, string2) It returns the part of string1 up before the first
occurrence of string2.
5) substring-after(string1, string2) It returns the part of string1 after the first
occurrence of string2.
6) string-length(string) It returns the length of string in terms of
characters.
7) normalize-space(string) It trims the leading and trailing space from
string.
8) translate(string1, string2, string3) It returns string1 after any matching characters
in string2 have been replaced by the characters
in string3.
9) concat(string1, string2, ...) It is used to concatenate all strings.
XML parser validates the document and check that the document is
well formatted.
•Parsers also check whether documents conform to the XML standard
and have a correct structure
•There are two types of XML parsers
1. Validating: check documents against a DTD or an XML
Schema
2. Non-validating: do not check documents against a DTD or an
XML Schema.
Types of XML Parsers
Disadvantages
1) It is memory inefficient. (consumes more memory because the whole
XML document needs to loaded into memory).
2) It is comparatively slower than other parsers.
types of nodes in a DOM Document object
•Document node <?xmI version="1.0"?>
<?xmI-styIesheet type="text/css"
• Element node href=“test.css"?»
«!-- It's an xml-stylesheet processing
•Text node instruction. --»
<!DOCTYPE shapes SYSTEM “shapes.dtd">
•Attribute node <shapes>
•Processing instruction node <squre coIor=“BLUE"»
•Comment node «length» 20 «/Iength»
«/squre>
<lshapes>
• Each element node actually contains a list of other nodes as its children.
• These children might contain text values or other nodes
• DOM preserves the sequence of the elements that it reads from XML documents
<?xmI version="1.0"?>
<?xmI-styIesheet type="text/css" href=“test.css"?»
«!-- It's an xml-stylesheet processing instruction. --»
<!DOCTYPE shapes SYSTEM “shapes.dtd">
<shapes>
<squre coIor=“BLUE"»
«length» 20 «/Iength»
«/squre>
<shapes>
• Represents the content of xml document as tree structure
• It is programming API
• Can easily read,access, update the contents of document
• tree structure stored in the memory and can be used with any
programming language as javascript..,
• You need to move parts of the document around (you might want to
sort certain elements, for example).
• You need to use the information in the document more than once.
What you get?
When you parse an XML document with a DOM parser, you get back a tree
structure that contains all of the elements of your document. The DOM
provides a variety of functions you can use to examine the contents and
structure of the document.
Advantages
Element − The vast majority of the objects you will deal with are Elements.
Document − Represents the entire XML document. A Document object is often referred to as a
DOM tree.
Common DOM methods
When you are working with the DOM, there are several methods that are used often −
• Node.getAttribute(attrName) − For a given Node, returns the attribute with the requested name.
Steps to Use DOM
Following are the steps used while parsing a document using the DOM Parser.
2. Create a DocumentBuilder
5. Examine attributes
6. Examine sub-elements
1. Import XML-related packages
import org.w3c.dom.*; 4. Extract the root element
import javax.xml.parsers.*; Element root =
import java.io.*; document.getDocumentElement();
2. Create a DocumentBuilder 5. Examine attributes
DocumentBuilderFactory factory = //returns specific attribute
getAttribute("attributeName");
DocumentBuilderFactory.newInstance();
//returns a Map (table) of names/values
DocumentBuilder builder = factory.newDocumentBuilder(); getAttributes();
3. Create a Document from a file or stream 6. Examine sub-elements
StringBuilder xmlStringBuilder = new StringBuilder(); //returns a list of subelements of specified name
xmlStringBuilder.append("<?xml version = "1.0"?> <class> </class>"); getElementsByTagName("subelementName");
ByteArrayInputStream input = new ByteArrayInputStream( //returns a list of all child nodes
xmlStringBuilder.toString().getBytes("UTF-8"));
Document doc = builder.parse(input);
getChildNodes();
Demo Example
package com.tutorialspoint.xml;
The above program will generate the following
import java.io.File; result −
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder; Root element :class
import org.w3c.dom.Document; ----------------------------
import org.w3c.dom.NodeList;
import org.w3c.dom.Node; Current Element :student
import org.w3c.dom.Element; Student roll no : 393
First Name : Dinkar
public class DomParserDemo { Last Name : Kad
public static void main(String[] args){ Nick Name : Dinkar
Marks : 85
try {
File inputFile = new File("input.txt"); Current Element :student
DocumentBuilderFactory dbFactory Student roll no : 493
= DocumentBuilderFactory.newInstance(); First Name : Vineet
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder(); Last Name : Gupta
Document doc = dBuilder.parse(inputFile); Nick Name : Vinni
doc.getDocumentElement().normalize(); Marks : 95
SAX
• SAX 1.0 was released on May 11, 1998.
• SAX is a common, event-based API for parsing XML documents
• Primarily a Java API but there implementations in most
• languages
• The current version is SAX 2.0.1, and there are versions for several
programming language environments other than Java
SAX
The XML-DEV mailing group developed a Simple API for XML also called the SAX,
which is an event-driven online algorithm for parsing XML documents. SAX is a
way of reading data from an XML document that is an alternative to the
Document Object Model’s mechanism (DOM). Whereas the DOM works on the
document as a whole, creating the whole abstract syntax tree of an XML
document for the user’s convenience, SAX parsers work on each element of the
XML document sequentially, issuing parsing events while passing through the
input stream in a single pass. Unlike DOM, SAX does not have a formal
specification.
SAX is a programming interface for processing XML files based on events. The
DOM’s counterpart, SAX, has a very different way of reading XML code
Why use SAX Parser
Parsers are used to process XML documents. The parser examines the XML document, checks for
errors, and then validate it against a schema or DTD if it’s a validating parser. The next step is
determined by the parser in use. It may copy the data into a data structure native to the
computer language you’re using on occasion. It may also apply styling to the data or convert it
into a presentation format.
Apart from triggering certain events, the SAX parser does nothing with the data. It is up to the
SAX parser’s user to decide. The SAX events include (among others) as follows:
Advantages
1) It is simple and memory efficient.
2) It is very fast and works for huge documents.
Disadvantages
1) It is event-based so its API is less intuitive.
2) Clients never know the full information because the data is broken into piece
• SAX is an event based parser for XML Documents
• The parser tells the application what is in the documents by notifying
the application of a stream of parsing events.
• Application then processes those events to act on data.
• SAX chooses to give you access to the information in your XML
document, not as a tree of nodes, but as a sequence of events.
• SAX chooses not to create a default object model on top of your XML
document (like DOM does)
When should I use it?
– Large documents
– Memory constrained devices
– If you need not to modify the document
SAX Structure(1/4)
Reailer
Structure(2/4)
SAXParserFactory:A SAXParserFactory object creates an
instance of the parser determined by the system property,
javax.xml.parsers.SAXParserFactory.
• SAXParser:The SAXParser interface defines several
kinds of parse() methods. In general, it passes an XML data
source and a DefaultHandler object to the parser, which processes
the XML and invokes the appropriate methods in the handler
object.
• SAXReader:The SAXParser wraps a SAXReader. Typically,
it doesn't care about that, but every once in a while it needs
to get hold of it using SAXParser's getXMLReader() so that it
can configure it. It is the SAXReader that carries on the
conversation with the SAX event handlers it defines.
Structure(3/4)
• DefaultHandler:Not shown in the diagram, a
DefaultHandler implements the ContentHandler, ErrorHandler,
DTDHandler, and EntityResolver interfaces (with null methods),
so it can override only the ones it is interested in.
• ContentHandler:Methods such as startDocument,
endDocument, startElement, and endElement are invoked
when an XML tag is recognized. This interface also defines the
methods characters and processinglnstruction, which are
invoked when the parser encounters the text in an XML
element or an inline processing instruction, respectively.
• EntityResolver:The resolve Entity method is invoked when
the parser must identify data identified by a URI
Structure(4/4)
• ErrorHandIer:Methods error, fatalError, and warning are
invoked in response to various parsing errors. The default error
handler throws an exception for fatal errors and ignores other
errors (including validation errors). That's one reason you need
to know something about the SAX parser, even if you are
using the DOM.
• Sometimes the application may be able to recover from a
validation error. Other times, it may need to generate an
exception. To ensure the correct handling, you'll need to supply
your own error handler to the parser.
■ DTDHandIer:Defines methods you will generally never be
called upon to use. Used when processing a DTD to recognize
and act on declarations for an unparsed entity.
Event
startDocumen
t
endDocument
startElement
endElement
characters
Pull Parsing Versus Push Parsing
• Streaming pull parsing refers to a programming model in which a
client application calls methods on an XML parsing library when it
needs to interact with an XML infoset--that is, the client only gets
(pulls) XML data
when it explicitly asks for it.
• Streaming push parsing refers to a programming model in which
an XML parser sends (pushes) XML data to the client as the parser
encounters elements in an XML infoset--that is, the parser sends the
data whether
or not the client is ready to use it at that time.
DOM SAX
Tree model parser (Object based) Event based parser (Sequence
(Tree of nodes). of events).
DOM loads the file into the memory SAX parses the file as it reads it,
and then parse- the file. i.e. parses node by node.
DOM is read and write (can insert or SAX is read only i.e. can’t insert or
delete nodes). delete the node.
If the XML content is small, then Use SAX parser when XML content
prefer DOM parser. is large.
Backward and forward search is SAX reads the XML file from top to
possible for searching the tags and bottom and backward navigation is not
evaluation of the information inside the possible.
tags.