Unit 1
Unit 1
What is XML?
XML (eXtensible Markup Language) is a markup language designed to store and transport
data. It defines a set of rules for encoding documents in a format that is both human-readable
and machine-readable. Unlike HTML, which is focused on displaying data, XML focuses on
structuring and transporting data across different systems.
Purpose: XML is used to store and organize data, often in a structured format that can
be shared between different platforms, systems, and applications.
Self-descriptive: XML allows you to define your own custom tags, making it flexible
for representing any kind of structured data.
<note>
<to>Ram</to>
<from>Shyam</from>
<heading>Reminder</heading>
<body>Don't forget to bring my notebook!</body>
</note>
Key Features of XML
1. Self-descriptive tags: You can define your own tags to describe the data.
2. Platform independent: XML is widely used for data interchange because it can be
processed on any platform.
3. Well-structured: Data is organized hierarchically, making it easy to parse and
manage.
4. Flexible: No predefined tags; users define their own.
Differences Between XML and HTML
BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India
Self- Tags describe the data (e.g., <to>, Tags describe how content should
descriptive <from>). appear (e.g., <b>, <i>).
Data Mainly used for exchanging data Used for presenting information
Transport between systems. to the user via a browser.
Requires all tags to be closed (e.g., Not all tags require a closing tag
Closing Tags
<tag></tag>). (e.g., <img>).
BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India
BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India
Structure:
o DTD can define elements, attributes, entities, and notation declarations.
Types:
o Internal DTD: Included directly in the XML document.
o Example:
<!DOCTYPE note [
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
External DTD: Stored in a separate file and referenced from the XML
<!DOCTYPE note SYSTEM "note.dtd">
Limitations:
DTD does not support data types like integer or date, only text data (PCDATA).
It lacks support for namespaces, which are used to avoid element name conflicts.
XML Namespaces
XML Namespaces are a method used in XML to avoid naming conflicts between
elements or attributes when combining XML documents from different XML
vocabularies. Namespaces provide a way to uniquely identify elements and attributes
BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India
by associating them with a URI (Uniform Resource Identifier), ensuring that similar
names used in different contexts don't clash.
2 XML Schemas
Definition: XML Schema (XSD) is a more advanced and powerful alternative to DTD
that defines the structure of an XML document using XML itself.
Advantages:
Data types: Allows validation of data types (e.g., strings, integers, dates, etc.).
Namespaces: Supports namespaces, making it easier to avoid naming conflicts in large
documents.
Extensibility: Because it's written in XML, schemas can be extended and modified
more easily.
Structure: XML Schema defines elements, attributes, complex types (elements
containing other elements), and simple types (text-based elements).
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India
This schema defines a person element with child elements firstname (string) and age
(integer).
<xs:complexType>:
Used to define elements that have child elements or attributes.
Complex types allow you to create hierarchical structures in XML, making the
document more meaningful and organized.
<xs:sequence>:
Specifies that the child elements must appear in a particular order.
If the order of the child elements is important, you use <xs:sequence>. In the example,
the firstname element must appear before the age element.
Child Elements:
firstname is defined as a string, meaning it will hold text.
age is defined as an integer, meaning it will hold a whole number.
Benefits Over DTD:
XML Schema supports more sophisticated data validation, such as numeric ranges,
patterns (e.g., email format), and data length constraints.
It integrates seamlessly with modern XML technologies.
XML Schemas vs. DTD (Document Type Definition)
Both XML Schema (XSD) and Document Type Definition (DTD) are used to define the
structure and rules for an XML document. However, they differ significantly in features,
capabilities, and syntax. Let's compare XML Schema and DTD in terms of various factors.
1. Syntax
DTD: Uses a non-XML syntax to define the structure of an XML document. It’s
simpler, but less flexible.
XML Schema: Uses XML syntax, meaning an XML Schema is itself a well-formed
XML document. This makes it more powerful, and easier to integrate and manipulate
using standard XML tools.
BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India
Example of a DTD:
<!DOCTYPE note [
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
Example of an XML Schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
2. Data Types
DTD: Lacks support for data types. Everything is treated as text (#PCDATA), and you
cannot specify if an element or attribute should be an integer, date, boolean, etc.
XML Schema: Provides built-in data types (e.g., xs:string, xs:int, xs:date, xs:boolean)
and allows for user-defined data types. This is a major advantage, as it allows for stricter
validation of the data in XML documents.
Example in XML Schema:
xml
Copy code
BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India
<xs:complexType name="employee">
<xs:complexContent>
<xs:extension base="person">
<xs:sequence>
<xs:element name="employeeID" type="xs:string"/>
BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India
</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
5. Validation Capabilities
DTD: Basic validation capabilities, mostly structural. It checks for the presence of
elements and their order but doesn’t validate data types or more complex constraints.
XML Schema: Advanced validation, including:
o Data type validation (e.g., integer, string, date, etc.).
o Value constraints (e.g., minimum, maximum).
o Key constraints and uniqueness validation (like database keys).
Example of Value Constraints in XML Schema:
<xs:element name="age">
<xs:simpleType>
<xs:restriction base="xs:int">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="150"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India
</xs:complexType>
</xs:element>
7. Documentation and Readability
DTD: Simpler syntax and easier to read for small and basic XML structures, but not as
expressive.
XML Schema: More verbose and complex, but much more powerful and flexible. Its
XML-based syntax allows it to integrate well with other XML technologies.
8. Default Values and Fixed Values
DTD: Allows defining default values for attributes.
XML Schema: Supports both default and fixed values for elements and attributes.
Example in XML Schema:
<xs:element name="country" type="xs:string" default="USA"/>
9. Tool Support
DTD: Older and supported by most XML parsers, but not as well-integrated with
modern XML tools.
XML Schema: Supported by most modern XML tools, including editors, parsers, and
validators. XML Schema provides better support for integrated development
environments (IDEs) and can be manipulated more easily by software that works with
XML.
10. Modularity and Reuse
DTD: Limited in terms of modularity. You can reference external DTDs, but it's more
cumbersome to break up and reuse parts of a DTD.
XML Schema: Modular and reusable. XML Schema allows you to define complex
types, groups, and references that can be reused across multiple schemas, making it
easier to manage large and complex XML applications.
Summary of Differences:
Limited (no support for Supports complex data types (e.g., int,
Data Types
data types) string)
BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India
Documentati
Simpler and easier to read More verbose, but flexible and powerful
on
Tool Support Limited to older tools Supported by most modern XML tools
BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India
o As each event is triggered, the application can react to that event (for example,
extracting a value or printing data).
o After the event is handled, Simple API for XML moves on to the next part of
the document, discarding the previous part from memory.
Why This Is Efficient for Large XML Documents
Low Memory Usage:
o Since Simple API for XML reads and processes the document one piece at a
time, it doesn’t need to load the entire XML document into memory.
o This is particularly useful for large XML files that could be hundreds of
megabytes or even gigabytes in size.
o With Simple API for XML, you only keep the current element or text node in
memory, not the entire document.
Avoiding Memory Overload:
o In contrast, a method like DOM (Document Object Model) loads the entire
document into memory to create a tree structure, which can consume a large
amount of memory, especially for very large files.
o This could potentially lead to performance issues or even cause the system to
run out of memory when working with large XML documents.
Efficient Streaming:
o Simple API for XML operates like a stream processor, where data is read and
processed continuously as it is encountered.
o This means it is well-suited for XML data that is being streamed or that doesn’t
need to be stored in memory once processed.
3. Practical Example of Why It’s Efficient
Imagine you have a huge XML file that contains millions of entries, such as a list of
products in a global inventory system:
<inventory>
<product>
<id>1</id>
<name>Product 1</name>
<price>100</price>
</product>
<product>
<id>2</id>
BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India
<name>Product 2</name>
<price>200</price>
</product>
<!-- Millions of more product entries -->
</inventory>
DOM Approach:
o With DOM, the entire document, including every <product>, must be loaded into
memory, even if you only want to extract information about a specific product.
o As the number of products increases, the memory required to store the entire tree in
memory grows exponentially, and your system might struggle with memory
consumption.
Simple API for XML Approach:
o With Simple API for XML, the parser reads the XML document one <product>
element at a time. As each <product> is read, the application can process it (e.g.,
extract the product name and price) and then discard it from memory.
o Only the current product being processed is held in memory, making it much
more memory-efficient for large files.
4. Use Case for Simple API for XML
Large Log Files or Streaming Data
Consider a scenario where you are parsing large log files or live data streams (such as
stock prices or weather reports) in XML format. Since Simple API for XML processes
the document sequentially, it can handle real-time data or large datasets efficiently.
Processing Specific Information
Suppose you only need certain information from a massive XML file, such as specific
tags or elements (e.g., products that are out of stock). With Simple API for XML, you
can efficiently extract just the data you need, without having to load or retain the rest
of the document.
BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India
Pros of DOM:
o Random access to any part of the document.
o Full manipulation capabilities (add, delete, modify elements).
Cons of DOM:
o Memory-intensive since the entire document is loaded into memory.
o Not ideal for large XML files.
Example:
<library>
<book>
<title>XML Basics</title>
<author>Jane Smith</author>
</book>
</library>
In DOM, this would be represented as:
BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India
BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India
Memory
High (loads entire document) Low (reads element by element)
Usage
Data Access Random access to any part Sequential (no random access)
When you need to modify or query For parsing large documents without
Use Case
specific parts of the document loading everything into memory
BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur