0% found this document useful (0 votes)
4 views

Unit 1

Uploaded by

pejemo3978
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Unit 1

Uploaded by

pejemo3978
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

PSIT-Pranveer Singh Institute of Technology

Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India

What is XML?
XML (eXtensible Markup Language) is a markup language designed to store and transport
data. It defines a set of rules for encoding documents in a format that is both human-readable
and machine-readable. Unlike HTML, which is focused on displaying data, XML focuses on
structuring and transporting data across different systems.
 Purpose: XML is used to store and organize data, often in a structured format that can
be shared between different platforms, systems, and applications.
 Self-descriptive: XML allows you to define your own custom tags, making it flexible
for representing any kind of structured data.
<note>
<to>Ram</to>
<from>Shyam</from>
<heading>Reminder</heading>
<body>Don't forget to bring my notebook!</body>
</note>
Key Features of XML
1. Self-descriptive tags: You can define your own tags to describe the data.
2. Platform independent: XML is widely used for data interchange because it can be
processed on any platform.
3. Well-structured: Data is organized hierarchically, making it easy to parse and
manage.
4. Flexible: No predefined tags; users define their own.
Differences Between XML and HTML

HTML (HyperText Markup


Feature XML (eXtensible Markup Language)
Language)

Used to display data (create web


Purpose Used to transport and store data.
pages).

Predefined tags (e.g., <div>, <p>,


Tag Definitions User-defined tags (customizable).
<a>).

Data vs. Focuses on formatting and


Focuses on storing and structuring data.
Presentation displaying content.

BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India

HTML (HyperText Markup


Feature XML (eXtensible Markup Language)
Language)

Self- Tags describe the data (e.g., <to>, Tags describe how content should
descriptive <from>). appear (e.g., <b>, <i>).

Case-sensitive and strict (tags must be Not case-sensitive and lenient


Strictness
closed properly). with structure.

Improper nesting is allowed in


Nesting All elements must be properly nested. some browsers (though not
recommended).

Data Mainly used for exchanging data Used for presenting information
Transport between systems. to the user via a browser.

Whitespace Whitespace is not significant for


Whitespace is preserved by default.
Handling display (ignored in most cases).

Extensible, allowing custom tags based Limited to predefined tags for


Extensibility
on user needs. presentation.

XML can be validated using DTD or HTML validation ensures correct


Validation
XML Schema. rendering, but not strict.

Requires all tags to be closed (e.g., Not all tags require a closing tag
Closing Tags
<tag></tag>). (e.g., <img>).

Data storage, configuration files (e.g., Web page design, document


Use Cases *.xml), data interchange in web services formatting, visual content
(SOAP, REST). presentation.

BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India

1 Document Type Definition (DTD)


 Definition: DTD defines the legal building blocks (elements, attributes) of an XML
document, essentially setting rules that the document must follow.
 Purpose:
o Ensures the XML document is "well-formed" and "valid".
o Allows defining a consistent format for data sharing between systems.

BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India

 Structure:
o DTD can define elements, attributes, entities, and notation declarations.

 Types:
o Internal DTD: Included directly in the XML document.
o Example:
<!DOCTYPE note [
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>

External DTD: Stored in a separate file and referenced from the XML
<!DOCTYPE note SYSTEM "note.dtd">

Limitations:
 DTD does not support data types like integer or date, only text data (PCDATA).
 It lacks support for namespaces, which are used to avoid element name conflicts.

XML Namespaces

XML Namespaces are a method used in XML to avoid naming conflicts between
elements or attributes when combining XML documents from different XML
vocabularies. Namespaces provide a way to uniquely identify elements and attributes

BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India

by associating them with a URI (Uniform Resource Identifier), ensuring that similar
names used in different contexts don't clash.

Why Are XML Namespaces Important?


In XML, you might have elements with the same name but used in different contexts.
For example, if you combine XML data from two different sources—say, an order
system and an inventory system—both might use the <name> tag, but with different
meanings. XML Namespaces resolve this conflict by qualifying each tag with a unique
identifier.
Syntax of XML Namespaces
1. Defining a Namespace: You declare a namespace using the xmlns attribute within the
start tag of an element. The value of this attribute is the URI that identifies the
namespace.
2. Prefixing a Namespace: To use a namespace, you often associate it with a prefix. This
prefix is added to the element names or attribute names to differentiate them from other
namespaces.
<root
xmlns:order="http://example.com/orders" xmlns:inventory="http://example.com/inventory">
<order:name>John Doe</order:name>
<inventory:name>Product XYZ</inventory:name>
</root>

2 XML Schemas
Definition: XML Schema (XSD) is a more advanced and powerful alternative to DTD
that defines the structure of an XML document using XML itself.
Advantages:
Data types: Allows validation of data types (e.g., strings, integers, dates, etc.).
Namespaces: Supports namespaces, making it easier to avoid naming conflicts in large
documents.
Extensibility: Because it's written in XML, schemas can be extended and modified
more easily.
Structure: XML Schema defines elements, attributes, complex types (elements
containing other elements), and simple types (text-based elements).
<xs:element name="person">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>

BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India

<xs:element name="age" type="xs:int"/>


</xs:sequence>
</xs:complexType>
</xs:element>

This schema defines a person element with child elements firstname (string) and age
(integer).
<xs:complexType>:
 Used to define elements that have child elements or attributes.
 Complex types allow you to create hierarchical structures in XML, making the
document more meaningful and organized.
<xs:sequence>:
 Specifies that the child elements must appear in a particular order.
 If the order of the child elements is important, you use <xs:sequence>. In the example,
the firstname element must appear before the age element.
Child Elements:
 firstname is defined as a string, meaning it will hold text.
 age is defined as an integer, meaning it will hold a whole number.
Benefits Over DTD:
 XML Schema supports more sophisticated data validation, such as numeric ranges,
patterns (e.g., email format), and data length constraints.
 It integrates seamlessly with modern XML technologies.
XML Schemas vs. DTD (Document Type Definition)
Both XML Schema (XSD) and Document Type Definition (DTD) are used to define the
structure and rules for an XML document. However, they differ significantly in features,
capabilities, and syntax. Let's compare XML Schema and DTD in terms of various factors.
1. Syntax
 DTD: Uses a non-XML syntax to define the structure of an XML document. It’s
simpler, but less flexible.
 XML Schema: Uses XML syntax, meaning an XML Schema is itself a well-formed
XML document. This makes it more powerful, and easier to integrate and manipulate
using standard XML tools.

BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India

Example of a DTD:
<!DOCTYPE note [
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
Example of an XML Schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
2. Data Types
 DTD: Lacks support for data types. Everything is treated as text (#PCDATA), and you
cannot specify if an element or attribute should be an integer, date, boolean, etc.
 XML Schema: Provides built-in data types (e.g., xs:string, xs:int, xs:date, xs:boolean)
and allows for user-defined data types. This is a major advantage, as it allows for stricter
validation of the data in XML documents.
Example in XML Schema:
xml
Copy code

BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India

<xs:element name="age" type="xs:int"/>


<xs:element name="birthday" type="xs:date"/>
3. Namespace Support
 DTD: Does not support XML Namespaces, which can cause issues when dealing with
documents that use multiple vocabularies or schemas.
 XML Schema: Fully supports XML Namespaces, allowing documents to combine
elements from multiple XML vocabularies without naming conflicts.
Example of a Namespaced XML Schema:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:book="http://example.com/books">
<xs:element name="book" type="book:BookType"/>
</xs:schema>
4. Extensibility
 DTD: Limited extensibility. It lacks the ability to create new, reusable types or extend
existing types.
 XML Schema: Highly extensible. XML Schema allows you to define complex types,
create reusable types, extend and restrict existing types (inheritance), making it suitable
for complex and modular XML document definitions.
Example of Type Inheritance in XML Schema:
<xs:complexType name="person">
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="employee">
<xs:complexContent>
<xs:extension base="person">
<xs:sequence>
<xs:element name="employeeID" type="xs:string"/>

BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India

</xs:sequence>
</xs:extension>
</xs:complexContent>
</xs:complexType>
5. Validation Capabilities
 DTD: Basic validation capabilities, mostly structural. It checks for the presence of
elements and their order but doesn’t validate data types or more complex constraints.
 XML Schema: Advanced validation, including:
o Data type validation (e.g., integer, string, date, etc.).
o Value constraints (e.g., minimum, maximum).
o Key constraints and uniqueness validation (like database keys).
Example of Value Constraints in XML Schema:
<xs:element name="age">
<xs:simpleType>
<xs:restriction base="xs:int">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="150"/>
</xs:restriction>
</xs:simpleType>
</xs:element>

6. Support for Attributes


 DTD: Supports defining attributes for elements, but lacks complex validation rules for
these attributes.
 XML Schema: Allows attributes to be strongly typed, and you can define default
values, required/optional attributes, or use them in complex structures.
Example in XML Schema:
<xs:element name="book">
<xs:complexType>
<xs:attribute name="isbn" type="xs:string" use="required"/>

BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India

</xs:complexType>
</xs:element>
7. Documentation and Readability
 DTD: Simpler syntax and easier to read for small and basic XML structures, but not as
expressive.
 XML Schema: More verbose and complex, but much more powerful and flexible. Its
XML-based syntax allows it to integrate well with other XML technologies.
8. Default Values and Fixed Values
 DTD: Allows defining default values for attributes.
 XML Schema: Supports both default and fixed values for elements and attributes.
Example in XML Schema:
<xs:element name="country" type="xs:string" default="USA"/>
9. Tool Support
 DTD: Older and supported by most XML parsers, but not as well-integrated with
modern XML tools.
 XML Schema: Supported by most modern XML tools, including editors, parsers, and
validators. XML Schema provides better support for integrated development
environments (IDEs) and can be manipulated more easily by software that works with
XML.
10. Modularity and Reuse
 DTD: Limited in terms of modularity. You can reference external DTDs, but it's more
cumbersome to break up and reuse parts of a DTD.
 XML Schema: Modular and reusable. XML Schema allows you to define complex
types, groups, and references that can be reused across multiple schemas, making it
easier to manage large and complex XML applications.
Summary of Differences:

Feature DTD XML Schema (XSD)

Syntax Non-XML syntax XML-based syntax

Limited (no support for Supports complex data types (e.g., int,
Data Types
data types) string)

Namespaces No support Fully supports XML namespaces

BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India

Feature DTD XML Schema (XSD)

Highly extensible (complex types,


Extensibility Not extensible
inheritance)

Basic structural Advanced validation (data types,


Validation
validation constraints, uniqueness)

Strongly typed, with constraints and


Attributes Supported, but limited
default values

Modularity/Reusability Limited Highly modular and reusable

Documentati
Simpler and easier to read More verbose, but flexible and powerful
on

Tool Support Limited to older tools Supported by most modern XML tools

1. Object Models in XML


XML object models are abstractions that represent the structure of XML documents as objects
in a program. They allow developers to manipulate XML data using objects rather than working
directly with raw XML. The two main object models are:
 Document Object Model: Represents the XML document as a tree structure where
each node is an object representing part of the document (elements, attributes, text,
etc.). DOM is commonly used for parsing, creating, and modifying XML documents.
Simple API for XML:
o A stream-based parser that reads the XML document sequentially and triggers
events when it encounters elements, attributes, or other data.
o Simple API for XML is efficient for large XML documents since it doesn’t load
the entire document into memory.
How Simple API for XML Works
 Sequential Processing:
o Simple API for XML reads the XML file from start to finish, line by line. It
doesn’t load the entire document into memory at once.
o Instead, as it encounters each part of the document (such as elements, attributes,
or text), it triggers events like startElement, endElement, and characters.
o These events allow you to process the XML data as you go, without keeping it
all in memory.
 Event-driven Model:

BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India

o As each event is triggered, the application can react to that event (for example,
extracting a value or printing data).
o After the event is handled, Simple API for XML moves on to the next part of
the document, discarding the previous part from memory.
Why This Is Efficient for Large XML Documents
 Low Memory Usage:
o Since Simple API for XML reads and processes the document one piece at a
time, it doesn’t need to load the entire XML document into memory.
o This is particularly useful for large XML files that could be hundreds of
megabytes or even gigabytes in size.
o With Simple API for XML, you only keep the current element or text node in
memory, not the entire document.
 Avoiding Memory Overload:
o In contrast, a method like DOM (Document Object Model) loads the entire
document into memory to create a tree structure, which can consume a large
amount of memory, especially for very large files.
o This could potentially lead to performance issues or even cause the system to
run out of memory when working with large XML documents.
 Efficient Streaming:
o Simple API for XML operates like a stream processor, where data is read and
processed continuously as it is encountered.
o This means it is well-suited for XML data that is being streamed or that doesn’t
need to be stored in memory once processed.
3. Practical Example of Why It’s Efficient
Imagine you have a huge XML file that contains millions of entries, such as a list of
products in a global inventory system:
<inventory>
<product>
<id>1</id>
<name>Product 1</name>
<price>100</price>
</product>
<product>
<id>2</id>

BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India

<name>Product 2</name>
<price>200</price>
</product>
<!-- Millions of more product entries -->
</inventory>
 DOM Approach:
o With DOM, the entire document, including every <product>, must be loaded into
memory, even if you only want to extract information about a specific product.
o As the number of products increases, the memory required to store the entire tree in
memory grows exponentially, and your system might struggle with memory
consumption.
 Simple API for XML Approach:
o With Simple API for XML, the parser reads the XML document one <product>
element at a time. As each <product> is read, the application can process it (e.g.,
extract the product name and price) and then discard it from memory.
o Only the current product being processed is held in memory, making it much
more memory-efficient for large files.
4. Use Case for Simple API for XML
Large Log Files or Streaming Data
Consider a scenario where you are parsing large log files or live data streams (such as
stock prices or weather reports) in XML format. Since Simple API for XML processes
the document sequentially, it can handle real-time data or large datasets efficiently.
Processing Specific Information
Suppose you only need certain information from a massive XML file, such as specific
tags or elements (e.g., products that are out of stock). With Simple API for XML, you
can efficiently extract just the data you need, without having to load or retain the rest
of the document.

2. Presenting and Using XML


XML is a versatile format used for data storage, communication between systems, and
configuration. It can be used in various ways, depending on the application:

BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India

 Data Interchange: XML is widely used to exchange data between applications.


Systems with different platforms, programming languages, or architectures can use
XML as a standardized format.
 Presentation of Data: With technologies like XSLT (Extensible Stylesheet
Language Transformations), XML data can be transformed into various formats, such
as HTML, PDF, or plain text, for presentation.
 Configuration Files: Many software applications and servers use XML for
configuration files because it’s readable, structured, and easy to update.
Example:
<book>
<title>Learning XML</title>
<author>John Doe</author>
<price>29.99</price>
</book>
This data can be transformed into an HTML webpage using XSLT or presented as a table in a
report.

 Pros of DOM:
o Random access to any part of the document.
o Full manipulation capabilities (add, delete, modify elements).
 Cons of DOM:
o Memory-intensive since the entire document is loaded into memory.
o Not ideal for large XML files.
 Example:
<library>
<book>
<title>XML Basics</title>
<author>Jane Smith</author>
</book>
</library>
In DOM, this would be represented as:

BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India

o Root node: library


o Child node: book
o Sub-child nodes: title and author
Operations with DOM:
// Accessing elements using JavaScript's DOM API
let books = xmlDocument.getElementsByTagName("book");
books[0].getElementsByTagName("title")[0].childNodes[0].nodeValue = "Advanced XML";
 Pros of Simple API for XML:
o Memory-efficient for large documents since it processes the document
sequentially and doesn’t store the whole document in memory.
o Suitable for reading large XML files or streams where you don’t need random
access to the entire document.
 Cons of Simple API for XML:
o Cannot modify the XML document since it doesn’t build a complete tree
structure.
o Only allows sequential access (i.e., you can’t go back to a previous node once
processed).
 Example in Simple API for XML:
<bookstore>
<book>
<title>XML Guide</title>
<author>John Doe</author>
</book>
</bookstore>
In Simple API for XML, the parser will fire events when it reads the <bookstore>, <book>,
<title>, <author>, and the text content within these elements.

Comparison Between DOM and Simple API for XML

Feature DOM Simple API for XML

Type Tree-based (object model) Event-based (sequential parser)

BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur
PSIT-Pranveer Singh Institute of Technology
Kanpur-Delhi National Highway (NH-19), Bhauti, Kanpur-209305 (U.P.), India

Feature DOM Simple API for XML

Memory
High (loads entire document) Low (reads element by element)
Usage

Data Access Random access to any part Sequential (no random access)

Manipulation Full read/write capabilities Read-only (no modification)

Suitable for Small to medium documents Large documents or streams

When you need to modify or query For parsing large documents without
Use Case
specific parts of the document loading everything into memory

BCS-502 Web Technology Nikita Tiwari, Asst. Professor, CSE, PSIT Kanpur

You might also like