0% found this document useful (0 votes)
26 views43 pages

XML (Extensible Markup Language) UNIT-4: DR Anupama Jha

XML BCA WT SUBJECT

Uploaded by

golu9354singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views43 pages

XML (Extensible Markup Language) UNIT-4: DR Anupama Jha

XML BCA WT SUBJECT

Uploaded by

golu9354singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

XML(Extensible Markup Language)

UNIT-4

Dr Anupama Jha
Dr Anupama Jha
Introduction : XML
XML stands for Extensible Mark-up Language, is actually a simpler and easier-to-use
text-based mark-up language and derived from Standard Generalized Markup Language
(SGML), the standard for how to create a document structure
It is a language used to create other markup languages to describe data in a structured
format.
It is approach for describing, capturing, processing and publishing information
XML is "extensible" because we can use it to make up our own tags i.e user defined tags,
hence XML is designed to be self-descriptive
XML 1.0 was officially adopted as a W3C recommendation in 1998.
XML was designed to carry data, not to display data.
XML Supports CSS, XSL, DOM.
XML does not qualify to be a programming language as it does not performs any
computation or algorithms.
It is usually stored in a simple text file and is processed by special software that is capable
of interpreting XML.

Dr Anupama Jha
Features/Characteristics of XML
XML documents are highly portable.
One important characteristics of XML is that it is both human readable and machine
readable.
Processing an XML document requires a software program called an XML parser or
processor.
Parser check an XML document’s syntax and enable software programs to process mark up
data.
XML parser support the DOM.
An XML document can reference a DTD (Document Type Definition) or a Schema that
describe the proper structure of the XML document.

Dr Anupama Jha
XML Based system
If an XML Parser can process an XML document successfully, the XML document is
well-formed. Hence, a valid document is well formed. If the XML document confirms to
the DTD/Schema, the XML document is Valid.

XML
DOCUMENT
XML PARSER OR XML
PROCESSOR APPLICATION

XML DTD

Dr Anupama Jha
Example: Valid XML-
A well-formed XML with DTD
Filename: x1dtd.xml

<?xml version="1.0" encoding="UTF-8"?>


<!DOCTYPE note [
Example: Well-formed XML <!ELEMENT note (to, from, heading, body)>
Filename: x1.xml <!ELEMENT to (#PCDATA)>
<?xml version="1.0" encoding="UTF-8"?> <!ELEMENT from (#PCDATA)>
<note> <!ELEMENT heading (#PCDATA)>
<to>Joe</to>
<!ELEMENT body (#PCDATA)>
<from>Smith</from>
<heading>Reminder</heading> ]>
<body>Hello World!</body> <note>
</note> <to>Joe</to>
<from>Smith</from>
<heading>Reminder</heading>
<body>Hello World!</body>
</note>

Dr Anupama Jha
The Difference between XML and HTML
1. HTML is about displaying information, where as XML is about carrying information. In
other words, XML was created to structure, store, and transport information. HTML was
designed to display the data.
2. Using XML, we can create own tags where as in HTML it is not possible instead it offers
several built in tags.
3. XML is platform independent neutral and language independent.
4. XML tags and attribute names are case-sensitive where as in HTML it is not.
5. XML attribute values must be single or double quoted where as in HTML it is not
compulsory.
6. XML elements must be properly nested.
7. All XML elements must have a closing tag.

Dr Anupama Jha
Dr Anupama Jha
Well Formed XML Documents
A "Well Formed" XML document must have the following correct XML syntax:
XML documents must have a root element
XML elements must have a closing tag(start tag must have matching end tag).
XML tags are case sensitive
XML elements must be properly nested Ex:<one><two>Hello</two></one>
XML attribute values must be quoted
XML with correct syntax is "Well Formed" XML. XML validated against a DTD is
"Valid“ XML.

Dr Anupama Jha
XML Naming Rules
XML Naming Rules
XML elements must follow these naming rules:
•Element names are case-sensitive
•Element names must start with a letter or underscore
•Element names cannot start with the letters xml (or XML, or Xml, etc)
•Element names can contain letters, digits, hyphens, underscores, and periods
•Element names cannot contain spaces
Any name can be used, no words are reserved (except xml).

Best Naming Practices


Create descriptive names, like this: <person>, <firstname>, <lastname>.
Create short and simple names, like this: <book_title> not like this: <the_title_of_the_book>.
Avoid "-". If you name something "first-name", some software may think you want to subtract
"name" from "first".
Avoid ".". If you name something "first.name", some software may think that "name" is a
property of the object "first".
Avoid ":". Colons are reserved for namespaces (more later).
Non-English letters like éòá are perfectly legal in XML, but watch out for problems if your
software doesn't support them.
Dr Anupama Jha
Dr Anupama Jha
The XML prolog does not have a closing
tag! This is not an error. The prolog is not
Dr Anupama Jha
a part of the XML document.
What is Markup?

XML is a markup language that defines set of rules for encoding documents in a format
that is both human-readable and machine-readable.

Dr Anupama Jha
<?xml version="1.0" encoding="UTF-8"?>
Example of xml document:
<note>
x1.xml
<to>Joe</to>
<from>Smith</from>
<heading>Reminder</heading>
<body>Hello world!</body>
</note>
The xml document begins with XML declaration statement:
The next line describes the root element of the document:</note>
This element is "the parent" of all other elements.
The next 4 lines describe 4 child elements of the root: to, from, heading, and body.
And finally the last line defines the end of the root element:</note>
The XML declaration has no closing tag i.e </?xml>
The file name extension used for xml program is.xml.

Dr Anupama Jha
XML Parsers
An XML parser is a software library or package that provides interfaces for client
applications to work with an XML document. i. e TO READ/BUILD/WRITE XML
Document
All modern browsers have a built-in XML parser that can be used to read and manipulate
XML.
XML parser validates the document and check that the document is well formatted.

Dr Anupama Jha
Types of XML Parsers
These are the two main types of XML Parsers:
1. DOM (Document Object Model)- object based XML Parsers
2. SAX (Simple API for XML)- event based

Dr Anupama Jha
1. XML DOM (Document Object Model)
An XML DOM document is an object which contains all the information of an XML document.
It is composed like a tree structure.
The DOM Parser implements a DOM API. This API is very simple to use.
The DOM
Features of DOM Parser
A DOM Parser creates an internal structure in memory which is a DOM document object and the client applications
get information of the original XML document by invoking methods on this document object.
DOM Parser has a tree based structure.
Advantages
1) It supports both read and write operations and the API is very simple to use.
2) It is preferred when random access to widely separated parts of a document is required.
Disadvantages
1) It is memory inefficient. (consumes more memory because the whole XML document needs to loaded into memory).
2) It is comparatively slower than other parsers.

Dr Anupama Jha
XML DOM

<?xml version="1.0" encoding="UTF-8"?>


<bookstore>
<book category="cooking">
<title lang="en">Everyday
Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
</bookstore>

Dr Anupama Jha
2. SAX (Simple API for XML)
A SAX Parser implements SAX API. This API is an event based API and less intuitive.
Features of SAX Parser
• It does not create any internal structure.
• Clients does not know what methods to call, they just overrides the methods of the API and place his own code inside
method.
• It is an event based parser, it works like an event handler in Java.

Advantages
1) It is simple and memory efficient.
2) It is very fast and works for huge documents.

Disadvantages
1) It is event-based so its API is less intuitive.
2) Clients never know the full information because the data is broken into pieces.

Dr Anupama Jha
Some common SAX events :

• startDocument() and endDocument() – Method called at the start and end of an XML document.
• startElement() and endElement() – Method called at the start and end of a XML element.
• characters() – Method called with the text contents in between the start and end of an XML element

Example of an XML file which contains the following code:


<name>Deitel</name>

The SAX parser read the above XML file and calls the following events or methods sequentially:
1. startDocument()
2. startElement() – <name>
3. characters() – Deitel
4. endElement() – </name>
5. endDocument()

Dr Anupama Jha
Valid XML document
If an XML document is well-formed and has an associated Document Type Declaration
(DTD), then it is said to be a valid XML document.
We will study more about DTD in the chapter XML - DTDs.

XML DTD
Document Type Definition purpose is to define the structure of an XML document.
It defines the structure with a list of defined elements in the xml document.
Using DTD we can specify the various elements types, attributes and their relationship with
one another.
Basically DTD is used to specify the set of rules for structuring data in any XML file.

Dr Anupama Jha
Why use a DTD?

XML provides an application independent way of sharing data.


With a DTD, independent groups of people can agree to use a common DTD for
interchanging data.
Your application can use a standard DTD to verify that data that you receive from the
outside world is valid.
You can also use a DTD to verify your own data.

Dr Anupama Jha
DTD - XML building blocks

Various building blocks of XML are:


1. Elements
2. Tags
3. Attribute
4. Entities
5. CDATA
6. PCDATA

Dr Anupama Jha
DTD - XML building blocks
1. Elements:
The basic entity is element. The elements are used for defining the tags. The elements typically consist of
opening and closing tag. Mostly only one element is used to define a single tag.

Syntax1: <!ELEMENT element-name (element-content)>


Syntax 2: <!ELEMENT element-name (#CDATA)>
#CDATA means the element contains character data that is not supposed to be parsed by a
parser. or
Syntax 3: <!ELEMENT element-name (#PCDATA)>
#PCDATA means that the element contains data that IS going to be parsed by a parser. Or
Syntax 4: <!ELEMENT element-name (ANY)>
The keyword ANY declares an element with any content.
Example:
<!ELEMENT note (#PCDATA)>

Dr Anupama Jha
Elements with children (sequences)

Elements with one or more children are defined with the name of the children elements
inside the parentheses:

<!ELEMENT parent-name (child-element-name)>EX:<!ELEMENT student (id)>


<!ELEMENT id (#PCDATA)> or
<!ELEMENT element-name(child-element-name, child-element-name,. .....)>
Example: <!ELEMENT note (to, from, heading, body)>

Dr Anupama Jha
Elements with children (sequences) contd…

When children are declared in a sequence separated by commas, the children must appear
in the same sequence in the document. In a full declaration, the children must also be
declared, and the children can also have children. The full declaration of the note document
will be:

<!ELEMENT note (to, from, heading, body)>


<!ELEMENT to (#CDATA)>
<!ELEMENT from (#CDATA)>
<!ELEMENT heading (#CDATA)>
<!ELEMENT body (#CDATA)>

Dr Anupama Jha
2. Tags
Tags are used to markup elements. A starting tag like <element_name> mark up the
beginning of an element, and an ending tag like </element_name> mark up the end of an
element.
Examples:
A body element: <body>body text in between</body>.
A message element: <message>some message in between</message>

Dr Anupama Jha
3. Attribute
The attributes are generally used to specify the values of the element. These are specified
within the double quotes. Ex:

<?xml version="1.0" encoding="UTF-8"?>


<!DOCTYPE note [
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
<!ATTLIST to id CDATA "123">
]>
<note>
<to id="123">Joe</to>
<from>Smith</from>
<heading>Reminder</heading>
<body>Hello World!</body>
</note>
Dr Anupama Jha
Dr Anupama Jha
Dr Anupama Jha
<?xml version = "1.0"?>
<!DOCTYPE address [
<!ELEMENT address ( name )>
<!ELEMENT name ( #PCDATA )>
<!ATTLIST name id CDATA "0">
]> <?xml version = "1.0"?>
<address> <!DOCTYPE address [
<name id = "123"> Smith</name> <!ELEMENT address ( name )>
</address> <!ELEMENT name ( #PCDATA )>
<!ATTLIST name id CDATA #REQUIRED>
In this example we have name element with ]>
attribute id whose default value is 0. The
<address>
default value is been enclosed within the
<name id = "123"> Smith</name>
double quotes. </address>

In this example we have used #REQUIRED keyword to specify


that the attribute id must be provided for the element-name name
Dr Anupama Jha
<?xml version = "1.0"?>
<!DOCTYPE address [
<!ELEMENT address (company)*>
<!ELEMENT company (#PCDATA)>
<!ATTLIST company name #FIXED “Microsoft">
]>
<address>
<company name = “Microsoft"> welcome</company>
</address>

Invalid xml
<company name = “TCS"> welcome</company>
In this example we have used the keyword #IMPLIED as we do not
Use the #FIXED keyword when you want an attribute to want to specify any attributes to be included in element name. It is
have a fixed value/constant value without allowing the optional.
author to change it. If an author includes another value, In other words, if the attribute you are declaring has no default value,
the XML parser will return an error. has no fixed value, and is not required, then you must declare that the
attribute as implied.

Dr Anupama Jha
4. Entities
Entities as variables used to define common text. Entity references are references to
entities.
Most of you will known the HTML entity reference: "&nbsp;" that is used to insert an extra
space in an HTML document. Entities are expanded when a document is parsed by an
XML parser.
The following entities are predefined in XML:
&lt; (<), &gt;(>), &amp;(&), &quot;(") and &apos;(')

Dr Anupama Jha
CDATA and PCDATA
CDATA: It stands for character data. CDATA is text that will NOT be parsed by a parser.
Tags inside the text will NOT be treated as markup and entities will not be expanded.
PCDATA: It stands for Parsed Character Data(i.e., text). Any parsed character data should
not contain the markup characters. The markup characters are < or > or &. If we want to
use these characters then make use of &lt; , &gt; or &amp;. Think of character data as the
text found between the start tag and the end tag of an XML element. PCDATA is text that
will be parsed by a parser. Tags inside the text will be treated as markup and entities will be
expanded.

Dr Anupama Jha
<!DOCTYPE note
[
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
Where PCDATA refers parsed character data. In the above xml document the elements to, from, heading,
body carries some text, so that, these elements are declared to carry text in DTD file.
This definition file is stored with .dtd extension.

Dr Anupama Jha
Types of DTD:
1. Internal DTD and 2. External DTD

1. Internal DTD
A DTD is referred to as an internal DTD if elements are declared within the XML files.
To refer it as internal DTD, standalone attribute in XML declaration must be set to yes. This means, the
declaration works independent of external source.

Syntax:
The syntax of internal DTD is as shown:
<!DOCTYPE root-element [element-declarations]>
Where root-element is the name of root element and element-declarations is where you declare the elements.

Dr Anupama Jha
Example: Internal DTD

Following is a simple example of internal DTD:


<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE address [
<!ELEMENT address (name, company, phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>Piyush</name>
<company>TCS</company>
<phone>(040) 123-4567</phone>
</address>
Dr Anupama Jha
Let us go through the above code: Start Declaration- Begin the XML declaration with following statement DTD- Immediately after the
XML header, the document type declaration follows, commonly referred to as the DOCTYPE:

<!DOCTYPE address [
The DOCTYPE declaration has an exclamation mark (!) at the start of the element name. The
DOCTYPE informs the parser that a DTD is associated with this XML document.
DTD Body- The DOCTYPE declaration is followed by body of the DTD, where you declare
elements, attributes, entities, and notations:
<!ELEMENT address (name, company, phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
Several elements are declared here that make up the vocabulary of the <name> document.
<!ELEMENT name (#PCDATA)> defines the element name to be of type "#PCDATA".
Here #PCDATA means parse-able text data. End Declaration - Finally, the declaration
section of the DTD is closed using a closing bracket and a closing angle bracket (]>). This
effectively ends the definition, and thereafter, the XML document follows immediately.

Dr Anupama Jha
Rules
The document type declaration must appear at the start of the document (preceded only by
the XML header) — it is not permitted anywhere else within the document.
Similar to the DOCTYPE declaration, the element declarations must start with an
exclamation mark.
The Name in the document type declaration must match the element type of the root
element.

Dr Anupama Jha
File extdtd.dtd
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note SYSTEM "extdtd.dtd"> <!ELEMENT note (to, from, heading, body)>
<note> <!ELEMENT to(#PCDATA)>
<to>Joe</to> <!ELEMENT from(#PCDATA)>
<from>Smith</from> <!ELEMENT heading (#PCDATA)>
<heading>Reminder</heading> <!ELEMENT body (#PCDATA)>
<body>Hello World!</body>
</note>

Dr Anupama Jha
XML example
<?xml version="1.0" encoding="UTF-8"?>
<books>
<heading> web technology </heading>
<book>
<title> WT</title>
<author> Deitel</author>
<ISBN>123-456-789</ISBN>
<publisher>wiley</publisher>
<edition>3</edition>
<price>350</price>
</book>
<book>
<title> internet worldwideweb</title>
<author> ditel&amp;ditel</author>
<ISBN>123-456-781</ISBN>
<publisher>pearson</publisher>
<edition>3</edition>
<price>450</price>
</book>
</books>

Dr Anupama Jha
XML with CSS
rule.css
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href='rule.css'?> books
<books> {
<heading> web technology </heading> background-color: blue;
<book> width: 100%;
<title> WT</title> }
heading
<author> Deitel</author>
{
<ISBN>123-456-789</ISBN> color: green;
<publisher>wiley</publisher> font-size: 40px;
<edition>3</edition> background-color: yellow;
<price>350</price> }
</book> heading, title, author, publisher, edition, price
<book> {
<title> internet worldwideweb</title> color: green;
}
<author> ditel&amp;ditel</author>
title
<ISBN>123-456-781</ISBN> {
<publisher>pearson</publisher> font-size: 25px;
<edition>3</edition> font-weight: bold;
<price>450</price> }
</book>
</books>

Dr Anupama Jha
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE bookstore SYSTEM "extdtdforbs.dtd">
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>

<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>

<book category="web">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<year>2003</year>
<price>49.99</price>
</book> External DTD : extdtdforbs.dtd
<book category="web" cover="paperback">
<title lang="en">Learning XML</title> <!ELEMENT bookstore (book+)>
<author>Erik T. Ray</author>
<year>2003</year>
<!ELEMENT book (title, author, year, price)>
<price>39.95</price> <!ELEMENT title (#PCDATA)>
</book>
<!ELEMENT author (#PCDATA)>
</bookstore> <!ELEMENT year (#PCDATA)>
<!ELEMENT price (#PCDATA)>
Dr Anupama Jha

You might also like