0% found this document useful (0 votes)
67 views56 pages

Web IV Unit Notes

XML is a markup language that allows users to define their own tags to structure documents. It was created in 1996 to facilitate data sharing across different systems. XML uses elements enclosed in tags to describe the structure and meaning of content. Well-formed XML documents must have a root element containing other properly nested elements, with start and end tags. Elements can contain text content and attributes to further describe the data.

Uploaded by

Sudheer Mamidala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views56 pages

Web IV Unit Notes

XML is a markup language that allows users to define their own tags to structure documents. It was created in 1996 to facilitate data sharing across different systems. XML uses elements enclosed in tags to describe the structure and meaning of content. Well-formed XML documents must have a root element containing other properly nested elements, with start and end tags. Elements can contain text content and attributes to further describe the data.

Uploaded by

Sudheer Mamidala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

UNIT – IV

Web Technologies

XML Introduction
What is Markup?

XML as a markup language characterizes a set of rules for encoding scripts


(documents) in an arrangement that is both comprehensible and machine-
decipherable. Thus, a developer would love to ask know, what precisely is a
markup language? Markup is data added to a script that upgrades its
significance in so many ways, it distinguishes, however, the components and
how they identify with one another. Particularly, a markup language is a
bunch of emblems that can be put in the content of a script to differentiate
and name the pieces of that script.

Consider the following example of XML markup when put together in a bit
of text:

1 <message>
2 <text>Hello, world!</text>
3 </message>
This piece incorporates the markup emblem, or the labels, for example,
<message>…</message> also, <text>… </text>. The labels (tags)
<message> and </message> mark the beginning and the end of the XML
code section. The labels (tags) <text> and </text> encompass the content
Hello, world!.

The process of the invention of XML started around the 1990s with the sole
aim of integrating the definition of new text elements. XML Working Group
(Initially known as the SGML Editorial Review Board) created XML in the
year 1996. The group was led by Jon Bosak of Sun Microsystems with the
dynamic cooperation of an XML Special Interest Group (recently known as
the SGML Working Group) likewise sorted out by the W3C. Don Connolly
who filled in as the Working Group‘s contact was among the team as a
contact with the World Wide Web Consortium (W3W).

Extensible markup language (XML) is a script (document) formatting


language consumed by a few websites. The extensibility of XML makes it
imperative, however, extremely useful language. More precisely, XML is a
disentangled type of standard generalized markup language (SGML)
expected to target scripts that are circulated on the Internet. Similar to
SGML, XML utilizes document type definitions (DTDs) for characterization
of documents and also the implications of tags utilized in them. XML gives a
larger number of sorts of hypertext joins than HTML, for example,
bidirectional connections and connections comparative with a script
subsection. Furthermore, the ability of XML to adopt conventions allows it
fully interpret and decipher text elements. For example, script elements are
set by start and end tags, <BEGIN>… </BEGIN>.

The plan objectives for XML are:

 XML will be direct-usable over the Web.


 XML will uphold a broad assortment of uses.
 XML will be viable with SGML.
 It will be anything but difficult to compose programs which measure
XML scripts (documents).
 The quantity of alternative highlights in XML is meant to be
indisputably the base, preferably zero.
 The XML documents ought to be human-intelligible and sensibly
clear.
 The XML configuration ought to be arranged rapidly.
 XML document shall be any means be straightforward and simple to
create.
By far, XML is immensely significant. Dr. Charles Goldfarb who was
individually engaged during its innovation said, ―the sacred goal of
computing, tackling the issue of general data trade between unique
frameworks.‖ It is likewise a helpful organization for virtually all things
ranging from circumscribing files to data and scripts of practically any sort.
XML is a user-friendly language and automatically produced that you would
not need to be very much familiar with everything commands/specifications
so as to execute programs or technically benefit from it. What makes a
difference is to comprehend the main logic behind XML and what it does,
and consequently, you can perceive how to manoeuvre it in your own
activities.

History of XML
Here are some significant XML historical milestones:

 SGML was also the source of XML.


 In February 1998, XML version 1.0 was released.
 IETF Proposed Standard: XML Media Types, January 2001
 The Extensible Markup Language (XML) is an acronym for
Extensible Markup Language.
 GML was invented in 1970 by Charles Goldfarb, Ed Mosher, and
Ray Lorie.
 Sun Microsystems pioneered the invention of XML in 1996.

Features of XML
 It represents an extensible markup language.
 It was invented to be naturally engaging.
 There are no predefined XML tags. You must define your
personalized tags.
 XML was created to transport data, not to display that data.
 The mark-up code of XML is simple for a human to understand.
 The structured format, on the other hand, is simple to read and write
from programmes.
XML, like HTML, is an extensible markup language.
HTML and XML

 XML is extensible, while HTML is not.


 Both XML and HTML are markup languages.
 XML was invented to backlog and convey data, while HTML is
meant for publishing and visualizing data.
 HTML tags are more defined than XML tags.

Advantages of XML/ Disadvantages of XML


Here are some key advantages of utilizing XML:

 Documents can now be moved between systems and applications.


We can swiftly communicate data between platforms with the help
of XML.
 XML decouples data from HTML.
 The platform switching procedure is speeded up with the help of
XML.
 User-defined tags / Customised tags can be created.
Here are some disadvantages of utilizing XML:

 The use of a processing application is necessary for XML


 The XML syntax is quite similar to other ‗text-based‘ data transfer
protocols, which might be perplexing at times.
 There is no intrinsic data type support.
 The XML syntax is superfluous.

XML Tree
Elements trees are used to create XML documents.

An XML tree begins with a root element and branches to child elements.

All elements can have child elements (sub-elements):

<root>

<child>

<subchild>…..</subchild>

</child>

</root>

To describe the relationships between elements, the terms parent, child, and
sibling are utilized.

Parents have kids. Parents exist for children. Siblings are children who are on
the same grade level (brothers and sisters).

Text content (Harry Potter) and attributes (category=‖cooking‖) are allowed


for all elements.
XML Syntax
The syntax rules of XML are exceptionally basic and cogent. The standards
are anything but straightforward to learn and simple to utilize. Under this
part, we are going to simply explore the basic syntax rule for writing an
uncomplicated XML document.

The question is: are you ready? Consider the following example of making a
complete XML document

<?xml version = "1.0"?>


<contact-info>
<name>Anil Kumar</name>
<company>GreatLearning</company>
<phone>(91) 987-3679</phone>
</contact-info>
You can see there are two sorts of data in the above model –

 Markup, as <contact-info>
 The text, or the character information, Great Learning and (91) 987-
3679.
Self – Describing Syntax:

XML has a very self-descriptive syntax.

A prologue specifies the XML version as well as the character encoding:

<?xml version=‖1.0″ encoding=‖UTF-8“?>

The following line is the document’s main component:

<bookstore>

The following line begins a <book> element:

<book category=‖cooking‖>

The elements <book> have four child elements: <title>, <author>,


<year>, and <price>

<title lang=‖en‖>Two States</title>


<author>Chethan Bhagath</author>
<year>2003</year>
<price>200.00</price>

The following line brings the book element to a close:

</book>

Example of XML Document:

1
2
3 <?xml version="1.0" encoding="UTF-8"?>
4 <bookstore>
<book category="novel">
5 <title lang="en">Two States</title>
6 <author>Chetan Bhagath</author>
7 <year>2005</year>
8 <price>300.00</price>
</book>
9 <book category="children">
10 <title lang="en">Harry Potter</title>
11 <author>J K. Rowling</author>
12 <year>2005</year>
<price>295.99</price>
13 </book>
14 <book category="web">
15 <title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
16 <year>2003</year>
17 <price>339.95</price>
18 </book>
19 </bookstore>
20
21
XML syntax alludes to the principles that decide how an XML application
can be composed. The XML syntax is extremely direct. Thereupon, this
makes XML exceptionally simple to learn. The following are the central
matters to recollect while making XML script.

 XML components/elements must have an end tag.


 XML labels/tags are case touchy.
 All XML components must be appropriately nested.
 All XML scripts must have a root component.
 Attributes esteem should consistently be cited.
XML Documents
As characterized in this particular, a data object becomes an XML document
once it is well-formed. A very much formed XML document may, moreover,
be legitimate if some precise requirements are met. Physical structure and
logics exist in every single XML document. Actually, the document is
formed of divisions that are named substances (entities). An entity may
allude to different entities to push their integration and consideration in the
document. That said, a document begins in a document entity. Coherently,
the document is made out of declaration, component or elements, comments,
character references, and processing instruction, which are all shown by
explicit markup. Properly, the physical structure and logic must nest
ultimately.

For every XML document, it must have a solitary tag-pair to characterize a


root element. All different elements must be inside this root element. Also,
all elements can now have sub-elements. The so-called sub-elements must be
appropriately nested inside their parent element.

Now, take a look at this example:

<root>
<child>
<subchild>.....</subchild>
</child>
</root>
XML Document Rules

In the event that you‘ve seen HTML documents, you‘re acquainted with the
essential ideas of utilizing tags to markup the content of a document. This
segment examines the contrasts between HTML records and XML
documents. It goes over the essential principles of XML documents and talks
about the phrasing used to depict them.

One significant point about XML documents is: The XML detail requires a
parser to dismiss any XML document that doesn‘t adhere to the fundamental
principles. Virtually all HTML parsers will acknowledge messy markup,
thereby, making a theory with respect to what the developer of the document
proposed. To dodge the approximately organized wreck found in the normal
HTML document, the makers of XML chose to uphold document structure
from the earliest starting point.

Note: A parser is a bit of code that endeavors to pursue a document and


decipher its substance/contents.
There are three types of XML documents:

1. Valid Document: Valid documents observe both the XML syntax structure
rules and the standards characterized in their DTD or composition (schema).

2. Invalid Document: Invalid documents don‘t keep the syntax structure rules
characterized by the XML particular. Once a developer characterizes some
certain rules for what a document may contain in a DTD or schema, and the
document doest observe those rules of a developer, then, that document
remains invalid.

3. Well-formed documents keep the XML syntax structure rules yet don‘t
have a DTD or pattern (schema).

The root component

Accurately, an XML document must be enclosed in a solitary element. That


solitary element is known as the root element, and it encloses all the content
and some other elements in the documents. In the accompanying instance, the
XML document is enclosed in a solitary element, the <greeting> element.
Kindly, notice the document has a remark that is outside the root element;
that is totally legitimate.

Are you excited to explore examples? Let us roll on this one

- <?xml version="1.0"?>
- <!-- A well-formed document -->
- <greeting>
- Hello, World!
- </greeting>
Here comes a document that doesn‘t contain a single root element:

- <?xml version="1.0"?>
- <!-- An invalid document -->
- <greeting>
- Hello, World!
- </greeting>
- <greeting>
- Namaste, Duniya!
- </greeting>
An XML parser is designed to dismiss this document, nonetheless, of the
data, it might contain.

Well-Formed XML Documents


An object becomes a well-formed XML document if it possesses the
following characteristics:

i. If taken overall, it coordinates the creation named document.

ii. If it plays catch-up with all the well-formedness requirements given in this
detail.

iii. For each of the parsed elements which are referred to in a direct or
indirect way in the document is well-formed

For instance,

document ::= prolog element

Coordinating the document creation infers that:

i. It has at least one or more than one element.

ii. Or, there is actually one element, called the root, document element, no
portion of which shows up in the substance of some other element. For every
other element, if the beginning tag is in the substance of another element, the
end-tag is in the substance of a similar element. All the more just expressed,
the elements, delimited by the beginning-and end-tags, nest appropriately
inside one another.

As an outcome of this, for each non-root component X in the document, there


is one other element Y in the document with the end goal that X is in the
substance of Y, however, isn‘t in the substance of whatever other elements
that are in the substance of Y. Y is alluded to as the parent of X, and X as an
offspring of Y.

XML data consists of an essential unit called an XML document, and this
particular unit is made out of elements, plus another markup in an old
package. In precise, an XML document has a wide vast assortment of data.
For instance, raw data of numbers, numbers in the textual representation of
molecular-structure, or numerical equations.

The following fully display sections of an XML document


1. Document Prolog Section: This singular part of the document reign at the
top of the document, before the document element (root element). It contains
the XML declaration and Declaration type of a document.

2. Document Elements Section: The document elements are the backbone of


XML. It divides the document into some sort of segment, each filling a
particular need. Therein, you can isolate a document into numerous
segments; hence, they can be delivered in an unexpected way, or utilized
through a search engine. The elements can be described as containers with a
mix of texts, and different elements.

XML Document Example

<?xml version = "1.0"?> Document Prolog


<contact-info>
<name>Anil Kumar</name>
<company>GreatLearning</company> Document Elements
<phone>(91) 987-3679</phone>
</contact-info>

XML Declaration
The XML declaration demonstrates that the document is written in XML and
determines which variant of XML. The XML declaration, whenever
included, must be on the first line of the document. Likewise, the XML
declaration can indicate the language encoding for the document
(discretionary) and if the application alludes to external entities
(discretionary). For example, let us determine that the document utilizes
UTF-8 encoding (in spite of the fact that we don‘t generally need to as UTF-
8 is the default), and we indicate that the document alludes to external
entities by utilizing standalone=‖no‖. This isn‘t an independent document as
it depends on an external resource (for example the DTD). Despite the fact
that the XML declaration is discretionary, the W3C suggests that you
remember it for your XML documents. Regardless, you will need the XML
declaration to effectively validate your document.

Virtually all XML documents begin with an XML declaration that gives
fundamental data about the document to the parser. An XML declaration is
suggested, however not needed. Whenever it is being used in the document, it
must be the first thing. The declaration shall have up-to three name-value sets
(some people call the name-value attributes, albeit actually, they‘re most
certainly not). The adaptation is the rendition of XML utilized; at present,
this worth must be 1.0. The encoding is the character set utilized in this
document. The ISO-8859-1 character set referred to in this statement
incorporates the entirety of the characters utilized by most Western European
dialects. In the event that no encoding is indicated, the XML parser expects
that the characters are in the UTF-8 set, a Unicode standard that underpins
essentially every character and ideograph from the world‘s dialects.

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>


At last, standalone, which is either yes or no, characterizes whether this
document can be handled without perusing some other files. For instance, if
the XML document doesn‘t reference some other documents, you would
determine standalone=‖yes‖. On the off chance that the XML document
references different records that depict what the document can enclose (more
about those files in a moment), you could indicate standalone=‖no‖. Since
standalone=‖no‖ is the default, you infrequently observe independence in
XML declarations.

<?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>


Rules of governing XML Declaration

An XML declaration ought to follow the accompanying guidelines:

i. Once the XML declaration is available in the XML, it must be put as the
mainline in the XML document.

ii. If the XML declaration is incorporated, it must have an attribute version


number.

iii. Case sensitive is the name of the parameters and values and must start
with ―<?xml>‖ where ―xml‖ is written in lower-case.

iv. The names are consistently contained in lower case.

v. The request for putting the parameters is significant. The right request is
version, encoding, and standalone.

vi. Either single or twofold statements (quotes) may be utilized.

vii. The XML declaration has no end tag, for example </?xml>

viii. An encoding can be overrun by an HTTP protocol that you included in


the declaration of XML.

XML Declaration Examples


The steps:

1. XML declaration with no parameter:

<?xml>
2. XML declaration with version definition:

<?xml version=‖1‖>
3. XML declaration with all parameters defined:

<?xml version=‖1‖ encoding=‖UTF-8‖ standalone=‖no‖ ?>


4. XML declaration with all parameters defined in single quotes:

<?xml version=‘1‘ encoding=‘iso-8859-1‘ standalone=‘no‘ ?>


Root Element

For every XML document, it must contain one root element ONLY.
Eventually, other root elements will be situated within the one root element.

For instance

<root>
<child>Data</child>
<child>More Data</child>
</root>
XML statement without any boundaries

As stated above, an XML Declaration shows up as the principal line of an


XML document. Its utilization is discretionary. Find the below example of
declaration: encoding demonstrates how the individual pieces relate to a
character set, version demonstrates the XML version, and standalone shows
whether an external sort definition must be counselled so as to accurately
handle the document.

An XML document can alternatively be written as follows –

<?xml version = "1.0" encoding = "UTF-8"?>


Note: In the above, version is the XML version and encoding describes the
character encoding used inside the document.

XML Tags
Evidently, the XML tags are one of the most significant parts of XML. Tags
establish and building rock of XML. They characterize the extent of a
component in XML. They can likewise be utilized to embed comment,
declare settings needed for parsing the environment, and to embed
uncommon instructions.

That said, we can extensively classify XML tags as follows:

1. Start tag: The start of each non empty XML element is set apart by a
start tag. Consider an example below:

<address>

1. End tag: Each element that has start-tag must have an end tag.
Consider an example below:

</address>
Note: The end-tags incorporate a solidus (“/”) right before the name of an
element.

1. Empty tag: When a text appears between the start-tag and eng-tag, it
is called content. An element is called empty when it has no content.
An empty-tag can be written in the following ways:

 A start tag quickly followed by an end-tag: <hr></hr>


 A total empty element tag: <hr />
Empty-element tags might be utilized for any component which has no
content.

Elements/Tags

Elements are demarcated with < and >. Like we said, element names are case
sensitive and can‘t incorporate spaces (the full character set can be found in
the specification). Therefore, attributes can be included as space-isolated
name or value pairs with values encased in quotes. (either single or double
quotes).

<sometag attrname="attrvalue">
The structure of XML

In addition to text, elements may also contain different elements.

• Start-tag starts with ―<‖ and end with ―>‖.


• End-tag starts with ―<‖ and end with ―>‖.

• Empty tags (for example tags with no content, and the start-tag is quickly
trailed by an end-tag) can on the other hand be spoken to by a single-tag.
These empty-tags start with ―<‖ and end with ―/>‖. As such, empty-tags are
handwriting. For instance: <br><br> is equivalent to <br/>. This means that,
while changing HTML over to XHTML, all <br> tags must be in both of the
permitted types of the empty tags.

• Every start-tag must contain an end-tag and should be appropriately nested.


For instance, coming up next isn‘t very much formed, since it isn‘t
appropriately nested.

<x><a>mmm<b>mmm</a>mmm</b></x>
Below is well-formed:

<x><a>mmm<b>mmm</b></a><b>mmm</b></x>
To Do:

Most current HTML web-browsers can effectively deal with inappropriately


nested documents. Is this piece of the HTML detail? Attempt to discover
more about the likenesses and contrasts among XML and HTML tags. End
tags can‘t be left out. In the example beneath, the markup isn‘t legitimate on
the grounds that there are no end section ( </p>) tags. While this is worthy in
HTML (and, at times, SGML), a XML parser will dismiss it.

1. <!– NOT legal XML markup –>


2. <p>Yada yada yada…
3. <p>Yada yada yada…
4. <p>…
In the event that a component contains no markup at all it is called an
unfilled component; the HTML break ( <br>) and picture ( <img>)
components are two models. In empty elements in XML documents, you can
place the end cut in the start-tag. The two break elements and the two images
elements underneath mean something very similar to a XML parser:

1. <!– Two equivalent break elements –>


2. <br></br>
3. <br />
4. <!– Two equivalent image elements –>
5. <img src=‖../img/c.gif‖></img>
6. <img src=‖../img/c.gif‖ />

XML Elements
A XML document is organized by a few XML elements, additionally called
XML-nodes or XML tags. The names of XML-elements are encased in
triangular brackets < > as appeared below

<element>
Syntax Rules for Elements and Tags

Element syntax: Each XML element should be closed either with the start
elements or end elements as appeared below

<element>....</element>
or on the other hand in basic cases, simply along these lines −

<element/>
Elements nesting: XML element may contain various XML elements as its
kids, however, the kids elements must not in any way over-lap – that‘s an
end-tag of an element must contain a similar name as that of the latest
unrivaled start-tag.

Below example shows inaccurate nested tags:

<?xml version = "1.0"?>


<contact-info>
<company>GreatLearning
</contact-info>
</company>
Below example shows accurate nested tags:

<?xml variant = "1.0"?>


<contact-info>
<company>GreatLearning</company>
<contact-info>
Root Element

An XML document can contain just one root element. For instance, an
example given below isn‘t correct XML-document, in light of the fact that
both a and b elements happen at the high level without a root element.
<a>...</a>

<b>...</b>
The correct syntax is as follows

<root>
<a>...</a>
<b>...</b>
</root>
Case Sensitivity: XML-elements names are case sensitive. The names of
XML-components are case-touchy. That implies the name of the start-
elements and end-elements should actually be in a similar case.

For instance, <contact-info> is not the same as <Contact-Info>

As rightly stated above, XML is case sensitive. XML is a case sensitive


language.

For instance:

This is correct

<from>Deepak</from>
This is incorrect

The first letter of the start-tag is in small letter, while the first letter of the
end-tag is in capital letter, and hence, this is an incorrect/invalid XML.

<from>Deepak</From>
Root Element is mandatory in XML: XML-document must contain a root-
element. A root-element can contain child-elements have and sub-child
elements.

For instance: In the accompanying XML-document, <message> is the root-


element and <to>, <from>, <subject> and <text> are child elements.

<?xml version="1.0" encoding="UTF-8"?>


<message>
<to>Anuj</to>
<from>Deepak</from>
<subject>Message from teacher to Student</subject>
<text>You have an exam tomorrow at 8:00 AM</text>
</message>
The accompanying XML document is invalid, for there exists no root-
element.
<?xml version="1.0" encoding="UTF-8"?>
<to>Anuj</to>
<from>Deepak</from>
<subject>Message from teacher to Student</subject>
<text>You have an exam tomorrow at 8:00 AM</text>
XML elements must contain an end-tag. Most XML documents must contain
an end-tag.

<text classification = message>hello</text> - >correct


<text classification = message>hello - >wrong
It‘s invalid to discard the end-tag when you‘re making XML syntax. XML-
elements must contain an end tag.

Invalid syntax:

<body>See Spot run.


<body>See Spot get the ball.

Valid syntax:

<body>See Spot run.</body>


<body>See Spot get the ball.</body>
Element Type Declarations

An element declaration type takes the form. The element-structure of an


XML-document can, for approval intentions, be obliged utilizing element-
type and attribute list declaration. An element-type declaration obliges the
element‘s declarations. Element-type declarations regularly compel which
element types can show up as children of the element. At user choice, an
XML processor MAY give an admonition when a declaration makes
reference to an element-type for which no declaration is given, however, this
isn‘t a mistake.

Find examples of element type declarations:

<!ELEMENT br EMPTY>
<!ELEMENT p (#PCDATA|emph)* >
<!ELEMENT %name.para; %content.para; >
<!ELEMENT container ANY>
Element-type mustn‘t be declared more than one time.

Element Content

An element-type has element content when elements of that type SHOULD


have only child elements, alternatively isolated by white-space (characters
matching the non-terminal S). Definition: For this situation, the limitation
incorporates a content model, a Basic English structure influencing the
permitted types of the child elements and the order in which they are
permitted to appear.

The grammar is built-on content particles, which has names, choice-lists-of-


content-particles, or sequence-lists-of-content-particles:

Element-content Models

 children ::= (choice | seq) (‗?‘ | ‗*‘ | ‗+‘)?


 cp ::= (Name | choice | seq) (‗?‘ | ‗*‘ | ‗+‘)?
 choice ::= ‗(‗ S? cp ( S? ‗|‘ S? cp )+ S? ‗)‘
Proper Group/PE Nesting

 seq ::= ‗(‗ S? cp ( S? ‗,‘ S? cp )* S? ‗)‘


Proper Group/PE Nesting

Where each Name is the kind of an element which may show up as a child.
Because, any content-particle in a choice-list may show up in the element-
content at the area where the choice list shows up in the grammar; content
particles happening in a succession list MUST each show up in the element-
content in the order given in the list. The discretionary character following a
name or list administers whether the element or the content particles in the
list may happen at least one or more (+), at least zero or more (*), or zero or
one times (?). The absence of such an operator implies that the element or
content particle MUST show up precisely once. This syntax and meaning are
indistinguishable from those utilized in the productions in this specification.
The content of an element coordinates a content-model if and just on the off
chance that it is conceivable to follow-out a way through the content-model,
complying with the sequence, decision, and repetition operators and
marching every element in the content against an element-type in the content-
model. For similarity, it is a mistake if the content-model permits an element
to match more than one occurrence of an element-type in the content-model.

XML Attributes
Generally, an attribute determines a solitary property for the element, using a
value pair. An XML element can have at least one or more attributes. For
instance −

<a href = "http://www.greatlearning.com/">Greatlearning!</a>


Here href is the quality name and http://www.greatlearning.com/ is attribute
value.

Normally, attribute names are portrayed without quotes. In the same vein,
attribute values ought to reliably appear in the quotes. The following example
displays invalid xml linguistic-structure

<a b = x>....</a>
In the above accentuation, the property assessment isn‘t portrayed in quotes.

Talking about XML attributes, let us see the sentence structure of properties.
An underlying-tag in XML can have credits, the traits are name and worth
sets.

Check this out!

 The trait-names are case-sensitive and shouldn‘t be in quotes.


 The trait-esteems must be in at one or double reference.

<text grouping = "message">You have a test tomorrow at 8:00 AM</text>


Here grouping is the quality name and message is the property assessment.

Let us take hardly additional guides for see authentic and invalid cases of
qualities.

A tag can at least contain or more name and worth sets, at any rate two
property names cannot be same.

1. <text class = message>hello</text> – >wrong


2. <text ―class‖ = message>hello</text> – >wrong
3. <text class = ―message‖>hello</text> – >correct
4. <text class = ―message‖ reason = ―greet‖>hello</text> – >correct
5. <text class = ―message‖ classification =‖greet‖>hello</text> –
>wrong
XML Attributes Syntax Rules

 Unlike HTML, attribute names in XML are case-sensitive, i.e.


HREF and href are viewed as two distinctive XML attributes.
 In syntax, two values can‘t have the same attributes. The
accompanying example shows invalid syntax in light of the fact that
the attribute b is indicated twice.

<a b = "x" c = "y" b = "z">....</a>


Attribute names are characterized without quotes, while quality qualities
should consistently show up in quotes. Following model exhibits wrong xml
linguistic structure

<a b = x>....</a>
In the above punctuation, the property estimation isn‘t characterized in
quotes.

Rule: Attribute should always be quoted

It is not proper to discard quotes marks attribute values. Additionally, XML


elements should have attributes in name/value pairs: in any case, the
attribute-value should consistently be quoted.

Invalid syntax:
<?xml version= ―1.0‖ encoding=―ISO-8859-1‖?>
<note date=02/02/02>
<to>Deepak</to>
<from>Spoorthi</from>
</note>
Valid syntax:
<?xml version= ―1.0‖ encoding=―ISO-8859-1‖?>
<note date=‖02/02/02‖>
<to>Deepak</to>
<from>Spoorthi</from>
</note>
It will make a wrong document; the date attribute in the note isn‘t quoted.

Declarations of Attribute-List

Attributes can be used to relate name-value pairs with elements.


Specifications of attribute mustn‘t appear outside of start tags and empty
tags; consequently, the productions used to remember start tags, end tags, and
empty element tags.

Attribute list declaration

• To characterize the set of attributes relating to a given element-type.

• To set up type constraints for these attributes.


• To give default esteems to attributes.

Attribute list declarations determine the name, data-type, and default-value


(if any) of each attribute related with a given element-type.

Attribute List Declaration Example

 AttlistDecl ::= ‗<!ATTLIST‘ S Name AttDef* S? ‗>‘


 AttDef ::= S Name S AttType S DefaultDecl
The Name in the AttlistDecl rule is the kind of an element. At user-choice, an
XML-processor may give a warning if attributes are declared for an element-
type not itself declared, but rather this isn‘t a blunder. The Name in the
AttDef rule is the name of the attribute.

 AttlistDecl ::= ‗<!ATTLIST‘ S Name AttDef* S? ‗>‘


 AttDef ::= S Name S AttType S DefaultDecl
Assume that, when at least one or more AttlistDecl is provided with a given
element-type, the contents of each element type provided will be merged.
Again, when at least one or more definition is provided for a similar attribute
of a given element-type, the first-declaration is mandatory and the
subsequent declaration is disregarded. For flexibility, the coders of DTDs can
decided to give at-most one attribute list declaration for a given attribute-
name, at-most one attribute definition for a given attribute-name in an
declaration of attribute-list, plus at-least one attribute definition in every
attribute-list declaration. More so, for flexibility, an XML-processor may at
user choice issue a cautioning when more-than one attribute-list declaration
is provided for a given element-type, or one or more attribute-definition is
provided for a given attribute, yet this isn‘t a blunder.

Types of Attributes

XML attribute types are of three kinds: a string type, a set of tokenized types,
and enumerated types. The string type may take any literal string as a value;
the tokenized types are more constrained. The validity constraints noted in
the grammar are applied after the attribute value has been normalized as
described in 3.3.3 Attribute-Value Normalization.

We have three types of XML attributes namely:

1. String type: The string-type may accept any literal-string as a value.


2. Set of tokenized type: This particular type of attribute is more
obliged, however, constrained. The validity obligations noted in
the grammar are applied after the attribute-value has been
standardized.
3. Enumerated types:
AttType ::= StringType | TokenizedType | EnumeratedType

StringType ::= ‗CDATA‘

TokenizedType ::= ‗ID‘

Attribute-declaration gives data on whether the attribute‘s essence


#REQUIRED, and if not, how an XML-processor is to respond once an
attribute declared is missing in a document.

Attribute Defaults

 DefaultDecl ::= ‗#REQUIRED‘ | ‗#IMPLIED‘


In an attribute-declaration, #REQUIRED implies that the attribute must
consistently be given; #IMPLIED that no default value is given. [Note: If the
declaration is neither #REQUIRED nor #IMPLIED, at that point the
AttValue value contains the declared default-value; the #FIXED main-word
expresses that the attribute must consistently have the default value. At the
point when an XML processor experiences an element without a particular
for an attribute for which it has perused a default value-declaration, it should
report the attribute with the declared default and value to the application.

Value Normalization of Attribute

Right before a certain value of an attribute is moved to the application or


crosschecked for accuracy (validity), the XML-processor should normalize
the attribute value by applying the algorithm underneath, or by utilizing some
other technique with the end goal that the value passed to the application is
equivalent to that delivered by the algorithm.

1. All-line breaks should have been normalized on input to #xA.


2. Start with a normalized-value comprising of the unfilled (empty)
string.
3. Every character, entity-reference or character-reference in the un-
normalized attribute-value, starting with the first and preceding to
the last do the accompanying:
– For instance, character-reference, add the referred (referenced) character to
the normalized value.
– Again, for entity-reference, recursively apply stage 3 of this algorithm to
the substitution of the text of the entity.

– Also, for a white-space-character (#x20, #xD, #xA, #x9), append a space-


character (#x20) to the normalized-value.

 Finally, add the character to the normalized-value.


On the off chance that the attribute-type isn‘t CDATA, at that point the
XML-processor must farther deal with the normalized-attribute value by
disposing of any leading and trailing space (#x20) characters, and by
supplanting sequences of space (#x20) characters by a single-space (#x20)
character.

Take note: If the un-normalized attribute-value has a reference character to a


white-space character other-than space (#x20), the normalized-value has the
reference character itself (#xD, #xA or #x9). This differences with the
situation where the un-normalized value has a white-space character (not a
reference), which is supplanted with a space-character (#x20) in the
normalized-value and furthermore appears differently in relation to the
situation where the un-normalized-value has a entity-reference whose
substitution text has a white-character; being recursively processed, the
white-space character is supplanted with a space character (#x20) in the
normalized-value.

Eventually, all attributes for which no declaration has thoroughly been


perused must be treated by a non validating XML-processor as though
declared by CDATA. It is, however, a huge mistake if an attribute value has a
reference to an entity for which no declaration has been perused. Following
are instances of attribute normalization. Given the accompanying
declaration:

<!ENTITY d ―&#xD;‖>

<!ENTITY a ―&#xA;‖>

<!ENTITY da ―&#xD;&#xA;‖>

The attribute specifications in the left column beneath would be normalized


to the character sequences of the center column if the attribute a is declared
NMTOKENS and to those of the right columns if a is declared CDATA.

Attribute specification a is NMTOKENS a is CDATA


a=‖ xyz‖ xyz #x20 #x20 x y z

a=‖&d;&d;A&a;&#x20;&a;B&da;‖ A #x20 B #x20 #x20 A


#x20 #x20 #x20
B #x20 #x20

a=‖&#xd;&#xd;A&#xa;&#xa;B&#xd;&#xa;‖ #xD #xD A #xA #xA B #xD #xD A #xA


#xD #xA #xA B #xD #xA

Another thing to notice is: The previous example isn‘t correct/invalid (but
rather well-formed), if a is declared to remain type of NMTOKENS.

Special Attributes

An element-tag may show extra-properties for its contents. For instance,


xml:space is utilized to show if white-space is critical. When all is said in
done, it is accepted that all white-space outside of the tag-structure is
critical.

Another special attributes is xml:lang which can be utilized to show the


language of the content. For instance:

<p xml:lang=‖en‖>I do not speak</p> Hindi

<p xml:lang=‖es‖>Main nahin bolata</p> Hindi

Attributes must have quoted values

There are two rules for attributes in XML documents:

• Attributes MUST CONTAIN values.

• Those values MUST BE enclosed within quotation marks.

Compare the two examples below. The markup at the top is legal in HTML,
but not in XML. To do the equivalent in XML, you have to give the attribute
a value, and you have to enclose it in quotes. Look at the two examples
beneath. The mark-up at the top is valid in HTML, yet not in XML. To do the
identical in XML, you need to give the attribute a value, and you need to
encase it in ―quotes‖.
1. <!– NOT legal XML markup –> Example 1
2. <ol compact>
3. <!– legal XML markup –> Example 2
4. <ol compact=‖yes‖>
You can utilize either single or double quotes, similarly insofar as you‘re
consciously steady. In the event that the attribute has a single or double
quote, you could utilize the other sort of quote to encompass the value (as in
name=‖Deepak‘s vehicle‖), or utilize the elements &quot; for a double quote
and &apos; for a single-quote. An entity is a symbol, for example, &quot;,
that the XML parser replaces with other text, for example, ―.

We might not have fully covered in details the concept of DTDs and how it
works, yet there‘s one more essential topic to cover here: Defining attributes.
You can characterize attributes for the elements that will show up in your
XML-document. Using an DTD, you can likewise:

• Interpret which of the attributes are required.

• Interpret default values for attributes.

• List the entirety of the valid values for a given attribute.

Assume that you need to change the DTD to make state an attribute of the
<city> element. Here‘s the means by which to do that:

2 <!ELEMENT city (#PCDATA)>

<!ATTLIST city state CDATA #REQUIRED>

This characterizes the <city> element as in the past, yet the reviewed
example additionally utilizes an ATTLIST declaration to list the attributes of
the elements. The name city inside the attribute-list tells the parser that these
attributes are characterized for the <city> element. The name-state is the
name of the attribute, and the watchwords CDATA and #REQUIRED tell the
parser that the state attribute contains text and is required (if it‘s
discretionary, CDATA #IMPLIED will work).

To characterize various attributes for an element, compose the ATTLIST like


this:
<!ELEMENT city (#PCDATA)>

<!ATTLIST city state CDATA #REQUIRED

postal-code CDATA #REQUIRED>

The above example characterizes both state and postal-code as attributes of


the <city> element.

At last, DTDs permit you to characterize default values for attributes and
identify the entirety of the correct values for an attribute:

<!ELEMENT city (#PCDATA)>

<!ATTLIST city state CDATA (AZ|CA|NV|OR|UT|WA) ―CA‖>

To cap it all, the example here demonstrates that it just backings addresses
from the conditions of Arizona (AZ), California (CA), Nevada (NV), Oregon
(OR), Utah (UT), and Washington (WA), and that the default state is
California. Consequently, you can do a restricted type of data-validation.
While this is a valuable function, it‘s a little subset of how you can deal with
XML-schemas.

XML Comments
Comments may show up anyplace in a document outside other mark-up;
moreover, they may show up inside the document-type declaration at places
permitted by the grammar. They‘re not part of the document‘s character
data; an XM- processor may, yet needn‘t, make it feasible for an application
to recover the text of comments. For similarity, the string ‖ – ‖ (double-
hyphen) mustn’t happen inside comments.] Parameter substance
references mustn’t be perceived inside comments.

Comment ::= ‗<!–‗ ((Char – ‗-‗) | (‗-‗ (Char – ‗-‗)))* ‗–>‘

This is another means by which a comment should look-like in XML-


document.

<!– This is just a comment –>

A case for a comment:


 <!– declarations for <head> & <body> –>
 Note that the grammar doesn‘t permit a comment ending-in — >.
The accompanying example isn‘t well framed.
– <!– B+, B, or B—>

Comments can show up anyplace in the document; they can even show up
before or after the root element. A comment starts with <!- – and closes with
– >. A comment cannot contain a double hyphen ( — ) aside from toward the
end; with that special case, a comment can contain anything. Above all, any
mark-up inside a comment is overlooked; only if you need to eliminate a
huge section of a XML-document, essentially enclose that section by a
comment. (To reestablish the commented-out section, essentially eliminate
the comment tags.)

Here comes a mark-up that contains a remark:

2 <!– Here‘s a PI for Cocoon: –>

<?cocoon-process type=‖sql‖?>

XML Character Entities


Entities

Anyplace the XML processor finds the string &dw;, it replaces the entity
with the string developerWorks. The XML-spec additionally characterizes
five entities you can use instead of different special characters.

An entity reference must not contain the name of an unparsed entity.


Unparsed entities maybe referred to just in attribute values declared to be of
type entity or entities.

The entities are:

• &lt; for the less than sign

• &gt; for the greater than sign

• &quot; for a double-quote


• &apos; for a single quote (or apostrophe)

• &amp; for an ampersand.

Character and Entity References

A character reference alludes to a particular character in the ISO/IEC 10646


character set, for instance one not straightforwardly open from accessible
info devices.

Character Reference

CharRef ::= ‗&#‘ [0-9]+ ‗;‘

| ‗&#x‘ [0-9a-fA-F]+ ‗;‘

Well-formed-ness limitation: Legal Character

Characters alluded to utilizing character references MUST match the


production for Char.

On the off chance that the character reference starts with ‖ &#x ―, the digits
and letters up to the ending ; give a hexadecimal representation of the
character‘s code point in ISO/IEC 10646. Again, if it starts just with ‖ &# ―,
the digits up to the ending ; give a decimal representation of the character‘s
code point.

Entity reference: An entity reference alludes to the content of a named


entity. References to parsed general elements use ampersand (and) and
semicolon (;) as delimiters. Parameter entity references use percent-sign (%)
and semicolon (;) as delimiters.

Entity Reference

Reference ::= EntityRef | CharRef

EntityRef ::= '&' Name ';'

[WFC: Entity Declared]


[VC: Entity Declared]

[WFC: Parsed Entity]

[WFC: No Recursion]

PEReference ::= '%' Name ';'

[VC: Entity Declared]

[WFC: No Recursion]

[WFC: In DTD]

Case 1: Character and entity references example

Type <key>less-than</key> (<) to save options.

This document was prepared on &docdate; and is classified &security-level;.

Case 2: Parameter-entity reference example

<!-- declare the parameter entity "ISOLat2"... -->

<!ENTITY % ISOLat2

SYSTEM "http://www.xml.com/iso/isolat2-xml.entities" >

<!-- ... now reference it. -->

%ISOLat2;

CDATA Sections
CDATA Sections: CDATA sections may happen anyplace where character-
data may happen; they are utilized to get away from blocks of text containing
characters which would somehow or be perceived as mark-up. The sections
of CDATA start with the string ‖ <![CDATA[ ‖ and end with the string ‖ ]]>
―:].

CDATA Sections:

1. CDSect ::= CDStart CData CDEnd


2. CDStart ::= ‗<![CDATA[‗
3. CData ::= (Char* – (Char* ‗]]>‘ Char*))
4. CDEnd ::= ‗]]>‘
Within a CDATA section, only the CDEnd string is recognized as markup, so
that left angle brackets and ampersands may occur in their literal form; they
need not (and cannot) be escaped using ‖ &lt; ‖ and ‖ &amp; ―. CDATA
sections cannot nest.

Inside a CDATA-section, just the CDEnd string is perceived as mark-up, so


that left angle brackets and ampersands may happen in their exacting form;
they needn‘t (and can‘t) be avoided utilizing ‖ &lt; ‖ and ‖ &amp; ―. CDATA
sections can‘t nest.

Consider an example of a CDATA sections, ‖ <greeting> ‖ and ‖ </greeting>


‖ are perceived as character-data, not mark-up:

<![CDATA[<greeting>Hello, world!</greeting>]]>

The CDATASection object

The CDATASection object represents a CDATA-segment in a document. A


CDATA-section contains text that won’t be parsed by a parser. Tags within a
CDATA-section won’t be treated as mark-up and elements won‘t be
extended. The basic role is for including material, for example, XML-
fragments, without expecting to get away from all the delimiters.

The main delimiter that is perceived in a CDATA area is ―]]>‖ – which


demonstrates the finish of the CDATA section. CDATA areas can‘t be
nested.

Processing XML
The processing instructions start with <? and, end with ?>. Processing
instructions are guidelines for the XML-processor. Processing instructions
aren‘t incorporated with the XML-recommendation. Or maybe, they‘re
processor-dependant so not all processors see all processing instructions. Our
example is a typical processing-instruction that numerous processors
understand. The instructions to the processor are to utilize an external style-
sheet.

Processing Instructions: Processing directions (PIs) permit documents to


contain instructions for applications.

Processing Instructions Example

 PI ::= ‗<?‘ PITarget (S (Char* – (Char* ‗?>‘ Char*)))? ‗?>‘

 PITarget ::= Name – ((‗X‘ | ‗x‘) (‗M‘ | ‗m‘) (‗L‘ | ‗l‘))


Processing instructions (PIs) aren‘t part of the document‘s character-data, but
rather must be gone through to the application. The PI starts with a target
(PITarget) used to recognize the application to which the instruction is
directed. The target names ‖ XML ―, ‖ xml ―,, etc are saved for
standardization in this or future versions of this specifications. The XML
Notation mechanism maybe utilized for formal declaration of PI targets.
Parameter entity references mustn’t be perceived inside processing
instructions (PIs).

White Space XML


White-space is essentially clear/blank space made via carriage returns, line
feeds, tabs, or potentially spaces. White-space doesn‘t influence the
processing of the document, so you can decide to incorporate white-space or
not. Actually, the XML recommendation determines that XML-documents
utilize the UNIX convention for line endings. This implies that you should
utilize a linefeed character just (ASCII code 10) to indicate the end of a line.

Discussing white-space, there‘s an special attribute (xml:whitespace) that


you can use to preserve white-space inside your elements (however we won‘t
fret about that a few seconds ago).

White-spaces are preserved in XML

Dissimilar to HTML that doesn‘t preserve white-space, the XML-document


preserves white-spaces.

White Space Handling


In altering XML-documents, it is frequently advantageous to utilize ―white-
space‖ (spaces, tabs, and blank lines) to separate the mark-up for more
prominent readability. Such white-space is ordinarily not proposed for
inclusion in the delivered version of the document. Then again, ―huge‖
white-space that ought to be preserved in the delivered form is normal, for
instance in poetry and source code.

An XML-processor must consistently pass all characters in a document that


aren‘t mark-up through to the application. A validating XML-
processor must likewise inform the application which of these characters
constitutes white-space appearing in element content. An exceptional
attribute named xml:space maybe joined to an element to single an
expectation that in that element, white-space ought to be preserved by
applications. In correct documents, this attribute, similar to some other, must
be declared if it‘s used. At the point when declared, it must be given as a
counted type whose values are either of ―default‖ and ―preserve‖.

For instance:

<!ATTLIST poem xml:space (default|preserve) ‗preserve‘>

<!ATTLIST pre xml:space (preserve) #FIXED ‗preserve‘>

The value ―default‖ signals that applications‘ default white-space processing-


modes are worthy for this element; the value ―preserve‖ shows the purpose
that applications preserve all the white-space. This declared goal is
considered to apply to all elements inside the content of the element where it
is specified, except if superseded with another example of the xml:space
property. This determination doesn‘t offer significance to any estimation of
xml:space other than ―default‖ and ―preserve‖. It is a blunder for different
specification to be specified; the XML-processor may report the mistake
or may recoup by overlooking the attribute specification or by reporting the
(mistaken) value to the application. Wrong values may be overlooked or
rejected by application.

Encoding XML
Encoding is the way toward converting unicode characters into their identical
binary representation. At the point when the XML-processor peruses a XML-
document, it encodes the document contingent upon the type of encoding.
Consequently, we have to indicate the type of encoding in the XML
declaration.
Types of encoding

There are essentially two types of encoding:

1. UTF-8
2. UTF-16
UTF represents UCS Transformation Format, and UCS itself implies
Universal Character Set. The number 8 or 16 alludes to the number of bits
used to represent a character. They are either 8(1 to 4 bytes) or 16(2 or 4
bytes). For the documents without encoding data, UTF-8 is set by default.

Validation in XML
Validation is defined as a process by which an XML-document is validated.
An XML-document is said to be valid if its contents coordinate with the
elements, attributes and related-document type declaration (DTD), and if the
document conforms to the limitations expressed in it. Validation is managed
in two different ways by the XML parser.

They are:

1. Well-formed XML document


2. Valid XML document
Well-formed XML Document: An XML document is supposed to be well-
formed in the event that it clings to the accompanying guidelines:

 Non-DTD XML files must utilize the predefined character entities


for amp(&), apos(single quote), gt(>), lt(<), quot(double quote).
 It also must follow the ordering of the tag. i.e., the internal tag must
be encased prior to the shutting the external tag.
Every one of its starting tag must‘ve an end tag or it must be a self-ending
tag. (<title>….</title> or <title/>).

It must‘ve just one attribute in a start-tag, which should be quoted.

amp(&), apos(single quote), gt(>), lt(<), quot(double quote) entities other


than these must be declared.

Example:
1
2 Following is a case of a well-formed XML-document:
3 <?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
4 <!DOCTYPE address
5 [
<!ELEMENT address (name,company,phone)>
6 <!ELEMENT name (#PCDATA)>
7 <!ELEMENT company (#PCDATA)>
8 <!ELEMENT phone (#PCDATA)>
]>
9 <address>
10 <name>Deepak Kumar</name>
11 <company>GreatLearning</company>
12 <phone>91 123-4567</phone>
</address>
13
14
The above example is said to be well-formed as –

 It characterizes the type of document. Here, the document-type is an


element-type.
 It incorporates a root-element named as address.
Every one of the kid elements among name, company and phone is encased
in its self simple-tag.

Maintained is the order of the tags.

Valid XML Document:

In the event that an XML document is well-formed and has a


related Document Type Declaration (DTD), at that point it is supposed to
be a valid XML document.

XML Namespaces
In XML, the names of the tags used are defined by the developer. While
mixing the XML documents from different XML applications, this naming
might result in conflicts. So, XML namespaces provide a method to avoid
this issue of element name conflicts.

Name Conflict Example:

The following XML code carries information of HTML table:

<table>
<tr>

<td>Table</td>

<td>Chair</td>

</tr>

</table>
The following XML code carries the information about a table (Shape):

<table>

<name>Rectangle</name>

<length>100</length>

<width>60</width>

</table>
If the above XML code fragments were to be added together, it would result
in a name conflict as both contain an element , but the content and meaning
of both the elements are different.
An XML application or a user will not be able to know how to handle such
differences.

Using Prefix to Solve Name Conflict

Name prefix can be used in XML to avoid name conflicts.

The following code carries the data of both HTML Table and Shape Table:

<t:table>

<t:tr>

<t:td>Table</t:td>

<t:td>Chair</t:td>

</t:tr>
</t:table>

<s:table>

<s:name>Rectangle</s:name>

<s:length>100</s:length>

<s:width>60</s:width>

</s:table>
The example given above will have no conflict as both the <table> elements
have different names.

XML Parser
The XML parser is a package or a software library that provides an interface
for the applications of clients to work with XML documents. It may validate
the XML documents and checks for a proper format for the XML document.
Programs use XML with the help of an XML parser.

Types of parsers:

1. DOM Parser
2. SAX Parser
3. JDOM Parser
4. StAX Parser
5. XPath Parser
6. DOM4J Parser
DOM Parser

The Document Object Model (DOM) parser loads the document‘s complete
contents and creates its entire hierarchical tree in the memory to parse a
document. DOM parser is officially recommended by the World Wide Web
Consortium (W3C).

Make use of a DOM parser when :


 A lot of information regarding the structure of a document is
required.
 Movement of parts of an XML document is required.
 Data in an XML document is to be used more than once
Advantages:

 API is simple to use.


 DOM Parser supports both read and write operations.
 When random access to widely separated parts of a document is
required, DOM Parser is preferred.
Disadvantages:

 As the whole XML document requires to be loaded into memory,


DOM Parser consumes excess memory; hence, it is memory
efficient.
 It is slower in comparison to other parsers.
SAX Parser

Simple API for XML (SAX) does not load the complete document in the
memory; instead, it parses the document on event-based triggers. No parse
trees are created by SAX Parser. SAX is a streaming interface for XML, i.e.
that when the XML document being processed an element and attribute,
applications using SAX receive event notifications at a time in chronological
order, starting from the beginning of the XML document and ending with the
closing of the ROOT element.

 SAX Parser recognizes the tokens that make up a well-formed XML


document by reading the XML document from top to bottom.
 The way the tokens appear in the document, they get processed in
that exact order.
 An ―event‖ handler is provided by the application program that must
be registered with the parser.
 Callback methods in the handler are invoked as the tokens are
identified with the relevant information.
Use a SAX Parser when:

 The XML document is not deeply nested.


 The XML document can be processed linearly from top to down.
 A massive XML document is being processed whose DOM tree
would be consuming too much memory. (Ten bytes of memory is
used to represent one byte of XML while implementing DOM.)
 Only a part of the XML document is involved while solving the
problem.
 An XML document arrives over a stream as the data is available as
soon as the parser sees it.
Advantages:

 SAX Parser is simple to use and memory efficient.


 It works well for huge documents.
 It works very fast.
Disadvantages:

 Its API is less intuitive as it is event-based.


 As the data is broken into pieces, the client never knows the
complete information.
You need to write the code and store the data on your own to keep
track of data the parser has seen or change the items‘ order.
JDOM Parser

JDOM Parser is a Java developer-friendly API, Java-optimised and uses Java


collections like Lists and Arrays. It works along with DOM and SAX APIs,
combining the best of the two. It uses less memory and is as fast as SAX.

StAX Parser

It parses in the same way as the SAX parser but in a more efficient manner.

XPath Parser

It parses an XML document based on the expression. It is extensively used in


conjunction with XSLT.

DOM4J Parser

It is a java library that uses Java Collections Framework to parse XML,


XPath and XSLT. DOM4J parser also provides support for DOM, SAX and
JAXP
Text String Parsing:

<!DOCTYPE html>

<html>

<body>

<p id="example"></p>

<script>

var text, parser, xmlDoc;

<!--define text string-->

text = "<mall><shop>" +

"<name>Everyday Items</name>" +

"<item>bucket</item>" +

"<price>50</price>" +

"</shop></mall>";

<--create XML DOM parser-->

parser = new DOMParser();

<--parser creates a new XML DOM object using the text string-->
xmlDoc = parser.parseFromString(text,"text/xml");

document.getElementById("example").innerHTML =

xmlDoc.getElementsByTagName("name")[0].childNodes[0].nodeValue;

</script>

</body>

</html>

XML DTD
Document Type Definition (DTD) defines the legal attributes and elements
along with the structure of an XML document. An XML document is well-
informed if the syntax is correct, but the XML Document that gets validated
against a DTD is both well-informed and valid.

Valid XML Documents


A valid XML document is not only well-informed but also conforms to the
rules of a DTD.
Example:

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE note SYSTEM "Note.dtd">

<note>

<to>Chanchal</to>

<from>Harshit</from>

<heading>Message</heading>

<body>Hi! How are you doing?</body>


</note>
The DOCTYPE declared above contains a reference to the DTD file whose
content has been shown and explained below.

XML DTD

Note.dtd:

<!DOCTYPE note <!—defines the element of the document as note-->

<!ELEMENT note (to,from,heading,body)> <!—defines note element must


contain the elements - to, from, heading and body-->

<!ELEMENT to (#PCDATA)> <!—defines to element of type ‗#PCDATA‘-


->

<!ELEMENT from (#PCDATA)> <!—defines from element of type


‗#PCDATA‘-->

<!ELEMENT heading (#PCDATA)> <!—defines heading element of type


‗#PCDATA‘-->

<!ELEMENT body (#PCDATA)> <!—defines body element of type


‗#PCDATA‘-->

]>

XML Schema
XML Schema, also known as XML Schema Definition (XSD), is used to
describe and validate the structure and content of XML data. It defines
attributes, elements and data types. It is similar to DTD but provides more
control over the XML structure.

Having the correct syntax makes an XML document well-informed. Being


validated against schema means that the XML document is both well-
informed and valid.

XML Schema as an alternative to DTD:


<xs:element name="note"> <!--defines the element ―note‖-->

<xs:complexType> <!--element note is a complex type-->

<xs:sequence> <!--complex type is a sequence of elements-->

<xs:element name="to" type="xs:string"/> <!--element ―to‖ is of type


string (text)-->

<xs:element name="from" type="xs:string"/> <!--element ―from‖ is of


type string-->

<xs:element name="heading" type="xs:string"/><!--element ―heading‖ is


of type string-->

<xs:element name="body" type="xs:string"/><!--element ―body‖ is of


type string-->

</xs:sequence>

</xs:complexType>

</xs:element>
XML Schema Data Types

XML schemas have two types of data types:

1. simpleType: It allows you to have text-based elements. It cannot be


left empty and contains fewer attributes and child elements.
2. complexType: You are allowed to hold multiple elements and
attributes in complexType. It can be left empty and can have
additional sub-elements.
Why are XML Schemas more potent than DTD?

 XML Schemas are written in XML.


 XML Schemas are extendible to additions.
 Data Types are supported by XML Schemas.
 Namespaces are supported by XML Schemas.

XML DOM
The Document Object Model (DOM) is XML‘s foundation. XML documents
contain a hierarchy of informative units known as nodes; DOM defines those
nodes and their relationships.

A DOM document is a hierarchical collection of nodes or pieces of


information. This hierarchy enables a developer to search the tree for specific
information. The DOM is said to be tree-based since it is based on a
hierarchy of information.

The XML DOM, on the other hand, includes an API that allows a developer
to add, modify, move, or remove nodes in the tree at any time throughout the
development process.

Example of XML DOM

The sample.html example parses an XML document (―address.xml‖) into an


XML DOM object and then extracts some information from it using
JavaScript.

Contents of sample.html

<!DOCTYPE html>
1 <html>
2 <body>
3 <h1>Example for DOM </h1>
4 <div>
<b>Name:</b> <span id = "name"></span><br>
5 <b>Company:</b> <span id = "company"></span><br>
6 <b>Phone:</b> <span id = "phone"></span>
7 </div>
<script>
8 if (window.XMLHttpRequest)
9 {// code for IE7+, Firefox, Chrome, Opera, Safari search engines
10 xmlhttp = new XMLHttpRequest();
11 }
else
12 {// code for IE6, IE5 search engines
13 xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
14 }
15 xmlhttp.open("GET","/xml/address.xml",false); // used to fetch data
xmlhttp.send();
16 xmlDoc = xmlhttp.responseXML;
17
18 document.getElementById("name").innerHTML=
xmlDoc.getElementsByTagName("name")[0].childNodes[0].nodeValue;
19 document.getElementById("company").innerHTML=
20
21 xmlDoc.getElementsByTagName("company")[0].childNodes[0].nodeValue;
22 document.getElementById("phone").innerHTML=
23 xmlDoc.getElementsByTagName("phone")[0].childNodes[0].nodeValue;
</script>
24 </body>
25 </html>
26
27
28
29
30
31
32
Content in address, the XML file is as follows:

1
<?xml version = "1.0"?>
2 <contact-info>
3 <name>Karuna </name>
4 <company>Cerner</company>
<phone>(91)8364682929</phone>
5 </contact-info>
6
Now, retain these two files, sample.html and address.xml, in the same
directory /XML and run the sample.html file in any browser. This should
result in the output shown below.

Output:

Example for DOM


Name: Karuna
Company: Cerner
Phone: (91) 8364682929
What is AJAX?
AJAX stands for Asynchronous JavaScript and XML. AJAX is a new technique
for creating better, faster, and more interactive web applications with the help of
XML, HTML, CSS, and Java Script.
 Ajax uses XHTML for content, CSS for presentation, along with
Document Object Model and JavaScript for dynamic content display.
 Conventional web applications transmit information to and from the sever
using synchronous requests. It means you fill out a form, hit submit, and
get directed to a new page with new information from the server.
 With AJAX, when you hit submit, JavaScript will make a request to the
server, interpret the results, and update the current screen. In the purest
sense, the user would never know that anything was even transmitted to
the server.
 XML is commonly used as the format for receiving server data, although
any format, including plain text, can be used.
 AJAX is a web browser technology independent of web server software.
 A user can continue to use the application while the client program
requests information from the server in the background.
 Intuitive and natural user interaction. Clicking is not required, mouse
movement is a sufficient event trigger.
 Data-driven as opposed to page-driven.

Rich Internet Application Technology


AJAX is the most viable Rich Internet Application (RIA) technology so far. It is
getting tremendous industry momentum and several tool kit and frameworks are
emerging. But at the same time, AJAX has browser incompatibility and it is
supported by JavaScript, which is hard to maintain and debug.

AJAX is Based on Open Standards


AJAX is based on the following open standards −
 Browser-based presentation using HTML and Cascading Style Sheets (CSS).
 Data is stored in XML format and fetched from the server.
 Behind-the-scenes data fetches using XMLHttpRequest objects in the browser.
 JavaScript to make everything happen.
AJAX - Technologies
AJAX cannot work independently. It is used in combination with other
technologies to create interactive webpages.

JavaScript
 Loosely typed scripting language.
 JavaScript function is called when an event occurs in a page.
 Glue for the whole AJAX operation.

DOM
 API for accessing and manipulating structured documents.
 Represents the structure of XML and HTML documents.

CSS
 Allows for a clear separation of the presentation style from the content and may be
changed programmatically by JavaScript

XMLHttpRequest
 JavaScript object that performs asynchronous interaction with the server.
AJAX - Examples
Here is a list of some famous web applications that make use of AJAX.

Google Maps
A user can drag an entire map by using the mouse, rather than clicking on a
button.
 https://maps.google.com/

Google Suggest
As you type, Google offers suggestions. Use the arrow keys to navigate the
results.
 https://www.google.com/webhp?complete=1&hl=en

Gmail
Gmail is a webmail built on the idea that emails can be more intuitive, efficient,
and useful.
 https://gmail.com/

Yahoo Maps (new)


Now it's even easier and more fun to get where you're going!
 https://maps.yahoo.com/

Difference between AJAX and Conventional CGI Program


Try these two examples one by one and you will feel the difference. While
trying AJAX example, there is no discontinuity and you get the response very
quickly, but when you try the standard GCI example, you would have to wait
for the response and your page also gets refreshed.
AJAX Example
AJAX
* =

Standard Example
Standard
* =

NOTE − We have given a more complex example in AJAX Database.

AJAX - Browser Support


All the available browsers cannot support AJAX. Here is a list of major
browsers that support AJAX.
 Mozilla Firefox 1.0 and above.
 Netscape version 7.1 and above.
 Apple Safari 1.2 and above.
 Microsoft Internet Explorer 5 and above.
 Konqueror.
 Opera 7.6 and above.
When you write your next application, do consider the browsers that do not
support AJAX.
NOTE − When we say that a browser does not support AJAX, it simply means
that the browser does not support the creation of Javascript object –
XMLHttpRequest object.

Writing Browser Specific Code


The simplest way to make your source code compatible with a browser is to
use try...catch blocks in your JavaScript.
<html>
<body>
<script language = "javascript" type = "text/javascript">
<!--
//Browser Support Code
function ajaxFunction() {
var ajaxRequest; // The variable that makes Ajax possible!

try {
// Opera 8.0+, Firefox, Safari
ajaxRequest = new XMLHttpRequest();
} catch (e) {

// Internet Explorer Browsers


try {
ajaxRequest = new ActiveXObject("Msxml2.XMLHTTP");
} catch (e) {

try {
ajaxRequest = new ActiveXObject("Microsoft.XMLHTTP");
} catch (e) {

// Something went wrong


alert("Your browser broke!");
return false;
}
}
}
}
//-->
</script>

<form name = 'myForm'>


Name: <input type = 'text' name = 'username' /> <br />
Time: <input type = 'text' name = 'time' />
</form>

</body>
</html>

In the above JavaScript code, we try three times to make our XMLHttpRequest
object. Our first attempt −
 ajaxRequest = new XMLHttpRequest();
It is for Opera 8.0+, Firefox, and Safari browsers. If it fails, we try two more
times to make the correct object for an Internet Explorer browser with −
 ajaxRequest = new ActiveXObject("Msxml2.XMLHTTP");
 ajaxRequest = new ActiveXObject("Microsoft.XMLHTTP");
If it doesn't work, then we can use a very outdated browser that doesn't support
XMLHttpRequest, which also means it doesn't support AJAX.
Most likely though, our variable ajaxRequest will now be set to
whatever XMLHttpRequest standard the browser uses and we can start sending
data to the server. The step-wise AJAX workflow is explained in the next
chapter.

AJAX - Action
This chapter gives you a clear picture of the exact steps of AJAX operation.

Steps of AJAX Operation


 A client event occurs.
 An XMLHttpRequest object is created.
 The XMLHttpRequest object is configured.
 The XMLHttpRequest object makes an asynchronous request to the Webserver.
 The Webserver returns the result containing XML document.
 The XMLHttpRequest object calls the callback() function and processes the result.
 The HTML DOM is updated.
Let us take these steps one by one.

A Client Event Occurs


 A JavaScript function is called as the result of an event.
 Example − validateUserId() JavaScript function is mapped as an event
handler to an onkeyup event on input form field whose id is set to "userid"
 <input type = "text" size = "20" id = "userid" name = "id" onkeyup =
"validateUserId();">.

The XMLHttpRequest Object is Created


var ajaxRequest; // The variable that makes Ajax possible!
function ajaxFunction() {
try {
// Opera 8.0+, Firefox, Safari
ajaxRequest = new XMLHttpRequest();
} catch (e) {

// Internet Explorer Browsers


try {
ajaxRequest = new ActiveXObject("Msxml2.XMLHTTP");
} catch (e) {

try {
ajaxRequest = new ActiveXObject("Microsoft.XMLHTTP");
} catch (e) {

// Something went wrong


alert("Your browser broke!");
return false;
}
}
}
}

The XMLHttpRequest Object is Configured


In this step, we will write a function that will be triggered by the client event
and a callback function processRequest() will be registered.
function validateUserId() {
ajaxFunction();

// Here processRequest() is the callback function.


ajaxRequest.onreadystatechange = processRequest;

if (!target) target = document.getElementById("userid");


var url = "validate?id=" + escape(target.value);

ajaxRequest.open("GET", url, true);


ajaxRequest.send(null);
}

Making Asynchronous Request to the Webserver


Source code is available in the above piece of code. Code written in bold
typeface is responsible to make a request to the webserver. This is all being
done using the XMLHttpRequest object ajaxRequest.
function validateUserId() {
ajaxFunction();

// Here processRequest() is the callback function.


ajaxRequest.onreadystatechange = processRequest;

if (!target) target = document.getElementById("userid");


var url = "validate?id = " + escape(target.value);

ajaxRequest.open("GET", url, true);


ajaxRequest.send(null);
}

Assume you enter Zara in the userid box, then in the above request, the URL is
set to "validate?id = Zara".

Webserver Returns the Result Containing XML Document


You can implement your server-side script in any language, however its logic
should be as follows.
 Get a request from the client.
 Parse the input from the client.
 Do required processing.
 Send the output to the client.
If we assume that you are going to write a servlet, then here is the piece of code.
public void doGet(HttpServletRequest request,
HttpServletResponse response) throws IOException, ServletException {
String targetId = request.getParameter("id");

if ((targetId != null) && !accounts.containsKey(targetId.trim())) {


response.setContentType("text/xml");
response.setHeader("Cache-Control", "no-cache");
response.getWriter().write("<valid>true</valid>");
} else {
response.setContentType("text/xml");
response.setHeader("Cache-Control", "no-cache");
response.getWriter().write("<valid>false</valid>");
}
}

Callback Function processRequest() is Called


The XMLHttpRequest object was configured to call the processRequest()
function when there is a state change to the readyState of
the XMLHttpRequest object. Now this function will receive the result from the
server and will do the required processing. As in the following example, it sets a
variable message on true or false based on the returned value from the
Webserver.

function processRequest() {
if (req.readyState == 4) {
if (req.status == 200) {
var message = ...;
...
}

The HTML DOM is Updated


This is the final step and in this step, your HTML page will be updated. It
happens in the following way −
 JavaScript gets a reference to any element in a page using DOM API.
 The recommended way to gain a reference to an element is to call.
document.getElementById("userIdMessage"),
// where "userIdMessage" is the ID attribute
// of an element appearing in the HTML document
 JavaScript may now be used to modify the element's attributes; modify
the element's style properties; or add, remove, or modify the child
elements. Here is an example −
<script type = "text/javascript">
<!--
function setMessageUsingDOM(message) {
var userMessageElement = document.getElementById("userIdMessage");
var messageText;

if (message == "false") {
userMessageElement.style.color = "red";
messageText = "Invalid User Id";
} else {
userMessageElement.style.color = "green";
messageText = "Valid User Id";
}

var messageBody = document.createTextNode(messageText);

// if the messageBody element has been created simple


// replace it otherwise append the new element
if (userMessageElement.childNodes[0]) {
userMessageElement.replaceChild(messageBody, userMessageElement.childNodes[0]);
} else {
userMessageElement.appendChild(messageBody);
}
}
-->
</script>

<body>
<div id = "userIdMessage"><div>
</body>

If you have understood the above-mentioned seven steps, then you are almost
done with AJAX. In the next chapter, we will see XMLHttpRequest object in
more detail.
AJAX - XMLHttpRequest
The XMLHttpRequest object is the key to AJAX. It has been available ever
since Internet Explorer 5.5 was released in July 2000, but was not fully
discovered until AJAX and Web 2.0 in 2005 became popular.
XMLHttpRequest (XHR) is an API that can be used by JavaScript, JScript,
VBScript, and other web browser scripting languages to transfer and manipulate
XML data to and from a webserver using HTTP, establishing an independent
connection channel between a webpage's Client-Side and Server-Side.
The data returned from XMLHttpRequest calls will often be provided by back-
end databases. Besides XML, XMLHttpRequest can be used to fetch data in
other formats, e.g. JSON or even plain text.
You already have seen a couple of examples on how to create an
XMLHttpRequest object.
Listed below are some of the methods and properties that you have to get
familiar with.

XMLHttpRequest Methods
 abort()
Cancels the current request.
 getAllResponseHeaders()
Returns the complete set of HTTP headers as a string.
 getResponseHeader( headerName )
Returns the value of the specified HTTP header.
 open( method, URL )
 open( method, URL, async )
 open( method, URL, async, userName )
 open( method, URL, async, userName, password )
Specifies the method, URL, and other optional attributes of a request.
The method parameter can have a value of "GET", "POST", or "HEAD".
Other HTTP methods such as "PUT" and "DELETE" (primarily used in
REST applications) may be possible.
The "async" parameter specifies whether the request should be handled
asynchronously or not. "true" means that the script processing carries on
after the send() method without waiting for a response, and "false" means
that the script waits for a response before continuing script processing.
 send( content )
Sends the request.
 setRequestHeader( label, value )
Adds a label/value pair to the HTTP header to be sent.

XMLHttpRequest Properties
 onreadystatechange
An event handler for an event that fires at every state change.
 readyState
The readyState property defines the current state of the XMLHttpRequest
object.
The following table provides a list of the possible values for the
readyState property −
State Description

0 The request is not initialized.

1 The request has been set up.

2 The request has been sent.

3 The request is in process.

4 The request is completed.

readyState = 0 After you have created the XMLHttpRequest object, but before
you have called the open() method.
readyState = 1 After you have called the open() method, but before you have
called send().
readyState = 2 After you have called send().
readyState = 3 After the browser has established a communication with the
server, but before the server has completed the response.
readyState = 4 After the request has been completed, and the response data has
been completely received from the server.
 responseText
Returns the response as a string.
 responseXML
Returns the response as XML. This property returns an XML document
object, which can be examined and parsed using the W3C DOM node tree
methods and properties.
 status
Returns the status as a number (e.g., 404 for "Not Found" and 200 for
"OK").
 statusText
Returns the status as a string (e.g., "Not Found" or "OK").
Dojo Toolkit
Dojo Toolkit is an open-source modular JavaScript library designed to ease the rapid
development of cross-platform, JavaScript/Ajax-based applications and web sites.

The Dojo Toolkit is divided into several main packages that would constitute a full
distribution of Dojo Toolkit. Those main packages are:

 dojo - Sometimes referred to as the ―core‖, this is the main part of Dojo and the most
generally applicable packages and modules are contained in here. The core covers a
wide range of functionality like AJAX, DOM manipulation, class-type
programming, events, promises, data stores, drag-and-drop and internationalization
libraries.
 dijit - An extensive set of widgets (user interface components) and the underlying
system to support them. It is built fully on-top of the Dojo core.
 dojox - A collection of packages and modules that provide a vast array of
functionality that are built upon both the Dojo core and Dijit. Packages and
modules contained in DojoX will have varying degrees of maturity, denoted within
the README files of each package. Some of the modules are extremely mature
and some are highly experimental.
 util - Various tools that support the rest of the toolkit, like being able to build, test
and document code.

One of the long term objectives of the Dojo Toolkit is to continue to make packages more
vibrant and not necessarily require packages to exist within DojoX. Some of the packages
that are currently part of this community are:

 dgrid - A full featured, lightweight data grid.


 gridx - A fast rendering, modularized, plugin based grid.

You might also like