0% found this document useful (0 votes)
54 views10 pages

Unit 1

Uploaded by

bitestkaran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views10 pages

Unit 1

Uploaded by

bitestkaran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Unit – 1 : Introduction of XML

1.1 Characteristic and Use of XML

XML stands for Extensible Markup Language. It is a text-based markup language derived from
Standard Generalized Markup Language (SGML). Markup is information added to a document
each other. More specifically, a markup language is a set of symbols that can be placed in the
text of a document to demarcate and label the parts of that document.

XML tags identify the data and are used to store and organize the data, rather than specifying
how to display it like HTML tags, which are used to display the data. XML is not going to replace
HTML in the near future, but it introduces new possibilities by adopting many successful
features of HTML.

XML is not a replacement for HTML. XML is designed to be self-descriptive.

XML is designed to carry data, not to display data. XML tags are not predefined. You must
define your own tags. XML is platform independent and language independent. The main
benefit of xml is that user can use it to take data from a program like Microsoft SQL, convert it
into XML then share that XML with other programs and platforms. User can communicate
between two platforms which are generally very difficult.

The Difference between XML and HTML

XML and HTML were designed with different goals:

XML was designed to carry data - with focus on what data is HTML was designed to display
data - with focus on how data looks

XML tags are not predefined like HTML tags

XML documents form a tree structure that starts at "the root" and branches to "the leaves".

The main features or advantages of XML are

1. XML separates data from HTML

With XML, data can be stored in separate XML files. This way user can focus on using HTML/CSS
for display and layout, and be sure that changes in the underlying data will not require any
changes to the HTML.

With a few lines of JavaScript code, user can read an external XML file and update the data
content of your web page.

Vivekanand College for Advanced Computer and Information Science 1


Unit – 1 : Introduction of XML
2. XML simplifies data sharing

Computer systems and databases contain data in incompatible formats.

XML data is stored in plain text format. This provides a software- and hardware-independent
way of storing data.

This makes it much easier to create data that can be shared by different applications.

3. XML simplifies data transport

One of the most time-consuming challenges is to exchange data between incompatible systems
over the Internet.

Exchanging data as XML greatly reduces this complexity, since the data can be read by different
incompatible applications.

4. XML simplifies Platform change

Upgrading to new systems (hardware or software platforms), is always time consuming. Large
amounts of data must be converted and incompatible data is often lost.

XML data is stored in text format. This makes it easier to expand or upgrade to new operating
systems, new applications, or new browsers, without losing data.

5. XML increases data availability

Different applications can access your data, not only in HTML pages, but also from XML data
sources.

With XML, your data can be available to all kinds of "reading machines" (Handheld computers,
voice machines, news feeds, etc), and make it more available for blind people, or people with
other disabilities.

There are three important characteristics of XML that make it useful in a variety of systems and
solutions:
XML is extensible:
XML allows you to create your own self-descriptive tags, or language, that suits your
application.
XML carries the data, does not present it:
XML allows you to store the data irrespective of how it will be presented.
XML is a public standard:
XML was developed by an organization called the World Wide Web Consortium (W3C)
and is available as an open standard.

Vivekanand College for Advanced Computer and Information Science 2


Unit – 1 : Introduction of XML

1.2 XML syntax (Declaration, Tags, elements)

XML Syntax

A complete XML document:


<?xml version = "1.0"?>
<contact-info>
<name>Sachin Tendulkar</name>
<company>Reliance Infotech</company>
<phone>(022)1234567</phone>
</contact-info>

There are two kinds of information in the above example –


 Markup, like <contact-info>
 The text, or the character data, Reliance Infotech and (022)1234567.

The following diagram depicts the syntax rules to write different types of markup and text in an
XML document.

 XML Declaration

Vivekanand College for Advanced Computer and Information Science 3


Unit – 1 : Introduction of XML
The XML document can optionally have an XML declaration. It is written as follows −

<?xml version = "1.0" encoding = "UTF-8"?>

Where version is the XML version and encoding specifies the character encoding used in the

document.

Syntax Rules for XML Declaration

 The XML declaration is case sensitive and must begin with "<?xml>" where "xml" is
written in lower-case.
 If document contains XML declaration, then it strictly needs to be the first statement of
the XML document.
 An HTTP protocol can override the value of encoding that you put in the XML
declaration.
 If the XML declaration is included, it must contain version number attribute.
 The Parameter names and values are case-sensitive.
 The names are always in lower case.
 The order of placing the parameters is important. The correct order is: version, encoding
and standalone.
 Either single or double quotes may be used.
 The XML declaration has no closing tag i.e. </?xml>

 Tags and Elements

An XML file is structured by several XML-elements, also called XML-nodes or XML-tags. The
names of XML-elements are enclosed in triangular brackets < > as shown below −

<element>

Tags are case sensitive.

Types of XML tags


Start Tag
End Tag
Empty Tag

Syntax Rules for Tags and Elements


Element Syntax − Each XML-element needs to be closed either with start or with end elements
as shown below −

<element>....</element>

Vivekanand College for Advanced Computer and Information Science 4


Unit – 1 : Introduction of XML
or

<element/>

Nesting of Elements –

An XML-element can contain multiple XML-elements as its children, but the children elements
must not overlap. i.e., an end tag of an element must have the same name as that of the most
recent unmatched start tag.

The Following example shows incorrect nested tags −

<?xml version = "1.0"?>


<contact-info>
<company> Reliance Infotech
</contact-info>
</company>

The Following example shows correct nested tags −

<?xml version = "1.0"?>


<contact-info>
<company> Reliance Infotech </company>
<contact-info>

Root Element –

An XML document can have only one root element. For example, following is not a correct XML
document, because both the x and y elements occur at the top level without a root element −

<x>...</x>
<y>...</y>

The Following example shows a correctly formed XML document −

<root>
<x>...</x>
<y>...</y>
</root>

Case Sensitivity – The names of XML-elements are case-sensitive. That means the name of the
start and the end elements need to be exactly in the same case.

Vivekanand College for Advanced Computer and Information Science 5


Unit – 1 : Introduction of XML

For example, <contact-info> is different from <Contact-Info>

 XML Attributes

An attribute specifies a single property for the element, using a name/value pair. An XML-
element can have one or more attributes. For example −

<a href = "http://www.relianceinfo.com/"> Reliance Infotech!</a>

Here href is the attribute name and http://www. relianceinfo.com/ is attribute value.

Syntax Rules for XML Attributes


 Attribute names in XML (unlike HTML) are case sensitive. That is, HREF and href are
considered two different XML attributes.
 Same attribute cannot have two values in a syntax. The following example shows
incorrect syntax because the attribute b is specified twice
<a b = "x" c = "y" b = "z">....</a>
 Attribute names are defined without quotation marks, whereas attribute values must
always appear in quotation marks. Following example demonstrates incorrect xml
syntax
<a b = x>....</a> [..incorrect syntax]

 XML References

References usually allow you to add or include additional text or markup in an XML document.
References always begin with the symbol "&" which is a reserved character and end with the
symbol ";". XML has two types of references −

 Entity References − An en ty reference contains a name between the start and the end
delimiters. For example &amp; where amp is name. The name refers to a predefined
string of text and/or markup.
 Character References − These contain references, such as &#65;, contains a hash mark
(“#”) followed by a number. The number always refers to the Unicode code of a
character. In this case, 65 refers to alphabet "A".

 XML Text

Vivekanand College for Advanced Computer and Information Science 6


Unit – 1 : Introduction of XML
The names of XML-elements and XML-attributes are case-sensitive, which means the name of
start and end elements need to be written in the same case. To avoid character encoding
problems, all XML files should be saved as Unicode UTF-8 or UTF-16 files.

Whitespace characters like blanks, tabs and line-breaks between XML-elements and between
the XML-attributes will be ignored.

Some characters are reserved by the XML syntax itself. Hence, they cannot be used directly. To
use them, some replacement-entities are used, which are listed below −

Not Allowed Character Replacement Entity Character Description

< &lt; less than

> &gt; greater than

& &amp; ampersand

' &apos; apostrophe

" &quot; quotation mark

1.3 root element, case sensitivity

An XML document is always descriptive. The tree structure is often referred to as XML Tree.
The tree structure contains root (parent) elements, child elements and so on. By using tree
structure, you can get to know all succeeding branches and sub-branches starting from the
root. The parsing starts at the root, then moves down the first branch to an element, take the
first branch from there, and so on to the leaf nodes.

<?xml version = "1.0"?>


<Company>
<Employee>
<FirstName>Sachin</FirstName>
<LastName>Tendulkar</LastName>
<ContactNo>1234567890</ContactNo>
<Email>[email protected]</Email>
<Address>
<City>Mumbai</City>
<State>Maharastra</State>
<Zip>560212</Zip>

Vivekanand College for Advanced Computer and Information Science 7


Unit – 1 : Introduction of XML
</Address>
</Employee>
</Company>

Following tree structure represents the above XML document −

There is a root element named as <company>. Inside that, there is one more element
<Employee>. Inside the employee element, there are five branches named <FirstName>,
<LastName>, <ContactNo>, <Email>, and <Address>. Inside the <Address> element, there are
three sub-branches, named <City> <State> and <Zip>.
XML documents are formed as element trees. An XML tree starts at a root element and branches
from the root to child elements.

1.4 XML document:


An XML document is a basic unit of XML information composed of elements and other markup
in an orderly package. An XML document can contains wide variety of data. For example,
database of numbers, numbers representing molecular structure or a mathematical equation .

A simple document is shown in the following example –

<?xml version = "1.0"?> [ Document prolog ]


<Employee> [ document elements]
<FirstName>Sachin</FirstName>
<LastName>Tendulkar</LastName>
<ContactNo>1234567890</ContactNo>
<Email>[email protected]</Email>
</Employee>
Vivekanand College for Advanced Computer and Information Science 8
Unit – 1 : Introduction of XML

Document Prolog Section

Document Prolog comes at the top of the document, before the root element. This section
contains −

 XML declaration
 Document type declaration

Document element section

Document Elements are the building blocks of XML. These divide the document into a
hierarchy of sections, each serving a specific purpose. You can separate a document into
multiple sections so that they can be rendered differently, or used by a search engine. The
elements can be containers, with a combination of text and other elements.

1.5 XML declaration and rules of declaration.

XML declaration contains details that prepare an XML processor to parse the XML document. It
is optional, but when used, it must appear in the first line of the XML document.

Following syntax shows XML declaration −

<?xml
version = "version_number"
encoding = "encoding_declaration"
standalone = "standalone_status"
?>

Each parameter consists of a parameter name, an equals sign (=), and parameter value inside a
quote.

Rules

An XML declaration should abide with the following rules −

 If the XML declaration is present in the XML, it must be placed as the first line in the
XML document.
 If the XML declaration is included, it must contain version number attribute.
 The Parameter names and values are case-sensitive.
 The names are always in lower case.

Vivekanand College for Advanced Computer and Information Science 9


Unit – 1 : Introduction of XML
 The order of placing the parameters is important. The correct order is: version, encoding
and standalone.
 Either single or double quotes may be used.
 The XML declaration has no closing tag i.e. </?xml>

XML Declaration Examples

Following are few examples of XML declarations −

XML declaration with no parameters −

<?xml >

XML declaration with version definition −

<?xml version = "1.0">

XML declaration with all parameters defined −

<?xml version = "1.0" encoding = "UTF-8" standalone = "no" ?>

XML declaration with all parameters defined in single quotes −

<?xml version = '1.0' encoding = 'iso-8859-1' standalone = 'no' ?>

Vivekanand College for Advanced Computer and Information Science 10

You might also like