XML Basics
XML Basics
XML Basics
Document Control
Change Record
Date
Author
Version
Change Reference
04-Apr-09
Anoosha Burlakanti,
Preethi Phani Mummaleti
1.0.0
Initial Document
Page 1 of 13
Position
Distribution
Copy No.
Name
Location
1
2
3
4
Library Master
Project Library
Project Manager
Note To Holders:
If you receive an electronic copy of this document and print it out, please write your name
on the equivalent of the cover page, for document control purposes.
If you receive a hard copy of this document, please write your name on the front cover, for
document control purposes.
Table of Contents:
Topic
Page
No
1. Introduction To XML
2. What is XML
3. Why XML?
5. Characteristics of XML
6. XML Tree
Page 2 of 13
8. XML Elements
10
9. XML Attributes
10
11
13
12. References
13
Page 3 of 13
2. What is XML?
XML is a software- and hardware-independent tool for carrying information.
XML (Extensible Markup Language) is a general-purpose specification for creating custom markup
languages. It is classified as an extensible language, because it allows the user to define the mark-up
elements. You can create content and mark it up with delimiting tags, making each word, phrase, or chunk
into identifiable, sortable information.
XML is recommended by the World Wide Web Consortium (W3C). It is a fee-free open standard. The
recommendation specifies lexical grammar and parsing requirements.
3. Why XML?
XML was created so that richly structured documents could be used over the web. The only viable
alternatives, HTML and SGML, are not practical for this purpose.
HTML comes bound with a set of semantics and does not provide arbitrary structure.
SGML provides arbitrary structure, but is too difficult to implement just for a web browser. Full SGML
systems solve large, complex problems that justify their expense. Viewing structured documents sent over
the web rarely carries such justification.
This is not to say that XML can be expected to completely replace SGML. While XML is being designed to
deliver structured content over the web, some of the very features it lacks to make this practical, make
SGML a more satisfactory solution for the creation and long-time storage of complex documents. In many
organizations, filtering SGML to XML will be the standard procedure for web delivery.
Designed for ease-of-use with Standard Generalized Markup Language (SGML).Goal is to enable SGML to
be served, received and processed beyond what is now possible with HTML.
Page 4 of 13
5. Characteristics of XML
1) XML was created to structure, store, and transport information.
The following example is a note to Anoosha from Preethi, stored as XML:
<note>
<to>Anoosha</to>
<from>Preethi</from>
<heading>Urgent</heading>
<body>Please call me!</body>
</note>
The above example is self descriptive. It has sender and receiver information, it also has a heading and a
message body.
This XML document does not do anything. It is just pure information wrapped in tags.
2) XML is Just Plain Text
XML is nothing special. It is just plain text. Software that can handle plain text can also handle XML.
However, XML-aware applications can handle the XML tags specially. The functional meaning of the tags
depends on the nature of the application.
3) With XML You Can Invent Your Own Tags
The tags in the example above (like <to> and <from>) are not defined in any XML standard. These tags
are "invented" by the author of the XML document.
That is because the XML language has no predefined tags.
The tags used in HTML (and the structure of HTML) are predefined. HTML documents can only use tags
defined in the HTML standard (like <p>, <h1>, etc.).
XML allows the author to define his own tags and his own document structure.
4) XML is Not a Replacement for HTML
It is important to understand that XML is not a replacement for HTML but complement to HTML. In most
web applications, XML is used to transport data, while HTML is used to format and display the data.
5) XML is a W3C Recommendation
XML became a W3C Recommendation on 10. February 1998.
6) XML is everywhere
XML is now as important for the Web as HTML was to the foundation of the Web.XML is everywhere. It is
the most common tool for data transmissions between all sorts of applications, and is becoming more and
more popular in the area of storing and describing information.
6. XML Tree
Page 5 of 13
Example:
Page 6 of 13
Page 7 of 13
(this is correct)
(This is wrong because date attribute in the note element is not quoted)
<note date="12/11/2007">
<to>Anoosha</to>
<from>Preethi</from>
</note>
(This is correct)
6)Entity References
Some characters have a special meaning in XML.
Page 8 of 13
<
<
less than
>
>
greater than
&
&
ampersand
'
'
apostrophe
"
"
quotation mark
7)Comments in XML
The syntax for writing comments in XML is similar to that of HTML.
<!-- This is a comment -->
8)White-space is Preserved in XML
HTML truncates multiple white-space characters to one single white-space. With XML, the white-space in a
document is not truncated.
9) XML Stores New Line as LF
In Windows applications, a new line is normally stored as a pair of characters: carriage return (CR) and
line feed (LF). The character pair bears some resemblance to the typewriter actions of setting a new line.
In UNIX applications, a new line is normally stored as a LF character. Macintosh applications use only a CR
character to store a new line.
8. XML Elements
An XML element is everything from (including) the element's start tag to (including) the element's end
tag.
An element can contain other elements, simple text or a mixture of both. Elements can also have
attributes.
Elements are the most common form of markup. Delimited by angle brackets, most elements identify the
nature of the content they surround. Some elements may be empty, in which case they have no content.
If an element is not empty, it begins with a start-tag, <element>, and ends with an end-tag, </element>.
<note date="12/11/2007">
<to>Anoosha</to>
<from>Preethi</from>
<heading>Urgent</heading>
Page 9 of 13
3.
Names cannot start with the letters xml (or XML, or Xml, etc)
4.
9. XML Attributes
Attributes provide additional information about elements.
Attributes are name-value pairs that occur inside start-tags after the element name. For example,
<div class="preface"> is a div element with the attribute class having the value preface. In XML, all
attribute values must be quoted.
XML Elements vs. Attributes
The following figure shows the anatomy of XML file
XML attributes are normally used to describe XML elements. For example consider these two examples
<note date="12/11/2007">
<to>Anoosha</to>
Page 10 of 13
Page 11 of 13
note (to,from,heading,body)>
to (#PCDATA)>
from (#PCDATA)>
heading (#PCDATA)>
body (#PCDATA)>
XML DTD
The purpose of a DTD is to define the structure of an XML document. It defines the structure with a list of
legal elements.
An XML DTD allows computers to check that each component of document occurs in a valid place within
the document. For example it allows computers to check that users do not accidentally enter a third level
of heading without first having a second level heading, etc.
The DTD can be Internal or external. An internal DTD refers to a case where the XML document has the
DTD inline; where as an external DTD is one where the document instance is separated from the formal
definition of elements.
Consider the example below:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE note SYSTEM "Note.dtd">
<note>
<to>Anoosha</to>
<from>Preethi</from>
<heading>Urgent</heading>
<body>Please call me!</body>
</note>
The DOCTYPE declaration in the example above is a reference to an external DTD file. The content of the
file is shown below
<!DOCTYPE
[
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ELEMENT
]>
note
note (to,from,heading,body)>
to (#PCDATA)>
from (#PCDATA)>
heading (#PCDATA)>
body (#PCDATA)>
Page 12 of 13
XML Schemas are much more powerful than DTDs.XML schema is written in XML.
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
12. References:
www.w3schools.com
www.wikipedia.com
www.ibm.com
Page 13 of 13