Introduction To XML
Introduction To XML
It is a Text file. XML was designed to transport and store data. Its file extension is .xml.
HTML was designed to display data.
What is XML?
• XML was designed to transport and store data, with focus on what data is. In
XML every tag must be closed.
• HTML was designed to display data, with focus on how data looks. In HTML,
there are some tag which will not closed (i.e. <br>, <hr> etc.)
2
An Example XML Document
XML documents use a self-describing and simple syntax:
<?xml version="1.0" encoding="UTF-8" ?>
<note>
<to>Mahesh</to>
<from>Ramesh</from>
<heading>Reminder</heading>
<body> Hi, It’s good to see you!</body>
</note>
The first line is the XML declaration. It defines the XML version (1.0) and the encoding
used (UTF-8 = 8-bit UCS/Unicode Transformation Format is a variable-length character
encoding for Unicode).
The next line describes the root element of the document (like saying: "this document is
a note"):
<note>
The next 4 lines describe 4 child elements of the root (to, from, heading, and body):
<to>Mahesh</to>
<from>Rameshi</from>
<heading>Reminder</heading>
<body> Hi, It’s good to see you!</body>
And finally the last line defines the end of the root element:
</note>
XML Syntax Rules
The syntax rules of XML are very simple and logical. The rules are easy to learn, and
easy to use.
<note date="12/11/2007">
<to>Mahesh</to>
<from>Ramesh</from>
</note>
The error in the first document is that the date attribute in the note element is not quoted.
Entity References
Some characters have a special meaning in XML.
If you place a character like "<" inside an XML element, it will generate an error because
the parser interprets it as the start of a new element.
This will generate an XML error:
<message>if salary < 1000 then</message>
To avoid this error, replace the "<" character with an entity reference:
<message>if salary < 1000 then</message>
There are 5 predefined entity references in XML:
< < less than
> > greater than
& & ampersand
' ' apostrophe
4
" " quotation mark
Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than
character is legal, but it is a good habit to replace it.
Comments in XML
The syntax for writing comments in XML is similar to that of HTML.
<!-- This is a comment -->
XML Attributes
XML elements can have attributes in the start tag, just like HTML.
XML Attributes
5
From HTML you will remember this: <img src="computer.gif">. The "src" attribute
provides additional information about the <img> element.
In HTML (and in XML) attributes provide additional information about elements:
<img src="computer.gif">
<a href="demo.asp">
Attributes often provide information that is not a part of the data. In the example below,
the file type is irrelevant to the data, but important to the software that wants to
manipulate the element:
<file type="gif">computer.gif</file>
<person>
<sex>female</sex>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
In the first example sex is an attribute. In the last, sex is an element. Both examples
provide the same information.
There are no rules about when to use attributes and when to use elements. Attributes are
handy in HTML. In XML my advice is to avoid them. Use elements instead.
XML Validation
XML with correct syntax is "Well Formed" XML.
Internal DTD
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Mahesh</to>
<from>Ramesh</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
External DTD
<!DOCTYPE note
[
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
7
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
PCDATA
PCDATA means parsed character data. Think of character data as the text found
between the start tag and the end tag of an XML element.
PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as
markup and entities will be expanded.
CDATA
CDATA also means character data. CDATA is text that will NOT be parsed by a
parser. Tags inside the text will NOT be treated as markup and entities will not be
expanded.
8
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="cd_catalog.css"?>
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<CD>
<TITLE>Hide your heart</TITLE>
<ARTIST>Bonnie Tyler</ARTIST>
<COUNTRY>UK</COUNTRY>
<COMPANY>CBS Records</COMPANY>
<PRICE>9.90</PRICE>
<YEAR>1988</YEAR>
</CD>
</CATALOG>