XML Tutorial
XML Tutorial
XML
XML was designed to describe data and to focus on what data is.
In this XML tutorial you will learn what XMLis, and how it works.
Introduction to XML
XML was designed to describe data and to focus on what data is.
HTML was designed to display data and to focus on how data looks.
If you want to study these subjects first, before you start reading about XML, you can find the
tutorials you need at W3Schools' Home Page.
What is XML?
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The note has a header and a message body. It also has sender and receiver information. But still,
this XML document does not DO anything. It is just pure information wrapped in XML tags. Someone
must write a piece of software to send, receive or display it.
It is important to understand that XML was designed to store, carry, and exchange data.
XML was not designed to display data.
XML Syntax
The syntax rules of XML are very simple and very strict. The rules are very easy to learn,
and very easy to use.
Because of this, creating software that can read and manipulate XML is very easy to do.
The first line in the document - the XML declaration - defines the XML version and the character
encoding used in the document. In this case the document conforms to the 1.0 specification of XML
and uses the ISO-8859-1 (Latin-1/West European) character set.
The next line describes the root element of the document (like it was saying: "this document is a
note"):
<note>
The next 4 lines describe 4 child elements of the root (to, from, heading, and body):
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
And finally the last line defines the end of the root element:
</note>
Can you detect from this example that the XML document contains a Note to Tove from Jani? Don't
you agree that XML is pretty self-descriptive?
<p>This is a paragraph
<p>This is another paragraph
<p>This is a paragraph</p>
<p>This is another paragraph</p>
Note: You might have noticed from the previous example that the XML declaration did not have a
closing tag. This is not an error. The declaration is not a part of the XML document itself. It is not an
XML element, and it should not have a closing tag.
<Message>This is incorrect</message>
<message>This is correct</message>
In HTML some elements can be improperly nested within each other like this:
In XML all elements must be properly nested within each other like this:
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
The error in the first document is that the date attribute in the note element is not quoted.
This is correct: date="12/11/2002". This is incorrect: date=12/11/2002.
Comments in XML
The syntax for writing comments in XML is similar to that of HTML.
<!-- This is a comment -->
XML Elements
<note>
<to>Tove</to>
<from>Jani</from>
<body>Don't forget me this weekend!</body>
</note>
Let's imagine that we created an application that extracted the <to>, <from>, and <body>
elements from the XML document to produce this output:
MESSAGE
To: Tove
From: Jani
Imagine that the author of the XML document added some extra information to it:
<note>
<date>2002-08-01</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
My First XML
Introduction to XML
What is HTML
What is XML
XML Syntax
<book>
<title>My First XML</title>
<prod id="33-657" media="paper"></prod>
<chapter>Introduction to XML
<para>What is HTML</para>
<para>What is XML</para>
</chapter>
<chapter>XML Syntax
<para>Elements must have a closing tag</para>
<para>Elements must be properly nested</para>
</chapter>
</book>
Book is the root element. Title, prod, and chapter are child elements of book. Book is the parent
element of title, prod, and chapter. Title, prod, and chapter are siblings (or sister elements)
because they have the same parent.
Element Naming
XML elements must follow these naming rules:
Take care when you "invent" element names and follow these simple rules:
Any name can be used, no words are reserved, but the idea is to make names descriptive. Names
with an underscore separator are nice.
Examples: <first_name>, <last_name>.
Avoid "-" and "." in names. For example, if you name something "first-name," it could be a mess if
your software tries to subtract name from first. Or if you name something "first.name," your
software may think that "name" is a property of the object "first."
Element names can be as long as you like, but don't exaggerate. Names should be short and
simple, like this: <book_title> not like this: <the_title_of_the_book>.
XML documents often have a corresponding database, in which fields exist corresponding to
elements in the XML document. A good practice is to use the naming rules of your database for the
elements in the XML documents.
Non-English letters like éòá are perfectly legal in XML element names, but watch out for problems if
your software vendor doesn't support them.
The ":" should not be used in element names because it is reserved to be used for something called
namespaces (more later).
XML Attributes
XML elements can have attributes in the start tag, just like HTML.
XML Attributes
XML elements can have attributes.
From HTML you will remember this: <IMG SRC="computer.gif">. The SRC attribute provides
additional information about the IMG element.
In HTML (and in XML) attributes provide additional information about elements:
<img src="computer.gif">
<a href="demo.asp">
Attributes often provide information that is not a part of the data. In the example below, the file
type is irrelevant to the data, but important to the software that wants to manipulate the element:
<file type="gif">computer.gif</file>
<person sex="female">
or like this:
<person sex='female'>
Note: If the attribute value itself contains double quotes it is necessary to use single quotes, like in
this example:
<gangster name='George "Shotgun" Ziegler'>
Note: If the attribute value itself contains single quotes it is necessary to use double quotes, like in
this example:
<person sex="female">
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
<person>
<sex>female</sex>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
In the first example sex is an attribute. In the last, sex is a child element. Both examples provide
the same information.
There are no rules about when to use attributes, and when to use child elements. My experience is
that attributes are handy in HTML, but in XML you should try to avoid them. Use child elements if
the information feels like data.
My Favorite Way
I like to store data in child elements.
The following three XML documents contain exactly the same information:
A date attribute is used in the first example:
<note date="12/11/2002">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<note>
<date>12/11/2002</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<note>
<date>
<day>12</day>
<month>11</month>
<year>2002</year>
</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
If you use attributes as containers for data, you end up with documents that are difficult to read and
maintain. Try to use elements to describe data. Use attributes only to provide information that is
not relevant to the data.
Don't end up like this ( if you think this looks like XML, you have not understood the point):
<messages>
<note id="p501">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<note id="p502">
<to>Jani</to>
<from>Tove</from>
<heading>Re: Reminder</heading>
<body>I will not!</body>
</note>
</messages>
The ID in these examples is just a counter, or a unique identifier, to identify the different notes in
the XML file, and not a part of the note data.
What I am trying to say here is that metadata (data about data) should be stored as attributes, and
that data itself should be stored as elements.
XML Validation
XML with correct syntax is Well Formed XML.
XML DTD
A DTD defines the legal elements of an XML document.
The purpose of a DTD is to define the legal building blocks of an XML document. It defines the
document structure with a list of legal elements. You can read more about DTD, and how to validate
your XML documents in our DTD tutorial.
XML in Netscape 6
Netscape 6 supports XML.
To look at the XML source in Netscape 6: right-click on the page and select "View Page Source".