Introduction of XML
Introduction of XML
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The note is quite self-descriptive. It has sender and receiver information. It also has a heading
and a message body.
But still, this XML document does not DO anything. XML is just information wrapped in
tags. Someone must write a piece of software to send, receive, store, or display it:
Note
To: Tove
From: Jani
Reminder
HTML was designed to display data - with focus on how data looks
Most XML applications will work as expected even if new data is added (or removed).
Imagine an application designed to display the original version of note.xml (<to> <from>
<heading> <data>).
Then imagine a newer version of note.xml with added <date> and <hour> elements, and a
removed <heading>.
The way XML is constructed, older version of the application can still work:
<note>
<date>2015-09-01</date>
<hour>08:30</hour>
<to>Tove</to>
<from>Jani</from>
<body>Don't forget me this weekend!</body>
</note>
Many computer systems contain data in incompatible formats. Exchanging data between
incompatible systems (or upgraded systems) is a time-consuming task for web developers.
Large amounts of data must be converted, and incompatible data is often lost.
XML stores data in plain text format. This provides a software- and hardware-independent
way of storing, transporting, and sharing data.
XML also makes it easier to expand or upgrade to new operating systems, new applications,
or new browsers, without losing data.
With XML, data can be available to all kinds of "reading machines" like people, computers,
voice machines, news feeds, etc.
`
In many HTML applications, XML is used to store or transport data, while HTML is used to
format and display the same data.
When displaying data in HTML, you should not have to edit the HTML file when the data
changes.
With XML, the data can be stored in separate XML files.
With a few lines of JavaScript code, you can read an XML file and update the data content of
any HTML page.
Transaction Data
Thousands of XML formats exists, in many different industries, to describe day-to-day data
transactions:
Financial transactions
Medical data
Mathematical data
Scientific measurements
News information
Weather services
XML Tree
XML documents form a tree structure that starts at "the root" and branches to
"the leaves".
<subchild>.....</subchild>
</child>
</root>
The terms parent, child, and sibling are used to describe the relationships between elements.
Parent have children. Children have parents. Siblings are children on the same level (brothers
and sisters).
All elements can have text content (Harry Potter) and attributes (category="cooking").
Self-Describing Syntax
The <book> elements have 4 child elements: <title>,< author>, <year>, <price>.
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
You can assume, from this example, that the XML document contains information about
books in a bookstore.
XML documents must contain one root element that is the parent of all other elements:
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
The XML prolog is optional. If it exists, it must come first in the document.
XML documents can contain international characters, like Norwegian or French .
To avoid errors, you should specify the encoding used, or save your XML files as UTF-8.
UTF-8 is the default character encoding for XML documents.
Character encoding can be studied in our Character Set Tutorial.
In HTML, some elements might work well, even with a missing closing tag:
<p>This is a paragraph.
<br>
In XML, it is illegal to omit the closing tag. All elements must have a closing tag:
<p>This is a paragraph.</p>
<br />
XML tags are case sensitive. The tag <Letter> is different from the tag <letter>.
Opening and closing tags must be written with the same case:
<Message>This is incorrect</message>
<message>This is correct</message>
"Opening and closing tags" are often referred to as "Start and end tags". Use whatever you
prefer. It is exactly the same thing.
XML Elements Must be Properly Nested
In the example above, "Properly nested" simply means that since the <i> element is opened
inside the <b> element, it must be closed inside the <b> element.
XML elements can have attributes in name/value pairs just like in HTML.
In XML, the attribute values must always be quoted.
INCORRECT:
<note date=12/11/2007>
<to>Tove</to>
<from>Jani</from>
</note>
CORRECT:
<note date="12/11/2007">
<to>Tove</to>
<from>Jani</from>
</note>
The error in the first document is that the date attribute in the note element is not quoted.
Entity References
To avoid this error, replace the "<" character with an entity reference:
<message>salary < 1000</message>
>
&
& ampersand
'
'
apostrophe
"
"
quotation mark
Comments in XML
XML does not truncate multiple white-spaces (HTML truncates multiple white-spaces to one
single white-space):
XML:
Hello
HTML:
Hello Tove
Tove
Windows applications store a new line as: carriage return and line feed (CR+LF).
Unix and Mac OSX uses LF.
Old Mac systems uses CR.
XML stores a new line as LF.
XML documents that conform to the syntax rules above are said to be "Well Formed" XML
documents.
XML Elements
An XML document contains XML Elements.
An XML element is everything from (including) the element's start tag to (including) the
element's end tag.
<price>29.99</price>
text
attributes
other elements
<bookstore>
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
In the example above:
<title>, <author>, <year>, and <price> have text content because they contain text (like
29.99).
<bookstore> and <book> have element contents, because they contain elements.
<book> has an attribute (category="children").
Empty XML Elements
The two forms produce identical results in XML software (Readers, Parsers, Browsers).
Element names cannot start with the letters xml (or XML, or Xml, etc)
Element names can contain letters, digits, hyphens, underscores, and periods
Naming Styles
There are no naming styles defined for XML elements. But here are some commonly used:
Style
Example
Description
All letters lower case
Let's imagine that we created an application that extracted the <to>, <from>, and <body>
elements from the XML document to produce this output:
MESSAGE
To: Tove
From: Jani
Don't forget me this weekend!
Imagine that the author of the XML document added some extra information to it:
<note>
<date>2008-01-10</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Should the application break or crash?
No. The application should still be able to find the <to>, <from>, and <body> elements in the
XML document and produce the same output.
This is one of the beauties of XML. It can be extended without breaking applications.
XML Attributes
XML elements can have attributes, just like HTML.
Attributes are designed to contain data related to a specific element.
XML Attributes Must be Quoted
Attribute values must always be quoted. Either single or double quotes can be used.
For a person's gender, the <person> element can be written like this:
<person gender="female">
or like this:
<person gender='female'>
If the attribute value itself contains double quotes you can use single quotes, like in this
example:
<gangster name='George "Shotgun" Ziegler'>
or you can use character entities:
<gangster name="George "Shotgun" Ziegler">
XML Elements vs. Attributes
<person>
<gender>female</gender>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
In the first example gender is an attribute. In the last, gender is an element. Both examples
provide the same information.
There are no rules about when to use attributes or when to use elements in XML.
Sometimes ID references are assigned to elements. These IDs can be used to identify XML
elements in much the same way as the id attribute in HTML. This example demonstrates this:
<messages>
<note id="501">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<note id="502">
<to>Jani</to>
<from>Tove</from>
<heading>Re: Reminder</heading>
<body>I will not</body>
</note>
</messages>
The id attributes above are for identifying the different notes. It is not a part of the note itself.
What I'm trying to say here is that metadata (data about data) should be stored as attributes,
and the data itself should be stored as elements.
XML Namespaces
XML Namespaces provide a method to avoid element name conflicts.
Name Conflicts
In XML, element names are defined by the developer. This often results in a conflict when
trying to mix XML documents from different XML applications.
This XML carries HTML table information:
<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>
In the example above, there will be no conflict because the two <table> elements have
different names.
XML Namespaces - The xmlns Attribute
When using prefixes in XML, a namespace for the prefix must be defined.
The namespace can be defined by an xmlns attribute in the start tag of an element.
The namespace declaration has the following syntax. xmlns:prefix="URI".
<root>
<h:table xmlns:h="http://www.w3.org/TR/html4/">
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table xmlns:f="http://www.w3schools.com/furniture">
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
In the example above:
The xmlns attribute in the first <table> element gives the h: prefix a qualified namespace.
The xmlns attribute in the second <table> element gives the f: prefix a qualified namespace.
When a namespace is defined for an element, all child elements with the same prefix are
associated with the same namespace.
Namespaces can also be declared in the XML root element:
<root
xmlns:h="http://www.w3.org/TR/html4/"
xmlns:f="http://www.w3schools.com/furniture">
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
Note: The namespace URI is not used by the parser to look up information.
The purpose of using an URI is to give the namespace a unique name.
However, companies often use the namespace as a pointer to a web page containing
namespace information.
Uniform Resource Identifier (URI)
Defining a default namespace for an element saves us from using prefixes in all the child
elements. It has the following syntax:
xmlns="namespaceURI"
XSLT is a language that can be used to transform XML documents into other formats.
The XML document below, is a document used to transform XML into HTML.
The namespace "http://www.w3.org/1999/XSL/Transform" identifies XSLT elements inside
an HTML document:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr>
<th style="text-align:left">Title</th>
<th style="text-align:left">Artist</th>
</tr>
<xsl:for-each select="catalog/cd">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Displaying XML
Raw XML files can be viewed in all major browsers.
Don't expect XML files to be displayed as HTML pages.
Viewing XML Files
<?xml version="1.0" encoding="UTF-8"?>
- <note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Note: In Safari 5 (and earlier), only the element text will be displayed. To view the raw
XML, you must right click the page and select "View Source".
If an erroneous XML file is opened, some browsers will report the error, and some will
display it, or display it incorrectly.
<?xml version="1.0" encoding="UTF-8"?>
- <note>
<to>Tove</to>
<from>Jani</Ffrom>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
XML documents do not carry information about how to display the data.
Since XML tags are "invented" by the author of the XML document, browsers do not know if
a tag like <table> describes an HTML table or a dining table.
Without any information about how to display the data, the browsers can just display the
XML document as it is.