0% found this document useful (0 votes)
53 views

Introduction of XML

XML (eXtensible Markup Language) is a markup language that is designed to store and transport data. It was designed to be both human- and machine-readable. XML is used to carry data with a focus on what data is, separately from presentation. XML documents form a tree structure and use tags to describe and surround content. XML has simple, logical syntax rules including having one root element, closing all elements, case-sensitive elements, and properly nested elements.

Uploaded by

nramnram
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Introduction of XML

XML (eXtensible Markup Language) is a markup language that is designed to store and transport data. It was designed to be both human- and machine-readable. XML is used to carry data with a focus on what data is, separately from presentation. XML documents form a tree structure and use tags to describe and surround content. XML has simple, logical syntax rules including having one root element, closing all elements, case-sensitive elements, and properly nested elements.

Uploaded by

nramnram
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Introduction of XML:

XML stands for EXtensible Markup Language.


XML was designed to store and transport data.
XML was designed to be both human- and machine-readable.

<?xml version="1.0" encoding="UTF-8"?>


<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
XML is a software- and hardware-independent tool for storing and transporting
data.

Why Study XML?


XML plays an important role in many IT systems.
For this reason, it is important for all software developers to have a good understanding of
XML.
What is XML?

XML stands for EXtensible Markup Language

XML is a markup language much like HTML

XML was designed to store and transport data

XML was designed to be self-descriptive

XML is a W3C Recommendation

XML Does Not DO Anything

Maybe it is a little hard to understand, but XML does not DO anything.


This note is a note to Tove, from Jani, stored as XML:

<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

The note is quite self-descriptive. It has sender and receiver information. It also has a heading
and a message body.
But still, this XML document does not DO anything. XML is just information wrapped in
tags. Someone must write a piece of software to send, receive, store, or display it:
Note

To: Tove
From: Jani
Reminder

Don't forget me this weekend!

The Difference Between XML and HTML


XML and HTML were designed with different goals:

XML was designed to carry data - with focus on what data is

HTML was designed to display data - with focus on how data looks

XML tags are not predefined like HTML tags are

XML Does Not Use Predefined Tags


The XML language has no predefined tags.
The tags in the example above (like <to> and <from>) are not defined in any XML standard.
These tags are "invented" by the author of the XML document.
HTML works with predefined tags like <p>, <h1>, <table>, etc.
With XML, the author must define both the tags and the document structure.
XML is Extensible

Most XML applications will work as expected even if new data is added (or removed).

Imagine an application designed to display the original version of note.xml (<to> <from>
<heading> <data>).
Then imagine a newer version of note.xml with added <date> and <hour> elements, and a
removed <heading>.
The way XML is constructed, older version of the application can still work:
<note>
<date>2015-09-01</date>
<hour>08:30</hour>
<to>Tove</to>
<from>Jani</from>
<body>Don't forget me this weekend!</body>
</note>

XML Simplifies Things

It simplifies data sharing

It simplifies data transport

It simplifies platform changes

It simplifies data availability

Many computer systems contain data in incompatible formats. Exchanging data between
incompatible systems (or upgraded systems) is a time-consuming task for web developers.
Large amounts of data must be converted, and incompatible data is often lost.
XML stores data in plain text format. This provides a software- and hardware-independent
way of storing, transporting, and sharing data.
XML also makes it easier to expand or upgrade to new operating systems, new applications,
or new browsers, without losing data.
With XML, data can be available to all kinds of "reading machines" like people, computers,
voice machines, news feeds, etc.
`

How Can XML be Used?


XML is used in many aspects of web development.
XML is often used to separate data from presentation.

XML Separates Data from Presentation

XML does not carry any information about how to be displayed.


The same XML data can be used in many different presentation scenarios.
Because of this, with XML, there is a full separation between data and presentation.

XML is Often a Complement to HTML

In many HTML applications, XML is used to store or transport data, while HTML is used to
format and display the same data.

XML Separates Data from HTML

When displaying data in HTML, you should not have to edit the HTML file when the data
changes.
With XML, the data can be stored in separate XML files.
With a few lines of JavaScript code, you can read an XML file and update the data content of
any HTML page.

Transaction Data
Thousands of XML formats exists, in many different industries, to describe day-to-day data
transactions:

Stocks and Shares

Financial transactions

Medical data

Mathematical data

Scientific measurements

News information

Weather services

XML Tree
XML documents form a tree structure that starts at "the root" and branches to
"the leaves".

An Example XML Document

The image above represents books in this XML:


<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
XML Tree Structure

XML documents are formed as element trees.


An XML tree starts at a root element and branches from the root to child elements.
All elements can have sub elements (child elements):
<root>
<child>

<subchild>.....</subchild>
</child>
</root>

The terms parent, child, and sibling are used to describe the relationships between elements.
Parent have children. Children have parents. Siblings are children on the same level (brothers
and sisters).
All elements can have text content (Harry Potter) and attributes (category="cooking").
Self-Describing Syntax

XML uses a much self-describing syntax.


A prolog defines the XML version and the character encoding:
<?xml version="1.0" encoding="UTF-8"?>

The next line is the root element of the document:


<bookstore>

The next line starts a <book> element:


<book category="cooking">

The <book> elements have 4 child elements: <title>,< author>, <year>, <price>.
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>

The next line ends the book element:


</book>

You can assume, from this example, that the XML document contains information about
books in a bookstore.

XML Syntax Rules


The syntax rules of XML are very simple and logical. The rules are easy to learn, and easy to
use.

XML Documents Must Have a Root Element

XML documents must contain one root element that is the parent of all other elements:
<root>
<child>
<subchild>.....</subchild>
</child>
</root>

In this example <note> is the root element:


<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The XML Prolog

This line is called the XML prolog:


<?xml version="1.0" encoding="UTF-8"?>

The XML prolog is optional. If it exists, it must come first in the document.
XML documents can contain international characters, like Norwegian or French .
To avoid errors, you should specify the encoding used, or save your XML files as UTF-8.
UTF-8 is the default character encoding for XML documents.
Character encoding can be studied in our Character Set Tutorial.

All XML Elements Must Have a Closing Tag

In HTML, some elements might work well, even with a missing closing tag:
<p>This is a paragraph.
<br>

In XML, it is illegal to omit the closing tag. All elements must have a closing tag:
<p>This is a paragraph.</p>
<br />

XML Tags are Case Sensitive

XML tags are case sensitive. The tag <Letter> is different from the tag <letter>.
Opening and closing tags must be written with the same case:
<Message>This is incorrect</message>
<message>This is correct</message>

"Opening and closing tags" are often referred to as "Start and end tags". Use whatever you
prefer. It is exactly the same thing.
XML Elements Must be Properly Nested

In HTML, you might see improperly nested elements:


<b><i>This text is bold and italic</b></i>

In XML, all elements must be properly nested within each other:


<b><i>This text is bold and italic</i></b>

In the example above, "Properly nested" simply means that since the <i> element is opened
inside the <b> element, it must be closed inside the <b> element.

XML Attribute Values Must be Quoted

XML elements can have attributes in name/value pairs just like in HTML.
In XML, the attribute values must always be quoted.
INCORRECT:
<note date=12/11/2007>
<to>Tove</to>
<from>Jani</from>
</note>

CORRECT:
<note date="12/11/2007">
<to>Tove</to>
<from>Jani</from>
</note>

The error in the first document is that the date attribute in the note element is not quoted.

Entity References

Some characters have a special meaning in XML.


If you place a character like "<" inside an XML element, it will generate an error because the
parser interprets it as the start of a new element.
This will generate an XML error:
<message>salary < 1000</message>

To avoid this error, replace the "<" character with an entity reference:
<message>salary &lt; 1000</message>

There are 5 pre-defined entity references in XML:


&lt;

< less than

&gt;

> greater than

&amp;

& ampersand

&apos;

'

apostrophe

&quot;

"

quotation mark

Comments in XML

The syntax for writing comments in XML is similar to that of HTML.


<!-- This is a comment -->
Two dashes in the middle of a comment are not allowed.
Not allowed:
<!-- This is a -- comment -->

Strange, but allowed:


<!-- This is a - - comment -->

White-space is Preserved in XML

XML does not truncate multiple white-spaces (HTML truncates multiple white-spaces to one
single white-space):
XML:

Hello

HTML:

Hello Tove

Tove

XML Stores New Line as LF

Windows applications store a new line as: carriage return and line feed (CR+LF).
Unix and Mac OSX uses LF.
Old Mac systems uses CR.
XML stores a new line as LF.

Well Formed XML

XML documents that conform to the syntax rules above are said to be "Well Formed" XML
documents.

XML Elements
An XML document contains XML Elements.

What is an XML Element?

An XML element is everything from (including) the element's start tag to (including) the
element's end tag.
<price>29.99</price>

An element can contain:

text

attributes

other elements

or a mix of the above

<bookstore>
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
In the example above:
<title>, <author>, <year>, and <price> have text content because they contain text (like
29.99).
<bookstore> and <book> have element contents, because they contain elements.
<book> has an attribute (category="children").
Empty XML Elements

An element with no content is said to be empty.


In XML, you can indicate an empty element like this:
<element></element>

You can also use a so called self-closing tag:


<element />

The two forms produce identical results in XML software (Readers, Parsers, Browsers).

XML Naming Rules


XML elements must follow these naming rules:

Element names are case-sensitive

Element names must start with a letter or underscore

Element names cannot start with the letters xml (or XML, or Xml, etc)

Element names can contain letters, digits, hyphens, underscores, and periods

Element names cannot contain spaces

Any name can be used, no words are reserved (except xml).

Best Naming Practices


Create descriptive names, like this: <person>, <firstname>, <lastname>.
Create short and simple names, like this: <book_title> not like this: <the_title_of_the_book>.
Avoid "-". If you name something "first-name", some software may think you want to
subtract "name" from "first".
Avoid ".". If you name something "first.name", some software may think that "name" is a
property of the object "first".
Avoid ":". Colons are reserved for namespaces (more later).
Non-English letters like are perfectly legal in XML, but watch out for problems if your
software doesn't support them.

Naming Styles

There are no naming styles defined for XML elements. But here are some commonly used:
Style

Example

Lower case <firstname>

Description
All letters lower case

Upper case <FIRSTNAME> All letters upper case


Underscore <first_name> Underscore separates words
Pascal case <FirstName>

Uppercase first letter in each word

Camel case <firstName>

Uppercase first letter in each word except the first

If you choose a naming style, it is good to be consistent!


XML documents often have a corresponding database. A common practice is to use the
naming rules of the database for the XML elements.
XML Elements are Extensible

XML elements can be extended to carry more information.

Look at the following XML example:


<note>
<to>Tove</to>
<from>Jani</from>
<body>Don't forget me this weekend!</body>
</note>

Let's imagine that we created an application that extracted the <to>, <from>, and <body>
elements from the XML document to produce this output:
MESSAGE

To: Tove
From: Jani
Don't forget me this weekend!

Imagine that the author of the XML document added some extra information to it:
<note>
<date>2008-01-10</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Should the application break or crash?
No. The application should still be able to find the <to>, <from>, and <body> elements in the
XML document and produce the same output.
This is one of the beauties of XML. It can be extended without breaking applications.

XML Attributes
XML elements can have attributes, just like HTML.
Attributes are designed to contain data related to a specific element.
XML Attributes Must be Quoted

Attribute values must always be quoted. Either single or double quotes can be used.
For a person's gender, the <person> element can be written like this:
<person gender="female">
or like this:
<person gender='female'>
If the attribute value itself contains double quotes you can use single quotes, like in this
example:
<gangster name='George "Shotgun" Ziegler'>
or you can use character entities:
<gangster name="George &quot;Shotgun&quot; Ziegler">
XML Elements vs. Attributes

Take a look at these examples:


<person gender="female">
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>

<person>
<gender>female</gender>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>

In the first example gender is an attribute. In the last, gender is an element. Both examples
provide the same information.
There are no rules about when to use attributes or when to use elements in XML.

Avoid XML Attributes?


Some things to consider when using attributes are:

attributes cannot contain multiple values (elements can)

attributes cannot contain tree structures (elements can)

attributes are not easily expandable (for future changes)

Don't end up like this:


<note day="10" month="01" year="2008"
to="Tove" from="Jani" heading="Reminder"
body="Don't forget me this weekend!">
</note>
XML Attributes for Metadata

Sometimes ID references are assigned to elements. These IDs can be used to identify XML
elements in much the same way as the id attribute in HTML. This example demonstrates this:
<messages>
<note id="501">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<note id="502">
<to>Jani</to>
<from>Tove</from>
<heading>Re: Reminder</heading>
<body>I will not</body>
</note>
</messages>

The id attributes above are for identifying the different notes. It is not a part of the note itself.
What I'm trying to say here is that metadata (data about data) should be stored as attributes,
and the data itself should be stored as elements.

XML Namespaces
XML Namespaces provide a method to avoid element name conflicts.

Name Conflicts

In XML, element names are defined by the developer. This often results in a conflict when
trying to mix XML documents from different XML applications.
This XML carries HTML table information:

<table>
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>

This XML carries information about a table (a piece of furniture):


<table>
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>
If these XML fragments were added together, there would be a name conflict. Both contain a
<table> element, but the elements have different content and meaning.
A user or an XML application will not know how to handle these differences.
Solving the Name Conflict Using a Prefix

Name conflicts in XML can easily be avoided using a name prefix.


This XML carries information about an HTML table, and a piece of furniture:
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>

In the example above, there will be no conflict because the two <table> elements have
different names.
XML Namespaces - The xmlns Attribute

When using prefixes in XML, a namespace for the prefix must be defined.
The namespace can be defined by an xmlns attribute in the start tag of an element.
The namespace declaration has the following syntax. xmlns:prefix="URI".

<root>
<h:table xmlns:h="http://www.w3.org/TR/html4/">
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table xmlns:f="http://www.w3schools.com/furniture">
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
In the example above:
The xmlns attribute in the first <table> element gives the h: prefix a qualified namespace.
The xmlns attribute in the second <table> element gives the f: prefix a qualified namespace.
When a namespace is defined for an element, all child elements with the same prefix are
associated with the same namespace.
Namespaces can also be declared in the XML root element:
<root
xmlns:h="http://www.w3.org/TR/html4/"
xmlns:f="http://www.w3schools.com/furniture">
<h:table>
<h:tr>
<h:td>Apples</h:td>
<h:td>Bananas</h:td>
</h:tr>
</h:table>
<f:table>
<f:name>African Coffee Table</f:name>
<f:width>80</f:width>
<f:length>120</f:length>
</f:table>
</root>
Note: The namespace URI is not used by the parser to look up information.
The purpose of using an URI is to give the namespace a unique name.

However, companies often use the namespace as a pointer to a web page containing
namespace information.
Uniform Resource Identifier (URI)

A Uniform Resource Identifier (URI) is a string of characters which identifies an Internet


Resource.
The most common URI is the Uniform Resource Locator (URL) which identifies an
Internet domain address. Another, not so common type of URI is the Universal Resource
Name (URN).
Default Namespaces

Defining a default namespace for an element saves us from using prefixes in all the child
elements. It has the following syntax:
xmlns="namespaceURI"

This XML carries HTML table information:


<table xmlns="http://www.w3.org/TR/html4/">
<tr>
<td>Apples</td>
<td>Bananas</td>
</tr>
</table>

This XML carries information about a piece of furniture:


<table xmlns="http://www.w3schools.com/furniture">
<name>African Coffee Table</name>
<width>80</width>
<length>120</length>
</table>
Namespaces in Real Use

XSLT is a language that can be used to transform XML documents into other formats.
The XML document below, is a document used to transform XML into HTML.
The namespace "http://www.w3.org/1999/XSL/Transform" identifies XSLT elements inside
an HTML document:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">

<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr>
<th style="text-align:left">Title</th>
<th style="text-align:left">Artist</th>
</tr>
<xsl:for-each select="catalog/cd">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

Displaying XML
Raw XML files can be viewed in all major browsers.
Don't expect XML files to be displayed as HTML pages.
Viewing XML Files
<?xml version="1.0" encoding="UTF-8"?>
- <note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

Look at the XML file above in your browser: note.xml


Most browsers will display an XML document with color-coded elements.
Often a plus (+) or minus sign (-) to the left of the elements can be clicked to expand or
collapse the element structure.
To view raw XML source, try to select "View Page Source" or "View Source" from the
browser menu.

Note: In Safari 5 (and earlier), only the element text will be displayed. To view the raw
XML, you must right click the page and select "View Source".

Viewing an Invalid XML File

If an erroneous XML file is opened, some browsers will report the error, and some will
display it, or display it incorrectly.
<?xml version="1.0" encoding="UTF-8"?>
- <note>
<to>Tove</to>
<from>Jani</Ffrom>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

Try to open the following XML file: note_error.xml


Why Does XML Display Like This?

XML documents do not carry information about how to display the data.
Since XML tags are "invented" by the author of the XML document, browsers do not know if
a tag like <table> describes an HTML table or a dining table.
Without any information about how to display the data, the browsers can just display the
XML document as it is.

You might also like