0% found this document useful (0 votes)
181 views

XML Syntax Rules of XML Language: © 2008 Mindtree Consulting

This presentation covers XML Syntax for beginners.

Uploaded by

Neeraj Singh
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
181 views

XML Syntax Rules of XML Language: © 2008 Mindtree Consulting

This presentation covers XML Syntax for beginners.

Uploaded by

Neeraj Singh
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 36

XML Syntax

Rules of XML Language


Sep-2009

© 2008 MindTree Consulting


Agenda

Need for XML


Quiz
XML Syntax - Rules of XML language

Slide 2
Need For XML
Revision of previous session
Quiz

© 2008 MindTree Consulting


A First Look at XML

The idea behind XML is deceptively simple. It aims at answering the


conflicting demands that arrive at the W3C for the future of HTML.
On one hand, people need more tags. And these new tags are
increasingly specialized. For example, mathematicians want tags
for formulas. Chemists also want tags for formulas but they are not
the same.
On the other hand, authors and developers want fewer tags.
HTML is already so complex! As handheld devices gain in
popularity, the need for a simpler markup language also is apparent
because small devices, like the PalmPilot, are not powerful enough
to process HMTL pages.

Slide 4
XML

How can you have both more tags and fewer tags in a single
language?
To resolve this dilemma, XML makes essentially two changes to HTML:
It predefines no tags.
It is stricter.

Slide 5
What is Markup

In an electronic document, the markup is the codes, embedded


with the document text, which store the information required for
electronic processing, like font name, boldness or, in the case of
XML, the document structure. This is not specific to XML. Every
electronic document standard uses some sort of markup.

Slide 6
Applications of XML

Publishing
XML is being used by an increasing number of publishers as the format
for documents.
Example XML document for a monthly newsletter. As you can see, it uses
elements for the title, abstract, paragraphs, and other concepts common
in publishing.

Business Document Exchange


For example placing the order in XML rather than on paper. Advantage is
that software can process it. An application could read this order and
automatically fulfill it.

RSS / Atom
Eg Bloglines

Slide 7
XML Introduction - Quiz
Basic questions on XML Introduction

© 2008 MindTree Consulting


XML Introduction - Quiz

XML stands for


XML is about the description of data, and not its presentation.
XML allows us to define your own tags, so we can create our own
markup languages.
The XML specification is owned by W3C
XML is designed to be both machine readable and human readable.
XML provides a platform-neutral, language-independent means of
describing data.
Obviously, it’s the markup that differentiates the XML document
from plain text.

Slide 9
The XML Syntax
Start & End Tags, Elements, Element nesting XML Names, Attributes,
XML Declaration, Entities, CDATA, Comments, Processing Instructions,
Well formed XML

© 2008 MindTree Consulting


XML - Example
 Listing 2.1: An Address Book in XML
<?xml version=”1.0”?>
 <address-book>
<entry><name>John Doe</name>
<address><street>34 Fountain Square Plaza</street>
<region>OH</region><postal-code>45202</postal-code>
<locality>Cincinnati</locality><country>US</country>
</address>
<tel preferred=”true”>513-555-8889</tel>
<tel>513-555-7098</tel>
<email href=”mailto:[email protected]”/>
</entry>
<entry><name><fname>Jack</fname><lname>Smith</lname></name>
<tel>513-555-3465</tel>
<email href=”mailto:[email protected]”/>
</entry>
</address-book>

Slide 11
Element’s Start and End Tags

The building block of XML is the element, as that’s what comprises


XML documents. Each element has a name and a content.
<tel>513-555-7098</tel>
The content of an element is delimited by special markups known
as start tag and end tag.
Unlike HTML, both start and end tags are required. The following is
not correct in XML:
<tel>513-555-7098

Slide 12
Names in XML

Element names must follow certain rules. As we will see, there are other
names in XML that follow the same rules.
Names in XML must start with either a letter or the underscore character
(“_”). The rest of the name consists of letters, digits, the underscore
character, the dot (“.”), or a hyphen (“-”). Spaces are not allowed in
names.
Finally, names cannot start with the string “xml”, which is reserved for the
XML specification itself.
Unlike HTML, names are case sensitive in XML.
By convention, XML elements are frequently written in lowercase. When a
name consists of several words, the words are usually separated by a
hyphen, as in address-book or written as AddressBook. Choose the
convention that works best for you but try to be consistent.

Slide 13
Names in XML - Quiz

The following are examples of valid or invalid element names in


XML: <copyright-information> <p> <base64> <décompte.client>
<firstname> <123> <first name> <tom&jerry>

Slide 14
Attributes

It is possible to attach additional information to elements in the form of


attributes.
Attributes have a name and a value. The names follow the same rules as
element names.
The syntax is similar to HTML. Elements can have one or more attributes in
the start tag, and the name is separated from the value by the equal
character.
The value of the attribute is enclosed in double or single quotation marks.
For example, the tel element can have a preferred attribute:
<tel preferred=”true”>513-555-8889</tel>
Unlike HTML, XML insists on the quotation marks. The XML processor would
reject the following:
<tel preferred=true>513-555-8889</tel>

Slide 15
Attributes - Quiz

Correct / Incorrect
<confidentiality level=”I don’t know”>
This document is not confidential.
</confidentiality>
or
<confidentiality level=’approved “for your eyes only”’>
This document is top-secret
</confidentiality>
Attribute names without values not allowed. (Yes / No)
Attribute names without delimiters not allowed ( Yes / no)
Only one instance of an attribute tag is allowed within a given tag. (Yes /
no)

Slide 16
Empty Element

Elements that have no content are known as empty elements.


Usually, they are enclosed in the document for the value of their
attributes.
There is a shorthand notation for empty elements: The start and
end tags merge and the slash from the end tag is added at the end
of the opening tag.
For XML, the following two elements are identical:
<email href=”mailto:[email protected]”/>
<email href=”mailto:[email protected]”></email>

Quiz
An empty element tag can have attributes. ( Yes / no)

Slide 17
Nesting of Elements

Element content is not limited to text; elements can contain other


elements that in turn can contain text or elements and so on.
An XML document is a tree of elements. There is no limit to the depth of
the tree, and elements can repeat. As you see in Listing 2.1, there are two
entry elements in the address-book element. The entry for John Doe has
two tel elements. Figure 2.1 is the tree of Listing 2.1. [Refer: XML Example
slide]
An element that is enclosed in another element is called a child. The
element it is enclosed into is its parent.
<name>
<fname>Jack</fname>
<lname>Smith</lname>
</name>

Start and end tags must always be balanced and children are always
completely enclosed in their parents. Following is legal or illegal?
<name><fname>Jack</fname><lname>Smith</name></lname>

Slide 18
Root

At the root of the document there must be one and only one
element. In other words, all the elements in the document must be
the children of a single element.
Quiz: Following example is legal or illegal?
<?xml version=”1.0”?>
<entry>
<name>John Doe</name>
<email href=”mailto:[email protected]”/>
</entry>
<entry>
<name>JackSmith</name>
<email href=”mailto:[email protected]”/>
</entry>

Slide 19
XML Declaration

The XML declaration is the first line of the document. The


declaration identifies the document as an XML document. The
declaration also lists the version of XML used in the document.
<?xml version=”1.0”?>
The declaration can contain other attributes to support other
features such as character set encoding.
The XML declaration is optional.
If the declaration is included however, it must start on the first
character of the first line of the document. The XML
recommendation suggests you include the declaration in every XML
document.

Slide 20
XML Declaration – Stand-alone document

If an XML document can be read with no reference to external sources, it is said to
be a stand-alone document. Such documents can be annotated with a standalone
attribute with a value of yes in the XML declaration. If an XML document requires
external sources to be resolved to parse correctly and/or to construct the entire
data tree (for example, a document with references to external general entities),
then it is not a stand-alone document. Such documents may be marked
standalone='no', but because this is the default, such an annotation rarely appears in
XML documents.
XML declarations
<?xml version='1.0' ?>
<?xml version='1.0' encoding='US-ASCII' ?>
<?xml version='1.0' encoding='US-ASCII' standalone='yes' ?>
<?xml version='1.0' encoding='UTF-8' ?>
<?xml version='1.0' encoding='UTF-16' ?>
<?xml version='1.0' encoding='ISO-10646-UCS-2' ?>
<?xml version='1.0' encoding='ISO-8859-1' ?>
<?xml version='1.0' encoding='Shift-JIS' ?>

Slide 21
Comments

To insert comments in a document, enclose them between “<!--”


and “-->”.
Comments are used for notes, indication of ownership, and more.
They are intended for the human reader and they are ignored by
the XML processor.
<!– This is a comment -->
Comments cannot be inserted in the markup. They must appear
before or after the markup.

Slide 22
Unicode

Characters in XML documents follow the Unicode standard.


XML uses the 16 bit Unicode character set.
XML processor must recognize the UTF-8 and UTF-16 encodings.
Most processors support other encodings. In particular, for Western
European languages, they support ISO 8859-1 (the official name for Latin-
1).
Documents that use encoding other than UTF-8 or UTF-16 must start with
an XML declaration. The declaration must have an attribute encoding to
announce the encoding used. For example, a document written in Latin-1
(such as with Windows Notepad) could use the following declaration:
<?xml version=”1.0” encoding=”ISO-8859-1”?>
<entrée>
<nom>José Dupont<nom/>
<email href=”mailto:[email protected]”/>
</entrée>

Slide 23
XML Declaration - Quiz

How the XML processor can read the encoding parameter. Indeed,
to reach the encoding parameter, the processor must read the
declaration. However, to read the declaration, the processor needs
to know which encoding is being used.
What about those documents that have no declaration (since the
declaration is optional)?

Slide 24
Entities

XML organizes documents physically in entities. In some cases,


entities are equivalent to files; in others, they are not.
Entities are inserted in the document through entity references
(the name of the entity between an ampersand character and a
semicolon).
For the application, the entity reference is replaced by the content
of the entity.
If we assume we have defined an entity “us,” which has the value
“United States,” the following two lines are equivalent:
<country>&us;</country>
<country>United States</country>

Slide 25
Predefined Entities in XML

XML predefines entities for the characters used in markup (angle brackets,
quotes, and so on). The entities are used to escape the characters from
element or attribute content. The entities are
&lt; left angle bracket “<” must be escaped with &lt;
&amp; ampersand “&” must be escaped with &amp;
&gt; right angle bracket “>” must be escaped with &gt; in the combination ]]> in
CDATA sections (see the following)
&apos; single quote “‘” can be escaped with &apos; essentially in parameter
value
&quot; double quote “”” can be escaped with &quot; essentially in parameter
value
Quiz – Correct / Incorrect?
<company>Mark & Spencer</company>
<company>Mark &amp; Spencer</company>

Slide 26
Character references

XML also supports character references where a letter is replaced by its


Unicode character code.
&#DecimalUnicodeValue;
 Character references that start with &# provide a decimal representation of the character
code.

&#xHexadecimalUnicodeValue;
 Character references that start with &#x provides a hexadecimal representation of the
character code.

Example - Character references


<?xml version='1.0' encoding='US-ASCII' ?>
<Personne occupation='&#xe9;tudiant' >
<nom>Martin</nom>
<langue>Fran&#231;ais</langue>
</Personne>

Slide 27
Processing Instructions

Processing instructions (abbreviated PI) is a mechanism to insert


non-XML statements, such as scripts, in the document.
The processing instruction is enclosed in <? and ?>.
The first name is the target. It identifies the application or the
device to which the instructions are directed. The rest of the
processing instructions are in a format specific to the target. It
does not have to be XML.
<?xml-stylesheet href=”simple-ie5.xsl” type=”text/xsl”?>
<?xml version=”1.0” encoding=”ISO-8859-1”?>

Slide 28
CDATA Sections

As you have seen, markup characters (left angle bracket and ampersand)
that appear in the content of an element must be escaped with an entity.
For some applications, it is difficult to escape markup characters, if only
because there are too many of them. Also, it is difficult to include an XML
document in an XML document.
CDATA (Character DATA) sections are intended for these cases. CDATA
sections are delimited by “<[CDATA[” and “]]>”. The XML processor ignores
all markup except for]]>
PCDATA stands for parsed character data and means the element can
contain text. #PCDATA is often (but not always) used for leaf elements.
The difference between CDATA and PCDATA is that PCDATA cannot contain
markup characters.

Slide 29
More on CDATA Sections

Syntax
<![CDATA[…]]>

The ‘…’ section can contain any character string that does not contain
the “]]>” string literal.

May contain most markup characters.


May occur anywhere that character data may occur.
Cannot be nested.
Cannot be empty
Will not be processed by the parser.

Slide 30
CDATA Section - Example

The following example uses a CDATA section to insert an XML


example into an XML document:
<?xml version=”1.0”?>
<example>
<[CDATA[
<?xml version=”1.0”?>
<entry>
<name>John Doe</name>
<email href=”mailto:[email protected]”/>
</entry>]]>

</example>

Slide 31
Well Formed XML

The end tag matches the corresponding start tag, and there is:
No overlapping in element definitions.
No instances of multiple attributes with the same name for one element
Syntax conforms to the XML Specifications
Start-tags all have matching end-tags (or are empty-element tags).
Element tags do not overlap.
Attributes have unique names.
Markup characters are properly escaped.
Elements form a hierarchical tree, with a single root node.
There are no references to external entities, except if a DTD is
provided.

Slide 32
Well formed XML - example

<?xml version="1.0" encoding="UTF-8"?>


<employees>
<employee id="IN9999">
<name>
<firstname>Suraj</firstname>
<middlename>Kumar</middlename>
<surname>Verma</surname>
</name>
<department>IT Services</department>
<project>C2</project>
<details><![CDATA[Some data here >, even these symbols don't bother it.]]></details>
</employee>
<employee id="IN9498">
<name>
<firstname>Abhi</firstname>
<surname>Dhar</surname>
</name>
<department>R&amp;D Services</department>
<project/>
</employee>
</employees>

Slide 33
Four Common Errors in XML Syntax

Forget End Tags


Forget That XML Is Case Sensitive
Introduce Spaces in the Name of Element
<address book>
<entry>
<name>John Doe</name>
<email href=”mailto:[email protected]”/>
</entry>
</address book>

Forget the Quotes for Attribute Value


<tel preferred=true>513-555-8889</tel>

Slide 34
Questions

Slide 35
Thank you

XML Technology, Semester 4


SICSR Executive MBA(IT) @ MindTree, Bangalore, India

By Neeraj Singh (toneeraj(AT)gmail(DOT)com


)
Slide 36

You might also like