XML Syntax Rules of XML Language: © 2008 Mindtree Consulting
XML Syntax Rules of XML Language: © 2008 Mindtree Consulting
Slide 2
Need For XML
Revision of previous session
Quiz
Slide 4
XML
How can you have both more tags and fewer tags in a single
language?
To resolve this dilemma, XML makes essentially two changes to HTML:
It predefines no tags.
It is stricter.
Slide 5
What is Markup
Slide 6
Applications of XML
Publishing
XML is being used by an increasing number of publishers as the format
for documents.
Example XML document for a monthly newsletter. As you can see, it uses
elements for the title, abstract, paragraphs, and other concepts common
in publishing.
RSS / Atom
Eg Bloglines
Slide 7
XML Introduction - Quiz
Basic questions on XML Introduction
Slide 9
The XML Syntax
Start & End Tags, Elements, Element nesting XML Names, Attributes,
XML Declaration, Entities, CDATA, Comments, Processing Instructions,
Well formed XML
Slide 11
Element’s Start and End Tags
Slide 12
Names in XML
Element names must follow certain rules. As we will see, there are other
names in XML that follow the same rules.
Names in XML must start with either a letter or the underscore character
(“_”). The rest of the name consists of letters, digits, the underscore
character, the dot (“.”), or a hyphen (“-”). Spaces are not allowed in
names.
Finally, names cannot start with the string “xml”, which is reserved for the
XML specification itself.
Unlike HTML, names are case sensitive in XML.
By convention, XML elements are frequently written in lowercase. When a
name consists of several words, the words are usually separated by a
hyphen, as in address-book or written as AddressBook. Choose the
convention that works best for you but try to be consistent.
Slide 13
Names in XML - Quiz
Slide 14
Attributes
Slide 15
Attributes - Quiz
Correct / Incorrect
<confidentiality level=”I don’t know”>
This document is not confidential.
</confidentiality>
or
<confidentiality level=’approved “for your eyes only”’>
This document is top-secret
</confidentiality>
Attribute names without values not allowed. (Yes / No)
Attribute names without delimiters not allowed ( Yes / no)
Only one instance of an attribute tag is allowed within a given tag. (Yes /
no)
Slide 16
Empty Element
Quiz
An empty element tag can have attributes. ( Yes / no)
Slide 17
Nesting of Elements
Start and end tags must always be balanced and children are always
completely enclosed in their parents. Following is legal or illegal?
<name><fname>Jack</fname><lname>Smith</name></lname>
Slide 18
Root
At the root of the document there must be one and only one
element. In other words, all the elements in the document must be
the children of a single element.
Quiz: Following example is legal or illegal?
<?xml version=”1.0”?>
<entry>
<name>John Doe</name>
<email href=”mailto:[email protected]”/>
</entry>
<entry>
<name>JackSmith</name>
<email href=”mailto:[email protected]”/>
</entry>
Slide 19
XML Declaration
Slide 20
XML Declaration – Stand-alone document
If an XML document can be read with no reference to external sources, it is said to
be a stand-alone document. Such documents can be annotated with a standalone
attribute with a value of yes in the XML declaration. If an XML document requires
external sources to be resolved to parse correctly and/or to construct the entire
data tree (for example, a document with references to external general entities),
then it is not a stand-alone document. Such documents may be marked
standalone='no', but because this is the default, such an annotation rarely appears in
XML documents.
XML declarations
<?xml version='1.0' ?>
<?xml version='1.0' encoding='US-ASCII' ?>
<?xml version='1.0' encoding='US-ASCII' standalone='yes' ?>
<?xml version='1.0' encoding='UTF-8' ?>
<?xml version='1.0' encoding='UTF-16' ?>
<?xml version='1.0' encoding='ISO-10646-UCS-2' ?>
<?xml version='1.0' encoding='ISO-8859-1' ?>
<?xml version='1.0' encoding='Shift-JIS' ?>
Slide 21
Comments
Slide 22
Unicode
Slide 23
XML Declaration - Quiz
How the XML processor can read the encoding parameter. Indeed,
to reach the encoding parameter, the processor must read the
declaration. However, to read the declaration, the processor needs
to know which encoding is being used.
What about those documents that have no declaration (since the
declaration is optional)?
Slide 24
Entities
Slide 25
Predefined Entities in XML
XML predefines entities for the characters used in markup (angle brackets,
quotes, and so on). The entities are used to escape the characters from
element or attribute content. The entities are
< left angle bracket “<” must be escaped with <
& ampersand “&” must be escaped with &
> right angle bracket “>” must be escaped with > in the combination ]]> in
CDATA sections (see the following)
' single quote “‘” can be escaped with ' essentially in parameter
value
" double quote “”” can be escaped with " essentially in parameter
value
Quiz – Correct / Incorrect?
<company>Mark & Spencer</company>
<company>Mark & Spencer</company>
Slide 26
Character references
&#xHexadecimalUnicodeValue;
Character references that start with &#x provides a hexadecimal representation of the
character code.
Slide 27
Processing Instructions
Slide 28
CDATA Sections
As you have seen, markup characters (left angle bracket and ampersand)
that appear in the content of an element must be escaped with an entity.
For some applications, it is difficult to escape markup characters, if only
because there are too many of them. Also, it is difficult to include an XML
document in an XML document.
CDATA (Character DATA) sections are intended for these cases. CDATA
sections are delimited by “<[CDATA[” and “]]>”. The XML processor ignores
all markup except for]]>
PCDATA stands for parsed character data and means the element can
contain text. #PCDATA is often (but not always) used for leaf elements.
The difference between CDATA and PCDATA is that PCDATA cannot contain
markup characters.
Slide 29
More on CDATA Sections
Syntax
<![CDATA[…]]>
The ‘…’ section can contain any character string that does not contain
the “]]>” string literal.
Slide 30
CDATA Section - Example
</example>
Slide 31
Well Formed XML
The end tag matches the corresponding start tag, and there is:
No overlapping in element definitions.
No instances of multiple attributes with the same name for one element
Syntax conforms to the XML Specifications
Start-tags all have matching end-tags (or are empty-element tags).
Element tags do not overlap.
Attributes have unique names.
Markup characters are properly escaped.
Elements form a hierarchical tree, with a single root node.
There are no references to external entities, except if a DTD is
provided.
Slide 32
Well formed XML - example
Slide 33
Four Common Errors in XML Syntax
Slide 34
Questions
Slide 35
Thank you