Unit-3 XML
Unit-3 XML
Unit-3 XML
XML
INTRODUCTION:
What is xml
Students often underline or highlight a passage to revise easily, same in the sense
of modern mark-up language highlighting or underlining is replaced by tags.
HTML VS XML
7. HTML can ignore small errors. XML does not allow errors.
10. HTML tags are predefined tags. XML tags are user-defined tags.
HTML tags are used for displaying XML tags are used for describing the
13.
the data. data not for displaying.
HTML does not carry data it just XML carries the data to and from the
16.
displays it. database.
In this chapter, we will discuss the simple syntax rules to write an XML document.
Following is a complete XML document −
<?xml version = "1.0"?>
<contact-info>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</contact-info>
You can notice there are two kinds of information in the above example −
Markup, like <contact-info>
The text, or the character data, Tutorials Point and (040) 123-4567.
The following diagram depicts the syntax rules to write different types of markup
and text in an XML document.
XML Declaration
The XML document can optionally have an XML declaration. It is written as follows −
<?xml version = "1.0" encoding = "UTF-8"?>
Where version is the XML version and encoding specifies the character encoding used
in the document.
Syntax Rules for XML Declaration
The XML declaration is case sensitive and must begin with "<?xml>" where "xml"
is written in lower-case.
If document contains XML declaration, then it strictly needs to be the first
statement of the XML document.
The XML declaration strictly needs be the first statement in the XML document.
An HTTP protocol can override the value of encoding that you put in the XML
declaration.
Tags and Elements
An XML file is structured by several XML-elements, also called XML-nodes or XML-
tags. The names of XML-elements are enclosed in triangular brackets < > as shown
below −
<element>
Syntax Rules for Tags and Elements
Element Syntax − Each XML-element needs to be closed either with start or with end
elements as shown below −
<element>....</element>
or in simple-cases, just this way −
<element/>
Nesting of Elements − An XML-element can contain multiple XML-elements as its
children, but the children elements must not overlap. i.e., an end tag of an element must
have the same name as that of the most recent unmatched start tag.
The Following example shows incorrect nested tags −
<?xml version = "1.0"?>
<contact-info>
<company>TutorialsPoint
</contact-info>
</company>
The Following example shows correct nested tags −
<?xml version = "1.0"?>
<contact-info>
<company>TutorialsPoint</company>
<contact-info>
Root Element − An XML document can have only one root element. For example,
following is not a correct XML document, because both the x and y elements occur at
the top level without a root element −
<x>...</x>
<y>...</y>
The Following example shows a correctly formed XML document −
<root>
<x>...</x>
<y>...</y>
</root>
Case Sensitivity − The names of XML-elements are case-sensitive. That means the
name of the start and the end elements need to be exactly in the same case.
For example, <contact-info> is different from <Contact-Info>
XML Attributes:
An attribute specifies a single property for the element, using a name/value pair. An
XML-element can have one or more attributes. For example −
<a href = "http://www.tutorialspoint.com/">Tutorialspoint!</a>
Here href is the attribute name and http://www.tutorialspoint.com/ is attribute value.
Syntax Rules for XML Attributes
Attribute names in XML (unlike HTML) are case sensitive. That
is, HREF and href are considered two different XML attributes.
Same attribute cannot have two values in a syntax. The following example shows
incorrect syntax because the attribute b is specified twice
−
<a b = "x" c = "y" b = "z">....</a>
Attribute names are defined without quotation marks, whereas attribute values
must always appear in quotation marks. Following example demonstrates
incorrect xml syntax
−
<a b = x>....</a>
In the above syntax, the attribute value is not defined in quotation marks.
XML References
References usually allow you to add or include additional text or markup in an XML
document. References always begin with the symbol "&" which is a reserved character
and end with the symbol ";". XML has two types of references −
Entity References − An entity reference contains a name between the start and
the end delimiters. For example & where amp is name. The name refers to
a predefined string of text and/or markup.
Character References − These contain references, such as A, contains a
hash mark (“#”) followed by a number. The number always refers to the Unicode
code of a character. In this case, 65 refers to alphabet "A".
XML Text
The names of XML-elements and XML-attributes are case-sensitive, which means the
name of start and end elements need to be written in the same case. To avoid character
encoding problems, all XML files should be saved as Unicode UTF-8 or UTF-16 files.
Whitespace characters like blanks, tabs and line-breaks between XML-elements and
between the XML-attributes will be ignored.
Some characters are reserved by the XML syntax itself. Hence, they cannot be used
directly. To use them, some replacement-entities are used, which are listed below −
Previous Page
<to>Tove</to>
<from>Jani</from>
</note>
Entity References
Some characters have a special meaning in XML.
If you place a character like "<" inside an XML element, it will generate an error
because the parser interprets it as the start of a new element.
To avoid this error, replace the "<" character with an entity reference:
In XML, there are no rules about when to use attributes, and when
to use child elements.
<person sex="female">
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
<person>
<sex>female</sex>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
My Favorite Way
I like to store data in child elements.
The following three XML documents contain exactly the same
information:
A date attribute is used in the first example:
<note date="12/11/2002">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<note>
<date>12/11/2002</date>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Well-formed
XML
document
Valid XML
document
Well-formed XML Document
An XML document is said to be well-formed if it adheres to the following rules −
Non DTD XML files must use the predefined character entities
for amp(&), apos(single quote), gt(>), lt(<), quot(double quote).
It must follow the ordering of the tag. i.e., the inner tag must be closed before
closing the outer tag.
Each of its opening tags must have a closing tag or it must be a self ending tag.
(<title>....</title> or <title/>).
It must have only one attribute in a start tag, which needs to be quoted.
amp(&), apos(single quote), gt(>), lt(<), quot(double quote) entities other than
these must be declared.
Example
Following is an example of a well-formed XML document −
<?xml version = "1.0" encoding = "UTF-8" standalone = "yes" ?>
<!DOCTYPE address
[
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
The above example is said to be well-formed as −
It defines the type of document. Here, the document type is element type.
It includes a root element named as address.
Each of the child elements among name, company and phone is enclosed in its
self explanatory tag.
Order of the tags is maintained.
Valid XML Document
If an XML document is well-formed and has an associated Document Type Declaration
(DTD), then it is said to be a valid XML document. We will study more about DTD in the
chapter XML - DTDs.
Introduction
A well-formed XML document must have a corresponding end tag for all of its start tags.
Nesting of elements within each other in an XML document must be proper. For
example, <tutorial><topic>XML</topic></tutorial> is a correct way of nesting but
<tutorial><topic>XML</tutorial></topic> is not.
In each element two attributes must not have the same value. For example, <tutorial
id="001"><topic>XML</topic></tutorial> is right,but <tutorial id="001"
id="w3r"><topic>XML</topic></tutorial> is incorrect.
An XML document can contain only one root element. So, the root element of an xml
document is an element which is present only once in an xml document and it does not
appear as a child element within any other element.
Example of a Valid XML document
<?xml
version="1.0" ?>
<w3resource>
<design>
html
xhtml
css
svg
xml
</design>
<programming>
php
mysql
</programming>
</w3resource>
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
The above example is said to be well-formed as −
It defines the type of document. Here, the document type is element type.
It includes a root element named as address.
Each of the child elements among name, company and phone is enclosed in its
self explanatory tag.
Order of the tags is maintained.
Valid XML Document
If an XML document is well-formed and has an associated Document Type Declaration
(DTD), then it is said to be a valid XML document. We will study more about DTD in the
chapter XML - DTDs.
XML DTD
INTERNAL DTD:
DTD stands for Document Type Definition , and it is used to define the structure
and content of an XML document. An XML document can have an internal DTD
or an external DTD, depending on the needs of the user. In this article, we will
discuss the differences between internal and external DTDs. This article will
discuss the dissimilarities between these two types of DTD, including their
syntax, sample usage, and variations presented in tabular form.
Internal DTD: Internal DTD is a type of Document Type Definition (DTD) in
XML that is written within the XML document itself. It specifies the structure
and rules for the elements and attributes of the XML document. An internal DTD
is enclosed within the <!DOCTYPE> declaration of the XML document and is
defined using a set of predefined keywords and syntax. Internal DTDs are
suitable for smaller XML documents where the complexity of the structure is
not very high. It is easier to maintain and modify the internal DTD as it is part of
the XML document itself.
Syntax:
<!DOCTYPE root_element[ <!ELEMENT element_name (element_content)>
<!ELEMENT another_element_name (another_element_content)>
]>
Example: In this example, we will show the internal DTD.
XML
]>
<customers>
<customer>
<name>Satyam Nayak</name>
<email>[email protected]</email>
<phone>112-123-1234</phone>
</customer>
<customer>
<name>Sonu N</name>
<email>[email protected]</email>
<phone>112-455-9969</phone>
</customer>
</customers>
Output:
Internal DTD
The DTD defines the structure of an XML document that contains customer
information. The XML document contains two customer elements, and each
customer element contains a name, email, and phone element.
External DTD:
External DTD is a type of Document Type Definition (DTD) in XML that is
located outside of the actual XML document it describes. It can be stored
in a separate file or accessed via a URL, and it defines the structure,
rules, and constraints for the elements and attributes within an XML
document. By using an external DTD, multiple XML documents can share
the same set of rules and constraints, leading to more consistency and
easier maintenance. External DTD can also be updated independently
without having to modify the XML documents themselves.
Syntax:
<!DOCTYPE root_element SYSTEM "DTD_file_name">
Example: In this example, we will show the external DTD
XML
<?xml version="1.0" encoding="UTF-8"?>
<customers>
<customer>
<name>Satyam Nayak</name>
<email>[email protected]</email>
<phone>122-112-1234</phone>
</customer>
<customer>
<name>Sonu N</name>
<email>[email protected]</email>
<phone>112-554-9969</phone>
</customer>
</customers>
Output:
External DTD
The main building blocks of both XML and HTML documents are elements.
Elements
Attributes
Entities
PCDATA
CDATA
Elements
Elements are the main building blocks of both XML and HTML documents.
Examples of HTML elements are "body" and "table". Examples of XML elements
could be "note" and "message". Elements can contain text, other elements, or
be empty. Examples of empty HTML elements are "hr", "br" and "img".
Examples:
<body>some text</body>
<message>some text</message>
Attributes
Attributes provide extra information about elements.
Attributes are always placed inside the opening tag of an element. Attributes
always come in name/value pairs. The following "img" element has additional
information about a source file:
The name of the element is "img". The name of the attribute is "src". The value
of the attribute is "computer.gif". Since the element itself is empty it is closed
by a " /".
Entities
Some characters have a special meaning in XML, like the less than sign (<) that
defines the start of an XML tag.
Most of you know the HTML entity: " ". This "no-breaking-space" entity is
used in HTML to insert an extra space in a document. Entities are expanded
when a document is parsed by an XML parser.
< <
> >
& &
" "
' '
PCDATA
PCDATA means parsed character data.
Think of character data as the text found between the start tag and the end tag
of an XML element.
Tags inside the text will be treated as markup and entities will be expanded.
However, parsed character data should not contain any &, <, or > characters;
these need to be represented by the & < and > entities, respectively.
CDATA
CDATA means character data.
CDATA is text that will NOT be parsed by a parser. Tags inside the text will
NOT be treated as markup and entities will not be expanded.