Chapter 5 XML

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

CHAPTER FIVE

eXtensible Markup Language XML

1
Why XML ?

Although HTML is widely used for formatting and


structuring Web documents, it is not suitable for
specifying structured data that is extracted from
databases

A new language—namely XML has emerged as


the standard for structuring and exchanging data
over the Web.

2
What is XML ?

• XML stands for eXtensible Markup Language.


• A markup language is used to provide information
about a document.
• Tags are added to the document to provide the extra
information.
• XML tags give a reader some idea what some of the
data means
• XML and HTML have a similar syntax … both derived
from SGML

3
The Basic Rules

• XML is case sensitive


• All start tags must have end tags
• Elements must be properly nested
• XML declaration is the first statement
• Every document must contain a root element
• Attribute values must have quotation marks
• Certain characters are reserved for parsing

4
Encoding

• XML uses Unicode to encode characters.


• Unicode comes in many flavors.
• The most common one used in West is UTF-8.
• UTF-8 is a variable length code. Characters are
encoded in 1 byte, 2 bytes, or 4 bytes.

5
Example :
<?xml version = “1.0” ?>
<address>
<name>
<first>Mohammed </first>
<last>Ali</last>
</name>
<email>[email protected]</email>
<phone>05278743</phone>
<birthday>
<year>2001</year>
<month>01</month>
<day>09</day>
</birthday>
</address>

6
XML Files are Trees

• An XML document has a single root node.


• Preorder traversal are usually used.

address

name email phone birthday

first last year month day

7
HTML vs XML

• Fixed set of tags  Extensible set of tags


• Presentation oriented  Content orientated
• No data validation  Standard Data
capabilities infrastructure
• Single presentation
 Allows multiple output
• Tags are used for forms
display.
 Tags are used to
describe documents
and data.
8
Validation
• A well-formed document has a tree structure and obeys
all the XML rules.
• A particular application may add more rules in either a
DTD (document type definition) or in a schema.
• Many specialized DTDs and schemas have been created
to describe particular areas.
• DTDs were developed first, so they are not as
comprehensive as schema.

9
DTD : Document Type Definitions
• A DTD describes the tree structure of a document and something
about its data.

• There are two data types, PCDATA and CDATA.


• PCDATA is parsed character data.
• CDATA is character data, not usually parsed.

• A DTD determines how many times a node may appear, and how
child nodes are ordered.

10
DTD for address Example

<!ELEMENT address (name, email, phone, birthday)>


<!ELEMENT name (first, last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT last (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT birthday (year, month, day)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT day (#PCDATA)>

11
Schemas

• Schemas are themselves XML documents.


• They were standardized after DTDs and provide more
information about the document.
• They have a number of data types including string, decimal,
integer, boolean, date, and time.
• They divide elements into simple and complex types.
• They also determine the tree structure and how many
children a node may have.

12
Schema for First address Example

<?xml version="1.0" encoding="ISO-8859-1" ?>


<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="address">
<xs:complexType>
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="email" type="xs:string"/>
<xs:element name="phone" type="xs:string"/>
<xs:element name="birthday" type="xs:date"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

13
XSLT
Extensible Stylesheet Language Transformations

• XSLT is used to transform one xml document into another,


often an html document.
• A program is used that takes as input one xml document
and produces as output another.
• If the resulting document is in html, it can be viewed by a
web browser.
• This is a good way to display xml data.

14
A Style Sheet to Transform address.xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/
XSL/Transform">
<xsl:template match="address">
<html><head><title>Address Book</title></head>
<body>
<xsl:value-of select="name"/>
<br/><xsl:value-of select="email"/>
<br/><xsl:value-of select="phone"/>
<br/><xsl:value-of select="birthday"/>
</body>
</html>
</xsl:template> Result :
</xsl:stylesheet> AMU MCA
[email protected]
123-45-6789
15
1920-01-09
Parsers

• There are two principal models for parsers.


• SAX – Simple API for XML
• Uses a call-back method
• Similar to javax listeners
• DOM – Document Object Model
• Creates a parse tree
• Requires a tree traversal

16
Advantages of XML

• XML uses human, not computer, language. XML is


readable and understandable, even by novices, and no
more difficult to code than HTML.
• XML is platform independent and programming language
independent, thus it can be used on any system and
supports the technology change when that happens
• XML is text (Unicode) based.
• Takes up less space.
• Can be transmitted efficiently.
• One XML document can be displayed differently in
different media.
• Html, video, CD, DVD,
• You only have to change the XML document in order to
change all the rest.
• XML documents can be modularized. Parts can be reused.
17
Disadvantages of XML

More difficult ,demanding and precise than HTML.

Lack of browser support / end user applications.

The redundancy in syntax of XML causes higher


storage and transportation cost when the volume of
data is large

XML file sizes are usually very large due to its verbose
nature, it is totally dependent on who is writing it.

18

You might also like