Unit 4 XML
Unit 4 XML
Dr Anupama Jha
Introduction : XML
XML stands for Extensible Mark-up Language, is actually a simpler and easier-to-use text-
based mark-up language and derived from Standard Generalized Markup Language (
SGML), the standard for how to create a document structure
It is a language used to create other markup languages to describe data in a structured
format.
It is approach for describing, capturing, processing and publishing information
XML is "extensible" because we can use it to make up our own tags i.e user defined tags,
hence XML is designed to be self-descriptive
XML 1.0 was officially adopted as a W3C recommendation in 1998.
XML was designed to carry data, not to display data.
XML Supports CSS, XSL, DOM.
XML does not qualify to be a programming language as it does not performs any
computation or algorithms.
It is usually stored in a simple text file and is processed by special software that is capable
of interpreting XML.
Dr Anupama Jha
Features/Characteristics of XML
XML documents are highly portable.
One important characteristics of XML is that it is both human readable and machine
readable.
Processing an XML document requires a software program called an XML parser or
processor.
Parser check an XML document’s syntax and enable software programs to process mark up
data.
XML parser support the DOM.
An XML document can reference a DTD (Document Type Definition) or a Schema that
describe the proper structure of the XML document.
Dr Anupama Jha
XML Based system
If an XML Parser can process an XML document successfully, the XML document is
well-formed. Hence, a valid document is well formed. If the XML document confirms to
the DTD/Schema, the XML document is Valid.
XML
DOCUMENT
XML PARSER OR XML
PROCESSOR APPLICATION
XML DTD
Dr Anupama Jha
The Difference between XML and HTML
1. HTML is about displaying information, where as XML is about carrying information. In other
words, XML was created to structure, store, and transport information. HTML was designed to
display the data.
2. Using XML, we can create own tags where as in HTML it is not possible instead it offers
several built in tags.
3. XML is platform independent neutral and language independent.
4. XML tags and attribute names are case-sensitive where as in HTML it is not.
5. XML attribute values must be single or double quoted where as in HTML it is not compulsory.
6. XML elements must be properly nested.
7. All XML elements must have a closing tag.
Dr Anupama Jha
Dr Anupama Jha
Well Formed XML Documents
A "Well Formed" XML document must have the following correct XML syntax:
XML documents must have a root element
XML elements must have a closing tag(start tag must have matching end tag).
XML tags are case sensitive
XML elements must be properly nested Ex:<one><two>Hello</two></one>
XML attribute values must be quoted
XML with correct syntax is "Well Formed" XML. XML validated against a DTD is "Valid“
XML.
Dr Anupama Jha
XML Naming Rules
XML Naming Rules
XML elements must follow these naming rules:
•Element names are case-sensitive
•Element names must start with a letter or underscore
•Element names cannot start with the letters xml (or XML, or Xml, etc)
•Element names can contain letters, digits, hyphens, underscores, and periods
•Element names cannot contain spaces
Any name can be used, no words are reserved (except xml).
XML is a markup language that defines set of rules for encoding documents in a format
that is both human-readable and machine-readable.
Dr Anupama Jha
<?xml version="1.0" encoding="UTF-8"?>
Example of xml document:
<note>
x1.xml
<to>Joe</to>
<from>Smith</from>
<heading>Reminder</heading>
<body>Hello world!</body>
</note>
The xml document begins with XML declaration statement:
The next line describes the root element of the document:</note>
This element is "the parent" of all other elements.
The next 4 lines describe 4 child elements of the root: to, from, heading, and body.
And finally the last line defines the end of the root element:</note>
The XML declaration has no closing tag i.e </?xml>
The file name extension used for xml program is.xml.
Dr Anupama Jha
Valid XML document
If an XML document is well-formed and has an associated Document Type Declaration
(DTD), then it is said to be a valid XML document.
We will study more about DTD in the chapter XML - DTDs.
XML DTD
Document Type Definition purpose is to define the structure of an XML document.
It defines the structure with a list of defined elements in the xml document.
Using DTD we can specify the various elements types, attributes and their relationship
with one another.
Basically DTD is used to specify the set of rules for structuring data in any XML file.
Dr Anupama Jha
Why use a DTD?
Dr Anupama Jha
DTD - XML building blocks
Dr Anupama Jha
DTD - XML building blocks
1. Elements:
The basic entity is element. The elements are used for defining the tags. The elements typically consist of
opening and closing tag. Mostly only one element is used to define a single tag.
Dr Anupama Jha
Elements with children (sequences)
Elements with one or more children are defined with the name of the children elements
inside the parentheses:
Dr Anupama Jha
Elements with children (sequences) contd…
When children are declared in a sequence separated by commas, the children must appear
in the same sequence in the document. In a full declaration, the children must also be
declared, and the children can also have children. The full declaration of the note
document will be:
Dr Anupama Jha
2. Tags
Tags are used to markup elements. A starting tag like <element_name> mark up the
beginning of an element, and an ending tag like </element_name> mark up the end of an
element.
Examples:
A body element: <body>body text in between</body>.
A message element: <message>some message in between</message>
Dr Anupama Jha
3. Attribute
The attributes are generally used to specify the values of the element. These are specified
within the double quotes. Ex:
Invalid xml
<company name = “TCS"> welcome</company>
Use the #FIXED keyword when you want an In this example we have used the keyword #IMPLIED
attribute to have a fixed value/constant value as we do not want to specify any attributes to be
without allowing the author to change it. If an included in element name. It is optional.
author includes another value, the XML parser will In other words, if the attribute you are declaring has
return an error. no default value, has no fixed value, and is not
required, then you must declare that the attribute
Dr Anupama Jha
as implied.
4. Entities
Entities as variables used to define common text. Entity references are references to
entities.
Most of you will known the HTML entity reference: " " that is used to insert an
extra space in an HTML document. Entities are expanded when a document is parsed by an
XML parser.
The following entities are predefined in XML:
< (<), >(>), &(&), "(") and '(')
Dr Anupama Jha
CDATA and PCDATA
CDATA: It stands for character data. CDATA is text that will NOT be parsed by a parser.
Tags inside the text will NOT be treated as markup and entities will not be expanded.
PCDATA: It stands for Parsed Character Data(i.e., text). Any parsed character data should
not contain the markup characters. The markup characters are < or > or &. If we want to
use these characters then make use of < , > or &. Think of character data as the
text found between the start tag and the end tag of an XML element. PCDATA is text that
will be parsed by a parser. Tags inside the text will be treated as markup and entities will
be expanded.
Dr Anupama Jha
<!DOCTYPE note
[
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
Where PCDATA refers parsed character data. In the above xml document the elements to, from, heading,
body carries some text, so that, these elements are declared to carry text in DTD file.
This definition file is stored with .dtd extension.
Dr Anupama Jha
Types of DTD:
1. Internal DTD and 2. External DTD
1. Internal DTD
A DTD is referred to as an internal DTD if elements are declared within the XML files.
To refer it as internal DTD, standalone attribute in XML declaration must be set to yes. This means, the
declaration works independent of external source.
Syntax:
The syntax of internal DTD is as shown:
<!DOCTYPE root-element [element-declarations]>
Where root-element is the name of root element and element-declarations is where you declare the elements.
Dr Anupama Jha
Example: Internal DTD
<!DOCTYPE address [
The DOCTYPE declaration has an exclamation mark (!) at the start of the element name. The
DOCTYPE informs the parser that a DTD is associated with this XML document.
DTD Body- The DOCTYPE declaration is followed by body of the DTD, where you declare
elements, attributes, entities, and notations:
<!ELEMENT address (name, company, phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
Several elements are declared here that make up the vocabulary of the <name> document.
<!ELEMENT name (#PCDATA)> defines the element name to be of type "#PCDATA".
Here #PCDATA means parse-able text data. End Declaration - Finally, the declaration
section of the DTD is closed using a closing bracket and a closing angle bracket (]>). This
effectively ends the definition, and thereafter, the XML document follows immediately.
Dr Anupama Jha
Rules
The document type declaration must appear at the start of the document (preceded only by
the XML header) — it is not permitted anywhere else within the document.
Similar to the DOCTYPE declaration, the element declarations must start with an
exclamation mark.
The Name in the document type declaration must match the element type of the root
element.
Dr Anupama Jha
File extdtd.dtd
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note SYSTEM "extdtd.dtd"> <!ELEMENT note (to, from, heading, body)>
<note> <!ELEMENT to(#PCDATA)>
<to>Joe</to> <!ELEMENT from(#PCDATA)>
<from>Smith</from> <!ELEMENT heading (#PCDATA)>
<heading>Reminder</heading> <!ELEMENT body (#PCDATA)>
<body>Hello World!</body>
</note>
Dr Anupama Jha
XML example
<?xml version="1.0" encoding="UTF-8"?>
<books>
<heading> web technology </heading>
<book>
<title> WT</title>
<author> Deitel</author>
<ISBN>123-456-789</ISBN>
<publisher>wiley</publisher>
<edition>3</edition>
<price>350</price>
</book>
<book>
<title> internet worldwideweb</title>
<author> ditel&ditel</author>
<ISBN>123-456-781</ISBN>
<publisher>pearson</publisher>
<edition>3</edition>
<price>450</price>
</book>
</books>
Dr Anupama Jha
XML with CSS
rule.css
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href='rule.css'?> books
<books> {
<heading> web technology </heading> background-color: blue;
<book> width: 100%;
<title> WT</title> }
heading
<author> Deitel</author>
{
<ISBN>123-456-789</ISBN> color: green;
<publisher>wiley</publisher> font-size: 40px;
<edition>3</edition> background-color: yellow;
<price>350</price> }
</book> heading, title, author, publisher, edition, price
<book> {
<title> internet worldwideweb</title> color: green;
}
<author> ditel&ditel</author> title
<ISBN>123-456-781</ISBN> {
<publisher>pearson</publisher> font-size: 25px;
<edition>3</edition> font-weight: bold;
<price>450</price> }
</book>
</books>
Dr Anupama Jha
XML SCHEMA
Dr Anupama Jha
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title lang="en">XQuery Kick Start</title>
<author>James McGovern</author>
<author>Per Bothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaidyanathan Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<book category="web" cover="paperback">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book> Dr Anupama Jha
</bookstore>
XML Namespaces
Name Conflicts
Dr Anupama Jha
Solving the Name Conflict Using a Prefix
In the example above, there will be no conflict because the two <table>
elements have different names.
Dr Anupama Jha
XML Namespaces - The xmlns Attribute
When using prefixes in XML, a namespace for the prefix must be defined.
The namespace can be defined by an xmlns attribute in the start tag of an
element. The namespace declaration has the following syntax.
xmlns:prefix="URI".
t way: Namespaces can be declared in the XML element: A Uniform Resource Identifier (URI) is a string of characters
which identifies an Internet Resource. The most common URI
is the Uniform Resource Locator (URL) which identifies an
Internet domain address.