0% found this document useful (0 votes)
55 views

Unit 4 XML

The document provides an introduction to XML (Extensible Markup Language). It states that XML is a markup language used to describe data in a structured format. It describes some key features of XML including that it is extensible, human and machine readable, uses tags to structure documents, and is commonly stored in text files. The document then discusses several aspects of XML such as its syntax, elements, tags, attributes, parsers, and differences between XML and HTML.

Uploaded by

Priyanshu Mohta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Unit 4 XML

The document provides an introduction to XML (Extensible Markup Language). It states that XML is a markup language used to describe data in a structured format. It describes some key features of XML including that it is extensible, human and machine readable, uses tags to structure documents, and is commonly stored in text files. The document then discusses several aspects of XML such as its syntax, elements, tags, attributes, parsers, and differences between XML and HTML.

Uploaded by

Priyanshu Mohta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Unit – IV

XML(Extensible Markup Language)

Dr Anupama Jha
Introduction : XML
 XML stands for Extensible Mark-up Language, is actually a simpler and easier-to-use text-
based mark-up language and derived from Standard Generalized Markup Language (
SGML), the standard for how to create a document structure
 It is a language used to create other markup languages to describe data in a structured
format.
 It is approach for describing, capturing, processing and publishing information
 XML is "extensible" because we can use it to make up our own tags i.e user defined tags,
hence XML is designed to be self-descriptive
 XML 1.0 was officially adopted as a W3C recommendation in 1998.
 XML was designed to carry data, not to display data.
 XML Supports CSS, XSL, DOM.
 XML does not qualify to be a programming language as it does not performs any
computation or algorithms.
 It is usually stored in a simple text file and is processed by special software that is capable
of interpreting XML.

Dr Anupama Jha
Features/Characteristics of XML
 XML documents are highly portable.
 One important characteristics of XML is that it is both human readable and machine
readable.
 Processing an XML document requires a software program called an XML parser or
processor.
 Parser check an XML document’s syntax and enable software programs to process mark up
data.
 XML parser support the DOM.
 An XML document can reference a DTD (Document Type Definition) or a Schema that
describe the proper structure of the XML document.

Dr Anupama Jha
XML Based system
If an XML Parser can process an XML document successfully, the XML document is
well-formed. Hence, a valid document is well formed. If the XML document confirms to
the DTD/Schema, the XML document is Valid.

XML
DOCUMENT
XML PARSER OR XML
PROCESSOR APPLICATION

XML DTD

Dr Anupama Jha
The Difference between XML and HTML
1. HTML is about displaying information, where as XML is about carrying information. In other
words, XML was created to structure, store, and transport information. HTML was designed to
display the data.
2. Using XML, we can create own tags where as in HTML it is not possible instead it offers
several built in tags.
3. XML is platform independent neutral and language independent.
4. XML tags and attribute names are case-sensitive where as in HTML it is not.
5. XML attribute values must be single or double quoted where as in HTML it is not compulsory.
6. XML elements must be properly nested.
7. All XML elements must have a closing tag.

Dr Anupama Jha
Dr Anupama Jha
Well Formed XML Documents
A "Well Formed" XML document must have the following correct XML syntax:
 XML documents must have a root element
 XML elements must have a closing tag(start tag must have matching end tag).
 XML tags are case sensitive
 XML elements must be properly nested Ex:<one><two>Hello</two></one>
 XML attribute values must be quoted
 XML with correct syntax is "Well Formed" XML. XML validated against a DTD is "Valid“
XML.

Dr Anupama Jha
XML Naming Rules
XML Naming Rules
XML elements must follow these naming rules:
•Element names are case-sensitive
•Element names must start with a letter or underscore
•Element names cannot start with the letters xml (or XML, or Xml, etc)
•Element names can contain letters, digits, hyphens, underscores, and periods
•Element names cannot contain spaces
Any name can be used, no words are reserved (except xml).

Best Naming Practices


Create descriptive names, like this: <person>, <firstname>, <lastname>.
Create short and simple names, like this: <book_title> not like this: <the_title_of_the_book>.
Avoid "-". If you name something "first-name", some software may think you want to subtract
"name" from "first".
Avoid ".". If you name something "first.name", some software may think that "name" is a
property of the object "first".
Avoid ":". Colons are reserved for namespaces (more later).
Non-English letters like éòá are perfectly legal in XML, but watch out for problems if your
software doesn't support them.
Dr Anupama Jha
Dr Anupama Jha
The XML prolog does not have a closing
tag! This is not an error. The prolog is not
Dr Anupama Jha
a part of the XML document.
What is Markup?

 XML is a markup language that defines set of rules for encoding documents in a format
that is both human-readable and machine-readable.

Dr Anupama Jha
<?xml version="1.0" encoding="UTF-8"?>
Example of xml document:
<note>
x1.xml
<to>Joe</to>
<from>Smith</from>
<heading>Reminder</heading>
<body>Hello world!</body>
</note>
 The xml document begins with XML declaration statement:
 The next line describes the root element of the document:</note>
 This element is "the parent" of all other elements.
 The next 4 lines describe 4 child elements of the root: to, from, heading, and body.
 And finally the last line defines the end of the root element:</note>
 The XML declaration has no closing tag i.e </?xml>
 The file name extension used for xml program is.xml.

Dr Anupama Jha
Valid XML document
 If an XML document is well-formed and has an associated Document Type Declaration
(DTD), then it is said to be a valid XML document.
 We will study more about DTD in the chapter XML - DTDs.

XML DTD
 Document Type Definition purpose is to define the structure of an XML document.
 It defines the structure with a list of defined elements in the xml document.
 Using DTD we can specify the various elements types, attributes and their relationship
with one another.
 Basically DTD is used to specify the set of rules for structuring data in any XML file.

Dr Anupama Jha
Why use a DTD?

 XML provides an application independent way of sharing data.


 With a DTD, independent groups of people can agree to use a common DTD for
interchanging data.
 Your application can use a standard DTD to verify that data that you receive from the
outside world is valid.
 You can also use a DTD to verify your own data.

Dr Anupama Jha
DTD - XML building blocks

Various building blocks of XML are:


1. Elements
2. Tags
3. Attribute
4. Entities
5. CDATA
6. PCDATA

Dr Anupama Jha
DTD - XML building blocks
1. Elements:
 The basic entity is element. The elements are used for defining the tags. The elements typically consist of
opening and closing tag. Mostly only one element is used to define a single tag.

Syntax1: <!ELEMENT element-name (element-content)>


Syntax 2: <!ELEMENT element-name (#CDATA)>
#CDATA means the element contains character data that is not supposed to be parsed by a
parser. or
Syntax 3: <!ELEMENT element-name (#PCDATA)>
#PCDATA means that the element contains data that IS going to be parsed by a parser. Or
Syntax 4: <!ELEMENT element-name (ANY)>
The keyword ANY declares an element with any content.
Example:
<!ELEMENT note (#PCDATA)>

Dr Anupama Jha
Elements with children (sequences)

 Elements with one or more children are defined with the name of the children elements
inside the parentheses:

<!ELEMENT parent-name (child-element-name)>EX:<!ELEMENT student (id)>


<!ELEMENT id (#PCDATA)> or
<!ELEMENT element-name(child-element-name, child-element-name,. .....)>
Example: <!ELEMENT note (to, from, heading, body)>

Dr Anupama Jha
Elements with children (sequences) contd…

 When children are declared in a sequence separated by commas, the children must appear
in the same sequence in the document. In a full declaration, the children must also be
declared, and the children can also have children. The full declaration of the note
document will be:

<!ELEMENT note (to, from, heading, body)>


<!ELEMENT to (#CDATA)>
<!ELEMENT from (#CDATA)>
<!ELEMENT heading (#CDATA)>
<!ELEMENT body (#CDATA)>

Dr Anupama Jha
2. Tags
 Tags are used to markup elements. A starting tag like <element_name> mark up the
beginning of an element, and an ending tag like </element_name> mark up the end of an
element.
Examples:
 A body element: <body>body text in between</body>.
 A message element: <message>some message in between</message>

Dr Anupama Jha
3. Attribute
 The attributes are generally used to specify the values of the element. These are specified
within the double quotes. Ex:

<?xml version="1.0" encoding="UTF-8"?>


<!DOCTYPE note [
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
<!ATTLIST to id CDATA "123">
]>
<note>
<to id="123">Joe</to>
<from>Smith</from>
<heading>Reminder</heading>
<body>Hello World!</body>
</note>
Dr Anupama Jha
Dr Anupama Jha
Dr Anupama Jha
<?xml version = "1.0"?>
<!DOCTYPE address [
<!ELEMENT address ( name )>
<!ELEMENT name ( #PCDATA )>
<!ATTLIST name id CDATA "0">
]> <?xml version = "1.0"?>
<address> <!DOCTYPE address [
<name id = "123"> Smith</name> <!ELEMENT address ( name )>
</address> <!ELEMENT name ( #PCDATA )>
<!ATTLIST name id CDATA #REQUIRED>
In this example we have name element with ]>
attribute id whose default value is 0. The
<address>
default value is been enclosed within the <name id = "123"> Smith</name>
double quotes. </address>

In this example we have used #REQUIRED keyword to specify


that the attribute id must be provided for the element-name name
Dr Anupama Jha
<?xml version = "1.0"?>
<!DOCTYPE address [
<!ELEMENT address (company)*>
<!ELEMENT company (#PCDATA)>
<!ATTLIST company name #FIXED “Microsoft">
]>
<address>
<company name = “Microsoft"> welcome</company>
</address>

Invalid xml
<company name = “TCS"> welcome</company>
Use the #FIXED keyword when you want an In this example we have used the keyword #IMPLIED
attribute to have a fixed value/constant value as we do not want to specify any attributes to be
without allowing the author to change it. If an included in element name. It is optional.
author includes another value, the XML parser will In other words, if the attribute you are declaring has
return an error. no default value, has no fixed value, and is not
required, then you must declare that the attribute
Dr Anupama Jha
as implied.
4. Entities
 Entities as variables used to define common text. Entity references are references to
entities.
 Most of you will known the HTML entity reference: "&nbsp;" that is used to insert an
extra space in an HTML document. Entities are expanded when a document is parsed by an
XML parser.
The following entities are predefined in XML:
&lt; (<), &gt;(>), &amp;(&), &quot;(") and &apos;(')

Dr Anupama Jha
CDATA and PCDATA
 CDATA: It stands for character data. CDATA is text that will NOT be parsed by a parser.
Tags inside the text will NOT be treated as markup and entities will not be expanded.
 PCDATA: It stands for Parsed Character Data(i.e., text). Any parsed character data should
not contain the markup characters. The markup characters are < or > or &. If we want to
use these characters then make use of &lt; , &gt; or &amp;. Think of character data as the
text found between the start tag and the end tag of an XML element. PCDATA is text that
will be parsed by a parser. Tags inside the text will be treated as markup and entities will
be expanded.

Dr Anupama Jha
<!DOCTYPE note
[
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
Where PCDATA refers parsed character data. In the above xml document the elements to, from, heading,
body carries some text, so that, these elements are declared to carry text in DTD file.
This definition file is stored with .dtd extension.

Dr Anupama Jha
Types of DTD:
1. Internal DTD and 2. External DTD

1. Internal DTD
 A DTD is referred to as an internal DTD if elements are declared within the XML files.
 To refer it as internal DTD, standalone attribute in XML declaration must be set to yes. This means, the
declaration works independent of external source.

Syntax:
 The syntax of internal DTD is as shown:
<!DOCTYPE root-element [element-declarations]>
Where root-element is the name of root element and element-declarations is where you declare the elements.

Dr Anupama Jha
Example: Internal DTD

Following is a simple example of internal DTD:


<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE address [
<!ELEMENT address (name, company, phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>Piyush</name>
<company>TCS</company>
<phone>(040) 123-4567</phone>
</address>
Dr Anupama Jha
 Let us go through the above code: Start Declaration- Begin the XML declaration with following statement DTD- Immediately after the
XML header, the document type declaration follows, commonly referred to as the DOCTYPE:

<!DOCTYPE address [
The DOCTYPE declaration has an exclamation mark (!) at the start of the element name. The
DOCTYPE informs the parser that a DTD is associated with this XML document.
DTD Body- The DOCTYPE declaration is followed by body of the DTD, where you declare
elements, attributes, entities, and notations:
<!ELEMENT address (name, company, phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
Several elements are declared here that make up the vocabulary of the <name> document.
<!ELEMENT name (#PCDATA)> defines the element name to be of type "#PCDATA".
Here #PCDATA means parse-able text data. End Declaration - Finally, the declaration
section of the DTD is closed using a closing bracket and a closing angle bracket (]>). This
effectively ends the definition, and thereafter, the XML document follows immediately.

Dr Anupama Jha
Rules
 The document type declaration must appear at the start of the document (preceded only by
the XML header) — it is not permitted anywhere else within the document.
 Similar to the DOCTYPE declaration, the element declarations must start with an
exclamation mark.
 The Name in the document type declaration must match the element type of the root
element.

Dr Anupama Jha
File extdtd.dtd
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE note SYSTEM "extdtd.dtd"> <!ELEMENT note (to, from, heading, body)>
<note> <!ELEMENT to(#PCDATA)>
<to>Joe</to> <!ELEMENT from(#PCDATA)>
<from>Smith</from> <!ELEMENT heading (#PCDATA)>
<heading>Reminder</heading> <!ELEMENT body (#PCDATA)>
<body>Hello World!</body>
</note>

Dr Anupama Jha
XML example
<?xml version="1.0" encoding="UTF-8"?>
<books>
<heading> web technology </heading>
<book>
<title> WT</title>
<author> Deitel</author>
<ISBN>123-456-789</ISBN>
<publisher>wiley</publisher>
<edition>3</edition>
<price>350</price>
</book>
<book>
<title> internet worldwideweb</title>
<author> ditel&amp;ditel</author>
<ISBN>123-456-781</ISBN>
<publisher>pearson</publisher>
<edition>3</edition>
<price>450</price>
</book>
</books>

Dr Anupama Jha
XML with CSS
rule.css
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href='rule.css'?> books
<books> {
<heading> web technology </heading>      background-color: blue;
<book>      width: 100%;
<title> WT</title> }
 heading
<author> Deitel</author>
{
<ISBN>123-456-789</ISBN>      color: green;
<publisher>wiley</publisher>      font-size: 40px;
<edition>3</edition>      background-color: yellow;
<price>350</price> }
</book>  heading, title, author, publisher, edition, price
<book> {
<title> internet worldwideweb</title>       color: green;
}
<author> ditel&amp;ditel</author>  title
<ISBN>123-456-781</ISBN> {
<publisher>pearson</publisher>      font-size: 25px;
<edition>3</edition>      font-weight: bold;
<price>450</price> }
</book>
</books>

Dr Anupama Jha
XML SCHEMA

XML Schemas are More Powerful than DTD


•XML Schemas are written in XML
•XML Schemas are extensible to additions
•XML Schemas support data types
•XML Schemas support namespaces

Dr Anupama Jha
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>

  <book category="cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>

  <book category="children">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>

  <book category="web">
    <title lang="en">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <price>49.99</price>
  </book>

  <book category="web" cover="paperback">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book> Dr Anupama Jha

</bookstore>
XML Namespaces

XML Namespaces provide a method to avoid element name conflicts.

Name Conflicts

In XML, element names are defined by the


developer. This often results in a conflict when trying
to mix XML documents from different XML
applications.

If these XML files were ad


together, there would be
name conflict.
Both contain a <tab
element, but the elements h
different content and meanin
A user or an XML applicat
will not know how to han
file1 file2 these differences.

Dr Anupama Jha
Solving the Name Conflict Using a Prefix

Name conflicts in XML can easily be avoided using a name prefix.


This XML carries information about an HTML table, and a piece of furniture:

In the example above, there will be no conflict because the two <table>
elements have different names.

Dr Anupama Jha
XML Namespaces - The xmlns Attribute
When using prefixes in XML, a namespace for the prefix must be defined.
The namespace can be defined by an xmlns attribute in the start tag of an
element. The namespace declaration has the following syntax.
xmlns:prefix="URI".

t way: Namespaces can be declared in the XML element: A Uniform Resource Identifier (URI) is a string of characters
which identifies an Internet Resource. The most common URI
is the Uniform Resource Locator (URL) which identifies an
Internet domain address.

In the example above:


The xmlns attribute in the first <table>
element gives the h: prefix a qualified
namespace.

The xmlns attribute in the second <table>


element gives the f: prefix a qualified
namespace.

When a namespace is defined for an


element, all child elements with the same
Dr Anupama Jha prefix are associated with the same
namespace.
Second way: Namespaces can also be declared in the XML root element:

Note: The namespace URI is not used by the parser to look up information.


The purpose of using an URI is to give the namespace a unique name.
Dr Anupama Jha

You might also like