XML DTD

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

XML (eXtensible Markup Language)

o Xml (eXtensible Markup Language)


o Extensible Markup Language (XML) is used to describe data.
o XML is designed to store and transport data.
o Xml was released in late 90’s. It was created to provide an easy to use and store self
describing data.
o XML became a W3C Recommendation on February 10, 1998.
o XML is not a replacement for HTML.
o XML is designed to be self-descriptive.
o XML is designed to carry data, not to display data.
o XML tags are not predefined. We must define our own tags.
o XML is platform independent and language independent.

XML BASICS :
XML, or eXtensible markup language, is all about creating a universal way for both formatting and
presenting data. Once data is coded or marked up with XML tags, data can then be used in many
different ways.

Main features of XML:

 XML files are text files, which can be managed by any text editor.
 XML is very simple, because it has less than 10 syntax rules.

Because of these features, XML offers following advantages

 XML provides a basic syntax that can be used to share information between different kinds of
computers, different applications, and different organizations. XML data is stored in plain
text format.
 With XML, our data can be available to all kinds of "reading machines" (Handheld
computers, voice machines, news feeds, etc), and make it more available for blind people, or
people with other disabilities.
 Databases can trade tables, business applications can trade updates, and document systems
can share information.
 It supports Unicode, allowing almost any information in any written human language to be
communicated.
 Its self-documenting format describes structure and field names as well as specific values.
 Content-based XML markup enhances searchability, making it possible for agents and search
engines to categorize data instead of wasting processing power on context-based full-text
searches.
 XML is heavily used as a format for document storage and processing, both online and
offline.
 It is based on international standards.
 It is platform-independent, thus relatively immune to changes in technology.
 Forward and backward compatibility are relatively easy to maintain despite changes in DTD or
Schema.

How can XML be used?


 XML can keep data separated from HTML
 XML can be used to store data inside HTML documents
 XML can be used as a format to exchange information
 XML can be used to store data in files or in databases

12
Why xml?
Platform Independent and Language Independent:
The main benefit of xml is that we can use it to take data from a program like Microsoft SQL,
convert it into XML then share that XML with other programs and platforms. We can communicate
between two platforms which are generally very difficult.

The main thing which makes XML truly powerful is its international acceptance. Many corporation
use XML interfaces for databases, programming, office application mobile phones and more. It is
due to its platform independent feature.

XML Example:
XML documents create a hierarchical structure looks like a tree so it is known as XML Tree that
starts at "the root" and branches to "the leaves".

Example of XML Document


XML documents uses a self-describing and simple syntax:

<?xml version="1.0" encoding="ISO-8859-1"?>


<note>
<to>cam</to>
<from>Jam</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
The first line is the XML declaration. It defines the XML version (1.0) and the encoding used (ISO
8859-1 = Latin-1/West European character set).
The next line describes the root element of the document (like saying: "this document is a note"):
1. <note>
The next 4 lines describe 4 child elements of the root (to, from, heading, and body).
2. <to>Tove</to>
3. <from>Jani</from>
4. <heading>Reminder</heading>
5. <body>Don't forget me this weekend!</body>
And finally the last line defines the end of the root element.
6. </note>

XML documents must contain a root element. This element is "the parent" of all other elements.
The elements in an XML document form a document tree. The tree starts at the root and branches to
the lowest level of the tree.
All elements can have sub elements (child elements).

<root>
<child>
<subchild>.....</subchild>
</child>
</root>

13
The main difference between XML and HTML:
HTML is an abbreviation for HyperText Markup Language while XML stands for eXtensible
Markup Language.The differences are as follows:-
1. HTML was designed to display data with focus on how data looks while XML was
designed to be a software and hardware independent tool used to transport and store data,
with focus on what data is.
2. HTML is a markup language itself while XML provides a framework for defining
markup languages.
3. HTML is a presentation language while XML is neither a programming language nor a
presentation language.
4. HTML is case insensitive while XML is case sensitive.
5. HTML is used for designing a web-page to be rendered on the client side while XML is
used basically to transport data between the application and the database.
6. HTML has its own predefined tags while what makes XML flexible is that custom tags can
be defined and the tags are invented by the author of the XML document.
7. HTML is not strict if the user does not use the closing tags but XML makes it mandatory
for the user the close each tag that has been used.
8. HTML does not preserve white space while XML does.
9. HTML is about displaying data, hence static but XML is about carrying information, hence
dynamic.
Thus,it can be said that HTML and XML are not competitors but rather complement to each
other and clearly serving altogether different purposes.

The building blocks of XML documents

XML documents (and HTML documents) are made up by the following building blocks:-
Elements, Tags, Attributes, Entities, PCDATA, and CDATA
This is a brief explanation of each of the building blocks:
Elements
Elements are the main building blocks of both XML and HTML documents. Examples of HTML
elements are "body" and "table".
Examples of XML elements could be "note" and "message". Elements can contain text, other
elements, or be empty. Examples of empty HTML elements are "hr", "br" and "img".
In a DTD, elements are declared with an ELEMENT declaration.
Tags
Tags are used to markup elements.
A starting tag like <element_name> mark up the beginning of an element, and an ending tag like
</element_name> mark up the end of an element.
Examples: A body element: <body>body text in between</body>. A message element:
<message>some message in between</message>
Attributes
Attributes provide extra information about elements.
Attributes are placed inside the start tag of an element. Attributes come in name/value pairs. The
following "img" element has an additional information about a source file:
<img src="computer.gif" />
The name of the element is "img". The name of the attribute is "src". The value of the attribute is
"computer.gif". Since the element itself is empty it is closed by a " /".
PCDATA
PCDATA stands for Parsed Character data. PCDATA is the text that will be parsed by a parser.
Tags inside the PCDATA will be treated as markup and entities will be expanded.

14
CDATA
CDATA: (Unparsed Character data): CDATA contains the text which is not parsed further in an
XML document. Tags inside the CDATA text are not treated as markup and entities will not be
expanded.
Entities
Entities as variables used to define common text. Entity references are references to entities.
Most of you will known the HTML entity reference: "&nbsp;" that is used to insert an extra
space in an HTML document. Entities are expanded when a document is parsed by an XML
parser.
The following entities are predefined in XML:

Entity Character
References
&lt; <
&gt; >
&amp; &
&quot; "
&apos; '

Wrapping:
If the DTD is to be included in your XML source file, it should be wrapped in a DOCTYPE
definition with the following syntax:
<!DOCTYPE root-element [element-declarations]>
example:
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#CDATA)>
<!ELEMENT from (#CDATA)>
<!ELEMENT heading (#CDATA)>
<!ELEMENT body (#CDATA)>
]>
<note>
<to>cam</to>
<from>Jam</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>

Types of XML Documents:


There are two kinds of XML documents:
1. Well-formed
2. Valid

1. Well-formed
A "Well Formed" XML document is a document that conforms to the XML syntax rules. They
contain text and XML tags. Everything is entered correctly. They do not, however, refer to a DTD.
The following is a "Well Formed" XML document:

<?xml version="1.0"?>
<note>
<to>cam</to>
<from>Jam</from>
15
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

2. Valid
Valid documents not only conform to XML syntax but they also are error checked against a
Document Type Definition (DTD) or schema
The following is the same document as above but with an added reference to a DTD:
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "InternalNote.dtd">
<note>
<to>cam</to>
<from>Jam</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

XML DTD

DTD stands for Document Type Definition. It defines the legal building blocks of an XML
document. It is used to define document structure with a list of legal elements and attributes.

Purpose of DTD:
Its main purpose is to define the structure of an XML document. It contains a list of legal elements
and defines the structure with the help of them.

Checking Validation:
Before proceeding with XML DTD, we must check the validation. An XML document is called
"well-formed" if it contains the correct syntax.
A well-formed and valid XML document is one which have been validated against DTD.

Valid and well-formed XML document with DTD:

employee.xml
<?xml version="1.0"?>
<!DOCTYPE employee SYSTEM "employee.dtd">
<employee>
<firstname>vimal</firstname>
<lastname>kumar</lastname>
<email>[email protected]</email>
</employee>
In this a example, the DOCTYPE declaration refers to an external DTD file. The content of the file is
shown in below paragraph
employee.dtd
<!ELEMENT employee (firstname,lastname,email)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT email (#PCDATA)>

Description of DTD:
<!DOCTYPE employee : It defines that the root element of the document is employee.

16
<!ELEMENT employee: It defines that the employee element contains 3 elements "firstname,
lastname and email".
<!ELEMENT firstname: It defines that the firstname element is #PCDATA typed. (parse-able data
type).
<!ELEMENT lastname: It defines that the lastname element is #PCDATA typed. (parse-able data
type).
<!ELEMENT email: It defines that the email element is #PCDATA typed. (parse-able data type).

XML DTD with entity declaration:

A doctype declaration can also define special strings that can be used in the XML file.
An entity has three parts:
1. An ampersand (&)
2. An entity name
3. A semicolon (;)
Syntax to declare entity:
1. <!ENTITY entity-name "entity-value">

Let's see a code to define the ENTITY in doctype declaration.


author.xml

<?xml version="1.0" standalone="yes" ?>


<!DOCTYPE author [
<!ELEMENT author (#PCDATA)>
<!ENTITY sk "Sunil kumar ">
]>
<author>&sk;</author>
In the above example, sk is an entity that is used inside the author element. In such case, it will print
the value of sk entity that is "Sunil kumar".

TYPES of DTD
1. Internal DTD:

<!DOCTYPE root-element [element-declarations]>


example:
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#CDATA)>
<!ELEMENT from (#CDATA)>
<!ELEMENT heading (#CDATA)>
<!ELEMENT body (#CDATA)>
]>
<note>
<to>cam</to>
<from>Jam</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>

17
2. External DTD:
In this type , an axternal DTD file is created and its name must be specified in the corresponding
XML file. Following XML document illustrates the use of external DTD.
Step1: Creation of DTD file [ employee.dtd]
employee.dtd
<!ELEMENT employee (firstname,lastname,email)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT email (#PCDATA)>

Step 2: Creation of XML document. [employee.xml]

employee.xml
<?xml version="1.0"?>
<!DOCTYPE employee SYSTEM "employee.dtd">
<employee>
<firstname>vimal</firstname>
<lastname>kumar</lastname>
<email>[email protected]</email>
</employee>
In this a example, the DOCTYPE declaration refers to an external DTD file. The content of the file is
shown in below paragraph

18
XML Parsers

An XML parser is a software library or package that provides interfaces for client applications to
work with an XML document. The XML Parser is designed to read the XML and create a way for
programs to use XML.
XML parser validates the document and check that the document is well formatted.
Let's understand the working of XML parser by the figure given below:

Types of XML Parsers:

These are the two main types of XML Parsers:


1. DOM
2. SAX

3. DOM (Document Object Model):


A DOM document is an object which contains all the information of an XML document. It is
composed like a tree structure. The DOM Parser implements a DOM API. This API is very simple
to use.

Features of DOM Parser:


A DOM Parser creates an internal structure in memory which is a DOM document object and the
client applications get information of the original XML document by invoking methods on this
document object.
DOM Parser has a tree based structure.

Advantages:
1. It supports both read and write operations and the API is very simple to use.
2. It is preferred when random access to widely separated parts of a document is required.
3. Disadvantages
4. It is memory inefficient. (consumes more memory because the whole XML document needs
to loaded into memory).
5. It is comparatively slower than other parsers.

4. SAX (Simple API for XML):


A SAX Parser implements SAX API. This API is an event based API and less intuitive.

Features of SAX Parser:

It does not create any internal structure.


Clients does not know what methods to call, they just overrides the methods of the API and place
his own code inside method.
It is an event based parser, it works like an event handler in Java.
19
Advantages
1) It is simple and memory efficient.
2) It is very fast and works for huge documents.

Disadvantages
1) It is event-based so its API is less intuitive.
2) Clients never know the full information because the data is broken into pieces.

Difference between SAX and DOM parsers.


DOM SAX
Tree model parser (Tree of nodes) Event based parser (Sequence of
events)
DOM loads the file into the memory SAX parses the file at it reads i.e.
and then parse the file Parses node by node
Has memory constraints since it loads No memory constraints as it does not
the whole XML file before parsing store the XML content in the memory
DOM is read and write (can insert or SAX is read only i.e. can’t insert or
delete the node) delete the node
If the XML content is small then prefer Use SAX parser when memory content
DOM parser is large
Backward and forward search is SAX reads the XML file from top to
possible for searching the tags and bottom and backward navigation is not
evaluation of the information inside possible
the tags. So this gives the ease of
navigation
Slower at runtime Faster at runtime

XML DOM
DOM is an acronym stands for Document Object Model. It defines a standard way to access and
manipulate documents. The Document Object Model (DOM) is a programming API for HTML and
XML documents. It defines the logical structure of documents and the way a document is accessed
and manipulated.
As a W3C specification, one important objective for the Document Object Model is to provide a
standard programming interface that can be used in a wide variety of environments and applications.
The Document Object Model can be used with any programming language.
XML DOM defines a standard way to access and manipulate XML documents.

What does XML DOM:

The XML DOM makes a tree-structure view for an XML document.


We can access all elements through the DOM tree.
We can modify or delete their content and also create new elements. The elements, their content
(text and attributes) are all known as nodes.
For example, consider this table, taken from an HTML document:

<TABLE>
<ROWS>
<TR>
<TD>A</TD>

20
<TD>B</TD>
</TR>
<TR>
<TD>C</TD>
<TD>D</TD>
</TR>
</ROWS>
</TABLE>
The Document Object Model represents this table like this:

XML Schema
What is XML schema:

XML schema is a language which is used for expressing constraint about XML documents. There
are so many schema languages which are used now a days for example Relax- NG and XSD (XML
schema definition).
An XML schema is used to define the structure of an XML document. It is like DTD but provides
more control on XML structure.

Checking Validation

An XML document is called "well-formed" if it contains the correct syntax. A well-formed and
valid XML document is one which have been validated against Schema.

What is XML Schema Used For?

A Schema can be used:


1. to provide a list of elements and attributes in a vocabulary;
2. to associate data types, such as integer, string, etc., or more specifically such as hatsize,
sock_colour, etc., with values found in documents;

21
3. to constrain where elements and attributes can appear, and what can appear inside those
elements, such as saying that a chapter title occurs inside a chapter, and that a chapter must
consist of a chapter title followed by one or more paragraphs of text;
4. to provide documentation that is both human-readable and machine-processable;
5. to give a formal description of one or more documents.

What is the difference between XML Schema and DTD?

1. DTD is the predecessor of XML schema.


2. While DTD provides the basic structure/grammar for defining a XML document, in addition
to that XML schema provides methods to define constraints on the data contained in the
document. Therefore XML schema is considered to be richer and powerful than DTD.
3. Also, XML schema provides an object oriented approach for defining the structure of a XML
document. But since XML schema is a new technology, some XML parsers do not support it
yet.
4. XML Schema is namespace aware, while DTD is not.
5. XML Schemas are written in XML, while DTDs are not.
6. XML Schema is strongly typed, while DTD is not.
7. XML Schema has a wealth of derived and built-in data types that are not available in DTD.
8. XML Schema does not allow inline definitions, while DTD does.

How to write a Simple Schema?


Step 1: We will first write a simple xsd file in which the desired structure of XML document is
defined.

employee.xsd

<?xml version="1.0"?>
<xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema”>
<xs:element name="employee">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
<xs:element name="email" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The xs is a qualifier used to identify the schema elements and types. The document element of
schema is xs:schema. The xs:schema is the root element. It takes the attribute xmlns:xs which has
the value http://www.w3.org/2001/XMLSchema. This declaration indicates that document should
follow the rules of XML Schema. The XML Schema rules are defined by the W3 recommendation
in year 2001.
Then xs:element which is used to define the xml element. In the above example the element
employee is of complex type who have three child elements: firstname, lastname, email. All these
elements are of type string.
<xs:complexType> : It defines that the element 'employee' is complex type.
<xs:sequence> : It defines that the complex type is a sequence of elements.

22
Step 2: Now develop the XML document in which the desired values to the XML elements can be
given.
employee.xml
<?xml version="1.0"?>
<employee
xmlns="http://www.javatpoint.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceschemaLocation="employee.xsd">

<firstname>vimal</firstname>
<lastname>kumar</lastname>
<email>[email protected]</email>
</employee>
 The attribute xmlns:xsi indicates that XML document is an instance of XML schema and its
has come from the namespace http://www.w3.org/2001/XMLSchema-instance
 The xsi:noNamespaceschemaLocation attribute takes the name of the xsd file as vale.
Step 3: See the output in browser window.

XML Schema Data types

There are two types of data types in XML schema.


1. simpleType
2. complexType
SimpleType
The simpleType allows us to have text-based elements. It contains only text and does not contain
any other attribute and cannot be left empty.
Syntax:
xs: element name = “element_name” type= “data_type”
Here type can be any built in data type-
xs:string
xs: Boolean
xs: decimal
xs: date
xs:time
The simple type does not have attributes. But attributes are simple types.
Syntax:
xs: attribute name = “name of attribute” type= “data_type”
ComplexType
The complexType allows us to hold multiple attributes and elements. It can contain additional sub
elements and can be left empty.
Complex Type:
 empty elements
 elements that contain text
 element that contain other element
 element that contain text as well as other element.

23

You might also like