XML DTD
XML DTD
XML DTD
XML BASICS :
XML, or eXtensible markup language, is all about creating a universal way for both formatting and
presenting data. Once data is coded or marked up with XML tags, data can then be used in many
different ways.
XML files are text files, which can be managed by any text editor.
XML is very simple, because it has less than 10 syntax rules.
XML provides a basic syntax that can be used to share information between different kinds of
computers, different applications, and different organizations. XML data is stored in plain
text format.
With XML, our data can be available to all kinds of "reading machines" (Handheld
computers, voice machines, news feeds, etc), and make it more available for blind people, or
people with other disabilities.
Databases can trade tables, business applications can trade updates, and document systems
can share information.
It supports Unicode, allowing almost any information in any written human language to be
communicated.
Its self-documenting format describes structure and field names as well as specific values.
Content-based XML markup enhances searchability, making it possible for agents and search
engines to categorize data instead of wasting processing power on context-based full-text
searches.
XML is heavily used as a format for document storage and processing, both online and
offline.
It is based on international standards.
It is platform-independent, thus relatively immune to changes in technology.
Forward and backward compatibility are relatively easy to maintain despite changes in DTD or
Schema.
12
Why xml?
Platform Independent and Language Independent:
The main benefit of xml is that we can use it to take data from a program like Microsoft SQL,
convert it into XML then share that XML with other programs and platforms. We can communicate
between two platforms which are generally very difficult.
The main thing which makes XML truly powerful is its international acceptance. Many corporation
use XML interfaces for databases, programming, office application mobile phones and more. It is
due to its platform independent feature.
XML Example:
XML documents create a hierarchical structure looks like a tree so it is known as XML Tree that
starts at "the root" and branches to "the leaves".
XML documents must contain a root element. This element is "the parent" of all other elements.
The elements in an XML document form a document tree. The tree starts at the root and branches to
the lowest level of the tree.
All elements can have sub elements (child elements).
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
13
The main difference between XML and HTML:
HTML is an abbreviation for HyperText Markup Language while XML stands for eXtensible
Markup Language.The differences are as follows:-
1. HTML was designed to display data with focus on how data looks while XML was
designed to be a software and hardware independent tool used to transport and store data,
with focus on what data is.
2. HTML is a markup language itself while XML provides a framework for defining
markup languages.
3. HTML is a presentation language while XML is neither a programming language nor a
presentation language.
4. HTML is case insensitive while XML is case sensitive.
5. HTML is used for designing a web-page to be rendered on the client side while XML is
used basically to transport data between the application and the database.
6. HTML has its own predefined tags while what makes XML flexible is that custom tags can
be defined and the tags are invented by the author of the XML document.
7. HTML is not strict if the user does not use the closing tags but XML makes it mandatory
for the user the close each tag that has been used.
8. HTML does not preserve white space while XML does.
9. HTML is about displaying data, hence static but XML is about carrying information, hence
dynamic.
Thus,it can be said that HTML and XML are not competitors but rather complement to each
other and clearly serving altogether different purposes.
XML documents (and HTML documents) are made up by the following building blocks:-
Elements, Tags, Attributes, Entities, PCDATA, and CDATA
This is a brief explanation of each of the building blocks:
Elements
Elements are the main building blocks of both XML and HTML documents. Examples of HTML
elements are "body" and "table".
Examples of XML elements could be "note" and "message". Elements can contain text, other
elements, or be empty. Examples of empty HTML elements are "hr", "br" and "img".
In a DTD, elements are declared with an ELEMENT declaration.
Tags
Tags are used to markup elements.
A starting tag like <element_name> mark up the beginning of an element, and an ending tag like
</element_name> mark up the end of an element.
Examples: A body element: <body>body text in between</body>. A message element:
<message>some message in between</message>
Attributes
Attributes provide extra information about elements.
Attributes are placed inside the start tag of an element. Attributes come in name/value pairs. The
following "img" element has an additional information about a source file:
<img src="computer.gif" />
The name of the element is "img". The name of the attribute is "src". The value of the attribute is
"computer.gif". Since the element itself is empty it is closed by a " /".
PCDATA
PCDATA stands for Parsed Character data. PCDATA is the text that will be parsed by a parser.
Tags inside the PCDATA will be treated as markup and entities will be expanded.
14
CDATA
CDATA: (Unparsed Character data): CDATA contains the text which is not parsed further in an
XML document. Tags inside the CDATA text are not treated as markup and entities will not be
expanded.
Entities
Entities as variables used to define common text. Entity references are references to entities.
Most of you will known the HTML entity reference: " " that is used to insert an extra
space in an HTML document. Entities are expanded when a document is parsed by an XML
parser.
The following entities are predefined in XML:
Entity Character
References
< <
> >
& &
" "
' '
Wrapping:
If the DTD is to be included in your XML source file, it should be wrapped in a DOCTYPE
definition with the following syntax:
<!DOCTYPE root-element [element-declarations]>
example:
<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#CDATA)>
<!ELEMENT from (#CDATA)>
<!ELEMENT heading (#CDATA)>
<!ELEMENT body (#CDATA)>
]>
<note>
<to>cam</to>
<from>Jam</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>
1. Well-formed
A "Well Formed" XML document is a document that conforms to the XML syntax rules. They
contain text and XML tags. Everything is entered correctly. They do not, however, refer to a DTD.
The following is a "Well Formed" XML document:
<?xml version="1.0"?>
<note>
<to>cam</to>
<from>Jam</from>
15
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
2. Valid
Valid documents not only conform to XML syntax but they also are error checked against a
Document Type Definition (DTD) or schema
The following is the same document as above but with an added reference to a DTD:
<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "InternalNote.dtd">
<note>
<to>cam</to>
<from>Jam</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
XML DTD
DTD stands for Document Type Definition. It defines the legal building blocks of an XML
document. It is used to define document structure with a list of legal elements and attributes.
Purpose of DTD:
Its main purpose is to define the structure of an XML document. It contains a list of legal elements
and defines the structure with the help of them.
Checking Validation:
Before proceeding with XML DTD, we must check the validation. An XML document is called
"well-formed" if it contains the correct syntax.
A well-formed and valid XML document is one which have been validated against DTD.
employee.xml
<?xml version="1.0"?>
<!DOCTYPE employee SYSTEM "employee.dtd">
<employee>
<firstname>vimal</firstname>
<lastname>kumar</lastname>
<email>[email protected]</email>
</employee>
In this a example, the DOCTYPE declaration refers to an external DTD file. The content of the file is
shown in below paragraph
employee.dtd
<!ELEMENT employee (firstname,lastname,email)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT email (#PCDATA)>
Description of DTD:
<!DOCTYPE employee : It defines that the root element of the document is employee.
16
<!ELEMENT employee: It defines that the employee element contains 3 elements "firstname,
lastname and email".
<!ELEMENT firstname: It defines that the firstname element is #PCDATA typed. (parse-able data
type).
<!ELEMENT lastname: It defines that the lastname element is #PCDATA typed. (parse-able data
type).
<!ELEMENT email: It defines that the email element is #PCDATA typed. (parse-able data type).
A doctype declaration can also define special strings that can be used in the XML file.
An entity has three parts:
1. An ampersand (&)
2. An entity name
3. A semicolon (;)
Syntax to declare entity:
1. <!ENTITY entity-name "entity-value">
TYPES of DTD
1. Internal DTD:
17
2. External DTD:
In this type , an axternal DTD file is created and its name must be specified in the corresponding
XML file. Following XML document illustrates the use of external DTD.
Step1: Creation of DTD file [ employee.dtd]
employee.dtd
<!ELEMENT employee (firstname,lastname,email)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT email (#PCDATA)>
employee.xml
<?xml version="1.0"?>
<!DOCTYPE employee SYSTEM "employee.dtd">
<employee>
<firstname>vimal</firstname>
<lastname>kumar</lastname>
<email>[email protected]</email>
</employee>
In this a example, the DOCTYPE declaration refers to an external DTD file. The content of the file is
shown in below paragraph
18
XML Parsers
An XML parser is a software library or package that provides interfaces for client applications to
work with an XML document. The XML Parser is designed to read the XML and create a way for
programs to use XML.
XML parser validates the document and check that the document is well formatted.
Let's understand the working of XML parser by the figure given below:
Advantages:
1. It supports both read and write operations and the API is very simple to use.
2. It is preferred when random access to widely separated parts of a document is required.
3. Disadvantages
4. It is memory inefficient. (consumes more memory because the whole XML document needs
to loaded into memory).
5. It is comparatively slower than other parsers.
Disadvantages
1) It is event-based so its API is less intuitive.
2) Clients never know the full information because the data is broken into pieces.
XML DOM
DOM is an acronym stands for Document Object Model. It defines a standard way to access and
manipulate documents. The Document Object Model (DOM) is a programming API for HTML and
XML documents. It defines the logical structure of documents and the way a document is accessed
and manipulated.
As a W3C specification, one important objective for the Document Object Model is to provide a
standard programming interface that can be used in a wide variety of environments and applications.
The Document Object Model can be used with any programming language.
XML DOM defines a standard way to access and manipulate XML documents.
<TABLE>
<ROWS>
<TR>
<TD>A</TD>
20
<TD>B</TD>
</TR>
<TR>
<TD>C</TD>
<TD>D</TD>
</TR>
</ROWS>
</TABLE>
The Document Object Model represents this table like this:
XML Schema
What is XML schema:
XML schema is a language which is used for expressing constraint about XML documents. There
are so many schema languages which are used now a days for example Relax- NG and XSD (XML
schema definition).
An XML schema is used to define the structure of an XML document. It is like DTD but provides
more control on XML structure.
Checking Validation
An XML document is called "well-formed" if it contains the correct syntax. A well-formed and
valid XML document is one which have been validated against Schema.
21
3. to constrain where elements and attributes can appear, and what can appear inside those
elements, such as saying that a chapter title occurs inside a chapter, and that a chapter must
consist of a chapter title followed by one or more paragraphs of text;
4. to provide documentation that is both human-readable and machine-processable;
5. to give a formal description of one or more documents.
employee.xsd
<?xml version="1.0"?>
<xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema”>
<xs:element name="employee">
<xs:complexType>
<xs:sequence>
<xs:element name="firstname" type="xs:string"/>
<xs:element name="lastname" type="xs:string"/>
<xs:element name="email" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
The xs is a qualifier used to identify the schema elements and types. The document element of
schema is xs:schema. The xs:schema is the root element. It takes the attribute xmlns:xs which has
the value http://www.w3.org/2001/XMLSchema. This declaration indicates that document should
follow the rules of XML Schema. The XML Schema rules are defined by the W3 recommendation
in year 2001.
Then xs:element which is used to define the xml element. In the above example the element
employee is of complex type who have three child elements: firstname, lastname, email. All these
elements are of type string.
<xs:complexType> : It defines that the element 'employee' is complex type.
<xs:sequence> : It defines that the complex type is a sequence of elements.
22
Step 2: Now develop the XML document in which the desired values to the XML elements can be
given.
employee.xml
<?xml version="1.0"?>
<employee
xmlns="http://www.javatpoint.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceschemaLocation="employee.xsd">
<firstname>vimal</firstname>
<lastname>kumar</lastname>
<email>[email protected]</email>
</employee>
The attribute xmlns:xsi indicates that XML document is an instance of XML schema and its
has come from the namespace http://www.w3.org/2001/XMLSchema-instance
The xsi:noNamespaceschemaLocation attribute takes the name of the xsd file as vale.
Step 3: See the output in browser window.
23