DTD - Overview
DTD - Overview
XML Document Type Declaration, commonly known as DTD, is a way to describe precisely the
XML language. DTDs check the validity of structure and vocabulary of an XML document against
the grammatical rules of the appropriate XML language.
Well-formed − If the XML document adheres to all the general XML rules such as tags must
be properly nested, opening and closing tags must be balanced, and empty tags must end with
'/>', then it is called as well-formed.
OR
Valid − An XML document said to be valid when it is not only well-formed, but it also
conforms to available DTD that specifies which tags it uses, what attributes those tags can
contain, and which tags can occur inside other tags, among other properties.
The following diagram represents that a DTD is used to structure the XML document −
Types
DTD can be classified on its declaration basis in the XML document, such as −
Internal DTD
External DTD
When a DTD is declared within the file it is called Internal DTD and if it is declared in a separate file
it is called External DTD.
We will learn more about these in the chapter DTD Syntax
Features
DTD - Syntax
An XML DTD can be either specified inside the document, or it can be kept in a separate document
and then the document can be linked to the DTD document to use it.
Syntax
Syntax
Example
<!DOCTYPE address [
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
The DOCTYPE declaration has an exclamation mark (!) at the start of the element name. The
DOCTYPE informs the parser that a DTD is associated with this XML document.
DTD Body − The DOCTYPE declaration is followed by body of the DTD, where you declare
elements, attributes, entities, and notations −
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone_no (#PCDATA)>
Several elements are declared here that make up the vocabulary of the <name> document. <!
ELEMENT name (#PCDATA)> defines the element name to be of type "#PCDATA". Here
#PCDATA means parse-able text data.
End Declaration − Finally, the declaration section of the DTD is closed using a closing bracket and a
closing angle bracket (]>). This effectively ends the definition, and thereafter, the XML document
follows immediately.
Rules
The document type declaration must appear at the start of the document (preceded only by the
XML header) - it is not permitted anywhere else within the document.
Similar to the DOCTYPE declaration, the element declarations must start with an exclamation
mark.
The Name in the document type declaration must match the element type of the root element.
External DTD
In external DTD elements are declared outside the XML file. They are accessed by specifying the
system attributes which may be either the legal .dtd file or a valid URL. To reference it as external
DTD, standalone attribute in the XML declaration must be set as no. This means, declaration includes
information from the external source.
Syntax
Example
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
The content of the DTD file address.dtd are as shown −
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
Types
You can refer to an external DTD by either using system identifiers or public identifiers.
System Identifiers
A system identifier enables you to specify the location of an external file containing DTD
declarations. Syntax is as follows −
As you can see it contains keyword SYSTEM and a URI reference pointing to the location of the
document.
DTD - Components
Element
Attributes
Entities
Elements
XML elements can be defined as building blocks of an XML document. Elements can behave as a
container to hold text, elements, attributes, media objects or mix of all.
Each XML document contains one or more elements, the boundaries of which are either delimited by
start-tags and end-tags, or empty elements.
Example
<name>
Tutorials Point
</name>
As you can see we have defined a <name> tag. There's a text between start and end tag of <name>.
Elements, when used in an XML-DTD, need to be declared which will be discussed in detail in the
chapter DTD Elements.
Attributes
Attributes are part of the XML elements. An element can have any number of unique attributes.
Attributes give more information about the XML element or more precisely it defines a property of
the element. An XML attribute is always a name-value pair.
Example
Entities
Entities are placeholders in XML. These can be declared in the document prolog or in a DTD. Entities
can be primarily categorized as −
Built-in entities
Character entities
General entities
Parameter entities
There are five built-in entities that play in well-formed XML, they are −
ampersand: &
Single quote: '
Greater than: >
Less than: <
Double quote: "
DTD - Elements
XML elements can be defined as building blocks of an XML document. Elements can behave as a
container to hold text, elements, attributes, media objects or mix of all.
A DTD element is declared with an ELEMENT declaration. When an XML file is validated by DTD,
parser initially checks for the root element and then the child elements are validated.
Syntax
Empty content
Element content
Mixed content
Any content
Empty Content
This is a special case of element declaration. This element declaration does not contain any content.
These are declared with the keyword EMPTY.
Syntax
<!DOCTYPE hr[
<!ELEMENT address EMPTY>
]>
<address />
In this example address is declared as an empty element. The markup for address element would
appear as <address />.
Element Content
In element declaration with element content, the content would be allowable elements within
parentheses. We can also include more than one element.
Syntax
Below example demonstrates a simple example for element declaration with element content −
<!DOCTYPE address [
<!ELEMENT address (name,company,phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
]>
<address>
<name>Tanmay Patil</name>
<company>TutorialsPoint</company>
<phone>(011) 123-4567</phone>
</address>
In the above example, address is the parent element and name, company and phone_no are its child
elements.
Below table shows the list of operators and syntax rules which can be applied in defining child
elements −
Operato
Syntax Description Example
r
It gives sequence of
child elements <!ELEMENT address (name,
<!ELEMENT company)>
separated by comma
, element-name Sequence of child
which must be elements name, company, which
(child1, child2)>
included in the the must occur in the same order
element-name. inside the element name address.
Rules
We need to follow certain rules if there is more than one element content −
Sequences − Often the elements within DTD documents must appear in a distinct order. If
this is the case, you define the content using a sequence.
The declaration indicates that the <address> element must have exactly three children -
<name>, <company>, and <phone> - and that they must appear in this order. For example −
<!ELEMENT address (name,company,phone)>
Choices − Suppose you need to allow one element or another, but not both. In such cases you
must use the pipe (|) character. The pipe functions as an exclusive OR. For example −
<!ELEMENT address (mobile | landline)>
Mixed Element Content
This is the combination of (#PCDATA) and children elements. PCDATA stands for parsed character
data, that is, text that is not markup. Within mixed content models, text can appear by itself or it can
be interspersed between elements. The rules for mixed content models are similar to the element
content as discussed in the previous section.
Syntax
Following is a simple example demonstrating the mixed content element declaration in a DTD.
<!DOCTYPE address [
<!ELEMENT address (#PCDATA|name)*>
<!ELEMENT name (#PCDATA)>
]>
<address>
Here's a bit of text mixed up with the child element.
<name>
Tanmay Patil
</name>
</address>
You can declare an element using the ANY keyword in the content. It is most often referred to as
mixed category element. ANY is useful when you have yet to decide the allowable contents of the
element.
Syntax
Here, the ANY keyword indicates that text (PCDATA) and/or any elements declared within the DTD
can be used within the content of the <elementname> element. They can be used in any order any
number of times. However, the ANY keyword does not allow you to include elements that are not
declared within the DTD.
Example
Following is a simple example demonstrating the element declaration with ANY content −
<!DOCTYPE address [
<!ELEMENT address ANY>
]>
<address>
Here's a bit of sample text
</address>
DTD - Attributes
In this chapter we will discuss about DTD Attributes. Attribute gives more information about an
element or more precisely it defines a property of an element. An XML attribute is always in the form
of a name-value pair. An element can have any number of unique attributes.
Attribute declaration is very much similar to element declarations in many ways except one; instead
of declaring allowable content for elements, you declare a list of allowable attributes for each element.
These lists are called ATTLIST declaration.
Syntax
The DTD attributes start with <!ATTLIST keyword if the element contains the attribute.
element-name specifies the name of the element to which the attribute applies.
attribute-name specifies the name of the attribute which is included with the element-name.
attribute-type defines the type of attributes. We will discuss more on this in the following
sections.
attribute-value takes a fixed value that the attributes must define. We will discuss more on
this in the following sections.
Example
<!DOCTYPE address [
<!ELEMENT address ( name )>
<!ELEMENT name ( #PCDATA )>
<!ATTLIST name id CDATA #REQUIRED>
]>
<address>
<name id = "123">Tanmay Patil</name>
</address>
Attribute Types
When declaring attributes, you can specify how the processor should handle the data that appears in
the value. We can categorize attribute types in three main categories −
String type
Tokenized types
Enumerated types
Sr.No
Type & Description
.
1 CDATA
CDATA is character data (text and not markup). It is a String Attribute Type.
ID
2
It is a unique identifier of the attribute. It should not appear more than once. It is
a Tokenized Attribute Type.
IDREF
3
It is used to reference an ID of another element. It is used to establish connections
between elements. It is a Tokenized Attribute Type.
4 IDREFS
It is used to reference multiple ID's. It is a Tokenized Attribute Type.
5 ENTITY
It represents an external entity in the document. It is a Tokenized Attribute Type.
ENTITIES
6
It represents a list of external entities in the document. It is a Tokenized Attribute
Type.
NMTOKEN
7
It is similar to CDATA and the attribute value consists of a valid XML name. It is
a Tokenized Attribute Type.
NMTOKENS
8
It is similar to CDATA and the attribute value consists a list of valid XML name. It is
a Tokenized Attribute Type.
NOTATION
9
An element will be referenced to a notation declared in the DTD document. It is
an Enumerated Attribute Type.
Enumeration
10
It allows defining a specific list of values where one of the values must match. It is
an Enumerated Attribute Type.
Within each attribute declaration, you must specify how the value will appear in the document. You
can specify if an attribute −
Default Values
It contains the default value. The values can be enclosed in single quotes(') or double quotes(").
Syntax
<!DOCTYPE address [
<!ELEMENT address ( name )>
<!ELEMENT name ( #PCDATA )>
<!ATTLIST name id CDATA "0">
]>
<address>
<name id = "123">
Tanmay Patil
</name>
</address>
In this example we have name element with attribute id whose default value is 0. The default value is
been enclosed within the double quotes.
FIXED Values
#FIXED keyword followed by the fixed value is used when you want to specify that the attribute
value is constant and cannot be changed. A common use of fixed attributes is specifying version
numbers.
Syntax
Example
<!DOCTYPE address [
<!ELEMENT address (company)*>
<!ELEMENT company (#PCDATA)>
<!ATTLIST company name NMTOKEN #FIXED "tutorialspoint">
]>
<address>
<company name = "tutorialspoint">we are a free online teaching faculty</company>
</address>
In this example we have used the keyword #FIXED where it indicates that the value "tutorialspoint" is
the only value for the attribute name of element <company>. If we try to change the attribute value
then it gives an error.
<address>
<company name = "abc">we are a free online teaching faculty</company>
</address>
REQUIRED values
Whenever you want specify that an attribute is required, use #REQUIRED keyword.
Syntax
Example
<!DOCTYPE address [
<!ELEMENT address ( name )>
<!ELEMENT name ( #PCDATA )>
<!ATTLIST name id CDATA #REQUIRED>
]>
<address>
<name id = "123">
Tanmay Patil
</name>
</address>
In this example we have used #REQUIRED keyword to specify that the attribute id must be provided
for the element-name name
IMPLIED Values
When declaring attributes you must always specify a value declaration. If the attribute you are
declaring has no default value, has no fixed value, and is not required, then you must declare that the
attribute as implied. Keyword #IMPLIED is used to specify an attribute as implied.
Syntax
Example
<!DOCTYPE address [
<!ELEMENT address ( name )>
<!ELEMENT name ( #PCDATA )>
<!ATTLIST name id CDATA #IMPLIED>
]>
<address>
<name />
</address>
DTD - Entities
Entities are used to define shortcuts to special characters within the XML documents. Entities can be
primarily of four types −
Built-in entities
Character entities
General entities
Parameter entities
Internal Entity
Syntax
entity_name is the name of entity followed by its value within the double quotes or single
quote.
entity_value holds the value for the entity name.
The entity value of the Internal Entity is de-referenced by adding prefix & to the entity
name i.e. &entity_name.
Example
<!DOCTYPE address [
<!ELEMENT address (#PCDATA)>
<!ENTITY name "Tanmay patil">
<!ENTITY company "TutorialsPoint">
<!ENTITY phone_no "(011) 123-4567">
]>
<address>
&name;
&company;
&phone_no;
</address>
In the above example, the respective entity names name, company and phone_no are replaced by their
values in the XML document. The entity values are de-referenced by adding prefix & to the entity
name.
Save this file as sample.xml and open it in any browser, you will notice that the entity values
for name, company, phone_no are replaced respectively.
External Entity
If an entity is declared outside a DTD it is called as external entity. You can refer to an external Entity
by either using system identifiers or public identifiers.
Syntax
System Identifiers − A system identifier enables you to specify the location of an external
file containing DTD declarations.
As you can see it contains keyword SYSTEM and a URI reference pointing to the document's
location. Syntax is as follows −
<!DOCTYPE name SYSTEM "address.dtd" [...]>
Public Identifiers − Public identifiers provide a mechanism to locate DTD resources and are
written as below −
As you can see, it begins with keyword PUBLIC, followed by a specialized identifier. Public
identifiers are used to identify an entry in a catalog. Public identifiers can follow any format;
however, a commonly used format is called Formal Public Identifiers, or FPIs.
<!DOCTYPE name PUBLIC "-//Beginning XML//DTD Address Example//EN">
Example
Let us understand the external entity with the following example −
<address>
<name>
Tanmay Patil
</name>
<company>
TutorialsPoint
</company>
<phone>
(011) 123-4567
</phone>
</address>
Below is the content of the DTD file address.dtd −
<!ELEMENT address (name, company, phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
Built-in entities
All XML parsers must support built-in entities. In general, you can use these entity references
anywhere. You can also use normal text within the XML document, such as in element contents and
attribute values.
There are five built-in entities that play their role in well-formed XML, they are −
ampersand: &
Single quote: '
Greater than: >
Less than: <
Double quote: "
Example
<note>
<description>I'm a technical writer & programmer</description>
<note>
As you can see here the & character is replaced by & whenever the processor encounters this.
Character entities
Character Entities are used to name some of the entities which are symbolic representation of
information i.e characters that are difficult or impossible to type can be substituted by Character
Entities.
Example
General entities
General entities must be declared within the DTD before they can be used within an XML document.
Instead of representing only a single character, general entities can represent characters, paragraphs,
and even entire documents.
Syntax
To declare a general entity, use a declaration of this general form in your DTD −
Example
<!DOCTYPE note [
<!ENTITY source-text "tutorialspoint">
]>
<note>
&source-text;
</note>
Whenever an XML parser encounters a reference to source-text entity, it will supply the replacement
text to the application at the point of the reference.
Parameter entities
The purpose of a parameter entity is to enable you to create reusable sections of replacement text.
Syntax
Example
Following example demonstrates the parameter entity declaration. Suppose you have element
declarations as below −
Parameter entities are dereferenced in the same way as a general entity reference, only with a percent
sign instead of an ampersand −
XSL
Before learning XSLT, we should first understand XSL which stands for
EXtensible Stylesheet Language. It is similar to XML as CSS is to HTML.
In case of HTML document, tags are predefined such as table, div, and span; and the browser knows
how to add style to them and display those using CSS styles. But in case of XML documents, tags are
not predefined. In order to understand and style an XML document, World Wide Web Consortium
(W3C) developed XSL which can act as XML based Stylesheet Language. An XSL document
specifies how a browser should render an XML document.
XSLT − used to transform XML document into various other types of document.
XPath − used to navigate XML document.
XSL-FO − used to format XML document.
What is XSLT
XSLT, Extensible Stylesheet Language Transformations, provides the ability to transform XML data
from one format to another automatically.
An XSLT stylesheet is used to define the transformation rules to be applied on the target XML
document. XSLT stylesheet is written in XML format. XSLT Processor takes the XSLT stylesheet
and applies the transformation rules on the target XML document and then it generates a formatted
document in the form of XML, HTML, or text format. This formatted document is then utilized by
XSLT formatter to generate the actual output which is to be displayed to the end-user.
Advantages
Independent of programming. Transformations are written in a separate xsl file which is again
an XML document.
Output can be altered by simply modifying the transformations in xsl file. No need to change
any code. So Web designers can edit the stylesheet and can see the change in the output
quickly.
XLink is used to create hyperlinks in XML documents.
XLink Browser Support
XLink Syntax
In HTML, the <a> element defines a hyperlink. However, this is not how it works in XML. In XML
documents, you can use whatever element names you want - therefore it is impossible for browsers to
predict what link elements will be called in XML documents.
Below is a simple example of how to use XLink to create links in an XML document:
<homepages xmlns:xlink="http://www.w3.org/1999/xlink">
<homepage xlink:type="simple" xlink:href="https://www.w3schools.com">Visit
W3Schools</homepage>
<homepage xlink:type="simple" xlink:href="http://www.w3.org">Visit W3C</homepage>
</homepages>
To get access to the XLink features we must declare the XLink namespace. The XLink namespace is:
"http://www.w3.org/1999/xlink".The xlink:type and the xlink:href attributes in the <homepage>
elements come from the XLink namespace.
The xlink:type="simple" creates a simple "HTML-like" link (means "click here to go there").
XLink Example
<bookstore xmlns:xlink="http://www.w3.org/1999/xlink">
</bookstore>
Example explained:
xlink:actuate onLoad Defines when the linked resource is read and shown:
onRequest
other onLoad - the resource should be loaded and shown when the
none document loads
onRequest - the resource is not read or shown before the link is
clicked
XPointer
There is no browser support for XPointer. But XPointer is used in other XML languages.
XPointer Example
In this example, we will use XPointer in conjunction with XLink to point to a specific part of
another document.
We will start by looking at the target XML document (the document we are linking to):
So, instead of linking to the entire document (as with XLink), XPointer allows you to link to specific
parts of the document. To link to a specific part of a page, add a number sign (#) and an XPointer
expression after the URL in the xlink:href attribute, like this:
xlink:href="https://dog.com/dogbreeds.xml#xpointer(id('Rottweiler'))". The expression refers to the
element in the target document, with the id value of "Rottweiler".
XPointer also allows a shorthand method for linking to an element with an id. You can use the value
of the id directly, like this: xlink:href="https://dog.com/dogbreeds.xml#Rottweiler".
The following XML document contains links to more information of the dog breed for each of my
dogs: