DTD
DTD
Elements
Elements are the main building blocks
of both XML and HTML documents.
Examples of HTML elements are "body"
and "table". Examples of XML elements
could be "note" and "message". Elements
can contain text, other elements, or be
empty. Examples of empty HTML elements
are "hr", "br" and "img".
Examples:
<body>some text</body>
<message>some text</message>
In a DTD, XML elements are declared with
an element declaration with the following
syntax:
<!ELEMENT element-name category>
or
<!ELEMENT element-name (element-
content)>
Empty Elements
Empty elements are declared with the
category keyword EMPTY:
<!ELEMENT element-name EMPTY>
Example:
<!ELEMENT br EMPTY>
XML example:
<br />
Example:
Example:
Example:
Example:
Example:
Attributes
Attributes provide extra information
about elements.
Attributes are always placed inside the
opening tag of an element. Attributes
always come in name/value pairs. The
following "img" element has additional
information about a source file:
<img src="computer.gif" />
The name of the element is "img". The
name of the attribute is "src". The value of
the attribute is "computer.gif". Since the
element itself is empty it is closed by a
" /".
Declaring Attributes
An attribute declaration has the following
syntax:
<!ATTLIST element-name attribute-name
attribute-type default-value>
DTD example:
XML example:
<payment type="check" />
The attribute-type can be one of the
following:
Type Description
CDATA The value is character data
(en1| The value must be one from
en2|..) an enumerated list
ID The value is a unique id
IDREF The value is the id of another
element
IDREFS The value is a list of other ids
NMTOKEN The value is a valid XML name
NMTOKENS The value is a list of valid XML
names
ENTITY The value is an entity
ENTITIES The value is a list of entities
NOTATION The value is a name of a
notation
xml: The value is a predefined xml
value
Valid XML:
<square width="100" />
In the example above, the "square"
element is defined to be an empty
element with a "width" attribute of type
CDATA. If no width is specified, it has a
default value of 0.
#REQUIRED
Syntax
<!ATTLIST element-name attribute-name
attribute-type #REQUIRED>
Example
DTD:
<!ATTLIST person number CDATA
#REQUIRED>
Valid XML:
<person number="5677" />
Invalid XML:
<person />
Use the #REQUIRED keyword if you don't
have an option for a default value, but still
want to force the attribute to be present.
#IMPLIED
Syntax
<!ATTLIST element-name attribute-name
attribute-type #IMPLIED>
Example
DTD:
<!ATTLIST contact fax CDATA #IMPLIED>
Valid XML:
<contact fax="555-667788" />
Valid XML:
<contact />
Use the #IMPLIED keyword if you don't
want to force the author to include an
attribute, and you don't have an option for
a default value.
#FIXED
Syntax
<!ATTLIST element-name attribute-name
attribute-type #FIXED "value">
Example
DTD:
<!ATTLIST sender company CDATA
#FIXED "Microsoft">
Valid XML:
<sender company="Microsoft" />
Invalid XML:
<sender company="W3Schools" />
Use the #FIXED keyword when you want
an attribute to have a fixed value without
allowing the author to change it. If an
author includes another value, the XML
parser will return an error.
XML example:
<payment type="check" />
or
<payment type="cash" />
Use enumerated attribute values when
you want the attribute value to be one of
a fixed set of legal values.
Use of Elements vs. Attributes
Data can be stored in child elements or in
attributes.
Take a look at these examples:
<person sex="female">
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
<person>
<sex>female</sex>
<firstname>Anna</firstname>
<lastname>Smith</lastname>
</person>
In the first example sex is an attribute. In
the last, sex is a child element. Both
examples provide the same information.
There are no rules about when to use
attributes, and when to use child
elements. My experience is that attributes
are handy in HTML, but in XML you should
try to avoid them. Use child elements if
the information feels like data.
Avoid using attributes?
Should you avoid using attributes?
Some of the problems with attributes are:
• attributes cannot contain multiple
values (child elements can)
• attributes are not easily expandable
(for future changes)
• attributes cannot describe structures
(child elements can)
• attributes are more difficult to
manipulate by program code
• attribute values are not easy to test
against a DTD
If you use attributes as containers for
data, you end up with documents that are
difficult to read and maintain. Try to use
elements to describe data. Use attributes
only to provide information that is not
relevant to the data.
Don't end up like this (this is not how XML
should be used):
<note day="12" month="11"
year="2002"
to="Tove" from="Jani"
heading="Reminder"
body="Don't forget me this weekend!">
</note>
<note id="p502">
<to>Jani</to>
<from>Tove</from>
<heading>Re: Reminder</heading>
<body>I will not!</body>
</note>
</messages>
The ID in these examples is just a
counter, or a unique identifier, to identify
the different notes in the XML file, and not
a part of the note data.
What I am trying to say here is that metadata (data
about data) should be stored as attributes, and that
data itself should be stored as elements
Entities
Some characters have a special meaning
in XML, like the less than sign (<) that
defines the start of an XML tag.
Most of you know the HTML entity:
" ". This "no-breaking-space" entity
is used in HTML to insert an extra space in
a document. Entities are expanded when a
document is parsed by an XML parser.
The following entities are predefined in
XML:
Entity References Character
< <
> >
& &
" "
' '
PCDATA
PCDATA means parsed character data.
Think of character data as the text found
between the start tag and the end tag of
an XML element.
PCDATA is text that WILL be parsed
by a parser. The text will be examined
by the parser for entities and markup.
Tags inside the text will be treated as
markup and entities will be expanded.
However, parsed character data should
not contain any &, <, or > characters;
these need to be represented by the
& < and > entities, respectively.
CDATA
CDATA means character data.
CDATA is text that will NOT be parsed
by a parser. Tags inside the text will NOT
be treated as markup and entities will not
be expanded.
In a DTD, elements are declared with an
ELEMENT declaration.
Declaring Elements
.
Entities are variables used to define
shortcuts to standard text or special
characters.
• Entity references are references to
entities
• Entities can be declared internal or
external
XML example:
<author>&writer;©right;</author>
Note: An entity has three parts: an
ampersand (&), an entity name, and a
semicolon (;).
XML example:
<author>&writer;©right;</author>
Example
<!DOCTYPE TVSCHEDULE [