TCP Lec03
TCP Lec03
TCP Lec03
Lecture 3
<note>
<subject>Reminder</subject>
<content>Bring exercise on next Monday</content>
<sender>Wong</sender>
</note>
DTD Declaration – External DTD
External DTD Declaration
DTD is declared in a separate file.
This is the same XML document with an external DTD:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE note SYSTEM "message.dtd">
<note>
<subject>Reminder</subject> URI reference pointing to the
root element <content>Bring exercise on next Monday</content> location of the document
<sender>Wong</sender>
</note>
SYSTEM identifier tells the XML parser where to find the DTD
file on the system.
Use SYSTEM for external DTDs that you define yourself, and use
PUBLIC for official, published DTDs
This is a copy of the file "message.dtd" containing the DTD:
<!ELEMENT note (subject, content, sender)>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT content (#PCDATA)>
<!ELEMENT sender (#PCDATA)>
Anatomy of DTD
Element type declaration
Attribute declaration
Entity declaration
Element Type Declaration
It declare each element that appears within the XML
document.
An element declaration has the following syntax:
<!ELEMENT element-name (element-content)>
Example: <!ELEMENT student (first_name, last_name)>
Rules:
All element types used in an XML document must be declared
in the DTD.
An element type cannot be declared more than once.
Element names are case sensitive.
The keyword ELEMENT must be in upper case.
Element Content
Element content consists of EMPTY, ANY, Mixed, or
children element types.
Element Content Definition
Children Elements Any number of element types can be placed within another element
type. These are called children elements, and the elements they are
placed in are called parent elements.
EMPTY Refers to tags that are empty.
For example, <!ELEMENT book EMPTY> <book />
ANY Refers to anything at all, as long as XML rules are followed.
ANY is useful to use when you have yet to decide the contents of
the element.
Mixed content Refers to a combination of (#PCDATA) and children elements.
PCDATA stands for parsed character data, that is, text that is not
markup. Within mixed content models, text can appear by itself or it
can be interspersed between elements.
PCDATA and CDATA
PCDATA means parsed character data
It is text that will be parsed by a parser. The text will be
examined by the parser for entities and markup.
Tag inside the text will be treated as markup and entities will be
expanded.
It should not contain any &, <, or > characters; these need to be
represented by the & < and > entities, respectively.
Example:
detail.dtd
<!ELEMENT student (id)>
<!ELEMENT id (#PCDATA)>
XML
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE student SYSTEM "detail.dtd">
<student>
<id>1151108766</id>
</student>
Element Content: Children Element
To declare Multiple Children in Sequence:
Multiple children are declared using commas (,).
<!ELEMENT parent_name (child1_name,child2_name,child3_name)>
<!ELEMENT child1_name element_content>
<!ELEMENT child2_name element_content >
<!ELEMENT child3_name element_content >
<student>
<birthdate>20.05.98</birthdate>
</student>
Element Content: Children Element
To declare Zero or More Children
Zero or more children are declared using the (*) operator.
<!ELEMENT parent_name (child_name*)>
<!ELEMENT child_name element_content>
Example:
content.dtd
<!ELEMENT student (subject*)>
<!ELEMENT subject (#PCDATA)>
XML
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE student SYSTEM "content.dtd">
<student>
<subject>English</subject>
<subject>Programming</subject>
<subject>Database</subject>
</student>
Element Content: Children Element
To declare One or More Children:
One or more children are declared using the (+) operator.
<!ELEMENT parent_name (child_name+)>
<!ELEMENT child_name element_content>
Example:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE student [
<!ELEMENT student (subject+)>
<!ELEMENT subject (#PCDATA)>
]>
<student>
<subject>English</subject>
</student>
Element Content: Children Element
Combinations of Children (Choice):
A choice between children element types is declared using the (|)
operator.
<!ELEMENT parent_name (child1_name | child2_name)>
<!ELEMENT child1_name element_content>
<!ELEMENT child2_name element_content>
Example:
content.dtd
<!ELEMENT student (id | surname)>
<!ELEMENT id (#PCDATA)>
<!ELEMENT surname (#PCDATA)>
XML
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE student SYSTEM "content.dtd">
<student>
<id>1151108766</id>
</student>
Element Content: Empty & ANY Element
Example
<!DOCTYPE room [ <!DOCTYPE room [
<!ELEMENT room (box)> <!ELEMENT room (box)>
<!ELEMENT box (EMPTY)> <!ELEMENT box (ANY)>
]> ]>
<room> <room>
<box/> <box>pink box</box>
</room> </room>
Element Content: Mixed Element
Mixed content is used to declare elements that contain a mixture
of children elements and text (PCDATA).
Text can appear by itself or it can be interspersed between elements
<!ELEMENT parent_name (#PCDATA|child1_name)*>
Example:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE student [
<!ELEMENT student (description) >
<!ELEMENT description (#PCDATA | title | detail)* >
]>
<student>
<description> Muthu like to teach
<title> Java Programming </title> using software called
<detail> Netbeans </detail> </description>
</student>
DTD Cardinality
An element’s cardinality defines how many times it will
appear within a content model
Indicator Description
[none] Element must appear once and only once
? Element may appear either once or not at all
+ Element may appear either one or more times
* Element may appear either zero or more times
Attribute Declaration
Attributes are additional information associated with an
element type.
They are intended for interpretation by an application.
The ATTLIST declarations identify which element types
may have attributes, what type of attributes they may be,
and what the default value of the attributes are.
<!ATTLIST element_name attribute_name attribute_type default_value>
.
.
.
<element attribute_name="attribute_value">
Attribute Declaration: Attribute Type
Type Description
CDATA CDATA stands for character data. This type allows all the characters including
numbers and space character.
ID ID is a unique identifier of the attribute. An element type may only have
one ID attribute. An ID attribute can only have an #IMPLIED or #REQUIRED
setting.
IDREF IDREF is used to establish connections between elements. The IDREF value of the
attribute must refer to an ID value declared elsewhere in the document.
IDREFS It allows multiple ID values separated by whitespace
ENTITY It allows one entity name where this entity should be unparsed entity
ENTITIES It allows multiple ENTITY names separated by whitespace.
NMTOKEN It is same as CDATA but does not accept space character
NMTOKENS It accepts one or more tokens(where one token is a sequence of characters
without space character) and in this case space is taken as separator between
tokens
NOTATION NOTATION is used to specify the format of non-XML data. A common use of
notations is to describe MIME types such as image/gif, image/jpeg etc.
Enumerated It provides a specific list of values where one of the values must match.
Example: <!ATTLIST element_name attribute_name (value1 | value2 | value3) >
Attribute Declaration: Default Value
Default Value - Specifies the value of the attribute name if its
value is not otherwise defined.
Type Description
Character Data default value of the attribute in a quoted string form.
(CDATA) Example: <!ATTLIST course degree CDATA "PhD">
#FIXED It is used to fix a default value of the attribute.
XML
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE individuals SYSTEM "content.dtd">
<individuals>
<name individual_id="a8904885">Alex Kok</name>
<name individual_id="a9011133">Sarah Tan</name>
<name individual_id="a9216735" parent1_id="a9011133"
parent2_id="a8904885">Mary Kok</name>
</individuals>
Example of Attribute: ID, IDREFS
<?xml version="1.0" encoding="UTF-8" standalone= "yes" ?>
<!DOCTYPE individuals [
<!ELEMENT individuals (name)*>
<!ELEMENT name (#PCDATA)>
<!ATTLIST name individual_id ID #REQUIRED>
<!ATTLIST name parent_id IDREFS #IMPLIED>
]>
<individuals>
<name individual_id="A8908">Alex Kok</name>
<name individual_id="A9113">Sarah Tan</name>
<name individual_id="A9273" parent_id="A9113 A8908">Mary Kok</name>
</individuals>
Example of Attribute: Entity
project.dtd
<!ELEMENT experiment (results)*>
<!ELEMENT results EMPTY>
<!ATTLIST results image ENTITY #REQUIRED>
<!ENTITY test SYSTEM "http://www.mmu.edu.my/image/exp.gif">
XML
<?xml version="1.0" encoding="UTF-8" standalone= "no" ?>
<!DOCTYPE experiment SYSTEM "project.dtd">
<experiment>
<results image="test"/>
</experiment>
Example of Attribute: Entities
<experiment>
<results image="test1 test2 test3"/>
</experiment>
Example of Attribute: NMTOKEN
<?xml version="1.0" encoding="UTF-8" standalone= "yes" ?>
<!DOCTYPE performances [
<!ELEMENT performances (#PCDATA)>
<!ATTLIST performances dates NMTOKEN #REQUIRED>
]>
<performances dates="08-06-2016">
Kat and the Kings
</performances>
Example of Attribute: NMTOKENS
list.dtd
<!ELEMENT code (#PCDATA)>
<!NOTATION jv SYSTEM "Java JDK 8u66">
<!ENTITY jversion SYSTEM “jversion.jpg” NDATA jv>
<!ATTLIST code description ENTITY #REQUIRED>
XML
<?xml version="1.0" encoding="UTF-8" standalone= "no" ?>
<!DOCTYPE code SYSTEM “list.dtd">
<ToDoList>
<task status="important">This is an important task</task>
<task status="normal">This task can wait for few days</task>
</ToDoList>
Entity
Entity is a storage unit that contains particular parts of an XML
document.
It may be some special characters, common text, file, a database record or
network resource.
Entities data can be acted as abbreviation or found at external
location.
It helps to reduce the repetitive information and allows for easier
editing (by reducing the number of occurrences of data to edit).
Entity declaration:
The entity name and value is declared in DTD.
Entity reference:
In the XML, when the entity name is referenced, the entity value is read
in its place.
Entity Declaration & Entity
Reference Example
Source: http://www.csse.monash.edu.au/hons/projects/2000/Susanti/thesis.html
Entity Type
Entities can be declared internally or externally.
Internal (parsed)
external (parsed)
external (unparsed)
DTD:
<!ELEMENT copyright (#PCDATA)>
<!ENTITY cpy PUBLIC "-//W3C//TEXT copy//EN"
"http://www.w3.org/copyright.html">
XML:
<?xml version="1.0" standalone="no" ?>
<!DOCTYPE list SYSTEM "list.dtd">
<copyright>&cpy;</copyright>
public_ID: This may be used by an XML processor to generate an alternate URI where the external parsed entity can be found.
If it cannot be found at this URI, the XML processor must use the normal URL.
Entity Type
External unparsed entity:
It refers to non-XML data (such as image or binary data), identified by a notation.
NOTATION can be used to describe the format of Non-XML data.
XML parser does not parse this entity.
<!ENTITY entity_name SYSTEM "URL" NDATA notation_name>
DTD:
<!ELEMENT figure EMPTY>
<!ATTLIST figure source ENTITY #REQUIRED>
<!ENTITY logo SYSTEM "http://www.abc.net/logo.png" NDATA
pinfo>
<!NOTATION pinfo SYSTEM "image/png">
<!ENTITY entity_name PUBLIC "public_ID" "URL" NDATA
DTD:
<!ELEMENTnotation_name>
figure EMPTY>
<!ATTLIST figure source ENTITY #REQUIRED>
<!ENTITY logo PUBLIC "-//W3C//GIF logo//EN" "http://www.w3.org/logo.png" NDATA
PNG>
<!NOTATION PNG SYSTEM "image/png">
XML:
<?xml version="1.0" standalone="no" ?
>
<figure source="logo"/>
Limitations of DTDs
DTDs are a very weak specification language
restrictions cannot be put on the element contents
It is difficult to specify:
All the children must occur, but may be in any order
This element must occur a certain number of times
There are only ten data types for attribute values
But most of all: DTDs are not written in XML!
If you want to do any validation, you need one parser for the XML
and another for the DTD
This makes XML parsing harder than it needs to be
There is a newer and more powerful technology: XML Schemas
However, DTDs are still very much in use