TCP Lec03

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 44

DTD in XML

Lecture 3

DTD video: https://www.youtube.com/watch?


XML and DTDs
 A DTD (Document Type Definition) define the structure of
one or more XML documents.
 Specifically, a DTD describes:
 Elements
 Attributes
 Entities
 An XML document is well-formed if it follows certain
simple syntactic rules
 An XML document is valid if it also specifies and
conforms to a DTD
Well-Formed XML Documents
 A “Well-Formed” XML document has correct XML
syntax.
 A “Well-Formed” XML document is a document that
conforms to the XML syntax rules described in previous
chapters:
 XML documents must have a root element
 XML elements must have start and end tags
 XML tags are case sensitive
 XML elements must be properly nested
 XML attributed must always be quoted
Valid XML Documents
 An XML document is valid if there is a DTD or XML
schema associated with it, and if the XML document
complies with that DTD or schema.
 A valid XML document is a well-formed XML document.
Why used DTD?
 XML documents provide an application independent way
of sharing data.
 XML documents may processed by computer programs:
 It is very hard to write a program that knows how to process the
tags since any tags can be included in an XML document.
 A DTD specifies what tags may occur, when they may occur,
and what attributes they may (or must) have
 A DTD allows the XML document to be verified.
 With a DTD, independent groups of people agree to use a
common DTD for interchanging data.
 Thus, DTD that is shared across groups allows the groups
to produce consistent XML documents.
Parsers
 An XML parser is an API that reads the content of an
XML document
 A validating parser is an XML parser that compares the
XML document to a DTD and reports any errors
 Validation is the process of checking a document against a DTD
(more generally against a set of construction rules).
 Most browsers don’t use validating parsers
XML Example
<?xml version="1.0"?>
<!DOCTYPE record SYSTEM "record.dtd">
<novel>
<foreword>
<paragraph>This is an Asian novel.</paragraph>
<paragraph>The story was happening in a trimester break.</paragraph>
</foreword>
<chapter number="1">
<paragraph>It was a dark and stormy night.</paragraph>
<paragraph>Suddenly, a shot rang out!</paragraph>
</chapter>
</novel>

 An XML document contains (and the DTD describes):


 Elements, such as novel and paragraph, consisting of tags and content
 Attributes, such as number="1", consisting of a name and a value
 Entities (not used in this example)
DTD Example

<!ELEMENT novel (foreword, chapter+)>


<!ELEMENT foreword (paragraph+)>
<!ELEMENT chapter (paragraph+)>
<!ELEMENT paragraph (#PCDATA)>
<!ATTLIST chapter number CDATA #REQUIRED>

 A novel consists of a foreword and one or more chapters, in that order


 Each chapter must have a number attribute
 A foreword consists of one or more paragraphs
 A chapter also consists of one or more paragraphs
 A paragraph consists of parsed character data (text that cannot contain
any other elements)
Another DTD Example
<?xml version="1.0"?>
<!DOCTYPE page SYSTEM "page.dtd">
<page>
<title>Hello friend</title>
<content>Here is some content :)</content>
<comment>Written by TD/MK, at S.M./the Cocoon samples</comment>
</page>
DTD Declaration – Internal DTD
 There are two approaches in DTD declaration:
 Internal DTD Declaration
 External DTD Declaration
 Internal DTD Declaration
 If the DTD is declared inside the XML file, it must be wrapped
inside the <!DOCTYPE> definition
<!DOCTYPE name_of_root [
elements/attributes/entities
]>
 name_of_root tells the parser to parse the document from the specified
root element
 The square brackets [ ] enclose a list of elements/attributes/entities
declarations
DTD Declaration – Internal DTD
 Internal DTD Declaration
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
root element
<!DOCTYPE note [
<!ELEMENT note (subject, content, sender)>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT content (#PCDATA)> element declaration
<!ELEMENT sender (#PCDATA)>
]>

<note>
<subject>Reminder</subject>
<content>Bring exercise on next Monday</content>
<sender>Wong</sender>
</note>
DTD Declaration – External DTD
 External DTD Declaration
 DTD is declared in a separate file.
 This is the same XML document with an external DTD:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE note SYSTEM "message.dtd">
<note>
<subject>Reminder</subject> URI reference pointing to the
root element <content>Bring exercise on next Monday</content> location of the document
<sender>Wong</sender>
</note>

 SYSTEM identifier tells the XML parser where to find the DTD
file on the system.
 Use SYSTEM for external DTDs that you define yourself, and use
PUBLIC for official, published DTDs
 This is a copy of the file "message.dtd" containing the DTD:
<!ELEMENT note (subject, content, sender)>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT content (#PCDATA)>
<!ELEMENT sender (#PCDATA)>
Anatomy of DTD
 Element type declaration
 Attribute declaration
 Entity declaration
Element Type Declaration
 It declare each element that appears within the XML
document.
 An element declaration has the following syntax:
<!ELEMENT element-name (element-content)>
 Example: <!ELEMENT student (first_name, last_name)>
 Rules:
 All element types used in an XML document must be declared
in the DTD.
 An element type cannot be declared more than once.
 Element names are case sensitive.
 The keyword ELEMENT must be in upper case.
Element Content
 Element content consists of EMPTY, ANY, Mixed, or
children element types.
Element Content Definition
Children Elements Any number of element types can be placed within another element
type. These are called children elements, and the elements they are
placed in are called parent elements.
EMPTY Refers to tags that are empty.
For example, <!ELEMENT book EMPTY> <book />
ANY Refers to anything at all, as long as XML rules are followed.
ANY is useful to use when you have yet to decide the contents of
the element.
Mixed content Refers to a combination of (#PCDATA) and children elements.
PCDATA stands for parsed character data, that is, text that is not
markup. Within mixed content models, text can appear by itself or it
can be interspersed between elements.
PCDATA and CDATA
 PCDATA means parsed character data
 It is text that will be parsed by a parser. The text will be
examined by the parser for entities and markup.
 Tag inside the text will be treated as markup and entities will be
expanded.
 It should not contain any &, <, or > characters; these need to be
represented by the &amp; &lt; and &gt; entities, respectively.

 CDATA means character data


 It is text that will NOT be parsed by a parser.
 Tag inside the text will NOT be treated as markup and entities
will NOT be expanded.
Element Content: Children Element
 Children element types are declared using parentheses in the
parent element type's declaration.
<!ELEMENT parent_name (child_name)>
<!ELEMENT child_name element_content>

 Example:
detail.dtd
<!ELEMENT student (id)>
<!ELEMENT id (#PCDATA)>

XML
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE student SYSTEM "detail.dtd">
<student>
<id>1151108766</id>
</student>
Element Content: Children Element
 To declare Multiple Children in Sequence:
 Multiple children are declared using commas (,).
<!ELEMENT parent_name (child1_name,child2_name,child3_name)>
<!ELEMENT child1_name element_content>
<!ELEMENT child2_name element_content >
<!ELEMENT child3_name element_content >

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>


 Example: <!DOCTYPE student [
<!ELEMENT student (id, surname, firstname)>
<!ELEMENT id (#PCDATA)>
<!ELEMENT surname (#PCDATA)>
<!ELEMENT firstname (#PCDATA)>
]>
<student>
<id>1151108766</id>
<surname>Wong</surname>
<firstname>Steven</firstname>
</student>
Element Content: Children Element
 To declare Optional Children
 Optional children are declared using the (?) operator.
 Optional means zero or one times
<!ELEMENT parent_name (child_name?)>
<!ELEMENT child_name element_content>
 Example:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE student [
<!ELEMENT student (birthdate?)>
<!ELEMENT birthdate (#PCDATA)>
]>

<student>
<birthdate>20.05.98</birthdate>
</student>
Element Content: Children Element
 To declare Zero or More Children
 Zero or more children are declared using the (*) operator.
<!ELEMENT parent_name (child_name*)>
<!ELEMENT child_name element_content>
 Example:
content.dtd
<!ELEMENT student (subject*)>
<!ELEMENT subject (#PCDATA)>

XML
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE student SYSTEM "content.dtd">
<student>
<subject>English</subject>
<subject>Programming</subject>
<subject>Database</subject>
</student>
Element Content: Children Element
 To declare One or More Children:
 One or more children are declared using the (+) operator.
<!ELEMENT parent_name (child_name+)>
<!ELEMENT child_name element_content>

 Example:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE student [
<!ELEMENT student (subject+)>
<!ELEMENT subject (#PCDATA)>
]>

<student>
<subject>English</subject>
</student>
Element Content: Children Element
 Combinations of Children (Choice):
 A choice between children element types is declared using the (|)
operator.
<!ELEMENT parent_name (child1_name | child2_name)>
<!ELEMENT child1_name element_content>
<!ELEMENT child2_name element_content>
 Example:
content.dtd
<!ELEMENT student (id | surname)>
<!ELEMENT id (#PCDATA)>
<!ELEMENT surname (#PCDATA)>

XML
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE student SYSTEM "content.dtd">
<student>
<id>1151108766</id>
</student>
Element Content: Empty & ANY Element

Type Empty element ANY element


Definition The element has no child elements or The element contain zero or more child
character data. elements of any declared type, as well as text
(PCDATA).
Syntax <!ELEMENT element_name (EMPTY)> <!ELEMENT element_name (ANY)>

Example
<!DOCTYPE room [ <!DOCTYPE room [
<!ELEMENT room (box)> <!ELEMENT room (box)>
<!ELEMENT box (EMPTY)> <!ELEMENT box (ANY)>
]> ]>

<room> <room>
<box/> <box>pink box</box>
</room> </room>
Element Content: Mixed Element
 Mixed content is used to declare elements that contain a mixture
of children elements and text (PCDATA).
 Text can appear by itself or it can be interspersed between elements
<!ELEMENT parent_name (#PCDATA|child1_name)*>
 Example:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE student [
<!ELEMENT student (description) >
<!ELEMENT description (#PCDATA | title | detail)* >
]>

<student>
<description> Muthu like to teach
<title> Java Programming </title> using software called
<detail> Netbeans </detail> </description>
</student>
DTD Cardinality
 An element’s cardinality defines how many times it will
appear within a content model

Indicator Description
[none] Element must appear once and only once
? Element may appear either once or not at all
+ Element may appear either one or more times
* Element may appear either zero or more times
Attribute Declaration
 Attributes are additional information associated with an
element type.
 They are intended for interpretation by an application.
 The ATTLIST declarations identify which element types
may have attributes, what type of attributes they may be,
and what the default value of the attributes are.
<!ATTLIST element_name attribute_name attribute_type default_value>
.
.
.
<element attribute_name="attribute_value">
Attribute Declaration: Attribute Type
Type Description
CDATA CDATA stands for character data. This type allows all the characters including
numbers and space character.
ID ID is a unique identifier of the attribute. An element type may only have
one ID attribute. An ID attribute can only have an #IMPLIED or #REQUIRED
setting.
IDREF IDREF is used to establish connections between elements. The IDREF value of the
attribute must refer to an ID value declared elsewhere in the document.
IDREFS It allows multiple ID values separated by whitespace
ENTITY It allows one entity name where this entity should be unparsed entity
ENTITIES It allows multiple ENTITY names separated by whitespace.
NMTOKEN It is same as CDATA but does not accept space character
NMTOKENS It accepts one or more tokens(where one token is a sequence of characters
without space character) and in this case space is taken as separator between
tokens
NOTATION NOTATION is used to specify the format of non-XML data. A common use of
notations is to describe MIME types such as image/gif, image/jpeg etc.
Enumerated It provides a specific list of values where one of the values must match.
Example: <!ATTLIST element_name attribute_name (value1 | value2 | value3) >
Attribute Declaration: Default Value
 Default Value - Specifies the value of the attribute name if its
value is not otherwise defined.
Type Description
Character Data default value of the attribute in a quoted string form.
(CDATA) Example: <!ATTLIST course degree CDATA "PhD">
#FIXED It is used to fix a default value of the attribute.

#IMPLIED The attribute is optional

#REQUIRED The attribute must be present, one value is required


Example of Attribute: CDATA

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>


<!DOCTYPE image [
<!ELEMENT image EMPTY>
<!ATTLIST image height CDATA #REQUIRED>
<!ATTLIST image width CDATA #REQUIRED>
]>

<image height="32" width="51"/>


Example of Attribute: ID, IDREF
content.dtd
<!ELEMENT individuals (name)*>
<!ELEMENT name (#PCDATA)>
<!ATTLIST name individual_id ID #REQUIRED>
<!ATTLIST name parent1_id IDREF #IMPLIED>
<!ATTLIST name parent2_id IDREF #IMPLIED>

XML
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE individuals SYSTEM "content.dtd">

<individuals>
<name individual_id="a8904885">Alex Kok</name>
<name individual_id="a9011133">Sarah Tan</name>
<name individual_id="a9216735" parent1_id="a9011133"
parent2_id="a8904885">Mary Kok</name>
</individuals>
Example of Attribute: ID, IDREFS
<?xml version="1.0" encoding="UTF-8" standalone= "yes" ?>
<!DOCTYPE individuals [
<!ELEMENT individuals (name)*>
<!ELEMENT name (#PCDATA)>
<!ATTLIST name individual_id ID #REQUIRED>
<!ATTLIST name parent_id IDREFS #IMPLIED>
]>

<individuals>
<name individual_id="A8908">Alex Kok</name>
<name individual_id="A9113">Sarah Tan</name>
<name individual_id="A9273" parent_id="A9113 A8908">Mary Kok</name>
</individuals>
Example of Attribute: Entity

project.dtd
<!ELEMENT experiment (results)*>
<!ELEMENT results EMPTY>
<!ATTLIST results image ENTITY #REQUIRED>
<!ENTITY test SYSTEM "http://www.mmu.edu.my/image/exp.gif">

XML
<?xml version="1.0" encoding="UTF-8" standalone= "no" ?>
<!DOCTYPE experiment SYSTEM "project.dtd">

<experiment>
<results image="test"/>
</experiment>
Example of Attribute: Entities

<?xml version= "1.0" encoding="UTF-8" standalone= "yes" ?>


<!DOCTYPE experiment [
<!ELEMENT experiment (results)*>
<!ELEMENT results EMPTY>
<!ATTLIST results image ENTITIES #REQUIRED>
<!ENTITY test1 SYSTEM "http://www.mmu.edu.my/image/exp1.gif">
<!ENTITY test2 SYSTEM "http://www.mmu.edu.my/image/exp2.gif">
<!ENTITY test3 SYSTEM "http://www.mmu.edu.my/image/exp3.gif">
]>

<experiment>
<results image="test1 test2 test3"/>
</experiment>
Example of Attribute: NMTOKEN
<?xml version="1.0" encoding="UTF-8" standalone= "yes" ?>
<!DOCTYPE performances [
<!ELEMENT performances (#PCDATA)>
<!ATTLIST performances dates NMTOKEN #REQUIRED>
]>

<performances dates="08-06-2016">
Kat and the Kings
</performances>
Example of Attribute: NMTOKENS

<?xml version="1.0" encoding="UTF-8" standalone= "yes" ?>


<!DOCTYPE performances [
<!ELEMENT performances (#PCDATA)>
<!ATTLIST performances dates NMTOKENS #REQUIRED>
]>

<performances dates="08-06-2016 10-06-2016 11-06-2016">


Kat and the Kings
</performances>
Example of Attribute: Notation

list.dtd
<!ELEMENT code (#PCDATA)>
<!NOTATION jv SYSTEM "Java JDK 8u66">
<!ENTITY jversion SYSTEM “jversion.jpg” NDATA jv>
<!ATTLIST code description ENTITY #REQUIRED>

XML
<?xml version="1.0" encoding="UTF-8" standalone= "no" ?>
<!DOCTYPE code SYSTEM “list.dtd">

<code description ="jversion">Java instructions</code>


Example of Attribute: Enumerated

<?xml version="1.0" encoding="UTF-8" standalone= "yes" ?>


<!DOCTYPE ToDoList [
<!ELEMENT ToDoList (task)*>
<!ELEMENT task (#PCDATA)>
<!ATTLIST task status (important | normal) #REQUIRED>
]>

<ToDoList>
<task status="important">This is an important task</task>
<task status="normal">This task can wait for few days</task>
</ToDoList>
Entity
 Entity is a storage unit that contains particular parts of an XML
document.
 It may be some special characters, common text, file, a database record or
network resource.
 Entities data can be acted as abbreviation or found at external
location.
 It helps to reduce the repetitive information and allows for easier
editing (by reducing the number of occurrences of data to edit).

 Entity declaration:
 The entity name and value is declared in DTD.
 Entity reference:
 In the XML, when the entity name is referenced, the entity value is read
in its place.
Entity Declaration & Entity
Reference Example

Source: http://www.csse.monash.edu.au/hons/projects/2000/Susanti/thesis.html
Entity Type
 Entities can be declared internally or externally.
 Internal (parsed)
 external (parsed)
 external (unparsed)

 Internal parsed entity:


 It is defined completely within the XML document
 It refers to data or text that an XML parser has to parse.
<!ENTITY entity_name "entity_value">

<?xml version="1.0" standalone="yes" ?>


<!DOCTYPE author [
<!ELEMENT author (#PCDATA)>
<!ENTITY mmu "Multimedia University"> Entity declaration
]>

<author>&mmu;</author> Entity reference


Entity Type
 External parsed entity:
 It refers to data or text that an XML parser has to parse.
 It acquires the content from another source located via a URL (Uniform Resource
Locator).
 It creates a common reference that can be shared between multiple documents.
 Any changes that are made to external entities are automatically updated in the
documents they are referenced.
 There are two types of external entities: private, and public.
Entity Type
External parsed entity:
Private external entities Public external entities
keyword SYSTEM keyword PUBLIC
It is intended for use by a single author or It is intended for broad use.
group of authors.

<!ENTITY entity_name SYSTEM "URL"> <!ENTITY entity_name PUBLIC "public_ID"


"URL">
DTD:
<!ELEMENT copyright (#PCDATA)>
<!ENTITY cpy SYSTEM "copyright.html">

DTD:
<!ELEMENT copyright (#PCDATA)>
<!ENTITY cpy PUBLIC "-//W3C//TEXT copy//EN"
"http://www.w3.org/copyright.html">
XML:
<?xml version="1.0" standalone="no" ?>
<!DOCTYPE list SYSTEM "list.dtd">
<copyright>&cpy;</copyright>
public_ID: This may be used by an XML processor to generate an alternate URI where the external parsed entity can be found.
If it cannot be found at this URI, the XML processor must use the normal URL.
Entity Type
 External unparsed entity:
 It refers to non-XML data (such as image or binary data), identified by a notation.
 NOTATION can be used to describe the format of Non-XML data.
 XML parser does not parse this entity.
<!ENTITY entity_name SYSTEM "URL" NDATA notation_name>
DTD:
<!ELEMENT figure EMPTY>
<!ATTLIST figure source ENTITY #REQUIRED>
<!ENTITY logo SYSTEM "http://www.abc.net/logo.png" NDATA
pinfo>
<!NOTATION pinfo SYSTEM "image/png">
<!ENTITY entity_name PUBLIC "public_ID" "URL" NDATA
DTD:
<!ELEMENTnotation_name>
figure EMPTY>
<!ATTLIST figure source ENTITY #REQUIRED>
<!ENTITY logo PUBLIC "-//W3C//GIF logo//EN" "http://www.w3.org/logo.png" NDATA
PNG>
<!NOTATION PNG SYSTEM "image/png">
XML:
<?xml version="1.0" standalone="no" ?
>
<figure source="logo"/>
Limitations of DTDs
 DTDs are a very weak specification language
 restrictions cannot be put on the element contents
 It is difficult to specify:
 All the children must occur, but may be in any order
 This element must occur a certain number of times
 There are only ten data types for attribute values
 But most of all: DTDs are not written in XML!
 If you want to do any validation, you need one parser for the XML
and another for the DTD
 This makes XML parsing harder than it needs to be
 There is a newer and more powerful technology: XML Schemas
 However, DTDs are still very much in use

You might also like