Lecture 3
Lecture 3
Monitor
Metadata & OLAP Server
other
source Integrator
s Analysis
Operational Extract Query
Transform Data Serve Reports
DBs
Load
Integration
Warehouse Data mining
Refresh
Data Marts
2008/1/9 2
XML is Extensible
• The tags used to markup HTML documents and
the structure of HTML documents are
predefined.
2008/1/9 6
XML Syntax
• An example XML document.
<?xml version="1.0"?>
<note>
<to>Tan Siew Teng</to>
<from>Lee Sim Wee</from>
<heading>Reminder</heading>
<body>Don't forget the Golf Championship this
weekend!</body>
</note>
2008/1/9 7
Example (cont’d)
• The first line in the document: The XML
declaration should always be included.
• It defines the XML version of the document.
• In this case the document conforms to the 1.0
specification of XML.
<?xml version="1.0"?>
• The next line defines the first element of the
document (the root element):
<note>
2008/1/9 8
Example (cont’d)
• The next lines defines 4 child elements of the root
(to, from, heading, and body):
</note>
2008/1/9 9
What is an XML element?
• An XML element is made up of a start tag, an end
tag, and data in between.
<Sport>Golf</Sport>
• The name of the element is enclosed by the less than
and greater than characters, and these are called tags.
• The start and end tags describe the data within the
tags, which is considered the value of the element.
• For example, the following XML element is a
<player> element with the value “Tiger Wood.”
<player>Tiger Wood</player>
2008/1/9 10
There are 3 types of tags
• Start-Tag
– In the example <Sport> is the start tag. It defines type of the element and possible
attribute specifications
<Player firstname=“Wood" lastname=“Tiger">
• End-Tag
– In the example </Sport> is the end tag. It identifies the type of element that tag is
ending. Unlike start tag end tag cannot contain attribute specifications.
• Empty Element Tag
– Like start tag this has attribute specifications but it does not need an end tag.
Denotes that element is empty (does not have any other elements). Note the symbol
for ending tag '/' before '> '
<Player firstname=“Wood" lastname=“Tiger"/>
2008/1/9 11
XML elements must have a closing
tag
In HTML some elements do not have to have a closing tag.
2008/1/9 12
Rules for Naming Elements
• XML names should start with a letter or the
underscore character.
• Rest of the name can contain letters, digits,
dots, underscores or hyphens.
• No spaces in names are allowed.
• Names cannot start with 'xml' which is a
reserved word.
2008/1/9 13
XML tags are case sensitive
• XML tags are case sensitive. The tag <Message> is
different from the tag <message>.
<message>This is correct</message>
<Message>This is incorrect</message>
2008/1/9 14
All XML elements must be
properly nested
In HTML some elements can be improperly nested within each other
like this:
In XML all elements must be properly nested within each other like this
2008/1/9 15
XML documents must have a root tag
• Documents must contain a single tag pair to define the
root element.
• All other elements must be nested within the root element.
• All elements can have sub (children) elements.
• Sub elements must be in pairs and correctly nested within
their parent element:
<root>
<child>
<subchild>
</subchild>
</child>
</root>
2008/1/9 16
XML Attributes
• XML attributes are normally used to describe
XML elements, or to provide additional
information about elements.
• An element can optionally contain one or more
attributes. An attribute is a name-value pair
separated by an equal sign (=).
• Usually, or most common, attributes are used to
provide information that is not a part of the
content of the XML document.
• Often the attribute data is more important to the
XML parser than to the reader.
2008/1/9 17
XML Attributes (cont’d)
• Attributes are always contained within the start
tag of an element. Here are some examples:
<Player firstname=“Wood" lastname=“Tiger“ />
Player - Element Name
Firstname - Attribute Name
Wood - Attribute Value
• HTML examples:
<img src="computer.gif">
<a href="demo.asp">
• XML examples:
<file type="gif">
<person id="3344">
2008/1/9 18
Attribute values must always be
quoted
• XML elements can have attributes in name/value
pairs just like in HTML.
• An element can optionally contain one or more
attributes.
• In XML the attribute value must always be quoted.
• An attribute is a name-value pair separated by an
equal sign (=).
<CITY ZIP="01085">Westfield</CITY>
• ZIP="01085" is an attribute of the <CITY> element.
2008/1/9 19
What is a Comment ?
• Comments are informational help for the
reader.
2008/1/9 21
What is a DTD ?
• Document Type Declaration (DTD) is a
mechanism (set of rules) to describe the
structure, syntax and vocabulary of XML
documents.
2008/1/9 22
Document Type Definition (DTD)
• Define the legal building blocks of an XML document.
• Set of rules to define document structure with a list of
legal elements.
• Declared inline in the XML document or as an external
reference.
• All names are user defined.
• Derived from SGML.
• One DTD can be used for multiple documents.
• Has ASCII format.
• DOCTYPE keyword.
2008/1/9 23
Element Declaration
• Following lines show the possible syntaxes for element
declaration
<!ELEMENT reports (employee*)>
<!ELEMENT employee (ss_number, first_name, middle_name,
last_name, email, extension, birthdate, salary)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT extension EMPTY>
#PCDATA - Parsed Character Data, meaning that elements can
contain text. This requirement means that no child elements may
be present in the element within which #PCDATA is specified
EMPTY - Indicates that this is the leaf element, cannot contain
any more children
2008/1/9 24
Occurrence
• There are notations to specify the number of
occurrences that a child element can occur within
the parent element.
• These notations can appear at the end of each
child element
+ - Element can appear one or several times
* - Element can appear zero or several times
? - Element can appear zero or one time
nothing - Element can appear only once (also, it
must appear once)
2008/1/9 25
Separators
, - Elements on the right and left of
comma must appear in the same
order.
2008/1/9 27
Why use a DTD?
• XML provides an application independent way of
sharing data.
• With a DTD, independent groups of people can
agree to use a common DTD for interchanging
data.
• Your application can use a standard DTD to
verify that data that you receive from the outside
world is valid.
• You can also use a DTD to verify your own data.
2008/1/9 28
Well-formed Document
<?xml version=“1.0”?>
<TITLE>
<title>A Well-formed Documents</title>
<first>
This is a simple
<bold>well-formed</bold>
document.
</first>
<TITLE> Source: L1.xml
2008/1/9 29
Rules for Well-formed
Documents
• The first line of a well-formed XML document
must be an XML declaration.
• All non-empty elements must have start tags and
end tags with matching element names.
• All empty elements must end with />.
• All document must contain one root element.
• Nested elements must be completely nested within
their higher-level elements.
• The only reserved entity references are &,
', > <, and ".
2008/1/9 30
DTD Graph
• Given the DTD information of the XML to be stored, we can
create a structure called the Data Type Definition Graph that
mirrors the structure of the DTD. Each node in the Data Type
Definition graph represents an XML element in rectangle, an XML
attribute in semi-cycle, and an operator in cycle. They are put
together in a hierarchical containment under a root element node,
with element nodes under a parent element node, separated by
occurrence indicator in cycle.
2008/1/9 31
Extended Entity Relationship Model DTD Graph
Root
Entity A
+
1 1
R1 R4
Element E
1 1
n 1
n 1 n
Entity B Entity E
* *
1 1 1 1 Mapping
n
R2 R3 R3 R7 Element A Element F Element H
n n n n 1 1
Customer_no
Invoice_no Customer_name
Quantity Sex Year
Invoice_amount Postal_code Month
Invoice_date Telephone Quantity
Shipment_date Email Total
n 1 Monthly
Invoice R7 Customer 1
Sales
1
Item_no,
1 1 1
Item_name
Author
Publisher
R8 R1 R3 R2 R4 Item_price
n n n n
Invoice_no n n n
Item_no Customer_no Year Year,
Quantity 1
Invoice Customer Address Month Month
Item
Unit_price City
Address State Country
Customer Customer_no Item_no Item Sales R3 Item
Invoice_price Sales Quantity Quantity Total
Discount Is_default Total
n
1
R11
2008/1/9 33
A Sample DTD graph for Customer Sales
Root
+
Sales
* * * *
Invoice_no
Quantity Customer_no
Element Invoice_amount Element Customer_name Element Year
Invoice_date Sex Month
Shipment_date Postal_code Quantity
Invoice Shipment_type Customer Telephone Monthly Total
ID
Email Sales
idref ID Item_no,
Element Item_name
* * * *
Author
Publisher
Item Item_price
Catalog_type
ID ID
Element Quantity Element Element
Unit_price Address Element
Invoice_price City idref
Quantity Quantity Total
Invoice Discount Customer Customer Total
State Country Item Sales
Item Is_default Address Sales
idref
idref
2008/1/9 34
The mapped DTD from DTD Graph
<!ELEMENT Sales (Invoice*, Customer*, Item*, Monthly_sales*)>
<!ATTLIST Sales
Status (New | Updated | History) #required>
<!ELEMENT Invoice (Invoice_item*)>
<!ATTLIST Invoice
Invoice_no CDATA #REQUIRED
Quantity CDATA #REQUIRED
Invoice_amount CDATA #REQUIRED
Invoice_date CDATA #REQUIRED
Shipment_date CDATA #IMPLIED
Customer_idref IDREF #REQUIRED>
<!ELEMENT Customer (Customer_address*)>
<!ATTLIST Customer
Customer_id ID #REQUIRED
Customer_name CDATA #REQUIRED
Customer_no CDATA #REQUIRED
Sex CDATA #IMPLIED
Postal_code CDATA #IMPLIED
Telephone CDATA #IMPLIED
Email CDATA #IMPLIED>
<!ELEMENT Customer_address EMPTY>
<!ATTLIST Customer_address
Address_type (Home|Office) #REQUIRED
Address NMTOKENS #REQUIRED
City CDATA #IMPLIED
State CDATA #IMPLIED
Country CDATA #IMPLIED
Customer_idref
2008/1/16 IDREF #REQUIRED 35
Is_default (Y|N) “Y”>
<!ELEMENT Invoice_Item EMPTY>
<!ATTLIST Invoice_Item
Quantity CDATA #REQUIRED
Unit_price CDATA #REQUIRED
Invoice_price CDATA #REQUIRED
Discount CDATA #REQUIRED
Item_idref IDREF REQUIRED>
<!ELEMENT Item EMPTY>
<!ATTLIST Item
Item_id ID #REQUIRED
Item_name CDATA #REQUIRED
Author CDATA #IMPLIED
Publisher CDATA #IMPLIED
Item_price CDATA #REQUIRED>
<!ELEMENT Monthly_sales(Item_sales*, Customer_sales*)>
<!ATTLIST Monthly_sales
Year CDATA #REQUIRED
Month CDATA #REQUIRED
Quantity CDATA #REQUIRED
Total CDATA #REQUIRED>
<!ELEMENT Item_sales EMPTY>
<!ATTLIST Item_sales
Quantity CDATA #REQUIRED
Total CDATA #REQUIRED
Item_idref IDREF #REQUIRED>
<!ELEMENT Customer_sales EMPTY>
<!ATTLIST Customer_sales
Quantity CDATA #REQUIRED
Total CDATA #REQUIRED
Customer_idref
2008/1/9 IDREF #REQUIRED>
36
Review Question 1
What are the similarity and dissimilarity
between DTD and Well-formed Document?
2008/1/9 37
Tutorial question 1
Map the following Extended Entity Relationship Model into
an DTD Graph and a Document Type Definition (DTD)
Department Department_ID
1
Salary
has
n
Trip Trip_ID
1
taken
n
Staff_ID
Car_rental Car_model
2008/1/9 38
Reading assignment
Chapter 3 Schema Translation in
“Information Systems Reengineering and
Integration” Second Edition, by Joseph
Fong, published by Springer, 2006, pp.142-
154.
2008/1/9 39