0% found this document useful (0 votes)
32 views

Lecture 3

This document discusses the multi-tiered architecture of data warehousing and data mining. It shows a diagram of the typical components involved, including data sources, extraction/transformation/loading processes, data storage in a data warehouse, online analytical processing engines, and front-end analysis and reporting tools. Data is also stored in smaller data marts for specific business units or departments.

Uploaded by

Phan Hoàng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Lecture 3

This document discusses the multi-tiered architecture of data warehousing and data mining. It shows a diagram of the typical components involved, including data sources, extraction/transformation/loading processes, data storage in a data warehouse, online analytical processing engines, and front-end analysis and reporting tools. Data is also stored in smaller data marts for specific business units or departments.

Uploaded by

Phan Hoàng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 39

Multi-Tiered Architecture of data warehousing and data mining

Monitor
Metadata & OLAP Server
other
source Integrator
s Analysis
Operational Extract Query
Transform Data Serve Reports
DBs
Load
Integration
Warehouse Data mining
Refresh

Data Marts

Data Sources Data Storage OLAP Engine Front-End Tools


2008/1/9 1
What is XML?

•EXtensible Markup Language (XML)


•World Wide Web Consortium (W3C) recommendation Version 1.0 as of
10/02/1998.
•Describes data, rather than instructing a system on how to process it.
•Provides powerful capabilities for data integration and data-driven
styling.
•Introduces new processing paradigms and requires new ways of thinking
about Web development.
•A Meta-Markup Language, a set of rules for creating semantic tags used
to describe data.

2008/1/9 2
XML is Extensible
• The tags used to markup HTML documents and
the structure of HTML documents are
predefined.

• The author of HTML documents can only use


tags that are defined in the HTML standard.

• XML allows the author to define his own tags


and his own document structure.
2008/1/9 3
Benefits of using XML
• It is structured.
• Documents are easily committed to a persistence
layer.
• Platform independent, textual information.
• An open standard.
• Language independent.
• DOM and SAX are open, language-independent
set of interfaces.
• It is Web enabled.
2008/1/9 4
Typical XML System
XML Document
(Content)
XML
Parser XML Application
(Processor)
XML
DTD
(Rules)
•XML Document (content)
•XML Document Type Definition - DTD (structure definition; this is an
operational part)
•XML Parser (conformity checker)
•XML Application (uses the output of the Parser to achieve your unique
objectives)
2008/1/9 5
How XML can be used?
• XML can keep data separated from your
HTML document.
• XML can also store data inside HTML
documents (Data Islands).
• XML can be used to exchange data.
• XML can be used to store data.

2008/1/9 6
XML Syntax
• An example XML document.

<?xml version="1.0"?>
<note>
<to>Tan Siew Teng</to>
<from>Lee Sim Wee</from>
<heading>Reminder</heading>
<body>Don't forget the Golf Championship this
weekend!</body>
</note>
2008/1/9 7
Example (cont’d)
• The first line in the document: The XML
declaration should always be included.
• It defines the XML version of the document.
• In this case the document conforms to the 1.0
specification of XML.
<?xml version="1.0"?>
• The next line defines the first element of the
document (the root element):
<note>

2008/1/9 8
Example (cont’d)
• The next lines defines 4 child elements of the root
(to, from, heading, and body):

<to>Tan Siew Teng</to>


<from>Lee Sim Wee</from>
<heading>Reminder</heading>
<body>Don't forget the Golf Championship this weekend!
</body>

• The last line defines the end of the root element:

</note>
2008/1/9 9
What is an XML element?
• An XML element is made up of a start tag, an end
tag, and data in between.
<Sport>Golf</Sport>
• The name of the element is enclosed by the less than
and greater than characters, and these are called tags.
• The start and end tags describe the data within the
tags, which is considered the value of the element.
• For example, the following XML element is a
<player> element with the value “Tiger Wood.”
<player>Tiger Wood</player>

2008/1/9 10
There are 3 types of tags
• Start-Tag
– In the example <Sport> is the start tag. It defines type of the element and possible
attribute specifications
<Player firstname=“Wood" lastname=“Tiger">
• End-Tag
– In the example </Sport> is the end tag. It identifies the type of element that tag is
ending. Unlike start tag end tag cannot contain attribute specifications.
• Empty Element Tag
– Like start tag this has attribute specifications but it does not need an end tag.
Denotes that element is empty (does not have any other elements). Note the symbol
for ending tag '/' before '> '
<Player firstname=“Wood" lastname=“Tiger"/>

2008/1/9 11
XML elements must have a closing
tag
In HTML some elements do not have to have a closing tag.

The following code is legal in HTML:


<p>This is a paragraph
<p>This is another paragraph

In XML all elements must have a closing tag like this:


<p>This is a paragraph</p>
<p>This is another paragraph</p>

2008/1/9 12
Rules for Naming Elements
• XML names should start with a letter or the
underscore character.
• Rest of the name can contain letters, digits,
dots, underscores or hyphens.
• No spaces in names are allowed.
• Names cannot start with 'xml' which is a
reserved word.
2008/1/9 13
XML tags are case sensitive
• XML tags are case sensitive. The tag <Message> is
different from the tag <message>.

• Opening and closing tags must therefore be


written with the same case:

<message>This is correct</message>

<Message>This is incorrect</message>

2008/1/9 14
All XML elements must be
properly nested
In HTML some elements can be improperly nested within each other
like this:

<b><i>This text is bold and italic</b></i>

In XML all elements must be properly nested within each other like this

<b><i>This text is bold and italic</i></b>

2008/1/9 15
XML documents must have a root tag
• Documents must contain a single tag pair to define the
root element.
• All other elements must be nested within the root element.
• All elements can have sub (children) elements.
• Sub elements must be in pairs and correctly nested within
their parent element:
<root>
<child>
<subchild>
</subchild>
</child>
</root>

2008/1/9 16
XML Attributes
• XML attributes are normally used to describe
XML elements, or to provide additional
information about elements.
• An element can optionally contain one or more
attributes. An attribute is a name-value pair
separated by an equal sign (=).
• Usually, or most common, attributes are used to
provide information that is not a part of the
content of the XML document.
• Often the attribute data is more important to the
XML parser than to the reader.
2008/1/9 17
XML Attributes (cont’d)
• Attributes are always contained within the start
tag of an element. Here are some examples:
<Player firstname=“Wood" lastname=“Tiger“ />
Player - Element Name
Firstname - Attribute Name
Wood - Attribute Value
• HTML examples:
<img src="computer.gif">
<a href="demo.asp">
• XML examples:
<file type="gif">
<person id="3344">
2008/1/9 18
Attribute values must always be
quoted
• XML elements can have attributes in name/value
pairs just like in HTML.
• An element can optionally contain one or more
attributes.
• In XML the attribute value must always be quoted.
• An attribute is a name-value pair separated by an
equal sign (=).
<CITY ZIP="01085">Westfield</CITY>
• ZIP="01085" is an attribute of the <CITY> element.

2008/1/9 19
What is a Comment ?
• Comments are informational help for the
reader.

• These are ignored by XML processors.

• They are enclosed within "<!--" and "-->"


tags.
<!-- This is a comment -->
2008/1/9 20
What is a Processing Instruction ?
• Processing Instructions provide a way to send
instructions to computer programs or
applications. They are enclosed within "<?" and
"?>" tags.

<? xml:stylesheet type="text/xsl" href="styler.xsl" ?>

xml:stylesheet - Application name


type="text/xsl" href="styler.xsl" - Instructions to the
application

2008/1/9 21
What is a DTD ?
• Document Type Declaration (DTD) is a
mechanism (set of rules) to describe the
structure, syntax and vocabulary of XML
documents.

• It is a modeling language for XML but it


does not follow the same syntax as XML.

2008/1/9 22
Document Type Definition (DTD)
• Define the legal building blocks of an XML document.
• Set of rules to define document structure with a list of
legal elements.
• Declared inline in the XML document or as an external
reference.
• All names are user defined.
• Derived from SGML.
• One DTD can be used for multiple documents.
• Has ASCII format.
• DOCTYPE keyword.

2008/1/9 23
Element Declaration
• Following lines show the possible syntaxes for element
declaration
<!ELEMENT reports (employee*)>
<!ELEMENT employee (ss_number, first_name, middle_name,
last_name, email, extension, birthdate, salary)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT extension EMPTY>
#PCDATA - Parsed Character Data, meaning that elements can
contain text. This requirement means that no child elements may
be present in the element within which #PCDATA is specified
EMPTY - Indicates that this is the leaf element, cannot contain
any more children

2008/1/9 24
Occurrence
• There are notations to specify the number of
occurrences that a child element can occur within
the parent element.
• These notations can appear at the end of each
child element
+ - Element can appear one or several times
* - Element can appear zero or several times
? - Element can appear zero or one time
nothing - Element can appear only once (also, it
must appear once)

2008/1/9 25
Separators
, - Elements on the right and left of
comma must appear in the same
order.

| - Only one of the elements on the


left or right of this symbol must
appear.
2008/1/9 26
Attribute Declaration
Syntaxes for attribute declaration
<!ATTLIST customer ID CDATA #REQUIRED>
<!ATTLIST customer Preferred (true | false) "false">
Customer - Element name
ID - Attribute type ID uniquely identifies an element
IDREF - Attribute with type IDREF point to elements with an ID attribute
Preferred - Attribute names
(true | false) - Possible attribute values
False - Default attribute value
CDATA - Character data
#REQUIRED- Attribute value must be provided
#IMPLIED - If no value is provided, application must use its
own default
#FIXED - Attribute value must be the one that is
provided in DTD
NMTOKEN - Name token consists of letters, digits, periods, underscores, hyphens
and colon characters

2008/1/9 27
Why use a DTD?
• XML provides an application independent way of
sharing data.
• With a DTD, independent groups of people can
agree to use a common DTD for interchanging
data.
• Your application can use a standard DTD to
verify that data that you receive from the outside
world is valid.
• You can also use a DTD to verify your own data.

2008/1/9 28
Well-formed Document

<?xml version=“1.0”?>
<TITLE>
<title>A Well-formed Documents</title>
<first>
This is a simple
<bold>well-formed</bold>
document.
</first>
<TITLE> Source: L1.xml

2008/1/9 29
Rules for Well-formed
Documents
• The first line of a well-formed XML document
must be an XML declaration.
• All non-empty elements must have start tags and
end tags with matching element names.
• All empty elements must end with />.
• All document must contain one root element.
• Nested elements must be completely nested within
their higher-level elements.
• The only reserved entity references are &amp;,
&apos;, &gt; &lt;, and &quot.
2008/1/9 30
DTD Graph
• Given the DTD information of the XML to be stored, we can
create a structure called the Data Type Definition Graph that
mirrors the structure of the DTD. Each node in the Data Type
Definition graph represents an XML element in rectangle, an XML
attribute in semi-cycle, and an operator in cycle. They are put
together in a hierarchical containment under a root element node,
with element nodes under a parent element node, separated by
occurrence indicator in cycle.

• Facilities are available to link elements together with an Identifier


(ID) and Identifier Reference (IDREF). An element with IDREF
refers to an element with ID. Each ID must have a unique address.
Nodes can refer to each other by using ID and IDREF such that
nodes with IDREF referring to nodes with ID.

2008/1/9 31
Extended Entity Relationship Model DTD Graph
Root

Entity A
+
1 1
R1 R4
Element E
1 1
n 1
n 1 n
Entity B Entity E
* *
1 1 1 1 Mapping
n
R2 R3 R3 R7 Element A Element F Element H

n n n n 1 1

Entity C Entity D Entity F Entity H * *


1
n n
R1
Element B Element G
n 1 1
Entity G * *
n n
Element C Element D
2008/1/9 32
An EER model for Customer Sales

Customer_no
Invoice_no Customer_name
Quantity Sex Year
Invoice_amount Postal_code Month
Invoice_date Telephone Quantity
Shipment_date Email Total
n 1 Monthly
Invoice R7 Customer 1
Sales
1
Item_no,
1 1 1
Item_name
Author
Publisher
R8 R1 R3 R2 R4 Item_price
n n n n
Invoice_no n n n
Item_no Customer_no Year Year,
Quantity 1
Invoice Customer Address Month Month
Item
Unit_price City
Address State Country
Customer Customer_no Item_no Item Sales R3 Item
Invoice_price Sales Quantity Quantity Total
Discount Is_default Total
n
1

R11

2008/1/9 33
A Sample DTD graph for Customer Sales
Root

+
Sales

* * * *
Invoice_no
Quantity Customer_no
Element Invoice_amount Element Customer_name Element Year
Invoice_date Sex Month
Shipment_date Postal_code Quantity
Invoice Shipment_type Customer Telephone Monthly Total
ID
Email Sales
idref ID Item_no,
Element Item_name
* * * *
Author
Publisher
Item Item_price
Catalog_type
ID ID
Element Quantity Element Element
Unit_price Address Element
Invoice_price City idref
Quantity Quantity Total
Invoice Discount Customer Customer Total
State Country Item Sales
Item Is_default Address Sales
idref
idref

2008/1/9 34
The mapped DTD from DTD Graph
<!ELEMENT Sales (Invoice*, Customer*, Item*, Monthly_sales*)>
<!ATTLIST Sales
Status (New | Updated | History) #required>
<!ELEMENT Invoice (Invoice_item*)>
<!ATTLIST Invoice
Invoice_no CDATA #REQUIRED
Quantity CDATA #REQUIRED
Invoice_amount CDATA #REQUIRED
Invoice_date CDATA #REQUIRED
Shipment_date CDATA #IMPLIED
Customer_idref IDREF #REQUIRED>
<!ELEMENT Customer (Customer_address*)>
<!ATTLIST Customer
Customer_id ID #REQUIRED
Customer_name CDATA #REQUIRED
Customer_no CDATA #REQUIRED
Sex CDATA #IMPLIED
Postal_code CDATA #IMPLIED
Telephone CDATA #IMPLIED
Email CDATA #IMPLIED>
<!ELEMENT Customer_address EMPTY>
<!ATTLIST Customer_address
Address_type (Home|Office) #REQUIRED
Address NMTOKENS #REQUIRED
City CDATA #IMPLIED
State CDATA #IMPLIED
Country CDATA #IMPLIED
Customer_idref
2008/1/16 IDREF #REQUIRED 35
Is_default (Y|N) “Y”>
<!ELEMENT Invoice_Item EMPTY>
<!ATTLIST Invoice_Item
Quantity CDATA #REQUIRED
Unit_price CDATA #REQUIRED
Invoice_price CDATA #REQUIRED
Discount CDATA #REQUIRED
Item_idref IDREF REQUIRED>
<!ELEMENT Item EMPTY>
<!ATTLIST Item
Item_id ID #REQUIRED
Item_name CDATA #REQUIRED
Author CDATA #IMPLIED
Publisher CDATA #IMPLIED
Item_price CDATA #REQUIRED>
<!ELEMENT Monthly_sales(Item_sales*, Customer_sales*)>
<!ATTLIST Monthly_sales
Year CDATA #REQUIRED
Month CDATA #REQUIRED
Quantity CDATA #REQUIRED
Total CDATA #REQUIRED>
<!ELEMENT Item_sales EMPTY>
<!ATTLIST Item_sales
Quantity CDATA #REQUIRED
Total CDATA #REQUIRED
Item_idref IDREF #REQUIRED>
<!ELEMENT Customer_sales EMPTY>
<!ATTLIST Customer_sales
Quantity CDATA #REQUIRED
Total CDATA #REQUIRED
Customer_idref
2008/1/9 IDREF #REQUIRED>
36
Review Question 1
What are the similarity and dissimilarity
between DTD and Well-formed Document?

2008/1/9 37
Tutorial question 1
Map the following Extended Entity Relationship Model into
an DTD Graph and a Document Type Definition (DTD)

Department Department_ID
1
Salary

has
n
Trip Trip_ID
1
taken
n
Staff_ID
Car_rental Car_model

An ER model for car rental

2008/1/9 38
Reading assignment
Chapter 3 Schema Translation in
“Information Systems Reengineering and
Integration” Second Edition, by Joseph
Fong, published by Springer, 2006, pp.142-
154.

2008/1/9 39

You might also like