DMDW (Olap)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Data Warehouse and OLAP

Week 5

1
Midterm I

• Friday, March 4
• Scope
– Homework assignments 1 – 4
– Open book
Team Homework Assignment #7

• R
Read pp. 121 –
d 121 139, 146 –
139 146 150  of the text book.
150 f h b k
• Do Examples 3.8, 3.10 and Exercise 3.4 (b) and (c). Prepare for 
the results of the homework assignment.
the results of the homework assignment.
• Due date
– beginning of the lecture on Friday March11th. 
Topics

• Definition of data warehouse
• Multidimensional data model
• Data warehouse architecture
• From data warehousing to data mining
What is Data Warehouse? (1)

• A data warehouse is a repository of information 
collected from multiple sources, stored under a 
unified schema, and that usually resides at a single 
site
• A data warehouse is a semantically consistent data 
store that serves as a physical implementation of a 
decision support data model and stores the 
information on which an enterprise need to make 
strategic decisions
What is Data Warehouse? (2)
• Data
Data warehouses provide on
warehouses provide on‐line
line analytical 
analytical
processing (OLAP) tools for the interactive analysis of 
multidimensional data of varied granularities, which 
facilitate effective data generalization and data 
mining
• Many other data mining functions, such as 
association, classification, prediction, and clustering, 
can be integrated
b i t t d with OLAP operations to enhance 
ith OLAP ti t h
interactive mining of knowledge at multiple levels of 
abstraction
What is Data Warehouse? (3)

• A decision support database that is maintained 
separately from the organization’s operational 
database
• “A data warehouse is a subject‐oriented, integrated, 
time‐variant, and nonvolatile collection of data in 
support of management’s decision‐making process
[Inm96].”—W. H. Inmon
Data Warehouse Framework

data mining

Figure 1.7 Typical framework of a data warehouse for AllElectronics

8
Data Warehouse is
S bj
Subject-Oriented
Oi d

• Organized around major subjects, such as customer, 
product, sales, etc.
• Focusing on the modeling and analysis of data for 
decision makers, not on daily operations or 
transaction processing
• Provide a simple and concise view around particular 
subject issues by excluding data that are not useful in 
the decision support process
Data Warehouse is
I t
Integrated
t d

• Constructed by integrating multiple, heterogeneous data 
sources
– relational databases, flat files, on‐line transaction records
• Data cleaning
Data cleaning and data integration
and data integration techniques are applied.
techniques are applied
– Ensure consistency in naming conventions, encoding 
structures, attribute measures, etc. among different data 
sources
• E.g., Hotel price: currency, tax, breakfast covered, etc.  
D t W
Data Warehouse
h i Time
is Ti Variant
V i t

• The time horizon for the data warehouse is significantly 
longer than that of operational systems
g p y
– Operational database: current value data
– Data warehouse data: provide information from a 
historical perspective (e.g., past 5‐10 years)
• Every key structure in the data warehouse
– Contains an element of time, explicitly or implicitly
D t W
Data Warehouse
h i Nonvolatile
is N l til

• A physically separate store of data transformed from the 
operational environment
• Operational update of data does not occur in the data 
warehouse environment
– Does not require transaction processing, recovery, and 
concurrency control mechanisms
– Requires only two operations in data accessing: 
R i l t ti i d t i
• initial loading of data and access of data
OLTP vs.
vs OLAP

Table 3.1 Comparison between OLTP and OLAP


13
Why Separate is Data Warehouse Needed?
(1)

• Why not perform on‐line analytical processing directly on 
operational databases instead of spending additional time
operational databases instead of spending additional time 
and resources to construct a separate data warehouse?
Why Separate is Data Warehouse
N d d? (2)
Needed? ( )

• High performance for both systems
g p y
– DBMS— tuned for OLTP: searching for particular records, 
indexing, hashing, concurrency control, recovery
– Warehouse—tuned for OLAP: complex OLAP queries, 
multidimensional view, consolidation (summarization and 
aggregation)
ti )
Topics

• Definition of data warehouse
• Multidimensional data model
• Data warehouse architecture
• From data warehousing to data mining
From Tables and Spreadsheets to
D C
Data Cubes
b

• A
A data warehouse is based on a multidimensional 
data warehouse is based on a multidimensional
data model
• This model views data in the form of a data cube
This model views data in the form of a data cube
• A data cube allows data to be modeled and viewed in 
multiple dimensions
multiple dimensions
From Tables and Spreadsheets to Data
C b (1)
Cubes
• A data cube is defined by facts and dimensions
A data cube is defined by facts and dimensions
– Facts are data which data warehouse focus on
• Fact
Fact tables contain numeric measures
tables contain numeric measures (such as 
(such as
dollars_sold) and keys to each of the related dimension 
tables
– Dimensions are perspectives with respect to
fact
• Dimension tables describe the dimension with 
attributes. For example, item (item_name, brand, type), 
or time(day, week, month, quarter, year) 
time(day week month quarter year)
19
Figure 1.6. Fra
agments of
o relations
from
m a relatio
onal datab
base for AlllElectronic
cs
From Tables and Spreadsheets
t Data
to D t Cubes
C b (2)

dimensions

Facts (numerical measures)

Table 3.2 A 2-D view of sales data for AllElectronics according to the
di
dimensions
i ti
time and
d item,
it where
h the
th salesl are from
f branches
b h located
l t d in
i
the city of Vancouver. The measure displayed is dollar_sold (in thousands).
20
From Tables and Spreadsheets
t Data
to D t Cubes
C b (3)

Table 3.3 A 3-D view of sales data for AllElectronics according to the
dimensions time, item, and location. The measure displayed is dollar_sold (in
thousands).

21
From Tables and Spreadsheets
t Data
to D t Cubes
C b (4)

Figure 3.1 A 3-D data cube representation of the data in Table 3.3,
according to the dimensions time, item, and location. The measure
displayed is dollar_sold (in thousands). 22
From Tables and Spreadsheets
t Data
to D t Cubes
C b (5)

Figure 3.2 A 4-D data cube representation, according to the dimensions


time, item, location, and supplier. The measure displayed is dollar_sold
dollar sold (in
thousands).
23
Cuboid

• A data cube is a lattice of cuboids
• The total number of cuboids
• The apex cuboid
• The base cuboid

24
Figure
g 3.14 Lattice of cuboids, making
g up p a 3-D data cube. Each
cuboid represents a different group-by. The base cuboid contains
the three dimensions city, item, and year.
25
The Curse of Dimensionality

• How many cuboids are there in a n‐dimensional data cube?
• How many cuboids are there in a n‐dimensional data cube 
How many cuboids are there in a n dimensional data cube
and each dimension (i) has the number of level, (Li)?

26
Conceptual Modeling of Data
W h
Warehouses
• M
Modeling data warehouses: dimensions & measures
d li d t h di i &
– Star schema: A fact table in the middle connected to a set 
of dimension tables
of dimension tables 
– Snowflake schema:  A refinement of star schema where 
some dimensional hierarchy is normalized into a set of 
smaller dimension tables, forming a shape similar to 
snowflake
– Fact constellations:  Multiple fact tables share dimension 
F t t ll ti M lti l f t t bl h di i
tables, viewed as a collection of stars, therefore called 
ggalaxy schema
y or fact constellation 
Star Schema
time
time_keyy item
day item_key
day_of_the_week Sales Fact Table
item_name
month brand
quarter
q time key
time_key type
year supplier_type
item_key

branch_key
branch location
location_key
branch_key location_key
dollars_sold street
branch_name
branch_type cit
city
unit_sold province_or_street
country

Figure 3.4 Star schema of a data warehouse for sales.

28
Snowflake Schema
time
supplier
time_key item supplier_key
supplier key
day
item_key supplier_type
day_of_the_week Sales Fact Table
month item_name
quarter time_key brand
year type
item_key supplier_key
branch_key
location key
location_key
branch
dollars_sold
branch_key location
branch_name units_sold
location_key city
branch_type
b h t street city_key
city city
province_or_street
country
Figure 3.4 Snowflake schema of a data warehouse for sales.
29
Fact Constellation Shipping Fact Table
time
item_key
time_key Sales Fact Table
Sales Fact Table item
day time_key
ti k
day_of_the_week item_key shipper_key
month time_key item_name
quarter brand from_location
year item_key
k type to_location
supplier_type
branch_key dollars_sold
location key
location_key unit_shipped
pp
branch location
dollars_sold
branch_key location_key
branch_name unit_sold street
branch type
branch_type city shipper
hi
shipper
province_or_street shipper_key
country shipper_name
location key
location_key
shipper_type
Figure 3.5 Fact constellation schema of a data warehouse for sales 30
and shipping.
Exercise

• Exercise 3.5 (a) 
Exercise 3 5 (a) – page 153
page 153

31

You might also like