Data Warehousing
Data Warehousing
all
0-D(apex) cuboid
time,location,supplier
time,item,location 3-D cuboids
time,item,supplier item,location,supplier
4-D(base) cuboid
time, item, location, supplier
August 18, 2022 Data Mining: Concepts and Techniques 15
Conceptual Modeling of
Data Warehouses
◼ Modeling data warehouses: dimensions & measures
◼ Star schema: A fact table in the middle connected to a
set of dimension tables
◼ Snowflake schema: A refinement of star schema
where some dimensional hierarchy is normalized into a
set of smaller dimension tables, forming a shape
similar to snowflake
◼ Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact constellation
August 18, 2022 Data Mining: Concepts and Techniques 16
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures
branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type
dollars_sold
city_key
avg_sales city
province_or_street
Measures country
Snowflake enables data storage, processing, and analytic solutions that are faster, easier to
use, and far more flexible than traditional offerings.
<dimension_name_first_time> in cube
<cube_name_first_time>
all all
Specification of hierarchies
◼ Schema hierarchy
day < {month <
quarter; week} < year
◼ Set_grouping hierarchy
{1..10} < inexpensive
Office Day
Month
August 18, 2022 Data Mining: Concepts and Techniques 28
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
TV
PC U.S.A
VCR
Country
sum
Canada
Mexico
sum
all
0-D(apex) cuboid
product date country
1-D cuboids
3-D(base) cuboid
product, date, country
ORDER
TRUCK
PRODUCT LINE
Time Product
ANNUALY QTRLY DAILY PRODUCT ITEM PRODUCT GROUP
CITY
SALES PERSON
COUNTRY
DISTRICT
REGION
DIVISION
Location Each circle is
called a footprint Promotion Organization
August 18, 2022 Data Mining: Concepts and Techniques 32
Chapter 2: Data Warehousing and
OLAP Technology for Data Mining
◼ Choose the dimensions that will apply to each fact table record
◼ Choose the measure that will populate each fact table record
Monitor
& OLAP Server
other Metadata
sources Integrator
Analysis
Operational Extract Query
Transform Data Serve Reports
DBs
Load
Refresh
Warehouse Data mining
Data Marts
materialized
August 18, 2022 Data Mining: Concepts and Techniques 37
Data Warehouse Development:
A Recommended Approach
Multi-Tier Data
Warehouse
Distributed
Data Marts
Enterprise
Data Data
Mart Mart Data
Warehouse
techniques)
◼ fast indexing to pre-computed summarized data
C c3 61
c2 45
62 63 64
46 47 48
c1 29 30 31 32
c0
B13 14 15 16 60
b3 44
B 28 56
b2 9
40
24 52
b1 5
36
20
b0 1 2 3 4
a0 a1 a2 a3
A
C c3 61
c2 45
62 63 64
46 47 48
c1 29 30 31 32
c0
B13 14 15 16 60
b3 44
B 28 56
b2 9
40
24 52
b1 5
36
20
b0 1 2 3 4
a0 a1 a2 a3
A
Layer2
MDDB
MDDB
Meta Data