Lecture 04
Lecture 04
MINING (SE-409)
Lecture-4
Dimensional Modeling
Dr. Huma
Software Engineering department
1
Dimensional Modeling (DM)
2
The need for ER modeling?
• Problems with early COBOLian data processing
systems.
• Collection of data
• Data redundancies
3
Why ER Modeling has been so successful?
– Coupled with normalization drives out all the
redundancy out of the database.
4
ER Modeling
CITY DISTRICT M DIVISION PROVINCE
1 district BACK
1 1
zone M division
M DISTRICT DIVISION
ZONE CITY
1
store M week
1
STORE # STREET ZONE ... DATE WEEK
1 M
sale_header quarter
M M
RECEIPT # STORE # DATE ... MONTH QTR
1 1
M M
1
WEEK MONTH
M sale_detail month 1
RECEIPT # ITEM # ... $
YEAR QTR
1 M M
1 year
ITEM # CATEGORY
ITEM # SUPPLIER
item_x_cat M
1 item_x_splir
CATEGORY DEPT
cat_x_dept 5
Need for DM: Un-answered Qs
• Lets have a look at a typical ER data model first.
• Some Observations:
– All tables look-alike, as a consequence it is difficult to identify:
3 2 5
2 5 4
• Too complex for queries that span multiple tables with a large
number of records
8
ER vs. DM
ER DM
Constituted to optimize OLTP Constituted to optimize DSS
performance. query performance.
Models the
Models the micro/detail macro[aggregate]
relationships among data relationships among data
elements. elements with an overall
deterministic strategy.
All dimensions serve as
A wild variability of the
equal entry points to the
structure of ER models.
fact table.
Very vulnerable to changes in Changes in users' querying
the user's querying habits, habits can be
because such schemas are accommodated by
asymmetrical. automatic SQL generators.
9
How to simplify a ER data model?
• Bring it to DSS
• Two general methods:
– De-Normalization
10
What is DM?…
• A simpler logical model optimized for decision support.
• Inherently dimensional in nature[fact + dimension] , with a
single central fact table and a set of smaller dimensional
tables.
• Multi-part key for the fact table (long in terms of data, contain
numerical data, how many item sale, what revenue we get
from sale+ how much sale we need + single column primary
key).
11
What is DM?...
12
Dimensions have Hierarchies
Items
Books Cloths
Engg Medical
13
The two Schemas
Star
Snow-flake
14
“Simplified” 3NF (Retail)
CITY DISTRICT M DIVISION PROVINCE
1 district BACK
1 1
zone M division
M DISTRICT DIVISION
ZONE CITY
1
store M week
1
STORE # STREET ZONE ... DATE WEEK
1 M
sale_header quarter
M M
RECEIPT # STORE # DATE ... MONTH QTR
1 1
M M
1
WEEK MONTH
M sale_detail month 1
RECEIPT # ITEM # ... $
YEAR QTR
1 M M
1 year
ITEM # CATEGORY
ITEM # SUPPLIER
item_x_cat M
1 item_x_splir
CATEGORY DEPT
cat_x_dept 15
Vastly Simplified Star Schema
Product Dim
Geography Dim
1 ITEM#
STORE# 1
Fact Table CATEGORY
ZONE
RECEIPT#
DEPT
CITY
STORE#
M SUPPLIER
DISTRICT
ITEM# M
DIVISION
DATE Time Dim
M
PROVINCE . DATE
. 1
facts . WEEK
QUARTER
YEAR
16
The Benefit of Simplicity
17
Features of Star Schema
Dimensional hierarchies are collapsed into a single table for
each dimension. Loss of Information? Relationship lost
–
18
Process of Dimensional Modeling
The Process of Dimensional Modeling
Four Step Method from ER to DM: ER covers all business.[ Visualization high,
complexity is high, whether requirement or not].
Star-1
Snow-flake
Step-2: Choosing the Grain
• Grain is the fundamental, atomic [not further break down] level
of data to be represented.
• Typical grains
– Individual Transactions [single + multiple
– Daily aggregates (snapshots)
– Monthly aggregates
SAY SOME ONE RUN SOME PROMOTION SCHEME AND SEEING HOW
PEOPLE RESPOND ON IT
Aggregation hides crucial facts chart
250
Z1 Z2 Z3 Z4
200
Sale
wise 150
Wrong
grain
100
setting
50
0
Week-1 Week-2 Week-3 Week-4
Week wise
Z1: Sale is constant (need to work on it)
Z2: Sale went up, then fell (need of concern)
Z3: Sale is on the rise, why?
Z4: Sale dropped sharply, need to look deeply.
W2: Static sale
Step 3: Choose Facts statement
Facts
“We need monthly sales: data
volume and Rs. by
week, product and Zone : reference”
Dimensions