0% found this document useful (0 votes)

15 views

Lecture 04

Uploaded by

Syed Badshah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Lecture 04

Uploaded by

Syed Badshah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Data Warehousing & DATA

MINING (SE-409)
Lecture-4
Dimensional Modeling

Dr. Huma
Software Engineering department

University of Engineering and Technology, Taxila

1
Dimensional Modeling (DM)

2
The need for ER modeling?
• Problems with early COBOLian data processing
systems.
• Collection of data

• Data redundancies

• From flat file to Table, each entity ultimately

becomes a Table in the physical schema.

• Simple O(n2) Join to work with Tables

3
Why ER Modeling has been so successful?
– Coupled with normalization drives out all the
redundancy out of the database.

– Change (or add or delete) the data at just one

point.

– Can be used with indexing for very fast access.

– Resulted in success of OLTP systems.

4
ER Modeling
CITY DISTRICT M DIVISION PROVINCE
1 district BACK
1 1
zone M division
M DISTRICT DIVISION
ZONE CITY
1
store M week
1
STORE # STREET ZONE ... DATE WEEK
1 M
sale_header quarter
M M
RECEIPT # STORE # DATE ... MONTH QTR
1 1
M M
1
WEEK MONTH
M sale_detail month 1
RECEIPT # ITEM # ... $
YEAR QTR
1 M M
1 year
ITEM # CATEGORY
ITEM # SUPPLIER
item_x_cat M
1 item_x_splir
CATEGORY DEPT
cat_x_dept 5
Need for DM: Un-answered Qs
• Lets have a look at a typical ER data model first.

• Some Observations:
– All tables look-alike, as a consequence it is difficult to identify:

• Which table is more important ?

• Which is the largest?

• Which tables contain numerical measurements of the

business?

• Which table contain nearly static descriptive attributes?

[dimension info]
6
Need for DM: Complexity of Representation
– Many topologies for the same ER diagram, all
appearing different.
• Very hard to visualize and remember.
12
7 6
3 12 7
11 4 8
8
9
1 10
10 9 11
6 1

3 2 5
2 5 4

• A large number of possible connections to any

two (or more) tables
7
Need for DM: The Paradox
• The Paradox: Trying to make information accessible using tables
resulted in an inability to query them!

• ER and Normalization result in large number of tables which are:

– Hard to understand by the users (DB programmers): EPR
system span on multiple tables

– Hard to navigate optimally by DBMS software

• Real value of ER is in using tables individually or in pairs[ good

performance in one or less table in join operation ]

• Too complex for queries that span multiple tables with a large
number of records
8
ER vs. DM
ER DM
Constituted to optimize OLTP Constituted to optimize DSS
performance. query performance.

Models the
Models the micro/detail macro[aggregate]
relationships among data relationships among data
elements. elements with an overall
deterministic strategy.
All dimensions serve as
A wild variability of the
equal entry points to the
structure of ER models.
fact table.
Very vulnerable to changes in Changes in users' querying
the user's querying habits, habits can be
because such schemas are accommodated by
asymmetrical. automatic SQL generators.
9
How to simplify a ER data model?

• Bring it to DSS
• Two general methods:

– De-Normalization

– Dimensional Modeling (DM)

10
What is DM?…
• A simpler logical model optimized for decision support.
• Inherently dimensional in nature[fact + dimension] , with a
single central fact table and a set of smaller dimensional
tables.
• Multi-part key for the fact table (long in terms of data, contain
numerical data, how many item sale, what revenue we get
from sale+ how much sale we need + single column primary
key).

• Dimensional tables with a single-part PK.(one and more but

small + single column key+ info regarding time, geography,
product dimension).

11
What is DM?...

• Results in a star like structure, called star

schema or star join. Fact in center and
dimension around it.

– All relationships mandatory M-1.

– Single path between any two levels.[fact vs

dimensional table]

12
Dimensions have Hierarchies

Items

Books Cloths

Fiction Text Men Women

Engg Medical

Analysts tend to look at the data through dimension at a

particular “level” in the hierarchy

13
The two Schemas

Star
Snow-flake

14
“Simplified” 3NF (Retail)
CITY DISTRICT M DIVISION PROVINCE
1 district BACK
1 1
zone M division
M DISTRICT DIVISION
ZONE CITY
1
store M week
1
STORE # STREET ZONE ... DATE WEEK
1 M
sale_header quarter
M M
RECEIPT # STORE # DATE ... MONTH QTR
1 1
M M
1
WEEK MONTH
M sale_detail month 1
RECEIPT # ITEM # ... $
YEAR QTR
1 M M
1 year
ITEM # CATEGORY
ITEM # SUPPLIER
item_x_cat M
1 item_x_splir
CATEGORY DEPT
cat_x_dept 15
Vastly Simplified Star Schema
Product Dim
Geography Dim
1 ITEM#
STORE# 1
Fact Table CATEGORY
ZONE
RECEIPT#
DEPT
CITY
STORE#
M SUPPLIER
DISTRICT
ITEM# M
DIVISION
DATE Time Dim
M
PROVINCE . DATE
. 1
facts . WEEK

Sale Rs. MONTH

QUARTER

YEAR

16
The Benefit of Simplicity

Beauty lies in close correspondence

with the business, obvious even to
business users.[means simplicity]

17
Features of Star Schema
Dimensional hierarchies are collapsed into a single table for
each dimension. Loss of Information? Relationship lost

A single fact table created with a single header from the

detail records, resulting in:

– A vastly simplified physical data model!

– Fewer tables (thousands of tables in some ERP systems).

– Fewer joins resulting in high performance.

–
18
Process of Dimensional Modeling
The Process of Dimensional Modeling
Four Step Method from ER to DM: ER covers all business.[ Visualization high,
complexity is high, whether requirement or not].

1. Choose the Business Process like [accounting, inventory, invoices

and even processes not running simultaneously and focus on it].
2. Choose the Grain [high grain high detail, each row showing grain
level]
3. Choose the Facts
4. Choose the Dimensions [small in size more in numbers]
Step-1: Choose the Business Process
• A business process is a major operational
process in an organization.
• Typically supported by a legacy system
(database) or an OLTP.
– Examples: Orders, Invoices, Inventory etc.

• Business Processes are often termed as Data

Marts and that is why many people criticize
DM as being data mart oriented. [Two school
of thoughts]
Step-1: Separating the Process

Star-1

Snow-flake
Step-2: Choosing the Grain
• Grain is the fundamental, atomic [not further break down] level
of data to be represented.

• Grain is also termed as the unit of analyses.

• Example grain statements

• Typical grains
– Individual Transactions [single + multiple
– Daily aggregates (snapshots)
– Monthly aggregates

• Relationship between grain and expressiveness.[ more detail

more know about business]

• Grain vs. hardware trade-off.[ keep focus on grian not on

hardware]
Step-2: Relationship b/w Grain
LOW Granularity HIGH Granularity

Four aggregates per week

4 x 4 = 16 values

Two aggregates per week Daily aggregates

2 x 4 = 8 values 6 x 4 = 24 values
The case FOR data aggregation
• Works well for repetitive queries.

• Justifiable if used for max number of queries.

• Provides a “big picture” or macroscopic view.

• Application dependent, usually fixed to business

changes. [ construct according to bussiness]
The case AGAINST data aggregation
• Aggregation is irreversible.
– Can create monthly sales data from weekly sales data,
but the reverse is not possible.
– [do not through detail data]

• Aggregation limits the questions that can be

answered.
– What, when, why, where, what-else, what-next
– When [time]
– Where[zone 1 or zone 2]
The case AGAINST data aggregation
• Aggregation can hide crucial facts.
–The average of 100 & 100 is same as 150 & 50
Aggregation hides crucial facts Example

Week-1 Week-2 Week-3 Week-4 Average

Zone-1 100 100 100 100 100
Zone-2 50 100 150 100 100
Just looking at the averages i.e. aggregate
Zone-3 50 100 100 150 100
Zone-4 200 100 50 50 100
Average 100 100 100 100

SALE DATA: SAME IN EVERY ZONE AND EVERY WEEK

SAY SOME ONE RUN SOME PROMOTION SCHEME AND SEEING HOW
PEOPLE RESPOND ON IT
Aggregation hides crucial facts chart
250
Z1 Z2 Z3 Z4
200
Sale
wise 150
Wrong
grain
100
setting

0
Week-1 Week-2 Week-3 Week-4
Week wise
Z1: Sale is constant (need to work on it)
Z2: Sale went up, then fell (need of concern)
Z3: Sale is on the rise, why?
Z4: Sale dropped sharply, need to look deeply.
W2: Static sale
Step 3: Choose Facts statement

Facts
“We need monthly sales: data
volume and Rs. by
week, product and Zone : reference”

Dimensions

Decision maker ask and you being

WH architect
Step 3: Choose Facts
• Choose the facts that will populate each
fact table record.

– Remember that best Facts are Numeric,

Continuously Valued and Additive.

– Example: Quantity Sold, Amount etc.

Step 4: Choose Dimensions
• Choose the dimensions that apply to each
fact in the fact table.

– Typical dimensions: time, product, geography

etc. WHERE CLAUSE

– Identify the descriptive attributes that explain

each dimension.

– Determine hierarchies within each dimension.

Step-4: How to Identify a Dimension?

• The single valued attributes during recording of a transaction

are dimensions. : MEANS VALUES DOES NOT CHANGE
Fact Table
Calendar_Date
Time_of_Day
Dim Account _No
ATM_Location
Transaction_Type
Transaction_Rs

Time_of_day: Morning, Mid Morning, Lunch Break etc.

Transaction_Type: Withdrawal, Deposit, Check balance etc.

Dimensional Data Modeling - Lecture3
100% (1)
Dimensional Data Modeling - Lecture3
87 pages
Dimensional Data Modeling Introduction
100% (3)
Dimensional Data Modeling Introduction
56 pages
Computer Science Faculty Information Systems Department: Data Warehousing & BI
No ratings yet
Computer Science Faculty Information Systems Department: Data Warehousing & BI
52 pages
Data Warehousing - C03 - DM
No ratings yet
Data Warehousing - C03 - DM
42 pages
4 Lecture 4-Dimensional Modelling
No ratings yet
4 Lecture 4-Dimensional Modelling
45 pages
C 01 Dimensional Modeling
No ratings yet
C 01 Dimensional Modeling
30 pages
Dimensional Data Modeling - Lecture 1
No ratings yet
Dimensional Data Modeling - Lecture 1
21 pages
dimensional modelling
No ratings yet
dimensional modelling
30 pages
Data Base
No ratings yet
Data Base
18 pages
Dim Modelling Part 1 -Sh24
No ratings yet
Dim Modelling Part 1 -Sh24
50 pages
Unit – I (1)
No ratings yet
Unit – I (1)
65 pages
Lec 5,6,7,8 DW Revison
No ratings yet
Lec 5,6,7,8 DW Revison
31 pages
Data Warehousing and Data Mining: Sunil Paudel
No ratings yet
Data Warehousing and Data Mining: Sunil Paudel
29 pages
C 01 Dimensional Modeling
No ratings yet
C 01 Dimensional Modeling
30 pages
DWH Architecture & Concepts
No ratings yet
DWH Architecture & Concepts
37 pages
COMP8047 - S04 Dimensional Modelling 1
No ratings yet
COMP8047 - S04 Dimensional Modelling 1
41 pages
DWH Spring 2011 Lecture Slides Week6&7
No ratings yet
DWH Spring 2011 Lecture Slides Week6&7
18 pages
Dimensional Modelling
No ratings yet
Dimensional Modelling
26 pages
APznzab3upw_UOf0tS71yzluuvSezhLOcz0V7YImO44BKlMzoQgANMOu408H90gWZEJRzh0QRc8b5XMYwXV25p9Q4tzh7igo57bYxI3CvqCHVgm4M1pnEXoAEjP5LvnGF9SXNlLIy347ksJ1-4jgkX6Ti8kztG1r4z60z674JDmz2y3qz0AQ66NvgOVcgnbL55H7P0DJyD6aBGp
No ratings yet
APznzab3upw_UOf0tS71yzluuvSezhLOcz0V7YImO44BKlMzoQgANMOu408H90gWZEJRzh0QRc8b5XMYwXV25p9Q4tzh7igo57bYxI3CvqCHVgm4M1pnEXoAEjP5LvnGF9SXNlLIy347ksJ1-4jgkX6Ti8kztG1r4z60z674JDmz2y3qz0AQ66NvgOVcgnbL55H7P0DJyD6aBGp
43 pages
DimensionalityModeling 2023
No ratings yet
DimensionalityModeling 2023
25 pages
Dimensional Modeling in Data Warehousing
No ratings yet
Dimensional Modeling in Data Warehousing
23 pages
Welcome To Data Warehouse Presentation
No ratings yet
Welcome To Data Warehouse Presentation
38 pages
Data Warehousing: Engr. Madeha Mushtaq Department of Computer Science Iqra National University
No ratings yet
Data Warehousing: Engr. Madeha Mushtaq Department of Computer Science Iqra National University
34 pages
Principles of Dimensional Modeling - Dimensional Modeling: Advanced Topics - Etl - Olap
No ratings yet
Principles of Dimensional Modeling - Dimensional Modeling: Advanced Topics - Etl - Olap
32 pages
L7. Multidimensional Modeling
No ratings yet
L7. Multidimensional Modeling
29 pages
Dimensional Modeling: Prof. Sunita Sahu
No ratings yet
Dimensional Modeling: Prof. Sunita Sahu
50 pages
DWM Unit2
No ratings yet
DWM Unit2
65 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
40 pages
Dimensional Analysis: Prithwis Mukerjee, PH.D
No ratings yet
Dimensional Analysis: Prithwis Mukerjee, PH.D
48 pages
Data Warehouse Ques
No ratings yet
Data Warehouse Ques
10 pages
Lecture 3
No ratings yet
Lecture 3
42 pages
ETL Introduction
No ratings yet
ETL Introduction
44 pages
MIS 385/MBA 664 Systems Implementation With DBMS/ Database Management
No ratings yet
MIS 385/MBA 664 Systems Implementation With DBMS/ Database Management
39 pages
Data Warehouse Modeling
100% (1)
Data Warehouse Modeling
87 pages
DW Life Cycle
No ratings yet
DW Life Cycle
114 pages
BI - Lecture 3 - Kimball Concepts
No ratings yet
BI - Lecture 3 - Kimball Concepts
44 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
11 pages
Data Model
100% (1)
Data Model
11 pages
What Is Dimensional Model
No ratings yet
What Is Dimensional Model
7 pages
Dimensional Modeling
100% (1)
Dimensional Modeling
19 pages
Dimensional Modeling (DM)
No ratings yet
Dimensional Modeling (DM)
9 pages
DWH Int Questions
100% (1)
DWH Int Questions
9 pages
dw4 - Dimension1
No ratings yet
dw4 - Dimension1
75 pages
Data Warehouse Lec-3
No ratings yet
Data Warehouse Lec-3
38 pages
BI Assignment 1
No ratings yet
BI Assignment 1
6 pages
What Is Data Warehouse?: Explanatory Note
No ratings yet
What Is Data Warehouse?: Explanatory Note
11 pages
ETL Overview • The ETL Process • General ETL issue
No ratings yet
ETL Overview • The ETL Process • General ETL issue
5 pages
Dimensional Modeling Tutorial
No ratings yet
Dimensional Modeling Tutorial
9 pages
What Is Data Warehouse?: Explanatory Note
No ratings yet
What Is Data Warehouse?: Explanatory Note
10 pages
Kimball Group A Dimensional Modeling Manifesto Kimball Group Print
No ratings yet
Kimball Group A Dimensional Modeling Manifesto Kimball Group Print
7 pages
Week 04 & 05
No ratings yet
Week 04 & 05
63 pages
Unit 3 OLAP and OLTP
No ratings yet
Unit 3 OLAP and OLTP
64 pages
DW Concepts Shiva
No ratings yet
DW Concepts Shiva
32 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
104 pages
Dimensional Modeling
No ratings yet
Dimensional Modeling
59 pages
Lecture 3 & 4 - 5610
No ratings yet
Lecture 3 & 4 - 5610
19 pages
Dimensional Modelling
No ratings yet
Dimensional Modelling
36 pages
Dwdm Class Ppt 9-9-23
No ratings yet
Dwdm Class Ppt 9-9-23
65 pages

Lecture 04

Uploaded by

Lecture 04

Uploaded by

Data Warehousing & DATA

University of Engineering and Technology, Taxila

• From flat file to Table, each entity ultimately

• Simple O(n2) Join to work with Tables

– Change (or add or delete) the data at just one

– Can be used with indexing for very fast access.

– Resulted in success of OLTP systems.

• Which table is more important ?

• Which is the largest?

• Which tables contain numerical measurements of the

• Which table contain nearly static descriptive attributes?

• A large number of possible connections to any

• ER and Normalization result in large number of tables which are:

– Hard to navigate optimally by DBMS software

• Real value of ER is in using tables individually or in pairs[ good

– Dimensional Modeling (DM)

• Dimensional tables with a single-part PK.(one and more but

• Results in a star like structure, called star

– All relationships mandatory M-1.

– Single path between any two levels.[fact vs

Fiction Text Men Women

Analysts tend to look at the data through dimension at a

Sale Rs. MONTH

Beauty lies in close correspondence

A single fact table created with a single header from the

– A vastly simplified physical data model!

– Fewer tables (thousands of tables in some ERP systems).

1. Choose the Business Process like [accounting, inventory, invoices

• Business Processes are often termed as Data

• Grain is also termed as the unit of analyses.

• Example grain statements

• Relationship between grain and expressiveness.[ more detail

• Grain vs. hardware trade-off.[ keep focus on grian not on

Four aggregates per week

Two aggregates per week Daily aggregates

• Justifiable if used for max number of queries.

• Provides a “big picture” or macroscopic view.

• Application dependent, usually fixed to business

• Aggregation limits the questions that can be

Week-1 Week-2 Week-3 Week-4 Average

SALE DATA: SAME IN EVERY ZONE AND EVERY WEEK

Decision maker ask and you being

– Remember that best Facts are Numeric,

– Example: Quantity Sold, Amount etc.

– Typical dimensions: time, product, geography

– Identify the descriptive attributes that explain

– Determine hierarchies within each dimension.

• The single valued attributes during recording of a transaction

Time_of_day: Morning, Mid Morning, Lunch Break etc.

Transaction_Type: Withdrawal, Deposit, Check balance etc.

You might also like