0% found this document useful (0 votes)
42 views18 pages

DWH Spring 2011 Lecture Slides Week6&7

The document discusses the need for and benefits of entity-relationship (ER) modeling and dimensional modeling. ER modeling helped address issues with early data processing systems by reducing data redundancies but resulted in complex schemas that were difficult for users to understand and query across multiple tables. Dimensional modeling provides a simpler representation optimized for decision support queries through a star schema with a central fact table linked to smaller dimension tables. This structure supports analysis and is easier for users to navigate compared to complex ER schemas.

Uploaded by

merajaslam
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views18 pages

DWH Spring 2011 Lecture Slides Week6&7

The document discusses the need for and benefits of entity-relationship (ER) modeling and dimensional modeling. ER modeling helped address issues with early data processing systems by reducing data redundancies but resulted in complex schemas that were difficult for users to understand and query across multiple tables. Dimensional modeling provides a simpler representation optimized for decision support queries through a star schema with a central fact table linked to smaller dimension tables. This structure supports analysis and is easier for users to navigate compared to complex ER schemas.

Uploaded by

merajaslam
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

Dr.

Abdul Basit Siddiqui


FUIEMS
(Lecture Slides Weeks # 6&7)
The need for ER modeling?
Problems with early COBOLian data processing
systems.

Data redundancies

From flat file to Table, each entity ultimately


becomes a Table in the physical schema.

Simple O(n2) Join to work with Tables

FUIEMS Data Warehousing - Spring2011


Why ER Modeling has been so successful?

Coupled with normalization drives out all the


redundancy out of the database.

Change (or add or delete) the data at just one


point.

Can be used with indexing for very fast access.

Resulted in success of OLTP systems.

FUIEMS Data Warehousing - Spring2011


Need for DM: Un-answered Qs
 Lets have a look at a typical ER data model first.

 Some Observations:
 All tables look-alike, as a consequence it is difficult to identify:

 Which table is more important ?

 Which is the largest?

 Which tables contain numerical measurements of the


business?

 Which table contain nearly static descriptive attributes?

FUIEMS Data Warehousing - Spring2011


Need for DM: Complexity of Representation
 Many topologies for the same ER diagram, all appearing
different.
 Very hard to visualize and remember.
12
7 6
3 12 7
11 4 8
8
9
1 10
10 9 11
6 1

 A large number 3
of possible 2 5
connections to 4any two (or
2 5
more) tables

FUIEMS Data Warehousing - Spring2011


Need for DM: The Paradox
 The Paradox: Trying to make information accessible using tables
resulted in an inability to query them!

 ER and Normalization result in large number of tables which are:


 Hard to understand by the users (DB programmers)

 Hard to navigate optimally by DBMS software

 Real value of ER is in using tables individually or in pairs

 Too complex for queries that span multiple tables with a large
number of records

FUIEMS Data Warehousing - Spring2011


ER vs. DM
ER DM
 Constituted to optimize  Constituted to optimize DSS
OLTP performance. query performance.
 Models the micro  Models the macro
relationships among data relationships among data
elements. elements with an overall
deterministic strategy.
A wild variability of the  All dimensions serve as equal
structure of ER models. entry points to the fact table.
 Changes in users' querying
 Very vulnerable to changes in habits can be accommodated
the user's querying habits, by automatic SQL generators.
because such schemas are
asymmetrical.

FUIEMS Data Warehousing - Spring2011


How to simplify a ER data model?

Two general methods:

 De-Normalization

 Dimensional Modeling (DM)

FUIEMS Data Warehousing - Spring2011


What is DM?

A simpler logical model optimized for decision


support.
Inherently dimensional in nature, with a single
central fact table and a set of smaller
dimensional tables.
Multi-part key for the fact table
Dimensional tables with a single-part PK.
Keys are usually system generated

FUIEMS Data Warehousing - Spring2011


What is DM?
Results in a star like structure, called star schema
or star join.

All relationships mandatory M-1.

Single path between any two levels.

Supports ROLAP operations.

FUIEMS Data Warehousing - Spring2011


Dimensions have Hierarchies

Items

Books Cloths

Fiction Text Men Women

Engg Medical

Analysts tend to look at the data through


dimension at a particular “level” in the
hierarchy

FUIEMS Data Warehousing - Spring2011


The two Schemas

Star
Snow-flake

FUIEMS Data Warehousing - Spring2011


“Simplified” 3NF (Retail)
CITY DISTRICT M DIVISION PROVINCE
1 district
1 1
zone M division
M DISTRICT DIVISION
ZONE CITY
1
store M week
1
STORE # STREET ZONE ... DATE WEEK
1 M
sale_header quarter
M M
RECEIPT # STORE # DATE ... MONTH QTR
1 1
M M
1
WEEK MONTH
M sale_detail month 1
RECEIPT # ITEM # ... $
YEAR QTR
1 M M
1 year
ITEM # CATEGORY
ITEM # SUPPLIER
item_x_cat M
1 item_x_splir
CATEGORY DEPT
cat_x_dept
FUIEMS Data Warehousing - Spring2011
Vastly Simplified Star Schema
Product Dim
Geography Dim
1 ITEM#
STORE# 1
Fact Table CATEGORY
ZONE
RECEIPT#
DEPT
CITY
STORE#
M SUPPLIER
DISTRICT
ITEM# M
DIVISION
DATE Time Dim
M
PROVINCE . DATE
. 1
facts . WEEK

Sale Rs. MONTH

QUARTER

YEAR
FUIEMS Data Warehousing - Spring2011
The Benefit of Simplicity

Beauty lies in close


correspondence with the
business, evident even to
business users.

FUIEMS Data Warehousing - Spring2011


Features of Star Schema
Dimensional hierarchies are collapsed into a single table
for each dimension. Loss of Information?

A single fact table created with a single header from the


detail records, resulting in:

A vastly simplified physical data model!

Fewer tables (thousands of tables in some ERP systems).

Fewer joins resulting in high performance.

Some requirement of additional space.

FUIEMS Data Warehousing - Spring2011


Quantifying space requirement
Quantifying use of additional space using star schema

There are about 10 million mobile phone users in Pakistan.


Say the top company has half of them = 500,000

Number of days in 1 year = 365


Number of calls recorded each day = 250,000 (assumed)
Maximum number of records in fact table = 91 billion rows
Assuming a relatively small header size = 128 bytes
Fact table storage used = 11 Tera bytes
Average length of city name = 8 characters  8 bytes
Total number of cities with telephone access = 170 (1 byte)
Space used for city name in fact table using Star = 8 x 0.091 = 0.728 TB
Space used for city code using snow-flake = 1x 0.091= 0.091 TB
Additional space used  0.637 Tera byte i.e. about 5.8%

FUIEMS Data Warehousing - Spring2011

You might also like