0% found this document useful (0 votes)
7 views18 pages

Data Base

It's a short but reliable lesson of database systems.it's useful and helpful for you .

Uploaded by

MUHAMMAD ANSAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views18 pages

Data Base

It's a short but reliable lesson of database systems.it's useful and helpful for you .

Uploaded by

MUHAMMAD ANSAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

Data Warehousing

Lecture-11
Dimensional Modeling (DM)

1
Dimensional Modeling (DM)

2
The need for ER modeling?
 Data redundancies

 From flat file to Table, each entity ultimately


becomes a Table in the physical schema.

 Simple O(n2) Join to work with Tables

3
Why ER Modeling has been so successful?
 Coupled with normalization drives out all
the redundancy out of the database.

 Change (or add or delete) the data at just


one point.

 Can be used with indexing for very fast


access.

 Resulted in success of OLTP systems.


4
Need for DM: Un-answered Qs
 Lets have a look at a typical ER data model first.

 Some Observations:
 All tables look-alike, as a consequence it is difficult to
identify:

 Which table is more important ?

 Which is the largest?

 Which tables contain numerical measurements of the


business?

 Which table contain nearly static descriptive attributes?


5
Need for DM: Complexity of Representation
 Many topologies for the same ER
diagram, all appearing different.
 Very hard to visualize and remember.
12
7 6
3 12 7
11 4 8
8
9
1 10
10 9 11
6 1

3 2 5
2 5 4

 A large number of possible connections to


any two (or more) tables 6
Need for DM: The Paradox
 The Paradox: Trying to make information accessible using
tables resulted in an inability to query them!

 ER and Normalization result in large number of tables which


are:
 Hard to understand by the users (DB programmers)

 Hard to navigate optimally by DBMS software

 Real value of ER is in using tables individually or in pairs

 Too complex for queries that span multiple tables with a


large number of records
7
ER vs. DM
ER DM
Constituted to optimize OLTP Constituted to optimize DSS
performance. query performance.

Models the macro


Models the micro relationships relationships among data
among data elements. elements with an overall
deterministic strategy.
All dimensions serve as
A wild variability of the
equal entry points to the
structure of ER models.
fact table.
Very vulnerable to changes in Changes in users' querying
the user's querying habits, habits can be
because such schemas are accommodated by
asymmetrical. automatic SQL generators.
8
How to simplify a ER data model?

 Two general methods:

 De-Normalization

 Dimensional Modeling (DM)

9
What is DM?…
 A simpler logical model optimized for decision
support.

 Inherently dimensional in nature, with a single


central fact table and a set of smaller
dimensional tables.

 Multi-part key for the fact table

 Dimensional tables with a single-part PK.

 Keys are usually system generated


10
What is DM?...
 Results in a star like structure, called star
schema or star join.

 All relationships mandatory M-1.

 Single path between any two levels.

 Supports ROLAP operations.

11
Dimensions have Hierarchies

Items

Books Cloths

Fiction Text Men Women

Engg Medical

Analysts tend to look at the data through dimension at a


particular “level” in the hierarchy

12
The two Schemas

Star
Snow-flake

13
“Simplified” 3NF (Retail)
CITY DISTRICT M DIVISIONPROVINCE
1 district BACK
1 1
zone M division
M DISTRICT DIVISION
ZONE CITY
1
store M week
1
STORE #STREET ZONE ... DATE WEEK
1 M
sale_header M quarter
M
RECEIPT #STORE # DATE ... MONTH QTR
1 1
M M
1
WEEK MONTH
M sale_detail month 1
RECEIPT # ITEM # ... $
YEAR QTR
1 M M
1 year
ITEM #CATEGORY
ITEM # SUPPLIER
item_x_cat M
1 item_x_splir
CATEGORY DEPT
14
cat_x_dept
Vastly Simplified Star Schema
Product Dim
Geography Dim
1 ITEM#
STORE# 1
Fact Table CATEGORY
ZONE
RECEIPT#
DEPT
CITY
STORE#
M SUPPLIER
DISTRICT
ITEM# M
DIVISION
DATE Time Dim
M
PROVINCE . DATE
. 1
facts . WEEK

Sale Rs. MONTH

QUARTER

YEAR

15
The Benefit of Simplicity

Beauty lies in close correspondence


with the business, evident even to
business users.

16
Features of Star Schema
Dimensional hierarchies are collapsed into a single
table for each dimension. Loss of Information?

 A single fact table created with a single header from the detail
records, resulting in:

 A vastly simplified physical data model!

 Fewer tables (thousands of tables in some ERP systems).


 Fewer joins resulting in high performance.

 Some requirement of additional space.


17
Quantifying space requirement
Quantifying use of additional space using star schema

There are about 10 million mobile phone users in Pakistan.


Say the top company has half of them = 500,000

Number of days in 1 year = 365


Number of calls recorded each day = 250,000 (assumed)
Maximum number of records in fact table = 91 billion rows
Assuming a relatively small header size = 128 bytes
Fact table storage used = 11 Tera bytes
Average length of city name = 8 characters  8 bytes
Total number of cities with telephone access = 170 (1 byte)
Space used for city name in fact table using Star = 8 x 0.091 =
0.728 TB
Space used for city code using snow-flake = 1x 0.091= 0.091 TB
Additional space used  0.637 Tera byte i.e. about 5.8%
18

You might also like