0% found this document useful (0 votes)

49 views18 pages

Virtual University of Pakistan

This document discusses issues related to de-normalization in data warehousing. It covers three main issues: storage, performance, and ease-of-use/maintenance. Storage issues include increased table sizes from adding redundant data. Performance issues can occur when de-normalization results in larger tables, sorting needs, or loss of optimization opportunities. Ease-of-use and maintenance are impacted by challenges reversing splits like hashing, and uneven data distribution from techniques like range splitting. Horizontal and vertical splitting are discussed as techniques that can improve performance for some queries but degrade it for others due to join overhead.

Uploaded by

hamza abbas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views18 pages

Virtual University of Pakistan

Uploaded by

hamza abbas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 18

Virtual University of Pakistan

Data Warehousing
Lecture-9
Issues of De-normalization

Ahsan Abdullah
Assoc. Prof. & Head
Center for Agro-Informatics Research
www.nu.edu.pk/cairindex.asp
National University of Computers & Emerging Sciences, Islamabad
Email: [email protected]
1
Ahsan Abdullah
Issues of De-normalization

2
Ahsan Abdullah
Why Issues?

3
Ahsan Abdullah
Issues of Denormalization

 Storage

 Performance

 Ease-of-use

 Maintenance
4
Ahsan Abdullah
Industry Characteristics
Master:Detail Ratios
 Health care 1:2 ratio

 Video Rental 1:3 ratio

 Retail 1:30 ratio

5
Ahsan Abdullah
Storage Issues: Pre-joining Facts
 Assume 1:2 record count ratio between claim
master and detail for health-care application.

 Assume 10 million members (20 million records in

claim detail).

 Assume 10 byte member_ID.

 Assume 40 byte header for master and 60 byte
header for detail tables.

6
Ahsan Abdullah
Storage Issues: Pre-joining (Calculations)
With normalization:
Total space used = 10 x 40 + 20 x 60 = 1.6 GB

After denormalization:
Total space used = (60 + 40 – 10) x 20 = 1.8 GB

Net result is 12.5% additional space required in

raw data table size for the database.

7
Ahsan Abdullah
Performance Issues: Pre-joining
Consider the query “How many members
were paid claims during last year?”

With normalization:
Simply count the number of records in the master
table.

After denormalization:
The member_ID would be repeated, hence need a
count distinct. This will cause sorting on a larger
table and degraded performance.
8
Ahsan Abdullah
Why Performance Issues: Pre-joining
Depending on the query, the performance
actually deteriorates with denormalization!
This is due to the following three reasons:
 Forcing a sort due to count distinct.
 Using a table with 1.5 times header size.
 Using a table which is 2 times larger.
 Resulting in 3 times degradation in
performance.

Bottom Line: Other than 0.2 GB additional

space, also keep the 0.4 GB master table.
9
Ahsan Abdullah
Performance Issues: Adding redundant columns
Continuing with the previous Health-Care
example, assuming a 60 byte detail table and
10 byte Sale_Person.
 Copying the Sale_Person to the detail table
results in all scans taking 16% longer than
previously.

 Justifiable only if significant portion of queries get

benefit by accessing the denormalized detail
table.
 Need to look at the cost-benefit trade-off for each
denormalization decision.
Ahsan Abdullah
10
Other Issues: Adding redundant columns
Other issues include, increase in table size,
maintenance and loss of information:

 The size of the (largest table i.e.) transaction table

increases by the size of the Sale_Person key.
 For the example being considered, the detail table size
increases from 1.2 GB to 1.32 GB.

 If the Sale_Person key changes (e.g. new 12 digit

NID), then updates to be reflected all the way to
transaction table.

 In the absence of 1:M relationship, column movement

will actually result in loss of data. 11
Ahsan Abdullah
Ease of use Issues: Horizontal Splitting
Horizontal splitting is a Divide&Conquer technique that exploits parallelism.
The conquer part of the technique is about combining the results.

Lets see how it works for hash based splitting/partitioning.

 Assuming uniform hashing, hash splitting supports even data distribution

across all partitions in a pre-defined manner.

 However, hash based splitting is not easily reversible to eliminate the split.

12
Ahsan Abdullah
Ease of use Issues: Horizontal Splitting

13
Ahsan Abdullah
Ease of use Issues: Horizontal Splitting
 Round robin and random splitting:
 Guarantee good data distribution.
 Almost impossible to reverse (or undo).
 Not pre-defined.

14
Ahsan Abdullah
Ease of use Issues: Horizontal Splitting
 Range and expression splitting:
 Can facilitate partition elimination with a
smart optimizer.
 Generally lead to "hot spots” (uneven
distribution of data).

15
Ahsan Abdullah
Performance Issues: Horizontal Splitting

Dramatic cancellation
of airline reservations
after 9/11, resulting in
Processors “hot spot”

P1 P2 P3 P4

1998 1999 2000 2001

Splitting based on year

16
Ahsan Abdullah
Performance issues: Vertical Splitting Facts
Example: Consider a 100 byte header for the
member table such that 20 bytes provide
complete coverage for 90% of the queries.

Split the member table into two parts as follows:

1. Frequently accessed portion of table (20 bytes),

and

2. Infrequently accessed portion of table (80+

bytes). Why 80+?

Note that primary key (member_id) must be

present in both tables for eliminating the split. 17
Ahsan Abdullah
Performance issues: Vertical Splitting Good vs. Bad

Scanning the claim table for most frequently used

queries will be 500% faster with vertical splitting

Ironically, for the “infrequently” accessed queries the

performance will be inferior as compared to the un-split
table because of the join overhead.

18
Ahsan Abdullah

Q.1. Define Problem. What Are Steps in Problem Solving? Definition of Problem
100% (1)
Q.1. Define Problem. What Are Steps in Problem Solving? Definition of Problem
30 pages
Quiz 1 - Dfo
33% (6)
Quiz 1 - Dfo
5 pages
Autosar Sws Ipdumultiplexer
100% (1)
Autosar Sws Ipdumultiplexer
103 pages
Modern Network Security NSE1 Study Guide Ebook
No ratings yet
Modern Network Security NSE1 Study Guide Ebook
79 pages
Lecture 2 Denormalization
No ratings yet
Lecture 2 Denormalization
22 pages
GENESIS32 OLE Automation References
No ratings yet
GENESIS32 OLE Automation References
469 pages
Lecture 44
No ratings yet
Lecture 44
19 pages
Lecture 36
No ratings yet
Lecture 36
19 pages
Virtual University of Pakistan
No ratings yet
Virtual University of Pakistan
16 pages
Virtual University of Pakistan
No ratings yet
Virtual University of Pakistan
13 pages
Data Rich, Information Poor
No ratings yet
Data Rich, Information Poor
5 pages
BP 2000 Oracle On Nutanix
No ratings yet
BP 2000 Oracle On Nutanix
91 pages
Introduction To Apsimng - The Gtksharp User Interface For Apsimx
No ratings yet
Introduction To Apsimng - The Gtksharp User Interface For Apsimx
3 pages
Oracle Tables Defragmentation
No ratings yet
Oracle Tables Defragmentation
10 pages
Referensi PLSQL Semester 1 Mid Term Exam
No ratings yet
Referensi PLSQL Semester 1 Mid Term Exam
61 pages
Falcon-E: Introduction: (I.e., 4 Byte Chunks)
No ratings yet
Falcon-E: Introduction: (I.e., 4 Byte Chunks)
61 pages
cs301 GDB
No ratings yet
cs301 GDB
1 page
Lecture - 5 6 16032023 111618am
No ratings yet
Lecture - 5 6 16032023 111618am
38 pages
Denormalization Notes Lecture 8 9
No ratings yet
Denormalization Notes Lecture 8 9
2 pages
Introduction To Data Science and Analytics: Summer School 2015
No ratings yet
Introduction To Data Science and Analytics: Summer School 2015
31 pages
De Normalization 17062020 101155am 01042022 064624pm
No ratings yet
De Normalization 17062020 101155am 01042022 064624pm
36 pages
Lecture 03
No ratings yet
Lecture 03
30 pages
The User's View: A User Is A Person Employing The Computer To Do Useful Work Examples of Useful Work Include
No ratings yet
The User's View: A User Is A Person Employing The Computer To Do Useful Work Examples of Useful Work Include
25 pages
TalendOpenStudio DI IG 5.6.1 en
No ratings yet
TalendOpenStudio DI IG 5.6.1 en
24 pages
Vendor Managed Inventory: A Presentation To
No ratings yet
Vendor Managed Inventory: A Presentation To
13 pages
The Mother of All Database Normalization Debates On Coding Horror
No ratings yet
The Mother of All Database Normalization Debates On Coding Horror
7 pages
Miss Rate Versus Block Size: 25% 1K 4K 16K 64K 256K
No ratings yet
Miss Rate Versus Block Size: 25% 1K 4K 16K 64K 256K
33 pages
DBMS 2
No ratings yet
DBMS 2
8 pages
L12 de Normalization
No ratings yet
L12 de Normalization
16 pages
Advanced Database Integration Group 52
No ratings yet
Advanced Database Integration Group 52
45 pages
De Normalization
No ratings yet
De Normalization
7 pages
Lecture 42
No ratings yet
Lecture 42
29 pages
Lecture 44
No ratings yet
Lecture 44
19 pages
Lecture#10 Database Systems
No ratings yet
Lecture#10 Database Systems
14 pages
NXN Crossbar Design For Barrel Shifter: X-Input Y-Output
No ratings yet
NXN Crossbar Design For Barrel Shifter: X-Input Y-Output
18 pages
Unit 4
No ratings yet
Unit 4
18 pages
Ch05 - Physical Database Design and Performance
No ratings yet
Ch05 - Physical Database Design and Performance
38 pages
Corporate Email Usage Policy
No ratings yet
Corporate Email Usage Policy
4 pages
LAB 7 Question Set
No ratings yet
LAB 7 Question Set
12 pages
CS 4221: Database Design
No ratings yet
CS 4221: Database Design
34 pages
Unit IV - Database Normalization
No ratings yet
Unit IV - Database Normalization
31 pages
Islamic Republic of Afghanistan Ministry of Higher Education Herat University Computer Science Faculty
No ratings yet
Islamic Republic of Afghanistan Ministry of Higher Education Herat University Computer Science Faculty
35 pages
Saft4u User Kit
No ratings yet
Saft4u User Kit
25 pages
Bahria University, Islamabad Campus: Department of Computer Sciences
No ratings yet
Bahria University, Islamabad Campus: Department of Computer Sciences
8 pages
5 Denormalization
No ratings yet
5 Denormalization
19 pages
Rman Q A
No ratings yet
Rman Q A
16 pages
Optimize Query Performance: 5.1 Lab - Understanding The Execution Plan
No ratings yet
Optimize Query Performance: 5.1 Lab - Understanding The Execution Plan
14 pages
Virtual University of Pakistan
No ratings yet
Virtual University of Pakistan
14 pages
Custom Adapter Module Development
No ratings yet
Custom Adapter Module Development
7 pages
Lab 19 - Using Activeevent 9-37
No ratings yet
Lab 19 - Using Activeevent 9-37
10 pages
Kroenke Dbp16e Chapter 4
No ratings yet
Kroenke Dbp16e Chapter 4
31 pages
DB Chapter 05 Physical Database Design and Performance
No ratings yet
DB Chapter 05 Physical Database Design and Performance
36 pages
PDF Document 2
No ratings yet
PDF Document 2
72 pages
Webanalyst Server™ - Universal Platform For Intelligent E-Business
No ratings yet
Webanalyst Server™ - Universal Platform For Intelligent E-Business
10 pages
17-Arid-6382 (Muhammad Awais Riaz) Practical
No ratings yet
17-Arid-6382 (Muhammad Awais Riaz) Practical
3 pages
Purple & White Business Profile Presentation
No ratings yet
Purple & White Business Profile Presentation
16 pages
Database Normalization
No ratings yet
Database Normalization
8 pages
Ads QB
No ratings yet
Ads QB
17 pages
Title: - Develop Javascript To Use Decision Making and Looping Statements
No ratings yet
Title: - Develop Javascript To Use Decision Making and Looping Statements
8 pages
Relational Data Manipulation: CXB 3104 Advanced Database Systems
No ratings yet
Relational Data Manipulation: CXB 3104 Advanced Database Systems
10 pages
Itri 613 Database Systems Assignment 1 29435927
No ratings yet
Itri 613 Database Systems Assignment 1 29435927
9 pages
Windows Server 2003 Active Directory Interview Questions
No ratings yet
Windows Server 2003 Active Directory Interview Questions
3 pages
Cohort 9 Day 2
No ratings yet
Cohort 9 Day 2
10 pages
Apache Knox - Load Balancing
No ratings yet
Apache Knox - Load Balancing
5 pages
DBMS Session 6 Notes
No ratings yet
DBMS Session 6 Notes
50 pages
Geographical Analysis in SAP Business Information Warehouse
No ratings yet
Geographical Analysis in SAP Business Information Warehouse
3 pages
Database System Lect 07
No ratings yet
Database System Lect 07
75 pages
Normalization vs. Denormalization Striking The Right Balance in Database Design
No ratings yet
Normalization vs. Denormalization Striking The Right Balance in Database Design
7 pages
Optimization Strategy
No ratings yet
Optimization Strategy
21 pages
Data & Web Mining: Manoj Pandia, Silicon Institute of Technology
No ratings yet
Data & Web Mining: Manoj Pandia, Silicon Institute of Technology
21 pages
Distributed Database Design
No ratings yet
Distributed Database Design
51 pages
Normalization and Denormalization Balancing Performance and Storage Efficiency
No ratings yet
Normalization and Denormalization Balancing Performance and Storage Efficiency
6 pages
Data Warehousing: Lecture No 04
No ratings yet
Data Warehousing: Lecture No 04
47 pages
Anil Resume 3
No ratings yet
Anil Resume 3
5 pages
Normalization
No ratings yet
Normalization
47 pages
Lecture 22 Denormalization
No ratings yet
Lecture 22 Denormalization
14 pages
Lec3 De-Normalization
No ratings yet
Lec3 De-Normalization
38 pages
BCNF
No ratings yet
BCNF
3 pages
Chapter V
No ratings yet
Chapter V
38 pages
Week 2
No ratings yet
Week 2
6 pages
SAP GRC Vs ProfileTailor GRC Appsian Security
No ratings yet
SAP GRC Vs ProfileTailor GRC Appsian Security
4 pages
Aldon LMi Datasheet
No ratings yet
Aldon LMi Datasheet
4 pages
Lecture 02
No ratings yet
Lecture 02
46 pages
Lecture # 07 Denormalization
No ratings yet
Lecture # 07 Denormalization
10 pages
Lecture 6
No ratings yet
Lecture 6
10 pages
Cebu Cpar Center: Auditing in A Computer Information Systems (Cis) Environment
No ratings yet
Cebu Cpar Center: Auditing in A Computer Information Systems (Cis) Environment
1 page
Resume 1
No ratings yet
Resume 1
2 pages
Enterprise Systems: Distributed Databases and Systems - DT211 4
No ratings yet
Enterprise Systems: Distributed Databases and Systems - DT211 4
25 pages
Lesson5 NORMALIZATION (Midtrem)
No ratings yet
Lesson5 NORMALIZATION (Midtrem)
29 pages
Denormalization
No ratings yet
Denormalization
9 pages
Week08 - Physical Design
No ratings yet
Week08 - Physical Design
24 pages
Database Techniques DB Normalization
No ratings yet
Database Techniques DB Normalization
37 pages
Network
No ratings yet
Network
6 pages
Lecture 7 - 8 - Normalization
No ratings yet
Lecture 7 - 8 - Normalization
30 pages
Imran Introduction To DWH-6-I
No ratings yet
Imran Introduction To DWH-6-I
21 pages
Data Warehousing - CS614 Power Point Slides Lecture 07
No ratings yet
Data Warehousing - CS614 Power Point Slides Lecture 07
9 pages
4th Module DBMS Notes
No ratings yet
4th Module DBMS Notes
23 pages
Normalisation Part 3
No ratings yet
Normalisation Part 3
26 pages
12th Databases
No ratings yet
12th Databases
32 pages
Virtual University of Pakistan
No ratings yet
Virtual University of Pakistan
10 pages
Database Management Systems
No ratings yet
Database Management Systems
20 pages
Normalisation
No ratings yet
Normalisation
21 pages
Reference Short Notes For Mid Term Papers: CS614 - Date Warehousing
No ratings yet
Reference Short Notes For Mid Term Papers: CS614 - Date Warehousing
18 pages
Industrial Cases in Simulation Modeling
From Everand
Industrial Cases in Simulation Modeling
James A. Chisman PhD
No ratings yet
SAP HANA Interview Questions You'll Most Likely Be Asked
From Everand
SAP HANA Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Virtual University of Pakistan

Uploaded by

Virtual University of Pakistan

Uploaded by

Virtual University of Pakistan

 Video Rental 1:3 ratio

 Retail 1:30 ratio

 Assume 10 million members (20 million records in

 Assume 10 byte member_ID.

Net result is 12.5% additional space required in

Bottom Line: Other than 0.2 GB additional

 Justifiable only if significant portion of queries get

 The size of the (largest table i.e.) transaction table

 If the Sale_Person key changes (e.g. new 12 digit

 In the absence of 1:M relationship, column movement

Lets see how it works for hash based splitting/partitioning.

 Assuming uniform hashing, hash splitting supports even data distribution

1998 1999 2000 2001

Split the member table into two parts as follows:

1. Frequently accessed portion of table (20 bytes),

2. Infrequently accessed portion of table (80+

Note that primary key (member_id) must be

Scanning the claim table for most frequently used

Ironically, for the “infrequently” accessed queries the

You might also like