Data Warehousing - C02 - OLAP
Data Warehousing - C02 - OLAP
Lecture-2
Online Analytical Processing
(OLAP)
1
DWH & OLAP
2
Supporting the human thought process
THOUGHT PROCESS QUERY SEQUENCE
Profit down by a large percentage What was the quarterly sales at
consistently during last quarter regional level during last year ??
only. Rest is OK
•Analysis is directional
• Drill Down
• Roll Up More in
subsequent
• Pivot slides
4
Challenges…
•Not feasible to write predefined queries.
• Fails to remain user_driven (becomes programmer
driven).
5
Challenges
•Contradiction
• Want to compute answers in advance, but don't know
the questions
•Solution
• Compute answers to “all” possible “queries”. But
how?
6
OLAP: Facts & Dimensions
7
Where Does OLAP Fit In?
• It is a classification of applications, NOT a database
design technique.
8
Where does OLAP fit in?
?
Transaction
Data
Data
Loading
OLAP
Reports
Decision
Maker
Data Cube
(MOLAP) Presentation
Tools
9
OLTP vs. OLAP
Feature OLTP OLAP
Level of data Detailed Aggregated
Amount of data per Small Large
transaction
Views Pre-defined User-defined
Typical write Update, insert, delete Bulk insert
operation
“age” of data Current (60-90 days) Historical 5-10 years and
also current
Number of users High Low-Med
Tables Flat tables Multi-Dimensional tables
Database size Med (109 B – 1012 B) High (1012 B – 1015 B)
Query Optimizing Requires experience Already “optimized”
Data availability High Low-Med
10
OLAP FASMI Test
Fast: Delivers information to the user at a fairly constant rate. Most
queries answered in under five seconds.
11
OLAP Implementations
12
Multidimensional OLAP (MOLAP)
13
MOLAP Implementations
OLAP has historically been implemented using a
multi_dimensional data structure or “cube”.
14
MOLAP Implementations
🞭 No standard query language for querying MOLAP
- No SQL !
15
Aggregations in MOLAP
Product
Bread 8
Category Division Quarter Eggs 45
Butter 13
Product District Month Week Jam 12
Juice 10
City Day
w1 w2 w3 w4 w5 w6
Zone
Time
16
Cube operations
• Drill down: get more details
• e.g., given summarized sales as above, find breakup of sales by city
within each region, or within Sindh
17
Querying the cube
40,00
0 Juices Soda Drinks 14,000
35,00 Juices Soda Drinks
0 12,000
30,00
0 Drill-Down 10,000
25,00
8,000
0
20,00 6,000
-
0
2001 2002 4,000
15,00
0
Roll-Up
10,00 2,000
0
5,00 -
0 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
2001 2002
12,000
OJ RK 8UP PK MJ BU AJ
10,000
8,000 Drill-down
6,000
4,000
2,000
-
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 18
2001 2002
Querying the cube: Pivoting
40,000
Juices Soda Drinks
35,000
30,000
25,000
20,000
15,000
10,000
5,000
-
2001 2002
18,000
2001 2002
16,000
14,000
12,000
10,000
8,000
6,000
4,000
2,000
-
Orange Mango Apple Rola- 8-UP Bubbly- Pola-
juice juice juice Kola UP Kola
19
MOLAP evaluation
Advantages of MOLAP:
20
MOLAP evaluation
Drawbacks of MOLAP:
21
MOLAP Implementation issues
Maintenance issue: Every data item received must be
aggregated into every cube (assuming “to-date”
summaries are maintained). Lot of work.
22
Partitioned Cubes
• To overcome the space limitation of MOLAP, the cube is
partitioned.
23
Partitioned Cubes: How it looks Like?
Men’s clothing
Children clothing
Bed linen
Time
Product
Geography
• Logically similar to a relational view i.e. linking two (or more) cubes along
common dimension(s).
Example: Joining the store cube and the list price cube along the product
dimension, to calculate the sale price without redundant storage of the sale
price data.
25
Relational OLAP (ROLAP)
26
The necessary of ROLAP
Issue of scalability i.e. curse of dimensionality for
MOLAP
27
ROLAP as a “Cube”
🞭 OLAP data is stored in a relational database (e.g. a star
schema)
🞭 The fact table is a way of visualizing as a “un-rolled”
cube.
🞭 So where is the cube?
🞭It’s a matter of perception
🞭Visualize the fact table as an elementary cube.
Fact Table
Product
Month Product Zone Sale K Rs.
M1 P1 Z1 250
M2 P2 Z1 500
Time
28
How to create “Cube” in ROLAP
• Cube is a logical entity containing values of a certain fact at a
certain aggregation level at an intersection of a combination of
dimensions.
Month_ID
SUM M1 M2 M3 ALL
(Sales_Amt)
Product_ID
P1
P2
P3
Total
29
How to create “Cube” in ROLAP using SQL
🞭 For the table entries, without the totals
SELECT S.Month_Id, S.Product_Id,
SUM(S.Sales_Amt)
FROM Sales
GROUP BY S.Month_Id, S.Product_Id;
30
Problem With Simple Approach
• Number of required queries increases exponentially with
the increase in number of dimensions.
• In the example, the first query can do most of the work of the
other two queries
31
CUBE Clause
32
ROLAP & Space Requirement
If one is not careful, with the increase in number of
dimensions, the number of summary tables gets very
large
33
EXAMPLE: ROLAP & Space Requirement
A naïve implementation will require all combinations of summary
tables at each and every aggregation level.
…
24 summary tables, add in
geography, results in 120 tables
34
ROLAP Issues
• Maintenance.
• Aggregation pit-falls.
35
ROLAP Issue: Maintenance
36
ROLAP Issue: Hierarchies
Dimensions are NOT always simple hierarchies
Dimensions can be more than simple hierarchies i.e.
item, subcategory, category, etc.
The product dimension might also branch off by trade style
that cross simple hierarchy boundaries such as:
• Looking at sales of air conditioners that cross
manufacturer boundaries, such as COY1, COY2,
COY3 etc.
• Looking at sales of all “green colored” items that even cross
product categories (washing machine, refrigerator, split-AC,
etc.).
• Looking at a combination of both.
37
ROLAP Issue: Convention
Conventions are NOT absolute
• Calendar:
01 Jan. to 31 Dec or
01 Jul. to 30 Jun. or
01 Sep to 30 Aug.
• Week:
Mon. to Sat. or Thu. to Wed.
38
ROLAP Issue: Storage space explosion
39
ROLAP Issues: Aggregation pitfalls
40
How to Reduce Summary tables?
Many ROLAP products have developed means to reduce
the number of summary tables by:
41
Performance vs. Space Trade-Off
42
Performance vs. Space Trade-off using Wizard
60
40 Aggregation
answers few queries
20
2 4 MB 6 8
Hybrid OLAP (HOLAP)
44
HOLAP
46