Online Analytical Processing (OLAP) : An Overview
Online Analytical Processing (OLAP) : An Overview
An Overview
Overview
Motivation Multi-Dimensional Data Model Research Areas Optimizations
Materializing multiple aggregates simultaneously Materialization strategy
Motivation
Aggregation, summarization and exploration Of historical data To help management make informed decisions
Different Goal
Aggregation, summarization and exploration Of historical data To help management make informed decisions
Product Coke (0.5 gallon) Pepsi (0.5 gallon) Coke (1 gallon) Altoids Branch Convoy Street UTC UTC Costa Verde Time 2006-03-01 09:00:01 2006-03-01 09:00:01 2006-03-01 09:00:02 2006-03-01 09:01:33 Price $1.00 $1.03 $1.50 $0.30
...
Find the total sales for each product and month Find the percentage change in the total monthly sales for each product
Different Requirements
OLTP On-Line Transaction Processing OLAP On-Line Analytical Processing
OLTP
Tasks Day to day operation
OLAP
High level decision support Terabytes
Size of database
Gigabytes
Time span
Recent, up-to-date
Workload
Performance
Transaction throughput
Overview
Motivation Multi-Dimensional Data Model Research Areas Optimizations
Materializing multiple aggregates simultaneously Materialization strategy
Feb
30.3
Mar
Heineken Feb Mar
93.9
34.8 123.8
124.2
158.6
282.8
Sales by Cat.
Feb
30.3
282.8
Drinks
Drinks Drinks
Coke
Heineken Heineken
Mar
Feb Mar
93.9
34.8 123.8
124.2
158.6 158.6
282.8
282.8 282.8
Coke
Heineken Total
30.3
34.8 65.1
93.9
123.8 217.7
124.2
158.6 282.8
Drinks
Drinks Drinks Drinks Drinks
Heineken
Heineken ALL ALL ALL
Mar
ALL ALL Feb Mar
123.8
158.6 282.8 65.1 217.7
Coke
Heineken Total (ALL)
30.3
34.8 65.1
93.9
123.8 217.7
124.2
158.6 282.8
SUM
65.1
217.7
Drinks
Drinks Drinks
ALL
ALL ALL
Feb
Mar ALL
65.1
217.7 282.8
ALL
ALL
964.0
ALL
Idea: Group by the CUBE list. Union the aggregates. Introduce the ALL values.
Drinks ALL
ALL ALL
Feb ALL
99.8 964.0
Month
Day
State
Product Name
Coke Heineken
Sales
Feb Feb
26 26
CA CA
12.3 5.4
Feb
Feb Snacks Feb
26
26 26
CA
ALL
ALL
Coke
30.4
12.0
CA
Doritos
Overview
Motivation Multi-Dimensional Data Model Research Areas Optimizations
Materializing multiple aggregates simultaneously Materialization strategy
Research Areas
SQL language extensions Server architecture Parallel processing Index structures Materialized views
Overview
Motivation Multi-Dimensional Data Model Research Areas Optimizations
Materializing multiple aggregates simultaneously Materialization strategy
Optimization to calculate multiple aggregates simultaneously Useful for materialization of aggregate views
Multiple Aggregates
Aggregate on
Product Coke Pepsi City San Diego Los Angeles Month Feb 06 Feb 06 Sales 12 13
Doritos
Altoids
San Diego
San Diego
Mar 06
Mar 06
72
65
...
Feb 36 37
Doritos
Heineken Pepsi Pringles Total
21
44 31 37 206
136
110 122 126 764
157
154 153 164 970
Multiple Aggregates
City / Product Altoids Coke Doritos Heineken Pepsi Pringles Total Month / Product Altoids Coke Month / City Los Angeles San Diego Total Feb 112 95 206 Mar 358 407 764 Total 469 501 970 Feb 36 37 San Diego 90 89 74 74 68 73 469 Mar 131 138 Los Angeles 77 86 83 80 85 90 501 Total 167 175 157 154 153 164 970 Total 167 175
Aggregate on
Product Coke Pepsi City San Diego Los Angeles Month Feb 06 Feb 06 Sales 12 13
Doritos
Altoids
San Diego
San Diego
Mar 06
Mar 06
72
65
...
Doritos
Heineken Pepsi Pringles Total
21
44 31 37 206
136
110 122 126 764
157
154 153 164 970
Multiple Aggregates
Aggregate on
Product Coke Pepsi City San Diego Los Angeles Month Feb 06 Feb 06 Sales 12 13
Doritos
Altoids
San Diego
San Diego
Mar 06
Mar 06
72
65
...
1. 2. 3. 4. 5. 6. 7.
Sales by Product / City Sales by Product / Month Sales by Month / City Sales by Product Sales by City Sales by Month Sales (Total)
Is it possible to make a single pass over the transactional table? calculate multiple aggregates simultaneously?
Chunking
Partition transactional data into array chunks
13 14 15 16
64
9
Dimension B
10
11
12 42
8 20
36
Dimension C
Month
Dimension A
Product
Product Coke City San Diego Month Feb 06 Sales 12
Nave Algorithm
13 Dimension A 14 15 16
64
9
Dimension B 5
10
11
12
42
6 7 8
36 20
4
Dimension C
Pivot on AB
aggregate on all C
Dimension A
Nave Algorithm
13
14 15 16
64
9
Dimension B
10
11
12
42 5
6 7 8
36 20
4
Dimension C
Pivot on AB
aggregate on all C
Dimension A
Pivot on AC
aggregate on all B
Pivot on BC
aggregate on all A
64
14
15
16
AC
B
10
11
12 42
8 20
36
4
Dimension C
BC
Dimension A
1234
64
13
14
15
16
AC
B
10
11
12 42
8 20
36
159 13
2 6 10
3 7 11
4 5 12
4
Dimension C
BC
13
9 10 11 12 5678
Dimension A
1234
64
13
14
15
16
AC
B
10
11
12 42
8 20
36
159 13
2 6 10
3 7 11
4 5 12
4
Dimension C
BC
13
9 10 11 12 5678
Dimension A
1234
64
13
14
15
16
AC
B
10
11
12 42
8 20
36
159 13
2 6 10
3 7 11
4 5 12
4
Dimension C
BC
Dimension A
13
Array Chunk
ABC
1 2 3 4
4x4x4
AC
AB
16 x 4 x 4
AC
4x4x4
BC
4x4
159 13
2 6 10
3 7 11
4 5 12
A
4x4
B
4
C
4
BC
all
1
13
ABC
ABD
ACD
BCD
AB
AC
BC
AD
BD
CD
all
Overview
Motivation Multi-Dimensional Data Model Research Areas Optimizations
Materializing multiple aggregates simultaneously Materialization strategy
Month
Time Id City Id City City Id Product Id Week Week
State
Sales
Product Id Name Category Category Id Category Name
simple idea: Q1 depends on Q2 (Q1Q2) if Q1 can be fully answered using the results of Q2
none
none
direct-product lattice
ptc pt pcatt pwc pyc tc pc pmc pts ps
Assume that all queries are identical to some view in the lattice
Discussion
Questions from the audience