0% found this document useful (0 votes)

94 views50 pages

Online Analytical Processing (OLAP) : An Overview

This document provides an overview of Online Analytical Processing (OLAP). It discusses the motivation for OLAP which is to enable aggregation, summarization and exploration of historical data to help management make informed decisions. It also describes how OLAP has different requirements than OLTP in terms of data size, time span, workload and performance goals. The document outlines research areas in OLAP including query language extensions, server architecture, parallel processing, index structures and materialized views. It provides details on techniques for simultaneously calculating multiple aggregates in a single pass over the data through array chunking and maintaining minimum spanning trees of aggregates.

Uploaded by

Neha Kohli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views50 pages

Online Analytical Processing (OLAP) : An Overview

Uploaded by

Neha Kohli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 50

Online Analytical Processing (OLAP)

An Overview

Kian Win Ong, Nicola Onose

Mar 3rd 2006

Overview
Motivation Multi-Dimensional Data Model Research Areas Optimizations
Materializing multiple aggregates simultaneously Materialization strategy

Motivation
Aggregation, summarization and exploration Of historical data To help management make informed decisions

Different Goal
Aggregation, summarization and exploration Of historical data To help management make informed decisions
Product Coke (0.5 gallon) Pepsi (0.5 gallon) Coke (1 gallon) Altoids Branch Convoy Street UTC UTC Costa Verde Time 2006-03-01 09:00:01 2006-03-01 09:00:01 2006-03-01 09:00:02 2006-03-01 09:01:33 Price $1.00 $1.03 $1.50 $0.30

...

Find the total sales for each product and month Find the percentage change in the total monthly sales for each product

Different Requirements
OLTP On-Line Transaction Processing OLAP On-Line Analytical Processing
OLTP
Tasks Day to day operation

OLAP
High level decision support Terabytes

Size of database

Gigabytes

Time span

Recent, up-to-date

Spanning over months / years

Size of working set

Tens of records, accessed through primary keys

Structured / repetitive

Consolidated data from multiple databases

Ad-hoc, exploratory queries Query latency

Workload

Performance

Transaction throughput

Overview
Motivation Multi-Dimensional Data Model Research Areas Optimizations
Materializing multiple aggregates simultaneously Materialization strategy

Query Language Extensions

In the real world, data is stored in RDBs.

Query Language Extensions

In the real world, data is stored in RDBs.

How to express N-dimensional problems using 2D tables?

Query Language Extensions

In the real world, data is stored in RDBs.

How to express N-dimensional problems using 2D tables?

Can we combine OLAP and SQL queries?

Jim Gray et al: Data Cube: A Relational Aggregation Operator 1997

Query Language Extensions

Problems with GROUP BY 1.histograms
SELECT sales, prod_name, population FROM sales_history GROUP BY Population(City, State) as population

Query Language Extensions

Problems with GROUP BY 1.histograms 2.rollup/drilldow Product Product n
Category Name Drinks Coke

non relational representation

Month Sales Sales by Cat., by Name Sales by Cat.

Feb

30.3

Mar
Heineken Feb Mar

93.9
34.8 123.8

124.2

158.6

282.8

Query Language Extensions

Problems with GROUP BY 1.histograms 2.rollup/drilldow Product Product n
Category Name Drinks Coke

relational, but the rollup is huge

Month Sales Sales by Cat., by Name
124.2

Sales by Cat.

Feb

30.3

282.8

Drinks
Drinks Drinks

Coke
Heineken Heineken

Mar
Feb Mar

93.9
34.8 123.8

124.2
158.6 158.6

282.8
282.8 282.8

Query Language Extensions

Problems with GROUP BY 1.histograms 2.rollup/drilldown 3.cross tabulations
2-D aggregation is more compact and more natural:
Drinks Feb Mar Total

Coke
Heineken Total

30.3
34.8 65.1

93.9
123.8 217.7

124.2
158.6 282.8

Query Language Extensions

Reducing the number of attributes
Product Category Drinks Drinks Drinks Drinks Product Name Coke Coke Coke Heineken Month Sales

Feb Mar ALL Feb

30.3 93.9 124.2 34.8

Drinks
Drinks Drinks Drinks Drinks

Heineken
Heineken ALL ALL ALL

Mar
ALL ALL Feb Mar

123.8
158.6 282.8 65.1 217.7

Query Language Extensions

Reducing the number of attributes

introduce a new value: ALL

Drinks Feb Mar Total (ALL)

Coke
Heineken Total (ALL)

30.3
34.8 65.1

93.9
123.8 217.7

124.2
158.6 282.8

ALL = the set over which we aggregate

Query Language Extensions

General approach GROUP BY (1D)
Sales by Product Name Coke Heineken Feb 30.3 34.8 Mar 93.9 123.8

SUM

65.1

217.7

Query Language Extensions

General approach GROUP BY (1D) Cross Tab (2D)
the corresponding relation:
Product Category Drinks Drinks Drinks Drinks Drinks Drinks Coke Heineken ALL Feb 30.3 34.8 65.1 Mar 93.9 123.8 217.7 ALL 124.2 158.6 282.8 Drinks Product Name Coke Coke Coke Heineken Heineken Heineken Month Sales

Feb Mar ALL Feb Mar ALL

30.3 93.9 124.2 34.8 123.8 158.6

Drinks
Drinks Drinks

ALL
ALL ALL

Feb
Mar ALL

65.1
217.7 282.8

Query Language Extensions

General approach GROUP BY (1D) Cross Tab (2D) Cube (3D)
By cat. and name (does it make sense?) By cat. and month
Product Category Drinks Drinks Drinks Snacks Snacks Snacks Product Name Coke Coke Coke Doritos Doritos Doritos Month Sales Feb Mar ALL Feb Mar ALL 30.3 93.9 124.2 123.8 158.6 65.1

ALL

964.0

By month and name

ALL

Query Language Extensions

General approach GROUP BY (1D) Cross Tab (2D) Cube (3D)

Any hypercube can be represented as a relation!

Query Language Extensions

General approach a CUBE relation, with aggregation function f(.)
(x1, x2, , xn-1, xn, f() ) (x1, xn-1, , xn, ALL, f() ) (x1, x2, , ALL, xn, f() )

after ROLLUP , reduce to a linear # of tuples

(x1, x2, , xn-1, xn, f() ) (x1, xn-1, , xn, ALL, f() ) (x1, x2, , ALL, ALL, f() ) (ALL, ALL, , ALL, ALL, f() )

Query Language Extensions

The new operators: CUBE, ROLLUP
SELECT prod_category, prod_name, month, SUM(sales) AS sales FROM sales_history GROUP BY CUBE prod_category, prod_name, month
Product Category Drinks Drinks Drinks Product Name Coke Coke Coke Month Sales

Feb Mar ALL

30.3 93.9 124.2

Idea: Group by the CUBE list. Union the aggregates. Introduce the ALL values.

Drinks ALL

ALL ALL

Feb ALL

99.8 964.0

Query Language Extensions

The new operators: CUBE, ROLLUP
SELECT prod_category, month, day, state, prod_name, SUM(sales) AS sales FROM sales_history GROUP BY prod_category ROLLUP month, day CUBE city, state
Product Category
Drinks

Month

Day

State

Product Name
Coke Heineken

Sales

Feb Feb

26 26

CA CA

12.3 5.4

Feb
Feb Snacks Feb

26
26 26

CA
ALL

ALL
Coke

30.4
12.0

Doritos

Overview
Motivation Multi-Dimensional Data Model Research Areas Optimizations
Materializing multiple aggregates simultaneously Materialization strategy

Research Areas
SQL language extensions Server architecture Parallel processing Index structures Materialized views

Overview
Motivation Multi-Dimensional Data Model Research Areas Optimizations
Materializing multiple aggregates simultaneously Materialization strategy

Simultaneous Multi-Dimensional Aggregates

Y. Zhao, P. Deshpande, J. Naughton An Array-Based Algorithm for Simultaneous Multidimensional Aggregates SIGMOD 1997

Optimization to calculate multiple aggregates simultaneously Useful for materialization of aggregate views

Multiple Aggregates

Aggregate on
Product Coke Pepsi City San Diego Los Angeles Month Feb 06 Feb 06 Sales 12 13

Doritos
Altoids

San Diego
San Diego

Mar 06
Mar 06

72
65

...

Month / Product Altoids Coke

Feb 36 37

Mar 131 138

Total 167 175

Doritos
Heineken Pepsi Pringles Total

21
44 31 37 206

136
110 122 126 764

157
154 153 164 970

Multiple Aggregates
City / Product Altoids Coke Doritos Heineken Pepsi Pringles Total Month / Product Altoids Coke Month / City Los Angeles San Diego Total Feb 112 95 206 Mar 358 407 764 Total 469 501 970 Feb 36 37 San Diego 90 89 74 74 68 73 469 Mar 131 138 Los Angeles 77 86 83 80 85 90 501 Total 167 175 157 154 153 164 970 Total 167 175

Aggregate on
Product Coke Pepsi City San Diego Los Angeles Month Feb 06 Feb 06 Sales 12 13

Doritos
Altoids

San Diego
San Diego

Mar 06
Mar 06

72
65

...

Doritos
Heineken Pepsi Pringles Total

21
44 31 37 206

136
110 122 126 764

157
154 153 164 970

Multiple Aggregates

Aggregate on
Product Coke Pepsi City San Diego Los Angeles Month Feb 06 Feb 06 Sales 12 13

Doritos
Altoids

San Diego
San Diego

Mar 06
Mar 06

72
65

...

1. 2. 3. 4. 5. 6. 7.

Sales by Product / City Sales by Product / Month Sales by Month / City Sales by Product Sales by City Sales by Month Sales (Total)

Is it possible to make a single pass over the transactional table? calculate multiple aggregates simultaneously?

Chunking
Partition transactional data into array chunks
13 14 15 16

9
Dimension B

12 42

City Array Chunk

1
12

8 20

Dimension C

Month
Dimension A

Product
Product Coke City San Diego Month Feb 06 Sales 12

Nave Algorithm
13 Dimension A 14 15 16

9
Dimension B 5

42
6 7 8

36 20

4
Dimension C

Pivot on AB
aggregate on all C
Dimension A

Nave Algorithm
13
14 15 16

9
Dimension B

42 5
6 7 8

36 20

4
Dimension C

Pivot on AB
aggregate on all C
Dimension A

Pivot on AC
aggregate on all B

Pivot on BC
aggregate on all A

Single Pass Algorithm

AB
13
1 2 3 4

AC
B

12 42

8 20

4
Dimension C

Dimension A

1234

Make a single pass over data

Single Pass Algorithm

AB
13 9 5 10 6 11 7 12 8

AC
B

12 42

8 20

159 13

2 6 10

3 7 11

4 5 12

4
Dimension C

BC
13

9 10 11 12 5678

Dimension A

1234

Simultaneously maintain multiple aggregates

Single Pass Algorithm

AB
13 9 5 10 6 11 7 12 8

AC
B

12 42

8 20

159 13

2 6 10

3 7 11

4 5 12

4
Dimension C

BC
13

9 10 11 12 5678

Dimension A

1234

Write out completed aggregates

Single Pass Algorithm

AB
13 9 5 10 6 11 7 12 8

AC
B

12 42

8 20

159 13

2 6 10

3 7 11

4 5 12

4
Dimension C

Dimension A

Only allocate memory that is necessary

Single Pass Algorithm

AB
13 9 5 10 6 11 7 12 8

Array Chunk

ABC
1 2 3 4

4x4x4

AB
16 x 4 x 4

AC
4x4x4

BC
4x4

159 13

2 6 10

3 7 11

4 5 12

A
4x4

B
4

C
4

all
1
13

Minimum memory spanning tree

Multi Pass Algorithm

Recursively aggregate
ABCD

ABC

ABD

ACD

BCD

all

Overview
Motivation Multi-Dimensional Data Model Research Areas Optimizations
Materializing multiple aggregates simultaneously Materialization strategy

Implementing Data Cubes

Biggest problem for data warehouses: the size Space / time trade-off: accelerate queries by materializing the cube

Implementing Data Cubes

Biggest problem for data warehouses: the size Space / time trade-off: accelerate queries by materializing the cube The size of the relations gets even bigger!

Implementing Data Cubes

Biggest problem for data warehouses: the size Space / time trade-off: accelerate queries by materializing the cube The size of the relations gets even bigger! M(ultidimensional)OLAP: good query performance, but bad scalability R(elational)OLAP: very scalable; query performance improved by materializing (partial) results

Implementing Data Cubes

V. Harinarayan, A. Rajaraman, J.D. Ullman: Implementing Data Cubes Efficiently SIGMOD 1996 Presents a materialization strategy for the cells of the cube.

Implementing Data Cubes

Month Day Year Year

Month
Time Id City Id City City Id Product Id Week Week

State

Sales
Product Id Name Category Category Id Category Name

Implementing Data Cubes

casted as particular case of the rewriting using views problem what cells to materialize what SQL views to materialize

Implementing Data Cubes

casted as particular case of the rewriting using views problem what cells to materialize what SQL views to materialize
ptc pt t tc p none pc c p = product t = time c = city

simple idea: Q1 depends on Q2 (Q1Q2) if Q1 can be fully answered using the results of Q2

Implementing Data Cubes

but cube dimensions are usually hierarchical
product_name product_category X week day month year X city state none

none

direct-product lattice
ptc pt pcatt pwc pyc tc pc pmc pts ps

p = product t = time c = city

Implementing Data Cubes

Def. cost of answering Q = # of rows in the table of ancestor(Q) It can be estimated w/o materializing the views

Assume that all queries are identical to some view in the lattice

Implementing Data Cubes

For a set S and a view v B(v,S) = wv, (w not in S) max{cost(w)-cost(v), 0} Greedy algorithm for selecting k views to materialize from the lattice:
1. S := {top view} 2. For i=1 to k, add v to S s.t. B(v,S) is maximized

The greedy algorithm is an (e-1)/e 0.63 approx. of the optimum.

Discussion
Questions from the audience

Data Analyst Cheat Sheet
No ratings yet
Data Analyst Cheat Sheet
28 pages
CH13
No ratings yet
CH13
52 pages
MySQL - Common Queries
100% (1)
MySQL - Common Queries
92 pages
OLTP and OLAP
No ratings yet
OLTP and OLAP
46 pages
Session 5 BIZ
No ratings yet
Session 5 BIZ
69 pages
Baonhh SQL
No ratings yet
Baonhh SQL
49 pages
De Lab Programs
No ratings yet
De Lab Programs
32 pages
17 Olap
No ratings yet
17 Olap
32 pages
DM Cia1
No ratings yet
DM Cia1
31 pages
Essbase Intro
No ratings yet
Essbase Intro
35 pages
SQL - Question
No ratings yet
SQL - Question
4 pages
Lec04 SQL Aggregation Grouping
No ratings yet
Lec04 SQL Aggregation Grouping
38 pages
OLAP Data Mining
No ratings yet
OLAP Data Mining
44 pages
Capstone Project
No ratings yet
Capstone Project
57 pages
Olap Ssas
No ratings yet
Olap Ssas
69 pages
C2.FDB ORCL - Integration Analytical Views ROLAP
No ratings yet
C2.FDB ORCL - Integration Analytical Views ROLAP
50 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
69 pages
Database Modeling - Notes-X
No ratings yet
Database Modeling - Notes-X
4 pages
Data Warehousing (Advanced Query Processing) : Carsten Binnig Donald Kossmann
No ratings yet
Data Warehousing (Advanced Query Processing) : Carsten Binnig Donald Kossmann
55 pages
Lecture 4
No ratings yet
Lecture 4
31 pages
Dimensions and Dependent Attributes
No ratings yet
Dimensions and Dependent Attributes
20 pages
CS 345: Topics in Data Warehousing - Lecture 2
No ratings yet
CS 345: Topics in Data Warehousing - Lecture 2
27 pages
2 1 Datawarehouses
No ratings yet
2 1 Datawarehouses
56 pages
Data Warehouse - Logical Design
No ratings yet
Data Warehouse - Logical Design
40 pages
Lecture 6
No ratings yet
Lecture 6
26 pages
Francis X.E. Albert - The School of Nisibis - Its History and Statutes - Catholic Univ Bulletin V12 1906 P. 171
100% (1)
Francis X.E. Albert - The School of Nisibis - Its History and Statutes - Catholic Univ Bulletin V12 1906 P. 171
593 pages
OLAP Vs OLTP 1635783645
No ratings yet
OLAP Vs OLTP 1635783645
44 pages
OLAP1
100% (1)
OLAP1
31 pages
On-Line Analytical Processing: Analyzing Data Resources
No ratings yet
On-Line Analytical Processing: Analyzing Data Resources
60 pages
Data Warehousing: Data Models and OLAP Operations: by Kishore Jaladi
No ratings yet
Data Warehousing: Data Models and OLAP Operations: by Kishore Jaladi
41 pages
SQL 1729830819
No ratings yet
SQL 1729830819
10 pages
Olap 2
No ratings yet
Olap 2
46 pages
Unit 3 SQL
No ratings yet
Unit 3 SQL
72 pages
Modeling Multidimensional Databases
No ratings yet
Modeling Multidimensional Databases
12 pages
APEX DW Lab4
No ratings yet
APEX DW Lab4
34 pages
Database Systems I Data Warehousing: CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 391
No ratings yet
Database Systems I Data Warehousing: CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 391
36 pages
SQL Project - Exploring Trends, Segmentation & KPIs
No ratings yet
SQL Project - Exploring Trends, Segmentation & KPIs
43 pages
Lecture 4: More SQL: Monday, January 13th, 2003
No ratings yet
Lecture 4: More SQL: Monday, January 13th, 2003
58 pages
OLAP Functions Part 1
No ratings yet
OLAP Functions Part 1
41 pages
OLAP Queries in SQL
No ratings yet
OLAP Queries in SQL
11 pages
Lecture 3: Business Intelligence: OLAP, Data Warehouse, and Column Store
No ratings yet
Lecture 3: Business Intelligence: OLAP, Data Warehouse, and Column Store
119 pages
Project Report On: University of Mumbai
No ratings yet
Project Report On: University of Mumbai
51 pages
An Introduction To Data Warehousing: Yannis Kotidis
No ratings yet
An Introduction To Data Warehousing: Yannis Kotidis
32 pages
OLAP and Data Warehousing: Slides Courtesy Of: Julia Stoyanovitch
No ratings yet
OLAP and Data Warehousing: Slides Courtesy Of: Julia Stoyanovitch
46 pages
Data Ware House Concept 2019 (Compatibility Mode) PDF
No ratings yet
Data Ware House Concept 2019 (Compatibility Mode) PDF
25 pages
L17-18 PPT IVSem
No ratings yet
L17-18 PPT IVSem
38 pages
Training On Oracle Hyperion Products Suite: Amit Sharma
No ratings yet
Training On Oracle Hyperion Products Suite: Amit Sharma
35 pages
02 Olap
No ratings yet
02 Olap
41 pages
Advanced Concepts in SQL
No ratings yet
Advanced Concepts in SQL
5 pages
Data Warehousing & OLAP (Business Intellegent)
No ratings yet
Data Warehousing & OLAP (Business Intellegent)
31 pages
Grouping and Summarizing Data
No ratings yet
Grouping and Summarizing Data
34 pages
Data Warehousing: Online Analytical Processing (OLAP)
No ratings yet
Data Warehousing: Online Analytical Processing (OLAP)
44 pages
Data Cube
No ratings yet
Data Cube
42 pages
03 Data Warehousing Data Mining MIM
No ratings yet
03 Data Warehousing Data Mining MIM
48 pages
OLAP Operation in R
No ratings yet
OLAP Operation in R
6 pages
23 - Pratiksha Nimgade (ADBMS Assi-06)
No ratings yet
23 - Pratiksha Nimgade (ADBMS Assi-06)
8 pages
Difference Between Column-Stores and OLAP Data Cubes
No ratings yet
Difference Between Column-Stores and OLAP Data Cubes
3 pages
Class 10 Maths Chapter 1 - REAL NUMBERS EXERCISE SOLUTIONS
No ratings yet
Class 10 Maths Chapter 1 - REAL NUMBERS EXERCISE SOLUTIONS
27 pages
Data Warehousing and Decision Support
No ratings yet
Data Warehousing and Decision Support
8 pages
PST Worksheets
No ratings yet
PST Worksheets
2 pages
Amj Final Cover Letter s15
No ratings yet
Amj Final Cover Letter s15
2 pages
Normal Distribution1
100% (1)
Normal Distribution1
8 pages
Soldier Foundation of India NGO Darpan
No ratings yet
Soldier Foundation of India NGO Darpan
5 pages
Manjul Bhargava
No ratings yet
Manjul Bhargava
7 pages
Enhancing and Scalability in Big Data and Cloud Computing: Future Opportunities and Security
No ratings yet
Enhancing and Scalability in Big Data and Cloud Computing: Future Opportunities and Security
7 pages
A-heavy-metal-tolerant-novel-bacterium,-Bacillus-malikii-sp.-nov.,-isolated-from-tannery-effluent-wastewater_2015_Antonie-van-Leeuwenhoek,-International-Journal-of-General-and-Molecular-Microbiology.pdf
No ratings yet
A-heavy-metal-tolerant-novel-bacterium,-Bacillus-malikii-sp.-nov.,-isolated-from-tannery-effluent-wastewater_2015_Antonie-van-Leeuwenhoek,-International-Journal-of-General-and-Molecular-Microbiology.pdf
12 pages
DW - Rolap Molap Holap
No ratings yet
DW - Rolap Molap Holap
48 pages
CAM Magazine January 2009 - Green Building Products, Construction Safety, CAM Expo Showcase
No ratings yet
CAM Magazine January 2009 - Green Building Products, Construction Safety, CAM Expo Showcase
96 pages
Analysis of Risk and Returns of Sahara Mutual Funds: A Project Report On
No ratings yet
Analysis of Risk and Returns of Sahara Mutual Funds: A Project Report On
7 pages
Econometrics Word File
No ratings yet
Econometrics Word File
13 pages
DLL - Mathematics 6 - Q2 - W4
No ratings yet
DLL - Mathematics 6 - Q2 - W4
6 pages
FILE - 20220826 - 162755 - 3.1 Multiply and Divide by 0.1, 0.01
No ratings yet
FILE - 20220826 - 162755 - 3.1 Multiply and Divide by 0.1, 0.01
24 pages
2. Trọng Âm e 8 (Unit 7-12)
No ratings yet
2. Trọng Âm e 8 (Unit 7-12)
5 pages
Phsycoanalytic Analysis
No ratings yet
Phsycoanalytic Analysis
4 pages
(Dey, Pradip Ghosh, Manas) Computer Fundamentals (B-Ok - Xyz)
No ratings yet
(Dey, Pradip Ghosh, Manas) Computer Fundamentals (B-Ok - Xyz)
42 pages
Angka Penting
No ratings yet
Angka Penting
45 pages
MB0043 Human Resource Management Units 1-5
No ratings yet
MB0043 Human Resource Management Units 1-5
22 pages
Vision Technique
No ratings yet
Vision Technique
14 pages
2nd Term Syllabus Sba23 24
No ratings yet
2nd Term Syllabus Sba23 24
1 page
The Disposal of Activated Carbon From Chemical Agent Disposal Facilities
No ratings yet
The Disposal of Activated Carbon From Chemical Agent Disposal Facilities
13 pages
Assignment 1
No ratings yet
Assignment 1
14 pages
(LEARNING TASKS 6) Proper Etiquette and Safety in The Use of Facilities and Equipment
No ratings yet
(LEARNING TASKS 6) Proper Etiquette and Safety in The Use of Facilities and Equipment
3 pages
Hatem Saber CV
No ratings yet
Hatem Saber CV
3 pages
Where Do You Write The Scope and Delimitations
No ratings yet
Where Do You Write The Scope and Delimitations
3 pages
GRCon17 Program 1
No ratings yet
GRCon17 Program 1
1 page
Katherine M. Pineda: Languages
No ratings yet
Katherine M. Pineda: Languages
1 page
Application Note: Measuring Moisture Content Using Water Activity
No ratings yet
Application Note: Measuring Moisture Content Using Water Activity
3 pages
Springs Heavy Gauge World Summary: Market Values & Financials by Country
From Everand
Springs Heavy Gauge World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Fabricated Steel Plate Work World Summary: Market Values & Financials by Country
From Everand
Fabricated Steel Plate Work World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
Big Data Visualization
From Everand
Big Data Visualization
James D. Miller
No ratings yet

Online Analytical Processing (OLAP) : An Overview

Uploaded by

Online Analytical Processing (OLAP) : An Overview

Uploaded by

Online Analytical Processing (OLAP)

Kian Win Ong, Nicola Onose

Spanning over months / years

Size of working set

Tens of records, accessed through primary keys

Consolidated data from multiple databases

Query Language Extensions

Query Language Extensions

How to express N-dimensional problems using 2D tables?

Query Language Extensions

How to express N-dimensional problems using 2D tables?

Jim Gray et al: Data Cube: A Relational Aggregation Operator 1997

Query Language Extensions

Query Language Extensions

non relational representation

Query Language Extensions

relational, but the rollup is huge

Query Language Extensions

Query Language Extensions

Feb Mar ALL Feb

30.3 93.9 124.2 34.8

Query Language Extensions

introduce a new value: ALL

ALL = the set over which we aggregate

Query Language Extensions

Query Language Extensions

Feb Mar ALL Feb Mar ALL

30.3 93.9 124.2 34.8 123.8 158.6

Query Language Extensions

By month and name

Query Language Extensions

Any hypercube can be represented as a relation!

Query Language Extensions

after ROLLUP , reduce to a linear # of tuples

Query Language Extensions

Feb Mar ALL

30.3 93.9 124.2

Query Language Extensions

Simultaneous Multi-Dimensional Aggregates

Month / Product Altoids Coke

Mar 131 138

Total 167 175

City Array Chunk

Single Pass Algorithm

Make a single pass over data

Single Pass Algorithm

Simultaneously maintain multiple aggregates

Single Pass Algorithm

Write out completed aggregates

Single Pass Algorithm

Only allocate memory that is necessary

Single Pass Algorithm

Minimum memory spanning tree

Multi Pass Algorithm

Implementing Data Cubes

Implementing Data Cubes

Implementing Data Cubes

Implementing Data Cubes

Implementing Data Cubes

Implementing Data Cubes

Implementing Data Cubes

Implementing Data Cubes

p = product t = time c = city

Implementing Data Cubes

Implementing Data Cubes

The greedy algorithm is an (e-1)/e 0.63 approx. of the optimum.

You might also like