Data Cube
Data Cube
CUBES
Online Analytic
Processing
OLAP
2
OLAP
• OLAP: Online Analytic
Processing
• OLAP queries are complex queries that
• Touch large amounts of data
• Discover patterns and trends in the data
• Typically expensive queries that take long
time
Select salary
• Also called decision-support queries
From Emp
• In contrast to OLAP: Where ID =
• OLTP: Online Transaction 100;
• Processing
OLTP queries are simple queries, e.g., over banking or
airline systems
• OLTP queries touch small amount of data for fast
transactions
3
OLTP vs. OLAP
On-Line Transaction Processing (OLTP):
– technology used to perform updates on operational
or transactional systems (e.g., point of sale systems)
OLA
OLA
P
P
Server
Internal
Reports
Source
s Data Dat Query and
Integration a
Warehouse Analysis
Operation Componen
Componen Data
al DBs t
t Minin
g
Met
a
data Clien
Externa
l t
Source Tool
s s
5
OLAP AND DATA WAREHOUSE
• Typically, OLAP queries are executed over a separate copy
of the working data
• Over data warehouse
7
EXAMPLE OLAP APPLICATIONS
• Market Analysis
• Find which items are frequently sold over the summer
but not over winter?
gender
nts
de
c i
age ac
'
10
DATA
CUBES
• Data cube is a structure that enable OLAP to
achieves the multidimensional functionality.
Some dimensions can have multiple levels forming
a hierarchy.
For example dates have year, month, day;
geography has country, region, city;
product might have category, subcategory and
the product.
Dimensions And
Measures
Data Cubes
Concepts
• Three important concepts associated
with data cubes :
1. Slicing.
2. Dicing.
3. Rotating.
Slicin
g
• the term slice most often refers to a
two- dimensional page selected from
the cube.
Slicing-Wireless
Mouse
Slicin
g
Slicing-
Asia
Dicin
g
• A related operation to slicing .
• For example …
– rotating may consist of swapping the rows and
columns, or moving one of the row dimensions
into the column dimension
– or swapping an off-spreadsheet dimension with
one of the dimensions in the page display
Rotatin
g
Dimension
s
• represents descriptive categories of data
such as time or location.
• child category
– is the next lower level category in a drill-
down path.
Categorie
s
Categorie
s
measur
es
• The measures are the actual data values
that occupy the cells as defined by the
dimensions selected.
• Measures include facts or variables
typically stored as numerical fields.
measur
es
Computed versus Stored Data Cubes
• The goal is to retrieve the information
from the data cube in the most efficient
way possible.
• Three possible solutions are:
– Pre-compute all cells in the cube.
– Pre-compute no cells.
– Pre-compute some of the cells.
Computed versus Stored Data Cubes
• If the whole cube is pre-computed
– Advantage
• the queries run on the cube will be very
fast.
– Disadvantage
• pre-computed cube requires a lot of
memory.
Computed versus Stored Data Cubes
• To minimize memory requirements, we can
pre- compute none of the cells in the cube.