0% found this document useful (0 votes)
122 views

Data Cube

OLAP (Online Analytical Processing) allows users to perform complex analysis on large amounts of data in order to discover patterns and trends. OLAP queries are run against a data warehouse that collects and organizes data from multiple sources. Data warehouses use a multi-dimensional model with measures and dimensions to enable users to analyze data from different perspectives.

Uploaded by

GauravBhatt
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views

Data Cube

OLAP (Online Analytical Processing) allows users to perform complex analysis on large amounts of data in order to discover patterns and trends. OLAP queries are run against a data warehouse that collects and organizes data from multiple sources. Data warehouses use a multi-dimensional model with measures and dimensions to enable users to analyze data from different perspectives.

Uploaded by

GauravBhatt
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

DATA

CUBES
Online Analytic
Processing
OLAP

2
OLAP
• OLAP: Online Analytic
Processing
• OLAP queries are complex queries that
• Touch large amounts of data
• Discover patterns and trends in the data
• Typically expensive queries that take long
time
Select salary
• Also called decision-support queries
From Emp
• In contrast to OLAP: Where ID =
• OLTP: Online Transaction 100;
• Processing
OLTP queries are simple queries, e.g., over banking or
airline systems
• OLTP queries touch small amount of data for fast
transactions
3
OLTP vs. OLAP
 On-Line Transaction Processing (OLTP):
– technology used to perform updates on operational
or transactional systems (e.g., point of sale systems)

 On-Line Analytical Processing (OLAP):


– technology used to perform complex analysis of the
data in a data warehouse
OLAP is a category of software technology that enables
analysts, managers, and executives to gain insight into
data through fast, consistent, interactive access to a
wide variety of possible views of information that has
been transformed from raw data to reflect the
dimensionality of the enterprise
as understood by the user.
[source: OLAP Council: www.olapcouncil.org] 4
OLAP AND DATA WAREHOUSE

OLA
OLA
P
P
Server
Internal
Reports
Source
s Data Dat Query and
Integration a
Warehouse Analysis
Operation Componen
Componen Data
al DBs t
t Minin
g
Met
a
data Clien
Externa
l t
Source Tool
s s
5
OLAP AND DATA WAREHOUSE
• Typically, OLAP queries are executed over a separate copy
of the working data
• Over data warehouse

• Data warehouse is periodically updated, e.g., overnight


• OLAP queries tolerate such out-of-date gaps

• Why run OLAP queries over data warehouse??


• Warehouse collects and combines data from multiple sources
• Warehouse may organize the data in certain formats to support
OLAP queries
• OLAP queries are complex and touch large amounts of data
• They may lock the database for long periods of time
• Negatively affects all other OLTP transactions
6
OLAP ARCHITECTURE

7
EXAMPLE OLAP APPLICATIONS

• Market Analysis
• Find which items are frequently sold over the summer
but not over winter?

• Credit Card Companies


• Given a new applicant, does (s)he a credit-worthy?
• Need to check other similar applicants (age, gender,
income, etc…) and observe how they perform, then
do prediction for new applicant

OLAP queries are also called


“decision- support”
queries 8
MULTI-DIMENSIONAL VIEW
• Data is typically viewed as
Locatio points in multi-dimensional
n space
NY
Item MA
s CA Raw data cubes
(raw level
bread 10 without
Orange
aggregation)
juice
47

Milk 2%fat 30 Typical OLAP


applications have
Milk 1%fat 12 many dimensions
Tim
3/1 3/2 3/3 3/4 e
9
ANOTHER EXAMPLE

gender
nts
de
c i
age ac
'

10
DATA
CUBES
• Data cube is a structure that enable OLAP to
achieves the multidimensional functionality.

• The data cube is used to represent data along


some measure of interest.

• Data Cubes are an easy way to look at the data


( allow us to look at complex data in a simple
format).

• Although called a "cube", it can be 2-dimensional,


3- dimensional, or higher-dimensional.
DATA
CUBES
• databases design s is for OLTP and
efficiency in data storage.

• data cube design is for efficiency in


data retrieval (ensures report
optimization).

• The cube is comparable to a table in


a relational database.
Dimensions Measures and Hierarchies
• data cubes have categories of data called
dimensions and measures.
• measure
– represents some fact (or number) such as cost
or units of service.
• dimension
– represents descriptive categories of data such
as time or location.
Hierarchy

Some dimensions can have multiple levels forming 
a hierarchy. 
For example dates have year, month, day; 
geography has country, region, city; 
product might have category, subcategory and 
the product.
Dimensions And
Measures
Data Cubes
Concepts
• Three important concepts associated
with data cubes :
1. Slicing.
2. Dicing.
3. Rotating.
Slicin
g
• the term slice most often refers to a
two- dimensional page selected from
the cube.

• subset of a multidimensional array


corresponding to a single value for one or
more members of the dimensions not in
the subset.
Slicin
g

Slicing-Wireless
Mouse
Slicin
g

Slicing-
Asia
Dicin
g
• A related operation to slicing .

• in the case of dicing, we define a subcube


of the original space.

• Dicing provides you the smallest


available slice.
Dicin
g
SELECT PRODUCT, SUM(REVENUE) FROM SALES
WHERE PRODUCTS= ‘OPV’ GROUP BY
PRODUCTS ;---- Slicing
EXAMPLE:
SELECT PRODUCT, SUM(REVENUE) FROM
SALES WHERE PRODUCTS= ‘EL’ AND
LOCATION=’EUROPE’ GROUP BY PRODUCTS;
---------DICING
Usage

Slice is used to select one particular dimension


from a given cube and to provide a new subcube.

Dice is used to select two or more dimensions from


a given cube and to provide a new subcube.
Rotatin
g
• Some times called pivoting.

• Rotating changes the dimensional orientation


of the report from the cube data.

• For example …
– rotating may consist of swapping the rows and
columns, or moving one of the row dimensions
into the column dimension
– or swapping an off-spreadsheet dimension with
one of the dimensions in the page display
Rotatin
g
Dimension
s
• represents descriptive categories of data
such as time or location.

• Each dimension includes different levels


of categories.
Dimension
s
Categorie
s
• is an item that matches a specific
description or classification such as years in
a time dimension.

• Categories can be at different levels


of information within a dimension.
Categorie
s
• parent category
– is the next higher level of another category in
a drill-up path.

• child category
– is the next lower level category in a drill-
down path.
Categorie
s
Categorie
s
measur
es
• The measures are the actual data values
that occupy the cells as defined by the
dimensions selected.
• Measures include facts or variables
typically stored as numerical fields.
measur
es
Computed versus Stored Data Cubes
• The goal is to retrieve the information
from the data cube in the most efficient
way possible.
• Three possible solutions are:
– Pre-compute all cells in the cube.
– Pre-compute no cells.
– Pre-compute some of the cells.
Computed versus Stored Data Cubes
• If the whole cube is pre-computed
– Advantage
• the queries run on the cube will be very
fast.
– Disadvantage
• pre-computed cube requires a lot of
memory.
Computed versus Stored Data Cubes
• To minimize memory requirements, we can
pre- compute none of the cells in the cube.

• But the queries on the cube will run more


slowly.

• As a compromise between these two, we can


pre- compute only those cells in the cube which
will most likely be used for decision support
queries.
representation of
Totals
• A simple data cube does not contain totals.
• The storage of totals increases the size of
the data cube but can also decrease the
time to make total-based queries.

• A simple way to represent totals is to add


an additional layer on n sides of the n-
dimensional data cube.
representation of
Totals

You might also like