0% found this document useful (0 votes)
62 views25 pages

1.6 Efficient Data Cube Computation & Indexing OLAP

Uploaded by

hareeeee14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views25 pages

1.6 Efficient Data Cube Computation & Indexing OLAP

Uploaded by

hareeeee14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

SRI KRISHNA COLLEGE OF ENGINEERING

AND TECHNOLOGY

M.Tech. Computer Science and Engineering

21CSI501 DATA WAREHOUSING AND MINING

MODULE 1

1.6 DATA WAREHOUSE IMPLEMENTATION


Efficient data cube computation – Indexing OLAP data

Faculty - Dr.D.Prabha
DATA WAREHOUSE IMPLEMENTATION

• Data Warehouses contain huge volumes of data.


• Business People use Data Warehousing, to make
decisions from historic data.
• Inorder to make decisions, business people will use
data warehousing, to be answered in order of seconds.
• Crucial for data warehouse systems to support highly
efficient cube computation techniques.
Efficient Data Cube Computation

• Multi-dimensional data analysis – Efficient computation of


aggregations across many set of dimensions.
• In SQL, aggregations are referred to as group – by’s.
• All these dimensions are combined in the cuboid .
• Each group-by can be represented by a cuboid, where the set
of group – by’s forms a lattice of cuboids defining a data
cube.
Compute cube Operator and Curse of
Dimensionality
Compute cube operator :
It computes the aggregates over all subsets of the dimensions
specified in the operation.
Syntax :
Compute cube cubename
Example
Consider we define the data cube for an electronic store “Best
Electronics”
Dimensions are :
• City
• Item
• Year

Measure :
• Sales_in_dollars
Example : Compute cube operator
The statement “ compute cube sales “
• It explicitly instructs the system to compute the sales
aggregate cuboids for all the subsets of the set { item, city,
year}
• Generates a lattice of cuboids making up a 3D data cube
‘sales’
• Each cuboid in the lattice corresponds to a subset
Example : Compute cube operator
Cont...
BASE CUBOIDS APEX CUBOIDS
Return total sales for any Group by is empty – contains
combination of 3 dimensions total sum of all sales.

Least generalized - Most Most generalized and least


specific of the cuboids. specific

Explore downwards – Drilling Explore upwards – Drilling up /


down within the data cube. Rolling up within the data cube.
Cont...

SQL SYNTAX :
define cube sales_cube [city, item , year] : sum
(sales_in_dollars)

compute cube sales_cube


Compute cube operator
Advantages
• Computes all the cuboids for the cube in advance
• Online analytical processing needs to access different cuboids
for different queries.
• Pre-computation leads to fast response time

Disadvantages
• Required storage space may explode if all of the cuboids in
the data cube are pre computed
Cont...
Consider the following 2 cases for n dimensional cube
Case 1 : Dimensions have no hierarchies
• Then the total number of cuboids computed for a n
dimensional cube = 2n
Case 2: Dimensions have hierarchies
• Then the total number of cuboids computed for a n
dimensional cube :

Where Li is the number of levels associated with dimension i.


1 = Virtual Top (all)
Curse of dimensionality

• The storage requirements are more excessive, when


dimensions have multiple levels of concept hierarchy is
referred to as Curse of Dimensionality.
• Size of each cuboid also depends on the cardinality.
• Cardinality – number of distinct values in each dimensions.
• Many cuboids are large in size, only some of the cuboids are
materialized .
Types of Materialization
No Materialization :
• Do not pre compute any of the “non – base cuboids”.
• Leads to expensive multidimensional which is extremely slow.

Full Materialization :
• Pre compute all of the cuboids.
• Resulting lattice of computed cuboids called as Full Cube.
• Huge amount of memory space in order to store all of the pre
computed cuboids.
Cont....
Partial Materialization :
• Selectively compute a proper subset of the whole set of possible
cuboids.
• Resulting lattice of computed cuboids called as Sub Cube.

Factors :
• Identify the subset of cuboids or sub cubes.

• Exploit the cuboids or sub cube during query processing.

• Efficiently update the materialized cuboids during load and


refresh.
Cont....
ICEBERG CUBE:
• A data cube that stores only those cube cells with an aggregate
values.
SHELL CUBE:
• Pre - computing the cuboids only for a small number of
dimensions.
Indexing of OLAP Data

• To facilitate efficient data accessing, Data warehouses support


index structures and materialized views.
Index OLAP data by
• Bitmap Indexing
• Join Indexing
Bitmap Indexing
• It allows quicker searching in data cubes.
• Bit map index is an alternative representation of the
record_ID.
• In the bit map index for a given attribute, there is a distinct bit
vector Bv.
• If the attribute has the value v for a given row, then the bit
represents that value is set to 1 , all other bits for that row are
set to 0.
Advantages of Bitmap Indexing
• Useful for low cardinality domains.
• Leads to significant reduction in space

Example
ABC Electronics, dimensions – item at top levels has four values
(types) : “home entertainment, computer, phone and security”.
Suppose that cube is stored as a relational table, each item
consists of four values. The table has dimensions item , city and
mapping to bitmap index tables for dimensions.
Cont...
RID Item city RID H C P S
R1 H V R1 1 0 0 0
R2 C V R2 0 1 0 0
R3 P V R3 0 0 1 0
R4 S V R4 0 0 0 1
R5 H T R5 1 0 0 0
R6 C T R6 0 1 0 0
R7 P T R7 0 0 1 0
R8 S T R8 0 0 0 1

Base Table Item Bitmap Index Table


Cont...
RID Item city RID V T
R1 H V R1 1 0
R2 C V R2 1 0
R3 P V R3 1 0
R4 S V R4 1 0
R5 H T R5 0 1
R6 C T R6 0 1
R7 P T R7 0 1
R8 S T R8 0 1

Base Table City Bitmap Index Table


Join Indexing
• Join Indexing registers the joinable rows of two relations.
• Join Index records can identify joinable tuples without
performing costly join operations.
• Useful for maintaining the relationship between a foreign key
and its matching primary keys from joinable relation.
• Star schema model of data warehousing makes join indexing.
• Because linkage between a fact table and its corresponding
dimension table
Cont...
• Join indexing maintains relationships between attribute values
of a dimension and corresponding rows in a fact table.
• Join indices may span multiple dimensions to form Composite
join indices.
Example
ABC Electronics, “sales_star [time, item, branch, location] :
dollars_sold = sum (sales_in_dollars)”. Join index relationship
between sales fact table and dimension tables of location and
item.
Cont...
Join Index table for Join Index table for
location/sales item/sales
LOCATION SALES_KEY ITEM SALES_KEY
Main street T57 Sony-TV T57
Main street T238 Sony-TV T459
Main street T884 .... ...
.... .....

Join Index table linking location and item to sales


LOCATION Item SALES_KEY
Main street ......
Main street Sony-TV T57
Main street .... .....
.... ...... .....
Cont...
Linkages between Sales Fact table and location and item
dimension tables.
Location Sales Item

T57
Sony_TV
Main Street
T238

T459

T884
Cont...
• To speed up query processing, join and bit map indexing
methods can be integrated to form Bit mapped join indices.

You might also like