0% found this document useful (0 votes)

69 views9 pages

03 - A Survey On OLAP

Uploaded by

Sourabh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views9 pages

03 - A Survey On OLAP

Uploaded by

Sourabh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

A Survey on OLAP

K. Dhanasree C. Shobabindu
Dept of CSE, DRKIST Dept of CSE, JNTUA College of Engineering
Hyderabad, Telangana, India Anantapuramu, Andhra Pradesh, India
[email protected] [email protected]

Abstract--Online analytical processing is to-days major The remaining parts of the paper are organized as follows:
database technology that has completely changed the face of In section 2 we briefly discuss the classification of OLAP
decision support systems. Many of the enterprise real-time technologies. In section 3 we discuss the data accessing
analytical solutions are provided using most advanced OLAP methods. In section 4 we have discussed when and where to
methods. In this paper, we have presented the overview of the use these technologies. In section 5 we have discussed about
various OLAP technologies and their access paths. The focus of OLAP in distributed scenario. Finally section 6 concludes the
this paper is on OLAP in distributed scenario, where we pinned paper.
on the drawback of OLAPs natural indexing search. We designed
a new translated lattice called the pchrome lattice, whose nodes
are binary. We implemented the natural indexing on this II. OLAP TECHNOLOGIES
translated lattice and showed a drastic reduce in indexing search Organizations huge data is a critical resource which is in
space, search time and distributed communication cost. need of powerful tools to fetch queried information .OLAP is
one such powerful technology providing sophisticated tools
Keywords- MOLAP, ROLAP, HOLAP, B-tree, Bitmap, R-trees, R*- for an enterprise to meet its competitive goal. Currently there
trees, R-cube. are three dominant OLAP technologies:
• Multidimensional OLAP (MOLAP).
I. INTRODUCTION
• Relational OLAP (ROLAP).
In the past decades we have been using various database • Hybrid OLAP (HOLAP).
technologies to answer many of user queries either simple or
complex. The prominent use of the database technology is
seen in business enterprise where decision making is prior A. MOLAP
than transactions. Traditional database systems are In MOLAP the preprocessed data is aggregated and
transactional processing systems, which can access only few uploaded periodically in a multidimensional array structure
tuples for database reads and writes [1]. Their major called Data cube [4]. Basing on the dimensional hierarchies
drawback is they cannot handle the user decision making the data cube is divided into sub-cubes. For a data cube with n
queries. This is because decision making is an instant dimensions without hierarchies there can be a total of 2n sub
comparison of past data and present data and traditional cubes. With hierarchies defined the number of sub cubes
databases does not store any past data. To handle enormous increases. As the dimensions and dimensional hierarchies
past and present data and to support decision making queries increase the cube becomes larger with many sub-cubes. As
many of the enterprises are using an extended database such a molap query for a user requested sub-cube has to spend
technology called data warehouse. Data warehouses differ time for an on fly analysis. To make this on fly analysis faster
very much from the traditional database applications. Data what followed by molap is pre-computation. Pre-computation
warehouses are mainly used by major business enterprises, to is a generic support for short response times where some of
analysis their business trends and to track their business the sub-cubes are materialized [5]. Materialization is way
profits. Analysts use the data warehouse to extract the where some of the needed measures like sum, average are
business information that enables better decision making. This calculated pre hand and the values are stored in the sub-cubes.
type of interactive decision making process is provided by In molap all these measures are stored in arrays, referenced by
OLAP (On-line Analytical Processing) tools [2]. These OLAP dimensional names that are strings. Between the warehouse
applications mostly use only data reads for their decision and the user front end tools a Molap cube sits analyzing the
making. Real time complex analytical queries are answered user requested data. For a Molap cube with huge dimensional
using OLAP. hierarchies many of the smaller granules of the cube will be
The most commonly used OLAP technologies are left pre-computed. This is what is the dimensional cursity[6]
Multidimensional On-line Analytical Processing (MOLAP), of the data cube, where many sparse sub-cubes are generated.
Relational on-line Analytical Processing (ROLAP) and hybrid The main problem with sparsity is many of the olap
on-line Analytical Processing (HOLAP) [3].They are different methodologies will search through the sparse cube to identify
in their data processing capabilities. They have their own whether the user requested sub-cube is materialized or not.
supporting data accessing methodologies. Though they are This may increase the query waiting time. Research has
opposing technologies they are widely recognized by many of provided with many methodologies on which sub-cubes to
the today’s decision making enterprises. materialize [7]. To our knowledge there is less work done on

978-1-5090-0612-0/16/$31.00 ©2016 IEEE

2016 IEEE International Conference on Computational Intelligence and Computing Research

what sub-cubes are materialized. Molap has its own born

advantage with its natural array structure, which is flexible for
many of the olap accessing methods and analysis on present
and past data can be easily done. The outer cube layer contains
the present recent data and the inner sub-cubes contain the
past data. MOLAP uses many operations to perform an on fly
analysis. All the queries will be directly posed on the molap Fig. 3. Holap Architecture
array based lattice structure shown in figure 5. Using a string
matching technique the requested view can be fastly retrieved. D. OLAP OPERATIONS AND OPERATORS
Even the molap structure supports easy aggregation of data
along multiple dimensions. To perform the multidimensional analysis on fly and for
faster query responses OLAP includes the following basic
operations:
Roll-Up: Otherwise called as aggregation where data from low
levels to high levels is aggregated to provide a summarization
at the high levels.
Drill-down: Allows data navigation from higher level to lower
Fig. 1. Molap Architecture
level data.
Slicing: Describes the selection of data along single dimension
B. ROLAP of which the view is a table.
Dicing: Describes a selection of data along multiple
In rolap the warehouse data is stored in relational or dimensions whose view is again a sub cube.
extended-relational database. Rolap uses tables to store the
past and the present data [8]. There is a greater scalability with Using the above operations olap will present user requested
rolap server for large data sets. Between the data warehouse multidimensional analysis. Using Roll-up operations sub-
and the client front end tools the rolap server is used which is totals can be aggregated to grand totals, using drill-down the
a collection of multiple tables. The problem of sparsity does application can navigate from grand total to sub-totals. Using
not arise here because tables can be joined to return the user dicing operation a sub-cube can be selected. Using slicing a
query if needed with the multiple group bys. In rolap pre- cross section of the cube is selected i.e. a table can be selected.
computed data is not stored in advance. The aggregates from
multiple tables are calculated on fly. Considering rolap, user To perform the above operations OLAP uses two types of
requested aggregates may be in multiple tables. Here the rolap operators:
server follows a translation of user query to a multi-statement • The group-by.
SQL (Structured query language) query posed on multiple • The cube-by.
tables. On fly analysis from multiple tables may take much
time and this is the main drawback of rolap. TABLE 1. SALES DATA

Parts Locations Time Sales

quantity
A1 B1 C1 4
A1 B2 C1 2
A2 B3 C1 2
A3 B3 C2 4
A3 B3 C3 1
Fig. 2. Rolap Architecture

C. HOLAP 1. Group-by operator

The group-by is a usual relational operator. It typically
HOLAP combines the features of both MOLAP and operates by partitioning the relation into disjoint tuples and
ROLAP. It supports both of MOLAPs multidimensional then aggregating along the given dimension. Group by
structure and ROLAPs sql constructs. It uses the features of operator does not allow a direct aggregation along a column
MOLAP to address fast processing of user queries and uses i.e. rolling of sub-totals to a grand total or drilling down of
the features of ROLAP to address the processing of large data grand total to sub-totals [10]. For instance consider the sales
Holap stores the most recent data in molap to enhance data as shown in table 1, where location B1 has dimensional
faster access and stores past data in rolap. HOLAP addresses a hierarchy’s b1,b2.
complex query by dividing it into sub-queries [9]. The sub-
queries which span dense data sets are directed to MOLAP A query of the type
and the sub-queries which span sparse data sets are directed to SQL Query 1:
ROLAP. Select A,B,C,SUM(quantity)
2016 IEEE International Conference on Computational Intelligence and Computing Research

from sales SQL query 3:

group by B,A; Select A,B,C, Sum(quantity)
from sales
Query 1 shows grouped data based on the dimensions B, A. Cube -by A, B, C;
It shows total quantity of items grouped by the dimension B
with reference to the dimension A, but does not show sub- The above cube-by query calculates aggregates along all
totals on single dimension B or on single dimension A i.e. it possible dimensional combinations. Since there are 3
shows the quantity of distinct values in B or A i.e. B1=4, dimensions the cube-by operator calculates 23=8 possible
B2=2, B3=7 individually but cannot drill-down to B1 and give aggregates ABC, AB,AC,BC,A,B,C ,ALL and thus
sub-totals on hierarchies in B1 i.e. cannot give b1=2,b2=2. addressing the failed case of group-by operator. The cube-by
These types of sub totals may fetch the decision makers that operator along with where conditions can provide aggregates
cannot be provided by group-by operator. on dimensional hierarchies.

2. The problem with group-by operator 4. Drawback of Cube-by operator

The main problem with group-by operator is it will not The major drawback with cube-by operator is it more
provide multidimensional group-bys, even it cannot analyze expensive [11]. More expensive in the sense the cost of usage
the aggregates of dimensional hierarchies. The group-by is high. For roll-up or drill-down operations the cube by
operator cannot support drill down and rollup operations operator has to carry with huge lots of string cube by
which are meant for multidimensional analysis. Queries of the dimensions. This may increase the operator file size. In SQL
type: Display the total sales from the sub location b1 cannot be query 3 the cube by operator has to carry with all 3
answered. In order to provide a multidimensional aggregative dimensions A,B,C that are characters and thereby the cube
analysis the application has to perform the union of multiple operator occupies three bytes of search space. But in reality
SQL statements as follows: these dimensions are strings which when included with cube-
by operator may considerably increase the search space.
SQL Query 2: Carrying all these huge string data from outer layer of the olap
Select A,B,C, Sum(quantity) structure to inner layer may still increase the search space.
from sales This has made the cube operator to be less choosable though it
Group-by A,B,C; is advantageous over group-by operator. In the context of this
UNION survey we state a research problem:
Select A,B,C, Sum(quantity)
from sales Problem 1 description: Cant there be a better transformation
Group-by A,B; of the cube-by string dimensions to somewhat like binaries
UNION which exactly identifies the requested views and thus can
Select A,B,C, Sum(quantity) reduce the search space of the cube-by operator.
from sales
Group-by A;
III. OLAP ACCESSING METHODS
UNION
Select A,B,C, Sum(quantity) To increase the performance of olap, research has provided
from sales many accessing methods. In this paper we present our survey
Group-by B; on two major accessing methodologies:
UNION • OLAP Pre-computation.
Select A,B,C, Sum(quantity) • OLAP Indexing.
from sales
A. OLAP Pre-computation
Group-by C;
UNION Pre-computation is a way where parts of the whole cube are
Select A,B,C, Sum(quantity) materialized to provide fast query responses. The advent of
from sales warehouse technology has led to many sophisticated pre-
Group-by (); computing methods. Parts of data cube that are pre-computed
Even such type of union of SQL group-bys cannot address can also be termed as materialized views [12]. With pre-
analysis on dimensional hierarchies and this is the failure case computation the question is: which views to be materialized,
of group-by operator. and how to optimally materialize. Choosing which views to
materialize is based on three important factors: the query cost,
3. The cube operator view materialization cost and storage space. Sub-cubes
The cube operator is designed to address the drawback of requested by common queries are given priority for
SQL group-by. The union of multi SQL statements given in materialization. Sub-cubes most frequently requested for are
SQL query 2 can be replaced with a single cube-by operator as pre-computed.
follows:
2016 IEEE International Conference on Computational Intelligence and Computing Research

There are many approaches on how pre-computation is For instance with an initial sort of ABCD, the prefix group-
performed: bys ABC,AB,A can be pre-computed without actually sorting
them ,thus reducing additional sorts.
1. Multi-way
Multi-way array aggregation discussed in [13] pre- 4. Hashing
computes the aggregates using array as its basic structure and The hash based method is based on optimizations of cache
is a full cube computation method. It makes use of chunk results and scans [16]. Usual pre-computation methods incur
concept where the entire cube memory is partitioned into multiple scans of the dimensional attributes which is costly.
chunks. These chunks are then simultaneously aggregated For instance in one scan the aggregate ABC is pre-computed.
across multiple dimensions to pre-compute various sub-cubes. To compute AB, again we have to scan AB once, thus taking
The multi-way array aggregation is faster as it is done on the two scans of the same attributes.
molap structure using a direct array addressing.
Instead the hash based method caches the result to further
Figure 4 shows the multi-way array aggregation where reduce the scans. For example the hash based method
ABCD is a base cuboid. Memory chunking is done to fit maintains hash tables in memory where AB and AC can fit.
ABCD and from ABCD cuboids ABC,AB,A etc, can be Now in one scan of ABC both AB, AC can be pre-computed.
calculated allowing multiple aggregations across various
dimensions. 5. H-Cubing
A better cube computation is offered by H-cubing
ABCD discussed in [17].H-cubing computes on a tree like data
structure called the H-tree. From the lattice structure shown in
Figure 5 H-cubing constructs H-tree from which it computes
ABC BCD
the multidimensional aggregates. The advantage of H-cubing
is being in one level the method calculates the possible
AB BC aggregates within the same level before proceeding to the next
higher level.
A
6. Star Cubing
Fig. 4. Multi-way Star cubing discussed in [18] combines the features of
multi-way, BUC and H-cubing. It combines both top-down
The multi-way algorithm is infeasible for large number of and bottom-up computation approaches. From the lattice
dimensions, because the larger arrays may not fit into the structure shown in Figure 5 it constructs a star tree and
chunks. Even the method continues to compute unnecessary identifies the star nodes as the nodes not satisfying iceberg
aggregates without prior pruning them. conditions and prunes them. The advantage of star cubing is
being in one level the method even computes the aggregates of
2. BUC both lower and higher next levels using shared dimensions and
The bottom up construction (BUC) method addresses the simultaneously prunes the aggregates not satisfying iceberg
partial cube computation with iceberg conditions [14]. It conditions.
makes use of an apriori based pruning method where the cells
not satisfying a minimum threshold are pruned off to further B. OLAP Indexing
not be included in the pre-computation of other aggregates.
To support fast accessing to multidimensional aggregates
For the dimension A, if A is not satisfying the minimum the olap systems follow indexing. Many existing indexing
threshold then it cannot support to the aggregation of AB and methods are followed by both molap and rolap. We examine
ABC as well. BUC method is efficient in optimally utilizing each in the context of our survey.
the available memory by priorily pruning the unnecessary 1. Natural Indexing
aggregations.
Natural indexing also called array based indexing is
supported by MOLAP. The array structure of the MOLAP
3. Sorting
itself forms the natural indexing. Natural indexing is the only
The sort based methods are based on optimizations on the indexing method which is done on the storage layer of the data
sorted aggregates [15]. Usually any data warehouse model cube [19]. As the indexing is done on the storage layer it is
follows an order of the dimensional design. Irrespective of the faster and the requested views are retrieved in no time. The
query dimensional order the application has to sort query order lattice structure of the cube in Figure 5 presents arrays as
in accordance to the design order. The sort based method layered sub cubes at various levels. The end points of each
optimizes the sorts by priorily sorting the required group by layer represents one possible group-by of the dimensions
and then pre-computing the prefix group-bys from the initial which are usually strings .In natural indexing all these string
sort. nodes are stored in string arrays . Whenever user queries for
2016 IEEE International Conference on Computational Intelligence and Computing Research

an aggregate ABC , these aggregates are directly mapped on to the above query the group-by ABC can be assigned a BEx
the end points of the layers using natural indexing and the variable of the type SAPBWOODPP2. Then using the cognos
corresponding sub cube which is highlighted in Figure 5 is locate option the BEx variables are matched.
retrieved. Thus MOLAP’s natural indexing offers improved
performance by directly indexing on the structure of MOLAP. In the context of our survey here we project the drawback
But many enterprises prefer other indexing techniques as it of BEx queries: BEx variables are also long strings and using
doesn’t suit for large data sets. The major drawbacks with data string comparisons the search index file size may be large and
cube natural indexing are: there by the search time too long. We want to go with a
• When number of dimensions is more the cube becomes method where in the transformed query the user requested
sparser [20], that means several cells that represent string group-bys are transformed to unique binaries; there by
particular attribute combinations will not contain any the search index file size and search time is reduced.
aggregated data. There by the natural indexing search
for which sub-cubes are pre-computed becomes time 2. Tree Based Indexing And Variations
consuming. Both MOLAP and ROLAP supports tree based indexing
• The natural indexing search directly uses the user methods. One of the traditional tree based indexing is the B-
group-by dimensions, which is string data. This type of tree [22]. A B–tree indexing includes sub trees corresponding
search using string data even increases storage space to each dimension of the data cube. As the values of the cube
for large dimensions. dimensions are unique, B-tree uses these dimensions as index
pointers that point to the sub trees. By tracing the pointers,
data can be easily retrieved. For an 8 bytes column the B-tree
ALL…………………….level 0 index file size is 326 MB and the construction time is
1580s.To build index on a large column B-tree is expensive in
terms of space and construction time. The main drawback with
B-tree indexes are rebalancing the tree is needed with updates.
A B C D …level 1
Other popular tree based indexing structures that are supported
by both MOLAP and ROLAP technologies are R-trees [23],
aR-trees [24].The R-tree indexing supports complex range
queries to some extent. Much research was done on R*-trees to
extend them into structures like Ra*-trees [25], Hilbert R-trees.
AB AC AD BC BD CD…l2
All of these uses more sophisticated update algorithms; they
can answer complex range queries; they can dynamically
rebalance the tree structure whenever updates are performed.
The major drawbacks of tree based indexing are:
ABC ABD ACD BCD..level 3
• Huge storage.
• Supports only few dimensions.
3. Bitmap Indexing And Variations
ABCD…………………level 4 Bitmap indexing was introduced to enhance the
performance on various query types [26]. For each attribute of
Fig. 5. Lattice structure of cube with 4 dimensions the table one bitmap index is associated. Each row of the
bitmap vector is given a row-id starting from 0. Rows will
For instance consider the sql query
have distinct attribute values .The basic idea behind bitmap
SQL query 4: indexes is to use a string of binary numbers to indicate
Select * from sales whether the indexed attribute in a table is equal to a specific
Cube by A,B,C; value or not. If the bit is set to 1, it indicates that the row with
the corresponding row-id contains the key value; otherwise the
Here the molap indexing directly indexes the outer view ABC bit is set to 0. Complex queries on one or more dimensions
which is directly fetched from the string array using string can be answered by intersecting the bit maps over multiple
comparison technique and the view highlighted in figure 5 is dimensions and also by using AND/OR operations. The major
retrieved. This type of string comparison may take huge index advantages of bitmaps are:
file size and high comparison time and even the retrieval time
is bit increased. • Overcomes the storage limitation of B-trees.
• Bitmaps are retrieval efficient for low cardinalities.
To reduce the index file size and high string comparison • Sparse data can be efficiently handled.
time the MOLAP based model IBM Cognos8 uses a • More CPU efficient because of their simple
transformer module where user queries are transformed to representation.
BEx queries (Business Explorer queries) [21]. In the BEx The major disadvantages of bitmaps are:
queries the user requested string group-bys are represented • Efficiency decreases for high cardinalities.
using BEx variables that are also long strings. For example in • AND/OR operations are expensive.
2016 IEEE International Conference on Computational Intelligence and Computing Research

• As the dimensions increases more bitmap vectors are TABLE 2. OLAP TECHNOLOGIES
needed; results in overhead of storage space. OLAP Features Adopted by
• Cannot support huge reads/updates. Technology

To address this storage overhead encoded bit maps [27], High Microsoft SQL
MOLAP performance, less server 2005, Essbase
hybrid bitmap methods are introduced [28].The encoded scalable for huge server from
bitmap indexing can be used for large cardinalities. The basic dimensions hypersion
idea of encoded bitmap indexing is to encode the attribute
domain. There by we can reduce the number of bit vectors and Low performance, Microsoft SQL
thus reduce the storage space. ROLAP More scalable for server 2005,
huge dimensions Micro strategy’s Dss
Other variations of bitmaps: projection bitmap, bit-sliced server, Informixs
indexes are discussed in [29]. meta cube

4. OLAP Join Indexes High performance SAS server.

HOLAP and scalable for huge
Most popular olap index is the join index [30].All the dimensions.
traditional indexing methods discussed above indexes by
mapping a column value to a group of rows having that value B. Choice between OLAP Indexes
with in a same relation. Further every relation need to be
indexed separately and if needed with tuples from two are The main problem with OLAP indexing is still today there
more tables traditional indexing methods increase the cost of is no definite guideline for an analyst to choose best suited
joins. This may increase the index size with increased indexing method:
relations. In contrast the join index provides a grouped index
on two or more tables. They contain indexed records that • For a DW system with very few dimensions a MOLAP
contain joinable rows of relations. Thus making it easy to natural indexing is advisable. For an enterprise which
identify the joinable tuples and without going for further is at it’s begin set a MOLAP natural indexing can be
costly joins. adopted.
• For a DW system which is frequently updated tree
IV. CHOICE BETWEEN OLAP TECHNOLOGIES indexing is advisable because tree indexing does not
Not all real word enterprises are using either strictly need rebalancing. For an enterprise using only few
MOLAP or strictly ROLAP or HOLAP. Their choice is dimensions B-trees are advisable. To answer complex
varying according to the potential benefits of OLAP like range queries the variation of tree indexing, R*_ trees
which improves decision making, which provides accurate are advisable [32].
analysis, provides all user required information, that improves • Bitmap index is best suited for columns having less
working efficiency, and that increases user productivity. number of distinct values. Bit map indexing supports
more number of dimensions than B-trees. For a DW
A. Choice Between MOLAP / ROLAP / HOLAP system which is not frequently updated a bitmap
indexing is advisable.
Table 3 shows various OLAP indexing techniques their
• MOLAP best suits non sophisticated users as it uses features and the enterprises implementing them.
user friendly graphical visualization techniques;
whereas use ROLAP for sophisticated users [31]. TABLE 3. OLAP INDEXING
• If the users are needed with consistent information for
a period of time MOLAP is preferred. If the
OLAP Features Adopted by
requirement changes frequently ROLAP should be Indexing
used because of its flexible query capability. Easy and fast Microsoft
• For decision making on past data MOLAP should be Natural retrieval, supports corporation, IBMs
adopted. Whereas for decision making on current data Indexing few dimensions Cognos.
ROLAP is adopted.
Supports huge IBM SPSS,
• Because of the easiness of MOLAP, it is recommended Tree dimensions, huge Oracle,
at the beginning of an enterprise. After considerable indexing storage space, and Red Brick
decision making experience, a ROLAP system is more retrieval time
preferred because of its flexibility and ability to handle when compared to
complex queries. natural and bit map.
Easy, supports huge Inter system
Bit Map dimensions, less Corporation,
Table 2 shows various OLAP technologies their features and Indexing storage when Oracle, Red Brick,
the enterprises implementing them. compared to tree DB2.
index
2016 IEEE International Conference on Computational Intelligence and Computing Research

C. Indexing Performance Study V. OLAP IN DISTRIBUTED SCENARIO

Usually data warehouses contain real world’s huge
Here we study the performance of various olap indexing amounts of data that must be analyzed. Most of to-days OLAP
methods by comparing the index file size and search time on applications work on data warehouse with a centralized
varied dimensions. For example consider a warehouse sales on structure in which a single database contains huge amounts of
three dimensions A,B,C. suppose each dimension is included data. As data warehouse tend to be extremely large, the
with 10000 tuples: consider the query centralized data warehouse is very expensive. This has led to
SQL query 5: the distributed scenario where the large data warehouse can be
partitioned and distributed across various locations. All the
Select count(*) from sales OLAP technologies scale considerably well in the distributed
Where A=’X’ scenario.
Group by A,B,C;
A. Distributed MOLAP
The index file size in units of memory bytes, the index
construction time and the query retrieval time in units of Many current distributed OLAP systems use MOLAP
seconds is compared as shown in Table 4. approach. The main reason is its fast execution of OLAP
queries. But MOLAP does not scale well in case of more
TABLE 4. INDEX COMPARISON number of dimensions. But many of the distributed enterprises
are following a vertical fragmentation of the dimensions,
Index File Construction Retrieval which allows few dimensions to be at each location and then
Type Size Time(µs) Time(µs)
going for a join of these fragments for user queries. The
(bytes)
Array 10 1200 10 scenario even supports centralized accessing methods like
natural indexing, B-tree indexing and bitmap indexing.
Btree 10 1000 40
B. Distributed ROLAP
Bitmap 6 800 20
Though many vendors are sacrificing scalability for
performance; to support current huge data warehouse
enterprises are adopting scalable ROLAP. Indexing by R-trees
Bitmap index requires less storage space than B-tree as and R*_ trees are still supported. Many of the to-days ROLAP
shown in Table 4. As the number of dimensions increases the enterprises are with a novel distributed indexing called the
B-tree indexing fails to scale to the increased file size. Figure RCUBE indexing [33]. RCUBE indexing is fast and is a
6 shows the search time comparisons of various indexing combination of packed R-trees with distributed stripping and
methods. The natural indexing presents fast search for less Hilbert curve based data ordering.
number of dimensions, but as the dimensions increase the The supporting features of RCUBE are:
search time also increases. This is because the natural
indexing search is carried out with huge dimensional group- • Low communication volume.
bys that are string data and searching with string data takes • Scalable in terms of data sizes and dimensions.
comparatively more time. The B-tree and Bitmap search scales
well even for high dimensions. C. Distributed HOLAP
Many of current distributed systems use HOLAP. The
distributed approach of MOLAP stores the frequently
requested sub-cubes in MDDB (Multi dimensional data base)
and less popular parts on a remote RDB (Relational Database)
[34]. In addition to its flexibility to larger data sets, distributed
HOLAP provides other features like:
• Caching - The distributed HOLAP saves the results of
a query so that it can be reused later.
• Logging- The distributed HOLAP creates a log file
where the information about each query is stored.

D. Distributed OLAP Querying

The distributed nature of OLAP architecture results in the
following costs [35]:
• Query processing cost.
• Communication cost.
Fig. 6. Search time comparison
2016 IEEE International Conference on Computational Intelligence and Computing Research

E. Query Processing Cost surveyed that in distributed scenario OLAP redirects the query
in a translated form which includes the group by attributes that
Early olaps are criticized for being inefficient in handling are strings. Communicating string group-bys to more than one
complex queries with huge operations. The efficiency of node may increase the communication cost.
distributed olap is measured in the way the query is optimized
.Query optimization is a way in which complex queries are The GMDJ relations discussed above are used to count the
transformed to include less cost operations. Many of the number of query redirects to various distributed nodes. While
distributed olaps follow query optimization to reduce the redirecting these GMDJ relations, the SKALLA system
processing cost by using transformation mechanisms in the includes reduced base table of the original query thus reducing
search space. A desirable optimization is the one which incurs the communication cost. But the reduced base relations still
less cost by reducing the search space and minimize the includes with the string dimensions which may increase the
response time. communication cost. We are now with a problem of:
The distributed olap optimizations include many Problem 2 description: Can there be a better translation of
transformation techniques where the original query is group-bys from string data to somewhat like binary, there by
translated to some sort of algebraic expressions which communicating binaries instead of strings decrease the
represent the original query. Evaluation of the user query communication cost. We started our work by combining these
using these algebraic expressions is of less cost when two problems and planning to publish as our future extension
compared to evaluation of the original query. Many of the to distributed OLAP technologies. Using this translation of
distributed query evaluation techniques are successful in cube-by dimensions to binaries we can address the problem of
minimizing the search time but failed to reduce the search cube-by operator and can make molap less costly thus can
space. make the molap technology to be adopted by all.
The SKALLA system discussed in [36] uses
Multidimensional Join (MDJ) and Generalized MDJ (GMDJ) VI. CONCLUSIONS AND FUTURE WORK
operators for expressing olap queries. The GMDJ operator
optimizes the complex olap queries by separating the In this paper we discussed about prominent OLAP
aggregate functions, the definitions and the dimensions from technologies and their accessing methods. Though MOLAP
the complex query into operator notations that are of less cost. and ROLAP are different in features both are considerably
While separating the query into GMDJ expressions the extending their services to real time decision making. At the
SKALLA system still includes the string dimensions of the beginning many enterprises are adopting MOLAP then after a
cube by query. These types of GMDJ expressions with string better acquaintance with the usage they are switching to
dimensions may decrease the query response time but the ROLAP. We provided a survey on various standard olap
search space which includes the string dimensions may still be accessing methods and their disadvantages. In the context of
large. our survey we projected on MOLAPs advantage of fast search
and retrieval. Because of molaps dimensional cursity problem
For instance let A, B be two table, f1,f2…fn be the list of many enterprises are moving to rolap Even the distributed olap
aggregate functions, a1,a2…an and b1,b2….bn be the suffered from increasing the communicating cost. Research in
dimensional attributes of A,B. The GMDJ expression is also a data cube technology is still arriving with new indexing
relation of the type (f1Aa1, f2Aa2…..fnAan, f1Bb1….fnBbn). methods. Some of these methods are still under our study and
Since the search is done using these GMDJ expressions that we will project them in our future work.
include the dimensions and dimensional hierarchies
We are working towards the problem of making MOLAP
Aa1,Aa2…,Bb1,Bb2…. ,that are strings , increases the search technology to be used in such a way to reduce sparsity and
space. render fast search; and also communicating the group-bys in
F. Communication Cost the distributed molap cube architecture so as to reduce the
communication cost.
A decreased communication cost increases the efficiency
of the OLAP technology. As a future enhancement we want to map molap lattice
structure to a compressed lattice structure whose nodes are
For example consider the SQL query: binaries rather than strings. We want to go with a query
transformation mechanism which translates the string cube-by
SQL query 6: dimensions to the binaries which exactly represent the lattice
Select A, B, C, sum(s) nodes. Thus the search can be carried on the compressed
Cube by A, B, C,D; lattice with binaries there by reducing the query retrieval time
as well as search space. In the distributed olap architecture if
The query is a request for the group-by with 4 attributes the cube-by view is not present at a location then instead of
A,B,C,D that are strings. This group-by represents a communicating the string cube-by dimensions our method
materialized view of the whole MOLAP cube shown in figure communicates binaries that are empirically same as the
5. In a distributed scenario if this view is not present at a node, requested view and thereby reducing the communication cost.
then the query has to be redirected to other nodes. We have
2016 IEEE International Conference on Computational Intelligence and Computing Research

REFERENCES [32] H Gupta, V Harinarayan, A Rajaraman,” Index selection for OLAP”,

Data engineering,IEEE,1997.
[33] F Dehne ,T Eavis, A Rau-chaplin, ”RCube: Parallel multidimensional
[1] Hasan H and Hyland P, “Using OLAP and Multidimensional Data for
Rolap indexing” ,International journal of data warehousing and mining.
Decision Making”, IT Professional, September-October 2001,3(S),44-50
[34] A Wrinberger, M ender,” The power of OLAP in Multidimensional
,IEEE 2001.
world”,SUGI,2000.
[2] S. Agarawal, R. Agrawal, P.M .Deshpande, A. gupta, J. F. Naughton, R.
[35] A Bauer, W Lehner,” On solving the view selection problem in
Ramakrishnam, and S. Sarawagi,” On the Computation of
distributed data warehouse architecture”, scientific and statistical
Multidimensional Aggregates” ,VLDB,1996.
databases,IEEE,2003.
[3] Surajit Chaudhuri, Umeshwar Dayal, Venkatesh Ganti, “Database
[36] Michael O Akinde, Michael H Bohlen, Theodore Johnson,Lake V S
Technology for Decision Support Systems”, IEEE,2001.
Lakshmanan, Divesh Srivastava,” Efficient Olap query processing in
[4] S. Chaudhuri, U. Dayal , “An over view of Data warehousing and
distributed data warehouses”, Information systems,2003.
OLAP Technology”, ACM ,SIGMOD,1997.
[5] V Poosala, V Ganti,” Fast approximate query answering using pre-
computed stastics”, Data engineering proceedings,1999.
[6] D L Donoho,”High dimensional data analysis: The curse and blessings
of dimensionality”, AMS math challenges,2000. ABOUT THE AUTHORS
[7] E Barlis, S Parboschi, E Teniente,” Materialized view selection in a
multidimensional database”,VLDB,1997.
[8] K. Morfonios, S. Konakar, Y l Loanidis,”Rolap implementation of the
Dr. K. Dhanasree received her Ph.D degree in
data cube ”, ACM computing surveys,2007. Computer Science and Engineering from JNTU,
[9] C Salka,”Ending the rolap/molap debate: usage based aggregation and Anantapur. Her research interests include Data
flexible Holap”, Data engineering ,1998. Mining, Database Security, and Network
[10] J. Gray ,S. Chaudhuri, A. Bosworth, A Layman,” Data cube: A
relational aggregation operator generalizing group-by, cross tab and sub
Security.
totals”, Data mining and knowledge discovery,Springer,1997.
[11] W, wang, J feng, Hlu, JXyu, “ Condensed Cube : An effective approach
to reducing data cube size”, Data engineering, 2002.
[12] H. Karloff, M Mihail,” On the complexity of the view selection Dr. C .Shoba Bindu, received her Ph.D degree
problem”, ACM SIGMOD,1999. in Computer Science and Engineering from
[13] Y Zhao, PM Deshpande, JF Naughton,” An array based algorithm for
simultaneous multidimensional aggregates”, ACM SIGMOD,1997.
JNTU, Anantapur. She is now extending her
[14] K Beyes, R Rama Krishnan,” Bottom-Up computation of sparse and services as Professor, Dept. of CSE, JNTUA.
iceberg cube”, ACM SIGMOD, 1999. She has guided many external and internal
[15] R Agrawal, A Gupta, S Sarawagi,” Database system and method projects and has good contributions in many of reputed
employing data cube operator for group by operations”, US patent-
5832475,1998.
journals. Her research interests are Mobile and Adhoc
[16] RT Ng, A Wagner, Y yin,” Iceberg-cube computation with PC clusters”, Networks, Network Security, Data Mining and Cloud
ACM SIGMOD,2001. Computing.
[17] X Li, J Han, H Gonzalez,” High dimensional olap a minimal cubing
approach”, ACM,2004.
[18] D Xin, J Han, X Li, BW Wah,” Star-Cubing: computing iceberg cubes
by top-down and bottom-up integration”, ACM,2003.
[19] S Choudhuri, U Dayal, V Ganti,” Database technology for decision
support systems”,Data engineering,IEEE,2001.
[20] JS Vitter, M Wang,” Approximate computation of multidimensional
aggregates of sparse data using wavelets”, ACM SIGMOD,1999.
[21] IBM Cookbook for IBM Cognos 8.4 for use with SAP Netweaver
business warehouse.
[22] R Bayes,” The universal B-tree for multidimensional indexing: general
concepts”, World wide computing and its applications, springer,1997.
[23] A Guttman, ”R-trees: A dynamical index structure for spatial searching”,
ACM SIGMOD,1984
[24] D Papadias, P kalnis, J Zhang, Y Taw,” Efficient olap operations in
spatial data warehouses”, Advances in spatial and temporal
databases,springer,2001.
[25] M Jurgens ,HJ lenz,” the Ra*-tree:an improved R*-tree with
materialized data for supporting range queries on OLAP data”, ACM
sigmod ,1998.
[26] CY Chan, YE Loannidis ,”Bitmap index design and evaluation ”,ACM
sigmod,1997.
[27] Mc Wu, AP Buchman ,”Encoded bitmap indexing for data warehouses”,
Data engineering,IEEE,1998.
[28] K Wu, EJ Otoo, A Shoshani ,”Compressing bitmap indexes for faster
search operations”, scientific and statistical database,IEEE,2002.
[29] P O Neil, D Quass,” Improved query performance with variant indexes”,
ACM sigmod,1997.
[30] K Aouiche ,J Dasmont ,O Boursaid,” Automatic selection of bitmap
join indexes in data warehouses”, Data warehouse and knowledge
discovery, springer,2005.
[31] N Gorla,,” Features to consider in a data warehousing system”,
communications of the ACM, 2003.

Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
PGDM BA 04 - Data Mining
No ratings yet
PGDM BA 04 - Data Mining
10 pages
2.6 - OLAP Models
No ratings yet
2.6 - OLAP Models
7 pages
OLAP Architecture and Types
No ratings yet
OLAP Architecture and Types
11 pages
Olap Case Study - VJ
No ratings yet
Olap Case Study - VJ
16 pages
AyanHaldar DataMining Warehousing
No ratings yet
AyanHaldar DataMining Warehousing
5 pages
DM 24 Dwi Olap Queries Servers
No ratings yet
DM 24 Dwi Olap Queries Servers
4 pages
Data Database Data Mining Server Client Database Management Systems (DBMS)
No ratings yet
Data Database Data Mining Server Client Database Management Systems (DBMS)
4 pages
Online Analytical Processing (OLAP)
No ratings yet
Online Analytical Processing (OLAP)
34 pages
Unit 2
No ratings yet
Unit 2
63 pages
OLAP (Online Analytical Processing) : Zalpa Rathod (39) Yatin Puthran (37) Mayuri Pawar (35) Mitesh Patil
No ratings yet
OLAP (Online Analytical Processing) : Zalpa Rathod (39) Yatin Puthran (37) Mayuri Pawar (35) Mitesh Patil
37 pages
DW&DM 1,2&3
No ratings yet
DW&DM 1,2&3
58 pages
What Is OLAP
No ratings yet
What Is OLAP
11 pages
What Is OLAP - Merged
No ratings yet
What Is OLAP - Merged
14 pages
On-Line Analytical Processing (OLAP)
No ratings yet
On-Line Analytical Processing (OLAP)
26 pages
Online Analytical Processing (OLAP) : Marut - Buranarach@nectec - Or.th
No ratings yet
Online Analytical Processing (OLAP) : Marut - Buranarach@nectec - Or.th
33 pages
Chapter 3 Data Warehouse & OLAP
No ratings yet
Chapter 3 Data Warehouse & OLAP
17 pages
DWH Assignment
No ratings yet
DWH Assignment
6 pages
Business Intelligence
No ratings yet
Business Intelligence
2 pages
1) ROLAP Stands For Relational Online Analytical Processing
0% (1)
1) ROLAP Stands For Relational Online Analytical Processing
7 pages
Reference Short Notes For Mid Term Papers: CS614 - Date Warehousing
No ratings yet
Reference Short Notes For Mid Term Papers: CS614 - Date Warehousing
18 pages
Unit IV - Data Warehousing and OLAP Technologies PDF
No ratings yet
Unit IV - Data Warehousing and OLAP Technologies PDF
10 pages
DWH & Data Modeling
No ratings yet
DWH & Data Modeling
50 pages
Unit 5
No ratings yet
Unit 5
14 pages
DWH Lectures OLAP
No ratings yet
DWH Lectures OLAP
51 pages
Online Analytical Processing
No ratings yet
Online Analytical Processing
24 pages
OLAP
No ratings yet
OLAP
25 pages
Online Analytical Processing: Vinaybhutani
No ratings yet
Online Analytical Processing: Vinaybhutani
3 pages
Data Mining New Notes Unit 2 PDF
No ratings yet
Data Mining New Notes Unit 2 PDF
15 pages
DM Unit 2
No ratings yet
DM Unit 2
19 pages
What Is OLAP
No ratings yet
What Is OLAP
3 pages
Data Warehousing - C02 - OLAP
No ratings yet
Data Warehousing - C02 - OLAP
46 pages
Csonline Analytical Procc 1216309789772675 9
No ratings yet
Csonline Analytical Procc 1216309789772675 9
17 pages
Lecture 13introduction of OLAP
No ratings yet
Lecture 13introduction of OLAP
4 pages
Relation Between Olap: Data Warehouse and
No ratings yet
Relation Between Olap: Data Warehouse and
7 pages
Differentiate OLAP From OLTP 1.1. in Terms of Definition
No ratings yet
Differentiate OLAP From OLTP 1.1. in Terms of Definition
5 pages
Chapter 3 Olap and Oltp
No ratings yet
Chapter 3 Olap and Oltp
29 pages
Online Analytical Processing
No ratings yet
Online Analytical Processing
17 pages
Olap Molap
No ratings yet
Olap Molap
10 pages
Data Mining 5
No ratings yet
Data Mining 5
2 pages
OLAP
No ratings yet
OLAP
25 pages
Bi Unit 2
No ratings yet
Bi Unit 2
22 pages
Olap 2
No ratings yet
Olap 2
46 pages
Lecture 8 p2
No ratings yet
Lecture 8 p2
43 pages
MOLAP Vs ROLAP Vs HOLAP in Online Analytical Processing (OLAP) - Engineering Education (EngEd) Program - Section
No ratings yet
MOLAP Vs ROLAP Vs HOLAP in Online Analytical Processing (OLAP) - Engineering Education (EngEd) Program - Section
9 pages
What Is OLAP (Online Analytical Processing) : Cube, Operations & Types
No ratings yet
What Is OLAP (Online Analytical Processing) : Cube, Operations & Types
12 pages
Data Warehousing, OLAP, and Data Mining
No ratings yet
Data Warehousing, OLAP, and Data Mining
28 pages
Chapter 3 Olap and Oltp
No ratings yet
Chapter 3 Olap and Oltp
29 pages
OLAP2
No ratings yet
OLAP2
53 pages
OLTP (On-Line Transaction Processing) Is Characterized by A Large Number of Short On-Line Transactions
No ratings yet
OLTP (On-Line Transaction Processing) Is Characterized by A Large Number of Short On-Line Transactions
12 pages
Topic 3 MOLAP, ROLAP, HOLAP, DOLAP, and RTOLAP Cubes
No ratings yet
Topic 3 MOLAP, ROLAP, HOLAP, DOLAP, and RTOLAP Cubes
3 pages
OLTP (On-Line Transaction Processing) Is Characterized by A Large Number of Short On-Line Transactions
No ratings yet
OLTP (On-Line Transaction Processing) Is Characterized by A Large Number of Short On-Line Transactions
10 pages
Oltp VS Olap
100% (1)
Oltp VS Olap
9 pages
OLAP Implementation Techniques: High Performance Data Warehouse Design and Construction
No ratings yet
OLAP Implementation Techniques: High Performance Data Warehouse Design and Construction
34 pages
MOLAP
No ratings yet
MOLAP
7 pages
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
From Everand
An Investigation into the Use of a Neural Tree Classifier for Knowledge Discovery in OLAP Databases
David R Swinburne
No ratings yet
Optimizing Big Data Queries with LLAP: Definitive Reference for Developers and Engineers
From Everand
Optimizing Big Data Queries with LLAP: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Concise Oracle Database For People Who Has No Time
From Everand
Concise Oracle Database For People Who Has No Time
Billy Aung Myint
No ratings yet
Oracle Modernization Solutions
From Everand
Oracle Modernization Solutions
Tom Laszewski
No ratings yet
Presentación DL8000 PDF
100% (1)
Presentación DL8000 PDF
27 pages
Automatic Bottle Filling and Capping Machine Using SCADA With The Internet of Things
No ratings yet
Automatic Bottle Filling and Capping Machine Using SCADA With The Internet of Things
6 pages
Capstone-Thesis 1 To 3
No ratings yet
Capstone-Thesis 1 To 3
83 pages
Realistic Real-Time Rendering
No ratings yet
Realistic Real-Time Rendering
22 pages
Noteshub (Noteshub - Co.In) : Dbms Lab File 4 Semester
No ratings yet
Noteshub (Noteshub - Co.In) : Dbms Lab File 4 Semester
41 pages
Big Data Analytics Notes
67% (3)
Big Data Analytics Notes
16 pages
02 Querying Data On External Object Storage - v1 - 0 - DA016655
No ratings yet
02 Querying Data On External Object Storage - v1 - 0 - DA016655
11 pages
COMP3331 Assignment
No ratings yet
COMP3331 Assignment
10 pages
Radwag As.r
No ratings yet
Radwag As.r
2 pages
Ask Me (Ask4pc)
No ratings yet
Ask Me (Ask4pc)
2 pages
Razorpay
No ratings yet
Razorpay
7 pages
Bus Ticket System
No ratings yet
Bus Ticket System
15 pages
Nexis GC-2030 Operation Guide 221-79201
No ratings yet
Nexis GC-2030 Operation Guide 221-79201
144 pages
LAWO PI - MADI - SRC - en
No ratings yet
LAWO PI - MADI - SRC - en
2 pages
Adam Wilen, Justin P. Schade, Ron Thornburg Introduction To PCI Express A Hardware and Software Developers Guide PDF
0% (1)
Adam Wilen, Justin P. Schade, Ron Thornburg Introduction To PCI Express A Hardware and Software Developers Guide PDF
309 pages
Use of Deep Learning in Modern Recommendation System: A Summary of Recent Works
No ratings yet
Use of Deep Learning in Modern Recommendation System: A Summary of Recent Works
6 pages
Erp Performance As Intervening Variable To Financial Performance For Erp Implementation, Adherence To Coso, and GCG Implementation
No ratings yet
Erp Performance As Intervening Variable To Financial Performance For Erp Implementation, Adherence To Coso, and GCG Implementation
20 pages
Datasheet+Fech3+Fixed+03gf0348 2
No ratings yet
Datasheet+Fech3+Fixed+03gf0348 2
2 pages
Unit 6 Software Metrics
No ratings yet
Unit 6 Software Metrics
6 pages
Temperature Calibration: Applications Solutions
No ratings yet
Temperature Calibration: Applications Solutions
40 pages
CSNETWK - Machine Project Demo Kit T3 AY2023-2024
No ratings yet
CSNETWK - Machine Project Demo Kit T3 AY2023-2024
2 pages
Log
No ratings yet
Log
2 pages
All MCQ
No ratings yet
All MCQ
9 pages
James Hall
No ratings yet
James Hall
8 pages
Bit Info Nepal - Operating Systems - Bit204-2078
No ratings yet
Bit Info Nepal - Operating Systems - Bit204-2078
2 pages
Newsletter - Moms Club of Eugene
No ratings yet
Newsletter - Moms Club of Eugene
6 pages
ECSE 489 - Java Chat Client - Project Report
No ratings yet
ECSE 489 - Java Chat Client - Project Report
15 pages
Adding ABAP System To SOLMAN 7.1
No ratings yet
Adding ABAP System To SOLMAN 7.1
64 pages
This Is A File
No ratings yet
This Is A File
8 pages
ADITA
No ratings yet
ADITA
52 pages

03 - A Survey On OLAP

Uploaded by

03 - A Survey On OLAP

Uploaded by

A Survey on OLAP

978-1-5090-0612-0/16/$31.00 ©2016 IEEE

what sub-cubes are materialized. Molap has its own born

Parts Locations Time Sales

C. HOLAP 1. Group-by operator

from sales SQL query 3:

2. The problem with group-by operator 4. Drawback of Cube-by operator

4. OLAP Join Indexes High performance SAS server.

C. Indexing Performance Study V. OLAP IN DISTRIBUTED SCENARIO

D. Distributed OLAP Querying

REFERENCES [32] H Gupta, V Harinarayan, A Rajaraman,” Index selection for OLAP”,

You might also like