0% found this document useful (0 votes)
69 views9 pages

03 - A Survey On OLAP

Uploaded by

Sourabh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views9 pages

03 - A Survey On OLAP

Uploaded by

Sourabh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

A Survey on OLAP

K. Dhanasree C. Shobabindu
Dept of CSE, DRKIST Dept of CSE, JNTUA College of Engineering
Hyderabad, Telangana, India Anantapuramu, Andhra Pradesh, India
[email protected] [email protected]

Abstract--Online analytical processing is to-days major The remaining parts of the paper are organized as follows:
database technology that has completely changed the face of In section 2 we briefly discuss the classification of OLAP
decision support systems. Many of the enterprise real-time technologies. In section 3 we discuss the data accessing
analytical solutions are provided using most advanced OLAP methods. In section 4 we have discussed when and where to
methods. In this paper, we have presented the overview of the use these technologies. In section 5 we have discussed about
various OLAP technologies and their access paths. The focus of OLAP in distributed scenario. Finally section 6 concludes the
this paper is on OLAP in distributed scenario, where we pinned paper.
on the drawback of OLAPs natural indexing search. We designed
a new translated lattice called the pchrome lattice, whose nodes
are binary. We implemented the natural indexing on this II. OLAP TECHNOLOGIES
translated lattice and showed a drastic reduce in indexing search Organizations huge data is a critical resource which is in
space, search time and distributed communication cost. need of powerful tools to fetch queried information .OLAP is
one such powerful technology providing sophisticated tools
Keywords- MOLAP, ROLAP, HOLAP, B-tree, Bitmap, R-trees, R*- for an enterprise to meet its competitive goal. Currently there
trees, R-cube. are three dominant OLAP technologies:
• Multidimensional OLAP (MOLAP).
I. INTRODUCTION
• Relational OLAP (ROLAP).
In the past decades we have been using various database • Hybrid OLAP (HOLAP).
technologies to answer many of user queries either simple or
complex. The prominent use of the database technology is
seen in business enterprise where decision making is prior A. MOLAP
than transactions. Traditional database systems are In MOLAP the preprocessed data is aggregated and
transactional processing systems, which can access only few uploaded periodically in a multidimensional array structure
tuples for database reads and writes [1]. Their major called Data cube [4]. Basing on the dimensional hierarchies
drawback is they cannot handle the user decision making the data cube is divided into sub-cubes. For a data cube with n
queries. This is because decision making is an instant dimensions without hierarchies there can be a total of 2n sub
comparison of past data and present data and traditional cubes. With hierarchies defined the number of sub cubes
databases does not store any past data. To handle enormous increases. As the dimensions and dimensional hierarchies
past and present data and to support decision making queries increase the cube becomes larger with many sub-cubes. As
many of the enterprises are using an extended database such a molap query for a user requested sub-cube has to spend
technology called data warehouse. Data warehouses differ time for an on fly analysis. To make this on fly analysis faster
very much from the traditional database applications. Data what followed by molap is pre-computation. Pre-computation
warehouses are mainly used by major business enterprises, to is a generic support for short response times where some of
analysis their business trends and to track their business the sub-cubes are materialized [5]. Materialization is way
profits. Analysts use the data warehouse to extract the where some of the needed measures like sum, average are
business information that enables better decision making. This calculated pre hand and the values are stored in the sub-cubes.
type of interactive decision making process is provided by In molap all these measures are stored in arrays, referenced by
OLAP (On-line Analytical Processing) tools [2]. These OLAP dimensional names that are strings. Between the warehouse
applications mostly use only data reads for their decision and the user front end tools a Molap cube sits analyzing the
making. Real time complex analytical queries are answered user requested data. For a Molap cube with huge dimensional
using OLAP. hierarchies many of the smaller granules of the cube will be
The most commonly used OLAP technologies are left pre-computed. This is what is the dimensional cursity[6]
Multidimensional On-line Analytical Processing (MOLAP), of the data cube, where many sparse sub-cubes are generated.
Relational on-line Analytical Processing (ROLAP) and hybrid The main problem with sparsity is many of the olap
on-line Analytical Processing (HOLAP) [3].They are different methodologies will search through the sparse cube to identify
in their data processing capabilities. They have their own whether the user requested sub-cube is materialized or not.
supporting data accessing methodologies. Though they are This may increase the query waiting time. Research has
opposing technologies they are widely recognized by many of provided with many methodologies on which sub-cubes to
the today’s decision making enterprises. materialize [7]. To our knowledge there is less work done on

978-1-5090-0612-0/16/$31.00 ©2016 IEEE


2016 IEEE International Conference on Computational Intelligence and Computing Research

what sub-cubes are materialized. Molap has its own born


advantage with its natural array structure, which is flexible for
many of the olap accessing methods and analysis on present
and past data can be easily done. The outer cube layer contains
the present recent data and the inner sub-cubes contain the
past data. MOLAP uses many operations to perform an on fly
analysis. All the queries will be directly posed on the molap Fig. 3. Holap Architecture
array based lattice structure shown in figure 5. Using a string
matching technique the requested view can be fastly retrieved. D. OLAP OPERATIONS AND OPERATORS
Even the molap structure supports easy aggregation of data
along multiple dimensions. To perform the multidimensional analysis on fly and for
faster query responses OLAP includes the following basic
operations:
Roll-Up: Otherwise called as aggregation where data from low
levels to high levels is aggregated to provide a summarization
at the high levels.
Drill-down: Allows data navigation from higher level to lower
Fig. 1. Molap Architecture
level data.
Slicing: Describes the selection of data along single dimension
B. ROLAP of which the view is a table.
Dicing: Describes a selection of data along multiple
In rolap the warehouse data is stored in relational or dimensions whose view is again a sub cube.
extended-relational database. Rolap uses tables to store the
past and the present data [8]. There is a greater scalability with Using the above operations olap will present user requested
rolap server for large data sets. Between the data warehouse multidimensional analysis. Using Roll-up operations sub-
and the client front end tools the rolap server is used which is totals can be aggregated to grand totals, using drill-down the
a collection of multiple tables. The problem of sparsity does application can navigate from grand total to sub-totals. Using
not arise here because tables can be joined to return the user dicing operation a sub-cube can be selected. Using slicing a
query if needed with the multiple group bys. In rolap pre- cross section of the cube is selected i.e. a table can be selected.
computed data is not stored in advance. The aggregates from
multiple tables are calculated on fly. Considering rolap, user To perform the above operations OLAP uses two types of
requested aggregates may be in multiple tables. Here the rolap operators:
server follows a translation of user query to a multi-statement • The group-by.
SQL (Structured query language) query posed on multiple • The cube-by.
tables. On fly analysis from multiple tables may take much
time and this is the main drawback of rolap. TABLE 1. SALES DATA

Parts Locations Time Sales


quantity
A1 B1 C1 4
A1 B2 C1 2
A2 B3 C1 2
A3 B3 C2 4
A3 B3 C3 1
Fig. 2. Rolap Architecture

C. HOLAP 1. Group-by operator


The group-by is a usual relational operator. It typically
HOLAP combines the features of both MOLAP and operates by partitioning the relation into disjoint tuples and
ROLAP. It supports both of MOLAPs multidimensional then aggregating along the given dimension. Group by
structure and ROLAPs sql constructs. It uses the features of operator does not allow a direct aggregation along a column
MOLAP to address fast processing of user queries and uses i.e. rolling of sub-totals to a grand total or drilling down of
the features of ROLAP to address the processing of large data grand total to sub-totals [10]. For instance consider the sales
Holap stores the most recent data in molap to enhance data as shown in table 1, where location B1 has dimensional
faster access and stores past data in rolap. HOLAP addresses a hierarchy’s b1,b2.
complex query by dividing it into sub-queries [9]. The sub-
queries which span dense data sets are directed to MOLAP A query of the type
and the sub-queries which span sparse data sets are directed to SQL Query 1:
ROLAP. Select A,B,C,SUM(quantity)
2016 IEEE International Conference on Computational Intelligence and Computing Research

from sales SQL query 3:


group by B,A; Select A,B,C, Sum(quantity)
from sales
Query 1 shows grouped data based on the dimensions B, A. Cube -by A, B, C;
It shows total quantity of items grouped by the dimension B
with reference to the dimension A, but does not show sub- The above cube-by query calculates aggregates along all
totals on single dimension B or on single dimension A i.e. it possible dimensional combinations. Since there are 3
shows the quantity of distinct values in B or A i.e. B1=4, dimensions the cube-by operator calculates 23=8 possible
B2=2, B3=7 individually but cannot drill-down to B1 and give aggregates ABC, AB,AC,BC,A,B,C ,ALL and thus
sub-totals on hierarchies in B1 i.e. cannot give b1=2,b2=2. addressing the failed case of group-by operator. The cube-by
These types of sub totals may fetch the decision makers that operator along with where conditions can provide aggregates
cannot be provided by group-by operator. on dimensional hierarchies.

2. The problem with group-by operator 4. Drawback of Cube-by operator


The main problem with group-by operator is it will not The major drawback with cube-by operator is it more
provide multidimensional group-bys, even it cannot analyze expensive [11]. More expensive in the sense the cost of usage
the aggregates of dimensional hierarchies. The group-by is high. For roll-up or drill-down operations the cube by
operator cannot support drill down and rollup operations operator has to carry with huge lots of string cube by
which are meant for multidimensional analysis. Queries of the dimensions. This may increase the operator file size. In SQL
type: Display the total sales from the sub location b1 cannot be query 3 the cube by operator has to carry with all 3
answered. In order to provide a multidimensional aggregative dimensions A,B,C that are characters and thereby the cube
analysis the application has to perform the union of multiple operator occupies three bytes of search space. But in reality
SQL statements as follows: these dimensions are strings which when included with cube-
by operator may considerably increase the search space.
SQL Query 2: Carrying all these huge string data from outer layer of the olap
Select A,B,C, Sum(quantity) structure to inner layer may still increase the search space.
from sales This has made the cube operator to be less choosable though it
Group-by A,B,C; is advantageous over group-by operator. In the context of this
UNION survey we state a research problem:
Select A,B,C, Sum(quantity)
from sales Problem 1 description: Cant there be a better transformation
Group-by A,B; of the cube-by string dimensions to somewhat like binaries
UNION which exactly identifies the requested views and thus can
Select A,B,C, Sum(quantity) reduce the search space of the cube-by operator.
from sales
Group-by A;
III. OLAP ACCESSING METHODS
UNION
Select A,B,C, Sum(quantity) To increase the performance of olap, research has provided
from sales many accessing methods. In this paper we present our survey
Group-by B; on two major accessing methodologies:
UNION • OLAP Pre-computation.
Select A,B,C, Sum(quantity) • OLAP Indexing.
from sales
A. OLAP Pre-computation
Group-by C;
UNION Pre-computation is a way where parts of the whole cube are
Select A,B,C, Sum(quantity) materialized to provide fast query responses. The advent of
from sales warehouse technology has led to many sophisticated pre-
Group-by (); computing methods. Parts of data cube that are pre-computed
Even such type of union of SQL group-bys cannot address can also be termed as materialized views [12]. With pre-
analysis on dimensional hierarchies and this is the failure case computation the question is: which views to be materialized,
of group-by operator. and how to optimally materialize. Choosing which views to
materialize is based on three important factors: the query cost,
3. The cube operator view materialization cost and storage space. Sub-cubes
The cube operator is designed to address the drawback of requested by common queries are given priority for
SQL group-by. The union of multi SQL statements given in materialization. Sub-cubes most frequently requested for are
SQL query 2 can be replaced with a single cube-by operator as pre-computed.
follows:
2016 IEEE International Conference on Computational Intelligence and Computing Research

There are many approaches on how pre-computation is For instance with an initial sort of ABCD, the prefix group-
performed: bys ABC,AB,A can be pre-computed without actually sorting
them ,thus reducing additional sorts.
1. Multi-way
Multi-way array aggregation discussed in [13] pre- 4. Hashing
computes the aggregates using array as its basic structure and The hash based method is based on optimizations of cache
is a full cube computation method. It makes use of chunk results and scans [16]. Usual pre-computation methods incur
concept where the entire cube memory is partitioned into multiple scans of the dimensional attributes which is costly.
chunks. These chunks are then simultaneously aggregated For instance in one scan the aggregate ABC is pre-computed.
across multiple dimensions to pre-compute various sub-cubes. To compute AB, again we have to scan AB once, thus taking
The multi-way array aggregation is faster as it is done on the two scans of the same attributes.
molap structure using a direct array addressing.
Instead the hash based method caches the result to further
Figure 4 shows the multi-way array aggregation where reduce the scans. For example the hash based method
ABCD is a base cuboid. Memory chunking is done to fit maintains hash tables in memory where AB and AC can fit.
ABCD and from ABCD cuboids ABC,AB,A etc, can be Now in one scan of ABC both AB, AC can be pre-computed.
calculated allowing multiple aggregations across various
dimensions. 5. H-Cubing
A better cube computation is offered by H-cubing
ABCD discussed in [17].H-cubing computes on a tree like data
structure called the H-tree. From the lattice structure shown in
Figure 5 H-cubing constructs H-tree from which it computes
ABC BCD
the multidimensional aggregates. The advantage of H-cubing
is being in one level the method calculates the possible
AB BC aggregates within the same level before proceeding to the next
higher level.
A
6. Star Cubing
Fig. 4. Multi-way Star cubing discussed in [18] combines the features of
multi-way, BUC and H-cubing. It combines both top-down
The multi-way algorithm is infeasible for large number of and bottom-up computation approaches. From the lattice
dimensions, because the larger arrays may not fit into the structure shown in Figure 5 it constructs a star tree and
chunks. Even the method continues to compute unnecessary identifies the star nodes as the nodes not satisfying iceberg
aggregates without prior pruning them. conditions and prunes them. The advantage of star cubing is
being in one level the method even computes the aggregates of
2. BUC both lower and higher next levels using shared dimensions and
The bottom up construction (BUC) method addresses the simultaneously prunes the aggregates not satisfying iceberg
partial cube computation with iceberg conditions [14]. It conditions.
makes use of an apriori based pruning method where the cells
not satisfying a minimum threshold are pruned off to further B. OLAP Indexing
not be included in the pre-computation of other aggregates.
To support fast accessing to multidimensional aggregates
For the dimension A, if A is not satisfying the minimum the olap systems follow indexing. Many existing indexing
threshold then it cannot support to the aggregation of AB and methods are followed by both molap and rolap. We examine
ABC as well. BUC method is efficient in optimally utilizing each in the context of our survey.
the available memory by priorily pruning the unnecessary 1. Natural Indexing
aggregations.
Natural indexing also called array based indexing is
supported by MOLAP. The array structure of the MOLAP
3. Sorting
itself forms the natural indexing. Natural indexing is the only
The sort based methods are based on optimizations on the indexing method which is done on the storage layer of the data
sorted aggregates [15]. Usually any data warehouse model cube [19]. As the indexing is done on the storage layer it is
follows an order of the dimensional design. Irrespective of the faster and the requested views are retrieved in no time. The
query dimensional order the application has to sort query order lattice structure of the cube in Figure 5 presents arrays as
in accordance to the design order. The sort based method layered sub cubes at various levels. The end points of each
optimizes the sorts by priorily sorting the required group by layer represents one possible group-by of the dimensions
and then pre-computing the prefix group-bys from the initial which are usually strings .In natural indexing all these string
sort. nodes are stored in string arrays . Whenever user queries for
2016 IEEE International Conference on Computational Intelligence and Computing Research

an aggregate ABC , these aggregates are directly mapped on to the above query the group-by ABC can be assigned a BEx
the end points of the layers using natural indexing and the variable of the type SAPBWOODPP2. Then using the cognos
corresponding sub cube which is highlighted in Figure 5 is locate option the BEx variables are matched.
retrieved. Thus MOLAP’s natural indexing offers improved
performance by directly indexing on the structure of MOLAP. In the context of our survey here we project the drawback
But many enterprises prefer other indexing techniques as it of BEx queries: BEx variables are also long strings and using
doesn’t suit for large data sets. The major drawbacks with data string comparisons the search index file size may be large and
cube natural indexing are: there by the search time too long. We want to go with a
• When number of dimensions is more the cube becomes method where in the transformed query the user requested
sparser [20], that means several cells that represent string group-bys are transformed to unique binaries; there by
particular attribute combinations will not contain any the search index file size and search time is reduced.
aggregated data. There by the natural indexing search
for which sub-cubes are pre-computed becomes time 2. Tree Based Indexing And Variations
consuming. Both MOLAP and ROLAP supports tree based indexing
• The natural indexing search directly uses the user methods. One of the traditional tree based indexing is the B-
group-by dimensions, which is string data. This type of tree [22]. A B–tree indexing includes sub trees corresponding
search using string data even increases storage space to each dimension of the data cube. As the values of the cube
for large dimensions. dimensions are unique, B-tree uses these dimensions as index
pointers that point to the sub trees. By tracing the pointers,
data can be easily retrieved. For an 8 bytes column the B-tree
ALL…………………….level 0 index file size is 326 MB and the construction time is
1580s.To build index on a large column B-tree is expensive in
terms of space and construction time. The main drawback with
B-tree indexes are rebalancing the tree is needed with updates.
A B C D …level 1
Other popular tree based indexing structures that are supported
by both MOLAP and ROLAP technologies are R-trees [23],
aR-trees [24].The R-tree indexing supports complex range
queries to some extent. Much research was done on R*-trees to
extend them into structures like Ra*-trees [25], Hilbert R-trees.
AB AC AD BC BD CD…l2
All of these uses more sophisticated update algorithms; they
can answer complex range queries; they can dynamically
rebalance the tree structure whenever updates are performed.
The major drawbacks of tree based indexing are:
ABC ABD ACD BCD..level 3
• Huge storage.
• Supports only few dimensions.
3. Bitmap Indexing And Variations
ABCD…………………level 4 Bitmap indexing was introduced to enhance the
performance on various query types [26]. For each attribute of
Fig. 5. Lattice structure of cube with 4 dimensions the table one bitmap index is associated. Each row of the
bitmap vector is given a row-id starting from 0. Rows will
For instance consider the sql query
have distinct attribute values .The basic idea behind bitmap
SQL query 4: indexes is to use a string of binary numbers to indicate
Select * from sales whether the indexed attribute in a table is equal to a specific
Cube by A,B,C; value or not. If the bit is set to 1, it indicates that the row with
the corresponding row-id contains the key value; otherwise the
Here the molap indexing directly indexes the outer view ABC bit is set to 0. Complex queries on one or more dimensions
which is directly fetched from the string array using string can be answered by intersecting the bit maps over multiple
comparison technique and the view highlighted in figure 5 is dimensions and also by using AND/OR operations. The major
retrieved. This type of string comparison may take huge index advantages of bitmaps are:
file size and high comparison time and even the retrieval time
is bit increased. • Overcomes the storage limitation of B-trees.
• Bitmaps are retrieval efficient for low cardinalities.
To reduce the index file size and high string comparison • Sparse data can be efficiently handled.
time the MOLAP based model IBM Cognos8 uses a • More CPU efficient because of their simple
transformer module where user queries are transformed to representation.
BEx queries (Business Explorer queries) [21]. In the BEx The major disadvantages of bitmaps are:
queries the user requested string group-bys are represented • Efficiency decreases for high cardinalities.
using BEx variables that are also long strings. For example in • AND/OR operations are expensive.
2016 IEEE International Conference on Computational Intelligence and Computing Research

• As the dimensions increases more bitmap vectors are TABLE 2. OLAP TECHNOLOGIES
needed; results in overhead of storage space. OLAP Features Adopted by
• Cannot support huge reads/updates. Technology

To address this storage overhead encoded bit maps [27], High Microsoft SQL
MOLAP performance, less server 2005, Essbase
hybrid bitmap methods are introduced [28].The encoded scalable for huge server from
bitmap indexing can be used for large cardinalities. The basic dimensions hypersion
idea of encoded bitmap indexing is to encode the attribute
domain. There by we can reduce the number of bit vectors and Low performance, Microsoft SQL
thus reduce the storage space. ROLAP More scalable for server 2005,
huge dimensions Micro strategy’s Dss
Other variations of bitmaps: projection bitmap, bit-sliced server, Informixs
indexes are discussed in [29]. meta cube

4. OLAP Join Indexes High performance SAS server.


HOLAP and scalable for huge
Most popular olap index is the join index [30].All the dimensions.
traditional indexing methods discussed above indexes by
mapping a column value to a group of rows having that value B. Choice between OLAP Indexes
with in a same relation. Further every relation need to be
indexed separately and if needed with tuples from two are The main problem with OLAP indexing is still today there
more tables traditional indexing methods increase the cost of is no definite guideline for an analyst to choose best suited
joins. This may increase the index size with increased indexing method:
relations. In contrast the join index provides a grouped index
on two or more tables. They contain indexed records that • For a DW system with very few dimensions a MOLAP
contain joinable rows of relations. Thus making it easy to natural indexing is advisable. For an enterprise which
identify the joinable tuples and without going for further is at it’s begin set a MOLAP natural indexing can be
costly joins. adopted.
• For a DW system which is frequently updated tree
IV. CHOICE BETWEEN OLAP TECHNOLOGIES indexing is advisable because tree indexing does not
Not all real word enterprises are using either strictly need rebalancing. For an enterprise using only few
MOLAP or strictly ROLAP or HOLAP. Their choice is dimensions B-trees are advisable. To answer complex
varying according to the potential benefits of OLAP like range queries the variation of tree indexing, R*_ trees
which improves decision making, which provides accurate are advisable [32].
analysis, provides all user required information, that improves • Bitmap index is best suited for columns having less
working efficiency, and that increases user productivity. number of distinct values. Bit map indexing supports
more number of dimensions than B-trees. For a DW
A. Choice Between MOLAP / ROLAP / HOLAP system which is not frequently updated a bitmap
indexing is advisable.
Table 3 shows various OLAP indexing techniques their
• MOLAP best suits non sophisticated users as it uses features and the enterprises implementing them.
user friendly graphical visualization techniques;
whereas use ROLAP for sophisticated users [31]. TABLE 3. OLAP INDEXING
• If the users are needed with consistent information for
a period of time MOLAP is preferred. If the
OLAP Features Adopted by
requirement changes frequently ROLAP should be Indexing
used because of its flexible query capability. Easy and fast Microsoft
• For decision making on past data MOLAP should be Natural retrieval, supports corporation, IBMs
adopted. Whereas for decision making on current data Indexing few dimensions Cognos.
ROLAP is adopted.
Supports huge IBM SPSS,
• Because of the easiness of MOLAP, it is recommended Tree dimensions, huge Oracle,
at the beginning of an enterprise. After considerable indexing storage space, and Red Brick
decision making experience, a ROLAP system is more retrieval time
preferred because of its flexibility and ability to handle when compared to
complex queries. natural and bit map.
Easy, supports huge Inter system
Bit Map dimensions, less Corporation,
Table 2 shows various OLAP technologies their features and Indexing storage when Oracle, Red Brick,
the enterprises implementing them. compared to tree DB2.
index
2016 IEEE International Conference on Computational Intelligence and Computing Research

C. Indexing Performance Study V. OLAP IN DISTRIBUTED SCENARIO


Usually data warehouses contain real world’s huge
Here we study the performance of various olap indexing amounts of data that must be analyzed. Most of to-days OLAP
methods by comparing the index file size and search time on applications work on data warehouse with a centralized
varied dimensions. For example consider a warehouse sales on structure in which a single database contains huge amounts of
three dimensions A,B,C. suppose each dimension is included data. As data warehouse tend to be extremely large, the
with 10000 tuples: consider the query centralized data warehouse is very expensive. This has led to
SQL query 5: the distributed scenario where the large data warehouse can be
partitioned and distributed across various locations. All the
Select count(*) from sales OLAP technologies scale considerably well in the distributed
Where A=’X’ scenario.
Group by A,B,C;
A. Distributed MOLAP
The index file size in units of memory bytes, the index
construction time and the query retrieval time in units of Many current distributed OLAP systems use MOLAP
seconds is compared as shown in Table 4. approach. The main reason is its fast execution of OLAP
queries. But MOLAP does not scale well in case of more
TABLE 4. INDEX COMPARISON number of dimensions. But many of the distributed enterprises
are following a vertical fragmentation of the dimensions,
Index File Construction Retrieval which allows few dimensions to be at each location and then
Type Size Time(µs) Time(µs)
going for a join of these fragments for user queries. The
(bytes)
Array 10 1200 10 scenario even supports centralized accessing methods like
natural indexing, B-tree indexing and bitmap indexing.
Btree 10 1000 40
B. Distributed ROLAP
Bitmap 6 800 20
Though many vendors are sacrificing scalability for
performance; to support current huge data warehouse
enterprises are adopting scalable ROLAP. Indexing by R-trees
Bitmap index requires less storage space than B-tree as and R*_ trees are still supported. Many of the to-days ROLAP
shown in Table 4. As the number of dimensions increases the enterprises are with a novel distributed indexing called the
B-tree indexing fails to scale to the increased file size. Figure RCUBE indexing [33]. RCUBE indexing is fast and is a
6 shows the search time comparisons of various indexing combination of packed R-trees with distributed stripping and
methods. The natural indexing presents fast search for less Hilbert curve based data ordering.
number of dimensions, but as the dimensions increase the The supporting features of RCUBE are:
search time also increases. This is because the natural
indexing search is carried out with huge dimensional group- • Low communication volume.
bys that are string data and searching with string data takes • Scalable in terms of data sizes and dimensions.
comparatively more time. The B-tree and Bitmap search scales
well even for high dimensions. C. Distributed HOLAP
Many of current distributed systems use HOLAP. The
distributed approach of MOLAP stores the frequently
requested sub-cubes in MDDB (Multi dimensional data base)
and less popular parts on a remote RDB (Relational Database)
[34]. In addition to its flexibility to larger data sets, distributed
HOLAP provides other features like:
• Caching - The distributed HOLAP saves the results of
a query so that it can be reused later.
• Logging- The distributed HOLAP creates a log file
where the information about each query is stored.

D. Distributed OLAP Querying


The distributed nature of OLAP architecture results in the
following costs [35]:
• Query processing cost.
• Communication cost.
Fig. 6. Search time comparison
2016 IEEE International Conference on Computational Intelligence and Computing Research

E. Query Processing Cost surveyed that in distributed scenario OLAP redirects the query
in a translated form which includes the group by attributes that
Early olaps are criticized for being inefficient in handling are strings. Communicating string group-bys to more than one
complex queries with huge operations. The efficiency of node may increase the communication cost.
distributed olap is measured in the way the query is optimized
.Query optimization is a way in which complex queries are The GMDJ relations discussed above are used to count the
transformed to include less cost operations. Many of the number of query redirects to various distributed nodes. While
distributed olaps follow query optimization to reduce the redirecting these GMDJ relations, the SKALLA system
processing cost by using transformation mechanisms in the includes reduced base table of the original query thus reducing
search space. A desirable optimization is the one which incurs the communication cost. But the reduced base relations still
less cost by reducing the search space and minimize the includes with the string dimensions which may increase the
response time. communication cost. We are now with a problem of:
The distributed olap optimizations include many Problem 2 description: Can there be a better translation of
transformation techniques where the original query is group-bys from string data to somewhat like binary, there by
translated to some sort of algebraic expressions which communicating binaries instead of strings decrease the
represent the original query. Evaluation of the user query communication cost. We started our work by combining these
using these algebraic expressions is of less cost when two problems and planning to publish as our future extension
compared to evaluation of the original query. Many of the to distributed OLAP technologies. Using this translation of
distributed query evaluation techniques are successful in cube-by dimensions to binaries we can address the problem of
minimizing the search time but failed to reduce the search cube-by operator and can make molap less costly thus can
space. make the molap technology to be adopted by all.
The SKALLA system discussed in [36] uses
Multidimensional Join (MDJ) and Generalized MDJ (GMDJ) VI. CONCLUSIONS AND FUTURE WORK
operators for expressing olap queries. The GMDJ operator
optimizes the complex olap queries by separating the In this paper we discussed about prominent OLAP
aggregate functions, the definitions and the dimensions from technologies and their accessing methods. Though MOLAP
the complex query into operator notations that are of less cost. and ROLAP are different in features both are considerably
While separating the query into GMDJ expressions the extending their services to real time decision making. At the
SKALLA system still includes the string dimensions of the beginning many enterprises are adopting MOLAP then after a
cube by query. These types of GMDJ expressions with string better acquaintance with the usage they are switching to
dimensions may decrease the query response time but the ROLAP. We provided a survey on various standard olap
search space which includes the string dimensions may still be accessing methods and their disadvantages. In the context of
large. our survey we projected on MOLAPs advantage of fast search
and retrieval. Because of molaps dimensional cursity problem
For instance let A, B be two table, f1,f2…fn be the list of many enterprises are moving to rolap Even the distributed olap
aggregate functions, a1,a2…an and b1,b2….bn be the suffered from increasing the communicating cost. Research in
dimensional attributes of A,B. The GMDJ expression is also a data cube technology is still arriving with new indexing
relation of the type (f1Aa1, f2Aa2…..fnAan, f1Bb1….fnBbn). methods. Some of these methods are still under our study and
Since the search is done using these GMDJ expressions that we will project them in our future work.
include the dimensions and dimensional hierarchies
We are working towards the problem of making MOLAP
Aa1,Aa2…,Bb1,Bb2…. ,that are strings , increases the search technology to be used in such a way to reduce sparsity and
space. render fast search; and also communicating the group-bys in
F. Communication Cost the distributed molap cube architecture so as to reduce the
communication cost.
A decreased communication cost increases the efficiency
of the OLAP technology. As a future enhancement we want to map molap lattice
structure to a compressed lattice structure whose nodes are
For example consider the SQL query: binaries rather than strings. We want to go with a query
transformation mechanism which translates the string cube-by
SQL query 6: dimensions to the binaries which exactly represent the lattice
Select A, B, C, sum(s) nodes. Thus the search can be carried on the compressed
Cube by A, B, C,D; lattice with binaries there by reducing the query retrieval time
as well as search space. In the distributed olap architecture if
The query is a request for the group-by with 4 attributes the cube-by view is not present at a location then instead of
A,B,C,D that are strings. This group-by represents a communicating the string cube-by dimensions our method
materialized view of the whole MOLAP cube shown in figure communicates binaries that are empirically same as the
5. In a distributed scenario if this view is not present at a node, requested view and thereby reducing the communication cost.
then the query has to be redirected to other nodes. We have
2016 IEEE International Conference on Computational Intelligence and Computing Research

REFERENCES [32] H Gupta, V Harinarayan, A Rajaraman,” Index selection for OLAP”,


Data engineering,IEEE,1997.
[33] F Dehne ,T Eavis, A Rau-chaplin, ”RCube: Parallel multidimensional
[1] Hasan H and Hyland P, “Using OLAP and Multidimensional Data for
Rolap indexing” ,International journal of data warehousing and mining.
Decision Making”, IT Professional, September-October 2001,3(S),44-50
[34] A Wrinberger, M ender,” The power of OLAP in Multidimensional
,IEEE 2001.
world”,SUGI,2000.
[2] S. Agarawal, R. Agrawal, P.M .Deshpande, A. gupta, J. F. Naughton, R.
[35] A Bauer, W Lehner,” On solving the view selection problem in
Ramakrishnam, and S. Sarawagi,” On the Computation of
distributed data warehouse architecture”, scientific and statistical
Multidimensional Aggregates” ,VLDB,1996.
databases,IEEE,2003.
[3] Surajit Chaudhuri, Umeshwar Dayal, Venkatesh Ganti, “Database
[36] Michael O Akinde, Michael H Bohlen, Theodore Johnson,Lake V S
Technology for Decision Support Systems”, IEEE,2001.
Lakshmanan, Divesh Srivastava,” Efficient Olap query processing in
[4] S. Chaudhuri, U. Dayal , “An over view of Data warehousing and
distributed data warehouses”, Information systems,2003.
OLAP Technology”, ACM ,SIGMOD,1997.
[5] V Poosala, V Ganti,” Fast approximate query answering using pre-
computed stastics”, Data engineering proceedings,1999.
[6] D L Donoho,”High dimensional data analysis: The curse and blessings
of dimensionality”, AMS math challenges,2000. ABOUT THE AUTHORS
[7] E Barlis, S Parboschi, E Teniente,” Materialized view selection in a
multidimensional database”,VLDB,1997.
[8] K. Morfonios, S. Konakar, Y l Loanidis,”Rolap implementation of the
Dr. K. Dhanasree received her Ph.D degree in
data cube ”, ACM computing surveys,2007. Computer Science and Engineering from JNTU,
[9] C Salka,”Ending the rolap/molap debate: usage based aggregation and Anantapur. Her research interests include Data
flexible Holap”, Data engineering ,1998. Mining, Database Security, and Network
[10] J. Gray ,S. Chaudhuri, A. Bosworth, A Layman,” Data cube: A
relational aggregation operator generalizing group-by, cross tab and sub
Security.
totals”, Data mining and knowledge discovery,Springer,1997.
[11] W, wang, J feng, Hlu, JXyu, “ Condensed Cube : An effective approach
to reducing data cube size”, Data engineering, 2002.
[12] H. Karloff, M Mihail,” On the complexity of the view selection Dr. C .Shoba Bindu, received her Ph.D degree
problem”, ACM SIGMOD,1999. in Computer Science and Engineering from
[13] Y Zhao, PM Deshpande, JF Naughton,” An array based algorithm for
simultaneous multidimensional aggregates”, ACM SIGMOD,1997.
JNTU, Anantapur. She is now extending her
[14] K Beyes, R Rama Krishnan,” Bottom-Up computation of sparse and services as Professor, Dept. of CSE, JNTUA.
iceberg cube”, ACM SIGMOD, 1999. She has guided many external and internal
[15] R Agrawal, A Gupta, S Sarawagi,” Database system and method projects and has good contributions in many of reputed
employing data cube operator for group by operations”, US patent-
5832475,1998.
journals. Her research interests are Mobile and Adhoc
[16] RT Ng, A Wagner, Y yin,” Iceberg-cube computation with PC clusters”, Networks, Network Security, Data Mining and Cloud
ACM SIGMOD,2001. Computing.
[17] X Li, J Han, H Gonzalez,” High dimensional olap a minimal cubing
approach”, ACM,2004.
[18] D Xin, J Han, X Li, BW Wah,” Star-Cubing: computing iceberg cubes
by top-down and bottom-up integration”, ACM,2003.
[19] S Choudhuri, U Dayal, V Ganti,” Database technology for decision
support systems”,Data engineering,IEEE,2001.
[20] JS Vitter, M Wang,” Approximate computation of multidimensional
aggregates of sparse data using wavelets”, ACM SIGMOD,1999.
[21] IBM Cookbook for IBM Cognos 8.4 for use with SAP Netweaver
business warehouse.
[22] R Bayes,” The universal B-tree for multidimensional indexing: general
concepts”, World wide computing and its applications, springer,1997.
[23] A Guttman, ”R-trees: A dynamical index structure for spatial searching”,
ACM SIGMOD,1984
[24] D Papadias, P kalnis, J Zhang, Y Taw,” Efficient olap operations in
spatial data warehouses”, Advances in spatial and temporal
databases,springer,2001.
[25] M Jurgens ,HJ lenz,” the Ra*-tree:an improved R*-tree with
materialized data for supporting range queries on OLAP data”, ACM
sigmod ,1998.
[26] CY Chan, YE Loannidis ,”Bitmap index design and evaluation ”,ACM
sigmod,1997.
[27] Mc Wu, AP Buchman ,”Encoded bitmap indexing for data warehouses”,
Data engineering,IEEE,1998.
[28] K Wu, EJ Otoo, A Shoshani ,”Compressing bitmap indexes for faster
search operations”, scientific and statistical database,IEEE,2002.
[29] P O Neil, D Quass,” Improved query performance with variant indexes”,
ACM sigmod,1997.
[30] K Aouiche ,J Dasmont ,O Boursaid,” Automatic selection of bitmap
join indexes in data warehouses”, Data warehouse and knowledge
discovery, springer,2005.
[31] N Gorla,,” Features to consider in a data warehousing system”,
communications of the ACM, 2003.

You might also like