Unit II OLAP
Unit II OLAP
Analysis Services
Online analytical processing (OLAP) is a technology that organizes large business databases and supports
complex analysis. It can be used to perform complex analytical queries without negatively affecting
transactional systems.
The databases that a business uses to store all its transactions and records are called online transaction
processing (OLTP) databases. These databases usually have records that are entered one at a time. Often
they contain a great deal of information that is valuable to the organization. The databases that are used
for OLTP, however, were not designed for analysis. Therefore, retrieving answers from these databases is
costly in terms of time and effort. OLAP systems were designed to help extract this business intelligence
information from the data in a highly perform at way. This is because OLAP databases are optimized for
heavy read, low write workloads.
Semantic modeling
A semantic data model is a conceptual model that describes the meaning of the data elements it contains.
Organizations often have their own terms for things, sometimes with synonyms, or even different
meanings for the same term. For example, an inventory database might track a piece of equipment with
an asset ID and a serial number, but a sales database might refer to the serial number as the asset ID.
There is no simple way to relate these values without a model that describes the relationship.
Semantic modeling provides a level of abstraction over the database schema, so that users don't need to
know the underlying data structures. This makes it easier for end users to query data without performing
aggregates and joins over the underlying schema. Also, usually columns are renamed to more user-
friendly names, so that the context and meaning of the data are more obvious.
Semantic modeling is predominately used for read-heavy scenarios, such as analytics and business
intelligence (OLAP), as opposed to more write-heavy transactional data processing (OLTP). This is mostly
due to the nature of a typical semantic layer:
• Aggregation behaviors are set so that reporting tools display them properly.
• Business logic and calculations are defined.
• Time-oriented calculations are included.
• Data is often integrated from multiple sources.
Traditionally, the semantic layer is placed over a data warehouse for these reasons.
• Tabular. Uses relational modeling constructs (model, tables, columns). Internally, metadata is
inherited from OLAP modeling constructs (cubes, dimensions, measures). Code and script use OLAP
metadata.
• Multidimensional. Uses traditional OLAP modeling constructs (cubes, dimensions, measures).
The multi-Dimensional Data Model is a method which is used for ordering data in the database
along with good arrangement and assembling of the contents in the database.
The Multi Dimensional Data Model allows customers to interrogate analytical questions associated
with market or business trends, unlike relational databases which allow customers to access data
in the form of queries. They allow users to rapidly receive answers to the requests which they
made by creating and examining the data comparatively fast.
OLAP (online analytical processing) and data warehousing uses multi dimensional databases. It is
used to show multiple dimensions of the data to users.
It represents data in the form of data cubes. Data cubes allow to model and view the data from
many dimensions and perspectives. It is defined by dimensions and facts and is represented by a
fact table. Facts are numerical measures and fact tables contain measures of the related
dimensional tables or names of the facts.
On the basis of the pre-decided steps, the Multidimensional Data Model works.
The following stages should be followed by every project for building a Multi Dimensional Data
Model :
Stage 1 : Assembling data from the client : In first stage, a Multi Dimensional Data Model
collects correct data from the client. Mostly, software professionals provide simplicity to the client
about the range of data which can be gained with the selected technology and collect the complete
data in detail.
Stage 2 : Grouping different segments of the system : In the second stage, the Multi
Dimensional Data Model recognizes and classifies all the data to the respective section they
belong to and also builds it problem-free to apply step by step.
Stage 3 : Noticing the different proportions : In the third stage, it is the basis on which the
design of the system is based. In this stage, the main factors are recognized according to the
user’s point of view. These factors are also known as “Dimensions”.
Stage 4 : Preparing the actual-time factors and their respective qualities : In the fourth stage,
the factors which are recognized in the previous step are used further for identifying the related
qualities. These qualities are also known as “attributes” in the database.
Stage 5 : Finding the actuality of factors which are listed previously and their qualities : In
the fifth stage, A Multi Dimensional Data Model separates and differentiates the actuality from the
factors which are collected by it. These actually play a significant role in the arrangement of a Multi
Dimensional Data Model.
Stage 6 : Building the Schema to place the data, with respect to the information collected
from the steps above : In the sixth stage, on the basis of the data which was collected
previously, a Schema is built.
For Example :
1. Let us take the example of a firm. The revenue cost of a firm can be recognized on the basis of
different factors such as geographical location of firm’s workplace, products of the firm,
advertisements done, time utilized to flourish a product, etc.
Example 1
2. Let us take the example of the data of a factory which sells products per quarter in Bangalore.
The data is represented in the table given below :
2D factory data
In the above given presentation, the factory’s sales for Bangalore are, for the time dimension,
which is organized into quarters and the dimension of items, which is sorted according to the kind
of item which is sold. The facts here are represented in rupees (in thousands).
Now, if we desire to view the data of the sales in a three-dimensional table, then it is represented
in the diagram given below. Here the data of the sales is represented as a two dimensional table.
Let us consider the data according to item, time and location (like Kolkata, Delhi, Mumbai). Here is
the table :
3D data representation as 2D
This data can be represented in the form of three dimensions conceptually, which is shown in the
image below :
3D data representation
Relational OLAP
ROLAP servers are placed between relational back-end server and client front-end tools. To store and
manage warehouse data, ROLAP uses relational or extended-relational DBMS.
ROLAP includes the following −
Multidimensional OLAP
MOLAP uses array-based multidimensional storage engines for multidimensional views of data. With
multidimensional data stores, the storage utilization may be low if the data set is sparse. Therefore, many
MOLAP server use two levels of data storage representation to handle dense and sparse data sets.
Hybrid OLAP
Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of ROLAP and
faster computation of MOLAP. HOLAP servers allows to store the large data volumes of detailed
information. The aggregations are stored separately in MOLAP store.
OLAP Operations
Since OLAP servers are based on multidimensional view of data, we will discuss OLAP operations in
multidimensional data.
Here is the list of OLAP operations −
• Roll-up
• Drill-down
• Slice and dice
• Pivot (rotate)
Roll-up
Roll-up performs aggregation on a data cube in any of the following ways −
2 OLAP systems are used by knowledge OLTP systems are used by clerks, DBAs, or
workers such as executives, managers and database professionals.
analysts.
3 Useful in analyzing the business. Useful in running the business.
7 Provides summarized and consolidated data. Provides primitive and highly detailed data.
8 Provides summarized and multidimensional Provides detailed and flat relational view of data.
view of data.
The MOLAP engine in the application layer collects data from the databases in the data
layer. It then loads data cubes into the multi-dimensional databases. When the user makes a
query, data will move in a propriety format from the MDDBs to the client desktop in the
presentation layer. This enables users to view data in multiple dimensions.
Advantages
ROLAP
1. Database server: This exists in the data layer. This consists of data that is loaded into
the ROLAP server.
2. ROLAP server: This consists of the ROLAP engine that exists in the application
layer.
3. Front-end tool: This is the client desktop that exists in the presentation layer.
Let’s briefly look at how ROLAP works. When a user makes a query (complex), the ROLAP
server will fetch data from the RDBMS server. The ROLAP engine will then create data
cubes dynamically. The user will view data from a multi-dimensional point.
Unlike in MOLAP, where the multi-dimensional view is static, ROLAP provides a dynamic
multi-dimensional view. This explains why it is slower when compared to MOLAP.
Image Source: Tech Differences
Advantages
Disadvantages
This is an abbreviation for Hybrid Online Analytical Processing. This type of analytical
processing solves the limitations of MOLAP and ROLAP and combines their attributes. Data
in the database is divided into two parts: specialized storage and relational storage.
Integrating these two aspects addresses issues relating to performance and scalability.
HOLAP stores huge volumes of data in a relational database and keeps aggregations in a
MOLAP server.
The HOLAP model consists of a server that can support ROLAP and MOLAP. It consists of
a complex architecture that requires frequent maintenance. Queries made in the HOLAP
model involve the multi-dimensional database and the relational database. The front-user tool
presents data from the database management system (directly) or through the intermediate
MOLAP.
To process an OLAP query, select Tools, then Process Query , and then Option.
Since a document can contain multiple queries, the Process drop-down list has three processing options:
• Process Current—Processes the current object. In some cases more than one query may be processed, for
example, if a report references results sets from multiple queries. Process Current is the default selection
when using the toolbar button.
• Process All—Processes all the queries in the document.
• Process Custom—Opens the Process Custom dialog box so that you can indicate which queries to process
by selecting a query’s check box.
Interactive Reporting sends the query to the database and retrieve the data to the OLAPQuery section.
While the data is being retrieved, the Status bar displays a dynamic count indicating rate and progress of
server data processing and network transfer.