0% found this document useful (0 votes)
161 views107 pages

Unit 3 - OLAP

Uploaded by

PARAZZI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
161 views107 pages

Unit 3 - OLAP

Uploaded by

PARAZZI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 107

Unit 3

OLAP

1
Objectives
OLTP
OLTP Applications, benefits
OLTP benchmarks
Data partitioning in OLTP
Comparison between OLTP and OLAP
Multi-Dimensional Data Model: Data Cube
OLAP types and operations
Data modeling: Star and Snowflake
schema
Denormalization

2
OLTP
Online transaction processing,
or OLTP, is a class of 
information systems that facilitate
and manage transaction-oriented
applications, typically for data
entry and retrieval 
transaction processing.
OLTP has also been used to refer
to processing in which the system
responds immediately to user
requests. 3
OLTP
Online transaction processing (OLTP)
involves gathering input information,
processing the information and updating
existing information to reflect the
gathered and processed information.
Most organizations use a database
management system to support OLTP
OLTP is carried in a client server system
On line transaction process concerns
about concurrency and atomicity
4
OLTP applications
Online transaction processing
applications are high throughput and
insert or update-intensive in database
management.
An automated teller machine (ATM) for a
bank is an example of a commercial
transaction processing application. 
These applications are used concurrently
by hundreds of users. The key goals of
OLTP applications are availability, speed,
concurrency and recoverability.
Online banking is completely based on
online transaction processing systems. 5
RDBMS used for OLTP
Database Systems have been used
traditionally for OLTP
◦ clerical data processing tasks
◦ detailed, up to date data
◦ structured repetitive tasks
◦ read/update a few records
◦ isolation, recovery and integrity are critical
The data warehouse and the OLTP data
base are both relational databases.
However, the objectives of both these
databases are different.
6
Online transaction processing systems (Advantages)

Easy and best solution for online shoppers.


These systems are most efficient and have
excellent response times.
Very easy to use; as simple as fill a form and
the rest will be taken care of by the web and
database servers.
Credit cards are also well-handled by these
systems.
You can access anything on the web and choose
to buy it because all financial transactions
methods are supported by these systems.

7
Online transaction processing
systems (Disadvantages)
At times, there occur millions and millions of requests at a time which gets
difficult to handle.
During purchases even if the servers hang for few seconds a large number
of transactions get effected, in turn effecting the organizations reputation.
Databases store all user data and account information, if these servers are
hacked, it could lead to financial and personal problems (theft).
In case of hardware failures of the online transaction processing systems,
visitors of a website get in trouble and their online transactions get
effected.
Online transaction processing involves a lot of staff working in groups to
maintain inventory.
These online transaction systems impose processing costs on the buyers
and sellers as well.
The fundamental of operation of online transaction systems is atomicity.
Atomicity ensures that if any step fails in the process of a transaction, the
entire transaction must fail, due to which the same steps have to be
repeated again and again while filling forms which cause dissatisfaction
among buyers.
Electricity problem is another issue, i.e. if there is a shortage in electric
supply additional backup facilities like generators and related hardware is
a must. 8
OLTP benchmarks
The Transaction Processing
Performance Council (TPC) is the
benchmark to measure the
performance and
price/performance of transaction
processing systems

9
TPC–C benchmark
The term transaction is often applied to a wide
variety of business and computer functions
 A transaction could refer to a set of operations
including disk read/writes, operating system calls,
or some form of data transfer from one subsystem
to another
TCP-C is a mixture of read-only and update
intensive transactions that simulate the activities
found in complex OLTP application environments.
A typical transaction, as defined by the TPC, would
include the updating to a database system for
such things as inventory control (goods), airline
reservations (services), or banking (money).

10
TPC–C benchmark
In these environments, a number of
customers or service representatives input
and manage their transactions via a terminal
or desktop computer connected to a
database.
Typically, the TPC produces benchmarks that
measure transaction processing (TP) and
database (DB) performance in terms of how
many transactions a given system and
database can perform per unit of time, e.g.,
transactions per second or transactions per
minute.
11
TPC-C Benchmark Bench
Example
Workload consists of five OLTP transaction types.
New Order - Enter new order from customer. (45%)
Payment – update customer balance to reflect a
payment. (43%)
 Delivery – deliver orders.(4%)
 The Delivery business transaction consists of
processing a batch of 10 new (not yet delivered) orders.
Order Status- retrieve status of customers most recent
order. (4%)
Stock – monitor warehouse inventory. (4%)
The Stock-Level business transaction determines the
number of recently sold items that have a stock level
below a specified threshold

12
Data partitioning in OLTP
Scalability – is the property of system which
can accommodate changes in transaction
volume without affecting the performance.
Partitioning is a common technique used for
scaling databases, particularly for scaling
updates, by distributing the partitions across a
cluster of nodes,and routing the writes to their
respective partitions.
Data Partitioning is also the process of logically
and/or physically partitioning data into
segments that are more easily maintained or
accessed.

13
Different partitioning strategies

Vertical partitioning

Horizontal partitioning
◦ Range partition
◦ Hash partition
◦ List partition
Vertical Partitioning
Resumes SSN Name Address Resume Picture
234234 Mary Huston Clob1… Blob1…
345345 Sue Seattle Clob2… Blob2…
345343 Joan Seattle Clob3… Blob3…
234234 Ann Portland Clob4… Blob4…

T1 T2 T3
SSN Name Address SSN Resume SSN Picture
234234 Mary Huston 234234 Clob1… 234234 Blob1…
345345 Sue Seattle 345345 Clob2… 345345 Blob2…
...
15
Horizontal Partitioning
Customers
CustomersInHouston
SSN Name City Country
SSN Name City Country
234234 Mary Houston USA
234234 Mary Houston USA
345345 Sue Seattle USA CustomersInSeattle
345343 Joan Seattle USA SSN Name City Country
234234 Ann Portland USA 345345 Sue Seattle USA
-- Frank Calgary Canada 345343 Joan Seattle USA
-- Jean Montreal Canada
CustomersInCanada
SSN Name City Country
-- Frank Calgary Canada
-- Jean Montreal Canada
16
Types of Horizontal
Partitioning

17
Range partitioning
Range partitioning maps data to
partitions based on ranges of values of
the partitioning key that you establish
for each partition. It is the most
common type of partitioning and is
often used with dates. For a table with a
date column as the partitioning key, the
January-2005 partition would contain
rows with partitioning key values from
01-Jan-2005 to 31-Jan-2005.

18
List partitioning
List partitioning enables you to explicitly control
how rows map to partitions by specifying a list of
discrete values for the partitioning key in the
description for each partition.
E.g. a warehouse table containing sales
summary data by product, state, and
month/year could be partitioned into
geographic regions.
 The advantage of list partitioning is that you
can group and organize unordered and unrelated
sets of data in a natural way.
19
Hash partitioning
Hash partitioning maps data to partitions
based on a hashing algorithm that Oracle
applies to the partitioning key that you
identify. The hashing algorithm evenly
distributes rows among partitions, giving
partitions approximately the same size.
Hash partitioning is the ideal method for
distributing data evenly across devices. Hash
partitioning is also an easy-to-use alternative
to range partitioning, especially when the
data to be partitioned is not historical or has
no obvious partitioning key.

20
Online Analytical
Processing(OLAP)
 OLAP is a category of software tools that provides
analysis of data stored in a database.
 OLAP is a category of applications and
technologies for collecting, managing,
processing, and presenting multidimensional data
for analysis and management purposes.
 OLAP tools allow the user to query, browse, and
summarize information in a very efficient,
interactive, and dynamic way.
Product

Data
Warehouse
Regio
n

Time
Online analytical processing
(OLAP)
Multidimensional data analysis
◦ 3-D graphics, Pivot Tables, Crosstabs, etc.
◦ Compatible with Spreadsheets & Statistical
packages
◦ Advanced Data Presentation Functions
Advanced Database Support
◦ Access to many kinds of DBMS’s, flat files,
and internal and external data sources
◦ Support for Very Large Databases
◦ Advanced data navigation
Easy-to-use end-user interfaces
Support Client/Server architecture 22
Online Analytical Processing
(OLAP)
 A widely adopted definition for OLAP used today in five key
words is:
Fast Analysis of Shared Multidimensional Information
(FASMI).
 Fast refers to the speed that an OLAP system is able to
deliver most responses to the end user.
 Analysis refers to the ability of an OLAP system to manage
any business logic and statistical analysis relevant for the
application and user. In addition, the system must allow
users to define new ad hoc calculations as part of the
analysis and report without having to program them.
 Shared refers to the ability of an OLAP system being able to
implement all security requirements necessary for
confidentiality and the concurrent update locking at an
appropriate level when multiple write access is required.
 Multidimensional refers an OLAP system must provide a
multidimensional view of data. This includes supporting
hierarchies and multiple hierarchies.
Online Analytical Processing
(OLAP)
Implemented in a multi-user
client/server mode
Offers consistently rapid response
to queries, regardless of database
size and complexity
OLAP helps user to synthesize
enterprise information and analyze
historical data
Operational v/s Information
System
Features Operational Information
(OLTP) (OLAP)
Characteristics Operational processing Informational processing
Orientation Transaction Analysis
User Clerk,DBA,database Knowledge workers
professional
Function Day to day operation Decision support
Data Current Historical
View Detailed,flat relational Summarized,
multidimensional
DB design Application oriented Subject oriented
Unit of work Short ,simple transaction Complex query
Access Read/write Mostly read
Operational v/s Information
System
Features Operational Information
(OLTP) (OLAP)
Focus Data in Information out
Number of records Tens, hundreds millions
accessed
Number of users thousands hundreds
DB size 100MB to GB 100 GB to TB
Priority High performance,high High flexibility, end-
availability user autonomy
Metric Transaction throughput Query throughput
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response

Data Mining: Concepts and


4/5/20 Techniques 27
OLTP vs. OLAP
1.OLTP systems require high concurrency, reliability, locking
which provide good performance for short and simple OLTP
queries.
An OLAP query is very complex and does not require these
properties. Use of OLAP query on OLTP system degrades its
performance.

2.An OLAP query reads HUGE amount of data and generates the
required result. The query is very complex too. Thus special
primitives have to provided to support this kind of data
access.

3.OLAP systems access historical data and not current volatile


data while OLTP systems access current up-to-date data and do
not need historical data.
Multi-Dimensional Data Model: Data
Cube
Multidimensional data model views data in
the form of a data cube
A data cube allows data to be modeled and
viewed in multiple dimensions
Dimensions are entities with respect to
which an organization wants to keep records
such as time, item, branch, location etc
◦ Dimension table gives further descriptions about
dimension e.g. time (day, week, month, year etc)
◦ Fact table contains measures and keys to each of
the related dimensions tables e.g. dollars sold
Multi-Dimensional Data Model: Data
Cube
A cube is a visual representation of a
multidimensional table and has just three
dimensions: rows, columns and layers.
OLAP databases are referred often as “cubes”
since they have a multidimensional nature
OLAP cubes are easy to create and manipulate
Users can have multiple cubes for their
business data: one cube for customers, one
for sales, one for production, one for
geography, etc.
Multi-Dimensional Data Model:
Data Cube
Data cube
This multi-dimensional data can be represented
using a data cube as shown below.
This figure shows a 3-Dimensional dataModel.
X –Dimension : Item type
Y –Dimension : Time/Period
Z –Dimension : Location
Each cell represents the items sold of type ‘x’, in
location ‘z’ during the quarter ‘y’.
This is easily visualized as Dimensions are 3.
What if we want to represent the store where it
was sold too?
We can add more dimensions. This makes
representation complex.
Data cube is thus a n -dimensional data mode 32
Data cube
Salesvolume as a function of product, month, and
region
Dimensions: Product, Location, Time

33
OLAP Operations
OLAP provides a user-friendly environment for
interactive data analysis.
A number of OLAP data cube operations exist
to materialize different views of data, allowing
interactive querying and analysis of the data.
The most popular end user operations on
dimensional data are:
Roll-up
Drill-down
Slice and dice
Pivot (rotate)

34
Drill Up (Roll up)
Roll-up performs aggregation on a data cube in any
of the following ways:
By climbing up a concept hierarchy for a dimension
By dimension reduction

Product

Category e.g Electrical Applianc

Sub Category e.g Kitchen


Region

Product e.g Toaster

Time
Drill Up (Roll up)
Roll-up is performed by climbing up a concept
hierarchy for the dimension location.
Initially the concept hierarchy was "street <
city < province < country".
On rolling up, the data is aggregated by
ascending the location hierarchy from the
level of city to the level of country.
The data is grouped into cities rather than
countries.
When roll-up is performed, one or more
dimensions from the data cube are removed.

36
Drill Up (Roll up)

37
Drill Down (roll down)
Drill-down is the reverse operation of roll-up. It is
performed by either of the following ways:
By stepping down a concept hierarchy for a dimension
By introducing a new dimension.

Product

Category e.g Electrical Appliance

Sub Category e.g Kitchen


Region

Product e.g Toaster

Time
Drill Down (roll down)
Drill-down is performed by stepping down a
concept hierarchy for the dimension time.
Initially the concept hierarchy was "day <
month < quarter < year."
On drilling down, the time dimension is
descended from the level of quarter to the
level of month.
When drill-down is performed, one or more
dimensions from the data cube are added.
It navigates the data from less detailed data
to highly detailed data.

39
Drill Down (roll down)
The result of a drill-down
operation performed on
the central cube by
stepping down a concept
hierarchy for temperature
can be defined as week-
-day--cool. Drill-down
occurs by descending the
time hierarchy from the
level of week to the more
detailed level of day. Also
new dimensions can be
added to the cube,
because drill-down adds
more detail to the given
data.

40
Slice
The slice operation is based on selecting one dimension and
focusing on a portion of a cube
It will form a new sub-cube by selecting one or more
dimensions.

Product
Product=Toaster

Region
Region

Time
Time
Slice
Slice performs a
selection on one
dimension of the
given cube, thus
resulting in a
subcube. For
example, in the
cube example
above, if we make
the selection,
temperature=cool
we will obtain the
following cube: 42
Dice
The dice operation creates a sub-cube by
focusing on two or more dimensions.
Dice selects two or more dimensions from
a given cube and provides a new sub-
cube.
The dice operation on the cube based on
the following selection criteria involves
three dimensions.
(location = "Toronto" or "Vancouver")
(time = "Q1" or "Q2")
(item =" Mobile" or "Modem")

43
Dice
The dice operation
defines a subcube
by performing a
selection on two or
more dimensions.
For example,
applying the
selection (time =
day 3 OR time = day
4) AND (temperature
= cool OR
temperature = hot)
to the original cube
we get the following 44
Pivot (rotate)
- reorient the cube, visualization, 3D to series of 2 D planes
Pivoting, or rotation, changes the perspective in presenting the data to t

Product
Product
Region

Time

Region
Time
Pivot (rotate)
Pivot otherwise known as Rotate
changes the dimensional
orientation of the cube, i.e.
rotates the data axes to view the
data from different perspectives.
Pivot groups data with different
dimensions.

46
Pivot (rotate)

47
OLAP Operations
Presentation

Product

Reporting
Tool
Region

Report

Time
Data Warehouse Schema
Data Warehouse environment usually
transforms the relational data model into
some special architectures.
Each Schema has a Fact table that stores
all the facts about the subject/measure.
Each fact is associated with multiple
dimension keys that are linked to
Dimension Tables.
The most commonly used Data
Warehouse Schemas are:
Data Warehouse Schema
Star Schema
◦ Single Fact table with n –Dimension
tables linked to it.
Snowflake Schema
◦ Single Fact table with n-Dimension
tables organized as a hierarchy.
Fact Constellation Schema
◦ Multiple Facts table sharing dimension
tables.

51
Star Schema
A fact table in the middle connected to a
set of dimension tables
A single, large and central fact table and
one table for each dimension.
Every fact points to one tuple in each of
the dimensions and has additional
attributes.
Usually the fact tables in a star schema are
in third normal form(3NF) whereas
dimensional tables are de-normalized.
The star schema is the simplest
architecture, it is most commonly used
nowadays and is recommended by Oracle.
Star Schema
Fact Tables
A fact table typically has two types of
columns: foreign keys to dimension tables and
measures those that contain numeric facts.
Dimension Tables
A dimension is a structure usually composed
of one or more hierarchies that categorizes
data. The primary keys of each of the
dimension tables are part of the composite
primary key of the fact table.
Dimension tables are generally small in size
than fact table.

53
Star Schema Example
Store Fact Table Time Dimension
Dimension
Store Key Store Key Period Key
Store Name Product Key Year
City Period Key
Quarter
Units
State Month
Price
Region

Product Key
Product Desc

Product
Dimension

Benefits: Easy to understand, easy to define hierarchies, reduces no


physical
joins.
SnowFlake Schema
Variant of star schema model.
A single, large and central fact
table and one or more tables for
each dimension.
Dimension tables are normalized
i.e. split dimension table data into
additional tables
"Snowflaking" is a method of
normalising the dimension tables in
a star schema.
SnowFlake Schema Example

Store Dimension Fact Table Time Dimension


Store Key Period Key
Store Key
Product Key Year
Store Name Period Key
Quarter
City Key Units
Month
Price
City Dimension
City Key
Product Key
City
Product Desc
State
Region Product
Dimension

Drawbacks: Time consuming joins, report generation


SnowFlake Schema
Example

58
Fact Constellation Schema

Multiple fact tables share dimension tables.


This schema is viewed as collection of stars
hence called galaxy schema or fact
constellation.
For each star schema it is possible to
construct fact constellation schema
for example by splitting the original star
schema into more star schemes each of
them describes facts on another level of
dimension hierarchies
Sophisticated application requires such
schema.
Fact Constellation Example
Sales Shipping
Fact Table Fact Table
Store Key Product
Dimension Shipper Key
Product Key Product Key Store Key
Period Key Product Desc Product Key
Units
Period Key
Price
Units
Price
Store
Dimension
Store Key
Store Name
City
State
Region
Fact Constellation Example

62
Fact Constellation schema

The main shortcoming of the fact


constellation schema is a more
complicated design because many
variants for particular kinds of
aggregation must be considered
and selected.
Moreover, dimension tables are still
large.

64
Concept hierarchy
A concept hierarchy defines a sequence of
mapping from a set of low-level concepts to
higher level, more general concepts
A Concept Hierarchy example

all all

region Europe ... North_America

country Germany ... Spain Canada ... Mexico

city Frankfurt ... Vancouver ... Toronto

office L. Chan ... M. Wind

66
Case Study
XYZ Foods & Beverages is a new
company which produces dairy, bread
and meat products with production
unit located at Baroda.
There products are sold in North,
North West and Western region of
India.
They have sales units at Mumbai,
Pune , Ahemdabad ,Delhi and Baroda.
The President of the company wants
sales information.
Sales Information

Report: The number of units sold in first


quarter
113

Report: The number of units sold over time (per month)

January February March April


14 41 33 25
Sales Information
Report : The number of items sold for each product
with time

Jan Feb Mar Apr


Wheat Bread 6 17

Cheese 6 16 6 8

Time
Swiss Rolls 8 25 21
Product
Sales Information
Report: The number of items sold in each City
for each product with time

Jan Fe Mar Ap
b r City
Mumb Wheat 3 10
ai Bread

Time
Cheese 3 16 6
Swiss 4 16 6
Rolls
Pune Wheat 3 7 Product
Bread
Cheese 3 8
Swiss 4 9 15
Rolls
Sales Information
Report: The number of items sold and income in
each region for each product with time.

Jan Feb Mar Apr


Rs U Rs U Rs U Rs U
Mumb Wheat 7.44 3 24.8 10
ai Bread 0
Cheese 7.9 3 42.4 1 15.9 6
5 0 6 0
Swiss Rolls 7.3 4 29.9 1 10.9 6
2 8 6 8
Pune Wheat 7.44 3 17.3 7
Bread 6
Cheese 7.9 3 21.2 8
5 0
Sales Measures &
Dimensions
Measure – Units sold, Amount.
Dimensions – Product, Time,
Region.
Sales Data Warehouse
Model
Fact Table
City Product Month Units Rupees
Mumbai Cheese January 3 7.95
Mumbai Swiss January 4 7.32
Rolls
Pune Cheese January 3 7.95
Pune Swiss January 4 7.32
Rolls
Mumbai Cheese February 16 42.40
Sales Data Warehouse Model

Fact Table

City_I Prod_I Time_ID Units Rupee


D D s
1 589 1/1/1998 3 7.95
1 1218 1/1/1998 4 7.32
2 589 1/1/1998 3 7.95
2 1218 1/1/1998 4 7.32
1 589 2/1/1998 16 42.40
Sales Data Warehouse Model

Product Dimension Tables


Prod_ID Product_Name Product_Category_I
D
589 Cheese 1
590 Wheat Bread 2
288 Coconut Cookies 3

1218 Swiss Roll 2

Product_Category_Id Product_Category
1 Milk
2 Bread
3 Cookies
Sales Data Warehouse Model

Region Dimension
Table
City_ID City Region Country

1 Mumbai West India


2 Pune NorthWest India
Sales Data Warehouse Model

Time

Product
Sales Fact Product
Category

Region
Sales Data Warehouse Model:
Snowflake Schema

78
Sales Data Warehouse
Model
City Product Time Units Dollars
All (M+P) All Qtr 1 113 251.26
(Cheese+
Wheat
Bread+
Swiss
Roll)
Mumbai All All 64 146.07
Mumbai Cheese All 38 66.25
Mumbai Wheat Qtr1 13 32.24
Bread
Mumbai Wheat March 3 7.44
Bread
Assignment 1
Suppose that a data warehouse consists of the three
dimensions time, doctor, and patient, and the two
measures count and charge, where charge is the fee
that a doctor charges a patient for a visit.
1. Enumerate three classes of schemas that are popularly
used for modeling data warehouses.
2. Draw a schema diagram for the above data warehouse
using one of the schema classes listed in (1).
3. Starting with the base cube [day, doctor, patient],
what specific OLAP operations should be performed in
order to list the total fee collected by each doctor in
2004?
4. To obtain the same list, write an SQL query assuming
the data are stored in a relational database with the
schema fee (day, month, year, doctor, hospital,
patient, count, charge).
80
Solution
1. roll up from day to month to year
2. slice for year = “2004”
3. roll up on patient from individual patient to all

Select doctor, Sum(charge) From fee Where year = 2004


Group by doctor;

81
Assignment 2
Design a data warehouse for a
regional weather bureau. The weather
bureau has about 1,000 probes, which
are scattered throughout various land
and ocean locations in the region to
collect basic weather data, including
air pressure, temperature, and
precipitation at each hour. All data are
sent to the central station, which has
collected such data for over 10 years.
82
Assignment 2 solution

Since the weather bureau has about 1,000 probes scattered


throughout various land and ocean locations, we need to
construct a spatial data warehouse so that a user can view
weather patterns on a map by month,
by region, and by di®erent combinations of temperature and
precipitation, and can dynamically drill down or roll up along
83
any dimension to explore desired patterns.
Assignment 3
Suppose that a data warehouse for Big University consists of
the following four dimensions: student, course, semester, and
instructor, and two measures count and avg grade. When at
the lowest conceptual level (e.g., for a given student, course,
semester, and instructor combination), the avg grade
measure stores the actual course grade of the student. At
higher conceptual levels, avg grade stores the average grade
for the given student.
 Draw a snowflake schema diagram for the data warehouse.
What specific OLAP operations should one perform in order to
list the average grade of CS courses for each Big University
student?
To obtain the same list, write an SQL query assuming the data
are stored in a relational database with the schema
big_university (student, course, department, semester,
instructor, grade).

84
OLAP Server
In order to offer consistent, rapid response to
queries (regardless of database size and
complexity), OLAP needs to be implemented in a
multi-user client/server mode.
An OLAP Server is a high-capacity, multi-user
data manipulation engine specifically designed to
support and operate on multi-dimensional data
structure.
The server design and data structure are
optimized for rapid ad-hoc information retrieval
in any orientation.
Types of OLAP Servers:
◦ MOLAP server
◦ ROLAP server
◦ HOLAP server
Multidimensional OLAP (MOLAP)
In MOLAP, data is stored in a
multidimensional cube and not in the
relational database
It uses specialized data structures to
organize, navigate and analyze data
It uses array technology and efficient
storage techniques that minimize the disk
space requirements.
MOLAP differs significantly in that (in
some software) it requires the
pre-computation and storage of
information in the cube — the operation
known as processing.
Multidimensional OLAP
(MOLAP)
Advantages:
Excellent performance: MOLAP cubes are built for fast
data retrieval, and is optimal for slicing and dicing
operations.
It uses array technology and efficient storage
techniques that minimize the disk space requirements.
Can perform complex calculations: All calculations
have been pre-generated when the cube is created.
Hence, complex calculations are not only doable, but
they return quickly.

MOLAP example:
Analysis and budgeting in a financial department
Sales analysis
Multidimensional OLAP (MOLAP)

Disadvantages:
Only a limited amount of data can be efficiently stored
and analyzed because all calculations are performed
when the cube is built, it is not possible to include a
large amount of data in the cube itself
Underlying data structures are limited in their ability
to support multiple subject areas and provide access
to detailed data.
Storage, Navigation and analysis of data are limited
because the data is designed according to previously
determined requirements. Data may need to be
physically reorganized to optimally support new
requirements.
Requires additional investment: Cube technology are
often proprietary and do not already exist in the
organization. Therefore, to adopt MOLAP technology,
Relational OLAP
(ROLAP)
ROLAP is form of online analytical
processing that performs dynamic
multidimensional analysis of data
stored in a relational database rather
than in a multidimensional database.
 It is the fastest-growing type of OLAP tool
 It does not require the pre-computation
and storage of information.
 They stand between relational back-end
server and client front-end tools
Relational OLAP (ROLAP)
 Advantages:
 Can handle large amounts of data: ROLAP itself places
no limitation on data amount.
 Can influence functionalities inherent in the relational
database: Often, relational database already comes
with a host of functionalities. ROLAP technologies,
since they sit on top of the relational database, can
therefore leverage these functionalities.
ROLAP technology tends to have greater
scalability than MOLAP technology

 ROLAP Examples:
◦ Telecommunication startup: call data records (CDRs)
◦ ECommerce Site
◦ Credit Card Company
Relational OLAP (ROLAP)
 Disadvantages:
 Performance can be slow: Because each ROLAP
report is essentially a SQL query (or multiple
SQL queries) in the relational database, the
query time can be long if the underlying data
size is large.
 Limited by SQL functionalities: It is difficult to
perform complex calculations using SQL
 Development of middleware to facilitate the
development of multidimensional applications,
that is software that converts two dimensional
relational into multidimensional structure.

91
Hybrid OLAP (HOLAP)
 Combine ROLAP and MOLAP technology
 Allow large volumes of storing detailed data in
RDBMS and Storing aggregated data in MDBMS
 User access via MOLAP tools
 Best of both worlds greater data capacity of
ROLAP with superior processing capability of
MOLAP
◦ Benefits from greater scalability of ROLAP
◦ Benefits from faster computation of MOLAP
 Itstores data in both a relational database RDB
and a multidimensional database (MDD) and
uses whichever is suited to the type of
processing desired
Hybrid OLAP (HOLAP)

 HOLAP tools deliver selected data directly from


the DBMS or via MOLAP server in the form of
data cube, where it is stored, analyzed,
maintained locally
 Issues :

◦ The architecture results in significant data


redundancy and may cause problems for
networks that support many users
◦ Hybrid OLAP tools provide limited analysis
capability
◦ Only a limited amount of data can be
efficiently maintained
 HOLAP Examples:
◦ Sales department of a multi-national
company
Denormalization

As the name indicates, denormalization is


the reverse process of normalization.
It is the controlled introduction of
redundancy in to the database design.
It helps to improve the query performance
as the number of joins could be reduced.
Denormalization is the process of trying
to improve the read performance of a
database, at the expense of losing some
write performance, by adding redundant
copies of data or by grouping data

94
Denormalization
A normalized design will often
store different but related pieces of
information in separate logical
tables (called relations).
If these relations are stored
physically as separate disk files,
completing a database query that
draws information from several
relations (a join operation) can be
slow.
95

Denormalization
Solution is to denoramlize tables
Data is included in one table from another
in order to eliminate the second table
which reduces the number of JOINS in a
query and thus achieves performance.
It’s important to point out that you don’t
need to use denormalization if there are
no performance issues in the application.
Before going with it, consider other
options, like query optimization and
proper indexing.
96
Denormalization Example
1
Example 2 normalized
model

98
Example 2 normalized
model
The user_account table stores data about users who login into
our application
The client table contains some basic data about our clients.
The product table lists products offered to our clients.
The task table contains all the tasks we have created. each
task as a set of related actions towards clients. Each task has its
related calls, meetings, and lists of offered and sold products.
The call and meeting tables store data about all calls and
meetings and relates them with tasks and users.
The dictionaries task_outcome, meeting_outcome and
call_outcome contain all possible options for the final state of
a task, meeting or call.
The product_offered stores a list of all products that were
offered to clients on certain tasks while product_sold contains
a list of all the products that client actually bought.
The supply_order table stores data about all orders we’ve
placed and the products_on_order table lists products and
their quantity for specific orders.
The writeoff table is a list of products that were written off due
99
Denormalized model

10
0
Denormalized model: product
 The only change in the product table is the
addition of the units_in_stock attribute. In a
normalized model we could compute this data as
units ordered – units sold – (units offered) – units
written off. We would repeat the calculation each
time a client asks for that product, which would
be extremely time consuming. Instead, we’ll
compute the value up front; when a customer
asks us, we’ll have it ready. Of course, this
simplifies the select query a lot. On the other
hand, the units_in_stock attribute must be
adjusted after every insert, update, or delete in
the products_on_order, writeoff,
product_offered and product_sold tables.

10
1
Denormalized model : task
In the modified task table, we find two
new attributes: client_name and
user_first_last_name. Both of them store
values when the task was created. The
reason is that both of these values can
change during time. We’ll also keep a
foreign key that relates them to the
original client and user ID. There are
more values that we would like to store,
like client address, VAT ID, etc.

10
2
Denormalized model
The denormalized product_offered table has two new
attributes, price_per_unit and price. The price_per_unit
attribute is stored because we need to store the actual
price when the product was offered. The normalized
model would only show its current state, so when the
product price changes our ‘history’ prices would also
change. Our change doesn’t just make the database run
faster: it also makes it work better. The price attribute is
the computed value units_sold * price_per_unit. I added it
here to avoid making that calculation each time we want
to take a look at a list of offered products. It’s a small cost,
but it improves performance.
The changes made on the product_sold table are very
similar. The table structure is the same, but it stores a list
of sold items.

10
3
Denormalized model

10
4
Denormalized model
The statistics_per_year table is completely new to our
model. We should look at it as a denormalized table because
all its data can be computed from the other tables. The idea
behind this table is to store the number of tasks, successful
tasks, meetings and calls related to any given client. It also
handles the sum total charged per each year. After inserting,
updating, or deleting anything in the task, meeting, call
and product_sold tables, we should recalculate this table’s
data for that client and corresponding year. We can expect
that we’ll mostly have changes only for the current year.
Reports for previous years shouldn’t need to change.
Values in this table are computed up front, so we’ll spend
less time and resources at the moment we need the
calculation result.

10
5
When to use
denormalization
Maintaining history

Improving query performance

Speeding up reporting

Computing commonly-needed
values up front

10
6
Disadvantages of Denormalization
Disk space: As will have duplicate data
Data anomalies: We must update every piece of
duplicate data . That also applies to computed values
and reports. We can achieve this by using triggers,
transactions and/or procedures for all operations that
must be completed together.
Documentation: We must properly document every
denormalization rule that we have applied.
Slowing other operations: We can expect that
we’ll slow down data insert, modification, and
deletion operations.
More coding: It will require additional coding, but at
the same time they will simplify some select queries
a lot.

10
7

You might also like