Data Warehouse Unit 4 Complete
Data Warehouse Unit 4 Complete
OLAP (Online Analytical Processing) is a software technology that allows analysts, managers,
and executives to quickly and interactively access data in various formats, turning raw
information into meaningful insights. It supports the multidimensional analysis of business
information and enables complex calculations, trend analysis, and advanced data modeling.
Uses of OLAP
OLAP applications are used by a variety of the functions of an organization.
Production
○ Production planning
○ Defect analysis
OLAP cubes have two main purposes. The first is to provide business users with a data
model more intuitive to them than a tabular model. This model is called a Dimensional
Model.
The second purpose is to enable fast query response that is usually difficult to achieve
using tabular models.
Aggregation, historical information management, and query facilities are essential components of
data warehousing systems, enabling efficient data analysis and decision-making. Here’s an overview
of each:
Aggregation
1. Definition:
2. Purpose:
● Aggregated data provides valuable insights into trends, patterns, and anomalies
4. Usage:
ad-hoc analysis.
complex datasets.
2. Purpose:
● Historical data provides context and historical perspective for analysis and
decision-making.
● Organizations define data retention policies to determine how long historical data
● SCDs track historical changes to dimension attributes, allowing for accurate historical
analysis.
Query Facility
1. Definition:
● Query facilities provide tools and interfaces for querying and accessing data stored in
● Users can write and execute SQL queries or use graphical interfaces to retrieve data.
2. Features:
● SQL Support: Query facilities support SQL (Structured Query Language) for data
● Ad-Hoc Querying: Allows users to create and execute ad-hoc queries to explore data
interactively.
● Parameterized Queries: Supports parameterized queries to enable dynamic filtering
and customization.
3. Performance Optimization:
● They leverage database engine capabilities for efficient query execution and
resource utilization.
4. Integration:
● Query facilities integrate with reporting tools, BI platforms, and data visualization
● They support integration with ETL (Extract, Transform, Load) tools for data
Aggregation, historical information management, and query facilities are critical components of data
warehousing systems, enabling efficient data analysis and decision-making. By aggregating data,
managing historical information, and providing robust query facilities, organizations can extract
valuable insights from their data warehouse and drive informed business decisions.
OLAP functions
Slice
The slice function allows users to examine a single level of data from a multidimensional array. It
extracts a 2D view from a 3D data cube by fixing one of the dimensions. For example, if you
have a sales data cube with dimensions like time, product, and location, slicing by "time" could
give you a report for a specific year. It is useful when you need to analyze data for a specific
period or focus on a single dimension while keeping others constant.
Dice
The dice function enables users to select specific ranges of data across multiple dimensions,
creating a sub-cube. This is ideal for comparing data across different parameters. For example,
you can slice the data by time and then dice it by selecting data for a particular product category
and a specific region, creating a smaller, more focused data subset for deeper analysis.
Drill-Down
The drill-down function lets users view more detailed data by moving to a lower level in the data
hierarchy. It helps users uncover underlying trends or outliers in the data. For example, if you're
viewing sales data for a country, drilling down can show data by state, city, or even individual
stores. This is particularly beneficial when you want to understand the detailed data behind
high-level reports.
Roll-Up
The roll-up function is the opposite of drill-down. It aggregates data to a higher level of the
hierarchy, summarizing detailed data into broader categories. For example, if you're viewing
sales data at the product level, rolling up could show you total sales for a product category or
the entire product line. This function is ideal for strategic overviews and is useful for
executive-level reporting.
Pivot (Rotate)
The pivot function allows users to rearrange dimensions in the view to get a different
perspective on the data. This is helpful when users want to examine the data from different
angles or perspectives. For example, you can pivot a report showing sales by region and time,
so it shows sales by product and location instead. It helps users identify patterns or trends that
might be overlooked in a static view.
Drill-Up
The drill-up function is the inverse of drill-down. It helps users navigate back to a higher level in
the data hierarchy when they need to view more summarized data. After drilling down to see
detailed monthly sales, for example, a user might use drill-up to return to the yearly summary.
OLAP TOOLS
OLAP tools are software applications that provide the functionality for multidimensional data
analysis. These tools enable users to interact with data cubes, perform advanced queries, and
generate reports based on specific business needs.
It is well-known as an
It is well-known as an online
Definition online database query
database modifying system.
management system.
Consists of historical
Consists of only operational
Data source data from various
current data.
Databases.
It is subject-oriented.
Used for Data Mining, It is application-oriented.
Application
Analytics, Decisions Used for business tasks.
making, etc.
In an OLAP database,
In an OLTP database, tables
Normalized tables are not
are normalized (3NF).
normalized.
It provides a
multi-dimensional view It reveals a snapshot of
Task
of different business present business tasks.
tasks.
Types of OLAP
There are three main types of OLAP servers are as following:
They use a relational or extended-relational DBMS to save and handle warehouse data,
and OLAP middleware to provide missing pieces.
ROLAP servers contain optimization for each DBMS back end, implementation of
aggregation navigation logic, and additional tools and services.
ROLAP systems work primarily from the data that resides in a relational database, where
the base data and dimension tables are stored as relational tables. This model permits the
multidimensional analysis of data.
This technique relies on manipulating the data stored in the relational database to give the
presence of traditional OLAP's slicing and dicing functionality. In essence, each method of
slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.
○ Database server.
○ ROLAP server.
○ Front-end tool.
Advantages
Can handle large amounts of information: The data size limitation of ROLAP technology
depends on the data size of the underlying RDBMS. So, ROLAP itself does not restrict the
data amount.
RDBMS already comes with a lot of features. So ROLAP technologies, (works on top of the
RDBMS) can control these functionalities.
Disadvantages
Performance can be slow: Each ROLAP report is a SQL query (or multiple SQL queries) in
the relational database, the query time can be prolonged if the underlying data size is large.
Limited by SQL functionalities: ROLAP technology relies upon developing SQL statements
to query the relational database, and SQL statements do not suit all needs.
One of the significant distinctions of MOLAP against a ROLAP is that data are summarized
and are stored in an optimized format in a multidimensional cube, instead of in a relational
database. In MOLAP model, data are structured into proprietary formats by client's reporting
requirements with the calculations pre-generated on the cubes.
MOLAP Architecture
MOLAP Architecture includes the following components
Database server.
MOLAP server.
Front-end tool.
Advantages
Excellent Performance: A MOLAP cube is built for fast information retrieval, and is optimal
for slicing and dicing operations.
Can perform complex calculations: All evaluations have been pre-generated when the cube
is created. Hence, complex calculations are not only possible, but they return quickly.
Disadvantages
Limited in the amount of information it can handle: Because all calculations are performed
when the cube is built, it is not possible to contain a large amount of data in the cube itself.
Requires additional investment: Cube technology is generally proprietary and does not
already exist in the organization. Therefore, to adopt MOLAP technology, chances are other
investments in human and capital resources are needed.
Hybrid OLAP (HOLAP) Server
HOLAP incorporates the best features of MOLAP and ROLAP into a single architecture.
HOLAP systems save more substantial quantities of detailed data in the relational tables
while the aggregations are stored in the pre-calculated cubes. HOLAP also can drill through
from the cube down to the relational tables for delineated data. The Microsoft SQL Server
2000 provides a hybrid OLAP server.
Advantages of HOLAP
1. HOLAP provides benefits of both MOLAP and ROLAP.
2. It provides fast access at all levels of aggregation.
3. HOLAP balances the disk space requirement, as it only stores the aggregate
information on the OLAP server and the detail record remains in the relational
database. So no duplicate copy of the detail record is maintained.
Disadvantages of HOLAP
1. HOLAP architecture is very complicated because it supports both MOLAP and
ROLAP servers.
ROLAP stands for Relational MOLAP stands for HOLAP stands for Hybrid Online
Online Analytical Processing. Multidimensional Online Analytical Processing.
Analytical Processing.
The ROLAP storage mode The MOLAP storage mode The HOLAP storage mode
causes the aggregation of the principle the aggregations of the connects attributes of both
division to be stored in indexed division and a copy of its source MOLAP and ROLAP. Like
views in the relational database information to be saved in a MOLAP, HOLAP causes the
that was specified in the multidimensional operation in aggregation of the division to be
partition's data source. analysis services when the stored in a multidimensional
separation is processed. operation in an SQL Server
analysis services instance.
ROLAP does not because a This MOLAP operation is highly HOLAP does not causes a copy
copy of the source information optimize to maximize query of the source information to be
to be stored in the Analysis performance. The storage area stored. For queries that access
services data folders. Instead, can be on the computer where the only summary record in the
when the outcome cannot be the partition is described or on aggregations of a division,
derived from the query cache, another computer running HOLAP is the equivalent of
the indexed views in the record Analysis services. Because a MOLAP.
source are accessed to answer copy of the source information
queries. resides in the multidimensional
operation, queries can be
resolved without accessing the
partition's source record.
Query response is frequently Query response times can be Queries that access source
slower with ROLAP storage reduced substantially by using record for example, if we want
than with the MOLAP or HOLAP aggregations. The record in the to drill down to an atomic cube
storage mode. Processing time partition's MOLAP operation is cell for which there is no
is also frequently slower with only as current as of the most aggregation information must
ROLAP. recent processing of the retrieve data from the relational
separation. database and will not be as fast
as they would be if the source
information were stored in the
MOLAP architecture.
Data Mining Interface
A data mining interface facilitates the exploration and extraction of actionable insights from large
datasets using data mining techniques. Here’s an overview:
1. Query Interface:
● Provides users with tools to define and execute data mining queries against the data
warehouse.
● Supports various query languages or graphical interfaces for defining mining tasks.
● Enables users to explore data visually and interactively to identify patterns, trends,
and anomalies.
● Allows users to build predictive models using machine learning algorithms and
● Provides tools for model training, testing, and validation using techniques like
cross-validation.
● Integrates with business intelligence (BI) tools and dashboards to visualize and
Security
Data warehouse security is crucial for protecting sensitive information and ensuring compliance with
regulatory requirements. Here are key security measures:
1. Access Control:
● Implement role-based access control (RBAC) to restrict access to data based on
2. Data Encryption:
● Implement auditing and logging mechanisms to track user activities and changes to
data.
breaches.
● Mask sensitive data to anonymize personally identifiable information (PII) and protect
privacy.
confidentiality.
Backup and recovery processes are essential for data warehouse reliability and resilience. Here’s
how it’s managed:
1. Regular Backups:
● Schedule regular backups of the data warehouse to ensure data availability in case
requirements.
2. Redundant Storage:
● Store backup copies of data in redundant storage locations, such as cloud storage or
● Ensure data redundancy and fault tolerance to mitigate the risk of data loss due to
3. Point-in-Time Recovery:
● Enable rollback or recovery to restore the data warehouse to a consistent state after
● Develop and test disaster recovery plans to ensure business continuity in the event
● Establish procedures for failover, data restoration, and system recovery to minimize
● Use automated backup solutions and backup scheduling tools to streamline backup
● Monitor backup jobs and receive alerts for any failures or anomalies to ensure timely
resolution.
A robust data mining interface facilitates data exploration and predictive analysis, while
comprehensive security measures protect sensitive information and ensure compliance. Backup and
recovery processes ensure data warehouse resilience and availability, safeguarding against data
loss and disruptions. By implementing these measures effectively, organizations can leverage their
data warehouse infrastructure securely and reliably to drive business insights and decision-making.