0% found this document useful (0 votes)
84 views21 pages

Data Warehouse Unit 4 Complete

Uploaded by

Sandeep Nayal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views21 pages

Data Warehouse Unit 4 Complete

Uploaded by

Sandeep Nayal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

OLAP

OLAP (Online Analytical Processing) is a software technology that allows analysts, managers,
and executives to quickly and interactively access data in various formats, turning raw
information into meaningful insights. It supports the multidimensional analysis of business
information and enables complex calculations, trend analysis, and advanced data modeling.

OLAP is essential for business performance management, planning, budgeting, forecasting,


financial reporting, analysis, simulations, knowledge discovery, and data warehouse reporting. It
allows users to perform ad hoc analysis in multiple dimensions, helping them make better
decisions by providing deeper insights and understanding.

Uses of OLAP
OLAP applications are used by a variety of the functions of an organization.

Finance and accounting:


○ Budgeting
○ Activity-based costing
○ Financial performance analysis
○ And financial modeling

Sales and Marketing

○ Sales analysis and forecasting


○ Market research analysis
○ Promotion analysis
○ Customer analysis
○ Market and customer segmentation

Production

○ Production planning
○ Defect analysis

OLAP cubes have two main purposes. The first is to provide business users with a data
model more intuitive to them than a tabular model. This model is called a Dimensional
Model.
The second purpose is to enable fast query response that is usually difficult to achieve
using tabular models.

Aggregation, historical information management, and query facilities are essential components of
data warehousing systems, enabling efficient data analysis and decision-making. Here’s an overview
of each:

Aggregation
1. Definition:

● Aggregation involves combining and summarizing data to derive meaningful insights.

● Aggregated data often represents higher-level summaries or totals, facilitating

analysis and reporting.

2. Purpose:

● Aggregation reduces the volume of data by consolidating detailed information into

more manageable and meaningful summaries.

● Aggregated data provides valuable insights into trends, patterns, and anomalies

within the dataset.

3. Common Aggregation Functions:

● Sum: Adds up numerical values.

● Average: Calculates the mean value of numerical data.

● Count: Counts the number of occurrences.

● Min/Max: Finds the minimum or maximum value within a dataset.

● Group By: Groups data based on specified attributes for aggregation.

4. Usage:

● Aggregation is essential for generating reports, creating dashboards, and performing

ad-hoc analysis.

● It simplifies data exploration and visualization by presenting summarized views of

complex datasets.

Historical Information Management


1. Definition:
● Historical information management involves storing and managing historical data

within the data warehouse.

● Historical data represents past snapshots of business transactions, events, or states.

2. Purpose:

● Historical data provides context and historical perspective for analysis and

decision-making.

● It supports trend analysis, forecasting, and predictive modeling by capturing past

patterns and behaviors.

3. Data Retention Policies:

● Organizations define data retention policies to determine how long historical data

should be retained in the data warehouse.

● Policies consider regulatory requirements, business needs, and storage constraints.

4. Temporal Data Modeling:

● Temporal data modeling techniques, such as slowly changing dimensions (SCDs),

are used to manage changes in historical data over time.

● SCDs track historical changes to dimension attributes, allowing for accurate historical

analysis.

Query Facility
1. Definition:

● Query facilities provide tools and interfaces for querying and accessing data stored in

the data warehouse.

● Users can write and execute SQL queries or use graphical interfaces to retrieve data.

2. Features:

● SQL Support: Query facilities support SQL (Structured Query Language) for data

retrieval and manipulation.

● OLAP (Online Analytical Processing): Supports multidimensional analysis with

capabilities like slicing, dicing, drilling, and pivoting.

● Ad-Hoc Querying: Allows users to create and execute ad-hoc queries to explore data

interactively.
● Parameterized Queries: Supports parameterized queries to enable dynamic filtering

and customization.

3. Performance Optimization:

● Query facilities optimize query performance through techniques like query

optimization, indexing, and caching.

● They leverage database engine capabilities for efficient query execution and

resource utilization.

4. Integration:

● Query facilities integrate with reporting tools, BI platforms, and data visualization

tools to enable seamless data analysis and reporting.

● They support integration with ETL (Extract, Transform, Load) tools for data

preparation and loading.

Aggregation, historical information management, and query facilities are critical components of data
warehousing systems, enabling efficient data analysis and decision-making. By aggregating data,
managing historical information, and providing robust query facilities, organizations can extract
valuable insights from their data warehouse and drive informed business decisions.

OLAP functions
Slice

The slice function allows users to examine a single level of data from a multidimensional array. It
extracts a 2D view from a 3D data cube by fixing one of the dimensions. For example, if you
have a sales data cube with dimensions like time, product, and location, slicing by "time" could
give you a report for a specific year. It is useful when you need to analyze data for a specific
period or focus on a single dimension while keeping others constant.

Dice

The dice function enables users to select specific ranges of data across multiple dimensions,
creating a sub-cube. This is ideal for comparing data across different parameters. For example,
you can slice the data by time and then dice it by selecting data for a particular product category
and a specific region, creating a smaller, more focused data subset for deeper analysis.

Drill-Down
The drill-down function lets users view more detailed data by moving to a lower level in the data
hierarchy. It helps users uncover underlying trends or outliers in the data. For example, if you're
viewing sales data for a country, drilling down can show data by state, city, or even individual
stores. This is particularly beneficial when you want to understand the detailed data behind
high-level reports.

Roll-Up

The roll-up function is the opposite of drill-down. It aggregates data to a higher level of the
hierarchy, summarizing detailed data into broader categories. For example, if you're viewing
sales data at the product level, rolling up could show you total sales for a product category or
the entire product line. This function is ideal for strategic overviews and is useful for
executive-level reporting.

Pivot (Rotate)

The pivot function allows users to rearrange dimensions in the view to get a different
perspective on the data. This is helpful when users want to examine the data from different
angles or perspectives. For example, you can pivot a report showing sales by region and time,
so it shows sales by product and location instead. It helps users identify patterns or trends that
might be overlooked in a static view.

Drill-Up

The drill-up function is the inverse of drill-down. It helps users navigate back to a higher level in
the data hierarchy when they need to view more summarized data. After drilling down to see
detailed monthly sales, for example, a user might use drill-up to return to the yearly summary.

OLAP TOOLS

OLAP tools are software applications that provide the functionality for multidimensional data
analysis. These tools enable users to interact with data cubes, perform advanced queries, and
generate reports based on specific business needs.

1. Microsoft SQL Server Analysis Services (SSAS):


○ SSAS is a powerful OLAP tool that allows users to build multidimensional cubes
and analyze data. It integrates with SQL Server and provides both MOLAP
(Multidimensional OLAP) and ROLAP (Relational OLAP) capabilities.
○ Key Features: Data mining, security features, and advanced calculations for
business intelligence applications.
2. IBM Cognos Analytics:
○ IBM Cognos is an OLAP tool that provides business intelligence and data
visualization capabilities. It allows users to analyze data, create reports, and
perform trend analysis.
○ Key Features: Ad-hoc reporting, drill-down and drill-through functionality, and
powerful visualizations.
3. Oracle OLAP:
○ Oracle OLAP is a part of the Oracle database that provides multidimensional
analysis within the database. It allows users to perform ad-hoc queries, create
data cubes, and gain insights from large datasets.
○ Key Features: Integration with Oracle databases, advanced calculations, and
support for both MOLAP and ROLAP.
4. Tableau:
○ Tableau is a popular data visualization and OLAP tool that enables users to
create interactive and shareable dashboards. It supports multidimensional
analysis and can connect to various data sources.
○ Key Features: Drag-and-drop interface, real-time collaboration, and integration
with cloud-based data sources.
5. QlikView:
○ QlikView is an OLAP tool that provides in-memory data processing and
self-service analytics. It allows users to explore data and gain insights through
interactive dashboards.
○ Key Features: In-memory data processing, associative data model, and powerful
visualization options.
6. SAP BusinessObjects:
○ SAP BusinessObjects is an enterprise BI tool that includes OLAP functionality for
multidimensional analysis. It supports complex queries and can integrate with
SAP data warehouses.
○ Key Features: Reporting, dashboards, ad-hoc queries, and advanced analytics
capabilities.
7. Pentaho:
○ Pentaho offers a suite of business analytics tools, including OLAP functionality
for multidimensional analysis. It supports the creation of cubes and reports from
various data sources.
○ Key Features: Data integration, reporting, and analysis in a unified platform.

Difference Between OLAP and OLTP


OLAP (Online
OLTP (Online
Category Analytical
Transaction Processing)
Processing)

It is well-known as an
It is well-known as an online
Definition online database query
database modifying system.
management system.

Consists of historical
Consists of only operational
Data source data from various
current data.
Databases.

It makes use of a standard


It makes use of a data
Method used database management
warehouse.
system (DBMS).

It is subject-oriented.
Used for Data Mining, It is application-oriented.
Application
Analytics, Decisions Used for business tasks.
making, etc.
In an OLAP database,
In an OLTP database, tables
Normalized tables are not
are normalized (3NF).
normalized.

The data is used in


The data is used to perform
Usage of planning,
day-to-day fundamental
data problem-solving, and
operations.
decision-making.

It provides a
multi-dimensional view It reveals a snapshot of
Task
of different business present business tasks.
tasks.

It serves the purpose to It serves the purpose to


extract information for Insert, Update, and Delete
Purpose
analysis and information from the
decision-making. database.
The size of the data is
A large amount of data
Volume of relatively small as the
is stored typically in TB,
data historical data is archived in
PB
MB, and GB.

Relatively slow as the


amount of data involved Very Fast as the queries
Queries
is large. Queries may operate on 5% of the data.
take hours.

The OLAP database is


The data integrity constraint
not often updated. As a
Update must be maintained in an
result, data integrity is
OLTP database.
unaffected.

It only needs backup The backup and recovery


Backup and
from time to time as process is maintained
Recovery
compared to OLTP. rigorously
It is comparatively fast in
The processing of
Processing processing because of
complex queries can
time simple and straightforward
take a lengthy time.
queries.

This data is generally


Types of This data is managed by
managed by CEO, MD,
users clerksForex and managers.
and GM.

Only read and rarely Both read and write


Operations
write operations. operations.

With lengthy, scheduled


The user initiates data
batch operations, data is
Updates updates, which are brief and
refreshed on a regular
quick.
basis.

Nature of The process is focused The process is focused on


audience on the customer. the market.
Database Design with a focus on Design that is focused on the
Design the subject. application.

Improves the efficiency Enhances the user’s


Productivity
of business analysts. productivity.

Types of OLAP
There are three main types of OLAP servers are as following:

ROLAP stands for Relational OLAP, an application based on relational DBMSs.

MOLAP stands for Multidimensional OLAP, an application based on multidimensional


DBMSs.
HOLAP stands for Hybrid OLAP, an application using both relational and multidimensional
techniques.

Relational OLAP (ROLAP) Server


These are intermediate servers which stand in between a relational back-end server and
user frontend tools.

They use a relational or extended-relational DBMS to save and handle warehouse data,
and OLAP middleware to provide missing pieces.

ROLAP servers contain optimization for each DBMS back end, implementation of
aggregation navigation logic, and additional tools and services.

ROLAP technology tends to have higher scalability than MOLAP technology.

ROLAP systems work primarily from the data that resides in a relational database, where
the base data and dimension tables are stored as relational tables. This model permits the
multidimensional analysis of data.

This technique relies on manipulating the data stored in the relational database to give the
presence of traditional OLAP's slicing and dicing functionality. In essence, each method of
slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.

Relational OLAP Architecture


ROLAP Architecture includes the following components

○ Database server.
○ ROLAP server.
○ Front-end tool.
Advantages
Can handle large amounts of information: The data size limitation of ROLAP technology
depends on the data size of the underlying RDBMS. So, ROLAP itself does not restrict the
data amount.

RDBMS already comes with a lot of features. So ROLAP technologies, (works on top of the
RDBMS) can control these functionalities.

Disadvantages
Performance can be slow: Each ROLAP report is a SQL query (or multiple SQL queries) in
the relational database, the query time can be prolonged if the underlying data size is large.

Limited by SQL functionalities: ROLAP technology relies upon developing SQL statements
to query the relational database, and SQL statements do not suit all needs.

Multidimensional OLAP (MOLAP) Server


A MOLAP system is based on a native logical model that directly supports multidimensional
data and operations. Data are stored physically into multidimensional arrays, and positional
techniques are used to access them.

One of the significant distinctions of MOLAP against a ROLAP is that data are summarized
and are stored in an optimized format in a multidimensional cube, instead of in a relational
database. In MOLAP model, data are structured into proprietary formats by client's reporting
requirements with the calculations pre-generated on the cubes.
MOLAP Architecture
MOLAP Architecture includes the following components

Database server.
MOLAP server.
Front-end tool.

Advantages
Excellent Performance: A MOLAP cube is built for fast information retrieval, and is optimal
for slicing and dicing operations.

Can perform complex calculations: All evaluations have been pre-generated when the cube
is created. Hence, complex calculations are not only possible, but they return quickly.

Disadvantages
Limited in the amount of information it can handle: Because all calculations are performed
when the cube is built, it is not possible to contain a large amount of data in the cube itself.

Requires additional investment: Cube technology is generally proprietary and does not
already exist in the organization. Therefore, to adopt MOLAP technology, chances are other
investments in human and capital resources are needed.
Hybrid OLAP (HOLAP) Server
HOLAP incorporates the best features of MOLAP and ROLAP into a single architecture.
HOLAP systems save more substantial quantities of detailed data in the relational tables
while the aggregations are stored in the pre-calculated cubes. HOLAP also can drill through
from the cube down to the relational tables for delineated data. The Microsoft SQL Server
2000 provides a hybrid OLAP server.

Advantages of HOLAP
1. HOLAP provides benefits of both MOLAP and ROLAP.
2. It provides fast access at all levels of aggregation.
3. HOLAP balances the disk space requirement, as it only stores the aggregate
information on the OLAP server and the detail record remains in the relational
database. So no duplicate copy of the detail record is maintained.

Disadvantages of HOLAP
1. HOLAP architecture is very complicated because it supports both MOLAP and
ROLAP servers.

Difference between ROLAP, MOLAP, and


HOLAP

ROLAP MOLAP HOLAP

ROLAP stands for Relational MOLAP stands for HOLAP stands for Hybrid Online
Online Analytical Processing. Multidimensional Online Analytical Processing.
Analytical Processing.

The ROLAP storage mode The MOLAP storage mode The HOLAP storage mode
causes the aggregation of the principle the aggregations of the connects attributes of both
division to be stored in indexed division and a copy of its source MOLAP and ROLAP. Like
views in the relational database information to be saved in a MOLAP, HOLAP causes the
that was specified in the multidimensional operation in aggregation of the division to be
partition's data source. analysis services when the stored in a multidimensional
separation is processed. operation in an SQL Server
analysis services instance.

ROLAP does not because a This MOLAP operation is highly HOLAP does not causes a copy
copy of the source information optimize to maximize query of the source information to be
to be stored in the Analysis performance. The storage area stored. For queries that access
services data folders. Instead, can be on the computer where the only summary record in the
when the outcome cannot be the partition is described or on aggregations of a division,
derived from the query cache, another computer running HOLAP is the equivalent of
the indexed views in the record Analysis services. Because a MOLAP.
source are accessed to answer copy of the source information
queries. resides in the multidimensional
operation, queries can be
resolved without accessing the
partition's source record.
Query response is frequently Query response times can be Queries that access source
slower with ROLAP storage reduced substantially by using record for example, if we want
than with the MOLAP or HOLAP aggregations. The record in the to drill down to an atomic cube
storage mode. Processing time partition's MOLAP operation is cell for which there is no
is also frequently slower with only as current as of the most aggregation information must
ROLAP. recent processing of the retrieve data from the relational
separation. database and will not be as fast
as they would be if the source
information were stored in the
MOLAP architecture.
Data Mining Interface

A data mining interface facilitates the exploration and extraction of actionable insights from large
datasets using data mining techniques. Here’s an overview:

1. Query Interface:

● Provides users with tools to define and execute data mining queries against the data

warehouse.

● Supports various query languages or graphical interfaces for defining mining tasks.

2. Data Exploration Tools:

● Enables users to explore data visually and interactively to identify patterns, trends,

and anomalies.

● Includes features such as data visualization, clustering, classification, and

association rule discovery.

3. Model Building and Evaluation:

● Allows users to build predictive models using machine learning algorithms and

evaluate their performance.

● Provides tools for model training, testing, and validation using techniques like

cross-validation.

4. Integration with BI Tools:

● Integrates with business intelligence (BI) tools and dashboards to visualize and

present data mining results.

● Enables users to incorporate predictive insights into decision-making processes.

Security

Data warehouse security is crucial for protecting sensitive information and ensuring compliance with
regulatory requirements. Here are key security measures:

1. Access Control:
● Implement role-based access control (RBAC) to restrict access to data based on

users’ roles and responsibilities.

● Enforce strong authentication mechanisms, such as multi-factor authentication

(MFA), to prevent unauthorized access.

2. Data Encryption:

● Encrypt data at rest and in transit to prevent unauthorized access or interception.

● Use encryption techniques such as SSL/TLS for network communication and

encryption algorithms for data storage.

3. Auditing and Monitoring:

● Implement auditing and logging mechanisms to track user activities and changes to

data.

● Monitor access patterns and detect suspicious behavior to prevent security

breaches.

4. Data Masking and Anonymization:

● Mask sensitive data to anonymize personally identifiable information (PII) and protect

privacy.

● Replace sensitive data with pseudonymized or randomized values to ensure

confidentiality.

5. Compliance and Governance:

● Ensure compliance with regulations such as GDPR, HIPAA, and PCI-DSS by

implementing data governance policies and controls.

● Conduct regular security assessments and audits to identify vulnerabilities and

ensure adherence to security standards.

Backup and Recovery

Backup and recovery processes are essential for data warehouse reliability and resilience. Here’s
how it’s managed:

1. Regular Backups:
● Schedule regular backups of the data warehouse to ensure data availability in case

of data loss or corruption.

● Implement full, incremental, or differential backup strategies based on recovery

requirements.

2. Redundant Storage:

● Store backup copies of data in redundant storage locations, such as cloud storage or

off-site data centers.

● Ensure data redundancy and fault tolerance to mitigate the risk of data loss due to

hardware failures or disasters.

3. Point-in-Time Recovery:

● Maintain transaction logs or incremental backups to facilitate point-in-time recovery

to a specific moment in the past.

● Enable rollback or recovery to restore the data warehouse to a consistent state after

data corruption or accidental changes.

4. Disaster Recovery Planning:

● Develop and test disaster recovery plans to ensure business continuity in the event

of catastrophic failures or natural disasters.

● Establish procedures for failover, data restoration, and system recovery to minimize

downtime and data loss.

5. Automated Backup Solutions:

● Use automated backup solutions and backup scheduling tools to streamline backup

and recovery processes.

● Monitor backup jobs and receive alerts for any failures or anomalies to ensure timely

resolution.

A robust data mining interface facilitates data exploration and predictive analysis, while
comprehensive security measures protect sensitive information and ensure compliance. Backup and
recovery processes ensure data warehouse resilience and availability, safeguarding against data
loss and disruptions. By implementing these measures effectively, organizations can leverage their
data warehouse infrastructure securely and reliably to drive business insights and decision-making.

You might also like