0% found this document useful (0 votes)
11 views17 pages

Chapter 3 Data Warehouse & OLAP

Uploaded by

Hemant Kushwaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views17 pages

Chapter 3 Data Warehouse & OLAP

Uploaded by

Hemant Kushwaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Chapter 3

Data Warehouse &


OLAP
Introduction To OLAP
• OLAP (Online Analytical Processing) is a category of data processing
that enables users to extract and analyze data interactively from
multiple perspectives. It is essential in business intelligence (BI), data
mining, and decision support systems (DSS) to provide meaningful
insights for better decision-making.
• OLAP (Online Analytical Processing) is a data processing technique
used in business intelligence (BI) to analyze large datasets quickly
from multiple dimensions. It organizes data into multidimensional
cubes, allowing users to perform complex queries efficiently.
Characteristics of OLAP (Online Analytical
Processing)
1. Multidimensional Data Model
• Data is stored in OLAP cubes with multiple dimensions (e.g., Time,
Geography, Product).
• Allows users to analyze data from various perspectives.
2. Fast Query Performance
• Uses pre-aggregated and indexed data for rapid query responses.
• Reduces processing time for complex analytical queries.
3. Aggregation & Summarization
• Data is pre-aggregated at different levels (e.g., daily, monthly, yearly).
• Enhances efficiency for reporting and analytics.
4. Historical Data Analysis
• Stores large volumes of historical data for trend analysis.
• Essential for forecasting and decision-making.
5. Data Integration
• Combines data from multiple sources (OLTP databases, ERP, CRM, etc.).
• Ensures a unified view of business information.
6. High Scalability
• Designed to handle large datasets efficiently.
• Can scale to accommodate growing business needs.
7. Business Intelligence Support
• Integrates with BI tools for reporting, visualization, and dashboards.
• Helps organizations make data-driven decisions.
OLAP Creation Process
1. Requirement Analysis
📌 Objective: Identify business needs and analytical requirements.
🔹 Define key performance indicators (KPIs) and metrics.
🔹 Identify data sources (OLTP databases, ERP, CRM, etc.).
🔹 Determine the dimensions and measures for analysis.
2. Data Extraction, Transformation, and Loading (ETL)
📌 Objective: Collect, clean, and load data into the data warehouse.
🔹 Extract data from multiple sources (OLTP, CSV files, APIs).
🔹 Transform data (remove duplicates, format changes, aggregations).
🔹 Load cleaned data into a Data Warehouse.
Example Tools: Informatica, Talend, SSIS (SQL Server Integration Services).
3. Data Warehouse Design
📌 Objective: Organize data efficiently for OLAP analysis.
🔹 Choose a suitable schema:
• Star Schema (simpler, faster queries).
• Snowflake Schema (normalized, less redundancy).
🔹 Create fact tables (e.g., Sales data).
🔹 Define dimension tables (e.g., Time, Region, Product).
Example Tools: Amazon Redshift, Snowflake, Google BigQuery.
4. Deployment & Access Control
📌 Objective: Make OLAP data available to users securely.
🔹 Deploy OLAP cubes to BI platforms.
🔹 Set user roles & permissions (e.g., read-only, admin).
Example BI Tools: Power BI, Tableau, QlikView, Looker.
5. Reporting & Analysis
📌 Objective: Enable business users to analyze and visualize data.
🔹 Use BI tools to create dashboards & reports.
🔹 Perform OLAP operations (Slice, Dice, Drill-down, Pivot).
🔹 Conduct trend analysis, forecasting, and decision-making.

6. Maintenance & Performance Tuning


📌 Objective: Ensure long-term efficiency and scalability.
🔹 Monitor query performance and optimize indexing.
🔹 Periodically update cubes with new data.
🔹 Apply security patches and system updates.
OLAP operations
• OLAP (Online Analytical Processing) operations are used in data warehouses to
analyze multidimensional data efficiently. These operations help users interact
with data cubes to extract meaningful insights. The main OLAP operations
include:
1.Roll-up (Aggregation)
1. Increases the level of data aggregation by moving up the hierarchy.
2. Example: Summarizing sales data from daily to monthly or regional to national.
2.Drill-down (Disaggregation)
1. The opposite of roll-up; moves down the hierarchy to view more detailed data.
2. Example: Breaking down yearly sales into quarterly or monthly data
3. Slice
• Selects a single dimension to create a new sub-cube.
• Example: Analyzing sales data for only a specific year (e.g., 2024).
4. Dice
•Similar to slicing but selects multiple dimensions to create a more
refined sub-cube.
•Example: Filtering sales data for a specific year and region.
5. Pivot (Rotation)
•Reorients the data cube to view it from different perspectives.
•Example: Swapping rows and columns in a report.
6. Drill-through
•Allows users to access transactional data from the OLAP cube for
further analysis.
•Example: Clicking on a summary report to view detailed invoices.
Advantages of OLAP
1. Fast Data Retrieval
• OLAP is optimized for querying and analyzing large datasets quickly.
• Pre-aggregated data and indexing techniques improve response times.
2. Multidimensional Analysis
• Allows users to analyze data across multiple dimensions (e.g., time, region,
product).
• Provides a more intuitive and structured way to explore data relationships.
3. Enhanced Decision-Making
• Helps businesses make data-driven decisions by providing in-depth insights.
• Supports trend analysis, forecasting, and strategic planning.
4. Interactive & Flexible Analysis
• Users can perform operations like drill-down, roll-up, slice, dice, and pivot for
detailed insights.
• Enables dynamic reporting and ad hoc querying.
5. Data Integration
• Consolidates data from multiple sources into a single, unified view.
• Ensures consistency and accuracy in reporting.
6. Reduced Workload on Transactional Databases
• OLAP cubes store pre-aggregated data separately from operational databases.
• Reduces the load on transactional systems, improving overall performance.
Multidimensional Data
• Multidimensional data refers to data organized in multiple dimensions, allowing
complex analysis and insights. It is commonly used in OLAP (Online Analytical
Processing) to analyze business metrics across different perspectives.
Key Concepts of Multidimensional Data
1.Dimension
1. Represents a perspective for analysis (e.g., Time, Product, Region).
2. Each dimension has a hierarchy (e.g., Year → Quarter → Month → Day).
2.Fact
1. The measurable data or key business metric (e.g., Sales, Revenue, Profit).
2. Stored in fact tables, linked to dimensions.
3.Hierarchy
1. Represents levels within a dimension (e.g., City → State → Country).
4.Data Cube
1. A multidimensional structure that stores aggregated data for analysis.
2. Example: A sales cube with dimensions Time, Product, and Region.
OLAP Architectures
1. ROLAP (Relational OLAP)
🔹 Stores data in relational databases (RDBMS) and performs OLAP operations
using SQL queries.
🔹 Uses indexing and aggregation to optimize performance.
✅ Advantages:
✔ Handles large datasets efficiently.
✔ No need for pre-aggregated data, making it flexible.
✔ Supports dynamic and complex queries.
❌ Disadvantages:
✖ Slower query performance compared to MOLAP due to on-the-fly calculations.
✖ Heavy reliance on SQL queries, which may require optimization.
💡 Best for: Organizations dealing with massive datasets that require flexibility.
2. MOLAP (Multidimensional OLAP)
🔹 Uses a multidimensional data cube to store pre-aggregated data for fast
access.
🔹 Data is structured in an optimized format for OLAP queries.
✅ Advantages:
✔ Extremely fast query performance due to precomputed aggregates.
✔ Efficient for complex calculations.
✔ Data is highly compressed, reducing storage needs.
❌ Disadvantages:
✖ High storage requirements for large datasets.
✖ Data loading can be slow due to pre-aggregation.
💡 Best for: Fast, pre-defined analytics on structured data with predictable
queries.
3. HOLAP (Hybrid OLAP)
🔹 Combines the best of ROLAP and MOLAP by storing detailed data in relational
databases (ROLAP) and aggregations in multidimensional cubes (MOLAP).
✅ Advantages:
✔ Balances speed and flexibility.
✔ Uses MOLAP for fast access to precomputed data.
✔ Uses ROLAP for detailed, on-the-fly analysis.
❌ Disadvantages:
✖ More complex architecture requiring both relational and cube-based storage.
✖ May need extra configuration for optimal performance.
💡 Best for: Organizations needing both quick summaries and deep drill-downs
into large datasets.
Data Warehouse vs. OLAP

Feature Data Warehouse OLAP


Purpose Stores large amounts of structured Analyzes data efficiently
data
Data Storage Relational databases Multidimensional cubes
(Star/Snowflake schema) (MOLAP) or relational storage
(ROLAP)
Processing Batch processing (ETL) Interactive querying
Performance Optimized for storage Optimized for fast queries
Users Data engineers, IT Business analysts, decision-
makers
Hypercube vs. Multi-Cube

Feature Hypercube Multi-Cube


Structure Single, large multidimensional Multiple smaller OLAP cubes
cube
Complexity High (handles many dimensions) Moderate (simpler, focused
cubes)
Performance Can be slow for large datasets Faster, as queries run on
smaller cubes
Flexibility Less flexible, tightly integrated More flexible, modular analysis
Use Case Enterprise-wide analytics with Separate business areas with
many variables focused analysis

You might also like