0% found this document useful (0 votes)
10 views39 pages

DWH 03

The document provides an overview of data warehousing, defining it as a subject-oriented, integrated, nonvolatile, and time-variant collection of data that supports management decisions. It discusses key features of data warehouses, including data separation, integration, time-variance, nonvolatility, and granularity, along with their advantages for business intelligence and reporting. Additionally, it contrasts data warehouses with application-oriented databases and introduces the concept of data marts as focused subsets of data warehouses.

Uploaded by

idreesmujaddidy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views39 pages

DWH 03

The document provides an overview of data warehousing, defining it as a subject-oriented, integrated, nonvolatile, and time-variant collection of data that supports management decisions. It discusses key features of data warehouses, including data separation, integration, time-variance, nonvolatility, and granularity, along with their advantages for business intelligence and reporting. Additionally, it contrasts data warehouses with application-oriented databases and introduces the concept of data marts as focused subsets of data warehouses.

Uploaded by

idreesmujaddidy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Data Warehousing

By
Sayed Mortaza Kazemi
Overview
Data Warehouse Features
Separate

Available (No reset)

Integrated

Time stamped

Subject oriented

Nonvolatile (No data deletion)

Accessible (Summary, Raw and Historical )


Data Warehouse
• Father of data warehousing Bill Inmon Define data warehouse as:
• A Data Warehouse is a subject oriented, integrated, nonvolatile, and time
variant collection of data in support of management’s decisions.
Data Warehouse
• According to Sean Kelly, Data in data warehouse is:
• Separate
• Available
• Integrated
• Time stamped
• Subject oriented
• Nonvolatile
• Accessible
Definition of Separate Data
• In a data warehouse, data is stored separately from operational databases.
• This ensures that analytical queries do not interfere with real-time business
transactions.
• Separation allows for optimized storage, better performance, and specialized
data processing.
Advantages of Keeping Data Separate in a Data Warehouse
✅ Faster Query Performance – Separating analytical workloads from operational
data avoids performance bottlenecks.
✅ Better Data Governance – Historical and structured data is preserved without
impacting daily business operations.
✅ Enhanced Reporting & BI – Businesses can generate accurate reports without
affecting live databases.
Data Warehouse Features
Subject Oriented Data

• In operational systems, data sets contain the data that is needed


for particular application.

• For example in Bank environment, having checking account, saving


account, ATM user, Credit Card user, Bank Loan, Leasing Van, all of
these information are related to customer. In Bank customer may have
HR System Accounting System, all of these thing does not provide
the single view of customer.

• When application cross line of business and become subject


centric. Such an application is called subject oriented.
Subject-Oriented Data (Data Warehouse Approach)
• Data is organized around key business subjects such as sales, finance,
marketing, healthcare, or customer behavior rather than specific
applications.
• The goal is to support decision-making by providing a holistic view of
business trends and analytics.
• 📌 Characteristics:
✅ Focuses on high-level business areas (not individual transactions).
✅ Data is collected from multiple sources and integrated for analytical
processing.
✅ Provides historical and summarized views rather than operational data.
✅ Used in Online Analytical Processing (OLAP) for insights and reporting.
Example:
• A retail company's data warehouse stores total monthly sales per
region, customer purchase patterns, and seasonal trends instead of each
individual transaction.
• A healthcare data warehouse analyzes patient history, disease trends,
and hospital performance instead of real-time appointments.
Application-Oriented Data (Database Approach)
• 📌 Definition:
• Data is organized based on the needs of a specific application or process
such as banking transactions, inventory management, or e-commerce
order processing.
• The goal is to support real-time operations with fast read/write access.
Characteristics:

✅ Designed for transaction processing rather than historical analysis.


✅ Data is application-specific and optimized for quick retrieval.
✅ Used in Online Transaction Processing (OLTP) systems.
✅ Typically normalized to reduce redundancy and maintain consistency.
• 📌 Example:
• A banking database stores real-time account balances, deposits,
withdrawals, and transactions.
• An e-commerce database tracks order details, product availability, and
customer addresses in real-time.
Key Differences: Subject-Oriented vs. Application-Oriented Data

Subject-Oriented (Data
Feature Application-Oriented (Database)
Warehouse)
Business subjects (e.g., Sales, Operational processes (e.g., Order
Focus
Customer Analysis) Processing, Payroll)
Analytics, decision-making, and Transaction processing and real-
Purpose
trend analysis time operations
Data Structure Denormalized for faster querying Normalized to reduce redundancy
Stores historical data for long-term Stores current data for immediate
Time Perspective
trends transactions
Integrated from multiple sources Comes from a single application or
Data Sources
(ERP, CRM, financial systems) operational system
A retail data warehouse analyzing A retail database tracking daily
Example
customer spending patterns orders and inventory updates
Data Warehouse Features
Subject Oriented Data
Data Warehouse Features
Integrated Data

• In data warehouse, data is populated from various data


sources.

• Heterogeneous data sources is a variation in data


warehouse.
Definition of Integrated Data
• Data from multiple sources (e.g., sales, customer service, finance) is
combined into a unified format.
• Ensures consistency, accuracy, and completeness across the organization.
• Uses ETL (Extract, Transform, Load) processes to merge, clean, and
standardize data.
• 📌 Example:
A bank integrates customer data from different branches to create a
centralized profile for each client.
Why Integration Matters in Data Warehouses
✅ Removes Redundancy – Eliminates duplicate records.
✅ Improves Data Consistency – Standardized formats prevent mismatches.
✅ Enhances Decision-Making – Unified data allows for better business
insights.
• 📌 Example:
A retail company integrates data from:
• Point-of-sale systems (transaction details).
• Inventory databases (stock levels).
• Customer relationship management (CRM) software (customer
preferences).
This allows accurate demand forecasting and customer trend analysis.
Data Warehouse Features
Integrated Data
Data Warehouse Features
Time-Variant Data

• A data warehouse contain historical data as well as current


values. Every data structure in the data warehouse
contains the time element.
Time-Variant Data
• A data warehouse stores historical data that allows tracking of changes
over time.
• Data includes timestamps to show trends and comparisons.
• Useful for trend analysis, forecasting, and business intelligence.
• Example:
A supermarket records monthly sales trends over five years to analyze
customer buying patterns.
Data Warehouse Features
Time-Variant Data

• Example

• If a user is looking at the buying pattern of customers, the


user needs data not only about the current purchase, but
on the past purchases as well.

• When an organization owner wants to find out the reason


for the drop in sales in any region, the user needs all the
sales data for that division over a period extending back in
time.
Data Warehouse Features
Time-Variant Data

• The time-variant nature of the data in a data warehouse


• Allows analysis of the past
• Relates information to the present
• Enables forecasts for the future
Importance of Time-Stamped Data in Data Warehousing

✅ Historical Analysis – Helps analyze past trends and patterns.


✅ Business Intelligence & Reporting – Enables reports for daily, monthly, or
yearly comparisons.
✅ Time-Based Decision-Making – Useful for seasonal demand forecasting.
📌 Example:
A telecom company uses time-stamped data to analyze peak call hours for
optimizing network traffic.
Types of Time-Stamped Data in Data Warehouses
1️. Transaction Time – The time when a transaction occurs.
2. Valid Time – The time when the data is valid in the real world.
3. Loading Time – When data is added to the warehouse.
📌 Example:
A banking system tracks when a loan application was submitted,
processed, and approved.
A hospital tracks patient admission, discharge, and treatment times.
Definition of Time-Stamped Data

• Data in a data warehouse is always time-referenced.


• Each record is associated with a specific time or date.
• Helps in trend analysis, forecasting, and historical comparisons.
• 📌 Example:
A sales database records each transaction with a timestamp (e.g., "March
20, 2025, at 10:15 AM").
Data Warehouse Features
Nonvolatile Data

• When the data is loaded into data warehouse, it is neither


changed nor removed. In exceptions cases false or incorrect
data inserted erroneously into data warehouse are removed.

• In operational system data can be added, updated, or


deleted but usually the data in the data warehouse neither
updates nor deletes.

• Once the data is captured in the data warehouse, you can


not run transactions to change the data.
Data Warehouse Features
Nonvolatile Data
Data Warehouse Features
Data Granularity

• In an operational system, data is kept at the lowest level of


detail and usually summarized data is not store.

• In data warehouse usually summarized data is store.

• Data granularity in a data warehouse refers to the level of


detail.

• In data warehouse, the analysis of data start at high level


and moves down to lower levels of detail.
Nonvolatile Data
• Nonvolatility means data is not deleted or modified, only added.
• Ensures data consistency and reliability for business intelligence (BI).
• Helps in historical trend analysis and reporting.
• Example:
A university stores student admission records permanently to analyze
long-term enrollment trends.
Definition of Data Granularity
• Granularity refers to the level of detail or summarization in stored data.
• Fine-grained data = high detail, large volume.
• Coarse-grained data = less detail, smaller volume.
• 📌 Example:
• A bank records each transaction (fine-grained).
• A monthly financial report summarizes transactions (coarse-grained).
Types of Data Granularity
• 1. Fine-Grained Data (Highly Detailed)
• Definition: Stores individual transaction records.
• Pros:
• Provides detailed insights.
• Enables customized analysis.
• Cons:
• Requires large storage.
• Slow query performance for large datasets.
• 📌 Example:
• A supermarket stores every product scan at checkout.
• Useful for inventory tracking and customer preferences.
Definition of Medium-Grained Data
• Medium-grained data is a balance between fine-grained and coarse-
grained data.
• It aggregates some details while still retaining enough specificity for
analysis.
• Typically used for daily or weekly summaries rather than individual
transactions or high-level summaries.
• 📌 Example:
• A retail store tracks daily sales per product category rather than each
individual sale (fine-grained) or just total monthly revenue (coarse-
grained).
Coarse-Grained Data (Summarized Information)
• Definition: Stores aggregated or summarized data.
• Pros:
• Faster query performance.
• Requires less storage.
• Cons:
• Loses individual details.
• Limits deep-dive analysis.
• 📌 Example:
• A telecom company aggregates daily call records into monthly usage
summaries.
• Helps in trend analysis without excessive storage use
Choosing the Right Granularity

Granularity Level When to Use Example


Customer transaction
Fine-Grained Need high detail logs
Medium-Grained Need summary + some Daily sales reports
detail
Coarse-Grained Need high-level Quarterly revenue trends
summary

📌 Example:
A healthcare provider stores daily patient visits (fine) but analyzes annual disease trends (coarse).
Impact of Granularity on Data Warehouses
• ✅ Storage Requirements:
• Fine-grained needs more space, while coarse-grained is storage-efficient.
• ✅ Query Performance:
• Fine-grained queries take longer, while coarse-grained queries are faster.
• ✅ Decision-Making:
• Fine-grained supports detailed decision-making.
• Coarse-grained helps in high-level strategic planning.
• 📌 Example:
A banking data warehouse stores every transaction (fine) but generates
monthly reports (coarse) for executives.
Data Warehouse Features
Data Granularity

Example

• First step is the analysis of the total sale of units of


product in an entire State.

• Next step or level of analysis is to breakdown states in the


region and examine sale units of individual stores.

• In a data warehouse, it is efficient to keep summarized


data at different levels.
Data Warehouse Features
Data Granularity

• Data granularity in a data warehouse refers to the level of


detail. The lower the level of detail, the finer the data
granularity.

• For DW, If the stored data is in its lowest level of detail,


then large amount of data is store in the data warehouse.

• You will have to decide on the granularity levels, based on


the data granularity types and the expected system
performance for queries.
What is a Data Mart?
• Definition:
• A Data Mart is a subset of a Data Warehouse, typically focused on a single subject
area or department (e.g., sales, finance, marketing).
• Key Characteristics:
• Smaller in scope compared to Data Warehouse.
• Optimized for querying specific business needs.
• Easier and quicker access to relevant data.
Data Mart vs. Data Warehouse
• Data Warehouse:
• Large, centralized repository.
• Contains data from multiple subject areas.
• Complex structure for enterprise-wide data storage.
• Data Mart:
• A focused version of the Data Warehouse.
• Dedicated to specific departments or business functions.
• Easier to deploy and maintain.
Thank You

You might also like