Data Warehouse Components
Data Warehouse Components
COMPONENTS
What is Data Warehouse?
• Loosely speaking, a data warehouse refers to a database that
is maintained separately from an organization’s operational
database
• Officially speaking:
• “A data warehouse is a subject-oriented, integrated, time-
variant, and nonvolatile collection of data in support of
management’s decision-making process.”—W. H. Inmon
Data Warehouse—Subject-Oriented
Data
Data
Claims
Claims Losses
Losses Premium
Premium
Accounting
Accounting Processing
Processing
System
System Billing System
System
Billing
System
System
4
Data Warehouse—Integrated
• Constructed by integrating multiple, heterogeneous
data sources
– relational databases, flat files, on-line transaction
records
• Data cleaning and data integration techniques are
applied.
– Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different
data sources
• E.g., Hotel price: currency, tax, breakfast covered, etc.
– Before data is moved to the warehouse, it is
transformed to a common scheme.
Integrated
• Data is stored once in a single integrated location
(e.g. insurance company)
Auto
AutoPolicy
Policy
Processing Data Warehouse
Processing Database
System
System
Customer
Fire
FirePolicy
Policy
data Processing
stored Processing
System
System
in several
databases
Subject = Customer
FACTS,
FACTS,LIFE
LIFE
Commercial,
Commercial,Accounting
Accounting
Applications
Applications
6
Data Warehouse—Time Variant
Time Data
{
Key
8
Data Warehouse—Nonvolatile
• A physically separate store of data transformed from the
operational environment
• Operational update of data does not occur in the data
warehouse environment
– Does not require transaction processing, recovery,
and concurrency control mechanisms
– Requires only two operations in data accessing:
• initial loading of data and access of data
Non-Volatile
• Existing data in the warehouse is not overwritten or
updated. External
Sources
Production Data
Databases Warehouse
Data
Data Database
Production
Production Warehouse
Warehouse
Applications
Applications Environment
Environment
• Load
• Update
• Insert • Read-Only
• Delete
10
Data Warehouse vs. Heterogeneous DBMS
• Two-layer
Operational Informational
– Real-time + derived data systems systems
19
Three-layer Architecture: Conceptual View
• Transformation of real-time data to derived
data really requires two steps
Operational Informational
systems systems
View level
“Particular informational
Derived Data
needs”
Reconciled Data
Physical Implementation
of the Data Warehouse
Real-time data
20
Data Warehouse: A Multi-Tiered Architecture
Monitor
Metadata & OLAP Server
Other
sources Integrator
Analysis
Operational Extract Query
DBs Transform Data Serve Reports
Load
Refresh
Warehouse Data mining
Data Marts
• Characteristics include:
– Do not normally contain detailed operational data
unlike data warehouses.
– May contain certain levels of aggregation