DataWarehousing Building Blocks
DataWarehousing Building Blocks
Definition
C Bill Inmon, considered to be the father of Data
Warehousing provides the following definition:
C “A Data Warehouse is a subject oriented, integrated,
nonvolatile, and time variant collection of data in support of
management’s decisions.”
C Defining Features are
C Subject Oriented
C Integrated
C NonVolatile
C TimeVariant
C Data Granularity
Data Warehouse—Subject-Oriented
3 September 1,2012
Subject Oriented
Data Warehouse—Integrated
C Constructed by integrating multiple, heterogeneous data
sources
C relational databases, flat files, on-line transactionrecords
C Data cleaning and data integration techniques are applied.
C Ensure consistency in naming conventions, encoding
structures,attribute measures,etc.among different data
sources
C E.g.,Hotel price:currency,tax,breakfast covered,etc.
C When data is moved to the warehouse, it is converted.
5 September 1,2012
Integrated Data
Data Warehouse—Nonvolatile
C A physically separate store of data transformed from the
operational environment
C Operational update of data does not occur in the data
warehouse environment
C Does not require transaction processing, recovery,and
concurrency control mechanisms
C Requires only two operations in data accessing:
C initial loading of data and access of data
7 September 1,2012
Non Volatile
Data Warehouse—Time Variant
C The time horizon for the data warehouse is significantly longer
than that of operational systems
C Operational database: current value data
C Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
C Every key structure in the data warehouse
C Contains an element of time, explicitly or implicitly
C But the key of operational data may or may not contain“time
element”
9
Data Granularity
Approaches for Data Warehouse Design
C Top-down or bottom-up approach?
C Enterprise-wide or departmental?
C Which first—data warehouse or data mart?
C Build pilot or go with a full-fledged implementation?
C Dependent or independent data marts?
Three Data Warehouse Models
C Enterprise warehouse
C collects all of the information about subjects spanning the
entire organization
C Data Mart
C a subset of corporate-wide data that is of value to a specific
groups of users. Its scope is confined to specific, selected
groups, such as marketing data mart
C Independent vs. dependent (directly from warehouse) datamart
C Virtual warehouse
C A set of views over operational databases
C Only some of the possible summary views may be materialized
12 September 1,2012
Data Warehouse vs Data Marts
Top Down Approach
C The advantages of this approach are:
C A truly corporate effort, an enterprise view of data
C Inherently architected—not a union of disparate data marts
C Single, central storage of data about the content
C Centralized rules and control
C May see quick results if implemented with iterations
C The disadvantages are:
C Takes longer to build even with an iterative method
C High exposure/risk to failure
C Needs high level of cross-functional skills
C High outlay without proof of concept
Bottom Up Approach
C The advantages of this approach are:
C Faster and easier implementation of manageable pieces
C Favorable return on investment and proof of concept
C Less risk of failure
C Inherently incremental; can schedule important data marts
first
C Allows project team to learn and grow
C The disadvantages are:
C Each data mart has its own narrow view of data
C Permeates redundant data in every data mart
C Perpetuates inconsistent and irreconcilable data
C Proliferates unmanageable interfaces
Practical Approach
C The steps in this practical approach are as follows:
C 1. Plan and define requirements at the overall corporate
level
C 2. Create a surrounding architecture for a complete
warehouse
C 3. Conform and standardize the data content
C 4. Implement the data warehouse as a series of supermarts,
one at a time
Data Warehouse Development: A
Recommended Approach
Multi-Tier Data
Warehouse
Distributed
Data Marts
Enterprise
Data Data
Data
Mart Mart
Warehouse