CH3 Data Warehousing
CH3 Data Warehousing
TOPIC 3 :
Data Warehousing
1
Course Overview
• Introduction
• Data Warehousing
2
• Data Warehousing, OLAP
Introduction
3
A producer wants to know….
Which
Whichare
areour
our
lowest/highest
lowest/highestmargin
margin
customers
customers??
Who
Whoare
aremy
mycustomers
customers
What and
andwhat
whatproducts
Whatisisthe
themost
most products
effective are
arethey
theybuying?
effectivedistribution
distribution buying?
channel?
channel?
What
Whatproduct
productprom- Which
prom- Whichcustomers
customers
-otions
-otionshave
havethe
thebiggest are
biggest are mostlikely
most likelyto
togo
go
impact
impactononrevenue? to
revenue? tothe
thecompetition
competition??
What
Whatimpact
impactwill
will
new
newproducts/services
products/services
have
haveon
onrevenue
revenue 4
and
andmargins?
margins?
Data, Data everywhere
yet ... • I can’t find the data I need
– data is scattered over the network
– many versions, subtle differences
6
What are the users saying...
• Data should be integrated
across the enterprise
• Summary data has a real
value to the organization
• Historical data holds the
key to understanding data
over time
• What-if capabilities are
required
7
What is Data Warehousing?
A process of
Information transforming data into
information and making
it available to users in a
timely enough manner
to make a difference
Data
8
Evolution
• 60’s: Batch reports
– hard to find and analyze information
– inflexible and expensive, reprogram every new request
35%
30%
25%
Respondents
20%
15%
10%
Initial
5% Projected 2Q96
12
Data Warehouse
• A data warehouse is a
– subject-oriented
– integrated
– time-varying
– non-volatile
14
Data Warehouse for Decision Support & OLAP
Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud Analysis
Telecommunication Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providersValue added data
Utilities Power usage analysis
20
Why Separate Data Warehouse?
• Performance
– Op dbs designed & tuned for known txs & workloads.
– Complex OLAP queries would degrade perf. for op txs.
– Special data organization, access & implementation methods
needed for multidimensional views & queries.
Function
Missing data: Decision support requires historical data, which
op dbs do not typically maintain.
Data consolidation: Decision support requires consolidation
(aggregation, summarization) of data from many
heterogeneous sources: op dbs, external sources.
Data quality: Different sources typically use inconsistent data
representations, codes, and formats which have to be
reconciled. 21
What are Operational Systems?
• They are OLTP systems
• Run mission critical
applications
• Need to work with
stringent performance
requirements for
routine tasks
• Used to run a business!
22
RDBMS used for OLTP
23
Operational Systems
• Run the business in real time
• Based on up-to-the-second data
• Optimized to handle large numbers
of simple read/write transactions
• Optimized for fast response to
predefined transactions
• Used by people who deal with
customers, products -- clerks,
salespeople etc.
• They are increasingly used by
customers
24
Examples of Operational Data
Data Industry Usage Technology Volumes
Operation Data
al Warehouse
Database
Credit
Loans Card Customer
Vendor
Product
Trust
Savings Activity 27
OLTP vs. Data Warehouse
28
OLTP vs Data Warehouse
• OLTP • Warehouse (DSS)
– Application Oriented – Subject Oriented
– Used to run business – Used to analyze
– Detailed data business
– Current up to date – Summarized and
refined
– Isolated Data
– Snapshot data
– Repetitive access
– Integrated Data
– Clerical User
– Ad-hoc access
– Knowledge User
(Manager)
29
OLTP vs Data Warehouse
30
OLTP vs Data Warehouse
• Data
OLTP Warehouse
– Query
Transaction
throughput
throughput
is theisperformance
the performance
metric
metric
– Hundreds
Thousandsofofusers
users
– Managed byin entirety
subsets
31
To summarize ...
• OLTP Systems are
used to “run” a business
32
Wal*Mart Case Study
• Founded by Sam Walton
• One the largest Super Market Chains in the US
33
Old Retail Paradigm
• Wal*Mart
• Suppliers
– Inventory Management
– Accept Orders
– Merchandise Accounts
– Promote Products
Payable
– Provide special Incentives
– Purchasing
– Monitor and Track The Incentives
– Supplier Promotions:
– BillNational,
and Collect Receivables
Region, Store
Level Retailer Demands
– Estimate
34
New (Just-In-Time) Retail Paradigm
• No more deals
• Shelf-Pass Through (POS Application)
– One Unit Price
• Suppliers paid once a week on ACTUAL items sold
– Wal*Mart Manager
• Daily Inventory Restock
• Suppliers (sometimes SameDay) ship to Wal*Mart
• Warehouse-Pass Through
– Stock some Large Items
• Delivery may come from supplier
– Distribution Center
• Supplier’s merchandise unloaded directly onto Wal*Mart Trucks
35
Wal*Mart System
• NCR 5100M 96 24 TB Raw Disk; 700 - 1000 Pentium
CPUs
Nodes;
> 5 Billions
• Number of Rows:
• Historical Data: 65 weeks (5 Quarters)
• New Daily Volume: Current Apps: 75 Million
New Apps: 100 Million +
• Number of Users: Thousands
LOCATION
•Eg : If chosen aggregation T
DA
hierarchy is “LOCATION”
CITY
STATE
COUNTRY
CONTINENT
PRODUCT 38
Representing Data in Cube
Type of car
220 230 250 360 Central
Area East
140 160 100 100
South
50 80 90 90
Location
BWP
1200 1500 2100 Product
(camera)
42
Snowflake Schema
PRODUCT
Product Code FACT @ DAY
MAJOR TABLE
Description Day Code
Type SALES Month
Product Code Year
LOCATION Day Code
Location Code Dimension @
Location Code
Minor Table
Post Code Quantity
Unit Price
POST CODE
State
“Expanded from 43
LOCATION minor table”
Data Warehouse
vs. Data Marts
What comes first
From the Data Warehouse to Data Marts
Information
Individually Less
Structured
History
Departmentally
Normalized
Structured
Detailed
Organizationally More
Structured Data Warehouse
Data 45
Data Warehouse and Data Marts
OLAP
Data Mart
Lightly summarized
Departmentally structured
Organizationally structured
Atomic
Detailed Data Warehouse Data
46
Characteristics of the Departmental Data Mart
• OLAP
• Small
• Flexible
• Customized by
Department
• Source is departmentally
structured data
warehouse
47
Techniques for Creating Departmental Data Mart
• OLAP
• Summarized
• Superset
• Indexed
• Arrayed
48
Data Mart Centric
Data Sources
Data Marts
Data Warehouse
49
Problems with Data Mart Centric Solution
50
True Warehouse
Data Sources
Data Warehouse
Data Marts
51