Data Warehouse Fundamentals: Instructor: Paul Chen
Data Warehouse Fundamentals: Instructor: Paul Chen
Data Warehouse Fundamentals: Instructor: Paul Chen
Chapter 9
Time Line
Topic 2: Decision Processing
Overview
Decision processing systems, and their underlying
analytical applications, provide business users with the
information they need to track and analyze business
trends, and to explore new business opportunities. As
businesses become increasingly competitive and
complex, effective decision processing systems are
essential for success.
The Next Generation of Business
Intelligence
A decision processing system analyzes business
information captured from operational systems (Back-
and-front office, and e-business applications).
Distribution of business information to business users
is via corporate intranets and extranets.
The flow of data can be thought of as an information
supply chain whose objective is to convert operational
data into useful business information.
The Decision Processing Information
Supply Chain Business
Metrics
Operational
Systems
External
E-Business Analytic
Data
Applications Applications
Collaborative
DW &
Back-Office Office Systems
Transaction Business
Applications Intelligence
Information Tools
Staging
Area
Business
Front-Office Decisions
Applications
Decision Processing—Four Tasks***
Medicine
Characterizing patient behaviour to predict
surgery visits
Identifying successful medical therapies for
different illnesses.
Examples of Applications of Data
Mining via relationships and patterns
Customer profiling: characteristics of good customers are
identified with the goals of predicting who will become
one and helping marketers target new prospects.
Operational (OLTP)
Data Mining
Level of Modeling vs. Level of Analytical Processing
Decision Trees
Clustering
Factor Analysis
Neural Network
Association Rules
Rule Induction
* Based on Sakhr Youness’s book “ Professional Data Warehousing with SQL Server 7.0
and OLAP Services
Data Visualization
A pie chart showing the sales of a product by region is
Sometimes much more effective than presenting the same
Data in a text or tabular form.
9%
Northeast South 11 %
39% North
21 %
West
20 %
East
Decision Tree
Cluster Analysis
First segment (high income>8,000)
Have
Children
Second Segment (8000>middle income >3000)
Married
Own car
Factor Analysis
Unlike cluster analysis, factor analysis builds a model from data.
The technique finds underlying factors, also called “latent
variables” and provides models for these factors based on
variables in the data. For ex., a software company is considering a
survey to find out the nine most perceived attributes of one of
their products. They might categorize these products to categories
such as service for technical support, availability for training and
a help system.
T-Tests
Analysis of Variables
Linear Regression
Logistic Regression
Discriminate Analysis
Forecasting Methods
Topic 7: The Data Mining Process
Information Model
Validation
Deployment
ARE YOU READY FOR DATA
MINING?
Just because you have a data warehouse doesn’t mean
you’re necessarily ready for data mining. Much of the
work our company does in the data mining arena has
more to do with data mining readiness assessment than
with actually performing data mining.
Metrics you can use to gauge your data
mining readiness
Do you have a staff of experienced knowledge workers?
Do you have the data?
Do you have marketing processes in place that can use this
data?
Do you have a business champion who can embrace the
process and results?
Do you have the technology infrastructure to support
advanced analysis?
Topic 8: Data Mining Tools
ASSOCIATION
SEQUENCE
CLUSTERING
PREDICTIVE MODELING
In the Predictive mode patterns discovered from the database are used
to predict the future patterns or trends. Predictive modeling allows the
user to submit records with some unknown field values, and the
system will guess the unknown values based on previous patterns
discovered from the database.
In comparing the two models, one can state that “Verification” can be
very inefficient, timely and costly. Whereas, “Discovery” modeling
can be very efficient, cost effective, less dependent on user input and
increases modeling accuracy.