0% found this document useful (0 votes)
16 views34 pages

DBMS4

Discriminant analysis is a technique used to discriminate between two or more groups based on a set of predictor variables. It identifies variables that best discriminate between the groups and develops functions to classify new observations. The analysis finds variables that maximize differences between groups and minimize differences within groups. It can be used for segmentation analysis to classify prospects into groups for targeting strategies.

Uploaded by

Neeraj Panchal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views34 pages

DBMS4

Discriminant analysis is a technique used to discriminate between two or more groups based on a set of predictor variables. It identifies variables that best discriminate between the groups and develops functions to classify new observations. The analysis finds variables that maximize differences between groups and minimize differences within groups. It can be used for segmentation analysis to classify prospects into groups for targeting strategies.

Uploaded by

Neeraj Panchal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

Discriminant Analysis

Database Marketing
Instructor:Nanda Kumar
Multiple Regression
 Y = b0 + b1 X1 + b2 X2 + …+ bn Xn

 Same as Simple Regression in principle

 New Issues:
– Each Xi must represent something unique
– Variable selection
Multiple Regression
 Example 1:
– Spending = a + b income + c age

 Example 2:
– weight = a + b height + c sex + d age
Real Estate Example

 How is price related to the characteristics


of the house?
SAS Code

proc reg;
model price = section lotsize
bed bath age other;
run;
Interpreting the Regression Output

 Parameter Estimates or Slope Coefficients


capture the marginal impact of explanatory
variable on price
 Example: the coefficient of the variable
beds represents the impact of increasing
the number of bedrooms by one on price
Significance of the Coefficients
 Are they significantly different from zero?
– Look at the T values and p values
• T value higher than 1.8 or p<0.05 good
• Sometimes p<0.10 is considered reasonably
significant
 Overall Goodness of Fit
– Look at R2 (also refer to note in Session 1)
Where are we Now?
Segment 1

Secondary
Behavior
Data

Segment 2 Discriminant
Factor /Logit
Analysis Cluster Analysis
Analysis
Distinguishing
Targeting Characteristics
Web Browsing
 Identified two groups of consumers
– One that visits your website frequently
– One that doesn’t
 Can the differences in behavior be related
to socio-demographic variables?
 Can we use these discriminators to classify
prospects into one of these two groups?
Catalog Business
 Identified two consumer segments
– One which buys a lot
– Other which does not buy as much
 Can we find variables that help discriminate the
behavior of these two groups?
 Can we use these discriminators to classify other
consumers into one of these two groups?
Promotional Campaigns
 Identify groups based on their response to
promotional campaigns
– One group purchases a lot on promotion
– Other does not
 Identify characteristics that distinguish these two
groups
 Can we use these discriminators to identify price
sensitive prospects from the not so price sensitive
ones?
Segmentation Analysis
 General Problem
– Identified segments in the population based on
behavior

– Want to find targetable characteristics that


discriminate these groups
– Classify prospects into different groups
Data
Stock # GE/A ROI Stock # GE/A ROI
1 0.158 0.182 13 -0.012 -0.031
2 0.21 0.206 14 0.036 0.053
3 0.207 0.188 15 0.038 0.036
4 0.28 0.236 16 -0.063 -0.074
5 0.197 0.193 17 -0.054 -0.119
6 0.227 0.173 18 0 -0.005
7 0.148 0.196 19 0.005 0.039
8 0.254 0.212 20 0.091 0.122
9 0.079 0.147 21 -0.036 -0.072
10 0.149 0.128 22 0.045 0.064
11 0.2 0.15 23 -0.026 -0.024
12 0.187 0.191 24 0.016 0.026
Good Stocks
Good Stocks

0.25

0.2

0.15
ROI

ROI

0.1

0.05

0
0 0.05 0.1 0.15 0.2 0.25 0.3
GE/A
Bad Stocks
Bad Stocks

0.15

0.1

0.05
ROI

0 ROI
-0.1 -0.05 0 0.05 0.1
-0.05

-0.1

-0.15
GE/A
All Stocks
All Stocks
0.3

0.25

0.2

0.15

0.1
ROI

0.05

0
-0.1 0 0.1 0.2 0.3
-0.05

-0.1

-0.15 GE/A
Identifying the Best Discriminators

 Two groups appear to be well separated on


each ratio: ROI and GE/A
 Also well separated in two dimensional
space
 But this need not always be the case!
Discriminating Variables

X1

X2
Discriminant Analysis
 Identify a set of variables that best
discriminate between the two groups
 Does so by choosing a new line that
maximizes the similarity between members
of the same group and minimizing the
similarity between members belonging to
different groups
Discriminant Function

Z = w1 GEA + w2 ROI

Between-Group Sum of Squares – SS b


Within-Group Sum of Squares – SSw
 = (SSb/SSw)
More on the Criterion
 For Z to provide maximum separation
between the groups, the following must be
satisfied:
– The means of Z for the two groups should be
as far apart as possible (or high SSb)
– Values of Z for each group should be as
homogenous as possible (or low SSw)
Classification
 Discriminant Function: The line that
separates the members of the two groups
 Methods of Classification
– Cut-Off Value Method
– Decision Theory Approach
– Classification Function Approach
– Mahalanobis Distance Method
Cut-Off Value Method

 Uses the Discriminant Function line to


score new observations (prospects) and
classify them into one of two groups based
on a cut-off value
Classification

Cut-off
Value

R2 R1 Z
Classification Function Approach

 Classifications based on this approach are


identical to those done by Decision Theory
approach
 Classification functions are computed for
each group:
C1 = -7.87 + 61.237*GEA + 21.027*ROI
C2 = -0.004 + 2.551*GEA – 1.404*ROI
Basic Idea
 Score each new observation using these
two scoring functions

 The observation gets assigned to the group


with the higher score
What To Look For In The Results?

 Significance of the Discriminating


Variables
– Idea is to test whether the means of the
discriminating variables are statistically
different across the two groups
– Statistic: Wilks’ Lamda must be small (Look
for the p value/significance level)
Estimate of The Discriminant
Function
 Canonical Discriminant Function
Z = -2.0018 + 15.0919*GEA + 5.769*ROI
 It is possible that the group means are statistically
different even though for all practical purposes,
the differences between the groups may not be
large
 Look at the squared Canonical Correlation: ratio
of between group SS/Total SS (High is good)
Importance of the Discriminant Variables
and the Discriminant Function

 How important is a variable to the Discriminant


Function?
 Look at the structure loadings: Pooled Within
Canonical Structure
– Variable with the higher loading is relatively more
important
– Caution: If the variables are highly correlated relative
importance of the variables can change with sample
Classification Summary
 Look at Cross-Validation results
Web Browsing
 Can use the Discriminant function to
classify prospects into one of these two
groups

 Target Appropriately
Catalog Business
 Classify other consumers into one of these
two groups

 Do stuff!
Promotional Campaigns
 Classify Prospects into price sensitive and
not so price sensitive segments

 Target appropriately
Summary
 Discriminant Analysis
 Extremely Useful Segmentation Analysis
tool
 Intermediate step in the overall picture –
helps classify prospects and devise the
appropriate targeting strategies

You might also like