0% found this document useful (0 votes)
264 views56 pages

Topic 7 - Discriminant and Cluster Analysis

Uploaded by

Ne Ne
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
264 views56 pages

Topic 7 - Discriminant and Cluster Analysis

Uploaded by

Ne Ne
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

TOPIC 7

Discriminant and
Cluster Analysis
BRM9 - Group 1
Table of contents

01 Technique overview

03 References

02 Technique practicing
01
Technique overview
DISCRIMINANT ANALYSIS

A statistical technique used to classify observations into


different categories based on a set of predictor variables

It predicts a categorical outcome variable.

Identifying the characteristics that distinguish between different groups of customers.

It can be used to develop marketing strategies that are tailored to specific customer
segments.
CLUSTER ANALYSIS

A class of techniques
Objects in each cluster
used to classify
tend to be similar to
objects or cases into
each other and
relatively
dissimilar to objects in
homogeneous groups
the other clusters
called clusters

There is no a priori
information about the
group or cluster
membership for any of
the objects
Which situation, in terms of variables’ characteristics,
are these techniques appropriate to be used?
Useful for analyzing data when the criterion or dependent
variable is categorical and the predictor or independent
variables are interval scaled.

Two-group discriminant analysis Discriminant


Discriminant analysis technique where the analysis
criterion variable has two categories.

Multiple discriminant analysis


Discriminant analysis technique where the criterion
variable involves three or more categories.
Which situation, in terms of variables’ characteristics,
are these techniques appropriate to be used?
Numerical
Cluster analysis algorithms use numerical data
to calculate distances between data points.

Cluster
Continuous
analysis
Cluster analysis algorithms typically work
best with continuous data.

Uncorrelated
Cluster analysis algorithms try to group data points
together based on their similarity.
Which situation, in terms of variables’ characteristics,
are these techniques appropriate to be used?

The number of clusters is unknown a priori


Cluster analysis algorithms can be used to identify
the optimal number of clusters in the data.
Cluster
The clusters are not well-defined analysis
Cluster analysis algorithms can be used to
identify clusters that are not well-defined in
terms of their shape or boundaries.
Differences

Discriminant Analysis
Cluster Analysis
Number of groups: Known
Number of groups: Unknown
Groups: Well-defined
Groups: Not well-defined
Goal: Predict the group
membership of new observations
Goal: Identify groups of similar
data points
With discriminant analysis:

Predict the group membership of


new data points

Identify the variables that are most important


for discriminating between the groups

Assess the accuracy of the model in


predicting the group membership of
new data points
BUSINESS QUESTIONS
With cluster analysis:

Identify groups of similar data points

Understand the patterns in the data

Segment the data into different groups


for targeted marketing or other
purposes
BUSINESS QUESTIONS
02
Technique practicing
Dependent Datasets Independent
variables variables
Which factors significantly explain the differences between online shopping
adopting groups and online shopping refusing groups?

Perceived value (PEV) Computer skill (COM) Shopping experience (SHE)

Consumers who perceive Consumers with higher Consumers with more


more value in online computer skills are more shopping experience
shopping are more likely likely to adopt online are more likely to adopt
to adopt online channels channels for shopping. online channels for
for shopping. shopping.
Which factors significantly explain the differences between online shopping
adopting groups and online shopping refusing groups?

Income (INC) Education (EDU)

Consumers with higher Consumers with higher


income are more likely to education are more likely
adopt online channels for to adopt online channels
shopping. for shopping.
Tests of Equality of Group Means

Not significant in explaining the differences

> 0.05
between consumers who adopt online
channels and consumers who refuse to make
online purchase.
2. How many percentages of the differences between these two groups of
consumers are explained by the predictors?

How much the factors can account for the


reasons why the two groups of consumers are
different from each other.

One group that adopts online shopping and


another group that refuses to adopt it. We want
to understand why they differ.
3. Among the significant factors, which one contributes most and least to
the differences between adopting and refusing group

Which factors have the biggest and


smallest impact on the differences
between the group of consumers who
adopt online shopping and the group that
refuses to adopt it.

For example: the predators that have the


function coefficient of 0.5 are more
meaningful than the one that has the
function coefficient of 0.1.
4. Assume that there are two potential consumers having following characteristics;
identify which group (adopting or refusal) each of them may belong to.

Shopping Perceived Computer


Consumer Income Education Gender Age
experience value skill

A 7 2 5 3 8 Female 25

B 1 6 7 6 3 Male 30

Compare the mean of adopting group and refusal group to A and B discriminant function,
cases with scores near to a centroid are predicted as belonging to that group
1 As many groups as
possible

Analysis of a survey
conducted to understand
2 2 groups

students' choices to enroll

3
at UEH 3 groups

Optimal number of
4 groups
Datasets

Independent
variables

5-point Likert scale


Optimal Number
Two Groups of Groups

As Many Groups
As Possible Three groups
K-Means Hierarchical
Clustering (k=2) Clustering
Two distinct Optimal number
groups will be K-Means of cluster
Cluster Analysis
formed Clustering (k=3)
(K-Means)
Each belong to Three distinct
a specific groups will be
cluster formed
BRM9 - G1

Discriminant
Analysis

TOPIC 7
Managers of an online retailer want to investigate the differences between two
groups of consumers who adopt online channel and who refuse to adopt online
channel for shopping (OSA) based on some criterion: shopping experience
(SHE), income (INC), education (EDU), perceived value (PEV), computer skill
(COM), gender and age.

Dependent
OSA
Variable

Independent
SHE, INC, EDU, PEV, COM, GENDER AND AGE
Variables
Steps for running a two-group discriminant analysis
Select ANALYZE from the SPSS menu bar

Click CLASSIFY and then


DISCRIMINANT.
Click CLASSIFY and then DISCRIMINANT.
Move “OSA” into the GROUPING VARIABLE box.

Click DEFINE RANGE. 1 for


MINIMUM and 2 for MAXIMUM.
Move “Age,” “PEV,” “COM,” “EDU,” “INC” and “Gender” into
the INDEPENDENTS box
Click on “Statistic”
Click on “Classify”
OUTPUT
1.Which factors significantly explain for the differences between online
shopping adopting group and online shopping refusing group?
Sig (SHE); Sig (Gender); Sig (Age) > 0.05: NOT significant in explaining the differences

Sig (INC); Sig (EDU); Sig (PEV); Sig (COM) < 0.05: significantly explains for the differences

=> Income (INC), education (EDU), perceived value (PEV) and computer skill (COM) are
the factors that significantly explain the differences between consumers who adopt
andconsumers who refuse to make online purchases.
2. How many percentages of the differences between these two groups of
consumers are explained by the predictors?

The Canonical Correlation coefficient is 0.57


=> The predictors (Income, Education, Perceived value and Computer
skill) explain 32.49% (= 0.57^2)
3. Among the significant factors, which one contributes most and least to
the differences between adopting and refusing group?

Shopping experience (SHE), gender and age are


NEGATIVE
=> The two groups of consumers cannot be differentiated

Education (0.585) is the MOST meaningful factor


used to explain the differences between the two groups of
consumers followed by computer skill (0.308), perceived
value (0.242) Income (0.074) is the LEAST.
4. Assume that there are two potential consumers having following
characteristics; identify which group (adopting or refusal) each of them may
belong to.
Group A: D= (0.038x2) + (0.324x5) + (0.133x3) + (0.185x8) + (-0.015x7) - 2.621
=> Group A: D= 0.849

Group B: D= (0.038x6) + (0.324x7) + (0.133x6) + (0.185x3) -0.015 - 2.621


=> Group B: D= 1,213
Group A: D= (0.038x2) + (0.324x5) + (0.133x3) + (0.185x8) + (-0.015x7) - 2.621
=> Group A: D= 0.849 (near 0.949 -> REFUSAL GROUP)

Group B: D= (0.038x6) + (0.324x7) + (0.133x6) + (0.185x3) -0.015 - 2.621


=> Group B: D= 1,213 (near 0.949 -> REFUSAL GROUP)

=> Cases with scores near to a centroid are predicted


as belonging to that group
BRM9 - G1

Cluster
Analysis

TOPIC 7
UEH DATA
The survey has been conducted to investigate students’ choice to enrol UEH. The
sample includes 50 participants who are currently UEH students. The
questionnaire is used to measure some key factors (5-point Likert scale)
including:
University reputation, coded as UR
Lecturers – FA
Learning program – PC
Financial support – CF
Facility – FACI
Student career development – CD
Social influence – PI
Classify as many groups as possible

Analyze > Classify > Hierachical Cluster


Classify as many groups as possible

Move the variables and label cases by SID


Choose Plots, then tick the Dendrogram
Choose Method, then choose Ward’s methos and Squared Eulidean distance
Classify as many groups as possible

Using the classification tree to determine


the number of cluster is a selective
process.

50 clusters
Classify as many groups as possible

Move variables and set label cases by SID


Type “50” in Number of Clusters

Create 50 clusters by Analyze > Classify > K-Means Cluster


Classify as many groups as possible
Classify as 2 groups

Create 2 clusters by K-Means Cluster


Classify as 2 groups

Group 1: Moderate Satisfaction (19,000 students)


Group 2: High Satisfaction with Emphasis on
Reputation and Teaching (31,000 students)
Classify as 3 groups

Group 1: Moderate Satisfaction (3,000 students)


Group 2: High Satisfaction Group (27,000 students)
Group 3: Moderate Satisfaction Group with Emphasis on Reputation and Lecturers (20,000
students)
Optimal number of group
Starting from the
right, between 10 and
25 there are two clear
clusters.
The gap is bridged
between 3 clusters
and 4 clusters.
However, when come
to 5 clusters, it’s a
sudden jump (gap)
=> The solution before
the gap indicate the good
solution.
In this case, it’s 4 clusters
Optimal number of group

By K-Mean Cluster, we create 4 groups by:


Group 1: High Satisfaction (20,000 students)
Group 2: High Satisfaction with Varied Preferences (11,000 students)
Group 3: Low Satisfaction Group (3,000 students)
Group 4: Moderate Satisfaction with Emphasis on Reputation (16,000 students)
BRM9 - G1

Thanks
for listening

TOPIC 7
03
References
(1) Malhotra NK. Marketing research : an applied orientation.
Upper Saddle River, Nj ; London: Prentice Hal l; 2010.

You might also like