0% found this document useful (0 votes)

673 views42 pages

SPSS Tutorial Cluster Analysis

This document provides an overview of cluster analysis techniques. Cluster analysis is used to group cases into relatively homogeneous clusters. It has various applications in marketing research, such as market segmentation, understanding buyer behavior, and identifying new product opportunities. The key steps to conducting cluster analysis are selecting a distance measure and clustering algorithm, determining the number of clusters, and validating the analysis. Hierarchical and k-means clustering are common algorithms. Determining the optimal number of clusters can involve examining the agglomeration schedule for large jumps in distance coefficients or creating a scree diagram. SPSS can be used to perform principal components analysis, hierarchical clustering using Ward's method, and k-means clustering on a dataset of supermarket attributes to identify clusters of similar stores

Uploaded by

Anirban Bhowmick

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

673 views42 pages

SPSS Tutorial Cluster Analysis

Uploaded by

Anirban Bhowmick

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

SPSS Tutorial

AEB 37 / AE 802
Marketing Research Methods
Week 7
Cluster analysis
Lecture / Tutorial outline
• Cluster analysis
• Example of cluster analysis
• Work on the assignment
Cluster Analysis
• It is a class of techniques used to
classify cases into groups that are
relatively homogeneous within
themselves and heterogeneous
between each other, on the basis of
a defined set of variables. These
groups are called clusters.
Cluster Analysis and
marketing research
• Market segmentation. E.g. clustering of
consumers according to their attribute
preferences
• Understanding buyers behaviours.
Consumers with similar
behaviours/characteristics are clustered
• Identifying new product opportunities.
Clusters of similar brands/products can help
identifying competitors / market opportunities
• Reducing data. E.g. in preference mapping
Steps to conduct a
Cluster Analysis
1. Select a distance measure
2. Select a clustering algorithm
3. Determine the number of clusters
4. Validate the analysis
3

2
1

1
REGR factor score 1 for analysis

-1

-2

-3

-4
-3 -2 -1 0 1 2 3 4

REGR factor score 2 for analysis 1

Defining distance: the
Euclidean distance
n
2
Dij   x
k 1
ki  xkj 

Dij distance between cases i and j

xki value of variable Xk for case j
Problems:
• Different measures = different weights
• Correlation between variables (double
counting)
Solution: Principal component analysis
Clustering procedures
• Hierarchical procedures
– Agglomerative (start from n clusters,
to get to 1 cluster)
– Divisive (start from 1 cluster, to get to
n cluster)
• Non hierarchical procedures
– K-means clustering
Agglomerative clustering
Agglomerative
clustering
• Linkage methods
– Single linkage (minimum distance)
– Complete linkage (maximum distance)
– Average linkage
• Ward’s method
1. Compute sum of squared distances within clusters
2. Aggregate clusters with the minimum increase in the
overall sum of squares
• Centroid method
– The distance between two clusters is defined as the
difference between the centroids (cluster averages)
K-means clustering
1. The number k of cluster is fixed
2. An initial set of k “seeds” (aggregation centres) is
provided
• First k elements
• Other seeds
3. Given a certain treshold, all units are assigned to
the nearest cluster seed
4. New seeds are computed
5. Go back to step 3 until no reclassification is
necessary
Units can be reassigned in successive steps
(optimising partioning)
Hierarchical vs Non
hierarchical methods
Hierarchical Non hierarchical
clustering
clustering
• No decision about the
number of clusters • Faster, more reliable
• Problems when data • Need to specify the
contain a high level of number of clusters
error (arbitrary)
• Can be very slow • Need to set the initial
• Initial decision are seeds (arbitrary)
more influential (one-
step only)
Suggested approach
1. First perform a hierarchical
method to define the number of
clusters
2. Then use the k-means procedure
to actually form the clusters
Defining the number of
clusters: elbow rule (1)
Agglomeration Schedule
n
Stage Cluster First
Stage Number of clusters Cluster Combined Appears
0 12 StageCluster 1 Cluster 2CoefficientsCluster 1 Cluster 2Next Stage
1 11 1 4 7 .015 0 0 4
2 10 2 6 10 .708 0 0 5
3 9 3 8 9 .974 0 0 4
4 8 4 4 8 1.042 1 3 6
5 7 5 1 6 1.100 0 2 7
6 6 6 4 5 3.680 4 0 7
7 5 7 1 4 3.492 5 6 8
8 4 8 1 11 6.744 7 0 9
9 3 9 1 2 8.276 8 0 10
10 2 10 1 12 8.787 9 0 11
11 1 11 1 3 11.403 10 0 0
Elbow rule (2): the
scree diagram
12

8
Distance

0
11 10 9 8 7 6 5 4 3 2 1
Number of clusters
Validating the
analysis
• Impact of initial seeds / order of
cases
• Impact of the selected method
• Consider the relevance of the
chosen set of variables
SPSS Example
1.5 MATTHEW
JULIA

1.0 LUCY
JENNIFER
.5 NICOLE

0.0

JOHN
-.5 PAMELA
THOMAS ARTHUR

-1.0
Component2

-1.5 FRED

-2.0
-1.5 -1.0 -.5 0.0 .5 1.0 1.5 2.0

Component1
Agglomeration Schedule

Stage Cluster First

Cluster Combined Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 3 6 .026 0 0 8
2 2 5 .078 0 0 7
3 4 9 .224 0 0 5
4 1 7 .409 0 0 6
5 4 10 .849 3 0 8
6 1 8 1.456 4 0 7
7 1 2 4.503 6 2 9
8 3 4 9.878 1 5 9
9 1 3 18.000 7 8 0

Number of clusters: 10 – 6 = 4
1.5 MATTHEW
JULIA

1.0 LUCY
JENNIFER
.5 NICOLE

0.0

JOHN
-.5 PAMELA
THOMAS ARTHUR
Cluster Number of Ca

-1.0 4
Component2

3
-1.5 FRED
2

-2.0 1
-1.5 -1.0 -.5 0.0 .5 1.0 1.5 2.0

Component1
Open the dataset
supermarkets.sav
From your N: directory (if you saved it
there last time
Or download it from:
http://www.rdg.ac.uk/~aes02mm/
supermarket.sav
• Open it in SPSS
The supermarkets.sav
dataset
Run Principal
Components Analysis
and save scores
• Select the variables to perform the
analysis
• Set the rule to extract principal
components
• Give instruction to save the
principal components as new
variables
Cluster analysis:
basic steps
• Apply Ward’s methods on the
principal components score
• Check the agglomeration schedule
• Decide the number of clusters
• Apply the k-means method
Analyse / Classify
Select the component
scores

Select from here Untick this

Select Ward’s algorithm

Select
method here

Click here
first
Output: Agglomeration
schedule
Number of clusters
Identify the step where the “distance coefficients” makes a bigger
jump
The scree diagram
(Excel needed)
Distance

800

700

600

500

400

300

200

100

0
118

120

122

124

126

128

130

132

134

136

138

140

142

144

146

148
Step
Number of clusters
Number of cases 150
Step of ‘elbow’ 144
__________________________________
Number of clusters 6
Now repeat the
analysis
• Choose the k-means technique
• Set 6 as the number of clusters
• Save cluster number for each case
• Run the analysis
K-means
K-means dialog box

Specify
number of
clusters
Save cluster membership

Click here
first Thick here
Final output
Cluster membership
Component meaning
(tutorial week 5)
4. Organic radio
Component Matrixa
listener
1. “Old Rich Big
Component
Spender” 3. Vegetarian TV
1 2 3 4 5
Monthly amount spent .810 lover
-.294 -4.26E-02 .183 .173
Meat expenditure
2. Family
.480
shopper
-.152 .347 .334 -5.95E-02
Fish expenditure .525 -.206 -.475 -4.35E-02 .140
Vegetables expenditure .192 -.345 -.127 .383 5. Vegetarian
.199 TV and
-.207web hater
% spent in own-brand
.646 -.281 -.134 -.239
product
Own a car .536 .619 -.102 -.172 6.008E-02
% spent in organic food .492 -.186 .190 .460 .342
Vegetarian 1.784E-02 -9.24E-02 .647 -.287 .507
Household Size .649 .612 .135 -6.12E-02 -3.29E-03
Number of kids .369 .663 .247 .184 1.694E-02
Weekly TV watching
.124 -9.53E-02 .462 .232 -.529
(hours)
Weekly Radio listening
2.989E-02 .406 -.349 .559 -8.14E-02
(hours)
Surf the web .443 -.271 .182 -5.61E-02 -.465
Yearly household income .908 -4.75E-02 -7.46E-02 -.197 -3.26E-02
Age of respondent .891 -5.64E-02 -6.73E-02 -.228 6.942E-04
Extraction Method: Principal Component Analysis.
a. 5 components extracted.
Final Cluster Centers

Cluster
1 2 3 4 5 6
REGR factor score
-1.34392 .21758 .13646 .77126 .40776 .72711
1 for analysis 1
REGR factor score
.38724 -.57755 -1.12759 .84536 .57109 -.58943
2 for analysis 1
REGR factor score
-.22215 -.09743 1.41343 .17812 1.05295 -1.39335
3 for analysis 1
REGR factor score
.15052 -.28837 -.30786 1.09055 -1.34106 .04972
4 for analysis 1
REGR factor score
.04886 -.93375 1.23631 -.11108 .31902 .87815
5 for analysis 1
Cluster interpretation
through mean component values
• Cluster 1 is very far from profile 1 (-1.34) and
more similar to profile 2 (0.38)
• Cluster 2 is very far from profile 5 (-0.93) and
not particularly similar to any profile
• Cluster 3 is extremely similar to profiles 3 and 5
and very far from profile 2
• Cluster 4 is similar to profiles 2 and 4
• Cluster 5 is very similar to profile 3 and very far
from profile 4
• Cluster 6 is very similar to profile 5 and very far
from profile 3
Which cluster to
target?
• Objective: target the organic
consumer
• Which is the cluster that looks more
“organic”?
• Compute the descriptive statistics
on the original variables for that
cluster
Representation of factors 1
and 4
(and cluster membership)
3

2
1
REGR factor score 4 for analysis

Cluster Number of Ca
0
6

5
-1
4

3
-2
2

-3 1
-3 -2 -1 0 1 2

REGR factor score 1 for analysis 1

Time Series Analysis by State Space Methods
100% (9)
Time Series Analysis by State Space Methods
369 pages
Managerial Economics by Dominic Salvatore
100% (4)
Managerial Economics by Dominic Salvatore
790 pages
Business Statistics by (S P Gupta)
83% (229)
Business Statistics by (S P Gupta)
745 pages
Managerial Economics by Dominick Salvatore PDF
73% (11)
Managerial Economics by Dominick Salvatore PDF
335 pages
Financial Management - Prasanna Chandra
80% (10)
Financial Management - Prasanna Chandra
817 pages
Practice Problem On Capital Budgeting
94% (16)
Practice Problem On Capital Budgeting
29 pages
J. K. Sharma - Business Statistics - Problems and Solutions-Pearson Education (2010) PDF
77% (22)
J. K. Sharma - Business Statistics - Problems and Solutions-Pearson Education (2010) PDF
599 pages
Probability and Statistical Inference-CRC (2021)
89% (9)
Probability and Statistical Inference-CRC (2021)
444 pages
Business Statistics Using Excel PDF
93% (14)
Business Statistics Using Excel PDF
505 pages
Final Accounts With Adjustments Solved Problems
86% (22)
Final Accounts With Adjustments Solved Problems
10 pages
Economics Answers
100% (13)
Economics Answers
55 pages
DATA ANALYTICS - A Comprehensive Beginner's Guide To Learn About The Realms of Data Analytics From A-Z
88% (17)
DATA ANALYTICS - A Comprehensive Beginner's Guide To Learn About The Realms of Data Analytics From A-Z
102 pages
Financial Econometrics Mathematics and Statistics Theory Method and Application Hardcovernbsped 1493994271 9781493994274 - Compress
100% (1)
Financial Econometrics Mathematics and Statistics Theory Method and Application Hardcovernbsped 1493994271 9781493994274 - Compress
657 pages
Data Mining For The Masses
100% (1)
Data Mining For The Masses
77 pages
Statistics in Action
96% (24)
Statistics in Action
903 pages
Mathematical Statistics With Applications PDF
100% (16)
Mathematical Statistics With Applications PDF
644 pages
Advanced Macroeconomics
91% (11)
Advanced Macroeconomics
420 pages
(Stephen J. Taylor) Modelling Financial Times Series
100% (2)
(Stephen J. Taylor) Modelling Financial Times Series
297 pages
Economics Notes
100% (6)
Economics Notes
126 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
Business Analytics For Decision Making
100% (8)
Business Analytics For Decision Making
326 pages
Learn R Programming in A Day
100% (7)
Learn R Programming in A Day
229 pages
Cluster Analysis
No ratings yet
Cluster Analysis
30 pages
Department of Economics: ECONOMICS 481: Economics Research Paper and Seminar
No ratings yet
Department of Economics: ECONOMICS 481: Economics Research Paper and Seminar
15 pages
SAS Cluster Project Report
100% (1)
SAS Cluster Project Report
24 pages
Little Book of R For Multivariate Analysis
No ratings yet
Little Book of R For Multivariate Analysis
51 pages
17ME-ENV-48 SPSS Practical
No ratings yet
17ME-ENV-48 SPSS Practical
41 pages
The Leader Style Steve JobS
No ratings yet
The Leader Style Steve JobS
32 pages
Buyer Persona Slides
No ratings yet
Buyer Persona Slides
20 pages
Data Transformation and Arima Models A S
No ratings yet
Data Transformation and Arima Models A S
8 pages
Discrete Choice Models: Statistics For Marketing & Consumer Research
100% (1)
Discrete Choice Models: Statistics For Marketing & Consumer Research
48 pages
02-03 ASAP Business Analytics-2 Descriptive Statistics
No ratings yet
02-03 ASAP Business Analytics-2 Descriptive Statistics
109 pages
Exploratory Data Analysis - Komorowski PDF
No ratings yet
Exploratory Data Analysis - Komorowski PDF
20 pages
Dataset
No ratings yet
Dataset
104 pages
Introduction To Factor Analysis (Compatibility Mode) PDF
No ratings yet
Introduction To Factor Analysis (Compatibility Mode) PDF
20 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
91 pages
Time Series and Arima Models
No ratings yet
Time Series and Arima Models
20 pages
Group7 Session7-8
No ratings yet
Group7 Session7-8
12 pages
Practice Questions Additional PDF
No ratings yet
Practice Questions Additional PDF
33 pages
Time Series Modeling: Shouvik Mani April 5, 2018
No ratings yet
Time Series Modeling: Shouvik Mani April 5, 2018
46 pages
LDA 01 Linear Discriminant Analysis
No ratings yet
LDA 01 Linear Discriminant Analysis
65 pages
GAM: The Predictive Modeling Silver Bullet: Author: Kim Larsen
No ratings yet
GAM: The Predictive Modeling Silver Bullet: Author: Kim Larsen
27 pages
Cfa Amos
No ratings yet
Cfa Amos
8 pages
Stats Statcrunch Card PDF
No ratings yet
Stats Statcrunch Card PDF
2 pages
Statistical Methods For Decision Making
No ratings yet
Statistical Methods For Decision Making
8 pages
Statistical Methods For Decision Making
100% (1)
Statistical Methods For Decision Making
15 pages
Fundamentals of Business Statistics - Hypothesis
No ratings yet
Fundamentals of Business Statistics - Hypothesis
25 pages
Generalized Additive Model
No ratings yet
Generalized Additive Model
10 pages
How To Solve LP With Excel
100% (1)
How To Solve LP With Excel
13 pages
Spss Modeler Professional Sample Problem
No ratings yet
Spss Modeler Professional Sample Problem
3 pages
Data Mart Info
No ratings yet
Data Mart Info
5 pages
Cluster Training PDF (Compatibility Mode)
No ratings yet
Cluster Training PDF (Compatibility Mode)
21 pages
Regression: UNIT - V Regression Model
100% (1)
Regression: UNIT - V Regression Model
21 pages
Regression
No ratings yet
Regression
46 pages
(Ebook) Microeconometrics Using Stata, Second Edition, Volume I: Cross-Sectional and Panel Regression Models by A. Colin Cameron & Pravin K. Trivedi ISBN 9781597183611, 159718361X - Quickly download the ebook to never miss any content
100% (1)
(Ebook) Microeconometrics Using Stata, Second Edition, Volume I: Cross-Sectional and Panel Regression Models by A. Colin Cameron & Pravin K. Trivedi ISBN 9781597183611, 159718361X - Quickly download the ebook to never miss any content
56 pages
Food Co Case Analysis
No ratings yet
Food Co Case Analysis
7 pages
Time Series and Index
No ratings yet
Time Series and Index
27 pages
The Box-Jenkins Methodology For RIMA Models
No ratings yet
The Box-Jenkins Methodology For RIMA Models
172 pages
FinalPaper SalesPredictionModelforBigMart
No ratings yet
FinalPaper SalesPredictionModelforBigMart
14 pages
Why R Programming
No ratings yet
Why R Programming
25 pages
Time Series Analysis
100% (1)
Time Series Analysis
2 pages
Akash 5yr Pidilite
100% (1)
Akash 5yr Pidilite
9 pages
Polynomial Regression and Step Function
100% (1)
Polynomial Regression and Step Function
6 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
6 pages
R Studio How To
No ratings yet
R Studio How To
12 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Numpy
No ratings yet
Numpy
15 pages
Multivariate Data Analysis: Overview of Methods
100% (1)
Multivariate Data Analysis: Overview of Methods
30 pages
Machine Learning - Customer Segment Project. Approved by UDACITY
100% (1)
Machine Learning - Customer Segment Project. Approved by UDACITY
19 pages
Variable Selection
No ratings yet
Variable Selection
15 pages
Discriminant Analysis Chapter-Seven
No ratings yet
Discriminant Analysis Chapter-Seven
7 pages
LPP Formulation
No ratings yet
LPP Formulation
6 pages
Augmented Analytics
No ratings yet
Augmented Analytics
8 pages
Databook PDF
No ratings yet
Databook PDF
64 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
33 pages
Linear Regression with Multiple Covariates
From Everand
Linear Regression with Multiple Covariates
Brett Kottmann
No ratings yet
SPSS Tutorial Cluster Analysis PDF
No ratings yet
SPSS Tutorial Cluster Analysis PDF
42 pages
SPSS Week7
No ratings yet
SPSS Week7
42 pages
SPSS Week7
No ratings yet
SPSS Week7
42 pages
Cluster Analysis Finalllll
No ratings yet
Cluster Analysis Finalllll
24 pages
J. K. Sharma - Fundamentals of Business Statistics-Pearson Education (2014)
100% (4)
J. K. Sharma - Fundamentals of Business Statistics-Pearson Education (2014)
505 pages
Risk Analysis (Divya Jadi Booti)
No ratings yet
Risk Analysis (Divya Jadi Booti)
48 pages
Mathematical Modeling For Business Analytics
94% (16)
Mathematical Modeling For Business Analytics
451 pages
Time Series Econometrics
100% (5)
Time Series Econometrics
421 pages
Statistical Data Analysis Explained
93% (27)
Statistical Data Analysis Explained
359 pages
Operation Research
100% (5)
Operation Research
309 pages
2015 Book HandbookOfFinancialEconometric PDF
100% (9)
2015 Book HandbookOfFinancialEconometric PDF
2,880 pages
Basic Statistics PDF
100% (9)
Basic Statistics PDF
262 pages
Data Science Theory, Analysis and Applications - Memon - Ahmed
100% (12)
Data Science Theory, Analysis and Applications - Memon - Ahmed
345 pages
[EBOOK PDF] Download complete 2D object detection and recognition models algorithms and networks Yali Amit ebook
100% (2)
[EBOOK PDF] Download complete 2D object detection and recognition models algorithms and networks Yali Amit ebook
85 pages
ML QB Ans
No ratings yet
ML QB Ans
141 pages
Practice Question Bank - Machine Learning
No ratings yet
Practice Question Bank - Machine Learning
4 pages
Cluster-Based Hierarchical Demand Forecasting For Perishable Goods
No ratings yet
Cluster-Based Hierarchical Demand Forecasting For Perishable Goods
12 pages
Wiley Handbook of Software
No ratings yet
Wiley Handbook of Software
611 pages
843 Artificial Intelligence Xi Xii
No ratings yet
843 Artificial Intelligence Xi Xii
12 pages
Journal Data Mining
No ratings yet
Journal Data Mining
31 pages
Data Scientist Interview Questions and Answers PDF
No ratings yet
Data Scientist Interview Questions and Answers PDF
37 pages
lab manual
No ratings yet
lab manual
80 pages
Hands On Machine Learning with R 1st Edition Brad Boehmke (Author) instant download
100% (1)
Hands On Machine Learning with R 1st Edition Brad Boehmke (Author) instant download
60 pages
Vision Sensing Based People Following Robot A Superpixel Augmented Density Based Clustering Approach
No ratings yet
Vision Sensing Based People Following Robot A Superpixel Augmented Density Based Clustering Approach
6 pages
Artificial Intelligence Questions
No ratings yet
Artificial Intelligence Questions
49 pages
DM
No ratings yet
DM
2 pages
Uncertainty Management with Fuzzy and Rough Sets Recent Advances and Applications Rafael Bello - Own the complete ebook set now in PDF and DOCX formats
100% (1)
Uncertainty Management with Fuzzy and Rough Sets Recent Advances and Applications Rafael Bello - Own the complete ebook set now in PDF and DOCX formats
67 pages
The LION Way: Roberto Battiti Mauro Brunato
No ratings yet
The LION Way: Roberto Battiti Mauro Brunato
257 pages
PDF (Ebook) Python Programming in Context, Fourth Edition by Julie Anderson and Jon Anderson ISBN 9781284283228, 1284283224 download
100% (10)
PDF (Ebook) Python Programming in Context, Fourth Edition by Julie Anderson and Jon Anderson ISBN 9781284283228, 1284283224 download
65 pages
Module-3 Eco-598 ML & Ai
No ratings yet
Module-3 Eco-598 ML & Ai
93 pages
Data Vizualisation (Types of Charts)
No ratings yet
Data Vizualisation (Types of Charts)
159 pages
Introduction To Graph Cluster Analysis
No ratings yet
Introduction To Graph Cluster Analysis
48 pages
Jurnal Review - Object-Oriented LULC Classification in Google Earth
No ratings yet
Jurnal Review - Object-Oriented LULC Classification in Google Earth
18 pages
Machine Learning Paper Set-5
No ratings yet
Machine Learning Paper Set-5
2 pages
Applications of Graph Theory in Computer Science An Overview
100% (2)
Applications of Graph Theory in Computer Science An Overview
12 pages
R8282 Sensory Evaluation Report
No ratings yet
R8282 Sensory Evaluation Report
14 pages
New Prediction Models For Mean Particle Size in Rock Blast Fragmentation
No ratings yet
New Prediction Models For Mean Particle Size in Rock Blast Fragmentation
20 pages
Data Mining and Ware Housing
No ratings yet
Data Mining and Ware Housing
130 pages
Foad Faraji
No ratings yet
Foad Faraji
299 pages
04 Data Mining-Applications
No ratings yet
04 Data Mining-Applications
6 pages
Summer Internship Report ON: "Data Analytics"
No ratings yet
Summer Internship Report ON: "Data Analytics"
24 pages
ai-900_9 (4)
No ratings yet
ai-900_9 (4)
32 pages

SPSS Tutorial Cluster Analysis

Uploaded by

SPSS Tutorial Cluster Analysis

Uploaded by

SPSS Tutorial

REGR factor score 2 for analysis 1

Dij distance between cases i and j

Stage Cluster First

Select from here Untick this

REGR factor score 1 for analysis 1

You might also like