0% found this document useful (0 votes)
44 views10 pages

Binder 3

The document discusses using a hierarchical clustering method and k-means clustering to define the optimal number of clusters in a dataset. It describes using the elbow method by plotting within-group sum of squares against the number of clusters and selecting the elbow point. It also discusses validating the cluster analysis by examining the impact of initial seeds, the selected method, and relevant variables. An SPSS example is provided to illustrate defining 4 clusters from a dataset.

Uploaded by

Atiqul Islam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views10 pages

Binder 3

The document discusses using a hierarchical clustering method and k-means clustering to define the optimal number of clusters in a dataset. It describes using the elbow method by plotting within-group sum of squares against the number of clusters and selecting the elbow point. It also discusses validating the cluster analysis by examining the impact of initial seeds, the selected method, and relevant variables. An SPSS example is provided to illustrate defining 4 clusters from a dataset.

Uploaded by

Atiqul Islam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Suggested approach

1. First perform a hierarchical


method to define the number of
clusters
2. Then use the k-means procedure
to actually form the clusters
Defining the number of
clusters: elbow rule (1)
Agglomeration Schedule
n
Stage Cluster First
Stage Number of clusters Cluster Combined Appears
0 12 StageCluster 1 Cluster 2CoefficientsCluster 1 Cluster 2Next Stage
1 11 1 4 7 .015 0 0 4
2 10 2 6 10 .708 0 0 5
3 9 3 8 9 .974 0 0 4
4 8 4 4 8 1.042 1 3 6
5 7 5 1 6 1.100 0 2 7
6 6 6 4 5 3.680 4 0 7
7 5 7 1 4 3.492 5 6 8
8 4 8 1 11 6.744 7 0 9
9 3 9 1 2 8.276 8 0 10
10 2 10 1 12 8.787 9 0 11
11 1 11 1 3 11.403 10 0 0
Elbow rule (2): the
scree diagram
12

10

8
Distance

0
11 10 9 8 7 6 5 4 3 2 1
Number of clusters
Validating the
analysis
• Impact of initial seeds / order of
cases
• Impact of the selected method
• Consider the relevance of the
chosen set of variables
SPSS Example
1.5 MATTHEW
JULIA

1.0 LUCY
JENNIFER
.5 NICOLE

0.0

JOHN
-.5 PAMELA
THOMAS ARTHUR

-1.0
Component2

-1.5 FRED

-2.0
-1.5 -1.0 -.5 0.0 .5 1.0 1.5 2.0

Component1
Agglomeration Schedule

Stage Cluster First


Cluster Combined Appears
Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage
1 3 6 .026 0 0 8
2 2 5 .078 0 0 7
3 4 9 .224 0 0 5
4 1 7 .409 0 0 6
5 4 10 .849 3 0 8
6 1 8 1.456 4 0 7
7 1 2 4.503 6 2 9
8 3 4 9.878 1 5 9
9 1 3 18.000 7 8 0

Number of clusters: 10 – 6 = 4
1.5 MATTHEW
JULIA

1.0 LUCY
JENNIFER
.5 NICOLE

0.0

JOHN
-.5 PAMELA
THOMAS ARTHUR
Cluster Number of Ca

-1.0 4
Component2

3
-1.5 FRED
2

-2.0 1
-1.5 -1.0 -.5 0.0 .5 1.0 1.5 2.0

Component1
Open the dataset
supermarkets.sav
From your N: directory (if you saved it
there last time
Or download it from:
http://www.rdg.ac.uk/~aes02mm/
supermarket.sav
• Open it in SPSS
The supermarkets.sav
dataset

You might also like