Cluster Analysis: Hierarchical Agglomerative Cluster Analysis Use of A Created Cluster Variable in Secondary Analysis
Cluster Analysis: Hierarchical Agglomerative Cluster Analysis Use of A Created Cluster Variable in Secondary Analysis
Cluster Analysis
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
2
KEY CONCEPTS
*****
Cluster Analysis
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
3
Factor Analysis
Q-Analysis
Density Methods
Multivariate probability approaches
(NORMIX, NORMAP)
Clumping Methods
Graphic Methods
Glyphs & Metroglyphs
Fourier Series
Chernoff Faces
Agglomeration Schedule
Fusion coefficient
Alternative ways to determine the optimal number of clusters
Criteria: clusters as internally homogeneous and significantly different from each other
Dendrogram
Scaled distance
Cluster scores
Profiling clusters
Using a cluster variable as an IV or DV in secondary analysis
Sokal, Robert & Smeath, Peter, Principles of Numerical Taxonomy (1963)
Steps in cluster analysis
Variable selection, construction of data base, testing assumptions
Selecting measure of similarity/distance
Selecting clustering algorithm
Determining number of clusters
Profile clusters
Validation
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
4
Cluster Analysis
Interdependency Technique
Assumptions
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
5
Cluster Analysis
Numerical Taxonomy
Q-Analysis
Typology Analysis
Classification Analysis
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
6
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
7
Example
Example
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
8
Matrix of Dissimilarities
Subjects 1 2 3 … N
… … … … …
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
9
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
10
Technique Technique
* Available in SPSS
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
11
An Example of
Squared Euclidean Distances
Subjects
Variables
Subject Subject (Si - Sj) (Si - Sj) 2
1 2
X1 18 19 -1 1
X2 15 17 -2 4
X3 9 10 -1 1
X4 12 10 +2 4
X5 0 1 +1 1
X6 1 1 0 0
X7 9 8 +1 1
Totals NA NA NA 12
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
12
Hierarchical methods
Non-hierarchical methods
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
13
Centroid Method *
Median Clustering
* Available in SPSS
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
14
Factor Analysis
Q-Factor Analysis
Density Methods
Clumping Methods
Graphic Methods
Glyphs
Metroglyphs
Fourier Series
Chernoff Faces
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
15
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
16
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
17
The variables
Age: age
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
18
Subjects 1 2 3 … 70
… … … … …
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
19
1 62 63 .255 0 0 40
2 31 33 .610 0 0 37
3 2 3 1.021 0 0 43
4 7 8 1.502 0 0 31
5 29 30 1.984 0 0 45
6 14 15 2.495 0 0 31
7 52 67 3.031 0 0 34
8 18 19 3.588 0 0 49
9 46 47 4.191 0 0 35
10 27 28 4.803 0 0 44
11 36 40 5.437 0 0 33
12 9 13 6.095 0 0 49
13 48 49 6.760 0 0 51
14 32 38 7.435 0 0 42
15 20 21 8.128 0 0 39
16 22 64 8.844 0 0 39
17 35 39 9.580 0 0 52
18 5 12 10.324 0 0 36
19 23 24 11.093 0 0 29
20 57 59 11.878 0 0 32
21 37 43 12.702 0 0 42
22 6 10 13.551 0 0 55
23 1 4 14.439 0 0 28
24 11 45 15.358 0 0 46
25 41 44 16.284 0 0 33
26 55 56 17.220 0 0 41
27 51 66 18.237 0 0 48
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
20
28 1 50 19.329 23 0 47
29 17 23 20.483 0 19 38
30 54 69 21.732 0 0 41
31 7 14 23.076 4 6 46
32 57 58 24.425 20 0 53
33 36 41 25.784 11 25 40
34 52 53 27.173 7 0 51
35 42 46 28.626 0 9 58
36 5 16 30.251 18 0 54
37 31 34 32.018 2 0 62
38 17 68 33.905 29 0 59
39 20 22 35.806 15 16 57
40 36 62 37.855 33 1 56
41 54 55 39.918 30 26 50
42 32 37 42.118 14 21 52
43 2 65 44.428 3 0 47
44 25 27 46.758 0 10 45
45 25 29 49.344 44 5 59
46 7 11 52.395 31 24 54
47 1 2 55.709 28 43 63
48 26 51 59.223 0 27 61
49 9 18 62.772 12 8 57
50 54 70 66.383 41 0 65
51 48 52 70.076 13 34 60
52 32 35 73.798 42 17 58
53 57 60 77.659 32 0 65
54 5 7 81.736 36 46 55
55 5 6 86.189 54 22 64
56 36 61 90.955 40 0 66
57 9 20 97.853 49 39 60
58 32 42 105.430 52 35 62
59 17 25 114.736 38 45 67
60 9 48 125.105 57 51 61
61 9 26 136.517 60 48 63
62 31 32 150.461 37 58 68
63 1 9 167.695 47 61 64
64 1 5 194.756 63 55 66
65 54 57 222.045 50 53 67
66 1 36 258.210 64 56 68
67 17 54 298.955 59 65 69
68 1 31 361.556 66 62 69
69 1 17 483.000 68 67 0
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
21
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
22
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
23
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
24
What is a Dendogram?
* * * * * * H I E R A R C H I C A L C L U S T E R A N A L Y S I S * * * * * *
C A S E 0 5 10 15 20 25
Label Num +---------+---------+---------+---------+---------+
Case 62 62 -+
Case 63 63 -+
Case 36 36 -+
Case 40 40 -+-------------+
Case 41 41 -+ |
Case 44 44 -+ |
Case 61 61 -+ |
Case 6 6 -+ |
Case 10 10 -+ |
Case 5 5 -+---------+ +---------+
Case 12 12 -+ | | |
Case 16 16 -+ | | |
Case 11 11 -+ | | |
Case 45 45 -+ | | |
Case 7 7 -+ | | |
Case 8 8 -+ | | |
Case 14 14 -+ +---+ |
Case 15 15 -+ | |
Case 1 1 -+ | |
Case 4 4 -+ | |
Case 50 50 -+-----+ | |
Case 2 2 -+ | | |
Case 3 3 -+ | | |
Case 65 65 -+ | | |
Case 51 51 -+ +---+ |
Case 66 66 -+---+ | |
Case 26 26 -+ | | +-----------------------+
Case 48 48 -+ | | | |
Case 49 49 -+---+-+ | |
Case 52 52 -+ | | |
Case 67 67 -+ | | |
Case 53 53 -+ | | |
Case 20 20 -+ | | |
Case 21 21 -+-+ | | |
Case 22 22 -+ | | | |
Case 64 64 -+ +-+ | |
Case 18 18 -+ | | |
Case 19 19 -+-+ | |
Case 9 9 -+ | |
Case 13 13 -+ | |
Case 31 31 -+ | |
Case 33 33 -+---+ | |
Case 34 34 -+ | | |
Case 46 46 -+ +-------------------+ |
Case 47 47 -+-+ | |
Case 42 42 -+ +-+ |
Case 35 35 -+ | |
Case 39 39 -+-+ |
Case 32 32 -+ |
Case 38 38 -+ |
Case 37 37 -+ |
Case 43 43 -+ |
Case 23 23 -+ |
Case 24 24 -+ |
Case 17 17 -+-+ |
Case 68 68 -+ +-------------+ |
Case 29 29 -+ | | |
Case 30 30 -+-+ | |
Case 27 27 -+ | |
Case 28 28 -+ | |
Case 25 25 -+ +-------------------------------+
Case 55 55 -+ |
Case 56 56 -+ |
Case 54 54 -+---------+ |
Case 69 69 -+ | |
Case 70 70 -+ +-----+
Case 57 57 -+ |
Case 59 59 -+ |
Case 58 58 -+---------+
Case 60 60 -+
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
25
The Dendogram
Interpretation
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
26
Cluster scores
5-Cluster Solution
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
27
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
28
Cluster 1
+------------------------+-----------+
| | Mean |
+------------------------+-----------+
|SENTENCE | 4.6 |
| | |
|PR_CONV | 1.5 |
| | |
|DR_SCORE | 7.5 |
| | |
|AGE | 21.6 |
| | |
|AGE_FIRS | 16.2 |
| | |
|EDUC_EQV | 7.3 |
| | |
|SKL_INDX | 6.0 |
+------------------------+-----------+
Ward Method
Cluster 2
+------------------------+-----------+
| | Mean |
+------------------------+-----------+
|SENTENCE | 7.3 |
| | |
|PR_CONV | 4.8 |
| | |
|DR_SCORE | 5.7 |
| | |
|AGE | 24.7 |
| | |
|AGE_FIRS | 14.4 |
| | |
|EDUC_EQV | 3.4 |
| | |
|SKL_INDX | 2.8 |
+------------------------+-----------+
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
29
Cluster 3
+------------------------+-----------+
| | Mean |
+------------------------+-----------+
|SENTENCE | 2.4 |
| | |
|PR_CONV | .9 |
| | |
|DR_SCORE | 3.3 |
| | |
|AGE | 21.3 |
| | |
|AGE_FIRS | 19.3 |
| | |
|EDUC_EQV | 3.3 |
| | |
|SKL_INDX | 2.5 |
+------------------------+-----------+
Ward Method
Cluster 4
+------------------------+-----------+
| | Mean |
+------------------------+-----------+
|SENTENCE | 3.1 |
| | |
|PR_CONV | .9 |
| | |
|DR_SCORE | 3.0 |
| | |
|AGE | 20.6 |
| | |
|AGE_FIRS | 19.0 |
| | |
|EDUC_EQV | 10.7 |
| | |
|SKL_INDX | 8.1 |
+------------------------+-----------+
Ward Method
Cluster 5
+------------------------+-----------+
| | Mean |
+------------------------+-----------+
|SENTENCE | 16.3 |
| | |
|PR_CONV | 2.1 |
| | |
|DR_SCORE | 8.1 |
| | |
|AGE | 30.2 |
| | |
|AGE_FIRS | 14.7 |
| | |
|EDUC_EQV | 5.3 |
| | |
|SKL_INDX | 3.8 |
+------------------------+-----------+
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
30
Variable Clusters
1 2 3 4 5
Age M H L LL HH
Age_Firs M L HH H LL
Dr_Score H M L LL HH
Educ_Eqv H L LL HH M
Pr_Conv M HH L LL H
Sentence M H LL L HH
Skl_Indx H L LL HH M
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
31
Cluster 2
Cluster 3
Cluster 4
Cluster 5
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
32
Example
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
33
N
Ward 1 33
Method 2 9
3 12
4 7
5 9
Interpretation
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
34
Multiple Comparisons
Mean
Difference 95% Confidence Interval
(I) Ward Method (J) Ward Method (I-J) Std. Error Sig. Lower Bound Upper Bound
1 2 -2.5152* .5264 .000 -3.9921 -1.0382
3 1.2348 .4718 .079 -8.9081E-02 2.5588
4 1.3420 .5825 .157 -.2923 2.9763
5 -2.8485* .5264 .000 -4.3254 -1.3716
2 1 2.5152* .5264 .000 1.0382 3.9921
3 3.7500* .6172 .000 2.0182 5.4818
4 3.8571* .7054 .000 1.8779 5.8364
5 -.3333 .6598 .987 -2.1847 1.5181
3 1 -1.2348 .4718 .079 -2.5588 8.908E-02
2 -3.7500* .6172 .000 -5.4818 -2.0182
4 .1071 .6657 1.000 -1.7607 1.9750
5 -4.0833* .6172 .000 -5.8152 -2.3515
4 1 -1.3420 .5825 .157 -2.9763 .2923
2 -3.8571* .7054 .000 -5.8364 -1.8779
3 -.1071 .6657 1.000 -1.9750 1.7607
5 -4.1905* .7054 .000 -6.1697 -2.2112
5 1 2.8485* .5264 .000 1.3716 4.3254
2 .3333 .6598 .987 -1.5181 2.1847
3 4.0833* .6172 .000 2.3515 5.8152
4 4.1905* .7054 .000 2.2112 6.1697
Based on observed means.
*. The mean difference is significant at the .05 level.
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
35
SER_INDX
a,b,c
Tukey HSD
Subset
Ward Method N 1 2
4 7 2.1429
3 12 2.2500
1 33 3.4848
2 9 6.0000
5 9 6.3333
Sig. .196 .982
Means for groups in homogeneous subsets are displayed.
Based on Type III Sum of Squares
The error term is Mean Square(Error) = 1.959.
a. Uses Harmonic Mean Sample Size = 10.445.
b. The group sizes are unequal. The harmonic mean
of the group sizes is used. Type I error levels are
not guaranteed.
c. Alpha = .05.
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
36
Example
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
37
Discriminant
Group Statistics
Valid N (listwise)
Ward Method Unweighted Weighted
1 COUNSEL 33 33.000
JAIL_TM 33 33.000
TM_DISP 33 33.000
2 COUNSEL 9 9.000
JAIL_TM 9 9.000
TM_DISP 9 9.000
3 COUNSEL 12 12.000
JAIL_TM 12 12.000
TM_DISP 12 12.000
4 COUNSEL 7 7.000
JAIL_TM 7 7.000
TM_DISP 7 7.000
5 COUNSEL 9 9.000
JAIL_TM 9 9.000
TM_DISP 9 9.000
Total COUNSEL 70 70.000
JAIL_TM 70 70.000
TM_DISP 70 70.000
Analysis 1
Summary of Canonical Discriminant Functions
Eigenvalues
Canonical
Function Eigenvalue % of Variance Cumulative % Correlation
1 .492a 89.5 89.5 .574
2 .042a 7.6 97.1 .200
3 .016a 2.9 100.0 .125
a. First 3 canonical discriminant functions were used in the
analysis.
Wilks' Lambda
Wilks'
Test of Function(s) Lambda Chi-square df Sig.
1 through 3 .633 29.686 12 .003
2 through 3 .945 3.678 6 .720
3 .984 1.019 2 .601
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
38
Function
1 2 3
COUNSEL .549 .863 .523
JAIL_TM -.627 .807 .607
TM_DISP .102 .384 -.962
Structure Matrix
Function
1 2 3
JAIL_TM -.867* .488 .103
COUNSEL .848* .455 .271
TM_DISP -.086 .555 -.827*
Pooled within-groups correlations between discriminating
variables and standardized canonical discriminant functions
Variables ordered by absolute size of correlation within function.
*. Largest absolute correlation between each variable and
any discriminant function
Function
1 2 3
COUNSEL 1.235 1.943 1.176
JAIL_TM -.016 .020 .015
TM_DISP .004 .015 -.039
(Constant) -.304 -3.221 2.205
Unstandardized coefficients
Function
Ward Method 1 2 3
1 .213 -.115 -9.76E-02
2 -.803 .140 -2.89E-02
3 .673 .366 4.822E-02
4 .618 -.266 .291
5 -1.357 -1.51E-03 9.600E-02
Unstandardized canonical discriminant functions
evaluated at group means
Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University