0% found this document useful (0 votes)

430 views38 pages

Cluster Analysis: Hierarchical Agglomerative Cluster Analysis Use of A Created Cluster Variable in Secondary Analysis

Cluster analysis is a technique used to group subjects into clusters based on similarities across multiple variables. It involves creating a distance matrix to quantify the similarity between each pair of subjects, then using clustering algorithms to sort subjects into groups that are as internally homogeneous as possible but distinctly different from other groups. There are many options for measuring distances, clustering algorithms, and determining the optimal number of clusters, and results can vary, so there is no single best approach to cluster analysis.

Uploaded by

rohit7853

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

430 views38 pages

Cluster Analysis: Hierarchical Agglomerative Cluster Analysis Use of A Created Cluster Variable in Secondary Analysis

Uploaded by

rohit7853

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 38

1

Cluster Analysis

Hierarchical agglomerative cluster

analysis

Use of a created cluster variable in

secondary analysis

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
2

KEY CONCEPTS
*****
Cluster Analysis

Research questions addressed by cluster analysis

Cluster analysis assumptions
Alternative names for cluster analysis
Caveats in using cluster analysis
Similarity/dissimilarity matrix, also called a distance matrix
• Squared Euclidean distance
• Euclidean distance
• Cosine of vector variables
• City block (Manhattan distance)
• Chebychev distance metric
• Distances in absolute power metric
• Pearson product-moment correlation coefficient
• Minkowski metric
• Mahalanobis D2
• Jaccard's coefficient(s)
• Gower's coefficient
• Simple matching coefficient
Cluster-seeking vs. cluster-imposing methods
Clustering algorithms
• Hierarchical Methods
Agglomerative Methods
Single average/linkage (nearest neighbor)
Complete average/linkage (furthest neighbor)
Average linkage
Ward's error sum of squares
Centroid method
Median clustering
-Divisive Methods
K-means clustering
Trace methods
A Splinter-Average Distance method
Automatic Interaction Detection (AID)
• Non-Hierarchical Methods
Iterative Methods
Sequential threshold method
Parallel threshold method
Optimizing methods

KEY CONCEPTS (CONT.)

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
3

Factor Analysis
Q-Analysis
Density Methods
Multivariate probability approaches
(NORMIX, NORMAP)
Clumping Methods
Graphic Methods
Glyphs & Metroglyphs
Fourier Series
Chernoff Faces
Agglomeration Schedule
Fusion coefficient
Alternative ways to determine the optimal number of clusters
Criteria: clusters as internally homogeneous and significantly different from each other
Dendrogram
Scaled distance
Cluster scores
Profiling clusters
Using a cluster variable as an IV or DV in secondary analysis
Sokal, Robert & Smeath, Peter, Principles of Numerical Taxonomy (1963)
Steps in cluster analysis
Variable selection, construction of data base, testing assumptions
Selecting measure of similarity/distance
Selecting clustering algorithm
Determining number of clusters
Profile clusters
Validation

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
4

Cluster Analysis

Interdependency Technique

 Designed to group a sample of subjects

Into significantly different groups

Based upon a number of variables

 The groups are constructed to be as different

as statistically possible

And as internally homogeneous as

statistically possible

Assumptions

 The sample needs to be representative of

the population

 Multiple collinearity among the variables

should be minimal

 Absence of outliers & good N to k ratio

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
5

Cluster Analysis by Other Names

Similar techniques have been independently
developed in various fields, giving rise to different
names for this statistical technique (e.g. biology,
archeology. etc.)

Cluster Analysis

Numerical Taxonomy

Q-Analysis

Typology Analysis

Classification Analysis

There are a number of different clustering

techniques depending upon …

The procedure used to measure the similarity

or distance among subjects

And the clustering algorithm used.

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
6

Caveats in Using Cluster Analysis

 There is no one best way to perform a cluster

analysis

 There are many methods and most lack rigorous

statistical reasoning or proofs

 Cluster analysis is used in different disciplines,

which favor different techniques for:

Measuring the similarity or distance among

subjects relative to the variables

And the clustering algorithm used

 Different clustering techniques can produce

different cluster solutions

 Cluster analysis is supposed to be “cluster

-seeking”, but in fact it is “cluster - imposing”

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
7

Applications of Cluster Analysis

Cluster analysis seeks to reduce a sample of cases

to a few statistically different groups, i.e. clusters,
based upon differences/similarities across a set of
multiple variables

A useful tool for constructing typologies among

cases

Example

Is each case filed with court unique, or can

cases be sorted into distinctly different types
based upon the amount of the evidence, quality
of the defense, complexity of the charges, etc.?

Example

Is a murder a murder, or can cases be sorted

into distinctively different types on the basis of
victim/offender characteristics, circumstances,
motives, etc.?

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
8

The Logic of Cluster Analysis

Step 1 Cluster analysis begins with an N x k

database

Step 2 Using one of several methods, an N x N

matrix is created that indicates the similarity (or
dissimilarity) of very case to every other case, based
on the k number of variables

Matrix of Dissimilarities

Subjects 1 2 3 … N

1 1.782 2.538 … 47.236

2 1.782 0.821 … 39.902

3 2.538 0.821 … 41.652

… … … … …

n 47.236 39.902 41.652 …

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
9

The Logic of Cluster Analysis (cont.)

Step 3 Using one of several clustering algorithms,

the subjects are sorted into significantly different
groups where …

The subjects within each group are as

homogeneous as possible, and …

The groups are as different from one another as

possible

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
10

Measures of Similarity or Difference

Cluster analysis begins by creating a matrix

indicating the similarity between (or the distance
between) each pair of subjects relative to the k
variables in the database.

There are a number of ways that this can be done.

Technique Technique

Squared Euclidean Distance * Pearson Correlation Coefficient *

Euclidean Distance * Mahalanobis D 2 *

Cosine of Vector Variables * Minkowski Metric *

City Block or Manhattan Distances * Jaccard’s Coefficient

Chebychev Distance Metric * Gower’s Coefficient

Distances in the Absolute Power Simple Matching Coefficient

Metric

* Available in SPSS

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
11

An Example of
Squared Euclidean Distances

Subjects
Variables
Subject Subject (Si - Sj) (Si - Sj) 2
1 2

X1 18 19 -1 1

X2 15 17 -2 4

X3 9 10 -1 1

X4 12 10 +2 4

X5 0 1 +1 1

X6 1 1 0 0

X7 9 8 +1 1

Totals NA NA NA 12

Squared Euclidean Distance =  (Si - Sj) 2 = 12

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
12

A Variety of Clustering Algorithms

 There is no proven best way to cluster subjects

into homogeneous groups

 Different techniques have been developed in

different fields based upon different logics (e.g.
biology, archeology, etc.)

 Given the same database, similar clustering

results can be achieved using different clustering
algorithms, but not always.

 Clustering algorithms are generally classified into

two broad types …

Hierarchical methods

Non-hierarchical methods

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
13

Hierarchical Clustering Algorithms

Agglomerative Methods Divisive Methods

Single Average (Nearest K-Means Clustering *

Neighbor) *

Complete Average (Furthest Trace Methods

Neighbor) *

Average Linkage * A-Splinter-Average Distance

Method

Ward’s Error Sum of Squares * Automatic Interaction Detection

(AID)

Centroid Method *

Median Clustering

* Available in SPSS

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
14

Non-hierarchical Clustering Algorithms

Iterative Methods

Sequential Threshold Method

Parallel Threshold Method
Optimization Methods

Factor Analysis

Q-Factor Analysis

Density Methods

Multivariate Probability Approaches

NORMIX
NORMAP

Clumping Methods

Graphic Methods

Glyphs
Metroglyphs
Fourier Series
Chernoff Faces

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
15

An Example of a Clustering Algorithm

Ward’s Errors Sum of Squares Algorithm

Imagine that data on seven variables (Xk) was

gathered on 70 subjects (n)

Imagine further that a dissimilarity matrix was

constructed indicating the differences among all
pairs of subjects using squared Euclidean distances

Step 1 Ward's algorithm begins with each of 70

subjects in their own cluster

Step 2 Next it finds the two subjects that are most

similar and creates a cluster with two subjects

Now there are 69 clusters, one with two

subjects, and 68 with one subject each

Step 3 Now it finds the next two most similar

subjects and creates a two-subject cluster

Now there are 68 clusters, two with two subjects

each, and 66 with one subject each

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
16

An Example of a Clustering Algorithm

Ward’s Errors Sum of Squares Algorithm (cont.)

As Ward's algorithm progresses it will begin to

combine a single subject into a pre-existing cluster,

And then begins to combine one pre-existing

cluster with another

This process is continued until all 70 subjects are

finally combined into one cluster

Ward's algorithm forms clusters by selecting that

subject (or another cluster if combining clusters)
which minimizes the within cluster sum of squares
(i.e. error sum of squares)

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
17

A Seven Variable Example of

Cluster Analysis

The database 70 subjects and 7 variables

The variables

Sentence in years: sentence

Number of prior convictions: pr_conv

Degree of drug dependency: dr_score

Age: age

Age at first arrest: age_firs

Educational equivalency: educ_eqv

Level of work skill: skl_indx

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
18

Steps in the Cluster Analysis

Step 1 Transform the seven variables to standard

scores, i.e. Z-scores

Step 2 Create a dissimilarity matrix using squared

Euclidean distances

Squared Euclidean Distances

Subjects 1 2 3 … 70

1 1.782 2.538 … 47.236

2 1.782 0.821 … 39.902

3 2.538 0.821 … 41.652

… … … … …

70 47.236 39.902 41.652 …

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
19

Steps in the Cluster Analysis (cont.)

Step 3 Use Ward's algorithm to cluster the 70

subjects, beginning with 70 clusters of one subject
each and terminating with one cluster containing all
70 subjects
Agglomeration Schedule

Cluster Coefficients Stage Next Stage

Combined Cluster First
Appears
Stage Cluster 1 Cluster 2 Cluster 1 Cluster 2

1 62 63 .255 0 0 40
2 31 33 .610 0 0 37
3 2 3 1.021 0 0 43
4 7 8 1.502 0 0 31
5 29 30 1.984 0 0 45
6 14 15 2.495 0 0 31
7 52 67 3.031 0 0 34
8 18 19 3.588 0 0 49
9 46 47 4.191 0 0 35
10 27 28 4.803 0 0 44
11 36 40 5.437 0 0 33
12 9 13 6.095 0 0 49
13 48 49 6.760 0 0 51
14 32 38 7.435 0 0 42
15 20 21 8.128 0 0 39
16 22 64 8.844 0 0 39
17 35 39 9.580 0 0 52
18 5 12 10.324 0 0 36
19 23 24 11.093 0 0 29
20 57 59 11.878 0 0 32
21 37 43 12.702 0 0 42
22 6 10 13.551 0 0 55
23 1 4 14.439 0 0 28
24 11 45 15.358 0 0 46
25 41 44 16.284 0 0 33
26 55 56 17.220 0 0 41
27 51 66 18.237 0 0 48

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
20

28 1 50 19.329 23 0 47
29 17 23 20.483 0 19 38
30 54 69 21.732 0 0 41
31 7 14 23.076 4 6 46
32 57 58 24.425 20 0 53
33 36 41 25.784 11 25 40
34 52 53 27.173 7 0 51
35 42 46 28.626 0 9 58
36 5 16 30.251 18 0 54
37 31 34 32.018 2 0 62
38 17 68 33.905 29 0 59
39 20 22 35.806 15 16 57
40 36 62 37.855 33 1 56
41 54 55 39.918 30 26 50
42 32 37 42.118 14 21 52
43 2 65 44.428 3 0 47
44 25 27 46.758 0 10 45
45 25 29 49.344 44 5 59
46 7 11 52.395 31 24 54
47 1 2 55.709 28 43 63
48 26 51 59.223 0 27 61
49 9 18 62.772 12 8 57
50 54 70 66.383 41 0 65
51 48 52 70.076 13 34 60
52 32 35 73.798 42 17 58
53 57 60 77.659 32 0 65
54 5 7 81.736 36 46 55
55 5 6 86.189 54 22 64
56 36 61 90.955 40 0 66
57 9 20 97.853 49 39 60
58 32 42 105.430 52 35 62
59 17 25 114.736 38 45 67
60 9 48 125.105 57 51 61
61 9 26 136.517 60 48 63
62 31 32 150.461 37 58 68
63 1 9 167.695 47 61 64
64 1 5 194.756 63 55 66
65 54 57 222.045 50 53 67
66 1 36 258.210 64 56 68
67 17 54 298.955 59 65 69
68 1 31 361.556 66 62 69
69 1 17 483.000 68 67 0

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
21

Interpretation of the Agglomeration

Schedule
Stage 1 Cases 62 and 63 are combined into a
cluster. Now there is one cluster with two cases and
68 clusters with one case each, 69 total clusters, or
70 - 1 = 69

Coefficient The squared Euclidean distance over

which these two cases were joined = 0.255, called a
fusion coefficient

Next Stage The next stage at which one of these

cases is joined to a cluster is Stage 40 when case 62
is joined to case 36

Stage 33 Cases 36 and 41 are joined together

over a distance = 25.784. At this stage 37 clusters
have been formed (70 - 33 = 37)

Stage Cluster first Appears

Cluster 1 Notice that case 36 was previously

joined with case 40 at Stage 11

Cluster 2 Again, notice that case 41 was

previously joined with case 44 at Stage 25

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
22

Interpretation of the Agglomeration Schedule (cont.)

Next Stage The next stage at which one of these

cases is joined to a cluster is Stage 40 when case 36
is joined with case 62

Stage 69 Case 1 is joined with case 17 at an

Euclidean distance of 483.0, clearly two cases that
are very dissimilar.

At Stage 69 all 70 cases have been included in a

single cluster. Obviously this one cluster is a
heterogeneous cluster, containing many very
dissimilar cases.

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
23

How Do You Determine the Optimal

Number of Clusters in the Final Solution?
In this example, Ward's algorithm yields clusters
ranging from 70 clusters with one case each, to one
cluster containing all 70 cases.

Somewhere in between these two extremes is an

optimal number of clusters which best satisfies the
following conditions …

The clusters are as internally homogeneous as

possible (i.e. minimum within sum of squares)

And the various clusters are as different as possible

Determining the optimal number of clusters

Theory about the number of underlying groups

Ease of profiling the groups

Magnitude of change in the fusion coefficient

Dendogram with rescaled distance measure

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
24

What is a Dendogram?
* * * * * * H I E R A R C H I C A L C L U S T E R A N A L Y S I S * * * * * *

Dendrogram using Ward Method

Rescaled Distance Cluster Combine

C A S E 0 5 10 15 20 25
Label Num +---------+---------+---------+---------+---------+

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
25

What is a Dendogram? (cont.)

The Scaled Distance

The fusion coefficient transformed to a scale

ranging from 0 to 25

The Dendogram

The dendogram shows which cases were joined

together into clusters and at what distance, and
at latter stages, which clusters were joined
together into larger clusters, and at what
distance.

Interpretation

The point at which the "foothills" become the

"mountain peaks" is probability the optimal
number of clusters

Optimal Number of Clusters

A five-cluster solution appears about optimal

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
26

Computing a Five-Cluster Solution

Having hypothesized that a five-cluster solution may

be optimal …

The next step is to compute a five-cluster

solution and …

Save the cluster scores

Cluster scores

In this case, a cluster score is a number

between 1 and 5 assigned to each case
indicating the cluster to which a particular case
has been assigned

5-Cluster Solution

This is accomplished by repeating the cluster

analysis and specifying that five clusters are to
be extracted and the cluster scores saved.

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
27

Saved Cluster Scores

1.0 1
2.0 1
3.0 1
4.0 1
5.0 1
6.0 1
7.0 1
8.0 1
9.0 1
10.0 1
11.0 1
12.0 1
13.0 1
14.0 1
15.0 1
16.0 1
17.0 2
18.0 1
19.0 1
20.0 1
21.0 1
22.0 1
23.0 2
24.0 2
25.0 2
26.0 1
27.0 2
…
…
…
46.0 3
47.0 3
48.0 1
49.0 1
50.0 1
51.0 1
52.0 1
53.0 1
54.0 5
55.0 5
56.0 5
57.0 5
58.0 5
59.0 5
60.0 5
61.0 4
62.0 4
63.0 4
64.0 1
65.0 1
66.0 1
67.0 1
68.0 2
69.0 5
70.0 5

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
28

Profiling the Five Clusters

One way to profile the characteristics of the five

clusters is to compute the means of the seven
variables for each of the five clusters
Ward Method

Cluster 1
+------------------------+-----------+
| | Mean |
+------------------------+-----------+
|SENTENCE | 4.6 |
| | |
|PR_CONV | 1.5 |
| | |
|DR_SCORE | 7.5 |
| | |
|AGE | 21.6 |
| | |
|AGE_FIRS | 16.2 |
| | |
|EDUC_EQV | 7.3 |
| | |
|SKL_INDX | 6.0 |
+------------------------+-----------+

Ward Method

Cluster 2
+------------------------+-----------+
| | Mean |
+------------------------+-----------+
|SENTENCE | 7.3 |
| | |
|PR_CONV | 4.8 |
| | |
|DR_SCORE | 5.7 |
| | |
|AGE | 24.7 |
| | |
|AGE_FIRS | 14.4 |
| | |
|EDUC_EQV | 3.4 |
| | |
|SKL_INDX | 2.8 |
+------------------------+-----------+

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
29

Profiling the Five Clusters (cont.)

Ward Method

Cluster 3
+------------------------+-----------+
| | Mean |
+------------------------+-----------+
|SENTENCE | 2.4 |
| | |
|PR_CONV | .9 |
| | |
|DR_SCORE | 3.3 |
| | |
|AGE | 21.3 |
| | |
|AGE_FIRS | 19.3 |
| | |
|EDUC_EQV | 3.3 |
| | |
|SKL_INDX | 2.5 |
+------------------------+-----------+

Ward Method

Cluster 4
+------------------------+-----------+
| | Mean |
+------------------------+-----------+
|SENTENCE | 3.1 |
| | |
|PR_CONV | .9 |
| | |
|DR_SCORE | 3.0 |
| | |
|AGE | 20.6 |
| | |
|AGE_FIRS | 19.0 |
| | |
|EDUC_EQV | 10.7 |
| | |
|SKL_INDX | 8.1 |
+------------------------+-----------+

Ward Method

Cluster 5
+------------------------+-----------+
| | Mean |
+------------------------+-----------+
|SENTENCE | 16.3 |
| | |
|PR_CONV | 2.1 |
| | |
|DR_SCORE | 8.1 |
| | |
|AGE | 30.2 |
| | |
|AGE_FIRS | 14.7 |
| | |
|EDUC_EQV | 5.3 |
| | |
|SKL_INDX | 3.8 |
+------------------------+-----------+

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
30

Ranking the Variable Means of the Five

Clusters

Variable Clusters

1 2 3 4 5

Age M H L LL HH

Age_Firs M L HH H LL

Dr_Score H M L LL HH

Educ_Eqv H L LL HH M

Pr_Conv M HH L LL H

Sentence M H LL L HH

Skl_Indx H L LL HH M

LL = lowest L = low M = median H = high HH = Highest

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
31

Profile Descriptions of the Five Clusters

Cluster 1

Better educated drug users who are highly skilled

workers, about median age

Cluster 2

Older offenders, unskilled, poorly educated with some

history of drug use, career criminals serving long
sentences

Cluster 3

Young 1st offenders, unskilled, poorly educated with

little drug history, serving very short sentences

Cluster 4

Very young, highly educated, skilled 1st offenders

serving short sentences, little history of drug use

Cluster 5

Severely drug dependent old offenders with long

criminal careers serving very long sentences

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
32

Secondary Applications of the Results of a

Cluster Analysis

Some statistical techniques use a priori categorical

independent or dependent variables such as
analysis of variance or discriminant analysis.

Cluster analysis allows us to create an empirically

derived categorical variable wherein the groups or
clusters are determined to be homogeneous and
significantly different from each other.

Other statistical tests can then be conducted using

the cluster variable as a categorical IV or DV.

Example

Do the five clusters of offenders differ

significantly in the seriousness of the crime of
which they were convicted? This is a one-way
ANOVA problem.

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
33

Secondary Applications of the Results of a Cluster Analysis (cont.)

Univariate Analysis of Variance

Between-Subjects Factors

N
Ward 1 33
Method 2 9
3 12
4 7
5 9

Tests of Between-Subjects Effects

Dependent Variable: SER_INDX

Type III Sum
Source of Squares df Mean Square F Sig.
Corrected Model 152.593a 4 38.148 19.471 .000
Intercept 853.296 1 853.296 435.527 .000
CLU5_1 152.593 4 38.148 19.471 .000
Error 127.350 65 1.959
Total 1306.000 70
Corrected Total 279.943 69
a. R Squared = .545 (Adjusted R Squared = .517)

Post Hoc Tests

Ward Method

Interpretation

There are significant mean differences in the crime

seriousness of the offences committed by the five
clusters of offenders.

Tukey's HSD test is used to determine which group

mean differences are significant.

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
34

Secondary Applications of the Results of a Cluster Analysis (cont.)

Multiple Comparisons

Dependent Variable: SER_INDX

Tukey HSD

Mean
Difference 95% Confidence Interval
(I) Ward Method (J) Ward Method (I-J) Std. Error Sig. Lower Bound Upper Bound
1 2 -2.5152* .5264 .000 -3.9921 -1.0382
3 1.2348 .4718 .079 -8.9081E-02 2.5588
4 1.3420 .5825 .157 -.2923 2.9763
5 -2.8485* .5264 .000 -4.3254 -1.3716
2 1 2.5152* .5264 .000 1.0382 3.9921
3 3.7500* .6172 .000 2.0182 5.4818
4 3.8571* .7054 .000 1.8779 5.8364
5 -.3333 .6598 .987 -2.1847 1.5181
3 1 -1.2348 .4718 .079 -2.5588 8.908E-02
2 -3.7500* .6172 .000 -5.4818 -2.0182
4 .1071 .6657 1.000 -1.7607 1.9750
5 -4.0833* .6172 .000 -5.8152 -2.3515
4 1 -1.3420 .5825 .157 -2.9763 .2923
2 -3.8571* .7054 .000 -5.8364 -1.8779
3 -.1071 .6657 1.000 -1.9750 1.7607
5 -4.1905* .7054 .000 -6.1697 -2.2112
5 1 2.8485* .5264 .000 1.3716 4.3254
2 .3333 .6598 .987 -1.5181 2.1847
3 4.0833* .6172 .000 2.3515 5.8152
4 4.1905* .7054 .000 2.2112 6.1697
Based on observed means.
*. The mean difference is significant at the .05 level.

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
35

Secondary Applications of the Results of a Cluster Analysis (cont.)

SER_INDX
a,b,c
Tukey HSD
Subset
Ward Method N 1 2
4 7 2.1429
3 12 2.2500
1 33 3.4848
2 9 6.0000
5 9 6.3333
Sig. .196 .982
Means for groups in homogeneous subsets are displayed.
Based on Type III Sum of Squares
The error term is Mean Square(Error) = 1.959.
a. Uses Harmonic Mean Sample Size = 10.445.
b. The group sizes are unequal. The harmonic mean
of the group sizes is used. Type I error levels are
not guaranteed.
c. Alpha = .05.

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
36

Using the Categorical Cluster Variable as a

Dependent Variable

Example

To what extent does the type of defense

counsel, pretrial jail time, and time to case
disposition predict differences among the five
groups of offenders?

This is a discriminant analysis problem with the

cluster variable as the DV. (If the cluster variable
were used as the IV, this would be a MANOVA
problem)

Discriminant analysis results

Three discriminant functions were extracted

since there are 3 IVs, which is less than 5
groups. (functions: g-1 or k, if kg)

Only the 1st discriminant function is significant.

Z1 = -0.313 - 0.866 council + 0.021 jail_tm -0.002 tm_disp

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
37

Using the Cluster Variable as a Dependent Variable (cont.)

Discriminant

Group Statistics

Valid N (listwise)
Ward Method Unweighted Weighted
1 COUNSEL 33 33.000
JAIL_TM 33 33.000
TM_DISP 33 33.000
2 COUNSEL 9 9.000
JAIL_TM 9 9.000
TM_DISP 9 9.000
3 COUNSEL 12 12.000
JAIL_TM 12 12.000
TM_DISP 12 12.000
4 COUNSEL 7 7.000
JAIL_TM 7 7.000
TM_DISP 7 7.000
5 COUNSEL 9 9.000
JAIL_TM 9 9.000
TM_DISP 9 9.000
Total COUNSEL 70 70.000
JAIL_TM 70 70.000
TM_DISP 70 70.000

Analysis 1
Summary of Canonical Discriminant Functions
Eigenvalues

Canonical
Function Eigenvalue % of Variance Cumulative % Correlation
1 .492a 89.5 89.5 .574
2 .042a 7.6 97.1 .200
3 .016a 2.9 100.0 .125
a. First 3 canonical discriminant functions were used in the
analysis.

Wilks' Lambda

Wilks'
Test of Function(s) Lambda Chi-square df Sig.
1 through 3 .633 29.686 12 .003
2 through 3 .945 3.678 6 .720
3 .984 1.019 2 .601

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
38

Using the Cluster Variable as a Dependent Variable (cont.)

Standardized Canonical Discriminant Function Coefficients

Function
1 2 3
COUNSEL .549 .863 .523
JAIL_TM -.627 .807 .607
TM_DISP .102 .384 -.962

Structure Matrix

Function
1 2 3
JAIL_TM -.867* .488 .103
COUNSEL .848* .455 .271
TM_DISP -.086 .555 -.827*
Pooled within-groups correlations between discriminating
variables and standardized canonical discriminant functions
Variables ordered by absolute size of correlation within function.
*. Largest absolute correlation between each variable and
any discriminant function

Canonical Discriminant Function Coefficients

Function
1 2 3
COUNSEL 1.235 1.943 1.176
JAIL_TM -.016 .020 .015
TM_DISP .004 .015 -.039
(Constant) -.304 -3.221 2.205
Unstandardized coefficients

Functions at Group Centroids

Function
Ward Method 1 2 3
1 .213 -.115 -9.76E-02
2 -.803 .140 -2.89E-02
3 .673 .366 4.822E-02
4 .618 -.266 .291
5 -1.357 -1.51E-03 9.600E-02
Unstandardized canonical discriminant functions
evaluated at group means

Cluster Analysis: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

Aula - Análise de Clusters
No ratings yet
Aula - Análise de Clusters
93 pages
Literary Analysis Sample
No ratings yet
Literary Analysis Sample
9 pages
Laws Affecting Franchising in India
No ratings yet
Laws Affecting Franchising in India
23 pages
13.multiple Regression - NLS Edit
No ratings yet
13.multiple Regression - NLS Edit
32 pages
Chapter 4 - Cluster Analysis
No ratings yet
Chapter 4 - Cluster Analysis
55 pages
Georjeanna Wilson Doenges SPSS For Research Methods - A Basic Guide W. W. Norton - Company - 2021
No ratings yet
Georjeanna Wilson Doenges SPSS For Research Methods - A Basic Guide W. W. Norton - Company - 2021
308 pages
Latent Clustering W Mplus v2
No ratings yet
Latent Clustering W Mplus v2
57 pages
Cluster Analysis
No ratings yet
Cluster Analysis
61 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
19 - Clustering in Operation Research
No ratings yet
19 - Clustering in Operation Research
11 pages
Analysis of Cluteruing
No ratings yet
Analysis of Cluteruing
16 pages
Cse-613 - Mod 4
No ratings yet
Cse-613 - Mod 4
97 pages
Group#10 (Cluster Analysis)
No ratings yet
Group#10 (Cluster Analysis)
53 pages
8 Clustering
No ratings yet
8 Clustering
53 pages
Lec 35
No ratings yet
Lec 35
18 pages
DM 4
No ratings yet
DM 4
76 pages
Cluster Analysis
No ratings yet
Cluster Analysis
33 pages
Cluster Analysis Finalllll
No ratings yet
Cluster Analysis Finalllll
24 pages
Lecture-11 Cluster Analysis-1
No ratings yet
Lecture-11 Cluster Analysis-1
28 pages
Markup 01 Statistika Lanjut - Cluster Analysis 1
No ratings yet
Markup 01 Statistika Lanjut - Cluster Analysis 1
60 pages
Cluster Analysis-2
No ratings yet
Cluster Analysis-2
7 pages
Consumer Behaviour and Online Shopping: The Study of Online Shopping Adoption (With Reference To Lucknow City)
No ratings yet
Consumer Behaviour and Online Shopping: The Study of Online Shopping Adoption (With Reference To Lucknow City)
17 pages
Lecture 02 - Cluster Analysis 1
No ratings yet
Lecture 02 - Cluster Analysis 1
59 pages
BFC 34303 Civil Engineering Statistics SEMESTER I 2024/2025
No ratings yet
BFC 34303 Civil Engineering Statistics SEMESTER I 2024/2025
9 pages
Research II Manuscirpt
No ratings yet
Research II Manuscirpt
17 pages
Beyond T Test and ANOVA: Applications of Mixed-Effects Models For More Rigorous Statistical Analysis in Neuroscience Research
No ratings yet
Beyond T Test and ANOVA: Applications of Mixed-Effects Models For More Rigorous Statistical Analysis in Neuroscience Research
15 pages
A Study of Customer Perception of Youth Towards Branded Fashion Apparels
No ratings yet
A Study of Customer Perception of Youth Towards Branded Fashion Apparels
51 pages
Pride and Prejudice PDF
No ratings yet
Pride and Prejudice PDF
479 pages
A Comprehensive Survey of Clustering Algorithms
No ratings yet
A Comprehensive Survey of Clustering Algorithms
30 pages
The Sage Encyclopedia of Communication Research Methods
No ratings yet
The Sage Encyclopedia of Communication Research Methods
6 pages
Cluster Analysis
No ratings yet
Cluster Analysis
34 pages
Module 5
No ratings yet
Module 5
98 pages
Paper 7
No ratings yet
Paper 7
13 pages
DM Clustering
No ratings yet
DM Clustering
51 pages
Module 3 - Assignment
No ratings yet
Module 3 - Assignment
8 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
Lecture-9 Cluster Analysis - LAK
No ratings yet
Lecture-9 Cluster Analysis - LAK
4 pages
Cluster Analysis: Clusters Classification Analysis Numerical Taxonomy
No ratings yet
Cluster Analysis: Clusters Classification Analysis Numerical Taxonomy
50 pages
Cluster Analysis
No ratings yet
Cluster Analysis
20 pages
Sample
No ratings yet
Sample
96 pages
Cluster Analysis
No ratings yet
Cluster Analysis
45 pages
Mlr-I Practical 4.2
No ratings yet
Mlr-I Practical 4.2
2 pages
Chapter Twenty: Cluster Analysis
No ratings yet
Chapter Twenty: Cluster Analysis
35 pages
Cluster Analysis Using Statgraphics: Dr. Neil W. Polhemus
No ratings yet
Cluster Analysis Using Statgraphics: Dr. Neil W. Polhemus
32 pages
Assignment 2-Data Analysis and Report Writing
No ratings yet
Assignment 2-Data Analysis and Report Writing
2 pages
BA2 7 Cluster
No ratings yet
BA2 7 Cluster
33 pages
Dmbi Unit-4
No ratings yet
Dmbi Unit-4
18 pages
Unit-7 Finalized
No ratings yet
Unit-7 Finalized
20 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
Presentation Malo
No ratings yet
Presentation Malo
65 pages
Cluster Analysis
No ratings yet
Cluster Analysis
29 pages
Production of Bioactive Compounds From Ginger (Zingiber Officianale)
No ratings yet
Production of Bioactive Compounds From Ginger (Zingiber Officianale)
7 pages
Cluster Analysis
No ratings yet
Cluster Analysis
11 pages
Chapter Twenty: Cluster Analysis
No ratings yet
Chapter Twenty: Cluster Analysis
46 pages
A Comparison On The Absorption Capacities of Thermal Energy by Different Color of Shirts
No ratings yet
A Comparison On The Absorption Capacities of Thermal Energy by Different Color of Shirts
18 pages
Applied Statistics For The Behavioral Sciences
No ratings yet
Applied Statistics For The Behavioral Sciences
6 pages
Chicano English
100% (1)
Chicano English
27 pages
MDA Session 4
No ratings yet
MDA Session 4
5 pages
Cluster Analysis
No ratings yet
Cluster Analysis
9 pages
Data Mining: Clustering
No ratings yet
Data Mining: Clustering
46 pages
Thesis Crop Science
No ratings yet
Thesis Crop Science
28 pages
Chapter 20: Cluster Analysis: Advance Marketing Research
No ratings yet
Chapter 20: Cluster Analysis: Advance Marketing Research
40 pages
10.cluster Analysis
No ratings yet
10.cluster Analysis
68 pages
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
No ratings yet
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
43 pages
Study Habits of Students Acads Perf
No ratings yet
Study Habits of Students Acads Perf
14 pages
Cluster Analysis
No ratings yet
Cluster Analysis
33 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
12d Stem - g2 - Research Paper (Chapter I & II)
No ratings yet
12d Stem - g2 - Research Paper (Chapter I & II)
21 pages
Cluster Analysis: G Sreenivas
No ratings yet
Cluster Analysis: G Sreenivas
29 pages
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
No ratings yet
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
41 pages
The Relationship Between Cellular Phone Addiction and Self-Esteem of Elementary School Students in Highly Mobile Environment
No ratings yet
The Relationship Between Cellular Phone Addiction and Self-Esteem of Elementary School Students in Highly Mobile Environment
8 pages
Session-13b BRM PDF
No ratings yet
Session-13b BRM PDF
18 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
SPSS Sample Power Self Paced Course Syllabus
No ratings yet
SPSS Sample Power Self Paced Course Syllabus
8 pages
Applied Longitudinal Analysis Lecture Notes
No ratings yet
Applied Longitudinal Analysis Lecture Notes
475 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
Cluster Analysis
No ratings yet
Cluster Analysis
30 pages
KXK BG Factorial Designs
No ratings yet
KXK BG Factorial Designs
4 pages
Cluster Analysis
100% (1)
Cluster Analysis
4 pages
Cluster Analysis GP Seminar
No ratings yet
Cluster Analysis GP Seminar
13 pages
TDS On Payments To Non-Residents & Residents: Nihar Jambusaria, Bdo India 11 August, 2010
No ratings yet
TDS On Payments To Non-Residents & Residents: Nihar Jambusaria, Bdo India 11 August, 2010
45 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
51 pages
PridePrejudice ProductionNotes
100% (1)
PridePrejudice ProductionNotes
39 pages
Chapter 017
No ratings yet
Chapter 017
50 pages
The UB Group: Presentation ON Mcdowell & Company United Breweries LTD
No ratings yet
The UB Group: Presentation ON Mcdowell & Company United Breweries LTD
30 pages
Mancova PDF
No ratings yet
Mancova PDF
11 pages
Osservatorio Del Mediterraneo Ministry of Foreign Affairs Roma, Nov. 4, 2009
No ratings yet
Osservatorio Del Mediterraneo Ministry of Foreign Affairs Roma, Nov. 4, 2009
33 pages
Competition Scenario Bangladesh
No ratings yet
Competition Scenario Bangladesh
23 pages
Chapter-5-Cluster Analysis PDF
No ratings yet
Chapter-5-Cluster Analysis PDF
5 pages
Competency Mapping Worksheet
No ratings yet
Competency Mapping Worksheet
10 pages
Balsara Toothpaste Success N Failure Story
No ratings yet
Balsara Toothpaste Success N Failure Story
8 pages
What Is Cluster Analysis?
No ratings yet
What Is Cluster Analysis?
24 pages
CIBC Retail Banking - Branch Platform Project and Engagement Best Practices
No ratings yet
CIBC Retail Banking - Branch Platform Project and Engagement Best Practices
24 pages
Market Leadership in India
No ratings yet
Market Leadership in India
29 pages
Ict Competency
No ratings yet
Ict Competency
17 pages
Effects of Price, Brand, and Store Information On Buyers' Product Evaluations
No ratings yet
Effects of Price, Brand, and Store Information On Buyers' Product Evaluations
14 pages
The Food Processing Sector in India: Madras Consultancy Group, Chennai
No ratings yet
The Food Processing Sector in India: Madras Consultancy Group, Chennai
18 pages
Product Life Cycle: Prepared By: Hitesh Baid
No ratings yet
Product Life Cycle: Prepared By: Hitesh Baid
23 pages
Summary of Surviving Your Dissertation
No ratings yet
Summary of Surviving Your Dissertation
28 pages
Bionator III
No ratings yet
Bionator III
5 pages
Chapter 9
No ratings yet
Chapter 9
20 pages
A Famous Example of Cluster Analysis
No ratings yet
A Famous Example of Cluster Analysis
5 pages
Competency Mapping - Techniques and Models
No ratings yet
Competency Mapping - Techniques and Models
17 pages
Competition and Consumer Protection
No ratings yet
Competition and Consumer Protection
13 pages
Computing Skills Proficiency 1.1
No ratings yet
Computing Skills Proficiency 1.1
2 pages
Cluster Analysis
No ratings yet
Cluster Analysis
9 pages
SPSS Course Content
No ratings yet
SPSS Course Content
2 pages
Competency Mapping
No ratings yet
Competency Mapping
2 pages
Notification
No ratings yet
Notification
1 page
Competency Mapping 2
No ratings yet
Competency Mapping 2
1 page
Tumkur University
No ratings yet
Tumkur University
3 pages
Kmat 2010 Notification
No ratings yet
Kmat 2010 Notification
1 page