Cluster Analysis
Cluster Analysis
Cluster Analysis
• Used to classify objects (cases) into
homogeneous groups called clusters.
• Objects in each cluster tend to be similar and
dissimilar to objects in the other clusters.
• In cluster analysis groups are suggested by the
data.
An Ideal Clustering Situation
Variable 1
Variable 2
More Common Clustering Situation
Variable 1
X
Variable 2
Statistics Associated with Cluster Analysis
Cluster 1 Cluster 2
Complete Linkage
Maximum
Distance
Cluster 1 Cluster 2
Average Linkage
Average
Cluster 1 Distance Cluster 2
Hierarchical Agglomerative Clustering-
Variance and Centroid Method
• Variance methods generate clusters to minimize the within-
cluster variance.
Centroid Method
Idea Behind K-Means
• Algorithm for K-means clustering
1. Partition items into K clusters
2. Assign items to cluster with nearest
centroid mean
3. Recalculate centroids both for cluster
receiving and losing item
4. Repeat steps 2 and 3 till no more
reassignments
Select a Clustering Procedure
• The hierarchical and nonhierarchical methods should be
used in tandem.
Means of
Variables
Cluster No. V1 V2 V3 V4 V5 V6