Linkage Methods
Linkage Methods
The linkage method that you choose determines how the distance between two clusters is
defined. At each amalgamation stage, the two closest clusters are joined. At the beginning,
when each observation constitutes a cluster, the distance between clusters is just the inter-
observation distance. Subsequently, after observations are joined together, a linkage rule is
necessary for calculating inter-cluster distances when there are multiple observations in a
cluster.
You might want to try several linkage methods and compare results. Depending on the
characteristics of your data, some methods may provide "better" results than others.
Single
With single linkage method (also called nearest neighbor method), the distance
between two clusters is the minimum distance between an observation in one cluster
and an observation in the other cluster. The single linkage method is a good choice
when clusters are obviously separated. When observations lie close together, the
single linkage method tends to identify long chain-like clusters that can have a
relatively large distance separating observations at either end of the chain.
Average
With the average linkage method, the distance between two clusters is the mean
distance between an observation in one cluster and an observation in the other cluster.
Whereas the single or complete linkage methods group clusters are based on single
pair distances, the average linkage method uses a more central measure of location.
Centroid
With the centroid linkage method, the distance between two clusters is the distance
between the cluster centroids or means. Like the average linkage method, this method
is one more averaging technique.
Complete
With the complete linkage method (also called furthest neighbor method), the distance
between two clusters is the maximum distance between an observation in one cluster
and an observation in the other cluster. This method ensures that all observations in a
cluster are within a maximum distance and tends to produce clusters with similar
diameters. The results can be sensitive to outliers.
Median
With the median linkage method, the distance between two clusters is the median
distance between an observation in one cluster and an observation in the other cluster.
This is a different averaging technique, but uses the median instead of the mean, thus
downweighting the effect of outliers.
McQuitty
With McQuitty's linkage method, when two clusters are be joined, the distance of the
new cluster to any other cluster is calculated as the average of the distances of the
soon to be joined clusters to that other cluster. For example, if clusters 1 and 3 are to
be joined into a new cluster, say 1*, then the distance from 1* to cluster 4 is the
average of the distances from 1 to 4 and 3 to 4. Here, distance depends on a
combination of clusters instead of individual observations in the clusters.
Ward
With Ward's linkage method, the distance between two clusters is the sum of squared
deviations from points to centroids. The goal of Ward's linkage method is to minimize
the within-cluster sum of squares. It tends to produce clusters with similar numbers of
observations, but it is sensitive to outliers. In Ward's linkage method, it is possible for
the distance between two clusters to be larger than dmax, the maximum value in the
original distance matrix. If this occurs, the similarity will be negative.
Minitab.com
License Portal
Store
Blog
Contact