0% found this document useful (0 votes)
94 views2 pages

Linkage Methods

The document discusses different linkage methods for hierarchical cluster analysis. There are several linkage methods that determine how the distance between clusters is calculated as observations are joined together, including single, average, centroid, complete, median, McQuitty, and Ward's methods. Each method calculates inter-cluster distances differently, so trying multiple methods is recommended to see which works best for a given dataset as some may produce better results than others depending on the characteristics of the data.

Uploaded by

santi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views2 pages

Linkage Methods

The document discusses different linkage methods for hierarchical cluster analysis. There are several linkage methods that determine how the distance between clusters is calculated as observations are joined together, including single, average, centroid, complete, median, McQuitty, and Ward's methods. Each method calculates inter-cluster distances differently, so trying multiple methods is recommended to see which works best for a given dataset as some may produce better results than others depending on the characteristics of the data.

Uploaded by

santi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Linkage methods

Learn more about Minitab 17

The linkage method that you choose determines how the distance between two clusters is
defined. At each amalgamation stage, the two closest clusters are joined. At the beginning,
when each observation constitutes a cluster, the distance between clusters is just the inter-
observation distance. Subsequently, after observations are joined together, a linkage rule is
necessary for calculating inter-cluster distances when there are multiple observations in a
cluster.

You might want to try several linkage methods and compare results. Depending on the
characteristics of your data, some methods may provide "better" results than others.
Single
With single linkage method (also called nearest neighbor method), the distance
between two clusters is the minimum distance between an observation in one cluster
and an observation in the other cluster. The single linkage method is a good choice
when clusters are obviously separated. When observations lie close together, the
single linkage method tends to identify long chain-like clusters that can have a
relatively large distance separating observations at either end of the chain.
Average
With the average linkage method, the distance between two clusters is the mean
distance between an observation in one cluster and an observation in the other cluster.
Whereas the single or complete linkage methods group clusters are based on single
pair distances, the average linkage method uses a more central measure of location.
Centroid
With the centroid linkage method, the distance between two clusters is the distance
between the cluster centroids or means. Like the average linkage method, this method
is one more averaging technique.
Complete
With the complete linkage method (also called furthest neighbor method), the distance
between two clusters is the maximum distance between an observation in one cluster
and an observation in the other cluster. This method ensures that all observations in a
cluster are within a maximum distance and tends to produce clusters with similar
diameters. The results can be sensitive to outliers.
Median
With the median linkage method, the distance between two clusters is the median
distance between an observation in one cluster and an observation in the other cluster.
This is a different averaging technique, but uses the median instead of the mean, thus
downweighting the effect of outliers.
McQuitty
With McQuitty's linkage method, when two clusters are be joined, the distance of the
new cluster to any other cluster is calculated as the average of the distances of the
soon to be joined clusters to that other cluster. For example, if clusters 1 and 3 are to
be joined into a new cluster, say 1*, then the distance from 1* to cluster 4 is the
average of the distances from 1 to 4 and 3 to 4. Here, distance depends on a
combination of clusters instead of individual observations in the clusters.
Ward
With Ward's linkage method, the distance between two clusters is the sum of squared
deviations from points to centroids. The goal of Ward's linkage method is to minimize
the within-cluster sum of squares. It tends to produce clusters with similar numbers of
observations, but it is sensitive to outliers. In Ward's linkage method, it is possible for
the distance between two clusters to be larger than dmax, the maximum value in the
original distance matrix. If this occurs, the similarity will be negative.

Minitab.com

License Portal

Store

Blog

Contact

You might also like