0% found this document useful (0 votes)

41 views11 pages

Hierarchical Clustering

Hierarchical clustering builds a tree structure over data by successively merging clusters. There are two main approaches: agglomerative clustering which starts with each data point as a cluster and merges them, and divisive clustering which starts with all data in one cluster and splits it. The key decision is how to define the distance between clusters, with common choices being single, complete, average, and centroid linkages.

Uploaded by

Muhammad Salman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views11 pages

Hierarchical Clustering

Uploaded by

Muhammad Salman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Hierarchical Clustering

Ryan P. Adams
COS 324 – Elements of Machine Learning
Princeton University

K-Means clustering is a good general-purpose way to think about discovering groups in data,
but there are several aspects of it that are unsatisfying. For one, it requires the user to specify the
number of clusters in advance, or to perform some kind of post hoc selection. For another, the
notion of what forms a group is very simple: a datum belongs to cluster k if it is closer to the k th
center than it is to any other center. Third, K-Means is nondeterministic; the solution it finds will
depend on the initialization and even good initialization algorithms such as K-Means++ have a
randomized aspect. Finally, we might reasonably think that our data are more complicated than
can be described by simple partitions. For example, we partition organisms into diﬀerent species,
but science has also developed a rich taxonomy of living things: kingdom, phylum, class, etc.
Hierarchical clustering is one framework for thinking about how to address these shortcomings.
Hierarchical clustering constructs a (usually binary) tree over the data. The leaves are individual
data items, while the root is a single cluster that contains all of the data. Between the root and
the leaves are intermediate clusters that contain subsets of the data. The main idea of hierarchical
clustering is to make “clusters of clusters” going upwards to construct a tree. There are two main
conceptual approaches to forming such a tree. Hierarchical agglomerative clustering (HAC)
starts at the bottom, with every datum in its own singleton cluster, and merges groups together.
Divisive clustering starts with all of the data in one big group and then chops it up until every
datum is in its own singleton group.

1 Agglomerative Clustering
The basic algorithm for hierarchical agglomerative clustering is shown in Algorithm 1. Essentially,
this algorithm maintains an “active set” of clusters and at each stage decides which two clusters to
merge. When two clusters are merged, they are each removed from the active set and their union
is added to the active set. This iterates until there is only one cluster in the active set. The tree is
formed by keeping track of which clusters were merged.
The clustering found by HAC can be examined in several diﬀerent ways. Of particular interest
is the dendrogram, which is a visualization that highlights the kind of exploration enabled by
hierarchical clustering over flat approaches such as K-Means. A dendrogram shows data items
along one axis and distances along the other axis. The dendrograms in these notes will have the
data on the y-axis. A dendrogram shows a collection of ⊐ shaped paths, where the legs show

1
Algorithm 1 Hierarchical Agglomerative Clustering Note: written for clarity, not eﬃciency.
1: Input: Data vectors {xn }n=
N ,
1 group-wise distance D󰝖󰝠󰝡(G, G ′)
2: A ← ∅ ⊲ Active set starts out empty.
3: for n ← 1 . . . N do ⊲ Loop over the data.
4: A ← A ∪ {{xn }} ⊲ Add each datum as its own cluster.
5: end for
6: T ← A ⊲ Store the tree as a sequence of merges. In practice, pointers.
7: while |A| > 1 do ⊲ Loop until the active set only has one item.
8: G1󰂏, G2󰂏 ← arg min D󰝖󰝠󰝡(G1, G2 ) ⊲ Choose pair in A with best distance.
G1, G2 ∈A ; G1, G2 ∈A
9: A ← (A\{G1󰂏 })\{G2󰂏 } ⊲ Remove each from active set.
10: A ← A ∪ {G1󰂏 ∪ G2󰂏 } ⊲ Add union to active set.
11: T ← T ∪ {G1󰂏 ∪ G2󰂏 } ⊲ Add union to tree.
12: end while
13: Return: Tree T .

the groups that have been joined together. These groups may be the base of another ⊐ or may
be singleton groups represented as the data along the axis. A key property of the dendrogram
is that that vertical base of the ⊐ is located along the x-axis according to the distance between
the two groups that are being merged. For this to result in a sensible clustering – and a valid
dendrogram – these distances must be monotonically increasing. That is, the distance between two
merged groups G and G ′ must always be greater than or equal to the distance between any of the
previously-merged subgroups that formed G and G ′.
Figure 1b shows a dendrogram for a set of professional basketball players, based on some per-
game performance statistics in the 2012-13 season. Figure 1a on the left of it shows the pairwise
distance matrix that was used to compute the dendrogram. Notice how there are some distinct
groups that appear as blocks in the distance matrix and as a subtree in the dendrogram. When
we explore these data, we might observe that this structure seems to correspond to position; all of
the players in the bottom subtree between Dwight Howard and Paul Millsap are centers or power
forwards (except for Paul Pierce who is considered more of a small forward) and play near the
basket. Above these is a somewhat messier subtree that contains point guards (e.g., Stephen Curry
and Tony Parker) and shooting guards (e.g., Dwayne Wade and Kobe Bryant). At the top are Kevin
Durant and LeBron James, as they are outliers in several categories. Anderson Varejao also appears
to be an unusual player according to these data; I attribute this to him having an exceptionally large
number of rebounds for a high-scoring player.
The main decision to make when using HAC is what the distance criterion1 should be between
groups – the D󰝖󰝠󰝡(G, G ′) function in the pseudocode. In K-Means, we looked at distances between
data items; in HAC we look at distances between groups of data items. Perhaps not suprisingly,
there are several diﬀerent ways to think about such distances. In each of the cases below, we
consider the distances between two groups G = {xn }n= N and G ′ = { y } M , where N and M are
1 m m=1

1These are not necessarily “distances” in the formal sense that they arise from a metric space. Here we’ll be thinking
of distances as a measure of dissimilarity.

2
(a) Pairwise Distances (b) Single-Linkage Dendrogram

Figure 1: These figures demonstrate hierarchical agglomerative clustering of high-scoring profes-

sional basketball players in the NBA, based on a set of normalized features such as assists and
rebounds per game, from http://hoopdata.com/. (a) The matrix of pairwise distances between
players. Darker is more similar. The players have been ordered to highlight the block structure:
power forwards and centers appear in the bottom right, point guards in the middle block, with some
unusual players in the top right. (b) The dendrogram arising from HAC with the single-linkage
criterion.

not necessarily the same. Figure 2 illustrates these four types of “linkages”. Figures 3 and 4 show
the eﬀects of these linkages on some simple data.

The Single-Linkage Criterion: The single-linkage criterion for hierarchical clustering merges
groups based on the shortest distance over all possible pairs. That is
N M
D󰝖󰝠󰝡-S󰝖󰝛󰝔󰝙󰝒L󰝖󰝛󰝘({xn }n= 1, { ym }m=1 ) = min ||x n − ym ||, (1)
n,m

where || x − y|| is an appropriately chosen distance metric between data examples. See Figure 2a.
This criterion merges a group with its nearest neighbor and has an interesting interpretation. Think
of the data as the vertices in a graph. When we merge a group using the single-linkage criterion, add
an edge between the two vertices that minimized Equation 1. As we never add an edge between two
members of an existing group, we never introduce loops as we build up the graph. Ultimately, when
the algorithm terminates, we have a tree. As we were adding edges at each stage that minimize
the distance between groups (subject to not adding a loop), we actually end up with the tree that
connects all the data but for which the sum of the edge lengths is smallest. That is, single-linkage
HAC produces the minimum spanning tree for the data.
Eponymously, two merge two clusters with the single-linkage criterion, you just need one of
the items to be nearby. This can result in “chaining” and long stringy clusters. This may be good
or bad, depending on your data and your desires. Figure 4a shows an example where it seems like

3
a good thing because it is able to capture the elongated shape of the pinwheel lobes. On the other
hand, this eﬀect can result in premature merging of clusters in the tree.

The Complete-Linkage Criterion: Rather than choosing the shortest distance, in complete-
linkage clustering the distance between two groups is determined by the largest distance over all
possible pairs, i.e.,
N M
D󰝖󰝠󰝡-C󰝜󰝚󰝝󰝙󰝒󰝡󰝒L󰝖󰝛󰝘({xn }n=1, { ym }m=1 ) = max || x n − ym ||, (2)
n,m

where again || x − y|| is an appropriate distance measure. See Figure 2b. This has the opposite of
the chaining eﬀect and prefers to make highly compact clusters, as it requires all of the distances
to be small. Figures 3b and 4b show how this results in tighter clusters.

The Average-Linkage Criterion: Rather than the worst or best distances, when using the average-
linkage criterion we average over all possible pairs between the groups:

1 󳕗󳕗
N M
N M
D󰝖󰝠󰝡-A󰝣󰝒󰝟󰝎󰝔󰝒({xn }n= 1, { ym }m=1 ) = ||xn − ym || . (3)
N M n=1 m=1

This linkage can be thought of as a compromise between the single and complete linkage criteria.
It produces compact clusters that can still have some elongated shape. See Figures 2c, 3b, and 4c.

The Centroid Criterion: Another alternative approach to computing the distance between clus-
ters is to look at the diﬀerence between their centroids:
󰀣 󰀤 󰀣 󰀤
1 󳕗N
1 󳕗M
N M
D󰝖󰝠󰝡-C󰝒󰝛󰝡󰝟󰝜󰝖󰝑{xn }n= 1, { ym }m=1 ) = || xn − ym || . (4)
N n=1 M m=1

Note that this is something that only makes sense if an average of data items is sensible; recall the
motivation for K-Medoids versus K-Means. See Figure 2d, 3d and 4d.
Although this criterion is appealing when thinking of HAC as a next step beyond K-Means, it
does present some diﬃculties. Specifically, the centroid linkage criterion breaks the assumption of
monotonicity of merges and can result in an inversion in the dendrogram.

1.1 Discussion
Hierarchical agglomerative clustering is our first example of a nonparametric, or instance-based,
machine learning method. When thinking about machine learning methods, it is useful to think
about the space of possible things that can be learned from data, i.e., our hypothesis space.
Parametric methods such as K-Means decide in advance how large this hypothesis space will be;
in clustering that means how many clusters there can be and what possible shapes they can have.
Nonparametric methods such as HAC allow the eﬀective number of parameters to grow with the

4
(a) Single-Linkage (b) Complete-Linkage

(c) Average-Linkage (d) Centroid

Figure 2: Four diﬀerent types of linkage criteria for hierarchical agglomerative clustering (HAC).
(a) Single linkage looks at minimum distance between all inter-group pairs. (b) Complete linkage
looks at the maximum distance between all inter-group pairs. (c) Average linkage uses the average
distance between all inter-group pairs. (d) Centroid linkage first computes the centroid of each
group and then looks at the distance between them. Inspired by Figure 17.3 of Manning et al.
(2008).

size of the data. This can be appealing because we have to make fewer choices when applying
our algorithm. The downside of nonparametric methods is that their flexibility can increase
computational complexity. Also, nonparametric methods often depend on some notion of distance
in the data space and distances become less meaningful in higher dimensions. This phenomenon
is known as the curse of dimensionality and it unfortunately comes up quite often in machine
learning. There are various ways to get an intuition for this behavior. One useful way to see the
curse of dimensionality is observe that squared Euclidean distances are sums over dimensions. If
we have a collection of random variables, the diﬀerences in each dimension will also be random
and the central limit theorem results in this distribution converging to a Gaussian. Figure 5 shows
this eﬀect in the unit hypercube for 2, 10, 100, and 1000 dimensions.

1.2 Example: Binary Features of Animals

We look again at the binary feature data for animals we examined in the K-Means notes.2 These
data are 50 binary vectors, where each entry corresponds to properties of an animal. The raw
data are shown as a matrix in Figure 6a, where the rows are animals such as “lion” and “german
shepherd” while the features are columns such as “tall” and “jungle”. Figure 6b shows a matrix
of Hamming distances in this feature space, and Figure 6c shows the resulting dendrogram using
the average-link criterion. The ordering shown in Figures 6b and 6c is chosen so that there are no
overlaps.
2http://www.psy.cmu.edu/~ckemp/code/irm.html

5
100 100 100 100

90 90 90 90

80 80 80 80
Wait Time

Wait Time

Wait Time
70 70 70 70

60 60 60 60

50 50 50 50

40 40 40 40
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Duration Duration Duration Duration

(a) Single-Linkage (b) Complete-Linkage (c) Average-Linkage (d) Centroid

Figure 3: These figures show clusterings from the four diﬀerent group distance criteria, applied to
the joint durations and waiting times between Old Faithful eruptions. The data were normalized
before clustering. In each case, HAC was run, the tree was truncated at six groups, and these groups
are shown as diﬀerent colors.

(a) Single-Linkage (b) Complete-Linkage (c) Average-Linkage (d) Centroid

Figure 4: These figures show clusterings from the four diﬀerent group distance criteria, applied to
1500 synthetic “pinwheel” data. In each case, HAC was run, the tree was truncated at three groups,
and these groups are shown as diﬀerent colors. (a) The single-linkage criterion can give stringy
clusters, so it can capture the pinwheel shapes (b-d) Complete, average, and centroid linkages try
to create more compact clusters and so tend to not identify the lobes.

1.3 Example: National Governments and Demographics

These data are binary properties of 14 nations (collected in 1965), available at the same URL
as the animal data of the previous section. The nations are Brazil, Burma, China, Cuba, Egypt,
India, Indonesia, Israel, Jordan, the Netherlands, Poland, the USSR, the United Kingdom, and the
USA. The features are various properties of the governments, social structures, and demographics.
The data are shown as a binary matrix in Figure 7a, with missing data shown in gray. Figure 7b
shows the matrix of pairwise distances, with darker indicating greater similarity. Figure 7c shows
a dendrogram arising from the complete-linkage criterion.

6
4 4 4
x 10 x 10 4
x 10 x 10
2 3.5 4 4

1.8 3.5
3 3.5
1.6
3 3
2.5
1.4
2.5 2.5
1.2 2
1 2 2
1.5
0.8 1.5
1.5
0.6 1
1 1
0.4
0.5 0.5 0.5
0.2

0 0 0 0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.5 1 1.5 2 2.5 3 0 2 4 6 8 10 0 5 10 15 20 25 30
Interpoint Distances Interpoint Distances Interpoint Distances Interpoint Distances

(a) 2 Dimensions (b) 10 Dimensions (c) 100 Dimensions (d) 1000 Dimensions

Figure 5: Histograms of inter-point Euclidean distances for 1000 points in a unit hypercube of
increasing dimensionality.
√ Notice how the distribution concentrates relative to the minimum (zero)
and maximum ( D) values. The curse of dimensionality is the idea that this concentration means
diﬀerences in data will be come less meaningful as dimension increases.

1.4 Example: Voting in the Senate

These data are voting patterns of senators in the 113th United States Congress. In total there are
104 senators and 172 roll call votes. The resulting binary matrix is shown in Figure 8a, with
abstentions/absences shown as grey entries. Pairwise Euclidean distances are shown in Figure 8b,
with darker being more similar. Note the very clear block structure; the names have been ordered
to make it clear. Figure 8c shows a dendrogram using the average-linkage criterion. There are two
very large clusters apparent, which have been colored using the conventional red/blue distinction.

2 Divisive Clustering
Agglomerative clustering is a widely-used and intuitive procedure for data exploration and the
construction of hierarchies. While HAC is a bottom-up procedure, divisive clustering is a top-down
hierarchical clustering approach. It starts with all of the data in a single group and then applies a flat
clustering method recursively. That is, it first divides the data into K clusters using, e.g., K-Means
or K-Medoids, and then it further subdivides each of these clusters into smaller groups. This can
be performed until the desired granularity is acheived or each datum belongs to a singleton cluster.
One advantage of divisive clustering is that it does not require binary trees. However, it suﬀers
from all of the diﬃculties and non-determinism of flat clustering, so it is less commonly used than
HAC. A sketch of the divisive clustering algorithm is shown in Algorithm 2.

3 Additional Reading
• Chapter 17 of Manning et al. (2008) is freely available online and is an excellent resource.

• Duda et al. (2001) is a classic and Chapter 10 discusses these methods.

7
Algorithm 2 K -Wise Divisive Clustering Note: written for clarity, not eﬃciency.
1: Input: Data vectors {xn }n=
N , Flat clustering procedure F󰝙󰝎󰝡C󰝙󰝢󰝠󰝡󰝒󰝟(G, K )
1
2:
3: function S󰝢󰝏D󰝖󰝣󰝖󰝑󰝒(G , K ) ⊲ Function to call recursively.
4: K ← F󰝙󰝎󰝡C󰝙󰝢󰝠󰝡󰝒󰝟(G, K )
{Hk } k= ⊲ Perform flat clustering of this group.
1
5: S←∅
6: for k ← 1 . . . K do ⊲ Loop over the resulting partitions.
7: if |Hk | = 1 then
8: S ← S ∪ {Hk } ⊲ Add singleton.
9: else
10: S ← S ∪ S󰝢󰝏D󰝖󰝣󰝖󰝑󰝒(Hk , K ) ⊲ Recurse on non-singletons and add.
11: end if
12: end for
13: Return: S ⊲ Return a set of sets.
14: end function
15:
16: Return: S󰝢󰝏D󰝖󰝣󰝖󰝑󰝒({xn }n=
N , K)
1 ⊲ Call and recurse on the whole data set.

References
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information
Retrieval. Cambridge University Press, 2008. URL http://nlp.stanford.edu/IR-book/.

Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification. Wiley-Interscience,
2001.

Changelog
• TODO

8
(a) Animals and Features

(b) Hamming Distances (c) Average-Linkage Dendrogram

Figure 6: These figures show the result of running HAC on a data set of 50 animals, each with 85
binary features. (a) The feature matrix, where the rows are animals and the columns are binary
features. (b) The distance matrix computed using pairwise Hamming distances. The ordering
shown here is chosen to highlight the block structure. Darker colors are smaller distances. (c) A
dendrogram arising from HAC with average-linkage.

9
(a) Nations and Features

(b) Euclidean Distances (c) Complete-Linkage Dendrogram

Figure 7: These figures show the result of running HAC on a data set of 14 nations, with binary
features. (a) The feature matrix, where the rows are nations and the columns are binary features.
When a feature was missing it was replaced with 1/2. (b) The distance matrix computed using
pairwise Euclidean distances. The ordering shown here is chosen to highlight the block structure.
Darker colors are smaller distances. (c) A dendrogram arising from HAC with complete linkage.

10
(a) Senators and Votes

(b) Euclidean Distances (c) Average-Linking Dendogram

Figure 8: These figures show the result of running HAC on a data set of 104 senators in the 113th
US congress, with binary features corresponding to votes on 172 bills. (a) The feature matrix,
where the rows are senators and the columns are votes. When a vote was missing or there was an
abstention it was replaced with 1/2. (b) The distance matrix computed using pairwise Euclidean
distances. The ordering shown here is chosen to highlight the block structure. Darker coloring
corresponds to smaller distance. (c) A dendrogram arising from HAC with complete linkage, along
with two colored clusters.

Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Clustering
No ratings yet
Clustering
75 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
10 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
ML TCS Lecture Hierarchical 1608
No ratings yet
ML TCS Lecture Hierarchical 1608
41 pages
Hierarchical Clustering: Class Program University Semester Lecturer Sources
100% (1)
Hierarchical Clustering: Class Program University Semester Lecturer Sources
33 pages
Hierarchical Clusters
No ratings yet
Hierarchical Clusters
6 pages
Lecture - 11 Hierarchical Clustering
No ratings yet
Lecture - 11 Hierarchical Clustering
28 pages
CLUSTERING
No ratings yet
CLUSTERING
16 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
35 pages
Clustring
No ratings yet
Clustring
20 pages
Agnes
No ratings yet
Agnes
25 pages
Expt 5
No ratings yet
Expt 5
3 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
CV w4 - Recognition - Statistical Based
No ratings yet
CV w4 - Recognition - Statistical Based
42 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
32 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
9 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
10 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
34 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
23 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Lec 35
No ratings yet
Lec 35
18 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Machine Learning 3
No ratings yet
Machine Learning 3
65 pages
Clustering
No ratings yet
Clustering
38 pages
AI20 - Hierarchical-Clustering
No ratings yet
AI20 - Hierarchical-Clustering
31 pages
Grouping
No ratings yet
Grouping
98 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
6 - Clustering and Applications and Trends in Datamining
No ratings yet
6 - Clustering and Applications and Trends in Datamining
66 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
What Is Cluster Analysis?
No ratings yet
What Is Cluster Analysis?
20 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
Spooo
No ratings yet
Spooo
9 pages
4.4 Hierarchical Clustering Methods
No ratings yet
4.4 Hierarchical Clustering Methods
39 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
11 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Lect 11 DM
No ratings yet
Lect 11 DM
41 pages
Clustering and Applications and Trends in Datamining Lecture:-30 To 35
No ratings yet
Clustering and Applications and Trends in Datamining Lecture:-30 To 35
66 pages
Hierarchical
No ratings yet
Hierarchical
31 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
Data Mining Functionalities
No ratings yet
Data Mining Functionalities
13 pages
Clustering
No ratings yet
Clustering
75 pages
Phân Cấp Phân Cụm
No ratings yet
Phân Cấp Phân Cụm
17 pages
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
No ratings yet
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
7 pages
Clustering Hierarchical Algorithms
100% (1)
Clustering Hierarchical Algorithms
21 pages
Clustering
No ratings yet
Clustering
39 pages
Hierarchial Clustering
No ratings yet
Hierarchial Clustering
14 pages
10Hierarchical&Probabilistic Clustering & GMM (ML)
No ratings yet
10Hierarchical&Probabilistic Clustering & GMM (ML)
24 pages
Introduction To Clustering: Alka Arora Sr. Scientist
No ratings yet
Introduction To Clustering: Alka Arora Sr. Scientist
57 pages
Data Science Session 8 Clustering V0
No ratings yet
Data Science Session 8 Clustering V0
30 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Introduction to Topology
From Everand
Introduction to Topology
Simone Malacrida
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
4.2.3.agreement in Failure - Free System
No ratings yet
4.2.3.agreement in Failure - Free System
5 pages
Computer Science Practice Book III
No ratings yet
Computer Science Practice Book III
115 pages
Lecture 5 Fixed Point Vs Floating Point Q-Format Number Representation
No ratings yet
Lecture 5 Fixed Point Vs Floating Point Q-Format Number Representation
5 pages
Introduction To The Parareal Algorithm
No ratings yet
Introduction To The Parareal Algorithm
93 pages
The Bubble Sort: Algorithm. A Sorting Algorithm Is A Technique For Scanning Through An Array and Rearrang
No ratings yet
The Bubble Sort: Algorithm. A Sorting Algorithm Is A Technique For Scanning Through An Array and Rearrang
4 pages
Coordinate Geometry
No ratings yet
Coordinate Geometry
11 pages
Minimum Degree Reordering Algorithms: A Tutorial: 1 Introduction: Cholesky Factorization and The Elimination Game
No ratings yet
Minimum Degree Reordering Algorithms: A Tutorial: 1 Introduction: Cholesky Factorization and The Elimination Game
5 pages
UNIT 5 Approximation Algorithms
No ratings yet
UNIT 5 Approximation Algorithms
59 pages
CSZG524 ESZG524 MELZG524 - Real Time Operating System - 2023-24 - Midsem - Make Up
No ratings yet
CSZG524 ESZG524 MELZG524 - Real Time Operating System - 2023-24 - Midsem - Make Up
3 pages
Salesforce - LeetCode
No ratings yet
Salesforce - LeetCode
3 pages
Unit-4 Dynamic Programming: Dr. Gopi Sanghani
No ratings yet
Unit-4 Dynamic Programming: Dr. Gopi Sanghani
65 pages
Closure Properties
No ratings yet
Closure Properties
41 pages
Lecture 5 Complement
No ratings yet
Lecture 5 Complement
13 pages
Context Free Grammars Normal
No ratings yet
Context Free Grammars Normal
38 pages
Solution:: Quiz 3
No ratings yet
Solution:: Quiz 3
10 pages
1D Arrays IV2
No ratings yet
1D Arrays IV2
7 pages
9219 Et Et
No ratings yet
9219 Et Et
13 pages
The Coin Changing Problem The Coin Changing Problem
No ratings yet
The Coin Changing Problem The Coin Changing Problem
17 pages
Ece-V-Information Theory & Coding (10ec55) - Notes
No ratings yet
Ece-V-Information Theory & Coding (10ec55) - Notes
217 pages
MD5 - With Example
89% (19)
MD5 - With Example
7 pages
Local Search Based Metaheuristics For The Robust Vehicle Routing Problem With Discrete Scenarios
No ratings yet
Local Search Based Metaheuristics For The Robust Vehicle Routing Problem With Discrete Scenarios
14 pages
DFA Minimization
100% (1)
DFA Minimization
24 pages
Compiler Design
No ratings yet
Compiler Design
59 pages
Front H
No ratings yet
Front H
5 pages
Solution Tutorial 9
No ratings yet
Solution Tutorial 9
4 pages
Simple Algorithm
No ratings yet
Simple Algorithm
6 pages
Midterm I - Version B: 1 2 1.5 3 Log N
No ratings yet
Midterm I - Version B: 1 2 1.5 3 Log N
5 pages
9-2 Arithmetic Sequence: Lulu Nabil Alghazi 11-B
No ratings yet
9-2 Arithmetic Sequence: Lulu Nabil Alghazi 11-B
11 pages
Beyond Classical Search AIMA Exercises
No ratings yet
Beyond Classical Search AIMA Exercises
6 pages
Data Preparation 3
No ratings yet
Data Preparation 3
5 pages

Hierarchical Clustering

Uploaded by

Hierarchical Clustering

Uploaded by

Hierarchical Clustering

Figure 1: These figures demonstrate hierarchical agglomerative clustering of high-scoring profes-

(c) Average-Linkage (d) Centroid

1.2 Example: Binary Features of Animals

(a) Single-Linkage (b) Complete-Linkage (c) Average-Linkage (d) Centroid

(a) Single-Linkage (b) Complete-Linkage (c) Average-Linkage (d) Centroid

1.3 Example: National Governments and Demographics

1.4 Example: Voting in the Senate

• Duda et al. (2001) is a classic and Chapter 10 discusses these methods.

(b) Hamming Distances (c) Average-Linkage Dendrogram

(b) Euclidean Distances (c) Complete-Linkage Dendrogram

(b) Euclidean Distances (c) Average-Linking Dendogram

You might also like