Hierarchical Clustering
Hierarchical Clustering
in Machine Learning
Dendrogra
m 3
A dendrogram
A dendrogram, a tree-like figure produced by
hierarchical clustering, depicts the hierarchical
relationships between groups.
Individual data points are located at the bottom of
the dendrogram,
while the largest clusters, which include all the data
points, are located at the top.
In order to generate different numbers of clusters, the
dendrogram can be sliced at various heights.
4
A dendrogram
X axis of the dendrogram represents the features or
columns of the dataset,
Y axis of the dendrogram represents the Euclidian
distance between data observations.
5
Hierarchical clustering types
1. Agglomerative: Initially, each object is
considered to be its own cluster.
2. According to a particular procedure, the
clusters are then merged step by step until a
single cluster remains.
3. At the end of the cluster merging process, a
cluster containing all the elements will be
formed.
6
Hierarchical clustering types
1. Divisive: The Divisive method is the opposite
of the Agglomerative method. Initially, all
objects are considered in a single cluster. Then
the division process is performed step by step
until each object forms a different cluster. The
cluster division or splitting procedure is
carried out according to some principles that
maximum distance between neighboring
objects in the cluster.
7
Agglomerative
8
Agglomerative: Steps for Agglomerative
clustering can be summarized as follows:
11
Euclidean Distance
The Pythagorean Theorem can be used to calculate the
distance between two points, as shown in the figure below.
If the points (x1, y1)) and (x2, y2) in 2-dimensional space,
12
Manhattan Distance
Euclidean distance may not be suitable while measuring the
distance between different locations
The Manhattan distance is the simple sum of the horizontal
and vertical components.
13
Computing a proximity matrix
14
Similarity between Clusters
The main question in hierarchical clustering is
how to calculate the distance between clusters
and update the proximity matrix.
There are many different approaches used to
answer that question.
The choice will depend on whether there is
noisein the data set,
whether the shape of the clusters is circular or not,
the density of the data points.
15
A numerical example
16
two clusters in the sample data set, as shown in
Figure.
17
Min (Single) Linkage
One way to measure the distance between
clusters is to find the minimum distance
between points in those clusters.
That is, we can find the point in the first
cluster nearest to a point in the other cluster
and calculate the distance between those
points.
18
Min (Single) Linkage
20
Max (Complete) Linkage
maximum distance between points in two
clusters.
find the points in each cluster that are furthest
away from each other and calculate the
distance between those points.
21
Max (Complete) Linkage
Max is less sensitive to noise and outliers in comparison to
MIN method.
However, MAX can break large clusters and tends to be
biased towards globular clusters.
22
Centroid Linkage
The Centroid method defines the distance between clusters as being the
distance between their centers/centroids.
After calculating the centroid for each cluster, the distance between those
centroids is computed using a distance function.
23
Average Linkage
The Average method defines the distance between clusters as
the average pairwise distance among all pairs of points in the
clusters.
For simplicity, only some of the lines connecting pairs of
points are shown in Figure
24
Ward Linkage
The Ward approach analyzes the variance of
the clusters rather than measuring distances
directly, minimizing the variance between
clusters.
With the Ward method, the distance between
two clusters is related to how much the sum of
squares (SS) value will increase when
combined.
25
Ward Linkage
In other words, the Ward method attempts to
minimize the sum of the squared distances of
the points from the cluster centers.
Compared to the distance-based measures, the
Ward method is less susceptible to noise and
outliers.
Therefore, Ward's method is preferred more
than others in clustering.
26
27
Hierarchical Clustering with Python
In Python, the Scipy and Scikit-Learn libraries have
defined functions for hierarchical clustering.
First, we'll import NumPy, matplotlib, and seaborn
(for plot styling):
28
Hierarchical Clustering with Python
graph this data set as a scatter plot:
29
Hierarchical Clustering with Python
graph this data
set as a scatter
plot:
30
Hierarchical Clustering using Scipy
The Scipy library has the linkage function for
hierarchical (agglomerative) clustering.
31
Hierarchical Clustering using Scipy
by passing the dendrogram function to matplotlib, we can
view a plot of these linkages:
32
RESULT: Dendrogram
33
Hierarchical Clustering using Scipy
Finally, let's use the fcluster function to find
the clusters for the Ward linkage:
34
Hierarchical Clustering using Scikit-
Learn
35
Hierarchical Clustering using Scikit-
Learn
Using sklearn is slightly different than scipy.
We need to import the AgglomerativeClustering
class, then instantiate it with the number of desired
clusters and the distance (linkage) function to use.
36
Hierarchical Clustering using Scikit-
Learn
Result:
37
Clustering a real dataset
dataset from the book Biostatistics with R,
which contains information for nine different
protein sources and their respective
consumption from various countries.
38
39
Dendrogram
40
Result
41