0% found this document useful (0 votes)
53 views41 pages

Hierarchical Clustering

This document provides an overview of hierarchical clustering in machine learning. It discusses how hierarchical clustering groups similar objects together in a tree structure called a dendrogram. It describes different methods for calculating the distance between clusters, including single, complete, average, centroid, and Ward's linkage. It also provides an example of performing hierarchical clustering on a dataset using Python libraries like Scipy and Scikit-Learn.

Uploaded by

Sivam Chinna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views41 pages

Hierarchical Clustering

This document provides an overview of hierarchical clustering in machine learning. It discusses how hierarchical clustering groups similar objects together in a tree structure called a dendrogram. It describes different methods for calculating the distance between clusters, including single, complete, average, centroid, and Ward's linkage. It also provides an example of performing hierarchical clustering on a dataset using Python libraries like Scipy and Scikit-Learn.

Uploaded by

Sivam Chinna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 41

Hierarchical Clustering

in Machine Learning

Fundamental of Data Science:


UNIT II
Dr. C. Sivaraj
Hierarchical clustering
 Hierarchical clustering is a popular method for
grouping objects used in unsupervised machine
learning.
 It creates groups so that objects within a group are
similar to each other and different from objects in
other groups.
 Clusters are visually represented in a hierarchical tree
called a dendrogram.
 The assumption is that data points that are close to
each other are more similar or related than data points
that are farther apart
2
Let's consider that we have a set of cars and we want to group
similar ones together. SEDANs AND SUVs
last step, we can group
everything into one cluster
and finish when we’re left
with only one cluster

Next, we'll bunch


the sedans and the
SUVs together.

For starters, we have four


cars that we can put into
two clusters of car types:

Dendrogra
m 3
A dendrogram
 A dendrogram, a tree-like figure produced by
hierarchical clustering, depicts the hierarchical
relationships between groups.
 Individual data points are located at the bottom of
the dendrogram,
 while the largest clusters, which include all the data
points, are located at the top.
 In order to generate different numbers of clusters, the
dendrogram can be sliced at various heights.

4
A dendrogram
 X axis of the dendrogram represents the features or
columns of the dataset,
 Y axis of the dendrogram represents the Euclidian
distance between data observations.

5
Hierarchical clustering types
1. Agglomerative: Initially, each object is
considered to be its own cluster.
2. According to a particular procedure, the
clusters are then merged step by step until a
single cluster remains.
3. At the end of the cluster merging process, a
cluster containing all the elements will be
formed.

6
Hierarchical clustering types
1. Divisive: The Divisive method is the opposite
of the Agglomerative method. Initially, all
objects are considered in a single cluster. Then
the division process is performed step by step
until each object forms a different cluster. The
cluster division or splitting procedure is
carried out according to some principles that
maximum distance between neighboring
objects in the cluster.
7
Agglomerative

8
Agglomerative: Steps for Agglomerative
clustering can be summarized as follows:

 Hierarchical clustering employs a measure of


distance/similarity to create new clusters.
 Step 1: Compute the proximity matrix using a particular
distance metric
 Step 2: Each data point is assigned to a cluster
 Step 3: Merge the clusters based on a metric for the similarity
between clusters
 Step 4: Update the distance matrix
 Step 5: Repeat Step 3 and Step 4 until only a single cluster
remains
9
Computing a proximity matrix
 The first step of the algorithm is to create a
distance matrix.
 The values of the matrix are calculated by
applying a distance function between each pair
of objects.

 The Euclidean distance function is


commonly used for this operation.
10
Euclidean Distance
 The Euclidean distance is the most widely used
distance measure when the variables are continuous
(either interval or ratio scale).

 The Euclidean distance between two points


calculates the length of a segment connecting the
two points. It is the most evident way of representing
the distance between two points.

11
Euclidean Distance
 The Pythagorean Theorem can be used to calculate the
distance between two points, as shown in the figure below.
 If the points (x1, y1)) and (x2, y2) in 2-dimensional space,

12
Manhattan Distance
 Euclidean distance may not be suitable while measuring the
distance between different locations
 The Manhattan distance is the simple sum of the horizontal
and vertical components.

13
Computing a proximity matrix

14
Similarity between Clusters
 The main question in hierarchical clustering is
how to calculate the distance between clusters
and update the proximity matrix.
 There are many different approaches used to
answer that question.
 The choice will depend on whether there is
 noisein the data set,
 whether the shape of the clusters is circular or not,
 the density of the data points.

15
A numerical example

16
two clusters in the sample data set, as shown in
Figure.

17
Min (Single) Linkage
 One way to measure the distance between
clusters is to find the minimum distance
between points in those clusters.
 That is, we can find the point in the first
cluster nearest to a point in the other cluster
and calculate the distance between those
points.

18
Min (Single) Linkage

The advantage of the Min


method is that it can
accurately handle non-
elliptical shapes.
The disadvantages are that it
is sensitive to noise and
outliers.
19
Max (Complete) Linkage

maximum distance between points in two


clusters. We can find the points in each
cluster that are furthest away from each
other and calculate the distance between
those points. In Figure 3, the maximum
distance is between and . Distance
between those two points, and hence the
distance between clusters, is found as .

20
Max (Complete) Linkage
 maximum distance between points in two
clusters.
 find the points in each cluster that are furthest
away from each other and calculate the
distance between those points.

21
Max (Complete) Linkage
 Max is less sensitive to noise and outliers in comparison to
MIN method.
 However, MAX can break large clusters and tends to be
biased towards globular clusters.

22
Centroid Linkage
 The Centroid method defines the distance between clusters as being the
distance between their centers/centroids.
 After calculating the centroid for each cluster, the distance between those
centroids is computed using a distance function.

23
Average Linkage
 The Average method defines the distance between clusters as
the average pairwise distance among all pairs of points in the
clusters.
 For simplicity, only some of the lines connecting pairs of
points are shown in Figure

24
Ward Linkage
 The Ward approach analyzes the variance of
the clusters rather than measuring distances
directly, minimizing the variance between
clusters.
 With the Ward method, the distance between
two clusters is related to how much the sum of
squares (SS) value will increase when
combined.

25
Ward Linkage
 In other words, the Ward method attempts to
minimize the sum of the squared distances of
the points from the cluster centers.
 Compared to the distance-based measures, the
Ward method is less susceptible to noise and
outliers.
 Therefore, Ward's method is preferred more
than others in clustering.

26
27
Hierarchical Clustering with Python
 In Python, the Scipy and Scikit-Learn libraries have
defined functions for hierarchical clustering.
 First, we'll import NumPy, matplotlib, and seaborn
(for plot styling):

28
Hierarchical Clustering with Python
 graph this data set as a scatter plot:

29
Hierarchical Clustering with Python
 graph this data
set as a scatter
plot:

30
Hierarchical Clustering using Scipy
 The Scipy library has the linkage function for
hierarchical (agglomerative) clustering.

 The linkage function has several methods available


for calculating the distance between clusters:
 single, average, weighted, centroid, median, and ward.

 To draw the dendrogram, we'll use the dendrogram


function.

31
Hierarchical Clustering using Scipy
 by passing the dendrogram function to matplotlib, we can
view a plot of these linkages:

 by passing the dendrogram function to matplotlib, we can


view a plot of these linkages:

32
RESULT: Dendrogram

33
Hierarchical Clustering using Scipy
 Finally, let's use the fcluster function to find
the clusters for the Ward linkage:

34
Hierarchical Clustering using Scikit-
Learn

 The Scikit-Learn library has its own function


for agglomerative hierarchical clustering:
AgglomerativeClustering.
 Options for calculating the distance between
clusters include ward, complete, average,
and single.

35
Hierarchical Clustering using Scikit-
Learn
 Using sklearn is slightly different than scipy.
 We need to import the AgglomerativeClustering
class, then instantiate it with the number of desired
clusters and the distance (linkage) function to use.

36
Hierarchical Clustering using Scikit-
Learn
 Result:

37
Clustering a real dataset
 dataset from the book Biostatistics with R,
which contains information for nine different
protein sources and their respective
consumption from various countries.

 We'll use this data to group countries


according to their protein consumption.

38
39
Dendrogram

40
Result

41

You might also like