0% found this document useful (0 votes)
22 views28 pages

Lecture - 11 Hierarchical Clustering

Uploaded by

Prince Abdullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views28 pages

Lecture - 11 Hierarchical Clustering

Uploaded by

Prince Abdullah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Hierarchical Clustering

By: Abdul Hameed

1
Hierarchical Clustering

• Hierarchical clustering (also called hierarchical cluster analysis or


HCA) is a method of cluster analysis which seeks to build a
hierarchy of clusters. E.g. All files and folders on our hard disk are
organized in a hierarchy.

2
Hierarchical Clustering - Dendrogram
• It constructs a binary tree of the data that consecutively combines related ensembles of points. The
graphical representation of the resultant hierarchy is a tree-structured graph named dendrogram.

3
Hierarchical Clustering - Strategies

• Agglomerative (bottom-up):
• Beginning with singletons (sets with 1 element)
• Merging them until S is achieved as the root
• In each step, the two closest clusters are aggregates into a new combined cluster
• In this way, number of clusters in the dataset is reduced at each step
• Eventually, all records/elements are combined into a single huge cluster
• It is the most common approach
• Divisive (top-down):
• All records are combined in to a one big cluster
• Then the most dissimilar records being split off recursively partitioning S until
singleton sets are reached.

4
Hierarchical Clustering - Strategies

5
Hierarchical Agglomerative Clustering - Steps

1. Start by assigning each item to its own cluster, so that if you


have N items, you now have N clusters, each containing just one
item. Let the distances (similarities) between the clusters equal
the distances (similarities) between the items they contain.
2. Find the closest (most similar) pair of clusters and merge them
into a single cluster, so that now you have one less cluster.
3. Compute distances (similarities) between the new cluster and
each of the old clusters.
4. Repeat steps 2 and 3 until all items are clustered into a single
cluster of size N.
6
Example - Problem

• Suppose I want to divide my students into different groups.


• I have the marks scored by each student in an assignment and based on
these marks, I want to segment them into groups.
• There’s no fixed target here as to how many groups to have.
• Since I don’t know what type of students should be assigned to which
group, it cannot be solved as a supervised learning problem.
• So, I shall try to apply hierarchical clustering here and segment the students
into different groups.

7
Example - Dataset

• Let’s take a sample of five students

Reg# Marks
1 10
2 7
3 28
4 20
5 35

8
Example – Proximity Matrix

• Proximity matrix stores the distances between each point.


Reg# Marks Distances have been
1 10 calculated using Euclidean
Distance.
2 7
3 28 For example: distance
between 1 and 2
4 20
5 35

9
Example – Step 1

• Assign all the points to an


individual cluster.

10
Example – Step 2

• Find smallest distance in the proximity matrix and merge


the points with the smallest distance:

11
Example – Step 2

• Update the tables Reg# Marks


1 10
2 7
3 28
4 20
5 35

Reg# Marks
(1,2) 10
3 28
4 20
5 35

12
Example – Step 3

• Now repeat step 2 until only a single cluster is left.


10 7 28 20 35

13
Example – How many clusters?

• Now, we can set a threshold distance and draw a


horizontal line (Generally, we try to set the
threshold in such a way that it cuts the tallest
vertical line). Let’s set this threshold as 12 and
draw a horizontal line.
• The number of clusters will be the number of
vertical lines which are being intersected by the
line drawn using the threshold. Since the red
line intersects 2 vertical lines, we will have 2
clusters. One cluster will have a sample (1,2,4)
and the other will have a sample (3,5).

14
Distance Measures

• Single link: smallest distance between an


element in one cluster and an element in the
other, i.e.,
• , )}
• Complete link: largest distance between an
element in one cluster and an element in the
other, i.e.,
• , )}
• Average: avg distance between elements in
one cluster and elements in the other, i.e.,
• , )}

15
Summary

• Hierarchical Clustering • Disadvantages


• Not easy to define levels for clusters
• For a dataset consisting of n points
• Can never undo what was done
• O(n2) space; it requires storing the previously
distance matrix • Sensitive to cluster distance measures
• O(n3) time complexity in most of the and noise/outliers
cases(agglomerative clustering) • Experiments showed that other
clustering techniques
• Advantages • outperform hierarchical clustering
• Dendograms are great for • There are several variants to overcome
visualization its weaknesses
• Provides hierarchical relations • BIRCH: scalable to a large data set
between clusters • ROCK: clustering categorical data
• CHAMELEON: hierarchical clustering
using dynamic modelling
16
Divisive Clustering

17
Divisive Hierarchical Clustering

• Divisive or DIANA(DIvisive ANAlysis Clustering) is a top-down


clustering approach.
• The process starts at the root with all the points as one
cluster.
• It recursively splits the higher level clusters to build the
dendrogram.
• Can be considered as a global approach.
• Divisive clustering is good at identifying large clusters while
agglomerative clustering is good at identifying small clusters.
18
Hierarchical Clustering - Strategies

19
Hierarchical Clustering - Steps

1. Start with all data points in the cluster.


2. After each iteration, remove the
outsiders/heterogeneous objects from
the cluster.
3. Stop when each example is in its own
singleton cluster, else go to step 2.
20
Example - Divisive Hierarchical Clustering

ID X1 X2 X3 ID 1 2 3 4 5
1 1 6 -1 1 0 4 4 5 7
2 3 7 0 2 4 0 4 3 3
3 3 5 -2 3 4 4 0 5 7
4 4 8 -1 4 5 3 5 0 2
5 5 8 0 5 7 3 7 2 0

Proximity Matrix using Manhattan Distance

Top level cluster A{1,2,3,4,5} is all the 5 points


21
Example

ID 1 2 3 4 5 Find most dissimilar point


Take the average distance to the other points. So, just for
1 0 4 4 5 7 example, for point 1 we compute
2 4 0 4 3 3
3 4 4 0 5 7
4 5 3 5 0 2
Similarly, average distances for all the points are:
5 7 3 7 2 0

Proximity Matrix using Manhattan Distance Point 1 2 3 4 5


Distance 5.00 3.50 5.00 3.75 4.75

22
Example

• Since points 1 and 3 are Point 1 2 3 4 5


Distance 5.00 3.50 5.00 3.75 4.75
tied for the most
dissimilar, we pick one of
these arbitrarily.
• I will use point 1.
• Now we have
• A = {2,3,4,5}
• B = {1}

23
Example

Now we want to move any points that are closer to B than


ID 1 2 3 4 5 (the other points in) A into B. So for each point x in A we
1 0 4 4 5 7 compute d(x, A) and d(x,B). For example, for point 2 we
compute:
2 4 0 4 3 3
3 4 4 0 5 7
4 5 3 5 0 2
5 7 3 7 2 0 Similarly, average distances for all the points are:

Proximity Matrix using Manhattan Distance


Point 2 3 4 5
Distance -0.67 1.33 1.67 -3

24
Example

• Only point 3 is bigger Point 2 3 4 5


Distance -0.67 1.33 1.67 -3
than zero so we move it
to cluster B.
• Now we have
• A = {2,4,5}
• B = {1,3}

25
Example

We check if any additional points should be moved. Again,


ID 1 2 3 4 5 we compute d(x, A) - d(x,B) for each point in A. The
1 0 4 4 5 7 differences are:
2 4 0 4 3 3 Point 2 4 5
3 4 4 0 5 7
Distance -1.0 -2.5 -4.5
4 5 3 5 0 2
5 7 3 7 2 0
All are negative (that is the remaining
Proximity Matrix using Manhattan Distance
points in A are closer to A than to B),
so we stop this division and we have
the two clusters {2,4,5} and {1,3}.
26
Example

For the next step, we choose the cluster with the largest diameter, that
ID 1 2 3 4 5 is the cluster with the greatest distance between two points in the
1 0 4 4 5 7 cluster.
2 4 0 4 3 3
3 4 4 0 5 7
4 5 3 5 0 2
So cluster {1,3} has the largest diameter. Trivially, this will be split into
5 7 3 7 2 0 {1} and {3}. So now we have clusters {2,4,5}, {1} and {3}.

No recursively apply the same steps to {2,4,5} to split it further.


Proximity Matrix using Manhattan Distance

27
Example - Dendrogram

ID 1 2 3 4 5
1 0 4 4 5 7
2 4 0 4 3 3
3 4 4 0 5 7
4 5 3 5 0 2
5 7 3 7 2 0

Proximity Matrix using Manhattan Distance

28

You might also like