Hierarchical Clustering Algorithm
Hierarchical Clustering Algorithm
CLUSTERING
ALGORITHM
ROSHINI SELVAKUMAR
2021503041
INTRODUCTION
A Hierarchical clustering method works via grouping data into a tree of clusters.
Hierarchical clustering begins by treating every data point as a separate cluster.
Then, it repeatedly executes the subsequent steps:
1.Identify the 2 clusters which can be closest together, and
Agglomerative Clustering
Initially consider every data point as an individual Cluster and at every step, merge
the nearest pairs of the cluster. (It is a bottom-up method).
At first, every dataset is considered an individual entity or cluster.
At every iteration, the clusters merge with different clusters until one cluster is
formed.
AGGLOMERATIVE CLUSTERING
The algorithm for Agglomerative Hierarchical Clustering is:
Calculate the similarity of one cluster with all the other clusters (calculate proximity matrix)
Calculate the distance between each pair of data points using a distance function, like
Euclidean distance.
Fill the matrix with the distances calculated in step 1. The proximity matrix will be a
square matrix with dimensions n x n where n is the number of data points.
Consider every data point as an individual cluster
Merge the clusters which are highly similar or close to each other.
Recalculate the proximity matrix for each cluster
Repeat Steps 3 and 4 until only a single cluster remains.
AGGLOMERATIVE CLUSTERING-
EXAMPLE
AGGLOMERATIVE CLUSTERING-
EXAMPLE
• Step-1: Consider each alphabet as a single cluster and calculate the distance of one cluster from all the
other clusters.
• Step-2: In the second step comparable clusters are merged together to form a single cluster. Let’s say
cluster (B) and cluster (C) are very similar to each other therefore we merge them in the second step
similarly to cluster (D) and (E) and at last, we get the clusters [(A), (BC), (DE), (F)]
• Step-3: We recalculate the proximity according to the algorithm and merge the two nearest
clusters([(DE), (F)]) together to form new clusters as [(A), (BC), (DEF)]
• Step-4: Repeating the same process; The clusters DEF and BC are comparable and merged together to
form a new cluster. We’re now left with clusters [(A), (BCDEF)].
• Step-5: At last, the two remaining clusters are merged together to form a single cluster [(ABCDEF)].
DIVISIVE CLUSTERING
We can say that Divisive Hierarchical clustering is
precisely the opposite of Agglomerative Hierarchical
clustering.
It’s a top- down method.
In Divisive Hierarchical clustering, we take into account
all of the data points as a single cluster and in every iteration,
we separate the data points from the clusters which aren’t
comparable.
In the end, we are left with N clusters.
ADVANTAGES AND DIS-
ADVANTAGES
ADVANTAGES: DIS-ADVANTAGES:
•The ability to handle non-convex •The need for a criterion to stop the
clusters and clusters of different sizes clustering process and determine the
and densities. final number of clusters.
•The ability to handle missing data •The computational cost and memory
and noisy data. requirements of the method can be
high, especially for large datasets.
•The ability to reveal the hierarchical
structure of the data, which can be •The results can be sensitive to the
useful for understanding the initial conditions, linkage criterion,
relationships among the clusters. and distance metric used.