0% found this document useful (0 votes)
3 views2 pages

Hierarchical Clustering

Hierarchical clustering is a method for grouping similar data points into a tree-like structure, starting with each point as its own cluster and progressively merging or splitting them. There are two main types: Agglomerative Clustering, which merges clusters from the bottom up, and Divisive Clustering, which splits clusters from the top down. Both methods do not require pre-specifying the number of clusters and involve iterative processes to achieve the final grouping.

Uploaded by

sarey74393
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views2 pages

Hierarchical Clustering

Hierarchical clustering is a method for grouping similar data points into a tree-like structure, starting with each point as its own cluster and progressively merging or splitting them. There are two main types: Agglomerative Clustering, which merges clusters from the bottom up, and Divisive Clustering, which splits clusters from the top down. Both methods do not require pre-specifying the number of clusters and involve iterative processes to achieve the final grouping.

Uploaded by

sarey74393
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Hierarchical clustering is a technique used to group similar data points together

based on their similarity creating a hierarchy or tree-like structure. The key


idea is to begin with each data point as its own separate cluster and then
progressively merge or split them based on their similarity.
Let s understand this with the help of an example
Imagine you have four fruits with different weights: an apple (100g), a banana
(120g), a cherry (50g), and a grape (30g). Hierarchical clustering starts by
treating each fruit as its own group.
 It then merges the closest groups based on their weights.
 First, the cherry and grape are grouped together because they are the lightest.
 Next, the apple and banana are grouped together.
Finally, all the fruits are merged into one large group, showing how hierarchical
clustering progressively combines the most similar data points.
Types of Hierarchical Clustering
Now that we understand the basics of hierarchical clustering, let’s explore the
two main types of hierarchical clustering.
1. Agglomerative Clustering
2. Divisive clustering
Hierarchical Agglomerative Clustering
It is also known as the bottom-up approach or hierarchical agglomerative
clustering (HAC). Unlike flat clustering hierarchical clustering provides a
structured way to group data. This clustering algorithm does not require us to pre
specify the number of clusters. Bottom-up algorithms treat each data as a
singleton cluster at the outset and then successively agglomerate pairs of clusters
until all clusters have been merged into a single cluster that contains all data.

Hierarchical Divisive clustering


It is also known as a top-down approach. This algorithm also does not require to
pre specify the number of clusters. Top-down clustering requires a method for
splitting a cluster that contains the whole data and proceeds by splitting clusters
recursively until individual data have been split into singleton clusters.
Workflow for Hierarchical Divisive clustering:
1. Start with all data points in one cluster: Treat the entire dataset as a single
large cluster.
2. Split the cluster: Divide the cluster into two smaller clusters. The division is
typically done by finding the two most dissimilar points in the cluster and
using them to separate the data into two parts.
3. Repeat the process: For each of the new clusters, repeat the splitting process:
1. Choose the cluster with the most dissimilar points.
2. Split it again into two smaller clusters.
4. Stop when each data point is in its own cluster: Continue this process until
every data point is its own cluster, or the stopping condition (such as a
predefined number of clusters) is met.

You might also like