0% found this document useful (0 votes)
7 views31 pages

Hierarchical

Hierarchical clustering is an unsupervised machine learning algorithm that organizes data into clusters based on similarity, using a top-down or bottom-up approach. It includes two main types: Agglomerative Hierarchical Clustering, which merges clusters from individual data points, and Divisive Hierarchical Clustering, which starts with one cluster and splits it into smaller clusters. The algorithm employs various linkage methods to measure distances between clusters and can be visualized using a dendrogram.

Uploaded by

abhijaychauhan88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views31 pages

Hierarchical

Hierarchical clustering is an unsupervised machine learning algorithm that organizes data into clusters based on similarity, using a top-down or bottom-up approach. It includes two main types: Agglomerative Hierarchical Clustering, which merges clusters from individual data points, and Divisive Hierarchical Clustering, which starts with one cluster and splits it into smaller clusters. The algorithm employs various linkage methods to measure distances between clusters and can be visualized using a dendrogram.

Uploaded by

abhijaychauhan88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

HIERARCHICAL Unsupervised ML

CLUSTERING
HIERARCHICAL
CLUSTERING ALGORITHM
Also called Hierarchical cluster analysis or HCA is an unsupervised
clustering algorithm which involves creating clusters that have
predominant ordering from top to bottom.
For e.g. All files and folders on our hard disk are organized in a hierarchy.
The algorithm groups similar objects into groups called clusters. The
endpoint is a set of clusters or groups, where each cluster is distinct
from each other cluster, and the objects within each cluster are broadly
similar to each other.
This clustering technique is divided into two types:
Agglomerative Hierarchical Clustering
Divisive Hierarchical Clustering
AGGLOMERATIVE
HIERARCHICAL CLUSTERING
The Agglomerative Hierarchical Clustering is the most common
type of hierarchical clustering used to group objects in clusters
based on their similarity. It’s also known as AGNES (Agglomerative
Nesting). It's a “bottom-up” approach: each observation starts
in its own cluster, and pairs of clusters are merged as one
moves up the hierarchy.
HOW DOES IT WORK?
Make each data point a single-point cluster → forms N clusters
Take the two closest data points and make them one cluster →
forms N-1 clusters
Take the two closest clusters and make them one cluster → Forms
N-2 clusters.
Repeat step-3 until you are left with only one cluster.
LINKAGE METHODS FOR
CLUSTER OBSERVATIONS
There are several ways to measure the distance between clusters in order
to decide the rules for clustering, and they are often called Linkage
Methods. Some of the common linkage methods are:

Complete-linkage: the distance between two clusters is defined as


the longest distance between two points in each cluster.
dmj = max (dkj, dlj)
Single-linkage: the distance between two clusters is defined as
the shortest distance between two points in each cluster. This linkage may
be used to detect high values in your dataset which may be outliers as they
will be merged at the end.
dmj = min (dkj, dlj)
LINKAGE METHODS
Average-linkage: the distance between two clusters is defined as
the average distance between each point in one cluster to every
point in the other cluster.

Centroid-linkage: finds the centroid of cluster 1 and centroid of


cluster 2, and then calculates the distance between the two before
merging.
LINKAGE METHODS
Ward :- With Ward's linkage method, the distance between two
clusters is the sum of squared deviations from points to centroids.
The objective of Ward's linkage is to minimize the within-cluster
sum of squares.
The distance is calculated with the following distance matrix:
WHAT IS A
DENDROGRAM?
A Dendrogram is a type of tree diagram showing hierarchical
relationships between different sets of data.
As already said a Dendrogram contains the memory of hierarchical
clustering algorithm, so just by looking at the Dendrgram you can
tell how the cluster is formed.
DENDROGRAM
EXAMPLE HIERARCHICAL
CLUSTER ANALYSIS:
Example: We asked people about how many hours a week they
spend on social media platforms and at the gym.
HOW IS A HIERARCHICAL CLUSTER
ANALYSIS CALCULATED?
.
With this we can now start to create the clusters. In the first step
we assign a cluster to each point. So we have as many clusters as
we have persons.
The goal now is: to merge more and more clusters little by little,
until finally all points are in one cluster.
For this we need to determine two things:
1. How the distance between two points is measured.
2. How points in a cluster are connected.
DISTANCE BETWEEN TWO
POINTS
Let's start with the question, how do we calculate the distance
between two points? Here are the most known distances:
Euclidean Distance
Manhattan Distance
Maximum Distance
EUCLIDEAN DISTANCE
.
MANHATTAN DISTANCE
.
MAXIMUM DISTANCE
.
LINKING METHOD
Now that we know what ways there are to calculate the distances
between points, we need to determine how to link the points within
a cluster.
SINGLE-LINKAGE
Single-linkage uses the distance between the closest elements in
the cluster. This is the distance between Caro and Joe.
COMPLETE-LINKAGE
Complete linkage uses the distance between the farthest elements
in the cluster. So between Max and Joe.
AVERAGE-LINKAGE
Average-linkage uses the average of all pairwise distances. From
each combination the distance is calculated and from it the
average.
.
DIVISIVE HIERARCHICAL
CLUSTERING
In Divisive or DIANA(DIvisive ANAlysis Clustering) is a top-down
clustering method where we assign all of the observations to a
single cluster and then partition the cluster to two least similar
clusters. Finally, we proceed recursively on each cluster until there
is one cluster for each observation. So this clustering approach is
exactly opposite to Agglomerative clustering.
THE STEPS TO FORM
DIVISIVE CLUSTERING
Step 1: Start with all data points in the cluster.

Step 2: After each iteration, remove the “outsiders” from the least
cohesive cluster.

Step 3: Stop when each example is in its own singleton cluster,


else go to step 2.
DENDROGRAM

You might also like