Hierarchical Clustering

Hierarchical clustering is an unsupervised machine learning algorithm that groups similar instances into clusters, with two main types: agglomerative and divisive clustering. The document explains the process of agglomerative hierarchical clustering using a step-by-step example, including the creation of a proximity matrix and the use of a dendrogram to visualize cluster relationships. Additionally, it discusses implementing hierarchical clustering using the scikit-learn library and evaluating clustering effectiveness with the silhouette coefficient.

Uploaded by

24mt0362

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views23 pages

Hierarchical Clustering

Uploaded by

24mt0362

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Hierarchical

clustering

Prepared By
Archana
AP/PE
IIT(ISM), Dhanbad
Monsoon 24-25
• Another powerful unsupervised ML algorithm is referred to as hierarchical clustering.
Hierarchical clustering is an algorithm that groups similar instances into clusters.
• Hierarchical clustering just like k-means clustering uses a distance-based algorithm to
measure the distance between clusters. There are two main types of hierarchical clustering
as follows:
1) Agglomerative hierarchical clustering (additive hierarchical clustering):
• In this type, each point is assigned to a cluster. For instance, if there are 10 points in a data
set, there will be 10 clusters at the beginning of applying hierarchical clustering.
• Afterward, based on a distance function such as euclidean, the closest pair of clusters are
merged. This iteration is repeated until a single cluster is left.
2) Divisive hierarchical clustering:
• This type of hierarchical clustering works the opposite way of agglomerative hierarchical
clustering.
• Hence, if there are 10 data points, all data points will initially belong to one single cluster.
• Afterward, the farthest point is split in the cluster and this process continues until each
cluster has a single point.
• To further explain the concept of hierarchical clustering, let’s go
through a step-by-step example of applying agglomerative
hierarchical clustering to a small data set of 4 wells with their
respective EURs as shown below.
• Note that since this is a one-dimensional data, the data was not
standardized prior to calculating the distances.
• In other words, it is OK to not standardize the data for this particular
example.
• Step 1) The first step in solving this problem is creating proximity
matrix. Proximity matrix simply stores the distances between each
two points.
• To create a proximity matrix for this example, a square matrix of n by
n is created.
• n represents the number of observations. Therefore, a proximity
matrix of 4*4 can be created as shown in Table 4.2.
• The diagonal elements of the matrix will be 0 because the distance of
each element from itself is 0.
• To calculate the distance between point 1 and 2, let’s use the
euclidean distance function as follows:

• Similarly, that’s how the rest of the distances were calculated in Table
4.2.
• Next, the smallest distance in the proximity matrix is identified, and
the points with the smallest distance are merged.
• As can be seen from this table, the smallest distance is 0.2 between
points 1 and 2. Therefore, these two points can be merged.
• Let’s update the clusters followed by updating the proximity matrix.
• To merge points 1 and 2 together, average, maximum, or minimum can
be chosen.
• For this example, maximum was chosen. Therefore, the maximum
EUR/1000 ft between well numbers 1 and 2 is 1.4.

• Let’s recreate the proximity matrix with the new merged clusters as
illustrated in Table 4.3.
• Clusters 3 and 4 can now (as shown in bold in Table 4.3) be merged into
one cluster with the smallest distance of 0.5. The maximum EUR/1000 ft
between well numbers 3 and 4 is 2.5. Let’s update the table as follows:
• Finally, let’s recreate the proximity matrix as shown in Table 4.4. Now,
all clusters 1, 2, 3, and 4 can be combined into one cluster. This is
essentially how agglomerative hierarchical clustering functions. The
example problem started with four clusters and ended with one
cluster.
Dendrogram
• A dendrogram is used to show the hierarchical relationship between
objects and is the output of the hierarchical clustering.
• A dendrogram could potentially help with identifying the number of
clusters to choose when applying hierarchical clustering.
• Dendrogram is also helpful in obtaining the overall structure of the data.
• To illustrate the concept of using a dendrogram, let’s create a
dendrogram for the hierarchical clustering example above.
• As illustrated in Fig. 4.15, the distance between well numbers 1 and 2 is
0.2 as shown on the y-axis (distance) and the distance between well
numbers 3 and 4 is 0.5.
• Finally, merged clusters 1,2 and 3,4 are connected and have a distance
of 1.1.
• Longer vertical lines in the dendrogram diagram indicate larger
distance between clusters.
• As a general rule of thumb, identify clusters with the longest distance
or branches (vertical lines). Shorter branches are more similar to one
• another.
• For instance, in Fig. 4.15, one cluster combines two smaller branches
(clusters 1 and 2) and another cluster combines the other two smaller
branches (clusters 3 and 4). Therefore, two clusters can be chosen in
this example.
• Please note that the optimum number of clusters is subjective and
could be influenced by the problem, domain knowledge of the
problem, and application.
Implementing dendrogram and hierarchical clustering in scikit-learn
library
• Let’s use the scikit-learn library to apply dendrogram and hierarchical
clustering.
• Please create a new Jupyter Notebook and start importing the main
libraries and use the link below to access the hierarchical clustering
data set which includes 200 wells with their respective Gas in Place
(GIP) and EUR/1000 ft.
• Next, let’s standardize the data prior to applying hierarchical
clustering as follows:

• Next, let’s create the dendrogram.

• First, "import scipy.cluster.hierarchy as shc" library.
• Please make sure to pass along "df_scaled" which is the standardized
version of the data set in
"dend = shc.dendrogram( shc.linkage(df_scaled, method = 'ward'))."
• As illustrated in Fig. 4.17, the dashed black line intersects 5 vertical lines.
• This dashed black horizontal line is drawn based on longest distance or
branches observed and is subjective.
• Therefore, feel free to alter "n_clusters“and visualize the clustering
outcome.
• Let’s now import agglomerative clustering and apply agglomerative
clustering to "df_scaled" data frame.
• Under "AgglomerativeClustering," number of desired clusters can be
accessed with attribute "n_clusters," "affinity" returns the metric used to
compute the linkage.
• In this example, euclidean distance was selected. The linkage determines
which distance to use between sets of observation.
• The "linkage“ parameter can be set as (i) ward, (ii) average, (iii) complete
or maximum, and (iv) single.
• According to scikit-learn library, "ward" minimizes the variance of the
clusters being merged.
• "average" uses the average of the distances of each observation of
the two sets.
• "complete" or "maximum" linkage uses the maximum distances
between all observations of the two sets.
• "single" uses the minimum of the distances between all observations
of the two sets.
• The default linkage of "ward" was used in this example.
• After defining the hierarchical clustering criteria under "HC," apply
"fit_predict()" to the standardized data set (df_scaled).
• Let’s convert the "df_scaled" to a data frame using panda’s "pd.Data-
Frame."
• Afterward, apply the silhouette coefficient to this data set as
illustrated below.
• As illustrated in Fig. 4.20, the silhouette coefficient on this data set is
high (close to 0.7).
• It is recommended to use silhouette coefficient to provide insight on
the clustering effectiveness of the data set.
• Next, let’s unstandardize the data and show the mean of each cluster:
• As illustrated above, the first cluster (cluster 0) represents low
EUR/high GIP, the second cluster (cluster 1) represents high EUR/high
GIP, the third cluster (cluster 2) represents medium EUR/medium GIP,
the forth cluster (cluster 3) represents high EUR/low GIP, and finally
the last cluster (cluster 4) represents low EUR and low GIP.
• Please note that dendrogram in hierarchical clustering can be used to
get a sense of the number of clusters to choose prior to applying to k-
means clustering.
• In other words, if unsure of selecting the number of clusters prior to
applying k-means clustering, a dendrogram can be used to find out
the number of clusters.
THANK YOU

CodeBotics-Book-8 Answers - Final
75% (32)
CodeBotics-Book-8 Answers - Final
30 pages
Sap HCM User Manual Organizational Management1
100% (1)
Sap HCM User Manual Organizational Management1
29 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
4 pages
Hierarchial Clustering
No ratings yet
Hierarchial Clustering
14 pages
Lecture - 11 Hierarchical Clustering
No ratings yet
Lecture - 11 Hierarchical Clustering
28 pages
ChatGPT and Artificial Intelligence in Higher Education - Seminar
No ratings yet
ChatGPT and Artificial Intelligence in Higher Education - Seminar
47 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
10 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
9 pages
Hierarchical Clustering: Class Program University Semester Lecturer Sources
100% (1)
Hierarchical Clustering: Class Program University Semester Lecturer Sources
33 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
LPDP X Aas Recommendation Letter Aqilla Nur Syaharani
No ratings yet
LPDP X Aas Recommendation Letter Aqilla Nur Syaharani
2 pages
Hierarchical Clusters
No ratings yet
Hierarchical Clusters
6 pages
Scoring Rubrics1
100% (2)
Scoring Rubrics1
2 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
10 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Aiml Unit 3 4
No ratings yet
Aiml Unit 3 4
19 pages
Example For Agglomerative Clustering
No ratings yet
Example For Agglomerative Clustering
2 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
Lect 11 DM
No ratings yet
Lect 11 DM
41 pages
Data Mining Functionalities
No ratings yet
Data Mining Functionalities
13 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
6 pages
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
No ratings yet
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
41 pages
Cluster Analysis 04: Elbow, Slihouette, Hierarchical Clustering, Agglomerative Clustering, Min, Max, Group Average
No ratings yet
Cluster Analysis 04: Elbow, Slihouette, Hierarchical Clustering, Agglomerative Clustering, Min, Max, Group Average
28 pages
3.2 HierCluster
No ratings yet
3.2 HierCluster
17 pages
Week 10
No ratings yet
Week 10
84 pages
Hierarchical Clustering and Data Science Group Project - Assignment 2
No ratings yet
Hierarchical Clustering and Data Science Group Project - Assignment 2
29 pages
CLUSTERING
No ratings yet
CLUSTERING
16 pages
4.4 Hierarchical Clustering Methods
No ratings yet
4.4 Hierarchical Clustering Methods
39 pages
ML Unit 5
No ratings yet
ML Unit 5
50 pages
Hierar Scale4
No ratings yet
Hierar Scale4
51 pages
Chinninti Venkata Assessment Machine Learning
No ratings yet
Chinninti Venkata Assessment Machine Learning
11 pages
Text Book
100% (1)
Text Book
129 pages
What Is Cluster Analysis?
No ratings yet
What Is Cluster Analysis?
20 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
34 pages
Heirarchical Clustering
No ratings yet
Heirarchical Clustering
22 pages
08 Clustering Hierarchical
No ratings yet
08 Clustering Hierarchical
44 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
ML TCS Lecture Hierarchical 1608
No ratings yet
ML TCS Lecture Hierarchical 1608
41 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
32 pages
ML Lec-17
No ratings yet
ML Lec-17
12 pages
Clustring
No ratings yet
Clustring
20 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Airbus 1803 AirnavX Intranet IT Webinar
100% (1)
Airbus 1803 AirnavX Intranet IT Webinar
27 pages
Topic 6d - Hierarchical Algorithm
No ratings yet
Topic 6d - Hierarchical Algorithm
38 pages
DWM Exp8 127 133 137
No ratings yet
DWM Exp8 127 133 137
4 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
11 pages
Expt 5
No ratings yet
Expt 5
3 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Clustering Dendogram
No ratings yet
Clustering Dendogram
13 pages
Hierarchical
No ratings yet
Hierarchical
31 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Lec.4.D. M. Spring 2025
No ratings yet
Lec.4.D. M. Spring 2025
19 pages
Gene and Sample Clustering
No ratings yet
Gene and Sample Clustering
5 pages
Marketing Analytics Week-10 LAQ
No ratings yet
Marketing Analytics Week-10 LAQ
5 pages
10Hierarchical&Probabilistic Clustering & GMM (ML)
No ratings yet
10Hierarchical&Probabilistic Clustering & GMM (ML)
24 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
11 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Lesson 6 - Unsupervised Learning
No ratings yet
Lesson 6 - Unsupervised Learning
63 pages
Clustering
No ratings yet
Clustering
19 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
7 pages
Agnes
No ratings yet
Agnes
25 pages
Cloud Computing: - BY Banu Pooja Kannikanti 17H71A05C4
No ratings yet
Cloud Computing: - BY Banu Pooja Kannikanti 17H71A05C4
19 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Work With File Handling: © Dukestar Technologies Pvt. Ltd. 1/12
No ratings yet
Work With File Handling: © Dukestar Technologies Pvt. Ltd. 1/12
12 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
35 pages
What Is An NFT
No ratings yet
What Is An NFT
5 pages
CCTV: A Boon To The New Millenium: La Union National High School City of San Fernando SY: 2012-2013
No ratings yet
CCTV: A Boon To The New Millenium: La Union National High School City of San Fernando SY: 2012-2013
15 pages
Senior High School Department: Republic of The Philippines
No ratings yet
Senior High School Department: Republic of The Philippines
3 pages
c1 Curso B Prof Ingles
No ratings yet
c1 Curso B Prof Ingles
32 pages
Sources of Online Courses and Finance
No ratings yet
Sources of Online Courses and Finance
7 pages
Reconfigurable Training Module (RTM) 587443 (9431-20) : Labvolt Series
No ratings yet
Reconfigurable Training Module (RTM) 587443 (9431-20) : Labvolt Series
3 pages
Andro-Socio: Tressa Poulose, Gauri Palshikar, Sneha Chandra & Anushri Patil
No ratings yet
Andro-Socio: Tressa Poulose, Gauri Palshikar, Sneha Chandra & Anushri Patil
5 pages
UM1734 User Manual: STM32Cube™ USB Device Library
No ratings yet
UM1734 User Manual: STM32Cube™ USB Device Library
60 pages
Job Skills Thesis
No ratings yet
Job Skills Thesis
5 pages
Seo 1726465399543
No ratings yet
Seo 1726465399543
2 pages
Tieng Anh
No ratings yet
Tieng Anh
17 pages
RBK852 Um en
No ratings yet
RBK852 Um en
142 pages
pRRU5731GR Hardware Description (02) (PDF) - EN
No ratings yet
pRRU5731GR Hardware Description (02) (PDF) - EN
25 pages
Description of Microsoft Internet Information Services (IIS) 5.0 and 6.0 Status Codes
No ratings yet
Description of Microsoft Internet Information Services (IIS) 5.0 and 6.0 Status Codes
8 pages
8.0 Tree Data Structure - Notes
No ratings yet
8.0 Tree Data Structure - Notes
23 pages
Topik 1
No ratings yet
Topik 1
23 pages
Dirac Live User Manual 16 Dec
No ratings yet
Dirac Live User Manual 16 Dec
32 pages
Alignment Software Spine Version.2.0 Revision.1.2 en
No ratings yet
Alignment Software Spine Version.2.0 Revision.1.2 en
42 pages
Part E Accuracy Instructions For Prefilled Excel Workbook
No ratings yet
Part E Accuracy Instructions For Prefilled Excel Workbook
14 pages
JVM Overview Content
No ratings yet
JVM Overview Content
10 pages
LS2208 Spec Sheet
No ratings yet
LS2208 Spec Sheet
2 pages
MATLAB for Beginners: A Gentle Approach
From Everand
MATLAB for Beginners: A Gentle Approach
Peter I. Kattan
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet

Hierarchical Clustering

Uploaded by

Hierarchical Clustering

Uploaded by

Hierarchical

• Next, let’s create the dendrogram.

You might also like