Hierarchical clustering
Hierarchical clustering
clustering
Prepared By
Archana
AP/PE
IIT(ISM), Dhanbad
Monsoon 24-25
• Another powerful unsupervised ML algorithm is referred to as hierarchical clustering.
Hierarchical clustering is an algorithm that groups similar instances into clusters.
• Hierarchical clustering just like k-means clustering uses a distance-based algorithm to
measure the distance between clusters. There are two main types of hierarchical clustering
as follows:
1) Agglomerative hierarchical clustering (additive hierarchical clustering):
• In this type, each point is assigned to a cluster. For instance, if there are 10 points in a data
set, there will be 10 clusters at the beginning of applying hierarchical clustering.
• Afterward, based on a distance function such as euclidean, the closest pair of clusters are
merged. This iteration is repeated until a single cluster is left.
2) Divisive hierarchical clustering:
• This type of hierarchical clustering works the opposite way of agglomerative hierarchical
clustering.
• Hence, if there are 10 data points, all data points will initially belong to one single cluster.
• Afterward, the farthest point is split in the cluster and this process continues until each
cluster has a single point.
• To further explain the concept of hierarchical clustering, let’s go
through a step-by-step example of applying agglomerative
hierarchical clustering to a small data set of 4 wells with their
respective EURs as shown below.
• Note that since this is a one-dimensional data, the data was not
standardized prior to calculating the distances.
• In other words, it is OK to not standardize the data for this particular
example.
• Step 1) The first step in solving this problem is creating proximity
matrix. Proximity matrix simply stores the distances between each
two points.
• To create a proximity matrix for this example, a square matrix of n by
n is created.
• n represents the number of observations. Therefore, a proximity
matrix of 4*4 can be created as shown in Table 4.2.
• The diagonal elements of the matrix will be 0 because the distance of
each element from itself is 0.
• To calculate the distance between point 1 and 2, let’s use the
euclidean distance function as follows:
• Similarly, that’s how the rest of the distances were calculated in Table
4.2.
• Next, the smallest distance in the proximity matrix is identified, and
the points with the smallest distance are merged.
• As can be seen from this table, the smallest distance is 0.2 between
points 1 and 2. Therefore, these two points can be merged.
• Let’s update the clusters followed by updating the proximity matrix.
• To merge points 1 and 2 together, average, maximum, or minimum can
be chosen.
• For this example, maximum was chosen. Therefore, the maximum
EUR/1000 ft between well numbers 1 and 2 is 1.4.
• Let’s recreate the proximity matrix with the new merged clusters as
illustrated in Table 4.3.
• Clusters 3 and 4 can now (as shown in bold in Table 4.3) be merged into
one cluster with the smallest distance of 0.5. The maximum EUR/1000 ft
between well numbers 3 and 4 is 2.5. Let’s update the table as follows:
• Finally, let’s recreate the proximity matrix as shown in Table 4.4. Now,
all clusters 1, 2, 3, and 4 can be combined into one cluster. This is
essentially how agglomerative hierarchical clustering functions. The
example problem started with four clusters and ended with one
cluster.
Dendrogram
• A dendrogram is used to show the hierarchical relationship between
objects and is the output of the hierarchical clustering.
• A dendrogram could potentially help with identifying the number of
clusters to choose when applying hierarchical clustering.
• Dendrogram is also helpful in obtaining the overall structure of the data.
• To illustrate the concept of using a dendrogram, let’s create a
dendrogram for the hierarchical clustering example above.
• As illustrated in Fig. 4.15, the distance between well numbers 1 and 2 is
0.2 as shown on the y-axis (distance) and the distance between well
numbers 3 and 4 is 0.5.
• Finally, merged clusters 1,2 and 3,4 are connected and have a distance
of 1.1.
• Longer vertical lines in the dendrogram diagram indicate larger
distance between clusters.
• As a general rule of thumb, identify clusters with the longest distance
or branches (vertical lines). Shorter branches are more similar to one
• another.
• For instance, in Fig. 4.15, one cluster combines two smaller branches
(clusters 1 and 2) and another cluster combines the other two smaller
branches (clusters 3 and 4). Therefore, two clusters can be chosen in
this example.
• Please note that the optimum number of clusters is subjective and
could be influenced by the problem, domain knowledge of the
problem, and application.
Implementing dendrogram and hierarchical clustering in scikit-learn
library
• Let’s use the scikit-learn library to apply dendrogram and hierarchical
clustering.
• Please create a new Jupyter Notebook and start importing the main
libraries and use the link below to access the hierarchical clustering
data set which includes 200 wells with their respective Gas in Place
(GIP) and EUR/1000 ft.
• Next, let’s standardize the data prior to applying hierarchical
clustering as follows: