0% found this document useful (0 votes)
25 views4 pages

DWM Exp8 127 133 137

The experiment implements agglomerative hierarchical clustering on a dataset. It calculates the Euclidean distance matrix between data points to quantify their dissimilarity based on attributes like age, income, and spending score. This distance matrix is then used as input for the hierarchical clustering algorithm. A dendrogram plot is produced to visualize the hierarchy of clusters formed at different distance thresholds. The experiment demonstrates how hierarchical clustering can reveal natural groupings within unlabeled data.

Uploaded by

Manav Purswani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views4 pages

DWM Exp8 127 133 137

The experiment implements agglomerative hierarchical clustering on a dataset. It calculates the Euclidean distance matrix between data points to quantify their dissimilarity based on attributes like age, income, and spending score. This distance matrix is then used as input for the hierarchical clustering algorithm. A dendrogram plot is produced to visualize the hierarchy of clusters formed at different distance thresholds. The experiment demonstrates how hierarchical clustering can reveal natural groupings within unlabeled data.

Uploaded by

Manav Purswani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Experiment 8

Aim: Implementation of any one Hierarchical Clustering method

Theory: Hierarchical clustering is another unsupervised machine learning algorithm, that is


used to group the unlabeled datasets into a cluster and is also known as hierarchical
cluster analysis or HCA.
In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this
tree-shaped structure is known as the dendrogram. Sometimes the results of K-
means
clustering and hierarchical clustering may look similar, but they both differ depending on
how they work. There is no requirement to predetermine the number of clusters as we did in
the K-Means algorithm. The hierarchical clustering technique has two approaches:

1. Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm


starts with taking all data points as single clusters and merging them until one
cluster is left.
2. Divisive: The divisive algorithm is the reverse of the agglomerative algorithm as it
is a top-down approach.

Agglomerative Hierarchical clustering

The agglomerative hierarchical clustering algorithm is a popular example of HCA. To group


the datasets into clusters, it follows the bottom-up approach. It means this algorithm
considers each dataset as a single cluster at the beginning, and then starts combining the
closest pair of clusters together. It does this until all the clusters are merged into a single
cluster that contains all the datasets.

This hierarchy of clusters is represented in the form of the dendrogram.

Measure for the distance between two clusters

As we have seen, the closest distance between the two clusters is crucial for hierarchical
clustering. There are various ways to calculate the distance between two clusters, and these
ways decide the rule for clustering. These measures are called Linkage methods. Some of
the popular linkage methods are given below:

Single Linkage: It is the Shortest Distance between the closest points of the clusters.
Consider the below image
Complete Linkage: It is the farthest distance between the two points of two different
clusters. It is one of the popular linkage methods as it forms tighter clusters than
single-linkage.

Average Linkage: It is the linkage method in which the distance between each pair of
datasets is added up and then divided by the total number of datasets to calculate the
average distance between two clusters. It is also one of the most popular linkage methods.

Centroid Linkage: It is the linkage method in which the distance between the centroid of the
clusters is calculated. Consider the below image

CODE:

import numpy as nm

import matplotlib.pyplot as mtp

import pandas as pd

dataset =

pd.read_csv('exp8.csv')

dataset.head()

OUTPUT:

import pandas as pd

from scipy.spatial.distance import pdist, squareform

data = pd.DataFrame({

'Age': [19, 21, 20, 23, 31],

'Annual Income(k$)': [15, 15, 16, 16, 17],

'Spending Score(1-100)': [39, 81, 6, 77, 40]

})
distance_matrix = pdist(data, metric='euclidean')

distance_matrix_square = squareform(distance_matrix)

print(distance_matrix_square)

OUTPUT:

import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv('exp8.csv')
x = dataset.iloc[:, [3, 4]].values
import scipy.cluster.hierarchy as shc
dendro = shc.dendrogram(shc.linkage(x, method="ward"))
mtp.title("Dendrogrma Plot")
mtp.ylabel("Euclidean Distances")
mtp.xlabel("Customers")
mtp.show()

OUTPUT:
Conclusion: In the experiment, we calculated the Euclidean distance matrix for a subset
of data points from the given dataset. This distance matrix quantifies the dissimilarity
between data points based on their 'Age,' 'Annual Income(k$),' and 'Spending Score(1-100).'
The matrix provides a foundation for hierarchical clustering analysis, which can reveal
natural groupings or clusters within the data. The distance matrix is a crucial input for
clustering algorithms and allows us to identify similarities and differences among data
points for further analysis and decision-making.

You might also like