DWM Exp8 127 133 137
DWM Exp8 127 133 137
As we have seen, the closest distance between the two clusters is crucial for hierarchical
clustering. There are various ways to calculate the distance between two clusters, and these
ways decide the rule for clustering. These measures are called Linkage methods. Some of
the popular linkage methods are given below:
Single Linkage: It is the Shortest Distance between the closest points of the clusters.
Consider the below image
Complete Linkage: It is the farthest distance between the two points of two different
clusters. It is one of the popular linkage methods as it forms tighter clusters than
single-linkage.
Average Linkage: It is the linkage method in which the distance between each pair of
datasets is added up and then divided by the total number of datasets to calculate the
average distance between two clusters. It is also one of the most popular linkage methods.
Centroid Linkage: It is the linkage method in which the distance between the centroid of the
clusters is calculated. Consider the below image
CODE:
import numpy as nm
import pandas as pd
dataset =
pd.read_csv('exp8.csv')
dataset.head()
OUTPUT:
import pandas as pd
data = pd.DataFrame({
})
distance_matrix = pdist(data, metric='euclidean')
distance_matrix_square = squareform(distance_matrix)
print(distance_matrix_square)
OUTPUT:
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv('exp8.csv')
x = dataset.iloc[:, [3, 4]].values
import scipy.cluster.hierarchy as shc
dendro = shc.dendrogram(shc.linkage(x, method="ward"))
mtp.title("Dendrogrma Plot")
mtp.ylabel("Euclidean Distances")
mtp.xlabel("Customers")
mtp.show()
OUTPUT:
Conclusion: In the experiment, we calculated the Euclidean distance matrix for a subset
of data points from the given dataset. This distance matrix quantifies the dissimilarity
between data points based on their 'Age,' 'Annual Income(k$),' and 'Spending Score(1-100).'
The matrix provides a foundation for hierarchical clustering analysis, which can reveal
natural groupings or clusters within the data. The distance matrix is a crucial input for
clustering algorithms and allows us to identify similarities and differences among data
points for further analysis and decision-making.