9536 DWM Expt 7 Merged
9536 DWM Expt 7 Merged
-----------------
9536
9649
Te- Comps A
Expt7
Step 2- Take the 2 closet data points and make them one cluster. Now the total clusters become
n-1.
Step 3-Take the 2 closet clusters and make them one cluster. Now the total clusters become n-2.
2. Furthest Points- Another option is to take the two furthest points and calculate their distance.
And consider this distance as the distance of two clusters. It is also known as Complete-linkage
That look something like that-
3. Average Distance- In that method, you can take the average distance of all the data points and
use this average distance as the distance of two clusters. It is known as Average-linkage.
4. Distance between Centroids- Another option is to find the centroid of clusters and then
calculate the distance between two centroids. It is known as Centroid-linkage.
Choosing the method for distance calculation is an important part of Hierarchical Clustering.
Because it affects performance.
That’s why you should keep in mind while working on Hierarchical clustering that distance
between clusters are crucial.
Depending upon you problem you can choose the option.
Now you understood the steps to perform and Hierarchical Clustering.
What is Dendrogram?
A Dendrogram is a tree-like structure, that stores each record of splitting and merging.
Let’s understand how to create dendrogram and how it works-
How Dendrogram is Created?
Suppose, we have 6 data points.
A Dendrogram stores each record of splitting and merging in a chart.
Suppose this is our Dendrogram chart-
Here, all 6 data points P1, P2, P3, P4, P5, and P6 are mention.
So, whenever any merging happen within data points and clusters, dendrogram update it on the
chart.
So, let’s start with the 1st step.
Step 1
That is combine two closet data points into one cluster. Suppose these are two closet data points,
so we combine them into one cluster.
Here we combine P5 and P6 into one cluster. So Dendrogram update this merging into the chart.
Dendrogram Store the records by drawing horizontal line in a chart. The height of this horizontal
line is based on the Euclidean Distance.
The minimum the euclidean distance the minimum height of this horizontal line.
Step 2-
At step 2, find the next two closet data points and convert them into one cluster.
Suppose P2 and P3 are the next closet data points.
Again the height of this horizontal line depends upon the Euclidean Distance.
Step 3-
At step 3, again we look at the closet clusters. P4 is closer to the Red cluster.
So, P4, P5, and P6 forms one cluster. The dendrogram update it into the dendrogram chart.
Step 4-
Again, we look at the closet clusters. P1 is closer to the green cluster. So merge the into one
cluster.
Dendrogram draws the final horizontal line. The height of this line is big because the distance
between cluster is very far.
So, that’s how Dendrogram is created. I hope you understood. The dendrogram is the memory of
Hierarchical clustering.
Now, we have created a Dendrogram, its time to find the optimal number of clusters with the
help of Dendrogram.
So, we can find the optimal number of cluster by cutting out dendrogram with a horizontal line.
this horizontal line has highest distance and who can traverse the maximum distance up and
down without intersecting the merging point.
Let’s understand with the help of this example-
Suppose in this dendrogram, this L1 is the longest distance, who can traverse maximum distance
up and down without intersecting the merging points.
So, we make cut by drawing a horizontal line. That look something like that-
This cutting line intersects two vertical lines. And this is the optimal number of clusters.
cluster_centers =
pd.DataFrame(scaler.inverse_transform(kmeans.cluster_centers_),
columns=X.columns)
print(cluster_centers)
import scipy.cluster.hierarchy as sch
dendro = sch.dendrogram(sch.linkage(X, method = 'ward'))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean distances')
plt.show()
from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters = 5, metric ='euclidean' , linkage
='ward')
y_hc =hc.fit_predict(X)
Program with code – Use different dataset
Links:
Hierarchical Clustering in Python, Step by Step Complete Guide (mltut.com)
scipy.cluster.hierarchy.linkage — SciPy v1.7.1 Manual