0% found this document useful (0 votes)

15 views14 pages

9536 DWM Expt 7 Merged

The document provides information about hierarchical clustering methods. It discusses agglomerative and divisive hierarchical clustering. It explains how to calculate distance between clusters using different methods like closest points, furthest points, average distance and distance between centroids. It also describes what a dendrogram is and how it is created to store records of splitting and merging of clusters at each step. The optimal number of clusters can be determined by cutting the dendrogram horizontally at the longest distance line.

Uploaded by

Kakashi Hatake

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views14 pages

9536 DWM Expt 7 Merged

Uploaded by

Kakashi Hatake

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

‭TE Comp-V‬ ‭Lab Experiment : 7‬ ‭Date of Submission :‬

‭Name: Saville D’silva Roll number : 9536‬

‭Course outcomes: On successful completion of course learner will be able to:‬

‭-----------------‬

‭Rubrics for assessment of Lab Experiment :‬

‭Indicator‬ ‭Average‬ ‭Good‬ ‭Excellent‬
‭Timeline‬ ‭ ate submission‬
L ‭01 (On Time )‬ ‭ 2 (Before‬
0
‭(0)‬ ‭deadline )‬
‭●‬ O ‭ n time Completion &‬
‭Submission (02)‬
‭Completeness and neatness‬
‭●‬ ‭Complete all parts of‬ ‭ 60% complete‬
< ‭ 80% complete‬
< ‭100%‬
‭schema diagram /‬ ‭(0)‬ ‭(1)‬ ‭complete (2)‬
‭OLAP / Algorithm (2)‬
‭Implementation‬ ‭ 60% complete‬
< ‭ 80% complete‬
< ‭ 00%‬
1
‭●‬ ‭Extent of coding (4)‬ ‭(2)‬ ‭(3)‬ ‭complete (4)‬
‭Knowledge‬
‭ nable to‬
U
‭●‬ ‭In depth knowledge of the‬ ‭ nable to answer‬
U ‭ ble to answer‬
A
‭answer 1‬
‭post assignment questions‬ ‭2 questions(0)‬ ‭2 questions (2)‬
‭question (1)‬
‭(2)‬
‭Completeness and neatness‬ ‭Timeline (2)‬ ‭ nowledge‬
K
I‭ mplementation‬
‭(2)‬ ‭(2)‬
‭(4)‬

‭Teacher's Sign : Total (10):‬

Ishita D'silva
Saville Yadav

9536
9649

Te- Comps A

Expt7

Aim : Implementation of any one Hierarchical Clustering method

Theory :
What is Clustering?
Clustering is nothing but different groups. Items in one group are similar to each other. And
Items in different groups are dissimilar with each other. Clustering is known as Unsupervised
Learning.
Hierarchical clustering
Hierarchical Clustering groups similar objects into one cluster. The final cluster in the
Hierarchical cluster combines all clusters into one cluster. An example of Hierarchical clustering
is Dendrogram.
Type of Hierarchical Clustering:Hierarchical Clustering is of 2 types-
1. Agglomerative Hierarchical Clustering.
2. Divisive Hierarchical Clustering.
1. Agglomerative Hierarchical Clustering.
Agglomerative Hierarchical Clustering uses a bottom-up approach to form clusters. That means
it starts from single data points. Then it clusters the closer data points into one cluster. The same
process repeats until it gets one single cluster.
2. Divisive Hierarchical Clustering.
Divisive Hierarchical Clustering is the opposite of Agglomerative Hierarchical clustering. It is
a Top-Down approach.
That means, it starts from one single cluster. In that single cluster, there may be n number of
clusters and data points.At each step it split the farthest cluster into separate clusters.
Step 1- Make each data point a single cluster. Suppose that forms n clusters.

Step 2- Take the 2 closet data points and make them one cluster. Now the total clusters become
n-1.
Step 3-Take the 2 closet clusters and make them one cluster. Now the total clusters become n-2.

Step 4- Repeat Step 3 until only one cluster is left.

When only one huge cluster is left, the algorithms stops.

How to calculate Distance between Two Clusters?
For calculating the distance between two data points, we use the Euclidean Distance Formula.
But, to calculate the distance between two clusters, we can use four methods-
1. Closet Points- That means we take the distance of two closet points from two clusters. It is
also known as Single Linkage. Something like that-

2. Furthest Points- Another option is to take the two furthest points and calculate their distance.
And consider this distance as the distance of two clusters. It is also known as Complete-linkage
That look something like that-
3. Average Distance- In that method, you can take the average distance of all the data points and
use this average distance as the distance of two clusters. It is known as Average-linkage.
4. Distance between Centroids- Another option is to find the centroid of clusters and then
calculate the distance between two centroids. It is known as Centroid-linkage.

Choosing the method for distance calculation is an important part of Hierarchical Clustering.
Because it affects performance.
That’s why you should keep in mind while working on Hierarchical clustering that distance
between clusters are crucial.
Depending upon you problem you can choose the option.
Now you understood the steps to perform and Hierarchical Clustering.
What is Dendrogram?
A Dendrogram is a tree-like structure, that stores each record of splitting and merging.
Let’s understand how to create dendrogram and how it works-
How Dendrogram is Created?
Suppose, we have 6 data points.
A Dendrogram stores each record of splitting and merging in a chart.
Suppose this is our Dendrogram chart-
Here, all 6 data points P1, P2, P3, P4, P5, and P6 are mention.

So, whenever any merging happen within data points and clusters, dendrogram update it on the
chart.
So, let’s start with the 1st step.
Step 1
That is combine two closet data points into one cluster. Suppose these are two closet data points,
so we combine them into one cluster.
Here we combine P5 and P6 into one cluster. So Dendrogram update this merging into the chart.

Dendrogram Store the records by drawing horizontal line in a chart. The height of this horizontal
line is based on the Euclidean Distance.
The minimum the euclidean distance the minimum height of this horizontal line.
Step 2-
At step 2, find the next two closet data points and convert them into one cluster.
Suppose P2 and P3 are the next closet data points.

So, Dendrogram update this merging into the dendrogram chart.

Again the height of this horizontal line depends upon the Euclidean Distance.
Step 3-
At step 3, again we look at the closet clusters. P4 is closer to the Red cluster.
So, P4, P5, and P6 forms one cluster. The dendrogram update it into the dendrogram chart.

Step 4-
Again, we look at the closet clusters. P1 is closer to the green cluster. So merge the into one
cluster.

Dendrogram again update it into the chart.

Step 5-
Now, no small clusters are left. So, the last step is to merge all clusters into one huge cluster.

Dendrogram draws the final horizontal line. The height of this line is big because the distance
between cluster is very far.

So, that’s how Dendrogram is created. I hope you understood. The dendrogram is the memory of
Hierarchical clustering.
Now, we have created a Dendrogram, its time to find the optimal number of clusters with the
help of Dendrogram.
So, we can find the optimal number of cluster by cutting out dendrogram with a horizontal line.
this horizontal line has highest distance and who can traverse the maximum distance up and
down without intersecting the merging point.
Let’s understand with the help of this example-
Suppose in this dendrogram, this L1 is the longest distance, who can traverse maximum distance
up and down without intersecting the merging points.
So, we make cut by drawing a horizontal line. That look something like that-

This cutting line intersects two vertical lines. And this is the optimal number of clusters.

That’s why in that case, the optimal number of clusters are 2.

Link to download dataset : Mall_Customers | Kaggle

steps:
1) Load dataset
2) Select any two dimensions or attributes representing 2D data
3) Load libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
4) Create Dendrogram to find the Optimal Number of Clusters
import scipy.cluster.hierarchy as sch
dendro = sch.dendrogram(sch.linkage(X, method = 'ward'))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean distances')
plt.show()

5) Use Agglomerative Hierarchical Clustering to fit clusters to the dataset

from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters = 5, affinity = 'euclidean', linkage = 'ward') y_hc =
hc.fit_predict(X)
Code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
data = pd.read_csv('Mall_Customers.csv')
data.head(10)
X= data[['Annual Income (k$)','Spending Score (1-100)']]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
num_clusters = 5
kmeans = KMeans(n_clusters=num_clusters)
data['Cluster'] = kmeans.fit_predict(X_scaled)

plt.scatter(data['Annual Income (k$)'], data['Spending Score (1-100)'],

c=data['Cluster'], cmap='rainbow')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.title(f'K-Means Clustering with {num_clusters} Clusters')
plt.show()

cluster_centers =
pd.DataFrame(scaler.inverse_transform(kmeans.cluster_centers_),
columns=X.columns)
print(cluster_centers)
import scipy.cluster.hierarchy as sch
dendro = sch.dendrogram(sch.linkage(X, method = 'ward'))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean distances')
plt.show()
from sklearn.cluster import AgglomerativeClustering
hc = AgglomerativeClustering(n_clusters = 5, metric ='euclidean' , linkage
='ward')
y_hc =hc.fit_predict(X)
Program with code – Use different dataset

Links:
Hierarchical Clustering in Python, Step by Step Complete Guide (mltut.com)
scipy.cluster.hierarchy.linkage — SciPy v1.7.1 Manual

Tle Week 6
100% (1)
Tle Week 6
99 pages
Unit IV
No ratings yet
Unit IV
51 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
9 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
Machine Learning Notes Anna University
100% (1)
Machine Learning Notes Anna University
14 pages
Week 10
No ratings yet
Week 10
84 pages
How To Break Software: James A. Whittaker
No ratings yet
How To Break Software: James A. Whittaker
8 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
101 pages
Clustering
No ratings yet
Clustering
69 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Lecture - 11 Hierarchical Clustering
No ratings yet
Lecture - 11 Hierarchical Clustering
28 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
4 pages
3CP10 MJJ Hierarchical Clustering
No ratings yet
3CP10 MJJ Hierarchical Clustering
40 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
10 pages
Philosophy of Action - Philosophie de L'action (PDFDrive)
100% (1)
Philosophy of Action - Philosophie de L'action (PDFDrive)
372 pages
Hierarchical
No ratings yet
Hierarchical
31 pages
ML Unit-5
No ratings yet
ML Unit-5
30 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
32 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Hierarchical Clustering: Class Program University Semester Lecturer Sources
100% (1)
Hierarchical Clustering: Class Program University Semester Lecturer Sources
33 pages
ML Unit-5
No ratings yet
ML Unit-5
31 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
05 Introduction To Eculid Geometry Subjective Questions
No ratings yet
05 Introduction To Eculid Geometry Subjective Questions
2 pages
Hierarchical Clustering PDF
No ratings yet
Hierarchical Clustering PDF
7 pages
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
10 pages
Hierarchical Clusters
No ratings yet
Hierarchical Clusters
6 pages
ML Unit 5
No ratings yet
ML Unit 5
50 pages
ML Lec-18
No ratings yet
ML Lec-18
21 pages
RK Clustering
No ratings yet
RK Clustering
77 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Clustering Revision
No ratings yet
Clustering Revision
6 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Unit 4 Self Made
No ratings yet
Unit 4 Self Made
28 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Unit 3
No ratings yet
Unit 3
12 pages
Clustering
No ratings yet
Clustering
19 pages
3.2 HierCluster
No ratings yet
3.2 HierCluster
17 pages
22it601 - Data Mining and Warehousing: Lecture Notes Template
No ratings yet
22it601 - Data Mining and Warehousing: Lecture Notes Template
10 pages
Enrolment System
No ratings yet
Enrolment System
22 pages
Clustering
No ratings yet
Clustering
110 pages
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
No ratings yet
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
41 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
34 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
7 pages
GUJRAT REG63c
No ratings yet
GUJRAT REG63c
20 pages
10Hierarchical&Probabilistic Clustering & GMM (ML)
No ratings yet
10Hierarchical&Probabilistic Clustering & GMM (ML)
24 pages
ML Lec-17
No ratings yet
ML Lec-17
12 pages
Clustring
No ratings yet
Clustring
20 pages
Answer Scheme Paper 1 F4 Midterm
No ratings yet
Answer Scheme Paper 1 F4 Midterm
7 pages
Senarai Pemaju Projek Perumahan Swasta 2022
No ratings yet
Senarai Pemaju Projek Perumahan Swasta 2022
10 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
7 pages
Exp 8
No ratings yet
Exp 8
7 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
11 pages
Reflective Lesson Plan # 6
No ratings yet
Reflective Lesson Plan # 6
6 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Clustering Dendogram
No ratings yet
Clustering Dendogram
13 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Practice Managing Backup and Recovery in Oracle RAC
No ratings yet
Practice Managing Backup and Recovery in Oracle RAC
10 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
6 pages
Agglomerative Hierarchical Clustering
No ratings yet
Agglomerative Hierarchical Clustering
22 pages
DWM Exp8 127 133 137
No ratings yet
DWM Exp8 127 133 137
4 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Comparative and Superlative 2°
No ratings yet
Comparative and Superlative 2°
23 pages
Agnes
No ratings yet
Agnes
25 pages
Create New Custom Migration Object Using LTMOM 1689948040
No ratings yet
Create New Custom Migration Object Using LTMOM 1689948040
55 pages
Stopping by The Woods and The Road Not Taken-Les
No ratings yet
Stopping by The Woods and The Road Not Taken-Les
4 pages
Ms (Windows) : of Computer Education
No ratings yet
Ms (Windows) : of Computer Education
2 pages
CTB Chapter Lattice Introduction en
No ratings yet
CTB Chapter Lattice Introduction en
91 pages
01.adactin Project (1) - SRS DOCUMENT
No ratings yet
01.adactin Project (1) - SRS DOCUMENT
12 pages
9 Writing B Документ Microsoft Word
No ratings yet
9 Writing B Документ Microsoft Word
2 pages
Tafsir Ibn Kathir 054 Qamar
No ratings yet
Tafsir Ibn Kathir 054 Qamar
40 pages
Pirates of The Caribbean
No ratings yet
Pirates of The Caribbean
2 pages
The Move The Divide The Myth and Its Dog PDF
No ratings yet
The Move The Divide The Myth and Its Dog PDF
36 pages
Jan Jacobus Spies - Form 4 Unit 2 Arrays Learner Workbook
No ratings yet
Jan Jacobus Spies - Form 4 Unit 2 Arrays Learner Workbook
11 pages
Class 7 English
No ratings yet
Class 7 English
5 pages
PTS Power Point
No ratings yet
PTS Power Point
20 pages
Unit 2.2. Satyashodhak Samaj, Arya Samaj, Ramakrishna Mission, Aligarh Movement
No ratings yet
Unit 2.2. Satyashodhak Samaj, Arya Samaj, Ramakrishna Mission, Aligarh Movement
10 pages
LR Parsers II The Canonical LR (1) Table Construction
No ratings yet
LR Parsers II The Canonical LR (1) Table Construction
24 pages
2 How To - Tests of Copy Configuration From Client 000 - Note 2838358 - Part2
No ratings yet
2 How To - Tests of Copy Configuration From Client 000 - Note 2838358 - Part2
7 pages
Christopher I. Beckwith
No ratings yet
Christopher I. Beckwith
3 pages
Social Studies Reviewer 3RD QTR 2023
No ratings yet
Social Studies Reviewer 3RD QTR 2023
2 pages
Computer Programming Paradigm Lab
No ratings yet
Computer Programming Paradigm Lab
6 pages
Infpt
No ratings yet
Infpt
1 page

9536 DWM Expt 7 Merged

Uploaded by

9536 DWM Expt 7 Merged

Uploaded by

‭TE Comp-V‬ ‭Lab Experiment : 7‬ ‭Date of Submission :‬

‭Name: Saville D’silva Roll number : 9536‬

‭Rubrics for assessment of Lab Experiment :‬

‭Teacher's Sign : Total (10):‬

Aim : Implementation of any one Hierarchical Clustering method

Step 4- Repeat Step 3 until only one cluster is left.

When only one huge cluster is left, the algorithms stops.

So, Dendrogram update this merging into the dendrogram chart.

Dendrogram again update it into the chart.

That’s why in that case, the optimal number of clusters are 2.

Link to download dataset : Mall_Customers | Kaggle

5) Use Agglomerative Hierarchical Clustering to fit clusters to the dataset

plt.scatter(data['Annual Income (k$)'], data['Spending Score (1-100)'],

You might also like