Aiml Unit 3 4

This document provides instructions for implementing hierarchical agglomerative clustering on a customer dataset. It includes steps to: 1) Import libraries and load the data 2) Pre-process the data and determine the optimal number of clusters using dendrograms 3) Train a hierarchical clustering model using ward linkage and the determined number of clusters 4) Visualize the clustered results on a 2D plot

Uploaded by

Gaurav patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views19 pages

Aiml Unit 3 4

Uploaded by

Gaurav patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Artificial Intelligence and Machine Learning

CET3030B
Lab Assignment: 6
• Write a program to implement Hierarchal agglomerative clustering
for a given dataset, e. g. customer dataset on kaggle. Evaluate its
performance.
• 1. Import required python libraries
• 2. Load and explore the data
• 3. Pre-process and train the hierarchal agglomerative clustering
model on the dataset
• 4. Analyse the results and visualize using dendograms.

2
Introduction
• Hierarchical clustering is another unsupervised machine learning algorithm,
which is used to group the unlabeled datasets into a cluster and also known
as Hierarchical Cluster Analysis or HCA.
• In this algorithm, we develop the hierarchy of clusters in the form of a tree,
and this tree-shaped structure is known as the dendrogram.
• A dendrogram is a diagram that shows the hierarchical relationship between
objects.
• It is most commonly created as an output from hierarchical clustering.
• The main use of a dendrogram is to work out the best way to allocate objects
to clusters.
• Hierarchical clustering algorithms group similar objects into groups
called clusters.

3
Introduction
• There are two types of hierarchical clustering algorithms:
• Agglomerative — Bottom up approach. Start with many small clusters
and merge them together to create bigger clusters.
• Divisive — Top down approach. Start with a single cluster than break
it up into smaller clusters.

4
Agglomerative Clustering
• Also known as bottom-up approach or Hierarchical Agglomerative
Clustering (HAC).
• This clustering algorithm does not require us to pre-specify the number of
clusters.
• Bottom-up algorithms treat each data as a singleton cluster at the outset
and then successively agglomerates pairs of clusters until all clusters have
been merged into a single cluster that contains all data.
• It means, this algorithm considers each dataset as a single cluster at the
beginning, and then start combining the closest pair of clusters together.
• It does this until all the clusters are merged into a single cluster that
contains all the datasets.

5
Agglomerative Clustering

6
Customer Dataset
CustomerID Gender Age Annual Income (k$) Spending Score (1-100) Cluster
1 1 19 15 39 3
2 1 21 15 81 4
3 0 20 16 6 3
4 0 23 16 77 4
5 0 31 17 40 3
6 0 22 17 76 4
7 0 35 18 6 3
8 0 23 18 94 4
9 1 64 19 3 3
10 0 30 19 72 4

Rows: 200 and Columns: 5

7
Implementation
• Step 1: Data Pre-processing:
• Importing the libraries
# Importing the libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
• The above lines of code are used to import the libraries to perform
specific tasks, such as numpy for the Mathematical operations,
matplotlib for drawing the graphs or scatter plot, and pandas for
importing the dataset.
8
Implementation
• Importing the dataset
• # Importing the dataset
dataset = pd.read_csv('Mall_Customers_data.csv')
• Extracting the matrix of features
• Here we will extract only the matrix of features as we don't have any
further information about the dependent variable.
x = dataset.iloc[:, [3, 4]].values
• Here we have extracted only 3 and 4 columns as we will use a 2D plot
to see the clusters. So, we are considering the Annual income and
Spending score as the matrix of features.

9
Implementation
• Step-2: Finding the optimal number of clusters using the Dendrogram
• Now we will find the optimal number of clusters using the Dendrogram for
our model. For this, we are going to use scipy library as it provides a
function that will directly return the dendrogram for our code.
• #Finding the optimal number of clusters using the dendrogram
import scipy.cluster.hierarchy as shc
dendro = shc.dendrogram(shc.linkage(x, method="ward"))
mtp.title("Dendrogrma Plot")
mtp.ylabel("Euclidean Distances")
mtp.xlabel("Customers")
mtp.show()

10
Implementation
• In the above lines of code, we have imported the hierarchy module of
scipy library.
• This module provides us a method shc.denrogram(), which takes
the linkage() as a parameter. The linkage function is used to define
the distance between two clusters, so here we have passed the
x(matrix of features), and method "ward," the popular method of
linkage in hierarchical clustering.
• Ward minimizes the sum of squared differences within all clusters.

11
Implementation
• Output:
• By executing the above lines of code, we will get the below output:

12
Implementation
• Using this Dendrogram, we will now determine the optimal number of clusters
for our model. For this, we will find the maximum vertical distance that does not
cut any horizontal bar. Consider the below diagram:

13
Implementation
• In the above diagram, we have shown the vertical distances that are
not cutting their horizontal bars. As we can visualize, the 4th distance
is looking the maximum, so according to this, the number of clusters
will be 5 (the vertical lines in this range).
• So, the optimal number of clusters will be 5, and we will train the
model in the next step, using the same.

14
Implementation
• Step-3: Training the hierarchical clustering model
• As we know the required optimal number of clusters, we can now train
our model.
• #training the hierarchical model on dataset
• from sklearn.cluster import AgglomerativeClustering
• hc= AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage=
'ward')
• y_pred= hc.fit_predict(x)

15
Implementation
• In the code, we have imported the AgglomerativeClustering class of cluster
module of scikit learn library.
• Then we have created the object of this class named as hc.
• The AgglomerativeClustering class takes the following parameters:
• n_clusters=5: It defines the number of clusters, and we have taken here 5
because it is the optimal number of clusters.
• affinity='euclidean': It is a metric used to compute the linkage.
• linkage='ward': It defines the linkage criteria, here we have used the "ward"
linkage. This method is the popular linkage method that we have already used for
creating the Dendrogram.
• In the last line, we have created the dependent variable y_pred to fit or train the
model. It does train not only the model but also returns the clusters to which each
data point belongs.

16
Implementation
• Step-4: Visualizing the clusters
• As we have trained our model successfully, now we can visualize the clusters
corresponding to the dataset.
• #visulaizing the clusters
mtp.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1], s = 100, c = 'blue', label = 'Cluster 1')
mtp.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1], s = 100, c = 'green', label = 'Cluster 2')
mtp.scatter(x[y_pred== 2, 0], x[y_pred == 2, 1], s = 100, c = 'red', label = 'Cluster 3')
mtp.scatter(x[y_pred == 3, 0], x[y_pred == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
mtp.scatter(x[y_pred == 4, 0], x[y_pred == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
mtp.title('Clusters of customers')
mtp.xlabel('Annual Income (k$)')
mtp.ylabel('Spending Score (1-100)')
mtp.legend()
mtp.show()
17
Implementation
• Output: By executing the above lines of code, we will get the below
output:

18
Thank you

Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
Lecture - 11 Hierarchical Clustering
No ratings yet
Lecture - 11 Hierarchical Clustering
28 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
9 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
23 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
10 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
4 pages
Hierarchical Clustering PDF
No ratings yet
Hierarchical Clustering PDF
7 pages
6 - Machine Learning and Unlabeled Data
No ratings yet
6 - Machine Learning and Unlabeled Data
67 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
10 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
34 pages
Data Mining
No ratings yet
Data Mining
27 pages
Joseph Xavier J - FML
No ratings yet
Joseph Xavier J - FML
15 pages
Hierarchical Clustering and Data Science Group Project - Assignment 2
No ratings yet
Hierarchical Clustering and Data Science Group Project - Assignment 2
29 pages
Hierarchical Clusters
No ratings yet
Hierarchical Clusters
6 pages
ML Unit 5
No ratings yet
ML Unit 5
50 pages
Chinninti Venkata Assessment Machine Learning
No ratings yet
Chinninti Venkata Assessment Machine Learning
11 pages
L08 Hierachical Agglomerative Clustering
No ratings yet
L08 Hierachical Agglomerative Clustering
41 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
What Is Cluster Analysis?
No ratings yet
What Is Cluster Analysis?
20 pages
FMLASS3Q7 - Jupyter Notebook
No ratings yet
FMLASS3Q7 - Jupyter Notebook
6 pages
Atelier N5 PDF
No ratings yet
Atelier N5 PDF
5 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Clustring
No ratings yet
Clustring
20 pages
21AI71 Module 5 Textbook
No ratings yet
21AI71 Module 5 Textbook
25 pages
10Hierarchical&Probabilistic Clustering & GMM (ML)
No ratings yet
10Hierarchical&Probabilistic Clustering & GMM (ML)
24 pages
Exp 8
No ratings yet
Exp 8
5 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
11 pages
9536 DWM Expt 7 Merged
No ratings yet
9536 DWM Expt 7 Merged
14 pages
Lesson 6 - Unsupervised Learning
No ratings yet
Lesson 6 - Unsupervised Learning
63 pages
3.2 HierCluster
No ratings yet
3.2 HierCluster
17 pages
Heirarchical Clustering
No ratings yet
Heirarchical Clustering
22 pages
ML TCS Lecture Hierarchical 1608
No ratings yet
ML TCS Lecture Hierarchical 1608
41 pages
Data Mining - Assignment: Girish Nayak
100% (1)
Data Mining - Assignment: Girish Nayak
21 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
DWM Exp8 127 133 137
No ratings yet
DWM Exp8 127 133 137
4 pages
Data Science
No ratings yet
Data Science
2 pages
Ass6 (DMDS)
No ratings yet
Ass6 (DMDS)
7 pages
Zara
No ratings yet
Zara
47 pages
ML Lec-17
No ratings yet
ML Lec-17
12 pages
Assignment ....
No ratings yet
Assignment ....
8 pages
Python
No ratings yet
Python
5 pages
Clustering Algorithms CheatSheet 1710438661
No ratings yet
Clustering Algorithms CheatSheet 1710438661
6 pages
Confusion Matrix Problem Solution
No ratings yet
Confusion Matrix Problem Solution
6 pages
Report 2
No ratings yet
Report 2
7 pages
23CC554
No ratings yet
23CC554
10 pages
Week 8 DS Practical
No ratings yet
Week 8 DS Practical
13 pages
Intro Qugates
No ratings yet
Intro Qugates
4 pages
Agnes
No ratings yet
Agnes
25 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
Expt 5
No ratings yet
Expt 5
3 pages
Marketing Analytics Week-10 LAQ
No ratings yet
Marketing Analytics Week-10 LAQ
5 pages
21MIC0107 Da4
No ratings yet
21MIC0107 Da4
4 pages
ChatGPT-Hierarchical Clustering Explained
No ratings yet
ChatGPT-Hierarchical Clustering Explained
12 pages
Day12 Hierarchical Clustering
No ratings yet
Day12 Hierarchical Clustering
9 pages
Hierarchical Clustering Mall Data
No ratings yet
Hierarchical Clustering Mall Data
2 pages
Tape Reading
0% (2)
Tape Reading
3 pages
Wa0069.
No ratings yet
Wa0069.
4 pages
Introduction To Computer System
100% (1)
Introduction To Computer System
66 pages
File PDF
No ratings yet
File PDF
50 pages
Large Rhombicosidodecahedron PDF
No ratings yet
Large Rhombicosidodecahedron PDF
11 pages
03 CP PDF
No ratings yet
03 CP PDF
8 pages
2024 MS Powerpoint Test
No ratings yet
2024 MS Powerpoint Test
3 pages
Turbo C Manual Chapter 2 Algo N Flowchart Module 2
100% (1)
Turbo C Manual Chapter 2 Algo N Flowchart Module 2
44 pages
Docker Kubernetes Made Easy Interactive Ebook FINAL
No ratings yet
Docker Kubernetes Made Easy Interactive Ebook FINAL
7 pages
Auditing and Investigations R.K 05-05-2006 DR Maungu
100% (2)
Auditing and Investigations R.K 05-05-2006 DR Maungu
347 pages
The Company and Its Founders: by A. M. Buckley
No ratings yet
The Company and Its Founders: by A. M. Buckley
114 pages
2012 Ibex Full Nmea Installation
No ratings yet
2012 Ibex Full Nmea Installation
82 pages
English Template JDLDE
No ratings yet
English Template JDLDE
6 pages
AAC Flash Encoder Plug-In
No ratings yet
AAC Flash Encoder Plug-In
15 pages
Dianne Flacks Resume - New
No ratings yet
Dianne Flacks Resume - New
2 pages
Hi-Scan 10080 XCT: Heimann X-Ray Inspection System
No ratings yet
Hi-Scan 10080 XCT: Heimann X-Ray Inspection System
2 pages
Operations Management Report On The Itc Echoupal Initiative
No ratings yet
Operations Management Report On The Itc Echoupal Initiative
13 pages
Logger5 E
No ratings yet
Logger5 E
158 pages
Management of Technology Task: Skype Business Canvas
0% (1)
Management of Technology Task: Skype Business Canvas
26 pages
DHT11 Thinkspeak
No ratings yet
DHT11 Thinkspeak
12 pages
RPS 1000 RPS 2500 RPS 5000 RPS 10000: User Manual
No ratings yet
RPS 1000 RPS 2500 RPS 5000 RPS 10000: User Manual
49 pages
Enhanced Implicit Sentiment Understanding With Prototype Learning and Demonstration For Aspect-Based Sentiment Analysis
No ratings yet
Enhanced Implicit Sentiment Understanding With Prototype Learning and Demonstration For Aspect-Based Sentiment Analysis
16 pages
Kapil Chauhan: Mobile: +91 9958965296
No ratings yet
Kapil Chauhan: Mobile: +91 9958965296
2 pages
Database Deign UG - G Assignment 1 Semester 1 2021
No ratings yet
Database Deign UG - G Assignment 1 Semester 1 2021
4 pages
Gateforum Ece Question Paper
No ratings yet
Gateforum Ece Question Paper
17 pages
Unit 1 Introduction To Cloud Computing: Structure
No ratings yet
Unit 1 Introduction To Cloud Computing: Structure
17 pages
FlashSystem Redirect On Write Snapshots 2021 Jul 01
No ratings yet
FlashSystem Redirect On Write Snapshots 2021 Jul 01
8 pages
Anis E-Waste Management
No ratings yet
Anis E-Waste Management
122 pages
W4 C2 Student Worksheet PDF
No ratings yet
W4 C2 Student Worksheet PDF
9 pages
Sop - Vor
No ratings yet
Sop - Vor
3 pages
Coaching Center WorkFlow
No ratings yet
Coaching Center WorkFlow
8 pages

Aiml Unit 3 4

Uploaded by

Aiml Unit 3 4

Uploaded by

Artificial Intelligence and Machine Learning

Rows: 200 and Columns: 5

You might also like