0% found this document useful (0 votes)

20 views8 pages

K-Means Clustering Algorithm - Javatpoint

Uploaded by

akepogupallavi12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views8 pages

K-Means Clustering Algorithm - Javatpoint

Uploaded by

akepogupallavi12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

10/1/24, 7:43 AM K-Means Clustering Algorithm - Javatpoint

Python Implementation of K-means Clustering Algorithm

In the above section, we have discussed the K-means algorithm, now let's see how it can be implemented
using Python.

Before implementation, let's understand what type of problem we will solve here. So, we have a dataset
of Mall_Customers, which is the data of customers who visit the mall and spend there.

In the given dataset, we have Customer_Id, Gender, Age, Annual Income ($), and Spending Score (which
is the calculated value of how much a customer has spent in the mall, the more the value, the more he has
spent). From this dataset, we need to calculate some patterns, as it is an unsupervised method, so we don't
know what to calculate exactly.

The steps to be followed for the implementation are given below:

Data Pre-processing

Finding the optimal number of clusters using the elbow method

Training the K-means algorithm on the training dataset

Visualizing the clusters

Step-1: Data pre-processing Step

The first step will be the data pre-processing, as we did in our earlier topics of Regression and Classification.
But for the clustering problem, it will be different from other models. Let's discuss it:

Importing Libraries
As we did in previous topics, firstly, we will import the libraries for our model, which is part of data
pre-processing. The code is given below:

# importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd

In the above code, the numpy we have imported for the performing mathematics calculation, matplotlib is for
plotting the graph, and pandas are for managing the dataset.

https://www.javatpoint.com/k-means-clustering-algorithm-in-machine-learning 1/8
10/1/24, 7:44 AM K-Means Clustering Algorithm - Javatpoint

Importing the Dataset:

Next, we will import the dataset that we need to use. So here, we are using the
Mall_Customer_data.csv dataset. It can be imported using the below code:

# Importing the dataset

dataset = pd.read_csv('Mall_Customers_data.csv')

By executing the above lines of code, we will get our dataset in the Spyder IDE. The dataset looks like the
below image:

From the above dataset, we need to find some patterns in it.

Extracting Independent Variables

Here we don't need any dependent variable for data pre-processing step as it is a clustering problem, and we
have no idea about what to determine. So we will just add a line of code for the matrix of features.

https://www.javatpoint.com/k-means-clustering-algorithm-in-machine-learning 2/8
10/1/24, 7:44 AM K-Means Clustering Algorithm - Javatpoint

x = dataset.iloc[:, [3, 4]].values

As we can see, we are extracting only 3rd and 4th feature. It is because we need a 2d plot to visualize the
model, and some features are not required, such as customer_id.

Step-2: Finding the optimal number of clusters using the elbow method
In the second step, we will try to find the optimal number of clusters for our clustering problem. So, as
discussed above, here we are going to use the elbow method for this purpose.

As we know, the elbow method uses the WCSS concept to draw the plot by plotting WCSS values on the Y-
axis and the number of clusters on the X-axis. So we are going to calculate the value for WCSS for different k
values ranging from 1 to 10. Below is the code for it:

#finding optimal number of clusters using the elbow method

from sklearn.cluster import KMeans
wcss_list= [] #Initializing the list for the values of WCSS

#Using for loop for iterations from 1 to 10.

for i in range(1, 11):
kmeans = KMeans(n_clusters=i, init='k-means++', random_state= 42)
kmeans.fit(x)
wcss_list.append(kmeans.inertia_)
mtp.plot(range(1, 11), wcss_list)
mtp.title('The Elobw Method Graph')
mtp.xlabel('Number of clusters(k)')
mtp.ylabel('wcss_list')
mtp.show()

As we can see in the above code, we have used the KMeans class of sklearn. cluster library to form the
clusters.

Next, we have created the wcss_list variable to initialize an empty list, which is used to contain the value of
wcss computed for different values of k ranging from 1 to 10.

https://www.javatpoint.com/k-means-clustering-algorithm-in-machine-learning 3/8
10/1/24, 7:44 AM K-Means Clustering Algorithm - Javatpoint

After that, we have initialized the for loop for the iteration on a different value of k ranging from 1 to 10; since
for loop in Python, exclude the outbound limit, so it is taken as 11 to include 10th value.

The rest part of the code is similar as we did in earlier topics, as we have fitted the model on a matrix of
features and then plotted the graph between the number of clusters and WCSS.

Output: After executing the above code, we will get the below output:

From the above plot, we can see the elbow point is at 5. So the number of clusters here will be 5.

https://www.javatpoint.com/k-means-clustering-algorithm-in-machine-learning 4/8
10/1/24, 7:44 AM K-Means Clustering Algorithm - Javatpoint

Step- 3: Training the K-means algorithm on the training dataset

As we have got the number of clusters, so we can now train the model on the dataset.

To train the model, we will use the same two lines of code as we have used in the above section, but here
instead of using i, we will use 5, as we know there are 5 clusters that need to be formed. The code is given
below:

#training the K-means model on a dataset

kmeans = KMeans(n_clusters=5, init='k-means++', random_state= 42)
y_predict= kmeans.fit_predict(x)

The first line is the same as above for creating the object of KMeans class.

In the second line of code, we have created the dependent variable y_predict to train the model.

By executing the above lines of code, we will get the y_predict variable. We can check it under the variable
explorer option in the Spyder IDE. We can now compare the values of y_predict with our original dataset.
Consider the below image:

https://www.javatpoint.com/k-means-clustering-algorithm-in-machine-learning 5/8
10/1/24, 7:44 AM K-Means Clustering Algorithm - Javatpoint

From the above image, we can now relate that the CustomerID 1 belongs to a cluster

3(as index starts from 0, hence 2 will be considered as 3), and 2 belongs to cluster 4, and so on.

Step-4: Visualizing the Clusters

The last step is to visualize the clusters. As we have 5 clusters for our model, so we will visualize each cluster
one by one.

To visualize the clusters will use scatter plot using mtp.scatter() function of matplotlib.

#visulaizing the clusters

mtp.scatter(x[y_predict == 0, 0], x[y_predict == 0, 1], s = 100, c = 'blue', label = 'Cluster 1') #for first cl
uster

https://www.javatpoint.com/k-means-clustering-algorithm-in-machine-learning 6/8
10/1/24, 7:44 AM K-Means Clustering Algorithm - Javatpoint

mtp.scatter(x[y_predict == 1, 0], x[y_predict == 1, 1], s = 100, c = 'green', label = 'Cluster 2') #for seco
nd cluster
mtp.scatter(x[y_predict== 2, 0], x[y_predict == 2, 1], s = 100, c = 'red', label = 'Cluster 3') #for third clu
ster
mtp.scatter(x[y_predict == 3, 0], x[y_predict == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4') #for fourth
cluster
mtp.scatter(x[y_predict == 4, 0], x[y_predict == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5') #for fif
th cluster
mtp.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300, c = 'yellow', label =
'Centroid')
mtp.title('Clusters of customers')
mtp.xlabel('Annual Income (k$)')
mtp.ylabel('Spending Score (1-100)')
mtp.legend()
mtp.show()

In above lines of code, we have written code for each clusters, ranging from 1 to 5. The first coordinate of the
mtp.scatter, i.e., x[y_predict == 0, 0] containing the x value for the showing the matrix of features values, and
the y_predict is ranging from 0 to 1.

Output:

The output image is clearly showing the five different clusters with different colors. The clusters are formed
between two parameters of the dataset; Annual income of customer and Spending. We can change the colors
and labels as per the requirement or choice. We can also observe some points from the above patterns, which
are given below:

Cluster1 shows the customers with average salary and average spending so we can categorize
these customers as
Cluster2 shows the customer has a high income but low spending, so we can categorize them
as careful.
Cluster3 shows the low income and also low spending so they can be categorized as sensible.

Cluster4 shows the customers with low income with very high spending so they can be
categorized as careless.
Cluster5 shows the customers with high income and high spending so they can be categorized as

https://www.javatpoint.com/k-means-clustering-algorithm-in-machine-learning 7/8
10/1/24, 7:44 AM K-Means Clustering Algorithm - Javatpoint

target, and these customers can be the most profitable customers for the mall owner.

https://www.javatpoint.com/k-means-clustering-algorithm-in-machine-learning 8/8

K-Means Clustering in Machine Learning
No ratings yet
K-Means Clustering in Machine Learning
12 pages
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
No ratings yet
Subject: ML Name: Priyanshu Gandhi Date: 10/4/21 Expt. No.: 9 Roll No.: C008 Title: Clustering Implementation in Python
7 pages
K-Means Clustering Algorithm Overview
No ratings yet
K-Means Clustering Algorithm Overview
47 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
Experiment-7: Implementation of K-Means Clustering Algorithm
No ratings yet
Experiment-7: Implementation of K-Means Clustering Algorithm
3 pages
DWM Exp4
No ratings yet
DWM Exp4
9 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
Python K-Means Clustering Guide
No ratings yet
Python K-Means Clustering Guide
6 pages
K-Means Clustering Implementation Guide
No ratings yet
K-Means Clustering Implementation Guide
8 pages
Practical 03
No ratings yet
Practical 03
3 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
K Means - Ipynb - Colab
No ratings yet
K Means - Ipynb - Colab
10 pages
ML2 Practical List
No ratings yet
ML2 Practical List
80 pages
Da Exp 10
No ratings yet
Da Exp 10
6 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
Da Exp 10
No ratings yet
Da Exp 10
6 pages
Unit 4
No ratings yet
Unit 4
63 pages
Mall Customer Segmentation Guide
No ratings yet
Mall Customer Segmentation Guide
8 pages
0006 - K Means Clustering - Introduction - 2025
No ratings yet
0006 - K Means Clustering - Introduction - 2025
19 pages
K-Means Clustering Report
No ratings yet
K-Means Clustering Report
2 pages
Practical File of AI and ML
No ratings yet
Practical File of AI and ML
26 pages
Aam Unit 4 QB With Answer
No ratings yet
Aam Unit 4 QB With Answer
11 pages
AAM 7th Prac
No ratings yet
AAM 7th Prac
4 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
26 pages
JAVIER KMeans Clustering Jupyter Notebook
No ratings yet
JAVIER KMeans Clustering Jupyter Notebook
7 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
K-Means Clustering
No ratings yet
K-Means Clustering
4 pages
Lab Report6 - B21CI014
No ratings yet
Lab Report6 - B21CI014
8 pages
Building K-Means Clustering Algorithm From Scratch
No ratings yet
Building K-Means Clustering Algorithm From Scratch
10 pages
Avinash Tiwari 9
No ratings yet
Avinash Tiwari 9
4 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
ML 2.3 Prashant
No ratings yet
ML 2.3 Prashant
4 pages
ML Clustering2
No ratings yet
ML Clustering2
11 pages
02.1 K-Means Example
No ratings yet
02.1 K-Means Example
12 pages
AppliedML Chap1 Clustering
No ratings yet
AppliedML Chap1 Clustering
37 pages
Lab6 Instruction
No ratings yet
Lab6 Instruction
3 pages
EXP-6 K Mean Clustring
No ratings yet
EXP-6 K Mean Clustring
6 pages
K-Means Clustering Tutorial
No ratings yet
K-Means Clustering Tutorial
16 pages
K Means Clustering
No ratings yet
K Means Clustering
5 pages
LAB 4 - K-Means and Elbow Technique
No ratings yet
LAB 4 - K-Means and Elbow Technique
3 pages
K Means Clustering - Experiment 12
No ratings yet
K Means Clustering - Experiment 12
3 pages
Experiment-3 ML Lab
No ratings yet
Experiment-3 ML Lab
20 pages
Unsupervised Learning: Clustering & Anomaly Detection
No ratings yet
Unsupervised Learning: Clustering & Anomaly Detection
50 pages
Kman 07
No ratings yet
Kman 07
9 pages
Ex No: Date: K-Means Clustering Using Python: Scatter
No ratings yet
Ex No: Date: K-Means Clustering Using Python: Scatter
10 pages
Experiment 2 KMeans Clustering
No ratings yet
Experiment 2 KMeans Clustering
3 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
26 pages
Data Science: K-Means Clustering
No ratings yet
Data Science: K-Means Clustering
7 pages
Bone Suplement Market Segmentation
No ratings yet
Bone Suplement Market Segmentation
20 pages
Detecting Patterns With Unsupervised Learning
No ratings yet
Detecting Patterns With Unsupervised Learning
21 pages
Clustering Mall Data Students
No ratings yet
Clustering Mall Data Students
11 pages
K-Means Clustering: Unsupervised Learning
No ratings yet
K-Means Clustering: Unsupervised Learning
5 pages
Document
No ratings yet
Document
4 pages
Facebook Live Seller
No ratings yet
Facebook Live Seller
8 pages
K.means Clustering
No ratings yet
K.means Clustering
8 pages
Implement Clustering Algorithms For Unsupervised Classification
No ratings yet
Implement Clustering Algorithms For Unsupervised Classification
4 pages
DWDM Lab All
No ratings yet
DWDM Lab All
20 pages
Understanding Wikipedia's Functionality
No ratings yet
Understanding Wikipedia's Functionality
11 pages
Interferometry
No ratings yet
Interferometry
5 pages
Impact CAD: Advanced Packaging Design Software
No ratings yet
Impact CAD: Advanced Packaging Design Software
6 pages
Theory of Change2
No ratings yet
Theory of Change2
10 pages
"Hotel Automation System": Project Report ON
No ratings yet
"Hotel Automation System": Project Report ON
9 pages
Paul Sheridan Resume 2018
No ratings yet
Paul Sheridan Resume 2018
1 page
Causes and Prevention of Workplace Accidents
No ratings yet
Causes and Prevention of Workplace Accidents
5 pages
Sectors of Economy Primary Secondary Tertiary Quaternary and Quinary
100% (2)
Sectors of Economy Primary Secondary Tertiary Quaternary and Quinary
5 pages
Bharathidasan University Distance Education Guide
No ratings yet
Bharathidasan University Distance Education Guide
43 pages
Individual Pathway Plan Template-1
100% (1)
Individual Pathway Plan Template-1
7 pages
Unstated Main Idea
No ratings yet
Unstated Main Idea
21 pages
Classification of Slope
No ratings yet
Classification of Slope
5 pages
EnP Exam Review Notes - David Garcia - 14 April 2014
100% (3)
EnP Exam Review Notes - David Garcia - 14 April 2014
70 pages
Organizational Structures Guide
No ratings yet
Organizational Structures Guide
15 pages
Respect For Constituted Authority
No ratings yet
Respect For Constituted Authority
3 pages
EOC Training Course Report 2022
No ratings yet
EOC Training Course Report 2022
9 pages
Top 7 Trends in Management Accounting
No ratings yet
Top 7 Trends in Management Accounting
10 pages
Family Therapy: Goals and Approaches
100% (1)
Family Therapy: Goals and Approaches
15 pages
3 Secrets For Resume Success
No ratings yet
3 Secrets For Resume Success
12 pages
Colorful Playful Career Planner Presentation
No ratings yet
Colorful Playful Career Planner Presentation
26 pages
Melting Point Analysis with Hoover Apparatus
No ratings yet
Melting Point Analysis with Hoover Apparatus
14 pages
Corona Losses in Power Lines
No ratings yet
Corona Losses in Power Lines
7 pages
Openai 2022
No ratings yet
Openai 2022
108 pages
HHS Public Access: Complementary Therapies For Acne Vulgaris
No ratings yet
HHS Public Access: Complementary Therapies For Acne Vulgaris
138 pages
Bağlaç Tablosu
No ratings yet
Bağlaç Tablosu
1 page
Lesson Plans for Year 2 & 5
No ratings yet
Lesson Plans for Year 2 & 5
5 pages
Digital Circuit Testing Guide
No ratings yet
Digital Circuit Testing Guide
54 pages
Hulda Winnes - Air Pollution From Ships
No ratings yet
Hulda Winnes - Air Pollution From Ships
92 pages
Application of Imaging Techniques To Mechanics of Materials and Structures
No ratings yet
Application of Imaging Techniques To Mechanics of Materials and Structures
448 pages
Grade 7 Geometry Lesson Plan: Circles
No ratings yet
Grade 7 Geometry Lesson Plan: Circles
14 pages

K-Means Clustering Algorithm - Javatpoint

Uploaded by

K-Means Clustering Algorithm - Javatpoint

Uploaded by

10/1/24, 7:43 AM K-Means Clustering Algorithm - Javatpoint

Python Implementation of K-means Clustering Algorithm

The steps to be followed for the implementation are given below:

Finding the optimal number of clusters using the elbow method

Training the K-means algorithm on the training dataset

Step-1: Data pre-processing Step

Importing the Dataset:

# Importing the dataset

From the above dataset, we need to find some patterns in it.

Extracting Independent Variables

x = dataset.iloc[:, [3, 4]].values

#finding optimal number of clusters using the elbow method

#Using for loop for iterations from 1 to 10.

Step- 3: Training the K-means algorithm on the training dataset

#training the K-means model on a dataset

Step-4: Visualizing the Clusters

#visulaizing the clusters

You might also like