0% found this document useful (0 votes)

197 views12 pages

K Means Clustering Algorithm

The document provides an overview of the K-Means clustering algorithm. It defines K-Means clustering as an unsupervised learning algorithm that groups unlabeled data points into K number of clusters based on their similarities. The document then describes the basic steps of the K-Means algorithm, including randomly selecting initial centroids, assigning data points to the closest centroid, recalculating centroids, and repeating until clusters no longer change. It also discusses how to determine the optimal number of K clusters using the elbow method of plotting WCSS against the number of K clusters.

Uploaded by

nandanvarma.dandu9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

197 views12 pages

K Means Clustering Algorithm

Uploaded by

nandanvarma.dandu9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

K-Means Clustering Algorithm

K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering
problems in machine learning or data science. In this topic, we will learn what is K-means
clustering algorithm, how the algorithm works, along with the Python implementation of k-
means clustering.

What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset
into different clusters. Here K defines the number of pre-defined clusters that need to be created
in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and
so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in
such a way that each dataset belongs only one group that has similar properties.

It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim
of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters,
and repeats the process until it does not find the best clusters. The value of k should be
predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.

Hence each cluster has datapoints with some commonalities, and it is away from other clusters.

The below diagram explains the working of the K-means Clustering Algorithm:
How does the K-Means Algorithm Work?
The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid
of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two variables is
given below:
o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into
different clusters. It means here we will try to group these datasets into two different
clusters.
o We need to choose some random k points or centroid to form the cluster. These points
can be either the points from the dataset or any other point. So, here we are selecting the
below two points as k points, which are not the part of our dataset. Consider the below
image:

o Now we will assign each data point of the scatter plot to its closest K-point or centroid.
We will compute it by applying some mathematics that we have studied to calculate the
distance between two points. So, we will draw a median between both the centroids.
Consider the below image:

From the above image, it is clear that points left side of the line is near to the K1 or blue
centroid, and points to the right of the line are close to the yellow centroid. Let's color them as
blue and yellow for clear visualization.
o As we need to find the closest cluster, so we will repeat the process by choosing a new
centroid. To choose the new centroids, we will compute the center of gravity of these
centroids, and will find new centroids as below:
o Next, we will reassign each datapoint to the new centroid. For this, we will repeat the
same process of finding a median line. The median will be like below image:

From the above image, we can see, one yellow point is on the left side of the line, and two blue
points are right to the line. So, these three points will be assigned to new centroids.
As reassignment has taken place, so we will again go to the step-4, which is finding new
centroids or K-points.
o We will repeat the process by finding the center of gravity of centroids, so the new
centroids will be as shown in the below image:
o As we got the new centroids so again will draw the median line and reassign the data
points. So, the image will be:

o We can see in the above image; there are no dissimilar data points on either side of the
line, which means our model is formed. Consider the below image:
As our model is ready, so we can now remove the assumed centroids, and the two final clusters
will be as shown in the below image:

How to choose the value of "K number of clusters" in K-

means Clustering?
The performance of the K-means clustering algorithm depends upon highly efficient clusters that
it forms. But choosing the optimal number of clusters is a big task. There are some different
ways to find the optimal number of clusters, but here we are discussing the most appropriate
method to find the number of clusters or value of K. The method is given below:

Elbow Method
The Elbow method is one of the most popular ways to find the optimal number of clusters. This
method uses the concept of WCSS value. WCSS stands for Within Cluster Sum of Squares,
which defines the total variations within a cluster. The formula to calculate the value of WCSS
(for 3 clusters) is given below:

WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2 distance(Pi C2)2+∑Pi in

CLuster3 distance(Pi C3)

In the above formula of WCSS,

∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances between each data point
and its centroid within a cluster1 and the same for the other two terms.
To measure the distance between data points and centroid, we can use any method such as
Euclidean distance or Manhattan distance.

To find the optimal value of clusters, the elbow method follows the below steps:

o It executes the K-means clustering on a given dataset for different K values (ranges from
1-10).
o For each value of K, calculates the WCSS value.
o Plots a curve between calculated WCSS values and the number of clusters K.
o The sharp point of bend or a point of the plot looks like an arm, then that point is
considered as the best value of K.

Since the graph shows the sharp bend, which looks like an elbow, hence it is known as the elbow
method. The graph for the elbow method looks like the below image:

Note: We can choose the number of clusters equal to the given data points. If we choose
the number of clusters equal to the data points, then the value of WCSS becomes zero, and
that will be the endpoint of the plot.

Computer Science O Level Syllabus
No ratings yet
Computer Science O Level Syllabus
44 pages
Applied ML Notes
No ratings yet
Applied ML Notes
123 pages
Unit 4
No ratings yet
Unit 4
12 pages
Chapter 3 For Consumer Behavior
No ratings yet
Chapter 3 For Consumer Behavior
30 pages
First Steps To Early Years English TG 1
No ratings yet
First Steps To Early Years English TG 1
120 pages
Statement of Purpose
100% (1)
Statement of Purpose
2 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
10 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Chapter 6 Measures of Skewness and Kurtosis
No ratings yet
Chapter 6 Measures of Skewness and Kurtosis
25 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
Vanishing and Exploding
No ratings yet
Vanishing and Exploding
9 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
Regression Notes
100% (1)
Regression Notes
20 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
ML UNIT 2 Sir
No ratings yet
ML UNIT 2 Sir
46 pages
Chapter 7
No ratings yet
Chapter 7
54 pages
DTB (ch5)
No ratings yet
DTB (ch5)
14 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
6 1 Mining Complex Data
No ratings yet
6 1 Mining Complex Data
69 pages
1.write A Program in Prolog To Show The Sum of N Natural Numbers. Code
No ratings yet
1.write A Program in Prolog To Show The Sum of N Natural Numbers. Code
2 pages
Numerical Methods-I PDF
No ratings yet
Numerical Methods-I PDF
30 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
Unit 4 Data Science
No ratings yet
Unit 4 Data Science
21 pages
Quiz Week 7 - Support Vector Machines
100% (1)
Quiz Week 7 - Support Vector Machines
3 pages
ML Question Bank
No ratings yet
ML Question Bank
29 pages
K Means Clustering Lecture
No ratings yet
K Means Clustering Lecture
32 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
13 pages
Model Building Through
No ratings yet
Model Building Through
21 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Unit 5
No ratings yet
Unit 5
29 pages
Machine Learning: PAC-Learning and VC-Dimension
No ratings yet
Machine Learning: PAC-Learning and VC-Dimension
31 pages
Text
No ratings yet
Text
131 pages
R22 ML Syllabus
No ratings yet
R22 ML Syllabus
2 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
18 pages
IAT-I Question Paper With Solution of 18CS71 Artificial Intelligence and Machine Learning Oct-2022-Dr. Paras Nath Singh
No ratings yet
IAT-I Question Paper With Solution of 18CS71 Artificial Intelligence and Machine Learning Oct-2022-Dr. Paras Nath Singh
7 pages
Bayes Theorem PPT 1
No ratings yet
Bayes Theorem PPT 1
9 pages
Unit1 ML
No ratings yet
Unit1 ML
23 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Support Vector Machine (SVM) : Basic Terminologies
100% (1)
Support Vector Machine (SVM) : Basic Terminologies
2 pages
03 - Decision - Tree - Hunt Algorithm
No ratings yet
03 - Decision - Tree - Hunt Algorithm
28 pages
Estimation Theory
100% (1)
Estimation Theory
8 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
Branch and Bound
No ratings yet
Branch and Bound
30 pages
Unit 4
No ratings yet
Unit 4
4 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
Unit 4
No ratings yet
Unit 4
79 pages
Mc4301 APR May 24 (Machine Learning)
No ratings yet
Mc4301 APR May 24 (Machine Learning)
3 pages
Session 18 Time Series Forecasting
No ratings yet
Session 18 Time Series Forecasting
30 pages
Chapter-1:-Introduction To R Language: 1.1 History and Overview
No ratings yet
Chapter-1:-Introduction To R Language: 1.1 History and Overview
7 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
13-Mca-Or-Probability & Statistics
No ratings yet
13-Mca-Or-Probability & Statistics
3 pages
SOFT COMPUTING - NOTES - UNIT 4 and UNIT 5
No ratings yet
SOFT COMPUTING - NOTES - UNIT 4 and UNIT 5
32 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
SC&RP - Unit 5
No ratings yet
SC&RP - Unit 5
36 pages
Similarity and Dissimilarity
No ratings yet
Similarity and Dissimilarity
34 pages
Ma5160 Applied Probability and Statistics: For Syllabus, Question Papers, Notes & Many More
100% (1)
Ma5160 Applied Probability and Statistics: For Syllabus, Question Papers, Notes & Many More
2 pages
Introduction of Pattern Recognition PDF
No ratings yet
Introduction of Pattern Recognition PDF
40 pages
Lab Program
100% (1)
Lab Program
15 pages
K-Means Clustering Algorithm - Javatpoint
No ratings yet
K-Means Clustering Algorithm - Javatpoint
21 pages
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
The Impact of Multilingualism and Learning Patterns On Student Achievement in English and Other Subjects in Higher Education
No ratings yet
The Impact of Multilingualism and Learning Patterns On Student Achievement in English and Other Subjects in Higher Education
21 pages
Your Guide To Developing Thinking Skills in Science 1726418158
No ratings yet
Your Guide To Developing Thinking Skills in Science 1726418158
19 pages
Uganda Covid-19 Emergency Education Response Project (Cerp)
No ratings yet
Uganda Covid-19 Emergency Education Response Project (Cerp)
67 pages
Araling Panlipunan Department: Topic: Applied Knowledge of Content Within and Across Curriculum Teaching Areas
No ratings yet
Araling Panlipunan Department: Topic: Applied Knowledge of Content Within and Across Curriculum Teaching Areas
34 pages
Professional Foundations Course Overview 1
No ratings yet
Professional Foundations Course Overview 1
6 pages
Article 2 Peterson
No ratings yet
Article 2 Peterson
13 pages
Lesson Plan # 8: Subject: Computer Science Grade: 8 Time: 30 Min
No ratings yet
Lesson Plan # 8: Subject: Computer Science Grade: 8 Time: 30 Min
3 pages
Unit Planner-EY2 - WWA
No ratings yet
Unit Planner-EY2 - WWA
4 pages
DLP Eng10 Q3 W3
No ratings yet
DLP Eng10 Q3 W3
4 pages
Design School Kolding - Tactus
No ratings yet
Design School Kolding - Tactus
11 pages
Class 7 History NCERT Solution Chapter 1 Introduction Tracing Changes Through A Thousand Years
No ratings yet
Class 7 History NCERT Solution Chapter 1 Introduction Tracing Changes Through A Thousand Years
31 pages
Linzs Comprehensive Respiratory Diseases Official Test Bank
No ratings yet
Linzs Comprehensive Respiratory Diseases Official Test Bank
403 pages
Curriculum Map in English 10.1
No ratings yet
Curriculum Map in English 10.1
6 pages
Difficulties of Students Encountering TLE
67% (3)
Difficulties of Students Encountering TLE
5 pages
Research Brief - Mara Macalan
No ratings yet
Research Brief - Mara Macalan
1 page
Istory of Valuation: The Clear and Specific Means With Which To Fix Such Problems.)
No ratings yet
Istory of Valuation: The Clear and Specific Means With Which To Fix Such Problems.)
1 page
Formato Lesson Plan
No ratings yet
Formato Lesson Plan
1 page
Obe Syllabus Format Rcastillo Pol Theory
100% (1)
Obe Syllabus Format Rcastillo Pol Theory
11 pages
Don Bosco Tvet Center-Calauan, Inc.: Session Plan
No ratings yet
Don Bosco Tvet Center-Calauan, Inc.: Session Plan
2 pages
Daily Lesson Plans (4th Week)
No ratings yet
Daily Lesson Plans (4th Week)
11 pages
Kelsie Whitehall Adjectives of Quality Lesson Plan
No ratings yet
Kelsie Whitehall Adjectives of Quality Lesson Plan
3 pages
Equity in Education Research Brief
No ratings yet
Equity in Education Research Brief
12 pages
Organizational Behaviour Multiple Choice Questions With Answers
No ratings yet
Organizational Behaviour Multiple Choice Questions With Answers
91 pages
Essay Competition 2022
No ratings yet
Essay Competition 2022
2 pages
Stregnth Weaknesses Opportunities and Challenges Faced During Semester Pattern
No ratings yet
Stregnth Weaknesses Opportunities and Challenges Faced During Semester Pattern
4 pages
Basic Calculus Activity Sheet Quarter 3 - Melc 4
33% (3)
Basic Calculus Activity Sheet Quarter 3 - Melc 4
7 pages

K Means Clustering Algorithm

Uploaded by

K Means Clustering Algorithm

Uploaded by

K-Means Clustering Algorithm

What is K-Means Algorithm?

The k-means clustering algorithm mainly performs two tasks:

Step-1: Select the number K to decide the number of clusters.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

How to choose the value of "K number of clusters" in K-

WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2 distance(Pi C2)2+∑Pi in

CLuster3 distance(Pi C3)

In the above formula of WCSS,

You might also like