0% found this document useful (0 votes)

40 views8 pages

K-Mean Clustering

The document discusses the K-Means clustering algorithm, an unsupervised machine learning technique that groups unlabeled data points into K number of clusters. It explains how K-Means works by initializing centroids, assigning data points to the closest centroid, recomputing centroids and repeating until clusters are stable. It also describes how to determine the optimal number of clusters K using the elbow method by plotting the within-cluster sum of squares against K.

Uploaded by

hokijic810

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views8 pages

K-Mean Clustering

Uploaded by

hokijic810

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

K-Means Clustering Algorithm

K-Means Clustering is an unsupervised learning algorithm that is used to solve the

clustering problems in machine learning or data science.

What is the K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled

dataset into different clusters. Here K deﬁnes the number of predeﬁned clusters that
need to be created in the process, as if K=2, there will be two clusters, and for K=3, there
will be three clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters
in such a way that each dataset belongs to only one group that has similar properties.

It allows us to cluster the data into different groups and a convenient way to discover
the categories of groups in the unlabeled dataset on its own without the need for any
training.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The

main aim of this algorithm is to minimize the sum of distances between the data point
and their corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not ﬁnd the best clusters. The value of k
should be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

○ Determines the best value for K center points or centroids by an iterative process.
○ Assigns each data point to its closest k-center. Those data points which are near
to the particular k-center, create a cluster.

Hence each cluster has data points with some commonalities, and it is away from other
clusters.

The below diagram explains the working of the K-means Clustering Algorithm:

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be different from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predeﬁned
K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means re-assign each datapoint to the new
closest centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two
variables is given below:

○ Let's take the number k of clusters, i.e., K=2, to identify the dataset and to put
them into different clusters. It means here we will try to group these datasets into
two different clusters.

○ We need to choose some random k points or centroid to form the cluster. These
points can be either the points from the dataset or any other point. So, here we
are selecting the below two points as k points, which are not the part of our
dataset. Consider the below image:
○ Now we will assign each data point of the scatter plot to its closest K-point or
centroid. We will compute it by applying some mathematics that we have studied
to calculate the distance between two points. So, we will draw a median between
both the centroids. Consider the below image:

From the above image, it is clear that points on the left side of the line are near to the K1
or blue centroid, and points to the right of the line are close to the yellow centroid. Let's
color them as blue and yellow for clear visualization.

○ As we need to ﬁnd the closest cluster, so we will repeat the process by choosing
a new centroid. To choose the new centroids, we will compute the center of
gravity of these centroids, and will ﬁnd new centroids as below:

○ Next, we will reassign each datapoint to the new centroid. For this, we will repeat
the same process of ﬁnding a median line. The median will be like below image:

From the above image, we can see, one yellow point is on the left side of the line, and
two blue points are right to the line. So, these three points will be assigned to new
centroids.
As reassignment has taken place, so we will again go to step-4, which is ﬁnding new
centroids or K-points.

○ We will repeat the process by ﬁnding the center of gravity of centroids, so the
new centroids will be as shown in the below image:

○ As we got the new centroids so again will draw the median line and reassign the
data points. So, the image will be:

○ We can see in the above image; there are no dissimilar data points on either side
of the line, which means our model is formed. Consider the below image:
As our model is ready, so we can now remove the assumed centroids, and the two ﬁnal
clusters will be as shown in the below image:

How to choose the value of "K number of clusters" in

K-means Clustering?
The performance of the K-means clustering algorithm depends upon highly efficient
clusters that it forms. But choosing the optimal number of clusters is a big task. There
are some different ways to find the optimal number of clusters, but here we are
discussing the most appropriate method to find the number of clusters or value of K.
The method is given below:

Elbow Method

The Elbow method is one of the most popular ways to ﬁnd the optimal number of
clusters. This method uses the concept of WCSS value. WCSS stands for Within Cluster
Sum of Squares, which deﬁnes the total variations within a cluster. The formula to
calculate the value of WCSS (for 3 clusters) is given below:

WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2 distance(Pi C2)2+∑Pi in CLuster3

distance(Pi C3)2

In the above formula of WCSS,

∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances between each
data point and its centroid within a cluster1 and the same for the other two terms.

To measure the distance between data points and centroid, we can use any method
such as Euclidean distance or Manhattan distance.

To ﬁnd the optimal value of clusters, the elbow method follows the below steps:

○ It executes the K-means clustering on a given dataset for different K values

(ranges from 1-10).

○ For each value of K, calculates the WCSS value.

○ Plots a curve between calculated WCSS values and the number of clusters K.

○ The sharp point of bend or a point of the plot looks like an arm, then that point is
considered as the best value of K.

Since the graph shows the sharp bend, which looks like an elbow, hence it is known as
the elbow method. The graph for the elbow method looks like the below image:

Note: We can choose the number of clusters equal to the given data points. If we choose
the number of clusters equal to the data points, then the value of WCSS becomes zero, and
that will be the endpoint of the plot.

DSA456 Midterm Radu PDF
No ratings yet
DSA456 Midterm Radu PDF
10 pages
Flowchart For Arranging Numbers in Ascending Order
No ratings yet
Flowchart For Arranging Numbers in Ascending Order
9 pages
K Means Clustering Algorithm
No ratings yet
K Means Clustering Algorithm
12 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
Algo
No ratings yet
Algo
59 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
K-Means Clustering Algorithm - Javatpoint
No ratings yet
K-Means Clustering Algorithm - Javatpoint
21 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Unit 4
No ratings yet
Unit 4
22 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
22 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
K Clustering
No ratings yet
K Clustering
28 pages
Data Mining-4
No ratings yet
Data Mining-4
9 pages
Clustering
No ratings yet
Clustering
17 pages
PART2
No ratings yet
PART2
61 pages
DWDM Unit5
No ratings yet
DWDM Unit5
14 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
Clustering
No ratings yet
Clustering
10 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
Kmean
No ratings yet
Kmean
24 pages
CPE412 Pattern Recognition (Week 7)
No ratings yet
CPE412 Pattern Recognition (Week 7)
48 pages
KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
Clustering Notes
No ratings yet
Clustering Notes
29 pages
Clustering
No ratings yet
Clustering
24 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Lecture 18 K Means Clustering
No ratings yet
Lecture 18 K Means Clustering
77 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
K Means Algo
No ratings yet
K Means Algo
7 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
Clustering - K-Means: Prerequisite
No ratings yet
Clustering - K-Means: Prerequisite
8 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
Simple K Means
No ratings yet
Simple K Means
3 pages
Unit 4
No ratings yet
Unit 4
63 pages
KMean Merged
No ratings yet
KMean Merged
13 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
K-Means Algo
No ratings yet
K-Means Algo
4 pages
K-Means Clustering
No ratings yet
K-Means Clustering
8 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
ML (Unit 4)
No ratings yet
ML (Unit 4)
19 pages
Lecture 11 K Means Clustering
No ratings yet
Lecture 11 K Means Clustering
8 pages
ML-Unit III - K-Means Clustering
No ratings yet
ML-Unit III - K-Means Clustering
22 pages
K Means Clustering
No ratings yet
K Means Clustering
27 pages
Assignment 6 ML
No ratings yet
Assignment 6 ML
4 pages
K - Means Clustering
No ratings yet
K - Means Clustering
34 pages
Clustering
No ratings yet
Clustering
18 pages
MMZ XRF O0 Ra Pre 0 ZB XGXW W1 Er 02 OAYQum QDD78 HQP
No ratings yet
MMZ XRF O0 Ra Pre 0 ZB XGXW W1 Er 02 OAYQum QDD78 HQP
4 pages
K-Means Clustering Algorithm: - V - ' Is The Euclidean Distance Between X ' Is The Number of Data Points in I
No ratings yet
K-Means Clustering Algorithm: - V - ' Is The Euclidean Distance Between X ' Is The Number of Data Points in I
3 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
Session 18-Cluster Analysis
No ratings yet
Session 18-Cluster Analysis
20 pages
DM Unit Iv
No ratings yet
DM Unit Iv
45 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
ML Seminar
No ratings yet
ML Seminar
37 pages
Unit IV
No ratings yet
Unit IV
96 pages
ML Unit-5
No ratings yet
ML Unit-5
21 pages
Here Are 20 Challenging LeetCode Questions Commonly Asked in Data Science and Informatics Interviews
No ratings yet
Here Are 20 Challenging LeetCode Questions Commonly Asked in Data Science and Informatics Interviews
5 pages
Dsu Practical Manual 24 25 K Scheme
No ratings yet
Dsu Practical Manual 24 25 K Scheme
34 pages
Computer 10th Test-1
No ratings yet
Computer 10th Test-1
1 page
Design and Analysis of Algorithms: Course Outline
No ratings yet
Design and Analysis of Algorithms: Course Outline
1 page
Ant Colony Optimization - Methods and Applications
No ratings yet
Ant Colony Optimization - Methods and Applications
346 pages
Kruskals Algorithm
No ratings yet
Kruskals Algorithm
2 pages
CSC317 Module 1and2
No ratings yet
CSC317 Module 1and2
12 pages
Data Structure and Algorithm Objective Questio1
No ratings yet
Data Structure and Algorithm Objective Questio1
3 pages
7 Data Structure II
No ratings yet
7 Data Structure II
38 pages
Lecture - 11 - 16
No ratings yet
Lecture - 11 - 16
82 pages
Optimization For Data Science - Lecture1 - Slides
No ratings yet
Optimization For Data Science - Lecture1 - Slides
9 pages
Regno: Name: Cse 102 - Data Structures and Algorithms - D1 & D2
No ratings yet
Regno: Name: Cse 102 - Data Structures and Algorithms - D1 & D2
12 pages
JAVA LAB Manual
No ratings yet
JAVA LAB Manual
9 pages
Recursion Problems
No ratings yet
Recursion Problems
7 pages
Live Complete Course Brochure
No ratings yet
Live Complete Course Brochure
21 pages
Data Structures Using C
No ratings yet
Data Structures Using C
1 page
Linear Time Sorting
No ratings yet
Linear Time Sorting
5 pages
Strategies For Query Processing
No ratings yet
Strategies For Query Processing
19 pages
OCS351-AIML Question Bank
100% (1)
OCS351-AIML Question Bank
5 pages
BFS and DFS
No ratings yet
BFS and DFS
9 pages
DSD Unit 1 Analysis of Algorithm
No ratings yet
DSD Unit 1 Analysis of Algorithm
38 pages
Binary Tree - Expression Tree
No ratings yet
Binary Tree - Expression Tree
60 pages
Doubly Linked List C++
No ratings yet
Doubly Linked List C++
4 pages
Unit III - Digital Image Fundamentals
No ratings yet
Unit III - Digital Image Fundamentals
19 pages
Assignment2 PDF
No ratings yet
Assignment2 PDF
2 pages
pst1 1st Sem Bca
No ratings yet
pst1 1st Sem Bca
14 pages
Linked List PPT Cs A Level
No ratings yet
Linked List PPT Cs A Level
16 pages
MiniMax Algorithm
No ratings yet
MiniMax Algorithm
1 page

K-Mean Clustering

Uploaded by

K-Mean Clustering

Uploaded by

K-Means Clustering Algorithm

K-Means Clustering is an unsupervised learning algorithm that is used to solve the

What is the K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled

It is a centroid-based algorithm, where each cluster is associated with a centroid. The

The k-means clustering algorithm mainly performs two tasks:

How does the K-Means Algorithm Work?

Step-1: Select the number K to decide the number of clusters.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Let's understand the above steps by considering the visual plots:

How to choose the value of "K number of clusters" in

In the above formula of WCSS,

○ It executes the K-means clustering on a given dataset for different K values

○ For each value of K, calculates the WCSS value.

You might also like