0% found this document useful (0 votes)

7 views9 pages

Data Mining-4

The K-Means algorithm involves selecting a number of clusters (K), initializing centroids, assigning data points to the nearest centroid, recalculating centroids, and repeating the assignment until no changes occur. The Elbow method is a popular technique for determining the optimal number of clusters by plotting the Within Cluster Sum of Squares (WCSS) against different K values and identifying the point where the curve bends sharply. This method helps in finding a balance between the number of clusters and the variance within them.

Uploaded by

Xyz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views9 pages

Data Mining-4

Uploaded by

Xyz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new
closest centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two
variables is given below:
o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put
them into different clusters. It means here we will try to group these datasets
into two different clusters.
o We need to choose some random k points or centroid to form the cluster.
These points can be either the points from the dataset or any other point. So,
here we are selecting the below two points as k points, which are not the part
of our dataset. Consider the below image:

o Now we will assign each data point of the scatter plot to its closest K-point or
centroid. We will compute it by applying some mathematics that we have
studied to calculate the distance between two points. So, we will draw a
median between both the centroids. Consider the below image:

From the above image, it is clear that points left side of the line is near to the K1 or
blue centroid, and points to the right of the line are close to the yellow centroid. Let's
color them as blue and yellow for clear visualization.
o As we need to find the closest cluster, so we will repeat the process by
choosing a new centroid. To choose the new centroids, we will compute the
center of gravity of these centroids, and will find new centroids as below:

o Next, we will reassign each datapoint to the new centroid. For this, we will
repeat the same process of finding a median line. The median will be like
below image:

From the above image, we can see, one yellow point is on the left side of the line,
and two blue points are right to the line. So, these three points will be assigned to
new centroids.
As reassignment has taken place, so we will again go to the step-4, which is finding
new centroids or K-points.

o We will repeat the process by finding the center of gravity of centroids, so the
new centroids will be as shown in the below image:
o As we got the new centroids so again will draw the median line and reassign
the data points. So, the image will be:

o We can see in the above image; there are no dissimilar data points on either
side of the line, which means our model is formed. Consider the below image:

As our model is ready, so we can now remove the assumed centroids, and the two
final clusters will be as shown in the below image:
How to choose the value of "K number of clusters" in
K-means Clustering?
The performance of the K-means clustering algorithm depends upon highly efficient
clusters that it forms. But choosing the optimal number of clusters is a big task. There
are some different ways to find the optimal number of clusters, but here we are
discussing the most appropriate method to find the number of clusters or value of K.
The method is given below:

Elbow Method
The Elbow method is one of the most popular ways to find the optimal number of
clusters. This method uses the concept of WCSS value. WCSS stands for Within
Cluster Sum of Squares, which defines the total variations within a cluster. The
formula to calculate the value of WCSS (for 3 clusters) is given below:

WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2distance(Pi C2)2+∑Pi in

CLuster3 distance(Pi C3)2

In the above formula of WCSS,

∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances between each
data point and its centroid within a cluster1 and the same for the other two terms.

To measure the distance between data points and centroid, we can use any method
such as Euclidean distance or Manhattan distance.
To find the optimal value of clusters, the elbow method follows the below steps:

o It executes the K-means clustering on a given dataset for different K values

(ranges from 1-10).
o For each value of K, calculates the WCSS value.
o Plots a curve between calculated WCSS values and the number of clusters K.
o The sharp point of bend or a point of the plot looks like an arm, then that
point is considered as the best value of K.

Since the graph shows the sharp bend, which looks like an elbow, hence it is known
as the elbow method. The graph for the elbow method looks like the below image:

***We can choose the number of clusters equal to the given data points. If we choose
the number of clusters equal to the data points, then the value of WCSS becomes zero,
and that will be the endpoint of the plot.

K Means
No ratings yet
K Means
26 pages
Algo
No ratings yet
Algo
59 pages
Lecture 18 K Means Clustering
No ratings yet
Lecture 18 K Means Clustering
77 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
Kmea
No ratings yet
Kmea
53 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
22 pages
Unit 4
No ratings yet
Unit 4
63 pages
Clustering
No ratings yet
Clustering
17 pages
Clustering Notes
No ratings yet
Clustering Notes
29 pages
CPE412 Pattern Recognition (Week 7)
No ratings yet
CPE412 Pattern Recognition (Week 7)
48 pages
Clustering
No ratings yet
Clustering
24 pages
Unit 4
No ratings yet
Unit 4
22 pages
Assignment 4 A
No ratings yet
Assignment 4 A
15 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
6 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
K-Means Clustering
No ratings yet
K-Means Clustering
7 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Unit 4 Aiml
No ratings yet
Unit 4 Aiml
24 pages
Lab Report6 - B21CI014
No ratings yet
Lab Report6 - B21CI014
8 pages
K Clustering
No ratings yet
K Clustering
28 pages
ML-Unit III - K-Means Clustering
No ratings yet
ML-Unit III - K-Means Clustering
22 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
K-Means With Elbow Method
No ratings yet
K-Means With Elbow Method
24 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
K-Means Clustering Algorithm - Javatpoint
No ratings yet
K-Means Clustering Algorithm - Javatpoint
21 pages
ML Seminar
No ratings yet
ML Seminar
37 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
DWDM Unit5
No ratings yet
DWDM Unit5
14 pages
K Means Clustering
No ratings yet
K Means Clustering
27 pages
K-Means Clustering
No ratings yet
K-Means Clustering
14 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
K-Means Algo
No ratings yet
K-Means Algo
4 pages
Kmean
No ratings yet
Kmean
24 pages
ML Unit 4 Part A Material
No ratings yet
ML Unit 4 Part A Material
15 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
ML Practical 4
No ratings yet
ML Practical 4
2 pages
K Means Clustering
No ratings yet
K Means Clustering
13 pages
Kmeansfinal
No ratings yet
Kmeansfinal
16 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Elbow Method For Optimal Cluster Number in K-Means
No ratings yet
Elbow Method For Optimal Cluster Number in K-Means
8 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Clustering
No ratings yet
Clustering
18 pages
KMean Merged
No ratings yet
KMean Merged
13 pages
ParkMe - Pitch Deck
No ratings yet
ParkMe - Pitch Deck
10 pages
Lecture 11 K Means Clustering
No ratings yet
Lecture 11 K Means Clustering
8 pages
Quality of Clustering: Clustering (K-Means Algorithm)
No ratings yet
Quality of Clustering: Clustering (K-Means Algorithm)
4 pages
K-MEANS CLUSTERING PPT Kpu
No ratings yet
K-MEANS CLUSTERING PPT Kpu
4 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
Simple K Means
No ratings yet
Simple K Means
3 pages
National Walk To Work Day by Slidesgo
No ratings yet
National Walk To Work Day by Slidesgo
56 pages
K Means Clustering Algorithm
No ratings yet
K Means Clustering Algorithm
12 pages
Sneha Sabu Project Report
No ratings yet
Sneha Sabu Project Report
81 pages
27th NCeG Compendium Booklet
No ratings yet
27th NCeG Compendium Booklet
191 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
Track Mobile Location
100% (1)
Track Mobile Location
14 pages
Polymorphism Interview Questions and Answers
No ratings yet
Polymorphism Interview Questions and Answers
16 pages
AI For Beginners Unlocking The Future With Artificial Intelligence A Comprehensive Guide To Understanding, Applying, And... (Austin, Garrick) (Z-Library)
No ratings yet
AI For Beginners Unlocking The Future With Artificial Intelligence A Comprehensive Guide To Understanding, Applying, And... (Austin, Garrick) (Z-Library)
197 pages
Chapter 18 - Virtual Machines
100% (1)
Chapter 18 - Virtual Machines
28 pages
Cyber Forensics Imp Questions
No ratings yet
Cyber Forensics Imp Questions
8 pages
Gtustudy in Dbms Gtu Paper Solution Winter 2021
No ratings yet
Gtustudy in Dbms Gtu Paper Solution Winter 2021
12 pages
Austroads Guide Traffic Management Part3 AGTM03 09
No ratings yet
Austroads Guide Traffic Management Part3 AGTM03 09
197 pages
Mlo
No ratings yet
Mlo
2 pages
CCascelli Masters Thesis
No ratings yet
CCascelli Masters Thesis
88 pages
GE200 Manual EN
No ratings yet
GE200 Manual EN
32 pages
Nama Item Harga Keterangan
No ratings yet
Nama Item Harga Keterangan
14 pages
ST 2
No ratings yet
ST 2
5 pages
4.1.2 Lab - Implement MST - ILM
No ratings yet
4.1.2 Lab - Implement MST - ILM
25 pages
NetXMS ATM Monitoring
No ratings yet
NetXMS ATM Monitoring
14 pages
MCC Lec01
No ratings yet
MCC Lec01
13 pages
Document From Suryateja - 1690895839741 - SuryaTeja Gourishetty
No ratings yet
Document From Suryateja - 1690895839741 - SuryaTeja Gourishetty
5 pages
GD - 1SP0635 Manual Power Integrations
No ratings yet
GD - 1SP0635 Manual Power Integrations
27 pages
Forecasting Exercises Problem
100% (1)
Forecasting Exercises Problem
2 pages
CV Template
No ratings yet
CV Template
3 pages
Redelmeier's Algorithm
No ratings yet
Redelmeier's Algorithm
13 pages
Course Brochure 6weeks
No ratings yet
Course Brochure 6weeks
6 pages
Natural Ventilation Report - Assembly Building PDF
No ratings yet
Natural Ventilation Report - Assembly Building PDF
6 pages
LE3u 56MR Manual
100% (1)
LE3u 56MR Manual
17 pages
Fujitsu G60 Multi-Cassette Currency Recycling Module: Data Sheet
No ratings yet
Fujitsu G60 Multi-Cassette Currency Recycling Module: Data Sheet
3 pages
SQL Server Version List
No ratings yet
SQL Server Version List
3 pages
Writing Task Final
No ratings yet
Writing Task Final
1 page

Data Mining-4

Uploaded by

Data Mining-4

Uploaded by

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2distance(Pi C2)2+∑Pi in

CLuster3 distance(Pi C3)2

In the above formula of WCSS,

o It executes the K-means clustering on a given dataset for different K values

You might also like