0% found this document useful (0 votes)

38 views16 pages

KMeans Clustering

The document discusses the K-means clustering algorithm, an unsupervised machine learning technique that groups unlabeled data points into K number of clusters. It works by first selecting K random centroids, then assigning each data point to the closest centroid and recalculating the centroids until the clusters stabilize. The algorithm is commonly used for customer segmentation, text clustering, image compression, and anomaly detection. It aims to minimize distances between points and cluster centers, though it works best for spherical clusters and cannot determine overlapping clusters.

Uploaded by

Basant Kothari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views16 pages

KMeans Clustering

Uploaded by

Basant Kothari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

K-Means Clustering Algorithm

K-Means Clustering is an unsupervised learning algorithm that is used to solve the

clustering problems in machine learning or data science. In this topic, we will learn what
is K-means clustering algorithm, how the algorithm works, along with the Python
implementation of k-means clustering.

What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled
dataset into different clusters. Here K defines the number of pre-defined clusters that
need to be created in the process, as if K=2, there will be two clusters, and for K=3,
there will be three clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in
such a way that each dataset belongs only one group that has similar properties.

It allows us to cluster the data into different groups and a convenient way to discover
the categories of groups in the unlabeled dataset on its own without the need for any
training.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The

main aim of this algorithm is to minimize the sum of distances between the data point
and their corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k
should be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near
to the particular k-center, create a cluster.

Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.

The below diagram explains the working of the K-means Clustering Algorithm:
How does the K-Means Algorithm Work?
The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the predefined
K clusters.

Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two
variables is given below:
o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them
into different clusters. It means here we will try to group these datasets into two
different clusters.
o We need to choose some random k points or centroid to form the cluster. These
points can be either the points from the dataset or any other point. So, here we
are selecting the below two points as k points, which are not the part of our
dataset. Consider the below image:

o Now we will assign each data point of the scatter plot to its closest K-point or
centroid. We will compute it by applying some mathematics that we have studied
to calculate the distance between two points. So, we will draw a median between
both the centroids. Consider the below image:

From the above image, it is clear that points left side of the line is near to the K1 or blue
centroid, and points to the right of the line are close to the yellow centroid. Let's color
them as blue and yellow for clear visualization.
o As we need to find the closest cluster, so we will repeat the process by
choosing a new centroid. To choose the new centroids, we will compute the
center of gravity of these centroids, and will find new centroids as below:
o Next, we will reassign each datapoint to the new centroid. For this, we will repeat
the same process of finding a median line. The median will be like below image:

From the above image, we can see, one yellow point is on the left side of the line, and
two blue points are right to the line. So, these three points will be assigned to new
centroids.
As reassignment has taken place, so we will again go to the step-4, which is finding new
centroids or K-points.
o We will repeat the process by finding the center of gravity of centroids, so the
new centroids will be as shown in the below image:
o As we got the new centroids so again will draw the median line and reassign the
data points. So, the image will be:

o We can see in the above image; there are no dissimilar data points on either side
of the line, which means our model is formed. Consider the below image:
As our model is ready, so we can now remove the assumed centroids, and the two final
clusters will be as shown in the below image:

AD
Why K-Means?
K-means as a clustering algorithm is deployed to discover groups that haven’t been
explicitly labeled within the data. It’s being actively used today in a wide variety of business
applications including:

 Customer segmentation: Customers can be grouped in order to better tailor products

and offerings.
 Text, document, or search results clustering: grouping to find topics in text.
 Image grouping or image compression: groups similar in images or colors.
 Anomaly detection: finds what isn’t similar—or the outliers from clusters
 Semi-supervised learning: clusters are combined with a smaller set of labeled data
and supervised machine learning in order to get more valuable results.

How K-Means Works

The K-means algorithm identifies a certain number of centroids within a data set, a centroid
being the arithmetic mean of all the data points belonging to a particular cluster. The
algorithm then allocates every data point to the nearest cluster as it attempts to keep the
clusters as small as possible (the ‘means’ in K-means refers to the task of averaging the
data or finding the centroid). At the same time, K-means attempts to keep the other clusters
as different as possible.
In practice it works as follows:

 The K-means algorithm begins by initializing all the coordinates to “K” cluster
centers. (The K number is an input variable and the locations can also be given as
input.)
 With every pass of the algorithm, each point is assigned to its nearest cluster center.
 The cluster centers are then updated to be the “centers” of all the points assigned to
it in that pass. This is done by re-calculating the cluster centers as the average of the
points in each respective cluster.
 The algorithm repeats until there’s a minimum change of the cluster centers from the
last iteration.

K-means is very effective in capturing structure and making data inferences if the clusters
have a uniform, spherical shape. But if the clusters have more complex geometric shapes,
the algorithm does a poor job of clustering the data. Another shortcoming of K-means is that
the algorithm does not allow data points distant from one another to share the same cluster,
regardless of whether they belong in the cluster. K-means does not itself learn the number
of clusters from the data, rather that information must be pre-defined. And finally, when
there is overlapping between or among clusters, K-means cannot determine how to assign
data points where the overlap occurs.

K-Means for Data Scientists

Owing to its intrinsic simplicity and popularity in unsupervised machine learning operations,
K-means has gained favor among data scientists. Its applicability in data mining operations
allows data scientists to leverage the algorithm to derive various inferences from business
data and enable more accurate data-driven decision-making, the limitations of the algorithm
notwithstanding. It’s widely considered among the most business-critical algorithms or data
scientists.

L7 Clustering
No ratings yet
L7 Clustering
58 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
ML (Unit 4)
No ratings yet
ML (Unit 4)
19 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Unit 4
No ratings yet
Unit 4
125 pages
UNIT III Part-1
No ratings yet
UNIT III Part-1
69 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
Clustering
No ratings yet
Clustering
17 pages
K Clustering
No ratings yet
K Clustering
28 pages
Assignment Haloalkanes and Haloarenes
100% (2)
Assignment Haloalkanes and Haloarenes
2 pages
Letter To The Principal: University of Southern Mindanao
100% (1)
Letter To The Principal: University of Southern Mindanao
4 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
20 pages
Assignment 6 ML
No ratings yet
Assignment 6 ML
4 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Algo
No ratings yet
Algo
59 pages
Clustering Notes
No ratings yet
Clustering Notes
29 pages
ML 12
No ratings yet
ML 12
19 pages
Clustering
No ratings yet
Clustering
24 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
1 Kmeans
No ratings yet
1 Kmeans
13 pages
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
No ratings yet
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
20 pages
DM Unit Iv
No ratings yet
DM Unit Iv
45 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
CourseOutline HCLT133 1 July Dec2025 LS V.1 02072025
No ratings yet
CourseOutline HCLT133 1 July Dec2025 LS V.1 02072025
58 pages
KMeans Clustering
No ratings yet
KMeans Clustering
11 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Week 11
No ratings yet
Week 11
49 pages
Unit 4
No ratings yet
Unit 4
22 pages
ML Exp8
No ratings yet
ML Exp8
4 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
Machine Learning Chapter 3
No ratings yet
Machine Learning Chapter 3
12 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Clustering
No ratings yet
Clustering
18 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
Lesson Plans PDF
50% (2)
Lesson Plans PDF
11 pages
Chapter 9
No ratings yet
Chapter 9
8 pages
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Class XII LINEAR PROGRAMMING PROBLEMS Most Important Questions For 2023-24 Examination (Dr. Amit Bajaj)
100% (1)
Class XII LINEAR PROGRAMMING PROBLEMS Most Important Questions For 2023-24 Examination (Dr. Amit Bajaj)
43 pages
Key Issues in E-Learning (1847063608)
100% (5)
Key Issues in E-Learning (1847063608)
180 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
Essentials of Instructional Technology: Mudasir Hamid Malik Aqueel Ahmad Pandith
No ratings yet
Essentials of Instructional Technology: Mudasir Hamid Malik Aqueel Ahmad Pandith
66 pages
Kmean
No ratings yet
Kmean
24 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
K-Means Algo
No ratings yet
K-Means Algo
4 pages
K-Means Clustering Algorithm - Javatpoint
No ratings yet
K-Means Clustering Algorithm - Javatpoint
21 pages
Grant Proposal Topic Verge Learning Management System
No ratings yet
Grant Proposal Topic Verge Learning Management System
8 pages
Project Work
No ratings yet
Project Work
28 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
Pilot
No ratings yet
Pilot
3 pages
Pedagogy Solved Questions
No ratings yet
Pedagogy Solved Questions
10 pages
Clustering
No ratings yet
Clustering
10 pages
ENG4U Course Outline 2022-2023
No ratings yet
ENG4U Course Outline 2022-2023
9 pages
Reflective Teaching
No ratings yet
Reflective Teaching
19 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Models of Curriculum
No ratings yet
Models of Curriculum
23 pages
Exp 7
No ratings yet
Exp 7
3 pages
Roles and Responsibilities
67% (3)
Roles and Responsibilities
21 pages
Ways of Mediating Grammar Instructions and Activities: Day 2 Session 2
100% (1)
Ways of Mediating Grammar Instructions and Activities: Day 2 Session 2
34 pages
K Means Clustering Algorithm
No ratings yet
K Means Clustering Algorithm
12 pages
K Mean
No ratings yet
K Mean
7 pages
Malaysia National Education Philosophy
100% (1)
Malaysia National Education Philosophy
25 pages
Kmean Clustering
No ratings yet
Kmean Clustering
3 pages
Introduction To Kmeans
No ratings yet
Introduction To Kmeans
4 pages
IELTS Presentation
No ratings yet
IELTS Presentation
17 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
Group 11
No ratings yet
Group 11
27 pages
q4 Arts 9 Melc 3 Modular 092119
No ratings yet
q4 Arts 9 Melc 3 Modular 092119
7 pages
Instructional Leadership
No ratings yet
Instructional Leadership
2 pages
Week 6
No ratings yet
Week 6
6 pages
Cost Benefit Analysis
No ratings yet
Cost Benefit Analysis
5 pages
Lecture in GEd 202 - Module 05
No ratings yet
Lecture in GEd 202 - Module 05
7 pages
Chapter Two
No ratings yet
Chapter Two
16 pages
Revision For Module Test1
No ratings yet
Revision For Module Test1
2 pages
Weekly Home Learning Plan - Grade 12 Week 3
100% (1)
Weekly Home Learning Plan - Grade 12 Week 3
1 page
Electricity Question Bank 2122
No ratings yet
Electricity Question Bank 2122
6 pages
Class Program Grade One 2ND Quarter
No ratings yet
Class Program Grade One 2ND Quarter
5 pages
Unit Information Form (UIF)
No ratings yet
Unit Information Form (UIF)
7 pages
K Mean
No ratings yet
K Mean
12 pages
Motivation and learning-Rivero-Menéndez (OpenAccess)
No ratings yet
Motivation and learning-Rivero-Menéndez (OpenAccess)
13 pages
Lanthanides and Actinides.
No ratings yet
Lanthanides and Actinides.
1 page
Roles of Primary School Teacher
No ratings yet
Roles of Primary School Teacher
21 pages
Positive Reinforcement
100% (1)
Positive Reinforcement
2 pages
ES605/ES805: Research Methodology (2-0-6) 1/2017
No ratings yet
ES605/ES805: Research Methodology (2-0-6) 1/2017
2 pages
1st Grade Small Group Reading Lesson Plan - Main Idea, Topic & Details
No ratings yet
1st Grade Small Group Reading Lesson Plan - Main Idea, Topic & Details
2 pages
Dive Into Data Science: Use Python To Tackle Your Toughest Business Challenges
From Everand
Dive Into Data Science: Use Python To Tackle Your Toughest Business Challenges
Bradford Tuckfield
No ratings yet

KMeans Clustering

Uploaded by

KMeans Clustering

Uploaded by

K-Means Clustering Algorithm

K-Means Clustering is an unsupervised learning algorithm that is used to solve the

What is K-Means Algorithm?

It is a centroid-based algorithm, where each cluster is associated with a centroid. The

The k-means clustering algorithm mainly performs two tasks:

Step-1: Select the number K to decide the number of clusters.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the visual plots:

 Customer segmentation: Customers can be grouped in order to better tailor products

How K-Means Works

K-Means for Data Scientists

You might also like