0% found this document useful (0 votes)

32 views8 pages

Lecture 11 K Means Clustering

Uploaded by

anavlamba94

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views8 pages

Lecture 11 K Means Clustering

Uploaded by

anavlamba94

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

K-Mean Clustering

 K-means clustering is an unsupervised machine learning technique where we cluster data points based on
similarity or closeness between the data points how exactly We cluster them?

Definition: It groups the data points based on their similarity or closeness to each other, in simple terms, the
algorithm needs to find the data points whose values are similar to each other and therefore these points would
then belong to the same cluster.
OR
A K-means clustering algorithm tries to group similar items in the form of clusters. The number of groups is
represented by K.
‘Distance Measure’ - ‘Euclidean Distance’

The observations which are closer or similar to each other would have low Euclidean distance and then
clustered together.
The k-means algorithm uses the concept of centroid to create ‘k clusters.’
Steps in K-Means:
step1: choose k value for ex: k=2
step2: initialize centroids randomly
step3: calculate Euclidean distance from centroids to each data point and form clusters that are close to centroids
step4: find the centroid of each cluster and update centroids
step:5 repeat step3
Each time clusters are made centroids are updated, the updated centroid is the center of all points which
fall in the cluster. This process continues till the centroid no longer changes i.e solution converges.
Example: Suppose you went to a vegetable shop to buy some vegetables. There you will see different kinds of
vegetables. The one thing you will notice there that the vegetables will be arranged in a group of their types. Like
all the carrots will be kept in one place, potatoes will be kept with their kinds and so on. If you will notice here
then you will find that they are forming a group or cluster, where each of the vegetables is kept within their kind
of group forming the clusters.

How Does the K-means clustering algorithm work?

k-means clustering tries to group similar kinds of items in form of clusters. It finds the similarity between the items
and groups them into the clusters. K-means clustering algorithm works in three steps. Let’s see what are these
three steps.
1. Select the k values.
2. Initialize the centroids.
3. Select the group and find the average.
Let us understand the above steps with the help of the figure because a good picture is better than the thousands of
words.
How to choose the value of K?
 If we choose the k values randomly, it might be correct or may be wrong.
 The wrong k value will directly affect your model performance.
 So there are two methods by which you can select the right value of k.
1. Elbow Method.
2. Silhouette Method.
Elbow Method
 It is an empirical method to find out the best value of k. it picks up the range of values and takes the best among
them.
 It calculates the sum of the square of the points and calculates the average distance.

 When the value of k is 1, the within-cluster sum of the square will be high. As the value of k increases, the
within-cluster sum of square value will decrease.
Finally, we will plot a graph between k-values and the within-cluster sum of the square to get the k value. we will
examine the graph carefully. At some point, our graph will decrease abruptly. That point will be considered as a
value of k.

Silhouette Method
 The elbow method it also picks up the range of the k values and draws the silhouette graph.
 It calculates the silhouette coefficient of every point.
 It calculates the average distance of points within its cluster a(i) and the average distance of the points to its next
closest cluster called b(i).

Note : The a(i) value must be less than the b(i) value, that is a(i)<<b(i).

Now, we have the values of a (i) and b (i). we will calculate the silhouette coefficient by using the below formula.
 The plot of the silhouette is between -1 to 1.Silhouette coefficient equal to -1 is the worst case scenario.
Observe the plot and check which of the k values is closer 1.

Advantages of K-means
1. It is very simple to implement.
2. It is scalable to a huge data set and also faster to large datasets.
3. it adapts the new examples very frequently.
4. Generalization of clusters for different shapes and sizes.
Disadvantages of K-means
1. It is sensitive to the outliers.
2. Choosing the k values manually is a tough job.
3. As the number of dimensions increases its scalability decreases.
Example
Numerical – Using K means clustering algorithm form two clusters for given data.

Height Weight

185 72

170 56

168 60

179 68

182 72

188 77

180 71

180 70

183 84

180 88

180 67

177 76

Note: As per question we need to form 2 clusters, So for that we consider first two data points of our data and assign
them as a centroid for each cluster.
 Now we need to assign each and every data point of our data to one of these clusters based on Euclidean distance
calculation.

 Here (X0,Y0) is our data point and (Xc,Yc) is a centroid of a particular cluster. Lets consider the next data point
i.e. 3rd data point(168,60) and check its distance with the centroid of both clusters.

 Now we can see from calculations that 3rd data point(168,60) is more closer to k2(cluster 2), so we assign it to k2.
After that we need to modify the centroid of k2 by using the old centroid values and new data point which we just
assigned to k2.

 Now after new centroid calculations we got new centroid value for k2 as (169,58) and k1 centroid value will
remain the same as NO new data point is added to that cluster(k1). We need to repeat the above mentioned
procedure until all data points are over.
CW

Location of (x, y) in term of distance is given below. Cluster the given distance points in three clusters

A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)

Initial cluster centers are: A1(2, 10), A4(5, 8) and A7(1, 2).

Unit IV
No ratings yet
Unit IV
51 pages
Unit V
No ratings yet
Unit V
165 pages
K Means
No ratings yet
K Means
26 pages
K - Means Clustering
No ratings yet
K - Means Clustering
34 pages
K-MEANS CLUSTERING PPT Kpu
No ratings yet
K-MEANS CLUSTERING PPT Kpu
4 pages
Clustering Numericals
No ratings yet
Clustering Numericals
8 pages
Week 10
No ratings yet
Week 10
41 pages
K - Mean Clustering
No ratings yet
K - Mean Clustering
15 pages
Lecture 18 K Means Clustering
No ratings yet
Lecture 18 K Means Clustering
77 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
Clustering
No ratings yet
Clustering
17 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
22 pages
K Clustering
No ratings yet
K Clustering
28 pages
Unit 4
No ratings yet
Unit 4
63 pages
Kmea
No ratings yet
Kmea
53 pages
K-Means and PCA
No ratings yet
K-Means and PCA
69 pages
Algo
No ratings yet
Algo
59 pages
PART2
No ratings yet
PART2
61 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
Clustering - K-Means: Prerequisite
No ratings yet
Clustering - K-Means: Prerequisite
8 pages
K Mean Clustering
No ratings yet
K Mean Clustering
45 pages
Module 3 - Theory of Production, Cost and Revenue
100% (1)
Module 3 - Theory of Production, Cost and Revenue
11 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
CPE412 Pattern Recognition (Week 7)
No ratings yet
CPE412 Pattern Recognition (Week 7)
48 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Unit 4
No ratings yet
Unit 4
22 pages
Digital Computer Concept and Practice: Unsupervised Learning
No ratings yet
Digital Computer Concept and Practice: Unsupervised Learning
21 pages
Quality of Clustering: Clustering (K-Means Algorithm)
No ratings yet
Quality of Clustering: Clustering (K-Means Algorithm)
4 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
K Means
No ratings yet
K Means
25 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
Clustering
No ratings yet
Clustering
18 pages
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
Data Mining-4
No ratings yet
Data Mining-4
9 pages
ML Seminar
No ratings yet
ML Seminar
37 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
Kmean
No ratings yet
Kmean
24 pages
ML-Unit III - K-Means Clustering
No ratings yet
ML-Unit III - K-Means Clustering
22 pages
KMean Merged
No ratings yet
KMean Merged
13 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
K-Means With Elbow Method
No ratings yet
K-Means With Elbow Method
24 pages
08 K-Means
No ratings yet
08 K-Means
19 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
Module 5.3 Lateral Loads On Building Frames (Portal and Cantilever Method)
No ratings yet
Module 5.3 Lateral Loads On Building Frames (Portal and Cantilever Method)
11 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Kmeansfinal
No ratings yet
Kmeansfinal
16 pages
K-Means Clustering Algorithm - Javatpoint
No ratings yet
K-Means Clustering Algorithm - Javatpoint
21 pages
Mastercam Lathe Lesson 7 CAMInstructor
100% (3)
Mastercam Lathe Lesson 7 CAMInstructor
56 pages
10Th Maths EM Creative One Mark - UNIT 3 - 4 - Kalviexpress
No ratings yet
10Th Maths EM Creative One Mark - UNIT 3 - 4 - Kalviexpress
10 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Module 9 - Motions of Physics - Study Guide
No ratings yet
Module 9 - Motions of Physics - Study Guide
4 pages
ULA Resource Pack (Urdu Version)
No ratings yet
ULA Resource Pack (Urdu Version)
70 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
MAT1100 Integral Calculus I - 2020
No ratings yet
MAT1100 Integral Calculus I - 2020
6 pages
K Means Clustering Algorithm
No ratings yet
K Means Clustering Algorithm
12 pages
Main Elements of The Comunication Process
No ratings yet
Main Elements of The Comunication Process
1 page
Safety Stock
No ratings yet
Safety Stock
35 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
A New and Simple Method To Estimate RE) and Ko (E) in DAEM Model From Three Set of Experimental Data
No ratings yet
A New and Simple Method To Estimate RE) and Ko (E) in DAEM Model From Three Set of Experimental Data
6 pages
Basic Properties and Behaviors of Oil and Gas Reservoirs PDF
No ratings yet
Basic Properties and Behaviors of Oil and Gas Reservoirs PDF
97 pages
Math5 DLLQ4W2 A
No ratings yet
Math5 DLLQ4W2 A
6 pages
Simple K Means
No ratings yet
Simple K Means
3 pages
Inductive and Deductive Reasoning
No ratings yet
Inductive and Deductive Reasoning
52 pages
HEIDENHAINAccuracy of Feed Axes 349 843-20
No ratings yet
HEIDENHAINAccuracy of Feed Axes 349 843-20
12 pages
ANGLES
No ratings yet
ANGLES
9 pages
On Maximal Paths and Circuits Erods Gallai
No ratings yet
On Maximal Paths and Circuits Erods Gallai
20 pages
CONCEPTUAL UNDERSTANDING TEST Final
100% (1)
CONCEPTUAL UNDERSTANDING TEST Final
6 pages
Maths Class Ix Sample Paper 01 Blue Prints For Annual Exam 2023 1
No ratings yet
Maths Class Ix Sample Paper 01 Blue Prints For Annual Exam 2023 1
1 page
5.3.conjugate Beam Method
No ratings yet
5.3.conjugate Beam Method
14 pages
Education - Post 12th Standard - CSV
No ratings yet
Education - Post 12th Standard - CSV
11 pages
1 Complex Numbers
No ratings yet
1 Complex Numbers
9 pages
The Classification of Stocks With Basic Financial Indicators An Application of Cluster Analysis On The BIST 100 Index
No ratings yet
The Classification of Stocks With Basic Financial Indicators An Application of Cluster Analysis On The BIST 100 Index
29 pages
Introduction To Simio
No ratings yet
Introduction To Simio
10 pages
Is 10719 (Iso 1302) - 5
No ratings yet
Is 10719 (Iso 1302) - 5
1 page
Complete the table showing the rejection regions for common values of α
No ratings yet
Complete the table showing the rejection regions for common values of α
1 page
IFEM Solution Ch15
No ratings yet
IFEM Solution Ch15
3 pages
Spore News Vol 3 No 1
No ratings yet
Spore News Vol 3 No 1
6 pages
Equations PDF
No ratings yet
Equations PDF
20 pages
The Sublime Girls Academy of Science Rajanpur: Long Questions
No ratings yet
The Sublime Girls Academy of Science Rajanpur: Long Questions
2 pages
Lecture#1: The Geometry of Linear Equations
No ratings yet
Lecture#1: The Geometry of Linear Equations
2 pages
Dive Into Algorithms: A Pythonic Adventure for the Intrepid Beginner
From Everand
Dive Into Algorithms: A Pythonic Adventure for the Intrepid Beginner
Bradford Tuckfield
No ratings yet
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
Bresenham Line Algorithm: Efficient Pixel-Perfect Line Rendering for Computer Vision
From Everand
Bresenham Line Algorithm: Efficient Pixel-Perfect Line Rendering for Computer Vision
Fouad Sabry
No ratings yet

Lecture 11 K Means Clustering

Uploaded by

Lecture 11 K Means Clustering

Uploaded by

K-Mean Clustering

How Does the K-means clustering algorithm work?

You might also like