0% found this document useful (0 votes)

7 views35 pages

K Means Clustering

Uploaded by

Sajjad Khan8254

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views35 pages

K Means Clustering

Uploaded by

Sajjad Khan8254

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Lecture 7

K Means Clustering

Abrar Hasan
Lecturer
Dept. of Software Engineering
< >

T 1

< T 1 >

Clustering
< >

T 1

< T 1 >

Clustering

▪ Clustering: the process of grouping a set of

objects into classes of similar objects
▪ Documents within a cluster should be similar.
▪ Documents from different clusters should be
dissimilar.
< >

T 1

< T 1 >

Clustering
▪ Clustering: the process of grouping a set of objects into classes of
similar objects
▪ Documents within a cluster should be similar.
▪ Documents from different clusters should be dissimilar.

▪ The commonest form of unsupervised learning

▪ Unsupervised learning = learning from raw data, as
opposed to supervised data where a classification of
examples is given
▪ A common and important task that finds many applications in
IR and other places
< >

T 1

< T 1 >

Clustering
< >

T 1

< T 1 >

Clustering
Class 2

Class 1
< >

T 1

< T 1 >

Clustering
Class 2

Centroid

Class 1
< >

T 1

< T 1 >

Clustering
Example of Clustering:
In real life when a company opens a store, they finds out which
location would be the best location for their store.

Ever Wondered how it is done?

< >

T 1

< T 1 >

Clustering
Lets give you an idea.

At first,
The company collects the dataset of address (latitude and
longitude) of the potential customers.
< >

T 1

< T 1 >
Latitude(x) longitude(y)
Clustering 20.201
25.72864
49.81513
50.11764
The data set looks like this. 23.8191
18.69347
47.39496
44.06723
17.98995 52.33613
22.81407 56.47059
14.37186 48.30252
14.37186 52.84034
16.68341 56.16806
51.65829 20.47059
44.52261 20.97479
Looks scary, right? 48.34171
43.51759
16.94118
16.13446
53.16582 17.04202
54.87437 24.60504
Lets make it beautiful 48.34171 25.10924
56.38191 20.7731
50.85427 13.61345
57.18593 15.32773
< >

T 1

< T 1 >

Clustering Y-Values
60

0
0 10 20 30 40 50 60 70
< >

T 1

< T 1 >

Clustering
60

0
0 10 20 30 40 50 60 70
< >

T 1

< T 1 >

Clustering
60

Is it beneficial? 40

30 Store Location

0
0 10 20 30 40 50 60 70
< >

T 1

< T 1 >

Clustering
60

So the company
actually makes 40
clusters
30

0
0 10 20 30 40 50 60 70
< >

T 1

< T 1 >

Clustering Store Location 1

So the company Store Location 2

actually makes 40
clusters
30

0
0 10 20 30 40 50 60 70
< >

T 1

< T 1 >

Clustering

Now the question arises:

So what should be the store location (latitude and longitude)?

Or how the clustering would be done?

< >

T 1

< T 1 >

K-Means
< >

T 1

< T 1 >

K Means

6
Remember Euclidian distance? 4, 5
5
2, 4
4
𝑥1 − 𝑥2 2 + 𝑦1 − 𝑦2 2
3

2
= 2−4 2 + 4−5 2
1

0
0 2 4 6
< >

T 1

< T 1 >

K Means
Steps of K means:

STEP 1 : First know, what is the value of K?

(How many cluster you want to make)

< >

T 1

< T 1 >
Latitude(x) longitude(y)
K Means 20.201 49.81513
25.72864 50.11764
23.8191 47.39496
Steps of K means: 18.69347 44.06723
17.98995 52.33613
22.81407 56.47059
STEP 2: Lets suppose K= 2. And you have a dataset 14.37186 48.30252
14.37186 52.84034
16.68341 56.16806
51.65829 20.47059
44.52261 20.97479
48.34171 16.94118
43.51759 16.13446
53.16582 17.04202
54.87437 24.60504
48.34171 25.10924
56.38191 20.7731
50.85427 13.61345
57.18593 15.32773
< >

T 1

< T 1 >
Latitude(x) longitude(y)
K Means 20.201 49.81513
25.72864 50.11764
23.8191 47.39496
Steps of K means: 18.69347 44.06723
17.98995 52.33613
22.81407 56.47059
As K is 2, first two value would be the centroid of two 14.37186 48.30252
clusters, Initially. 14.37186 52.84034
16.68341 56.16806
51.65829 20.47059
44.52261 20.97479
So centroid of cluster 1 = (20.201, 49.81) 48.34171 16.94118
43.51759 16.13446
So centroid of cluster 2 = (25.72864, 50.11764) 53.16582 17.04202
) 54.87437 24.60504
48.34171 25.10924
56.38191 20.7731
50.85427 13.61345
57.18593 15.32773
< >

T 1

< T 1 >
Latitude(x) longitude(y)
K Means 20.201 49.81513
25.72864 50.11764
Y-Values 23.8191 47.39496
18.69347 44.06723
50.15 17.98995 52.33613
50.1 22.81407 56.47059
25.72864,50.11764 14.37186 48.30252
50.05
14.37186 52.84034
50 16.68341 56.16806
49.95 51.65829 20.47059
49.9 44.52261 20.97479
20.201,49.81513
48.34171 16.94118
49.85
43.51759 16.13446
49.8 53.16582 17.04202
0 10 20 30 54.87437 24.60504
48.34171 25.10924
56.38191 20.7731
50.85427 13.61345
57.18593 15.32773
< >

T 1

< T 1 >
Latitude(x) longitude(y)
K Means 20.201 49.81513
25.72864 50.11764
23.8191 47.39496
If K = 3 then we would select first 3 values 18.69347 44.06723
17.98995 52.33613
22.81407 56.47059
So centroid of cluster 1 = (20.201, 49.81) 14.37186 48.30252
14.37186 52.84034
16.68341 56.16806
So centroid of cluster 2 = (25.72864, 50.11764) 51.65829 20.47059
44.52261 20.97479
48.34171 16.94118
So centroid of cluster 3 = (23.8191, 47.3949) 43.51759 16.13446
53.16582 17.04202
54.87437 24.60504
48.34171 25.10924
56.38191 20.7731
50.85427 13.61345
57.18593 15.32773
< >

T 1

< T 1 >
Latitude(x) longitude(y)
K Means 20.201 49.81513
25.72864 50.11764
23.8191 47.39496
STEP 3 : Handling data points 18.69347 44.06723
17.98995 52.33613
For Data Point [23.81, 47.39] 22.81407 56.47059
14.37186 48.30252
Euclidian Distance: 14.37186 52.84034
16.68341 56.16806
(X1−dataX)2+(Y1−dataY)2 51.65829 20.47059
= ( 20.2 − 23.8 )2 + ( 49.81 − 47.39 )2 44.52261 20.97479
=4.35 48.34171 16.94118
43.51759 16.13446
For Cluster 2, 53.16582 17.04202
54.87437 24.60504
(X2−dataX)2+(Y2−dataY)2 48.34171 25.10924
= √(25.72 - 23.81 )2 + (50.11 - 47.39 )2 56.38191 20.7731
=3.325 50.85427 13.61345
57.18593 15.32773
So data Point [23.81,47.39] Belongs to Cluster 2
< >

T 1

< T 1 >
Latitude(x) longitude(y)
K Means 20.201 49.81513
25.72864 50.11764
Y-Values 23.8191 47.39496
18.69347 44.06723
50.5 25.7, 50.1
17.98995 52.33613
50 22.81407 56.47059
20.2, 49.8
49.5 14.37186 48.30252
14.37186 52.84034
49 16.68341 56.16806
48.5 51.65829 20.47059
48 44.52261 20.97479
48.34171 16.94118
47.5
23.8, 47.3 43.51759 16.13446
47 53.16582 17.04202
0 10 20 30 54.87437 24.60504
48.34171 25.10924
56.38191 20.7731
50.85427 13.61345
57.18593 15.32773
< >

T 1

< T 1 >
Latitude(x) longitude(y)
K Means 20.201 49.81513
25.72864 50.11764
Y-Values 23.8191 47.39496
18.69347 44.06723
50.5 25.7, 50.1
17.98995 52.33613
50 22.81407 56.47059
20.2, 49.8
49.5 14.37186 48.30252
14.37186 52.84034
49 16.68341 56.16806
48.5 51.65829 20.47059
48 44.52261 20.97479
48.34171 16.94118
47.5 23.8, 47.3 43.51759 16.13446
47 53.16582 17.04202
0 10 20 30 54.87437 24.60504
48.34171 25.10924
56.38191 20.7731
50.85427 13.61345
57.18593 15.32773
< >

T 1

< T 1 >

K Means
STEP 3 : Update Centroid
Y-Values
As the new data point belongs to cluster 2 25.7, 50.1
50.5
50
We have to update the centroid of cluster 2 20.2, 49.8
49.5
49
48.5
𝑜𝑙𝑑 𝐶𝑒𝑛𝑡𝑟𝑜𝑖𝑑𝑋 + 𝑁𝑒𝑤 𝑉𝑎𝑙𝑢𝑒𝑋
New CentroidX = 48
2
47.5 23.8, 47.3
𝑜𝑙𝑑 𝐶𝑒𝑛𝑡𝑟𝑜𝑖𝑑𝑌 + 𝑁𝑒𝑤 𝑉𝑎𝑙𝑢𝑒𝑌 47
New CentroidY = 0 10 20 30
2
< >

T 1

< T 1 >

K Means
STEP 3 : Update Centroid
𝑜𝑙𝑑 𝐶𝑒𝑛𝑡𝑟𝑜𝑖𝑑𝑋 + 𝑁𝑒𝑤 𝑉𝑎𝑙𝑢𝑒𝑋 Y-Values
New CentroidX = 25.7, 50.1
2 50.5
50
20.2, 49.8
25.7+23.8 49.5
= 2 49
= 24.77386993 48.5
48
47.5 23.8, 47.3
𝑜𝑙𝑑 𝐶𝑒𝑛𝑡𝑟𝑜𝑖𝑑𝑌 + 𝑁𝑒𝑤 𝑉𝑎𝑙𝑢𝑒𝑌
New CentroidY = 47
2 0 10 20 30
50.1+47.3
=
2
= 48.75630134 Centroid
< >

T 1

< T 1 >

K Means
< >

T 1

< T 1 >

K Means

FIND OPTIMUM K VALUE

< >

T 1

< T 1 >

K Means
Calculating total distance of the cluster
< >

T 1

< T 1 >

K Means
Calculating total distance of the cluster 1 and Cluster 2

Total Distance =
Calculate the Euclidian distance
from the centroid to all of the
points if the cluster
< >

T 1

< T 1 >

K Means

Now consider K = 3
And you get 3 Clusters.
So three centroid.
Calculate total error.

Do this for K= 1,2,3…….

< >

T 1

< T 1 >

K Means
Now Plot this Total error vs K into a graph

Total error
Thank You

L7 Clustering
No ratings yet
L7 Clustering
58 pages
5 - CH 5-K-Means Clustering
No ratings yet
5 - CH 5-K-Means Clustering
54 pages
L11 Cluster Analysis
No ratings yet
L11 Cluster Analysis
47 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
Week6 Clustering Regression
No ratings yet
Week6 Clustering Regression
101 pages
K Mean
No ratings yet
K Mean
5 pages
Unit 4
No ratings yet
Unit 4
125 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
Unit V
No ratings yet
Unit V
165 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
Kmea
No ratings yet
Kmea
53 pages
Clustering Solved Examples
No ratings yet
Clustering Solved Examples
13 pages
Unit 7 Clustering
No ratings yet
Unit 7 Clustering
56 pages
Unit 4
No ratings yet
Unit 4
16 pages
ML - 8
No ratings yet
ML - 8
70 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
ML Lec13
No ratings yet
ML Lec13
3 pages
K Means
No ratings yet
K Means
40 pages
K Means
No ratings yet
K Means
66 pages
Clustering
No ratings yet
Clustering
24 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
PART2
No ratings yet
PART2
61 pages
K Means Example
No ratings yet
K Means Example
14 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
Lectures On Divergent Series (Emile Borel)
No ratings yet
Lectures On Divergent Series (Emile Borel)
129 pages
CS8091 - Big Data Analytics - Unit 2
No ratings yet
CS8091 - Big Data Analytics - Unit 2
44 pages
4 Clustering1
No ratings yet
4 Clustering1
41 pages
Week 9 Part 1 Clustering
No ratings yet
Week 9 Part 1 Clustering
44 pages
Clustering
No ratings yet
Clustering
18 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
Handbook of Econometrics Volume 3
No ratings yet
Handbook of Econometrics Volume 3
620 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
M5
No ratings yet
M5
40 pages
Kmeansfinal
No ratings yet
Kmeansfinal
16 pages
ML Lec-16
No ratings yet
ML Lec-16
16 pages
Physical Education Class-12th Notes
No ratings yet
Physical Education Class-12th Notes
276 pages
K-Means Clustering
No ratings yet
K-Means Clustering
21 pages
CS8091 BDA Unit 2
No ratings yet
CS8091 BDA Unit 2
101 pages
Unit - 4 DWDM
No ratings yet
Unit - 4 DWDM
27 pages
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Clustering
No ratings yet
Clustering
125 pages
Module 5
No ratings yet
Module 5
98 pages
Mod 4 - CLustering
No ratings yet
Mod 4 - CLustering
55 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
Locally GAN-generated Face Detection Based On An Improved Xception
No ratings yet
Locally GAN-generated Face Detection Based On An Improved Xception
13 pages
STS Reviewer
No ratings yet
STS Reviewer
23 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
Civil Engineering Important Questions
No ratings yet
Civil Engineering Important Questions
8 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
Signal Integrity Measurements and Network Analysis
No ratings yet
Signal Integrity Measurements and Network Analysis
55 pages
Country Frost King Creek Cowboys Book 8 Cheyenne Mccray PDF Download
No ratings yet
Country Frost King Creek Cowboys Book 8 Cheyenne Mccray PDF Download
29 pages
Library Management System Using Java: ASHUTOSH PATRA (2001229024) LALAJI PRASAD PANDA (2001229088) BINAYAK BAL (2001229025)
No ratings yet
Library Management System Using Java: ASHUTOSH PATRA (2001229024) LALAJI PRASAD PANDA (2001229088) BINAYAK BAL (2001229025)
28 pages
K Means Clustering
No ratings yet
K Means Clustering
17 pages
3 UnSupervised Learning
No ratings yet
3 UnSupervised Learning
53 pages
Section 1-Short Cantilever ST
No ratings yet
Section 1-Short Cantilever ST
5 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
INAC 2011 Phnatom Alderson RANDO - Boia Et Al
No ratings yet
INAC 2011 Phnatom Alderson RANDO - Boia Et Al
10 pages
Product Conformity Certificate - O2000 Oxygen Analyser
No ratings yet
Product Conformity Certificate - O2000 Oxygen Analyser
9 pages
Ud Module 4
No ratings yet
Ud Module 4
105 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
AS Chemistry - Revision Notes Unit 3 - Introduction To Organic Chemistry
No ratings yet
AS Chemistry - Revision Notes Unit 3 - Introduction To Organic Chemistry
15 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
SOP (Mahi - Project Coordinator)
No ratings yet
SOP (Mahi - Project Coordinator)
1 page
Practical Skill Improvement Needs of Technical College Mechanical Engineering Craft Practice Curriculum in Nigeria
No ratings yet
Practical Skill Improvement Needs of Technical College Mechanical Engineering Craft Practice Curriculum in Nigeria
9 pages
Bilal Khan Paper
No ratings yet
Bilal Khan Paper
18 pages
Lifi
No ratings yet
Lifi
19 pages
Storage Technologies: Digital Assignment 1
No ratings yet
Storage Technologies: Digital Assignment 1
16 pages
On The Optimal Weighting Matrix For The GMM System Estimator in Dynamic Panel Data Models
No ratings yet
On The Optimal Weighting Matrix For The GMM System Estimator in Dynamic Panel Data Models
28 pages
8 TQ Quarter4
No ratings yet
8 TQ Quarter4
2 pages
ANOVA Poplar-Trees
No ratings yet
ANOVA Poplar-Trees
3 pages
Lect 4
No ratings yet
Lect 4
34 pages
WC4331
No ratings yet
WC4331
4 pages
Interpreting Studies L On Fidelity in Interpretation
No ratings yet
Interpreting Studies L On Fidelity in Interpretation
11 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
K Means Example
No ratings yet
K Means Example
8 pages
Quality of Clustering: Clustering (K-Means Algorithm)
No ratings yet
Quality of Clustering: Clustering (K-Means Algorithm)
4 pages
KMeans Example
No ratings yet
KMeans Example
8 pages
Mindful Math 2: Use Your Geometry to Solve These Puzzling Pictures
From Everand
Mindful Math 2: Use Your Geometry to Solve These Puzzling Pictures
Ann McNair
No ratings yet
Modified Bitumens
No ratings yet
Modified Bitumens
6 pages
Table.1 Demographic Profile of The Respondents in Terms of Age
No ratings yet
Table.1 Demographic Profile of The Respondents in Terms of Age
5 pages
Assessment Task 1.2
No ratings yet
Assessment Task 1.2
14 pages
Regression
No ratings yet
Regression
4 pages
Common Session Observe Workplace Hygiene Procedures
No ratings yet
Common Session Observe Workplace Hygiene Procedures
8 pages
Masterprotect 1813: Amine-Cured, Pitch Free Epoxy
100% (1)
Masterprotect 1813: Amine-Cured, Pitch Free Epoxy
2 pages
Oversize Fashion Crochet: 6 Cozy Cardigans, Pullovers & Wraps Designed with Maximum Style and Ease
From Everand
Oversize Fashion Crochet: 6 Cozy Cardigans, Pullovers & Wraps Designed with Maximum Style and Ease
Salena Baca
No ratings yet
1 (B) - Laterally Loaded Piles
No ratings yet
1 (B) - Laterally Loaded Piles
6 pages

K Means Clustering

Uploaded by

K Means Clustering

Uploaded by

Lecture 7

▪ Clustering: the process of grouping a set of

▪ The commonest form of unsupervised learning

Ever Wondered how it is done?

Clustering Store Location 1

So the company Store Location 2

Now the question arises:

So what should be the store location (latitude and longitude)?

Or how the clustering would be done?

STEP 1 : First know, what is the value of K?

(How many cluster you want to make)

FIND OPTIMUM K VALUE

Do this for K= 1,2,3…….

You might also like