0% found this document useful (0 votes)

3 views13 pages

Clustering Solved Examples

K-Means clustering is an unsupervised iterative technique that partitions data into k distinct clusters based on similarity. The algorithm involves selecting initial cluster centers, assigning data points to the nearest cluster, and recalculating cluster centers until convergence. While efficient, it requires predefined cluster numbers and struggles with noisy data and non-convex shapes.

Uploaded by

Raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views13 pages

Clustering Solved Examples

Uploaded by

Raj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

K-Means Clustering-

 K-Means clustering is an unsupervised iterative clustering technique.

 It partitions the given data set into k predefined distinct clusters.
 A cluster is defined as a collection of data points exhibiting certain similarities.

It partitions the data set such that-

 Each data point belongs to a cluster with the nearest mean.
 Data points belonging to one cluster have high degree of similarity.
 Data points belonging to different clusters have high degree of dissimilarity.

K-Means Clustering Algorithm involves the following steps-

Step-01:

 Choose the number of clusters K.

Step-02:

 Randomly select any K data points as cluster centers.

 Select cluster centers in such a way that they are as farther as possible from each other.
Step-03:

 Calculate the distance between each data point and each cluster center.
 The distance may be calculated either by using given distance function or by using
euclidean distance formula.

Step-04:

 Assign each data point to some cluster.

 A data point is assigned to that cluster whose center is nearest to that data point.
Step-05:

 Re-compute the center of newly formed clusters.

 The center of a cluster is computed by taking mean of all the data points contained in
that cluster.

Step-06:

Keep repeating the procedure from Step-03 to Step-05 until any of the following stopping
criteria is met-
 Center of newly formed clusters do not change
 Data points remain present in the same cluster
 Maximum number of iterations are reached
Advantages-

K-Means Clustering Algorithm offers the following advantages-

Point-01:

It is relatively efficient with time complexity O(nkt) where-

 n = number of instances
 k = number of clusters
 t = number of iterations
Point-02:

 It often terminates at local optimum.

 Techniques such as Simulated Annealing or Genetic Algorithms may be used to find the
global optimum.

Disadvantages-

K-Means Clustering Algorithm has the following disadvantages-

 It requires to specify the number of clusters (k) in advance.
 It can not handle noisy data and outliers.
 It is not suitable to identify clusters with non-convex shapes.

Euclidean Distance:
 Formula: For two points (x1, y1) and (x2, y2), the Euclidean distance is calculated as:
Code
d = √((x2 - x1)² + (y2 - y1)²)

Manhattan Distance:
 Formula: For two points (x1, y1) and (x2, y2), the Manhattan distance is calculated as:
Code
d = |x2 - x1| + |y2 - y1|
Problem-01:

Cluster the following eight points (with (x, y) representing locations) into three clusters:
A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)

Initial cluster centers are: A1(2, 10), A4(5, 8) and A7(1, 2).
The distance function between two points a = (x1, y1) and b = (x2, y2) is defined as-
Ρ(a, b) = |x2 – x1| + |y2 – y1|

Use K-Means Algorithm to find the three cluster centers after the second iteration.
Solution-

We follow the above discussed K-Means Clustering Algorithm-

Iteration-01:

 We calculate the distance of each point from each of the center of the three clusters.
 The distance is calculated by using the given distance function.

The following illustration shows the calculation of distance between point A1(2, 10) and
each of the center of the three clusters-

Calculating Distance Between A1(2, 10) and C1(2, 10)-

Ρ(A1, C1)
= |x2 – x1| + |y2 – y1|
= |2 – 2| + |10 – 10|
=0

Calculating Distance Between A1(2, 10) and C2(5, 8)-

Ρ(A1, C2)
= |x2 – x1| + |y2 – y1|
= |5 – 2| + |8 – 10|
3+2
=5
Calculating Distance Between A1(2, 10) and C3(1, 2)-

Ρ(A1, C3)
= |x2 – x1| + |y2 – y1|
= |1 – 2| + |2 – 10|
=1+8
=9

In the similar manner, we calculate the distance of other points from each of the center of
the three clusters.
Next,
 We draw a table showing all the results.
 Using the table, we decide which point belongs to which cluster.
 The given point belongs to that cluster whose center is nearest to it.

Distance from Distance from Distance from Point

iven Points center (2, 10) center (5, 8) of center (1, 2) of belongs to
of Cluster-01 Cluster-02 Cluster-03 Cluster

A1(2, 10) 0 5 9 C1

A2(2, 5) 5 6 4 C3

A3(8, 4) 12 7 9 C2

A4(5, 8) 5 0 10 C2

A5(7, 5) 10 5 9 C2
A6(6, 4) 10 5 7 C2

A7(1, 2) 9 10 0 C3

A8(4, 9) 3 2 10 C2

From here, New clusters are-

Cluster-01:

First cluster contains points-

 A1(2, 10)

Cluster-02:
Second cluster contains points-
 A3(8, 4)
 A4(5, 8)
 A5(7, 5)
 A6(6, 4)
 A8(4, 9)

Cluster-03:

Third cluster contains points-

 A2(2, 5)
 A7(1, 2)
Now,
 We re-compute the new cluster clusters.
 The new cluster center is computed by taking mean of all the points contained in that
cluster.

For Cluster-01:
 We have only one point A1(2, 10) in Cluster-01.
 So, cluster center remains the same.

For Cluster-02:

Center of Cluster-02
= ((8 + 5 + 7 + 6 + 4)/5, (4 + 8 + 5 + 4 + 9)/5)
= (6, 6)
For Cluster-03:

Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)

This is completion of Iteration-01.

Iteration-02:

 We calculate the distance of each point from each of the center of the three clusters.
 The distance is calculated by using the given distance function.
 The following illustration shows the calculation of distance between point A1(2, 10)
and each of the center of the three clusters-

 Calculating Distance Between A1(2, 10) and C1(2, 10)-

 Ρ(A1, C1)
 = |x2 – x1| + |y2 – y1|
 = |2 – 2| + |10 – 10|
 =0

 Calculating Distance Between A1(2, 10) and C2(6, 6)-

 Ρ(A1, C2)
 = |x2 – x1| + |y2 – y1|
 = |6 – 2| + |6 – 10|
 =4+4
 =8

 Calculating Distance Between A1(2, 10) and C3(1.5, 3.5)-
Ρ(A1, C3)
= |x2 – x1| + |y2 – y1|
= |1.5 – 2| + |3.5 – 10|
= 0.5 + 6.5
=7

In the similar manner, we calculate the distance of other points from each of the center of
the three clusters.

Next,
 We draw a table showing all the results.
 Using the table, we decide which point belongs to which cluster.
 The given point belongs to that cluster whose center is nearest to it.

Distance
Distance from Distance from Point
Given from center
center (6, 6) center (1.5, 3.5) belongs to
Points (2, 10) of
of Cluster-02 of Cluster-03 Cluster
Cluster-01

A1(2, 10) 0 8 7 C1

A2(2, 5) 5 5 2 C3

A3(8, 4) 12 4 7 C2

A4(5, 8) 5 3 8 C2

A5(7, 5) 10 2 7 C2

A6(6, 4) 10 2 5 C2

A7(1, 2) 9 9 2 C3
A8(4, 9) 3 5 8 C1

From here, New clusters are-

Cluster-01:

First cluster contains points-

 A1(2, 10)
 A8(4, 9)

Cluster-02:

Second cluster contains points-

 A3(8, 4)
 A4(5, 8)
 A5(7, 5)
 A6(6, 4)
Cluster-03:

Third cluster contains points-

 A2(2, 5)
 A7(1, 2)

Now,
 We re-compute the new cluster clusters.
 The new cluster center is computed by taking mean of all the points contained in that
cluster.

For Cluster-01:

Center of Cluster-01
= ((2 + 4)/2, (10 + 9)/2)
= (3, 9.5)
For Cluster-02:

Center of Cluster-02
= ((8 + 5 + 7 + 6)/4, (4 + 8 + 5 + 4)/4)
= (6.5, 5.25)

For Cluster-03:

Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)

This is completion of Iteration-02.

Problem 2:

Confer the K-means algorithm with the following data for two clusters. Data set: {10,
4, 2,12, 3, 20, 30, 11, 25, 31}
Solution: Steps of the K-Means Algorithm (for 1D data):

1. Initialize centroids (Choose 2 random values as initial centroids).

2. Assign each data point to the nearest centroid.
3. Recalculate the centroids as the mean of the values assigned to each cluster.
4. Repeat steps 2–3 until the assignments do not change (convergence).

Step-by-step Execution:

Step 1: Initial Centroids

Let's choose:

 C1=4C_1 = 4C1=4
 C2=30C_2 = 30C2=30

Step 2: Cluster Assignment

Assign each point to the nearest centroid:

Data Distance to C₁ (4) Distance to C₂ (30) Assigned Cluster

10 6 20 C₁

4 0 26 C₁

2 2 28 C₁

12 8 18 C₁

3 1 27 C₁

20 16 10 C₂

30 26 0 C₂

11 7 19 C₁

25 21 5 C₂

31 27 1 C₂
Cluster 1 (C₁): {10, 4, 2, 12, 3, 11}
Cluster 2 (C₂): {20, 30, 25, 31}

Step 3: Recalculate Centroids

 New C1=10+4+2+12+3+116=426=7C_1 = \frac{10 + 4 + 2 + 12 + 3 + 11}{6} = \

frac{42}{6} = 7C1=610+4+2+12+3+11=642=7
 New C2=20+30+25+314=1064=26.5C_2 = \frac{20 + 30 + 25 + 31}{4} = \frac{106}
{4} = 26.5C2=420+30+25+31=4106=26.5

Step 4: Reassign Based on New Centroids

Data Distance to C₁ (7) Distance to C₂ (26.5) Assigned Cluster

10 3 16.5 C₁

4 3 22.5 C₁

2 5 24.5 C₁

12 5 14.5 C₁

3 4 23.5 C₁

20 13 6.5 C₂

30 23 3.5 C₂

11 4 15.5 C₁

25 18 1.5 C₂

31 24 4.5 C₂

No change in assignments.

✅ Final Clusters:

 Cluster 1: {10, 4, 2, 12, 3, 11}

Centroid = 7
 Cluster 2: {20, 30, 25, 31}
Centroid = 26.5

SQL Cheat Sheet
100% (1)
SQL Cheat Sheet
21 pages
General Mathematics: Functions
No ratings yet
General Mathematics: Functions
27 pages
A Beautiful Mind-Jason Padgett
No ratings yet
A Beautiful Mind-Jason Padgett
11 pages
K Means
No ratings yet
K Means
19 pages
K Means
No ratings yet
K Means
14 pages
K Means Alg, Example
No ratings yet
K Means Alg, Example
9 pages
3 00f3f2a7d5 K Means
No ratings yet
3 00f3f2a7d5 K Means
13 pages
K Means
No ratings yet
K Means
66 pages
K Means
No ratings yet
K Means
25 pages
Unit V
No ratings yet
Unit V
165 pages
Kmeans Clustering Numerical - 1
No ratings yet
Kmeans Clustering Numerical - 1
5 pages
K Means Example
No ratings yet
K Means Example
8 pages
5 - CH 5-K-Means Clustering
No ratings yet
5 - CH 5-K-Means Clustering
54 pages
K Means Algorithms
No ratings yet
K Means Algorithms
27 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
Kmeans Clustering Lecture 8
No ratings yet
Kmeans Clustering Lecture 8
20 pages
ML Unit 4 Part A Material
No ratings yet
ML Unit 4 Part A Material
15 pages
Clustering TNP
No ratings yet
Clustering TNP
53 pages
DM Unit Iv
No ratings yet
DM Unit Iv
45 pages
Example 1
No ratings yet
Example 1
8 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Answer Model Final2021-2022Term1
No ratings yet
Answer Model Final2021-2022Term1
10 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
79 pages
ML-Unit III - K-Means Clustering
No ratings yet
ML-Unit III - K-Means Clustering
22 pages
K - Means Clustering
No ratings yet
K - Means Clustering
34 pages
K-Means Clustering
No ratings yet
K-Means Clustering
21 pages
ML Seminar
No ratings yet
ML Seminar
37 pages
Quality of Clustering: Clustering (K-Means Algorithm)
No ratings yet
Quality of Clustering: Clustering (K-Means Algorithm)
4 pages
Algo
No ratings yet
Algo
59 pages
08 K-Means
No ratings yet
08 K-Means
19 pages
K-Means Clustering
No ratings yet
K-Means Clustering
7 pages
K-Means With Elbow Method
No ratings yet
K-Means With Elbow Method
24 pages
Unit 5
No ratings yet
Unit 5
189 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Pilot
No ratings yet
Pilot
3 pages
Clustering
No ratings yet
Clustering
18 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
CPE412 Pattern Recognition (Week 7)
No ratings yet
CPE412 Pattern Recognition (Week 7)
48 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Module 5
No ratings yet
Module 5
98 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
No ratings yet
Introduction To Data Science Lecture 6 KG Sir OEC M 621 (E)
19 pages
Segmentaion 6S
No ratings yet
Segmentaion 6S
37 pages
Unit IV
No ratings yet
Unit IV
51 pages
K Means Example
No ratings yet
K Means Example
14 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
K Means Tutorial
No ratings yet
K Means Tutorial
8 pages
KMeans Example
No ratings yet
KMeans Example
8 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Clustering Numericals
No ratings yet
Clustering Numericals
8 pages
Lecture 11 K Means Clustering
No ratings yet
Lecture 11 K Means Clustering
8 pages
Kmea
No ratings yet
Kmea
53 pages
ML CH 4
No ratings yet
ML CH 4
65 pages
Title: K-Means Clustering Algorithm Implementation: Department of Computer Science and Engineering
No ratings yet
Title: K-Means Clustering Algorithm Implementation: Department of Computer Science and Engineering
7 pages
Unit 4 - K-Means Clustering Algorithm With Examples
No ratings yet
Unit 4 - K-Means Clustering Algorithm With Examples
14 pages
DWM Question Bank Solution
No ratings yet
DWM Question Bank Solution
23 pages
ML K-Means
No ratings yet
ML K-Means
3 pages
10 Marks Questions
No ratings yet
10 Marks Questions
19 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
ML 3 PPT Unit 3
No ratings yet
ML 3 PPT Unit 3
51 pages
Analytic Geometry: Graphic Solutions Using Matlab Language
From Everand
Analytic Geometry: Graphic Solutions Using Matlab Language
Ing. Mario Castillo
No ratings yet
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet
Geometry and Locus (Geometry) Mathematics Question Bank
From Everand
Geometry and Locus (Geometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Annotated-Behavior of Gases Study Guide-1
No ratings yet
Annotated-Behavior of Gases Study Guide-1
3 pages
Ultima CPX Standalone Sellsheet
No ratings yet
Ultima CPX Standalone Sellsheet
2 pages
Rmarkdown Cheatsheet 2.0
No ratings yet
Rmarkdown Cheatsheet 2.0
2 pages
COMPUTER SCIENCE INVESTIGATORY PROJECT Ruttajeet
100% (1)
COMPUTER SCIENCE INVESTIGATORY PROJECT Ruttajeet
18 pages
Y4 HL Spring Block 2 Area 2020
No ratings yet
Y4 HL Spring Block 2 Area 2020
8 pages
Accelerating Computational Science and Engineering
No ratings yet
Accelerating Computational Science and Engineering
19 pages
DB en Ib Il 24 Do 16 5559 en 07
No ratings yet
DB en Ib Il 24 Do 16 5559 en 07
13 pages
Working Principle of Speed Reduction Gearbox
No ratings yet
Working Principle of Speed Reduction Gearbox
27 pages
COST - VOLUME-PROFIT Analysis
No ratings yet
COST - VOLUME-PROFIT Analysis
17 pages
PKMN Sun & Moon
No ratings yet
PKMN Sun & Moon
3 pages
CMA Test
No ratings yet
CMA Test
2 pages
Final
No ratings yet
Final
24 pages
Chapter One Problems Solutions: KJ 51.84 WH 14.4
No ratings yet
Chapter One Problems Solutions: KJ 51.84 WH 14.4
8 pages
Fit India Letter
No ratings yet
Fit India Letter
2 pages
Computer Software
No ratings yet
Computer Software
24 pages
III-bsc Python Lab-Record
No ratings yet
III-bsc Python Lab-Record
22 pages
02 Decision Theory-Blockwood
No ratings yet
02 Decision Theory-Blockwood
5 pages
(TSC) ss32-ss315
No ratings yet
(TSC) ss32-ss315
2 pages
Tugas Praktika Akmen TM 9
No ratings yet
Tugas Praktika Akmen TM 9
6 pages
Intel Core 2 Duo E4600 Specifications
No ratings yet
Intel Core 2 Duo E4600 Specifications
5 pages
Experiment 1 The Vernier and Micrometer Calipers The Vernier Caliper
No ratings yet
Experiment 1 The Vernier and Micrometer Calipers The Vernier Caliper
4 pages
DSIOPMA ProblemSolvingNo.6CHUA
No ratings yet
DSIOPMA ProblemSolvingNo.6CHUA
6 pages
Geography Form 3 Notes
No ratings yet
Geography Form 3 Notes
8 pages
Operations Guide Baby-G Watch 5059
No ratings yet
Operations Guide Baby-G Watch 5059
4 pages
C# Module 1
100% (2)
C# Module 1
81 pages

Clustering Solved Examples

Uploaded by

Clustering Solved Examples

Uploaded by

K-Means Clustering-

 K-Means clustering is an unsupervised iterative clustering technique.

It partitions the data set such that-

K-Means Clustering Algorithm involves the following steps-

 Choose the number of clusters K.

 Randomly select any K data points as cluster centers.

 Assign each data point to some cluster.

 Re-compute the center of newly formed clusters.

K-Means Clustering Algorithm offers the following advantages-

It is relatively efficient with time complexity O(nkt) where-

 It often terminates at local optimum.

K-Means Clustering Algorithm has the following disadvantages-

We follow the above discussed K-Means Clustering Algorithm-

Calculating Distance Between A1(2, 10) and C1(2, 10)-

Calculating Distance Between A1(2, 10) and C2(5, 8)-

Distance from Distance from Distance from Point

From here, New clusters are-

First cluster contains points-

Third cluster contains points-

This is completion of Iteration-01.

From here, New clusters are-

First cluster contains points-

Second cluster contains points-

Third cluster contains points-

This is completion of Iteration-02.

1. Initialize centroids (Choose 2 random values as initial centroids).

Step 1: Initial Centroids

Step 2: Cluster Assignment

Assign each point to the nearest centroid:

Data Distance to C₁ (4) Distance to C₂ (30) Assigned Cluster

Step 3: Recalculate Centroids

 New C1=10+4+2+12+3+116=426=7C_1 = \frac{10 + 4 + 2 + 12 + 3 + 11}{6} = \

Step 4: Reassign Based on New Centroids

 Cluster 1: {10, 4, 2, 12, 3, 11}

You might also like