0% found this document useful (0 votes)
3 views13 pages

Clustering Solved Examples

K-Means clustering is an unsupervised iterative technique that partitions data into k distinct clusters based on similarity. The algorithm involves selecting initial cluster centers, assigning data points to the nearest cluster, and recalculating cluster centers until convergence. While efficient, it requires predefined cluster numbers and struggles with noisy data and non-convex shapes.

Uploaded by

Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views13 pages

Clustering Solved Examples

K-Means clustering is an unsupervised iterative technique that partitions data into k distinct clusters based on similarity. The algorithm involves selecting initial cluster centers, assigning data points to the nearest cluster, and recalculating cluster centers until convergence. While efficient, it requires predefined cluster numbers and struggles with noisy data and non-convex shapes.

Uploaded by

Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

K-Means Clustering-

 K-Means clustering is an unsupervised iterative clustering technique.


 It partitions the given data set into k predefined distinct clusters.
 A cluster is defined as a collection of data points exhibiting certain similarities.

It partitions the data set such that-


 Each data point belongs to a cluster with the nearest mean.
 Data points belonging to one cluster have high degree of similarity.
 Data points belonging to different clusters have high degree of dissimilarity.

K-Means Clustering Algorithm involves the following steps-

Step-01:

 Choose the number of clusters K.

Step-02:

 Randomly select any K data points as cluster centers.


 Select cluster centers in such a way that they are as farther as possible from each other.
Step-03:

 Calculate the distance between each data point and each cluster center.
 The distance may be calculated either by using given distance function or by using
euclidean distance formula.

Step-04:

 Assign each data point to some cluster.


 A data point is assigned to that cluster whose center is nearest to that data point.
Step-05:

 Re-compute the center of newly formed clusters.


 The center of a cluster is computed by taking mean of all the data points contained in
that cluster.

Step-06:

Keep repeating the procedure from Step-03 to Step-05 until any of the following stopping
criteria is met-
 Center of newly formed clusters do not change
 Data points remain present in the same cluster
 Maximum number of iterations are reached
Advantages-

K-Means Clustering Algorithm offers the following advantages-

Point-01:

It is relatively efficient with time complexity O(nkt) where-


 n = number of instances
 k = number of clusters
 t = number of iterations
Point-02:

 It often terminates at local optimum.


 Techniques such as Simulated Annealing or Genetic Algorithms may be used to find the
global optimum.

Disadvantages-

K-Means Clustering Algorithm has the following disadvantages-


 It requires to specify the number of clusters (k) in advance.
 It can not handle noisy data and outliers.
 It is not suitable to identify clusters with non-convex shapes.

Euclidean Distance:
 Formula: For two points (x1, y1) and (x2, y2), the Euclidean distance is calculated as:
Code
d = √((x2 - x1)² + (y2 - y1)²)

Manhattan Distance:
 Formula: For two points (x1, y1) and (x2, y2), the Manhattan distance is calculated as:
Code
d = |x2 - x1| + |y2 - y1|
Problem-01:

Cluster the following eight points (with (x, y) representing locations) into three clusters:
A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)

Initial cluster centers are: A1(2, 10), A4(5, 8) and A7(1, 2).
The distance function between two points a = (x1, y1) and b = (x2, y2) is defined as-
Ρ(a, b) = |x2 – x1| + |y2 – y1|

Use K-Means Algorithm to find the three cluster centers after the second iteration.
Solution-

We follow the above discussed K-Means Clustering Algorithm-

Iteration-01:

 We calculate the distance of each point from each of the center of the three clusters.
 The distance is calculated by using the given distance function.

The following illustration shows the calculation of distance between point A1(2, 10) and
each of the center of the three clusters-

Calculating Distance Between A1(2, 10) and C1(2, 10)-


Ρ(A1, C1)
= |x2 – x1| + |y2 – y1|
= |2 – 2| + |10 – 10|
=0

Calculating Distance Between A1(2, 10) and C2(5, 8)-

Ρ(A1, C2)
= |x2 – x1| + |y2 – y1|
= |5 – 2| + |8 – 10|
3+2
=5
Calculating Distance Between A1(2, 10) and C3(1, 2)-

Ρ(A1, C3)
= |x2 – x1| + |y2 – y1|
= |1 – 2| + |2 – 10|
=1+8
=9

In the similar manner, we calculate the distance of other points from each of the center of
the three clusters.
Next,
 We draw a table showing all the results.
 Using the table, we decide which point belongs to which cluster.
 The given point belongs to that cluster whose center is nearest to it.

Distance from Distance from Distance from Point


iven Points center (2, 10) center (5, 8) of center (1, 2) of belongs to
of Cluster-01 Cluster-02 Cluster-03 Cluster

A1(2, 10) 0 5 9 C1

A2(2, 5) 5 6 4 C3

A3(8, 4) 12 7 9 C2

A4(5, 8) 5 0 10 C2

A5(7, 5) 10 5 9 C2
A6(6, 4) 10 5 7 C2

A7(1, 2) 9 10 0 C3

A8(4, 9) 3 2 10 C2

From here, New clusters are-

Cluster-01:

First cluster contains points-


 A1(2, 10)

Cluster-02:
Second cluster contains points-
 A3(8, 4)
 A4(5, 8)
 A5(7, 5)
 A6(6, 4)
 A8(4, 9)

Cluster-03:

Third cluster contains points-


 A2(2, 5)
 A7(1, 2)
Now,
 We re-compute the new cluster clusters.
 The new cluster center is computed by taking mean of all the points contained in that
cluster.

For Cluster-01:
 We have only one point A1(2, 10) in Cluster-01.
 So, cluster center remains the same.

For Cluster-02:

Center of Cluster-02
= ((8 + 5 + 7 + 6 + 4)/5, (4 + 8 + 5 + 4 + 9)/5)
= (6, 6)
For Cluster-03:

Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)

This is completion of Iteration-01.

Iteration-02:

 We calculate the distance of each point from each of the center of the three clusters.
 The distance is calculated by using the given distance function.
 The following illustration shows the calculation of distance between point A1(2, 10)
and each of the center of the three clusters-

 Calculating Distance Between A1(2, 10) and C1(2, 10)-

 Ρ(A1, C1)
 = |x2 – x1| + |y2 – y1|
 = |2 – 2| + |10 – 10|
 =0

 Calculating Distance Between A1(2, 10) and C2(6, 6)-

 Ρ(A1, C2)
 = |x2 – x1| + |y2 – y1|
 = |6 – 2| + |6 – 10|
 =4+4
 =8

 Calculating Distance Between A1(2, 10) and C3(1.5, 3.5)-
Ρ(A1, C3)
= |x2 – x1| + |y2 – y1|
= |1.5 – 2| + |3.5 – 10|
= 0.5 + 6.5
=7

In the similar manner, we calculate the distance of other points from each of the center of
the three clusters.

Next,
 We draw a table showing all the results.
 Using the table, we decide which point belongs to which cluster.
 The given point belongs to that cluster whose center is nearest to it.

Distance
Distance from Distance from Point
Given from center
center (6, 6) center (1.5, 3.5) belongs to
Points (2, 10) of
of Cluster-02 of Cluster-03 Cluster
Cluster-01

A1(2, 10) 0 8 7 C1

A2(2, 5) 5 5 2 C3

A3(8, 4) 12 4 7 C2

A4(5, 8) 5 3 8 C2

A5(7, 5) 10 2 7 C2

A6(6, 4) 10 2 5 C2

A7(1, 2) 9 9 2 C3
A8(4, 9) 3 5 8 C1

From here, New clusters are-

Cluster-01:

First cluster contains points-


 A1(2, 10)
 A8(4, 9)

Cluster-02:

Second cluster contains points-


 A3(8, 4)
 A4(5, 8)
 A5(7, 5)
 A6(6, 4)
Cluster-03:

Third cluster contains points-


 A2(2, 5)
 A7(1, 2)

Now,
 We re-compute the new cluster clusters.
 The new cluster center is computed by taking mean of all the points contained in that
cluster.

For Cluster-01:

Center of Cluster-01
= ((2 + 4)/2, (10 + 9)/2)
= (3, 9.5)
For Cluster-02:

Center of Cluster-02
= ((8 + 5 + 7 + 6)/4, (4 + 8 + 5 + 4)/4)
= (6.5, 5.25)

For Cluster-03:

Center of Cluster-03
= ((2 + 1)/2, (5 + 2)/2)
= (1.5, 3.5)

This is completion of Iteration-02.

Problem 2:

Confer the K-means algorithm with the following data for two clusters. Data set: {10,
4, 2,12, 3, 20, 30, 11, 25, 31}
Solution: Steps of the K-Means Algorithm (for 1D data):

1. Initialize centroids (Choose 2 random values as initial centroids).


2. Assign each data point to the nearest centroid.
3. Recalculate the centroids as the mean of the values assigned to each cluster.
4. Repeat steps 2–3 until the assignments do not change (convergence).

Step-by-step Execution:

Step 1: Initial Centroids

Let's choose:

 C1=4C_1 = 4C1=4
 C2=30C_2 = 30C2=30

Step 2: Cluster Assignment

Assign each point to the nearest centroid:

Data Distance to C₁ (4) Distance to C₂ (30) Assigned Cluster


10 6 20 C₁

4 0 26 C₁

2 2 28 C₁

12 8 18 C₁

3 1 27 C₁

20 16 10 C₂

30 26 0 C₂

11 7 19 C₁

25 21 5 C₂

31 27 1 C₂
Cluster 1 (C₁): {10, 4, 2, 12, 3, 11}
Cluster 2 (C₂): {20, 30, 25, 31}

Step 3: Recalculate Centroids

 New C1=10+4+2+12+3+116=426=7C_1 = \frac{10 + 4 + 2 + 12 + 3 + 11}{6} = \


frac{42}{6} = 7C1=610+4+2+12+3+11=642=7
 New C2=20+30+25+314=1064=26.5C_2 = \frac{20 + 30 + 25 + 31}{4} = \frac{106}
{4} = 26.5C2=420+30+25+31=4106=26.5

Step 4: Reassign Based on New Centroids


Data Distance to C₁ (7) Distance to C₂ (26.5) Assigned Cluster

10 3 16.5 C₁

4 3 22.5 C₁

2 5 24.5 C₁

12 5 14.5 C₁

3 4 23.5 C₁

20 13 6.5 C₂

30 23 3.5 C₂

11 4 15.5 C₁

25 18 1.5 C₂

31 24 4.5 C₂

No change in assignments.

✅ Final Clusters:

 Cluster 1: {10, 4, 2, 12, 3, 11}


Centroid = 7
 Cluster 2: {20, 30, 25, 31}
Centroid = 26.5

You might also like