0% found this document useful (0 votes)
32 views22 pages

ML-Unit III - K-Means Clustering

Uploaded by

t40088356
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views22 pages

ML-Unit III - K-Means Clustering

Uploaded by

t40088356
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Machine Learning

Dr. Sunil Saumya


IIIT Dharwad
K-means clustering Algo.
K-means clustering: Intro
● K-means clustering is an unsupervised iterative clustering technique.
● It partitions the given data set into k predefined distinct clusters.
● A cluster is defined as a collection of data points exhibiting certain
similarities.
K-means clustering: Intro
● It partitions the data set such that- Each data point belongs to a cluster with
the nearest mean.
● Data points belonging to one cluster have high degree of similarity.
● Data points belonging to different clusters have high degree of dissimilarity.
K-means clustering: Algorithm
K-Means Clustering Algorithm involves the following steps-
● Step-1: Choose the number of clusters K.
● Step-02: Randomly select any K data points as cluster centers. Select cluster
centers in such a way that they are as farther as possible from each other.
● Step-03: Calculate the distance between each data point and each cluster
center. The distance may be calculated either by using given distance
function or by using euclidean distance formula.
● Step-04: Assign each data point to some cluster. A data point is assigned to
that cluster whose center is nearest to that data point.
K-means clustering: Algorithm Contd..
K-Means Clustering Algorithm involves the following steps-
● Step-05: Re-compute the center of newly formed clusters. The center of a
cluster is computed by taking mean of all the data points contained in that
cluster.
● Step-06: Keep repeating the procedure from Step-03 to Step-05 until any of
the following stopping criteria is met-
○ Center of newly formed clusters do not change
○ Data points remain present in the same cluster
○ Maximum number of iterations are reached
K-means clustering: Exercise
Cluster the following eight points (with (x, y) representing locations) into three
clusters:
A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)
Consider initial cluster centers are: A1(2, 10), A4(5, 8) and A7(1, 2).
The distance function between two points a = (x1, y1) and b = (x2, y2) is defined
as-
p(a, b) = |x2 – x1| + |y2 – y1|
Use K-Means Algorithm to find the three cluster centers after the second
iteration.
K-means clustering: Exercise
The given points can be plotted as:
A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8),
A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)
Initial cluster centers are: A1(2, 10),
A4(5, 8) and A7(1, 2).
K-means clustering: Exercise
Solution: Iteration 1

A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)

Initial cluster centers are: C1(2, 10), C2(5, 8) and C3(1, 2).

We calculate the distance of each point from each of the center of the three clusters.

Calculating Distance Between A1(2, 10) and C1(2, 10): Ρ(A1, C1) = |x2 – x1| + |y2 – y1| = |2 – 2| + |10 – 10| = 0

Calculating Distance Between A1(2, 10) and C2(5, 8): Ρ(A1, C2) = |x2 – x1| + |y2 – y1| = |5 – 2| + |8 – 10| = 3 + 2 = 5

Calculating Distance Between A1(2, 10) and C3(1, 2): Ρ(A1, C3) = |x2 – x1| + |y2 – y1| = |1 – 2| + |2 – 10| = 1 + 8 = 9

According to this, Distance Between A1(2, 10) and C1(2, 10) is minimum and it will go in cluster C1.

In the similar manner, we calculate the distance of other points from each of the center of the three clusters.
K-means clustering: Exercise
Solution: Iteration 1
K-means clustering: Exercise
Solution: Iteration 1

New clusters formed are:

Cluster-01:

First cluster contains points- A1(2, 10)

Cluster-02:

Second cluster contains points- A3(8, 4)


A4(5, 8) A5(7, 5) A6(6, 4) A8(4, 9)

Cluster-03:

Third cluster contains points- A2(2, 5)


A7(1, 2)
K-means clustering: Exercise
Solution: Iteration 1
Now, We re-compute the new cluster clusters. The new cluster
New clusters formed are: center is computed by taking mean of all the points contained in
that cluster.
Cluster-01:
For Cluster-01:
First cluster contains points- A1(2, 10) We have only one point A1(2, 10) in Cluster-01. So, cluster
center remains the same.
Cluster-02:
For Cluster-02:
Second cluster contains points- A3(8, 4) Center of Cluster-02 = ((8 + 5 + 7 + 6 + 4)/5, (4 + 8 + 5 + 4 +
A4(5, 8) A5(7, 5) A6(6, 4) A8(4, 9) 9)/5) = (6, 6)

Cluster-03: For Cluster-03:


Center of Cluster-03 = ((2 + 1)/2, (5 + 2)/2) = (1.5, 3.5) This is
Third cluster contains points- A2(2, 5)
completion of Iteration-01.
A7(1, 2)
This completes the Iteration 1.
K-means clustering: Exercise
Solution: Iteration 2

Cluster-01:

First cluster contains points- A1(2, 10)

Center = C1(2,10)

Cluster-02:

Second cluster contains points- A3(8, 4) A4(5, 8)


A5(7, 5) A6(6, 4) A8(4, 9)

Center=C2(6,6)

Cluster-03:

Third cluster contains points- A2(2, 5) A7(1, 2)

Center = C3(1.5,3.5)
K-means clustering: Exercise
Solution: Iteration 2

From here, New clusters are-

Cluster-01:

First cluster contains points- A1(2, 10) A8(4, 9)

Cluster-02:

Second cluster contains points- A3(8, 4) A4(5, 8)


A5(7, 5) A6(6, 4)

Cluster-03: Third cluster contains points- A2(2, 5)


A7(1, 2)
K-means clustering: Exercise
Solution: Iteration 2

Now, We re-compute the new cluster clusters. The new


cluster center is computed by taking mean of all the points
contained in that cluster.
This is completion of Iteration-02.
For Cluster-01: A1(2, 10) A8(4, 9)
After second iteration, the center of the three
Center of Cluster-01 = ((2 + 4)/2, (10 + 9)/2) = (3, 9.5) clusters are-

For Cluster-02: A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) C1(3, 9.5) C2(6.5, 5.25) C3(1.5, 3.5)
Center of Cluster-02 = ((8 + 5 + 7 + 6)/4, (4 + 8 + 5 + 4)/4) =
(6.5, 5.25)

For Cluster-03: A2(2, 5) A7(1, 2)

Center of Cluster-03 = ((2 + 1)/2, (5 + 2)/2) = (1.5, 3.5)


K-means clustering: Algo
Decide n clusters

Initialize centroids

Assign Cluster

Move Centroids

Finish
K-means Clustering: Elbow method
● How to decides number of clusters?
○ The elbow method is a graphical representation of finding the optimal
‘K’ in a K-means clustering.
○ It works by finding WCSS (Within-Cluster Sum of Square) i.e. the sum
of the square distance between points in a cluster and the cluster
centroid.
K-means Clustering: Elbow method

WCSS1 > WCSS2 > WCSS3 > ..... > WCSSn


K-means Clustering: Elbow method

WCSS1 > WCSS2 > WCSS3 > ..... > WCSSn

● When we see an elbow shape in the


graph, we pick the K-value where the
elbow gets created. We can call this
point the Elbow point.
● Beyond the Elbow point, increasing the
value of ‘K’ does not lead to a
significant reduction in WCSS.
K-means Clustering: Silhouette score
● In the majority of the real-world datasets, it is not very clear to identify the
right ‘K’ using the elbow method. The elbow looks like
K-means Clustering: Silhouette score
● The Silhouette score is a very useful method to find the number of K when
the Elbow method doesn't show the Elbow point.

● The Silhouette score ranges from -1 to +1.


○ 1: Points are perfectly assigned in a cluster and clusters are easily
distinguishable.
○ 0: Clusters are overlapping.
○ -1: Points are wrongly assigned in a cluster.
K-means Clustering: Silhouette score
● Silhouette Score = (b-a)/max(a,b)
where,
○ a= average intra-cluster
distance i.e the average
distance between each point
within a cluster.
○ b= average inter-cluster
distance i.e the average
distance between all clusters.

You might also like