Ai - W8L15
Ai - W8L15
WEEK 06
LECTURE 12
TOPICS TO COVER IN THIS LECTURE
• K means clustering, assigns data points to one of the K clusters depending on their
distance from the center of the clusters.
• It starts by randomly assigning the clusters centroid in the space.
• Centroid is a data point that represents the center of the cluster.
• Then each data point assign to one of the cluster based on its distance from
centroid of the cluster.
• After assigning each point to one of the cluster, new cluster centroids are
assigned.
• This process runs iteratively until it finds good cluster.
• In the analysis we assume that number of cluster is given in advanced and we
have to put points in one of the group.
• In some cases, K is not clearly defined, and we have to think about the optimal
number of K. K Means clustering performs best data is well separated.
• When data points overlapped this clustering is not suitable. K Means is faster as
• It provides strong coupling between the data points. K Means
cluster do not provide clear information regarding the quality of
clusters.
• Different initial assignment of cluster centroid may lead to
different clusters.
• Also, K Means algorithm is sensitive to noise. It may have stuck
in local minima.
WHAT IS THE OBJECTIVE OF K-MEANS CLUSTERING?
Now, we will calculate the new centroid for each cluster for the
third iteration.
•In cluster 1, we have 5 points i.e. A2 (2,6), A5 (6,4), A6 (1,2), A10
(7,5), and A12 (4,6). To calculate the new centroid for cluster 1, we
will find the mean of the x and y coordinates of each point in the
cluster. Hence, the new centroid for cluster 1 is (4, 4.6).
•In cluster 2, we have 7 points i.e. A1 (2,10), A4 (6,9), A7 (5,10) ,
A8 (4,9), A13 (3,10), A14 (3,8), and A15 (6,11). Hence, the new
centroid for cluster 2 is (4.143, 9.571)
•In cluster 3, we have 3 points i.e. A3 (11,11), A9 (10,12), and A11
(9,11). Hence, the new centroid for cluster 3 is (10, 11.333).
At this point, we have calculated new centroids for
each cluster. Now, we will calculate the distance of
each data point from the new centroids. Then, we
will assign the points to clusters based on their
distance from the centroids.
Distance from Distance from Distance from
Assigned
Point Centroid 1 (4, centroid 2 centroid 3 (10,
Cluster
4.6) (4.143, 9.571) 11.333)
A1 (2,10) 5.758 2.186 8.110 Cluster 2
A2 (2,6) 2.441 4.165 9.615 Cluster 1
A3 (11,11) 9.485 7.004 1.054 Cluster 3
A4 (6,9) 4.833 1.943 4.631 Cluster 2
A5 (6,4) 2.088 5.872 8.353 Cluster 1
A6 (1,2) 3.970 8.197 12.966 Cluster 1
A7 (5,10) 5.492 0.958 5.175 Cluster 2
A8 (4,9) 4.400 0.589 6.438 Cluster 2
A9 (10,12) 9.527 6.341 0.667 Cluster 3
A10 (7,5) 3.027 5.390 7.008 Cluster 1
A11 (9,11) 8.122 5.063 1.054 Cluster 3
A12 (4,6) 1.400 3.574 8.028 Cluster 1
A13 (3,10) 5.492 1.221 7.126 Cluster 2
A14 (3,8) 3.544 1.943 7.753 Cluster 2
A15 (6,11) 6.705 2.343 4.014 Cluster 2
Results from 3rd iteration of K means clustering
Now, we have completed the third iteration of the k-means
clustering algorithm and assigned each point into an updated
cluster
Now, we will calculate the new centroid for each cluster
for the third iteration.
•In cluster 1, we have 5 points i.e. A2 (2,6), A5 (6,4), A6
(1,2), A10 (7,5), and A12 (4,6). To calculate the new
centroid for cluster 1, we will find the mean of the x and y
coordinates of each point in the cluster. Hence, the new
centroid for cluster 1 is (4, 4.6).
•In cluster 2, we have 7 points i.e. A1 (2,10), A4 (6,9), A7
(5,10) , A8 (4,9), A13 (3,10), A14 (3,8), and A15 (6,11).
Hence, the new centroid for cluster 2 is (4.143, 9.571)
•In cluster 3, we have 3 points i.e. A3 (11,11), A9 (10,12),
Here, you can observe that no point has changed its
cluster compared to the previous iteration. Due to this,
the centroid also remains constant. Therefore, we will say
that the clusters have been stabilized. Hence, the clusters
obtained after the third iteration are the final clusters
made from the given dataset. If we plot the clusters on a
graph, the graph looks like as follows.
EVALUATION METRICS FOR UNSUPERVISED