K-Means Clustering
K-Means Clustering
Introduction
• K-Means Clustering is an Unsupervised Machine
Learning algorithm, which groups the unlabeled dataset
into different clusters. The article aims to explore the
fundamentals and working of k mean clustering along
with the implementation.
Use of Clustering
• The goal of clustering is to divide the population or set
of data points into a number of groups so that the data
points within each group are more comparable to one
another and different from the data points within the
other groups. It is essentially a grouping of things based
on how similar and different they are to one another.
Algorithm
• The algorithm works as follows:
on the y-axis.
• Look for the "elbow point":
• The elbow point is where the WCSS starts decreasing at
a slower rate, resembling an elbow.It indicates the
optimal number of clusters because adding more
clusters beyond this point yields diminishing returns.
• Why it Works:
• At low k, clusters are large and have higher WCSS
because many points are far from their cluster center.
• As k increases, WCSS decreases since clusters become
smaller and tighter.
• Beyond the optimal k, WCSS reduction slows as clusters
start to overfit (splitting data unnecessarily).
• Limitations:
• The elbow point may not always be distinct, making the
method subjective.
• The method does not guarantee the best clustering
structure but provides a useful heuristic.
• What is the Silhouette Score?
• Measures how similar a data point is to its own cluster
(cohesion) compared to other clusters (separation).
• The score ranges from −1to 1:
• +1: Perfect clustering (points are close to their cluster and far
from others).
• 0: Overlapping clusters.
• −1: Poor clustering (points assigned to the wrong cluster)