0% found this document useful (0 votes)
22 views14 pages

K-Means Clustering

K-Means Clustering

Uploaded by

97 Haseeb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views14 pages

K-Means Clustering

K-Means Clustering

Uploaded by

97 Haseeb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

K-Means Clustering

Introduction
• K-Means Clustering is an Unsupervised Machine
Learning algorithm, which groups the unlabeled dataset
into different clusters. The article aims to explore the
fundamentals and working of k mean clustering along
with the implementation.
Use of Clustering
• The goal of clustering is to divide the population or set
of data points into a number of groups so that the data
points within each group are more comparable to one
another and different from the data points within the
other groups. It is essentially a grouping of things based
on how similar and different they are to one another.
Algorithm
• The algorithm works as follows:

1. First, we randomly initialize k points, called means or


cluster centroids.
2. We categorize each item to its closest mean, and we
update the mean’s coordinates, which are the
averages of the items categorized in that cluster so
far.
3. We repeat the process for a given number of iterations
and at the end, we have our clusters.
• Initialize k means with random values
• --> For a given number of iterations:
• --> Iterate through items:
• --> Find the mean closest to the item by calculating
• the Euclidean distance of the item with each of the
means
• --> Assign item to mean
• --> Update mean by shifting it to the average of the
items in that cluster
• The elbow method is a technique used to determine
the optimal number of clusters (k) in k-means
clustering. It evaluates how well the data points fit
within the clusters by analyzing the within-cluster
sum of squares (WCSS), also known as inertia.
• Key Steps in the Elbow Method:
1.Run k-means for different values of k:
1.Start with a range of cluster numbers (e.g., 1 to 10).
2.Compute the WCSS for each k.
• Create a plot with 𝑘 values on the x-axis and the WCSS
• Plot the results:

on the y-axis.
• Look for the "elbow point":
• The elbow point is where the WCSS starts decreasing at
a slower rate, resembling an elbow.It indicates the
optimal number of clusters because adding more
clusters beyond this point yields diminishing returns.
• Why it Works:
• At low k, clusters are large and have higher WCSS
because many points are far from their cluster center.
• As k increases, WCSS decreases since clusters become
smaller and tighter.
• Beyond the optimal k, WCSS reduction slows as clusters
start to overfit (splitting data unnecessarily).
• Limitations:
• The elbow point may not always be distinct, making the
method subjective.
• The method does not guarantee the best clustering
structure but provides a useful heuristic.
• What is the Silhouette Score?
• Measures how similar a data point is to its own cluster
(cohesion) compared to other clusters (separation).
• The score ranges from −1to 1:
• +1: Perfect clustering (points are close to their cluster and far
from others).
• 0: Overlapping clusters.
• −1: Poor clustering (points assigned to the wrong cluster)

You might also like