Working of K Means Algorithm - YashBhure
Working of K Means Algorithm - YashBhure
ALGORITHM
AN UNSUPERVISED LEARNING ALGORITHM
11 JULY 2023
In the field of data science, k-means clustering is a widely used
unsupervised learning algorithm that aims to discover underlying
patterns and group similar data points together. It is particularly useful
for exploratory data analysis and finding meaningful insights from
unlabelled data. In this article, we will reach into the working of the k-
means algorithm, step-by-step, and explore its applications in various
industries.ext
Research Sources:
Step 1: Initialization
The k-means clustering algorithm works by first splitting data into k
number of clusters with k representing the number of clusters you
wish to create. If you choose to split your dataset into three clusters
then k, for example, is set to 3.In Figure 2, we can see that the original
(unclustered) data has been transformed into three clusters (k is 3). If
we were to set k to 4, an additional cluster would be derived from the
dataset to produce four clusters.
How does k-means clustering separate the data points?
Each data point can be assigned to only one cluster and each cluster is
discrete. This means that there is no overlap between clusters and no
case of nesting a cluster inside another cluster. Also, all data points,
including anomalies, are assigned to a centroid irrespective of how they
impact the final shape of the cluster. However, due to the statistical
force that pulls all nearby data points to a central point, your clusters
will generally form an elliptical or spherical shape. After all data points
have been allocated to a centroid, the next step is to aggregate the
mean value of all data points for each cluster, which can be found by
calculating the average x and y values of all data points in that cluster
Step 3: Updating Cluster Centroids
Next, take the mean value of the data points in each cluster and plug in
those x and y values to update your centroid coordinates. This will most
likely result in a change to your centroids’ location. Your total number of
clusters, however, will remain the same. You are not creating new
clusters, but rather updating their position on the scatterplot. Like
musical chairs, the remaining data points will then rush to the closest
centroid to form k number of clusters.
1. Customer Segmentation:
K-means clustering is commonly used in customer
segmentation, where customers are grouped based on their
behavior, preferences, or demographics.
By clustering customers, businesses can tailor marketing
strategies, personalize recommendations, and improve
customer satisfaction.
2. Image Compression:
K-means clustering can be employed for image compression by
reducing the number of colors in an image.
Each pixel in the image is treated as a data point, and k-means
clustering is applied to cluster similar colors together.
The cluster centroids represent the reduced color palette,
resulting in a compressed image with minimal loss of visual
quality.
3. Anomaly Detection:
K-means clustering can be used to detect anomalies or outliers
in datasets.
Anomalies are data points that do not belong to any cluster or
are significantly different from the other data points within a
cluster.
By examining the data points that are farthest from their cluster
centroids, potential anomalies can be identified.
Imagine a retail company that wants to gain insights into its customer
base and target them more effectively. The company has a large
customer database with information such as purchase history,
demographic data, browsing behavior, and more. By applying k-means
clustering to this dataset, the company can discover distinct customer
segments and tailor their marketing strategies accordingly.