0% found this document useful (0 votes)
43 views

Unsupervised Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Unsupervised Learning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

K-means Clustering

Presented By:
Akanksha Kaushik
Assistant Professor
The NorthCap University
Introduction to Unsupervised
Machine Learning

• Unsupervised Machine Learning is the process of


teaching a computer to use unlabeled, unclassified data
and enabling the algorithm to operate on that data
without supervision.

• Without any previous data training, the machine’s job in


this case is to organize unsorted data according to
parallels, patterns, and variations.
Introduction to K-
means Clustering

• K-Means Clustering is an Unsupervised Machine


Learning algorithm, which groups the unlabeled dataset
into different clusters.
• K means clustering, assigns data points to one of the K
clusters depending on their distance from the center of
the clusters.
• It starts by randomly assigning the clusters centroid in
the space.
• Then each data point assign to one of the cluster based
on its distance from centroid of the cluster.
• After assigning each point to one of the cluster, new
cluster centroids are assigned.
• This process runs iteratively until it finds good cluster.
• In the analysis we assume that number of cluster is given
in advanced, and we must put points in one of the group.
• In some cases, K is not clearly defined, and we must think
about the optimal number of K.
• K Means clustering performs best data is well separated. When
data points overlapped this clustering is not suitable.
• K Means is faster as compared to other clustering technique.
• It provides strong coupling between the data points.
• K Means cluster do not provide clear information regarding the
quality of clusters.
• Different initial assignment of cluster centroid may lead to
different clusters.
• Also, K Means algorithm is sensitive to noise. It may have stuck in
local minima.
Objective of k-means
clustering
• The goal of clustering is to divide the population
or set of data points into a number of groups so
that the data points within each group are more
comparable to one another and different from the
data points within the other groups.
• It is essentially a grouping of things based on how
similar and different they are to one another.
How k-means clustering
works?
• We are given a data set of items, with certain features, and
values for these features (like a vector).
• The task is to categorize those items into groups.
• To achieve this, we will use the K-means algorithm, an
unsupervised learning algorithm.
• ‘K’ in the name of the algorithm represents the number of
groups/clusters we want to classify our items into.
• The algorithm will categorize the items into k
groups or clusters of similarity.
• To calculate that similarity, we will use the
Euclidean distance as a measurement.
• The algorithm works as follows:

• First, we randomly initialize k points, called means


or cluster centroids.
• We categorize each item to its closest mean, and
we update the mean’s coordinates, which are the
averages of the items categorized in that cluster so
far.
• We repeat the process for a given number of
iterations and at the end, we have our clusters.
• The “points” mentioned above are called means because they
are the mean values of the items categorized in them.
• To initialize these means, we have a lot of options.
• An intuitive method is to initialize the means at random
items in the data set.
• Another method is to initialize the means at random values
between the boundaries of the data set (if for a feature x, the
items have values in [0,3], we will initialize the means with
values for x at [0,3]).
• The above algorithm in pseudocode is as follows:
Initialize k means with random values
--> For a given number of iterations:

--> Iterate through items:

--> Find the mean closest to the item by calculating


the euclidean distance of the item with each of the means

--> Assign item to mean

--> Update mean by shifting it to the average of the items in


that cluster
Let’s get started

You might also like