Assignment No 5 K-Means Clustering
Assignment No 5 K-Means Clustering
Assignment No 5 K-Means Clustering
Clustering
Clustering: the process of grouping a set of objects into classes of similar
objects
Clustering: Types
Clustering can be broadly divided into two subgroups:
Hard clustering: in hard clustering, each data object or point either belongs to a
cluster completely or not. For example in the Uber dataset, each location belongs
to either one borough or the other.
Soft clustering: in soft clustering, a data point can belong to more than one
cluster with some probability or likelihood value. For example, you could identify
some locations as the border points belonging to two or more boroughs.
K-means algorithm
K-mean is, without doubt, the most popular clustering method. Researchers released
the algorithm decades ago, and lots of improvements have been done to k-means.
The algorithm tries to find groups by minimizing the distance between the
observations, called local optimal solutions. The distances are measured based on
the coordinates of the observations.
Algorithm
The algorithm works as follow:
Step 1: Choose groups in the feature plan randomly
Step 2: Minimize the distance between the cluster center and the different
observations (centroid). It results in groups with observations
Step 3: Shift the initial centroid to the mean of the coordinates within a group.
Step 4: Minimize the distance according to the new centroids. New boundaries are
created. Thus, observations will move from one group to another
Repeat until no observation changes groups
Algorithm
Algorithm
Install and import required packages.
Load dataset
Define K (no of clusters)
Kmeans clustering
Calculate inertia for the given no of k
Select the k which has low inertia and low value of k for predicting