0% found this document useful (0 votes)

10 views

Module 5

Uploaded by

JADEN JOSEPH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Module 5

Uploaded by

JADEN JOSEPH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 370

Duration: 60 min Grade: 11 CCSS, NGSS

Module-5

Learning with Clustering

Learning with Clustering
5.1
Introduction to clustering with overview of distance metrics
Major clustering approaches
5.2
Graph Based Clustering: Clustering with minimal spanning tree
Model based Clustering: Expectation Maximization Algorithm
Density Based Clustering: DBSCAN
Introduction to clustering
• The dictionary meaning of cluster is "a number of similar things that occur together".
• For example, cluster of stars in the galaxy
• With respect to machine learning, Clustering is a technique in which the data points are
arranged in similar groups dynamically without any pre-assignment of groups.
• It is the task the data points are grouped in such a way that objects in the same group
(called a cluster) are more similar to each other than to those in other groups (clusters).
• It is a data exploration technique commonly used to understand how the data should be
interpreted and grouped to be best analysed.
• There is no prior learning of groupings required. Instead, groups are implicitly
arranged (or created) based on the data attributes
Introduction to clustering
• The task of grouping data points based on their similarity with each other is called
Clustering or Cluster Analysis.
• This method is defined under the branch of Unsupervised Learning.
• Clustering aims at forming groups of homogeneous data points from a heterogeneous
dataset.
Introduction to clustering
• Now it is not necessary that the clusters formed must be circular in shape. The shape of
clusters can be arbitrary. There are many algorithms that work well with detecting
arbitrary shaped clusters.
• For example, In the below given graph we can see that the clusters formed are not
circular in shape.
Introduction to clustering
• Clustering helps to splits
data into several subsets.
Each of these subsets
contains data similar to
each other, and these
subsets are called clusters.
• Now that the data from
our customer base is
divided into clusters, we
can make an informed
decision about who we
think is best suited for
this product.
Introduction to clustering
• Lets take simple example to understand the clustering technique

Customer Name Region Monthly Income Age Phone Model Laptop Model
A Bangalore 50,000 29 iPhone MacBook
B Bangalore 35,000 34 Motorola Dell
C Mumbai 80,000 36 iPhone Dell
D Mumbai 40,000 26 iPhone Dell
E Mumbai 55,000 39 iPhone MacBook
Introduction to clustering
• Lets take simple example to understand the clustering technique

Bangalore A, B
Region
Mumbai C, D, C
Introduction to clustering
• Lets take simple example to understand the clustering technique

Greater than
C, E
50,000/-
Monthly
Income
Less than
A, B, D
50,000/-
Introduction to clustering
• Lets take simple example to understand the clustering technique

Greater
C
than 35
Less than
Age A, E
30
Between
B
30 to 35
Introduction to clustering
• Lets take simple example to understand the clustering technique

iPhone A, C, D, E
Phone
Model
Motorola B
Introduction to clustering
• Lets take simple example to understand the clustering technique

MacBook A, E
Laptop
Model
Dell B, C, D
Introduction to clustering
• Properties of a cluster
• Property 1: Cohesion (Intra-cluster
Similarity)
• All the data points in a cluster should be similar to
each other.
• This property means that all data points within the
same cluster should be as similar to each other as
possible.
• In other words, the points within a single cluster
should be close together, indicating a strong internal
similarity. The more similar the data points are
within a cluster, the better the cohesion of that
cluster.
Introduction to clustering
• Properties of a cluster
• Property 1: Separation (Inter-cluster
Dissimilarity)
• The data points from different clusters should be as
different as possible
• This property focuses on ensuring that data points
from different clusters are as different from each
other as possible.
• This means that the clusters should be well-
separated, with a clear distinction between them.
The more dissimilar the clusters are, the better the
separation between them.
Overview of distance metrics
• Distance metrics are a key part of several machine learning algorithms.
• They are used in both supervised and unsupervised learning, generally to calculate the similarity
between data points.
• An effective distance metric improves the performance of our machine learning model, whether that’s
for classification tasks or clustering.
• Distance Metrics allow us to numerically quantify how similar two points are by calculating the
distance in between them.
• When the calculation of a distance metric leads to a small quantity, it means that the two points are
similar, when it is big, they are different.
Overview of distance metrics
• Types of Distance Metrics in Machine Learning
1. Euclidean Distance
2. Manhattan Distance
3. Minkowski Distance
4. Hamming Distance
Overview of distance metrics
• Euclidean Distance
• Euclidean Distance represents the shortest
distance between two vectors.
• It is the square root of the sum of squares of
differences between corresponding elements.
• Most machine learning algorithms, including
K-Means use this distance metric to measure
the similarity between observations.
Overview of distance metrics
• Euclidean Distance
• Let’s say we have two points, as shown in figure
Overview of distance metrics
• Euclidean Distance
• Let’s say we have two points, as shown in figure:
• So, the Euclidean Distance between these two points, A and B, will be
Overview of distance metrics
• Euclidean Distance
• Let’s say we have two points, as shown in figure:
• So, the Euclidean Distance between these two points, A and B, will be
• Formula for Euclidean Distance
Overview of distance metrics
• Euclidean Distance
• Let’s say we have two points, as shown in figure:
• So, the Euclidean Distance between these two points, A and B, will be
• Formula for Euclidean Distance

• We use this formula when we are dealing with 2 dimensions. We can generalize this for an n-dimensional
space as:

Where,
n = number of dimensions
pi, qi = data points
Overview of distance metrics
• Euclidean distance is a useful metric in many machine learning algorithms,
• K-nearest neighbor
• K-means clustering
Overview of distance metrics
Overview of distance metrics
Overview of distance metrics
• Manhattan Distance
• Manhattan Distance is the sum of absolute
differences between points across all the
dimensions.
• Also known as the City Block distance.
• This involves measuring the distance
between two points by summing the
differences in their coordinates along each
dimension.
• It is often used in cases where movement
can only occur along grid lines.
Overview of distance metrics
• Manhattan Distance
• Manhattan Distance is the sum of absolute differences
between points across all the dimensions.
• We can represent Manhattan Distance as
• So, the Manhattan distance in a 2-dimensional space is
given as

• And the generalized formula for an n-dimensional space is

given as:

Where,
n = number of dimensions
pi, qi = data points
Overview of distance metrics
• Comparison between Manhattan
Distance and Euclidean distance
• While Manhattan distance measures the
path along grid lines, Euclidean distance
measures the straight-line distance
between two points.
• For 2D example: Consider two points:
A(1, 1) and B(4, 5):
• Manhattan distance: |x₁ - x₂| + |y₁ - y₂|
= |1 - 4| + |1 - 5| = 4 + 3
• Euclidean distance: Sqrt((1-4)² + (1-5)²)
= 5 units
Overview of distance metrics
• Applications of Manhattan Distance:
1. Pathfinding algorithms (e.g., A* algorithm)
2. Clustering techniques (e.g., K-Means clustering)
Overview of distance metrics
Overview of distance metrics
Overview of distance metrics
• Minkowski Distance
• The Minkowski distance is a generalization
of the Euclidean and Manhattan distances.
• It allows the flexibility to consider
different power values, determining
whether the distance calculation is
influenced more by the larger differences
or the smaller ones
𝑛 ℎ
ℎ
𝐷 = ෍ 𝑝i − 𝑞𝑖
𝑖=1

• Here, h represents the order of the norm

Overview of distance metrics
• Minkowski Distance
• The Minkowski distance is a generalization
of the Euclidean and Manhattan distances.
• It allows the flexibility to consider
different power values, determining
whether the distance calculation is
influenced more by the larger differences
or the smaller ones
𝑛 ℎ
ℎ
𝐷 = ෍ 𝑝i − 𝑞𝑖
𝑖=1

• Here, h represents the order of the norm

• For h=1
𝑛 1
1
𝐷 = ෍ 𝑝i − 𝑞𝑖
𝑖=1
Overview of distance metrics
• Minkowski Distance
• The Minkowski distance is a generalization
of the Euclidean and Manhattan distances.
• It allows the flexibility to consider
different power values, determining
whether the distance calculation is
influenced more by the larger differences
or the smaller ones
𝑛 1/ℎ
ℎ
𝐷 = ෍ 𝑝i − 𝑞𝑖
𝑖=1

• For h=2
𝑛 1/2
2
𝐷 = ෍ 𝑝i − 𝑞𝑖
𝑖=1
Overview of distance metrics
• Minkowski Distance
1. K-Nearest Neighbors,
2. Learning Vector Quantization (LVQ),
3. Self-Organizing Map (SOM)
4. K-Means Clustering.
Overview of distance metrics
Overview of distance metrics
Overview of distance metrics
Overview of distance metrics
• Hamming Distance
• Hamming distance is used to measure the
difference between two binary vectors,
and it counts the number of positions at
which the corresponding bits are different. • 𝑎𝑖 represents 𝑖𝑡ℎ symbol of string A
• Hamming distance is all about calculating • 𝑏𝑖 represents𝑖𝑡ℎ symbol of string B
the similarity between two strings of equal • 𝛿 𝑎𝑖 , 𝑏𝑖 is an indication function that
length. returns 0 if ai and bi are the same and 1 if
• Or it is useful when calculating the they are different
distance in between observations for
which we have only binary features.
Overview of distance metrics
Major clustering approaches
• Approaches used in clustering algorithms
1. Hard Clustering
• Definition: In hard clustering, each data point is assigned to exactly one cluster. This means that every point
definitively belongs to a single cluster without any ambiguity.
Major clustering approaches
• Approaches used in clustering algorithms
2. Soft Clustering
• Definition: In soft clustering, each data point can belong to multiple clusters, with a certain probability or
degree of membership. Instead of a hard assignment to a single cluster, data points are assigned a set of
probabilities that sum to 1, representing their likelihood of belonging to each cluster.
Types of clustering
• Approaches used in clustering algorithms
2. Soft Clustering
• Definition: In soft clustering, each data point can belong to multiple clusters, with a certain probability or
degree of membership. Instead of a hard assignment to a single cluster, data points are assigned a set of
probabilities that sum to 1, representing their likelihood of belonging to each cluster.
Learning with Clustering
5.1
Introduction to clustering with overview of distance metrics
Major clustering approaches
5.2
Graph Based Clustering: Clustering with minimal spanning tree
Model based Clustering: Expectation Maximization Algorithm
Density Based Clustering: DBSCAN
Types of clustering

Types of
clustering

Partition
Density
Based Hierarchical Grid Based Graph Based Model Based
Based
Clustering

BIRCH DBSCAN
K-means STING MST EM
CURE DENCLUE
K-Mediods CLIQUE CLICK COBWEB
ROCK OPTICS
Types of clustering

Types of
clustering

Partition
Density
Based Hierarchical Grid Based Graph Based Model Based
Based
Clustering

BIRCH DBSCAN
K-means STING MST EM
CURE DENCLUE
K-Mediods CLIQUE CLICK COBWEB
ROCK OPTICS
Types of clustering

Types of
clustering

Partition
Density
Based Hierarchical Grid Based Graph Based Model Based
Based
Clustering

BIRCH DBSCAN
K-means STING MST EM
CURE DENCLUE
K-Mediods CLIQUE CLICK COBWEB
ROCK OPTICS
Types of clustering

Types of
clustering

Partition
Density
Based Hierarchical Grid Based Graph Based Model Based
Based
Clustering

BIRCH DBSCAN
K-means STING MST EM
CURE DENCLUE
K-Mediods CLIQUE CLICK COBWEB
ROCK OPTICS
Types of clustering

Types of
clustering

Partition
Density
Based Hierarchical Grid Based Graph Based Model Based
Based
Clustering

BIRCH DBSCAN
K-means STING MST EM
CURE DENCLUE
K-Mediods CLIQUE CLICK COBWEB
ROCK OPTICS
Types of clustering

Types of
clustering

Partition
Density
Based Hierarchical Grid Based Graph Based Model Based
Based
Clustering

BIRCH DBSCAN
K-means STING MST EM
CURE DENCLUE
K-Mediods CLIQUE CLICK COBWEB
ROCK OPTICS
Types of clustering

Types of
clustering

Partition
Density
Based Hierarchical Grid Based Graph Based Model Based
Based
Clustering

BIRCH DBSCAN
K-means STING MST EM
CURE DENCLUE
K-Mediods CLIQUE CLICK COBWEB
ROCK OPTICS
Types of clustering

Types of
clustering

Partition
Density
Based Hierarchical Grid Based Graph Based Model Based
Based
Clustering

BIRCH DBSCAN
K-means STING MST EM
CURE DENCLUE
K-Mediods CLIQUE CLICK COBWEB
ROCK OPTICS
Types of clustering

Types of
clustering

Partition
Density
Based Hierarchical Grid Based Graph Based Model Based
Based
Clustering

BIRCH DBSCAN
K-means STING MST EM
CURE DENCLUE
K-Mediods CLIQUE CLICK COBWEB
ROCK OPTICS
Partition Based Clustering: K-Means
K-Means Clustering Algorithm:
• Clustering is dividing data points into homogeneous classes or clusters
• Points in the same group are as similar as possible
• Points in different group are as dissimilar as possible
Partition Based Clustering: K-Means
K-Means Clustering Algorithm:
• A K-means clustering algorithm tries to group similar items in the form of clusters.
• The number of groups are represented by K. If K=2, there will be two clusters.
• It is centroid based algorithm where each cluster is associated with centroid.
• The main aim of this algorithm is to minimize the sum of the distance between the data
points and their corresponding clusters.
Partition Based Clustering: K-Means
Partition Based Clustering: K-Means
Cluster the following eight points (with (x, y) representing locations) into three clusters:
A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)
Solution:
Step1: Select number k to decide number of clusters
𝐾=3

Step2: Select random K points

A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)

Step3: Assign each data point to their closest centroid

Partition Based Clustering: K-Means
Data Points Distance to Cluster New
Cluster
A1 2 10
A2 2 5
A3 8 4
A4 5 8
A5 7 5
A6 6 4
A7 1 2
A8 4 9
Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
Cluster
A1 (2,10)
A1 2 10 A4 (5,8)
A2 2 5 A7 (1,2)
A3 8 4
A4 5 8
A5 7 5
A6 6 4
A7 1 2
A8 4 9
Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 A4 (5,8)
A2 2 5 A7 (1,2)
A3 8 4
A4 5 8
A5 7 5
A6 6 4
A7 1 2
A8 4 9
Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 A4 (5,8)
A2 2 5 A7 (1,2)
A3 8 4
A4 5 8
A5 7 5
A6 6 4
A7 1 2
A8 4 9
Partition Based Clustering: K-Means
Data Points x2 y2 Distance to Cluster New Initial Centroids
x1 y1 2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 A4 (5,8)
A2 2 5 A7 (1,2)
A3 8 4
A4 5 8
A5 7 5
A6 6 4
A7 1 2
A8 4 9
Partition Based Clustering: K-Means
Data Points x2 y2 Distance to Cluster New Initial Centroids
x1 y1 2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 A4 (5,8)
A2 2 5 A7 (1,2)
A3 8 4
A4 5 8
A5 7 5
A6 6 4
A7 1 2
A8 4 9

𝑑 = 2−2 2 + 10 − 10 2
Partition Based Clustering: K-Means
Data Points x2 y2 Distance to Cluster New Initial Centroids
x1 y1 2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 A4 (5,8)
A2 2 5 A7 (1,2)
A3 8 4
A4 5 8
A5 7 5
A6 6 4
A7 1 2
A8 4 9

𝑑 = 2−2 2 + 10 − 10 2

= 0.00
Partition Based Clustering: K-Means
Data Points x2 y2 Distance to Cluster New Initial Centroids
x1 y1 2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 A4 (5,8)
A2 2 5 5.00 A7 (1,2)
A3 8 4 8.49
A4 5 8 3.61
A5 7 5 7.07
A6 6 4 7.21
A7 1 2 8.06
A8 4 9 2.24
Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
x2 y2
x1 y1 2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 A4 (5,8)
A2 2 5 5.00 A7 (1,2)
A3 8 4 8.49
A4 5 8 3.61
A5 7 5 7.07
A6 6 4 7.21
A7 1 2 8.06
A8 4 9 2.24
Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
x2 y2
x1 y1 2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 A4 (5,8)
A2 2 5 5.00 4.24 A7 (1,2)
A3 8 4 8.49 5.00
A4 5 8 3.61 0.00
A5 7 5 7.07 3.61
A6 6 4 7.21 4.12
A7 1 2 8.06 7.21
A8 4 9 2.24 1.41
Partition Based Clustering: K-Means
Data Points Distance to x2 y2 Cluster New Initial Centroids
x1 y1 2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 A4 (5,8)
A2 2 5 5.00 4.24 A7 (1,2)
A3 8 4 8.49 5.00
A4 5 8 3.61 0.00
A5 7 5 7.07 3.61
A6 6 4 7.21 4.12
A7 1 2 8.06 7.21
A8 4 9 2.24 1.41
Partition Based Clustering: K-Means
Data Points Distance to x2 y2 Cluster New Initial Centroids
x1 y1 2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 A4 (5,8)
A2 2 5 5.00 4.24 3.16 A7 (1,2)
A3 8 4 8.49 5.00 7.28
A4 5 8 3.61 0.00 7.21
A5 7 5 7.07 3.61 6.71
A6 6 4 7.21 4.12 5.39
A7 1 2 8.06 7.21 0.00
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 A4 (5,8)
A2 2 5 5.00 4.24 3.16 A7 (1,2)
A3 8 4 8.49 5.00 7.28
A4 5 8 3.61 0.00 7.21
A5 7 5 7.07 3.61 6.71
A6 6 4 7.21 4.12 5.39
A7 1 2 8.06 7.21 0.00
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 A4 (5,8)
A2 2 5 5.00 4.24 3.16 A7 (1,2)
A3 8 4 8.49 5.00 7.28
A4 5 8 3.61 0.00 7.21
A5 7 5 7.07 3.61 6.71
A6 6 4 7.21 4.12 5.39
A7 1 2 8.06 7.21 0.00
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 A7 (1,2)
A3 8 4 8.49 5.00 7.28
A4 5 8 3.61 0.00 7.21
A5 7 5 7.07 3.61 6.71
A6 6 4 7.21 4.12 5.39
A7 1 2 8.06 7.21 0.00
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 A7 (1,2)
A3 8 4 8.49 5.00 7.28
A4 5 8 3.61 0.00 7.21
A5 7 5 7.07 3.61 6.71
A6 6 4 7.21 4.12 5.39
A7 1 2 8.06 7.21 0.00
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28
A4 5 8 3.61 0.00 7.21
A5 7 5 7.07 3.61 6.71
A6 6 4 7.21 4.12 5.39
A7 1 2 8.06 7.21 0.00
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28
A4 5 8 3.61 0.00 7.21
A5 7 5 7.07 3.61 6.71
A6 6 4 7.21 4.12 5.39
A7 1 2 8.06 7.21 0.00
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21
A5 7 5 7.07 3.61 6.71
A6 6 4 7.21 4.12 5.39
A7 1 2 8.06 7.21 0.00
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21
A5 7 5 7.07 3.61 6.71
A6 6 4 7.21 4.12 5.39
A7 1 2 8.06 7.21 0.00
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2
A5 7 5 7.07 3.61 6.71
A6 6 4 7.21 4.12 5.39
A7 1 2 8.06 7.21 0.00
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2
A5 7 5 7.07 3.61 6.71
A6 6 4 7.21 4.12 5.39
A7 1 2 8.06 7.21 0.00
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2
A5 7 5 7.07 3.61 6.71 2
A6 6 4 7.21 4.12 5.39
A7 1 2 8.06 7.21 0.00
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2
A5 7 5 7.07 3.61 6.71 2
A6 6 4 7.21 4.12 5.39
A7 1 2 8.06 7.21 0.00
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2
A5 7 5 7.07 3.61 6.71 2
A6 6 4 7.21 4.12 5.39 2
A7 1 2 8.06 7.21 0.00
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2
A5 7 5 7.07 3.61 6.71 2
A6 6 4 7.21 4.12 5.39 2
A7 1 2 8.06 7.21 0.00
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2
A5 7 5 7.07 3.61 6.71 2
A6 6 4 7.21 4.12 5.39 2
A7 1 2 8.06 7.21 0.00 3
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2
A5 7 5 7.07 3.61 6.71 2
A6 6 4 7.21 4.12 5.39 2
A7 1 2 8.06 7.21 0.00 3
A8 4 9 2.24 1.41 7.62
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2
A5 7 5 7.07 3.61 6.71 2
A6 6 4 7.21 4.12 5.39 2
A7 1 2 8.06 7.21 0.00 3
A8 4 9 2.24 1.41 7.62 2
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2
A5 7 5 7.07 3.61 6.71 2
A6 6 4 7.21 4.12 5.39 2
A7 1 2 8.06 7.21 0.00 3
A8 4 9 2.24 1.41 7.62 2

Step4: Calculate new centroid for each cluster

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2 New Centroids
A5 7 5 7.07 3.61 6.71 2
A6 6 4 7.21 4.12 5.39 2
A7 1 2 8.06 7.21 0.00 3
A8 4 9 2.24 1.41 7.62 2

Step4: Calculate new centroid for each cluster

Cluster: 1

Step4: Calculate new centroid for each cluster

Cluster: 1
(2,10)

Step4: Calculate new centroid for each cluster

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2 New Centroids
A5 7 5 7.07 3.61 6.71 2 (2,10)
A6 6 4 7.21 4.12 5.39 2
A7 1 2 8.06 7.21 0.00 3
A8 4 9 2.24 1.41 7.62 2

Cluster: 1
(2,10)

Step4: Calculate new centroid for each cluster

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2 New Centroids
A5 7 5 7.07 3.61 6.71 2 (2,10)
A6 6 4 7.21 4.12 5.39 2
A7 1 2 8.06 7.21 0.00 3
A8 4 9 2.24 1.41 7.62 2

Cluster: 2

Step4: Calculate new centroid for each cluster

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2 New Centroids
A5 7 5 7.07 3.61 6.71 2 (2,10)
A6 6 4 7.21 4.12 5.39 2
A7 1 2 8.06 7.21 0.00 3
A8 4 9 2.24 1.41 7.62 2

Cluster: 2
= (8+5+7+6+4)/5 , (4+8+5+4+9)/5
= (6,6)
Step4: Calculate new centroid for each cluster
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2 New Centroids
A5 7 5 7.07 3.61 6.71 2 (2,10)
A6 6 4 7.21 4.12 5.39 2 (6, 6)
A7 1 2 8.06 7.21 0.00 3
A8 4 9 2.24 1.41 7.62 2

Cluster: 3

Step4: Calculate new centroid for each cluster

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2 New Centroids
A5 7 5 7.07 3.61 6.71 2 (2,10)
A6 6 4 7.21 4.12 5.39 2 (6, 6)
A7 1 2 8.06 7.21 0.00 3
A8 4 9 2.24 1.41 7.62 2

Cluster: 3

Step4: Calculate new centroid for each cluster

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2 New Centroids
A5 7 5 7.07 3.61 6.71 2 (2,10)
A6 6 4 7.21 4.12 5.39 2 (6, 6)
A7 1 2 8.06 7.21 0.00 3
A8 4 9 2.24 1.41 7.62 2

Cluster: 3
= (2+1)/2, (5+2)/2
= 1.5, 3.5
Step4: Calculate new centroid for each cluster
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
A1 (2,10)
A1 2 10 0.00 3.61 8.06 1 A4 (5,8)
A2 2 5 5.00 4.24 3.16 3 A7 (1,2)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2 New Centroids
A5 7 5 7.07 3.61 6.71 2 (2,10)
A6 6 4 7.21 4.12 5.39 2 (6, 6)
A7 1 2 8.06 7.21 0.00 3 (1.5, 3.5)
A8 4 9 2.24 1.41 7.62 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 5 8 1 2 Cluster
(2,10)
A1 2 10 0.00 3.61 8.06 1 (6, 6)
A2 2 5 5.00 4.24 3.16 3 (1.5, 3.5)
A3 8 4 8.49 5.00 7.28 2
A4 5 8 3.61 0.00 7.21 2
A5 7 5 7.07 3.61 6.71 2
A6 6 4 7.21 4.12 5.39 2
A7 1 2 8.06 7.21 0.00 3
A8 4 9 2.24 1.41 7.62 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 1 (6, 6)
A2 2 5 3 (1.5, 3.5)
A3 8 4 2
A4 5 8 2
A5 7 5 2
A6 6 4 2
A7 1 2 3
A8 4 9 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points x2 y2 Distance to Cluster New Initial Centroids
x1 y1 2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 1 (6, 6)
A2 2 5 3 (1.5, 3.5)
A3 8 4 2
A4 5 8 2
A5 7 5 2
A6 6 4 2
A7 1 2 3
A8 4 9 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points x2 y2 Distance to Cluster New Initial Centroids
x1 y1 2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 1 (6, 6)
A2 2 5 5.00 3 (1.5, 3.5)
A3 8 4 8.49 2
A4 5 8 3.61 2
A5 7 5 7.07 2
A6 6 4 7.21 2
A7 1 2 8.06 3
A8 4 9 2.24 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
x2 y2
x1 y1 2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 1 (6, 6)
A2 2 5 5.00 3 (1.5, 3.5)
A3 8 4 8.49 2
A4 5 8 3.61 2
A5 7 5 7.07 2
A6 6 4 7.21 2
A7 1 2 8.06 3
A8 4 9 2.24 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
x2 y2
x1 y1 2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 1 (6, 6)
A2 2 5 5.00 4.12 3 (1.5, 3.5)
A3 8 4 8.49 2.83 2
A4 5 8 3.61 2.24 2
A5 7 5 7.07 1.41 2
A6 6 4 7.21 2.00 2
A7 1 2 8.06 6.40 3
A8 4 9 2.24 3.61 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points Distance to x2 y2 Cluster New Initial Centroids
x1 y1 2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 1 (6, 6)
A2 2 5 5.00 4.12 3 (1.5, 3.5)
A3 8 4 8.49 2.83 2
A4 5 8 3.61 2.24 2
A5 7 5 7.07 1.41 2
A6 6 4 7.21 2.00 2
A7 1 2 8.06 6.40 3
A8 4 9 2.24 3.61 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points Distance to x2 y2 Cluster New Initial Centroids
x1 y1 2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2
A4 5 8 3.61 2.24 5.70 2
A5 7 5 7.07 1.41 5.70 2
A6 6 4 7.21 2.00 4.53 2
A7 1 2 8.06 6.40 1.58 3
A8 4 9 2.24 3.61 6.04 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2
A4 5 8 3.61 2.24 5.70 2
A5 7 5 7.07 1.41 5.70 2
A6 6 4 7.21 2.00 4.53 2
A7 1 2 8.06 6.40 1.58 3
A8 4 9 2.24 3.61 6.04 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2
A4 5 8 3.61 2.24 5.70 2
A5 7 5 7.07 1.41 5.70 2
A6 6 4 7.21 2.00 4.53 2
A7 1 2 8.06 6.40 1.58 3
A8 4 9 2.24 3.61 6.04 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2
A4 5 8 3.61 2.24 5.70 2
A5 7 5 7.07 1.41 5.70 2
A6 6 4 7.21 2.00 4.53 2
A7 1 2 8.06 6.40 1.58 3
A8 4 9 2.24 3.61 6.04 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2
A5 7 5 7.07 1.41 5.70 2
A6 6 4 7.21 2.00 4.53 2
A7 1 2 8.06 6.40 1.58 3
A8 4 9 2.24 3.61 6.04 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2
A5 7 5 7.07 1.41 5.70 2
A6 6 4 7.21 2.00 4.53 2
A7 1 2 8.06 6.40 1.58 3
A8 4 9 2.24 3.61 6.04 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2
A5 7 5 7.07 1.41 5.70 2
A6 6 4 7.21 2.00 4.53 2
A7 1 2 8.06 6.40 1.58 3
A8 4 9 2.24 3.61 6.04 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2
A5 7 5 7.07 1.41 5.70 2 2
A6 6 4 7.21 2.00 4.53 2
A7 1 2 8.06 6.40 1.58 3
A8 4 9 2.24 3.61 6.04 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2
A5 7 5 7.07 1.41 5.70 2 2
A6 6 4 7.21 2.00 4.53 2
A7 1 2 8.06 6.40 1.58 3
A8 4 9 2.24 3.61 6.04 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2
A5 7 5 7.07 1.41 5.70 2 2
A6 6 4 7.21 2.00 4.53 2 2
A7 1 2 8.06 6.40 1.58 3
A8 4 9 2.24 3.61 6.04 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2
A5 7 5 7.07 1.41 5.70 2 2
A6 6 4 7.21 2.00 4.53 2 2
A7 1 2 8.06 6.40 1.58 3
A8 4 9 2.24 3.61 6.04 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2
A5 7 5 7.07 1.41 5.70 2 2
A6 6 4 7.21 2.00 4.53 2 2
A7 1 2 8.06 6.40 1.58 3 3
A8 4 9 2.24 3.61 6.04 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2
A5 7 5 7.07 1.41 5.70 2 2
A6 6 4 7.21 2.00 4.53 2 2
A7 1 2 8.06 6.40 1.58 3 3
A8 4 9 2.24 3.61 6.04 2

Step5: Repeat the step3 for new centroids

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2
A5 7 5 7.07 1.41 5.70 2 2
A6 6 4 7.21 2.00 4.53 2 2
A7 1 2 8.06 6.40 1.58 3 3
A8 4 9 2.24 3.61 6.04 2 1

Step6: If any reassignment is there then goto step 4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2
A5 7 5 7.07 1.41 5.70 2 2
A6 6 4 7.21 2.00 4.53 2 2
A7 1 2 8.06 6.40 1.58 3 3
A8 4 9 2.24 3.61 6.04 2 1

Step6: If any reassignment is there then goto step 4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2 New Centroids
A5 7 5 7.07 1.41 5.70 2 2
A6 6 4 7.21 2.00 4.53 2 2
A7 1 2 8.06 6.40 1.58 3 3
A8 4 9 2.24 3.61 6.04 2 1

Step6: If any reassignment is there then goto step 4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2 New Centroids
A5 7 5 7.07 1.41 5.70 2 2
A6 6 4 7.21 2.00 4.53 2 2
A7 1 2 8.06 6.40 1.58 3 3
A8 4 9 2.24 3.61 6.04 2 1

Step6: If any reassignment is there then goto step 4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2 New Centroids
A5 7 5 7.07 1.41 5.70 2 2
A6 6 4 7.21 2.00 4.53 2 2
A7 1 2 8.06 6.40 1.58 3 3
A8 4 9 2.24 3.61 6.04 2 1

Cluster: 1
= (2+4)/2, (10+9)/2
= 3, 9.5
Step6: If any reassignment is there then goto step 4
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2 New Centroids
A5 7 5 7.07 1.41 5.70 2 2 3, 9.5
A6 6 4 7.21 2.00 4.53 2 2
A7 1 2 8.06 6.40 1.58 3 3
A8 4 9 2.24 3.61 6.04 2 1

Cluster: 2

Step6: If any reassignment is there then goto step 4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2 New Centroids
A5 7 5 7.07 1.41 5.70 2 2 3, 9.5
A6 6 4 7.21 2.00 4.53 2 2
A7 1 2 8.06 6.40 1.58 3 3
A8 4 9 2.24 3.61 6.04 2 1

Cluster: 2

Step6: If any reassignment is there then goto step 4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2 New Centroids
A5 7 5 7.07 1.41 5.70 2 2 3, 9.5
A6 6 4 7.21 2.00 4.53 2 2
A7 1 2 8.06 6.40 1.58 3 3
A8 4 9 2.24 3.61 6.04 2 1

Cluster: 2
= (8+5+7+6)/4, (4+8+5+4)/4
= 6.5, 5.25
Step6: If any reassignment is there then goto step 4
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2 New Centroids
A5 7 5 7.07 1.41 5.70 2 2 3, 9.5
A6 6 4 7.21 2.00 4.53 2 2 6.5, 5.25
A7 1 2 8.06 6.40 1.58 3 3
A8 4 9 2.24 3.61 6.04 2 1

Cluster: 3

Step6: If any reassignment is there then goto step 4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2 New Centroids
A5 7 5 7.07 1.41 5.70 2 2 3, 9.5
A6 6 4 7.21 2.00 4.53 2 2 6.5, 5.25
A7 1 2 8.06 6.40 1.58 3 3
A8 4 9 2.24 3.61 6.04 2 1

Cluster: 3

Step6: If any reassignment is there then goto step 4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2 New Centroids
A5 7 5 7.07 1.41 5.70 2 2 3, 9.5
A6 6 4 7.21 2.00 4.53 2 2 6.5, 5.25
A7 1 2 8.06 6.40 1.58 3 3
A8 4 9 2.24 3.61 6.04 2 1

Cluster: 3
= (2+1)/2, (5+2)/2
= 1.5, 3.5
Step6: If any reassignment is there then goto step 4
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
2 10 6 6 1.5 3.5 Cluster
(2,10)
A1 2 10 0.00 5.66 6.52 1 1 (6, 6)
A2 2 5 5.00 4.12 1.58 3 3 (1.5, 3.5)
A3 8 4 8.49 2.83 6.52 2 2
A4 5 8 3.61 2.24 5.70 2 2 New Centroids
A5 7 5 7.07 1.41 5.70 2 2 3, 9.5
A6 6 4 7.21 2.00 4.53 2 2 6.5, 5.25
A7 1 2 8.06 6.40 1.58 3 3 1.5, 3.5
A8 4 9 2.24 3.61 6.04 2 1

Cluster: 3
= (2+1)/2, (5+2)/2
= 1.5, 3.5
Step6: If any reassignment is there then goto step 4
Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
Cluster
3, 9.5
A1 2 10 1 1 6.5, 5.25
A2 2 5 3 3 1.5, 3.5
A3 8 4 2 2
A4 5 8 2 2
A5 7 5 2 2
A6 6 4 2 2
A7 1 2 3 3
A8 4 9 2 1

Cluster: 3
= (2+1)/2, (5+2)/2
= 1.5, 3.5
Step6: If any reassignment is there then goto step 4
Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1 6.5, 5.25
A2 2 5 3 1.5, 3.5
A3 8 4 2
A4 5 8 2
A5 7 5 2
A6 6 4 2
A7 1 2 3
A8 4 9 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points x2 y2 Distance to Cluster New Initial Centroids
x1 y1 3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1 6.5, 5.25
A2 2 5 3 1.5, 3.5
A3 8 4 2
A4 5 8 2
A5 7 5 2
A6 6 4 2
A7 1 2 3
A8 4 9 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points x2 y2 Distance to Cluster New Initial Centroids
x1 y1 3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 1 6.5, 5.25
A2 2 5 4.61 3 1.5, 3.5
A3 8 4 7.43 2
A4 5 8 2.50 2
A5 7 5 6.02 2
A6 6 4 6.26 2
A7 1 2 7.76 3
A8 4 9 1.12 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
x2 y2
x1 y1 3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 1 6.5, 5.25
A2 2 5 4.61 3 1.5, 3.5
A3 8 4 7.43 2
A4 5 8 2.50 2
A5 7 5 6.02 2
A6 6 4 6.26 2
A7 1 2 7.76 3
A8 4 9 1.12 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
x2 y2
x1 y1 3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 1 6.5, 5.25
A2 2 5 4.61 4.51 3 1.5, 3.5
A3 8 4 7.43 1.95 2
A4 5 8 2.50 3.13 2
A5 7 5 6.02 0.56 2
A6 6 4 6.26 1.35 2
A7 1 2 7.76 6.39 3
A8 4 9 1.12 4.51 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points Distance to x2 y2 Cluster New Initial Centroids
x1 y1 3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 1 6.5, 5.25
A2 2 5 4.61 4.51 3 1.5, 3.5
A3 8 4 7.43 1.95 2
A4 5 8 2.50 3.13 2
A5 7 5 6.02 0.56 2
A6 6 4 6.26 1.35 2
A7 1 2 7.76 6.39 3
A8 4 9 1.12 4.51 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points Distance to x2 y2 Cluster New Initial Centroids
x1 y1 3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2
A4 5 8 2.50 3.13 5.70 2
A5 7 5 6.02 0.56 5.70 2
A6 6 4 6.26 1.35 4.53 2
A7 1 2 7.76 6.39 1.58 3
A8 4 9 1.12 4.51 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2
A4 5 8 2.50 3.13 5.70 2
A5 7 5 6.02 0.56 5.70 2
A6 6 4 6.26 1.35 4.53 2
A7 1 2 7.76 6.39 1.58 3
A8 4 9 1.12 4.51 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2
A4 5 8 2.50 3.13 5.70 2
A5 7 5 6.02 0.56 5.70 2
A6 6 4 6.26 1.35 4.53 2
A7 1 2 7.76 6.39 1.58 3
A8 4 9 1.12 4.51 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2
A4 5 8 2.50 3.13 5.70 2
A5 7 5 6.02 0.56 5.70 2
A6 6 4 6.26 1.35 4.53 2
A7 1 2 7.76 6.39 1.58 3
A8 4 9 1.12 4.51 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2
A4 5 8 2.50 3.13 5.70 2
A5 7 5 6.02 0.56 5.70 2
A6 6 4 6.26 1.35 4.53 2
A7 1 2 7.76 6.39 1.58 3
A8 4 9 1.12 4.51 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2
A5 7 5 6.02 0.56 5.70 2
A6 6 4 6.26 1.35 4.53 2
A7 1 2 7.76 6.39 1.58 3
A8 4 9 1.12 4.51 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2
A5 7 5 6.02 0.56 5.70 2
A6 6 4 6.26 1.35 4.53 2
A7 1 2 7.76 6.39 1.58 3
A8 4 9 1.12 4.51 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1
A5 7 5 6.02 0.56 5.70 2
A6 6 4 6.26 1.35 4.53 2
A7 1 2 7.76 6.39 1.58 3
A8 4 9 1.12 4.51 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1
A5 7 5 6.02 0.56 5.70 2
A6 6 4 6.26 1.35 4.53 2
A7 1 2 7.76 6.39 1.58 3
A8 4 9 1.12 4.51 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1
A5 7 5 6.02 0.56 5.70 2 2
A6 6 4 6.26 1.35 4.53 2
A7 1 2 7.76 6.39 1.58 3
A8 4 9 1.12 4.51 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1
A5 7 5 6.02 0.56 5.70 2 2
A6 6 4 6.26 1.35 4.53 2
A7 1 2 7.76 6.39 1.58 3
A8 4 9 1.12 4.51 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1
A5 7 5 6.02 0.56 5.70 2 2
A6 6 4 6.26 1.35 4.53 2 2
A7 1 2 7.76 6.39 1.58 3
A8 4 9 1.12 4.51 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1
A5 7 5 6.02 0.56 5.70 2 2
A6 6 4 6.26 1.35 4.53 2 2
A7 1 2 7.76 6.39 1.58 3
A8 4 9 1.12 4.51 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1
A5 7 5 6.02 0.56 5.70 2 2
A6 6 4 6.26 1.35 4.53 2 2
A7 1 2 7.76 6.39 1.58 3 3
A8 4 9 1.12 4.51 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1
A5 7 5 6.02 0.56 5.70 2 2
A6 6 4 6.26 1.35 4.53 2 2
A7 1 2 7.76 6.39 1.58 3 3
A8 4 9 1.12 4.51 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1
A5 7 5 6.02 0.56 5.70 2 2
A6 6 4 6.26 1.35 4.53 2 2
A7 1 2 7.76 6.39 1.58 3 3
A8 4 9 1.12 4.51 6.04 1 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1
A5 7 5 6.02 0.56 5.70 2 2
A6 6 4 6.26 1.35 4.53 2 2
A7 1 2 7.76 6.39 1.58 3 3
A8 4 9 1.12 4.51 6.04 1 1

Step6: Repeat step-4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1
A5 7 5 6.02 0.56 5.70 2 2
A6 6 4 6.26 1.35 4.53 2 2
A7 1 2 7.76 6.39 1.58 3 3
A8 4 9 1.12 4.51 6.04 1 1

Step6: Repeat step-4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1 New Centroids
A5 7 5 6.02 0.56 5.70 2 2
A6 6 4 6.26 1.35 4.53 2 2
A7 1 2 7.76 6.39 1.58 3 3
A8 4 9 1.12 4.51 6.04 1 1

Step6: Repeat step-4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1 New Centroids
A5 7 5 6.02 0.56 5.70 2 2
A6 6 4 6.26 1.35 4.53 2 2
A7 1 2 7.76 6.39 1.58 3 3
A8 4 9 1.12 4.51 6.04 1 1

Cluster: 1

Step6: Repeat step-4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1 New Centroids
A5 7 5 6.02 0.56 5.70 2 2
A6 6 4 6.26 1.35 4.53 2 2
A7 1 2 7.76 6.39 1.58 3 3
A8 4 9 1.12 4.51 6.04 1 1

Cluster: 1

Step6: Repeat step-4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1 New Centroids
A5 7 5 6.02 0.56 5.70 2 2
A6 6 4 6.26 1.35 4.53 2 2
A7 1 2 7.76 6.39 1.58 3 3
A8 4 9 1.12 4.51 6.04 1 1

Cluster: 1
= (2+5+4)/3, (10+8+9)/3
= 3.67, 9
Step6: Repeat step-4
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1 New Centroids
A5 7 5 6.02 0.56 5.70 2 2 3.67, 9
A6 6 4 6.26 1.35 4.53 2 2
A7 1 2 7.76 6.39 1.58 3 3
A8 4 9 1.12 4.51 6.04 1 1

Cluster: 2

Step6: Repeat step-4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1 New Centroids
A5 7 5 6.02 0.56 5.70 2 2 3.67, 9
A6 6 4 6.26 1.35 4.53 2 2
A7 1 2 7.76 6.39 1.58 3 3
A8 4 9 1.12 4.51 6.04 1 1

Cluster: 2

Step6: Repeat step-4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1 New Centroids
A5 7 5 6.02 0.56 5.70 2 2 3.67, 9
A6 6 4 6.26 1.35 4.53 2 2
A7 1 2 7.76 6.39 1.58 3 3
A8 4 9 1.12 4.51 6.04 1 1

Cluster: 2
= (8+7+6)/3, (4+5+4)/3
= 7, 4.33
Step6: Repeat step-4
Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1 New Centroids
A5 7 5 6.02 0.56 5.70 2 2 3.67, 9
A6 6 4 6.26 1.35 4.53 2 2 7, 4.33
A7 1 2 7.76 6.39 1.58 3 3
A8 4 9 1.12 4.51 6.04 1 1

Cluster: 3

Step6: Repeat step-4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1 New Centroids
A5 7 5 6.02 0.56 5.70 2 2 3.67, 9
A6 6 4 6.26 1.35 4.53 2 2 7, 4.33
A7 1 2 7.76 6.39 1.58 3 3
A8 4 9 1.12 4.51 6.04 1 1

Cluster: 3

Step6: Repeat step-4

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3 9.5 6.5 5.25 1.5 3.5 Cluster
3, 9.5
A1 2 10 1.12 6.54 6.52 1 1 6.5, 5.25
A2 2 5 4.61 4.51 1.58 3 3 1.5, 3.5
A3 8 4 7.43 1.95 6.52 2 2
A4 5 8 2.50 3.13 5.70 2 1 New Centroids
A5 7 5 6.02 0.56 5.70 2 2 3.67, 9
A6 6 4 6.26 1.35 4.53 2 2 7, 4.33
A7 1 2 7.76 6.39 1.58 3 3 1.5, 3.5
A8 4 9 1.12 4.51 6.04 1 1

Cluster: 3
= (2+1)/2, (5+2)/2
= 1.5, 3.5
Step6: Repeat step-4
Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
Cluster
3.67, 9
A1 2 10 1 1 7, 4.33
A2 2 5 3 3 1.5, 3.5
A3 8 4 2 2
A4 5 8 2 1
A5 7 5 2 2
A6 6 4 2 2
A7 1 2 3 3
A8 4 9 1 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
Cluster
3.67, 9
A1 2 10 1 7, 4.33
A2 2 5 3 1.5, 3.5
A3 8 4 2
A4 5 8 1
A5 7 5 2
A6 6 4 2
A7 1 2 3
A8 4 9 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1 7, 4.33
A2 2 5 3 1.5, 3.5
A3 8 4 2
A4 5 8 1
A5 7 5 2
A6 6 4 2
A7 1 2 3
A8 4 9 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points x2 y2 Distance to Cluster New Initial Centroids
x1 y1 3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1 7, 4.33
A2 2 5 3 1.5, 3.5
A3 8 4 2
A4 5 8 1
A5 7 5 2
A6 6 4 2
A7 1 2 3
A8 4 9 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points x2 y2 Distance to Cluster New Initial Centroids
x1 y1 3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 1 7, 4.33
A2 2 5 4.33 3 1.5, 3.5
A3 8 4 6.62 2
A4 5 8 1.67 1
A5 7 5 5.21 2
A6 6 4 5.52 2
A7 1 2 7.49 3
A8 4 9 0.33 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
x2 y2
x1 y1 3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 1 7, 4.33
A2 2 5 4.33 3 1.5, 3.5
A3 8 4 6.62 2
A4 5 8 1.67 1
A5 7 5 5.21 2
A6 6 4 5.52 2
A7 1 2 7.49 3
A8 4 9 0.33 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
x2 y2
x1 y1 3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 1 7, 4.33
A2 2 5 4.33 5.04 3 1.5, 3.5
A3 8 4 6.62 1.05 2
A4 5 8 1.67 4.18 1
A5 7 5 5.21 0.67 2
A6 6 4 5.52 1.05 2
A7 1 2 7.49 6.44 3
A8 4 9 0.33 5.55 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points Distance to x2 y2 Cluster New Initial Centroids
x1 y1 3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 1 7, 4.33
A2 2 5 4.33 5.04 3 1.5, 3.5
A3 8 4 6.62 1.05 2
A4 5 8 1.67 4.18 1
A5 7 5 5.21 0.67 2
A6 6 4 5.52 1.05 2
A7 1 2 7.49 6.44 3
A8 4 9 0.33 5.55 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points Distance to x2 y2 Cluster New Initial Centroids
x1 y1 3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2
A4 5 8 1.67 4.18 5.70 1
A5 7 5 5.21 0.67 5.70 2
A6 6 4 5.52 1.05 4.53 2
A7 1 2 7.49 6.44 1.58 3
A8 4 9 0.33 5.55 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2
A4 5 8 1.67 4.18 5.70 1
A5 7 5 5.21 0.67 5.70 2
A6 6 4 5.52 1.05 4.53 2
A7 1 2 7.49 6.44 1.58 3
A8 4 9 0.33 5.55 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2
A4 5 8 1.67 4.18 5.70 1
A5 7 5 5.21 0.67 5.70 2
A6 6 4 5.52 1.05 4.53 2
A7 1 2 7.49 6.44 1.58 3
A8 4 9 0.33 5.55 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2
A4 5 8 1.67 4.18 5.70 1
A5 7 5 5.21 0.67 5.70 2
A6 6 4 5.52 1.05 4.53 2
A7 1 2 7.49 6.44 1.58 3
A8 4 9 0.33 5.55 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2
A4 5 8 1.67 4.18 5.70 1
A5 7 5 5.21 0.67 5.70 2
A6 6 4 5.52 1.05 4.53 2
A7 1 2 7.49 6.44 1.58 3
A8 4 9 0.33 5.55 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2 2
A4 5 8 1.67 4.18 5.70 1
A5 7 5 5.21 0.67 5.70 2
A6 6 4 5.52 1.05 4.53 2
A7 1 2 7.49 6.44 1.58 3
A8 4 9 0.33 5.55 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2 2
A4 5 8 1.67 4.18 5.70 1
A5 7 5 5.21 0.67 5.70 2
A6 6 4 5.52 1.05 4.53 2
A7 1 2 7.49 6.44 1.58 3
A8 4 9 0.33 5.55 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2 2
A4 5 8 1.67 4.18 5.70 1 1
A5 7 5 5.21 0.67 5.70 2
A6 6 4 5.52 1.05 4.53 2
A7 1 2 7.49 6.44 1.58 3
A8 4 9 0.33 5.55 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2 2
A4 5 8 1.67 4.18 5.70 1 1
A5 7 5 5.21 0.67 5.70 2
A6 6 4 5.52 1.05 4.53 2
A7 1 2 7.49 6.44 1.58 3
A8 4 9 0.33 5.55 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2 2
A4 5 8 1.67 4.18 5.70 1 1
A5 7 5 5.21 0.67 5.70 2 2
A6 6 4 5.52 1.05 4.53 2
A7 1 2 7.49 6.44 1.58 3
A8 4 9 0.33 5.55 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2 2
A4 5 8 1.67 4.18 5.70 1 1
A5 7 5 5.21 0.67 5.70 2 2
A6 6 4 5.52 1.05 4.53 2
A7 1 2 7.49 6.44 1.58 3
A8 4 9 0.33 5.55 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2 2
A4 5 8 1.67 4.18 5.70 1 1
A5 7 5 5.21 0.67 5.70 2 2
A6 6 4 5.52 1.05 4.53 2 2
A7 1 2 7.49 6.44 1.58 3
A8 4 9 0.33 5.55 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2 2
A4 5 8 1.67 4.18 5.70 1 1
A5 7 5 5.21 0.67 5.70 2 2
A6 6 4 5.52 1.05 4.53 2 2
A7 1 2 7.49 6.44 1.58 3
A8 4 9 0.33 5.55 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2 2
A4 5 8 1.67 4.18 5.70 1 1
A5 7 5 5.21 0.67 5.70 2 2
A6 6 4 5.52 1.05 4.53 2 2
A7 1 2 7.49 6.44 1.58 3 3
A8 4 9 0.33 5.55 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2 2
A4 5 8 1.67 4.18 5.70 1 1
A5 7 5 5.21 0.67 5.70 2 2
A6 6 4 5.52 1.05 4.53 2 2
A7 1 2 7.49 6.44 1.58 3 3
A8 4 9 0.33 5.55 6.04 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points 1 Distance
2 to 3 Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2 2
A4 5 8 1.67 4.18 5.70 1 1
A5 7 5 5.21 0.67 5.70 2 2
A6 6 4 5.52 1.05 4.53 2 2
A7 1 2 7.49 6.44 1.58 3 3
A8 4 9 0.33 5.55 6.04 1 1

Step6: Repeat step-3

Partition Based Clustering: K-Means
Data Points Distance to Cluster New Initial Centroids
3.67 9 7 4.33 1.5 3.5 Cluster
3.67, 9
A1 2 10 1.94 7.56 6.52 1 1 7, 4.33
A2 2 5 4.33 5.04 1.58 3 3 1.5, 3.5
A3 8 4 6.62 1.05 6.52 2 2
A4 5 8 1.67 4.18 5.70 1 1
A5 7 5 5.21 0.67 5.70 2 2
A6 6 4 5.52 1.05 4.53 2 2
A7 1 2 7.49 6.44 1.58 3 3
A8 4 9 0.33 5.55 6.04 1 1

Step6: Repeat step-3

Step7: Finish. Clusters are ready

Partition Based Clustering: K-Means
Data Points Distance to Cluster New Cluster-1
3.67 9 7 4.33 1.5 3.5 Cluster
A1 (2,10)
A1 2 10 1.94 7.56 6.52 1 1 A4 (5,8)
A2 2 5 4.33 5.04 1.58 3 3 A8 (4,9)
A3 8 4 6.62 1.05 6.52 2 2
A4 5 8 1.67 4.18 5.70 1 1 Cluster-2
A5 7 5 5.21 0.67 5.70 2 2 A3 (8,4)
A6 6 4 5.52 1.05 4.53 2 2 A5 (7,5)
A7 1 2 7.49 6.44 1.58 3 3 A6 (6,4)
A8 4 9 0.33 5.55 6.04 1 1
Cluster-3
A2 (2,5)
Step7: Finish. Clusters are ready A7 (1,2)
Partition Based Clustering: K-Means
Partition Based Clustering: K-Means
Advantages:
• Simple and easy to implement: The k-means algorithm is easy to understand and
implement, making it a popular choice for clustering tasks.
• Fast and efficient: K-means is computationally efficient and can handle large datasets with
high dimensionality.
• Scalability: K-means can handle large datasets with a large number of data points and can
be easily scaled to handle even larger datasets.
• Flexibility: K-means can be easily adapted to different applications and can be used with
different distance metrics and initialization methods.
Disadvantages:
• It requires to specify the number of clusters (k) in advance.
• Sensitivity to initial centroids: K-means is sensitive to the initial selection of centroids and
can converge to a suboptimal solution.
• It can not handle noisy data and outliers.
Learning with Clustering
5.1
Introduction to clustering with overview of distance metrics
Major clustering approaches
5.2
Graph Based Clustering: Clustering with minimal spanning tree
Model based Clustering: Expectation Maximization Algorithm
Density Based Clustering: DBSCAN
Model based Clustering: Expectation Maximization Algorithm
Model based Clustering:
• Model-based clustering is a statistical method for
grouping data into clusters based on a statistical model,
typically some mathematical model.
• Model-based clustering is similar to K-means clustering,
but it assumes a probability distribution for the
observations within each cluster.
• It's a type of soft clustering method that's often used
with normal distributions.
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm:
• Many real world problem have hidden variables (sometime called as latent variables),
which are not observable in the data, that are not available for learning.
• The EM algorithm is considered a latent variable model to find local maximum likelihood
parameters of a statistical model, proposed by Arthur Dempster, Nan Laird, and Donald
Rubin in 1977.
• The EM(Expectation Maximization) algorithm is one of the most used in machine
learning to obtain the maximum likelihood estimate of variables that are sometime
observable and sometime not.
• However it is also applicable to unobserved data or sometime called latent.
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm:
• The EM algorithm is the combination of various unsupervised ML algorithms, such as k-
means clustering algorithm. Being an iterative approach, it consist of two modes
• In the first mode, we estimate the missing or latent variable. Hence it is referred to as the
Expectation/Estimation step (E-step)
• The other mode is used to optimize the parameters of the models so that it can explain
the data more clearly. The second mode is known as maximization step (M-step)
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm:
1st Step: The very first step is to initialize the parameter
values. Further, the system is provided with incomplete
observed data with the assumption that data is obtained from a
specific model.
2nd Step: This step is known as Expectation or E-Step, which is
used to estimate or guess the values of the missing or
incomplete data using the observed data. Further, E-step
primarily updates the variables.
3rd Step: This step is known as Maximization or M-step, where
we use complete data obtained from the 2nd step to update the
parameter values. Further, M-step primarily updates the
hypothesis.
4th step: The last step is to check if the values of latent
variables are converging or not. If it gets "yes", then stop the
process; else, repeat the process from step 2 until the
convergence occurs.
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
• Assume that we have a two coins, 𝐶1 and 𝐶2
A B
• Assume that probability of getting heads with 𝐶1 is 𝜃1
• Assume that probability of getting heads with 𝐶2 is 𝜃𝟐 𝐶1 𝐶2
• We want to find 𝜃1 , 𝜃𝟐 by performing number of trials

First Experiment
• We choose 5 times one of the coins
• We toss the chosen coin 10 times
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
• We choose 5 times one of the coins
• We toss the chosen coin 10 times

B H T T T H H T H T H
𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒉𝒆𝒂𝒅𝒔 𝒖𝒔𝒊𝒏𝒈 𝑪𝟏
A H H H H T H H H H H 𝜽𝟏 =
𝒕𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒇𝒍𝒊𝒑𝒔 𝒖𝒔𝒊𝒏𝒈 𝑪𝟏

A H T H H H H H T H H
𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒉𝒆𝒂𝒅𝒔 𝒖𝒔𝒊𝒏𝒈 𝑪𝟐
𝜽𝟐 =
𝒕𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒇𝒍𝒊𝒑𝒔 𝒖𝒔𝒊𝒏𝒈 𝑪𝟐
B H T H T T T H H T T

A T H H H T H H H T H
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
• We choose 5 times one of the coins
• We toss the chosen coin 10 times

B H T T T H H T H T H Coin A Coin B

5 H, 5 T
A H H H H T H H H H H
9 H, 1 T
A H T H H H H H T H H
8 H, 2 T
B H T H T T T H H T T
4 H, 6 T
A T H H H T H H H T H 7 H, 3 T
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
• We choose 5 times one of the coins
• We toss the chosen coin 10 times
Coin A Coin B

B H T T T H H T H T H
5 H, 5 T
A H H H H T H H H H H 9 H, 1 T

A H T H H H H H T H H 8 H, 2 T
4 H, 6 T
B H T H T T T H H T T
7 H, 3 T
A T H H H T H H H T H
24 H, 6 T 9 H, 11 T
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
• We choose 5 times one of the coins Coin A Coin B
• We toss the chosen coin 10 times 5 H, 5 T
9 H, 1 T
B H T T T H H T H T H
8 H, 2 T
A H H H H T H H H H H
4 H, 6 T
7 H, 3 T
A H T H H H H H T H H 24 H, 6 T 9 H, 11 T

B H T H T T T H H T T
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 ℎ𝑒𝑎𝑑𝑠 𝑢𝑠𝑖𝑛𝑔 𝐶1
𝜃1 =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑙𝑖𝑝𝑠 𝑢𝑠𝑖𝑛𝑔 𝐶1
A T H H H T H H H T H
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
• We choose 5 times one of the coins Coin A Coin B
• We toss the chosen coin 10 times 5 H, 5 T
9 H, 1 T
B H T T T H H T H T H
8 H, 2 T
A H H H H T H H H H H
4 H, 6 T
7 H, 3 T
A H T H H H H H T H H 24 H, 6 T 9 H, 11 T

B H T H T T T H H T T
24
𝜃1 = = 0.8
24 + 6
A T H H H T H H H T H
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
Coin A Coin B
First Experiment
• We choose 5 times one of the coins 5 H, 5 T
• We toss the chosen coin 10 times 9 H, 1 T
8 H, 2 T
B H T T T H H T H T H 4 H, 6 T
7 H, 3 T
A H H H H T H H H H H
24 H, 6 T 9 H, 11 T
A H T H H H H H T H H
24
𝜃1 = = 0.8
B H T H T T T H H T T 24 + 6

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 ℎ𝑒𝑎𝑑𝑠 𝑢𝑠𝑖𝑛𝑔 𝐶2

A T H H H T H H H T H 𝜃2 =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑙𝑖𝑝𝑠 𝑢𝑠𝑖𝑛𝑔 𝐶2
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
Coin A Coin B
First Experiment
• We choose 5 times one of the coins 5 H, 5 T
• We toss the chosen coin 10 times 9 H, 1 T
8 H, 2 T
B H T T T H H T H T H 4 H, 6 T
7 H, 3 T
A H H H H T H H H H H
24 H, 6 T 9 H, 11 T
A H T H H H H H T H H
24
𝜃1 = = 0.8
B H T H T T T H H T T 24 + 6

9
A T H H H T H H H T H 𝜃2 = = 0.45
9 + 11
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
• We choose 5 times one of the coins
• We toss the chosen coin 10 times

B H T T T H H T H T H
• Assume More challenging
A H H H H T H H H H H problem
A H T H H H H H T H H
• We don’t know the identities of
the coins used for each set of
B H T H T T T H H T T
tosses (We treat them as a hidden
variables
A T H H H T H H H T H
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
• We choose 5 times one of the coins
• We toss the chosen coin 10 times

H T T T H H T H T H
• Assume More challenging problem
H H H H T H H H H H • We don’t know the identities of the
coins used for each set of tosses (We
H T H H H H H T H H treat them as a hidden variables
H T H T T T H H T T

H T T T H H T H T H

H H H H T H H H H H

H T H H H H H T H H

H T H T T T H H T T

T H H H T H H H T H
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
Step-01: Initial Values
• We choose 5 times one of the coins
• We toss the chosen coin 10 times
Consider
H T T T H H T H T H 𝜃1 = 0.60, 𝜃2 = 0.50

H H H H T H H H H H

H T H H H H H T H H

H T H T T T H H T T

H H H H T H H H H H
Step-02: E-step
H T H H H H H T H H

H T H T T T H H T T

T H H H T H H H T H
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
Step-02: E-step
• We choose 5 times one of the coins
• We toss the chosen coin 10 times ✓ 𝑊𝑒 ℎ𝑎𝑣𝑒 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟𝑒𝑑 𝜃1 = 0.60, 𝜃2 = 0.50

✓ Consider first round

H T T T H H T H T H
Heads = 5
Tails = 5
H H H H T H H H H H

H T H H H H H T H H

H T H T T T H H T T

✓ Consider first round

H T T T H H T H T H
Heads = 5
Tails = 5
H H H H T H H H H H ✓ Compute likelihood using binomial distribution
𝑃 𝑘 = 𝜃 𝑘 1 − 𝜃 𝑛−𝑘
H T H H H H H T H H

H T H T T T H H T T

✓ Consider first round

H T T T H H T H T H
Heads = 5
Tails = 5
H H H H T H H H H H ✓ Compute likelihood using binomial distribution
𝑃 𝑘 = 𝜃 𝑘 1 − 𝜃 𝑛−𝑘
H T H H H H H T H H ✓ Consider Coin-1(𝐶1 )

H T H T T T H H T T

✓ Consider first round

H T T T H H T H T H
Heads = 5
Tails = 5
H H H H T H H H H H ✓ Compute likelihood using binomial distribution
𝑃 𝑘 = 𝜃 𝑘 1 − 𝜃 𝑛−𝑘
H T H H H H H T H H ✓ Consider Coin-1(𝐶1 )
• Here we want to compute probability of
H T H T T T H H T T getting heads with 𝐶1 ➔ 𝑃𝐴 ℎ
• And K is number of heads
T H H H T H H H T H
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
Step-02: E-step
• We choose 5 times one of the coins
• We toss the chosen coin 10 times ✓ 𝑊𝑒 ℎ𝑎𝑣𝑒 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟𝑒𝑑 𝜃1 = 0.60, 𝜃2 = 0.50

✓ Consider first round

H T T T H H T H T H
Heads = 5
Tails = 5
H H H H T H H H H H ✓ Compute likelihood using binomial distribution
𝑃 𝑘 = 𝜃 𝑘 1 − 𝜃 𝑛−𝑘
H T H H H H H T H H ✓ Consider Coin-1(𝐶1 )
• Here we want to compute probability of
H T H T T T H H T T getting heads with 𝐶1 ➔ 𝑃𝐴 ℎ
• And K is number of heads
T H H H T H H H T H
𝑃𝐴 ℎ = 𝜃1𝑘 1 − 𝜃1 𝑛−𝑘
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
Step-02: E-step
• We choose 5 times one of the coins
• We toss the chosen coin 10 times ✓ 𝑊𝑒 ℎ𝑎𝑣𝑒 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟𝑒𝑑 𝜃1 = 0.60, 𝜃2 = 0.50

✓ Consider first round

H T T T H H T H T H
Heads = 5
Tails = 5
H H H H T H H H H H ✓ Compute likelihood using binomial distribution
𝑃 𝑘 = 𝜃 𝑘 1 − 𝜃 𝑛−𝑘
H T H H H H H T H H ✓ Consider Coin-1(𝐶1 )
• Here we want to compute probability of
H T H T T T H H T T getting heads with 𝐶1 ➔ 𝑃𝐴 ℎ
• And K is number of heads
T H H H T H H H T H
𝑃𝐴 ℎ = 0.65 1 − 0.6 10−5
= 0.0007962624
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
Step-02: E-step
• We choose 5 times one of the coins
• We toss the chosen coin 10 times ✓ 𝑊𝑒 ℎ𝑎𝑣𝑒 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟𝑒𝑑 𝜃1 = 0.60, 𝜃2 = 0.50

✓ Consider first round

H T T T H H T H T H
Heads = 5
Tails = 5
H H H H T H H H H H ✓ Compute likelihood using binomial distribution
𝑃 𝑘 = 𝜃 𝑘 1 − 𝜃 𝑛−𝑘
H T H H H H H T H H ✓ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐴 = 𝑃𝐴 ℎ = 0.0007962624

H T H T T T H H T T

✓ Consider first round

H T T T H H T H T H
Heads = 5
Tails = 5
H H H H T H H H H H ✓ Compute likelihood using binomial distribution
𝑃 𝑘 = 𝜃 𝑘 1 − 𝜃 𝑛−𝑘
H T H H H H H T H H ✓ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐴 = 𝑃𝐴 ℎ = 0.0007962624
✓ Consider Coin-2(𝐶2 )
H T H T T T H H T T • Here we want to compute probability of
getting heads with 𝐶2 ➔ 𝑃𝐵 ℎ
T H H H T H H H T H • And K is number of heads
𝑘 𝑛−𝑘
𝑃𝐵 ℎ = θ2 1 − θ2
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
Step-02: E-step
• We choose 5 times one of the coins
• We toss the chosen coin 10 times ✓ 𝑊𝑒 ℎ𝑎𝑣𝑒 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟𝑒𝑑 𝜃1 = 0.60, 𝜃2 = 0.50

✓ Consider first round

H T T T H H T H T H
Heads = 5
Tails = 5
H H H H T H H H H H ✓ Compute likelihood using binomial distribution
𝑃 𝑘 = 𝜃 𝑘 1 − 𝜃 𝑛−𝑘
H T H H H H H T H H ✓ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐴 = 𝑃𝐴 ℎ = 0.0007962624
✓ Consider Coin-2(𝐶2 )
H T H T T T H H T T • Here we want to compute probability of
getting heads with 𝐶2 ➔ 𝑃𝐵 ℎ
T H H H T H H H T H • And K is number of heads
𝑃𝐵 ℎ = 0.55 1 − 0.5 10−5
= 0.0009765625
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
Step-02: E-step
• We choose 5 times one of the coins
• We toss the chosen coin 10 times ✓ 𝑊𝑒 ℎ𝑎𝑣𝑒 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟𝑒𝑑 𝜃1 = 0.60, 𝜃2 = 0.50

✓ Consider first round

H T T T H H T H T H
Heads = 5
Tails = 5
H H H H T H H H H H ✓ Compute likelihood using binomial distribution
𝑃 𝑘 = 𝜃 𝑘 1 − 𝜃 𝑛−𝑘
H T H H H H H T H H
✓ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐴 = 𝑃𝐴 ℎ = 0.0007962624
H T H T T T H H T T ✓ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐵 = 𝑃𝐵 ℎ = 0.0009765625

✓ Consider first round

𝑃 𝐴
T H H H T H H H T H ✓ Normalise 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐴 by using
𝑃 𝐴 +𝑃 𝐵
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
Step-02: E-step
• We choose 5 times one of the coins
• We toss the chosen coin 10 times ✓ 𝑊𝑒 ℎ𝑎𝑣𝑒 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟𝑒𝑑 𝜃1 = 0.60, 𝜃2 = 0.50

✓ Consider first round

H T T T H H T H T H Heads = 5
Tails = 5
✓ Compute likelihood using binomial distribution
H H H H T H H H H H
𝑃 𝑘 = 𝜃 𝑘 1 − 𝜃 𝑛−𝑘
H T H H H H H T H H
✓ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐴 = 𝑃𝐴 ℎ = 0.0007962624
✓ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐵 = 𝑃𝐵 ℎ = 0.0009765625
H T H T T T H H T T
✓ Normalise 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐴 by using
T H H H T H H H T H 0.0007962624
= 0.45
0.0007962624+0.0009765625
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
Step-02: E-step
• We choose 5 times one of the coins
• We toss the chosen coin 10 times ✓ 𝑊𝑒 ℎ𝑎𝑣𝑒 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟𝑒𝑑 𝜃1 = 0.60, 𝜃2 = 0.50

✓ Consider first round

H T T T H H T H T H Heads = 5
Tails = 5
✓ Compute likelihood using binomial distribution
H H H H T H H H H H
𝑃 𝑘 = 𝜃 𝑘 1 − 𝜃 𝑛−𝑘
H T H H H H H T H H
✓ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐴 = 𝑃𝐴 ℎ = 0.0007962624
✓ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐵 = 𝑃𝐵 ℎ = 0.0009765625
H T H T T T H H T T
✓ Normalise 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐴 =0.45
T H H H T H H H T H 0.0007962624
= 0.45
0.0007962624+0.0009765625
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
Step-02: E-step
• We choose 5 times one of the coins
• We toss the chosen coin 10 times ✓ 𝑊𝑒 ℎ𝑎𝑣𝑒 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟𝑒𝑑 𝜃1 = 0.60, 𝜃2 = 0.50

✓ Consider first round

H T T T H H T H T H Heads = 5
Tails = 5
✓ Compute likelihood using binomial distribution
H H H H T H H H H H
𝑃 𝑘 = 𝜃 𝑘 1 − 𝜃 𝑛−𝑘
H T H H H H H T H H
✓ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐴 = 𝑃𝐴 ℎ = 0.0007962624
✓ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐵 = 𝑃𝐵 ℎ = 0.0009765625
H T H T T T H H T T
✓ Normalise 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐴 =0.45
T H H H T H H H T H 𝑃 𝐵
✓ Normalise 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐵 by using
𝑃 𝐴 +𝑃 𝐵
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
Step-02: E-step
• We choose 5 times one of the coins
• We toss the chosen coin 10 times ✓ 𝑊𝑒 ℎ𝑎𝑣𝑒 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟𝑒𝑑 𝜃1 = 0.60, 𝜃2 = 0.50

✓ Consider first round

H T T T H H T H T H Heads = 5
Tails = 5
✓ Compute likelihood using binomial distribution
H H H H T H H H H H
𝑃 𝑘 = 𝜃 𝑘 1 − 𝜃 𝑛−𝑘
H T H H H H H T H H
✓ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐴 = 𝑃𝐴 ℎ = 0.0007962624
✓ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐵 = 𝑃𝐵 ℎ = 0.0009765625
H T H T T T H H T T
✓ Normalise 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐴 =0.45
T H H H T H H H T H ✓ Normalise 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐵 by using
0.0009765625
= 0.55
0.0007962624+0.0009765625
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
Step-02: E-step
• We choose 5 times one of the coins
• We toss the chosen coin 10 times ✓ 𝑊𝑒 ℎ𝑎𝑣𝑒 𝑐𝑜𝑛𝑠𝑖𝑑𝑒𝑟𝑒𝑑 𝜃1 = 0.60, 𝜃2 = 0.50

✓ Consider first round

H T T T H H T H T H Heads = 5
Tails = 5
✓ Compute likelihood using binomial distribution
H H H H T H H H H H
𝑃 𝑘 = 𝜃 𝑘 1 − 𝜃 𝑛−𝑘
H T H H H H H T H H
✓ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐴 = 𝑃𝐴 ℎ = 0.0007962624
✓ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐵 = 𝑃𝐵 ℎ = 0.0009765625
H T H T T T H H T T
✓ Normalise 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐴 = 0.45
T H H H T H H H T H ✓ Normalise 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝐵 = 0.55
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
• We choose 5 times one of the coins Step-01: Initial Values
• We toss the chosen coin 10 times
Consider
H T T T H H T H T H A 0.45 B 0.55 𝜃1 = 0.60, 𝜃2 = 0.50
H H H H T H H H H H
Step-02: E-step
H T H H H H H T H H

H T H T T T H H T T

T H H H T H H H T H
Model based Clustering: Expectation Maximization Algorithm
Expectation Maximization Algorithm: Example
First Experiment
• We choose 5 times one of the coins Step-01: Initial Values
• We toss the chosen coin 10 times
Consider
H T T T H H T H T H A 0.45 B 0.55 𝜃1 = 0.60, 𝜃2 = 0.50
H H H H T H H H H H
Step-02: E-step
H T H H H H H T H H

H T H T T T H H T T

H T H T T T H H T T A 0.35