Clustering and K-Mean Algorithm
Clustering and K-Mean Algorithm
Training set:
Unsupervised learning
Training set:
Clustering
• It is the task of identifying subgroups in the data such that data points
in the same subgroup (cluster) are very similar while data points in
different clusters are very different.
0 otherwise
• 3.Update the centroids:
ix
b
t
t t
mi =
i
b t
t
• 4.Repeat steps 2 and 3
• until converge to stable values
An empirical study
Initial centres
40
60
80
100
120
140
160
180
200
220
•Look at the above picture….it does not have all 2563 colors.
•Suppose we want to represent it using even less colors
•Kmeans has an application called color quantization
20 20 20
40 40 40
60 60 60
80 80 80
50 100 150 200 250 300 50 100 150 200 250 300 50 100 150 200 250 300
20 20
40 40
60 60
80 80
100 100
KMEANS
120 120
140 140
160 160
180 180
200 200
220 220
50 100 150 200 250 300 50 100 150 200 250 300
K=15 K=20
k=2 k=3 k=10
T-shirt sizing
Weight
Height
K-Means and Globular/Non-Globular
structures
Cost function
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Weight
Weight
Height Height
Implementation of K-Means Algorithm