K Means Clustering
K Means Clustering
K-Means Clustering
Gabriela Ochoa
[email protected]
Clustering
The main task in unsupervised learning
1 / 16
Two Clustering Methods
2 / 16
Characteristics of the Clusters
3 / 16
Measuring Distance
4 / 16
Distance
Euclidean Distance
I In the 2D plane, the Euclidean distance between p1 = (x1 , y1 )
and p2 = (x2 , y2 ) is given by the Pythagoras theorem:
q
d(p1 , p2 ) = (x2 − x1 )2 + (y2 − y1 )2
5 / 16
K-means Clustering Algorithms
6 / 16
What is the Centroid of a set of points?
7 / 16
K-means Algorithm
Example with D = 2, K = 2
1
8
10
6
4
4
5 6
2
7
0
0 2 4 6 8
8 / 16
K-means Algorithm
Randomly choose centroids. Calculate distance between all points and centroids.
1 Distances
8
C1 C2
2
1 6.08 5.39
10
6
2 5.10 5.10
3 3 4.24 3.16
4 4 2.24 5.39
4
9 5 1.00 6.40
5 6
6 0.00 7.21
2
7 7.28 6.08
8
8 6.08 5.00
7
0
9 8.06 3.61
10 7.21 0.00
0 2 4 6 8
9 / 16
K-means Algorithm
Assign points to clusters. Each point assigned to the closest centroid.
1 Distances
8
C1 C2
2
1 6.08 5.39
10 c2
6
2 5.10 5.10
3 3 4.24 3.16
4 4 2.24 5.39
4
9 5 1.00 6.40
5 6 c1
6 0.00 7.21
2
7 7.28 6.08
8
8 6.08 5.00
7
0
9 8.06 3.61
10 7.21 0.00
0 2 4 6 8
10 / 16
K-means Algorithm
Iteration 1
1 Distances
8
C1 C2
2
1 4.26 5.89
10
6
2 3.26 5.23
3 3 2.57 2.46
c2 4 4 0.35 4.17
4
c1
9 5 1.77 4.55
5 6
6 1.90 5.48
2
7 7.29 4.25
8
8 5.93 2.95
7
0
9 7.29 2.95
10 5.71 2.32
0 2 4 6 8
11 / 16
K-means Algorithm
Iteration 2
1 Distances
8
C1 C2
2
1 3.41 7.07
10
6
2 2.41 6.40
3 3 2.24 3.61
c1
4 4 0.63 5.10
4
9 c2 5 2.61 5.10
5 6
6 2.72 6.08
2
7 7.72 3.16
8
8 6.32 2.00
7
0
9 7.38 2.00
10 5.39 3.00
0 2 4 6 8
12 / 16
K-means Algorithm
Iteration 3: no change in centroids
1 Distances
8
C1 C2
2
1 3.34 7.96
10
6
2 2.34 7.30
3 3 1.86 4.51
c1
4 4 0.69 5.94
4
9 5 2.67 5.77
c2
5 6
6 2.91 6.77
2
7 7.47 2.51
8
8 6.07 1.68
7
0
9 7.03 1.35
10 5.01 3.58
0 2 4 6 8
13 / 16
Properties of the Algorithm
14 / 16
Local Optimum
1
0.8
0.6
0.4
0.2
−0.2
−0.4
−0.6
−0.8
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
15 / 16
Summary
16 / 16