0% found this document useful (0 votes)
2 views38 pages

Clustering and K-Mean Algorithm

This lecture covers the concepts of supervised and unsupervised learning, focusing on clustering and the K-Means algorithm. It explains the steps involved in the K-Means algorithm, including initialization, membership calculation, and centroid updating, as well as methods for choosing the optimal number of clusters (K) using the Elbow method. Additionally, applications of K-Means, such as image color quantization and handling outliers, are discussed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views38 pages

Clustering and K-Mean Algorithm

This lecture covers the concepts of supervised and unsupervised learning, focusing on clustering and the K-Means algorithm. It explains the steps involved in the K-Means algorithm, including initialization, membership calculation, and centroid updating, as well as methods for choosing the optimal number of clusters (K) using the Elbow method. Additionally, applications of K-Means, such as image color quantization and handling outliers, are discussed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Machine Learning

Clustering and k-Means Algorithm


Lecture – 7

Instructor: Qamar Askari


Headlines
• Supervised vs. Unsupervised Learning
• Clustering
• K-Means algorithm
• Random initialization
• Choosing K – Elbow Method
• Implementation of K-Means Algorithm in Python
Supervised learning

Training set:
Unsupervised learning

Training set:
Clustering
• It is the task of identifying subgroups in the data such that data points
in the same subgroup (cluster) are very similar while data points in
different clusters are very different.

Market segmentation Social network analysis


K-means algorithm
An example on board
K-Means Algorithm
• 1.Initialize:
• Choose K random data points from the data set to represent the initial centers
of the K partitions

• 2.Calculate the group memberships:



1 if x t
− mi = min j x t − mj
bi = 
t


0 otherwise
• 3.Update the centroids:
 ix
b
t
t t

mi =
 i
b t

t
• 4.Repeat steps 2 and 3
• until converge to stable values
An empirical study
Initial centres

Ref: K. Javed et. al, “The behavior of K-Means: An empirical study”,


International Conference on Electrical Engineering, 2008
Ref: K. Javed et. al, “The behavior of K-Means: An empirical study”,
International Conference on Electrical Engineering, 2008
Ref: K. Javed et. al, “The behavior of K-Means: An empirical study”,
International Conference on Electrical Engineering, 2008
Ref: K. Javed et. al, “The behavior of K-Means: An empirical study”,
International Conference on Electrical Engineering, 2008
Ref: K. Javed et. al, “The behavior of K-Means: An empirical study”,
International Conference on Electrical Engineering, 2008
Application – Image color quantization
20

40

60

80

100

120

140

160

180

200

220

50 100 150 200 250 300

•Look at the above picture….it does not have all 2563 colors.
•Suppose we want to represent it using even less colors
•Kmeans has an application called color quantization
20 20 20

40 40 40

60 60 60

80 80 80

100 100 100

120 120 120

140 140 140

160 160 160

180 180 180

200 200 200

220 220 220

50 100 150 200 250 300 50 100 150 200 250 300 50 100 150 200 250 300

Original Image: K=2 K=10


37859 Shades of Color

20 20

40 40

60 60

80 80

100 100
KMEANS
120 120

140 140

160 160

180 180

200 200

220 220

50 100 150 200 250 300 50 100 150 200 250 300

K=15 K=20
k=2 k=3 k=10

Reference: Bishop 2006.


K-means for non-separated clusters

T-shirt sizing

Weight
Height
K-Means and Globular/Non-Globular
structures

Globular Structure Non-Globular Structure


K-Means and Globular/Non-Globular
structures
K-Means on Non-Globular Structure
K-Means is sensitive to outliners
Handling Outliers
1. Remove/ignore outlier(s). Outliers can be identified from distance
of point from centroid
2. Make extra cluster for outlier as shown below
Random initialization
Should have

Randomly pick training


examples.

Set equal to these


examples.
Local optima
Random initialization
For i = 1 to 100 {

Randomly initialize K-means.


Run K-means. Get .
Compute cost function (distortion/WCSS)

Pick clustering that gave lowest cost


What is the right value of K?
Choosing the value of K
Elbow method:
Cost function

Cost function
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

(no. of clusters) (no. of clusters)


Choosing the value of K
Sometimes, you’re running K-means to get clusters to use for some
later/downstream purpose. Evaluate K-means based on a metric for
how well it performs for that later purpose.

E.g. T-shirt sizing T-shirt sizing

Weight
Weight

Height Height
Implementation of K-Means Algorithm

Discussion from Google Colab

You might also like