0% found this document useful (0 votes)
60 views29 pages

Clustering: Unsupervised Learning

This document discusses clustering and the k-means algorithm. It begins with an introduction to clustering and unsupervised learning. It then explains the k-means algorithm, including how it initializes cluster centroids randomly, assigns examples to the closest centroid, and updates the centroids to be the average of each cluster. It discusses how to choose the number of clusters k using the elbow method by plotting cost against k.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
0% found this document useful (0 votes)
60 views29 pages

Clustering: Unsupervised Learning

This document discusses clustering and the k-means algorithm. It begins with an introduction to clustering and unsupervised learning. It then explains the k-means algorithm, including how it initializes cluster centroids randomly, assigns examples to the closest centroid, and updates the centroids to be the average of each cluster. It discusses how to choose the number of clusters k using the elbow method by plotting cost against k.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 29

Clustering

Unsupervised learning
introduction

Machine Learning
Supervised learning

Training set:
Andrew Ng
Unsupervised learning

Training set:
Andrew Ng
Applications of clustering

Market segmentation Social network analysis

Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)

Organize computing clusters Astronomical data analysis


Andrew Ng
Clustering
K-means
algorithm
Machine Learning
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
Andrew Ng
K-means algorithm
Input:
- (number of clusters)
- Training set

(drop convention)

Andrew Ng
K-means algorithm

Randomly initialize cluster centroids


Repeat {
for = 1 to
:= index (from 1 to ) of cluster centroid
closest to
for = 1 to
:= average (mean) of points assigned to cluster

}
Andrew Ng
K-means for non-separated clusters

T-shirt sizing

Weight
Height

Andrew Ng
Clustering
Optimization
objective
Machine Learning
K-means optimization objective
= index of cluster (1,2,…, ) to which example is currently
assigned
= cluster centroid ( )
= cluster centroid of cluster to which example has been
assigned
Optimization objective:

Andrew Ng
K-means algorithm

Randomly initialize cluster centroids

Repeat {
for = 1 to
:= index (from 1 to ) of cluster centroid
closest to
for = 1 to
:= average (mean) of points assigned to cluster
}
Andrew Ng
Clustering
Random
initialization
Machine Learning
K-means algorithm

Randomly initialize cluster centroids

Repeat {
for = 1 to
:= index (from 1 to ) of cluster centroid
closest to
for = 1 to
:= average (mean) of points assigned to cluster
}
Andrew Ng
Random initialization
Should have

Randomly pick training


examples.

Set equal to these


examples.

Andrew Ng
Local optima

Andrew Ng
Random initialization
For i = 1 to 100 {

Randomly initialize K-means.


Run K-means. Get .
Compute cost function (distortion)

Pick clustering that gave lowest cost

Andrew Ng
Clustering
Choosing the
number of clusters
Machine Learning
What is the right value of K?

Andrew Ng
Choosing the value of K
Elbow method:
Cost function

Cost function
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

(no. of clusters) (no. of clusters)

Andrew Ng
Choosing the value of K
Sometimes, you’re running K-means to get clusters to use for some
later/downstream purpose. Evaluate K-means based on a metric for
how well it performs for that later purpose.

E.g. T-shirt sizing T-shirt sizing

Weight
Weight

Height Height
Andrew Ng

You might also like