0% found this document useful (0 votes)
8 views

cs221-lecture12

The lecture discusses unsupervised machine learning, focusing on K-Means clustering and its relation to Expectation Maximization (EM). It highlights the challenges of local minima in clustering and introduces spectral clustering as a method to improve clustering results. Various examples and applications of unsupervised learning techniques are also presented.

Uploaded by

Surya Basnet
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

cs221-lecture12

The lecture discusses unsupervised machine learning, focusing on K-Means clustering and its relation to Expectation Maximization (EM). It highlights the challenges of local minima in clustering and introduces spectral clustering as a method to improve clustering results. Various examples and applications of unsupervised learning techniques are also presented.

Uploaded by

Surya Basnet
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

CS 221: Artificial Intelligence

Lecture 11:
Unsupervised Machine Learning

Peter Norvig and Sebastian Thrun


Slide credit: Mark Pollefeys, Dan Klein, Chris Manning
The unsupervised learning problem

Many data points, no labels


2
Unsupervised Learning?
 Google Street View

3
K-Means

Many data points, no labels


4
K-Means
 Choose a fixed number of  Algorithm
clusters  fix cluster centers;
allocate points to closest
 Choose cluster centers and cluster
point-cluster allocations to  fix allocation; compute
minimize error best cluster centers
 can’t do this by exhaustive  x could be any set of
search, because there are features for which we can
too many possible compute a distance (careful
allocations.
about scaling)

 2 
   x j   i 
iclusters 
jelements of i'th cluster 
* From Marc Pollefeys COMP 256 2003
K-Means
K-Means

* From Marc Pollefeys COMP 256 2003


Results of K-Means Clustering:

Image Clusters on intensity Clusters on color

K-means clustering using intensity alone and color alone


* From Marc Pollefeys COMP 256 2003
K-Means
 Is an approximation to EM
 Model (hypothesis space): Mixture of N Gaussians
 Latent variables: Correspondence of data and Gaussians
 We notice:
 Given the mixture model, it’s easy to calculate the
correspondence
 Given the correspondence it’s easy to estimate the mixture
models
Expectation Maximzation: Idea
 Data generated from mixture of Gaussians

 Latent variables: Correspondence between Data Items


and Gaussians
Generalized K-Means (EM)
Learning a Gaussian Mixture
(with known covariance)

p ( x  xi |   j )
E-Step E[ zij ]  k

 p ( x  x |  
n 1
i n )
1 2
 2
( xi   j )
2
e
 k 1 2
 ( x   )
e
2 i n
2

n 1

1 m
M-Step  j   E[ zij ] xi
m i 1
Expectation Maximization
 Converges!
 Proof [Neal/Hinton, McLachlan/Krishnan]:
 E/M step does not decrease data likelihood
 Converges at local minimum or saddle point
 But subject to local minima
EM Clustering: Results

http://www.ece.neu.edu/groups/rpl/kmeans/
Practical EM
 Number of Clusters unknown
 Suffers (badly) from local minima
 Algorithm:
 Start new cluster center if many points
“unexplained”
 Kill cluster center that doesn’t contribute
 (Use AIC/BIC criterion for all this, if you want
to be formal)

15
Spectral Clustering

16
Spectral Clustering

17
The Two Spiral Problem

18
Spectral Clustering: Overview

Data Similarities Block-Detection

* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University
Eigenvectors and Blocks
 Block matrices have block eigenvectors:
1= 2 =2
2 =0
3 =0
4

1 1 0 0 .71 0
1 1 0 0 .71 0
0 0 1 1 eigensolver 0 .71
0 0 1 1 0 .71

 Near-block matrices have near-block eigenvectors: [Ng et al., NIPS 02]


1= 2.02 2= 2.02 3= -0.02 4= -0.02

1 1 .2 0 .71 0
1 1 0 -.2 .69 -.14
.2 0 1 1 eigensolver .14 .69
0 -.2 1 1 0 .71

* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University
Spectral Space
 Can put items into blocks by eigenvectors: e1

1 1 .2 0 .71 0
1 1 0 -.2 .69 -.14
.2 0 1 1 .14 .69 e2
0 -.2 1 1 0 .71

e1 e2
 Resulting clusters independent of row ordering:
e1

1 .2 1 0 .71 0
.2 1 0 1 .14 .69
1 0 1 -.2 .69 -.14 e2
0 1 -.2 1 0 .71

e1 e2
* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University
The Spectral Advantage
 The key advantage of spectral clustering is the spectral space
representation:

* Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University
Measuring Affinity
Intensity
  1
aff x, y exp 
  2 2
i



2 
I x  I y 


Distance
   2 

aff x, yexp  1 2 x  y 
  2 d  

Texture
  
 2 
aff x, yexp  1 2 cx  cy 
  2 t  

* From Marc Pollefeys COMP 256 2003
Scale affects affinity

* From Marc Pollefeys COMP 256 2003


* From Marc Pollefeys COMP 256 2003
Other examples of unsupervised learning

Mean face (after alignment)

26
Slide credit: Santiago Serrano
Eigenfaces

27
Slide credit: Santiago Serrano
Scape (Drago Anguelov et al)

28

You might also like