0% found this document useful (0 votes)

3 views39 pages

Slides 11

Uploaded by

Akshaya Ashok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views39 pages

Slides 11

Uploaded by

Akshaya Ashok

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Lecture Slides for

INTRODUCTION TO

Machine Learning
2nd Edition

ETHEM ALPAYDIN, modified by Leonardo Bobadilla

and some parts from
http://www.cs.tau.ac.il/~apartzin/MachineLearning/
and
www.cs.princeton.edu/courses/archive/fall01/cs302
/notes/11.../EM.ppt
© The MIT Press, 2010 [email protected]
http://www.cmpe.boun.edu.tr/~ethem/i2ml2e
Outline
Previous class
Ch 6: Dimensionality reduction
This class:
Ch 7: Clustering

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT
Press (V1.0)
CHAPTER 7:

Clustering
Clustering:Motivation
●
Optical Character Recognition
– Two ways to write 7 (w/o horizontal bar)
– Can’t assume single distribution
– Mixture of unknown number of templates

●
Compared to classification
– Number of classes is known
– Each training sample has a label of a class
– Supervised Learning

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Example : Color quantization
5

●
Image: each pixels represented by 24 bit color
●
Colors come from different distribution (e.g. sky,
grass)
●
Don’t have labeling for each pixels if it’s sky or
grass
●
Want to use only 256 colors in palette to
represent image as close as possible to original
●
Quantize uniformly: assign single color to each
2^24/256 interval
●
Waste values for rarely occurring intervals
Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Quantization
6

●
Sample (pixels):
●
k reference vectors (palette):
●
Select reference vector for each pixel:
xt − mi = min xt − m j
j
●
Reference vectors: codebook vectors or code
words
E {mi } i =1 X ) = ∑t ∑i bit xt − mi
( k
●
Compress image
●
Reconstruction error t 1 if xt − m = min xt − m
bi =  i
j
j

0 otherwise
Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Encoding/Decoding
7

1 if xt − mi = min xt − m j
bit =  j
0 otherwise

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)
K-means clustering
8

●
Minimize reconstruction error

( )
E { m i } i =1 X = ∑ t ∑ i bit xt − m i
k

●
Take derivatives and set to zero

●
Reference vectors is the mean of all
instances it represents

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)
K-Means clustering
9

●
Iterative procedure for finding reference
vectors
●
Start with random reference vectors
●
Estimate labels b
●
Re-compute reference vectors as means
●
Continue till converge

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)
k-means Clustering
10

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)
11 Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Expectation Maximization:
Learning from Data
We want to learn a model with a set of
parameter values
We are given a set of data X.
An approach: argmax Pr(X| )
This is the maximum likelihood model
(ML).
Super Simple Example
Coin I and Coin II. (biased.)
Pick a coin at random (uniform).
Flip it 4 times.
Repeat.

What are the parameters of the model?

Data
Coin I Coin II
HHHT TTTH
HTHH THTT
HTTH TTHT
THHH HTHT
HHHH HTTT
Probability of X Given
p: Probability of H from Coin I
q: Probability of H from Coin II

Let’s say h heads and t tails for Coin I. h’

and t’ for Coin II.
Pr(X| ) = ph (1-p)t qh’ (1-q)t’
How maximize this quantity?
Maximizing p
Use maximum likelihood.

h/(t+h) = p
Missing Data
HHHT HTTH
TTTH HTHH
THTT HTTT
TTHT HHHH
THHH HTHT
Oh Boy, Now What!
If we knew the labels (which flips from
which coin), we could find ML values for
p and q.
What could we use to label?
p and q!
Computing Labels
p = ¾, q = 3/10
Pr(Coin I | HHTH)
= Pr(HHTH | Coin I) Pr(Coin I) / c
= (3/4)3(1/4) (1/2)/c = .052734375/c
Pr(Coin II | HHTH)
= Pr(HHTH | Coin II) Pr(Coin II) / c
= (3/10)3(7/10) (1/2)/c= .00945/c
Expected Labels
I II I II
HHHT .85 .15 HTTH .44 .56
TTTH .10 .90 HTHH .85 .15
THTT .10 .90 HTTT .10 .90
TTHT .10 .90 HHHH .98 .02
THHH .85 .15 HTHT .44 .56
Wait, I Have an Idea
Pick some mode l 0

Expectation
●
Compute expected labels via i

Maximization
●
Compute ML model i+1

Repeat
Could This Work?
Expectation-Maximization (EM)
Pr(X| i) will not decrease.
Sound familiar? Type of search.
Mixture Densities
23

k
p ( x )=∑ p ( x∣Gi ) P ( Gi )
i=1
●
where Gi the components/groups/clusters,
P ( Gi ) mixture proportions (priors),
p ( x | Gi) component densities

●
Gaussian mixture where p(x|Gi) ~ N ( μi ,
∑i ) parameters Φ = {P ( Gi ), μi , ∑i }ki=1
unlabeled sample X={xt}t (unsupervised
learning)
Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Example
24

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Expectation Maximization(EM):
Motivation
25

●
Date came from several distributions
●
Assume each distribution is known up to
parameters
●
If we would know for each data instance from
what distribution it came, could use parametric
estimation
●
Introduce unobservable (latent) variables which
indicate source distribution
●
Run iterative process
– Estimate latent variables from data and current
estimation of distribution parameters
– Use current values of latent variables to refine
parameter estimation
Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)
EM
26

●
Log-Likelihood L ( Φ | X ) = log
∏ |Φ
p xt
( )
t

( )
k
= ∑t log∑ p xt | Gi P (Gi )
i =1

●
Assume hidden variables Z, which when
known, make optimization much simpler
●
Complete likelihood, Lc(Φ |X,Z), in terms of X
and Z
●
Incomplete likelihood, L(Φ |X), in terms of X

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Latent Variables
27

●
Unknown
●
Can’t compute complete likelihood Lc(Φ |X,Z)
●
Can compute its expected value
E-step:Q ( Φ|Φ l ) = E  L C ( Φ|X,Z ) |X , Φ l 

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)
E- and M-steps
28

 Iterate the two steps:

1. E-step: Estimate z given X and current
Φ
2. M-step: Find new Φ’ given z, X, and old
Φ.
E-step:Q ( Φ|Φ ) = E  L C ( Φ|X,Z ) |X, Φ 
l l

M-step:Φ l +1
= arg max Q ( Φ|Φ l
)
Φ

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Example:
29

●
Data came from mix of Gaussians
●
Maximize likelihood assuming we know latent
“indicator variable”

●
E-step: expected value of indicator variables

●
Assume all groups/clusters are Gaussians
●
Multivariate Uncorrelated
●
Same Variance
●
Harden indicators
– EM: expected values are between 0 and 1
– K-means: 0 or 1
●
Same as k-means

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Dimensionality Reduction vs.
32
Clustering
●
Dimensionality reduction methods find
correlations between features and group
features
– Age and Income are correlated

●
Clustering methods find similarities between
instances and group instances
– Customer A and B are from the same cluster

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Clustering: Usage for supervised
learning
33

●
Describe data in terms of cluster
– Represent all data in cluster by cluster mean
– Range of attributes

●
Map data into new space(preprocessing)
– d- dimension original space
– k- number of clusters
– Use indicator variables as data representations
– k might be larger then d

●
In classification, the input comes from a
mixture of classes (supervised).
●
If each class is also a mixture, e.g., of
Gaussians, (unsupervised), we have a
mixture of mixtures: k
p ( x | Ci ) = ∑ p ( x | Gij ) P (Gij )
i

j =1
K
p ( x) = ∑ p ( x | Ci ) P (Ci )
i =1

●
Probabilistic view
– Fit mixture model to data
– Find codewords minimizing reconstruction error

●
Hierarchical clustering
– Group similar items together
– No specific model/distribution
– Items in groups is more similar to each other
than instances in different groups

Minkowski (Lp) (Euclidean for p = 2)

( ) [∑ (x ) ]
1/ p
d s p
dm x , x = r s
j =1
r
j −x j

City-block distance
( r s
)
dcb x , x = ∑ j =1 xrj − x sj
d

●
Start with clusters each having single point
●
At each step merge similar clusters
●
Measure of similarity
– Minimal distance(single link)
●
Distance between closest points in 2 groups
– Maximal distance(complete link)
●
Distance between most distant points in 2 groups
– Average distance
●
Distance between group centers

Dendrogram

●
Defined by the application, e.g., image
quantization
●
Plotting data in two dimensions using PCA

●
Incremental (leader-cluster) algorithm: Add
one at a time until “elbow” (reconstruction
error/log likelihood/intergroup distances)

Introduction To Machine Learning - Ethem Alpaydin
100% (4)
Introduction To Machine Learning - Ethem Alpaydin
432 pages
Week 7 - Latent Variable Models and Expectation Maximization
No ratings yet
Week 7 - Latent Variable Models and Expectation Maximization
39 pages
Lecture Slides For: Ethem Alpaydin © The MIT Press, 2010
No ratings yet
Lecture Slides For: Ethem Alpaydin © The MIT Press, 2010
28 pages
Machine Learning and Econometrics
No ratings yet
Machine Learning and Econometrics
50 pages
Lecture Slides For: Ethem Alpaydin © The MIT Press, 2010
No ratings yet
Lecture Slides For: Ethem Alpaydin © The MIT Press, 2010
30 pages
ML Columbia PDF
No ratings yet
ML Columbia PDF
615 pages
Unsupervised Machine Learning in Python
100% (1)
Unsupervised Machine Learning in Python
89 pages
Classic Machine Learning Algorithms
No ratings yet
Classic Machine Learning Algorithms
61 pages
Module 5-Part 2
No ratings yet
Module 5-Part 2
55 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
32 pages
7 - Kmeans 5-11-24
No ratings yet
7 - Kmeans 5-11-24
51 pages
ML.5-Clustering Techniques (Week 9)
No ratings yet
ML.5-Clustering Techniques (Week 9)
71 pages
I2ml3e Chap7
No ratings yet
I2ml3e Chap7
22 pages
Lecture 19 and 20
No ratings yet
Lecture 19 and 20
27 pages
Clustering PDF
No ratings yet
Clustering PDF
36 pages
Module 5-Part 1
No ratings yet
Module 5-Part 1
30 pages
Week 5
No ratings yet
Week 5
49 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
57 pages
401 Week7 Part 2 EM Algorithm
No ratings yet
401 Week7 Part 2 EM Algorithm
58 pages
Module - 5 - ECE3047 - Machine Learning
No ratings yet
Module - 5 - ECE3047 - Machine Learning
52 pages
AnIntroductiontoMachineLearning - Thebook
No ratings yet
AnIntroductiontoMachineLearning - Thebook
234 pages
Clustering
No ratings yet
Clustering
82 pages
Lec 2
No ratings yet
Lec 2
11 pages
Lecture Expectation Maximization
No ratings yet
Lecture Expectation Maximization
58 pages
Lecture 06
No ratings yet
Lecture 06
51 pages
Machine Learning: CSCE883
No ratings yet
Machine Learning: CSCE883
22 pages
EM and Kmeans Relations
No ratings yet
EM and Kmeans Relations
70 pages
Machine Learning Chapter 1
No ratings yet
Machine Learning Chapter 1
25 pages
Machine Learning: Lecture Slides For
No ratings yet
Machine Learning: Lecture Slides For
67 pages
MLLecture 1
No ratings yet
MLLecture 1
56 pages
DSA5102 Lecture10
No ratings yet
DSA5102 Lecture10
40 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Chap13 KernelMachines
No ratings yet
Chap13 KernelMachines
24 pages
Topic: Machine Learning
No ratings yet
Topic: Machine Learning
35 pages
Machine Learning
No ratings yet
Machine Learning
18 pages
Introduction To (Statistical) Machine Learning
No ratings yet
Introduction To (Statistical) Machine Learning
30 pages
Complete Introduction To Machine Learning 3rd Edition Ethem Alpaydin PDF For All Chapters
No ratings yet
Complete Introduction To Machine Learning 3rd Edition Ethem Alpaydin PDF For All Chapters
55 pages
PROBABILISTIC Learning Jb-New
No ratings yet
PROBABILISTIC Learning Jb-New
13 pages
I2ml2e Chap5 v1 0
No ratings yet
I2ml2e Chap5 v1 0
26 pages
Intro
No ratings yet
Intro
24 pages
I2ml2e Chap4 v1 0
No ratings yet
I2ml2e Chap4 v1 0
27 pages
Kernel Machines
No ratings yet
Kernel Machines
33 pages
Gaussian Mixture Modelling GMM
No ratings yet
Gaussian Mixture Modelling GMM
11 pages
Expectation-Maximization Clustring V2
No ratings yet
Expectation-Maximization Clustring V2
9 pages
Thebook PDF
No ratings yet
Thebook PDF
234 pages
Poly Aml
No ratings yet
Poly Aml
76 pages
ML and Its Application
No ratings yet
ML and Its Application
13 pages
M146 Lec14 Sidenotes S25
No ratings yet
M146 Lec14 Sidenotes S25
33 pages
9 Unsupervised Learning: 9.1 K-Means Clustering
No ratings yet
9 Unsupervised Learning: 9.1 K-Means Clustering
34 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
234 pages
Data Mining Models David L. Olson Instant Download
100% (2)
Data Mining Models David L. Olson Instant Download
59 pages
ML Introduction 1
No ratings yet
ML Introduction 1
19 pages
Machine Learning Chapter 1
No ratings yet
Machine Learning Chapter 1
24 pages
Dimensionality Reduction Lecture Slide
No ratings yet
Dimensionality Reduction Lecture Slide
27 pages
Toc
No ratings yet
Toc
14 pages
Unit 3 Supervised Learning
No ratings yet
Unit 3 Supervised Learning
89 pages
Trip Coil Current Signature Analysis
100% (1)
Trip Coil Current Signature Analysis
5 pages
Code Planet. Machine Learning With Python. A Comprehensive Guide... 2025
No ratings yet
Code Planet. Machine Learning With Python. A Comprehensive Guide... 2025
231 pages
Data Science and Machine Learning Syllabus V1.0
No ratings yet
Data Science and Machine Learning Syllabus V1.0
6 pages
Updated Lecture Zero Int234 1
No ratings yet
Updated Lecture Zero Int234 1
44 pages
Image Segmentation in Python - Practical Hands-On
No ratings yet
Image Segmentation in Python - Practical Hands-On
24 pages
Plant Health Monitoring Using Digital Image Processing: By: Sivapriya.G
No ratings yet
Plant Health Monitoring Using Digital Image Processing: By: Sivapriya.G
12 pages
Data Discretization
No ratings yet
Data Discretization
32 pages
Self-Supervised Speech Representation Learning: A Review
No ratings yet
Self-Supervised Speech Representation Learning: A Review
34 pages
ML Disha
No ratings yet
ML Disha
46 pages
Intelligent System: Lecture Notes For Chapter 7
No ratings yet
Intelligent System: Lecture Notes For Chapter 7
25 pages
Performance Analysis of Various Fuzzy Clustering Algorithms: A Review
No ratings yet
Performance Analysis of Various Fuzzy Clustering Algorithms: A Review
12 pages
Cricket Player Data Analysis Using Clustering Technique
No ratings yet
Cricket Player Data Analysis Using Clustering Technique
5 pages
Developing A Two-Stage Decision-Making Method For Selecting and Clustering Suppliers Based On The Resilience Criteria
No ratings yet
Developing A Two-Stage Decision-Making Method For Selecting and Clustering Suppliers Based On The Resilience Criteria
28 pages
Pattern Recognition Using K-Nearest Neighbors (KNN) Technique
No ratings yet
Pattern Recognition Using K-Nearest Neighbors (KNN) Technique
67 pages
Aiml QB
No ratings yet
Aiml QB
16 pages
Notes For Business Analytics Part II
No ratings yet
Notes For Business Analytics Part II
66 pages
CSL7620 A2
No ratings yet
CSL7620 A2
2 pages
Cluster Analysis Exercise
No ratings yet
Cluster Analysis Exercise
2 pages
18 A Comparison of Various Distance Functions On K - Mean Clustering Algorithm
No ratings yet
18 A Comparison of Various Distance Functions On K - Mean Clustering Algorithm
9 pages
Artificial Intelligence and Machine Learning in SP
No ratings yet
Artificial Intelligence and Machine Learning in SP
8 pages
Content Server
No ratings yet
Content Server
15 pages
Crop Yield Prediction Based On Indian Agriculture Using Machine Learning
No ratings yet
Crop Yield Prediction Based On Indian Agriculture Using Machine Learning
5 pages
Chapter 3
No ratings yet
Chapter 3
18 pages
Final F04soln
No ratings yet
Final F04soln
10 pages
Biodata 1
No ratings yet
Biodata 1
3 pages
Detection of Pneumonia in Chest X Ray Image
No ratings yet
Detection of Pneumonia in Chest X Ray Image
6 pages
Lab 7
No ratings yet
Lab 7
4 pages
Geographic Visualization of Crime Datasets Using R: Case Study Crime Data From Indonesian Police Department, East Java
No ratings yet
Geographic Visualization of Crime Datasets Using R: Case Study Crime Data From Indonesian Police Department, East Java
1 page
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
From Everand
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
Mohammed Chadli
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Mathematical Formulas for Economics and Business: A Simple Introduction
From Everand
Mathematical Formulas for Economics and Business: A Simple Introduction
K.H. Erickson
4/5 (4)