Lecture Expectation Maximization
Lecture Expectation Maximization
Classification Results
x1→C(x1)
x1={r1, g1, b1} x2→C(x2)
…
x2={r2, g2, b2}
xi→C(xi)
… Classifier …
xi={ri, gi, bi} (K-Means)
… Cluster Parameters
m1 for C1
m2 for C2
…
mk for Ck
K-Means Classifier (Cont.)
•••
•••
•••
•••
K-Means (Cont.)
• Boot Step:
– Initialize K clusters: C1, …, CK
Each Cluster is represented by its mean mj
• Iteration Step:
– Estimate the cluster of each data
xi C(xi)
– Re-estimate the cluster parameters
m j = mean{xi | xi C j }
K-Means Example
K-Means Example
K-Means → EM
• Boot Step:
– Initialize K clusters: C1, …, CK
• Iteration Step:
Expectation
– Estimate the cluster of each data
p ( C j | xi )
Maximization
– Re-estimate the cluster parameters
( j , j ), p (C j ) For each cluster j
EM Classifier
Classification Results
p(C1|x1)
p(Cj|x2)
x1={r1, g1, b1}
…
x2={r2, g2, b2} p(Cj|xi)
…
… Classifier
xi={ri, gi, bi} (EM)
… Cluster Parameters
(1,1),p(C1) for C1
(2,2),p(C2) for C2
…
(k,k),p(Ck) for Ck
EM Classifier (Cont.)
p(C | x ) x
j i i p(C j | xi ) ( xi − j ) ( xi − j )T p(C j | xi )
j = i
j = i p(C j ) = i
p(C | x )
i
j i p(C j | xi ) N
i
EM Algorithm
• Boot Step:
– Initialize K clusters: C1, …, CK
(j, j) and P(Cj) for each cluster j.
• Iteration Step:
– Expectation Step
p ( xi | C j ) p (C j ) p ( xi | C j ) p (C j )
p (C j | xi ) = =
p ( xi ) p( x | C ) p(C )
j
i j j
– Maximization Step
p(C | x ) x
j i i p(C j | xi ) ( xi − j ) ( xi − j )T p(C j | xi )
j = i
j = i
p(C j ) = i
p(C | x )
i
j i p(C
i
j | xi ) N
Simpler Way to Understand
• In Actual Terms
– The Expectation- Maximization algorithm is an
iterative statistical technique used for estimating
parameters of probabilistic models when some of the
data is missing or unobserved. EM is particularly useful
in situations where you have incomplete or partially
useful in situations where you have incomplete or
partially observed data, and you want to estimate the
underlying hidden variables or parameter of a statistical
model
EM
26
Step2
27
Step3
28
Image Segmentation using EM
j f j ( xi | j )
p ( j | xi , ) = K
k =1
k f k ( xi | k )
1
1 − ( x − j )T j −1 ( x − j )
f j (x | j ) = e 2
(2 ) | j |
d /2 1/ 2
M-Step
i
x p ( j | xi , old
)
j new = i =1
N
p
i =1
( j | xi , old
)
)( xi − j )( xi − j
old new new T
p ( j | xi , )
j = i =1
new
N
p (
i =1
j | xi , old
)
N
1
j =
new old
p ( j | xi , )
N i =1
Sample Results
EM Clustering with Dataset
X1 (3,4)
X2 (6,5)
X3 (9,8)
X4 (14,11)
X1 (3,4)
X2 (6,5)
X3 (9,8)
X4 (14,11)
1 0 T 10 −1 0
1 −
2 0 01 0
= e
(2 )
1 0 1
= e = = 0.159 Higher
(2 ) (2 )
EM Clustering with Dataset
X1 (3,4)
X2 (6,5)
X3 (9,8)
X4 (14,11)
𝑷 𝒙𝟏 𝒄𝟐 = 𝑷( 𝟑, 𝟒 (𝟏𝟒, 𝟏𝟏))
X1 (3,4)
X2 (6,5)
X3 (9,8)
X4 (14,11)
𝑷 𝒙𝟐 𝒄𝟏 = 𝑷( 𝟔, 𝟓 (𝟑, 𝟒))
1 6 −3 T 10 −1 6 −3
1 −
2 5 − 4 01 5 − 4
p ( x1 | c 2 ) = e
[(2 ) ]
2 1/ 2
1 3 T 10 −1 3
1 −
2 1 01 1
= e
(2 )
1 −5
= e = 0.00107 Higher
(2 )
EM Clustering with Dataset
X1 (3,4)
X2 (6,5)
X3 (9,8)
X4 (14,11)
𝑷 𝒙𝟐 𝒄𝟐 = 𝑷( 𝟔, 𝟓 (𝟏𝟒, 𝟏𝟏))
1 −8 T 10 −1 −8
1 −
2 − 6 01 − 6
= e
(2 )
1 100
1 − 2 *(64+36 ) 1 − 1 −50
= e = e 2
= e
(2 ) (2 ) (2 )
EM Applications
• M Step:
– Calculate new centroid
1 = Avg of x1 and x2 2
=
3+6 4+5
2
,
2
= 9 9
,
2 2
= (4.5, 4.5) =
9+14 5+11
2
,
2
= 23 19
,
2 2
= (11.5, 9.5)
• The Goal:
Automatic image labeling (annotation)
to enable object-based image retrieval
Problem Statement
•••
{trees, grass, cherry trees} {cheetah, trunk} {mountains, sky} {beach, sky, trees, water}
•••
? ? ? ?
Abstract Regions
boat
building
Object Model Learning (Ideal)
Sk
y
Tree Sky =
+ Tree =
? Sky = ?
+ Tree = ?
?
{sky, tree, water, boat} Water ?
? =
Boat = ?
? Learned Mode
Tree
Sky
EM Variant
EM
Assumptions
• The feature distribution of each
object within a region is a Gaussian;
• Each image is a set of regions, each
of which can be modeled as a
mixture of multivariate Gaussian
distributions.
1. Initialization Step (Example)
Image & description
I1 O1 I2 O1 I3 O2
O2 O3 O3
E-Step
qO( 1p ) qO( 2p ) qO( 3p )
M-Step
q( p + 1) ( p + 1)
q qO( 3p+ 1)
O1 O2
Image Labeling
Object Model
Database
Test Image Color Regions
compare Tree
Sky
p( tree| )
p( tree| )
Experiments
• 860 images
• 18 keywords: mountains (30), orangutan (37), track
(40), tree trunk (43), football field (43), beach (45),
prairie grass (53), cherry tree (53), snow (54), zebra
(56), polar bear (56), lion (71), water (76),
chimpanzee (79), cheetah (112), sky (259), grass
(272), tree (361).
• A set of cross-validation experiments (80% as the
training set and the other 20% as the test set)
ROC Charts
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
False Positive Rate False Positive Rate
cheetah
Sample Results (Cont.)
grass
Sample Results (Cont.)
lion
Any Question?