0% found this document useful (0 votes)
10 views

Lecture Expectation Maximization

The document discusses clustering techniques in data mining, focusing on the K-means and Expectation-Maximization (EM) algorithms. It outlines the iterative processes of both algorithms for classifying data points into clusters based on their features, including initialization, expectation, and maximization steps. Additionally, it explains the application of the EM algorithm in image segmentation and provides examples of clustering with datasets.

Uploaded by

20i0863 Maryam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Lecture Expectation Maximization

The document discusses clustering techniques in data mining, focusing on the K-means and Expectation-Maximization (EM) algorithms. It outlines the iterative processes of both algorithms for classifying data points into clusters based on their features, including initialization, expectation, and maximization steps. Additionally, it explains the application of the EM algorithm in image segmentation and provides examples of clustering with datasets.

Uploaded by

20i0863 Maryam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Data Mining

Clustering with Expectation


Maximization
Color Clustering by K-means Algorithm

Form K-means clusters from a set of n-dimensional vectors

1. Set ic (iteration count) to 1

2. Choose randomly a set of K means m1(1), …, mK(1).

3. For each vector xi, compute D(xi,mk(ic)), k=1,…K


and assign xi to the cluster Cj with nearest mean.

4. Increment ic by 1, update the means to get m1(ic),…,mK(ic).

5. Repeat steps 3 and 4 until Ck(ic) = Ck(ic+1) for all k.


K-Means Classifier

Classification Results
x1→C(x1)
x1={r1, g1, b1} x2→C(x2)

x2={r2, g2, b2}
xi→C(xi)
… Classifier …
xi={ri, gi, bi} (K-Means)
… Cluster Parameters
m1 for C1
m2 for C2

mk for Ck
K-Means Classifier (Cont.)

Input (Known) Output (Unknown)


x1={r1, g1, b1}
Classification Results
x2={r2, g2, b2} Cluster Parameters x1→C(x1)
… m1 for C1 x2→C(x2)
m2 for C2 …
xi={ri, gi, bi} … xi→C(xi)
… mk for Ck …
Input (Known) Output (Unknown)
Initial Guess of
x1={r1, g1, b1} Cluster Parameters
x2={r2, g2, b2} m1 , m2 , …, mk Classification Results (1)
… C(x1), C(x2), …, C(xi)
Cluster Parameters(1)
xi={ri, gi, bi} m1 , m2 , …, mk
Classification Results (2)
… C(x1), C(x2), …, C(xi)
Cluster Parameters(2)
m1 , m2 , …, mk

•••
•••

Classification Results (ic)


Cluster Parameters(ic) C(x1), C(x2), …, C(xi)
m1 , m2 , …, mk

•••
•••
K-Means (Cont.)

• Boot Step:
– Initialize K clusters: C1, …, CK
Each Cluster is represented by its mean mj
• Iteration Step:
– Estimate the cluster of each data
xi C(xi)
– Re-estimate the cluster parameters

m j = mean{xi | xi C j }
K-Means Example
K-Means Example
K-Means → EM

• Boot Step:
– Initialize K clusters: C1, …, CK

(j, j) and P(Cj) for each cluster j.

• Iteration Step:
Expectation
– Estimate the cluster of each data
p ( C j | xi )
Maximization
– Re-estimate the cluster parameters
(  j ,  j ), p (C j ) For each cluster j
EM Classifier

Classification Results
p(C1|x1)
p(Cj|x2)
x1={r1, g1, b1}

x2={r2, g2, b2} p(Cj|xi)

… Classifier
xi={ri, gi, bi} (EM)
… Cluster Parameters
(1,1),p(C1) for C1
(2,2),p(C2) for C2


(k,k),p(Ck) for Ck
EM Classifier (Cont.)

Input (Known) Output (Unknown)


x1={r1, g1, b1} Cluster Parameters Classification Results
x2={r2, g2, b2} (1,1), p(C1) for C1 p(C1|x1)
(2,2), p(C2) for C2 p(Cj|x2)
… …
… p(Cj|xi)
xi={ri, gi, bi}
(k,k), p(Ck) for Ck …

Expectation Step

Input (Known) Input (Estimation) Output


x1={r1, g1, b1} Cluster Parameters Classification Results
x2={r2, g2, b2} (1,1), p(C1) for C1 p(C1|x1)
+ (2,2), p(C2) for C2 p(Cj|x2)
… …
… p(Cj|xi)
xi={ri, gi, (k,k), p(Ck) for Ck …
bi}

p ( xi | C j )  p (C j ) p ( xi | C j )  p (C j )
p (C j | xi ) = =
p ( xi )  p( x | C )  p(C )
j
i j j
Maximization Step

Input (Known) Input (Estimation) Output

x1={r1, g1, b1} Classification Results Cluster Parameters


p(C1|x1) (1, 1), p(C1) for C1
x2={r2, g2, b2} (2, 2), p(C2) for C2
p(Cj|x2)
+ …
… …
p(Cj|xi)
(k, k), p(Ck) for Ck
xi={ri, gi, …
bi}

 p(C | x )  x
j i i  p(C j | xi )  ( xi −  j )  ( xi −  j )T  p(C j | xi )
j = i
j = i p(C j ) = i

 p(C | x )
i
j i  p(C j | xi ) N
i
EM Algorithm

• Boot Step:
– Initialize K clusters: C1, …, CK
(j, j) and P(Cj) for each cluster j.
• Iteration Step:
– Expectation Step

p ( xi | C j )  p (C j ) p ( xi | C j )  p (C j )
p (C j | xi ) = =
p ( xi )  p( x | C )  p(C )
j
i j j

– Maximization Step
 p(C | x )  x
j i i  p(C j | xi )  ( xi −  j )  ( xi −  j )T  p(C j | xi )
j = i
j = i
p(C j ) = i

 p(C | x )
i
j i  p(C
i
j | xi ) N
Simpler Way to Understand

• Imagine you have a puzzle where some


pieces are missing. The EM algorithm helps
you complete the puzzle by guessing what
those missing pieces look like
Steps
You would follow

• Guess and Improve (Expectation Step)

– First you make a guess about what the missing


puzzle pieces might look like. This is like
saying “Hmm” I think the missing pieces could
be this color and shape. Your guess doesn’t
have to be prefect; its just a starting point
Steps
You would follow

• Make it better (Maximization Step)

– Then you look at the pieces you have and the


ones you guessed. You figure out how to adjust
your guess to make it match the pieces you
have as closely as possible. This steps is like
tweaking your guess to fit the puzzle better.
Steps
You would follow

• Repeat until done

– You keep doing these two steps over and over,


making your guess better and better each time.
It’s like refining your guess until the puzzle is
complete
EM

– The EM algorithm is like a smart helper that


makes educated guesses and keeps improving
them until the puzzle is solved. It’s great for
figuring out things when you don’t have all the
information you need.
EM Algorithm

• In Actual Terms
– The Expectation- Maximization algorithm is an
iterative statistical technique used for estimating
parameters of probabilistic models when some of the
data is missing or unobserved. EM is particularly useful
in situations where you have incomplete or partially
useful in situations where you have incomplete or
partially observed data, and you want to estimate the
underlying hidden variables or parameter of a statistical
model
EM

– The Expectation- Maximization algorithm is an


iterative optimization method that combines different
unsupervised machine learning algorithm to find
maximum likelihood or maximum posterior estimates
of parameter in statistical models that involve
unobserved latent variables
EM

– The EM Algorithm is commonly used for latent


variable models and can handle missing data. It consists
of an estimation step (E-step) and a maximization step
(M-Step) forming an iterative process to improve model
fit.
EM

– In E step, the algorithm computes the latent variable i.e.


expectation of the log-likelihood using the current
parameter estimates.
– In the M step, the algorithm determines the parameters
that maximize the expected log-likelihood obtained in
the E step, and corresponding model parameters are
updated based on the estimated latent variables
EM

– By iteratively repeating these steps, the EM algorithm


seeks to maximize the likelihood of the observed data.
It is commonly used for unsupervised learning task,
such as clustering, where latent variables are inferred,
and has application in various fields, including machine
learning, computer vision, and natural language
processing
Problem
Step1

26
Step2

27
Step3

28
Image Segmentation using EM

• Step 1: Feature Extraction


• Step 2: Image Segmentation using EM
Symbols

• The feature vector for pixel i is called xi.


• There are going to be K segments; K is
given.
• The j-th segment has a Gaussian distribution
with parameters j=(j,j).
• j's are the weights (which sum to 1) of
Gaussians.  is the collection of parameters:
 =(1, …, k, 1, …, k)
Initialization

• Each of the K Gaussians will have


parameters j=(j,j), where
– j is the mean of the j-th Gaussian.
– j is the covariance matrix of the j-th Gaussian.
• The covariance matrices are initialed to be
the identity matrix.
• The means can be initialized by finding the
average feature vectors in each of K windows
in the image; this is data-driven initialization.
E-Step

 j f j ( xi |  j )
p ( j | xi , ) = K


k =1
k f k ( xi |  k )

1
1 − ( x −  j )T  j −1 ( x −  j )
f j (x |  j ) = e 2
(2 ) |  j |
d /2 1/ 2
M-Step

 i
x p ( j | xi ,  old
)
 j new = i =1
N

 p
i =1
( j | xi ,  old
)

  )( xi −  j )( xi −  j
old new new T
p ( j | xi , )
j = i =1
new
N

 p (
i =1
j | xi ,  old
)

N
1
j =  
new old
p ( j | xi , )
N i =1
Sample Results
EM Clustering with Dataset

X1 (3,4)
X2 (6,5)
X3 (9,8)
X4 (14,11)

K=2 𝑷 𝒙𝟏 𝒄𝟏 = 𝑷( 𝟑, 𝟒 (𝟑, 𝟒))

Step1: Let us choose 1 =(3,4)


2 = (14,11)
Randomly to start Covariance 1= 1 0
0 1
Step2:  step 1  3−3  T 10  −1  3−3 
1 −
2  4 − 4   01  4 − 4 
p ( x1 | c1 ) = e
[(2 ) (1 − 0)]
2 1/ 2
EM Clustering with Dataset

X1 (3,4)
X2 (6,5)
X3 (9,8)
X4 (14,11)

1  3−3  T 10  −1  3−3 


1 −
2  4 − 4   01  4 − 4 
p ( x1 | c1 ) = e
[(2 ) (1 − 0)]
2 1/ 2

1  0  T 10  −1  0 
1 −
2  0   01  0 
= e
(2 )
1 0 1
= e = = 0.159 Higher
(2 ) (2 )
EM Clustering with Dataset

X1 (3,4)
X2 (6,5)
X3 (9,8)
X4 (14,11)
𝑷 𝒙𝟏 𝒄𝟐 = 𝑷( 𝟑, 𝟒 (𝟏𝟒, 𝟏𝟏))

1  3−14  T 10  −1  3−14 


1 −
2  4 −11  01  4 −11
p ( x1 | c 2 ) = e
[(2 ) ]
2 1/ 2

1  −11 T 10  −1  −11


1 −
2  − 7   01  − 7 
= e
(2 )
−170
1
= e 2
(2 )
EM Clustering with Dataset

X1 (3,4)
X2 (6,5)
X3 (9,8)
X4 (14,11)
𝑷 𝒙𝟐 𝒄𝟏 = 𝑷( 𝟔, 𝟓 (𝟑, 𝟒))

1  6 −3  T 10  −1  6 −3 
1 −
2  5 − 4   01  5 − 4 
p ( x1 | c 2 ) = e
[(2 ) ]
2 1/ 2

1  3  T 10  −1  3 
1 −
2 1   01 1 
= e
(2 )
1 −5
= e = 0.00107 Higher
(2 )
EM Clustering with Dataset

X1 (3,4)
X2 (6,5)
X3 (9,8)
X4 (14,11)
𝑷 𝒙𝟐 𝒄𝟐 = 𝑷( 𝟔, 𝟓 (𝟏𝟒, 𝟏𝟏))

1  6 −14  T 10  −1  6 −14 


1 −
2  5 −11   01  5 −11 
p ( x1 | c 2 ) = e
[(2 ) ]
2 1/ 2

1  −8  T 10  −1  −8 
1 −
2  − 6   01  − 6 
= e
(2 )
1 100
1 − 2 *(64+36 ) 1 − 1 −50
= e = e 2
= e
(2 ) (2 ) (2 )
EM Applications

• After complete execution status of cluster


– C1: x1,x2
– C2: x3, x4

• M Step:
– Calculate new centroid
1 = Avg of x1 and x2 2
=
3+6 4+5
2
,
2
= 9 9
,
2 2
= (4.5, 4.5) =
9+14 5+11
2
,
2
= 23 19
,
2 2
= (11.5, 9.5)

Then after second iteration you may get the actual


result if cluster data doesn’t get change
Object Class Recognition in CBIR

• The Goal:
Automatic image labeling (annotation)
to enable object-based image retrieval
Problem Statement

Known: Some images and their corresponding


descriptions

•••
{trees, grass, cherry trees} {cheetah, trunk} {mountains, sky} {beach, sky, trees, water}

To solve: What object classes are present in new


images

•••

? ? ? ?
Abstract Regions

Original Images Color Regions Texture Regions Line Clusters


Boat, Water,
Sky, …!

boat

building
Object Model Learning (Ideal)

Sk
y
Tree Sky =

+ Tree =

sky Water Water


tree =
Boat =
water Boat Learned Mode

boat region attributes → object


Object Model Learning (Real)

? Sky = ?

+ Tree = ?
?
{sky, tree, water, boat} Water ?
? =
Boat = ?

? Learned Mode

region attributes → object


Model Initial Estimation

• Estimate the initial model of an object using


all the region features from all images that
contain the object

Tree

Sky
EM Variant

Initial Model for “trees” Final Model for “trees”

EM

Initial Model for “sky” Final Model for “sky”


Object Model Learning

Assumptions
• The feature distribution of each
object within a region is a Gaussian;
• Each image is a set of regions, each
of which can be modeled as a
mixture of multivariate Gaussian
distributions.
1. Initialization Step (Example)
Image & description

I1 O1 I2 O1 I3 O2
O2 O3 O3

qO(0)1 qO(0)2 qO(0)3

W=0.5 W=0.5 W=0.5 W=0.5 W=0.5 W=0.5

W=0.5 W=0.5 W=0.5 W=0.5 W=0.5 W=0.5


2. Iteration Step (Example)
I1 O1 I2 O1 I3 O2
O2 O3 O3

E-Step
qO( 1p ) qO( 2p ) qO( 3p )

W=0.8 W=0.2 W=0.2 W=0.8 W=0.2 W=0.8

W=0.8 W=0.2 W=0.8 W=0.2 W=0.2 W=0.8

M-Step
q( p + 1) ( p + 1)
q qO( 3p+ 1)
O1 O2
Image Labeling

Object Model
Database
Test Image Color Regions

compare Tree

Sky

To calculate p(tree | image)


p( tree| )
p(tree | image) = max p( tree| ) p(o | FIa ) = max p ( o | r a
)
r FI
a a

p( tree| )
p( tree| )
Experiments

• 860 images
• 18 keywords: mountains (30), orangutan (37), track
(40), tree trunk (43), football field (43), beach (45),
prairie grass (53), cherry tree (53), snow (54), zebra
(56), polar bear (56), lion (71), water (76),
chimpanzee (79), cheetah (112), sky (259), grass
(272), tree (361).
• A set of cross-validation experiments (80% as the
training set and the other 20% as the test set)
ROC Charts

1 1

0.8 0.8

True Positive Rate


True Positive Rate

0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
False Positive Rate False Positive Rate

Independent Treatment of Using Intersection of


Color and Texture Color and Texture
Sample Results

cheetah
Sample Results (Cont.)

grass
Sample Results (Cont.)

lion
Any Question?

You might also like