0% found this document useful (0 votes)

38 views

Tutorial On Generalized Expectation

This document provides an introduction to the generalized expectation maximization (EM) algorithm and some of its modern generalizations. It begins with preliminaries on notation and assumptions. It then introduces the generalized EM algorithm, which iteratively performs an E-step to estimate the distribution of hidden variables given the current parameter estimates, and an M-step to update the parameter estimates by maximizing the expected log-likelihood with respect to the distribution from the E-step. The algorithm is shown to maximize a lower bound on the log-likelihood at each iteration. An example of applying the algorithm to a Gaussian mixture model is provided.

Uploaded by

bouzaieni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Tutorial On Generalized Expectation

Uploaded by

bouzaieni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Tutorial on Generalized Expectation Maximization

Javier R. Movellan

Preliminaries

The goal of this primer is to introduce the EM (expectation maximization) algorithm and some of its modern generalizations, including variational approximations. Notational conventions Unless otherwise stated, capital letters are used for random variables, small letters for specic values taken by random variables, and Greek letters for model parameters. We adhere to a Bayesian framework and treat model parameters as random variables with known prior. From this point of view Maximum-Likelihood methods can be interpreted as using weak priors. The probability space in which random variables are dened is left implicit and assumed to be endowed with the conditions needed to support the derivations being presented. We present the results using discrete random variables. Conversion to continuous variables simply requires changing probability mass functions into probability density fucntions and sums into integrals. When the context makes it clear, we identify probability functions by their arguments, and drop commas between arguments: e.g., p(xy) is shorthand for the joint probability mass or joint probability density that the random variable X takes the specic value x and the random variable Y takes the value y. Let O, H be random vectors representing observable data and hidden states. Let represent model parameters controlling the distribution of O, H. We treat as a random variable with known prior. We have two problems of interest: For a xed sample o from O nd values of with large posterior For a xed sample o from O nd values of H with large posterior Both problems are formally identical so we will focus on the rst one. Note 1 p( | o) = p(o, ) p(o) Thus argmax p( | o) = argmax log p(o, )

(1) (2)

Let q = {q ( | o) : p } be a family of distributions of H parameterized by . We call q a variational family, and the variational parameters of that family. Note log p(o, ) =
h

q (h | o) log p(o, ) q (h | o) log

(3) (4) (5)

p(oh) q (h | o) q (h | o) p(h | o, )

= F(, ) + K(, ) where F(, ) =

h
def

q (h | o) log q (h | o) log
h

p(o, h, ) q (h | o) q (h | o) p(h | o, )

(6) (7)

K(, ) =

def

Note K(, ) is the KL divergence between the distribution q ( | o) and p( | o ). Since KL divergences are non-negative, it follows that F(, ) is a lower bound on log(o, ),i.e., log p(o, ) F(, ) (8) This equation becomes an equality for values of for which K(, ) = 0, i.e., values of such that q (h | o) = p(h | o ) for all h.

The Generalized EM algorithm

We obtain a sequence, ((1) , (1) ), ((2) , (2) ) by iteration over two steps: E Step: (k+1) = argmax F(, (k) )

(9)

Note since F(, ) = log p(o, ) + K(, ) (10) and since log p(o, ) is a constant with respect to , this step amounts to minimizing K(, (k) ) with respect to , i.e., choose a member of the variational family q which is as close as possible to the current p. M Step: (k+1) = argmax F(k+1 , ) (11)

Successive application of EM maximize the lower bound F on log p(o, ), i.e, F((k+1) , (k) ) F((k) , (k) ) and F((k+1) , (k+1) ) F((k+1) , (k) ) Interpretation Optimizing F(, ) with respect to is equivalent to optimizing q (h | o) log p(o, h, )
h

(12) (13)

2.1

(14)

and since the log function is concave from below then q (h | o) log p(o, h, ) log
h h

p(o, h, ) = log p(o, )

(15)

Successive applications of EM increase a lower bound F on log p(o, ). This lower bound consists of the sum of two terms: a data driven term log p(o, ) that measures how well the distribution p() ts the observable data, and the term KL(, ) that penalizes deviations from the variational family q: F(, ) = log p(o, ) K(, ) (16)

Thus we can think of the Genearlized EM algorithm as solving a penalized maximum likelihood problem. Note log p(o, (k+1) ) log p(o, (k) ) K((k+1) , (k+1) ) K((k+1) , (k) ) (17) Note q(k+1) was chosen to be closest to p(|o, (k) ). Thus it is not unreasonable (but also not guaranteed) to expect that it may not be as close to p(|o, (k) ). In other words, it is not unreasonalbe (but also not guaranteed) to expect that K((k+1) , (k+1) ) K((k+1) , (k) ) 0 and thus log p(o, (k+1) ) log p(o, (k) ) (18) (19)

An important special case occurs when the family {q ( | o)} equals the family {p( | o, )}. In this case (k+1) = (k) and we can guarantee that log p(o, (k+1) ) log p(o, (k) ) K((k) , (k+1) ) 0 (20)

Moreover in this case to maximize F with respect to (k+1) we just need to maximize Q((k) , (k+1) ) =
h
def

p(h | o, (k) ) log p((k+1) )p(o, h | (k+1) )

(21)

and if we use an uninformative priors, then we just need to Q((k) , (k+1) ) =

h
def

p(h | o, (k) ) log p(o, h | (k+1) )

(22)

which is the objective function maximized by the standard EM algorithm. In the same vein, note that F(, ) is a free energy, i.e., the expected energy of states plus the entropy of the distribution under which the expected value is computed. In this case the energy of a state h is log p(o, h, ). Thus if there are no further constraints, the optimal distribution q( | o, ) is Boltzmann p(h | o, ) exp(log p(o, h, )) = p(o, h, )) p(o, h, )) p(h | o, ) = = p(h | o, ) h p(o, h, ) (23) (24)

Consider the case in which we are given a set of iid observations o = (o1 , on ). If we directly optimize log p(o | ) with respect to we get
n n i=1 n

log p(o | ) = =
i=1 n

log p(oi | ) =
i=1

1 p(oi | )

p(oi , h h

| ) (25) (26) (27)

1 p(oi | )

p(oi , h | )
h

log p(oi , h | )

=
i=1 h

p(h | oi , )

log p(oi , h | ) = 0

In contrast, when using the EM method we have

n Q(, )

=
i=1 h

p(h | oi , )

log p(oi , h | ) = 0

(28)

where and thus p(h | oi , ) is no longer a function of .

Example 1

Consider a simple Gaussian mixture model and a vector of independent observations o = (o1 , . . . , on )T from that model
n

log p(o | ) =
i=1

log p(oi | )

(29)

where p(oi | ) = (1 )p(oi | H = 0, ) + p(oi | H = 1, ) = (1 )g(oi , 0) + (oi , ) (30)

2 1 1 g(oi , ) = e 2 (oi ) 2

(31)

where the prior mixture term is xed. Taking derivatives with respect to we get log p(o | ) = =
i

1 p(oi | H = 1, )( oi ) p(oi | ) p(H = 1 | oi , ) ( oi ) = 0

(32) (33)

which is a non-linear equation difcult to solve. However EM asks us to optimize p(H = 1 | oi , ) log p(oi , H = 1 | ) =
i

(34)

Taking derivatives we get p(H = 1 | oi , ) ( oi ) = 0

(35)

which is easily solved =

oi p(H = 1 | oi , ) p(H = 1 | oi , )
i

(36)

History
The rst version of this document was written by Javier R. Movellan in January 2005, as part of the Kolmogorov project.

wma14-01-que-20250122 (filter)
No ratings yet
wma14-01-que-20250122 (filter)
12 pages
All EST 2 Math Level 1 Explanation
No ratings yet
All EST 2 Math Level 1 Explanation
98 pages
An Introduction To Signal Detection and Estimation - Second Edition Chapter IV: Selected Solutions
100% (1)
An Introduction To Signal Detection and Estimation - Second Edition Chapter IV: Selected Solutions
7 pages
Tutorial On Generalized Expectation Maximization: Javier R. Movellan
No ratings yet
Tutorial On Generalized Expectation Maximization: Javier R. Movellan
6 pages
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
No ratings yet
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
7 pages
The Kullback-Liebler Distance and Entropy
No ratings yet
The Kullback-Liebler Distance and Entropy
5 pages
The Expectation Maximization Algorithm
No ratings yet
The Expectation Maximization Algorithm
7 pages
EM Algo
No ratings yet
EM Algo
8 pages
Lecture3 EM
No ratings yet
Lecture3 EM
36 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
14 pages
The EM Algorithm: Ajit Singh November 20, 2005
No ratings yet
The EM Algorithm: Ajit Singh November 20, 2005
4 pages
(Slides) The em Algorithm
No ratings yet
(Slides) The em Algorithm
14 pages
CPSC 440: Advanced Machine Learning: Exponential Families
No ratings yet
CPSC 440: Advanced Machine Learning: Exponential Families
15 pages
EM Converge Property
No ratings yet
EM Converge Property
8 pages
Figueiredo EM Algorithm
No ratings yet
Figueiredo EM Algorithm
35 pages
Expectation-Maximization Algorithm
No ratings yet
Expectation-Maximization Algorithm
13 pages
Statistical Inference III: Mohammad Samsul Alam
No ratings yet
Statistical Inference III: Mohammad Samsul Alam
32 pages
Exponential Families: Peter D. Hoff September 26, 2013
No ratings yet
Exponential Families: Peter D. Hoff September 26, 2013
9 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
No ratings yet
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
24 pages
Lec16 PDF
No ratings yet
Lec16 PDF
10 pages
16 Aos1435
No ratings yet
16 Aos1435
44 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
The Expectation-Maximisation Algorithm: 14.1 The EM Algorithm - A Method For Maximising The Likeli-Hood
No ratings yet
The Expectation-Maximisation Algorithm: 14.1 The EM Algorithm - A Method For Maximising The Likeli-Hood
21 pages
Expectation Maximization
No ratings yet
Expectation Maximization
21 pages
Latent 2
No ratings yet
Latent 2
4 pages
8th Lecture Note - 1039837803 230515 094639
No ratings yet
8th Lecture Note - 1039837803 230515 094639
10 pages
EM at RIT
No ratings yet
EM at RIT
17 pages
statistics_lecture 7
No ratings yet
statistics_lecture 7
47 pages
An Introduction To Variational Calculus in Machine Learning
No ratings yet
An Introduction To Variational Calculus in Machine Learning
7 pages
Maximum Entropy Distribution
No ratings yet
Maximum Entropy Distribution
11 pages
HW2
No ratings yet
HW2
4 pages
A Modified Expectation Maximization Algorithm For Penalized Likelihood Estimation in Emission Tomorzradhv
No ratings yet
A Modified Expectation Maximization Algorithm For Penalized Likelihood Estimation in Emission Tomorzradhv
6 pages
msqe_metrics_1_ps2
No ratings yet
msqe_metrics_1_ps2
11 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Em Algo For Multivariate GMM
No ratings yet
Em Algo For Multivariate GMM
9 pages
Exp Family
No ratings yet
Exp Family
7 pages
Cheat Sheet 1
No ratings yet
Cheat Sheet 1
2 pages
L11.2 Prob Models em
No ratings yet
L11.2 Prob Models em
20 pages
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
No ratings yet
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
10 pages
Dis10 Sol PDF
No ratings yet
Dis10 Sol PDF
6 pages
Probabilistic Learning and Generalized Linear Models (GLMS)
No ratings yet
Probabilistic Learning and Generalized Linear Models (GLMS)
38 pages
Predição em Modelos de Tempo de Falha Acelerado Com Efeito Aleatório para Avaliação de Riscos de Falha - (JoaoBC)
No ratings yet
Predição em Modelos de Tempo de Falha Acelerado Com Efeito Aleatório para Avaliação de Riscos de Falha - (JoaoBC)
22 pages
Nummax
No ratings yet
Nummax
3 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
6 pages
Examples of Variational Inference With Gaussian-Gamma Distribution
No ratings yet
Examples of Variational Inference With Gaussian-Gamma Distribution
6 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
Oral Texte
No ratings yet
Oral Texte
12 pages
GMM Said Crv10 Tutorial
No ratings yet
GMM Said Crv10 Tutorial
27 pages
Conjugacy and the Exponential Family
No ratings yet
Conjugacy and the Exponential Family
6 pages
EM NonSym — Копия
No ratings yet
EM NonSym — Копия
18 pages
Notes
No ratings yet
Notes
10 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
ML and MAP - HTML
No ratings yet
ML and MAP - HTML
9 pages
The Problem: Library (MASS) Data (Faithful) Attach (Faithful)
No ratings yet
The Problem: Library (MASS) Data (Faithful) Attach (Faithful)
7 pages
Beamer
No ratings yet
Beamer
34 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Recursive Analysis
From Everand
Recursive Analysis
R. L. Goodstein
No ratings yet
Lahiri & Pal Problems 04.16-17
No ratings yet
Lahiri & Pal Problems 04.16-17
3 pages
A and B Are Mutually Exclusive A and B Are Independent.: IB Questionbank Mathematics Higher Level 3rd Edition 1
No ratings yet
A and B Are Mutually Exclusive A and B Are Independent.: IB Questionbank Mathematics Higher Level 3rd Edition 1
10 pages
Topic 5 - Group and Subgroup
No ratings yet
Topic 5 - Group and Subgroup
44 pages
KG Cycle 1 Year Plan 2022 2023
No ratings yet
KG Cycle 1 Year Plan 2022 2023
1 page
Examples On Ratig Curve & A-V Methods
No ratings yet
Examples On Ratig Curve & A-V Methods
7 pages
Calculus 3 - Chapter 14 - Vector Fields Course
No ratings yet
Calculus 3 - Chapter 14 - Vector Fields Course
54 pages
The Emerging Conception of The Policy Sciences
No ratings yet
The Emerging Conception of The Policy Sciences
12 pages
Information and Randomness-An Algorithmic Perspective
0% (1)
Information and Randomness-An Algorithmic Perspective
487 pages
Mean
No ratings yet
Mean
9 pages
Cse2003 Data-Structures-And-Algorithms Eth 1.0 37 Cse2003
No ratings yet
Cse2003 Data-Structures-And-Algorithms Eth 1.0 37 Cse2003
2 pages
Different Types of Central Tendency
No ratings yet
Different Types of Central Tendency
8 pages
Reflection
No ratings yet
Reflection
2 pages
Gödel's Incompleteness Theorems
No ratings yet
Gödel's Incompleteness Theorems
6 pages
Math DLP 16 Q1
No ratings yet
Math DLP 16 Q1
3 pages
MTAP Grade 2 Session 1
No ratings yet
MTAP Grade 2 Session 1
2 pages
Adv Trig PDF
No ratings yet
Adv Trig PDF
10 pages
BBB
No ratings yet
BBB
15 pages
NR-Power Flow
No ratings yet
NR-Power Flow
19 pages
0638 PDF
No ratings yet
0638 PDF
317 pages
3 Grade Place Value: Created by Barbara Fugitt Texarkana ISD
No ratings yet
3 Grade Place Value: Created by Barbara Fugitt Texarkana ISD
16 pages
Comparison Between Full Order and Minimum Order Observer Controller For DC Motor
No ratings yet
Comparison Between Full Order and Minimum Order Observer Controller For DC Motor
6 pages
DLL ESP MATH ENG Week 4
No ratings yet
DLL ESP MATH ENG Week 4
21 pages
Important Questions For CBSE Class 7 Maths Chapter 2 Free PDF
No ratings yet
Important Questions For CBSE Class 7 Maths Chapter 2 Free PDF
10 pages
The Cosmic Laws of Creation and Destruction - Georgi Stankov
No ratings yet
The Cosmic Laws of Creation and Destruction - Georgi Stankov
185 pages
TEMPLET To Give Attainment Values
No ratings yet
TEMPLET To Give Attainment Values
4 pages
BPJ Lesson 15
No ratings yet
BPJ Lesson 15
4 pages
Algebraic Expansion Study Materials
No ratings yet
Algebraic Expansion Study Materials
39 pages
Lecture Notes 1
No ratings yet
Lecture Notes 1
27 pages

Tutorial On Generalized Expectation

Uploaded by

Tutorial On Generalized Expectation

Uploaded by

Tutorial on Generalized Expectation Maximization

q (h | o) log p(o, ) q (h | o) log

(3) (4) (5)

= F(, ) + K(, ) where F(, ) =

The Generalized EM algorithm

p(o, h, ) = log p(o, )

p(h | o, (k) ) log p((k+1) )p(o, h | (k+1) )

and if we use an uninformative priors, then we just need to Q((k) , (k+1) ) =

p(h | o, (k) ) log p(o, h | (k+1) )

| ) (25) (26) (27)

In contrast, when using the EM method we have

where and thus p(h | oi , ) is no longer a function of .

where p(oi | ) = (1 )p(oi | H = 0, ) + p(oi | H = 1, ) = (1 )g(oi , 0) + (oi , ) (30)

1 p(oi | H = 1, )( oi ) p(oi | ) p(H = 1 | oi , ) ( oi ) = 0

Taking derivatives we get p(H = 1 | oi , ) ( oi ) = 0

which is easily solved =

You might also like