0% found this document useful (0 votes)

37 views32 pages

Statistical Inference III: Mohammad Samsul Alam

The document discusses the Expectation Maximization (EM) algorithm. The EM algorithm is an iterative method used to find maximum likelihood estimates of parameters in statistical models, when the model depends on unobserved latent variables. It involves two steps: (1) the Expectation (E) step, which calculates the expected value of the log-likelihood with respect to the latent variables; and (2) the Maximization (M) step, which computes the parameter estimates by maximizing the expected log-likelihood found in the E step. The algorithm iterates between these two steps until the parameter estimates converge. The EM algorithm is useful when the data can be thought of as incomplete or having missing values.

Uploaded by

Md Abdul Basit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views32 pages

Statistical Inference III: Mohammad Samsul Alam

Uploaded by

Md Abdul Basit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Statistical Inference III

(Expectation Maximization (EM) Algorithm)

Mohammad Samsul Alam

Assistant Professor of Applied Statistics
Institute of Statistical Research and Training (ISRT)
University of Dhaka

https://www.isrt.ac.bd/people/msalam

Email: [email protected] Lecture Material 4 1|32

Background I
The maximum likelihood estimate (mle) of population quanti-
ties θ is the value of θ for which the probability of observing the
data in hand is maximum. This can be done by maximizing the
likelihood function L(θ|y) or f (y1 , y2 , . . . , yn |θ) with respect to
θ. By notation, this can be written as,

arg max L(θ|y) (1)

This approach intuitively assumes that the data vector y is

complete in the sense that there is no missing value.
Now, if we consider that the data set is not complete. So that,
we can write the complete data as y = {yo , ym }.
In such case, a simple way is to discard the ym observations
and forward to get mle using the yo .
Email: [email protected] Lecture Material 4 2|32
Background II
This approach is not always a good practice, because we are
not using the information of ym while estimating the parameter
θ.
Fisher (1913, 1922) introduced a two steps iterative procedure
for mle in such case.
The first is the Expectation step, and the second step is the
Maximization step. Hence, the name of the procedure is EM
algorithm.
The EM algorithm uses two likelihood functions: one is the
complete data likelihood L(θ|y = {yo , ym }), and the other
is the incomplete data likelihood or observed data likelihood
L(θ|yo ).
The main idea is to maximize the observed data likelihood
through maximizing the complete data likelihood.

Email: [email protected] Lecture Material 4 3|32

EM Algorithm I

Assume that, among the n units of the sample, n1 observations

have been observed and are the elements of yo , and the others
n2 = n − n1 are elements of ym .
Also, assume that the elements in yo are iid having common
probability distribution Pθ (), where θ ∈ Ω, and elements in yo
and ym are mutually independent.
From the conditional probability we can write,

Pθ (yo , ym )
Pθ (ym |yo ) =
Pθ (yo )

we can simply rewrite,

log Pθ (yo ) = log Pθ (yo , ym ) − log Pθ (ym |yo )

Using the facts ,

Z
log Pθ (yo )Pθk (ym |yo )dym = log Pθ (yo ), and
Z
Eθ [g(x)] = g(x)f (x; θ)dx,

the equation (2) can be written as,

Email: [email protected] Lecture Material 4 5|32
EM Algorithm III
log Pθ (yo ) = Eθk [log Pθ (yo , ym )|yo ] − Eθk [log Pθ (ym |yo )|yo ]

This can be simplified as,

lθ (yo ) = Q(θ, θk ) − ν(θ, θk ) (3)

by letting,

Q(θ, θk ) = Eθk [log Pθ (yo , ym )|yo ]

ν(θ, θk ) = Eθk [log Pθ (ym |yo )|yo ]

EM algorithm maximizes the observed likelihood in (3), the

left hand side, by maximizing the complete data likelihood.

Email: [email protected] Lecture Material 4 6|32

EM Algorithm IV

Derivation of the expectation in the first term of the right

hand side of (3), that is finding Q(θ, θk ) is the E step, and
maximization of Q(θ, θk ) w.r.t θ is the M step of the EM
algorithm.

Email: [email protected] Lecture Material 4 7|32

EM Algorithm V
EM Algorithm
Let θ̂(m) denote the estimate on the mth step. To compute the
estimate on the (m + 1)st step, do
1 Expectation Step: Compute

Q(θ, θ̂(m) ) = Eθ̂(m) [log Pθ (yo , ym )|yo ]

where the expectation is taken under the conditional pdf

Pθ̂(m) (ym |yo ).
2 Maximization Step: Let

θ̂(m+1) = arg max Q(θ, θ̂(m) )

Email: [email protected] Lecture Material 4 8|32

EM Algorithm VI

Let us now study the difference between the log likelihood

function Lθ (yo ) evaluated at two different value θ and θk ,

Lθ (yo ) − Lθk (yo ) = (Q(θ, θk ) − ν(θ, θk )) − (Q(θk , θk ) − ν(θk , θk ))

(4)
= (Q(θ, θk ) − Q(θk , θk )) + (ν(θk , θk ) − ν(θ, θk ))

Now, second term in right hand side of above equation can be

simplified as,

Email: [email protected] Lecture Material 4 9|32

EM Algorithm VII

Email: [email protected] Lecture Material 4 10|32

EM Algorithm VIII
Since negative logarithm is a convex function, hence according to
Jensen’s Inequality1 ,

Email: [email protected] Lecture Material 4 11|32

EM Algorithm IX

So we can conclude that, left side of equation (4) would be

maximum if the first term of right hand is maximum (since the
second term is positive as shown in above equation).
Since, the difference in first term of right hand side could
be maximum by maximizing Q(θ, θk ), which is the expected
complete data likelihood.
This implies, initiating with a starting value, iterative maxi-
mization of the expectation of complete data likelihood will
yield the maximum observed data likelihood.

Email: [email protected] Lecture Material 4 12|32

EM Algorithm X

Case I:Censoring Model

Supppose X1 , X2 , . . . , Xn1 are iid with pdf f (x − θ), for −∞ < x <
∞, where −∞ < θ < ∞. Also, denote the cdf of Xi by F (x − θ).
Let Z1 , Z2 , . . . , Zn2 denote the censored observations. For these
observations, we only know that Zj > a, for some a which is known,
and that the Zj s are independent of the Xi s.

1
Jensen’s Inequality states that if f is convex function then

E{f (x)} ≥ f (E{x})

provided that both expectations exist.

Email: [email protected] Lecture Material 4 13|32
Solution I

The observed and complete likelihoods are given, respectively,

by
n1
L(θ|x) = [1 − F (a − θ)]n2
Y
f (xi − θ), (5)
i=1

n1 n2
c
Y Y
L (θ|x, z) = f (xi − θ) f (zi − θ). (6)
i=1 i=1

Email: [email protected] Lecture Material 4 14|32

Solution II

By expression (5), the conditional distribution (Z) given (X)

is the ratio of 6 to 5, that is,
Qn1
f (xi − θ) ni=1 f (zi − θ)
Q2
i=1
k(z|θ, x) = n2 n1
[1 − F (a − θ)] i=1 f (xi − θ)
Q
n2
= [1 − F (a − θ)]−n2
Y
f (zi − θ), a < zi , i = 1, . . . , n2
i=1
(7)

Thus, Z and X are independent, and Z1 , Z2 , . . . , Zn2 are iid

with the common pdf f (z − θ)/ [1 − F (a − θ)], for z > a.

Email: [email protected] Lecture Material 4 15|32

Solution III
Based on these observations and expression (7), we have the
following derivation:

Q(θ|θo, x) (8)
=Eθ0 [log Lc (θ|x, Z)]
"n n2
#
X1 X
=Eθ0 log f (xi − θ) + log f (Zi − θ)
i=1 i=1
n1
X
= log f (xi − θ) + n2 Eθ0 [log f (Z − θ)]
i=1
n1
X Z ∞
f (z − θ0 )
= log f (xi − θ) + n2 log f (z − θ) dz (9)
i=1 a 1 − F (a − θ0 )

This last result is the E step of the EM algorithm.

Email: [email protected] Lecture Material 4 16|32

Solution IV
For the M step, we need the partial derivative of Q(θ|θo, x)
with respect to θ. This is easily found to be,
( 0 0 )
δQ f (xi − θ) Z ∞
f (z − θ) f (z − θ0 )
=− + n2 dz .
δθ f (xi − θ) a f (z − θ) 1 − F (a − θ0 )
(10)

Assuming that θ0 = θ̂0 , the first-step EM estimate would be

the value of θ, say θ̂(1) , which solves δQ/δθ = 0.

Example: Censoring Model

For the censoring model discussed so far, assume that X has a
N (θ, 1) distribution. Find the M step of EM algorithm.

Email: [email protected] Lecture Material 4 17|32

Solution V

Then ( )
x2
f (x) = φ(x) = (2π)−1/2 exp −
2
0
, and it is easy to show that f (x)/f (x) = −x.
Letting Φ(z) denote, as usual, the cdf of a standard normal
random variable, by (10) the partial derivative of Q(θ|θ0 , x)
with respect to θ for the censoring model simplifies to

Email: [email protected] Lecture Material 4 18|32

Solution VI

n1
δQ X Z ∞
1 exp {−(z − θ0 )2 /2}
= (xi − θ) + n2 (z − θ) √ dz
δθ i=1 a 2π 1 − Φ(a − θ0 )
exp{−(z−θ0 )2 /2}
= n1 (x̄ − θ) + n2 a∞ (z − θ0 ) √12π 1−Φ(a−θ0 ) dz − n2 (θ − θ0 )
R

0
Z ∞
f (z − θ0 ) f (z − θ0 )
= n1 (x̄ − θ) − n2 dz − n2 (θ − θ0 )
a f (z − θ0 ) 1 − Φ(a − θ0 )
0
Z ∞
f (z − θ0 )
= n1 (x̄ − θ) − n2 dz − n2 (θ − θ0 )
a 1 − Φ(a − θ0 )
δ
Z ∞
δz
f (z − θ0 )
= n1 (x̄ − θ) − n2 dz − n2 (θ − θ0 )
a 1 − Φ(a − θ0 )

Email: [email protected] Lecture Material 4 19|32

Solution VII
δ a∞ f (z − θ0 )
R
δQ
= n1 (x̄ − θ) − n2 dz − n2 (θ − θ0 )
δθ δz 1 − Φ(a − θ0 )
n2
= n1 (x̄ − θ) + φ(a − θ0 ) − n2 (θ − θ0 )
1 − Φ(a − θ0 )

Solving δQ/δθ = 0 for θ determines the EM step estiamtes. In

particular, given that θ̂(m) is the EM estimate on the mth step,
the (m + 1)st step estimate is,
h i
(m)
n1 n2 n2 φ a − θ̂
θ̂(m+1) = x̄ + θ̂(m) + h i,
n n n 1 − Φ a − θ̂(m)

where n = n1 + n2 .

Email: [email protected] Lecture Material 4 20|32

Solution VIII

Example II
Let Xi and Zi have identical exponential distrbution with rate λ.
However, they are independent of each other. Also, assume that Zi
is censored observation, and there are such n2 = n − n1 , where n1 is
the number of observed variable, censored observations. Estimate
the λ using EM algorithm.

For censoring model, we have

n1 Z ∞
X
−λxi
f (z; λ0 )
Q(λ|x, λ0 ) = log λe + n2 log f (z; λ) dz
i=1 a 1 − F (a; λ0 )

Email: [email protected] Lecture Material 4 21|32

Solution IX
Using the fact that, for an exponential distribution,
F (a; λ) = 1 − exp{−λx} we can write,

n1 Z ∞
X
−λxi
f (z; λ0 )
Q(λ|x, λ0 ) = log λe + n2 log f (z; λ) dz
i=1 a e−λ0 a

Now,

Z ∞
f (z; λ0 )
n2 log f (z; λ) dz
a e−λ0 a
Z ∞ λ e−λ0 z
0
=n2 log λe−λz dz
a e−λ0 a

Email: [email protected] Lecture Material 4 22|32

Solution X
Z ∞
λ0 e−λ0 z
= n2 [log λ − λz] −λ0 a dz
a e
n2 log λ Z ∞ −λ0 z n2 λλ0 Z ∞ −λ0 z
= −λ0 a λ0 e dz − −λ0 a ze dz
e a e a
n2 log λ n2 λλ0 Z ∞ −λ0 z
= −λ0 a [1 − F (a; λ0 )] − −λ0 a ze dz
e e a
n2 λλ0 Z ∞ −λ0 z
= n2 log λ − −λ0 a ze dz
e a

Rb Rb
To apply a udv = uv|ba − a vdu let us assume that,

u = z → du = dz and
−λ0 z
Z Z
1 −λ0 z
dv = e → dv = e−λ0 z → v = − e .
λ0
Email: [email protected] Lecture Material 4 23|32
Solution XI
Therefore,

n2 λλ0 Z ∞ −λ0 z
ze dz
e−λ0 a "a ∞ #
Z ∞
n2 λλ0 1 1

−λ0 z −λ0 z
= −λ0 a z − e − − e dz
e λ0
a a λ0
∞ #
n2 λλ0 ae−λ0 a
" !
1
−λ0 z
= −λ0 a − e
λ20

e λ0
a
n2 λλ0 ae−λ0 a e−λ0 a
" #
= −λ0 a +
e λ0 λ20
n2 λ
=n2 aλ +
λ0

Email: [email protected] Lecture Material 4 24|32

Solution XII
Finally, we have
n1 n2 λ
log λe−λxi + n2 log λ − n2 aλ −
X
Q(λ|x, λ0 ) =
i=1 λ0
n2 λ
= n1 log λ − n1 λx̄ + n2 log λ − n2 aλ −
λ0
n2 λ
= n log λ − n1 λx̄ − n2 aλ −
λ0

Then, for M -step, we need to write δ/δλ(Q(λ|x, λ0 )) = 0 and need

to solve for λ. This yeilds, for (m + 1)th iteration,

n
λ(m+1) = n2
n1 x̄ + n2 a + λ(m)

Email: [email protected] Lecture Material 4 25|32

Solution XIII

Case 2: Mixture Distribution

Consider a mixture problem involving normal distributions. Suppose
Y1 has a N (µ1 , σ12 ) distribution and Y2 has a N (µ2 , σ22 ) distribution.
Let W be a random variable independent of Y1 and Y2 , and with
probability of success = P (W = 1). Suppose the random variable
we observe is X = (1 − W )Y1 + W Y2 . In this case, the vector of
0
parameter is given by θ = (µ1 , σ12 , µ2 , σ22 ).

The pdf of the mixture random variable X = (1 − W )Y1 + W Y2

f (x) = (1 − )f1 (x) + f2 (x), −∞ < x < ∞,

Email: [email protected] Lecture Material 4 26|32

Solution XIV
where fj (x) = σj−1 φ [(x − µj )/σj ], j = 1, 2, and φ(z) is the pdf
of a standard normal random variable (see Section 3.4 from
Hogg, et. al. 2013).
0
Suppose we observe a random sample X = (X1 , x2 , . . . , Xn )
from this mixture distribution with pdf f (x). Then the log of
the likelihood function is,
n
X
l(θ|x) = log [(1 − )f1 (xi ) + f2 (xi )] . (11)
i=1

In this mixture problem, the unobserved data are the random

variables which identify the distribution membership.
For i = 1, 2, . . . , n, define the random variables,
(
0 if Xi has pdf f1 (x)
Wi =
1 if Xi has pdf f2 (x).
Email: [email protected] Lecture Material 4 27|32
Solution XV
These variables, of course, constitute the random sample on
the Bernoulli random variable W .
Accordingly, assume that W1 , W2 , . . . , Wn are iid Bernoulli ran-
dom variables with probability of success .
The complete likelihood function is

Lc (θ|x, w) =
Y Y
f1 (xi ) f2 (xi ). (12)
Wi =0 Wi =1

Hence the log of the complete likelihood function is,

lc (θ|x, w) =
X X
log f1 (xi ) + log f2 (xi )
Wi =0 Wi =1
Xn
= [(1 − wi ) log f1 (xi ) + wi log f2 (xi )] . (13)
i=1

Email: [email protected] Lecture Material 4 28|32

Solution XVI
For the E step of the algorihm, we need the conditional expec-
tation of Wi given x under θ0 ; that is,

Eθ0 [Wi |θ0 , x] = P [Wi = 1|θ0 , x] (14)

An estimate of this expectation is the likelihood of xi being

drawn from distribution f2 (x), which given by

ˆf2,0 (xi )
γi = , (15)
(1 − ˆ)f1,0 (xi ) + ˆf2,0 (xi )

where the subscript 0 signifies that the parameters at θ0 are

being used.
Expression (15) is intuitively evident; see McLachlan and Kr-
ishnan (1997) for more discussion.

Email: [email protected] Lecture Material 4 29|32

Solution XVII

Replacing wi by γi , in expession (13), the M step fo the algo-

rithm is to maximize
n
X
Q(θ|θ0 , x) = [(1 − γi ) log f1 (xi ) + γi log f2 (xi )] . (16)
i=1

This maximization is easy to obtain by taking partial derivatives

of Q(θ|θ0 , x) with respect to the parameters. For example,
n
δQ
= (1 − γi )(−1/2σ12 )(−2)(xi − µ1 ).
X
δµ1 i=1

Email: [email protected] Lecture Material 4 30|32

Solution XVIII

Setting this to 0 and solving for µ1 yields the estimate of µ1 .

The estimates of the other mean and the variances can be
obtained similarly. Thesee estiamtes are
P1
(1 − γi )xi
µ̂1 = Pi=1
1
i=1 (1 − γi )
P1
(1 − γi )(xi − µ̂1 )2
σ̂12 = i=1 P1
i=1 (1 − γi )
P1
γi xi
µ̂2 = Pi=1
1
i=1 γi
P1
γi (xi − µ̂1 )2
σ̂22 = i=1 P1
i=1 γi

Email: [email protected] Lecture Material 4 31|32

Solution XIX

Since γi is an estimate of P [Wi = 1|θ 0 , x], the average

(1/n) ni=1 γi is an estimate of = P [Wi = 1]. This average is
P

the estimate of ˆ.

So, to start the algorithm, we need to take initial guess for
θ = {ˆ, µ̂1 , σ̂12 , µ̂2 , σ̂22 }.

Email: [email protected] Lecture Material 4 32|32

An Introduction To Signal Detection and Estimation - Second Edition Chapter IV: Selected Solutions
100% (1)
An Introduction To Signal Detection and Estimation - Second Edition Chapter IV: Selected Solutions
7 pages
(Slides) The em Algorithm
No ratings yet
(Slides) The em Algorithm
14 pages
Lecture3 EM
No ratings yet
Lecture3 EM
36 pages
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
No ratings yet
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
7 pages
EM Algorithm and Variants: An Informal Tutorial
No ratings yet
EM Algorithm and Variants: An Informal Tutorial
17 pages
5
No ratings yet
5
29 pages
Figueiredo EM Algorithm
No ratings yet
Figueiredo EM Algorithm
35 pages
The Kullback-Liebler Distance and Entropy
No ratings yet
The Kullback-Liebler Distance and Entropy
5 pages
Problem 1: Otherwise, 0 X 0 1), 0 ( ) (
No ratings yet
Problem 1: Otherwise, 0 X 0 1), 0 ( ) (
4 pages
Tutorial On Generalized Expectation Maximization: Javier R. Movellan
No ratings yet
Tutorial On Generalized Expectation Maximization: Javier R. Movellan
6 pages
Tutorial On Generalized Expectation
No ratings yet
Tutorial On Generalized Expectation
6 pages
Chapter 9.4 Allele Frequency Estimation
No ratings yet
Chapter 9.4 Allele Frequency Estimation
24 pages
EM Algo
No ratings yet
EM Algo
8 pages
The EM Algorithm: Ajit Singh November 20, 2005
No ratings yet
The EM Algorithm: Ajit Singh November 20, 2005
4 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
No ratings yet
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
24 pages
The Expectation-Maximisation Algorithm: 14.1 The EM Algorithm - A Method For Maximising The Likeli-Hood
No ratings yet
The Expectation-Maximisation Algorithm: 14.1 The EM Algorithm - A Method For Maximising The Likeli-Hood
21 pages
Solution 4 Problem 1: A A ( 1, +1) : Iid Data
No ratings yet
Solution 4 Problem 1: A A ( 1, +1) : Iid Data
18 pages
Lecture 11
No ratings yet
Lecture 11
124 pages
11 Hidden Markov Models (HMMS) Model and Problem Description
No ratings yet
11 Hidden Markov Models (HMMS) Model and Problem Description
15 pages
Lecture 13. em Algorithm (After-Class)
No ratings yet
Lecture 13. em Algorithm (After-Class)
6 pages
EM Algorithm
No ratings yet
EM Algorithm
10 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
No ratings yet
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
10 pages
A Modified Expectation Maximization Algorithm For Penalized Likelihood Estimation in Emission Tomorzradhv
No ratings yet
A Modified Expectation Maximization Algorithm For Penalized Likelihood Estimation in Emission Tomorzradhv
6 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
10 pages
Lecture-04 GMM EMalg
No ratings yet
Lecture-04 GMM EMalg
34 pages
EM Presentation 2013
No ratings yet
EM Presentation 2013
18 pages
Beamer
No ratings yet
Beamer
34 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
The Problem: Library (MASS) Data (Faithful) Attach (Faithful)
No ratings yet
The Problem: Library (MASS) Data (Faithful) Attach (Faithful)
7 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
14 pages
Expectation-Maximization Algorithm
No ratings yet
Expectation-Maximization Algorithm
13 pages
Expectation Maximization: Dekang Lin Department of Computing Science University of Alberta
No ratings yet
Expectation Maximization: Dekang Lin Department of Computing Science University of Alberta
22 pages
Inf 2
No ratings yet
Inf 2
37 pages
Expectation Maximization (EM) Algorithm
No ratings yet
Expectation Maximization (EM) Algorithm
47 pages
ECE 534: Exam II: Monday November 14, 2011 7:00 P.M. - 8:15 P.M. 103 Talbot
No ratings yet
ECE 534: Exam II: Monday November 14, 2011 7:00 P.M. - 8:15 P.M. 103 Talbot
2 pages
The Expectation Maximization Algorithm
No ratings yet
The Expectation Maximization Algorithm
7 pages
ECE 534: Exam II: Monday November 14, 2011 7:00 P.M. - 8:15 P.M. 103 Talbot
No ratings yet
ECE 534: Exam II: Monday November 14, 2011 7:00 P.M. - 8:15 P.M. 103 Talbot
1 page
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
Density Estimation With Gaussian Mixture Models: CS 2XX: Mathematics For AI and ML
No ratings yet
Density Estimation With Gaussian Mixture Models: CS 2XX: Mathematics For AI and ML
26 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
N D IX: The E-M Algorithm
No ratings yet
N D IX: The E-M Algorithm
12 pages
A Derivation of The EM Updates For Finding The Maximum Likelihood Parameter Estimates of The Student's T Distribution
No ratings yet
A Derivation of The EM Updates For Finding The Maximum Likelihood Parameter Estimates of The Student's T Distribution
5 pages
NOTES
No ratings yet
NOTES
14 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
Solutions To Selected Exercises From Chapter 9 Bain & Engelhardt - Second Edition
No ratings yet
Solutions To Selected Exercises From Chapter 9 Bain & Engelhardt - Second Edition
13 pages
Sta255 Week 11-2 Pre
No ratings yet
Sta255 Week 11-2 Pre
21 pages
7.estimation Clustering
No ratings yet
7.estimation Clustering
56 pages
Simple Example
No ratings yet
Simple Example
3 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Squalsoln
No ratings yet
Squalsoln
61 pages
Dis10 Sol PDF
No ratings yet
Dis10 Sol PDF
6 pages
Expectation Maximization Notes
No ratings yet
Expectation Maximization Notes
5 pages
PP 01 Soln
No ratings yet
PP 01 Soln
10 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Statistical Inference III: Mohammad Samsul Alam
No ratings yet
Statistical Inference III: Mohammad Samsul Alam
25 pages
Statistical Inference II: Mohammad Samsul Alam
No ratings yet
Statistical Inference II: Mohammad Samsul Alam
31 pages
Statistical Inference III: Mohammad Samsul Alam
No ratings yet
Statistical Inference III: Mohammad Samsul Alam
19 pages
Introduction Field Sigma Field Prob-1
No ratings yet
Introduction Field Sigma Field Prob-1
125 pages
Algorithms Probability Distribution Markov Chain Limiting Distribution
No ratings yet
Algorithms Probability Distribution Markov Chain Limiting Distribution
1 page
Convergence Almost Surely:: The Most Important Example of Convergence in
No ratings yet
Convergence Almost Surely:: The Most Important Example of Convergence in
2 pages
Growth and Decay, Newtons Law of Cooling, Mixtures
No ratings yet
Growth and Decay, Newtons Law of Cooling, Mixtures
4 pages
Lesson 1. Definition of Real Number System
No ratings yet
Lesson 1. Definition of Real Number System
396 pages
Logic Proposition
No ratings yet
Logic Proposition
12 pages
3 Industry Calculations 3
No ratings yet
3 Industry Calculations 3
24 pages
3 Aplication of Matrice Operations (Larson)
No ratings yet
3 Aplication of Matrice Operations (Larson)
14 pages
Assignment 1 Solution
No ratings yet
Assignment 1 Solution
5 pages
EE 301 CLOs Assessment Example
No ratings yet
EE 301 CLOs Assessment Example
14 pages
Csec Add Maths 2022 Paper 2
No ratings yet
Csec Add Maths 2022 Paper 2
17 pages
NOTES 10.5 Surface Area
No ratings yet
NOTES 10.5 Surface Area
4 pages
Pps Lab External Programs For Students
No ratings yet
Pps Lab External Programs For Students
3 pages
A Level Further Mathematics For AQA - Student Book 1
50% (2)
A Level Further Mathematics For AQA - Student Book 1
31 pages
Scalar Product HELM
No ratings yet
Scalar Product HELM
15 pages
Adds or Subtracts Dissimilar Fractions in Simple and Mixed Forms With Regrouping
No ratings yet
Adds or Subtracts Dissimilar Fractions in Simple and Mixed Forms With Regrouping
3 pages
Star Mesh Transform
No ratings yet
Star Mesh Transform
2 pages
Comparison of Several Multivariate Means
50% (2)
Comparison of Several Multivariate Means
103 pages
12th Maths Important 5, 3 & 1 Mark Q & A EM Nithish Publication
100% (2)
12th Maths Important 5, 3 & 1 Mark Q & A EM Nithish Publication
40 pages
9 Maths QP
No ratings yet
9 Maths QP
7 pages
Algorithm and Flow Chart
No ratings yet
Algorithm and Flow Chart
20 pages
Probability - 3: (Construct A Tree Diagram For Each of The Question)
No ratings yet
Probability - 3: (Construct A Tree Diagram For Each of The Question)
2 pages
Find Z-Transform Of: Solution: Since
No ratings yet
Find Z-Transform Of: Solution: Since
8 pages
Signals and Systems: BITS Pilani
No ratings yet
Signals and Systems: BITS Pilani
21 pages
WS 1.1.6 - Sets & Venn Diagram
No ratings yet
WS 1.1.6 - Sets & Venn Diagram
2 pages
VECTOR 5 - Dot Product
No ratings yet
VECTOR 5 - Dot Product
5 pages
LAS WEEK 2A-The NTH Term of A Sequence
No ratings yet
LAS WEEK 2A-The NTH Term of A Sequence
2 pages
Boolean Algebra and Logic Simplification: Truth Tables For The Laws of Boolean
No ratings yet
Boolean Algebra and Logic Simplification: Truth Tables For The Laws of Boolean
20 pages
Lecture 8
No ratings yet
Lecture 8
15 pages
IU2015ERP1856
No ratings yet
IU2015ERP1856
20 pages
Euclid's Elements - Wikipedia
No ratings yet
Euclid's Elements - Wikipedia
36 pages
Differentiability and Analyticity PDF
No ratings yet
Differentiability and Analyticity PDF
21 pages
Convolutional Neural Networks: Jianxin Wu
No ratings yet
Convolutional Neural Networks: Jianxin Wu
35 pages

Statistical Inference III: Mohammad Samsul Alam

Uploaded by

Statistical Inference III: Mohammad Samsul Alam

Uploaded by

Statistical Inference III

(Expectation Maximization (EM) Algorithm)

Mohammad Samsul Alam

Email: [email protected] Lecture Material 4 1|32

arg max L(θ|y) (1)

This approach intuitively assumes that the data vector y is

Email: [email protected] Lecture Material 4 3|32

Assume that, among the n units of the sample, n1 observations

we can simply rewrite,

log Pθ (yo ) = log Pθ (yo , ym ) − log Pθ (ym |yo )

Using the facts ,

the equation (2) can be written as,

This can be simplified as,

lθ (yo ) = Q(θ, θk ) − ν(θ, θk ) (3)

Q(θ, θk ) = Eθk [log Pθ (yo , ym )|yo ]

EM algorithm maximizes the observed likelihood in (3), the

Email: [email protected] Lecture Material 4 6|32

Derivation of the expectation in the first term of the right

Email: [email protected] Lecture Material 4 7|32

Q(θ, θ̂(m) ) = Eθ̂(m) [log Pθ (yo , ym )|yo ]

where the expectation is taken under the conditional pdf

θ̂(m+1) = arg max Q(θ, θ̂(m) )

Email: [email protected] Lecture Material 4 8|32

Let us now study the difference between the log likelihood

Lθ (yo ) − Lθk (yo ) = (Q(θ, θk ) − ν(θ, θk )) − (Q(θk , θk ) − ν(θk , θk ))

Now, second term in right hand side of above equation can be

Email: [email protected] Lecture Material 4 9|32

Email: [email protected] Lecture Material 4 10|32

Email: [email protected] Lecture Material 4 11|32

So we can conclude that, left side of equation (4) would be

Email: [email protected] Lecture Material 4 12|32

Case I:Censoring Model

E{f (x)} ≥ f (E{x})

provided that both expectations exist.

The observed and complete likelihoods are given, respectively,

Email: [email protected] Lecture Material 4 14|32

By expression (5), the conditional distribution (Z) given (X)

Thus, Z and X are independent, and Z1 , Z2 , . . . , Zn2 are iid

Email: [email protected] Lecture Material 4 15|32

This last result is the E step of the EM algorithm.

Email: [email protected] Lecture Material 4 16|32

Assuming that θ0 = θ̂0 , the first-step EM estimate would be

Example: Censoring Model

Email: [email protected] Lecture Material 4 17|32

Email: [email protected] Lecture Material 4 18|32

Email: [email protected] Lecture Material 4 19|32

Solving δQ/δθ = 0 for θ determines the EM step estiamtes. In

Email: [email protected] Lecture Material 4 20|32

For censoring model, we have

Email: [email protected] Lecture Material 4 21|32

Email: [email protected] Lecture Material 4 22|32

Email: [email protected] Lecture Material 4 24|32

Then, for M -step, we need to write δ/δλ(Q(λ|x, λ0 )) = 0 and need

Email: [email protected] Lecture Material 4 25|32

Case 2: Mixture Distribution

The pdf of the mixture random variable X = (1 − W )Y1 + W Y2

f (x) = (1 − )f1 (x) + f2 (x), −∞ < x < ∞,

Email: [email protected] Lecture Material 4 26|32

In this mixture problem, the unobserved data are the random

Hence the log of the complete likelihood function is,

Email: [email protected] Lecture Material 4 28|32

Eθ0 [Wi |θ0 , x] = P [Wi = 1|θ0 , x] (14)

An estimate of this expectation is the likelihood of xi being

where the subscript 0 signifies that the parameters at θ0 are

Email: [email protected] Lecture Material 4 29|32

Replacing wi by γi , in expession (13), the M step fo the algo-

This maximization is easy to obtain by taking partial derivatives

Email: [email protected] Lecture Material 4 30|32

Setting this to 0 and solving for µ1 yields the estimate of µ1 .

Email: [email protected] Lecture Material 4 31|32

Since γi is an estimate of P [Wi = 1|θ 0 , x], the average

the estimate of ˆ.

Email: [email protected] Lecture Material 4 32|32

You might also like

f (x) = (1 − )f1 (x) + f2 (x), −∞ < x < ∞,

the estimate of ˆ.