0% found this document useful (0 votes)
81 views18 pages

EM Presentation 2013

Em

Uploaded by

ackrin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views18 pages

EM Presentation 2013

Em

Uploaded by

ackrin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Expectation Maximization

-
Introduction to EM algorithm

TLT-5906 Advanced Course in Digital Transmission

Jukka Talvitie, M.Sc. (eng)


[email protected]
Department of Communication Engineering
Tampere University of Technology

M.Sc. Jukka Talvitie 5.12.2013


Outline
q Expectation Maximization (EM) algorithm
– Motivation, background
– Where the EM can be used?
q EM principle
– Formal definition
– How the algorithm really works?
– Coin toss example
– About some practical issues
q More advanced examples
– Line fitting with EM algorithm
– Parameter estimation of multivariate Gaussian mixture
q Conclusions

M.Sc. Jukka Talvitie 5.12.2013


Motivation

q Consider classical line fitting problem:


– Assume below measurements of a linear model y=ax+b+n (here
the line parameters are a and b and n is zero mean noise)

2.2
Measurements

2.1

1.9

1.8

1.7

1.6

1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

M.Sc. Jukka Talvitie 5.12.2013


Motivation

q We use LS (Least Squares) to find the best fit:


q Is this the best solution?

2.3
Measurements
2.2 LS

2.1

1.9

1.8

1.7

1.6

1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

M.Sc. Jukka Talvitie 5.12.2013


Motivation

q LS would be the Best Linear Unbiased Estimator, if the noise


would be uncorrelated with fixed variance
q Here, actually the noise term is correlated and the actual linear
model of this realization can be seen below as the black line
– Here the LS gives too much weight for a group of samples in the
middle
2.3
Measurements
2.2 LS
Correct line

2.1

1.9

1.8

1.7

1.6

1.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

M.Sc. Jukka Talvitie 5.12.2013


Motivation
q Taking the correlation of the noise term into account, we can use Generalized LS
method and the result can be improved considerably
q However, in many cases we do not know the correlation model
– It is hided in the observations and we cannot access it directly
– Therefore, e.g. here we would need to estimate simultaneously the
covariance and the line parameters
q This sort of problems might quite quickly become
very complicated 2.3
Measurements
– How to estimate the covariance without 2.2 LS
Correct line
knowing the line parameters and vice versa? 2.1 Generalized LS

q Intuitive (heuristic) solution:


2
– Iteratively estimate the other parameter, and
then the other, and continue… 1.9

– No guarantee for the performance in this 1.8

case (e.g. compared to maximum likelihood 1.7

(ML) solution)
1.6

q The EM algorithm provides the ML solution for


1.5
these sort of problems 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

M.Sc. Jukka Talvitie 5.12.2013


Expectation Maximization
Algorithm
q Presented by Dempster, Laird and Rubin in [1] in 1977
– Basically the same principle was already proposed earlier by
some other authors in specific circumstances
q EM algorithm is an iterative estimation algorithm that can derive
the maximum likelihood (ML) estimates in the presence of
missing/hidden data (“incomplete data”)
– e.g. the classical case is the Gaussian mixture, where we have
a set of unknown Gaussian distributions (see example later on)
Many-to-one mapping [2]
X: underlying space
x: complete data (required for ML)
Y: observation space
y: observation

x is observed only by means of y(x).


X(y) is a subset of X determined by y.

M.Sc. Jukka Talvitie 5.12.2013


Expectation Maximization
Algorithm
q The basic functioning of the EM algorithm can be divided into two
steps (the parameter to be estimated is θ):
– Expectation step (E-step)
• Take the expected value of the complete data given the
observation and the current parameter estimate qˆ
k

{
Q(q , qˆk ) = E log f (x | q ) | y, qˆk }
– Maximization step (M-step)
• Maximize the Q-function in the E-step (basically, the data of
the E-step is used as it were measured observations)

qˆk +1 = arg max Q (q | qˆk )


q

q The likelihood of the parameter is increased at every iteration


– EM converges towards some local maximum of the likelihood
function
M.Sc. Jukka Talvitie 5.12.2013
An example: ML estimation vs. EM
algorithm [3]
q We wish to estimate the variance of S:
– observation Y=S+N
• S and N are normally distributed with zero means and
variances θ and 1, respectively
– Now, Y is also normally distributed (zero mean with variance θ+1)
q ML estimate can be easily derived:

qˆML = arg max( p( y | q ))


q

M
= max{0, y 2 - 1}

q The zero in above result becomes from the fact that we know that
the variance is always non-negative

M.Sc. Jukka Talvitie 5.12.2013


An example: ML estimation vs. EM
algorithm
q The same with the EM algorithm
– complete data is now included in S and N
– E-step is then:

Q(q , qˆk ) = E éëln p( s, n | q ) | y, qˆk ùû


– the logarithmic probability distribution for the complete data is
then
ln p ( s, n | q ) = ln p( n) + ln( p( s | q ))
1 S2
= C - ln q - (C contains all the terms
2 2q
independent of θ)
®

ˆ 1 E[ S 2
| Y , qˆk ]
Q(q , q k ) = C - ln q -
2 2q

M.Sc. Jukka Talvitie 5.12.2013


An example: ML estimation vs. EM
algorithm
q M-step:
– maximize the E-step
– We set the derivative to zero and get (use results from math
tables: conditional means and variances, “Law of total variance”)

qˆk +1 = E éë S 2 | Y , qˆk ùû = E 2 éë S | Y , qˆk ùû + var éë S | Y , qˆk ùû


2
æ qˆk ö qˆk
= çç Y ÷÷ +
ˆ ˆ
è qk + 1 ø qk + 1

q At the steady state (qˆk +1 = qˆk ) we get the same value for the
estimate as in ML estimation (max{0,y2-1})
q What about the convergence? What if we choose the initial value
qˆ = 0
0

M.Sc. Jukka Talvitie 5.12.2013


An example: ML estimation vs. EM
algorithm
q In the previous example, the ML estimate could be solved in a
closed form expression
– In this case there was no need for EM algorithm, since the ML
estimate is given in a straightforward manner (we just showed
that the EM algorithm converges to the peak of the likelihood
function)

q Next we consider a coin toss example:


– The target is to figure out the probability of heads for two coins
– ML estimate can be directly calculated from the results
q We will raise the bets a little bit higher and assume that we don’t
even know which one of the coins is used for the sample set?
– i.e. we are estimating the coin probabilities without knowing
which one of the coins is being tossed

M.Sc. Jukka Talvitie 5.12.2013


An example: Coin toss) [4]
Maximum likelihood Coin A Coin B
ML method (if we
q We have two coins: A and B HTTTHHTHTH 5H, 5T
know the coins):
q The probabilities for heads are q A HHHHTHHHHH 9H, 1T
πˆA <
24
< 0.80
and q B HTHHHHHTHH 8H,2T
24 ∗ 6
9
πˆB < < 0.45
q We have 5 measurement sets HTHTTTTHHTT 4H, 6T 9 ∗ 11

including 10 coin tosses in each set THHHTHHHTH 7H,3T Example calculations for the
first set (qˆA(0) = 0.6, qˆB(0) = 0.5)
q If we know which of the coins are 24H, 6T 9H, 11T æ10 ö
5 sets, 10 tosses per set ç ÷ × 0.6 × 0.4 » 0.201
5 5

tossed in each set, we can w i th è5ø

calculate the ML probabilities for q A


Expectation Maximization liz
e”
2.2
37 æ10 ö
ç ÷ × 0.5 × 0.5 » 0.246
5 5
m a =
or 6 è5ø
N
and q B
2. E-step ” 1 4
0.2
0 1+
? HTTTHHTHTH
0.2

q If we don’t know which of the coins ?


?
?
HHHHTHHHHH
HTHHHHHTHH
HTHTTTTHHTT
Coin A Coin B

are tossed in each set, ML ? THHHTHHHTH 0.45 x 0.55 x ≈2.2H, 2.2T ≈2.8H, 2.8T

estimates cannot be calculated 0.80 x 0.20 x ≈7.2H, 0.8T ≈1.8H, 0.2T

directly 0.73 x 0.27 x ≈5.9H, 1.5T ≈2.1H, 0.5T

→ EM algorithm 0.35 x 0.65 x ≈1.4H, 2.1T ≈2.6H, 3.9T

0.65 x 0.35 x ≈4.5H, 1.9T ≈2.5H, 1.1T


Binomial distribution
used to calculate πˆA(0) < 0.6 ≈21.3H, 8.6T ≈11.7H, 8.4T
21.3
probabilities: πˆB(0) < 0.5 πˆA(1) < < 0.71
ænö k n -k
21.3 ∗ 8.6
ç k ÷ p (1 - p ) H
1. Initialization 3. M-step
11.7 πˆA(10) < 0.80
è ø πˆB(1) < < 0.58 4.
11.7 ∗ 8.4 πˆB(10) < 0.52

M.Sc. Jukka Talvitie 5.12.2013


About some practical issues

q Although many examples in the literature are showing excellent


results using the EM algorithm, the reality is often less glamorous
– As the number of uncertain parameters increase in the modeled
system, even the best available guess (in ML sense) might not
be adequate
– NB! This is not the algorithm’s fault. It still provides the best
possible solution in ML sense
q Depending on the form of the likelihood function (provided in the
E-step) the convergence rate of the EM might vary considerably
q Notice, that the algorithm converges towards a local maximum
– To locate the global peak one must use different initial guesses
for the estimated parameters or use some other more advanced
methods to find out the global peak
– With multiple unknown (hidden/latent) parameters the number of
local peaks usually increases
M.Sc. Jukka Talvitie 5.12.2013
Further examples
q Line Fitting (showed only in the lecture)
q Parameter estimation of multivariate Gaussian mixture
– See additional pdf-file for the
• Problem definition
• Equations
– Definition of the log-likelihood function
– E-step
– M-step
– See additional Matlab m-file for the illustration of
• The example in numerical form
– Dimensions and value spaces for each parameter
• The iterative nature of the EM algorithm
– Study how parameters change at each iteration
• How initial guesses for the estimated parameters affect the final
result
M.Sc. Jukka Talvitie 5.12.2013
Conclusions
q EM finds iteratively ML estimates in estimation problems with hidden
(incomplete) data
– likelihood increases at every step of the iteration process
q Algorithm consists of two iteratively taken steps:
– Expectation step (E-step)
• Take the expected value of the complete data given the
observation and the current parameter estimate
– Maximization step (M-step)
• Maximize the Q-function in the E-step (basically, the data of the
E-step is used as it were measured observations)
q Algorithm converges to the local maximum
– Global maximum can be elsewhere
q See reference list for literature regarding use cases of EM algorithm in
the Communications
– These are the references [5]-[16] (not mentioned in the previous slides)

M.Sc. Jukka Talvitie 5.12.2013


References
1. Dempster, A.P.; Laird, N.M.; Rubin, D.B., “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal
of the Royal Statistical Society, Series B (Methodological), Vol. 39, No. 1., pp. 1-38, 1977.
2. Moon, T.K., “The Expectation Maximization Algorithm”, IEEE Signal Processing Magazine, vol. 13, pp. 47-60, Nov.
1996.
3. Chuong, B.D.; Serafim B., What is Expectation Maximization algorithm? [Online]. Not available anymore. Was
originally available on: courses.ece.illinois.edu/ece561/spring08/EM.pdf
4. The Expectation-Maximization Algorithm. [Online]. Not available anymore. Was originally available on:
ai.stanford.edu/ ~chuongdo/papers/em_tutorial.pdf

Some communications related papers using the EM algorithm (continues in the next slide):
5. Borran, M.J.; Nasiri-Kenari, M., "An efficient detection technique for synchronous CDMA communication systems
based on the expectation maximization algorithm," Vehicular Technology, IEEE Transactions on , vol.49, no.5,
pp.1663,1668, Sep 2000
6. Cozzo, C.; Hughes, B.L., "The expectation-maximization algorithm for space-time communications," Information
Theory, 2000. Proceedings. IEEE International Symposium on , vol., no., pp.338,, 2000
7. Rad, K. R.; Nasiri-Kenari, M., "Iterative detection for V-BLAST MIMO communication systems based on expectation
maximisation algorithm," Electronics Letters , vol.40, no.11, pp.684,685, 27 May 2004
8. Barembruch, S.; Scaglione, A.; Moulines, E., "The expectation and sparse maximization algorithm," Communications
and Networks, Journal of , vol.12, no.4, pp.317,329, Aug. 2010
9. Panayirci, E., "Advanced signal processing techniques for wireless communications," Signal Design and its
Applications in Communications (IWSDA), 2011 Fifth International Workshop on , vol., no., pp.1,1, 10-14 Oct. 2011
10. O'Sullivan, J.A., "Message passing expectation-maximization algorithms," Statistical Signal Processing, 2005
IEEE/SP 13th Workshop on , vol., no., pp.841,846, 17-20 July 2005
11. Etzlinger, Bernhard; Haselmayr, Werner; Springer, Andreas, "Joint Detection and Estimation on MIMO-ISI Channels
Based on Gaussian Message Passing," Systems, Communication and Coding (SCC), Proceedings of 2013 9th
International ITG Conference on , vol., no., pp.1,6, 21-24 Jan. 2013

M.Sc. Jukka Talvitie 5.12.2013


References
12. Groh, I.; Staudinger, E.; Sand, S., "Low Complexity High Resolution Maximum Likelihood Channel Estimation in
Spread Spectrum Navigation Systems," Vehicular Technology Conference (VTC Fall), 2011 IEEE , vol., no., pp.1,5,
5-8 Sept. 2011
13. Wei Wang; Jost, T.; Dammann, A., "Estimation and Modelling of NLoS Time-Variant Multipath for Localization
Channel Model in Mobile Radios," Global Telecommunications Conference (GLOBECOM 2010), 2010 IEEE , vol.,
no., pp.1,6, 6-10 Dec. 2010
14. Nasir, A.A.; Mehrpouyan, H.; Blostein, S.D.; Durrani, S.; Kennedy, R.A., "Timing and Carrier Synchronization With
Channel Estimation in Multi-Relay Cooperative Networks," Signal Processing, IEEE Transactions on , vol.60, no.2,
pp.793,811, Feb. 2012
15. Tsang-Yi Wang; Jyun-Wei Pu; Chih-Peng Li, "Joint Detection and Estimation for Cooperative Communications in
Cluster-Based Networks," Communications, 2009. ICC '09. IEEE International Conference on , vol., no., pp.1,5, 14-
18 June 2009
16. Xie, Yongzhe; Georghiades, C.N., "Two EM-type channel estimation algorithms for OFDM with transmitter diversity,"
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on , vol.3, no., pp.III-
2541,III-2544, 13-17 May 2002

M.Sc. Jukka Talvitie 5.12.2013

You might also like