0% found this document useful (0 votes)

43 views23 pages

EM Algorithm: Jur Van Den Berg

The document describes the Expectation-Maximization (EM) algorithm. It begins by explaining Kalman filtering and smoothing for state space models. It then introduces the EM algorithm, which simultaneously optimizes state estimates and model parameters given observed data. The document proceeds to derive the log-likelihood function for state space models and shows how to maximize it with respect to the model parameters to perform the M-step of the EM algorithm.

Uploaded by

Chandramauli Chaudhuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views23 pages

EM Algorithm: Jur Van Den Berg

Uploaded by

Chandramauli Chaudhuri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 23

EM Algorithm

Jur van den Berg

Kalman Filtering vs.

Smoothing
Dynamics and Observation model
x t 1
yt

Ax t w t , w t ~ Wt N (0, Q)
Cx t v t , v t ~ Vt N (0, R )

Kalman Filter:

Compute X t | Y0 y 0 , , Yt y t
Real-time, given data so far

Kalman Smoother:

X t | Y0 y 0 , , YT y T , 0 t T
Compute
Post-processing, given all data

EM Algorithm
x t 1
yt

Ax t w t , w t ~ Wt N (0, Q)
Cx t v t , v t ~ Vt N (0, R )

Kalman smoother:
Compute distributions X0, , Xt
given parameters A, C, Q, R, and data y0, , yt.

EM Algorithm:
Simultaneously optimize X0, , Xt and A, C, Q,
R
given data y0, , yt.

Probability vs. Likelihood

Probability: predict unknown outcomes
based on known parameters:
p(x | )

Likelihood: estimate unknown

parameters based on known outcomes:
L(| x) = p(x | )

Coin-flip example:
is probability of heads (parameter)
x = HHHTTH is outcome

Likelihood for Coin-flip

Example
Probability of outcome given parameter:
p(x = HHHTTH | = 0.5) = 0.56 = 0.016

Likelihood of parameter given outcome:

L(= 0.5 | x = HHHTTH) = p(x | ) = 0.016

Likelihood maximal when = 0.6666

Likelihood function not a probability density

Likelihood for Cont.

Distributions
Six samples {-3, -2, -1, 1, 2, 3}
believed to be drawn from some
Gaussian N(0, 2)

LLikelihood
( | {3,2,1,1,2of
,3}):
p ( x 3 | ) p ( x 2 | ) p ( x 3 | )
Maximum (likelihood:
3) 2 (2) 2 (1) 2 12 2 2 32

2.16

Likelihood for Stochastic

Model
Dynamics model
x t 1
yt

Ax t w t , w t ~ Wt N (0, Q)
Cx t v t , v t ~ Vt N (0, R )

Suppose xt and yt are given for 0 t

T, what is likelihood of A,
C, Q and
T
L( A, C , Q, R | x, y ) p (x, y | A, C , Q, R ) p (x t | x t 1 ) p (y t | x t )
R?
t 0

log p (x, y | A, C , Q, R )
Compute log-likelihood:

Log-likelihood
T

log p(x, y | A, C , Q, R ) log p(x t | x t 1 ) p(y t | x t )

t 0

T 1

log p(x
t 0

t 1

| x t ) log p(y t | x t ) ...

t 0

Multivariate normal
distribution N(,
1
/
2
k / 2
1
T 1
1
p
(
x
)

(
2

exp(

(
x

)
(x ))
2
) has pdf:
From model:xt 1 ~ N ( Axt , Q) y t ~ N (Cxt , R)

1
1

1
T
1
log Q (x t 1 Ax t ) Q (x t 1 Ax t )
2
t 0 2

1
T 1

1
T
1
log R (y t Cx t ) R (y t Cx t ) const
2
t 0 2

T 1

Log-likelihood #2

1
1

1
T
1
log Q (x t 1 Ax t ) Q (x t 1 Ax t )
2
t 0 2

1
T 1

1
T
1
log R (y t Cx t ) R (y t Cx t ) const ...
2
t 0 2

T 1

a = Tr(a) if a is scalar
Bring summation inward
T 1
T
1

1
T
1
log Q Tr(( x t 1 Ax t ) Q (x t 1 Ax t ))
2
2 t 0

T 1
1 T

1
T
1
log R Tr(( y t Cx t ) R (y t Cx t )) const
2
2 t 0

Log-likelihood #3
T
1 T 1

1
T
1
log Q Tr(( x t 1 Ax t ) Q (x t 1 Ax t ))
2
2 t 0

T
T 1
1

1
T
1
log R Tr(( y t Cx t ) R (y t Cx t )) const ...
2
2 t 0

Tr(AB) = Tr(BA)
Tr(A) + Tr(B) = Tr(A+B)
T
1 1
1
log Q Tr Q
2
2

T 1
1 1
1
log R Tr R
2
2

T 1

(x
t 0

t 1

(y
t 0

Cx t ) (y t Cx t ) const

T

Ax t )(x t 1 Ax t )

T

Log-likelihood #4
T
1 1
1
log Q Tr Q
2
2

T 1

(x
t 0

T 1
1 1
1
log R Tr R
2
2

t 1

(y
t 0

Ax t )(x t 1 Ax t )

T

Cx t ) (y t Cx t ) const ...

T

Expand
l ( A, C , Q, R | x, y )
T
1 1
1
log Q Tr Q
2
2

T 1

x
t 0

T 1
1 1
1
log R Tr R
2
2

T
t 1 t 1

y y
t 0

T
t

x x A Ax t x
T
t 1 t

y t x C Cx t y
T
t

T
t 1

Ax t x A

T
t

Cx t x C const

T
t

Maximize likelihood
log is monotone function
max log(f(x)) max f(x)

Maximize l(A, C, Q, R | x, y) in turn

for A, C, Q and R.

l ( A, C , Q, R | x, y )
Solve
A
l ( A, C , Q, R | x, y )
Solve
C
Solvel ( A, C , Q, R | x, y )
Q
Solvel ( A, C , Q, R | x, y )
R

0
0

for
for
for
for

A
C
Q
R

Matrix derivatives
Defined for scalar functions f : Rn*m ->
R

Key identities

xT Ax
xT ( AT A)
x
B T AB
B T ( AT A)
B
Tr ( AB) Tr ( BA) Tr ( B T AT )

BT
A
A
A
log A
A T
A

Optimizing A
Derivative
l ( A, C , Q, R | x, y ) 1 1
Q
A
2

Maximizer

T 1

x
t 0

T
t 1 t

T 1

x x
t 0

T
t

T 1

2x
t 0

x 2 Ax t x

T
t 1 t

T
t

Optimizing C
Derivative
l ( A, C , Q, R | x, y ) 1 1
R
C
2

Maximizer

y x x x
t 0

T
t

t 0

T
t

2y x
t 0

T
t

2Cx t x
T
t

Optimizing Q
Derivative with respect to inverse
l ( A, C , Q, R | x, y ) T
1
Q
1
Q
2
2

t 0

Maximizer
1
Q
T

T 1

x
t 0

T
t 1 t 1

T
T T
T
T T
xt 1xt 1 xt 1xt A Axt xt 1 Axt xt A
T 1

x x A Ax t x
T
t 1 t

T
t 1

Ax t x A
T
t

Optimizing R
Derivative with respect to inverse
l ( A, C , Q, R | x, y ) T 1
1

R
1
R
2
2

T
T T
T
T
y
y

y
x
C

C
x
y

C
x
x
t t t t
t t
t t C

t 0

Maximizer
1
R

T 1

y y
t 0

T
t

y t x C Cx t y Cx t x C
T
t

T
t

EM-algorithm
x t 1
yt

Ax t w t , w t ~ Wt N (0, Q)
Cx t v t , v t ~ Vt N (0, R )

Initial guesses of A, C, Q, R
Kalman smoother (E-step):
Compute distributions X0, , XT
given data y0, , yT and A, C, Q, R.

Update parameters (M-step):

Update A, C, Q, R such that
expected log-likelihood is maximized

Repeat until convergence (local

optimum)

Kalman Smoother
for (t = 0; t < T; ++t)
filter x t 1|t Ax t|t

// Kalman

Pt 1|t APt|t AT Q

K t 1
x t 1|t 1

Pt 1|t C CPt 1|t C R

x t 1|t K t 1 y t 1 Cx t 1|t

Pt 1|t 1

Pt 1|t K t 1CPt 1|t

for (t = T 1; t 0;T --t)1

Lt
Pt |t A Pt 1|t
pass
x t|T
Pt|T

// Backward

x t |t Lt x t 1|T x t 1|t
Pt |t Lt ( Pt 1|T Pt 1|t ) LTt

Update Parameters
Likelihood in terms of x, but only X
l ( A, C , Q, R | x, y )
available
T
1 1
1
log Q Tr Q
2
2

T 1

x
t 0

T 1
1

log R 1 Tr R 1
2
2

T
t 1 t 1

x x A Ax t x
T
t 1 t

T
t 1

Ax t x A

T
t

T
T T
T
T T

y
y

y
x
C

C
x
y

C
x
x

t t
t t
t t 1
t t C const
t 0

T

x t , x t xTt , x t xTt1

Likelihood-function linear in
Expected likelihood: replace them with:
E ( X t | y ) x t|T

E ( X t X tT | y ) Pt|T x t|T x Tt|T

E ( X t X tT1 | y ) x t|t x Tt 1|T Lt Pt 1|T (x t 1|T x t 1|t )x Tt 1|T

Use maximizers to update A, C, Q and R.

Convergence
Convergence is guaranteed to local
optimum
Similar to coordinate ascent

Conclusion
EM-algorithm to simultaneously
optimize
state estimates and model
parameters
Given ``training data, EM-algorithm
can be used (off-line) to learn the
model for subsequent use in (realtime) Kalman filters

Next time
Learning from demonstrations
Dynamic Time Warping

Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
Solutions To Steven Kay's Statistical Estimation Book
67% (3)
Solutions To Steven Kay's Statistical Estimation Book
16 pages
Math Review For ML
No ratings yet
Math Review For ML
41 pages
Estimations
100% (1)
Estimations
183 pages
Background/Random Processes
No ratings yet
Background/Random Processes
33 pages
Detection and Estimation Theory Lecture Notes For Ecen 672
No ratings yet
Detection and Estimation Theory Lecture Notes For Ecen 672
216 pages
Mathematics For Machine Learning
No ratings yet
Mathematics For Machine Learning
134 pages
Linear Dynamical Models, Kalman Filtering and Statistics. Lecture Notes To IN-ST 259
No ratings yet
Linear Dynamical Models, Kalman Filtering and Statistics. Lecture Notes To IN-ST 259
163 pages
Observers and Kalman Filters: CS 393R: Autonomous Robots
No ratings yet
Observers and Kalman Filters: CS 393R: Autonomous Robots
37 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Thomas Minka - Note On Matrix Calculus and Algebra
No ratings yet
Thomas Minka - Note On Matrix Calculus and Algebra
19 pages
F-Bach
No ratings yet
F-Bach
36 pages
Kalman Smoothing
No ratings yet
Kalman Smoothing
15 pages
Probability and Statistics: Cookbook
No ratings yet
Probability and Statistics: Cookbook
28 pages
Likelihood EM HMM Kalman
No ratings yet
Likelihood EM HMM Kalman
46 pages
Prob RV Opt Basics
No ratings yet
Prob RV Opt Basics
35 pages
11 Hidden Markov Models (HMMS) Model and Problem Description
No ratings yet
11 Hidden Markov Models (HMMS) Model and Problem Description
15 pages
Solution For Assignment 1 Econ 421
No ratings yet
Solution For Assignment 1 Econ 421
4 pages
Unit 3 - Estimation And Prediction: θ 1 2 n 1 2 n 1 1 2 2 n n
No ratings yet
Unit 3 - Estimation And Prediction: θ 1 2 n 1 2 n 1 1 2 2 n n
14 pages
Kalman Filters: CS 344R: Robotics Benjamin Kuipers
No ratings yet
Kalman Filters: CS 344R: Robotics Benjamin Kuipers
19 pages
Shifting Method
No ratings yet
Shifting Method
9 pages
Cheat Sheet For Exam
No ratings yet
Cheat Sheet For Exam
2 pages
SI2018
No ratings yet
SI2018
32 pages
MLF Combined
No ratings yet
MLF Combined
84 pages
ML Ctanujit
No ratings yet
ML Ctanujit
56 pages
EE364a Homework 6 Solutions: I 1,..., K I I I
No ratings yet
EE364a Homework 6 Solutions: I 1,..., K I I I
20 pages
Math For Machine Learning
No ratings yet
Math For Machine Learning
1 page
Probability and Statistics Cookbook
No ratings yet
Probability and Statistics Cookbook
28 pages
DL (Unit I)
No ratings yet
DL (Unit I)
25 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
Machine Learning Techniques
No ratings yet
Machine Learning Techniques
8 pages
A Step by Step Mathematical Derivation A
No ratings yet
A Step by Step Mathematical Derivation A
32 pages
AIT2001 Final
No ratings yet
AIT2001 Final
8 pages
Background Material Crib-Sheet: 1 Probability Theory
No ratings yet
Background Material Crib-Sheet: 1 Probability Theory
4 pages
OptimumEngineeringDesign Day2b
No ratings yet
OptimumEngineeringDesign Day2b
24 pages
Tutorial KF
No ratings yet
Tutorial KF
13 pages
Lec10 LeastSquaresRegression PDF
No ratings yet
Lec10 LeastSquaresRegression PDF
4 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
KF PF
No ratings yet
KF PF
45 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
Day 1
No ratings yet
Day 1
41 pages
Linear Algebra Cheat Sheet
No ratings yet
Linear Algebra Cheat Sheet
2 pages
MLF Notes - Rishab Dec 24
No ratings yet
MLF Notes - Rishab Dec 24
6 pages
Convex Optimization Prerequisite - Topics
No ratings yet
Convex Optimization Prerequisite - Topics
6 pages
Lecture 2 - Math
No ratings yet
Lecture 2 - Math
39 pages
Mathematics For Machine Learning V5
No ratings yet
Mathematics For Machine Learning V5
10 pages
Deep-Learning
No ratings yet
Deep-Learning
28 pages
L6 - Kalman Filter
No ratings yet
L6 - Kalman Filter
15 pages
Aprendizaje Estadistico Final
No ratings yet
Aprendizaje Estadistico Final
71 pages
Skript Opt Mach
No ratings yet
Skript Opt Mach
49 pages
TS Theme3
No ratings yet
TS Theme3
18 pages
Lecture-05 - Least Squares and Optimization
No ratings yet
Lecture-05 - Least Squares and Optimization
34 pages
Exercise 01
No ratings yet
Exercise 01
3 pages
Lec1 Mathreview
No ratings yet
Lec1 Mathreview
61 pages
2IIG0 Cheat Sheet 1
No ratings yet
2IIG0 Cheat Sheet 1
2 pages
When Models Meet Data
No ratings yet
When Models Meet Data
25 pages
Useful Formulae: Mathematical & Physical
From Everand
Useful Formulae: Mathematical & Physical
Matthew Watkins
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet