Lecture 8: Bayesian Estimation of Parameters in State Space Models
Lecture 8: Bayesian Estimation of Parameters in State Space Models
Simo Särkkä
4 Summary
θ ∼ p(θ)
x0 ∼ p(x0 | θ)
xk ∼ p(xk | xk −1 , θ)
yk ∼ p(yk | xk , θ).
Advantages:
A simple static Bayesian model.
We can take any numerical method (e.g., MCMC) to attack
the model.
Disadvantages:
We are not utilizing the Markov structure of the model.
Dimensionality is huge, computationally very challenging.
Hard to utilize the already developed approximations for
filters and smoothers.
Requires computation of high-dimensional integral over the
state trajectories.
For computational reasons, we will select another, filtering
and smoothing based route.
p(xk | y1:k −1 , θ)
Metropolis–Hastings
Draw the starting point, θ (0) from an arbitrary initial
distribution.
For i = 1, 2, . . . , N do
1 Sample a candidate point θ ∗ ∼ q(θ ∗ | θ (i−1) ).
2 Evaluate the acceptance probability
( )
(i−1) ∗ q(θ (i−1) | θ ∗ )
αi = min 1, exp(ϕT (θ ) − ϕT (θ )) .
q(θ ∗ | θ (i−1) )
Abstract EM
The maximization of the lower bound can be done by
coordinate ascend as follows:
1 Start from initial guesses q (0) , θ (0) .
2 For n = 0, 1, 2, . . . do the following steps:
1 E-step: Find q (n+1) = arg maxq F [q, θ (n) ].
2 M-step: Find θ (n+1) = arg maxθ F [q (n+1) , θ].
We now get
F [q (n+1) (x0:T ), θ]
Z
= p(x0:T | y1:T , θ (n) ) log p(x0:T , y1:T | θ) dx0:T
Z
− p(x0:T | y1:T , θ (n) ) log p(x0:T | y1:T , θ (n) ) dx0:T .
EM algorithm
The EM algorithm consists of the following steps:
1 Start from an initial guess θ (0) .
2 For n = 0, 1, 2, . . . do the following steps:
1 E-step: compute Q(θ, θ (n) ).
2 M-step: compute θ (n+1) = arg maxθ Q(θ, θ (n) ).
Thus we get
p(yk | y1:k −1 , θ)
Z
= N(yk | H(θ) xk , R(θ)) N(xk | m− −
k (θ), Pk (θ)) dxk
= N(yk | H(θ) m− − T
k (θ), H(θ) Pk (θ) H (θ) + R(θ)).
1 1
ϕk (θ) = ϕk −1 (θ) + log |2π Sk (θ)| + vTk (θ) S−1
k (θ) vk (θ),
2 2
where the terms vk (θ) and Sk (θ) are given by the Kalman filter
with the parameters fixed to θ:
Prediction:
m−
k (θ) = A(θ) mk −1 (θ)
P− T
k (θ) = A(θ) Pk −1 (θ) A (θ) + Q(θ).
(continues . . . )
vk (θ) = yk − H(θ) m−
k (θ)
Sk (θ) = H(θ) P− T
k (θ) H (θ) + R(θ)
Kk (θ) = P− T −1
k (θ) H (θ) Sk (θ)
mk (θ) = m−
k (θ) + Kk (θ) vk (θ)
Pk (θ) = P− T
k (θ) − Kk (θ) Sk (θ) Kk (θ).
Q(θ, θ (n) )
1 T T
' − log |2π P0 (θ)| − log |2π Q(θ)| − log |2π R(θ)|
2( 2 2 )
1 −1
h
s s s T
i
− tr P0 (θ) P0 + (m0 − m0 (θ)) (m0 − m0 (θ))
2
T
1 X n −1 h io
− tr Q (θ) E (xk − f(xk −1 , θ)) (xk − f(xk −1 , θ))T | y1:T
2
k =1
T
1 X n h io
− tr R−1 (θ) E (yk − h(xk , θ)) (yk − h(xk , θ))T | y1:T ,
2
k =1