0% found this document useful (0 votes)

39 views46 pages

Lec35 SequentialImportanceSampling

The document discusses sequential importance sampling for state space models. The key goals are to learn about sequential importance sampling and understand online Bayesian parameter estimation in state space models. Sequential importance sampling allows estimating filtering distributions and normalizing constants sequentially over time by reusing samples from the previous time step, as opposed to running separate MCMC for each time step. This improves computational efficiency.

Uploaded by

hu jack

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views46 pages

Lec35 SequentialImportanceSampling

Uploaded by

hu jack

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Sequential Importance Sampling

Prof. Nicholas Zabaras

Email: [email protected]
URL: https://www.zabaras.com/

November 12, 2020

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras

Contents
 Sequential Bayesian Inference
 Importance sampling for the state space model
 SEQUENTIAL IMPORTANCE SAMPLING: Sequential IS, Factorization of the
importance density, Variance of the IS Estimates, IS in High-Dimensions, The
Bootstrap Particle Filter, Resampling
 BAYESIAN RECURSION FORMULAS: Filtering and marginal likelihood, The
Bootstrap Filter implementing the prediction/update recursions, The Kalman
filter updates for a LG-SSM, Forward-Filtering Backward-Smoothing relations,
Forward-Backward two filter smoother
 ONLINE BAYESIAN PARAMETER ESTIMATION: Introduction, MLE Solution
and Fisher’s identity, Expectation-Maximization approach, Gaussian Process
SSM

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 2

Goals
 The goals for today’s lecture include the following:

 Learn about sequential importance sampling for state space models

 Understand online Bayesian parameter estimation in state space models

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 3

Bayesian Inference for the SSM
 While our overall estimation problem is to compute the joint filtering
distribution 𝑝 𝒙1:𝑛 |𝒚1:𝑛 , the following inference problems are also of interest:
 Filtering: Compute 𝑝 𝑥𝑛 |𝒚1:𝑛 y1 y2 ... yn
𝑝 𝒙1:𝑛 |𝒚1:𝑛
 Prediction: Compute 𝑝 𝑥𝑛+1 |𝒚1:𝑛
x1 x2 ... xn
 Joint Smoothing: 𝑝 𝒙1:𝑇 |𝒚1:𝑇
p  x1 | y1  p  x1:2 | y1:2  p  x1:n | y1:n 
 Marginal Smoothing: 𝑝 𝑥𝑛 |𝒚1:𝑇 , 𝑛 ≤ 𝑇
n
Likelihood : p  y1 , , yn | x1 , , xn    g  yi | xi 
 Note: The Kalman filter provides an i 1
n
analytical solution to the filtering problem Prior : p  x1:n     x1   f  xk | xk 1 
for a LG-SSM. k 2

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 4

Importance Sampling for the State Space Model
 Let us draw 𝑁 samples from our importance distribution:
𝑁
𝑖 1
𝑿1:𝑛 ~𝑞 𝒙1:𝑛 |𝒚1:𝑛 , 𝑞ො𝑁 𝒙1:𝑛 |𝒚1:𝑛 = ෍ 𝛿𝑿 𝑖 𝒙1:𝑛
𝑁 1:𝑛
𝑖=1
 Then using the identity in the earlier slide, we obtain the following approximation of our target
distribution:
𝑤 𝒙1:𝑛 , 𝒚1:𝑛 𝑞ො𝑁 𝒙1:𝑛 |𝒚1:𝑛
𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛 =
‫𝒙 𝑤 ׬‬1:𝑛 , 𝒚1:𝑛 𝑞ො𝑁 𝒙1:𝑛 |𝒚1:𝑛 𝑑𝒙1:𝑛
1
𝑤 𝒙1:𝑛 , 𝒚1:𝑛 𝑁 σ𝑁 𝑖=1 𝛿𝑿 𝑖 𝒙1:𝑛
1:𝑛
=
1
න 𝑤 𝒙1:𝑛 , 𝒚1:𝑛 𝑁 σ𝑁 𝑖=1 𝛿𝑿 𝑖 𝒙1:𝑛 𝑑𝒙1:𝑛
1:𝑛
𝑁 𝑖
𝑖 𝑖
𝑤 𝑿1:𝑛 , 𝒚1:𝑛
= ෍ 𝑊𝑛 𝛿𝑿 𝑖 𝒙1:𝑛 , 𝑊𝑛 = 𝑁
1:𝑛 𝑖
𝑖=1 ෍ 𝑤 𝑿1:𝑛 , 𝒚1:𝑛
𝑖=1

𝑁 𝑁
1 1 𝑖
 Note that: 𝑍መ𝑛 ≡ 𝑝Ƹ 𝑁 𝒚1:𝑛 = ඲ 𝑤 𝒙1:𝑛 , 𝒚1:𝑛 ෍ 𝛿𝑿 𝑖 𝒙1:𝑛 𝑑𝒙1:𝑛 = ෍ 𝑤 𝑿1:𝑛 , 𝒚1:𝑛
𝑁 1:𝑛 𝑁
𝑖=1 𝑖=1
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 5
Bias & Variance of Importance Sampling Estimates
 We are interested in an importance sampling approximation of 𝔼𝑝 𝒙 𝜑
1:𝑛 |𝒚1:𝑛
𝑁
𝑖 𝑖
𝐼𝑛𝐼𝑆 𝜑 ≡ 𝔼𝑝ො𝑁 𝒙1:𝑛 |𝒚1:𝑛 𝜑 = ෍ 𝑊𝑛 𝜑 𝑿1:𝑛
𝑖=1

 The asymptotic bias is of the order 1/𝑁 (negligible) and the MSE error is:
MSE  bias 2  variance

O N 2   
O N 1

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 6

Sequential Importance Sampling
for the SSM

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 7

Sequential Importance Sampling
 Let us return to our state space model and consider a sequential Monte Carlo approximation
of 𝑝 𝒙1:𝑛 |𝒚1:𝑛 ∝ 𝑝 𝒙1:𝑛 , 𝒚1:𝑛 .

 The distributions 𝜋𝑛 = 𝑝 𝒙1:𝑛 |𝒚1:𝑛 are known up to a normalizing constant:

 n ( x1:n ) p  x1:n , y1:n 
 n ( x1:n )  
Zn Zn
 We want to estimate the expectations of functions 𝜑𝑛 : 𝒳 𝑛 → ℝ

n n    n ( x1:n ) n ( x1:n )dx1:n

and/or the normalizing constants 𝑍𝑛.

 One can use MCMC to sample from 𝜋𝑛 , 𝑛 = 1,2, . . . This calculation will be slow and cannot
compute 𝑍𝑛 , 𝑛 = 1,2 …

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 8

Sequential Importance Sampling
 We want to do these calculations sequentially starting with 𝜋1 and 𝑍1 at step
(time 1), then proceeding to 𝜋2 and 𝑍2, etc.

 Sequential Monte Carlo (SMC) provides the means to do so as an alternative

algorithm to MCMC.

The key idea is that if 𝜋𝑛−1 does not differ a lots from 𝜋𝑛 , we should be able to
reuse our estimate of 𝜋𝑛−1 to approximate 𝜋𝑛 .

 In sequential importance sampling, the proposal distribution is defined

sequentially and the weights are evaluated sequentially.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 9

Sequential Importance Sampling
 We want to design a sequential importance sampling method to approximate

 n n1 and Z n n1

 Assume that `at time 1’, we have approximations 𝜋ො 1 𝑥1 = 𝑝Ƹ 𝑁 𝑥1 |𝑦1 , 𝑍መ1 using an importance
density 𝑞1 𝑥1 |𝑦1 .
𝑖
𝑋1 ~𝑞1 𝑥1 |𝑦1 , 𝑖 = 1,2, . . . , 𝑁
𝑁 𝑖
𝑖 𝑖
𝑤1 𝑋1 , 𝑦1
𝑝Ƹ 𝑁 𝑥1 |𝑦1 𝑑𝑥1 = ෍ 𝑊1 𝛿𝑋 𝑖 𝑑𝑥1 , 𝑤ℎ𝑒𝑟𝑒 𝑊1 = 𝑁
1 𝑗
𝑖=1 ෍ 𝑤1 𝑋1 , 𝑦1
𝑗=1
𝑁
1 𝑖
𝑍መ1 = ෍ 𝑤1 𝑋1 , 𝑦1 𝑤𝑖𝑡ℎ
𝑁
𝑖=1
𝛾1 𝑥1 𝑝 𝑥1 , 𝑦1
𝑤1 𝑥1 , 𝑦1 = =
𝑞1 𝑥1 |𝑦1 𝑞1 𝑥1 |𝑦1

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 10

Sequential Importance Sampling
 At `time 2’, we want to approximate 𝜋ො 2 𝒙1:2 = 𝑝Ƹ 𝑁 𝒙1:2 |𝒚1:2 , 𝑍መ2 using an importance density
𝑞2 𝒙1:2 |𝒚1:2 .

𝑖
 We want to reuse the samples 𝑋1 and 𝑞1(𝑥1|𝑦1) in building the importance sampling
approximation for 𝜋2 𝒙1:2 , 𝑍2 . Let us select a proposal distribution that factorizes as:
q2  x1:2 | y1:2   q1  x1 | y1  q2  x2 | y1:2 , x1 
𝑖
 To obtain 𝑿1:2 ~𝑞2 𝒙1:2 |𝒚1:2 , we need to sample as follows:
X 2(i ) | X 1(i ) ~ q2  x2 | y1:2 , X 1(i ) 
 The importance sampling weight for this step is then:

 2  x1:2  p  x1:2 , y1:2 

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 11

Sequential Importance Sampling
 The normalized weights for step 2 are then given as:
p  x1:2 , y1:2 
W (i )
 w2  x1:2 , y1:2   w1  x1 , y1 
p  x1 , y1  q2  x2 | y1:2 , x1 
2
Weight from
step1 Incremental weight
 Generalizing to step 𝑛, we can write:
qn  x1:n | y1:n   qn 1  x1:n 1 | y1:n 1  qn  xn | y1:n , x1:n 1 
n
 q1  x1 | y1   qk  xk | y1:k , x1:k 1 
k 2

 Thus if
X1:(in)1 ~ qn 1  x1:n 1 | y1:n 1 
we sample 𝑋𝑛 from

X n(i ) | X1:(in)1 ~ qn  xn | y1:n , X 1:( in)1 

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 12

Sequential Importance Sampling
 The weights for step 𝑛 are then given as:
p ( X1:(in) , y1:n ) p ( X1:(in)1 , y1:n 1 ) p ( X1:( in) , y1:n )
wn ( X , y1:n ) 
(i )
1:n 
qn ( X1:(in) | y1:n ) qn 1 ( X1:(in)1 | y1:n 1 ) p( X1:(in)1 , y1:n 1 )qn ( X n( i ) | X1:( in)1 , y1:n )
wn1 ( X1:( ni )1 , y1:n1 )

p ( X1:(in) , y1:n )
 wn 1 ( X (i )
1:n 1 , y1:n 1 )
p ( X1:(in)1 , y1:n 1 )qn ( X n(i ) | X1:( in)1 , y1:n )
 Similarly the normalized weights are as follows:
Wn(i )  Wn ( X1:(in) , y1:n )  wn ( X1:(in) , y1:n )

 For our state space model, the above update formula takes the form:
f ( X n(i ) | X n( i)1 ) g ( yn | X n( i ) )
wn ( X , y1:n )  wn 1 ( X
(i )
1:n
(i )
1:n 1 , y1:n 1 )
qn ( X n(i ) | y1:n , X 1:(in)1 )
𝑁 𝑁
𝑖 1 𝑖
 At each time we have 𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛 = ෍ 𝑊𝑛 𝛿𝑿 𝑖 𝒙1:𝑛 and 𝑍መ𝑛 𝒙1:𝑛 |𝒚1:𝑛 = ෍ 𝑤𝑛 .
𝑖=1 1:𝑛 𝑁 𝑖=1
𝑖
 In general, we may need to store all the paths 𝑿1:𝑛 even if our interest is to only compute
𝜋𝑛 (𝑥𝑛 ) = 𝑝 𝑥𝑛 |𝒚1:𝑛 .
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 13
Variance of the IS Estimates
 In this sequential framework, it would seem that the only freedom the user has at time 𝑛 is the
choice of 𝑞n 𝑥𝑛 |𝒙1:𝑛−1 , 𝒚1:𝑛 .

 A sensible strategy consists of selecting it so as to minimize the variance of 𝑤𝑛 𝒙1:𝑛 . It is

straightforward to check that this is achieved by selecting
𝑜𝑝𝑡
𝑞𝑛 𝑥𝑛 |𝒚1:𝑛 , 𝒙1:𝑛−1 = 𝑝𝑛 𝑥𝑛 |𝒚1:𝑛 , 𝒙1:𝑛−1
as in this case the variance of 𝑤𝑛 𝒙1:𝑛 conditional upon 𝒙1:𝑛−1 is zero and the associated
incremental weight is given by

𝑜𝑝𝑡 𝑝(𝒙1:𝑛 , 𝒚1:𝑛 ) 𝑝(𝒙1:𝑛−1 , 𝒚1:𝑛 ) 𝑝(𝑥𝑛 |𝒙1:𝑛−1 , 𝒚1:𝑛 )

𝛼𝑛 𝒙1:𝑛 = =
𝑝(𝒙1:𝑛−1 , 𝒚1:𝑛−1 )𝑞𝑛 (𝑥𝑛 |𝒙1:𝑛−1 , 𝒚1:𝑛 ) 𝑝(𝒙1:𝑛−1 , 𝒚1:𝑛−1 ) 𝑞𝑛 (𝑥𝑛 |𝒙1:𝑛−1 , 𝒚1:𝑛 )
𝛾𝑛 𝒙1:𝑛−1 ‫𝒙 𝑛𝛾 ׬‬1:𝑛 𝑑𝑥𝑛
= =
𝛾𝑛−1 𝒙1:𝑛−1 𝛾𝑛−1 𝒙1:𝑛−1
𝑜𝑝𝑡
 It is not always possible to sample from 𝑝𝑛 𝑥𝑛 |𝒚1:𝑛 , 𝒙1:𝑛−1 nor to compute 𝛼𝑛 𝒙1:𝑛 . In
𝑜𝑝𝑡
these cases, one should employ an approximation of 𝑞𝑛 𝑥𝑛 |𝒚1:𝑛 , 𝒙1:𝑛−1 for
𝑞𝑛 𝑥𝑛 |𝒚1:𝑛 , 𝒙1:𝑛−1 . Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 14
Variance of the IS Estimates
 In those scenarios in which the time required to sample from 𝑞𝑛 𝑥𝑛 |𝒚1:𝑛 , 𝒙1:𝑛−1 and to
compute 𝛼𝑛 𝒙1:𝑛 is independent of 𝑛 (and this is, indeed, the case if 𝑞𝑛 is chosen sensibly
and one is concerned with a problem such as fitering), it appears that we have provided a
solution for Problem 2 (computational complexity being 𝒪(𝑛)).

 However, the methodology presented here suffers from severe drawbacks.

 Even for standard IS, the variance of the resulting estimates increases exponentially with 𝑛.

 The variance of the weights will grow unboundedly (weight degeneracy – after some time will only
be one weight with non-zero value)

 As SIS is nothing but a special version of IS in which the importance distribution is of the form
𝑛

𝑞𝑛 𝒙1:𝑛 |𝒚1:𝑛 = 𝑞1 𝑥1 |𝑦1 ෑ 𝑞𝑘 𝑥𝑘 |𝒚1:𝑘 , 𝒙1:𝑘−1

𝑘=2
it suffers from the same problems.

 We demonstrate this using a very simple toy example.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 15
Variance of the IS Estimates
 Consider the case 𝒳 = ℝ and 𝑝𝑛 𝒙1:𝑛 = ς𝑛𝑘=1 𝒩 𝑥𝑘 ; 0,1 , 𝛾𝑛 𝒙1:𝑛 =
𝑥𝑘2
ς𝑛𝑘=1 𝑒𝑥𝑝 − , 𝑍𝑛 = 2𝜋 𝑛/2 .
2
1
 Select the following reasonable importance sampling distribution (1 ≠ 𝜎2 > )
2
𝑛 𝑛

𝑞𝑛 𝒙1:𝑛 = ෑ 𝑞𝑘 𝑥k = ෑ 𝒩 𝑥𝑘 ; 0, 𝜎 2
𝑘=1 𝑘=1
 Note that:
𝑛
1 1
𝛾𝑛 𝒙1:𝑛 −
𝑛/2 𝜎 𝑛 𝑒 2 ෍ 𝑥𝑖
2
(1−
𝜎2
) 𝑛/2 𝜎 𝑛
𝑤𝜎 (𝒙1:𝑛 ) = = 2𝜋 𝑖=1 ≤ 2𝜋 ∀𝑥
𝑞𝑛 𝒙1:𝑛
2
𝛾𝑛2 𝒙1:𝑛
𝑉𝑎𝑟𝑞𝜎 𝑤𝜎 (𝒙1:𝑛 ) = ඲𝑞𝜎 2 𝑑𝒙 − න𝛾𝑛 𝒙1:𝑛 𝑑𝒙1:𝑛 =
𝑞𝜎 (𝒙1:𝑛 ) 1:𝑛

𝑛 2 𝑛 Τ2 𝑛/2
𝑛
1 𝑛
෌𝑖=1 𝑥𝑖 1 𝑛
𝜎2 𝜎4
= 2𝜋 ඳ 𝜎 exp − 2− 2 𝑑𝒙1:𝑛 − 1 = 2𝜋 𝜎𝑛 − 1 = 2𝜋 𝑛
−1
2𝜋 𝑛/2 2 𝜎 2𝜎 2 − 1 2𝜎 2 − 1

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 16

Importance Sampling in High-Dimensions
𝑛/2 𝑛/2
𝑛
𝜎4 𝑉𝑎𝑟 𝑍መ𝑛 1 𝜎4
𝑉𝑎𝑟𝑞𝜎 𝑤𝜎 (𝒙1:𝑛 ) = 2𝜋 −1 → = −1
2𝜎 2 − 1 𝑍𝑛2 𝑁 2𝜎 2 − 1

1
 It is easy to see that: 𝜎 4 > 2𝜎 2 − 1 ⇔ 𝜎 2 − 1 2 > 0 for 1 ≠ 𝜎 2 > . Therefore:
2

𝑉𝑎𝑟𝑞𝜎 𝑤𝜎 (𝒙1:𝑛 ) → ∞ 𝑎𝑠 𝑛 → ∞

 The variance of the weights increases exponentially fast with dimensionality. This is
despite the good choice of 𝑞𝑛 𝒙1:𝑛 .

 For example, if we select 𝜎 2 = 1.2 then we have a reasonably good importance distribution as
𝑉𝑎𝑟 𝑍෠𝑛
𝑞𝑘 𝑥𝑘 ≈ 𝜋𝑛 𝑥𝑘 but 𝑁 ≈ 1.103𝑛/2 which is approximately equal to 1.9 × 1021 for 𝑛 =
𝑍𝑛2
𝑉𝑎𝑟 𝑍෠𝑛
1000! We would need to use 𝑁 ≈ 2 × 1023 particles to obtain a relative variance ≈
𝑍𝑛2
0.01.This is impractical.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 17

Proposal Distribution Factorization
 From practical perspective, we use proposal distributions of the form:

qn  xn | y1:n , x1:n 1   qn  xn | yn , xn 1 

 Given 𝒙𝑛−1 and 𝑦𝑛 , 𝒚1:𝑛−1 and 𝒙1:𝑛−2 don’t bring any new information about 𝑋𝑛.
 Our sequential importance sampling update now looks as follows:
qn  x1:n | y1:n   qn 1  x1:n 1 | y1:n 1  qn  xn | yn , xn 1 
Importance Samping at n Distribution of the paths X1:( ni )1 Conditional Distribution of X n( i )
n
 q  x1   qk  xk | yk , xk 1 
k 2

𝑖
 Thus we assume that at 𝑛 − 1 we have sampled 𝑿1:𝑛−1 ~𝑞𝑛−1 𝒙1:𝑛−1 |𝒚1:𝑛−1 and to obtain
𝑖 𝑖 𝑖
𝑿1:𝑛 ~𝑞 𝒙1:𝑛 |𝒚1:𝑛 , we need to sample 𝑋𝑛 ~𝑞𝑛 𝑥𝑛 |𝑦𝑛, 𝑋𝑛−1 and then set

 
X (i )
 (i )
X1:n 1 , Xn (i )

1:n
 Pr eviously Sampled Paths Sampled Single Component at time n 
 

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 18

Sequential Importance Sampling
 We now need to show that we can recursively compute estimates of our target distribution
𝑝 𝒙1:𝑛 |𝒚1:𝑛 as well as of 𝑝 𝒚1:𝑛 .

 From our earlier Importance Sampling approximations:

𝑁 𝑖
𝑖 𝑖
𝑤 𝑿1:𝑛 , 𝒚1:𝑛
𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛 = ෍ 𝑊𝑛 𝛿𝑿 𝑖 𝒙1:𝑛 , 𝑊𝑛 = 𝑁
1:𝑛 𝑖
𝑖=1 𝑁 ෍ 𝑤 𝑿1:𝑛 , 𝒚1:𝑛
1 𝑖
𝑖=1
𝑝Ƹ 𝑁 𝒚1:𝑛 = ෍ 𝑤 𝑿1:𝑛 , 𝒚1:𝑛
𝑁
𝑖=1
 We can show the following recursions for calculations of these weights:
p  x1:n , y1:n  p  x1:n 1 , y1:n 1  f  xn | xn 1  g  yn | xn 
w  x1:n , y1:n   
q  x1:n | y1:n  q  x1:n 1 | y1:n 1  q  xn | yn , xn 1 
w x1:n1 , y1:n1  Incremental Weight

 This suggests the following sequential Importance Sampling Algorithm.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 19

Sequential Importance Sampling
At step 𝒏 = 𝟏:
𝑖
 Sample 𝑋1 ~𝑞 𝑥1 |𝑦1 , 𝑖 = 1, . . . , 𝑁 and then approximate:
𝑁 𝑖 𝑖
𝑖 𝑖 𝑖
𝜇 𝑋1 𝑔 𝑦1 |𝑋1
𝑝Ƹ 𝑁 𝑥1 |𝑦1 = ෍ 𝑊1 𝛿𝑋 𝑖 𝑥1 , 𝑊1 𝑋1 , 𝑦1 ∝ 𝑖
𝑖=1
1 𝑞 𝑋1 |𝑦1

At step 𝒏 ≥ 𝟐:

 The algorithm has computational complexity O(𝑁) independent of 𝑛.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 20
Sequential Importance Sampling
 Note that the complexity of the algorithm does not increase with 𝑛.

 The algorithm is fully parallelizable.

 Also note that if our interest is on computing the marginal posterior, 𝑝Ƹ 𝑁 𝑥𝑛 |𝒚1:𝑛 (posterior
𝑖 𝑖
filtered density), then we only need to store 𝑿𝑛−1:𝑛 rather than all the 𝑿1:𝑛 paths
𝑁
𝑖
𝑝Ƹ 𝑁 𝑥𝑛 |𝒚1:𝑛 = ෍ 𝑊𝑛 𝛿𝑋 𝑖 𝑥𝑛 ,
𝑛
𝑖=1
𝑖 𝑖 𝑖
𝑖 𝑖 𝑖
𝑓 𝑋𝑛 |𝑋𝑛−1 𝑔 𝑦𝑛 |𝑋𝑛
𝑊𝑛 ∝ 𝑤 𝑿1:𝑛 , 𝒚1:𝑛 = 𝑤 𝑿1:𝑛−1 , 𝒚1:𝑛−1 𝑖 𝑖
𝑞 𝑋𝑛 |𝑦𝑛, 𝑋𝑛−1

 One can show that this approaches the true posterior as 𝑁 → ∞.

 Crisan, D., P. D. Moral, and T. Lyons (1999). Discrete filtering using branching and interacting particle
systems. Markov Processes and Related Fields 5(3), 293–318.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 21
The Bootstrap Particle Filter
 A simple choice of an importance sampling distribution 𝑞𝑛 𝒙1:𝑛 |𝒚1:𝑛 is derived based on the
following:
qn  x1:n | y1:n   p  x1:n 

 This choice is extremely poor if the data are very informative (peaky likelihood), since the
proposal distribution doesn’t include any information from the data 𝒚1:𝑛 .
n
wn  x1:n , y1:n   wn 1  x1:n 1 , y1:n 1   g  yn | xn    g  yk | xk 
k 1

 In the Bootstrap particle filter the particles are simulated according to the dynamical model
and weights are assigned according to the likelihood.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 22
Bootstrap Particle Filter
 One selects
q1  x1    ( x1 ) and qn  xn | x1:n1   qn  xn | xn 1   f  xn | xn1 

𝑖 𝑖 𝑖
 At time 𝑛 = 1, we sample 𝑋1 ~𝜇(. ) and set 𝑤1 = 𝑔 𝑦1 |𝑋1

 At time 𝑛 (𝑛 > 1)

𝑖 𝑖 𝑖 𝑖 𝑖
 sample 𝑋𝑛 ~𝑓(. |𝑿1:𝑛−1 ) and set 𝑿1:𝑛 = 𝑿1:𝑛−1 , 𝑋𝑛

 evaluate the importance weights

N
wn(i )  wn(i)1 g  yn | X n(i )  or normalized : W n
(i )
 w /  wn( i )
(i )
n
i 1

 At any time 𝑛 we have:

~   x1  g  y1 | x1   f  xk | xk 1  g  yk | xk , wn  X    g( y
n n
(i ) (i )
X 1:n 1:n k |X k( i ) )
k 2 k 2

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 23

Resampling
 As 𝑛 increases, the mass of our approximation to the target distribution concentrates on a few
particles (degeneracy problem):
𝑁
𝑖
𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛 = ෍ 𝑊𝑛 𝛿𝑿 𝑖 𝒙1:𝑛 ≈ 𝛿𝑿 𝑖0 𝒙1:𝑛
1:𝑛 1:𝑛
𝑖=1

 Here a single delta mass is used (weight 1 for particle 𝑖0).

𝑖
 When the variance of the weights 𝑊𝑛 is high, the resampling idea essentially is to kill the
𝑖
samples with low weights 𝑊𝑛 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑡𝑜 1Τ𝑁 and multiply the particles with higher
weights.
 Of course the assumption here is that particles with low weights (relative to 1/𝑁) at step 𝑛 will
have even lower weights at later on steps.
 Resampling techniques are a key ingredient of SMC methods which can partially address the
problem of degeneracy of the SIS algorithm.

 Ref : J.S.Liu, R.Chen. Blind deconvolution via sequential imputations. Journal of the American Statistical Association. 1995, 90:p567.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 24
Resampling
 Let us assume that at time 𝑛 the following approximation holds:
𝑁
𝑖
𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛 = ෍ 𝑊𝑛 𝛿𝑿 𝑖 𝒙1:𝑛
1:𝑛
𝑖=1

𝑗 𝑁
 With resampling (sampling with replacement from 𝒞 𝑊𝑛 in proportion to the weights
𝑗=1
𝑖
𝑊𝑛 ), we sample 𝑁 times from the above distribution
𝑖
෩ 1:𝑛
𝑿 ~𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛 , 𝑖 = 1, . . . , 𝑁

to build a new approximation: 𝑁

1
𝑝෤𝑁 𝒙1:𝑛 |𝒚1:𝑛 = ෍ 𝛿𝑿෩ 𝑖 𝒙1:𝑛
𝑁 1:𝑛
𝑖=1

𝑖
෩ 1:𝑛
 Note that the resampled particles 𝑿 are approximately distributed according to 𝑝 𝒙1:𝑛 |𝒚1:𝑛
but they are statistically dependent (so CLT approximations, etc. are not holding!)
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 25
Bayesian Recursion Formulas for the
State Space Model

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 26

Filtering and Marginal Likelihood
 Let us return to the SSM where the objective is to compute 𝑝 𝒙1:𝑛 |𝒚1:𝑛 . We
want to calculate this sequentially.
 We can write the following recursion equation:
p  x1:n , y1:n  p  y1:n  p  x1:n , y1:n  p  y1:n 1 
p  x1:n | y1:n   p  x1:n 1 | y1:n 1   p  x1:n 1 | y1:n 1 
p  x1:n 1 , y1:n 1  p  y1:n 1  p  x1:n 1 , y1:n 1  p  y1:n 
Pr edictive: p  x1:n | y1:n1 

  g ( yn | xn ) p  xn , xn 1 | y1:n 1  dxn 1:n   g ( yn | xn ) f ( xn | xn 1 ) p  xn 1 | y1:n 1  dxn 1:n

 We can write our update equation above in two recursive steps:
Step I - Prediction : p  x1:n | y1:n 1   f  xn | xn 1  p  x1:n 1 | y1:n 1 
g ( yn | xn ) p  x1:n | y1:n 1 
Step II -Update : p  x1:n | y1:n    g ( yn | xn ) p  x1:n | y1:n 1 
p  yn | y1:n 1 
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 27
Filtering and Marginal Likelihood
 A two-step prediction/update for the marginal (filtering distributions) 𝑝 𝑥𝑛 |𝒚1:𝑛
can also be easily derived.
Step I - Prediction : p  xn | y1:n 1    p  xn 1:n | y1:n 1  dxn 1

  p  xn | xn 1 , y1:n 1  p  xn 1 | y1:n 1  dxn 1

 Our key emphasis remains in the calculation of 𝑝 𝒙1:𝑛 |𝒚1:𝑛 even if our
interests are in computing 𝑝 𝑥𝑛 |𝒚1:𝑛 .
 This recursion leads to the Kalman filter for LG-SSM.
 SMC is a simple simulation-based implementation of this recursion.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 28
Filtering and Marginal Likelihood
 To compute the normalizing factor 𝑝 𝒚1:𝑛 , one can use recursive calculation
avoiding high-dimensional integration.
𝑛

𝑝 𝒚1:𝑛 = 𝑝(𝑦1 ) ෑ 𝑝 𝑦𝑘 |𝒚1:𝑘−1

𝑘=2

 To compute 𝑝 𝑦𝑘 |𝒚1:𝑘−1 , we use the recursion derived earlier:

𝑝 𝑦𝑘 |𝒚1:𝑘−1 = න𝑝 𝑦𝑘 , 𝑥𝑘 |𝒚1:𝑘−1 𝑑𝑥𝑘 = න𝑔(𝑦𝑘 |𝑥𝑘 )𝑝 𝑥𝑘 |𝒚1:𝑘−1 𝑑𝑥𝑘

 The calculation of 𝑝 𝒚1:𝑛 is a product of lower dimensional integrals.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 29

MC Implementation of the Prediction Step
 The Bootstrap Particle Filter (Gordon et al. 1993) considered earlier can be seen as a natural
Monte-Carlo simulation based implementation of the prediction and updating recursive
relations.

 Assume you have at time 𝑛 − 1

𝑁
1 𝑖
𝑝෤𝑁 𝒙1:𝑛−1 |𝒚1:𝑛−1 = ෍ 𝛿𝑿෩ 𝑖 ෩ 1:𝑛−1
𝒙1:𝑛−1 , 𝑤ℎ𝑒𝑟𝑒 𝑿 ~𝑝 𝒙1:𝑛−1 |𝒚1:𝑛−1
𝑁 1:𝑛−1
𝑖=1

𝑖 𝑖 𝑖 𝑖 𝑖
By sampling 𝑋𝑛 ~𝑓 𝑥𝑛 |𝑋෨𝑛−1 , setting 𝑿1:𝑛 = 𝑿
෩ 1:𝑛−1 , 𝑋𝑛 and using 𝑝 𝒙1:𝑛 |𝒚1:𝑛−1 =
𝑓 𝑥𝑛 |𝑥𝑛−1 𝑝 𝒙1:𝑛−1 |𝒚1:𝑛−1 we obtain:

𝑁
1
𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛−1 = ෍ 𝛿𝑿 𝑖 𝒙1:𝑛
𝑁 1:𝑛
𝑖=1

 Sampling from 𝑓 𝑥𝑛 |𝑥𝑛−1 is straightforward.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 30

 Finally 𝑝 𝒙1:𝑛 |𝒚1:𝑛 becomes:

𝑁
𝑖
෍ 𝑔(𝑦𝑛 |𝑋𝑛 )𝛿𝑿 𝑖 𝒙1:𝑛 𝑁
𝑔(𝑦𝑛 |𝑥𝑛 )𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛−1 𝑖=1 1:𝑛 (𝑖)
𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛 = = 𝑁 = ෍ 𝑊𝑛 𝛿𝑿 𝑖 𝒙1:𝑛
𝑝Ƹ 𝑁 𝑦𝑛 |𝒚1:𝑛−1 𝑖 1:𝑛
෍ 𝑔(𝑦𝑛 |𝑋𝑛 ) 𝑖=1
𝑖=1
𝑁
(𝑖) 𝑖 (𝑖)
where the normalized weights are defined as 𝑊𝑛 ∝ 𝑔(𝑦𝑛 |𝑋𝑛 ), ෍ 𝑊𝑛 = 1.
𝑖=1
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 31
Multinomial Resampling
 We have a weighted approximation 𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛 of 𝑝 𝒙1:𝑛 |𝒚1:𝑛
𝑁
𝑖
𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛 = ෍ 𝑊𝑛 𝛿𝑿 𝑖 𝒙1:𝑛
1:𝑛
𝑖=1
(𝑖)
෩ 1:𝑛
 To obtain 𝑁 samples 𝑿 approximately distributed according to 𝑝 𝒙1:𝑛 |𝒚1:𝑛 , resample 𝑁
𝑖
times with replacement according to the weights 𝑊𝑛
(𝑖)
෩ 1:𝑛
𝑿 ~𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛 , 𝑖 = 1, . . . , 𝑁
to build a new approximation:
𝑁
𝑁 (𝑖)
1 𝑁𝑛
𝑝෤𝑁 𝒙1:𝑛 |𝒚1:𝑛 = ෍ 𝛿𝑿෩ 𝑖 𝒙1:𝑛 = ෎ 𝛿 𝑖 𝒙
𝑁 1:𝑛 𝑁 𝑿1:𝑛 1:𝑛
𝑖=1
𝑖=1

(𝑖) 𝑖 𝑖 𝑖 𝑖 𝑖
 Here 𝑁𝑛 follow a multinomial with 𝔼 𝑁𝑛 = 𝑁𝑊𝑛 , 𝑉𝑎𝑟 𝑁𝑛 = 𝑁𝑊𝑛 1 − 𝑊𝑛

 The computational cost is 𝒪 𝑁 .

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 32
Multinomial Resampling
 The resampling algorithm is based on the following two steps:
For 𝑖 = 1, … , 𝑁
𝑗 𝑁
 Select one of the components: 𝑎𝑛𝑖 ~𝒞 𝑊𝑛−1 (Categorical distribution)
𝑗=1
𝑖
𝑎𝑛
𝑖
 Generate a sample from the selected component: 𝑋𝑛 ~𝑓(𝑥𝑛 |𝑋𝑛−1 )
𝑖
𝑎𝑛 𝑖 𝑖
 The particle 𝑋𝑛−1 is referred to as the ancestor of the 𝑋𝑛 since 𝑋𝑛 is
𝑖
𝑎𝑛
generated conditionally on 𝑋𝑛−1 .
 The variable 𝑎𝑛𝑖 ∈ 1,2, … , 𝑁 is referred to as the ancestor index since it
𝑖
indexes the ancestor of particle 𝑋𝑛 at time 𝑛 − 1.
 The ancestor indices are essentially random variables that are used to make
the stochasticity of the resampling step explicit by keeping track of which
particles get resampled.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras
33
Vanilla SMC: Bootstrap Filter (Gordon et al.)
For 𝑖 = 1, … , 𝑁

 Time 1:
𝑖
 Sample 𝑁 particles 𝑋1 ~𝜇(𝑥1 ) and compute:
𝑁
𝑖 𝑖 𝑖
𝑝Ƹ 𝑁 𝑥1 |𝑦1 = ෍ 𝑊1 𝛿𝑋 𝑖 𝑥1 , 𝑊1 ∝ 𝑔 𝑦1 |𝑋1
1
𝑖=1
𝑁
𝑖 1
 Resample 𝑋෨1 ~𝑝Ƹ 𝑁 𝑥1 |𝑦1 to obtain 𝑝෤𝑁 𝑥1 |𝑦1 = ෍ 𝛿𝑋෨ 𝑖 𝑥1 .
𝑁 𝑖=1 1

𝑁
 Time 𝑛, 𝑛 ≥ 2. Given 𝑝Ƹ𝑁 𝒙1:𝑛−1 |𝒚1:𝑛−1 = ෍
𝑖
𝑊𝑛−1 𝛿𝑿 𝑖 𝒙1:𝑛−1
𝑖=1 1:𝑛−1

𝑗 𝑁
𝑁
1
 Resample: Sample 𝑎𝑛𝑖 ~𝒞 𝑊𝑛−1 → 𝑝෤𝑁 𝒙1:𝑛−1 |𝒚1:𝑛−1 =
𝑁
෍ 𝛿𝑿෩ 𝑖 𝒙1:𝑛−1 .
𝑗=1 𝑖=1 1:𝑛−1
𝑖
𝑎𝑛 𝑁
𝑖 𝑖 𝑖
෩ 1:𝑛−1 𝑖 1
 Propagate: Sample 𝑋𝑛 ~𝑓(𝑥𝑛 |𝑋𝑛−1 ), set 𝑿1:𝑛 = 𝑿 , 𝑋𝑛 → 𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛−1 =
𝑁
෍ 𝛿𝑿 𝑖 𝒙1:𝑛 .
𝑖=1 1:𝑛

𝑁
𝑖 𝑖 𝑖
 Weight: Compute 𝑊𝑛 ∝𝑔 𝑦𝑛 |𝑋𝑛 and normalize → 𝑝Ƹ𝑁 𝒙1:𝑛 |𝒚1:𝑛 = ෍ 𝑊𝑛 𝛿𝑿 𝑖 𝒙1:𝑛 .
𝑖=1 1:𝑛

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 34

Ancestral Path
 Consider the case with 𝑁 = 3 and 𝑛 = 3. 𝑥11 𝑥21 𝑥31

 Assume the resampling shown, e.g. particle 2 at 𝑛 = 1 is 𝑥12 𝑥22 𝑥32

resampled twice, particle 3 at 𝑛 = 2 is resampled twice.
𝑥13 𝑥23 𝑥33
 The ancestral paths were defined earlier in the form
𝑖 𝑖
෩ 1:𝑛−1 𝑖 𝑖
෩ 1:𝑛−1
𝑿1:𝑛 = 𝑿 , 𝑋𝑛 where 𝑿 refers to resampled paths at 𝑛 − 1, e.g. the ancestral path
1
෩ 1:2 2 (1) 2
of particle 1 at time 3 is: 𝑿 , 𝑋31 = 𝑋෨1 , 𝑋෨2 , 𝑋31 = 𝑋෨1 , 𝑋22 , 𝑋31 = 𝑋12 , 𝑋22 , 𝑋31 .

 To make the notation for the ancestral paths explicit, one can represent them in the form
𝑖 𝑎𝑖
𝑛
𝑖 𝑎𝑛 𝑖 𝑿1:𝑛−1 is the path that terminates at the
𝑿1:𝑛 = 𝑿1:𝑛−1 , 𝑋𝑛 𝑖 𝑖
𝑖
𝑎𝑛
ancestor of 𝑋𝑛 i.e. 𝑋𝑛 ~𝑓(𝑥𝑛 |𝑋𝑛−1 )

 For example, the ancestral path for particle 1 at time 3 can be written as:
𝑎1
𝑎2 3 𝑎31
𝑋1 , 𝑋2 , 𝑋31 = 𝑋12 , 𝑋22 , 𝑋31
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 35
Vanilla SMC: Bootstrap Filter (Gordon et al.)
𝑁
𝑖
 At time 𝑛, 𝑛 ≥ 2. Given 𝑝Ƹ 𝑁 𝒙1:𝑛−1 |𝒚1:𝑛−1 = ෍ 𝑊𝑛−1 𝛿𝑿 𝑖 𝒙1:𝑛−1
𝑖=1 1:𝑛−1
𝑁
1
 After resampling: It produces 𝑝෤𝑁 𝒙1:𝑛−1 |𝒚1:𝑛−1 = ෍ 𝛿𝑿෩ 𝑖 𝒙1:𝑛−1 .
𝑁 𝑖=1 1:𝑛−1
𝑁
1 𝑖 𝑖 𝑖
 After propagation: It produces 𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛−1 = 𝑁 ෍ 𝛿𝑿 𝑖 ෩ 1:𝑛−1
𝒙1:𝑛 where 𝑿1:𝑛 = 𝑿 , 𝑋𝑛
𝑖=1 1:𝑛

𝑁
𝑖
 After weighting: It produces 𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛 = ෍ 𝑊𝑛 𝛿𝑿 𝑖 𝒙1:𝑛 .
𝑖=1 1:𝑛

 In the original bootstrap particle filter of Gordon et al. (particles simulated with the dynamical
model and weights are assigned according to the likelihood) the focus was on computing an
approximation 𝑝Ƹ 𝑁 𝑥𝑛 |𝒚1:𝑛 of the filtering marginal.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 36
SMC Output
 At time 𝑛 we have:
𝑁
1 𝑖 𝑖 𝑖
𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛−1 = ෍ 𝛿𝑿 𝑖 ෩ 1:𝑛−1
𝒙1:𝑛 , 𝑿1:𝑛 = 𝑿 , 𝑋𝑛
𝑁 𝑖=1 1:𝑛
𝑁 𝑁
𝑖 1
𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛 = ෍ 𝑊𝑛 𝛿𝑿 𝑖 𝒙1:𝑛 , 𝑝෤𝑁 𝒙1:𝑛 |𝒚1:𝑛 = ෍ 𝛿𝑿෩ 𝑖 𝒙1:𝑛
𝑖=1 1:𝑛 𝑁 𝑖=1 1:𝑛

 Computational complexity is 𝒪 (𝑁) at each time step and memory requirements 𝒪 (𝑛𝑁).

 If we are only interested in 𝑝 𝑥𝑛 |𝒚1:𝑛 or 𝑝 𝑠𝑛 𝒙1:𝑛 |𝒚1:𝑛 where 𝑠𝑛 𝒙1:𝑛 =

Ψ𝑛 𝑠𝑛 , 𝑠𝑛−1 𝒙1:𝑛−1 , e.g. 𝑠𝑛 𝒙1:𝑛 = σ𝑛𝑘=1 𝑥𝑘2 is fixed dimension, then memory requirements
are 𝒪 𝑁 .
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 37
Kalman Filtering Solution for LG-SSM
 Recall the LG-SSM:
𝑋𝑛 = 𝐴𝑋𝑛−1 + 𝐵𝑢𝑛 + 𝑉𝑛

𝑌𝑛 = 𝐶𝑋𝑛 + 𝐷𝑢𝑛 + 𝐸𝑛

 Indeed, one can show:

 Here we used (see next) 𝑝 𝑥𝑛 𝑥𝑛+1 , 𝒚1:𝑇 = 𝑝 𝑥𝑛 |𝑥𝑛+1 , 𝒚1:𝑛 .

 Fredrik Lindsten and Thomas B. Schön. Backward simulation methods for Monte Carlo statistical inference. Foundations and
Trends in Machine Learning, 6(1):1-143, 2013.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 39
Forward-Filtering Backward-Smoothing
 Here we highlight the proof of the Eq. used in the earlier slide:

p ( xn | xn 1 , y1:T )  p  xn | xn 1 , y1:n 

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 40

Forward-Backward (Two-Filter) Smoother
 One can also estimate the marginal smoothing distribution 𝑝 𝑥𝑛 |𝒚1:𝑇 , 𝑛 =
1, . . . , 𝑇 as follows (see proof on the following slide):
Step I - Backward information filter : p  yn:T | xn    p  yn:T , xn 1 | xn  dxn 1

  p  yn:T | xn 1 , xn  f  xn 1 | xn  dxn 1   p  yn , yn 1:T | xn 1 , xn  f  xn 1 | xn  dxn 1

 Note that we can have: ‫𝑛𝒚 𝑝 ׬‬:𝑇 |𝑥𝑛 𝑑𝑥𝑛 = ∞. This precludes the use of SMC
algorithms.

 To address this, a generalized version was proposed using a set of artificial

distributions 𝑝෤𝑛 𝑥𝑛 .
 Briers, M., Doucet, A. and Maskell, S. (2008) Smoothing algorithms for state-space models, Ann Inst Stat Math (2010) 62:61–
89. Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 41
Online Bayesian Parameter
Estimation

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 42

Online Bayesian Parameter Estimation
 Let the SSM be defined with some unknown static parameter 𝜃 with some prior 𝑝(𝜃):*
X 1 ~  (.) and X n |  X n 1  xn 1  ~ f  xn | xn 1 
Yn |  X n  xn  ~ g  yn | xn 
 Given data 𝒚1:𝑛 , inference now is based on:
p  , x1:n | y1:n   p  | y1:n  p  x1:n | y1:n  ,
where
p  | y1:n   p  y1:n  p  
 Need to learn both 𝒙1:𝑛 and 𝜃 from the observations 𝒚1:𝑛 . We can use standard SMC but on
the extended space 𝑍𝑛 = (𝑋𝑛 , 𝜃𝑛).
f  zn | zn 1   n1  n  f  xn | xn 1  , g  yn | zn   g  yn | xn 
 Note that 𝜃 is a static parameter – does not involve with 𝑛.

*M. Kok, J. D. Hol and T. B. Schön. Using inertial sensors for position and orientation estimation. Foundations and Trends of Signal Processing, 11(1-2):1-153,
2017. (In motion capture, 𝑋𝑛 can represent the position/orientation of different body segments of a person, 𝜃 the body shape and 𝑌𝑛 measurements from sensors).

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 43

Maximum Likelihood Parameter Estimation
 Standard approaches for parameter estimation consists of computing the Maximum
Likelihood (ML) estimate
 ML  arg max log p  y1:n 
 The likelihood function can be multimodal and there is no guarantee to find its global
optimum.

 Standard (stochastic) gradient algorithms can be used (e.g. based on Fisher’s identity) to find
a local minimum:

 log p  y1:n     log p  x1:n , y1:n  p  x1:n | y1:n  dx1:n

 These algorithms can work decently but it can be difficult to scale the components of the
gradients.

 Note that these algorithms involve computing 𝑝𝜃 𝒙1:𝑛 |𝒚1:𝑛 which is the key result of the SMC
algorithm.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 44
Expectation/Maximization for HMM
 One can also use the EM algorithm
 ( i )  Q  ( i ) , 
Q  ( i ) ,    log p  x1:n , y1:n  p ( i 1)  x1:n | y1:n  dx1:n

  log    x1  g  y1 | x1   p ( i 1)  x1 | y1:n  dx1

n
   log  f  xk | xk 1  g  yk | xk   p ( i 1)  xk 1:k | y1:n  dxk 1:k
k 2

 Above we used: n n
p  x1:n , y1:n     x1   f  xk | xk 1  g  yk | xk 
k 2 k 1

 Implementation of the EM algorithm requires computing expectations with respect to the

smoothing distributions 𝑝𝜃 𝑖−1 𝒙𝑘−1:𝑘 |𝒚1:𝑛 .

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 45

Gaussian Process SSM
 The Gaussian process (GP) is a non-parametric probabilistic model for
nonlinear functions.

 Consider an SSM model of the form:

𝑋𝑛 = 𝑓 𝑋𝑛−1 + 𝑉𝑛 ; 𝑠. 𝑡. 𝑓 𝑋 ~𝐺𝑃 0; 𝑘𝜂,𝑓 (𝑥; 𝑥 ′ ) ;

𝑌𝑛 = 𝑔 𝑋𝑛 + 𝐸𝑛 ; 𝑠. 𝑡. 𝑔 𝑋 ~𝐺𝑃 0; 𝑘𝜂,𝑔 (𝑥; 𝑥 ′ ) :

 The model functions 𝑓 and 𝑔 are assumed to be realizations from Gaussian

process priors and 𝑉𝑛 ~𝒩 0; 𝑄 , 𝐸𝑛 ~𝒩 0; 𝑅 .

 The inference task becomes the calculation of the joint posterior

𝑝(𝑓, 𝑔, 𝑄, 𝑅, 𝜂, 𝒙1:𝑛 |𝒚1:𝑛 ).
 Roger Frigola, Fredrik Lindsten, Thomas B. Schön, and Carl Rasmussen. Bayesian inference and learning in Gaussian process state-space models with particle MCMC. NIPS, 2013.
 Andreas Svensson and Thomas B. Schön. A flexible state space model for learning nonlinear dynamical systems. Automatica, 80:189-199, June, 2017.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 46

S.No Masjid / Mosque Name Address Google Map Location Bayan Khutba
No ratings yet
S.No Masjid / Mosque Name Address Google Map Location Bayan Khutba
1 page
Rubric For Legal Essay Final Requirement
No ratings yet
Rubric For Legal Essay Final Requirement
2 pages
HW 5
100% (1)
HW 5
11 pages
Lecture 8.2 - Variational Quantum Eigensolver
No ratings yet
Lecture 8.2 - Variational Quantum Eigensolver
27 pages
Process Planning and Cost Estimation
100% (1)
Process Planning and Cost Estimation
13 pages
Reasons For Underdeveloped West AFRICA
100% (4)
Reasons For Underdeveloped West AFRICA
15 pages
Form 4 Lesson Plan Use 1
No ratings yet
Form 4 Lesson Plan Use 1
2 pages
Lecture 1.1 - Single States
No ratings yet
Lecture 1.1 - Single States
49 pages
Exams
0% (1)
Exams
22 pages
MCMC Brief
100% (1)
MCMC Brief
69 pages
Ek 2020
No ratings yet
Ek 2020
203 pages
Questions and Solutions
No ratings yet
Questions and Solutions
47 pages
MCMC Sampling - Class 2025
No ratings yet
MCMC Sampling - Class 2025
101 pages
Teaching and Learning in Diverse and Inclusive Classrooms - Key Issues For New Teachers (PDFDrive)
No ratings yet
Teaching and Learning in Diverse and Inclusive Classrooms - Key Issues For New Teachers (PDFDrive)
189 pages
Lec31 32 CaterpillarRegressionExample
No ratings yet
Lec31 32 CaterpillarRegressionExample
108 pages
Foundations of Statistical Inference
No ratings yet
Foundations of Statistical Inference
89 pages
Lec7 - Bayesian Network I
No ratings yet
Lec7 - Bayesian Network I
62 pages
Safe Work in Confined Spaces
100% (1)
Safe Work in Confined Spaces
20 pages
Intro Bayes Time Series 1
No ratings yet
Intro Bayes Time Series 1
72 pages
Trigo Revision
100% (1)
Trigo Revision
5 pages
Lec29 ImportanceSampling
No ratings yet
Lec29 ImportanceSampling
84 pages
Introduction To Nonlinear Filtering
No ratings yet
Introduction To Nonlinear Filtering
126 pages
Lec11 Introduction2BayesianStatistics
No ratings yet
Lec11 Introduction2BayesianStatistics
48 pages
Lec14 15 GenerativeModelsForDiscreteData
No ratings yet
Lec14 15 GenerativeModelsForDiscreteData
74 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
Durrande 2020
No ratings yet
Durrande 2020
90 pages
Lec33 MetropolisHastings
No ratings yet
Lec33 MetropolisHastings
66 pages
Lec12 13 BayesianInferenceForTheGaussian
No ratings yet
Lec12 13 BayesianInferenceForTheGaussian
57 pages
Hybrid Least Squares For Learning Functions From Highly Noisy Data
No ratings yet
Hybrid Least Squares For Learning Functions From Highly Noisy Data
30 pages
Weighted Ensemble 2003.02316v3
No ratings yet
Weighted Ensemble 2003.02316v3
41 pages
Gonzalez 2020
No ratings yet
Gonzalez 2020
79 pages
Lec9 MultivariateGaussian
No ratings yet
Lec9 MultivariateGaussian
60 pages
Lec16 SummarizingPosteriors BayesianModelSelection
No ratings yet
Lec16 SummarizingPosteriors BayesianModelSelection
59 pages
Lec30 GibbsSampling
No ratings yet
Lec30 GibbsSampling
55 pages
Met A Language
No ratings yet
Met A Language
101 pages
Introduction To State Space Models and Sequential Bayesian Inference
No ratings yet
Introduction To State Space Models and Sequential Bayesian Inference
58 pages
Project Report
No ratings yet
Project Report
56 pages
HT - Lab - Manual - 2016 - 2017 PDF
No ratings yet
HT - Lab - Manual - 2016 - 2017 PDF
94 pages
Lec25 MonteCarloMethods
No ratings yet
Lec25 MonteCarloMethods
57 pages
3 1 Lueckmann21a-Supp
No ratings yet
3 1 Lueckmann21a-Supp
39 pages
L14 TopicModels Sampling
No ratings yet
L14 TopicModels Sampling
40 pages
University of Bristol Research Report 08:16: SMCTC: Sequential Monte Carlo in C++
No ratings yet
University of Bristol Research Report 08:16: SMCTC: Sequential Monte Carlo in C++
36 pages
Lec22 Introduction2BayesianRegression
No ratings yet
Lec22 Introduction2BayesianRegression
42 pages
Dai 2020
No ratings yet
Dai 2020
62 pages
Biva 2 PDF
No ratings yet
Biva 2 PDF
1 page
Tense
No ratings yet
Tense
18 pages
Lecture 4.1 - Quantum Query Algorithms
No ratings yet
Lecture 4.1 - Quantum Query Algorithms
38 pages
Lec23 Evidence4Regression
No ratings yet
Lec23 Evidence4Regression
38 pages
Fairness Lectures-21
No ratings yet
Fairness Lectures-21
63 pages
11 BJPS144
No ratings yet
11 BJPS144
14 pages
Approximate Inference Via Variational Sampling
No ratings yet
Approximate Inference Via Variational Sampling
13 pages
Computational Statistics With Matlab: Mark Steyvers May 13, 2011
No ratings yet
Computational Statistics With Matlab: Mark Steyvers May 13, 2011
78 pages
Lecture 8.1 - Iterative Quantum Phase Estimation - Moving Beyond Traditional QPE
No ratings yet
Lecture 8.1 - Iterative Quantum Phase Estimation - Moving Beyond Traditional QPE
31 pages
Seminar em
No ratings yet
Seminar em
51 pages
Note 6
No ratings yet
Note 6
13 pages
Lec17 PriorModeling
No ratings yet
Lec17 PriorModeling
37 pages
Bayesian Update With Importance Sampling Required
No ratings yet
Bayesian Update With Importance Sampling Required
21 pages
Maximum Entropy Method: Sampling Bias: Jorge - Cossio@cigb - Edu.cu
No ratings yet
Maximum Entropy Method: Sampling Bias: Jorge - Cossio@cigb - Edu.cu
10 pages
Lec7 InformationTheory
No ratings yet
Lec7 InformationTheory
41 pages
AI 19 Bayes Nets IV Sampling
No ratings yet
AI 19 Bayes Nets IV Sampling
29 pages
Lecture 3 - Entanglement in Action
No ratings yet
Lecture 3 - Entanglement in Action
36 pages
Lec24 BayesianLinearRegression
No ratings yet
Lec24 BayesianLinearRegression
29 pages
Importance Sampling
No ratings yet
Importance Sampling
13 pages
Notes On Computational-To-Statistical Gaps: Predictions Using Statistical Physics
No ratings yet
Notes On Computational-To-Statistical Gaps: Predictions Using Statistical Physics
22 pages
Lec28 StratifiedSampling
No ratings yet
Lec28 StratifiedSampling
15 pages
New Perspectives New Truths
No ratings yet
New Perspectives New Truths
50 pages
Lec18 HierarchicalBayesianModels
No ratings yet
Lec18 HierarchicalBayesianModels
20 pages
Lec27 AcceptReject
No ratings yet
Lec27 AcceptReject
30 pages
Putational Statistics Using Matlab
No ratings yet
Putational Statistics Using Matlab
78 pages
L11 TopicModels 2
No ratings yet
L11 TopicModels 2
37 pages
Particle Filters: Texpoint Fonts Used in Emf. Read The Texpoint Manual Before You Delete This Box.: Aaaaaaaaaaaaa
No ratings yet
Particle Filters: Texpoint Fonts Used in Emf. Read The Texpoint Manual Before You Delete This Box.: Aaaaaaaaaaaaa
57 pages
Lec21 BiasVarianceDecomposition
No ratings yet
Lec21 BiasVarianceDecomposition
15 pages
Approximate Inference
No ratings yet
Approximate Inference
37 pages
Implicitly Adaptive Importance Sampling: Topi Paananen Juho Piironen Paul-Christian Bürkner Aki Vehtari
No ratings yet
Implicitly Adaptive Importance Sampling: Topi Paananen Juho Piironen Paul-Christian Bürkner Aki Vehtari
19 pages
Particle Filters: Texpoint Fonts Used in Emf. Read The Texpoint Manual Before You Delete This Box.: Aaaaaaaaaaaaa
No ratings yet
Particle Filters: Texpoint Fonts Used in Emf. Read The Texpoint Manual Before You Delete This Box.: Aaaaaaaaaaaaa
57 pages
cs188 Fa22 Note15
No ratings yet
cs188 Fa22 Note15
5 pages
Pick Et Al
No ratings yet
Pick Et Al
8 pages
Lecture 7 - Introduction To Quantum Noise Bonus
No ratings yet
Lecture 7 - Introduction To Quantum Noise Bonus
13 pages
Statistics I: Parameter Estimation, Part II
No ratings yet
Statistics I: Parameter Estimation, Part II
22 pages
Spag Activity Sheets
No ratings yet
Spag Activity Sheets
15 pages
Principles of Alphabetical Arrangement
No ratings yet
Principles of Alphabetical Arrangement
2 pages
Particle Filter Tutorial
No ratings yet
Particle Filter Tutorial
8 pages
Lec20 RidgeRegression
No ratings yet
Lec20 RidgeRegression
21 pages
Bayes Intro PT 2
No ratings yet
Bayes Intro PT 2
13 pages
Machine Learning and Pattern Recognition Sampling Based Approximations
No ratings yet
Machine Learning and Pattern Recognition Sampling Based Approximations
3 pages
Introductin Ibra
No ratings yet
Introductin Ibra
2 pages
01 Sacredness of Vows and Pledges
No ratings yet
01 Sacredness of Vows and Pledges
13 pages
Book Prob 2012 Probability and Statistics
No ratings yet
Book Prob 2012 Probability and Statistics
5 pages
Basic Sampling Methods: Sargur Srihari Srihari@cedar - Buffalo.edu
No ratings yet
Basic Sampling Methods: Sargur Srihari Srihari@cedar - Buffalo.edu
30 pages
On Sequential Monte Carlo Sampling Methods For Bayesian Filtering
No ratings yet
On Sequential Monte Carlo Sampling Methods For Bayesian Filtering
35 pages
Importance Sampling Via Simulacrum: Alan E. Wessel
No ratings yet
Importance Sampling Via Simulacrum: Alan E. Wessel
13 pages
Ryan Adams 140814 Bayesopt Ncap
No ratings yet
Ryan Adams 140814 Bayesopt Ncap
84 pages
Bayesian Modelling For Data Analysis and Learning From Data
No ratings yet
Bayesian Modelling For Data Analysis and Learning From Data
19 pages
How To Repent in Islam Importance and Salaat Al-Tawbah Method
No ratings yet
How To Repent in Islam Importance and Salaat Al-Tawbah Method
1 page
3 Practical
No ratings yet
3 Practical
2 pages
Sampling Slides
No ratings yet
Sampling Slides
38 pages
Sampling Methods: Søren Højsgaard
No ratings yet
Sampling Methods: Søren Højsgaard
22 pages
Importance Sampling
No ratings yet
Importance Sampling
3 pages
Trichomes: What Is A Trichome
No ratings yet
Trichomes: What Is A Trichome
7 pages
Biography Recount
No ratings yet
Biography Recount
2 pages
PostgreSQL Self Join
No ratings yet
PostgreSQL Self Join
5 pages
Low Variance Sampling Techniques For Particle Filter
No ratings yet
Low Variance Sampling Techniques For Particle Filter
7 pages
300+ (UPDATED) Software Engineering MCQs PDF 2020 PDF
No ratings yet
300+ (UPDATED) Software Engineering MCQs PDF 2020 PDF
50 pages
Sequential Monte Carlo Methods
No ratings yet
Sequential Monte Carlo Methods
6 pages
EMU Nigerian Student Association Bulletin April 2010
No ratings yet
EMU Nigerian Student Association Bulletin April 2010
4 pages
Dispute Contract Law MCQs
No ratings yet
Dispute Contract Law MCQs
5 pages
11 Effective Note Taking Strategies - PDF - Safe
No ratings yet
11 Effective Note Taking Strategies - PDF - Safe
2 pages
Statistics 202C Study Guide: Part I: Sampling Basic Unstructured Distributions and Monte Carlo Basics
No ratings yet
Statistics 202C Study Guide: Part I: Sampling Basic Unstructured Distributions and Monte Carlo Basics
14 pages
Iterative and Non-Iterative Simulation Algorithms
No ratings yet
Iterative and Non-Iterative Simulation Algorithms
6 pages
Computational Statistics With Matlab
No ratings yet
Computational Statistics With Matlab
71 pages
Stat GCSE Edexcel June 2007
No ratings yet
Stat GCSE Edexcel June 2007
24 pages
Lecture 1 - Definition of Language and Linguistics
No ratings yet
Lecture 1 - Definition of Language and Linguistics
2 pages
Particle Filtering: Emin Orhan Eorhan@bcs - Rochester.edu
No ratings yet
Particle Filtering: Emin Orhan Eorhan@bcs - Rochester.edu
6 pages

Lec35 SequentialImportanceSampling

Uploaded by

Lec35 SequentialImportanceSampling

Uploaded by

Sequential Importance Sampling

Prof. Nicholas Zabaras

November 12, 2020

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 2

 Learn about sequential importance sampling for state space models

 Understand online Bayesian parameter estimation in state space models

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 3

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 4

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 6

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 7

 The distributions 𝜋𝑛 = 𝑝 𝒙1:𝑛 |𝒚1:𝑛 are known up to a normalizing constant:

n n    n ( x1:n ) n ( x1:n )dx1:n

and/or the normalizing constants 𝑍𝑛.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 8

 Sequential Monte Carlo (SMC) provides the means to do so as an alternative

 In sequential importance sampling, the proposal distribution is defined

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 9

 n n1 and Z n n1

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 10

 2  x1:2  p  x1:2 , y1:2 

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 11

X n(i ) | X1:(in)1 ~ qn  xn | y1:n , X 1:( in)1 

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 12

 A sensible strategy consists of selecting it so as to minimize the variance of 𝑤𝑛 𝒙1:𝑛 . It is

𝑜𝑝𝑡 𝑝(𝒙1:𝑛 , 𝒚1:𝑛 ) 𝑝(𝒙1:𝑛−1 , 𝒚1:𝑛 ) 𝑝(𝑥𝑛 |𝒙1:𝑛−1 , 𝒚1:𝑛 )

 However, the methodology presented here suffers from severe drawbacks.

𝑞𝑛 𝒙1:𝑛 |𝒚1:𝑛 = 𝑞1 𝑥1 |𝑦1 ෑ 𝑞𝑘 𝑥𝑘 |𝒚1:𝑘 , 𝒙1:𝑘−1

 We demonstrate this using a very simple toy example.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 16

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 17

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 18

 From our earlier Importance Sampling approximations:

 This suggests the following sequential Importance Sampling Algorithm.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 19

 The algorithm has computational complexity O(𝑁) independent of 𝑛.

 The algorithm is fully parallelizable.

 One can show that this approaches the true posterior as 𝑁 → ∞.

 evaluate the importance weights

 At any time 𝑛 we have:

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 23

 Here a single delta mass is used (weight 1 for particle 𝑖0).

to build a new approximation: 𝑁

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 26

  g ( yn | xn ) p  xn , xn 1 | y1:n 1  dxn 1:n   g ( yn | xn ) f ( xn | xn 1 ) p  xn 1 | y1:n 1  dxn 1:n

  p  xn | xn 1 , y1:n 1  p  xn 1 | y1:n 1  dxn 1

𝑝 𝒚1:𝑛 = 𝑝(𝑦1 ) ෑ 𝑝 𝑦𝑘 |𝒚1:𝑘−1

 To compute 𝑝 𝑦𝑘 |𝒚1:𝑘−1 , we use the recursion derived earlier:

𝑝 𝑦𝑘 |𝒚1:𝑘−1 = න𝑝 𝑦𝑘 , 𝑥𝑘 |𝒚1:𝑘−1 𝑑𝑥𝑘 = න𝑔(𝑦𝑘 |𝑥𝑘 )𝑝 𝑥𝑘 |𝒚1:𝑘−1 𝑑𝑥𝑘

 The calculation of 𝑝 𝒚1:𝑛 is a product of lower dimensional integrals.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 29

 Assume you have at time 𝑛 − 1

 Sampling from 𝑓 𝑥𝑛 |𝑥𝑛−1 is straightforward.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 30

 Finally 𝑝 𝒙1:𝑛 |𝒚1:𝑛 becomes:

 The computational cost is 𝒪 𝑁 .

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 34

 Assume the resampling shown, e.g. particle 2 at 𝑛 = 1 is 𝑥12 𝑥22 𝑥32

 If we are only interested in 𝑝 𝑥𝑛 |𝒚1:𝑛 or 𝑝 𝑠𝑛 𝒙1:𝑛 |𝒚1:𝑛 where 𝑠𝑛 𝒙1:𝑛 =

 Indeed, one can show:

 Here we used (see next) 𝑝 𝑥𝑛 𝑥𝑛+1 , 𝒚1:𝑇 = 𝑝 𝑥𝑛 |𝑥𝑛+1 , 𝒚1:𝑛 .

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 40

  p  yn:T | xn 1 , xn  f  xn 1 | xn  dxn 1   p  yn , yn 1:T | xn 1 , xn  f  xn 1 | xn  dxn 1

 To address this, a generalized version was proposed using a set of artificial

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 42

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 43

 log p  y1:n     log p  x1:n , y1:n  p  x1:n | y1:n  dx1:n

  log    x1  g  y1 | x1   p ( i 1)  x1 | y1:n  dx1

 Implementation of the EM algorithm requires computing expectations with respect to the

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 45

 Consider an SSM model of the form:

𝑋𝑛 = 𝑓 𝑋𝑛−1 + 𝑉𝑛 ; 𝑠. 𝑡. 𝑓 𝑋 ~𝐺𝑃 0; 𝑘𝜂,𝑓 (𝑥; 𝑥 ′ ) ;

 The model functions 𝑓 and 𝑔 are assumed to be realizations from Gaussian

 The inference task becomes the calculation of the joint posterior

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 46

You might also like