0% found this document useful (0 votes)

20 views

Introduction To State Space Models and Sequential Bayesian Inference

This document introduces state space models and sequential Bayesian inference. It discusses various applications of state space models like tracking problems, speech enhancement, modeling asset dynamics, and more. It outlines the goals of learning about state space models through examples, performing Bayesian inference in these models, and using importance sampling for state space models. Finally, it provides a list of references and resources for further reading on sequential Monte Carlo methods, particle filtering, and applications of state space models.

Uploaded by

hu jack

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Introduction To State Space Models and Sequential Bayesian Inference

Uploaded by

hu jack

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Introduction to State Space Models

and
Sequential Bayesian Inference
Prof. Nicholas Zabaras

Email: [email protected]
URL: https://www.zabaras.com/

November 10, 2020

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras

Contents
 References and SMC Resources, Applications of SMC

 INTRODUCING THE STATE SPACE MODEL: Discrete time Markov Models, Tracking
Problem, Speech Enhancement, the Dynamics of an Asset, The state space model with
observations, Linear Gaussian LG-SSM, Stochastic Volatility, Bearings only tracking,
Probabilistic Programming and SSM, Bayesian Inference tasks for the SSM

 BAYESIAN INFERENCE IN STATE SPACE MODELS: Target distribution, Particle

motion in a random medium – marginal likelihood, Closed form inference for the HMM

 MONTE CARLO and SEQUENTIAL IMPORTANCE SAMPLING: Review of MC, Review

of IS, Estimating the normalizing constant, Variance of the weights, Monte Carlo for the
State Space Model, IS for the State Space Model, Bias and Variance IS estimates,
Selection of Importance Density, Effective Sample Size

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 2

Goals
 The goals for today’s lecture include the following:

 Learn about the state space model through various examples

 Learn to perform Bayesian inference in state space models

 Learn how to perform importance sampling for state space models

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 3

References
 C.P. Robert & G. Casella, Monte Carlo Statistical Methods, Chapter 11

 J.S. Liu, Monte Carlo Strategies in Scientific Computing, Chapter 3, Springer-Verlag, New York.

 A. Doucet, N. De Freitas & N. Gordon (eds), Sequential Monte Carlo in Practice, Springer-Verlag: 2001

 A. Doucet, N. De Freitas, N.J. Gordon, An introduction to Sequential Monte Carlo, in SMC in Practice, 2001

 D. Wilkison, Stochastic Modelling for Systems Biology, Second Edition, 2006

 E. Ionides, Inference for Nonlinear Dynamical Systems, PNAS, 2006

 J.S. Liu and R. Chen, Sequential Monte Carlo methods for dynamic systems, JASA, 1998

 A. Doucet, Sequential Monte Carlo Methods, Short Course at SAMSI

 A. Doucet, Sequential Monte Carlo Methods & Particle Filters Resources

 Pierre Del Moral, Feynman-Kac models and interacting particle systems (SMC resources)

 A. Doucet, Sequential Monte Carlo Methods, Video Lectures, 2007

 N. de Freitas and A. Doucet, Sequential MC Methods, N. de Freitas and A. Doucet, Video Lectures, 2010
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 4
References
 M.K. Pitt and N. Shephard, Filtering via Simulation: Auxiliary Particle Filter, JASA, 1999

 A. Doucet, S.J. Godsill and C. Andrieu, On Sequential Monte Carlo sampling methods for Bayesian filtering, Stat.
Comp., 2000

 J. Carpenter, P. Clifford and P. Fearnhead, An Improved Particle Filter for Non-linear Problems, IEE 1999.

 A. Kong, J.S. Liu & W.H. Wong, Sequential Imputations and Bayesian Missing Data Problems, JASA, 1994

 O. Cappe, E. Moulines & T. Ryden, Inference in Hidden Markov Models, Springer-Verlag, 2005

 W Gilks and C. Berzuini, Following a moving target: MC inference for dynamic Bayesian Models, JRSS B, 2001

 G. Poyadjis, A. Doucet and S.S. Singh, Maximum Likelihood Parameter Estimation using Particle Methods, Joint
Statistical Meeting, 2005

 N Gordon, D J Salmond, AFM Smith, Novel Approach to nonlinear non Gaussian Bayesian state estimation, IEE,
1993

 Particle Filters, S. Godsill, 2009 (Video Lectures)

 R. Chen and J.S. Liu, Predictive Updating Methods with Application to Bayesian Classification, JRSS B, 1996

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 5

References
 C. Andrieu and A. Doucet, Particle Filtering for Partially Observed Gaussian State-Space Models, JRSS B, 2002

 R Chen and J Liu, Mixture Kalman Filters, JRSSB, 2000

 A Doucet, S J Godsill, C Andrieu, On SMC sampling methods for Bayesian Filtering, Stat. Comp. 2000

 N. Kantas, A.D., S.S. Singh and J.M. Maciejowski, An overview of sequential Monte Carlo methods for parameter
estimation in general state-space models, in Proceedings IFAC System Identification (SySid) Meeting, 2009

 C. Andrieu, A.Doucet & R. Holenstein, Particle Markov chain Monte Carlo methods, JRSS B, 2010

 C. Andrieu, N. De Freitas and A. Doucet, Sequential MCMC for Bayesian Model Selection, Proc. IEEE Workshop
HOS, 1999

 P. Fearnhead, MCMC, sufficient statistics and particle filters, JCGS, 2002

 G. Storvik, Particle filters for state-space models with the presence of unknown static parameters, IEEE Trans. Signal
Processing, 2002

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 6

References
 C. Andrieu, A. Doucet and V.B. Tadic, Online EM for parameter estimation in nonlinear-non Gaussian state-space
models, Proc. IEEE CDC, 2005

 G. Poyadjis, A. Doucet and S.S. Singh, Particle Approximations of the Score and Observed Information Matrix in
State-Space Models with Application to Parameter Estimation, Biometrika, 2011

 C. Caron, R. Gottardo and A. Doucet, On-line Changepoint Detection and Parameter Estimation for Genome Wide
Transcript Analysis, Technical report 2008

 R. Martinez-Cantin, J. Castellanos and N. de Freitas. Analysis of Particle Methods for Simultaneous Robot
Localization and Mapping and a New Algorithm: Marginal-SLAM. International Conference on Robotics and
Automation

 C. Andrieu, A.D. & R. Holenstein, Particle Markov chain Monte Carlo methods (with discussion), JRSS B, 2010

 A Doucet, Sequential Monte Carlo Methods and Particle Filters, List of Papers, Codes, and Viedo lectures on SMC
and particle filters

 Pierre Del Moral, Feynman-Kac models and interacting particle systems

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 7

References
 P. Del Moral, A. Doucet and A. Jasra, Sequential Monte Carlo samplers, JRSSB, 2006

 P. Del Moral, A. Doucet and A. Jasra, Sequential Monte Carlo for Bayesian Computation, Bayesian Statistics, 2006

 P. Del Moral, A. Doucet & S.S. Singh, Forward Smoothing using Sequential Monte Carlo, technical report, Cambridge
University, 2009

 P. Del Moral, Feynman-Kac Formulae, Springer-Verlag, 2004

 Sequential MC Methods, M. Davy, 2007

 A Doucet, A Johansen, Particle Filtering and Smoothing: Fifteen years later, in Handbook of Nonlinear Filtering (edts
D Crisan and B. Rozovsky), Oxford Univ. Press, 2011

 A. Johansen and A. Doucet, A Note on Auxiliary Particle Filters, Stat. Proba. Letters, 2008.

 A. Doucet et al., Efficient Block Sampling Strategies for Sequential Monte Carlo, (with M. Briers & S. Senecal),
JCGS, 2006.

 C. Caron, R. Gottardo and A. Doucet, On-line Changepoint Detection and Parameter Estimation for Genome Wide
Transcript Analysis, Stat Comput. 2011.

 F. Lindsten, M. Jordan and R.C. Schön, Particle Gibbs with Ancestor Sampling, JMLR, 2014.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 8

Sequential Monte Carlo (SMC) Methods
 SMC is a powerful alternative/complementary approach to MCMC to address
general Bayesian computational problems.

 Both MCMC and SMC are asymptotically bias-free but computationally

expensive.

 Variational and Expectation-Propagation (EP) methods are computationally

inexpensive but perform functional approximations of the posteriors of
interest.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 9

SMC Applications
 Sequential Monte Carlo (SMC) methods are used to approximate any
sequence of probability distributions.
 They are used often in engineering, physics/chemistry, biology, etc.
 Terrain navigation, trajectory planning
 Solve differential/integral eqs., compute eigenvalues of positive operators
 Simulate polymer chains, compute free energies
 Econometrics
 Speech enhancement
 Epidemiological modelling
 …..
 State space models (SSM) are used in most tutorials for introducing SMC –
but SMC is a method for a much bigger class of problems.
 SMC methods are often known as Particle Filtering or Smoothing Methods.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 10
Introducing the
State Space Model

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 11

Discrete-Time Markov Model
 Consider a discrete-time Markov process : {𝑋𝑛}, 𝑛 ≥ 1

 It is defined by an initial density 𝑋1 ~𝜇(. ሻ and a transition density:

X n |  X n 1  x  ~ f   | x 

 We then can write (prior distribution of the states):

n
p  x1:n   p  x1 , , xn    ( x1 ) f  xk | xk 1 
k 2

x1 x2 ... xn Markov Chain

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 12

Tracking Example
 Consider tracking a target in the 𝑋𝑌 plane (location/speed in 𝑥 − 𝑦):
X k   X k ,1 ,Vk ,1 , X k ,2 ,Vk ,2 
T

 We consider the constant velocity model:

i .i .d .
X k  AX k 1  Wk , Wk ~ N (0,  )
 ACV 0  1 T 
A  , ACV   
 0 ACV   0 1 
T 3 T2 
  CV 0 
 2
3 2
  2
 ,  CV
 0  CV   T 
T
 2 
 The transition density for this model is then:
f  xk | xk 1   N ( xk ; Axk 1 , )
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 13
Speech Enhancement
 We model speech signals as an autoregressive (AR) process, i.e.
d
S k    i S k i  Vk , Vk ~ N (0,  s2 )
i 1

 We can write this in a matrix form as follows:

U k  AU k 1  BVk , U k   S k ,...., S k  d 
T

 1  2 ...  d  1
   
 1   0
A ,B
 ...  :
   
 1  0
 The transition density is now:

fU  uk | uk 1   N   u  ;  Au  ,    
k 1 k 1 1
2
s uk 1 1:d 1 u  
k 2:d

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 14

Speech Enhancement
 We can also consider the AR coefficients to be time dependent:
 k   k 1  Wk , Wk ~ N (0,  2 I d ), where :
 k   k ,1 ,...,  k ,d 
T

 Thus for non-stationary speech signals, we can write:

f  k |  k 1   N ( k ;  k 1 ,  2 I d )
 The process 𝑋𝑘 = (𝛼𝑘, 𝑈𝑘 ሻ is a Markov with transition density
f  xk | xk 1   N ( k ;  k 1 ,  2 I d )N u  ;  A u  , 
k 1 k k 1 1
2
s uk 1 1:d 1 u  
k 2:d

with
 S k 1 
 Ak uk 1 1   k ,1 ,...,  k ,d   : 
T 

S 
 k 1 d 
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 15
Econometrics
 The Heston model (1993) describes the dynamics of an asset price 𝑆𝑡 with the
following model for 𝑋𝑡 = log(𝑆𝑡 ሻ

𝑑𝑋𝑡 = 𝜇𝑑𝑡 + 𝑑𝑊𝑡 + 𝑑𝑍𝑡

where 𝑍𝑡 is a jump process, and 𝑑𝑊𝑡 Brownian motion.

 We approximate this (time integration) by a discrete-time Markov process

𝑋𝑡+𝛿 = 𝑋𝑡 + 𝛿𝜇 + 𝑊𝑡+𝛿,𝑡 + 𝑍𝑡+𝛿,𝑡

 The same model is used for biochemical networks, disease and population
dynamics, etc.

 D. Wilkison, Stochastic Modelling for Systems Biology, Second Edition, 2006

 E. Ionides, Inference for Nonlinear Dynamical Systems, PNAS, 2006

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 16

The State Space Model
 Let us discuss in some detail a very popular dynamic system, the state space
model, now including observations
 A state space model is an extension of a Markov Chain which is able to
capture the sequential relations among hidden variables.
 It is a dynamic system including two major parts
y1 y2 ... yn observations

x1 x2 ... xn Markov Chain

 The graphical model represents the probabilistic model. Arrows indicate

conditioning dependencies.
 C. A. Naesseth, F. Lindsten and T. B. Schön. Sequential Monte Carlo methods for graphical models. Advances in Neural Information Processing Systems
(NIPS), Montreal, Quebec, Canada, December, 2014.
 L. Murray and T. B. Schön. Automated learning with a probabilistic programming language: Birch. Annual Reviews in Control, 46:29-43, 2018. 11/19
 Ghahramani, Z. Probabilistic machine learning and articial intelligence. Nature 521:452-459, 2015.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 17
The State Space Model
 The two parts can be expressed by equations
 state equation: {𝑋𝑛}, 𝑛 ≥ 1 is a latent/hidden Markov process with
X 1 ~  (.) and X n |  X n 1  xn 1  ~ f   | xn 1 
 observation equation: {𝑌𝑛}, 𝑛 ≥ 1 is an observation process with the observations being
conditionally independent given {𝑋𝑛}, 𝑛 ≥ 1

Yn |  X n  xn  ~ g   | xn 

 The observations {𝑦𝑛} are conditionally independent given the Markov states
{𝑥𝑛}. Thus the likelihood is
n
p  y1 , , yn | x1 , , xn    g  yi | xi 
i 1

 Our aim is to recover {𝑋𝑛}, 𝑛 ≥ 1 given {𝑌𝑛}, 𝑛 ≥ 1.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 18

Linear Gaussian State Space Model (LG-SSM)
 A Linear Gaussian State Space Model (LG-SSM)

𝑋𝑛 = 𝐴𝑋𝑛−1 + 𝐵𝑢𝑛 + 𝑉𝑛

𝑌𝑛 = 𝐶𝑋𝑛 + 𝐷𝑢𝑛 + 𝐸𝑛

 Here 𝑋𝑛 ∈ ℝ𝑛𝑥 denotes the state, 𝑢𝑛 ∈ ℝ𝑛𝑢 denotes an explanatory variable

(known signal) and 𝑌𝑛 ∈ ℝ𝑛𝑦 denotes the measurements.

 The initial state and noise variables are defined as:

𝑋0 𝜇 𝑃0 0 0
𝑉𝑛 ~𝒩 0 , 0 𝑄 𝑆
𝐸𝑛 0 0 𝑆𝑇 𝑅
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 19
Stochastic Volatility Model
 A Stochastic Volatility Model

 2 
X 1 ~ N  0, 2 
and X n   X n 1  Vn
 1 
Yn   exp  X n / 2  Wn , where
i .i .d . i .i .d .
|  | 1, Vn ~ N (0,  ) and Wn ~ N (0,1)
2

xn ~ N  xn 1 ,  2  g  yn | xn   N  yn ;0,  2 exp  xn  

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 20

Bearings-Only Tracking
 The simplest linear model is of the form:
i .i .d .
Yk  CX k  Ek , Ek ~ N (0,  e ) 
g ( yk | xk )  N ( yk ; CX k ,  e )

 The non-linear version (Bearings-only-tracking) is more popular:

1 X k ,2
i .i .d .
Yk  tan  Ek , Ek ~ N (0,  2 ) 
X k ,1
 1 xk ,2

g ( yk | xk )  N  yk ; tan , 
2
 x 
 k ,1 
 Note that the mean of the Gaussian is a highly non-linear function of the state.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 21

Probabilistic Programming and SMC
 A probabilistic program encodes a probabilistic model according to the
semantics of a particular probabilistic programming language, giving rise to a
programmatic model.

 The memory state of a running probabilistic program evolves dynamically and

stochastically in time and so is a stochastic process.

 SMC is a common inference method for programmatic model.

 Creates a clear separation between the model and the inference methods.
Opens up for the automation of inference!

 L. Murray and T. B. Schön. Automated learning with a probabilistic programming language: Birch. Annual Reviews in Control, 46:29-43, 2018. 11/19
 Ghahramani, Z. Probabilistic machine learning and articial intelligence. Nature 521:452-459, 2015.
 Noah D. Goodman and Andreas Stuhlmüller. The design and implementation of probabilistic programming languages. Retrieved 2019-8-29 from
http://dippl.org
 Jan-Willem van de Meent et al. An introduction to probabilistic programming. arXiv preprint arXiv:1809.10756, 2018.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 22
Bayesian Inference for the SSM
 At time 𝑛, we have a total of 𝑛 observations and the target distribution to be
estimated is the posterior 𝑝 𝒙1:𝑛 |𝒚1:𝑛 .
 The target distribution is “time-varying”. The posterior distribution should be
updated after new observations are added. Thus we need to estimate a
sequence of distributions according to the time sequence.
y1 y2 ... yn observations

𝑝 𝒙1:𝑛 |𝒚1:𝑛
x1 x2 ... xn Markov Chain

p  x1 | y1  p  x1:2 | y1:2  p  x1:n | y1:n  Target Distribution

n
Likelihood : p  y1 , , yn | x1 , , xn    g  yi | xi 
i 1
n
Prior : p  x1:n     x1   f  xk | xk 1 
k 2

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 23

Bayesian Inference for the SSM
 While our overall estimation problem is to compute the joint filtering
distribution 𝑝 𝒙1:𝑛 |𝒚1:𝑛 , the following inference problems are also of interest:
 Filtering: Compute 𝑝 𝑥𝑛 |𝒚1:𝑛 y1 y2 ... yn
𝑝 𝒙1:𝑛 |𝒚1:𝑛
 Prediction: Compute 𝑝 𝑥𝑛+1 |𝒚1:𝑛
x1 x2 ... xn
 Joint Smoothing: 𝑝 𝒙1:𝑇 |𝒚1:𝑇
p  x1 | y1  p  x1:2 | y1:2  p  x1:n | y1:n 
 Marginal Smoothing: 𝑝 𝑥𝑛 |𝒚1:𝑇 , 𝑛 ≤ 𝑇
n
Likelihood : p  y1 , , yn | x1 , , xn    g  yi | xi 
 Note: The Kalman filter provides an i 1
n
analytical solution to the filtering problem Prior : p  x1:n     x1   f  xk | xk 1 
for a LG-SSM. k 2

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 24

Bayesian Inference in State
Space Models

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 25

Target Distribution
 In Bayesian estimation, the target distribution (posterior) for the SSM is
𝑝 𝒙1:𝑛 |𝒚1:𝑛 .

 The state equation for the Markov process defines a prior as

n
p  x1:n     x1   f  xk | xk 1 
k 2

 The observation equation defines the likelihood as

n
p  y1:n | x1:n    g  yk | xk 
k 1

 The posterior distribution is known up to a normalizing constant

p  x1:n , y1:n  n n
p  x1:n | y1:n    p  x1:n , y1:n   p  x1:n  p  y1:n | x1:n     x1   f  xk | xk 1  g  yk | xk  and
p  y1:n  k 2 k 1
Pr ior Likelihood

p  y1:n    ... p  x1:n  p  y1:n | x1:n  d x1:n

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 26

Target Distribution
 In this lecture, our target distribution is as follows:

 n  x1:n 
 n  x1:n    p  x1:n | y1:n  ,  n  x1:n   p  x1:n , y1:n  , Z n  p  y1:n 
Zn

 The posterior and marginal likelihood do not admit close forms unless

 {𝑋𝑛} and {𝑌𝑛} follow linear Gaussian equations or

 When {𝑋𝑛} takes values in a finite state space 𝒳 (finite-state-space HMM)

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 27

Point Estimates and Posterior Marginals
 From the posterior distribution, one can compute useful point estimates
arg max p  x1:n | y1:n 
 One can also compute the MAP estimate for components or the marginals
arg max p  xk | y1:n 
p  xk | y1:n    ... p  x1:n | y1:n  d x1:k 1dxk 1:n

 The posterior mean (minimum mean square estimate) can also be estimated
as:
 X k | y1:n    xk p  xk | y1:n  dxk

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 28

Particle Motion in a Random Medium
 Consider a Markovian particle {𝑋𝑛 }, 𝑛 ≥ 1 evolving in a random medium as
follows:
X1 ~  (.) and X n1 |  X n  x  ~ f  | x 

 At time 𝑛, the probability for the particle to be killed is given as 1 − 𝑔(𝑋𝑛ሻ,

where 0 ≤ 𝑔(𝑥ሻ ≤ 1 for any 𝑥 ∈ 𝐸.

 Let 𝑇 be the time at the which the particle is killed. We want to compute the
probability Pr(𝑇 > 𝑛ሻ.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 29

Particle Motion in a Random Medium
 Starting from 𝑡 = 1, given the current state 𝑥1, the probability for the particle to
survive is given as 𝑔(𝑥1ሻ.

 Thus, the joint probability (particle at state 𝑥1, particle survive) is

  x1  g  x1 

 By integration on 𝑥1, the probability that a particle survives at time 𝑡 = 1 is

   x g  x  dx
1 1 1

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 30

Particle Motion in a Random Medium
 At 𝑡 = 2, given the state 𝑥1, the current state 𝑥2 is determined by the transition
probability 𝑓 𝑥2 |𝑥1
 The probability for such a particle to survive at time 𝑡 = 2 is also determined by
current state 𝑥2, i.e. the probability is 𝑔(𝑥2ሻ
 If the particle survives at time 𝑡 = 2, it means
1. at time 1, the particle survives with probability 𝑔(𝑥1ሻ
2. state 𝑥1 determines the current state 𝑥2 with probability 𝑓(𝑥2|𝑥1ሻ
3. the probability to survive at time 𝑡 = 2 is 𝑔(𝑥2ሻ

The joint probability for the three events is

  x1  f  x2 | x1  g  x1  g  x2 
𝜇 𝑥1 𝑓 𝑥2 |𝑥1 determines the random states
𝑔 𝑥1 𝑔 𝑥2 determines the probability to survive at each time

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 31

Particle Motion in a Random Medium
 This can be considered as a typical Hidden Markov Model
Markov Chain (state equation)
xk ~ f  xk | xk 1 

Survive (observation equation)

yk ~ g  xk 

 The probability density for the particle to survive at time 𝑡 = 𝑛 is

n n
  x1   f  xk | xk 1    g  xk 
k 2 k 1

 By integration over the state variables 𝑥𝑘 , we obtain the probability for the
particle to survive at time 𝑡 = 𝑛
Pr(T  n)    Pr obability of surving at time n  
n n
    x1   f  xk | xk 1  g  xk dx1:n
k 2 k 1

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 32

Particle Motion in a Random Medium
Pr(𝑇 > 𝑛ሻ = 𝔼𝜇 Pr𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑠𝑢𝑟𝑣𝑖𝑛𝑔 𝑎𝑡 𝑡𝑖𝑚𝑒 𝑛 =
𝑛 𝑛

= ඲ 𝜇 𝑥1 ෑ 𝑓 𝑥𝑘 |𝑥𝑘−1 ෑ 𝑔 𝑥𝑘 𝑑𝒙1:𝑛
𝑘=2 𝑘=1
 To place this calculation in our SMC framework, we define the following:
n n
 n  x1:n     x1   f  xk | xk 1    g  xk 
k 2 k 1

 Then the integration needed to compute the required probability is just the
normalization constant of 𝛾𝑛 𝒙1:𝑛 , i.e.
n n
Z n     x1   f  xk | xk 1  g  xk dx1:n
k 2 k 1

 n  x1:n 
 n  x1:n   and
Zn
Z n  Pr(T  n)
33
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras
Closed Form Inference in HMM
 We have closed-form solutions for finite state-space HMM as all integrals are
becoming finite sums.

 For linear Gaussian models (LG-SSM), all the posterior distributions are
Gaussian (Kalman filter).

 In most cases of interest, it is not possible to compute the solution in closed-

form and we need numerical approximations.

 This is the case for all non-linear non-Gaussian models.

 SMC methods for such problems are in some sense asymptotically

consistent.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 34

Closed Form Inference in HMM
 Gaussian approximations: Extended Kalman filter, Unscented Kalman filter.
 Gaussian sum approximations (a weighted sum of Gaussian PDFs can be
used to approximate arbitrarily closely another density function).
 Projection filters (similar to variational methods in machine learning).
 Simple discretization of the state-space.
 Analytical methods work in simple cases but are not reliable and it is
difficult to diagnose when they fail.
 Standard discretization of the space is expensive and difficult to implement
in high-dimensional scenarios.
 We need numerical approximations.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 35

Monte Carlo Methods,
Importance Sampling and
Sequential Importance Sampling

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 36

Monte Carlo Methods
 Our goal is to compute an expectation value of the form :
 f  x    A f  x    x  dx
 

where 𝜋(𝒙ሻ is a probability distribution (posterior inference in Bayesian

models, Bayesian model validation, etc.)

𝛾(𝒙ሻ
 We assume that 𝜋(𝒙ሻ = where 𝑍 = ‫𝒙(𝛾 ׬‬ሻ𝑑𝒙 is unknown and  is known
𝑍
pointwise.

 The basic idea in Monte Carlo methods is to sample 𝑁 i.i.d. random numbers
𝑁
𝑿 𝑖 𝑖.𝑖.𝑑.
~𝜋 . and build an empirical measure 𝜋ො 𝒙 𝑑𝒙 =
1
෍ 𝛿𝑿 𝑖 𝑑𝒙
𝑁 𝑖=1

1 𝑁 𝑖 𝑖 𝑖.𝑖.𝑑.
 Using this: 𝔼𝜋ෝ 𝑓 𝒙 = ෌𝑖=1 𝑓 𝑿 , where 𝑿 ~𝜋 .
𝑁
J.S. Liu, Monte Carlo Strategies in Scientific Computing, Chapter 3, Springer-Verlag, New York.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 37

Monte Carlo Methods
 Using the approximation of 𝜋: 𝑁
1 𝑖 𝑖.𝑖.𝑑.
𝔼𝜋ෝ 𝑓 𝒙 = ෍𝑓 𝑿 , where 𝑿 𝑖 ~ 𝜋 .
𝑁
 The following hold: 𝑖=1
1 2 2
𝔼 𝔼𝜋ෝ 𝑓 = 𝔼𝜋 𝑓 , 𝑉 𝔼𝜋ෝ 𝑓 = 𝔼 𝑓 − 𝔼𝜋 𝑓 , 𝑁 𝔼𝜋ෝ 𝑓 − 𝔼𝜋 𝑓 ~𝒩 0, 𝔼𝜋 𝑓 − 𝔼𝜋 𝑓
𝑁 𝜋

 Similarly, marginalization is also simple:

𝑁
1
𝜋(𝑥
ො 𝑝 ሻ𝑑𝑥𝑝 = න𝜋(𝑥
ො 1 , 𝑥2 , . . . , 𝑥𝑛 ሻ𝑑𝒙1:𝑝−1 𝑑𝒙𝑝+1:𝑛 = ෍ 𝛿𝑋 𝑖 𝑑𝑥𝑝
𝑁 𝑝
𝑖=1
 In MC, the samples automatically concentrate in regions of high probability
mass regardless of the dimension of the space.

 However, it is not always easy or effective to sample from the original

probability distribution 𝜋(𝒙ሻ. A more effective strategy is to focus on the
regions of “importance” in 𝜋(𝒙ሻ so as to save computational resources.
J.S. Liu, Monte Carlo Strategies in Scientific Computing, Chapter 3, Springer-Verlag, New York.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 38
Importance Sampling
 We assume that 𝜋(𝒙ሻ is only known up to a normalizing constant:
  x
  x 
Z
 For any distribution 𝑞(𝒙ሻ such that 𝜋(𝒙ሻ > 0 𝑞(𝒙ሻ > 0, we can write:
w x q  x w x q  x   x
  x  = , where w  x  
 w  x  q  x  dx Z q x
Z

 The proposal distribution 𝑞(𝒙ሻ is known as “importance density”. 𝑤(𝒙ሻ is

called the importance weight.

 The importance density can be chosen arbitrarily as any proposal easy to

sample from: 𝑁
𝑖 𝑖.𝑖.𝑑.
1
𝑿 ~𝑞 ෝ 𝒙 𝑑𝒙 = ෍ 𝛿𝑿 𝑖 𝑑𝒙
𝒙 ⇒𝒒
𝑁
𝑖=1
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 39
Importance Sampling
1 𝑁
 Substitution of 𝑞ො 𝒙 𝑑𝒙 = σ𝑖=1 𝛿𝑿 𝑖 𝑑𝒙 in the importance sampling identity
𝑁
gives:
1 𝑁 𝑖 𝛿 𝑁
𝑤 𝒙 𝑞ො 𝒙 ෌ 𝑤 𝑿 𝑖 𝑑𝒙
𝑑𝒙 = 𝑁
𝑖=1 𝑿 𝑖
𝜋ො 𝒙 𝑑𝒙 = = ෍𝑊 𝛿𝑿 𝑖 𝑑𝒙 ,
‫׬‬ 𝑤 𝒙 𝑞
ො 𝒙 𝑑𝒙 1 𝑁
෌𝑖=1 𝑤 𝑿 𝑖 𝑖=1
𝑁
𝑁

𝑤ℎ𝑒𝑟𝑒 𝑊 𝑖 ∝𝑤 𝑿 𝑖 𝑎𝑛𝑑 ෍ 𝑊 𝑖 =1
𝑖=1
 Similarly, we can approximate the normalization factor of our target
distribution as follows:
𝑁
𝑁 𝑖
𝛾 𝒙 1 1 𝛾 𝑿
𝑍መ = ඲ 𝑞ො 𝒙 𝑑𝒙 = න𝑤 𝒙 𝑞ො 𝒙 𝑑𝒙 = ෍ 𝑤 𝑿 𝑖
= ෎ 𝑖
𝑞 𝒙 𝑁 𝑁 𝑞 𝑿
𝑖=1
𝑖=1

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 40

Importance Sampling
𝑁 𝑁

𝜋ො 𝒙 𝑑𝒙 = ෍ 𝑊 𝑖 𝛿𝑿 𝑖 𝑑𝒙 , 𝑤ℎ𝑒𝑟𝑒 𝑊 𝑖 ∝𝑤 𝑿 𝑖 𝑎𝑛𝑑 ෍ 𝑊 𝑖 =1
𝑖=1 𝑖=1

 The distribution 𝜋(𝒙ሻ is now approximated by a weighted sum of delta

masses, where the weights compensate for the discrepancy between
𝜋(𝒙ሻ and 𝑞(𝒙ሻ.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 41

Importance Sampling
 Similarly calculation of 𝔼𝜋 𝑓 𝒙 using importance sampling gives:
𝑁

𝔼𝜋ෝ 𝑓 𝒙 = න 𝑓 𝒙 𝜋ො 𝒙 𝑑𝒙 = ෍ 𝑓 𝑿(𝑖ሻ 𝑊 𝑖
𝐴 𝑖=1

 The statistics of this estimate are given for 𝑁 >> 1 as follows:

1
𝔼 𝔼𝜋ෝ 𝑓 𝒙 = 𝔼𝜋 𝑓 𝒙 − 𝔼 𝑊 𝑿 𝑓 𝑿 − 𝔼𝜋 𝑓 𝒙
𝑁𝜋
1 2
𝑉 𝔼𝜋ෝ 𝑓 𝒙 = 𝔼 𝑊 𝑿 𝑓 𝑿 − 𝔼𝜋 𝑓 𝒙
𝑁𝜋

where as you recall we have some negligible bias:

1
𝔼 𝑊 𝑿 𝑓 𝑿 − 𝔼𝜋 𝑓 𝒙
𝑁𝜋

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 42

Estimating the Normalization Constant
 We can similarly compute the statistics of the normalization constant:
𝑁
𝑁
𝛾 𝒙 1 𝛾 𝑿(𝑖ሻ 1
መ
𝑍=඲ 𝑞ො 𝒙 𝑑𝒙 = ෎ = ෍ 𝑤 𝑿 (𝑖ሻ
𝑞 𝒙 𝑁 𝑞 𝑿(𝑖ሻ 𝑁
𝑖=1
𝑖=1

 They are given as:

𝔼 𝑍መ = 𝑍, 𝑎𝑛𝑑
2
1 𝛾 𝒙
𝑉 𝑍መ = 𝔼𝑞 −𝑍
𝑁 𝑞 𝒙

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 43

Variance of the Weights
 We select 𝑞(𝒙ሻ as close as possible to 𝜋(𝒙ሻ.

 The variance of the weights is bounded iff

 2  x
 q x dx  
 In practice, it is sufficient to ensure that the weights are bounded:

  x
w x  
q x

 This is equivalent to saying that 𝑞(𝒙ሻ should have heavier tails than 𝜋(𝒙ሻ.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 44

Monte Carlo for the State Space Model
 We are interested to estimate 𝑝 𝒙1:𝑛 |𝒚1:𝑛 = 𝑝 𝑝𝒙1:𝑛𝒚 ,𝒚1:𝑛 ∝ 𝑝 𝒙1:𝑛 , 𝒚1:𝑛
1:𝑛

 For now let us start with a fixed 𝑛.

 A Monte Carlo approximation (empirical measure) of our target distribution is
of the form: 𝑁
1 𝑖
𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛 = ෍ 𝛿𝑿 𝑖 𝒙1:𝑛 , 𝑤ℎ𝑒𝑟𝑒 𝑿1:𝑛 ~𝑝 𝒙1:𝑛 |𝒚1:𝑛
𝑁 1:𝑛
𝑖=1

 For any function 𝜑 𝒙1:𝑛 : 𝒳 𝑛 → ℝ, we can use a Monte Carlo approximation of

its expectation as:

𝔼𝑝ො𝑁 𝒙1:𝑛 |𝒚1:𝑛 𝜑 = න 𝜑 𝒙1:𝑛 𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛 𝑑𝒙1:𝑛

𝒳𝑛
𝑁 𝑁
1 1 𝑖
= න 𝜑 𝒙1:𝑛 ෍ 𝛿𝑿 𝑖 𝒙1:𝑛 𝑑𝒙1:𝑛 = ෍ 𝜑 𝑿1:𝑛
𝑁 1:𝑛 𝑁
𝒳𝑛 𝑖=1 𝑖=1

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 45

Monte Carlo for the State Space Model
 This earlier estimate is asymptotically consistent (converges towards
𝔼𝑝 𝒙1:𝑛 |𝒚1:𝑛 𝜑 ).

 The estimate is unbiased and its variance gives the following convergence
properties:

 The rate of convergence is independent of 𝑛. This does not imply that Monte
Carlo bits the curse of dimensionality of 𝒳 𝑛 since it is possible that
𝑉𝑎𝑟𝑝 𝒙1:𝑛 |𝒚1:𝑛 𝜑 increases (with time) 𝑛.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 46

Monte Carlo for the State Space Model
 The Monte Carlo approximation can easily be used to compute any marginal
distribution, e.g. 𝑝 𝑥𝑘 |𝒚1:𝑛

𝑝Ƹ 𝑁 𝑥𝑘 |𝒚1:𝑛 = න 𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛 𝑑𝒙1:𝑘−1 𝑑𝒙𝑘+1:𝑛

𝒳 𝑛−1

𝑁
1
= ඲ ෍ 𝛿𝑿 𝑖 𝒙1:𝑛 𝑑𝒙1:𝑘−1 𝑑𝒙𝑘+1:𝑛
𝑁 1:𝑛
𝑖=1
𝒳 𝑛−1
𝑁
1
= ෍ 𝛿𝑋 𝑖 𝑥𝑘
𝑁 𝑘
𝑖=1

 Note that the marginal likelihood 𝑝 𝒚1:𝑛 cannot be estimated as easily using
𝑖
𝑿1:𝑛 ~𝑝 𝒙1:𝑛 |𝒚1:𝑛 .

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 47

Difficulties with Standard Monte Carlo Sampling
 Problem 1 - It is difficult to generate exact samples from our target high-
𝑖
dimensional distribution 𝑿1:𝑛 ~𝑝 𝒙1:𝑛 |𝒚1:𝑛 .

 MCMC methods are not useful in this context (they are not recursive).

 Problem 2 – Even if we can address Problem 1, algorithms to generate

samples from 𝑝 𝒙1:𝑛 |𝒚1:𝑛 will have at least complexity 𝒪 (𝑛ሻ (increasing
linearly with 𝑛).

 As 𝑛 increases, we would like to be able to sample from 𝑝 𝒙1:𝑛 |𝒚1:𝑛 with

an algorithm that keeps the computational cost fixed at each time step 𝑛.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 48

Difficulties with Standard Monte Carlo Sampling
 Problem 1 - It is difficult to generate exact samples from our target high-
𝑖
dimensional distribution 𝑿1:𝑛 ~𝑝 𝒙1:𝑛 |𝒚1:𝑛 .

 Problem 2 – Even if we can address Problem 1, algorithms to generate

samples from 𝑝 𝒙1:𝑛 |𝒚1:𝑛 will have at least complexity 𝒪 (𝑛ሻ (increasing
linearly with 𝑛).

 SMC solve partially both problems by breaking the sampling from 𝑝 𝒙1:𝑛 |𝒚1:𝑛
into a collection of simpler subproblems. First approximate 𝑝 𝑥1 |𝑦1 and 𝑝 𝑦1
at time 1, then 𝑝 𝒙1:2 |𝒚1:2 and 𝑝 𝒚1:2 at time 2 and so on.

 Each target distribution is approximated by a cloud of random samples

(particles) evolving according to importance sampling & resampling steps.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 49

Importance Sampling for the State Space Model
 Rather than sampling directly from our target distribution 𝑝 𝒙1:𝑛 |𝒚1:𝑛 , we should
sample from an importance distribution 𝑞 𝒙1:𝑛 |𝒚1:𝑛 .
 Note that in the notation here for 𝑞, 𝒚1:𝑛 is used as a parameter – not to indicate any
posterior distribution.
 The importance distribution needs to satisfy the following properties:

 The support of 𝑞 𝒙1:𝑛 |𝒚1:𝑛 includes the support of 𝑝 𝒙1:𝑛 |𝒚1:𝑛 i.e.
p  x1:n | y1:n   0  q  x1:n | y1:n   0

 It is easy to sample from 𝑞 𝒙1:𝑛 |𝒚1:𝑛

 We use the following identity:  p  x1:n , y1:n  q  x1:n | y1:n   q  x1:n | y1:n 
p  x1:n , y1:n 
p  x1:n | y1:n   
 px 1:n , y1:n  dx1:n   p  x1:n , y1:n  q  x1:n | y1:n  q  x1:n | y1:n  dx1:n
w  x1:n , y1:n  q  x1:n | y1:n 

 w  x , y  q  x | y  dx
1:n 1:n 1:n 1:n 1:n
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 50
Importance Sampling for the State Space Model
 Let us draw 𝑁 samples from our importance distribution:
𝑁
𝑖 1
𝑿1:𝑛 ~𝑞 𝒙1:𝑛 |𝒚1:𝑛 , 𝑞ො𝑁 𝒙1:𝑛 |𝒚1:𝑛 = ෍ 𝛿𝑿 𝑖 𝒙1:𝑛
𝑁 1:𝑛
𝑖=1
 Then using the identity in the earlier slide, we obtain the following approximation of our target
distribution:
𝑤 𝒙1:𝑛 , 𝒚1:𝑛 𝑞ො𝑁 𝒙1:𝑛 |𝒚1:𝑛
𝑝Ƹ 𝑁 𝒙1:𝑛 |𝒚1:𝑛 =
‫𝒙 𝑤 ׬‬1:𝑛 , 𝒚1:𝑛 𝑞ො𝑁 𝒙1:𝑛 |𝒚1:𝑛 𝑑𝒙1:𝑛
1
𝑤 𝒙1:𝑛 , 𝒚1:𝑛 𝑁 σ𝑁 𝑖=1 𝛿𝑿 𝑖 𝒙1:𝑛
1:𝑛
=
1
න 𝑤 𝒙1:𝑛 , 𝒚1:𝑛 𝑁 σ𝑁 𝑖=1 𝛿𝑿 𝑖 𝒙1:𝑛 𝑑𝒙1:𝑛
1:𝑛
𝑁 𝑖
𝑖 𝑖
𝑤 𝑿1:𝑛 , 𝒚1:𝑛
= ෍ 𝑊𝑛 𝛿𝑿 𝑖 𝒙1:𝑛 , 𝑊𝑛 = 𝑁
1:𝑛 𝑖
𝑖=1 ෍ 𝑤 𝑿1:𝑛 , 𝒚1:𝑛
𝑖=1

𝑁 𝑁
1 1 𝑖
 Note that: 𝑍መ𝑛 ≡ 𝑝Ƹ 𝑁 𝒚1:𝑛 = ඲ 𝑤 𝒙1:𝑛 , 𝒚1:𝑛 ෍ 𝛿𝑿 𝑖 𝒙1:𝑛 𝑑𝒙1:𝑛 = ෍ 𝑤 𝑿1:𝑛 , 𝒚1:𝑛
𝑁 1:𝑛 𝑁
𝑖=1 𝑖=1
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 51
Normalized Weights in Importance Sampling
 The unnormalized weights were defined as follows:
p  x1:n , y1:n  p  x1:n | y1:n 
Unnormalized weights : w  x1:n , y1:n    p  y1:n 
q  x1:n | y1:n  q  x1:n | y1:n 
Discrepancy between
target distribution and
importance distribution

 The normalized weights are then given as:

Normalized weights :Wn(i ) 


w X1:(in) , y1:n 
 w X 
N
(i )
1:n , y1:n
i 1

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 52

Optimal Importance Sampling Distribution
 𝑝Ƹ 𝑁 𝒚1:𝑛 is an unbiased estimate of 𝑍𝑛 ≡ 𝑝 𝒚1:𝑛 with variance:

1 2 𝑝2 𝒙1:𝑛 |𝒚1:𝑛
𝑍 ඲ 2 𝑞 𝒙1:𝑛 |𝒚1:𝑛 𝑑𝒙1:𝑛 − 1
𝑁 𝑛 𝑞 𝒙1:𝑛 |𝒚1:𝑛

 You can bring this variance to zero (variance of the unnormalized weights equal to zero) with
the selection

𝑞 𝒙1:𝑛 |𝒚1:𝑛 = 𝑝 𝒙1:𝑛 |𝒚1:𝑛

Of course this is what we wanted to avoid (we want to sample from an easier distribution).

 However, this results points to the fact that the choice of 𝑞 needs to be as close as possible to
the target distribution.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 53

Bias & Variance of Importance Sampling Estimates
 We are interested in an importance sampling approximation of 𝔼𝑝 𝒙 𝜑
1:𝑛 |𝒚1:𝑛
𝑁
𝑖 𝑖
𝐼𝑛𝐼𝑆 𝜑 ≡ 𝔼𝑝ො𝑁 𝒙1:𝑛 |𝒚1:𝑛 𝜑 = ෍ 𝑊𝑛 𝜑 𝑿1:𝑛
𝑖=1

 The asymptotic bias is of the order 1/𝑁 (negligible) and the MSE error is:

MSE  bias 2  variance


O N 2   
O N 1

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 54

Selection of Importance Sampling Distribution
 For a given test function, 𝜑 𝒙1:𝑛 it is easy to establish the importance
distribution minimizing the asymptotic variance of 𝐼𝑛𝐼𝑆 𝜑 .

 However, such a result is of minimal interest in a filtering context as this

distribution depends on 𝜑 𝒙1:𝑛 and we are typically interested in the
expectations of several test functions.

 Moreover, even if we were interested in a single test function, say 𝜑 𝒙1:𝑛 =

𝑥𝑛 , then selecting the optimal importance distribution at time 𝑛 would have
detrimental effects when we will try to obtain a sequential version of the
algorithms.

 The optimal distribution for estimating 𝜑 𝒙1:𝑛−1 will almost certainly not be
even similar to the marginal distribution of 𝒙1:𝑛−1 in the optimal distribution for
estimating 𝜑 𝒙1:𝑛 and this will prove to be problematic.
Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 55
Selection of Importance Sampling Distribution
 A more appropriate approach in this context is to attempt to select the 𝑞 𝒙1:𝑛 |𝒚1:𝑛 which
minimizes the variance of the importance weights (or, equivalently, the variance of 𝑍መ𝑛 ).

 Clearly, this variance is minimized for 𝑞 𝒙1:𝑛 |𝒚1:𝑛 = 𝑝 𝒙1:𝑛 |𝒚1:𝑛 . We cannot do this as this is
the reason we used IS in the first place. However, this simple result indicates that we should
aim at selecting an IS distribution which is close as possible to the target.

 As discussed before, the importance sampling distribution should be selected so that the
weights are bounded or equivalently 𝑞 𝒙1:𝑛 |𝒚1:𝑛 has heavier tails than 𝑝 𝒙1:𝑛 |𝒚1:𝑛

𝑤 𝒙1:𝑛 , 𝒚1:𝑛 ≤ 𝐶 ∀𝒙1:𝑛 ∈ 𝒳 𝑛

 Note that the selection of the importance sampling needs to be not only such that it covers the
support of the target but also needs to be a clever one for the particular problem of interest.

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 56

Effective Sample Size
 In our importance sampling approximation from the target 𝑝 𝒙1:𝑛 |𝒚1:𝑛 using the importance
distribution 𝑞 𝒙1:𝑛 |𝒚1:𝑛 (for a fixed 𝑛), we would like ideally to have 𝑞 𝒙1:𝑛 |𝒚1:𝑛 =
𝑝 𝒙1:𝑛 |𝒚1:𝑛 .

 In this case, all the unnormalized importance weights will be equal and their variance equal to
zero.

 To access the quality of the importance sampling approximation, note that for flat functions,
Variance of IS estimate
 1  Varq x1:n | y1:n W  X1:n | y1:n 
Variance of Standard MC estimate

 This is often interpreted as the effective sample size (𝑁 weighted samples from 𝑞 𝒙1:𝑛 |𝒚1:𝑛
are approximately equivalent to 𝑀 unweighted samples from 𝑝 𝒙1:𝑛 |𝒚1:𝑛 ).

N
M N
1  Varq x1:n | y1:n W  X1:n | y1:n 

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 57

Effective Sample Size
 We often approximate the effective sample size 𝑀 as follows:
1
 N (i )2 
ESS    Wn 
 i 1 
since

   
N
Varq x1:n | y1:n W X (i )
1:n | y1:n  N  W 2 X1:( in) | y1:n  1
i 1

𝑁
 We clearly can see from 𝐸𝑆𝑆 = that
1+𝑉𝑎𝑟𝑞 𝒙1:𝑛 |𝒚1:𝑛 𝑊 𝑿1:𝑛 |𝒚1:𝑛

1
 N (i )2 
1  ESS    Wn   N
 i 1 
 We can thus have

 𝐸𝑆𝑆 = 1 (one of the weights equal to 1, all other zero, very inefficient) to

 𝐸𝑆𝑆 = 𝑁 (all weights equal to 1/𝑁, excellent sampling).

Statistical Computing and Machine Learning, Fall 2020, N. Zabaras 58

Lecture 8.2 - Variational Quantum Eigensolver
No ratings yet
Lecture 8.2 - Variational Quantum Eigensolver
27 pages
Anne McDonnell Sill - Statistics For Laboratory Scientists and Clinicians - A Practical Guide (2021, Cambridge University Press) - Libgen - Li
No ratings yet
Anne McDonnell Sill - Statistics For Laboratory Scientists and Clinicians - A Practical Guide (2021, Cambridge University Press) - Libgen - Li
187 pages
Beyond The Kalman FilterParticle Filters For Tracking Applications
100% (1)
Beyond The Kalman FilterParticle Filters For Tracking Applications
47 pages
v88c02
No ratings yet
v88c02
41 pages
On Particle Methods For Parameter Estimation in State-Space Models
No ratings yet
On Particle Methods For Parameter Estimation in State-Space Models
25 pages
A Tutorial On Particle Filtering and Smoothing: Fifteen Years Later
No ratings yet
A Tutorial On Particle Filtering and Smoothing: Fifteen Years Later
41 pages
Particle Filter Tutorial
No ratings yet
Particle Filter Tutorial
39 pages
On Sequential Monte Carlo Sampling Methods For Bayesian Filtering
No ratings yet
On Sequential Monte Carlo Sampling Methods For Bayesian Filtering
35 pages
Lec30 GibbsSampling
No ratings yet
Lec30 GibbsSampling
55 pages
Discrete and Continuous Dynamical Systems Series S: Doi:10.3934/dcdss.2022054
No ratings yet
Discrete and Continuous Dynamical Systems Series S: Doi:10.3934/dcdss.2022054
25 pages
Supp2 2
No ratings yet
Supp2 2
307 pages
Inference in Mixed Linear/Nonlinear State-Space Models Using Sequential Monte Carlo
No ratings yet
Inference in Mixed Linear/Nonlinear State-Space Models Using Sequential Monte Carlo
31 pages
Sequential Monte Carlo Methods
No ratings yet
Sequential Monte Carlo Methods
6 pages
Bayesian Filtering
No ratings yet
Bayesian Filtering
252 pages
Lopes 2010
No ratings yet
Lopes 2010
42 pages
University of Bristol Research Report 08:16: SMCTC: Sequential Monte Carlo in C++
No ratings yet
University of Bristol Research Report 08:16: SMCTC: Sequential Monte Carlo in C++
36 pages
Sarkka
No ratings yet
Sarkka
252 pages
Bayesian Filtering - From Kalman Filters To Particle Filters and Beyond
No ratings yet
Bayesian Filtering - From Kalman Filters To Particle Filters and Beyond
69 pages
Big Data JPM
No ratings yet
Big Data JPM
31 pages
S Torvik 2002
No ratings yet
S Torvik 2002
9 pages
Frigola Bayesian Inference and Learning in Gaussian Process State Space Models With Particle MCMC
No ratings yet
Frigola Bayesian Inference and Learning in Gaussian Process State Space Models With Particle MCMC
9 pages
Factor Models
No ratings yet
Factor Models
59 pages
Computational Bayesian Statistics
100% (1)
Computational Bayesian Statistics
254 pages
An Introduction To Particle Filters: David Salmond and Neil Gordon Sept 2005
No ratings yet
An Introduction To Particle Filters: David Salmond and Neil Gordon Sept 2005
27 pages
Rao-Blackwellised Particle Filtering For Dynamic Bayesian Networks
No ratings yet
Rao-Blackwellised Particle Filtering For Dynamic Bayesian Networks
8 pages
Machine Learning Econometrics Bayesian Algorithms
No ratings yet
Machine Learning Econometrics Bayesian Algorithms
33 pages
An Introduction to Bayesian Inference, Methods and Computation Nick Heard - Read the ebook now with the complete version and no limits
100% (1)
An Introduction to Bayesian Inference, Methods and Computation Nick Heard - Read the ebook now with the complete version and no limits
62 pages
An Introduction To Sequential Monte Carlo: Nicolas Chopin Omiros Papaspiliopoulos
No ratings yet
An Introduction To Sequential Monte Carlo: Nicolas Chopin Omiros Papaspiliopoulos
390 pages
Where can buy An Introduction to Bayesian Inference, Methods and Computation Nick Heard ebook with cheap price
100% (4)
Where can buy An Introduction to Bayesian Inference, Methods and Computation Nick Heard ebook with cheap price
37 pages
Instant ebooks textbook An Introduction to Bayesian Inference, Methods and Computation Nick Heard download all chapters
100% (9)
Instant ebooks textbook An Introduction to Bayesian Inference, Methods and Computation Nick Heard download all chapters
31 pages
An Introduction To MCMC For Machine Learning: Abstract
No ratings yet
An Introduction To MCMC For Machine Learning: Abstract
39 pages
An Introduction To MCMC For Machine Learning
No ratings yet
An Introduction To MCMC For Machine Learning
39 pages
Computational Bayesian Statistics. An Introduction - Amaral, Paulino, Muller PDF
100% (3)
Computational Bayesian Statistics. An Introduction - Amaral, Paulino, Muller PDF
257 pages
Jun S, Liu 2001 - Monte Carlo Strategies in Scientific Cotnputing
No ratings yet
Jun S, Liu 2001 - Monte Carlo Strategies in Scientific Cotnputing
350 pages
3 1 Lueckmann21a-Supp
No ratings yet
3 1 Lueckmann21a-Supp
39 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
32 pages
SI Nonlin
No ratings yet
SI Nonlin
14 pages
Machine Learning
No ratings yet
Machine Learning
137 pages
Bishop2008 Chapter ANewFrameworkForMachineLearnin
No ratings yet
Bishop2008 Chapter ANewFrameworkForMachineLearnin
24 pages
Patterns of Scalable Bayesian Inference
No ratings yet
Patterns of Scalable Bayesian Inference
133 pages
Conceptual Introduction To MCMC
No ratings yet
Conceptual Introduction To MCMC
56 pages
10 Optimal Estimation
No ratings yet
10 Optimal Estimation
14 pages
Probabilistic Machine Learning: Advantages of Using Probabilistic Models
No ratings yet
Probabilistic Machine Learning: Advantages of Using Probabilistic Models
3 pages
Bellman Filtering For State Space Models
No ratings yet
Bellman Filtering For State Space Models
26 pages
Hogg 2018 ApJS 236 11
No ratings yet
Hogg 2018 ApJS 236 11
18 pages
[Review] The frontier of simulation-based inference
No ratings yet
[Review] The frontier of simulation-based inference
8 pages
A Conceptual Introduction To Markov Chain Monte Carlo Methods
No ratings yet
A Conceptual Introduction To Markov Chain Monte Carlo Methods
56 pages
Particle Filter Tutorial
No ratings yet
Particle Filter Tutorial
8 pages
(Ebook) Bayesian Filtering and Smoothing by Simo Särkkä ISBN 9781107030657, 110703065X - The ebook is available for instant download, read anywhere
100% (1)
(Ebook) Bayesian Filtering and Smoothing by Simo Särkkä ISBN 9781107030657, 110703065X - The ebook is available for instant download, read anywhere
58 pages
Full Notes 248 Spring 2022 Time Series
No ratings yet
Full Notes 248 Spring 2022 Time Series
117 pages
ML Lecture17
No ratings yet
ML Lecture17
60 pages
STAT 608 II: Monte Carlo Methods in Statistics. Winter 2015
No ratings yet
STAT 608 II: Monte Carlo Methods in Statistics. Winter 2015
2 pages
Machine Learning: Foundations: Prof. Nathan Intrator
No ratings yet
Machine Learning: Foundations: Prof. Nathan Intrator
60 pages
Dynamic Bayesian Networks - Representation, Inference and Learning
No ratings yet
Dynamic Bayesian Networks - Representation, Inference and Learning
225 pages
Bayesian Networks: An Introduction
From Everand
Bayesian Networks: An Introduction
Timo Koski
3/5 (1)
Nonlinear Programming: Analysis and Methods
From Everand
Nonlinear Programming: Analysis and Methods
Mordecai Avriel
5/5 (1)
Advanced Mathematical Applications in Data Science
From Everand
Advanced Mathematical Applications in Data Science
Biswadip Basu Mallik
No ratings yet
Random Data: Analysis and Measurement Procedures
From Everand
Random Data: Analysis and Measurement Procedures
Julius S. Bendat
3.5/5 (3)
Finite Element Methods
From Everand
Finite Element Methods
Rahul Basu
No ratings yet
Mechanical Properties of Nanostructured Materials: Quantum Mechanics and Molecular Dynamics Insights
From Everand
Mechanical Properties of Nanostructured Materials: Quantum Mechanics and Molecular Dynamics Insights
Abdolhossein Fereidoon
No ratings yet
Advanced Signal Integrity for High-Speed Digital Designs
From Everand
Advanced Signal Integrity for High-Speed Digital Designs
Stephen H. Hall
No ratings yet
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
From Everand
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
César Pérez López
No ratings yet
Dai 2020
No ratings yet
Dai 2020
62 pages
Ek 2020
No ratings yet
Ek 2020
203 pages
Gonzalez 2020
No ratings yet
Gonzalez 2020
79 pages
Lecture 3 - Entanglement in Action
No ratings yet
Lecture 3 - Entanglement in Action
36 pages
Durrande 2020
No ratings yet
Durrande 2020
90 pages
Seminar em
No ratings yet
Seminar em
51 pages
Lecture 4.1 - Quantum Query Algorithms
No ratings yet
Lecture 4.1 - Quantum Query Algorithms
38 pages
Lecture 7 - Introduction To Quantum Noise Bonus
No ratings yet
Lecture 7 - Introduction To Quantum Noise Bonus
13 pages
Lecture 8.1 - Iterative Quantum Phase Estimation - Moving Beyond Traditional QPE
No ratings yet
Lecture 8.1 - Iterative Quantum Phase Estimation - Moving Beyond Traditional QPE
31 pages
Lec20 RidgeRegression
No ratings yet
Lec20 RidgeRegression
21 pages
Lecture 1.1 - Single States
No ratings yet
Lecture 1.1 - Single States
49 pages
Lec31 32 CaterpillarRegressionExample
No ratings yet
Lec31 32 CaterpillarRegressionExample
108 pages
Lec33 MetropolisHastings
No ratings yet
Lec33 MetropolisHastings
66 pages
Lec29 ImportanceSampling
No ratings yet
Lec29 ImportanceSampling
84 pages
Lec9 MultivariateGaussian
No ratings yet
Lec9 MultivariateGaussian
60 pages
Lec35 SequentialImportanceSampling
No ratings yet
Lec35 SequentialImportanceSampling
46 pages
Lec27 AcceptReject
No ratings yet
Lec27 AcceptReject
30 pages
Lec24 BayesianLinearRegression
No ratings yet
Lec24 BayesianLinearRegression
29 pages
Lec23 Evidence4Regression
No ratings yet
Lec23 Evidence4Regression
38 pages
Lec7 InformationTheory
No ratings yet
Lec7 InformationTheory
41 pages
Lec21 BiasVarianceDecomposition
No ratings yet
Lec21 BiasVarianceDecomposition
15 pages
Lec17 PriorModeling
No ratings yet
Lec17 PriorModeling
37 pages
Lec18 HierarchicalBayesianModels
No ratings yet
Lec18 HierarchicalBayesianModels
20 pages
Lec25 MonteCarloMethods
No ratings yet
Lec25 MonteCarloMethods
57 pages
Lec22 Introduction2BayesianRegression
No ratings yet
Lec22 Introduction2BayesianRegression
42 pages
Lec16 SummarizingPosteriors BayesianModelSelection
No ratings yet
Lec16 SummarizingPosteriors BayesianModelSelection
59 pages
Lec28 StratifiedSampling
No ratings yet
Lec28 StratifiedSampling
15 pages
Lec14 15 GenerativeModelsForDiscreteData
No ratings yet
Lec14 15 GenerativeModelsForDiscreteData
74 pages
Lec19 Introduction2LinearRegression
No ratings yet
Lec19 Introduction2LinearRegression
53 pages
Probit Model
No ratings yet
Probit Model
29 pages
Statistics Review: EEE 305 Lecture 10: Regression
No ratings yet
Statistics Review: EEE 305 Lecture 10: Regression
12 pages
The Binomial Poisson and Normal Distributions
No ratings yet
The Binomial Poisson and Normal Distributions
45 pages
Worked Examples of Non-Parametric Tests
No ratings yet
Worked Examples of Non-Parametric Tests
22 pages
A Response Spectrum Method For Random Vibration Analysis of MDF Systems
No ratings yet
A Response Spectrum Method For Random Vibration Analysis of MDF Systems
17 pages
Basel Requirement of Downturn LGD: Modeling and Estimating PD & LGD Correlations
No ratings yet
Basel Requirement of Downturn LGD: Modeling and Estimating PD & LGD Correlations
32 pages
Soal-Soal PR Distribusi Normal
No ratings yet
Soal-Soal PR Distribusi Normal
2 pages
Data Analytics Essentials 2022-23
No ratings yet
Data Analytics Essentials 2022-23
27 pages
The Royal Statistical Society 2003 Examinations: Solutions
No ratings yet
The Royal Statistical Society 2003 Examinations: Solutions
9 pages
Statistical Eye Model For Normal Eyes
No ratings yet
Statistical Eye Model For Normal Eyes
9 pages
June 2001 QP - S1 Edexcel
No ratings yet
June 2001 QP - S1 Edexcel
5 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
88 pages
Athena Stat hw2
No ratings yet
Athena Stat hw2
5 pages
5-Dykstra Parson Method
No ratings yet
5-Dykstra Parson Method
7 pages
Math 215 Cheat Sheet
No ratings yet
Math 215 Cheat Sheet
3 pages
Fentahun and alexSTAT
No ratings yet
Fentahun and alexSTAT
38 pages
Reserves Estimation: The Following Equation Is Used in The Volumetric Calculation of Reserves
No ratings yet
Reserves Estimation: The Following Equation Is Used in The Volumetric Calculation of Reserves
20 pages
T-Tests & Chi2
No ratings yet
T-Tests & Chi2
35 pages
Decision and Choice Luce's Choice Axiom PDF
100% (1)
Decision and Choice Luce's Choice Axiom PDF
18 pages
Lecture 02 Normal Distribution and Binomial Distribution
No ratings yet
Lecture 02 Normal Distribution and Binomial Distribution
38 pages
The Central Limit Theorem (Solutions)
No ratings yet
The Central Limit Theorem (Solutions)
8 pages
STAT Summative Test - Q3 (Week 5-6)
100% (2)
STAT Summative Test - Q3 (Week 5-6)
2 pages
Advanced Maths Checklist
No ratings yet
Advanced Maths Checklist
20 pages
Applications of Free Probability and Random Matrix Theory: Øyvind Ryan
No ratings yet
Applications of Free Probability and Random Matrix Theory: Øyvind Ryan
23 pages
03 22 1-S2.0-S1364032122000569-Main
No ratings yet
03 22 1-S2.0-S1364032122000569-Main
35 pages
Nursing Research Quiz 1
No ratings yet
Nursing Research Quiz 1
5 pages
Probability (Schaum's Outline Series, 3rd ed.) 3rd Edition Seymour Lipschutz All Chapters Instant Download
No ratings yet
Probability (Schaum's Outline Series, 3rd ed.) 3rd Edition Seymour Lipschutz All Chapters Instant Download
40 pages
A Collection of Formulas For GENG5507 (Version 9 5 14) : Acronyms
No ratings yet
A Collection of Formulas For GENG5507 (Version 9 5 14) : Acronyms
12 pages