Exact Particle Filtering and Parameter Learning
Exact Particle Filtering and Parameter Learning
Abstract
In this paper, we provide an exact particle filtering and parameter learning algorithm.
Our approach exactly samples from a particle approximation to the joint posterior
distribution of both parameters and latent states, thus avoiding the use of and the
degeneracies inherent to sequential importance sampling. Exact particle filtering
algorithms for pure state filtering are also provided. We illustrate the efficiency of our
approach by sequentially learning parameters and filtering states in two models. First,
we analyze a robust linear state space model with t-distributed errors in both the
observation and state equation. Second, we analyze a log-stochastic volatility model.
Using both simulated and actual stock index return data, we find that algorithm
efficiently learns all of the parameters and states in both models.
∗
Johannes is at the Graduate School of Business, Columbia University, 3022 Broadway, NY, NY, 10027,
[email protected]. Polson is at the Graduate School of Business, University of Chicago, 5807 S. Wood-
lawn, Chicago IL 60637, [email protected]. We thank Seung Yae for valuable research assistance.
1
1 Introduction
Sequential parameter learning and state filtering is a central problem in the statistical
analysis of state space models. State filtering has been extensively studied using the Kalman
filter, analytical approximations, and particle filtering methods, however, these methods
assume any static model parameters are known. In practice, parameters are typically
unknown and filtered states are highly sensitive to parameter uncertainty. A complete
solution to the sequential inference problem delivers not only filtered state variables, but
also estimates of any unknown static model parameters.
This paper provides an exact particle filtering algorithm for sequentially filtering unob-
served state variables, xt , and learning unknown static parameters, θ, for wide class of mod-
els. Our algorithm generates exact samples from a particle approximation, pN (θ, xt |y t ), to
the joint posterior distribution of parameters and states, p (θ, xt |y t ), where N is the number
of particles and y t = (y1 , ..., yt ) is the vector of observations up to time t. Our algorithm
is “optimal” in the sense that we provide exact draws from the particle approximation
to p (θ, xt |y t ), thus avoiding the use of and the inherent degeneracies associated with im-
portance sampling. The algorithm applies generally to nonlinear, non-Gaussian models
assuming a conditional sufficient statistics structure for the parameter posteriors.
The algorithm relies on three main insights. First, we track a triple consisting of
parameters, sufficient statistics, and states, denote by (θ, st , xt ), as in Storvik (2002) and
Fearnhead (2002). Second, by tracking this triple, we can factorize the joint posterior
density via
¡ ¢ ¡ ¢ ¡ ¢
p θ, st+1 , xt+1 |y t+1 ∝ p (θ|st+1 ) p st+1 |xt+1 , y t+1 p xt+1 |y t+1 . (1)
This representation suggests an approach of sampling the joint density via a marginaliza-
tion procedure: update the states first via the filtering distribution, p (xt+1 |y t+1 ), update
the sufficient statistics, st+1 , given the data and updated state, and finally drawing the pa-
rameters via p (θ|st+1 ). Third, the key to operationalizing this factorization is generating
draws from the particle approximation to p (xt+1 |y t+1 ). We essentially follow this outline.
To do this, we use an alternative representation to express pN (xt+1 |y t+1 ) as a mixture dis-
tribution that can be directly sampled. Given samples from pN (xt+1 |y t+1 ), updating the
sufficient statistics and parameters is straightforward.
The key advantage to our algorithm is that it does not rely on sequential importance
sampling (SIS). SIS methods are popular and have dominated previous attempts to imple-
2
ment particle-based sequential learning algorithms. Importance sampling, however, suffers
from well known problems related to the compounding of approximation errors, which leads
to sample impoverishment and weight degeneracies. Since our algorithm exactly samples
from the particle distribution, it avoids the particle degeneracies of SIS algorithms.
To demonstrate the algorithm, we analyze in detail the class of models with linear
observation and state evolutions and non-Gaussian errors. This class includes robust spec-
ifications such as models with t, stable, and discrete mixtures of normals errors, as well
as dynamic discrete-choice models. In this class of models, the key to efficient inference
is to represent the errors as a scale mixture of normals, to introduce an auxiliary latent
scaling variable, and to use data augmentation. This scale mixture representation has been
extensively used to analyze the state and parameter smoothing via MCMC methods (see,
for example, Carlin and Polson (1991), Carlin, Polson, and Stoffer (1992), Carter and Kohn
(1994, 1996), and Shephard (1994)) and for pure state filtering using standard particle fil-
tering (Gordon, Salmond, and Smith (1992)) and extensions such as the auxiliary particle
filter (Pitt and Shephard (1999)) and mixture Kalman filter (Chen and Liu (2000)).
Pure state filtering is special case of our algorithm if the static parameters are known.
In this case, our general algorithm simplifies and generates an exact algorithm for particle-
based state filtering. Again, this state filtering algorithm has the advantage that it does
not resort to sequential importance sampling (SIS) methods. This algorithm provides an
exact alternative to popular SIS algorithms that include the approach in Gordon, Salmond,
and Smith (1993) and extended in Pitt and Shephard (1999) and Chen and Liu (2000).
We illustrate our approach using two models. The first is a model with a latent au-
toregressive state process controlling the mean and t-distributed observation and state
equation errors, a robust version of the classic linear Gaussian state space model. In the
case of pure filtering, models with t-distributed errors in either (but not both) the state or
observation have been analyzed in depth using approximate filters; see, for example, Mas-
reliez and Martin (1977), Meinhold and Singpurwalla (1987), West (1981), and Gordon
and Smith (1993). We also analyze a log-stochastic volatility parameterized via a mixture
of normals error term as in Kim, Shephard, and Chib (1998). In both cases, we show that
the algorithm is able to accurately learn all of the parameters and state variables in both
simulated and real data examples. We view our algorithms as simulation based robust
extensions of the Kalman filter that handle both parameter and state learning in models
with non-normalities.
3
To date, algorithms for parameter learning and state filtering have achieved varying
degrees of success. Previous attempts include the particle filters in Liu and West (2001),
Storvik (2002), Chopin (2002, 2005), Doucet, and Tadic (2003), Johansen, Doucet, and
Davey (2006), Andrieu, Doucet, and Tadic (2006), and Johannes, Polson, and Stroud
(2005, 2006), the pure MCMC approach of Polson, Stroud, and Muller (2006), and the
hybrid approach of Berzuini, Best, Gilks, and Larizza (1997) and Del Moral, Doucet, and
Jasra (2006). Most of these algorithms have limited scope or difficulties even in standard
models. For example, Stroud, Polson, and Muller (2006) document that Storvik’s algorithm
has difficulties handling outliers in an autoregressive model, while their MCMC approach
has difficulties estimating the volatility of volatility in a stochastic volatility model.
The rest of the paper is outlined as follows. Section 2 describes our general approach to
understand our updating mechanism. We discuss in detail the simple case of state filtering
and parameter learning in a linear Gaussian state space model and the special case of
pure filtering. We introduce latent auxiliary variables to transform non-normal models
into conditionally Gaussian models with a sufficient statistic structure. Section 3 provides
examples of the methodology in the case of t−distributed errors and a stochastic volatility
model using simulated and real data examples. Finally, Section 4 concludes.
4
We use a particle filtering approach to characterize p(θ, xt |y t ). Particle methods use a
discrete representation of p(θ, xt |y t )
1 X
N
N t
p (θ, xt |y ) = δ (i) ,
N i=1 (xt ,θ)
where N is the number of particles and (xt , θ)(i) denotes the particle vector. As in the
case of pure state filtering, the particle approximation simplifies many of the hurdles that
are inherent to sequential problems. Liu and West (2001), Chopin (2002), Storvik (2002),
Andrieu, Doucet, and Tadic (2005), Johansen, Doucet, and Davey (2006), and Johannes,
Polson, and Stroud (2005, 2006) all use particle methods for sequential parameter learning.
Given the particle approximation, the key problem is how to jointly propagate the para-
meter and state particles. This step is complicated because the state propagation depends
on the parameters, and vice versa. To circumvent the codependence in a joint draw, it is
common to use importance sampling. This, however, can lead to particle degeneracies, as
the importance densities may not closely match the target densities. Degeneracies are also
apparent in hybrid MCMC schemes due to the long range dependence between the parame-
ters and state variables. One essential key to breaking this dependence is to track a vector
of conditionally sufficient statistics, st , as in Storvik (2002) and Fearnhead (2002). We char-
acterize p(θ, st , xt |y t ) via a particle approximation and update the particles in three steps,
in which each component is sequentially updated. As we now show this allows to generate
an exact draw from pN (θ, st+1 , xt+1 |y t+1 ), given existing samples from pN (θ, st , xt |y t ).
The sufficient statistic is a functional relative to the random variables xt+1 and st , and yt+1
is observed. Viewed at this level, our algorithm uses the common mechanism of expressing
a joint distribution as a product of conditional and marginal distributions. Our approach
5
essentially follows these steps, taking advantage of the mixture structure generated by a
discrete particle approximation pN (θ, st , xt |y t ). We now discuss the mechanics of each step.
We first express p (xt+1 |y t+1 ) relative to p (xt , θ|y t ), via
Z
¡ t+1
¢ ¡ ¢
p xt+1 |y = p (yt+1 |xt , θ) p (xt+1 |xt , θ, yt+1 ) dp xt , θ|y t . (3)
This representation is somewhat nonstandard, and we discuss this issue further below in
Section 2.2. Given a particle approximation, pN (xt , θ|y t ), to the previous period’s posterior,
this implies that pN (xt+1 |y t+1 ) is given by
Z
N
¡ t+1
¢ ¡ ¢
p xt+1 |y = p (yt+1 |xt , θ) p (xt+1 |xt , θ, yt+1 ) dpN xt , θ|y t (4)
X
N ³ ´ ³ ´
(i) (i)
= w (xt , θ) p xt+1 | (xt , θ) , yt+1 , (5)
i=1
The distribution pN (xt+1 |y t+1 ) is a discrete mixture distribution, where w (xt , θ) are the
mixing probabilities and p (xt+1 |xt , θ, yt+1 ) is the conditional state distribution. Standard
simulation methods can now be applied to sample from pN (xt+1 |y t+1 ) by first resampling
the particle vector (θ, xt , st ):
µn ³ ´oN ¶
(i) (i)
(θ, xt , st ) ∼ MultiN w (xt , θ) ,
i=1
6
The exact particle filtering and parameter learning is given in the following four steps.
––––––––––––––––––––––––––––––––––––—
Algorithm: Exact state filtering and parameter learning
µn ³ ´oN ¶
(i) (i)
Step 1: Draw (θ, xt , st ) ∼ MultiN w (xt , θ) for i = 1, ..., N
i=1
³ ´
(i)
Step 2: Draw xt+1 ∼ p xt+1 | (xt , θ)(i) , yt+1 for i = 1, ..., N
³ ´
(i) (i) (i)
Step 3: Update sufficient statistics: st+1 = S st , xt+1 , yt+1 for i = 1, ..., N
³ ´
(i)
Step 4: Draw θ(i) ∼ p θ|st+1 for i = 1, ..., N .
––––––––––––––––––––––––––––––––––––—
From the representation in equation (2), the algorithm provides an exact draw from
pN (θ, xt+1 , st+1 |y t+1 ). Since there are fast algorithms to draw multinomial random variables
(see Carpenter, Clifford, and Fearnhead (1998)), the algorithm is O (N). For convergence
proofs as N increases, see Doucet, Godsill, and West (2004) in the state filtering case and
Hansen and Polson (2006) for the case with state filtering and parameter learning. As
with any Monte Carlo procedure, the choice N will depend on the model, the dimension
of the state and parameter vectors, and T . In particular, to mitigate the accumulation of
approximation errors, increasing N with T is important for long datasets.
Discussion: The algorithm requires three steps: (1) A sufficient statistic structure for
the parameters, (2) an ability to evaluate p (yt+1 |xt , θ) , and (3) an ability to sample from
p (xt+1 |xt , θ, yt+1 ). In the next section, we use a linear Gaussian model as an example, as all
of these distributions are known. Section 2.2 shows how to tailor the algorithm to models
with discrete or continuous scale mixture of normal distributions errors. This modification
introduces auxiliary variables indexing the mixture component in the error distributions,
and generates a conditional sufficient statistic structure.
For nonlinear models, the only formal requirement is that there exists a conditional
sufficient statistic structure. The distribution p (yt+1 |xt , θ) can be computed in many mod-
els using, for example, accurate and efficient numerical integration schemes. Similarly, if
p (xt+1 |xt , θ, yt+1 ) cannot be directly sampled, indirect methods such as rejection sampling
or MCMC can be used, although the computationally efficiency of these methods will de-
pend on the dimensionality of the distribution. In models for which these densities are
not known, sequential importance sampling can be used to approximate p (yt+1 |xt , θ) and
7
p (xt+1 |xt , θ, yt+1 ). Johannes, Polson, and Stroud (2006) develop a general algorithm for
this case, and provide an example using an inherently nonlinear model.
For a concrete example, consider the latent autoregressive, AR(1), with noise model:
where the shocks are independent standard normal random variables and θ = (αx , βx , σx2 , σ 2 ).
We assume an initial state distribution, x0 ∼ N (μ0 , σ02 ) and standard conjugate priors for
the parameters: σ 2 ∼ IG (a, A) and p (αx , βx |σx2 ) p (σx2 ) ∼ N IG (b, B), where N IG is the
normal/inverse gamma distribution.
In order to implement our algorithm, we need the following quantities: the predictive
likelihood, the updated state distribution, the sufficient statistics, and the parameter pos-
terior. The predictive likelihood used in the initial resampling step is
¡ ¢
p (yt+1 |xt , θ) ∼ N αx + βx xt , σ 2 + σx2 ,
where
μt+1 yt+1 αx + βx xt 1 1 1
2
= 2 + 2
and 2 = 2 + 2 .
σt+1 σ σx σt+1 σ σx
The p (xt+1 |xt , θ, yt+1 ) shows how sensitive the state updating is to the model parameters.
For the parameters and sufficient statistics, we re-write the state evolution as
0
xt+1 = Zt β + σx εxt+1
8
where Zt = (1, xt ) and β = (αx , βx )0 . To update the parameters, we note that the posterior
is given by
¡ ¢ ¡ ¢ ¡ ¢
p (θ|st ) = p β|σx2 , st p σ 2 |st p σx2 |st ,
and we can update first the volatilities and then the regression coefficients. The conditional
posteriors are known and given by
¡ ¢
p σ 2 |st+1 ∼ IG (at+1 , At+1 )
¡ ¢
p σx2 |st+1 ∼ IG (bt+1 , Bt+1 ) ,
¡ ¢ ¡ ¢
p β|σx2 , st+1 ∼ N ct+1 , σx2 Ct+1
−1
,
where the vector of sufficient statistics, st+1 = (At+1 , Bt+1 , ct+1 , Ct+1 ) , is updated via the
via the functional recursions
The hyperparameters are deterministic and given by at+1 = 1/2 + at and bt+1 = 1/2 + bt .
The full algorithm consists of the following steps:
––––––––––––––––––––––––––––––––––––—
Algorithm: AR(1) model state filtering and parameter learning
∙n ³ ´oN ¸
(i) (i)
Step 1: Draw (θ, st , xt ) ∼ MultiN w (xt , θ)
i=1
³ ´
(i) (i) (i)
Step 2: Draw xt+1 ∼ p xt+1 |xt , θ , yt+1 for i = 1, ..., N
³ ´
(i) (i) (i)
Step 3: Update st+1 = S st , xt+1 , yt+1 for i = 1, ..., N
³ ´ ³ ´
(i) (i) (i) (i)
Step 4: Draw (σ 2 ) ∼ p σ 2 |st+1 ∼ IG at+1 , At+1 ,
³ ´ ³ ´
(i) (i) (i) (i)
(σx2 ) ∼ p σx2 |st+1 ∼ IG bt+1 , Bt+1 , and
³ ´ ³ ´
(i) (i) (i) (i) ¡ −1 ¢(i)
(β)(i) ∼ p β| (σx2 ) , st+1 ∼ N ct+1 , (σx2 ) Ct+1 for i = 1, ..., N .
––––––––––––––––––––––––––––––––––––—
9
This algorithm essentially provides a simulation based extension to the Kalman filter
that can also estimate the parameters. Johannes, Polson, and Stroud (2006) develop a
similar algorithm using a slightly different interacting particle systems approach. Johannes
and Polson (2006) provide extensions in multivariate extensions where the observed vec-
tor or states are multivariate. These multivariate Gaussian state space models are used
extensively in modeling of macroeconomic time series.
We utilize a somewhat nonstandard expression for p (xt+1 |y t+1 ) in updating the states. To
understand the mechanics of this step and to contrast it with common particle filtering
algorithms, we consider the simpler case of pure filtering. For the rest of this subsection,
we assume the parameters are known and fixed at those true values.
The distribution p (xt+1 , yt+1 |xt ) can be expressed in different ways. We express
which combines the predictive likelihood p (yt+1 |xt ) and the conditional state posterior
p (xt+1 |xt , yt+1 ). This leads to the marginal distribution
Z
¡ t+1
¢ ¡ ¢
p xt+1 |y = p (yt+1 |xt ) p (xt+1 |xt , yt+1 ) p xt |y t dxt . (7)
¡ ¢ XN ³ ´ ³ ´
N t+1 (i) (i)
p xt+1 |y = w xt p xt+1 |xt , yt+1 , (8)
i=1
It is important to note that the mixture probabilities are a function of xt not xt+1 . This
implies a two-step direct draw from pN (xt+1 |y t+1 ).
The state filtering algorithm consists of the following steps:
––––––––––––––––––––––––––––––––––––—
Algorithm: Exact state filtering
10
n ³ ´ ³ ´o
(i) (1) (N)
Step 1: (Resample) Draw xt ∼ MultiN w xt , ..., w xt
³ ´
(i) (i)
Step 2: (Propagate) Draw xt+1 ∼ p xt+1 |xt , yt+1 .
––––––––––––––––––––––––––––––––––––—
In contrast, the standard particle filtering approach expresses p (yt+1 , xt+1 |xt ) as
¡ ¢ XN ³ ´ ³ ´
N t+1 (i) (i)
p xt+1 |y = w xt+1 p xt+1 |xt , (12)
i=1
where ³ ´
(i)
³ ´ p yt+1 |xt+1
(i)
w xt+1 = P ³ ´
N (i)
i=1 p yt+1 |xt+1
Sampling from this mixture distribution is difficult because the natural mixing prob-
abilities depend on xt+1 , which has yet to be simulated. Instead of direct sampling, the
common approach is to use importance sampling and the sampling-importance resampling
(SIR) algorithm of Rubin (1988) or Smith and Gelfand (1992). This generates the classic
SIR PF algorithm:
³ ´
(i) (i)
(Propagate) Draw xt+1 ∼ p xt+1 |xt for i = 1, ..., N
∙n ³ ´oN ¸
(i) (i)
(Resample) Draw xt+1 ∼ MultiN w xt+1 .
i=1
We use a multinomial resampling step, although other approaches are available (see Liu
and Chen (1998) or Carpenter, Clifford, and Fearnhead (1999)). The classic PF algorithm
suffers from a number of well-known problems as it blindly simulates states, even though
yt+1 is observed, and relies on importance sampling. Importance sampling typically results
in weight degeneracy or sample impoverishment.
11
Notice that our algorithm is in exactly the opposite order as the classical particle
³ filter.
´
(i)
First, the algorithm selects particles to propagate forward via their likelihood p yt+1 |xt .
This results in propagating high-likelihood particles multiple ³times and is key ´ to an effi-
(i)
cient algorithm. Second, the algorithm propagates states via p xt+1 |xt , yt+1 , taking into
account the new observation. The draws all have equal probability weights, so there is no
need to track the weights.
Our algorithm is closely related to the optimal importance function algorithms derived
in Doucet, et al. (2000).
³ Their ´algorithm effectively reverses our Steps 1 and 2, by first
(i)
simulating from p xt+1 |xt , yt+1 and then reweighting those draws. Like Doucet, et al.
(2000), our algorithm requires that p (yt+1 |xt ) is known and p (xt+1 |xt , yt+1 ) can be sim-
ulated. However, our algorithm is not an importance sampling algorithm as it provides
exact draws from the target distribution, pN (xt+1 |y t+1 ).
where the specification of λt+1 and ωt+1 determines the error distribution. For exam-
ple, λt+1 ∼ IG (ν/2, ν/2) generates a marginal distribution for observation errors that is
t−distributed with ν−degrees of freedom. The case of discrete mixtures is handled simi-
larly. We also assume that there exist conditional sufficient statistics for the parameters,
p (θ|st ) , where the recursions for the sufficient statistics are given by
It is important to note that the parameter posteriors generally do not admit sufficient
statistics unless we introduce the latent auxiliary variables.
This class of shocks has a long history in state space model. T -distributed errors in the
observation equation were analyzed by Masreliez (1975) and Masreliez and Martin (1977),
but they did not consider t-distribution state shocks. In the case of smoothing, this class of
shocks is considered using MCMC methods by Carlin, Polson, and Stoffer (1992), Carter
and Kohn (1994, 1996) and Shephard (1994). This implies that we allow for t−distributed
12
errors, stable errors, double exponential errors, and discrete mixture errors. This latter case
includes the important class of log-stochastic volatility models using the representation of
Kim, Shephard, and Chib (1998).
The algorithm outlined in Section 2.1 requires an analytical form for p (yt+1 |xt , θ) and
an ability to simulate from p (θ|st+1 ) and p (xt+1 |xt , θ, yt+1 ). For the mixture models, these
densities are analytically known. However, the algorithm can be slightly modified to handle
these non-Gaussian and non-linear components. The key is twofold: utilizing the fact that
p (yt+1 |xt+1 , λt+1 , θ) and p (xt+1 |xt , λt+1 , θ) are Gaussian distributions and then a careful
marginalization to sequentially update xt+1 and λt+1 .
The algorithm proceeds via an analog to (1). For notational parsimony, we will denote the
latent variables by just λt+1 from now on. The factorization is
¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢
p θ, st+1 , λt+1 , xt+1 |y t+1 = p (θ|st+1 ) p st+1 |xt+1 , λt+1 |y t+1 p xt+1 |λt+1 , y t+1 p λt+1 |y t+1 ,
by first updating λt+1 , then xt+1 , then st+1 , and finally θ. As in Section 2.1, to generate
samples from the joint, we rely on the factorization and careful marginalization arguments.
Given existing particles, the first step is to propagate the mixture variables, λt+1 . To
do this, we generate draws from a higher dimensional distribution, and then obtain draws
from p (λt+1 |y t+1 ) as the marginal distribution. To do this, first note that
¡ ¢ ¡ ¢
p λt+1 , xt , θ|y t+1 ∝ p (yt+1 |θ, λt+1 , xt ) p θ, λt+1 , xt |y t .
13
Since p (yt+1 |λt+1 , xt , θ) is known for all of the models that we consider, this step is feasible.
To propagate the states, express
Z
¡ t+1
¢ ¡ ¢
p xt+1 |λt+1 , y = p (xt+1 |θ, λt+1 , xt , yt+1 ) p θ, xt |y t+1 d (θ, xt ) .
and note that we already have draws from the particle approximation to p (xt , θ|y t+1 ) as a
marginal from p (λt+1 , xt , θ|y t+1 ). Therefore, we can sample xt+1 via
³ ´
(i)
xt+1 ∼ p xt+1 | (θ, λt+1 , xt )(i) , yt+1 ,
since the distribution p (xt+1 |θ, λt+1 , xt , yt+1 ) is known for all of the mixture models. Given
the updated states, we update the sufficient statistics via
³ ´
(i) (i) (i) (i)
st+1 = S st , xt+1 , λt+1 , yt+1 ,
³ ´
(i) (i)
and draw θ ∼ p θ|st+1 .
The full algorithm is given by the following steps.
––––––––––––––––––––––––––––––––––––—
Algorithm: Non-Gaussian sequential parameter learning and state filtering
(i)
Step 1: Draw λt+1 ∼ p (λt+1 ) for i = 1, ..., N
½n ³ ´oN ¾
(i) (i)
Step 2: Draw (θ, st , λt+1 , xt ) ∼ MultN w (θ, λt+1 , xt ) for i = 1, ..., N
i=1
³ ´
(i)
Step 3: Draw: xt+1 ∼ p xt+1 | (θ, λt+1 , xt )(i) , yt+1 for i = 1, ..., N
³ ´
(i) (i) (i) (i)
Step 4: Update: st+1 = S st , xt+1 , λt+1 , yt+1 for i = 1, ..., N
³ ´
(i)
Step 5: Draw θ ∼ p θ|st+1 for i = 1, ..., N .
––––––––––––––––––––––––––––––––––––—
This algorithm provides an exact sample from pN (θ, st+1 , λt+1 , xt+1 |y t+1 ).
After Step 3, an additional step can be introduced to update λt+1 from p (λt+1 |θ, xt+1 , yt+1 ).
This is effectively a one-step MCMC replenishment step. As the algorithm is already ap-
proximately sampling from the “equilibrium” distribution, the marginal for λt+1 , an addi-
tional replenishment step for λt+1 may help by introducing additional sample diversity.
14
2.2.2 Pure state filtering
If we assume the parameters are known and focus on the state filtering problem, we can
adapt the algorithms from the previous section to provide exact particle filtering algorithms.
Existing state filtering algorithms for these models rely on importance sampling methods
either via the auxiliary particle filter of Pitt and Shephard (1999) or the mixture-Kalman
filter of Chen and Liu (2000). Both of the algorithms above provide exact O (N) algorithms
for state filtering, and we briefly discuss these algorithms as they offer generic improvements
on the existing literature.
There are two ways to factor the joint filtering densities:
¡ ¢ ¡ ¢ ¡ ¢
p xt+1 , λt+1 |y t+1 = p xt+1 |λt+1 , y t+1 p λt+1 |y t+1
or
¡ ¢ ¡ ¢ ¡ ¢
p xt+1 , λt+1 |y t+1 = p λt+1 |xt+1 , y t+1 p xt+1 |y t+1 ,
with the differences based on the order of auxiliary variable or state variable updates.
The first factorization leads to an initial draw from p (λt+1 |y t+1 ). Since,
¡ ¢ ¡ ¢
p λt+1 , xt |y t+1 ∝ p (yt+1 |λt+1 , xt ) p λt+1 , xt |y t
and the latent auxiliary variables are i.i.d., we have that p (λt+1 , xt |y t ) ∝ p (λt+1 ) p (xt |y t ) .
Therefore, to draw from
Z
¡ t+1
¢ ¡ ¢
p λt+1 |y = p (yt+1 |λt+1 , xt ) p (λt+1 ) p xt |y t dxt ,
(i) (i)
we can augment the existing particles xt from pN (xt |y t ) with λt+1 draws, and resample
with probabilities given by
³ ´
p yt+1 | (λt+1 , xt )(i)
w (λt+1 , xt )(i) = P ³ ´.
n (i)
i=1 p yt+1 | (λt+1 , x t )
(i)
which implies that we can draw xt+1 |λt+1 using the resampled (λt+1 , xt )(i) and draw from
³ ´
p xt+1 | (λt+1 , xt )(i) , yt+1 .
15
This generates an exact draw from pN (xt+1 , λt+1 |y t+1 ).
The second approach updates the state variables and then the latent auxiliary variables.
To sample from pN (xt+1 |y t+1 ), we use a slight modification of the filtering distribution,
Z
¡ t+1
¢ ¡ ¢
p xt+1 |y = p (yt+1 |λt+1 , xt ) p (xt+1 |xt , λt+1 , yt+1 ) p λt+1 , xt |y t d (λt+1 , xt ) .
Since λt+1 is independent ³of y t and ´xt , we can simulate λt+1 ∼ p (λt+1 ) and create an
(i) (i)
augmented particle vector xt , λt+1 . Given this particle approximation for (xt , λt ), we
have that
¡ ¢ X N ³ ´ ³ ´
N t+1 (i) (i) (i) (i)
p xt+1 |y = w xt , λt+1 p xt+1 |xt , λt+1 , yt+1 ,
i=1
where
³ ´
(i) (i)
³ ´ p yt+1 |xt , λt+1
(i) (i)
w xt , λt+1 = P ³ ´
n (i) (i)
i=1 p y |x
t+1 t , λt+1
This mixture distribution can again be exactly sampled. Updating the auxiliary variables
is straightforward since
¡ ¢
p λt+1 |xt+1 , y t+1 ∝ p (yt+1 |xt+1 , λt+1 ) p (λt+1 )
3 Illustrative Examples
In this section, we provide the details of our sequential parameter learning and state filtering
algorithms for the two models that we consider.
16
where the auxiliary variables are independent and λt+1 ∼ IG (ν/2, ν/2) and ωt+1 ∼
IG (ν x /2, ν x /2) . Conditional on λt+1 and ωt+1 , the model is conditionally Gaussian.
Masreliez and Martin (1977) develop approximate robust state filters for models with
t−distributed errors in either the state or observation equation, but not both. West (1981)
and Gordon and Smith (1993) analyze the pure filtering problem. Storvik (2002) uses
importance sampling to analyze sequential parameter learning and state filtering using im-
portance sampling assuming the observation errors, but not state errors, are t−distributed.
To our knowledge, this algorithm provides the first algorithm for parameter and state
learning with t-errors in both equations.
Applying the general algorithm in Section 2.2.1, the distributions p (yt+1 |θ, λt+1 , ωt+1 , xt )
and p (xt+1 |θ, λt+1 , ωt+1 , xt ) are required to implement our algorithm. The first distribution,
p (yt+1 |θ, λt+1 , ωt+1 , xt ), defines the weights which are given by
⎛ ³ ´2 ⎞
(i) (i) (i)
³ ´ 1 ⎜ 1 yt+1 − αx − βx xt ⎟
w (xt , θ)(i) ∝ rh i exp ⎝− (i) (i) (i) ⎠.
2 (i) (i) 2 (i) 2 (σ ) λt+1 + (σx )
2 2
(σ ) λt+1 + (σx )
where
μt+1 yt+1 αx + βx xt 1 1 1
2
= 2 + 2
and 2 = 2 + 2 .
σt+1 σ λt+1 σx ωt+1 σt+1 σ λt+1 σx ωt+1
For the parameter posteriors and sufficient statistics, we re-write the state equation as
√
xt+1 = Zt0 β + σx ωt+1 t+1
where Zt = (1, xt )0 and β = (αx , βx )0 . Given this parameterization, the sufficient sta-
¡ ¢
tistic structure implies that p (σ 2 |st+1 ) ∼ IG (at+1 , At+1 ), p σx2 |st+1 ∼ IG (bt+1 , Bt+1 ),
¡ −1
¢
and p (β|σx2 , st+1 ) ∼ N ct+1 , σx2 Ct+1 . The hyperparameters are given by at = 12 + at−1 ,
17
1
bt = 2
+ bt−1 , and
(yt+1 − xt+1 )2
At+1 = + At
λt+1
x2
Bt+1 = Bt + c0t Ct ct + t+1 − c0t+1 Ct+1 ct+1
ωt+1
µ 0
¶
−1 Zt+1 xt+1
ct+1 = Ct+1 Ct ct +
ωt+1
0
Zt+1 Zt+1
Ct+1 = Ct + ,
ωt+1
which defines the vector of sufficient statistics, st+1 = (At+1 , Bt+1 , ct+1 , Ct+1 ), for sequential
parameter learning.
The t-distributed error model requires the specification of the degrees of freedom para-
meter in the t-distributions. Here, we leave (ν, ν x ) as known parameters. It is not possible
to add this parameter in the state vector, but one could compute their posterior distribution
by discretizing the support.
3.2 SV errors
Consider next the log-stochastic volatility model, first analyzed in Jacquier, Polson, and
Rossi (1994) and subsequently by many others:
³x ´
t
yt = exp εt
2
xt = αx + βx xt−1 + σv vt
where the errors are independent standard normal random variables. To estimate the
model, we use the transformation approach of Kim, Shephard, and Chib (1998) and the
10-component mixture approximation developed in Omori, Chib and Shephard (2006).
The Kim, Shephard, and Chib (1998) transformation analyzes the logarithm of squared
returns, yt = ln yt2 and the K = 10-component normal mixture approximation we have a
state space model of the form
yt = xt + t
xt = αx + βx xt−1 + σv vt
18
where t is a log (χ21 ) which is approximated by a discrete mixture of normals with fixed
P
weights, K 2
j=1 pj Zj where Zj ∼ N (μj , σj ). The indicator variable It tracks the mixture
components, with, for example, It = j indicating a current state in mixture component j.
³ Our state´filtering and sequential algorithm will track particles and sufficient statistics
(i) (i)
xt , θ(i) , st . Here st are the usual sufficient statistics for estimating the parameters
θ = (α, β, σv ). The sufficient statistics are conditionally on the indicator variables, and are
of the same form as a standard AR(1) model, as the error distribution is known.
To implement the algorithm, first note that we can calculate the following conditional
densities
X
K
p(yt+1 |xt , θ) = pj N(α + βxt , σj2 + σv2 )
j=1
That is the predictive density of the next observation given the current state is a mixture
of normals. We use this to define weights w(It+1 , xt , θ) as
à ¡ ∗ ¢2 !
1 1 yt+1 − μIt+1 − αx − βx xt
w (It+1 , xt , θ) ∝ q exp − .
σ2 + σ2 2 σI2t+1 + σv2
It+1 v
Now we can compute the updated filtering distribution of the next log-volatility state and
component indicator as follows
¡ ¢ ¡ ¢
p xt+1 , It+1 = j|xt , θ, yt+1 ∝ p yt+1 |xt+1 , It+1 = j, θ p (xt+1 , It+1 = j|xt , θ)
Therefore
¡ ¢ ¡ ¢
p xt+1 , It+1 = j|xt , θ, yt+1 ∝ p yt+1 |xt+1 , It+1 = j, θ p (xt+1 , It+1 = j|xt , θ) p(It+1 = j)
19
Using the definition of the predictive in terms of the weight function and the fact that
p(It+1 = j) = pj we obtain a density proportional to
X
N ³ ´ ³ ´
(i) (i) (i) (i) (i) (i)
wj xt , θ , It+1 p xt+1 |It+1 , xt , θ , yt+1
i=1
¡ ¢
Hence the next particle filtering distribution pN xt+1 | (y )t+1 is a mixture of normals
which can be sampled from directly.
The density is given by the Kalman filter recursion and is a conditional normal
¡ ¢ ¡ ¢
p xt+1 |(It+1 , xt , θ)(i) , yt+1 ∼ N x̂t+1,It+1 , ŝ2t+1,It+1
where
σv2 ¡ ¢ vj2
x̂t+1,j = y − mj + (α + βxt )
σv2 + vj2 t+1 σv2 + vj2
ŝ−2 −2 −2
t+1,j = σv + vj
(i)
Hence the next filtering distribution for xt+1 is easy to sample from. The update sufficient
(i) (i)
statistics st+1 . Then sample new θ|st+1 draw.
Since the algorithm is slightly different from the ones above, we provide the details.
The algorithm requires the following steps:
(i)
1. Draw It+1 ∼ p(It+1 |xt , θ) = p(It+1 ) = pj
(i) (i)
2. Re-sample triples (It+1 , xt , θ(i) ) with weights w (It+1 , xt , θ)
(i) ¡ ¢
3. Draw xt+1 ∼ p xt+1 |(It+1 , xt , θ)(i) , yt+1
³ ´
(i) (i) (i) (i)
4. Update sufficient statistics st+1 = S st , It+1 , xt+1 , yt+1
³ ´
(i) (i)
5. Draw θ ∼ p θ|st+1
Our approach uses be exact sampling from the particle approximation distribution.
Other authors have done sequential and parameter learning but have approximate algo-
rithms that also have difficulty in learning σv . Johannes, Polson and Stroud (2005) propose
an alternative approach to the exact sampling scheme used here based on interacting par-
ticle systems using importance sampling. They also analyze the nonlinear model without
using the mixture of errors transformation.
20
3.3 Numerical results
3.3.1 T-errors model: Simulated data
We consider a real data example using daily Nasdaq 100 stock index returns from 1999
to August 2006 for a total of T = 1953 observations. The priors are given by αx |σx2 ∼
N (0, 0.1σx2 ), βx |σx2 ∼ N (0.7, 2σx2 ), σx2 ∼ IG (20, 0.154), and σ2 ∼ IG (5, 10). Again, the
21
algorithm was run with N = 5000 particles. The results are in Figures (4) to (6).
The results provide a number of interesting findings. First, Figure (4) indicates that
there is little evidence for a time-varying mean for Nasdaq 100 stock returns. This is not
surprising because stock returns, and Nasdaq returns in particular, are quite noisy and past
evidence indicates that it is difficult to identify mean predictability over short frequencies
such as daily. Predictability, if it is present, is commonly seen over longer horizons such
as quarterly or annually. The filtered quantiles in the bottom panel of Figure (4) indicate
that there could be predictability, as the (5, 95)% bands are roughly -0.5% and 0.5%, but
there is too much uncertainty to identify it.
Second, one source of the uncertainty, especially in the early parts of the sample, is the
uncertainty over the parameters, which is shown in Figure (5). For each of the parameters,
the priors are relatively loose. This generates substantial uncertainty for the early portion
of the sample, and contributes to the highly uncertain filtered state distribution.
Third, a closer examination of Figure (5) shows that posterior for σ seems to be varying
over time, as it increase in the early portion of the sample and decreases in the latter portion.
This is capturing time-varying volatility as volatility declined in equity markets since the
early part of 2003. This can be seen from the data in the upper panel of Figure (4) and
will be clear in the stochastic volatility example below. It is important to note that this
is not due to outliers, since we allow for fat-tailed t-errors in both the observation and
state equation. This provides a useful diagnostic for slow time-variation: the fact that the
posterior for σ appears to be varying over time indicates that the model is misspecified
and a more general specification with stochastic volatility is warranted. Finally, Figure
(6) shows the posterior distribution at time T , and shows that the posteriors are slightly
non-normal, consistent with the findings in Jacquier, Polson, and Rossi (1994).
22
learn the true parameter values. Of note is that despite the near-unit behavior of stochastic
volatility, we are able to accurate estimate the persistence parameter.
We consider a real data example using daily Nasdaq 100 stock index returns from 1999 to
August 2006. The priors used are given by αx |σx2 ∼ N (0, 0.1σx2 ), βx |σx2 ∼ N (0.7, σx2 /4),
and σx2 ∼ IG (30, 0.725) . The algorithm was run using N = 5000 particles. The results are
given in Figures (10) to (12)
The previous results, in Figure (5), indicated the estimates of σ varied over time in the
t-errors model. This is more easily seen in the bottom panel of Figure (10), which displays
the posterior quantiles of daily volatility, exp (xt /2). Daily volatility was high and volatile
in the 2000-2002 period, volatility declined almost monotonically in 2003-2006. This slow
time-variation is exactly what the stochastic volatility models aims to capture.
Figure (11) shows the posterior quantiles over time, and provides some evidence of
time-variation. In the early portion of the sample, volatility was higher than the latter
portion. This feature is captured in the top panel of (11) by time-variation in the posterior
for αx , which controls the mean of log-volatility. The posterior means for α are much
higher in 1999-2000, than in the latter years, although there is greater uncertainty in the
early portion of the sample. It is interesting to note that the posteriors for β and σx vary
less over time. Figure (12) displays the posterior at time T . Given the large sample, there
is relatively little evidence for non-normality in the posteriors.
4 Conclusions
In this paper, we provide an exact sampling algorithm for performing sequential parameter
learning and state filtering for nonlinear, non-Gaussian state space models. The implication
of this is that we do not resort to importance sampling, and thus avoid the well known
degeneracies associated with sequential importance sampling methods. Formally, the only
assumption we require is that the parameter posterior admits a sufficient statistic structure.
We analyze the class of linear non-Gaussian models in detail, and exact state filtering is a
special case of our algorithm. Thus, we provide an exact sampling alternative to algorithms
such as the auxiliary particle filter of Pitt and Shephard (1999) and mixture Kalman filter
of Chen and Liu (2000) We provide both simulation and real data examples to document
23
the efficacy of the approach.
We are currently working on two extensions. First, in Johannes and Polson (2006),
we examine sequential parameter learning and state filtering algorithms for multivariate
Gaussian models, deriving the exact distributions required to implement the algorithms.
Second, in Johannes, Polson, and Yae (2006), we consider the problem of robust filtering.
Here, we adapt our algorithms to handle sequential parameter and state filtering via “ro-
bust” non-differentiable criterion functions such as least absolute deviations and quantiles.
Our algorithm compare favorably with those in the existing literature.
5 References
Andrieu, C., A. Doucet, and V.B. Tadic, 2006, Online simulation-based methods for pa-
rameter estimation in non linear non Gaussian state-space models, Proc. IEEE CDC,
forthcoming.
Berziuni, C., Best, N., Gilks, W. and Larizza, C. (1997). Dynamic conditional indepen-
dence models and Markov Chain Monte Carlo methods. Journal of American Statistical
Association, 92, 1403-1412.
Carlin, B. and N.G. Polson, 1992, Monte Carlo Bayesian Methods for Discrete Regression
Models and Categorical Time Series. Bayesian Statistics 4, J.M. Bernardo, et al. (Eds.),
Oxford, Oxford University Press, 577-586.
Carlin, B., N.G. Polson, and D. Stoffer, 1992, A Monte Carlo Approach to Nonnormal
and Nonlinear State-Space Modeling, Journal of the American Statistical Association, 87,
493-500.
Carpenter, J., P. Clifford, and P. Fearnhead, 1999, An Improved Particle Filter for Nonlin-
ear Problems. IEE Proceedings — Radar, Sonar and Navigation, 1999, 146, 2—7.
Carter, C.K., and R. Kohn, 1994, On Gibbs Sampling for State Space Models, Biometrika,
81, 541-553.
Carter, C.K., and R. Kohn, 1994, Markov chain Monte Carlo in conditionally Gaussian
state space models, Biometrika 83, 589-601.
Chen, R. and J. Liu, 2000, Mixture Kalman filters, Journal of Royal Statistical Society
Series B. 62, 493-508.
24
Chopin, N., 2002, A sequential particle filter method for static models. Biometrika, 89,
539-552.
Chopin, N., 2005, Inference and model choice for time-ordered hidden Markov models,
working paper, University of Bristol.
Del Moral, P., Doucet, A. and Jasra, A., 2006, Sequential Monte Carlo Samplers, Journal
of Royal Statistical Society, B, 68, 411-436.
Doucet, A, Godsill, S and Andrieu, C., 2000. On sequential Monte Carlo sampling methods
for Bayesian filtering. Statistics and Computing, 10, 197-208.
Doucet, A., N. de Freitas, and N. Gordon, 2001, Sequential Monte Carlo Methods in Prac-
tice, New York: Springer-Verlag, Series Statistics for Engineering and Information Science.
Doucet, A. and Tadic, V., 2003, Parameter Estimation in general state-space models using
particle methods. Annals of Inst. Stat. Math. 55, no. 2, pp. 409-422, 2003.
Fearnhead, P., 2002, MCMC, sufficient statistics, and particle filter. Journal of Computa-
tional and Graphical Statistics, 11, 848-862.
Godsill, S.J., Doucet, A. and West, M. (2004). Monte carlo Smoothing for Nonlinear Time
Series. Journal of American Statistical Association, 99, 156-168.
Gordon, N., Salmond, D. and Smith, Adrian, 1993, Novel approach to nonlinear/non-
Gaussian Bayesian state estimation. IEE Proceedings, F-140, 107—113.
Gordon, N. and A.F.M. Smith (1993). Approximate Non-Gaussian Bayesian Estimation
and modal consistency. Journal of Royal Statistical Society, B., 55, 913-918.
Hansen, L. and N.G. Polson (2006). Tractible Filtering. Working paper, University of
Chicago.
Jacquier, E., N.G. Polson, and P. Rossi, 1994, Bayesian analysis of Stochastic Volatility
Models, (with discussion). Journal of Business and Economic Statistics 12, 4.
Johannes, M., and Polson, N.G., 2006, Multivariate sequential parameter learning and state
filtering, working paper, University of Chicago.
Johannes, M., Polson, N.G., and S. Yae, 2006, Robust sequential parameter learning and
state filtering, working paper, University of Chicago.
Johannes, M., Polson, N.G. and Stroud, J.R., 2005, Sequential parameter estimation in
stochastic volatility models with jumps. Working paper, University of Chicago.
Johannes, M., Polson, N.G. and Stroud, J.R., 2006, An interacting particle systems ap-
proach to sequential parameter and state learning, working paper, University of Chicago.
25
Johansen, A., A. Doucet and M. Davy, 2006, Maximum likelihood parameter estimation
for latent variable models using sequential Monte Carlo, to appear Proc. IEEE ICASSP.
Kim, S., N. Shephard and S. Chib, 1998, Stochastic volatility: likelihood inference and
comparison with ARCH models, Review of Economic Studies 65, 361-93.
Kunsch, H., 2005, Recursive Monte carlo filters: Algorithms and theoretical analysis, An-
nals of Statistics, 33, 5, 1983-2021.
Liu, J. and Chen, R., 1995, Blind deconvolution via sequential imputations, Journal of
American Statistical Association., 89, 278-288.
Liu, J. and Chen, R., 1998, Sequential Monte Carlo Methods for Dynamical Systems.
Journal of the American Statistical Association, 93, 1032-1044.
Liu, J. and M. West, 2001, Combined parameter and state estimation in simulation -based
filtering, in Sequential Monte Carlo Methods in Practice, A. Doucet, N. de Freitas, and N.
Gordon, Eds. New York: SpringerVerlag, 197-217.
Masreliez, C.J., 1975, Approximate non-Gaussian filtering with linear state and observation
relations, IEEE Transactions on Automatic Control 20, 107-110.
Masreliez, C.J. and R.D. Martin, 1977, Robust Bayesian Estimation for the linear model
and robustifying the Kalman filter, IEEE Transactions on Automatic Control 22, 361-371.
Meinhold, R.J. and Singpurwalla, N.D. (1987). Robustification of Kalman Filter Models.
Journal of American Statistical Association, 84, 479-486.
Omori, Y., S. Chib, N. Shephard, and J. Nakajima, 2006, Stochastic Volatility with Lever-
age: Fast and Efficient Likelihood Inference, forthcoming Journal of Econometrics.
Pitt, M., and N. Shephard, 1999, Filtering via simulation: auxiliary particle fillters, Journal
of the American Statistical Association 94, 590-599.
Polson, N.G, J. Stroud, and P. Mueller, 2006, Practical filtering with Sequential Parameter
Learning. Working paper, University of Chicago.
Shephard, N. 1994, Partial non-Gaussian time series models, Biometrika 81, 115-31.
Smith, Adrian, and Alan Gelfand, 1992, Bayesian statistics without tears: a sampling-
resampling perspective, American Statistician 46, 84-88.
Storvik, G., 2002, Particle filters in state space models with the presence of unknown static
parameters, IEEE. Trans. of Signal Processing 50, 281—289.
26
Stroud, Jonathan, Nicholas Polson and Peter Müller, (2004), Practical Filtering for Sto-
chastic Volatility Models. In State Space and Unobserved Components Models (eds Harvey,
A. et al), 236-247. Oxford University Press.
West, M. (1981) Robust Sequential Approximate Bayesian Estimation. Journal of Royal
Statistical Society, B., 43(2), 157-166.
West, M. and J. Harrison, 1997, Bayesian Forecasting and dynamic models, New York,
Springer-Verlag.
27
Log−return
3
−1
−2
0 50 100 150 200 250 300
Time
Xt
2
1.5
0.5
−0.5
−1
−1.5
0 50 100 150 200 250 300
Time
Figure 1: The top panel plots the observed time series, yt+1 , simulated from the t-
distributed AR(1) model. The second panel plots the true simulated xt series (thick line)
as well as the (5, 50, 95) posterior quantiles of p (xt |y t ).
28
α β
0.15 1.1
0.1
1
0.05
0 0.9
−0.05 0.8
−0.1
0.7
−0.15
−0.2
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Time Time
σ σ
X
0.45 0.35
0.4 0.3
0.35 0.25
0.3 0.2
0.25 0.15
0.2 0.1
0 50 100 150 200 250 300 0 50 100 150 200 250 300
Time Time
Figure 2: This figure displays sequential summarizes of the parameter posterior, p (θ|y t ).
Each panel plots the (5, 50, 95) posterior quantiles for the given parameter and also provides
the true parameters used in simulation denoted by the horizontal line.
29
α β
500 400
400
300
300
200
200
100
100
0 0
−0.05 0 0.05 0.8 0.9 1 1.1
σ σX
400 500
400
300
300
200
200
100
100
0 0
0.2 0.25 0.3 0.35 0.4 0.16 0.18 0.2 0.22 0.24 0.26 0.28
Figure 3: This figure summarizes the posterior distribution of the parameters at time
T = 300. Each panel provides a histogram of the posterior, a smoothed estimate of the
posterio, and the true parameter value that is denoted by a solid vertical line.
30
Log−return
15
10
−5
−10
2000 2001 2002 2003 2004 2005 2006
Year
Xt
1.5
0.5
−0.5
−1
−1.5
2000 2001 2002 2003 2004 2005 2006
Year
Figure 4: The top panel plots the observed time series, yt+1 , simulated from an autoregres-
sive model with t−errors. The second panel plots the true simulated xt series (thick line)
as well as the (5, 50, 95) posterior quantiles of p (xt |y t ).
31
α β
0.9
0.1 0.85
0.8
0.05
0.75
0 0.7
0.65
−0.05
0.6
−0.1 0.55
0.5
2000 2001 2002 2003 2004 2005 2006 2000 2001 2002 2003 2004 2005 2006
Year Year
σ σ
X
3 0.28
0.26
2.5
0.24
0.22
1.5
0.2
1 0.18
2000 2001 2002 2003 2004 2005 2006 2000 2001 2002 2003 2004 2005 2006
Year Year
Figure 5: This figure displays sequential summarizes of the parameter posterior, p (θ|y t ).
Each panel plots the (5, 50, 95) posterior quantiles for the given parameter.
32
α β
120 140
100 120
100
80
80
60
60
40
40
20 20
0 0
−0.04 −0.02 0 0.02 0.04 0.65 0.7 0.75 0.8
σ σX
140 200
120
150
100
80
100
60
40
50
20
0 0
1.7 1.8 1.9 2 2.1 0.2 0.21 0.22 0.23 0.24 0.25
Figure 6: This figure summarizes the posterior distribution of the parameters at time
T = 1963. Each panel provides a histogram of the posterior, a smoothed estimate of the
posterior, and the posterior mean is indicated by a solid vertical line.
33
Log−return
4
−1
−2
−3
0 50 100 150 200 250 300
Time
2.5
1.5
0.5
0
0 50 100 150 200 250 300
Time
Figure 7: The top panel plots the observed time series, yt+1 , simulated from the stochastic
volatility model. The second panel plots the true simulated xt series (thick line) as well as
the (5, 50, 95) posterior quantiles of p (xt |y t ).
34
α
0.1
0.05
−0.05
−0.1
0 50 100 150 200 250 300
Time
β
1.1
0.9
0.8
0.7
0 50 100 150 200 250 300
Time
σ
X
0.3
0.25
0.2
0.15
0.1
0 50 100 150 200 250 300
Time
Figure 8: This figure displays sequential summarizes of the parameter posterior, p (θ|y t ).
Each panel plots the (5, 50, 95) posterior quantiles for the given parameter and also provides
the true parameters used in simulation denoted by the horizontal line.
35
α
800
600
400
200
0
−0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06
β
1000
500
0
0.9 0.95 1 1.05
σX
1000
500
0
0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32
Figure 9: This figure summarizes the posterior distribution of the parameters at time
T = 300. Each panel provides a histogram of the posterior, a smoothed estimate of the
posterio, and the true parameter value that is denoted by a solid vertical line.
36
Log−return
20
15
10
−5
−10
2000 2001 2002 2003 2004 2005 2006
Year
0
2000 2001 2002 2003 2004 2005 2006
Year
Figure 10: The top panel plots the observed time series, yt+1 , of Nasdaq 100 stock returns.
The second panel plots the true simulated xt series (thick line) as well as the (5, 50, 95)
posterior quantiles of p (xt |y t ).
37
α
0.2
0.1
−0.1
2000 2001 2002 2003 2004 2005 2006
Year
β
1.1
0.9
0.18
0.16
0.14
0.12
2000 2001 2002 2003 2004 2005 2006
Year
Figure 11: This figure displays sequential summarizes of the parameter posterior, p (θ|y t ).
Each panel plots the (5, 50, 95) posterior quantiles for the given parameter.
38
α
150
100
50
0
−0.02 −0.01 0 0.01 0.02 0.03 0.04
β
150
100
50
0
0.97 0.975 0.98 0.985 0.99 0.995 1
σX
150
100
50
0
0.155 0.16 0.165 0.17 0.175 0.18 0.185
Figure 12: This figure summarizes the posterior distribution of the parameters at time
T = 1963. Each panel provides a histogram of the posterior, a smoothed estimate of the
posterior, and the posterior mean is indicated by a solid vertical line.
39