0% found this document useful (0 votes)
41 views9 pages

S Torvik 2002

This document discusses particle filters for state-space models that contain unknown static parameters. It proposes marginalizing out the static parameters so that only the state vector needs to be considered in sequential filtering. This avoids issues like particle degeneracy that can occur when including static parameters in the state vector. The approach works when the distribution of the static parameters given observations and states depends on low-dimensional sufficient statistics. The filters are tested on several models with promising results.

Uploaded by

Daan2213
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views9 pages

S Torvik 2002

This document discusses particle filters for state-space models that contain unknown static parameters. It proposes marginalizing out the static parameters so that only the state vector needs to be considered in sequential filtering. This avoids issues like particle degeneracy that can occur when including static parameters in the state vector. The approach works when the distribution of the static parameters given observations and states depends on low-dimensional sufficient statistics. The filters are tested on several models with promising results.

Uploaded by

Daan2213
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO.

2, FEBRUARY 2002 281

Particle Filters for State-Space Models With the


Presence of Unknown Static Parameters
Geir Storvik

Abstract—In this paper, particle filters for dynamic state-space In particular, the focus will be on problems where new observa-
models handling unknown static parameters are discussed. The ap- tions arrive frequently (hours/minutes/seconds), and real-time
proach is based on marginalizing the static parameters out of the estimation/prediction is essential.
posterior distribution such that only the state vector needs to be
considered. Such a marginalization can always be applied. How- In a few cases, including linear Gaussian models and hidden
ever, real-time applications are only possible when the distribution Markov chains, the distribution of given the observations
of the unknown parameters given both observations and the hidden can be computed exactly using recursive formulas. For sit-
state vector depends on some low-dimensional sufficient statistics. uations where analytical solutions are impossible to obtain, sto-
Such sufficient statistics are present in many of the commonly used chastic simulation can be applied. Numerous papers have been
state-space models. Marginalizing the static parameters avoids the
problem of impoverishment, which typically occurs when static pa- written on construction of algorithms based on Markov chain
rameters are included as part of the state vector. The filters are Monte Carlo (MCMC) dealing with general state-space models
tested on several different models, with promising results. (see Gamerman [9] and the references therein). Although such
Index Terms—Global parameters, marginalization, particle fil- procedures may be effective for offline estimation, there are
ters, sequential updating, state-space models, sufficient statistics. problems with full MCMC in the case of on-line estimation. The
MCMC algorithm needs to be restarted at each time point. Fur-
ther, the dimension of the vector to be simulated increases with
I. INTRODUCTION time.
An alternative to full MCMC at each time point is construc-
D YNAMIC state-space models [1]–[3] are useful for de-
scribing data in many different areas, for instance, engi-
neering [4], finance mathematics [5], environmental data [6],
tion of simulation algorithms for sequential updating of the
posterior distributions. Such (slightly different) algorithms have
geophysical science [7], and disease data [8]. been developed independently in many fields [10]–[15] with
Using for a generic conditional distribution, the general different names (bootstrap filter, Monte Carlo filter, particle
(discrete time) state-space model is given by filter, condensation algorithm). The excellent review by Doucet
[16] even contains some references back to the late 1960s. A
system (1a) collection of papers describing the state of the art in this field
observations (1b) can be found in [17]. In this paper, such algorithms will be
denoted particle filters.
where contains the observations at time , whereas The main idea behind particle filters is to represent the
is an unobserved underlying stochastic process. In some cases, posterior distributions through a finite set of
may have a physical meaning, whereas in other cases, it is samples or particles that can be used to estimate any property of
merely included in order to describe the distribution of the ob- in an ordinary Monte Carlo estimation framework.
servation process properly. is a vector containing static param- When a new observation arrives, the particles are updated
eters, which in some cases can be specified but in many cases in order to represent the new posterior .
are unknown. Typically, some prior distribution is placed on . Techniques for performing this updating include rejection
An important task when analyzing data by state-space models sampling [10], importance sampling [18], sampling/importance
is estimation of the underlying state process based on measure- resampling [11], and MCMC [14], [19]. A main computational
ments from the observation process. The interest might be on problem with the general approach is that the dimension of the
itself (or a function of ), or is a tool for making pre- distribution increases with time. In many cases, only
diction on . Given data , estimation of is of interest. If earlier state variables can be disengaged [14],
is usually referred to as offline estimation, whereas on-line fast computation can be performed. Disengagement in this
estimation is sequential estimation of based on for setting means that variables at previous time points can be
. In this paper, our effort will be on the latter problem. neglected in the sequential computation algorithm.
Although particle filters have been successful in many simu-
lation experiments and in analysis of real data, a main problem
Manuscript received January 5, 2001; revised October 2, 2001. The associate with such an approach is how to handle the presence of unknown
editor coordinating the review of this paper and approving it for publication was static parameters. A common trick in engineering is to include
Prof. Simon J. Godsill. the parameters as part of the state space vector . Berzuini et
The author is with the University of Oslo and the Norwegian Computing
Center, Oslo, Norway (e-mail: [email protected]). al. [14] put this approach into a formal Bayesian setting. How-
Publisher Item Identifier S 1053-587X(02)00550-0. ever, the nondynamics in the parameters makes the parameter
1053–587X/02$17.00 © 2002 IEEE
282 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 2, FEBRUARY 2002

samples degenerate into one or a few different values when II. PARTICLE FILTERS IN GENERAL
increases. This section discusses particle filters in situations where the
Gordon et al. [11], West [20], Bølviken et al. [21], and Liu static parameters are known. For the time being, will be sup-
and West [22] introduced diversity in the set of particles by pressed in the notation.
adding random noise to the particles, which in this context is Today, many different versions of particle filters exist (see
similar to approximate the nondynamic parameters by some Doucet et al. [17]). Two different motivations are typically used
slowly changing dynamic ones. In addition to the problem of in the construction of a filter. One approach is based on im-
choosing the “diversity” procedure, this results in old obser- portance sampling. In this case, is simulated sequentially
vations being downweighted, and the parameter estimates ob- from some importance distribution , and the
tained at a given time is mainly dependent on the most recent whole trajectory is given importance weight
observations.
This paper considers an alternative approach. Suppose that
(2)
for given and , the distribution of is analytically
tractable. In particular, the distribution of is assumed to
depend on only through some low-dimensional sufficient
statistics. In such cases, only samples of , given , are such sequences are simulated in parallel, giving a weighted
needed since estimates of the posterior distribution for the static
particle set , at each time
parameters can be obtained either through Rao–Blackwelliza-
point . Restrictions on the importance distributions are needed
tion or by a simple additional simulation step. Furthermore,
both for ease of simulation and in order to make the computa-
updating the particle set to a new particle set one time step
tion of the weights possible. See [18] for further discussion on
further on can be performed by simulation of the state vector
this approach, which usually is named sequential importance
and the parameters simultaneously. The approach can be
sampling (SIS). A problem with this approach is that when time
considered to be a marginalization of from the posterior.
evolves, the variance of the weights will increase [25], making
The main idea has been suggested in several places before. the estimate (5) unstable. A common trick to avoid this is to re-
Liu and Chen [18, Sec. 5] call the procedure Rao–Black- sample from with probabilities proportional to . Liu and
wellization but state that “when disengagement is implemented, Chen [18] give some heuristics on when to resample.
Rao–Blackwellization is no longer directly applicable.” In An alternative approach is based on the ordinary histogram
this paper, we demonstrate that by including the sufficient approximation of the density :
statistics for into the state vector, it is possible to combine the
procedure with disengagement. Sufficient statistics were also
applied in [23] for models with discrete-valued state variables.
No general treatment of this approach has, however, been (3)
given in the literature. The main contribution of this paper is
to demonstrate the usefulness of marginalization in certain
classes of state models when estimation of static parameters
and dynamic state variables is performed simultaneously. where is the indicator function for event .
Although the required assumptions will restrict the set of By the Bayes rule and (1)
models possible to process by this approach, many important
(and widely used) models are included. In particular, models
for which the underlying process is Gaussian and linear in the (4)
parameters involved (but not necessarily linear in the process)
can be handled by this approach. Further, the assumption about A new particle set can now be obtained by simulating from
Gaussian noise in the system process can be relaxed to include this approximative distribution.
“partial non-Gaussian” processes, as defined by Shephard A possible rejection sampling procedure [10] for simu-
[24]. Both T-distributions and mixture of Gaussians fall into lation from (4) is to sample from
this group of models. In addition, discrete-valued Markov and accept the sample with
models and mixtures of these and Gaussian-based models can probability proportional to . This procedure can be
be used. These system processes can be combined with any repeated times in order to obtain a new (unweighted) particle
observational distribution that do not contain any additional set . In practice, the acceptance proba-
unknown parameters. Many observational distributions with bility for this simple algorithm will be far too low, making other
unknown parameters can be handled by this approach, but approaches necessary. In [19], constructions of more efficient
typically, special treatment is needed in each case. proposal distributions are given, and some other sampling
In Section II, the general particle filters are reviewed. Sec- approaches, including sampling/importance resampling and
tion III introduces the particle filters for situations with un- MCMC, are discussed. Note that the use of MCMC here is on
known static parameters. Section IV considers some particular a much smaller dimension than if a full MCMC scheme was to
classes of models that fit into this framework, whereas the filters be applied.
are applied on different types of models in Section V. Finally, a A main problem with the particle filters described so far is
summary and discussion is given in Section VI. that when simulating at time , the first compo-
STORVIK: PARTICLE FILTERS FOR STATE-SPACE MODELS 283

nents can only take the values given in . Gilks and Berzuini sufficient statistics will be stored, simulation simplifies
[26] and Carpenter et al. [27] introduced the possibilities of if is included as an ancillary variable in the simulation step.
changing the whole vector according to some Markov tran- The approach is based on the following:
sition kernel having as a stationary distribution.
Such an approach does solve some of the degeneracy problems
that can occur for the more standard particle filters but not in
general. The complexity of simulating from the Markov transi-
tion kernel will increase with time, giving similar problems as
for full MCMC simulation.
Given the main framework, the user is free to specify the al-
gorithm. However, care has to be taken in order to make the (6)
algorithm work properly both at a fixed time point for large
enough and for fixed with time increasing. Storvik [28] where , which is a constant not depending
demonstrated through some simulation experiments that the nu- on or . Using the approximation in (3), simulation from
merical errors introduced can accumulate linearly for some par- (6) can be performed as before, but with the additional step that
ticle filters. This is, in particular, the case where unknown static also needs to be simulated. The simplest approach would be
parameters are present. Using more sophisticated filters can, to simulate from , from ,
however, remove this error accumulation. This is the aim of the from , and accept with probability propor-
filter that will be presented in the next section. tional to . However, any simulation technique such
Assuming is of interest, the posterior expectation as sampling/importance resampling or MCMC can be applied.
is approximated by In addition, the SIS approach can be used.
The important part about this approach is that the parameter
simulated at time does not depend on values simulated at
(5) previous time points. This avoids the problem with impoverish-
ment.
In principle, the existence of a low-dimensional sufficient
Recently, some theoretical results on particle filters and their statistic for is not necessary because only evaluation or sim-
associated Monte Carlo estimates have appeared. Berzuini et al. ulation from is needed, as noted by Liu and
[14] established a central limit theorem for the estimator (5) for Chen [18]. However, in order to make the filter run fast and
the sequential importance sampling approach with resampling not have increasing complexity as time evolves, the need for
at each stage. More general results on convergence is given in only to depend on through be-
[29], which shows that most algorithms proposed will converge comes apparent.
properly. These results are, however, based on increasing the Following [25], an SIS with resampling (SISR) algorithm,
number of particles to infinity. In [3], it is proven that under which includes static parameters, is defined as follows.
certain conditions, the error in the approximative distribution
remains stable if grows like as increases. The order
can probably be improved by introducing additional conditions
or by constructing more efficient filters. Still, however, to the Importance Sampling: For
author’s knowledge, there are no general results on how errors • sample ;
propagate in time when is fixed. • sample and define
;
III. PARTICLE FILTERS INCLUDING NONDYNAMIC PARAMETERS • evaluate the importance weights
In this section, an approach for particle filtering in the pres-
ence of unknown parameters will be discussed. The usual ap-
proach is to include the parameters as part of the state vector
. Because of the nondynamic feature of the parameters,
samples of at time can only take the values given at time Resampling: For
. Since some of these values become very unlikely when • sample an index from with probabilities
new observations arrive, this will result in an impoverishment
proportional to ;
of the set of distinct values.
The approach taken in this paper is based on a different idea. • put , , and
Assume that the posterior distribution of given and .
depend on some sufficient statistics ,
where is easy to update recursively.
Assume that an approximate particle set is available
from the posterior distribution . Again, the Here, and are
particle set is to be updated to a new particle set at proposal distributions for and , respectively. Typically,
time . Even though only the process in addition to the in order to make
284 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 2, FEBRUARY 2002

both simulation and computation fast. is a function where is the dimension of . The particle filter approach
updating the sufficient statistics. described in Section III can therefore easily be applied.
As for the ordinary SISR algorithm, resampling can be The choice of distributions for priors on and are crucial
performed at each time step or according to some specific rules in order to obtain the analytical tractable forms (8a) and (8b) but
[12]. Stratified sampling [27] and other variance-reduction should be sufficient in most cases. Priors with little information
methods can also easily be incorporated. can be obtained choosing a large and and small. More
In the case of resampling at each time step and with complex priors are possible to obtain by using mixtures of Gaus-
, the algorithm can be seen sians (for ) and mixtures of inverse gamma distributions (for
as a special case of the resample-move algorithm due to [26], ). Choosing such mixtures as priors will change the posteriors
where the move step at time corresponds to sampling (8a) and (8b) to mixtures of Gaussians and inverse gamma dis-
from . tributions, respectively.
Samples of based on are directly available through the
algorithm. In order to estimate , a better approach is to use B. Partial Non-Gaussian State Space
Rao–Blackwellization, as described in [25]. Shephard [24] introduced a class of non-Gaussian time-series
models allowing the noise process to be T-distributed or a mix-
IV. GAUSSIAN-BASED SYSTEM PROCESSES ture of Gaussians. The particle filter approach presented in this
In this section, some particular classes of models will be dis- paper is also applicable for this type of model. For illustration
cussed. Only models for system processes will be considered purposes, only T-distributed noise will be considered.
because these can be discussed in general terms. When unknown Rewrite model (7) as
parameters are present in the observational distribution, special
treatment usually is needed. This is therefore discussed through and
specific examples in Section V. In many cases, the parame- (10)
ters involved in the observation process are given from other
sources. In such cases, no extra treatment is needed. with and being independent. The posterior distributions
We will concentrate on Gaussian-based models. This is both of and given both and become equal to (8), but
because such models are commonly applied as system processes the updating formulas (9) are slightly changed by replacing
and because sufficient statistics are easily calculated for this in (9a) by .
class of models. The state vector needs in this case to be extended to include
. Simulation conditional on can be performed as in the
A. Gaussian System Processes Gaussian case with small modifications. Simulation of given
all the other variables is also easy
A particular useful class of models is obtained when the un-
derlying state process is Gaussian but where the observation dis-
tribution is arbitrary [though following (1b)]. Assume

(7) where is the dimension of . Direct simulation of all variables


involved is, however, no longer possible, but a blocked Gibbs
where is a matrix with elements possibly non- sampler approach switching between sampling and a block
linear functions of . The unknown parameters are, in this containing all the other variables can be applied.
case, and ( is assumed known; the general case can also
be handled but becomes much more complex).
V. EXPERIMENTS
Assume and to have priors and
, where is the inverse Gamma distribution. In this section, some examples of dynamic models will
Then, a trivial extension of the standard theory [2] yields be considered in order to evaluate the performance of
the particle filter when some static parameters are un-
(8a) known. For each example, a SISR filter including static
parameters using and
(8b) is applied. Resam-
pling is performed at each time step. The result is that the
where the sufficient statistics , , , and are updated weights in (2) reduce to . More efficient
according to the equations filters, where the proposal distribution depends on the new
observation , can, however, be constructed similar to the
(9a) standard SISR filters.
(9b) In each case, the estimated posterior distributions are com-
pared with those obtained from a full MCMC run at each time
(9c)
step using a huge number of iterations. For the example in Sec-
(9d) tion V-A, a comparison is made toward other methods used in
(9e) the literature.
STORVIK: PARTICLE FILTERS FOR STATE-SPACE MODELS 285

In all examples, the number of particles has been chosen such


that a reasonable agreement with the results obtained by the
MCMC runs was obtained. Guidelines for specifying in prac-
tice is still missing, the rule of thumb being try and fail.

A. Linear Partial Gaussian Process


The first example is a simple linear model where the obser- (a)
vation noise is assumed to follow a -distribution. The model
can be written as

(11a)
(11b)

where and are independent zero-mean white noise


processes, the first being Gaussian with variance equal to one, (b) (c)
whereas the other follows a -distribution with degrees of
freedom. Such models can be used to allow for outliers in the
observations [24]. The unknown static parameters are, in this
case, ( is assumed known).
Since the system process follows the model discussed in Sec-
tion IV-A, the sufficient statistics for can be updated
according to the equations in (9). Given the model formulation
(d) (e)
above, no sufficient statistic for is available. Rewrite, however,
the observation model as Fig. 1. Model (11). Posterior expectations (solid lines) and 0.025 and 0.975
quantiles (dashed lines) based on a particle filter (black) and full MCMC (grey).
In the middle left panel, the estimates of fx g are given in black, whereas the
true values are given in grey. (a) Estimates of x . (b) Estimates and true values
of x . (c) Estimates of a. (d) Estimates of  . (e) Estimates of  .
where is a Gaussian variable, whereas is an independent
-variable. The distribution for is, of course, unaltered, but
the point is that a sufficient statistic for given and following Bølviken et al. [21] and Liu and West [22], diversity
is available. In particular is introduced into the parameter. In both cases, variance parame-
ters are included on a log scale in order to ensure positivity. For
further reference, we will denote the filter based on marginal-
ization and sufficient statistics by , whereas the filters based
where on including into the state vector will be denoted and ,
including diversity into the parameter.
None of and worked properly with the priors used for
. The main problem was to get proper estimates
of . In order to give some comparison, the prior for was
Similar to the approach discussed in Section IV-B, is therefore changed to an inverse Gamma distribution with shape
used as state vector in the particle filter. Note that simulation and scale parameters equal to 10. This gives a more informative
from the prior of is simply to draw from a distribution prior, making estimation of easier. For all filters,
since there is no dynamic structure in this variable. was used. Fig. 2 compares these approaches. In this case,
Data was simulated from this model with , , gave almost identical results compared with MCMC. For , the
, and . For and , an inverse Gamma prior dis- problems in estimating is clearly seen. Reasonable estimates
tribution with both shape and scale parameters equal to 0.5 was of are obtained, but the quantiles are very different from the
used. The prior distribution for was chosen to be . ones obtained by MCMC. The problems in estimating the static
In Fig. 1, posterior means and quantiles obtained from the SISR parameters also influence the inference for in that the cred-
filter using particles is plotted together with the same ibility intervals become wider. The problem with impoverish-
quantities calculated using full MCMC at each time step. For ment is clearly seen for algorithm . At , only a single
the process and the estimates of at each time step, both value of from the particles initially drawn has survived.
posterior means and quantiles are almost identical. Some dis- This value is then impossible to change at later time points.
crepancy is present for the estimates, but the results are still For this example, a comparison of computer times for the dif-
acceptable. In addition, the estimates for are very good. ferent algorithms were performed. Fig. 3 shows computer times
Other filters have been proposed in the literature for handling used for each time point for the particle filter (solid line),
static parameters. We have compared the filters discussed in this filter (dotted line), and the MCMC algorithm (dashed line).
paper with two other approaches. Both approaches include Times used for and are almost identical. The number of it-
into the state vector. The first approach retains the assumption erations used for MCMC was fixed for each time step, giving the
that is static (as in Kitagawa [30]). In the second approach, linear increase in computation time. This number was chosen
286 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 2, FEBRUARY 2002

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)


Fig. 2. Model (11) with an informative prior on  . Posterior expectations (solid lines) and 0.025 and 0.975 quantiles (dashed lines) from particle filters (black)
and MCMC (grey). First row shows estimates for x , second row for a, third row for  , and last row for  . First column is algorithm F , second is F , and third
is F .

such that there was reasonable agreement with results obtained MHz computer with 128 MB RAM and 512 kB Cache. The dif-
using huge number of MCMC iterations. A block Gibbs sampler ferences in computer time between algorithms and are
[with the whole state vector sampled simultane- negligible. The advantage in computer time using the particle
ously] was used for the MCMC algorithm. All computer times filters compared with MCMC is clearly seen. In practice, with
were obtained from C-programs run on a Dual Pentium II 350 the number of time points increasing, the number of iterations in
STORVIK: PARTICLE FILTERS FOR STATE-SPACE MODELS 287

Fig. 3. Comparison of times using particle filters (F solid line and F dotted
line) and an MCMC algorithm (dashed line). The scale on the y axis is in (a) (b)
seconds.

the MCMC algorithm also typically need to be increased, giving


even higher computational cost for the MCMC algorithm. Better
MCMC algorithms using Metropolis–Hastings steps could pos-
sibly decrease computation time but would not change the main
picture that while the computational cost for particle filters re-
(c) (d)
mains constant with time, it increases (linearly or more) for
MCMC algorithms. Fig. 4. Model (12) with and known. Posterior expectations (solid lines)
and 0.025 and 0.975 quantiles (dashed lines) based on a particle filter (black)
and full MCMC (grey). In the upper right panel, the estimates of fx g are given
B. Dynamic Generalized Linear Model in black , whereas the true values are given in grey. (a) Estimates of x . (b)
Estimates and true values of x . (c) Estimates of a. (d) Estimates of  .
West et al. [31] considered a general class of dynamic
Bayesian models. They studied the case where the underlying
system process is linear and the distribution for the observation referred to previously, this is not a serious concern since interest
conditioned on the underlying state vector is in the is primarily in the seasonal patterns of the process.
exponential family. We will consider applications where the We will assume a priori ,
observed data are (possibly multivariate) binary, making the , and . In this case, the
logistic model an obvious choice. Such models have been used model does not fit into (7) because and appear in the model
in, e.g., ecology [32], where a number of observers indicate through their product. In addition, direct simulation from
whether the population at the current time is either high or low. is not possible. It is, however, easy to show that
Here, only a simplified version will be considered:

(12a)
Binom logit (12b)
where
for . The unknown static parameters are
.
Data were simulated according to the model with ,
, and . Assume first that the parameters
in the observation process and are known, while, a priori,
and . In this particular case,
the recursions given in (9) can be applied to update the suffi-
cient statistics. Fig. 4 shows posterior means and quantiles ob-
tained from the SISR filter using particles. The same
quantities calculated using full MCMC at each time step are also
plotted for comparison. For all estimates, both posterior means
and quantiles are almost identical.
Turn now to the case when and are also unknown. In this
case, it will be advantageous to reparameterize the model such
that all the parameters become part of the system process. This
can be done by defining . Then, the model can be
written as

(13a)
Binom logit (13b)

where . Note that and are only identifiable through


their product. This means that the process can only be
recovered up to an unknown scale factor. In the ecology example
288 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 2, FEBRUARY 2002

(a)

(a) (b)

(b) (c)

(c) (d)
Fig. 6. Model (14). Posterior expectations (solid lines) and 0.025 and 0.975
quantiles (dashed lines) based on a particle filter using N = 2000 particles
(d) (e) (black) and N = 50 000 particles (grey). In (b), the estimates of fx g are given
Fig. 5. Model (12) with and unknown. Posterior expectations (solid lines) in black, whereas the true values are given in grey. (a) Estimates of jx j. (b)
and 0.025 and 0.975 quantiles (dashed lines) based on a particle filter (black) and Estimates and true values of jx j. (c) Estimates of a. (d) Estimates of  .
full MCMC (grey). In (c), the estimates of fx g are given in black, whereas the
true values are given in grey. (a) Estimates of x . (b) Estimates and true values
of x . (c) Estimates of . (d) Estimates of a. (e) Estimates of . Fig. 6 shows the results obtained by using a SISR filter on
this example where data were simulated from the true model
showing that using and . An inverse Gamma distribution with
shape and scale parameters equal to 0.5 was used as prior for
, whereas a priori. Because the posterior
distribution of becomes bimodal in this case, the ordinary
is a sufficient statistic for given . A Monte Carlo estimate (5) for would not be sensible to use.
SISR algorithm that samples approximately from Rather, the whole distribution should be reported. In order to
using a few Gibbs sampling steps was therefore evaluate the performance of the algorithm in this case, estimates
applied. was simulated from . The and quantiles of are shown instead.
The filter was applied with particles. Constructing
weights were put to , ignoring the error intro-
efficient MCMC algorithms for this model is difficult. Instead,
duced by using the approximative Gibbs sampling algorithm
a comparison with a filter using particles was per-
for simulating the static parameters.
formed. Again, the results are quite satisfactory.
In Fig. 5, posterior means and quantiles obtained from the
SISR filter using particles and five Gibbs sam-
pler steps is plotted together with the same quantities calculated VI. DISCUSSION
using full MCMC at each time step. The free software [33] The particle filter is a powerful method for processing a huge
was used for the MCMC runs. In addition, in this case, there is range of dynamic models. This paper discusses an approach
a nice agreement between the estimates obtained by the particle based on the particle filter for tackling unknown parameters.
filter and the ones given by the full MCMC runs. The approach has been tested on several different models, all
giving estimates almost identical to the ones obtained by run-
C. Gamma–Poisson Model ning full MCMC at each time point. Running full MCMC is not
Consider the model practical in real-time processing because the number of vari-
(14a) ables to be simulated increases in time. In contrast, the particle
filters discussed in this paper only need a fixed amount of com-
(14b) putation at each time point, making real-time processing pos-
Note that marginally, ; there- sible.
fore, this can also be described as a Gamma–Poisson process, In order to make this approach work in real time, a crucial
where controls the autocorrelation of the Gamma-process, assumption is that the posterior distribution for the parameters
whereas defines the scale of the Gamma vari- depends on the underlying system process only through some
able. Gamma–Poisson processes has been considered in, e.g., sufficient statistics that can be updated recursively. For many
[8]. models commonly applied, such sufficient statistics exist. In
STORVIK: PARTICLE FILTERS FOR STATE-SPACE MODELS 289

some other cases, the state vector can be extended by an addi- [14] C. Berzuini, N. G. Best, W. R. Gilks, and C. Larizza, “Dynamic condi-
tional variable, for which the extended model fits into the frame- tional independence models and Markov chain Monte Carlo methods,”
J. Amer. Statist. Assoc., vol. 92, pp. 1403–141, 1997.
work. When sufficient statistics are not available, the approach [15] M. Isard and A. Blake, “Condensation—Conditional density propaga-
can still be applied, but the computational complexity will in- tion for visual tracking,” Int. J. Comput. Vision, vol. 29, no. 1, pp. 5–28,
crease with time. 1998.
[16] A. Doucet, “On sequential simulation-based method for Bayesian fil-
In this paper, only Gaussian-based system processes com- tering,” CUED/F-INFENG, Tech. Rep. TR.310, 1998.
bined with general observation distributions have been consid- [17] A. Doucet, N. de Freitas, and N. Gordon, Sequential Monte Carlo in
ered, in which case, the sufficient statistics involved can be up- Practice. New York: Springer-Verlag, 2001.
[18] J. S. Liu and R. Chen, “Sequential Monte Carlo methods for dynamic
dated using Kalman-type filters. The approach should, however, systems,” J. Amer. Statist. Assoc., vol. 93, no. 443, pp. 1032–1044, 1998.
be possible to apply also for many other types of models for [19] M. K. Pitt and N. Shephard, “Filtering via simulation: Auxiliary particle
the system process. In particular, cases where the underlying filters,” J. Amer. Statist. Assoc., vol. 94, no. 446, pp. 590–599, 1999.
[20] M. West, “Mixture models, Monte Carlo, Bayesian updating and dy-
state vector is a discrete-valued Markov model can be handled namic models,” Comput. Sci. Statist., vol. 24, pp. 325–333, 1993.
using hidden Markov chain algorithms for updating the suf- [21] E. Bølviken, P. J. Acklam, N. Christophersen, and J. M. Størdal, “Monte
ficient statistics [23]. In addition, mixtures of discrete-valued Carlo filters for nonlinear state estimation,” Automatica, vol. 37, no. 2,
pp. 177–183, 1997.
Markov models and Gaussian-based models can be handled. [22] J. Liu and M. West, “Combined parameter and state estimation in
Any kind of distribution can be approximated by Gaussian mix- simulation-based filtering,” in Sequential Monte Carlo in Prac-
tures by making the number of mixtures large enough. The com- tice, A. Doucet, N. de Freitas, and N. Gordon, Eds. New York:
Springer-Verlag, 2001, pp. 197–223.
putation time will, however, increase with the number of mix- [23] T. C. Clapp and S. J. Godsill, “Fixed-lag smoothing using sequential
tures. importance sampling,” in Bayesian Statistics 6, J. M. Bernardo, J. O.
Berger, P. Dawid, and A. F. M. Smith, Eds. Oxford, U.K.: Oxford Univ.
Press, 1999, pp. 743–751.
ACKNOWLEDGMENT [24] N. Shephard, “Partial non-Gaussian state space,” Biometrika, vol. 81,
no. 1, pp. 115–31, 1994.
This work was finalized during the author’s sabbatical at the [25] A. Doucet, S. Godsill, and C. Andrieu, “On sequential Monte Carlo
Department of Statistics and Demography at the University of sampling methods for Bayesian filtering,” Statist. Comput., vol. 10, pp.
Odense. The author would like to thank the department for al- 197–208, 2000.
[26] W. R. Gilks and C. Berzuini, “Following a moving target—Monte Carlo
lowing him to use all their facilities during this period. He is also inference for dynamic Bayesian models,” J. R. Statist. Soc. B, vol. 63,
grateful to B. Storvik and B. Natvig and the anonymous referees pp. 1–20, 2001.
for valuable comments on this paper. [27] T. R. Carpenter, P. Clifford, and P. Fearnhead, “Building robust simula-
tion-based filters for evolving data sets,” Tech. Rep., Dept. Statist., Univ.
Oxford, Oxford, U.K., 1998.
REFERENCES [28] G. Storvik, “Some further topics on Monte Carlo methods for dynamic
[1] A. C. Harvey, Forecasting, Structural Time Series Models and the Bayesian problems,” in Models and Inference in HSSS: Recent Devel-
opments and Perspectives, 2002, to be published.
Kalman Filter. Cambridge, U.K.: Cambridge Univ. Press, 1989.
[2] M. West and J. Harrison, Bayesian Forecasting and Dynamic Models, [29] D. Crisan and A. Doucet, “Convergence of sequential Monte Carlo
methods,” , 2000, submitted for publication.
2nd ed. New York: Springer-Verlag, 1997.
[3] H. R. Künsch, “State space and hidden Markov models,” in Complex [30] G. Kitagawa, “A self-organized state-space model,” J. Amer. Statist.
Assoc., vol. 93, no. 443, pp. 1–1215, 1998.
Stochastic Systems, O. E. Barndorff-Nielsen, D. R. Cox, and C.
Klüppelberg, Eds. London, U.K.: Chapman & Hall, 2001, vol. 87, pp. [31] M. West, J. Harrison, and H. S. Migon, “Dynamic generalized linear
109–173. models and Bayesian forecasting,” J. Amer. Statist. Assoc., vol. 80, pp.
73–97, 1985.
[4] B. D. O. Anderson and J. B. Moore, Optimal Filtering. Englewood
Cliffs, NJ: Prentice-Hall, 1979. [32] E. Bølviken and G. Storvik, “Deterministic and stochastic particle
filters in state-space models,” in Sequential Monte Carlo in Prac-
[5] N. Shephard, “Statistical aspects of ARCH and stochastic volatility,” in
Time Series Models with Econometric, Finance and Other Applications, tice, A. Doucet, N. de Freitas, and N. Gordon, Eds. New York:
Springer-Verlag, 2001, pp. 97–116.
D. R. Cox, D. V. Hinkley, and O. E. Barndorff-Nielsen, Eds. London,
U.K.: Chapman & Hall, 1996, pp. 1–67. [33] D. Spiegelhalter, A. Thomas, N. Best, and W. Gilks, “Bugs 0.5: Bayesian
inference using Gibbs sampling, manual (version ii),” MRC Biostatist.
[6] C. K. Wikle, M. Berliner, and N. Cressie, “Hierarchical Bayesian
space-time models,” Environ. Ecolog. Statist., vol. 5, no. 2, 1998. Unit, Inst. Public Health, Cambridge, U.K., 1996.
[7] M. Ghil and K. Ide, “Data assimilation in meteorology and oceanog-
raphy: Theory and practice,” J. Meteor. Soc. Jpn., vol. 75, no. 1B, pp.
111–496, 1997.
[8] B. Jørgensen, S. Lundbye-Christensen, P. X. K. Song, and L. Sun, “A
longitudal study of emergence room visits and air pollution for Prince
George British Columbia,” Statist. Med., vol. 15, pp. 823–836, 1996.
[9] D. Gamerman, “Markov chain Monte Carlo for dynamic generalized
models,” Biometrika, vol. 85, no. 1, pp. 215–227, 1998. Geir Storvik was born in Hitra, Norway, in 1962. He received the Cand.Sci.
[10] P. Müller, “Monte Carlo integration in general dynamic models,” Con- degree in mathematical statistics in 1986 and the Dr.Sci. degree in mathematical
temp. Math., vol. 115, pp. 145–163, 1991. statistics in 1993, both from the University of Oslo, Oslo, Norway.
[11] N. Gordon, D. Salmond, and A. F. M. Smith, “Novel approach to non- From 1987 to 1993, he was a Research Statistician at the Norwegian Com-
linear/non-Gaussian Bayesian state estimation,” Proc. Inst. Elect. Eng. puting Center, working with reservoir simulation and image analysis, while he
F, vol. 140, no. 2, pp. 107–113, 1993. pursued the Dr.Sci. degree in mathematical statistics at the University of Oslo,
[12] A. Kong, J. S. Liu, and W. H. Wong, “Sequential imputations and working with contour detection in noisy images. He was with Stanford Univer-
Bayesian missing data problems,” J. Amer. Statist. Assoc., vol. 89, no. sity, Stanford, CA, in from 1988 to 1989 and with Odense University, Odense,
425, pp. 278–288, 1994. Denmark, from 2000 to 2001. Currently, he is an Associate Professor with the
[13] G. Kitagawa, “Monte Carlo filter and smoother for non-Gaussian non- Mathematical Institute, University of Oslo. His main research interests are in
linear state-space models,” J. Comput. Graph. Statist., vol. 5, pp. 1–25, the fields of stochastic simulation, statistical computing, state-space modeling,
1996. space- and space-time modeling, and image analysis.

You might also like