0% found this document useful (0 votes)
40 views25 pages

Particle Filtering and Marginalization For Parameter Identification in Structural Systems

Uploaded by

Daan2213
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views25 pages

Particle Filtering and Marginalization For Parameter Identification in Structural Systems

Uploaded by

Daan2213
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

STRUCTURAL CONTROL AND HEALTH MONITORING

Struct. Control Health Monit. (2016)


Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/stc.1874

Particle filtering and marginalization for parameter identification in


structural systems

Audrey Olivier and Andrew W. Smyth*,†


Department of Civil Engineering and Engineering Mechanics, Columbia University, New York, NY 10027, USA

SUMMARY
In structural health monitoring, one wishes to use available measurements from a structure to assess structural
condition, localize damage if present, and quantify remaining life. Nonlinear system identification methods are
considered that use a parametric, nonlinear, physics-based model of the system, cast in the state-space framework.
Various nonlinear filters and parameter learning algorithms can then be used to recover the parameters and quantify
uncertainty. This paper focuses on the particle filter (PF), which shows the advantage of not assuming Gaussianity
of the posterior densities. However, the PF is known to behave poorly in high dimensional spaces, especially when static
parameters are added to the state vector. To improve the efficiency of the PF, the concept of Rao–Blackwellisation is
applied, that is, we use conditional linearities present in the equations to marginalize out some of the
states/parameters and infer their conditional posterior pdf using the Kalman filtering equations. This method has been
studied extensively in the particle filtering literature, and we start our discussion by improving upon and applying
two well-known algorithms on a benchmark structural system. Then, noticing that in structural systems, high nonline-
arities are often localized while the remaining equations are bilinear in the states and parameters, a novel algorithm is
proposed, which combines this marginalization approach with a second-order extended Kalman filter. This new
approach enables us to marginalize out all the states/parameters, which do not contribute to any high nonlinearity in
the equations and, thus, improve identification of the unknown parameters. Copyright © 2016 John Wiley & Sons, Ltd.

Received 16 September 2015; Revised 30 March 2016; Accepted 1 April 2016

KEY WORDS: particle filter; Rao–Blackwellisation; parameter identification; nonlinear estimation; second-order EKF

1. INTRODUCTION
In the last decade, structural health monitoring (SHM) has become an important area of research as it
shows great potential for life-safety and economic benefits for improved and responsible management
of our aging infrastructure. SHM provides the potential to detect damage and, if possible, its location
and extent, in order to inform the owner about the remaining lifetime of a structure.
As has been noted by others ([1]), consideration of nonlinear effects in SHM is very important as 1)
damage can cause a structure that initially behaved linearly to exhibit nonlinear behavior (e.g.,
opening/closing of a crack in a beam is one simple example of a nonlinear behavior), and 2) many
structures actually behave nonlinearly even within their undamaged state, for example around connec-
tions and joints. Furthermore, a common challenge in SHM is that damage is usually a local phenom-
enon ([2]) and may not significantly influence the global response of the structure typically measured
by features used for linear systems (natural frequencies, mode shapes, etc.). Using a more precise (high
fidelity) parametric representation of the system accounting for numerous degrees of freedom and
localized nonlinearities would enable one to more easily locate and quantify damage.

*Correspondence to: Andrew Smyth, Department of Civil Engineering and Engineering Mechanics, Columbia University, New
York, NY 10027, USA.

E-mail: [email protected]

Copyright © 2016 John Wiley & Sons, Ltd.


A. OLIVIER AND A. W. SMYTH

The idea is then to use a parametric model of the structure (using the equations of motion) that can be
discretized in time and cast in state-space form. In the event that there is uncertainty in the excitation, envi-
ronmental disturbance, or modeling error, some process noise may exist in the system equation. The re-
sponse of structural systems is assumed to be measured at (or between) certain degrees of freedom.
Because of real-life imperfections and sensor noise, one must also consider a potential measurement noise
in the measurement equations. Nonlinear system identification tools are then used to recover the parameters
of the model (stiffness, damping, and possible nonlinear parameters) and thus identify possible damage.
For state filtering, that is, inference of the hidden (non-measured) states, the unscented Kalman filter
(UKF) and the particle filter (PF) are both extensively studied in the literature. For parameter estimation,
many different algorithms can be used: point estimates can be obtained via optimization algorithms (min-
imization of an error function, maximization of the likelihood, and expectation–maximization algorithm),
or one can perform Bayesian inference, that is, infer the whole posterior density function (pdf) of the pa-
rameters knowing the measurements (Markov Chain Monte Carlo methods, joint state/parameter filters).
However, learning algorithms usually behave very poorly for medium and high dimensional sys-
tems, because of the so-called curse of dimensionality. For example, maximizing the likelihood
function over a high dimensional parameter space can become very challenging as many directions
need to be searched and thus cannot usually be performed with simple optimization algorithms.
Markov Chain Monte Carlo methods, even though they are known to be able to beat the curse of
dimensionality, that is, they will always converge to the true posterior pdf, will need more and more
iterations as the dimension of the searching space increases. Thus, methods are being developed to per-
form more optimized search of the parameter space (for example the Asymptotically Independent
Markov Sampling algorithm, presented in [3]).
In this paper, we will look at online Bayesian inference algorithms, more precisely the PF. The PF
enables us to infer the true posterior pdf of the states and parameters, even for high nonlinear, non-
Gaussian systems. However, it is also known to behave poorly in high dimensional spaces, and it also
suffers from a sample impoverishment issue, which makes parameter learning very hard. In this paper,
we will discuss a possible refinement for parameter learning, based on the concept of
Rao–Blackwellisation. The idea is to use the structure of the equations (more precisely conditional lin-
earities) to marginalize out some of the states and thus use the PF in a smaller dimensional space while
the remaining states are inferred using a bank of Kalman filters (KFs).
The concept of Rao–Blackwellisation has already been used for structural systems in [4,5]. The
authors Sajeeb et al. show that the Rao–Blackwellised PF for conditionally linear Gaussian (CLG) sys-
tems cannot usually be directly applied to joint state/parameter estimation in structural systems because
equations are coupled. In their first paper, this issue is overcome by substructuring the system into sev-
eral linear and nonlinear sets. However, this method requires that such substructures exist, which will
not be the case if all parameters are unknown (which is the case treated here), because all equations will
be nonlinear through multiplication of an unknown parameter with an unknown state (e.g., kx and c ẋ).
In their second paper, Sajeeb et al. instead obtain the conditional pdf of all the states using a bank of
Kalman filters, conditioned on few states previously propagated with the particle filter. They show that
increasing the analyticity of the filtering algorithm (by using Kalman filtering as much as possible)
reduces the variance of the estimate, a conclusion that we will also observe with our algorithms.
Finally, in a third paper, [6], the same authors Sajeeb et al. propose a semi-analytical PF that consists
in locally linearizing the equations to obtain an ensemble of linearized systems, which can then be
solved using Kalman filtering and further be exploited to obtain samples for the particle filter. All these
methods perform considerably better than the generic PF as they provide more accurate results and
lower variance estimates.
The two algorithms presented in this paper in Sections 3 and 4 are in many ways related to the
algorithms presented in [4,5]. However, they differ on one major point: in those two sections, we
are trying to directly apply and improve upon algorithms that already exist and are extensively used
in the particle filtering literature in other fields (tracking, navigation…); namely, the marginalized
PF for mixed linear/nonlinear state-space models (called MLN-RBPF in the remainder of the paper)
developed in [7] and Storvik’s algorithm, developed in [8] and extensively used for static parameter
estimation. In this way, we hope to present algorithms which are easily generalizable. More impor-
tantly, we present in the last section a new algorithm that combines the marginalized PF with

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
PARTICLE FILTERING AND MARGINALIZATION FOR PARAMETER IDENTIFICATION

second-order extended Kalman filter (EKF2) propagation and update steps. This is carried out because
in structural systems, high nonlinearities are often localized while the remaining equations are bilinear
in the dynamic states and static parameters; thus, a second-order Taylor series expansion enables us to
find exactly the mean and covariance of many of the states and parameters, conditioned on the few
highly nonlinear ones. In this way, we are able to greatly reduce the dimension of the sampling space
of the particle filter, which would hopefully enable us to use this algorithm for medium size systems.
To summarize, we will first review basics of the particle filter, and then we will introduce the con-
cept of Rao–Blackwellisation in Section 3. In Section 4, this concept is applied to a structural system
previously studied in [9], using both the MLN-RBPF and Storvik’s algorithm. Finally, our new algo-
rithm, which combines the marginalized PF and EKF2 updates, is presented and applied to this same
system.

2. THE PARTICLE FILTER: A BRIEF REVIEW OF ITS STRENGTHS AND WEAKNESSES


2.1. Bayesian inference in state-space models
Many dynamical models can be written as state-space models and discretized in time to follow a pro-
cess equation of the form xk + 1 = f(xk). In Bayesian filtering, one wants to estimate the states x (or
parameters) from a series of measurements y1 : N, which can be contaminated by noise. We will con-
sider the generic nonlinear dynamic system in state-space form defined by the following system and
measurement equations:
system equation : xk ¼ f ðxk1 ; ek Þ þ vk1 (1a)

measurement equation : yk ¼ hðxk Þ þ ηk (1b)

where f() and h() are usually nonlinear functions, ek is a known forcing function, and vk  1 and ηk are
the system noise and measurement noise. We consider here for simplicity additive noise terms; how-
ever, the generic PF is capable of handling any type of noise (non-Gaussian and non additive). In
the following, we will consider that they follow a zero-mean Gaussian distribution, with covariance
matrices Q and R respectively, which gives the following state-transition and likelihood functions:

state  transition density : pðxk jxk1 Þ ¼ N ðxk ; f ðxk1 ; ek Þ; Qk Þ (2a)

likelihood : pðyk jxk Þ ¼ N ðyk ; hðxk Þ; Rk Þ (2b)

where N ða; B; C Þ means that we compute the value of the Gaussian density function with mean B and
covariance matrix C at point a.
In Bayesian inference, one wants to infer the posterior pdf of the states (and parameters), knowing a
set of measurements y1 : N. The prior pdf of the hidden states p(x0) is also known. Sequential filtering
techniques infer at each time step k the posterior pdf of the states knowing all the past measurements,
that is, p(xk|y1, ⋯, yk). Using this pdf, one can compute expectations of the form:
Z
E½gðxk Þjy1:k  ¼ gðxk Þpðxk jy1:k Þdxk (3)

where g is an arbitrary function. We will often be interested in the expectation of the state so g(xk) = xk
and its second-order moment so g(xk) = (xk  E(xk))(xk  E(xk))T.
When the functions f and h are both linear and both noise terms are Gaussian, exact inference can be
performed using the KF. However, for nonlinear non-Gaussian models, the posterior pdfs become
intractable. The unscented Kalman filter overcomes this issue by assuming that all the posterior pdfs
are Gaussian, and then only knowledge of the first and second moments are needed. Those are computed
using a small set of so-called sigma points, which are propagated using the system equation and updated
using the measurements. If this Gaussian assumption does not hold, one may want to use a more general
method that can estimate any kind of posterior density function, using a PF for example. To do so, par-
ticle filtering schemes use Monte Carlo approximations of the posterior density; that is, it is represented

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
A. OLIVIER AND A. W. SMYTH

by a finite number of samples (called particles), and the quantities such as Eq. (3) are computed at each
time step by sample averages:
Xnp  
ði Þ ðiÞ
E½gðxk Þjy1:k ≈ wk g xk (4)
i¼1

ðiÞ
with np the number of particles used for the approximation and wk weights, which will be defined in the
X
np
ði Þ
following section ( wk ¼ 1). The posterior density itself can be written as follows:
i¼1
X
np  
ðiÞ ðiÞ
pðxk jy1:k Þ≈ wk δ xk  xk (5)
i¼1

2.2. The particle filter


Only a brief review of the theory of particle filtering is given here. For more details on the theory of
particle filtering, one can refer, for example, to [10–12], as well as [13] for some convergence results.
The main steps of the bootstrap PF (the simplest version of the PF) can be summarized as in Figure 1.
Briefly, from the posterior pdf at time step k1, p(xk  1|y1 : k  1), one samples particles, propagates
them to time step k via the process equation and then weights them according to their proximity to
the actual measurements.
In a more general context, the PF can also be cast in the sequential importance sampling framework,
as explained in [10]. Importance sampling relates to the fact that, as the posterior pdf at time step k is
unknown, one instead samples particles from an importance distribution π(xk|x0 : k  1, y1 : k) (sometimes
also called proposal distribution) from which samples can be easily drawn. The discrepancy between the
true posterior pdf and the importance distribution is taken into account by weighting the particles. How-
ever, as this is carried out sequentially over a large number of steps, one usually ends up with a set of
particles among which only a few have significant weights (impoverishment). The worst case scenario
is when only one particle has weight one: this is called collapse of the PF (the posterior pdf becomes a
single dirac function). To overcome this issue, one adds a resampling scheme to the algorithm, the goal
being to duplicate particles with high weight while getting rid of particles with low weights. In this way
at each time step, the system is ‘reset’, and one focuses on regions of high likelihood, thus avoiding the
degeneracy issue. The algorithm of the PF is as presented in Algorithm 1.

Figure 1. Main steps of the bootstrap particle filter.

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
PARTICLE FILTERING AND MARGINALIZATION FOR PARAMETER IDENTIFICATION

In the bootstrap PF, the importance density function is chosen as the state-transition density function
     
ðiÞ  ðiÞ ðiÞ  ðiÞ
π x̃k x0:k1 ; y1:k ¼ p x̃k xk1 (6a)

from which samples are easily drawn. The recursive formula used to compute the weights is then greatly
simplified:
  
ðiÞ ði Þ  ðiÞ
w̃k ¼ wk1 p yk x̃k (6b)

In the remainder of the paper, we will always use this standard importance density function.

2.3. Problems which arise in high dimensional systems


Particle filters have so far been applied mostly to low dimensional state-space models, as they rapidly
tend to fail when the number of states increases. Many authors have examined in detail what obstacles
arise when one tries to apply a PF to a very high dimensional problem.[14,15] examine the behavior of
the mean square error (MSE) between the filter approximation of the integral in Eq. (3) and its true
value. More precisely it can be shown that the error associated with the weighting step of the PF is in-
versely proportional to the following quantity:
Z
pðyk jxk Þpðxk jy1:k1 Þdxk (7)
Rd

If the particles sampled from the prior p(xk|y1 : k  1) fall in the tail of the likelihood p(yk|xk), many
weights will be zero, and the PF will most likely degenerate and produce poor results.
On the other hand, [16–18] examine more precisely the collapse issue (i.e., when only one particle
gets all the weight), and its dependence on the dimension of the system. They show for a simple

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
A. OLIVIER AND A. W. SMYTH

example that to avoid collapse, the number of particles should grow exponentially with the number of
measurements, indirectly related to the dimension of the state vector.
Recall that the main objective here is to estimate at each time step k the posterior distribution
p(x0 : k|y1 : k), where xk is a state vector of possibly very large size, using a weighted average over
the so-called particles. It is well known that it is very hard to estimate a probability density func-
tion in high dimensional spaces from a finite number of data points: as explained in [19], the
number of required data points increases exponentially with the dimension. When the dimension
of the space increases, the data become very sparse, which explains why the number of required
data points necessary to estimate a density distribution increases exponentially and not linearly
with the dimension of the space (curse of dimensionality). Numerically, the PF will collapse dur-
ing the weighting step, because in high dimensions, it is very probable that the prior and the pos-
terior will be mutually singular; thus, a particle sampled from the prior will have very low
probability under the posterior distribution (it will be assigned a very low weight).

2.4. What about static parameter estimation?


As explained in [12], in Bayesian frameworks, parameters are modeled as random variables with a known
prior distribution p(θ). The state-space model with unknown parameters is then formulated as follows:

priors : θ∼pðθÞ and x0 ∼pðx0 jθÞ (8a)

transition density : xk ∼pðxk jxk1 ; θÞ (8b)

likelihood : yk ∼pðyk jxk ; θÞ (8c)

The full filtering posterior distribution that we want to estimate at each time step k is now p(x0 : k, θ|y1 : k).
Several ways can be used to perform parameter estimation using particle filters (i.e., find p(θ|y1 : N)). The
most common is to augment the state vector with the vector of parameters and run a PF over the augmented
dimensional space (this is known as joint estimation). Introduction of static parameters in the state vector is
known to exacerbate the impoverishment issue of the PF. As explained in [13], the PF will be stable (in the
sense that error does not accumulate in time) only if the system is quickly mixing, which means, roughly,
that the state vector at time step k is more or less independent from xk  l, for a relatively small value of the
lag l. When static parameters are added to the state vector, this condition does not hold any more, as
parameters are propagated in time without any variation. One possible way to bypass this issue is to add
a random-walk noise with decreasing variance to each parameter in the process equation. However, this
can create some convergence and accuracy issues if the level of noise added is not well calibrated.
Another issue that will arise when doing joint estimation is the dimension of the state vector. Indeed
if many parameters need to be identified, the size of the state vector will quickly increase. As men-
tioned before, this may become a problem as an exponentially increasing number of particles will be
needed to obtain good results.
Figure 2 summarizes some of the learning algorithms (online vs. offline, point estimates vs. full
inference of the posterior pdf p(θ|y1 : N)), which can be used for inference of the parameters while
the PF is used for inference of the hidden states.
In the following, we will focus on joint state/parameter estimation, using the concept of
Rao–Blackwellisation to overcome the impoverishment issue which arises with the PF when static pa-
rameters are added to the state vector.

3. REDUCING THE SIZE OF THE SAMPLING SPACE VIA MARGINALIZATION


3.1. The concept of Rao–Blackwellisation
 
uk
The main idea in Rao–Blackwellisation ([20]) is to decompose the state vector as fxk g ¼ and
zk
the full posterior distribution as follows:

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
PARTICLE FILTERING AND MARGINALIZATION FOR PARAMETER IDENTIFICATION

Figure 2. Parameter learning algorithms, which use the particle filter for dynamic state inference.

inferred with MC sampling


zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{
pðx0:k jy1:k Þ ¼ pðu0:k ; z0:k jy1:k Þ ¼ pðz0:k ju0:k ; y1:k Þ pðu0:k jy1:k Þ (9)
|fflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflffl}
analytically tractable

where p(z0 : k ∣ u0 : k, y1 : k) can be computed exactly. This is easily carried out


 for example for CLG
uk
models (Eq. (10)), where the state vector {xk} can be divided into two parts , where zk is condi-
zk
tionally Gaussian given the nonlinear variable uk:

uk ∼ pðuk juk1 Þ
zk ¼ Gðuk1 Þzk1 þ vk1 (10)
yk ¼ H ðuk Þzk þ ηk

with ηk ∼ N(0, R(uk)) and vk  1 ∼ N(0, Q(uk  1)). It is important to notice that in CLG models, the prop-
agation equation of the nonlinear part u does not depend explicitly on the linear part z.
In this case, assuming a Gaussian prior for the linear part z, z conditioned on u can be inferred
exactly using a KF. Thus, only the nonlinear part u will be inferred using solely sequential importance
sampling (particle filter). At each time step, the filtering density for uk is a particle approximation while
it is a mixture of Gaussians for zk:

X
np    
ðiÞ ðiÞ ðiÞ ðiÞ
pðuk ; zk jy1:k Þ ¼ wk N zk ; zkjk ; Pkjk δ uk  uk (11)
i¼1

n o
ðiÞ ðiÞ
with zkjk ; Pkjk being the mean and covariance of KF conditioned on the history of the corresponding
n o
ðiÞ ðiÞ
particle u0 ; ⋯; uk . Regarding the algorithm, at each time step k, for each particle  (i), the propaga-
tion and update steps combine PF and KF equations (see for instance [12]):

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
A. OLIVIER AND A. W. SMYTH

  
ðiÞ  ðiÞ
• Propagate the nonlinear part as in the PF: uk ∼p uk uk1
ðiÞ
• Propagate the linear part, conditioned on the sampled nonlinear part uk1 , as in the KF*:
 ðiÞ
ðiÞ
zkjk1 ; Pkjk1 ¼ KFpropagateðGðuðiÞ ÞÞ zk1jk1 ; Pk1jk1
k1
 
ðiÞ ðiÞ ðiÞ
• Weight the particles as in the PF: wk ¼ wk1 p yk ju0:k ; y1:k

• Update the linear part as in the KF*:


 ðiÞ
ðiÞ
zkjk ; Pkjk ¼ KFupdateðy ;H ðuðiÞ ÞÞ zkjk1 ; Pkjk1
k k

   
ðiÞ ðiÞ
where p yk ju0:k ; y1:k and p uk juk1 are Gaussian pdfs, thus easy to evaluate and sample from.

3.2. Rao–Blackwellisation for mixed linear–nonlinear systems


However, as explained in [4], because of the fact that equations for structural systems are coupled, it is
usually not possible to write our state-space model as in Eq. (10) (i.e., our systems are not CLG).
Recently, a new marginalized PF was derived in [7] that can be applied to models where the non-
linear part u also depends explicitly on the linear part z. Those models are called, following [7], mixed
linear/nonlinear Gaussian models and can be written as follows:
uk ¼ F k1 zk1 þ f k1 þ vuk1 (12a)
zk ¼ Gk1 zk1 þ gk1 þ vzk1 (12b)

yk ¼ H k zk þ hk þ ηk (12c)
Fk  1, Gk  1, and Hk are matrices, and fk  1, gk  1, and hk are vectors that can integrate terms nonlinear
in uk,k  1. Marginalizing out the nonlinear states, we can decompose the posterior distribution as
in (9).
Again, the density p(zk ∣ u0 : k, y1 : k) is estimated using the KF (which gives the exact posterior den-
sity for a linear system with Gaussian noise) while the density p(u0 : k|y1 : k) is estimated using the PF,
th
which will now be used in a lower n dimensional space. The ok step of the MLN-RBPF, which
ðiÞ ðiÞ ðiÞ ðiÞ
takes as an input an ensemble wk1 ; uk1 ; zk1jk1 ; Pk1jk1 and outputs an ensemble
n o i¼1;⋯;np
ðiÞ ðiÞ ðiÞ ðiÞ
wk ; uk ; zkjk ; Pkjk , can be decomposed in four steps (prediction of nonlinear/linear states
i¼1;⋯;np
and measurement update of nonlinear/linear states), as for CLG models. However, because of the
explicit dependence of uk on zk  1 in Eq. (12a), the prediction step is now more complex. The full
algorithm is presented in Algorithm 3 in Appendix B, in the case of diagonal covariance matrices.
The reader is referred to [7] for full proofs and derivation of the algorithm.

3.3. Marginalization applied to parameter learning


In this section, we consider systems in which the vector z of states/parameters, which appear linearly in
the equations, is composed solely of static parameters, for which the propagation equation is simply
zk = zk  1. Thus, the system equations can be written as follows:
uk ¼ F k1 zk1 þ f k1 þ vk1 (13a)
zk ¼ zk1 (13b)

yk ¼ H k zk þ hk þ ηk (13c)
* (i)
Notations: the function KFpropagate takes as input {zk  1|k  1, Pk  1|k  1} (i.e., posterior at time step k1), and outputs
(i)
{zk|k  1, Pktk  1} (i.e., prior at time step k), using the propagation equations of the KF to the equation zk ¼ G uk1ðiÞ zk1 þ vk1,
similarly for KFupdate. The actual equations contained in the functions KF propagate and KFupdate are explicitly written in Ap-
pendix A.1. In the remainder of the paper, we will write for simplicity Gk  1 for G(uk  1) and so on.

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
PARTICLE FILTERING AND MARGINALIZATION FOR PARAMETER IDENTIFICATION

with Fk  1, Hk, fk  1, and hk matrices and vectors, which can contain nonlinear terms in uk  1,k. This
special case is of interest for structural systems because the stiffness and damping parameters c, k
usually appear linearly in the equations of motion. It is also important to notice that the vector u is
composed of both the dynamic states and the parameters which appear nonlinearly in the equations,
if any. In our numerical example
 in Section 4, the Bouc–Wen parameter n appears nonlinearly in the
xk
equations and thus uk ¼ , while all other parameters appear linearly in the equations and thus form
n
the vector z.
For this type of system, two solvers, which make use of the concept of Rao–Blackwellisation, can
be used: the MLN-RBPF and Storvik’s algorithm.

3.3.1. Using the marginalized particle filter for mixed linear/nonlinear Gaussian systems. In [21], the
method described in Section 3.2 is applied to parameter estimation in the case where the parameters are
conditionally Gaussian given the dynamic states (or vice versa). Such a system can be written in the
form Eq. (13), which is the same as Eq. (12) with G = I and g = 0; thus, the MLN-RBPF can be
ðiÞ ðiÞ ðiÞ ðiÞ
directly applied. We furthermore write zkjk ¼ μk and Pkjk ¼ C k in Eq. (11).

3.3.2. Using Storvik’s algorithm. In online parameter estimation, a similar concept is used in [8] to
marginalize static parameters out from the full pdf, while using the PF to recover the dynamic states.
Storvik’s algorithm can be applied to systems in which the pdf of the static parameters p(θ|x0 : k, y1 : k)
depends only on some low dimensional sufficient statistics sk, which can be easily updated at each
time step, that is:

pðθjx0:k ; y1:k Þ ¼ pðθjsk Þ (14a)

sk ¼ updateðsk1 ; xk ; yk Þ (14b)

where p(θ|sk) is known and easy to sample from. In this way, a new parameter vector can be
simulated at each time step without using previous values of the parameter vector, which helps
prevent the problem of impoverishment. Going back to notations in Eq. (13),   if the static
 ði Þ
parameters in vector z appear linearly in the propagation equation, the pdf p zsk is Gaussian;
n o
ðiÞ ðiÞ ðiÞ
thus, the sufficient statistics are its mean vector and covariance matrix sk ¼ μk ; C k , and the
update function consists of a KF update, in a similar fashion as for the MLN-RBPF. For each
time step k and for each particle (i), Storvik’s algorithm
  canthen be decomposed into 2 steps:
ði Þ ðiÞ
(i) sample a realization z from p zju0:k1 ; y1:k1 ¼ p zjsk1 , and (ii) knowing this realization
(i)
 
ðiÞ
z(i), run a PF for uk, that is, sample from p uk juk1 ;zðiÞ and weight proportionally to
 
ðiÞ
p yk juk ; zðiÞ . The detailed algorithm is given in Algorithm 2.
It can be noted here that even though both Storvik’s algorithm and the MLN-RBPF are based
on the same decomposition of the full pdf, there exists an important difference between the two
algorithms. In Storvik’s algorithm, a realization of zu0:kðiÞ ; y1:k is simulated at each time step,
n o
ði Þ ðiÞ
while in the MLN-RBPF, all the equations are re-written in terms of μk ; C k , where
   
ðiÞ ðiÞ ðiÞ
p zju0:k ; y1:k ¼ N z; μk ; C k , and no realization is ever generated from this pdf.
Again, Storvik’s algorithm involves, for each particle, an update of the sufficient statistics of the
ðiÞ
static parameters, conditioned on sampled particle uk and measurement yk at time step k:
 
ðiÞ ðiÞ ðiÞ
sk ¼ update sk1 ; uk ; yk (15)

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
A. OLIVIER AND A. W. SMYTH

Examples for systems where the parameters appear linearly in the propagation equation are
presented in [8]. However, the systems of interest for SHM purposes are more complex because
the same static parameters will appear in both the propagation and the measurement equations
(Eq. (13)), a case not often considered in the literature.

The way we developed the update Eq. (15) in the case where the same parameters appear in both the
propagation and measurement equation is as follows:
ði Þ
• Write the measurement yk as a function of vector uk1 and static parameters z, by combining the
propagation and measurement equations:
 
ðiÞ
yk ¼ g uk1 ; z; vk1 ; ηk (16a)
   
ðiÞ
¼ h f uk1 ; z; vk1 ; z; ηk (16b)

If the same parameters appear linearly in both f and h, they will appear quadratically in g.
• Using also the propagation equation, one can write:
 
ðiÞ ðiÞ
uk ¼ f uk1 ; z; vk1 (17a)
 
ðiÞ
yk ¼ g uk1 ; z; vk1 ; ηk (17b)

the first equation of this system is linear in z, so one can directly apply the KF update equations to up-
ðiÞ
date z, conditioned on uk . On the contrary, g is quadratic in z; thus, one needs to linearize it as g≃ g̃ ¼
ðiÞ ðiÞ
GzðiÞ where GzðiÞ is the Jacobian of g with respect to z computed using realizations uk1 and zk1 (same
procedure as the one used in the extended KF, see note in Appendix 2 for more details on the EKF)
• Now one can write the following system, linear in z:
 
ðiÞ ðiÞ
uk ¼ f uk1 ; z; vk1 (18a)
 
ðiÞ
yk ¼ g̃ uk1 ; z; vk1 ; ; ηk (18b)

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
PARTICLE FILTERING AND MARGINALIZATION FOR PARAMETER IDENTIFICATION

thus at each time step, for each particle, one can apply the KF or EKF update equations to update the
ðiÞ ðiÞ
sufficient statistics sk , knowing both uk and yk.
Several comments can be made on this update algorithm. First, one can see that the way we tackle the
measurements yk by creating a new function g is similar to what is carried out in the dual UKF (presented
in [22]). However, in the dual UKF, this step is carried out only once per time step; thus, the correlation
between dynamic states and static parameters is lost. Here instead, this update is performed for each
particle; thus, this correlation is preserved. The second comment is that linearizing g can create some
error. However, as will be shown later in the numerical example, in the systems that are of interest
for us, this error will be negligible, because of the fact that the propagation equations discretized with
small time steps.

3.3.3. Posterior posterior density function of the marginalized static parameters. Using those two
algorithms (MLN-RBPF and Storvik’s algorithm), the posterior filtering distribution of the states and
parameters can be written as follows:

X
np    
ðiÞ ðiÞ ðiÞ ðiÞ
pðuk ; zk jy1:k Þ ¼ wk N zk ; μk ; C k δ uk  uk (19)
i¼1

where z is a vector composed of static parameters which appear linearly in the equations. Then the
expected value of these marginalized static parameters is computed as follows:

X
np
ðiÞ ðiÞ
X
np
ðiÞ
E½zjy1:k ≈ wk E½zju0:k ; y1:k  ¼ wk μðiÞ (20)
i¼1 i¼1

that is, the posterior z ∣ y1 : k is a mixture of Gaussians, and the expected value of z is the weighted sum
of the means.

3.4. Rao–Blackwellisation outputs smaller variance estimates


Rao–Blackwellisation algorithms output estimates with lower variance than the standard PF esti-
mates. This can be understood intuitively as the dimension of the space where importance sam-
pling is performed is reduced, while in the remaining dimensions, inference is performed
analytically using an optimal filter (here, the KF for linear Gaussian systems). Reference [20],
which first presented this concept of Rao–Blackwellisation for PFs, provides mathematical proof
of this property.
In practice, this means that methods that use Rao–Blackwellisation should produce more consistent
results over several runs. Indeed, as we will see at the end of this paper, our new algorithm, which uses
two levels of marginalization, produces parameter estimates with lower variance.

4. NUMERICAL VALIDATION
4.1. Structural system
These algorithms were tested on a highly nonlinear three-degree of freedom dynamic system, which
was already presented in [9]. The Bouc–Wen model for hysteresis is used for the first degree of free-
dom to model high nonlinearities at the base of the structure, as shown in Figure 3. According to this
model, the restoring force is as follows:

F ðt Þ ¼ k 1 r 1 ðt Þ (21)

r1(t) is the hysteretic displacement and follows a highly nonlinear equation of motion:

: : : :
r1 ðt Þ ¼ x1 ðt Þ  βj x1 ðt Þjjr 1 ðt Þjn1 r1 ðt Þ  γ x1 ðt Þjr 1 ðt Þjn (22)

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
A. OLIVIER AND A. W. SMYTH

Figure 3. Structural system considered, adapted from [25].

The dynamic equations of motion are then as follows:


2 38 9 2 38 9
m1 0 0 > < €x 1 >
= c1 þ c2 c2 0 > < x_ 1 >
=
6 7 6 7
4 0 m2 0 5 €x 2 þ 4 c2 c2 þ c3 c3 5 x_ 2
: >
> ; : >
> ;
0 0 m3 €x 3 0 c3 c3 x_ 3
8 9
2 3> r 1 > 8 9 (23)
k 2 0 >> > m ðt Þ
k1 k2 <x > = < 1 >
> =
6 7 1
þ 4 0 k2 k 2 þ k 3 k3 5 ¼ € u g m 2 ðt Þ
> x2 >
> > >
: >
;
0 0 k 3 : >
k3 > ; m 3 ðt Þ
x3

This system equation can then be discretized using a forward Euler scheme (integration time step
dt = 0.005s in our case) and cast in state-space form, using the state vector:
x ¼ ½ x1 x2 x3 x_ 1 x_ 2 x_ 3 r1 T ∈ℝ7 (24a)

Assuming m1 = m2 = m3 = 1, the system is parameterized by the following:

θ ¼ ½ k1 k2 k3 c1 c2 c3 β γ n T ∈ℝ9 (24b)

We simulate data by running the problem with zero initial conditions, using a fourth-order
Runge–Kutta scheme, using as true parameter vector:
θtrue ¼ ½ 8 8 8 0:25 0:25 0:25 2 1 2 T (25)

We also generate measurements (accelerations at three dofs), to which we add 10% root mean
square (RMS) Gaussian noise. We also add 10% RMS noise to the excitation üg, to represent possible
error made when measuring the excitation.
In those equations, the nonlinearities appear in two forms:
• the Bouc–Wen model itself; and
• the multiplication of a unknown parameter with an unknown state (e.g., kx or c ẋ), in both the
process and the measurement equations.
Also, one can see that, except for the Bouc–Wen parameters β, γ, and n, which do not appear in
the measurement equations, the same parameters (stiffness and damping) appear in both the process
and the measurement equations. This is very common for this type of structural system, but not com-
monly considered in the particle filtering literature. Indeed, problems solved using PFs usually ex-
hibit unknown static parameters in only one of the two equations, or different parameters in the
two equations, which simplifies the parameter learning part. Finally, as explained in [4], in such
structural systems, the equations are coupled; thus, one cannot easily extract a nonlinear subspace
that would follow a Markov process, that is, this system is not CLG. However, we can see that

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
PARTICLE FILTERING AND MARGINALIZATION FOR PARAMETER IDENTIFICATION

except for the Bouc–Wen parameter n, all the parameters appear linearly in all of the equations. This
means that we can use a marginalized PF for mixed linear/nonlinear states (MLN-RBPF), or
Storvik’s algorithm with our new update equations, marginalizing out the linear parameters.

4.2. Quantities of interest computed for comparison of performance between several methods
From a SHM perspective, accurate estimation of the structural parameters is crucial. As we work here
on simulated data, that is, we know the value of the true parameter vector θtrue, we can look at the con-
vergence or the relative error at each time step: eðθk Þ ¼ jθkθθ true j
true
.
Also, to validate the method, we use the identified parameter θval and re-run the equations of mo-
tion, using zero initial conditions and the ode45 solver. This will output a time series {xval,k}k = 1 : N,
and we can then plot the hysteresis loop and compare with the true one. Both the true loop and the val-
idation one are generated using the clean excitation, as we want to compare the behaviors of the true
and identified system to the true excitation. The identified parameter can be taken as the expected value

Figure 4. Generic bootstrap particle filter: convergence results, highlighting the particle degeneracy phenomenon
observed when using a generic particle filter.

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
A. OLIVIER AND A. W. SMYTH

of p(θ|y1 : N), that is, the last parameter value; however, in order to not overemphasized the effect of the
one final point, we decided here to average over the last few time steps, that is:

1 X N
θval ¼ θk (26)
l k¼Nlþ1

with l = 80, that is, we average the results over the last 0.40s, which represents roughly a quarter of a
loop of the hysteresis variable.
To look at errors on the dynamic states, we will compute the coefficient of variation of the root
mean squared error, used to compare two time series and defined as follows:

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
XT 2ffi
1
N k¼1 k
x  x true;k
CV ðRMSE Þstates ¼ XN (27)
1
N x
k¼1 true;k

where xk can be either the time series obtained during online learning or the one obtained after valida-
tion with the last parameter value.

4.3. Degeneracy of the bootstrap particle filter


First, we ran the problem with the generic bootstrap PF using 500 particles, and no noise added to the
static parameters. We chose lognormal priors for damping and stiffness parameters (admissible region
for those parameters is ℝ+), a uniform prior U(1,5) for n, and a uniform prior over an admissible region
of {β, γ}, which is {β ≥ 0, β ≤ 5, |γ| ≤ β}.
We used diagonal covariance matrices for both the process and measurement equations. For the
measurement equations, we used the same noise level as the one actually added to the measurements
and excitation. Idem for the process noise corresponding to the velocity states. For the displacements,
we use a very small variance. Because we have simulated data, we also know that our Bouc–Wen
model of hysteresis is exact; thus, there should not be noise in this equation either. However, because
in real life it is not possible to assume that the system is perfect, we used a non-zero variance for this
equation (σ r = dt × 5% RMS noise on ẋ) to see how the PF reacts to non-zero noise.
Figures 4a and b show the convergence results and validation hysteresis loop, respectively. We can
see that the PF degenerates quickly and then the particles are stuck on one value and cannot learn any-
more from the incoming data. Then the final parameter vector obtained is not able to reproduce the
characteristics of the system, as can be seen by plotting the validation hysteresis loop.

4.4. Rao–Blackwellisation
4.4.1. Equations for the MLN-RBPF. To use Rao–Blackwellisation for this system, we define the
linear and nonlinear vectors as follows:

uk ¼ ½ x1 x2 x3 x_ 1 x_ 2 x_ 3 r1 n Tk ∈ℝ8 (28a)

zk ¼ ½ k 1 k2 k3 c1 c2 c3 β γ Tk ∈ℝ8 (28b)

The static parameters, which appear linearly in the equations, form the vector z k , while the
vector u k contains the dynamic states and the nonlinear Bouc–Wen parameter n. Then we can
use the MLN-RBPF presented in Algorithm 3 for a system of the form Eq. (13). More
precisely, the matrices and vectors F, f, H, and h of the process and measurement
equations respectively can be computed as follows (using notation x ij = x i  x j ):

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
PARTICLE FILTERING AND MARGINALIZATION FOR PARAMETER IDENTIFICATION

f ðuk1 Þ
8 9 zfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflffl{
8 9
>
>
>
x1 >
>
> > x1 þ dt_x1 >
>
> >
>
>
> x2 >
> >
> x2 þ dt_x2 > >
>
> >
> >
> >
>
> x3 >
> > >
> >
>
>
> >
> >
> x þ dt_ x 3> >
< x_ >
> = > 3
< x_  dt€ >
1 1 ug =
¼
>
> x_ 2 >
> >
> x_ 2  dt€ ug > >
>
> >
> >
> >
>
> x_ 3 >
> > >
> x_ 3  dt€ ug > >
>
> >
> >
> >
>
>
> >
> >
> þ >
>
>
> r > > r dt_ x >
: > > >
1 1 1
; : ;
n ½k n ½k1
2 3 8 9
0 0 0 0 0 0 0 0 >
> k1 >>
6 0 7 >
> >
>
6 0 0 0 0 0 0 0 7 >
> k >
>
6 7 >
>
2 >
>
6 0 0 0 0 0 0 0 0 7 >
> k3 >>
>
> >
6
6 r 1 x21 0 _x1 x_ 21 0 0 0
7
7 <c > =
þ dt 6 7 1
6 0 7 (29a)
6 x12 x32 0 x_ 12 x_ 32 0 0 7 >
> c2 >>
6 0 7 >
> >
>
6 0 x23 0 0 x_ 23 0 0 7 >
> c >
>
6 >
>
3 >
>
n7 > >
4 0 0 0 0 0 0 jx_ 1 jjr1 j r 1 _x1 jr 1 j 5
n1
>
> β >
>
: >
> ;
0 0 0 0 0 0 0 0 ½k1
γ
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
F ðuk1 Þ
8 9
>
> k1 >
> >
> >
hð uk Þ
>
>
> k2 >>
>
>
zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ >
> >
8 9 8 9 2 3 > > k3 >>
>
< €x 1 = < € ug = r1 x21 0 _x1 x_ 21 0 0 0 <c >
> =
þ4 0 x_ 12 x_ 32 0 0 5
1
yk ¼ €x 2 ¼ € ug x12 x32 0 (29b)
: ; : ; > c2 >
€x 3 ½k € u g ½k 0 0 x23 0 0 x_ 23 0 0 ½k > > >
> >
>
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} >
>
>
>
c3 >>
>
>
H ð uk Þ >
> β >
>
>
> >
: > ;
γ
4.4.2. Equations for Storvik’s algorithm. To use Storvik’s algorithm, because the same
parameters appear in both the process and measurement equations, we need to write the
measurements at step k as function of z and u k  1 and then compute its Jacobian to use the
EKF (method presented in Section 3.3.2).

4.4.3. Performance of the marginalized algorithms. Figure 5 shows the behavior of the MLN-RBPF
and Storvik’s algorithm for the same problem, again with 500 particles and the same prior for the
parameters. We can see that the PF no longer collapses and the results are much better for the
stiffness and damping parameters. It also seems that the MLN-RBPF performs a little bit better than
Storvik’s algorithm.
However, we still have problems learning the Bouc–Wen parameters β, γ, and n. Let us recall that the
propagation equation for static parameters is θk + 1 = θk; thus, static parameters are not learnt during propaga-
tion. They can only be learnt via their correlation with other states and through the measurement update.
However, the three parameters β, γ, and n do not appear directly in the measurement equation h, and
they only influence the likelihood indirectly through the hysteresis variable r1, which appears in the
equation of the first measurement y1 ¼ €x 1. This renders learning of these three parameters more difficult†;


To see more clearly this sensitivity of the likelihood to the parameters, we also run a maximum-likelihood estimator (described in [23]),
that is, while the states are recovered through the PF, the estimated static parameters are computed at each time step by maximization of
the likelihood p(y1 : k|θ). Our numerical experiments show that if the acceleration at dof 1 is not measured, the gradients ∂p∂β;γ;n
ðy1:k jθÞ
are zero
at each time step, that is, one cannot learn those parameters via online likelihood maximization. Indeed, this can be easily explained
because if €x 1 is not measured, neither β, γ, n nor r1 appear explicitly in the measurement equation h. Thus, the parameters β, γ, n do
not have any influence on the likelihood at the current time step, which explains why the gradients are zero.

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
A. OLIVIER AND A. W. SMYTH

Figure 5. Performance of the MLN-RBPF (blue) versus Storvik’s algorithm (green).

thus, it seems that either more particles or an improved algorithm would be needed to obtain good
estimates of those Bouc–Wen parameters.

5. IMPROVEMENTS ON LEARNING THE BOUC–WEN MODEL PARAMETERS


5.1. Second-order extended Kalman filter for second level of marginalization
As mentioned previously, the nonlinearities in the problem studied and more generally in problems of
interest for SHM purposes come from two sources:
• multiplication of an unknown state with an unknown parameter (bilinear nonlinearities in
dynamic states and parameters); and
• Bouc–Wen model of hysteresis (localized high nonlinearity).

Table I. Nonlinear approximation of xTx, x is nx-dimensional for x∼N ð0; I nx Þ, adapted from [24].
Theory T1 T2 UT
Mean nx 0 nx nx
Covariance 2nx 0 2nx 2n2x

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
PARTICLE FILTERING AND MARGINALIZATION FOR PARAMETER IDENTIFICATION

Figure 6. Performance of the MLN-RBPF (blue) versus the marginalized particle filter with second-order extended
Kalman filter (MPF-EKF2) (red).

In many structural systems of interest for SHM, high nonlinearities are localized; thus, only few
states are actually involved in highly nonlinear equations. Thus, conditioned on those few states and
parameters (in our example, conditioned on x_ 1 ; r1 ; n), the system is bilinear in the states and parameters.
Recalling basics of nonlinear Kalman filtering (extended Kalman filtering for instance), we know that it
is possible to find exactly the mean and covariance of a multivariate Gaussian random vector, which
undergoes a quadratic transformation, through a second-order Taylor series expansion of this transfor-
mation (for a quadratic, or bilinear, transformation, the third-order and higher derivatives are 0; thus,
there is no error term in the second-order Taylor series expansion). Using this concept in nonlinear
Kalman filtering gives rise to the so-called EKF2, whose detailed equations are given in Appendix 2
(see for instance [12] for a detailed derivation of those equations). This filter has been gaining some
interest lately, because of its exactness for quadratic transformations over other filters. For instance,
Table I shows the mean and covariance of a transformed Gaussian random variable (prior is
x∼N ð0; I nx Þ), for a quadratic transformation z = g(x) = xTx, obtained with different methods. The theoret-
ical distribution of the transformed variable z is χ 2(nx), with mean nx and variance 2nx. Table I, adapted
from [24], shows that using a second-order Taylor series expansion (T2) gives the correct mean and
covariance, while the first-order Taylor expansion (T1) and even the unscented transform, used in
the UKF, do not output exact results.

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
A. OLIVIER AND A. W. SMYTH

Figure 7. Performance of MLN-RBPF (blue) versus Storvik’s algorithm versus the marginalized particle filter
with second-order extended Kalman filter (red) on the final identification of the static parameters, that is, identi-
fying θval. A cross is drawn for each parameter estimate; its position indicates the mean value over 50 runs, and its
size indicates the s.t.d. over 50 runs in each direction.

Thus, here, let us consider in our numerical example the partition of the augmented state vector
xbl ¼ ½ x1 x2 x3 x_ 2 x_ 3 k1 k2 k3 c1 c2 c3 β γ T ∈ℝ13 (30a)

xn ¼ ½ x_ 1 r1 n T ∈ℝ3 (30b)
bl
The vector x undergoes only quadratic transformations; thus, knowing the measurements yk and
the nonlinear states xn, one can find its mean and covariance using a EKF2. Plugging this idea into
the Rao–Blackwellised concept can be carried out by running a marginalized PF for mixed
linear–nonlinear Gaussian systems, using EKF2 propagation and update equations instead of the nor-
mal KF ones. More precisely, in Algorithm 3 (Appendix B), use u = xn and z = xbl and replace the
KFpropagate and KFupdate functions with the EKF2propagate and EKF2update ones respectively,
detailed in Appendix 2. This marginalized PF with EKF2 updates will be referred in the remainder
of the paper as MPF-EKF2.
It is very important to recognize that no linearization of a high nonlinear equation is performed here.
The posterior pdf of any state or parameter, which undergoes a transformation that is neither quadratic
nor linear, is inferred using the particle filter. Any linearization of a high nonlinearity would create
some error; here, we only use the EKF2 inference for quadratic functions, and simple EKF for linear
functions. However, there is a Gaussian approximation done for the quadratically transformed vari-
ables, that is, we assume that conditioned on the measurements yk and the highly nonlinear variables
xn, the pdf of xbl is Gaussian, which is not exact. However, this approximation is carried out for each
particle, not for the overall pdf; thus, the overall pdf can still be non-Gaussian (mixture of Gaussians).
This small assumption enables us to compute analytically the mean and covariance of xbl conditioned
on yk and xn exactly, thus outputting better results with the same number of particles, as will be shown
in the following section.
Another important observation is that with this last algorithm, the number of states which are
inferred only via particle approximation (here xn ∈ ℝ3), does no longer scale with the dimension

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
PARTICLE FILTERING AND MARGINALIZATION FOR PARAMETER IDENTIFICATION

Figure 8. Marginalized particle filter with second-order extended Kalman filter updates.

of the problem as it was for the MLN-RBPF and Storvik’s algorithm. This will generalize to
larger structures as long as the high nonlinearities are localized, which is often the case for
structural systems. Thus, this provides some optimism for the use of this last algorithm on
higher dimensional systems.

5.2. Results and discussion


Results of this new algorithm versus the MLN-RBPF are shown in Figure 6. Again, we can see that this
new algorithm performs very well for the stiffness and damping parameters. Results for the Bouc–Wen
parameters vary from one run to another. To look more precisely at results for the static parameters, the
three solvers were run 50 times each; then for each run, the identified parameter θval was computed (as
explained in Section 4.2). Then we looked at the statistics (mean and standard deviation) of this iden-
tified parameter vector. Results are shown on Figure 7: a cross is drawn for each parameter estimate; its

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
A. OLIVIER AND A. W. SMYTH

position indicates the mean value over 50 runs, and its size indicates the standard deviation (s.t.d.)
over 50 runs in each direction. First, we can observe that the s.t.d. for damping and stiffness param-
eters is much lower for our new algorithm, meaning that each run outputs almost the same results
and those results are very good for the stiffness, as well as c1 and c3. We are overestimating c2 with
all three algorithms; however, with our last algorithm, we obtained about 20% error on average
over the damping parameters, which is generally acceptable for damping parameters since they
are usually harder to recover than stiffness parameters. Concerning the Bouc–Wen model parame-
ters, we can see that, on average, γ and n are pretty well recovered with our new algorithm. How-
ever, the parameter β is not well recovered. This is a trend that we have observed on all runs (even
with other filters), that is, the β parameter seems to be the hardest to recover. Note however that
this would most likely vary with the excitation, response levels, frequency content, etc.
We can also observe in Figure 6 that our new algorithm performs better when looking at the nor-
malized root-mean-square error of the dynamic states (i.e., error between inferred hidden states and
the true states).
Also, even though results on the damping parameters are acceptable (a little bit less than 20% error
on average with our last algorithm), we can observe some bias in the identified parameter, especially
for parameter c2. Indeed, Figure 7 shows that our last algorithm consistently outputs overestimated
values of the damping parameters, especially for c2. Also, the parameters β and γ, which govern
the shape of the hysteresis loop, are usually a little bit overestimated. However, when we look at
the validation loop, it can be very well recovered, meaning that the error on some parameters com-
pensates the error on others. For example, Figure 8 shows the results for one run of our new algo-
rithm, where the validation loop is very good, which is pretty hard to achieve for such hysteretic
systems. However, the identified parameter vector shows clear bias for c2 and β:

θval ¼ ½ 7:973 7:912 7:943 0:277 0:342 0:278 3:418 0:960 2:488 T (31)

to be compared with the true values:

θtrue ¼ ½ 8 8 8 0:25 0:25 0:25 2 1 2 T (32)

We believe that those errors could be reduced if we could use a better discretization scheme than
the forward Euler. Actually, because the underlying process is continuous, one could use a
continuous-discrete sequential importance resampling algorithm, as derived in [26]. Furthermore, in
this same paper, Rao–Blackwellisation algorithms are also derived for continuous-discrete systems
(i.e., continuous process equation and discrete measurements, as is actually the case for us), so those
algorithms might be adapted and applied to our systems of interest.

6. CONCLUSIONS
In this paper, we have tackled state estimation and parameter learning in structural systems using par-
ticle filtering algorithms. These algorithms present the advantage that no Gaussian approximation is
made on the posterior pdf of the states and parameters, which can be very valuable for highly nonlinear
systems because we cannot be sure that the Gaussian approximation is correct (it will probably depend
a lot on the excitation, the level of non linearity, and maybe even the type of measurements). However,
those algorithms are not easy to use for large dimensional systems because they tend to collapse, espe-
cially when static parameters are added to the state vector.
We discuss here methods based on the concept of Rao–Blackwellisation, where states and parame-
ters, which appear linearly in the equations, are marginalized out and inferred using KF equations, thus
reducing the dimension of the sampling space where the PF is used. We applied two well-known
algorithms in the PF literature, namely, the marginalized PF for mixed linear/nonlinear Gaussian sys-
tems and Storvik’s algorithm. We developed a new way to tackle parameter learning with Storvik’s
algorithm in the case where the same parameters appear in both the propagation and the measurement
equation, a case not often treated in the literature but very important for SHM purposes. We were able
to learn quite accurately all the stiffness and damping parameters of the system using those two

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
PARTICLE FILTERING AND MARGINALIZATION FOR PARAMETER IDENTIFICATION

algorithms and a limited number of particles (500). Thus, this renders possible learning of parameters
in structural systems of higher dimensions.
Finally, we improved upon this algorithm by focusing sampling around the states involved in the
high nonlinearity of the system, observing that most of the equations are bilinear in the
states/parameters; thus, their mean and covariance can be inferred exactly using a EKF2. This can be
very helpful for structural systems because high nonlinearities are often localized. We showed that this
new algorithm outperforms the two algorithms previously mentioned, both in terms of state estimation
and parameter identification.
This paper shows promising results regarding application of particle filtering algorithms for
parameter identification and thus damage detection purposes; however, all experiments were per-
formed here using synthetic data. Experimental validation of Bayesian methods for damage de-
tection purposes was presented for instance in [27], where the UKF is used to perform joint
state/parameter estimation on experimental data. Further assessment of the methods proposed
here, based on the PF, would also require a similar validation using experimental data.
To move towards higher dimensional systems, one could think about adapting our last algo-
rithm with some well-known enhancements of the generic PF, such as using an optimal proposal,
or an auxiliary particle filter, method already used in [28] as an improvement upon Storvik’s al-
gorithm for parameter learning. One could look at the block PF derived in [15]. This algorithm
uses the fact that high dimensional systems usually represent phenomena or structures, which are
spatially distributed over relatively large distances (for us, numerous degrees of freedom, not all
connected to each other). We can then assume that the dynamics at one degree of freedom are
mostly dependent on the dofs in its neighborhood, and more importantly, it will be mostly depen-
dent on measurements performed in its vicinity. This translates mathematically into a decay of
correlation as the actual spatial distance between the degrees of freedom increases. In [15], it
is explained that using this property can provide a mechanism to overcome the curse of dimen-
sionality. However, the problem of parameter estimation, usually more complicated than state es-
timation, is not treated and should then be studied carefully.

APPENDIX A: KALMAN FILTER AND EXTENDED KALMAN FILTER EQUATIONS

A.1. KALMAN FILTER EQUATIONS


Let us consider the linear Gaussian system:
zk ¼ Gk1 zk1 þ gk1 þ vk1 (33a)
yk ¼ H k zk þ hk þ ηk (33b)
At each time step k, the filtering posterior pdf of the state zk knowing the measurements y1 : k
can be inferred exactly using the Kalman filter equations. As for any Bayesian filtering scheme,
these equations consist in a propagation and an update step:

• Start with posterior at time step k1: N zk1 ; zk1jk1 ; Pk1jk1
• Propagation equation


zkjk1 ; Pkjk1 ¼ KFpropagateðGk1 ;gk1 Þ zk1jk1 ; Pk1jk1
which consists of
zkjk1 ¼ Gk1 zk1jk1 þ gk1

Pkjk1 ¼ Gk1 Pk1jk1 GTk1 þ Qk1


• Update equation


zkjk ; Pkjk ¼ KFupdateðyk ;H k ;hk Þ zkjk1 ; Pkjk1

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
A. OLIVIER AND A. W. SMYTH

which consists of
zkjk ¼ zkjk1 þ K k yk  H k zkjk1  hk
Pkjk ¼ ðI  K k H k ÞPkjk1
K k ¼ Pkjk1 H Tk L1
k
Lk ¼ H k Pkjk1 H Tk þ Rk

• End with posterior at time step k: N zk ; zkjk ; Pkjk

A.2 SECOND-ORDER EXTENDED KALMAN FILTER EQUATIONS


Let us consider the nonlinear Gaussian system:
zk ¼ φðzk1 Þ þ gk1 þ vk1 (34a)

yk ¼ ψ ðzk Þ þ hk þ ηk (34b)

where ϕ and ψ are nonlinear functions. In the EKFs, one linearizes those two functions using
their Taylor series expansion around the mean. In the EKF2, terms up to the second order are
kept in the expansion. Thus, one can write for the propagation equation:
X1
φðzk1 Þ ¼ φ zk1jk1 þ δz ≈φ zk1jk1 þ Gz δz þ δzT Gizz δzϵi
i
2
with Gz the Jacobian of ϕ and Gizz the hessian of the ith component of ϕ, both computed at point
zk  1|k  1. Also, ϵi is a vector of zeros with 1 at position i. For the measurement equation one has
X1
ψ ðzk Þ ¼ ψ zkjk1 þ δz ≈ψ zkjk1 þ H z δz þ δzT H izz δzϵi
i
2
with Hz the Jacobian of ψ and H izz the Hessian of the ith component of ψ, both computed at point
zk|k  1.
One can then show (see for example [12]) that the propagation and update equations of the EKF2
can be written as follows:
• Propagation equation


zkjk1 ; Pkjk1 ¼ EKF 2 propagateðφÞ zk1jk1 ; Pk1jk1
which consists of
X

zkjk1 ¼ φ zk1jk1 þ 12 ϵi tr Gizz Pk1jk1
i
X n ′
o
Pkjk1 ¼ Gz Pk1jk1 Gz þ 2
T 1
ϵi ϵTi′ tr Gizz Pk1jk1 Gizz Pk1jk1 þ Qk1
i;i′

• Update equation


zkjk ; Pkjk ¼ EKF 2 updateðyk ;ψÞ zkjk1 ; Pkjk1
which consists of: 1X
zkjk ¼ zkjk1 þ K k ðyk  ψðzkjk1 Þ  ϵi trfH izz Pkjk1 gÞ
2 i
Pkjk ¼ Pkjk1  K k Lk K Tk

K k ¼ Pkjk1 H Tz L1
k

1X T n i ′
o
Lk ¼ H z Pkjk1 H Tz þ ϵi ϵi′ tr H zz Pkjk1 H izz Pkjk1 þ Rk
2 ′
i;i

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
PARTICLE FILTERING AND MARGINALIZATION FOR PARAMETER IDENTIFICATION

Note: in the first-order Kalman filter, only the first-order terms are kept in the Taylor expansion. The
equations for this filter can be easily obtained from the previous equations by setting all the Hessian
terms to 0.

APPENDIX B: THE RAO–BLACKWELLISED PARTICLE FILTER FOR MIXED LINEAR/


NONLINEAR GAUSSIAN SYSTEMS

Recall that the system of interest (mixed linear/nonlinear Gaussian system) can be written as follows:

uk ¼ F k1 zk1 þ f k1 þ vuk1 (35a)


zk ¼ Gk1 zk1 þ gk1 þ vzk1 (35b)
yk ¼ H k zk þ hk þ ηk (35c)

Eq. (35c) is the measurement equation for both the linear and nonlinear parts of the state vector,
using yk as a measurement. Equations (35a) and (35b) are propagation equations for the nonlinear
and linear parts, respectively. However, the nice property that is used in Algorithm 3 is that Eq.
(35a) can also be seen as a linear measurement equation for the linear part z, using uk as a measure-
ment. Thus, knowing uk, one can use the Kalman filter update equations to update the mean and co-
variance of the linear part and then propagate it using Eq. (35b).
The following algorithm considers only diagonal covariance matrices Qk and Rk. For more general
cases, see the main reference [7].

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
A. OLIVIER AND A. W. SMYTH

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc
PARTICLE FILTERING AND MARGINALIZATION FOR PARAMETER IDENTIFICATION

ACKNOWLEDGEMENTS

The authors would also like to acknowledge the support of the US National Science Foundation, which partially
supported this research under Grant No. CMMI-1100321.

REFERENCES

1. Worden K, Farrar CR, Haywood J, Todd M. A review of nonlinear dynamics applications to structural health monitoring.
Structural Control and Health Monitoring 2008; 15(4):540–567. doi:10.1002/stc.215.
2. Farrar CR, Worden K. An introduction to structural health monitoring. Philisophical Transactions of the Royal Society A
2007; 365:303–315. doi:10.1098/rsta.2006.1928.
3. Zuev KM, Beck JL. Asymptotically independent Markov sampling: a new MCMC scheme for Bayesian inference. Interna-
tional Journal for Uncertainty Quantification 2013; 3:445–474. doi:10.1615/Int.J.UncertaintyQuantification.2012004713.
4. Sajeeb R, Manohar CS, Roy D. Rao–Blackwellisation with substructuring for identification of a class of noisy nonlinear
dynamical systems. International Journal of Engineering under Uncertainty: Hazards, Assessment and Mitigation 2009;
1:81–99.
5. Sajeeb R, Manohar CS, Roy D. A conditionally linearized Monte Carlo filter in non-linear structural dynamics. International
Journal of Non-Linear Mechanics 2009; 44(7):776–790. doi:10.1016/j.ijnonlinmec.2009.04.001.
6. Sajeeb R, Manohar CS, Roy D. A semi-analytical particle filter for identification of nonlinear oscillators. Probabilistic
Engineering Mechanics 2010; 25(1):35–48. doi:10.1016/j.probengmech.2009.05.004.
7. Schön T, Gustafsson F, Nordlund P-j. Marginalized particle filters for mixed linear/nonlinear state-space modelsIn. IEEE
Transactions on Signal Processing 2005; 53:2279–2289. doi:10.1109/TSP.2005.849151.
8. Storvik G. Particle filters for state-space models with the presence of unknown static parameters. Signal Processing, IEEE
Transactions on 2002; 50(2):281–289. doi:10.1109/78.978383.
9. Chatzi EN, Smyth AW. The unscented Kalman filter and particle filter methods for nonlinear structural system identification
with non-collocated heterogeneous sensing. Structural Control and Health Monitoring 2009; 16:99–123. doi:10.1002/
stc.290.
10. Doucet A, Johansen AM. A tutorial on particle filtering and smoothing: fifteen years later. In Handbook of Nonlinear
Filtering, Vol. 12. Oxford University Press: Oxford, UK, 2011; 656–704.
11. Cappé O, Godsill SJ, Moulines E. An overview of existing methods and recent advances in sequential Monte CarloIn.
Proceedings of the IEEE 2007; 95:899–924. doi:10.1109/JPROC.2007.893250.
12. Särkkä S. Bayesian filtering and smoothing. Cambridge University Press: Cambridge, 2013.
13. Crisan D, Doucet A. A survey of convergence results on particle filtering methods for practitioners. IEEE Transactions on
Signal Processing 2002; 50(3):736–746. doi:10.1109/78.984773.
14. P. Bui Quang, C. Musso, and F. Le Gland. An insight into the issue of dimensionality in particle filtering. In Information
Fusion (FUSION), 2010 13th Conference on, pages 1–8, July 2010. doi: 10.1109/ICIF.2010.5712050.
15. Rebeschini P, van Handel R. Can local particle filters beat the curse of dimensionality? The Annals of Applied Probability
2015; 25(5):2809–2866. doi:10.1214/14-AAP1061.
16. Snyder C, Bengtsson T, Bickel P, Anderson J. Obstacles to high-dimensional particle filtering. Monthly Weather Review
2008; 136:4629–4640.
17. Chris Snyder. Particle Filters, the “Optimal” Proposal and High-dimensional Systems. In ECMWF Seminar on Data assim-
ilation for atmosphere and ocean, September 2011.
18. Thomas Bengtsson, Peter Bickel, and Bo Li. Curse-of-dimensionality Revisited: Collapse of the Particle Filter in Very Large
Scale Systems, 2008.
19. Silverman BW. Density Estimation for Statistics and Data Analysis. Chapman and Hall: London, 1986.
20. Arnaud Doucet, Nando de Freitas, Kevin P. Murphy, and Stuart J. Russell. Rao–Blackwellised Particle Filtering for Dynam-
ics Bayesian Networks. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, UAI’00, pages
176–183, 2000.
21. Thomas Schön and Fredrik Gustafsson. Particle Filters for System Identification of State-space Models Linear in Either
Parameters or States. In proceedings of the 13th IFAC Symposium on System Identification, pages 1287–1292, 2003.
22. Eric A. Wan and Rudolph van der Merwe. The Unscented Kalman Filter for Nonlinear Estimation. In Adaptive Systems for
Signal Processing, Communications, and Control Symposium 2000. AS-SPCC. The IEEE 2000, pages 153–158, 2000. doi:
10.1109/ASSPCC.2000.882463.
23. George Poyiadjis, Sumeetpal S. Singh, and Arnaud Doucet. Gradient-free Maximum Likelihood Parameter Estimation With
Particle Filters. In American Control Conference, 2006, pages 6 pp.–, June 2006. doi: 10.1109/ACC.2006.1657187.
24. Gustafsson F, Hendeby G. Some relations between extended and unscented Kalman filters. Proceedings of the IEEE Trans-
actions on Signal Processing 2012; 60:545–555. doi:10.1109/TSP.2011.2172431.
25. Chatzis MN, Chatzi EN, Smyth AW. On the observability and identifiability of nonlinear structural and mechanical systems.
Structural Control and Health Monitoring 2015; 22(3):574–593. doi:10.1002/stc.1690.
26. Sottinen T, Särkkä S. Application of Girsanov theorem to particle filtering of discretely observed continuous-time non-linear
systems. Bayesian Analysis 2008; 3(3):555–584. doi:10.1214/08-BA322.
27. Chatzis MN, Chatzi EN, Smyth AW. An experimental validation of time domain system identification methods with fusion
of heterogeneous data. Earthquake Engineering & Structural Dynamics 2015. ISSN 1096-9845; 44(4):523–547.
doi:10.1002/eqe.2528.
28. Carvalho CM, Johannes MS, Lopes HF, Polson NG. Particle learning and smoothing. Statistical Science 2010;
25(1):88–106.

Copyright © 2016 John Wiley & Sons, Ltd. Struct. Control Health Monit. (2016)
DOI: 10.1002/stc

You might also like