Statistical Solutions For and From Signal Processing
Statistical Solutions For and From Signal Processing
Signal Processing
by
Luke Bornn
Master of Science
in
(Statistics)
11
Table of Contents
Abstract .
List of Figures v
Acknowledgements vi
1 Introduction 1
1.1 Particle Filtering 1
1.2 Support Vector Forecasters 4
1.3 References 5
111
Table of Contents
4 Conclusion 50
5 Technical Appendix 52
iv
List of Figures
v
Acknowledgements
I am indebted to my mentors, Dr. Arnaud Doucet and Dr. Raphael Got
tardo, for the wealth of opportunities and immense support they have pro
vided me. I am also grateful to the many people I have had a chance to
interact with at both the University of British Columbia and Los Alamos
National Labs. In particular I thank Dr. Dave Higdon and Dr. Todd Graves
for guiding me down a fruitful path.
vi
Statement of Co-Authorship
Excluding chapters 2 and 3, all material in this thesis was solely authored.
While I identified and carried out the research in chapter 2, Arnaud
Doucet and Raphael Gottardo provided much guidance, criticism, and feed
back.
The data in chapter 3 was produced by Gyuhae Park and Kevin Farm
holt. Chuck Farrar assisted in writing some of the details pertaining to the
structural health monitoring field. In addition, his guidance and feedback
contributed immensely to the development of the work.
vii
Chapter 1
Introduction
The research area of signal processing is concerned with analyzing signals
including sound, video, and radar. There are many components to this task,
including storage and compression, removing noise, and extracting features
of interest. As an example, we might have a noisy recording of a telephone
conversation for which we want to store the signal, remove the noise, and
identify the speakers. These signals can take many forms, either digital or
analog. We focus on statistical signal processing, which is concerned with
studying signals based on their statistical properties. We begin by describ
ing two statistical methods employed for signal processing, the first being
particle filtering, and the latter being support vector forecasters. Because
chapter 3 contains a detailed development of support vector forecasters, we
forego these details here. Later chapters then extend these methodologies to
fields for which they were not intended, specifically prior sensitivity analysis
and cross validation as well as structural health monitoring
ytlxt py,t(yIxt)
where Xt and Yt denote the unobserved state and observation at time t,
respectively; Px,t and py,t are the state transition and measurement models,
respectively. Also, we assume a prior distribution p(xo) on xo. In the case
of linearly additive noise, we may write this state-space model as
= 16) + r
1
f(xt_
Yt = h(xt) + t. (1.1)
1
Chapter 1. Introduction
Here both the stochastic noise r and the measurement noise c are mutu
ally independent and identically distributed sequences with known density
functions. In addition, f(xt_iI8) and h(xt) are known functions up to some
parameters 8.
In order to build the framework on which to describe the filtering method
ologies, we first frame the above state-space model as a recursive Bayesian
estimation problem. Specifically, we are interested in obtaining the posterior
distribution
P(xoty1t) (1.2)
where XO:t = {xo,x1, .. ,Xt} and Y1:t = {yl,y2,
. .,yt}. Often we don’t re
. .
quire the entire posterior distribution, but merely one of its marginals. For
instance, we are often interested in the estimate of state given all obser
vations up to that point; we call this distribution the filtering density and
denote it as
p(xtlyit). (1.3)
By knowing this density, we are able to make estimates about the system’s
state, including measures of uncertainty such as confidence intervals.
If the functions f and h are linear and both rj and Ct are Gaussian,
Kalman filtering is able to obtain the filtering distribution in analytic form.
In fact it can be seen that all of the distributions of interest are Gaussian
with means and covariances that can be simply calculated. However, when
the dynamics are non-linear or the noise non-Gaussian, alternative methods
must be used.
In the case of non-linear dynamics with Gaussian noise, the standard
methodology is the extended Kalman filter, which may be considered as a
nonlinear Kalman filter which linearizes around the current mean and co
variance. However, as a result of this linearization, the filter may diverge if
the initial state estimate is wrong or the process is incorrectly modeled. In
addition, the calculation of the Jacobian in the extended Kalman filter can
become tedious in high-dimension problems. One attempted solution to this
problem has been the unscented Kalman filter (Wan and van der Merwe,
2001), which approximates the nonlinearity by transforming a random vari
able instead of through a Taylor expansion, as the extended Kalman filter
does. By employing a deterministic sampling technique known as the Un
scented transform (Julier and Uhlmann, 1997), UKF selects a minimal set
of sample points around the mean which are then propagated through the
non-linear functions while recovering the covariance matrix.
When either the stochastic or measurement noise is non-Gaussian, Monte
Carlo methods must be employed, in particular particle filters. This Monte
2
Chapter 1. Introduction
Carlo based filtering method relies on a large set of samples, called parti
cles, which are evolved through the system dynamics with potentially non-
Gaussian noise using importance sampling and bootstrap techniques. At
each time step the empirical distribution of these particles is used to approx
imate the distribution of interest and its associated features. By sampling
from some proposal distribution q(xo:tyi:t) in order to approximate (1.2),
we may use importance sampling with corresponding unnormalized weights
— P(yi:tlxo.t)P(xot)
Wt
q(xot IYi:t)
—
3
Chapter 1. Introduction
estimation, namely the use of parallel filters to estimate state and model pa
rameters (Wan et al., 2000). Specifically, a state-space representation is used
for both the state and parameter estimate problems. While the state-space
representation for the state is given in equation (1.1), the representation of
the model parameters is given by
= +V
Ot—1
Yt = t) + r +
6
f(xt_iI
Here nt and t are as in (1.1), while iJ is an additional iid noise term. Thus
we can run two parallel filters for both state and parameters. At each time
step the current state estimate is used in the parameter filter and the current
parameter estimate is used in the state filter. The situation is complicated
in the particle filter situation, due to the well-known problem of degenerate
weights (Casarin and Mann, 2008).
Through this filtering methodology we are able to estimate the state
of a dynamic system from noisy measurements, as well as the associated
uncertainty of these estimates. In addition, the dual framework provides
a mechanism for estimating model parameters along with the state. These
filtering tools approximate a sequence of distributions of increasing dimen
sion. In later chapters, we show how the particle filtering methodology may
be adapted for situations involving distributions of equal dimension, and
subsequently build an algorithm for efficiently performing prior sensitivity
analysis and cross-validation.
4
Chapter 1. Introduction
1.3 References
Casarin, R., Mann, J-M. (2008). “Online data processing: comparison
of Bayesian regularized particle filters.” arXiv:0806.4242v1.
Douc, R., Cappe, 0., Moulines, E. (2005). “Comparison of resampling
schemes for particle filtering.” Proceedings of the th International Sympo
sium on Image and Signal Processing and Analysis. pp 6469.
Julier, S. and Uhlmann, J. (2007). “A new extension of the kalman iter
to nonlinear systems.” Proceedings of AeroSense: The 11th Symposium on
Aerospace/Defence Sensing, Simulation and Controls.
Scholkopf, B. and Smola, A.J. (2001). Learning with Kernels: Support
Vector Machines, Regularization, Optimization, and Beyond. MIT Press,
Cambridge.
Steinwart, I. and Christmann, A. (2008). Support Vector Machines.
Springer, New York.
van der Merwe, R., Doucet, A., de Freitas, N., Wan, E. (2001). “The un
scented particle iter.” Advances in Neural Information Processing Systems.
13:584590.
Wan, E. and van der Merwe, R. (2001). “The unscented kalman iter.”
Kalman Filtering and Neural Networks. Ch. 7. Ed. S. Haykin.
Wan, E., van der Merwe, R., Nelson, A. (2000). “Dual estimation and
the unscented transformation.” Advances in Neural Information Processing
Systems. 12:666672.
S
Chapter 2
An Efficient Computational
Approach for Prior
Sensitivity Analysis and
Cross-Validation
2.1 Introduction and Motivation
‘An important step in any Bayesian analysis is to assess the prior distri
bution’s influence on the final inference. In order to check prior sensitivity,
the posterior distribution must be studied using a variety of prior distri
butions. If these posteriors are not available analytically, they are usually
approximated using Markov chain Monte Carlo (MCMC). Since obtaining
the posterior distribution for one given prior can be very expensive corn
putationally, repeating the. process for a large range of prior distributions
is often prohibitive. Importance sampling has been implemented as an at
tempted solution (Besag et al., 1995), but the potential of infinite variance
importance weights makes this technique useless if the posterior distribution
changes more than a trivial amount as the prior is altered. Additionally, this
importance weight degeneracy increases with the dimension of the parameter
space.
One such prior sensitivity problem is the creation of regularization path
plots a commonly used tool when performing penalized regression. In these
—
6
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
these plots. In the Bayesian version (Vidakovic, 1998; Park and Casella,
2008), however, we may want to plot the posterior means of the regression
coefficients /3 E lilY for a range of the tuning (or penalty) parameter A. The
corresponding posterior distributions are proportional to
exp
(_
[y - (y
T
X/3) - X/3) - A
iH) (2.1)
7
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
8
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
, o?}
nique used in SMC is importance sampling, where particles {W?
1
distributed as rt may be reused, reweighting them (before normalization)
according to
W(i) n(O)
(2.2)
Jni (0i)flKt(0t_i,8t)d01:T_i
is usually impossible to compute and therefore we can not calculate the
necessary importance weights. Additionally, this assumes we are able to
sample from ir 1 (Oi) which is not always the case. Alternatives attempt
to approximate Tit pointwise when possible, but the computation of these
algorithms is in 0(N
) (Del Moral et al., 2006).
2
The central idea of SMC Samplers (Del Moral et al., 2006) is to em
ploy an auxiliary backward kernel with density L_ 1 (Ot, Ot—i) to get around
this intractible integral. This backward kernel relates to a time-reversed
SMC sampler giving the same marginal distribution as the forward SMC
sampler induced by K. The backward kernel is essentially arbitrary, but
should be optimized to minimize the variance of the importance weights.
Del Moral et al. (2006) prove that the sequence of backward kernels mini
mizing the variance of the importance weights is, for any t, L?(0, Ot—1) =
iit_i(Ot_i)Kt(Ot_i, Ot)/’it(Ot). However, it is typically impossible to use this
optimal kernel since it relies on intractable marginals. Thus, we should se
lect a backward kernel that approximates this optimal kernel. Del Moral
opt
et al. (2006) give two suboptimal backward kernels to approximate L 1
9
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
a t)
9
lrt(
(2.3a)
= ft_i(Ot_i)Kt(Ot_i,Ot)d6t_i
lrt(Ot_1)
w(9t_j,Ot)= (2.3b)
lrt_1 (
t—1)
8
These incremental weights are then multiplied by the weights at the previ
ous time and normalized to sum to 1. We note that the suboptimal kernel
resulting in (2.3b) is actually an approximation of that resulting in (2.3a),
and coincidentally has the same form as (2.2), the reweighting mechanism
for importance sampling. In this manner the first kernel should perform
better, particularly when successive distributions are considerably different
(Del Moral et al., 2006). Although the weights (2.3a) are a better ap
proximation of the optimal backward kernel weights, the second kernel is
convenient since the resulting incremental weights (2.3b) do not depend on
the position of the moved particles & and hence we are able to reweight the
particles prior to moving them. We include the incremental weight (2.3a)
because, when K is a Gibbs kernel moving one component at a time, it sim
plifies to rrt(Ot1,k)/rt1(Ot1,k) where k is the index of the component
being moved by the Gibbs sampler and 0 t—1,—k is the particle excluding the
kth component. By a simple Rao-Blackwell argument it can be seen that
this choice, by conditioning on the variable being moved, results in reduced
variance of the importance weights compared to (2.3b).
10
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross- Validation
11
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
fort=ldo
Obtain N weighted samples 6 from rj (directly, MCMC, etc.)
end
fort=2,...,Tdo
(i) (i) . (i)
Copy 0_ to 61,, and calculate weights W according to (2.2)
if ESS(O) > c then
Copy (oi), w)) to (or),
w))
else
Reweight: Calculate weights according to
1 x wt(bt_i,
W cc W_ t) where Wt(t_1,
6 t) is either given by
8
(3a) or (3b)
Resample: Resample particles according to above weights. Set
all weights to 1/N
Move: Move particles with Markov kernel of invariant
distribution ‘lrt
end
end
note 1: If a backward kernel is chosen such that the incremental
weights depend on the position of the moved particle O, the reweight
step comes after the move step and resampling is performed with the
weights (i)
note 2: c is a user-specified threshold on the effective sample size.
12
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
)
1
(De .
(o)
1 x
w cx w
— f (Do2) t—i
(o1)
• Vt (ei?)
1 x
cx W (2.4)
Vt_i (et4)
where is the th particle sampled at time t and V(O) is the ttI prior
distribution evaluated at the point O(i) If the ESS falls below a given
.
the column vectors of predictors (including the unit intercept vector). For
clarity of presentation we present the model with a continuous response;
however, it is simple to extend to binary responses (Albert and Chib, 1993).
We use the prostate data of Stamey et al. (1989) which has eight predictors
and a response (logarithm of prostate-specific antigen) with likelihood
2
yIpx,/3,u 1n).
2
Nn(Xi3,o (2.5)
Using a double exponential prior distribution with parameter A on the
regression coefficients /3 i,
3
(/ . ,. the corresponding posterior distri
.
bution is proportional to (2.1). We see from the form of this posterior dis
tribution that if A = 0 the MAP estimate of /3 will correspond to the least
squares solution. However, as A increases there will be shrinkage on /3 which
may be displayed using a regularization path plot. Because the shrinkage as
A varies is nonlinear, we set a schedule A = et/20, t = 1,.. ,100. We create.
a “gold standard” Bayesian Lasso regularization path plot for this data by
running MCMC with a Markov chain of length 50,000 at each level of A and
plotting the posterior mean of the resulting regression coefficients (Figure
2.1). It should be noted that the creation of this plot took over 5 hours.
13
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
0
0
Since the idea is to create these plots quickly for exploratory analysis,
we will compare our SMC-based method to MCMC with both constrained
to work in 5 minutes (+/— 5 seconds), and both using the same Markov
kernel. In order to perform MCMC in our time frame of 5 minutes, the
Markov chain had a length of 1200 for each of the 100 levels of ). The mean
of each resulting posterior distribution was used to plot the regularization
path plots in Figure 2.2(a). In comparison, to run in 5 minutes our SMC
algorithm used N = 4200 particles and resampled and moved the particles
when the ESS dropped below c = = 2800 (Figure 2.2(b)). For the sake
14
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
we can employ the following prior distributions for /3 and 2 (Zellner, 1986;
Mann and Robert, 2007):
)_(7+l)/
(g
_
2 lexp
15
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross- Validation
0
0
Figure 2.2: Regularization Path Plots: Plots using MCMC and SMC for
fixed computational time of 5 minutes
The plots are of standardized coefficients /3 vs. I/31i/max(I/31i).
16
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
17
Chapter 2. An Efficient Computational Approach ibr Prior Sensitivity Analysis and Cross-Validation
What then may be taken from these marginal probability plots? When
performing simple forward selection regression, the variables 1, 2, 6, 9, and
14 are chosen. Slightly different results come from doing backward selection;
in particular variables 1 and 14 are replaced by variables 12 and 13. The
LASSO solution (using 5-fold cross-validation) is the same as the forward
solution with the additional variables 7 and 8. In addition, the LASSO
solution contains some shrinkage on the regression coefficients (see exam
ple 2.3.1). Using g-Priors the variables that clearly stand out (see Figure
2.3(a)) are 1, 2, 6, 9, and 14. Thus the g-Prior solution taken from the plot
corresponds to the forward selection model. Also, for a given g, say g =
the plot obtained with SMC shows the correct top 4 variables for inclusion,
whereas the variability from the MCMC-based plot makes it impossible to
do so.
2.4 Cross-Validation
18
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
oF
0 2 4 6 8 10
0
d
/
\ /
a
C
0
0 2 4 6 8 10
Figure 2.3: Exact Marginal and Model probabilities for variable selection
using g-Priors as a function of log(g)
Plot (a) highlights several variables (X
,X
1 ,X
2 ,X
6 ,)
9 14 which show high
X
marginal probabilities of inclusion. Plot (b) shows the posterior
probabilities of 5 models chosen to highlight the effect of g on model size
19
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
c’J
0 2 4 6 8 10
cJ
0•
0
0
0 2 4 6 8 10
20
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
Tr(OIX, y)
ir (81X\s
1 Y\S) 7tl(8IX\Smax Y\Smax)
(IX\si Y\s)
2
7t
I ‘
I
t2(8IX\S, Y\Smax)
71T(OIX\si, Y\S
)
1 —-. 7rT(&JX\Sma, Y\Smcr)
21
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
f\s(YI3,
2) = (2)_(ns)/2
exp {--- [(y - X/3)
(y
T - Xj3) - (Ys - ((ys
T
X5) - Xi3))] }
q\s(e) ) x ir(3) x ir(u
2
f\s(YII3u )
2
We assume that the prior distributions for 3 and cr 2 are proper and indepen
dent. Epifani et al. (2005) show that if the weights w\S(e) q\s(e)/q(e)
are used to move to the case-deletion posterior directly, then the moment
of these weights is finite if and only if all of the following conditions hold:
a) <1/r
b) n—rs >1
c) RSS*\s(r) > 0
and RSS denotes the residual sum of squares of the least squares fit of the
full data set. This result should not be taken lightly: as Geweke (1989)
points out, if the 2nd moment does not exist, the importance sampling
estimator will follow neither a n’ 2 asymptotic nor a central limit theorem.
(a) states that if the leverage of the deleted observations is too large, then the
importance weights will have infinite variance. (b) gives a condition relating
sample size to the allowable test set size s. (c) says that if the influence of
the deleted observation is large relative to RSS, then the importance weights
will have infinite variance. We show here how using a sequence of artificial
intermediate distributions with SMC can help to mitigate this problem.
We introduce a sequence of distributions
(e)
7
q cc
*(
7
W\S, e) — (q(e))l_7*(q\s(e))7*
\\q(9))
22
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
Theorem 1.. Provided that RSS*\s(1) > 0 and the prior distributions for/3
2 are proper and independent, a sequence of distributions proportional
and a
to {(q(e))’-7(q\
(e))7; ‘y = 0, e, 2e,
5 , 1—c, 1} may be constructed to move
. . .
(2.7)
largest sequence of distributions was of length 10. For most variables c > 2,
which for r = 2 is equivalent to importance sampling. Thus SMC does not
waste time transitioning to case-deleted posteriors if importance sampling
will suffice.
We use a Gibbs sampler to approximate the posterior distribution of
(3, a
) for A = e
2 5 on the full data set and then use SMC to move to
23
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
0
o
0
0
C)
0
C
0
0
0
0
0 C)
0
o
0
-4 -2 02 4
0
0
a
1’-
0
0
000
24
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
MCMC with a Markov chain of length 10,000 (not shown) we observe that
the average squared loss 2 is a smooth function in A with a
(Yk —Xk/3)
small bump at A = e 2 and minimum near . 312 Thus to minimize prediction
e
error (at least in terms of the squared loss) we should set A — . 312 To
e
perform this task in a time-restricted manner we constrained both MCMC
and SMC algorithms to work in 10 minutes (+/- 30 seconds). Figures 2.6(a)
and 2.6(b) are the resulting plots. The reduced variability of the SMC-based
plot allows us to make more accurate conclusions. For instance, it is clear
in the plot obtained with SMC (Figure 2.6(b)) that the minimum error lies
somewhere around A = , 312 whereas from the MCMC plot (Figure 2.6(a))
e
it could be anywhere between 1 and e .
2
25
Chapter 2. An Efficient Computational Approach ftr Prior Sensitivity Analysis and Cross-Validation
2.6 References
Albert, J.H. and Chib, S. (1993). “Bayesian Analysis of Binary and
Polytomous Response Data.” Journal of the American Statistical Associa
tion. 88:669-679.
Alqallaf, F. and Gustafson, P. (2001). “On Cross-validation of Bayesian
Models.” Canadian Journal of Statistics. 29:333-340.
Besag, J., Green, P., Higdon, D., Mengersen, K. (1995) “Bayesian Com
putation and Stochastic Systems (with discussion) .“ Journal of the Amen
26
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross- Validation
27
Chapter 2. An Efficient Computational Approach for Prior Sensitivity Analysis and Cross-Validation
28
Chapter 3
Structural Health
Monitoring with
Autoregressive Support
Vector Machines
3.1 Introduction
The extensive literature on structural health monitoring (SHM) has docu
2
mented the critical importance of detecting damage in aerospace, civil, and
mechanical engineering systems at the earliest possible time. For instance,
airlines may be interested in maximizing the lifespan and reliability of their
jet engines or governmental authorities might like to monitor the condition
of bridges and other civil infrastructure in an effort to develop cost-effective
lifecycle maintenance strategies. These examples indicate that the ability to
efficiently and accurately monitor all types of structural systems is crucial
for both economic and life-safety issues. One such monitoring technique
is vibration-based damage detection, which is based on the principal that
damage in a structure, such as a loosened connection or crack, will alter the
dynamic response of that structure. There has been much recent work in this
area; in particular, Doebling et al. (1998) and Sohn et al. (2004) present de
tailed reviews of vibration-based SHM. Because of random and systematic
variability in experimentally measured dynamic response data, statistical
approaches are necessary to ensure that changes in a structures measured
dynamic response are a result of damage and not caused by operational and
environmental variability. Although much of the vibration-based SHM liter
ature focuses on deterministic methods for identifying damage from changes
in dynamic system response, we will focus on approaches that follow a sta
A version of this chapter has been accepted for publication. Bornn, L., Farrar, C.R.,
2
Park, G., Farinholt, K. (2008). “Structural Health Monitoring with Autoregressive Sup
port Vector Machines.” Journal of Vibration and Acoustics.
29
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
tistical pattern recognition paradigm for SHM (Farrar and Worden, 2008).
This paradigm consists of the four steps of 1. Operational evaluation, 2.
Data acquisition, 3. Feature extraction, and 4. Statistical classification of
features. The work presented herein focus on steps 3 and 4 of this paradigm.
One approach for performing SHM is to fit a time series predictive model
such as an autoregressive (AR) model to each sensor output using data
known to be acquired from the structure in its undamaged state. These
models are then used to predict subsequent measured data, and the residuals
(the difference between the model’s prediction and the observed value) are
the damage-sensitive feature that is used to check for anomalies. This pro
cess provides many estimates (one at each time step) of a single-dimension
feature, which is advantageous for subsequent statistical classification. The
logic behind this approach is that if the model fit to the undamaged sen
sor data no longer predicts the data subsequently obtained from the system
(and hence the residuals are large and/or correlated), there has been some
sort of change in the process underlying the generation of the data. This
change is assumed to be caused by damage to the system. These linear
time series models have been used in such a damage detection process that
include applications to a wide range of structures and associated damage
scenarios including cracking in concrete columns (Fugate et al., 2001; Sohn
et al., 2000), loose connections in a bolted metallic frame structure (Allen
et al., 2002) and damage to insulation on wiring (Clark, 2008). However,
the linear nature of this modeling approach limits the scope of application
and the ability to accurately assess the condition of systems that exhibit
nonlinearity in their undamaged state. In this paper, we demonstrate how
support vector machines (SVM) may be used to create a non-linear time
series model that provides an alternative to these linear AR models.
Once a model has been chosen and the predictions from this model have
been compared to actual sensor data, there are several statistical methods
for analyzing the resulting residuals. Sequential hypothesis tests, such as
the sequential probability ratio test (Allen et al., 2002), may be used to
test for changes in the residuals. Alternatively, statistical process control
procedures, typically in the form of control charts, may be used to indicate
abnormalities in the residuals (Fugate et al., 2001). In addition, sliding
window approaches look at the features of successive subsets of data to detect
anomalies (e.g. Clark, 2008). For example, the sliding window approach of
Ma and Perkins (2003) looks at thresholds for the residuals such that the
probability of an undamaged residual exceeding this threshold is 5%. A
subset of n consecutive data points are then checked, and large values of the
number g of points exceeding the threshold indicate damage, where g has a
30
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
(3.1)
31
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
variables. Note that an n point time series will yield n p equations that
—
after time to). Next we must decide the order p of our model. There are
many methods for selecting p, such as partial autocorrelation or the Akaike
Information Criterion (AIC), which are discussed in more detail in Fugate
et al. (2001). In general, we seek the lowest order model that captures the
underlying physical process and hence will generalize to other data sets. As
with linear AR modeling, we create the training set on which to build our
32
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
) = x for
Ideally we would like to find a function f such that f(x_.t_
1
all k and t to. However, the form of f is often restricted to the class of
linear functions (as is the case for AR models),
where (,) denotes the dot (or inner) product and w is a vector of model
parameters. This restricted form makes perfect fit of the data impossible
in most scenarios. As a result, we allow prediction using f to have an
error bounded by e, and find w under this constraint. With the recent
advances in penalized regression methods such as ridge regression and lasso,
the improved prediction performance of shrunken (or smoothed) models is
now well-understood (Copas, 1997; Fu, 1998). Thus in order to provide
a model that maximizes prediction performance, we seek to incorporate
shrinkage on the model paramaters w. Such shrunken w may be found by
minimizing the Euclidean norm subject to the error constraint , namely
minimize 2
IIwII (3.3)
Ixk_Kw,xkt—p.t—1
subject to —
1Kw, X_p:t_i) —
This model relies on the assumption that a linear model is able to fit
the data to within precision e. However, typically such a linear model does
not exist, even for moderate settings of e. As such, we introduce the slack
variables , , to allow for deviations beyond €. The resulting formulation
is
minimize 2 +CZ+
IIwW ( +)
1 (3.4)
The constant C controls the tradeoff between giving small w and penalizing
deviations larger than e. In this form we see that only points that lie out
side of the bound e have an effect on w. Figure 3.1 illustrates the process
graphically.
33
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
L : IIwII2 +C ( +) +
t=p+1
-
t=p+1
(+ X (w,xp:ti))
(3.5)
(c + - + W, xp:ti)) -
+ h7i)
t=p+1 tP+l
= w —
(at + a) Xp:t_l = 0 (3.6)
t=p+1
-
=0
34
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
l1.
“a..
70.
0 7 1 9
0
optimization problem:
compute the dot products in the transformed space. Such mappings allow
us to extend beyond the linear framework presented above. Specifically, the
mapping allows us to fit linear functions in F which, when converted back to
IR, are nonlinear. A toy example of this process is illustrated for a mapping
2 —* R
R , namely (x,y) = (x
3 ,x/y),y), in Figure 3.1. Here the data
2
is generated using the relationship y = x. To make use of this transformed
2
space, we replace the dot product term with
35
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
(x .
= ((xi, X2) (yi, y2))
. 2 (3.10)
= ,x)
2
((x,v’xix (Y?V’Y1Y2Y))
= ((x),(y))
defining I’(x) = (xi, \/x1x2, x). More generally, it has been shown that
every kernel that gives a positive matrix (k(x, y)) has a corresponding map
(x) (Smola and Schlkopf, 2004). One such family of kernels we focus on is
Radial Basis Function (RBF) kernels, which have the form
where o2 is the kernel variance. This parameter controls fit, with large
values leading to smoother functions and small values leading to better fit.
In practice moderate values are preferred as a trade-off between model fit
and prediction performance.
Whereas a traditional AR(p) model employs a linear model that is a
function of the previous p time points, the SVM model looks at the previous
p time points compared to all groups of p successive data points from the
training sample. Specifically, the model has the form
to
:r
k iii k k
J iXj_p:i_i) — /JjnAXj_p:j_, Xt_p:t_1
jp+1
36
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
are then extended through the remaining (potentially damaged) data and
damage is indicated when a statistically significant number of residuals, in
this case more than 1%, lie outside these lines. Note damage can also be
indicated when the residuals no longer have a random distribution even
though they may not lie outside the control lines.
RBF neural networks, which have the same form as Equation (3.12),
have previously been used to perform SHM (e.g. Rytter and Kirkegaard
(1997)). However, fitting these networks requires much more user input such
3 are non-zero as well as selecting the corresponding
as selecting which /
training points. In addition, the fitting of the neural network model is
a rather complicated nonlinear optimization process relative to the simple
quadratic optimization used in the support vector framework. Although
the SVM models are more easily developed, Schlkopf et al. (1997) have
demonstrated that SVMs still more accurately predict the data than the
RBF neural networks despite their simplicity.
= (400irt/1200)
3
sin +2
(sin
400irt/1200) + sin(200rrt/1200) (3.13)
+sin(lOOirt/1200) + ‘I’ + e (3.14)
where c is Gaussian random noise with mean 0 and standard deviation 0.1
and ‘I’ is a damage term. Three different damage cases are added to this
time series at various times as defined by
ci for t=600,...,650
sin(1000irt/1200) for t=800,...,850
—
for t = 1000, , 1050
...
0 otherwise
where El and 2 are Gaussian random noise with mean 0 and 1, and stan
dard deviation 0.5 and 0.2, respectively. Through the use of Jf we attempt
to simulate several different types of damaged to compare the models per
formance handling each. This raw signal is plotted in Figure 3.3 where it
can be seen that the changes caused by the damage are somewhat subtle.
37
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
time
0 5 15
Lg
The order p for both models was set at 5 as determined from the autocor
relation plot in Figure 3.4. This plot is the measure of correlation between
successive time points for a given time lag. We see from the plot that after a
lag of 5, the correlation is quite small, and hence little information is gained
by including a longer past history p. This is a standard method for deter
mining model order for traditional AR models, and as such should maximize
this methods performance, ensuring the SVM-based model isnt afforded an
unfair advantage.
The results of applying both the SVM model and a traditional AR model
to the undamaged portion of the signal between time points 400 and 600 are
shown in Figure 3.5 where the signals predicted by these models are overlaid
on the actual signal. A qualitative visual assessment of Figure 3.5 shows that
the SVM more accurately predicts this signal. A quantitative assessment
is made by examining the distribution of the residual errors obtained with
each model. The standard deviation of the residual errors from the SVM
model is 0.26 while for the traditional AR it is 0.71, again indicating that
the SVM is more accurately predicting the undamaged portion of this time
series.
In order for a model to excel at detecting damage, it must fit the un
damaged data well (i.e small and randomly distributed residual errors) while
fitting the damaged data poorly as identified by increased residual errors
38
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
time
Figure 3.5: SVM (top) and linear AR models (bottom) fit to subset of data
39
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
a)
—-
D
S
a)
‘i
0,
a)
V —.
S
(a -
*I
time
Figure 3.6: Residuals from SVM (top) and linear AR models (bottom) ap
plied to simulated data
The 99% control lines based on the residuals from the undamaged portion
of the signal are shown in red.
damage. Since the traditional AR model fits a single model to the entire
data, model fit will be very poor if the data is non-stationary (for instance
if the excitation is in the form of hammer impacts). Additionally, since the
traditional AR model as presented above does not contain a moving average
term, it will continue to fit when damage is in the form of a shift up or down
in the raw time series (as demonstrated by the third damage scenario above).
Conversely, the SVM-based method works by comparing each length of p
data to all corresponding sets in the training set. Thus, if a similar sequence
exists in the training set, we can expect the fit to be quite good. We see two
scenarios in which the SVM-based method will perform poorly. Firstly, if
there is some damage in the undamaged scenario, and similar damage occurs
in the testing set, the model will likely fit this portion quite well. Secondly, if
damage manifests itself in such a way that the time-series data is extremely
similar to the undamaged time-series, the SVM methodology will be unable
to detect it. However, we should emphasize that other methods, including
the AR model, will suffer in such scenarios as well. As an attempted solution
when the sensitivity of the method to a given type of damage is unknown
and simulation tests are impossible, various damage detection methods could
potentially be combined to boost detection power.
40
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
41
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
D A
3rd Floor 3rd Floor *otor4
A
.Cthrnfl
0.77 3
2nd Floor 2nd Floor;
I
nfl
ri r.
___ Pn.iero7ote11
074
L--. J
42
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
The SVM time series models are developed for each of the accelerometer
measurements from the undamaged data as follows:
1. Select the number of time lags that will be used in the time series
models. In this case eight time lags were used based on the AIC. Note
the number of time lags is analogous to the order of an AR model.
2. Select the parameters of the SVM model, including the kernel type
and corresponding parameters as well as C and E, which control model
fit as described earlier. In our case we used a Gaussian kernel with
variance 1 and set C 1 and e = 0.1. We have found the methodology
to be robust to choices of variance ranging over an order of magnitude.
In addition, C could be increased to force fitting of extreme values,
and could be lowered to enforce a closer fit to the training data.
4. Once the SVM model is trained (i.e. the / j in Equation (3.12) are
3
selected) in step 3, make predictions based on the new test data from
the structure in its undamaged or damaged condition. Next, calculate
the residual between the measured data and the output of the time
series prediction.
5. Square and add the residuals from each sensor as described by Equa
tion (3.15). Build a control chart for these combined residuals to detect
damage (perhaps in conjunction with statistical tests such as a sliding
window approach).
Note that steps 1 through 4 of this process are applied to each time series
recorded by the four accelerometers shown in Figure 37.
First we will revisit the normality assumption that was made in con
structing the control chart. Figure 3.8 shows the resulting Q-Q plot for the
residuals from the SVM model fit to sensor 4 data obtained with the struc
ture in its undamaged state. The Q-Q plot compares the sample quantiles
of the residuals to theoretical quantiles of a Gaussian distribution. We see
43
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
Cl)
a,
0 C
G)
E
Cu
C,)
C’J
—4 —2 0 2 4
Theoretical Quantiles
in this figure that the sample quantiles fall very close to the theoretical line,
and hence our residuals are approximately Gaussian.
Figure 3.9 shows the residual errors from the SVM fit to each of the ac
celerometer readings, respectively, and the corresponding 99% control limits
that are based on the first 6000 points from the undamaged portion of each
signal. There are 8192 undamaged points and 8192 damaged ones. Thus
when we concatenate the data the damage occurs at time point 8193 of
16384. Figure 3.10 shows the density of the normalized residual errors from
all the sensors that have been combined according to Equation (3.15). We
see that the distribution is very nearly chi-squared. In situations where the
original residuals arent normal, this result wont be true, and hence proba
bilistic statements regarding the presence of damage must be made based
on control charts.
Figure 3.11 shows the combined residuals as a function of time. The
blue points in the plot show damage indication using the sliding window
approach of Ma and Perkins (2003) as described in the introduction and
based on the 99% control lines. Specifically we use a window size of 6 which,
when combined with the 99% control limit, detects damage whenever 1 or
more of the 6 points in the window exceeds the control line (equivalent to
binomial probability of 0.05). We see from Figure 3.9 that sensors 3 and
44
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
sensor 1 sensor 2
a 0
In fit
0 0
a a
V V
a
0 0
a 0 a 0
E
a It
In
0 o
q I II I
q
In In
I I
7000 7500 8000 8500 9000 7000 7500 8000 8500 9000
time time
sensor 3 sensor 4
In
0
I ill 0
II Jib 1
b11
In In
0 0
a
8
‘Is
In
q 0
I If
II)
.9
7000 7500 8000 8500 9000 7000 7500 8000 8500 9000
time time
5 10 15 20
45
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
3.4 Conclusion
Although the application of statistical techniques to structural health mon
itoring has been investigated in the past, these techniques have predomi
nantly been limited to identifying damage-sensitive features derived from
linear models fit to the output from individual sensors. As such, they are
typically limited to identifying only that damage has occurred. In general,
46
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
these methods are not able to identify which sensors are associated with the
damage in an effort to locate the damage within the resolution of the sensor
array. To improve upon this approach to damage detection, we have applied
support vector machines to model sensor output time histories and have
shown that such nonlinear regression models more accurately predict the
time series when compared to linear autoregressive models. Here the metric
for this comparison is the residual errors between the measured response
data and predictions of the time series model.
The support vector machine autoregressive method is superior to tra
ditional linear AR in both its ability to handle nonlinear dynamics as well
as the structure of the model. Specifically, the support vector approach
compares each new testing point to the entire training set whereas the tra
ditional AR model finds a simple linear relationship to best describe the
entire training set, which is then used on the testing data. For example,
when dealing with transient impact data, the AR model will fail in try
ing to fit the entire time domain with a simple linear model. Whereas in
the past RBF neural networks have been used to tackle this problem, these
networks require significant user input and complex methods for fitting the
model to the training data, and hence the simple support vector framework
is preferred.
Furthermore, we have also shown how the residuals from the SVM predic
tion of each sensor time history may be combined in a statistically rigorous
manner to provide probabilistic statements regarding the presence of dam
age as assessed from the amalgamation of all available sensors. In addition,
this methodology allows us to pinpoint the sensors that are contributing
most to the anomalous readings and therefore locate the damage within
the sensor networks spatial resolution. The process was demonstrated on a
test structure where damage was simulated by introducing an impact type
of nonlinearity between the measured degrees of freedom. The authors ac
knowledge that the approach has only been demonstrated on a structure
that was tested in a well-controlled laboratory setting. This approach will
have to be extended to structures subjected to real-world operational and
environmental variability before it can be used in practice. However, the
approach has the ability to adapt to such changes through the analysis of
appropriate training data that span these conditions. Therefore, follow-on
studies will focus on applying this approach to systems with operational and
environmental variability as well as systems that exhibit nonlinear response
in their undamaged state.
47
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
3.5 References
Allen, D., Sohn, H., Worden, K., and Farrar, C. (2002). “Utilizing the
Sequential Probability Ratio Test for Building Joint Monitoring.” Proc of
SPIE Smart Structures Conference. San Diego, March 2002.
Bulut, A. and Singh, A.K. and Shin, P. and Fountain, T. and Jasso, H.
and Yan, L. and Elgamal, A. (2005). “Real-time Nondestructive Structural
Health Monitoring Using Support Vector Machines and Wavelets.” Proc.
SPIE. 5770:180-189.
Brockwell, P., and Davis, R. (1991). Time Series Analysis: Forecasting
and Control. Prentice-Hall.
Chattopadhyay, A., Das, S., and Coelho, CK (2007). “Damage Diag
nosis Using a Kernel-based Method.” Insight-Non-Destructive Testing and
Condition Monitoring. 49:451-458.
Chang, C-J., Lin, C-J. (2001). LIBSVM: a library for support vector
machines. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
Clark, G. (2008) “Cable Damage Detection Using Time Domain Re
flectometry and Model-Based Algorithms.” Lawrence Livermore National
Laboratory document LLNL- CONF-4 02567.
Copas, J.B. (1997) “Using Regression Models for Prediction: Shrink
age and Regression to the Mean.” Statistical Methods in Medical Research.
6:167-183.
Doebling, S., Farrar, C., Prime, M., Shevitz, D. (1998) “A Review of
Damage Identification Methods that Examine Changes in Dynamic Proper
ties.” Shock and Vibration Digest. 30:91-105.
Farrar, C.R., Worden, K. (2007). “An Introduction to Structural Health
Monitoring.” Philosophical Transactions of the Royal Society A. 365:303-
315.
Fu, W.J. (1998). “Penalized Regressions: The Bridge Versus The Lasso.”
Journal of Computational and Graphical Statistics. 7:397-416
Fugate, M., Sohn, H., and Farrar, C.R. (2001). “Vibration-Based Dam
age Detection Using Statistical Process Control.” Mechanical Systems and
Signal Processing. 15:707-721.
Herzog, J., Hanlin, J., Wegerich, S., Wilks, A. (2005). “High Perfor
mance Condition Monitoring of Aircraft Engines.” Proc of GT2005 ASME
Turbo Expo. June 6-9, 2005.
Ma, J., and Perkins, S. (2003). “Online Novelty Detection on Tempo
ral Sequences.” Proc of ninth ACM SIGKDD international conference on
knowledge discovery and data mining. 613-618.
Rytter, A., and Kirkegaard, P. (1997) “Vibration Based Inspection Using
48
Chapter 3. Structural Health Monitoring with Autoregressive Support Vector Machines
49
Chapter 4
Conclusion
Because research in signal processing is being undertaken by physicists, com
puter scientists, statisticians, and engineers among others, many tools de
veloped by one group aren’t fully adopted by others. This is partially due
to differences in jargon, but also because of each group’s different focus and
goals. However, this thesis shows that methods developed by one group for
a given purpose may often be employed quite successfully by another group
for an entirely different problem.
With the state of the art in particle filtering focussing on limiting degen
eracy of the algorithm, it is likely that future research in the area might be
applied to the material in chapter 2 to extend the scope of application. In
addition, the development of support vector machines is moving toward im
plementing the method quickly and online, while minimizing space require
ments. These advances might increase the ability of performing structural
health monitoring as discussed in chapter 3 to long time series for which
storage and computation becomes difficult.
While this thesis successfully implements two separate statistical meth
ods, each is developed in a fairly specific nature, when in fact the scope of
application is much more general, and may apply to problems not covered in
this work. As future research, prior sensitivity and cross-validation need to
be studied with the goal of easing implementation for multi-dimensional pa
rameters. Since existing methods, including the one presented, have compu
tational complexity which scales exponentially with dimension, alternative
methods must be found. In regards to structural health monitoring, more
attention must be paid to jointly modeling all sensors simultaneously, taking
their correlation into effect. In addition, more studies must be undertaken
to understand the effect of varying environmental conditions as well as if
the initial system is slightly damaged, and hence nonlinear. Whether the
solutions to these problems come from the world of signal processing is to
be seen.
Both of the ideas presented in this thesis have been greeted with en
thusiasm from researchers at Los Alamos National Laboratories, who daily
analyze complex and computationally expensive systems. In particular, the
50
Chapter 4. Conclusion
use of sequential Monte Carlo for prior sensitivity and cross-validation has
potential to reduce the computational time of building models for under
standing complex systems such as those present in biological and weapons
research. In addition, the power gained from using SVM’s for structural
health monitoring will allow for earlier detection of damage, and hence en
sure the structure’s economic viability as well as the safety of operators.
51
Chapter 5
Technical Appendix
Proof of Theorem 1. (following along the lines of Peruggia (1997) and Epi
fani et al. (2005)) We seek to show that the rtk moment of successive
importance weights is finite. So we need to find the conditions under which
f(e)de is finite, where (O) = (q(9))l7(q\(9))7 x (w\S ))r. We
9
(
7
expand and simplify (9) to obtain
(O) 2
(
1
=f
) yIj3,u x f(yI,u
) x r(3) x ir(a2)
2 x (w\S,7(9))r
) x [w\s,
2
=f(yI,u (e)] () x
7
n—s(7+re)
2 X Tr(/3) x ir(u
)
2
x exp{- [(y - (y
T
X) - X) - ( + re)(ys - (y
T
X) - X)]}
=q(9) X
where
i(9) =) x
(2)
x exp {_± [( - )T
X
T
[X - (+ rE)XXs] ( -
—1)—i
2(9)
— T [XTX
x exp {— [T
— (7+ re)yys — (7+ re)XsX] ]}
and /3 = [XTX — (7+ re)XsX]’ [yTX (7+ re)ysX]. We will show
—
[XTX —
(7+ re)XXs] > 0
52
Chapter 5. Technical Appendix
Thus, aside from showing conditions under which [XTX (7+ rc)XsX] —
may then be written as the sum of a positive definite and a positive semi-
definite matrix, and hence [XTX ( + re)XXs] is positive definite.
—
53