Machine Learning For Structural Monitoring and Anamoly Detection
Machine Learning For Structural Monitoring and Anamoly Detection
UNIVERSITÀ DI BOLOGNA
CICLO XXXIII
ESAME FINALE ANNO 2021
ALMA MATER STUDIORUM
UNIVERSITY OF BOLOGNA
09/F2 - Telecommunications
ING-INF/03 - Telecommunications
CYCLE XXXIII
FINAL EXAM YEAR 2021
Keywords
Machine Learning
Deep Learning
Neural Network
Anomaly Detection
Structural Health Monitoring
Abstract
vii
viii Abstract
ix
x LIST OF FIGURES
2.11 Block diagram of the one class classifier neural network (OCCNN)
algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 a) Residual modes after mode selection over time with one
acquisition per hour (first 200 acquisitions). b) Natural fre-
quencies selected after the initial phase of the tracking algo-
rithm. c) First two natural frequencies estimation after the
density-based tracking algorithm. Blue points represent the
residual modes, red and yellow tracks represent, respectively,
the first and second fundamental frequency estimated after the
tracking algorithm; vertical dashed lines highlight the period
when the average measured temperatures are below 0 ◦ C [15],
in particular, it has been demonstrated in [16] that when the
temperature goes below 0 ◦ C the natural frequencies of the
bridge increase. Blue and green backgrounds highlight the
acquisitions made during the normal condition of the bridge,
used respectively as training and test sets, while red back-
ground stands for damaged condition acquisitions used in the
test phase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1 Data acquisition setup along the Z-24 bridge: the selected
accelerometers, their positions, and the measured acceleration
direction [17]. . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 The block diagram for signal acquisition, processing, feature
extraction, and detection. . . . . . . . . . . . . . . . . . . . . 50
4.3 Error function evolution over the epochs during training. . . . 54
4.4 Comparison of the classification algorithms in terms of F1
score, recall, precision, and accuracy. . . . . . . . . . . . . . . 56
4.5 F1 score varying the number of points, Nx , used for training,
with Ny = 854 and Nu = 854. . . . . . . . . . . . . . . . . . . 57
4.6 F1 score varying the number of points during damage, Nu ,
used to test the algorithms, with Nx = 2399 and Ny = 854. . . 58
AI artificial intelligence
AP access point
FA false alarm
xv
xvi List of Acronyms
LOS line-of-sight
ML machine learning
MP mean phase
NLOS non-line-of-sight
NN neural network
RF radio-frequency
Abstract vii
List of Acronyms xv
1 Introduction 1
1.1 Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Thesis Structure and Notation . . . . . . . . . . . . . . . . . . 4
xix
xx CONTENTS
4 Z-24 Bridge 47
4.1 System Configuration and Data Collection . . . . . . . . . . . 47
4.1.1 Data collection . . . . . . . . . . . . . . . . . . . . . . 48
4.1.2 Data pre-processing . . . . . . . . . . . . . . . . . . . . 50
4.2 Algorithmic Complexity and Processing Time . . . . . . . . . 51
4.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.1 Algorithm Comparison . . . . . . . . . . . . . . . . . . 56
4.3.2 Impact of the training set and responsiveness . . . . . 57
5 Dimensionality Reduction 59
5.0.1 Feature extraction . . . . . . . . . . . . . . . . . . . . 59
5.0.2 Feature selection . . . . . . . . . . . . . . . . . . . . . 60
5.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1.1 Frequencies selection . . . . . . . . . . . . . . . . . . . 62
5.1.2 Algorithm Comparison . . . . . . . . . . . . . . . . . . 64
CONTENTS xxi
6 Data Management 67
6.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.1.1 Sensors Relevance . . . . . . . . . . . . . . . . . . . . . 69
6.1.2 Number of Sensors . . . . . . . . . . . . . . . . . . . . 70
6.1.3 Number of Samples . . . . . . . . . . . . . . . . . . . . 71
6.1.4 Number of Bits . . . . . . . . . . . . . . . . . . . . . . 72
6.2 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
9 Conclusion 103
Acknowledgements 125
Chapter 1
Introduction
1.1 Scenario
Nowadays, the widespread adoption of automation and sensors in a broad
set of applications (i.e., industry, telecommunication, structural monitoring,
autonomous driving, crowdsensing) generated the opportunity to access a
variety of data that can be used for new types of services and to increase the
reliability of the existing ones (see Fig. 1.1). Moreover, the increasing number
of devices per km2 that with the 6G aims to become greater than 10 million
leads to new networks paradigms that must be properly designed [18–23],
and represents a tremendous source of data that must be organized and
compressed.
n this scenario, where several new research areas have grown, both in
data analysis and big data management, this thesis focuses on the adoption
of innovative ML strategies applied to the important field of SHM.
The study of methods able to determine the state of health of a structure,
and the localization and quantification of the damage, generated a vast liter-
ature, reviewed in several works [24–28]. Over the years, with the aim to per-
form damage detection on structures, several techniques have been developed
to extract the most significant damage-sensitive features. Such techniques
can be divided into model-free and model-based. In model-free methods, the
only information is gathered by measurements (e.g., acceleration, temper-
1
2 Introduction
solve general tasks if adequately used. After the extraction of the most sig-
nificant damage-sensitive features performed by SHM techniques, a wide set
of ML algorithm able to perform anomaly detection on a general dataset will
be presented and extensively discussed. Moreover, a collection of innovative
tools with increased detection capabilities, compared to classical approaches,
will be proposed and meticulously examined.
• We investigate strategies to select the best features among all the ex-
tracted ones, comparing feature extraction and feature selection tech-
niques in order to provide the best solution for bridge monitoring.
Chapter 6 explores the reliability of the monitoring system, and the min-
imum operating conditions necessary to ensure a target performance
of the anomaly detection chain, varying the number of sensors avail-
able, the acquisition time, and the number of bits per sample used to
quantize the measurements.
Throughout the thesis, capital boldface letters denote matrices and ten-
sors, lowercase bold letters denote vectors, (·)T stands for transposition, (·)+
indicates the Moore-Penrose pseudoinverse operator, || · ||2 is the `2 -norm of
a vector, || · ||F is the Frobenius norm of a matrix, <{·} and ={·} are the real
and imaginary parts of a complex number, respectively, V{·} is the variance
operator, and 1{a, b} is the indicator function equal to 1 when a = b, and
zero otherwise.
6 Introduction
Chapter 2
2.1 Introduction
Recognising patterns, classifying elements, clusterize data, and perform re-
gression from sampled data, are fundamental task to solve problems in en-
gineering. In the last few decades many efforts have been spent to develop
algorithms able to accomplish these tasks with few hyper-parameters, i.e.,
few parameters to be tuned.
Generally, a ML algorithm workflow is composed by a training phase and
a test phase. During the training phase the algorithm is fed with a training
dataset with the aim to set some parameters θ typical of the considered
algorithm in order to minimize a cost function. After that, a validation
dataset is used to set some hyper-parameters φ of the algorithm and finally
a test dataset is used to evaluate the performance. More in detail, given a
matrix of data D of dimension Nd × D where Nd represents the number of
observations and D stands for the dimensionality of the observation (number
of features), we can define the following subsets:
7
8 Machine Learning Algorithms
It can be noted that the dimensionality D is constant among the sets of data
and the total number of observations can be written as Nd = Nx + Nv + Ny .
It is good practice to partition the data with the following proportions:
Nx = 60% Nd , Nv = 20% Nd , Ny = 20% Nd . When the number of obser-
vations Nd is low, some strategies like k-fold validation, cross validation,
and one-vs-rest can be implemented to use a greater number of observations
for the training set [29]. Another important step to apply before using the
data is the normalization. Let us define the offset x̂ as the column vec-
tor containing the row-wise mean of the matrix X, and the rescaling factor
xm = maxn,d |x̄n,d − x̂n |. Before proceeding with the application of ML algo-
rithms, the matrices X, Y and V are centered and normalized subtracting
the offset x̂ row-wise and dividing each entry by the rescaling factor xm , in
this way the training data will fall in the interval [+1, −1]. A target matrix
T(·) is defined for the whole dataset of dimensions Nd × Dout where Dout
stands for the dimensionality of the output; this matrix represents the out-
put of the observations. For instance, in a classification problem, for the
training, the target matrix is actually a vector t(X) of dimension Nx × 1 that
contains the classes to which each training point belong.
2.1 Introduction 9
2.1.1 Metrics
Several metrics can be defined to estimate the performance of a ML algo-
rithm, usually related to the particular application. Generally, the most
common metrics for regression purpose are the following:
N
X
SSE(θ, φ) = (g(xn , θ, φ) − T(xn ))2
n=1
N
1 X
M SE(θ, φ) = (g(xn , θ, φ) − T(xn ))2
N n=1
N
X
N LL(θ, φ) = − g(xn , θ, φ) ln T(xn ).
n=1
• Accuracy
TP + TN
Acc =
TP + TN + FP + FN
• Precision
TP
Prec =
TP + FP
• Recall
TP
Rec =
TP + FN
• F1 score
Rec · Prec
F1 = 2 ·
Rec + Prec
Figure 2.2: Gradient descent applied to a convex function, red points repre-
sent the iterative evaluation of the error function E(θ) due to the update of
the weight vector θ.
E(θ)
‹ = E(θ (0) ) + ∇E(θ (0) )T (θ − θ (0) ). (2.1)
Now we take the first step by travelling in the direction in which the
tangent hyper-plane most sharply angles downward (referred to as the steep-
est descent direction). It can be shown that this steepest descent direction
is given precisely as −∇E(θ (0) ). Thus, we descend in the direction of the
negative gradient (hence the name of the algorithm gradient descent) taking
our first step to a point θ (1) where
The same process can be repeated iteratively; in general the number of iter-
ation represents the number of epochs Ne (if all the dataset is used for each
iteration of the algorithm) and the hyper-parameter ρ represents the learning
rate. For the kth iteration we can write the update rule as follows:
Model Underfitting
2 2
1 1
0 0
f(z)
f(z)
−1 −1
−2 −2
−3 −3
−4 Training y(z) −4 g(z θ 10)
−1 −0,5 0 0,5 1 −1 −0,5 0 0,5 1
z z
Correct Fitting Overfitting
2 2
1 1
0 0
f(z)
f(z)
−1 −1
−2 −2
−3 −3
−4 g(z θ 15) −4 g(z θ 40)
−1 −0,5 0 0,5 1 −1 −0,5 0 0,5 1
z z
φ
X
2 φ
g(z, θ, φ) = θ0 + θ1 z + θ2 z + · · · + θφ z = θj z j
j=0
where φ represents the model order of the polynomial model, the only hyper-
parameter for this model, and z is a generic point. The values of the coef-
ficients θ can be determined by fitting the polynomial to the training data.
This can be done by minimizing an error function that measures the misfit
between the function g(x, θ, φ), for any given value of θ, and the training set
data point targets t(x). One simple choice of error function, which is widely
used in literature, is given by the sum of the squares of the errors between the
predictions g(xn , θ, φ) for each data point xn and the corresponding target
14 Machine Learning Algorithms
NX
X
E(θ, φ) = (g(xn , θ, φ) − t(xn ))2 .
n=1
The layers of a neural network can be divided into the following groups:
2.2 Neural Networks 15
Figure 2.4: General NN structure; grey neurons represent the input layer,
white neurons the hidden layers, and yellow neurons the output one.
• Input layer, equal to the number of features of the problem (D) that
are used to feed the network with the dataset.
• Hidden layers, that are used to create non linear function able to follow
the data function.
In the next chapter we will see how to set the activation functions, and the
number of neurons and layers for several problems. Generally, in this thesis
the number of hidden layers and the relative number of neurons in each
hidden layer will be represented with the vector h of dimension Nh × 1 where
Nh stands for the number of hidden layers and the elements of h represent
the number of neurons in the relative hidden layer.
The NN provide, as all the ML algorithms, a training phase and an online
phase. During the training phase the network is fed with the training dataset
X with the training target T(X) to set the network weights θ in order to
minimize the selected error function E(θ, φ). To set the hyper-parameters
φ (i.e. λ, ρ, h, and Ne ) several trainings can be performed to maximize the
performance on the validation set V and finally the real performance can be
evaluated on the test set Y. As we can infer from the previous observations,
the training phase can be computationally onerous, and requires a long time
16 Machine Learning Algorithms
2.2.1 Regression
As shown previously, a regression problem is a possible application for ML
algorithms and particularly for NNs. In this kind of problem, the goal is
to reconstruct a curve starting from a set of measurements. This type of
application can be useful for:
• filtering, when the goal is to reduce the noise present on a set of mea-
surements.
Figure 2.5: NN structure for regression, different neuron colours stays for
different activation functions.
represent the bias terms, they are not connected to the previous layer and
are able to follow the bias values among the data.
2.2.2 Classification
As said before, a quite similar structure can be used to perform classification
on a given dataset. A classification problem is an application of NN where
the target function t(X) is discrete, and represents the class to which the
points belong. In this case the number of input neurons is still equal to the
number of input features D, the neurons in the hidden layers remain the same
as the previous application, but the output layer has a number of neurons
equal to the number of classes of the problem C. The activation function is
a sigmoid with the following expression:
ez
S(z) = (2.4)
ez + 1
where z represents the input of the considered neuron and S(z) its output.
In this case a sigmoid activation function with excursion between 0 and 1
is preferred to a linear one. That is because the output of each network
can be interpreted as a value related to the probability that the input point
belongs to the class represented by the considered neuron, hence an excursion
18 Machine Learning Algorithms
• Mapping layers, which consist of one or more hidden layers, with the
number of neurons in each layer decreasing progressively till the last
one, called bottleneck. In the bottleneck the number of neurons is
usually lower than the number of input features;
The input and output layers have the same dimension of the feature space,
and the labels during the training phase must be set equal to the input
data point. With this structure, the data is mapped in a lower-dimensional
feature space (with dimension equal to the number of neurons present in
2.3 Anomaly Detection 19
the bottleneck layer), and then reconstructed through the demapping layers
minimizing the error with respect to the input data. After the training phase,
if we feed the network with a new data point, the activation function in the
bottleneck layer outputs a mapped version of the starting data point in a
lower dimensional feature space. Afterwards, we can reconstruct the data
using the demapping layers with a low reconstruction error. An example of
ANN is shown in Fig. 2.7.
Now, focusing on the specific problems we are investigating the following
two questions arises:
Q1: How can we use tools for dimensionality reduction to perform anomaly
detection?
Q2: What are the most effective algorithms able to perform anomaly detec-
tion?
0
f2
-1
-2
-3
-3 -2 -1 0 1 2 3
f1
Figure 2.8: Example of PCA; blue points represent the starting points in the
original feature space, the projected ones are depicted in red.
minimizes the error (defined as Euclidean distance) between the data in the
feature space and their projection in the selected subspace [41]. More in
detail, to find the best subspace over which to project the training data, we
need to evaluate the D × D sample covariance matrix
XT X
Σx = . (2.5)
Nx − 1
In many cases, the linear boundaries found by PCA represent a severe limi-
tation [42]. KPCA firstly maps the data with a non-linear function, named
kernel, then applies the standard PCA to find a linear boundary in the new
feature space. Such boundary becomes non-linear in the original feature
space. A crucial point in KPCA is the selection of the kernel that leads to
linearly separable data in the new feature space. In [43], when data distribu-
tion is unknown, the RBF kernel is proposed as good candidate to accomplish
this task. Given a generic point z that corresponds to a 1 × D vector, we can
22 Machine Learning Algorithms
4 3
3 2
2 1
1 0
g2
f2
0 -1
-1 -2
-2 -3
-3 -4
-4 -5
-3 -2 -1 0 1 2 3 -4 -2 0 2 4
f1 g1
Figure 2.9: Example of KPCA; on the left the points in the original feature
space are depicted, on the right the same points are projected in a new feature
space through RBF.
2
Kn(z) = e−γ||z−xn || with n = 1, 2, . . . , Nx (2.6)
Figure 2.10: Example of GMM; blue points represent the training data, the
surface represents the Gaussian mixture of order 2 that fit the data.
algorithm are the covariance matrices, Σm , and the mean values, µm , of the
Gaussian functions, with m = 1, 2, . . . , M. The GMM algorithm finds the
set of parameters Σm and µm of a Gaussian mixture that better fit the data
distribution. This strategy can also be used to perform clustering on a set of
data [29, 30]. An example of GMM applied on a set of data is shown in 2.10.
Now, considering the high versatility of NNs, and the need for effective
OCCs, the following question arises:
2.4 OCCNN
In this section, a novel technique based on NN classifiers is presented. For
the sake of clarity, this section is divided in several parts, each one describing
a block of the diagram depicted in Fig. 2.11.
24 Machine Learning Algorithms
where λbx is the estimated density for a dataset X, rn is the Euclidean distance
between the the n-th point and its kn -th neighbour, and Np is the number
of points considered for the estimation. It is possible to show that the mean
value and variance of the estimator are, respectively
¶ ©
E λ
b x = λx (2.8)
bx = ÄP λx ä
¶ ©
V λ Np
. (2.9)
n=1 kn − 2
decreases and accordingly the accuracy increases. However, when the spatial
distribution of data points deviates from Poisson, some countermeasures are
taken especially to account for the finite boundaries of the normal class:
This block solves the task to generate uniformly distributed random points
with density λi = λ bx αi in a specific portion of the feature space defined
by θ i−1 where i is the iteration index. This is fundamental to create an
adversarial class Z from which the NN can learn the boundaries between the
training data X and the adversarial ones. The adversarial points must be
generated in the portion of space where the training points are absent, hence
where the output layer activation function is greater than 0.5 (we suppose
to associate label 1 to the training set points and 0 to the adversarial ones).
The function gets as input the estimated training point density multiplied by
a factor αi , that we will discuss in Section 2.4.4, and the NN weights matrix
θ i−1 that represents the network state after the previous training iteration
and defines the actual estimated boundaries. At the first iteration (i = 1)
we suppose θ 0 equal to a null matrix; this means that the adversarial points
will be generated in all the feature space including in the area filled by the
training points.
26 Machine Learning Algorithms
2.4.4 Dimensioning
In this section, a wide set of tests is presented with the aim to find the best
configuration of hyper-parameters. The main goals are:
• comparing the OCCNN solution with PCA, KPCA, GMM, and ANN.
The same metric can also be used to evaluate the reconstruction error of all
the test set points. A target false alarm rate on the training set is fixed equal
to 0.01, consequently a threshold is selected for each algorithm to guarantee
such constraint.
2.4 OCCNN 27
The first important step is to find the value of α1 that maximizes the accuracy
of the first iteration evaluated on a validation set and checked on the test
set. Validation and test sets have the same distribution, but in order to avoid
generalization error it is good practice to define two different datasets [31].
The value of Nx = Ny = Nv = Np is tested between 200 and 1000, for lower
values the density estimator variance becomes too high and the dataset too
small to train the NN. Again, the value of α1 is varied in the range between
0.01 and 1, higher values would generate too many adversarial points in
the area of the training set and the NN would classify all the feature space
as adversarial. In Fig. 2.12 a heatmap that shows the average accuracy
(evaluated over the six distributions shown in Fig. 2.14) of the OCCNN
algorithm varying Np and α1 is reported. Note that accuracy increases with
the number of points in the training set and the best value for α1 is 0.3,
irrespectively of the number of training points. From now on the parameter
α1 is set to 0.3 and the number of training, test and validation points for the
next simulations is fixed at 500.
Test on α2
Another important test is the definition of the parameter α2 that sets the
adversarial point density after the first iteration. This test is executed with
28 Machine Learning Algorithms
a fixed number of points in the training set, and the accuracy, when α2
varies between 0.01 and 2, is shown in Fig. 2.13. It is important to highlight
that in this case α2 can assume values greater than 1 because in this phase of
the OCCNN algorithm the boundaries are already defined, so the adversarial
points will only be generated outside the boundaries estimated by the NN and
defined by the weight matrix θ 1 . We can see this step as a fine-tuning of the
boundaries roughly estimated in the first iteration. All six curves are shown
along with the average accuracy (thick blue curve). It can be noted that
for values of α2 lower than 0.6, the accuracy becomes unpredictable because
the low number of adversarial points generated is not enough to define the
boundaries correctly. On the other hand, increasing α2 beyond 1.5 slightly
decreases the accuracy because the boundary region tends to be shaped by
the adversarial class. The value of α2 = 0.8 is selected to maximize the
accuracy and improve the performances with respect to the first step. With
these parameters, an image of the boundaries estimated by the OCCNN at
the end of the two iterations is shown at the bottom of Fig. 2.14. Comparing
these with the points distributions at the top of Fig. 2.14 it is possible to see
2.4 OCCNN 29
1 1 1
0 0 0
−1 0 1−1 0 1−1 0 1
1 1 1
0 0 0
−1 0 1−1 0 1−1 0 1
Figure 2.14: At the top some examples of normalised dataset points dis-
tribution. Blue circles denote features corresponding to normal conditions,
red circles denote features corresponding to anomaly conditions. Estimated
boundaries by the OCCNN algorithm are showed at the bottom.
that the estimated boundaries are really close to the real ones. Now that all
the parameters are set we are ready to compare the various algorithms.
2.4.5 OCCNN2
The OCCNN performance is rather sensitive to the density estimation; there-
fore, depending on the data set, its ability to detect anomalies may be dom-
inated by this first step [36]. Moreover, the Pollard’s estimator (2.7) may
exhibit accuracy degradation when the data set points distribution devi-
ates from Poisson. Based on these considerations, a possible solution is the
30 Machine Learning Algorithms
Now that a set of tools for detection of anomalies are available, two ques-
tions concerning the goal of this thesis arise:
Q4: How can we perform anomaly detection based on machine learning tools
to detect a damage in a structure?
3.1 Introduction
In this section we present some tools that belong to the so-called OMA able
to perform structural monitoring and extract damage sensitive features from
some structure characteristics. Over the years several strategies have been
developed to extract relevant parameters that can represent the healthy state
of a structure (i.e., natural frequencies, fundamental modes, dumping ratios,
etc.). These strategies can be divided in two groups:
Among the model based techniques we can find the finite element method,
this strategy provides a simulated model of complex structure as a sum of
basic elements that are easier to model with respect to the whole system.
This strategy provides an accurate estimation of the modal parameter of
the structure as long as its simulation is accurate. The main problem with
this strategy is the computational cost and the impossibility to generalize
it to different structures. In fact, the need of a model require a targeted
31
32 Structural Health Monitoring
stadia), mechanical and industrial engineering (ships, trucks, car bodies, en-
gines, rotating machineries), aerospace engineering (in-flight modal identifi-
cation of aircrafts and shuttles, studies about flutter).
OMA is based on the following assumptions:
Moreover, unlike the traditional modal testing where the input is con-
trolled, OMA is based on the dynamic response of the structure under test
to non-controllable and immeasurable loadings such as environmental and op-
erational loads (traffic, wind, microtremors, and so on). As a consequence,
some assumptions about the input are needed. If the structure is excited
by white noise, that is to say, the input spectrum is constant, all modes are
equally excited and the output spectrum contains full information about the
structure [14]. However, this is rarely the case, since the excitation has a
spectral distribution of its own. Modes are, therefore, weighted by the spec-
tral distribution of the input and both the properties of the input and the
modal parameters of the structure are observed in the response.
Additionally, noise and spurious harmonics due to rotating equipment
are observed in the response. Thus, generally, the structure is assumed to
be excited by unknown forces that are the output of the so-called excitation
system loaded by white noise (see also Fig. 3.1). Under this assumption, the
measured response can be interpreted as the output of the combined system,
made by the excitation system and the structure under test in series, to a
stationary, zero mean, Gaussian white noise.
Since the excitation system and the structure under test are in cascade,
the frequency response function of the combined system is the product of
34 Structural Health Monitoring
where Hc (ω), Hf (ω), and Hs (ω) are the frequency responses of the combined
system, the excitation system, and the structure under test, respectively.
In fact, for each subsystem, output and input are related by the following
equations [14]:
F (ω) = N (ω)Hf (ω)
where N (ω), F (ω), and Y (ω) denote the Fourier transforms of the white
noise input to the excitation system, the excitation system output, and the
structure output, respectively. In this context, the measured response in-
cludes information about the excitation system and the structure under test,
but the modal parameters of the structure are preserved and identifiable, and
the characteristics of the excitation system have no influence on the accuracy
of modal parameter estimates.
The discrimination between structural modes and properties of the ex-
citation system is possible since the structural system has a narrowband
response and time invariant properties, while the excitation system has a
broadband response and it may have either time variance or time invariance
properties.
The estimation of the modal model of the structure gives the opportunity
to estimate also the unknown forces, according to (3.1). The assumption of
broadband excitation ensures that all the structural modes in the frequency
range of interest are excited. Assuming that the combined system is excited
3.2 Operational Modal Analysis 35
by a random input, the second order statistics of the response carries all
the physical information about the system and plays a fundamental role in
output-only modal identification. The focus on second order statistics is
justified by the central limit theorem. In fact, the structural response is
approximately Gaussian in most cases, regardless of the distributions of the
(independent) input loads, which are often not Gaussian.
The spatial distribution of the input also affects the performance of OMA
methods, especially in the presence of closely spaced modes. A random dis-
tribution of inputs in time and space provides better modal identification
results. The presence of measurement noise and spurious harmonics in re-
sponse to measurements requires appropriate data processing to mitigate
their effects and discriminate them from actual structural modes. These
strategies take the name of mode selection and will be discussed in the fol-
lowing section.
3.2.1 Peak-Picking
Z T
Y (f ) = y(t)e−j2πf t dt
0
and the auto-spectral and cross-spectral density functions are defined as fol-
lows: Z +∞
Sxx (f ) = Rxx (τ )e−j2πf τ dτ
−∞
36 Structural Health Monitoring
Z +∞
Syy (f ) = Ryy (τ )e−j2πf τ dτ
−∞
Z +∞
Sxy (f ) = Syx (f ) = Rxy (τ )e−j2πf τ dτ
−∞
d2 y(t) dy(t)
M 2
+D + Ky(t) = w(t) (3.2)
dt dt
where y(t) stands for the measured system response, w(t) is a continuous-
time, zero mean Gaussian white noise, M represents the mass matrix, D the
dumping matrix, and K the stiffness matrix [14]. It is possible to show that
this system can be also described by a discrete-time ARMA vector model
(the ARMA model is referred as such an ARMA vector model to point out
its multivariate character) by approximating the differential operator with
finite differences over a finite time step ∆t.
Historically, ARMA vector models have been used for the estimation of
the modal parameters of civil structures. Due to a number of shortcomings
(in particular for systems with many outputs and many modes, where the
large set of parameters to be estimated leads to large computational burden
and convergence problems), stochastic state-space models have progressively
replaced them in the context of modal identification.
In order to explain how modal parameters can be extracted from an
ARMA model, assume that a continuous-time system is observed at discrete
time instants k with a sampling interval ∆t. Since the input on the structure
is not available (it is immeasurable), the equivalent discrete-time system can
be obtained only by requiring that the covariance function of its response
38 Structural Health Monitoring
to a Gaussian white noise input be coincident at all discrete time lags with
that of the continuous-time system. This implies that the first and second
order moments of the response of the discretized model are equal to the
first and second order moments of the response of the continuous-time sys-
tem at all the considered discrete time instants. Under the assumption that
the response of the system is Gaussian distributed, the covariance equiva-
lent model is the most accurate approximated model, since it is exact at all
discrete time lags. When the dynamic response of the system is driven by
the Gaussian white noise w(t) but there are also some disturbances (process
and measurement noise), the latter must also to be taken into account by
the equivalent discrete-time model. In the presence of such disturbances, an
ARM A(nα, nγ) model has the following form:
where yk is the vector of the output at the time instant tk , and ek is the
innovation modeled as a zero mean Gaussian white noise. The left-hand side
of (3.3) is the auto-regressive part, while the right-hand side is the moving
average part. The matrices Ωi contain the auto-regressive parameters, while
the matrices Γj contain the moving average parameters; NΩ and NΓ represent
the auto-regressive and moving average order of the model. It is possible to
show that a covariance equivalent ARMA vector model can be converted into
a forward innovation state space model, and vice versa. If the order of the
state-space model is too large, the model will contain redundant information;
on the contrary, if the state-space dimension is too small, a certain amount
of information about the modelled system will be lost. This transformation
provides a good estimation of the modal parameters but ARMA models
usually present stability issues and are computationally expansive.
nique has been applied to the Z-24 environment. The SSI is a time-domain,
parametric, covariance-driven procedure for blind modal analysis (i.e., it can
extract the modal frequencies from the accelerometric measurements without
any a priori knowledge about the structure [14]).
SSI is characterized by the system order n ∈ N and the time-lag i ≥ 1. To
apply the algorithm it is enough to satisfy the following constraint, l · i ≥ n
[14], where l stands for the number of sensors. In the Z-24 monitoring, the
system order n is considered unknown, so it is kept as a parameter varied in
the range n ∈ [2, 160] (with step 2), while the time-lag is i = 60 [14].
Let us define the block Toeplitz matrix for a given time-lag i and shift s
(a) (a) (a) (a)
Ri Ri−1 . . . Rs+1 Rs
(a)
Ri+1 R(a)
i
(a) (a)
. . . Rs+2 Rs+1
(a)
Ts|i = . (3.4)
.. .
.. .. .. ..
. . .
(a) (a) (a) (a)
R2i−1 R2i−2 . . . Ri+1 Ri
of dimensions li × li where
(a) 1
Ri = D(a,1:N −i,:) DT
(a,i:N,:) (3.5)
N −i
where U(n) and V(n)T are matrices of dimensions respectively li×n and n×li
and Σ is a diagonal matrix, of dimension n × n, that contain the singular
40 Structural Health Monitoring
T1|i = Oi Γi (3.7)
Oi = UΣ1/2 S (3.8)
where
C
CA î ó
Oi = . and Γi = Ai−1 G . . . AG G (3.10)
..
i−1
CA
represent, respectively, the observability matrix and the reversed controlla-
bility matrix. In (3.8) and (3.9) the matrix S plays the role of a similarity
transformation applied to the state-space model, therefore it can be set equal
to the identity matrix I. The matrices A, C, and G, in equation (3.10) repre-
sent the state matrix, the output influence matrix, and the next state-output
covariance matrix, respectively. Note that C and G can be easily extracted
from the matrices Oi and Γi , while A can be calculated by (3.4) as
A = O+ +
i T2|i+1 Γi . (3.11)
A = ΨΩΨT (3.12)
160 160
140 140
120 120
100 100
80 80
n
n
60 60
40 40
20 20
0 0
0 5 10 15 20 25 0 5 10 15 20 25
f [Hz] f [Hz]
(a) (b)
Figure 3.2: Example of stabilization diagram for the first hour monitoring:
a) through SSI, b) after mode selection and clustering.
where φ(a,n)
p is a l × 1 vector, and ψ (a,n)
p is the pth column vector of Ψ(a,n)
defined in (3.12). The natural frequencies extracted through this approach
in the Z-24 environment for the first acquisition (a = 1) varying the order n
is depicted in the stabilization diagram shown in Fig. 3.2a.
dumping ratio check, and complex conjugate poles check. Among several
criteria, we choose to use these four metrics as a good compromise between
computational cost and accuracy of the spurious modes detection. The met-
rics selected are widely used and discussed in literature [46, 48–50].
MAC
|φ(a,n)T
p φ(a,j)
q |2
MAC(φ(a,n)
p , φ(a,j)
q ) = (3.13)
||φ(a,n)
p ||22 ||φ(a,j)
q ||22
with values between 0 and 1. A MAC larger than 0.9 indicates a consis-
tent correspondence between the modes and so, very likely physical modes,
whereas small values indicate poor resemblance of the two shapes and so, a
spurious mode. A validation criteria based on MAC is the following [51]
(a,n) (a,j)
|λp − λq |
dm (a, n, j, p, q) = (a,n) (a,j)
+ 1 − MAC(φ(a,n)
p , φ(a,j)
q ) (3.14)
max(|λp |, |λq |)
where the first term is a measure of the distance between the pth and qth
eigenvalues. With (3.14) a mode is considered physical when dm (a, n, j, p, q) <
0.1.
MPD
It measures the deviation of a mode shape components from the mean phase
(MP) of all its components. A widely used technique to evaluate the MP is
through the SVD [51]
PΛQT = [<{φ(a,n)
p } ={φ(a,n)
p }] (3.15)
3.4 Features Extraction 43
3.4.1 Clustering
The stabilization diagram obtained after mode selection needs to be further
processed to select the natural frequencies. For this purpose, K-means can
be used to cluster the data, associating each cluster to a prospective natural
44 Structural Health Monitoring
frequency corresponding to its centroid [29, 52]. For each acquisition the
algorithm starts with an initial number of centroids (e.g., K = 4) initialized
at random frequencies within the range of interest, and then their position
will be updated until convergence.
Since the number of natural frequencies is unknown, the approach is re-
peated for different K values, ranging between 2 and 6; a large K tends to
produce many spurious modes while small values may discard real funda-
mental frequencies.
Finally, the centroids configuration that leads to the solution with the
lowest error (evaluated as the sum of the Euclidean distances between the
centroids and the associated points) is selected. The output of K-means is the
number of natural frequencies and their estimation (red lines in Fig. 3.2(b)).
Repeating this procedure for each acquisition the blue points depicted in
Fig. 3.3a are obtained.
At the end of clustering, the estimated frequencies are no more dependent
(a)
on the model order n, consequently the notation µ̄p is adopted (see Fig. 4.2).
Initial phase
Without any assumption about the structure, the idea is to analyze the data,
(a)
µ̄p , and find some clusters of points that could be the starting ones. To per-
(a)
form this task, the first Nt = 200 acquisitions (i.e. µ̄p with a = 1, 2, . . . , Nt ,
are considered (see Fig. 3.3a)). From this initial data, the number of points
3.4 Features Extraction 45
f1
15 15 f2
f3
f4
(0)
f4
Frequency [Hz]
(0)
10 10 f3
(0)
5 5 f2
(0)
f1
Figure 3.3: a) Residual modes after mode selection over time with one acqui-
sition per hour (first 200 acquisitions). b) Natural frequencies selected after
the initial phase of the tracking algorithm. c) First two natural frequencies
estimation after the density-based tracking algorithm. Blue points represent
the residual modes, red and yellow tracks represent, respectively, the first
and second fundamental frequency estimated after the tracking algorithm;
vertical dashed lines highlight the period when the average measured tem-
peratures are below 0 ◦ C [15], in particular, it has been demonstrated in [16]
that when the temperature goes below 0 ◦ C the natural frequencies of the
bridge increase. Blue and green backgrounds highlight the acquisitions made
during the normal condition of the bridge, used respectively as training and
test sets, while red background stands for damaged condition acquisitions
used in the test phase.
that fall into frequency bins of bandwidth Bf = 0.4 Hz are counted. The his-
togram obtained is depicted in Fig. 3.3b. Selecting the largest values of the
histogram, the number of starting points and the corresponding frequencies,
(0)
fs , are estimated. Specifically, the first estimated frequency is evaluated as
the average values of the frequencies that fall into the respective bins. For
(0)
example, according to Fig. 3.3b the values of the starting points, fs , in this
case s = 1, . . . , 4, are estimated and correspond to 4.0, 5.2, 10.1 and 12.8 Hz.
46 Structural Health Monitoring
Online phase
In this phase, for each starting point a Gaussian kernel of the form
(a) 2
Ä ä
1
− θ−fs
2σ 2
θ; fs(a) , σf
G =e f (3.17)
is used to track the frequencies evolution over time. The parameter σf con-
trols the kernel width and has been chosen equal to σf = 0.01 Hz; larger val-
ues of σf make the system more reactive to fast frequency changes during the
tracking but more sensitive to outliers due to the noisy measurements. The
(0)
tracking algorithm is initialized with fs and iteratively updated through
the following rule
fs(a) = (1 − )fs(a−1) + fes(a) (3.18)
where the parameter ∈ [0, 1] controls the impact of the new observation.
Large values of reduce smoothness but allow capturing sudden changes
of modal frequencies. For the specific data set = 0.7 is selected. The
(a)
innovation term fes in (3.18) is evaluated through the Gaussian kernel as
P (a) (a) (a−1)
pµ̄p G µ̄p ; fs , σf
fes(a) = P (a) (a−1) . (3.19)
p G µ̄p ; fs , σf
(a)
The four tracks fs = {fs }Na
a=1 are shown in Fig. 3.3 and stored in the
following matrix
T (1) (2) (N ) T
f1 f1 f1 ... f1 a
(1) (2) (N )
f2 a
= f2 f2 ...
f2
F= . (3.20)
f f (1) (2)
f3 ...
(N )
f3 a
3 3
(1) (2) (Na )
f4 f4 f4 ... f4
Z-24 Bridge
• A long-term continuous monitoring test took place during the year be-
fore demolition. The aim was to quantify the environmental variability
of the bridge dynamics.
• Progressive damage tests took place over a month, shortly before com-
47
48 Z-24 Bridge
Figure 4.1: Data acquisition setup along the Z-24 bridge: the selected ac-
celerometers, their positions, and the measured acceleration direction [17].
are: full scale range ±9.81 m/s2 , output ±2.5 V, natural frequency critical
damping 0.7, and supply voltage ±12 V DC. The A/D converter used is the
IOtech ADC 488/8SA [55], an 8 channel sample and hold with 8 differential
inputs, 16 bit, and 100 kHz sampling rate.
Figure 4.2: The block diagram for signal acquisition, processing, feature
extraction, and detection.
The block diagram depicted in Fig. 4.2 represents the sequence of tasks per-
formed for the fully automatic anomaly detection approach presented in this
work.
To reduce the computational cost and memory needs of the subsequent
elaborations, some pre-processing steps have been applied to the data D raw .
First, a decimation by a factor 2 is applied to each acquisition; thus the
sampling frequency is scaled to fsamp = 50 Hz. Such sampling frequency
is deemed sufficient because the Z-24 fundamental frequencies fall in the
[2.5, 20] Hz frequency range [53]. After decimation, the data are processed
with a finite impulse response (FIR) band-pass filter of order 30 with band
[2.5, 20] Hz, to remove disturbances. The pre-processed data are then stored
in a tensor D of size Na × l × N , where N = 32000 ' Ns /2. Regarding the
environmetal parameters, the temperatures collected are averaged over time
and among sensors to have one estimate for the whole bridge each hour. The
main matrices, vectors and scalars introduced are summarised in Table 4.2.
4.2 Algorithmic Complexity and Processing Time 51
• SSI requires the SVD of matrix T1|i (3.6), of size li×li, and eigenvalues
decomposition of matrix A (3.12), of size n × n. SVD complexity is
O(k(li)2 + k 0 (li)3 ) (where k and k 0 are constants which are 4 and 22,
respectively, for the R-SVD algorithm) [56]; the complexity of eigenval-
ues decomposition is O(n3 ). All these procedures are applied for each
system order n considered in the range n ∈ [2, 160] (with step 2, for
a total of ns = 80), and repeated for all the Na acquisitions. Thus,
the overall complexity is O(Na n ((k(li)2 + k 0 (li)3 ) + n3 )), and can be
P
4.3 Performance
In this section, the proposed algorithms are applied to the Z-24 bridge
data set to detect anomaly based on the fundamental frequencies estima-
tion [14,58,59]. The performance is evaluated through the following metrics,
described previously, considering only the test set through accuracy, preci-
sion, recall and F1 score. In these numerical results, the dataset is divided
into a training set and a test set. For this comparison, we decided not to
use a validation phase because it is not needed in OCCNN and OCCNN2
as there are no hyper-parameters to set. For the other algorithms, we set
their parameters to ensure the maximum accuracy on the test set. With this
methodology, we slightly overestimate the accuracy of PCA, KPCA, and
GMM. Despite this setup favor PCA, KPCA and GMM, in the following
paragraph it is shown that the proposed solution, OCCNN2 , overcame their
performance.
The feature space has dimension D = 2 because only the first two funda-
mental frequencies are considered (this decision will be widely discussed in
the next chapter), and unless otherwise specified the three data sets used for
training, test in normal condition, and damaged condition, have cardinality
Nx = 2399, Ny = 854, and Nu = 854, respectively. For PCA, the number of
components selected is P = 1. For KPCA, after several tests the values of P
and γ that ensure the minimum reconstruction error are P = 3 and γ = 8.
54 Z-24 Bridge
1
Step 1 Step 2 OCCNN
0.8
0.6
EX
0.4
0.2
0
1 5, 000 10, 000
1
Step 1 Step 2 OCCNN2
0.8
0.6
EX
0.4
0.2
0
1 5, 000 10, 000
Epochs
Figure 4.3: Error function evolution over the epochs during training.
For GMM the order of the model that maximize performance is M = 10.
Regarding the ANN, we adopted a fully connected network with 5 layers of,
respectively, 100, 50, 1, 50 and 100 neurons, with ReLU activation functions.
Note that, the same ANN is also used for the first step of the OCCNN2 al-
gorithm. The parameters of the OCCNN and OCCNN2 are set accordingly
to [36], resulting in α1 = 0.3, α2 = 0.8, and are largely independent on the
spatial distribution of the feature points; this allows to presume that such
OCCs can work on different structures and bridges. The NN has 2 hidden
layers with L = 50 neurons each. All the NNs are trained for a number of
epochs Ne = 5000 with a learning rate ρ = 0.05. The error function adopted
for a training set X is
Nx X
X C
EX = − tn,c ln e
tn,c (4.1)
n=1 c=1
output neuron [30]. As can be seen in Fig. 4.3 for each step of the algorithms
the networks are trained for Ne = 5000 epochs, quite enough for the error
function to reach the minimum. A target false alarm rate on the training set
is fixed equal to 0.01, from which a threshold is selected for each algorithm
to guarantee such constraint. In Table 4.4 the principal hyper-parameters
are summarized.
56 Z-24 Bridge
0.89
0.95
F1 0.95
0.45
0.96
0.94
0.8
0.93
Rec 0.92
0.29
0.94
0.95
0.99
0.98
Prec 0.99
0.95
0.98
0.94
0.89
0.95
Acc 0.95
0.62
0.96
0.94
0,8
0,6
F1 score
0,4
0,2
OCCNN OCCNN2 PCA
KPCA GMM ANN
0
500 1000 1500 2000 2500
Training Points
Figure 4.5: F1 score varying the number of points, Nx , used for training, with
Ny = 854 and Nu = 854.
0.8
0.6
F1 score
0.4
0.2
and the number of anomaly test points, Nu , varies from 86 to 854 with 10
points steps. The plot shows that OCCNN2 exhibits the best responsiveness
(i.e. it requires the lowest number of damaged points to achieve higher values
of the F1 score).
Now that the performance has been analyzed in detail, the following
question arises:
Dimensionality Reduction
59
60 Dimensionality Reduction
and then reconstructed through the demapping layers minimizing the error
with respect to the input data.
5.1 Performance
The performance is evaluated through the following metrics, described pre-
viously, considering only the test set:
TP + TN
Accuracy =
TP + TN + FP + FN
TP
Precision =
TP + FP
TP
Recall =
TP + FN
Recall · Precision
F1 score = 2 ·
Recall + Precision
where TP , TN , FP , and FN , represent, respectively, true positive, true nega-
tive, false positive, and false negative predictions.
The feature space has dimension D = 4, the three data sets used for
training, test in normal condition, and damaged condition, have cardinality
Nx = 2399, Ny = 854, and Nu = 854. For PCA, the number of components
selected is P = 1. For KPCA, after several tests the values of P and γ that
ensure the minimum reconstruction error are P = 3 and γ = 8. Regarding
the ANN we adopted a fully connected network with 7 layers of, respectively,
5.1 Performance 61
0,8
0,6
F1 score
0,4
0,2
50, 20, 10, k, 10, 20 and 50 neurons with k number of features extracted,
with ReLU activation functions for the feature extraction task, and a fully
connected network with 5 layers of, respectively, 100, 50, 1, 50 and 100 for
the anomaly detection. All the NNs are trained for a number of epochs
Ne = 5000 with a learning rate ρ = 0.05. The error function adopted for a
training set X is
XNx X C
EX = − tn,c ln e
tn,c , (5.1)
n=1 c=1
0,02
f1 f2 f3 f4
Variance
0,01
0,00
2 20 40 60 80 100 120 140 160 180 200
Nc
Figure 5.2: Modal frequencies variance for different points; vertical dashed
red lines indicate the minimum number of points for the correct frequency
sorting.
2
f1 f2 f3 f4
1
Skewness
−1
−2
2 20 40 60 80 100 120 140 160 180 200
Nc
Figure 5.3: Modal frequencies skewness varying the number of points; verti-
cal dashed red lines indicate the minimum number of points for the correct
frequency sorting.
(i)
where Nc is the number of acquisition considered, fs is the ith acquisition
of the sth fundamental frequency, and f¯s stands for the mean value of the
sth frequency evaluated in the interval {1, . . . , Nc }. As shown in Fig. 5.2,
this feature works well after Nc = 30 observations; as depicted, the variance
of reliable features is lower with respect to the noisy ones; so, this method
can be successfully used to sort the frequencies in the correct order.
The second metric proposed is the skewness that measures the asymmetry
of the probability distribution, i.e.
(i)
− f¯s )3
1
PNc
Nc i=1 (fs
Skewness = » 3 .
(i)
− f¯s )2
1
PNc
Nc i=1 (fs
As shown in Fig. 5.3, this metric can also be used to sort the frequencies
correctly; the method becomes reliable after around Nc = 100 observations.
64 Dimensionality Reduction
0
f1 f2 f3 f4
−250
Entropy
−500
−750
−1000
2 20 40 60 80 100 120 140 160 180 200
Nc
Nc
X
Entropy = − P (fs(i) ) log10 P (fs(i) )
i=1
(i)
where the probability density function P (fs ) is evaluated numerically im-
plementing data binning. In this case, the trend is descendent because the
information introduced by new measurements decreases by increasing the
number of observations. This metric can also be used to sort the frequencies,
but in this case, f3 and f4 are inverted for some values of Nc and f2 is very
close to the previous two and can be missorted.
0,8
0,6
F1 score
0,4
0,2
0.95
F1 0.95
0.59
0.95
Rec 0.94
0.43
0.95
Prec 0.97
0.98
0.96
Acc 0.95
0.68
Q9: Is it possible to further reduce the amount of data that must be stored
by the network?
Data Management
This chapter analyzes the minimum amount of data that must be stored to
perform anomaly detection on the vibrational waveforms, and some strategies
that can be implemented to reduce such volume of data, both representing
important topics often studied in literature [6, 64].
Considering a network of l = 8 synchronized sensors interconnected to a
coordinator that store the accelerometric measurements, where each sensor
acquire Ns = 65536 samples each acquisition with Nb = 16 resolution bits
for Na = 4107 acquisition, it is trivial to observe that the total amount of
data stored by the coordinator is equal to Mt = Ns Nb Na l ' 32 Gbit = 4 GB.
This considerable amount of data has been stored in a year of non continuous
measurements. In fact the effective acquisition time can be estimated as Tt =
Ta Na ' 44860 m ' 448 h, that shows how huge could be the volume of data in
a continuous measurement system; in a year it is around 47 GB. To reduce the
amount of data the first step is the decimation approach. In this application
the fundamental frequency of the bridge falls in the interval [0, 20] Hz, so
to respect the Shannon’s theorem with a guard band of 5 Hz a sampling
frequency fsamp = 50 Hz it is enough to capture the bridge oscillations, since
the measurements are acquired by accelerometers with fsamp = 100 Hz, a
decimation by factor 2 can be adopted and the total amount of data is halved:
Md = Mt /2 ' 2 GB. Moreover, starting from the decimated acquisitions,
three other parameters can be modified to find some possible configuration
67
68 Data Management
0 0 0
[Hz]
−1 −1 −1
0 −1 0 1 −1 0 1 −1 0 1
200 Samples
(a)
2 Sensors 3 Bit
f2
1 1 1
Training
Standard Test
Anomaly Test 0 0 0
−1
−1 0 1
(a)
f1 [Hz] −1 −1 −1
−1 0 1 −1 0 1 −1 0 1
Figure 6.1: Examples of feature transformation due to the effect of low num-
ber of sensors, low number of bits, and low number of samples with respect
to the standard measurement condition reported on the left.
that does not deteriorate the performance of the OCC algorithms, but can
reduce the volume of data:
All these possibilities will be analyzed and widely discussed in the next
section. In Fig. 6.1 some working points of the system are reported and
compared to the reference working condition after decimation (l = 8, Nd =
32768, Nb = 16).
6.1 Performance
The performance is evaluated through the accuracy, considering only the test
set. The feature space has dimension D = 2, and the three data sets used for
6.1 Performance 69
(n)
where Ns is the number of features (Ns = 2), fs is the sth feature of the
(n)
nth acquisition in the initial configuration, and f¯s is the relative data point
in the modified configuration.
0.1 1
0.08 0.8
Accuracy
0.06 0.6
Ef [Hz]
Error
OCCNN2
0.04 PCA 0.4
KPCA
GMM
0.02 0.2
0 0
1 2 3 4 5 6 7 8
l
0.1 1
0.08 0.8
Accuracy
0.06 0.6
Ef [Hz]
Error
OCCNN2
0.04 PCA 0.4
KPCA
GMM
0.02 0.2
curacy of the algorithms remains almost the same as long as the number of
sensors available is greater than 2, as the error presents a significant increase
in correspondence of the gap between 2 and 3 sensors. Thus we can deduce
that the minimum number of sensors that can be used to monitor the Z-24
bridge is equal to 3. In this configuration it is easy to notice that the amount
of data stored is reduced to Msen = Ns Nb Na 3 ' 0.8 GB.
0.1 1
0.08 0.8
Accuracy
0.06 0.6
Ef [Hz]
Error
OCCNN2
0.04 PCA 0.4
KPCA
GMM
0.02 0.2
0 0
3 4 5 6 7 8 9 10 11 12 13 14 15 16
Nb
6.2 Observations
Three different approaches are proposed to reduce the volume of data stored
and limit the costs of sensors and network infrastructure necessary to mon-
itor the structure. In this sense, when the goal is reducing the amount of
6.2 Observations 73
data stored, it is good practice cut down the observation time using a number
of accurate sensors; when the target is to minimize the sensor cost, a good
practice is to adopt low-cost sensors (with low resolution or number of bits
per sample), combined with long observation time with a network of several
sensors; when the objective is to contain the network infrastructure cost, a
low number of accurate sensors and long observation time can be considered.
To evaluate the error introduced from these strategies and the performance
of the algorithms, the RMSE and the accuracy are used as metrics. The
results show that these strategies can be adopted without significant loss of
performance; in fact, all the algorithms except the PCA, ensure accuracy
greater than 94% in all the proposed configurations, with the maximum per-
formance reached by OCCNN2 whose accuracy never goes down below 95%.
7.1 Introduction
The problem of identifying and classifying the presence of a target in a par-
ticular environment with low-cost sensors remains a key issue for outdoor
security applications [35, 36, 67, 68]. The variability of the ground and en-
vironment characteristics (i.e., weather conditions, humidity, temperature,
wind speed) makes the target detection more complicated than a controlled
indoor environment.
In literature, many works propose to use networks of geophones (which
present weak dependencies from the environment) to capture the ground
vibration to detect the presence of persons in indoor scenarios [69] or to clas-
sify several vehicles with different weights in a well defined outdoor area [70].
In [71], a method for detecting intruders and predicting their activities out-
door using a seismic sensor is presented. Similarly, in [72], the objective is
to detect and classify different targets (e.g., humans, vehicles, and animals
led by a human) using seismic and passive infrared sensors. Other solutions
exploit cooperative sensors (i.e., microphones and geophones) and data fu-
sion techniques to improve vehicle classification accuracy and estimate their
velocity [73,74]. In [75], a system architecture for the classification of moving
75
76 Human Activities Classification Using Biaxial Seismic Sensors
FEATURES EXTRACTION
CONDITIONING
x h (t) xh Zh Sh S̃h ŷ
STACKING
ADC & FFT PCA CLASSIFIER
NORM.
x v (t) xv Zv Sv S̃v
Figure 7.1: Illustration of the processing data chain to extract features from
the geophone signals and perform classification.
converter (ADC) with sampling frequency fs and resolution Nbit . The out-
put of the conversion consists in two time series xh = {xh (k/fs )}K−1 k=0 and
K−1
xv = {xv (k/fs )}k=0 of K samples. Subsequently, the time series are firstly
split in Nw row vectors, obtained through a partially overlapped sliding win-
dow of length W samples, with W ≤ K, and a sliding step ∆w , and then
rearranged into the matrices Zh and Zv , of size Nw × W , by stacking them.
From now on, since the processing stages apply to Zh and Zv separately, we
indicate both with Z for the sake of conciseness. In this phase, the samples
of each observation window are normalized column-wise such that the results
are zero mean row vectors with maxj |zn,j | = 1, n = 1, . . . , Nw . Finally, the
data are labeled by activity for the classification during the training phase.
7.2.2 PCA
Principal component analysis (PCA) distills the essential information from
the dataset, which is then represented as a set of new orthogonal variables
called principal components obtained from a linear combination of the orig-
inal data [78]. For the calculation of the principal components, we consider
only the No points of the training subset. After centering the matrix S by
subtracting its column-wise sample mean, we evaluate the sample covari-
ance matrix Σ = N1o ST S. Then, Σ is factorized by eigenvalues decompo-
sition (EVD) Σ = QΛQT , where Λ is the diagonal matrix of eigenvalues,
ordered from the largest to the smallest, and Q is the matrix of eigenvec-
tors [79].
In order to perform dimensionality reduction, we only keep the first Dh
or Dv eigenvalues of Λ and the corresponding eigenvectors Q̃ (i.e., selected
columns of Q). The projections S̃ of the observations in the components
subspace through the new projection matrix Q̃ are S̃ = SQ̃, where S̃ is a
Nw × Dh or Nw × Dv matrix, and Dh , Dv ≤ D are the number of principal
components considered for S̃h and S̃v . Note that, while the principal com-
ponents are calculated solely over the training points No , all the Nw points
are projected in the components subspace. These two matrices represent the
selected features used to train and test the classifiers described in Section 7.3.
• Single classifier. For each observation window, the data of the two
channels are jointly processed. The Nw × (Dh + Dv ) matrix S̃hv is built
concatenating the matrices S̃h and S̃v as S̃hv = [S̃h S̃v ]. The monoaxial
(i.e., horizontal or vertical) geophone configuration is obtained setting
Dv = 0 or Dh = 0, respectively.
7.3 Classification Techniques 79
Hereafter, the two classifiers used in this work are briefly described. Be-
sides, ordinary cross-correlation based classification method used both in
time and frequency domain is reviewed. For the sake of clarity, y is the vec-
tor of actual classification labels of length Nw , ŷ is the vector of classification
labels estimated by the algorithms, s̃n is the n-th row of S̃h (either S̃v or S̃hv ),
and S is the feature space of dimension D (either Dh or Dv ). Thus, each
point, both for training and test, is represented by the pair (s̃n , yn ).
where w̃ are the weights of the parametrical model, and λ is the regularization
parameter [29, 30]. The hyperplane that performs a good separation is the
one that maximizes its distance from the nearest training points of each class,
and it is identified by the set of weights w̃ that minimizes the error function.
Given a test point (s̃m , ym ), the estimated label ŷm is given by ŷm = s̃m w̃T .
80 Human Activities Classification Using Biaxial Seismic Sensors
7.3.2 k-NN
With (7.1) we can assign s̃m to the same class of the nearest s̃n . If the number
of training points is large enough, it makes sense to use the majority rule of
the nearest k neighbors, instead of the single nearest neighbors [80]. In the
case of binary classification, k is chosen to be odd to avoid that a point is
assigned to two different classes.
• Training and validation: 60% of all PSDs of each activity are ran-
domly chosen to calculate the projection matrix of PCA, Q̃. Then,
the projected points are used to train the classifiers. To set the hyper-
parameters of SVM and k-NN, 10% of these points are randomly chosen
and then used to perform cross-validation. The resulting optimal values
are k = 3 and λ = 0.1.
• Test: the other 40% of PSDs are used to test the performance of the
algorithms. In this case, the projection in the principal components
82 Human Activities Classification Using Biaxial Seismic Sensors
0.98
0.94
0.9
0.86
0.82 C-SVM
η
S-SVM
0.78 S-SVM (V)
S-SVM (H)
0.74
C-k-NN
0.7 S-k-NN
S-k-NN (V)
0.66 S-k-NN (H)
2 3 4 5 6 7 8 9 10
Dv , Dh
Figure 7.2: Accuracy varying PCA components of the vertical and horizontal
channels, with Tob = 20 s.
The same ratio of training and test points has also been used for the template
matching based approaches and the LDA. However, no cross-validation is
required in these cases. As a figure of merit, to evaluate the performance of
the classifiers, we consider the accuracy, defined as
As shown in Fig. 7.2, the single SVM (S-SVM) classifier provides always
the best performance, for each value of Dh and Dv ; in particular, with only
Dh = Dv = 3 components, it reaches an accuracy greater than 96%, while
to achieve the same accuracy with the cascade SVM (C-SVM) configuration,
it is necessary to use a higher number of PCA components (Dh , Dv ≥ 5).
On the contrary, the single k-NN (S-k-NN) converges to an accuracy roughly
equal to 93%, much lower than that of the cascade k-NN (C-k-NN), which
stands at almost 97%. These results prove that, in the case of k-NN and
for this experiment setup, it is better to adopt the cascade solution and
use the samples coming from the channels separately. With regard to the
monoaxial scenario, we can see that for the SVM, the performance in case
we use only the samples of the vertical channel is comparable with those of
C-SVM and S-SVM solutions, when Dv = Dh = 10. On the contrary, if we
use only the horizontal channel, the accuracy is always worse. Differently,
the performance of k-NN is always better when using a biaxial geophone.
0.98
0.94
0.9
0.86
0.82
η
5 10 15 20 25 30
Tob [s]
Figure 7.3: Accuracy varying the observation window duration, Tob . For
PCA-based classifiers Dh = Dv = 5.
7.5 Observations
A passive human activity classification method exploiting the ground vibra-
tions observed by a two-channel geophone is proposed. Data collected by
the two channels are processed by a PCA-based dimensionality reduction
to extract significant features in the frequency domain. The classification
step is performed by either a single classifier or a cascade of two classifiers.
The analysis considered the most important parameters (observation window
and the number of PCA components) and setups (joint processing or cascade
processing) to provide a complete set of results which may assist the system
7.5 Observations 85
8.1 Introduction
With the advent of the technological revolution named Internet of things
(IoT), increasingly pervasive and context-adaptive communication systems
are conquering the radio-frequency (RF) spectrum [10]. Since spectrum
population may represent an issue in some frequency bands, e.g., the over-
crowded industrial, scientific and medical (ISM) ones, there is an increasing
interest in exploiting existing over-the-air signals, devised for some specific
purpose, to perform other tasks thus avoiding dedicated radio emissions.
WiFi routers, broadcast stations, and mobile cellular networks are only a
few examples of such signals of opportunity (SoOp) [84–87].
Security in homes, industrial environments, and facilities is becoming a
critical aspect of modern society, and for such reasons, ambient intelligence
is gaining attention recently [37]. Video-based surveillance systems using, for
example, cameras are the dominant technology in such scenarios. However,
the personal privacy issue is still a reason for deterring users. The ambient
intelligence paradigm is not only beneficial for security purposes but more
generally as an enabler for context-aware applications like smart homes, to
name one example.
87
88 Anomaly Detection Using WiFi Signals of Opportunity
Figure 8.1: The four scenarios considered with the anomaly represented by
a human being.
• The tests have been performed in both line-of-sight (LOS) and non-
line-of-sight (NLOS) conditions.
packets emitted by the AP. In the RF sensor adopted, the sampling rate,
fs , can be varied according to the needs. On the one hand, a high sampling
frequency can guarantee a larger signal bandwidth at the cost of an increasing
computational rate required to process the samples. On the other hand, a
low sampling rate may alleviate the computational burden but result in poor
detection performance.
As shown in Fig. 8.2, the first step in the proposed approach is to extract
beacon packets from the observed samples. In order to keep the computa-
tional burden low, it is possible to use non-coherent detection of beacons
starting from the envelope of the received samples, |ri |, i ∈ N. After bea-
con detection from the ultra-dense over-the-air packet flow, NP samples are
extracted. In particular, denoting with NB the number of beacons detected,
we have
ToAj +NP
bj = {ri }i=ToA j
with j = 1, . . . , NB (8.1)
where bj is a vector containing the first NP samples of the j-th beacon while
ToAj is the j−th beacon time of arrival. We want to emphasize that precise
92 Anomaly Detection Using WiFi Signals of Opportunity
ri bj sj
PREAMBLE PSD
features
EXTRACTION ESTIMATION
Beacons as SoOp
|ri | |r̃i |
|·| BEACON
DETECTION
j =1
j =2
j =3
|ri|
|r‹i|
j =4
sj,n
i i n
further mitigate the impact of outliers in training the algorithms, PSDs have
been averaged over M points and organized in vectors xk , as
kM
1 X
xk = sj with k = 1, . . . , N (8.2)
M
j=(k−1)M +1
detection as
M
TD = (8.3)
BR EC
where BR is the beacon rate (number of beacons per second), and EC rep-
resents the number of beacons extracted over the number of beacons trans-
mitted.1 Finally, the features are organized in the matrix
X = [xT T T T
1 , x2 , . . . , xN ] (8.4)
of size N × D, where, from now on, N is the number of input points and D
is their dimension. Note that if M = 1 then X = S.
XT
m Xm
C= . (8.6)
N −1
C = VΛVT (8.7)
VP = [v1 , v2 , . . . , vP ] (8.8)
XP = XVP . (8.9)
This approach takes inspiration by the standard PCA and overcomes the lim-
itation of the linear mapping that corresponds to finding linear boundaries
in the original feature space. In many applications, this constraint repre-
sents a severe limitation and can sharply decrease the classification accuracy.
KPCA firstly maps the data with a non-linear function, after which applies
8.3 Survey of ML Techniques 95
the standard PCA to find a linear boundary in the new feature space. Such
boundary becomes non-linear, going back to the original feature space. A
crucial point in KPCA is the selection of a non-linear function that leads
to linearly separable data in the new feature space. In literature, when the
data distribution is unknown, the RBF kernel is often proposed as the right
candidate to accomplish this task [43]. Suppose we have a generic point z
that corresponds to a vector of length D, we can apply the RBF as follows
2
Kzj = e−γ||z−xj || with j = 1, 2, . . . , N (8.10)
ponent of the point z in the kernel space. Overall the starting vector z is
mapped in a vector Kz of length N . Applying now the PCA described in
section 8.3.1 to the new data set obtained remapping all the training points,
it is possible to find non-linear boundaries in the starting feature space for
a better classification. It is good practice to center the points mapped with
the RBF because the mapping in the new feature space could be non-zero
mean.
• Validation: composed of 25% of the empty room points and with the
same number of adversarial points (the adversarial class depends on
2
The accuracy is defined as the number of correct classifications over the number of
points classified.
8.4 Numerical Results 97
• Test: composed of the remaining empty room points and the same
number of adversarial points.
In this configuration, training, validation, and test sets have the same number
of points, N = 400, when averaging is not performed (M = 1). For each
simulation point, 100 Monte Carlo iterations are performed, and the data
partitioning is repeated, selecting random points in each iteration. Given a
generic point z and its reconstructed version e z = zP VPT from its projection
zP , the error function ez is defined as
ez = ||z − e
z||. (8.11)
Evaluating the error function over the validation set, we selected a threshold
to ensure a false alarm (FA) probability lower than 0.01 on the training set.
Table 8.2: PCA and KPCA parameters for the considered scenarios.
M
1 2 3 4 5 6 7 8 9 10 11 12
PCA P 4 2 1 1 1 1 1 1 1 1 1 1
(a) P 6 2 1 1 2 2 2 2 2 2 2 2
KPCA
γ 30 10 5 5 5 5 5 5 5 5 5 5
PCA P 4 2 1 1 1 1 1 1 1 1 1 1
(b) P 10 2 1 1 2 2 2 2 2 2 2 2
KPCA
γ 25 20 15 15 10 10 5 5 5 5 5 5
PCA P 2 1 1 1 1 1 1 1 1 1 1 1
(c) P 4 4 3 3 2 2 2 2 2 2 2 1
KPCA
γ 10 10 10 20 10 10 15 5 15 5 25 5
PCA P 2 1 1 1 1 1 1 1 1 1 1 1
(d) P 4 3 3 3 2 2 2 2 2 2 1 2
KPCA
γ 10 15 10 10 5 10 10 5 10 5 20 10
In the NLOS case depicted in Fig. 8.1c and Fig. 8.1d, an obstacle between
the AP and the RF sensor is present. Even in this scenario PCA, KPCA,
and RSS-based method are trained and optimized with the same procedure
described in the previous subsection. As we can see from Fig. 8.5, the per-
formances are different from the LOS case. In this configuration, the PCA
8.4 Numerical Results 99
accuracy with static target slightly decrease, instead KPCA works well also
for low M . For M = 2 in the case of a static object, we experience a remark-
able performance degradation for PCA while KPCA preserves high accuracy.
This effect is not present in the case of a moving target. This happens be-
cause the absence of a direct path between the transmitter and the RF sensor
makes the channel transfer function strongly influenced by the target that
intersects the reflected paths. Therefore, the detection of a moving target,
for low values of M , is favored by a NLOS scenario: presumably, the inter-
section of a reflected path is more likely to happen than the intersection of
the direct path in the LOS case. Even in these scenarios, the RSS-based
technique accuracy is overwhelmed by the proposed approaches, particularly
for low values of M .
100 Anomaly Detection Using WiFi Signals of Opportunity
Accuracy
PCA(a)
KPCA(d) PCA(b)
100
90
KPCA(c) PCA(c)
80
70
60
KPCA(b) 50 PCA(d)
KPCA(a) RSS(a)
RSS(d) RSS(b)
RSS(c)
M =2 M =8
KPCA in all the scenarios, suggesting the use of PCA that is less complex
and computationally faster. Finally, it is also important to highlight that the
PCA presents lower accuracy in the case of NLOS with respect to the LOS
one when M is low. This happens because the interception of the direct path
strongly changes the channel transfer function when the target is present.
8.5 Observations
In this work, we proposed and studied an RF-based automatic indoor anomaly
detector exploiting SoOp from WiFi devices and machine learning techniques.
The numerical results based on real waveforms, collected by a RF sensor,
102 Anomaly Detection Using WiFi Signals of Opportunity
Conclusion
103
104 Conclusion
area, these two last chapters present original results that originated from it.
This thesis presents the endeavor to explore the use of ML algorithms in
the area of SHM. While most of the results are based on the Z-24 database,
the extension of the proposed solutions to a wide class of structures appears
feasible and supported by the analysis of their robustness. Therefore, an in-
teresting direction to pursue is applying and validating the designed method-
ologies to other structures. Moreover, since deep learning is a growing area
capturing the attention of many researchers worldwide, the application of
deep learning in this context seems a thriving research area that could lead
to attractive solutions.
Questions Recall
Q1: How can we use tools for dimensionality reduction to perform anomaly
detection?
Q2: What are the most effective algorithms able to perform anomaly detec-
tion?
A3: Yes, it is. In this thesis, we propose the OCCNN and OCCNN2 algo-
rithms that exploit the NN flexibility in the anomaly detection task.
These algorithms generate adversarial dummy points to train the anomaly
detector as a two-class NN classifier.
Q4: How can we perform anomaly detection based on machine learning tools
to detect damage in a structure?
A5: Yes, there is. In this thesis, we focused our efforts on processing ac-
celerometric measurements gathered by sensors installed on bridges.
In order to maintain generality, we decide not to use strategies that
require a model of the monitored structure; hence we only exploited in-
formation extracted by the data (model-free or data-driven strategies).
Among several approaches offered by the literature, we decide to use
SSI that satisfies all the requirements described above.
A8: Yes, it is. We proposed several metrics to accomplish this task, and we
showed that variance and skewness of the extracted frequencies could
be effectively used to determine the most reliable frequencies among all
the extracted ones.
Q9: Is it possible to further reduce the amount of data that the network must
store?
A9: Yes, it is. In the thesis, we proposed three different strategies to reduce
the amount of data stored without effect the anomaly detectors perfor-
mance: i) reduce the number of sensors used to monitor the structure;
ii) reduce the number of resolution bits of the sensors; iii) reduce the
acquisition time for each measurement.
A10: Yes, it is. Experimental results showed that when the objective is to
reduce the network infrastructure costs, the best strategy is to reduce
the number of sensors using a low number of accurate accelerometers to
achieve good performance. Instead, when the goal is to reduce sensors
cost, a large network of low-cost sensors should be the best solution to
maintain high anomaly detection capability.
108 Conclusion
A12: Yes, it is. As said before, ML and anomaly detection techniques are
flexible strategies that can be used in different fields. For instance, in
chapter 8, we propose an application that exploits WiFi signals of op-
portunity to detect the presence of a target in an environment through
anomaly detection algorithms. Experimental results show that this ap-
proach works pretty well and provides very good results in terms of
accuracy.
109
Publications
1 E. Favarelli, A. Giorgetti, “Machine Learning for Automatic Processing
of Modal Analysis in Damage Detection of Bridges,” IEEE Transac-
tions on Instrumentation and Measurement (TIM), 2020.
[2] D. Brunelli, L. Benini, C. Moser, and L. Thiele, “An efficient solar en-
ergy harvester for wireless sensor nodes,” in 2008 Design, Automation
and Test in Europe, 2008, pp. 104–109.
111
112 BIBLIOGRAPHY
[10] M. Chiani and A. Elzanaty, “On the LoRa modulation for IoT: Wave-
form properties and spectral analysis,” IEEE Internet of Things Jour-
nal, vol. 6, no. 5, pp. 8463–8470, 2019.
[35] E. Testi, E. Favarelli, and A. Giorgetti, “Machine learning for user traf-
fic classification in wireless systems,” in 26th European Signal Process-
ing Conference (EUSIPCO), Rome, Italy, Sep. 2018, pp. 2040–2044.
[38] P. Perera and V. M. Patel, “Learning deep features for one-class classi-
fication,” IEEE Transactions on Image Processing, vol. 28, no. 11, pp.
5450–5463, Nov. 2019.
[77] P. Welch, “The use of fast Fourier transform for the estimation of power
spectra: A method based on time averaging over short, modified peri-
odograms,” IEEE Transactions on audio and electroacoustics, vol. 15,
no. 2, pp. 70–73, Jun. 1967.
[85] S. Bartoletti, A. Conti, and M. Z. Win, “Passive radar via LTE signals
of opportunity,” IEEE International Conference on Communication
Workshops (ICC), pp. 181–185, Jun. 2014.
[92] Y. Jin, Z. Tian, M. Zhou, Z. Li, and Z. Zhang, “An adaptive and
robust device-free intrusion detection using ubiquitous WiFi signals,”
in IEEE International Conference on Digital Signal Processing (DSP),
Shanghai, China, Nov. 2018, pp. 1–5.
[94] Z. Tian, Y. Li, M. Zhou, and Z. Li, “WiFi-based adaptive indoor pas-
sive intrusion detection,” in IEEE International Conference on Digital
Signal Processing (DSP), Shanghai, China, Nov. 2018, pp. 1–5.
[97] M. Chiani, A. Giorgetti, and E. Paolini, “Sensor radar for object track-
ing,” Proceedings of the IEEE, vol. 106, no. 6, pp. 1022–1041, Jun.
2018.
There are several people I should thanks for where I am today, several people
who gave me support when I wanted to give up, several people who showed
me the light when the only thing I could see was the darkness.
My father, Daniele, my green light at the end of the pier, a man who
had that rare smile that probably you meet once in your life, an indelible
example of the person that I want to be.
125
126 Acknowledgements
All my friends and relatives that support me every day, sometimes with
simple gesture, sometimes with great supports, I hope to spend as much time
as possible with them and to give them back at least half of the good they
gave me.
Finally I would like to thanks my advisor and mentor Prof. Andrea Gior-
getti, a great person and an illuminating professor, he always supported me
during these years and I hope to continue to work with him as long as pos-
sible. I have the merit of having followed him, he has the merit of having
believed in me.
Thank you all, for your trust, your support and your love. Is not al-
ways easy to believe in yourself during life, there are moments that the only
thought you have in your mind is “I am not gonna make it I am not good
enough” well today I finally proved to myself, that I am good enough. Thank
you.
“A true master is an eternal student”