Using LSTM Recurrent Neural Networks To Predict Excess Vibration Events in Aircraft Engines
Using LSTM Recurrent Neural Networks To Predict Excess Vibration Events in Aircraft Engines
Abstract—This paper examines building viable Recurrent Neu- of these contributions, making it very hard to predict the real
ral Networks (RNN) using Long Short Term Memory (LSTM) cause behind the excess in vibrations.
neurons to predict aircraft engine vibrations. The model is This paper presents a means to make these predictions
trained on a large database of flight data records obtained
from an airline containing flights that suffered from excessive viable in the aviation industry within a reasonable time
vibration. RNNs can provide a more generalizable and robust window. The problem is approached using LSTM RNNs,
method for prediction over analytical calculations of engine which have seen widespread recent use with strong results in
vibration, as analytical calculations must be solved iteratively image [2], speech [3] and, language prediction [4]. LTSM
based on specific empirical engine parameters, and this database RNNs were chosen for this work in particular due to their
contains multiple types of engines. Further, LSTM RNNs provide
a “memory” of the contribution of previous time series data generalizability and predictive power due to having a memory
which can further improve predictions of future vibration values. for the contribution of the previous time series data to predict
LSTM RNNs were used over traditional RNNs, as those suffer the future values of vibration. This study provides another
from vanishing/exploding gradients when trained with back dimension for the use of this promising type of recurrent neural
propagation. The study managed to predict vibration values for network.
5, 10 and 20 seconds in the future, with 3.3%, 5.51% and 10.19%
mean absolute error, respectively. These neural networks provide II. R ELATED W ORK
a promising means for the future development of warning systems
so that suitable actions can be taken before the occurrence of A. Aircraft Engine Vibration
excess vibration to avoid unfavorable situations during flight. According to Reference [1]: “The most common types
of vibration problems that concern the designer of jet en-
I. I NTRODUCTION gines include (a) resonant vibration occurring at an in-
tegral order, i.e. multiple of rotation speed, and (b) flut-
Aircraft Engine vibration is a critical aspect of the aviation ter, an aeroelastic instability occurring generally as a non-
industry, and accurate predictions of excessive engine vibration integral order vibration, having the potential to escalate,
have the potential to save time, effort, money as well as human unless checked by any means available to the operator, into
lives in the aviation industry. An aircraft engine, as turbo- larger and larger stresses resulting in serious damage to the
machinery, should normally vibrate as it has many dynamic machine. The associated failures of engine blades are referred
parts. However, it is not supposed to exceed resonance limits to as high cycle fatigue failures”. The means available to
so not to destroy the Engine [1]. the operator in practical aviation operations are mainly: i)
Reference [1] is discussing vibrations generated from en- maintenance engine checks scheduled in maintenance pro-
gine baldes’ fluttering. Engine blades are the engine rotating grams based on engine reliability observations, and ii) engine
components that have the largest dimensions among other vibration monitoring for forecasting the excess vibration oc-
components. While their rotation at high speeds, they will currence based on statistical and analytical methods which
withstand high centrifugal forces that would logically give the consider empirical factors of safety. Some effort had been
highest contribution to engine vibrations. done using neural networks to classify engine abnormalities
Engine vibrations are not that simple to calculate or predict without doing analytical computation, e.g., Alexandre Nairac
analytically because of the fact that various parameters con- et al. [5] worked on this aspect to detect abnormalities in
tribute to their occurrence. This fact is always a problem for engine vibrations based on recorded data.
aviation performance monitors, especially that engines vary in David A. Clifton et al [6] presented work for predicting
design, size, operation conditions, service life span, the aircraft abnormalities in engine vibration based on statistical analysis
they are mounted on, and many other parameters. Most of of vibration signatures. The paper presents two modes of
these parameters’ contributions can be translated in some key prediction. One is ground-based (off-line), where prediction
parameters measured and recorded on the flight data recorder. is done by run-by-run analysis to predict abnormalities based
Nonetheless, vibrations are likely to be a result of a mixture on previous engine runs. The success in this approach was
Authorized978-1-5090-4273-9/16/$31.00 © 2016OF
licensed use limited to: NANJING UNIVERSITY IEEE 260
AERONAUTICS AND ASTRONAUTICS. Downloaded on January 24,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
predicting abnormalities two runs ahead. The other mode is an
flight based-mode (online) in which detection is done either
by sending reduced data to the ground-base or onboard the
aircraft. The paper mentions that they had future 2.5 hours
successful prediction. However, this prediction is done after
half an hour of flight data collection, which might be a critical
time as well, as excess vibration may occur during this data
collection time. The paper did not mention how much data
was required to have a sound prediction.
B. LSTM RNN
LSTM RNN was first introduced by S. Hochrieter & J.
Schmidhuber [7]. The paper introduced a solution for this
problem: ”Learning to store information over extended period
of time intervals via recurrent backpropagation takes a very
long time, mostly due to insufficient, decaying error back
flow”. It was a solution for the exploding/vanishing gradients
in backpropagtion to modify the weights of the network. This
study paved the way for many interesting projects. LSTM
RNN’s have been used with strong performance in image
recognition [2], audio visual emotion recognition [3], music
composition [8] and other areas.
III. E XPERIMENTAL DATA
The data used consists of 76 different parameters recorded
on the aircraft Flight Data Recorder (FDR) as well as the Fig. 1. LSTM cell design
vibration parameter. A subset of these parameters were chosen
based on the likelihood of their contribution to the vibration
based on aerodynamics/turbo-machinary background. Some by a node in the inputs of the neural network and an additional
parameters, such as Inlet Guide Vans Configuration, Fuel node is used for a bais. Each neural network in the three
Flow, Spoilers Configuration (this was preliminary considered designs consists of LSTM cells that receive both an initial
because of the special position of the engine mount), High input and the output of the previous cell, as inputs. Each cell
Pressure Valve Configuration and Static Air Temperature were has three gates to control the flow of information through the
excluded because it was found that they generated noise more cell and accordingly, the output of the cell. Each cell also has a
than positively contributing in the vibration prediction. cell-memory which is the core of the LSTM RNN design. The
The finally chosen parameters were: cell-memory allows the flow of information from the previous
1) Altitude states into current predictions.
2) Angle of Attack The gates that control the flow are shown in Figure 1. They
3) Bleed Pressure are: i) the input gate, which controls how much information
4) Turbine Inlet Temperature will flow from the inputs of the cell, ii) the forget gate,
5) Mach Number which controls how much information will flow from the cell-
6) Primary Rotor/Shaft Rotation Speed memory, and iii) the output gate, which controls how much
7) Secondary Rotor/Shaft Rotation Speed information will flow out of the cell. This design allows the
8) Engine Oil pressure network to learn not only about the target values, but also
9) Engine Oil Quantity about how to tune its controls to reach the target values.
10) Engine Oil Temperature
11) Aircraft Roll All the utilized architectures have a common LSTM cell
12) Total Air Temperature design shown in Figure 1. However, there are two variations of
13) Wind Direction this common design used in the utilized architectures, shown
14) Wind Speed in Figures 2 and 3, with the difference being the number of
15) Engine Vibration inputs from the previous. Cells that take initial inputs from
more input nodes are denoted by ‘M1’ cells. As input nodes are
IV. M ETHODOLOGY needed to be reduced through the neural network, the design
Three LSTM RNN architectures were designed to predict of the cell will be different and it is denoted by ‘M2’ cells.
engine vibration 5 seconds, 10 seconds, and 20 seconds in the The equations used in the forward propagation through the
future. Each of the 15 selected FDR parameters is represented neural network are:
261
Authorized licensed use limited to: NANJING UNIVERSITY OF AERONAUTICS AND ASTRONAUTICS. Downloaded on January 24,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Level 1 LSTM cell design
262
Authorized licensed use limited to: NANJING UNIVERSITY OF AERONAUTICS AND ASTRONAUTICS. Downloaded on January 24,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
TABLE I
A RCHITECTURES W EIGHTS -M ATRICES D IMENSIONS
it = Sigmoid(wi • xt + ui • at−1 + baisi ) (1)
Architecture I
wi ui wf uf wo uo wg ug
ft = Sigmoid(wf • xt + uf • at−1 + baisf ) (2) Level 1 16×16 16×16 16×16 16×16 16×16 16×16 16×16 16×16
Level 2 16×1 1×1 16×1 1×1 16×1 1×1 16×1 1×1
Level 3 16×1
Architecture II
wi ui wf uf wo uo wg ug
gt = Sigmoid(wg • xt + ug • at−1 + baisg ) (4) Level 1 16×16 16×16 16×16 16×16 16×16 16×16 16×16 16×16
Level 2 16×1 1×1 16×1 1×1 16×1 1×1 16×1 1×1
ct = ft • ct−1 + it • gt (5)
Architecture III
wi ui wf uf wo uo wg ug
Level 1 16×16 16×16 16×16 16×16 16×16 16×16 16×16 16×16
at = ot • Sigmoid(ct ) (6) Level 2 16×16 16×16 16×16 16×16 16×16 16×16 16×16 16×16
Level 3 16×1 1×1 16×1 1×1 16×1 1×1 16×1 1×1
where (see Figure 1):
Level 4 16×1
it : input-gate output
ft : forget-gate output
ot : output-gate output TABLE II
A RCHITECTURES W EIGHTS M ATRICES ’ T OTAL E LEMENTS
gt : input’s sigmoid
ct : cell-memory output Art I Art II Art III
wi : weights associated with input and input-gate 21,170 21,160 83,290
ui : weights associated with previous output and input-
gate
wf : weights associated with input and forget-gate B. Architecture II
uf : weights associated with previous output and forget-
gate As shown in Figure 5, this architecture is almost the same
wo : weights associated with input and output-gate as the previous one except that it does not have the third level.
uo : weights associated with previous output and the Instead, the output of the second level is averaged to compute
output-gate the output of the whole network.
wg : weights associated with the cell input C. Architecture III
ug : weights associated with previous output and the
Figure 6 presents a deeper neural network architecture. In
cell input
this design, the neural network takes inputs from twenty time
series (the current time instant and the past nineteen). It feeds
the second level of the neural network with its output. Second
V. LSTM RNN A RCHITECTURES
level does the same procedure as first level giving a chance for
The three architectures are as follows, with the dimensions more abstract decision making. The output of the second level
of the weights of these architectures shown in Table I and the of the neural network is considered the first hidden layer and
total number of weights shown in Table II: the output of the second level is considered the second hidden
layer. The third level of the neural network then reduces the
A. Architecture I number of nodes fed to it from 16 nodes (15 input nodes +
As shown in Figure 4, this architecture takes inputs from bais) per cell to only one node per cell. The output of the third
ten time series (the current time instant and the past nine). It level of the neural network is considered the third hidden layer.
feeds the second level of the neural network with its output. Finally, the output of the third level of the neural network is
The output of the first level of the neural network is considered twenty nodes, a node from each cell. These nodes are fed to
the first hidden layer. The second level of the neural network a final neuron in the fourth level to compute the output of the
then reduces the number of nodes fed to it from 16 nodes (15 whole network.
input nodes + bais) per cell to only one node per cell. The
output of the second level of the neural network is considered D. Forward Propagation
the second hidden layer. Finally, the out of the second level The following is a general description for the forward
of the neural network would be only 10 nodes, a node from propagation path. This example uses Architecture I as an
each cell. These nodes are fed to a final neuron in the third example but similar steps are taken in the other architectures
level to compute the output of the whole network. with minor changes apparent in their diagrams. With Figure 4
263
Authorized licensed use limited to: NANJING UNIVERSITY OF AERONAUTICS AND ASTRONAUTICS. Downloaded on January 24,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
Fig. 4. Architecture I
Fig. 5. Architecture II
264
Authorized licensed use limited to: NANJING UNIVERSITY OF AERONAUTICS AND ASTRONAUTICS. Downloaded on January 24,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
Fig. 6. Architecture III
presenting an overview of the structure of the whole network uf . The output vectors are summed and an activation function
and considering Figure 2 as an overview of the structure of is applied to it as in Equation 2. It controls how much of the
the cells in Level 1 and in Level 2 – the input at each iteration cell memory Figure 4 (saved from previous time step) should
consists of 10 seconds of time series data of the 15 input pass. The output is called ft .
parameters and 1 bais (Input in Figure 2) in one vector (xt in Fourth, at the output gate, xt is dot multiplied by its weights
Figure 4) and the output of the previous cell (Previous Cell matrix wo and at−1 is dot multiplied by its weights matrix uo .
Output in Figure 2) in another vector (at−1 in Figure 4). Each The output vectors are summed and an activation function is
second of time series input is fed to the corresponding cell applied to it as in Equation 3. The output is called ot .
(i.e., the first seconds’ 15 parameters and 1 bais are fed to Fifth, the contribution of the cell input Input gt and cell
first cell, the second seconds’ 15 parameters and 1 bais are fed memory ct−1 is decided in Equation 5 by dot multiplying them
to second cell, ...) into the cell gate (shown in black color), by ft and it respectively. The output of this step is the new
input gate (shown in green color), forget gate (shown in blue cell memory ct .
color) and the output gate (shown in red color). If the gates Sixth, cell output is also regulated by the output gate (valve).
(input gate, forget gate and, output gate) are seen as valves This is done by applying the sigmoid function to the cell
that control how much of the data flow through it, the outputs memory ct and dot multiplying it by ot as shown in Equation 6.
of these gates (it , ft and, ot ) are considered as how much The output of this step is the final output of the cell at the
these valves are opened or closed. current time step at . at is fed to the next cell in the same
First, at the cell gate, xt is dot multiplied by its weights level and also fed to the cell in the above level as an Input at .
matrix wg and at−1 is dot multiplied by its weights matrix The same procedure is applied at Level 2 but with different
ug . The output vectors are summed and an activation function weight vectors and different dimensions. Weights at Level 2
is applied to it as in Equation 4. The output is called gt . have smaller dimensions to reduce their input dementions from
Second, at the input gate, xt is dot multiplied by its weights vectors with 16 dimensions to vectors with one dimension. The
matrix wi and at−1 is dot multiplied by its weights matrix ui . output from Level 2 a one dimensional vector from each cell
The output vectors are summed and an activation function is of the 10 cells in Level 2. These vectors are fed as one 10
applied to it as in Equation 1. The output is called it . dimensional vector to a simple neuron shown in Figure 4 at
Third, at the forget gate, xt is dot multiplied by its weights Level 3 to be dot multiplied by a weight vector to reduce the
matrix wf and at−1 is dot multiplied by its weights matrix vector to a single scalar value: the final output of the network
265
Authorized licensed use limited to: NANJING UNIVERSITY OF AERONAUTICS AND ASTRONAUTICS. Downloaded on January 24,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
TABLE III TABLE IV
T RAINING R ESULTS RUN T IME ( HOURS )
266
Authorized licensed use limited to: NANJING UNIVERSITY OF AERONAUTICS AND ASTRONAUTICS. Downloaded on January 24,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
TABLE V
T ESTING R ESULTS
TABLE VI
NEW T ESTING R ESULTS
¯
Error at Error at Error at
5 seconds 10 seconds 20 seconds
Architecture I 0.033048 0.055124 0.101991
Architecture II 0.097588 0.096054 0.112320
Architecture III 0.048056 0.070360 0.202609
(a) ART I Results Plot @ 05 SEC
267
Authorized licensed use limited to: NANJING UNIVERSITY OF AERONAUTICS AND ASTRONAUTICS. Downloaded on January 24,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
(a) ART II Results Plot @ 05 SEC (a) ART III Results Plot @ 05 SEC
(b) ART II Results Plot @ 10 SEC (b) ART III Results Plot @ 10 SEC
(c) ART II Results Plot @ 20 SEC (c) ART III Results Plot @ 20 SEC
Fig. 10. Plotted results for Architecture II for the for the three scenarios. Fig. 11. Plotted results for Architecture I for the for the three scenarios.
268
Authorized licensed use limited to: NANJING UNIVERSITY OF AERONAUTICS AND ASTRONAUTICS. Downloaded on January 24,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.
3) Results of Architecture III: Although it was the most to provide a deeper understanding of the relations between
computationally expensive and had a chance for deeper learn- parameters, and thus, more precise future predictions.
ing, its results were not as good as expected, as shown in Also the use of accelerator cards such as GPUs could be
Figure 11. The results of this architecture in Table V show used in this research to further improve training times, and
that the prediction accuracy for this architecture was less allow the neural networks to be trained longer (which could
than the more simple Architecture I. As this came counter potentially improve the performance of Architecture III). This
to the predictions for deeper learning, this opens door for can save time if well implemented; as data weights vectors
investigating about the deeper learning for this problem; this and matrices for all gates (inputs, input gates, forget gates,
LSTM RNN was one layers deeper and also had 20 seconds and output gates) can be grouped together in one matrix/vector
memory from the past which was not available for the other saved in one global memory variable to be transfered as on
two LSTM RNN’s used. It is also realized that the overall group to the GPU. This would reduce the penalty of data
error in Table V for the prediction at 20 future seconds transfer between the CPU and GPU. Similar measures can be
came relatively high. Looking at Figure 11c between time followed for processing the data for the several files instead
10,000-15,0000, 20,000-25,000 and 35,000-40,000, it can be of doing one data file (FDR reading) at a time. Subsequently,
seen that the calculated curve got very much higher than processing the data for several future vibration predictions (ex.
the actual vibration curve. This strange behaviour is unique at 5 sec, 10 sec, 20 sec, ...) could be performed together at
as it can be seen that the calculated vibration would rarely the same time, reducing data transfer between CPU and GPU.
exceed the actual vibration for all the curves plotted for all Overall, this work provides promising inital work in engine
the architectures at all scenarios, and it would be for relatively vibration prediction that could be integrated into future warn-
small value if occurred. This network could potentially gain ing systems so that pilots can act to prevent excessive vibration
further improvement if trained for more epochs over the other events before unfavorable situations happen during flight.
simpler architectures.
R EFERENCES
VIII. C ONCLUSIONS AND F UTURE W ORK
[1] A. V. Srinivasan, “FLUTTER AND RESONANT VIBRATION
This paper presents early work for utilizing long short term CHARACTERISTICS OF ENGINE BLADES,” 1997. [Online].
memory (LSTM) recurrent neural networks (RNNs) of differ- Available: http://www.energy.kth.se/compedu/webcompedu/WebHelp
[2] Donahue, Jeffrey and Anne Hendricks, Lisa and Guadarrama, Sergio and
ent types to predict engine vibrations and other critical aviation Rohrbach, Marcus and Venugopalan, Subhashini and Saenko, Kate and
parameters. The results obtained from this study are very Darrell, Trevor, “Long-term recurrent convolutional networks for visual
encouraging, given the accuracy of the predictions rather far in recognition and description,” June 2015.
[3] Linlin Chao et al, “Audio visual emotion recognition with temporal
the futuer – 3.3% error for 5 second predictions, 5.51% error alignment and perception attention,” Mar 2016.
for 10 second predictions, and 10.19% error for 20 second [4] Ilya Sutskever, Oriol Vinyals, Quoc V. Le, “Sequence to Sequence
predictions. This work opens up many avenues for future work, Learning with Neural Networks,” Dec 2014.
[5] A. NAIRAC et al, “A System for the Analysis of Jet Engine Vibration
such as fine tuning the neural network designs and their hyper Data,” 1999.
parameters, changing the design of the layers and/or combine [6] David A. Clifton et al, “A Framework for Novelty Detection in Jet Engine
different types of RNNs to further refine the results. Selecting Vibration Data,” 2007.
[7] S. Hochrieter & J. Schmidhuber, “Long Short Term Memory.”
flight parameters also had a great influence on the results. [8] D. Eck & J. Schmidhuber, “A First Look at Music Composition using
This work could be extended by further investigating the flight LSTM Recurrent Neural Network.”
parameters and their contributions to the prediction process. [9] Theano Development Team, “Theano: A Python framework
for fast computation of mathematical expressions,” arXiv e-
This could be achieved by either statistical means or going prints, vol. abs/1605.02688, May 2016. [Online]. Available:
deeper in the analytical and empirical theories and equations http://arxiv.org/abs/1605.02688
269
Authorized licensed use limited to: NANJING UNIVERSITY OF AERONAUTICS AND ASTRONAUTICS. Downloaded on January 24,2024 at 07:27:26 UTC from IEEE Xplore. Restrictions apply.