10.3390@aerospace7090132
10.3390@aerospace7090132
10.3390@aerospace7090132
Article
Deep Neural Network Feature Selection Approaches
for Data-Driven Prognostic Model of Aircraft Engines
Phattara Khumprom * , David Grewell and Nita Yodo
Industrial and Manufacturing Engineering, North Dakota State University, Fargo, ND 58102, USA;
[email protected] (D.G.); [email protected] (N.Y.)
* Correspondence: [email protected]; Tel.: +1-701-231-9818
Received: 29 July 2020; Accepted: 3 September 2020; Published: 4 September 2020
Abstract: Predicting Remaining Useful Life (RUL) of systems has played an important role in various
fields of reliability engineering analysis, including in aircraft engines. RUL prediction is critically
an important part of Prognostics and Health Management (PHM), which is the reliability science
that is aimed at increasing the reliability of the system and, in turn, reducing the maintenance cost.
The majority of the PHM models proposed during the past few years have shown a significant
increase in the amount of data-driven deployments. While more complex data-driven models are
often associated with higher accuracy, there is a corresponding need to reduce model complexity.
One possible way to reduce the complexity of the model is to use the features (attributes or variables)
selection and dimensionality reduction methods prior to the model training process. In this work,
the effectiveness of multiple filter and wrapper feature selection methods (correlation analysis,
relief forward/backward selection, and others), along with Principal Component Analysis (PCA)
as a dimensionality reduction method, was investigated. A basis algorithm of deep learning,
Feedforward Artificial Neural Network (FFNN), was used as a benchmark modeling algorithm.
All those approaches can also be applied to the prognostics of an aircraft gas turbine engines. In this
paper, the aircraft gas turbine engines data from NASA Ames prognostics data repository was used
to test the effectiveness of the filter and wrapper feature selection methods not only for the vanilla
FFNN model but also for Deep Neural Network (DNN) model. The findings show that applying
feature selection methods helps to improve overall model accuracy and significantly reduced the
complexity of the models.
Keywords: data-driven; machine learning; deep learning; DNN; feature selection; Prognostic and
Health Management; aircraft gas turbine engines; C-MAPSS
1. Introduction
Modern computational capability has become more powerful over the past decades. This has
induced a new trend of employing various data-driven models in many fields. Despite the fact that
modern computers can complete complex tasks, researchers are still searching for solutions to reduce
the computational time and complexity of the data-driven models to increase the likelihood that the
models can be employed in real-time operation.
The same challenge has also applied to a certain type of aerospace data, which in this case, is
the estimation of Remaining Useful Life (RUL) of the aircraft gas turbine engines. The main purpose
of this work is to prove the theory that a particular group or a set of prognostics features (attributes
or variables) from the aircraft gas turbine engines data can be selected prior to the training phase of
Artificial Neural Network (ANN) modeling in order to reduce the complexity of the model. The same
assumption also is believed to be applicable to the Deep Neural Network (DNN) model. It might
also be applied to other complex deep learning models, i.e., Convolutional Neural Network (CNN),
Recurrent Neural Network (RNN), and their variations as well.
In order to validate the aforementioned theory, the prognostics of aircraft gas turbine engines
dataset or Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset derived from
NASA Ames Prognostics Center of Excellence (PCoE) [1] was used to develop preliminary vanilla
ANN models with selected features from different feature selection methods. Furthermore, to prove
that similar assumptions can also be deployed to other deep learning algorithms, the Deep Neural
Network or DNN models have also been developed based on some selected features derived from
the ANN validation models. The final goal was to determine which feature selection method was the
most suitable for the deep learning model in general to predict prognostics state or Remaining Useful
Life for aircraft gas turbine engines data. End results from various future selection methods were
compared against the one that is using original features. The ANN and DNN models with selected
features were studied and compared based on their performance.
Based on the aforementioned goal, the summary of the main contributions of this work are:
1. Extract meaningful features for neural network-based and deep learning data-driven models
from the C-MAPSS dataset.
2. Suggest the novel neural network-based feature selection method for aircraft gas turbine engines
RUL prediction.
3. Develop deep neural network models from selected features.
4. Show how the developed methodology can improve the RUL prediction model by comparing its
performance/error and complexity to the model derived from original features.
The hidden neurons in ANN measure the distance between the input vector x and the centroid c
from the data cluster. The measured values are the output of the ANN. In Equation (1), the σ parameter
represents the radius of the hypersphere determined by iteratively selecting the optimum width.
The weights of the neural network are updated at the neural nodes using error back-propagation,
which is a stochastic gradient descent technique. Then the weights of each individual neural node
are fed forward to the next layer. This technique is often referred to as Feedforward Neural Network
(FFNN). This is how ANN “learns” the data pattern through its weights [8].
Aerospace 2020, 7, 132 3 of 32
In 2006, Geoffrey Hinton suggested the early design of deep learning algorithms based on the
aforementioned FFNN [9]. The vanilla FFNN generally consists of only the hidden layer with a sigmoid
activation function described in Equation (1). Multiple configurations of deep learning algorithms,
Aerospace 2020, 7, x FOR PEER REVIEW 3 of 33
such as Deep Neural Network (DNN), Convolutional Neural Network (CNN), Recurrent Neural
Network (RNN), etc., have been widely used as data-driven modeling algorithms. Most of them have
algorithms, such as Deep Neural Network (DNN), Convolutional Neural Network (CNN), Recurrent
outperformed every other well-known data-driven algorithms in the past.
Neural Network (RNN), etc., have been widely used as data‐driven modeling algorithms. Most of
One aspect to keep in mind before employing any deep learning algorithm is that each deep
them have outperformed every other well‐known data‐driven algorithms in the past.
learning algorithm might be suitable for different tasks. This heavily depends upon the different data
One aspect to keep in mind before employing any deep learning algorithm is that each deep
characteristics and type of target models. The deep learning algorithms also include different types of
learning algorithm might be suitable for different tasks. This heavily depends upon the different data
activation functions and optimizers. These are the key differences between deep learning algorithms
characteristics and type of target models. The deep learning algorithms also include different types
and vanilla ANN or FFNN that have been proposed in the early years [10].
of activation functions and optimizers. These are the key differences between deep learning
In this work, we only employed DNN with auto-encoder as a modeling algorithm. All encoded
algorithms and vanilla ANN or FFNN that have been proposed in the early years [10].
and decoded processes happen inside the hidden layers of the network through parameterized
In this work, we only employed DNN with auto‐encoder as a modeling algorithm. All encoded
function [9,10]. The construction of DNN with auto-encoder is briefly illustrated in Figure 1. Unlike
and decoded processes happen inside the hidden layers of the network through parameterized
the ANN that uses sigmoid function as an activation function, our DNN layers used Rectified Linear
function [9,10]. The construction of DNN with auto‐encoder is briefly illustrated in Figure 1. Unlike
Units (ReLU) as activation function. The ReLu function can be simply expressed as:
the ANN that uses sigmoid function as an activation function, our DNN layers used Rectified Linear
Units (ReLU) as activation function. The
f (xReLu
) = xfunction
+
= maxcan(0, xbe
) simply expressed as: (2)
𝑓 𝑥 𝑥 max 0, 𝑥 (2)
where x is the input to a neuron and + represents the positive part of its arguments. The ReLU
where 𝑥has
function is been
the input to a neuron
demonstrated and + better
to achieve represents theregression
general positive tasks
part of its arguments.
training for deeperThe ReLU
networks
function has
compared been activation
to other demonstrated to achieve
functions such asbetter general sigmoid
the logistic regressionandtasks training fortangent
the hyperbolic deeper
networks compared to other activation functions such as the logistic sigmoid and
(tanh) [10]. Therefore, the ReLU function has been chosen to use for modeling Remaining Useful the hyperbolic
tangent
Life (RUL)(tanh) [10]. Therefore,
prediction for our PHMthe ReLU function
data while the has
ANN been chosen
with to use
sigmoid for modeling
function has beenRemaining
used as a
Useful Lifealgorithm
validation (RUL) prediction forselection
for feature our PHM data while the ANN with sigmoid function has been used
methods.
as a validation algorithm for feature selection methods.
data-driven models have been proven to work relatively well with these types of PHM tasks [11].
However, one of the challenges is to reduce the complexity of the neural network prior to the training
states. This might possibly be done by reducing the input training data. One possible way that can
help in reducing the complexity of the model is to select only meaningful features or attributes from
the raw dataset before model training.
The rest of the paper is organized as follows: Section 2 covers the methodology outlining all
methods and approaches used for the defined problem. Section 3 describes the experimental setup
with detail of data description and comparing final results from all models. Section 4 discusses and
compares results from all modela. Lastly, a final conclusion and possible future works
Aerospace highlight
2020, 7, x FOR PEERareREVIEW
discussed in Section 5.
x FOR PEER REVIEW 5 of 33
2. Methodology
2. Methodology
gy In this section, all essential details of au
In this section, all essential details of auto-encoder deep neural network used
will in our experiment
be discussed. The problem definition, a
ction, all essentialwill
details of auto-encoder deep neural network used in our experiment
be discussed. The problem definition, and all notations will also be clearly defined,of ashow
well our
as the
illustration proposed deep neu
sed. The problem illustration
definition, and all notations
of how our proposed will also
deepbe clearly
neural defined,
network as well as the
architecture can be applied for RUL aircraft gas
gas turbine engines prediction with feature
how our proposed deep engines
turbine neural network
prediction architecture
with featurecan be applied
selection and for RULnetwork
neural aircraft modeling framework.
ngines prediction with feature selection and neural network modeling framework.
2.1. Problem Definition
2.1. Problem Definition
Definition n oNs Starting with the raw data, which i
Starting with the raw data, which is denoted as, D = xi , yi , the data contains N training
with the raw data, which is denoted
𝑁𝑠 S S
as, 𝐷𝑆 = {(𝑥𝑆𝑖 , 𝑦𝑆𝑖 )}𝑖=1 , the data contains 𝑁i𝑠=1
S training sample swhere 𝑥𝑆𝑖 ∈ 𝒳𝑆 is a feature
i
sample where xS ∈ XS is a feature with a length of Ti and qS is the number of features, in which,
Ti
ple where 𝑥𝑆𝑖 ∈ 𝒳𝑆 iis a feature
n oTi with a length of 𝑇𝑖 and 𝑞𝑆 is the number of features, in which, xSi = = {xti }t=1 ∈ RqS×Ti . In addition
i qS ×Ti i
T xS == xt ∈ R . In addition, yS ∈ YS is denoted as Remaining Useful withLife (RUL) also
the length with space and RUL
𝑇𝑖 (feature
= {x i } i ∈ RqS×Ti . In
t t=1 addition,t=1𝑦 𝑖 ∈ 𝒴 is denoted as Remaining
𝑆 𝑆 Useful Life (RUL) also 𝑇 𝑖𝑇
th 𝑇𝑖 (feature space with 𝑦 𝑖 = {𝓎𝑖𝑡 }𝑡=1 ∈ ℝ≥0𝑖 . where 𝑡 ∈ {1, 2
theand RULTspace
length i (feature are within
space and the same space are within the same length) with 𝑆
RUL length) .
𝑖 𝑇𝑖 𝑇𝑖 𝑖 𝑞 𝑖 measurement of all variables and RUL lab
𝑡 }𝑡=1 ∈ ℝ≥0 . wherewhere𝑡 ∈t {1, 2, …
∈ {1, . .𝑖 }. ,, T𝓍i },𝑡 ∈
2,, 𝑇 xit ℝ and 𝓎𝑡 ∈ ℝ≥0,,represent
∈ Rq,S ,and
𝑆 representthe t-th𝑡 -th
the measurement of 𝑖all𝑁𝒯variables and 𝑖
of all variables and RUL label, respectively. Similarly, the estimated target domain, n 𝐷N𝒯T= {𝑥𝒯 }𝑖=1 where 𝑥𝒯 ∈ 𝒳𝒯 and 𝒳𝒯 ∈ ℝ
o
RUL label, respectively. Similarly, the estimated target domain, DT = xiT where xi ∈ XT and
𝑖 𝑞𝒯 ×𝑇𝑖 i=1 𝐷𝒯 , are Tassumed
and to possibly have a
where 𝑥𝒯 ∈ 𝒳𝒯 and XT ∈ R 𝒳 ∈ ℝ with no labels. The source
𝒯 qT ×Ti with no labels. The source and target domain, D and D and target domain, 𝐷 𝑆 , are assumed to possibly have
S T primary goal is to define a function 𝑔 t
assumed to possibly have aprobability
different distribution, P(XS ) , P(X𝑃(𝑋
probability distribution, ) ≠ 𝑃(𝑋𝒯 ) . The
a different T ). 𝑆The primary goal is toapproximate
define a function g that
the corresponding RUL for the
is to define a function
can derive 𝑔 orthat canfrom
learn derive theor learndata
source from thecan
that source data that
approximate the can
corresponding RUL for the target
the preliminary assumption that mapping
𝑦𝒯𝑖preliminary
≈ 𝑔(𝑥𝒯𝑖 ), with
the correspondingdomain
RUL foratthe thetarget
testing domain
time, such,at theytesting
i ≈ g time,x such,the
i , with assumption that mapping between
T T across all domains.
ry assumption that mapping
input (x) and between
output (y) input (𝑥) and similar
is somehow outputacross(𝑦) is all
somehow
domains.similar
mains.
2.2. Deep Neural Network Architecture
2.2. Deep Neural Network Architecture
ral Network Architecture While there are existing deep learning
While there are existing deep learning algorithms that have been proposed to accommodate for
PHM of aircraft gas turbine engines data m
ere are existing deepPHM learning algorithms
of aircraft gas turbine that engines
have been dataproposed
modeling to [12–20],
accommodate
this workfor focuses on using a deep neural
network with auto-encoder with a specific u
aft gas turbine engines data modeling [12–20], this work focuses on using a deep neural
network with auto-encoder with a specific use case and specifications that fit into problem definition
previously identified.
auto-encoder with a specific identified.
previously use case and specifications that fit into problem definition
The DNN used in this work focused
entified. The DNN used in this work focused on the feedforward architecture by the H2O package in Python
Python API [21]. H2O is based on multi-lay
N used in this work APIfocused
[21]. H2O on theis based feedforward architecture
on multi-layer by the H2O
feedforward package
neural networksin for predictive modeling [22].
[22]. The following are some of the H2O DN
21]. H2O is based The on multi-layer
following are feedforward
some of the neural
H2O networks
DNN features for predictive modeling
used for this experiment.
owing are some of the H2O DNN features used for this experiment. Supervised training protocol for regre
• Supervised training protocol for regression tasks A multi-threaded and distributed par
ed training protocol for regression tasks
• A multi-threaded and distributed parallel computation that can be run oncluster
node a single or a
threaded and distributed parallel computation that can be run on a single or a multi-
multi-node cluster Automatic, per-neuron, adaptive learn
ster
• Automatic, per-neuron, adaptive learning rate for fast convergence Optional specification of the learning r
ic, per-neuron, adaptive learning rate for fast convergence
• Optional specification of the learning rate, annealing, and momentum options Regularization options to prevent mod
specification of the learning rate, annealing, and momentum options
• Regularization options to prevent model overfitting Elegant and intuitive web interface (Fl
zation options to prevent model overfitting
Grid search for hyperparameter optim
and intuitive web •interface Elegant
(Flow)and intuitive web interface (Flow)
• Grid search forand hyperparameter optimization and model selection Automatic early stopping based on the
rch for hyperparameter optimization model selection
tolerance
ic early stopping • based Automatic
on the convergenceearly stopping based on
of user-specified the toconvergence
metric a user-specified of user-specified metric to a
user-specified tolerance Model check-pointing for reduced run
e
Automatic pre- and post-processing fo
•
heck-pointing for reduced Modelruncheck-pointing
times and model fortuning
reduced run times and model tuning
Additional expert parameters for mod
ic pre- and post-processing
• for categorical
Automatic numerical data for categorical numerical data
pre- and post-processing
Deep auto-encoders for unsupervised
nal expert parameters • for model tuning
Additional expert parameters for model tuning
to-encoders for unsupervised feature learning. In the proposed DNN model, deep n
features from the time length, 𝑇𝑖 . The hidd
roposed DNN model, deep neural network layers are used to extract the temporal
vector ℎ𝑡−1 ∈ ℝℎ , input vector (as define
the time length, 𝑇𝑖 . The hidden state units of the neural consist of, the hidden state
function, 𝑓. All operations in DNN layers c
ℝℎ , input vector (as defined in problem definition), xti ∈ Rq , and the activation
All operations in DNN layers can be written as: 𝑖𝑡 = 𝑓(
Aerospace 2020, 7, 132 6 of 32
In the proposed DNN model, deep neural network layers are used to extract the temporal features
from the time length, Ti . The hidden state units of the neural consist of, the hidden state vector
ht−1 ∈ Rh , input vector (as defined in problem definition), xit ∈ Rq , and the activation function, f .
All operations in DNN layers can be written as:
Aerospace2020,
Aerospace 2020,7,
7,xxFOR
FORPEER
PEERREVIEW
REVIEW 66of
of33
33
Aerospace 2020, 7, x FOR PEER REVIEW it = f + bi Wt xit + Wt0 ht−1 6 of 33 (3)
𝑖 ′ (4)
𝑜𝑜𝑡𝑡 == 𝑓(𝑊
𝑓(𝑊
𝑜𝑜𝓍𝓍𝑡𝑡 +
𝑖 +𝑊𝑊𝑜𝑜ℎℎ𝑡−1
′ + 𝑏𝑏𝑜𝑜))
𝑡−1 + (4)
ot =𝑖 f Wo x′ it + Wo0 ht−1 + b′′o (4) (4)
where 𝑖𝑖 and
where and 𝑜𝑜 represent
represent input 𝑜𝑡 and
input = 𝑓(𝑊
and 𝑜 𝓍𝑡 + states.
output
output 𝑊𝑜 ℎ𝑡−1𝑊
states. 𝑊+ and𝑏and
𝑜) 𝑊𝑊 are are matrices
matrices of
of updated
updated weights
weights and
and
weightsi from
where fromothe
and the hidden state,
represent state, and
input 𝑏 is
and 𝑏output is the
thestates. W and W 0 are matrices of updated weights and
bias vector.
vector.
where weights
𝑖 and 𝑜 represent hidden
input and output states. bias
𝑊 and 𝑊 ′ are matrices of updated weights and
Unlike
weights
Unlikefrom inthe
in vanilla ANN,
hidden
vanilla ANN, inand
state, the proposed
bproposed
is the bias DNN, the activation
vector. function 𝑓𝑓 is
activation function is the
the Rectifier
Rectifier Linear
Linear
weights from the hidden state, and in𝑏 isthethe bias vector. DNN, the
function
Unlike
function [23]in
[23] instead
vanilla
instead ofANN,
the sigmoid
sigmoid function. The
in the function.
proposed The
DNN, DNN activation
theactivation function
activationfunction can bethe
f isbe
functioncan represented as;
Rectifier Linear
Unlike in vanilla ANN,ofinthe the proposed DNN
DNN, the activation function 𝑓 is the represented
Rectifier Linear as;
function [23] instead of the sigmoid function. The DNN activation function can be represented as;
function [23] instead of the sigmoid function.𝑓(𝛼) The DNN
𝑓(𝛼) = max(0,
= activation
max(0, 𝛼) ∈∈ ℝ
𝛼) ℝfunction
++
can be represented as; (5)
(5)
where, in
where, in this case, 𝛼𝛼 represent
this case, represent𝑓(𝛼) the state
the state α)functions
=f (max(0, = max
functions ℝα+) ∈ R+ (3)
𝛼) (∈0,(Formulas
(Formulas (3) and and (4)) (4)) that
that firing
firing intointo(5) the input
the input (5)
neural.
where,neural.
in this case, 𝛼 represent the state functions (Formulas (3) and (4)) that firing into the input
where, in this important
Another
Another case, α represent
important aspectthe
aspect ofstate
of the DNN
the functions
DNN model
model (Equations
architecture
architecture (3) and is (4))
is thethat
the lossfiring
loss function,
function, into the input by,
denoted
denoted neural.
by, ℒ. ℒ.
neural.
For this
For Another
this work,work, the important
the Huber
Huber loss aspect of
loss function the
function wasDNN model
was selected becausearchitecture
because itit [24]is the loss
[24] has function,
has provenproven to denoted
to work
work bestby, L.
best in For
in termsthis
terms
Another important aspect of the DNN modelselected architecture is the loss function, denoted by, ℒ.
work,
of the
accurately Huber loss
projecting functionthe RUL,was selected
𝑦 𝑖𝑖 ∈ 𝒴𝑆 ,because
𝑦𝑆𝑆 selected
∈ 𝒴𝑆 , of the of the it
source[24] has
domain, proven 𝐷 to
. work
The
𝐷𝑆 . ThetoHuber best
Huber in terms
loss of
function accurately
can be
For thisofwork,
accurately the Huber projecting the RUL,was
lossi function becausesource domain,
it [24] has proven 𝑆
work loss best function
in terms can be
projecting
described
described the
as;
as; RUL, y S
∈ Y S , of the source domain, D S . The Huber loss function can be described as;
of accurately projecting the RUL, 𝑦𝑆𝑖 ∈ 𝒴𝑆 , of the source domain, 𝐷𝑆 . The Huber loss function can be
described as; 11 𝑖𝑖 2
‖𝓎
‖𝓎 ̂̂𝑡𝑡 −−𝓎 𝓎𝑖𝑡𝑖𝑡‖‖22 ,,𝑓𝑜𝑟
𝑓𝑜𝑟 ‖𝓎 ‖𝓎 𝑖
̂̂𝑖𝑡𝑡 − −𝓎 𝓎𝑖𝑡𝑖𝑡‖‖1 ≤ ≤ 11
ℒ 𝑖
𝑖𝑦 (𝜃𝑓 , 𝜃𝑦 ) = { 22 2 1
(6)
1
ℒ𝑦 (𝜃𝑓 , 𝜃𝑦 ) = 𝑖{ 𝑖 2 1 ̂ 𝑡 − 𝓎𝑡 ‖ ≤ 1
𝑖 𝑖 (6)
‖𝓎̂𝑡 −‖𝓎 𝓎̂𝑖𝑡𝑖‖− 𝑖 , 𝑓𝑜𝑟1‖𝓎
𝓎𝑡𝑡‖‖1 −
𝑖 − , 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
ℒ𝑦𝑖 (𝜃𝑓 , 𝜃𝑦 ) = {2 ‖𝓎̂𝑡𝑡 −2 𝓎 1
2 , 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (6)
𝑖 𝑖
1 1 2
‖𝓎 ̂ − 𝓎 ‖ − , 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where, 𝜃
where,
where, θ𝜃𝑓𝑓f isisthe
thespace
space representation
representation 𝑡 of 𝑡the
of the1 targettarget
2 input input that that mapped
mapped through
mapped through the
through the feature
the feature extraction
feature extraction
extraction
layers
where,layers
layers
𝜃𝑓 is the
into
into
into aaa new
space
new space.
new space. In
space.
representation
In addition,
Inaddition,
addition,
of the target 𝜃𝜃θ is
isis the
𝑦𝑦y input
the
thethat domain
domain
domainmapped
regression
regression
throughspace
regression space
space
the feature
generated
generated
generatedextraction
by
by logistic
by logistic
logistic
repressor [24], and, 𝓎̂𝑖𝑖 is RUL prediction from the source domain.
layers repressor
repressor
into a new [24],
[24], and, 𝓎
and,
space. ̂ 𝑡𝑡 isaddition,
In isRULRULprediction
prediction
𝜃𝑦 is the from
from thesource
the
domain source domain.
domain.
regression space generated by logistic
The objective
objective in training
training DNN DNN is is totominimize
minimize the the prediction
prediction loss, loss, ℒ𝑖𝑖 , whichcan can bedescribed
described by;
repressor [24], The
Theand, objective 𝑖 in
𝓎̂𝑡 is in RULtraining
prediction DNNfrom is to minimize
the sourcethe prediction loss, ℒL𝑦𝑦iy,,which
domain. which canbe be describedby; by;
1 𝑁 𝑖
The objective in training DNN is to minimize min[[the 1 ∑prediction
𝑁 𝑠 ℒ𝑖 (𝜃 , 𝜃loss,
𝑠 𝑖
𝑦(𝜃 𝑓, 𝜃𝑦𝑦))]] 𝑦
ℒ , which can be described by; (7)
min 𝑁𝑠∑ 𝑖=1Nℒ s 𝑦 i𝑓 (7)
𝑦 1𝑠
𝜃𝜃𝑓𝑓,𝜃
,𝜃𝑦 𝑁 X 𝑖=1
1min 𝑁𝑠 𝑖 Ly θ f , θy (7)
min [ θ f∑ ,θ y Nℒs𝑦 (𝜃𝑓i,=𝜃1𝑦 )] (7)
The DNN
The DNN model model used used in in 𝜃this
this
𝑓 ,𝜃𝑦 work𝑁𝑠 𝑖=1
work is depicted
is depicted in in Figure
Figure 2. 2. This
This DNN DNN model model architecture
architecture is
is
trainedto
trained topredict
predictfor foreach
eachinput, input, 𝑥𝑥𝑖𝑖,,real realvaluevalue 𝑦𝑦𝑖𝑖 and andits itsdomain
domainlabel label 𝑑𝑑𝑖𝑖 for forthethesource
sourcedomain
domainand and
The DNN The model DNN model used inused this in thisiswork
work depicted is depicted in Figure in Figure
2. This2.DNN Thismodel DNN model architecture architecture
is is
onlydomain
only domainlabel labelfor forthe thetarget
target
𝑖 domain.
domain. i The
The
𝑖 firstipart
first partof ofthe theDNN DNNarchitecture
𝑖architecture i isthe
is thefeature
featureextractor,
extractor,
trainedtrained
to predict to predict
for eachfor each𝑥input,
input, x , real𝑦value
, real value and its y and domain its domainlabel 𝑑 label for the d for the source
source domaindomain and and
𝑔𝑔 , that decomposes
𝑓𝑓, that
decomposes the the inputs
inputs and and maps maps them them into into the the hidden
hidden state, state, ℎℎ𝑡−1 𝑡−1 ∈ ∈ℝ ℝℎℎ.. The
The modelmodel then then
only
only domain domain
label forlabel for thedomain.
the target target domain. The firstThe part first
of thepartDNN of the DNN architecture
architecture is the is the
feature feature
extractor, extractor,
embeds the output space as a feature space 𝑓
𝑓them of the deeper layers and repeats this process as needed.
𝑔𝑓 , thatembeds
g decomposes the output
f , that decomposes
space
the inputs theas a feature
inputs
and maps andthemspace
mapsinto ofthetheinto deeper
hidden the hiddenlayers
state, ℎ andstate,
𝑡−1
repeats
∈ hℝt−1 ℎ this
∈ Rhprocess
. The . The model
model as needed.
then then
As previously
As previously detailed, detailed, this this vectorvector space space parameter
parameter that that is is the
the result
result of of feature
feature mappingmapping is, is, 𝜃𝜃𝑓𝑓 i.e.,
i.e.,
embedsembedsthe output the output
space asspace a feature as a spacefeature𝑓space of thef deeper of the deeper layers and layers repeats and repeatsthis
𝑖
this process
process as needed. as needed.
𝑓𝑓
As
As previously==previously
𝑔𝑔𝑓𝑓(𝜃(𝜃𝑓𝑓)).. This
detailed,
This feature
feature
detailed,
this vector this space
spacevector
space 𝑓𝑓parameter
is first
is
space firstparameter
mapped to
mapped
that is the
to aa is
that real-value
real-value
result theofresult feature 𝓎𝓎 𝑖 variable by the function,
variablemapping
of𝑡𝑡 feature
mapping by
is, the function,
is, θ f i.e.,
𝜃𝑓 i.e.,
𝑔𝑔 𝑦 (𝑓;
(𝑓; 𝜃
𝜃 𝑦 ),
), which
which is
is composed
composed of
of fully-connected
fully-connected neural
neural network
network 𝑖 layers
layers with
with parameter,
parameter, 𝜃𝜃𝑦.. The
The
𝑓 = 𝑔𝑓 (𝜃 𝑓 ) .g fThis 𝑓 is f first real-value 𝓎𝑡 variable
f𝑦 = 𝑦
θ f . feature
This feature spacespace is first mappedmapped totoa areal-value variableby bythe the function,
function, g y𝑦 f ; θ y ,
dropout layer layer with with aa raterate of of 0.4
0.4 waswas applied
applied to to avoid
avoid the the overfitting
overfitting issue issue [25].
𝑔𝑦 (𝑓; 𝜃dropout
), which
which
𝑦 is composed
is composed of fully-connected
of fully-connected neural neural
network network layerslayers with parameter,with [25]. θ y . The𝜃dropout
parameter, 𝑦 . The layer
Another
Another goal
goal is
is to
to find
find the
the feature
feature space
space that
that is
is domain
domain invariant,
invariant, i.e., finding
i.e., finding aa feature
feature spacespace 𝑓𝑓
dropout withlayer withofa 0.4
a rate ratewasof 0.4 was applied
applied to avoidtothe avoid the overfitting
overfitting issue [25]. issue [25].
in which
in which 𝑃(𝑋 𝑃(𝑋𝑆)) and and 𝑃(𝑋 𝑃(𝑋𝒯𝒯)) are are similar.
similar. This This is is oneone of of the the challenges
challenges in in training, which which can be
Another goal is 𝑆to find the feature space that is domain invariant, i.e., finding atraining, feature space 𝑓can be
improved
improved by applying
by applying the “feature
thesimilar.
“feature selection”
selection” prior
prior to training (detailed in the further section).
in which 𝑃(𝑋𝑆 ) and 𝑃(𝑋𝒯 ) are This is one of the to training (detailed
challenges in training, in which
the further can be section).
Another
Another objective
objectivethe is to minimize
is to“feature
minimize the weights
the weights of feature extractor in the direction of the regression
improved by applying selection” prioroftofeature training extractor
(detailed in the in the direction
further of section).
the regression
loss, ℒℒ𝑦𝑖𝑦𝑖.. In
loss, In more
more detail,
detail, the the modelmodel loss loss function
function can can be be used
used to to derive
derive the the final
final learning
learning function,
function, 𝑔, 𝑔,
Another objective is to minimize the weights of feature extractor in the direction of the regression 𝑖
through parameter 𝜃, which means the RUL prediction
parameter 𝜃, which means the RUL prediction result (described in Equation (6)), 𝓎̂𝑡𝑡 = result (described in Equation (6)), 𝓎̂𝑖 =
loss, ℒthrough
𝑖
𝑦 . In more detail, the model loss function can be used to derive the final learning function, 𝑔,
𝑔 (𝑔 (𝑔𝑓𝑓(𝜃(𝜃𝑓𝑓); );𝜃𝜃𝑦𝑦). ).
through𝑔𝑦𝑦parameter 𝜃, which means the RUL prediction result (described in Equation (6)), 𝓎̂𝑖𝑡 =
The way the DNN algorithm update its learning weights, 𝜃, is through the gradient descent
𝑔𝑦 (𝑔𝑓 (𝜃𝑓 ); The 𝜃𝑦 ). way the DNN algorithm update its learning weights, 𝜃, is through the gradient descent
update
update [26] in the form of; of;
The way [26] the DNN in the formalgorithm update its learning weights, 𝜃, is through the gradient descent
update [26] in the form of; 𝜕ℒ𝑦𝑖𝑦𝑖
𝜕ℒ
𝜃𝑓𝑓 ← 𝜃𝑓𝑓 − 𝜆 ( 𝜕𝜃 ))
𝜃 ← 𝜃 − 𝜆 ( (8)
(8)
𝜕ℒ𝑦𝑖 𝜕𝜃𝑓𝑓
𝜃𝑓 ← 𝜃𝑓 − 𝜆 ( ) (8)
𝜕𝜃𝑓 𝜕ℒ 𝜕ℒ𝑖𝑦𝑖
𝜃𝜃𝑦𝑦 ← − 𝜆𝜆(( 𝑦 ))
← 𝜃𝜃𝑦𝑦 − (9)
(9)
𝜕ℒ𝑦𝑖 𝜕𝜃 𝜕𝜃𝑦𝑦
𝜃 ← 𝜃 − 𝜆( ) (9)
described as;
1 𝑖 2
‖𝓎̂𝑡 − 𝓎𝑖𝑡 ‖2 , 𝑓𝑜𝑟 ‖𝓎̂𝑖𝑡 − 𝓎𝑖𝑡 ‖1 ≤ 1
ℒ𝑦𝑖 (𝜃𝑓 , 𝜃𝑦 ) = { 2
1
‖𝓎̂𝑖𝑡 − 𝓎𝑖𝑡 ‖1 − , 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Aerospace 2020,7,7,132
Aerospace2020, x FOR PEER REVIEW 2 77ofof32
33
where, 𝜃𝑓 is the space representation of the target input that mapped through th
layers into a new space. In addition, 𝜃𝑦 is the domain regression space ge
repressor [24], and, 𝓎̂𝑖𝑡 is RUL prediction from the source domain.
The objective in training DNN is to minimize the prediction loss, ℒ𝑦𝑖 , which c
1
𝑁𝑠
min [ ∑𝑖=1 ℒ𝑦𝑖 (𝜃𝑓 , 𝜃𝑦 )]
𝜃𝑓 ,𝜃𝑦 𝑁𝑠
The DNN model used in this work is depicted in Figure 2. This DNN m
trained to predict for each input, 𝑥 𝑖 , real value 𝑦 𝑖 and its domain label 𝑑 𝑖 for the
only domain label for the target domain. The first part of the DNN architecture is t
𝑔𝑓 , that decomposes the inputs and maps them into the hidden state, ℎ𝑡−1 ∈ ℝ
embeds the output space as a feature space 𝑓 of the deeper layers and repeats this
As previously detailed, this vector space parameter that is the result of feature
𝑓 = 𝑔𝑓 (𝜃𝑓 ) . This feature space 𝑓 is first mapped to a real-value 𝓎𝑖𝑡 variabl
𝑔𝑦 (𝑓; 𝜃𝑦 ), which is composed of fully-connected neural network layers with p
Figure 2. Proposeddropout layer Networks
Deep Neural with a rate of 0.4Architecture.
Model was applied to avoid the overfitting issue [25].
Another goal is to find
Figure 2. Proposed Deep Neural Networks Model Architecture. the feature space that is domain invariant, i.e., findin
Another goal is to find the featureinspace
which that𝑃(𝑋 ) and 𝑃(𝑋
is 𝑆domain 𝒯 ) arei.e.,
invariant, finding
similar. a feature
This is onespace
of the f in
challenges in train
which P(XS )Selection
2.3. Feature and P(XMethods
T ) are similar.
for This
Neural is
Network
improved one of the challenges
Architectures
by applying the in training,
“feature which
selection” can be
priorimproved
to training (detailed in th
by applying the “feature selection” prior to training (detailed in the further section). Another
Another objective is to minimize the weights of feature extractor in the directio objective
In prognostic applications, feature extraction occurs after receiving raw data from sensors. The
is to minimize the weights of featureloss, ℒ𝑦𝑖 . Inin
extractor the detail,
more direction the of the regression
model loss function loss, Liybe
can . In
usedmore to derive the final le
feature extraction usually involves signal processing and analysis in the time or frequency domain.
detail, the model loss function can be through
used to derive
parameter the final
𝜃, which learningmeansfunction,
the RULg, through
predictionparameter
result
The purpose is to transform raw signals into more informative data that well‐represents the system
θ, (described in Equation
Equation (6)), 𝓎̂𝑖 = 𝑔𝑦 (𝑔𝑓 (𝜃𝑓 ); 𝜃𝑦 ).
which
[27]. means
In other the RUL
words, prediction
feature result
extraction is the(described
process of intranslating sensor𝑡 signals
(6)), into data.) In. contrast,
The way the DNN algorithm update Theits way the DNN
learning weights, algorithm
θ, is updatethe
through its gradient weights, 𝜃, is through th
learning descent
the purpose of feature selection is to select a particular set of features in the dataset that is believed
update [26] in the form update [26] in the form of;
to be more relevant forof;
modeling. These feature selection i processes always execute after the feature
∂L ytraining
extraction and occur in between pre‐processing and the or pre‐training phase of 𝜕ℒ the𝑦𝑖 data
θ f ← θ f − λ 𝜃 𝑓 ← 𝜃𝑓 − 𝜆 ( ) (8)
modeling framework. ∂θ f 𝜕𝜃𝑓
Three common feature selection strategies have been discussed in the literature: (1)𝑖 filter
∂Liy 𝜕ℒ𝑦
approach, (2) wrapper approach, and (3) embedded approach. This paper will𝜃𝑦only ← 𝜃discuss
𝑦 − 𝜆(
the ) filter
θ y ← θ y − λ
𝜕𝜃 (9)
and wrapper approaches. Figure 3 shows the processes ∂θ y flow and role difference role of feature 𝑦
extraction and feature selection in the data modeling
Usually, theprocess.
Stochastic Continuous Greedy (SCG) estimate is used to updat
Usually, the Stochastic Continuous Greedy (SCG) estimate is used to update the
and (9). The learning rate, 𝜆, represents the learning steps taken by the SCG as tr
Equations (8) and (9). The learning rate, λ, represents the learning steps taken by the SCG as
training processes.
Figure3.3.Role
Figure Roleofoffeature
featureextraction
extractionand
andfeature
featureselection
selectionininthe
theprognostics
prognosticsmodeling
modelingprocess.
process.
Filter
Filtermethods
methodsemploy
employstatistical,
statistical,correlation,
correlation,andandinformation
informationtheory
theorytotoidentify
identifythetheimportance
importance
ofofthe
the features. The performance measurement metrics of filter methods usually use the localcriteria
features. The performance measurement metrics of filter methods usually use the local criteria
that
thatdo
donot
notdirectly
directlyrelate
relatetotomodel
modelperformance
performance[28]. [28].
There
Therearearecurrently
currentlymultiple
multiplebaseline
baselinefilter
filtermethods
methodspopularly
popularlyemployed
employedfor forfeature
featureselection
selection
processes. However, the result from the experiments showed that only the correlation-based
processes. However, the result from the experiments showed that only the correlation‐based methods methods
were
weresuitable
suitablefor
forthe
thecase
casestudy
studydata.
data.This
Thisisisdue
duetotothe
thefact
factthat
thatcorrelation-based
correlation‐basedmethods
methodsevaluate
evaluate
the
the feature with a direct correlation to the target variable. In other words, the correlation‐basedfilter
feature with a direct correlation to the target variable. In other words, the correlation-based filter
methods
methodsmakemakeselections
selectionsbased
basedon onthe
themodeling
modelingobjectives,
objectives,which
whichcan
canimply
implythatthatthese
thesemethods
methodsare are
more
moresuitable
suitabletotothe
thedata
datawith
withthe
thetarget
targetvariable.
variable.TheThecorrelation-based
correlation‐basedfilter
filtermethods
methodsincluded
includedinin
this
thiswork
workisisPearson
Pearsoncorrelation
correlation[29,30].
[29,30].Additionally,
Additionally,the theresult
resultfrom
fromother
otherstatistical-based
statistical‐basedmethods,
methods,
namely Relief algorithm, Deviation selection, SVM selection, and PCA
namely Relief algorithm, Deviation selection, SVM selection, and PCA selection selection [31], was [31],
also included
was also
toincluded
provide toa complete
provide acomparison.
complete comparison.
Wrapper methods use a data-driven algorithm that performs the modeling for the dataset to
select the set of features that yield the highest modeling performance [32]. Wrapper methods are
typically more computationally intensive compared to filter methods. There are four main baseline
wrapper methods [32]: (1) forward selection, (2) backward elimination, (3) brute force selection, and
(4) evolutionary selection.
Forward selection and backward elimination are search algorithms with different starting and
stopping conditions. The forward selection starts with an empty selection set of features, then adds an
attribute in each searching round. Only the attribute that provides the highest increase in performance
is retained. Afterwards, another new searching cycle is started with the modified set of selected
features. The searching of forward selection stops when the added attribute in the next round does not
further improve the model performance.
In contrast, the backward elimination method performs in the reverse process. Backward selection
starts with a set of all attributes, and then the searching processes continue to eliminate attributes
until the next set of eliminated attributes does not provide any further improvements of modeling
performance. The brute force selection method uses search algorithms that try all combinations of
attributes. Evolutionary selection employs a genetic algorithm to select the best set of features based on
the fittest function measurement [33]. Because of computational and time limitations, the brute force
selection could not be included in this experiment. Only forward selection, backward elimination, and
evolutionary selection were implemented [34].
training deep learning models, while it also helped to improve some aspects of model performance.
The details of these two additional phases have also been detailed by others [34,37].
As mentioned in Section 2.1, one of the challenges of training the deep learning model is to seek
for a feature space f in which P(XS ) and P(XT ) are similar. Selecting only the meaningful feature is
believed to help reduce the dissimilarity in the feature space that effect the predictability of the model.
This is also the way to reduce the complexity of the model architecture and might also improve the
prediction accuracy of the deep learning models. One possible framework that incorporates the feature
engineering
Aerospace 2020, 7, phase and pre-training
x FOR PEER REVIEW phase into the CRISP-DM standard is illustrated in Figure 4. 9 of 33
Table 1. Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset description [38].
C-MAPSS
Description
FD001 FD002 FD003 FD004
Number of training engines 100 260 100 248
Number of testing engines 100 259 100 248
Operational conditions 1 6 1 6
Fault modes 1 1 2 2
Each sub-dataset FD001, FD002, FD003, and FD004 contains a number of training engines with
run-to-failure information and a number of testing engines with information terminating before failure
is observed. As for operating conditions, each dataset can have one or six operational conditions based
on altitude (0–42,000 feet), throttle resolver angle (20–100◦ ), and Mach (0–0.84). As for fault mode, each
dataset can have one mode or two modes, which are, HPC degradation and Fan degradation.
Sub-dataset FD002 and FD004 are generated with six operational conditions, which are believed
to be a better representation of general aircraft gas turbine engines operation compared to FD001 and
FD003, which could be generated from only one operational condition. Therefore, either data from
FD002 or FD004 can be selected for a complete experiment. In this study, the data from FD002 set
were selected as a training dataset. As our current model validation set-up (which will be described in
Section 3.2), the wrapper methods required roughly 2 to 3 weeks to complete the run. We also keep the
consistency of the amount of data points used in feature selection validations and model trainings–in
both ANN feature selection validation and DNN model training. Our experiments have been designed
this way in order to clearly demonstrate the effectiveness of the feature selection methods used for
neural network-based algorithms.
There are 21 features included in the C-MAPSS dataset for every sub-dataset. These attributes
represent the sensor signals from the different parts of the aircraft gas turbine engines, as illustrated in
Figure 5 [39]. Short descriptions of the features and the plots of all 21 sensor signals of sub-dataset
FD002 are illustrated in Figure 6.
Aerospace 2020, 7, 132 11 of 32
Aerospace 2020, 7, x FOR PEER REVIEW 11 of 33
Figure 5. Engine and sensor points (left) and engine parts modules connections (right) [39].
It is also reasonable to estimate RUL as a constant value when the engines operate in normal
conditions [38]. Therefore, a piece‐wise linear degradation model can be used to define the observed
RUL value in the training datasets. That is, after an initial period with constant RUL values, it can be
assumed that the RUL targets decrease linearly.
Figure 5. Engine and sensor points (left) and engine parts modules connections (right) [39].
Figure 5. Engine and sensor points (left) and engine parts modules connections (right) [39].
It is also reasonable to estimate RUL as a constant value when the engines operate in normal
conditions [38]. Therefore, a piece‐wise linear degradation model can be used to define the observed
RUL value in the training datasets. That is, after an initial period with constant RUL values, it can be
assumed that the RUL targets decrease linearly.
Figure 6. Example of Sensor signals (NRc and Ps30) and all feature descriptions.
Figure 6. Example of Sensor signals (NRc and Ps30) and all feature descriptions.
It has been suggested by multiple literature references to normalize the raw signal before
performing modeling and analysis [13–15]. Figure 7 shows the data signals before and after applying
z-normalization:
ij
ij
xt − min xj
xt =
e (10)
max x j − min x j
ij
where, xt denotes the original i-th data point of j-th feature at time t and x j is the vector of all inputs of
the j-th feature. Figure 6. Example
Each attribute of Sensor
value signals (NRc individually
was normalized and Ps30) andand
all feature
scaleddescriptions.
down to the same range
across all data points.
From the dataset, aircraft gas turbine engines start with various initial wear levels, but all are
considered to be at “healthy state” at the start of each record. The engines begin to degrade at a point
in time at higher operation cycles until they can no longer function normally. This is considered as
the time when the engine system is being at the “unhealthy state”. The training datasets have been
collected over the time of run-to-failure information to cover entire life until the engines fail.
It is also reasonable to estimate RUL as a constant value when the engines operate in normal
conditions [38]. Therefore, a piece-wise linear degradation model can be used to define the observed
RUL value in the training datasets. That is, after an initial period with constant RUL values, it can be
assumed that the RUL targets decrease linearly.
Aerospace 2020, 7, 132 12 of 32
Aerospace 2020, 7, x FOR PEER REVIEW 12 of 33
Figure 8 illustrates the RUL curves of all unseen or test datasets containing testing engines from
FD002 and FD004 dataset. Figure 9 show the example of RUL curves from one degradation engine
from FD002 and FD004 dataset. The same degradation behavior is also applied to the training set.
These RUL curves represent the health state or prognostic of the aircraft gas turbine engines over
cycles until the end‐of‐life, or the point that the aircraft gas turbine engines can no longer operate
normally.
The degradationFigure behavior of the aircraft
7. Example of beforegas
(left)turbine engines
and after can be observed clearer from Figure
(right) z-normalization.
9. We presume that the RUL is a constant cycle until it gets to the critical point when the performance
Figure 7. Example of before (left) and after (right) z‐normalization.
Figure 8starts
of the engine illustrates the RUL
to degrade. In curves of all unseen
the degradation or test
phase, thedatasets
RUL iscontaining
represented testing engines
by a linear from
function.
FD002 and
Hence,Figure FD004
the entire dataset.
RUL curve Figure 9 show
is identified the example
as unseen
a piece‐wiseof RUL curves from
linear degradation one degradation
function. engine
The from
critical
8 illustrates the RUL curves of all or test datasets containing testing engines
fromRth,
point, FD002
is andpoint
the FD004 dataset.
where the The same
aircraft degradation
engines started behavior
to degrade.is also
The applied
critical to the training
points of the set.
aircraft
FD002 and FD004 dataset. Figure 9 show the example of RUL curves from one degradation engine
gasThese RUL engines
turbine curves represent the health state
weredataset.
predefined based or on
prognostic of the aircraft
the condition gas turbine
described by theengines over cycles
from FD002 and FD004 The same degradation behavior is also applied todata
the source–NASA
training set.
until the end-of-life, or the point that the aircraft gas turbine engines can no longer operate normally.
These
Ames RUL curves
prognostics represent
data the health
repository [1]. state or prognostic of the aircraft gas turbine engines over
cycles until the end‐of‐life, or the point that the aircraft gas turbine engines can no longer operate
normally.
The degradation behavior of the aircraft gas turbine engines can be observed clearer from Figure
9. We presume that the RUL is a constant cycle until it gets to the critical point when the performance
of the engine starts to degrade. In the degradation phase, the RUL is represented by a linear function.
Hence, the entire RUL curve is identified as a piece‐wise linear degradation function. The critical
point, Rth, is the point where the aircraft engines started to degrade. The critical points of the aircraft
gas turbine engines were predefined based on the condition described by the data source–NASA
Ames prognostics data repository [1].
RULcurve
Figure8.8.RUL
Figure curveof
of all
all testing engines:
engines: FD002
FD002(top)
(top)and
andFD004
FD004(bottom).
(bottom).
Figure 8. RUL curve of all testing engines: FD002 (top) and FD004 (bottom).
Aerospace 2020, 7, 132 13 of 32
The degradation behavior of the aircraft gas turbine engines can be observed clearer from Figure 9.
We presume that the RUL is a constant cycle until it gets to the critical point when the performance of
the engine starts to degrade. In the degradation phase, the RUL is represented by a linear function.
Hence, the entire RUL curve is identified as a piece-wise linear degradation function. The critical point,
Rth, is the point where the aircraft engines started to degrade. The critical points of the aircraft gas
turbine engines were predefined based on the condition described by the data source–NASA Ames
Aerospace 2020, 7, x FOR PEER REVIEW 13 of 33
prognostics data repository [1].
Figure
Figure9.9.Example
Exampleof
ofRUL
RULcurve
curveof
ofone
onetesting
testingengine:
engine:FD002
FD002(top)
(top)and
andFD004
FD004(bottom).
(bottom).
To measure
To measure and andevaluate
evaluate the
theperformance
performance of ofthethemodels
modelswith withselected
selectedfeatures,
features,root rootmean
meansquare
square
error(RMSE)
error (RMSE)and and the
the scoring
scoring algorithm
algorithm as as suggested
suggested in in [39]
[39] were
were used.
used.
RMSE is commonly used as a performance indicator for
RMSE is commonly used as a performance indicator for regression models. regression models. TheThefollowing
followingisisthe
the
formulaof
formula ofRMSE:
RMSE: v
t n
1X
RMSE = [xi − xi ]2 (11)
1n i=1
𝑅𝑀𝑆𝐸 𝑥 𝑥̅ (11)
𝑛
where, n is the number of prediction datasets, xi is the real value, and xi is the prediction value. In this
case, the x parameters refer to the data points in RUL curve while xi is the actual RUL value and xi is
the RUL𝑛 value
where, is the predicted
number ofbyprediction
our models.datasets, 𝑥 is the real value, and 𝑥̅ is the prediction value. In
this case, the 𝑥 parameters refer to
The scoring algorithm is as described the datain points
the formulain RUL curve while 𝑥 is the actual RUL value
below:
and 𝑥̅ is the RUL value predicted by our models.
n
The scoring algorithm is as described in the
P −( adformula
) below:
1 − 1 f or d < 0
e
i=1
s= n −( d ) (12)
⎧ P
⎪ 𝑒e a2 − 11 𝑓𝑜𝑟 f or d𝑑≥ 00
i=1
𝑠 (12)
⎨
where, s is the computed score, n is number ⎪ 𝑒 of units 1 𝑓𝑜𝑟 𝑑 0
under test (UTT), d = t̂RUL − t̂ RUL or Estimated
RUL—True RUL, while a1 = 10 and a2 ⎩ = 13. It can also be explained that the difference between ai
is the difference
where, between score,
𝑠 is the computed predicted and observed
n is number of unitsRUL under values and s is
test (UTT), 𝑑 summed
𝑡̂ 𝑡over
̂ allEstimated
or examples.
From the formula,
RUL—True the scoring
RUL, while 𝑎 matric
10 andpenalizes
𝑎 13. positive
It can also errors more than that
be explained negative errors as these
the difference have
between
a higher impact on maintenance policies. Also, note that the lower score
𝑎 is the difference between predicted and observed RUL values and 𝑠 is summed over all examples. means better prediction
performance
From the formula,of thethe
model [39]. matric penalizes positive errors more than negative errors as these
scoring
have a higher impact on maintenance policies. Also, note that the lower score means better prediction
performance of the model [39].
• 5 Folds Cross-Validation
• 1000 Training cycles
• 0.001 Learning rate
• 0.9 Momentum
• Linear sampling.
For the DNN hyperparameters selection, the model parameters in H2O DNN algorithm varied as
described in Table 2. The grid search to identify the range of the learning rate, λ, was performed after
fine-tuning the remaining parameters manually. Additionally, the training sample per iteration was set
to auto-tuning, and batch size was set to 1 for all variations.
Table 2. Hyperparameters values evaluated in the proposed Deep Neural Network (DNN) model.
Hyperparameters Range
Epoch {100, 1000, 5000, 7000, 10,000}
Training sample per iteration AUTO
Batch size 1
Leaning rate annealing {10−10 , 10−8 , 10−5 , 10−1 }
Momentum {0.1, 0.2, 0.3, 0.5, 0.6, 0.8, 0.99}
L1: Regularization that constraint the absolute value {10−20 , 10−15 , 10−10 , 10−5 , 10−1 , 0}
L2: Regularization that constraint the sum of square weights {10−20 , 10−15 , 10−10 , 10−5 , 10−1 , 0}
Max w2: Maximum sum of square of incoming weight into the neuron {0, 10, 100, 10,000, ∞}
The best-case scenario is the combination of following hyperparameters; Epoch = 5000, Learning
rate = 10−8 , Momentum = 0.99, L1 = 10−5 , L2 = 0, and Max w2 set to infinity. These are all
hyperparameters employed in the final DNN model proposed.
Aerospace 2020, 7, 132 15 of 32
For the Pearson correlation, the attributes were not selected if the coefficient was less than
−0.01 [29,30]. For PCA, the features have been selected based on weight (selected if weight is more than
0.2) and the PCA matrix [31]. For the Relief algorithm, the attributes were not selected if the calculated
weight was below zero [31]. For deviation selection, the feature will be selected if the weights are
higher than 1 [31]. It is important to note that the weights of the attributes calculated using the Relief
algorithm were unacceptably low (less than 10−12 ) and there were very large gaps between calculated
weights. Similar results were observed with other filter selection methods, including the SVM. It was
found that by using the filter methods that provided statistically low weight as for selecting features,
the models trained from those features were unable to provide usable prediction results.
Aerospace 2020, 7, 132 16 of 32
The following are the features selected based on these two filtering methods. In addition to the
feature weights from Pearson correlation selection and PCA selection in Table 3, the Pearson correlation
matrix and PCA matrix are also provided in Appendices A and B.
• Pearson correlation; 8 attributes: T30, T50, Ne, Ps30, NRc, BPR, farB, and htBleed.
• Relief algorithm; 2 attributes: P15 and Nf_dmd.
• SVM selection; 11 attributes: T2, T24, P30, Nf, epr, phi, NRF, Nf_dmd, PCNfR_dmd, W31, and W32.
• PCA selection; 17 attributes: T2, T24, T30, T50, P2, P15, P30, Nf, Ne, epr, Ps30, phi, farB, htBleed,
Nf_dmd, W31, and W32.
• Deviation selection; 11 attributes: T2, T24, T50, P2, P15, Ne, epr, Ps30, farB, PCNfR_dmd, and W32.
In reference to the wrapper methods, below are the sets of features selected from each method.
It is important to note that for the wrapper methods, ANN validation with the modeling set-up,
as mentioned in Section 3.2 was used. Figure 10 shows the validation process using ANN for
evolutionary selection.
Unlike forward selection and backward elimination methods, which are both based on search
algorithms [32], the setting of Evolutionary selection is based on genetic algorithms [40]. However,
instead of using fitness function from genetic theory, the evolutionary selection method used ANN
validation as fitness measurement. The parameters set-up in our evolutionary selection experiment
are; population size = 10, maximum number of generation = 200, using tournament selection with
0.25 size, initial probability for attributes (features) to be switched = 0.5, crossover probability = 0.5
with uniform
Aerospace crossover,
2020, 7, x FOR and mutation probability = number of1 attributes .
PEER REVIEW 17 of 33
It is also important to note that, in this case, the brute force algorithm was not used. The brute
force algorithm is the selection algorithm that can derive the best features set from the data. However,
with limited computational capability, it cannot be used in real-time. Therefore, we did not include the
Brute force algorithm in this experiment.
• Backward elimination; validate RMSE 46.429 from 19 attributes; T2, T30, P2, P15, P30, Nf, epr,
Ps30, phi, NRF, NRc, BPR, farB, htBleed, Nf_dmd, PNCfR_dmd, W31, and W32.
• Evolutionary selection; validate RMSE 46.451 from 14 attributes; T2, T30, T50, P2, Nf, Ne, epr,
Ps30, NRc, BPR, farB, htBleed, W31, and W32.
• Forward selection methods; validate RMSE 46.480 from 11 attributes; T2, T30, T50, P2, P15, Ps30,
NRc, BPR, farB, htBleed, and Nf_dmd.
Table 4. Best root mean square error (RMSE) and Prediction Score results of RUL prediction from all
DNN models.
RMSE Score
Methods
FD002 FD004 FD002 FD004
Original data 45.439 45.302 645,121 427,968
SVM
Unusable
Relief algorithm
Backward elimination 45.121 45.436 645,132 211,129
Deviation 45.374 45.630 740,936 256,776
Evolutionary Selection 44.717 44.953 518,025 355,458 Best Overall
Forward selection 45.242 46.505 1,353,749 423,997
PCA 45.368 45.108 1,450,397 406,872
Pearson correlation 45.272 46.216 502,579 338,400
Aerospace 2020, 7, 132 18 of 32
100
50
0
0 5000 10000 15000 20000 Cycles
200 (b) Backward Elimination FD002
150
RUL
100
50
0
0 5000 10000 15000 20000 Cycles
200 (c) Deviation Selection FD002
150
RUL
100
50
0
0 5000 10000 15000 20000 Cycles
(d) Evolutionary Selection FD002
200
150
RUL
100
50
0
0 5000 10000 15000 20000 Cycles
(e) Forward Selection FD002
200
150
RUL
100
50
0
0 5000 10000 15000 20000 Cycles
(f) PCA Selection FD002
200
150
RUL
100
50
0
0 5000 10000 15000 20000 Cycles
(g) Pearson Correlation Selection FD002
200
150
RUL
100
50
0
0 5000 10000 15000 20000 Cycles
Actual RUL DNN Prediction
100
50
0
0 5000 10000 15000 Cycles
100
50
0
0 5000 10000 15000 Cycles
200 (c) Deviation Selection FD004
150
RUL
100
50
0
0 5000 10000 15000 Cycles
(d) Evolutionary Selection FD004
200
150
RUL
100
50
0
0 5000 10000 15000 Cycles
200 (e) Forward Selection FD004
150
RUL
100
50
0
0 5000 10000 15000 Cycles
200 (f) PCA Selection FD004
RUL
0
0 5000 10000 15000 Cycles
(g) Pearson Correlation Selection FD004
200
RUL
0
0 5000 10000 15000 Cycles
Actual RUL DNN Prediction
Figure 12. (a–g) All RUL prediction curves for FD004.
Figure 12. (a–g) All RUL prediction curves for FD004.
Aerospace 2020, 7, 132 20 of 32
Aerospace 2020, 7, x FOR PEER REVIEW 20 of 33
Figure 13.
Figure (a–g)RUL
13. (a–g) RULprediction
prediction points
points for
for one
one engine
engine of
of FD002
FD002 test
test data.
data.
Aerospace 2020, 7, 132 21 of 32
Aerospace 2020, 7, x FOR PEER REVIEW 21 of 33
Feature Selection
Model Output Weights Errors
Method
Table 5. Cont.
Feature Selection
Model Output Weights Errors
Method
Figure 16.
Figure 16. (a–g)
(a–g) Prediction
Prediction Error
Error Distributions.
Distributions.
Table6.6. Mean
Table Mean RMSE
RMSE from
fromall
allDNN
DNNmodels.
models.
Average
AverageRMSE
RMSE
Original BW
OriginalData
Data BW Elimination Deviation
Deviation EvoSelection
Evo Selection FWFWSelection
Selection PCA
PCA Pearson
Pearson
Elimination
FD002 FD004 FD002 FD004 FD002 FD004 FD002 FD004 FD002 FD004 FD002 FD004 FD002 FD004
FD002 FD004 FD002 FD004 FD002 FD004 FD002 FD004 FD002 FD004 FD002 FD004 FD002 FD004
48.398
48.398 50.541
50.541 47.907
47.907 50.331
50.331 48.160 50.081
48.160 50.081 47.452
47.452 49.650
49.650 48.434
48.434 50.708
50.708 48.072
48.072 49.737
49.737 49.203
49.203 52.111
52.111
4. Discussion
4.
As mentioned
As mentionedinin thethe related
related works
works (Section
(Section 1.2), have
1.2), there therebeen
have been aofnumber
a number efforts inofdeveloping
efforts in
developing deep learning models for a C-MAPSS aircraft gas turbine
deep learning models for a C-MAPSS aircraft gas turbine engines dataset [12–20]. Currently, the engines dataset [12–20].
deep
Currently,
learning the deep
model withlearning
the highestmodel with the
accuracy washighest
proposedaccuracy was proposed
by Zhengmin Kong by Zhengmin
et al. [17]. TheirKong
deepet
al. [17]. Their
learning deep learning
architecture consistsarchitecture
of CNN andconsists
LSTM-RNN of CNN and LSTM-RNN
combined layers andcombined
can achieve layers
16.13and can
RMSE,
achieve
while our16.13
best RMSE, while our
Evolutionary DNN best Evolutionary
model can achieve DNN
44.71model
RMSE. can achieve
This 44.71
indicates RMSE.
that This indicates
the performance of
thatDNN
our the performance of our
models is poorer thanDNN models hybrid
the modern is poorer
deepthan the modern
learning modelshybrid deepinlearning
developed the recent models
years.
developed
However,in the recent
to the bestyears.
of our knowledge, no work has addressed the complexity of the models and
However, to the
the computational best of
burden forour
modelknowledge,
training. noAllwork
hybrid hasdeep
addressed
neural the complexity
network layers of
arethe models
generally
and the
overly computational
complex and require burden for model
exponentially moretraining. All hybrid
computational timedeep
and neural
resourcesnetwork
comparedlayers are
to our
generally overly complex and require exponentially more computational
proposed Evolutionary DNN. All proposed models in recent years also took all features from the time and resources
compared dataset
C-MAPSS to our proposed
and disregard Evolutionary
the featuresDNN. All proposed
performance models Different
benchmark. in recent from
yearsthose
also models,
took all
features
our fromapproach
proposed the C-MAPSS appliesdataset and disregard
the feature the features
selection prior performance
to the model trainingbenchmark.
phase to help Different
reduce
from those models, our proposed approach applies the feature selection prior
the number of input attributes, and to improve the model complexity as a result. The reduction in to the model training
phase to help
complexity reduce
when usingtheless
number
input of input attributes,
features and tofor
is more evident improve
the highthe model complexity
complexity hybrid deepas aneural
result.
The reduction
network layers.in complexity when using less input features is more evident for the high complexity
hybrid deep neural network layers.
Additionally, as illustrated in Figures 15 and 16, prediction errors fluctuations can be noticed
when training deep learning models. This effect has occurred not only in DNN but also in other types
of network layers, such as LSTM-RNN, CNN, and other modern hybrid layers. Based on the results
demonstrated in Table 4 and Figures 12–16, the key observations of such an effect are as follows:
(1) Utilizing fewer features to train the model has shown to lower the error distribution range,
compared to using more features. This is due to that the initial random weights assigned to the
hidden nodes are smaller when using less feature in model training. In other words, the models
are more robust and reliable when using less features. Same observation is also applied for the
Aerospace 2020, 7, 132 26 of 32
Additionally, as illustrated in Figures 15 and 16, prediction errors fluctuations can be noticed
when training deep learning models. This effect has occurred not only in DNN but also in other types
of network layers, such as LSTM-RNN, CNN, and other modern hybrid layers. Based on the results
demonstrated in Table 4 and Figures 12–16, the key observations of such an effect are as follows:
(1) Utilizing fewer features to train the model has shown to lower the error distribution range,
compared to using more features. This is due to that the initial random weights assigned to the
hidden nodes are smaller when using less feature in model training. In other words, the models
are more robust and reliable when using less features. Same observation is also applied for the
fluctuation of the prediction errors, in that the prediction results are more stable when using less
features in model training.
(2) In terms of model performance and accuracy, although using selected features does not
always guarantee better results, the feature selection methods still help in terms of reducing
a computational burden while offering better prediction performance. In our experiment, the
Evolutionary selection can achieve both better performance and complexity reduction.
We emphasize that our current goal is not to improve on model performance compared against
other existing works; rather, we aim to provide baseline results and demonstrate the significant effect of
using feature selections on deep learning models, which have never been addressed before. We believe
that the end results can be further improved when applying our feature selection results in the modern
hybrid deep neural network architectures.
For our experimental results in general, as mentioned, the best accuracy based on the RMSE
results in Table 4 were generated from the Evolutionary method. The complexity of the model has also
been significantly improved using a reduced set of features, from 21 attributes to only 14 attributes.
When considering the complexity and computational time, the filter methods were less complex
and faster to run because they do not require to train-and-test multiples of ANN model validation in
the process. In this study, when performing the selection process, most of the filter methods required
only 5–10 min while wrapper methods required 10 h to 10 days to complete.
It is also important to note that the curve fitting and pattern recognition have been vastly improved,
as can be seen when comparing the RUL prediction curves in Figures 11–14. In greater detail, the DNN
model from most of the selected features can reasonably capture the trend of both before and after
aircraft gas turbine engines’ degradation intervals.
In summary, our Evolutionary DNN model architecture performs best as a simplified deep neural
network data-driven model for C-MAPSS aircraft gas turbine engines data. The feature selection
phase (as described in the modeling framework in Figure 4) must be included as a standard in the
modeling framework for such a PHM dataset. This is one way to potentially improve the overall
performance for RUL prediction for the prognostics of aircraft gas turbine engines data as well as other
prognostic datasets.
is also possible to use the dimensionality reduction technique such as, PCA, to transform the data
from selected features to reduce dimensionality, which can possibly improve prediction accuracy and
complexity. These are the key aspects that should be tested and experimented with in the future.
Lastly, we also believe that our studies will be a great benefit to aviation communities. We aim to
raise the awareness and discussion on how each aircraft gas turbine engines feature can significantly
help improve the overall life-span of the engines. Although, we only provided the insights based on
data science perspective, we strongly believe that more study in aviation communities will be further
investigated based on the results achieved in this work.
Author Contributions: Conceptualization, P.K.; methodology, P.K.; software, P.K.; validation, P.K.; formal analysis,
P.K.; investigation, P.K., and D.G.; resources, P.K.; data curation, P.K.; writing—original draft preparation, P.K.;
writing—review and editing, P.K., D.G., and N.Y.; visualization, P.K.; supervision, D.G.; project administration, D.G.
and N.Y.; funding acquisition, P.K. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design of the
study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to
publish the results.
Aerospace 2020, 7, 132 28 of 32
Appendix A
Appendix B
Figure
Figure A1. Final Proposed
A1. Final Proposed Evolutionary
Evolutionary DNN
DNN Model
Model description.
description.
Aerospace 2020, 7, 132 31 of 32
References
1. Saxena, A.; Goebel, K. Turbofan Engine Degradation Simulation Data Set. NASA Ames Prognostics Data
Repository, NASA Ames Research Center, Moffett Field. 2008. Available online: http://ti.arc.nasa.gov/project/
prognostic-data-repository (accessed on 10 May 2019).
2. Atamuradov, V.; Medjaher, K.; Dersin, P.; Lamoureux, B.; Zerhouni, N. Prognostics and health management
for maintenance practitioners-review, implementation and tools evaluation. Int. J. Progn. Health Manag.
2017, 8, 1–31.
3. Papakostas, N.; Papachatzakis, P.; Xanthakis, V.; Mourtzis, D.; Chryssolouris, G. An approach to operational
aircraft maintenance planning. Decis. Support Syst. 2010, 48, 604–612. [CrossRef]
4. Cubillo, A.; Perinpanayagam, S.; Esperon-Miguez, M. A review of physics-based models in prognostics:
Application to gears and bearings of rotating machinery. Adv. Mech. Eng. 2016, 8, 1687814016664660.
[CrossRef]
5. Si, X.-S.; Wang, W.; Hu, C.-H.; Zhou, D.-H. Remaining useful life estimation—A review on the statistical data
driven approaches. Eur. J. Oper. Res. 2011, 213, 1–14. [CrossRef]
6. Lei, Y.; Li, N.; Guo, L.; Li, N.; Yan, T.; Lin, J. Machinery health prognostics: A systematic review from data
acquisition to RUL prediction. Mech. Syst. Signal Process. 2018, 104, 799–834. [CrossRef]
7. Faghih-Roohi, S.; Hajizadeh, S.; Nunez, A.; Babuška, R.; De Schutter, B. Deep Convolutional Neural Networks
for Detection of Rail Surface Defects. In Proceedings of the 2016 International Joint Conference on Neural
Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 2584–2589.
8. Mehrotra, K.; Mohan, C.K.; Ranka, S. Elements of Artificial Neural Networks; MIT Press: Cambridge, MA,
USA, 1997.
9. Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006,
313, 504–507. [CrossRef]
10. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [CrossRef]
11. Zhao, G.; Zhang, G.; Ge, Q.; Liu, X. Research Advances in Fault Diagnosis and Prognostic Based on
Deep Learning. In Proceedings of the 2016 Prognostics and System Health Management Conference
(PHM-Chengdu), Chengdu, China, 19–21 October 2016; pp. 1–6.
12. Xiongzi, C.; Jinsong, Y.; Diyin, T.; Yingxun, W. Remaining Useful Life Prognostic Estimation for Aircraft
Subsystems or Components: A Review. In Proceedings of the 2011 10th International Conference on
Electronic Measurement & Instruments, Chengdu, China, 16–19 August 2011; Volume 2, pp. 94–98.
13. Yuan, M.; Wu, Y.-T.; Lin, L. Fault Diagnosis and Remaining Useful Life Estimation of Aero Engine Using
LSTM Neural Network. In Proceedings of the 2016 IEEE International Conference on Aircraft Utility Systems
(AUS), Beijing, China, 10–12 October 2016; pp. 135–140.
14. Khan, F.; Eker, O.F.; Khan, A.; Orfali, W. Adaptive Degradation Prognostic Reasoning by Particle Filter with
a Neural Network Degradation Model for Turbofan Jet Engine. Data 2018, 3, 49. [CrossRef]
15. Li, X.; Ding, Q.; Sun, J.-Q. Remaining useful life estimation in prognostics using deep convolution neural
networks. Reliab. Eng. Syst. Saf. 2018, 172, 1–11. [CrossRef]
16. Zhang, A.; Wang, H.; Li, S.; Cui, Y.; Liu, Z.; Yang, G.; Hu, J. Transfer Learning with Deep Recurrent Neural
Networks for Remaining Useful Life Estimation. Appl. Sci. 2018, 8, 2416. [CrossRef]
17. Kong, Z.; Cui, Y.; Xia, Z.; Lv, H. Convolution and Long Short-Term Memory Hybrid Deep Neural Networks
for Remaining Useful Life Prognostics. Appl. Sci. 2019, 9, 4156. [CrossRef]
18. Zheng, S.; Ristovski, K.; Farahat, A.; Gupta, C. Long Short-Term Memory Network for Remaining Useful
Life estimation. In Proceedings of the 2017 IEEE International Conference on Prognostics and Health
Management (ICPHM), Dallas, TX, USA, 19–21 June 2017; pp. 88–95.
19. Wu, Y.-T.; Yuan, M.; Dong, S.; Lin, L.; Liu, Y. Remaining useful life estimation of engineered systems using
vanilla LSTM neural networks. Neurocomputing 2018, 275, 167–179. [CrossRef]
20. Ellefsen, A.L.; Bjoerlykhaug, E.; Æsøy, V.; Ushakov, S.; Zhang, H. Remaining useful life predictions for
turbofan engine degradation using semi-supervised deep architecture. Reliab. Eng. Syst. Saf. 2019, 183,
240–251. [CrossRef]
21. Candel, A.; Parmar, V.; LeDell, E.; Arora, A. Deep Learning with H2 O; H2O. ai Inc.: Mountain View, CA,
USA, 2016.
22. Bengio, Y. Learning Deep Architectures for AI. Found. Trends Mach. Learn. 2009, 2, 1–127. [CrossRef]
Aerospace 2020, 7, 132 32 of 32
23. Goodfellow, I.; Warde-Farley, D.; Mirza, M.; Courville, A.; Bengio, Y. Maxout networks. In Proceedings of
the International Conference on Machine Learning, Atlanta, GA, USA, 16−21 June 2013; pp. 1319–1327.
24. Ben-David, S.; Blitzer, J.; Crammer, K.; Kulesza, A.; Pereira, F.C.N.; Vaughan, J.W. A theory of learning from
different domains. Mach. Learn. 2010, 79, 151–175. [CrossRef]
25. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent
neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958.
26. Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; LaRochelle, H.; LaViolette, F.; Marchand, M.; Lempitsky, V.
Domain-Adversarial Training of Neural Networks. J. Mach. Learn. Res. 2016, 17, 189–209. [CrossRef]
27. Guyon, I.; Gunn, S.; Nikravesh, M.; Zadeh, L.A. (Eds.) Feature Extraction: Foundations and Applications.
Springer: New York, NY, USA, 2008.
28. Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3,
1157–1182.
29. Benesty, J.; Chen, J.; Huang, Y.; Cohen, I. Pearson Correlation Coefficient. In Noise Reduction in Speech
Processing; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–4.
30. Sarwate, D. Mean-square correlation of shift-register sequences. IEEE Proc. F Commun. Radar Signal Process.
1984, 131, 101. [CrossRef]
31. Sun, Y. Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications. IEEE Trans. Pattern
Anal. Mach. Intell. 2007, 29, 1035–1051. [CrossRef]
32. Derksen, S.; Keselman, H.J. Backward, forward and stepwise automated subset selection algorithms:
Frequency of obtaining authentic and noise variables. Br. J. Math. Stat. Psychol. 1992, 45, 265–282. [CrossRef]
33. Vafaie, H.; Imam, I.F. Feature Selection Methods: Genetic Algorithms vs. Greedy-Like Search. In Proceedings
of the International Conference on Fuzzy and Intelligent Control Systems, LoUIsville, KY, USA,
26 June−2 July 1994; Volume 51, p. 28.
34. Javed, K.; Gouriveau, R.; Zemouri, R.; Zerhouni, N. Features Selection Procedure for Prognostics:
An Approach Based on Predictability. IFAC Proc. Vol. 2012, 45, 25–30. [CrossRef]
35. Wirth, R.; Hipp, J. CRISP-DM: Towards a Standard Process Model for Data Mining. In Proceedings
of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining;
Springer: London, UK, 2000; pp. 29–39.
36. Khumprom, P.; Yodo, N. A Data-Driven Predictive Prognostic Model for Lithium-Ion Batteries based on a
Deep Learning Algorithm. Energies 2019, 12, 660. [CrossRef]
37. Erhan, D.; Manzagol, P.A.; Bengio, Y.; Bengio, S.; Vincent, P. The difficulty of training deep architectures and
the effect of unsupervised pre-training. AISTATS 2009, 5, 153–160.
38. Saxena, A.; Goebel, K.; Simon, D.; Eklund, N. Damage Propagation Modeling for Aircraft Engine
Run-to-Failure Simulation. In Proceedings of the 2008 International Conference on Prognostics and
Health Management, Denver, CO, USA, 6–9 October 2008; pp. 1–9.
39. Frederick, D.K.; DeCastro, J.A.; Litt, J.S. User’s Guide for the Commercial Modular Aero-Propulsion System
Simulation (C-MAPSS); NASA/TM-2007-215026; NASA: Washington, DC, USA, 1 October 2007.
40. Van der Drift, A. Evolutionary selection, a principle governing growth orientation in vapour-deposited
layers. Philips Res. Rep. 1967, 22, 267–288.
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).