Sensors: Bearing Fault Diagnosis Method Based On Deep Convolutional Neural Network and Random Forest Ensemble Learning
Sensors: Bearing Fault Diagnosis Method Based On Deep Convolutional Neural Network and Random Forest Ensemble Learning
Sensors: Bearing Fault Diagnosis Method Based On Deep Convolutional Neural Network and Random Forest Ensemble Learning
Article
Bearing Fault Diagnosis Method Based on Deep
Convolutional Neural Network and Random Forest
Ensemble Learning
Gaowei Xu 1 , Min Liu 1, *, Zhuofu Jiang 1 , Dirk Söffker 2 and Weiming Shen 3,4
1 School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China;
[email protected] (G.X.); [email protected] (Z.J.)
2 Dynamics and Control, University of Duisburg-Essen, 47057 Duisburg, Germany; [email protected]
3 Key Laboratory of Embedded System and Service Computing, Tongji University, Shanghai 201804, China;
[email protected]
4 State Key Laboratory of Digital Manufacturing Equipment and Technology,
Huazhong University of Science and Technology, Wuhan 430074, China
* Correspondence: [email protected]
Received: 19 January 2019; Accepted: 27 February 2019; Published: 3 March 2019
Abstract: Recently, research on data-driven bearing fault diagnosis methods has attracted increasing
attention due to the availability of massive condition monitoring data. However, most existing
methods still have difficulties in learning representative features from the raw data. In addition,
they assume that the feature distribution of training data in source domain is the same as that of
testing data in target domain, which is invalid in many real-world bearing fault diagnosis problems.
Since deep learning has the automatic feature extraction ability and ensemble learning can improve
the accuracy and generalization performance of classifiers, this paper proposes a novel bearing
fault diagnosis method based on deep convolutional neural network (CNN) and random forest (RF)
ensemble learning. Firstly, time domain vibration signals are converted into two dimensional (2D)
gray-scale images containing abundant fault information by continuous wavelet transform (CWT).
Secondly, a CNN model based on LeNet-5 is built to automatically extract multi-level features that
are sensitive to the detection of faults from the images. Finally, the multi-level features containing
both local and global information are utilized to diagnose bearing faults by the ensemble of multiple
RF classifiers. In particular, low-level features containing local characteristics and accurate details
in the hidden layers are combined to improve the diagnostic performance. The effectiveness of the
proposed method is validated by two sets of bearing data collected from reliance electric motor
and rolling mill, respectively. The experimental results indicate that the proposed method achieves
high accuracy in bearing fault diagnosis under complex operational conditions and is superior to
traditional methods and standard deep learning methods.
Keywords: bearing fault diagnosis; convolutional neural network (CNN); random forest (RF);
continuous wavelet transform (CWT); ensemble learning
1. Introduction
Nowadays, with the rapid development of modern industry, fault diagnosis technology, as a core
of Prognostics and Health Management (PHM) system, is playing an increasingly important role in
intelligent equipment maintenance [1,2]. Bearings are the essential components of most machinery
and electrical equipment, their failures may result in considerable productivity and economic losses.
Accurate and efficient bearing fault diagnosis can not only reduce the maintenance costs, but also
improve the reliability and stability of equipment [2,3].
With the advent of the Internet of Things (IoT) and Cyber Physical System (CPS), a massive
amount of historic data known as industrial Big Data is being collected from various equipment
and systems. Therefore, research on data-driven fault diagnosis methods has attracted increasing
attention [4–6]. Compared with signal-based and model-based fault diagnosis methods, they eliminate
the complexity of signal processing and model establishment for different engineered systems [7].
Intelligent data-driven fault diagnosis methods usually consist of three critical steps: (1) data
preprocessing (i.e., outliers elimination); (2) feature extraction and selection; (3) fault classification [8].
Feature extraction methods are commonly used to analyze waveform signal data including vibration
data and extract signal-based features in the fault diagnosis of equipment [3,9–11]. However, there
are still some redundant information in the extracted features. Then, feature selection techniques are
adopted to significantly reduce feature dimensions, which can improve the classification efficiency
while retaining important and representative features [12,13]. Finally, the selected features are used
to diagnose faults by many fault classification methods based on traditional statistical and machine
learning models [11,14,15]. It can be seen that traditional data-driven fault diagnosis methods have
achieved some great progress. Nevertheless, there are still two important issues in these methods:
(1) The important and representative features containing enough fault information are manually
extracted and selected from raw data, which depend heavily on prior knowledge and diagnostic
expertise of signal processing techniques. In addition, the feature extraction and selection
processes for different diagnostic problems are case sensitive, thus they are also time-consuming
and laborious.
(2) The shallow architectures of traditional machine learning methods have problems in approximating
nonlinear mapping relationship accurately in complex systems [7,16–23].
Deep learning is a new branch in the field of machine learning and can overcome the
above-mentioned issues in fault diagnosis. It can replace the manual feature extraction and selection
with automatic learning of representative features and construct input-output relationship in complex
systems with a deep nonlinear network [16–20]. CNN model, as one of the most effective deep
learning models, has also shown promising capability in useful feature learning and intelligent fault
diagnosis [17–23]. It is widely accepted that only the extracted features in the last convolutional layer
are most suitable as the input vector of the classifier in most researches and applications of CNN
models [7,16–23]. Although the last layer contains more global and invariant high-level features for
category-level fault classification, it is still questionable whether it is most appropriate to directly use
high-level features for practical fault classification problems. The following four points need to be
further considered:
(1) In many practical fault diagnosis applications, the training data and testing data are collected
under different operational conditions, thus they are drawn from the different feature
distribution [23]. It is well known that the extracted high-level features in CNN models are
specific for particular dataset or task, while the low-level features in the hidden layers are
universal and similar for different but related distribution datasets or tasks [24–26]. That is to
say, low-level features extracted from training data in source domain are also applicable to test
data in target domain. The generalization ability of the CNN-based fault diagnosis methods only
taking high-level features into account would be poor. Therefore, low-level features in the hidden
layers should be combined to obtain the better accuracy and generalization performance.
(2) Bearing health conditions with the same fault types under different damage sizes need to
be classified in some practical applications. Since there are still some accurate details and
local characteristics existing in low-level features which are not well preserved in high-level
features [25,26], the classifier should make full use of multi-level features for accurate and complex
fault diagnosis tasks.
(3) The extracted features in each layer are another abstract representation of raw data, thus the
features in all layers can directly impact the diagnosis results.
Sensors 2019, 19, 1088 3 of 21
(4) Sometimes, the extracted low-level features already contain enough fault information used for
effective fault classification, there is no need to extract high-level features, which will cost more
time and computer memory.
Recently, several studies have investigated the multi-level and multi-scale features aggregation
of CNN models, which proves to be more effective in fault diagnosis and many other
applications [24,25,27–29]. These models summarize multi-level or multi-scale features altogether
into a category-level feature, then use it as the input of the full connection layer for more accurate
classification. However, only a small proportion of low-level features in hidden layers are used in these
models. In contrast, this paper takes the full advantage of the extracted multi-level features [30–32].
This is achieved by the following steps: firstly, A CWT-based signal-to-image conversion method is
presented. The presented conversion method can effectively capture enough fault information under
different conditions from nonlinear and non-stationary bearing vibration signals [33–35], and obtain
the time-frequency spectrums of signals that can be regarded as gray-scale images. The problem
of fault diagnosis can be solved by classifying these images. An improved CNN model based on
LeNet-5 is then proposed to automatically extract representative multi-level features from these
images. Finally, the extracted multi-level features at different layers in CNN are fed into multiple RF
classifiers to classify faults independently, and the outputs of multiple classifiers are aggregated by the
winner-take-all ensemble strategy to give the final diagnostic result.
The rest of this paper is organized as follows: Section 2 briefly reviews the related literature.
Section 3 introduces the theory of CNN and RF. Section 4 presents the proposed method using CWT,
CNN and RF. Section 5 analyzes the experimental results. Finally, the conclusions and further work
are given in Section 6.
2. Literature Review
3. Theoretical Background
3.1. CNN
In general, CNN is always composed of multiple convolution layers, pooling layers and fully
connected layers, an input layer, and an output layer. The input layer contains the data or images to
be processed. At a convolutional layer, a set of new feature maps are obtained. For each feature map,
firstly, the input is convolved with a kernel which has a local receptive field. Then, a bias term is added
to the convolution result. Finally, an activation function is applied. The operation is defined as:
Sensors 2019, 19, 1088 5 of 21
j −1
∑ nk=1 Wi,k × xk
j j j
xi = f + bi , (1)
j j −1
where xi donates the i-th output feature map of j-th level; xk donates the k-th input feature map
j
of the ( j − 1)-th level;Wi,k
is the convolution kernel between the i-th output feature map at the j-th
layer and k-th input feature map at the ( j − 1)-th layer; n is the number of the input feature maps;
j
bi is the bias of ith output feature map at the j-th layer; f (·) is the activation function. The most
commonly-used functions are tangent, ReLU, and sigmoid function [12]. The ReLU function can
increase the nonlinearity of CNN. It is adopted in this paper due to its excellent performance when
applied in CNN. ReLU is defined as:
j j0
xi = max (0, xi ), (2)
To decrease the number of parameters in CNN, the convolution kernels for the same feature map
share the same weight vectors and bias. Generally, a max-pooling layer is added to each convolutional
layer, which produces lower-resolution feature maps by sub-sampling operation. Max-pooling is
defined as:
0 0
xia,b = max ( xia ,b : a ≤ a0 < a + p, b ≤ b0 < b + p), (3)
0 0
where xia ,b , xia,b are the ( a, b) pixel in the i-th feature map before and after max-pooling operation,
respectively; Further p is the stride size of pooling window and p should be larger than 1. Note that
excessive value of p may result in a certain degree of information loss. The pooling layer decreases the
size of the input feature maps while maintaining the number of feature maps. With the sub-sampling
technique, the number of parameters in CNN is further reduced.
The last pooling layer is followed by a fully-connected layer. Each neuron in the fully-connected
layer is connected to all feature maps in the last pooling layer. The high-level feature in the
fully-connected layer are extracted and are taken as the input of the output layer. Then, the prediction
output of the CNN model is generated. Finally, the parameters {W, b} (weight vectors and bias) of the
network are fine-tuned by minimizing the loss function, with which the error between the predicted
output y and the targeted output t can be calculated as
1
{W, b}∗ = argmin{W,b}
n ∑ in=1 J (t, y), (4)
where {W, b}∗ is the optimized parameters; n is the number of the labelled samples; and J is a loss
function. In addition, the gradient-based supervised training of the network is performed through the
back propagation algorithm [41].
Individual learner 1
Individual learner 3
Figure 1.1.The
Figure Thearchitecture ofensemble
architecture of ensemblelearning.
learning.
RF is one of the most popular ensemble learning methods, which consists of a bootstrap
aggregating (bagging) of N decision trees with a randomized selection of features at each split.
Given a training dataset, the RF algorithm is as follows:
1. For 𝑛 = 1 𝑡𝑜 𝑁:
(a) Generate a bootstrap sample with replacement from the training dataset.
Sensors 2019, 19, x 6 of 21
Individual learner 1
Sensors 2019, 19, 1088 6 of 21
4. Proposed Method
In this section, a novel data-driven rolling bearing fault diagnosis method based on CWT, CNN
and RF is presented. The flowchart of this method is shown in Figure 2.
Ensemble method
Multiple
Gray-scale RFs
Images
Input
Multi-layer features
Feature extraction
Training dataset
Data organization
Normalization Compression
Gray-scale
Time-frequency Images (1024*S size)
spectrum
CWT Conversion
Bearing vibration signal
Figure 2. 2.The
Figure Theflowchart
flowchart of the
the proposed
proposedmethod.
method.
It contains three major steps. Firstly, the raw vibration signal data from bearing dataset are
converted into time-frequency spectrums using CWT, which are presented in the form of gray-scale
Sensors 2019, 19, 1088 7 of 21
images. The converted images contain enough fault information that is beneficial for fault classification.
Then the images are compressed in order to reduce the computational complexity. Secondly, an
improved CNN model based on LeNet-5 [43] is designed; the parameters of the CNN model are
randomly initialized; and the CNN model is pre-trained on the converted images. The pre-trained
CNN model is used as the feature extractor to learn the representative multi-level features from these
images. Next, the extracted multi-layer features in different layers are fed into multiple RF classifiers
separately and the results of multiple RF classifiers are combined by the winner-take-all ensemble
method. Finally, the combined classification results are used as the final diagnostic result.
The basic wavelet function ψ(t) is usually called the mother wavelet function, based on which,
a family of time-scale wavelets ψa,b (t) can be formulated by scale and translation adjustment
described by:
t−b
− 12
ψa,b (t) = | a| ψ a, b ∈ R, a > 0, (6)
a
where a and b represent the scale and translation factors, respectively. Specifically, the scale factor
a either stretches or compresses the wavelet function to change the oscillating frequency and the
translation factor b changes the position of time window. The longer scale stretches the wavelet and
decreases the frequency, and the smaller scale compresses the wavelet and increases the frequency.
For an arbitrary signal function f (t) ∈ L2 ( R), the corresponding CWT is defined as:
Z +∞
t−b
− 21
CWT f ( a, b) = f (t), ψa,b (t) = | a| f (t)ψ dt, (7)
−∞ a
where ψ(t) is the complex conjugate of the mother wavelet function ψ(t), CWT f ( a, b) is the inner
product of f (t) and ψa,b (t), which reflects the similarity between the signal function and wavelet
function. Wavelet functions have focal features and time-frequency localization properties and can
effectively capture non-stationary signal characteristics. There are many mother wavelet functions,
such as Haar, Meyer, Coiflet, Symlet, Gabor, and Morlet. Among them, Morlet wavelet has been
proven to be superior to others in term of non-stationary rolling bearing vibration signal analysis due
to its similarity to the transient impulse components of bearing faults [44–47]. Thus it is chosen as the
mother wavelet function for bearing fault diagnosis in this paper. The Morlet wavelet in time domain
is defined as:
ψ(t) = exp − β2 t2 /2 cos(πt), (8)
where β is the only parameter which controls the shape of the basic wavelet. As the β value increases,
the resolution of time domain increases and the resolution of frequency domain decreases. CWT
using Morlet can fully capture the signal characteristics and obtain good resolution in both time
and frequency domains [45]. In this paper, continuous Morlet wavelet transform converts the
one-dimension vibration signals in time-domain into two-dimension time-frequency spectrum with
abundant condition information.
The specific process of signal-to-image conversion method is shown in Figure 3. Firstly,
1024 continuous points are sampled randomly from the raw signals [4,7,21,25]. Then the 1024 points
are converted into a 1024 × S time-frequency spectrum that is consisted of coefficient matrices by
Sensors 2019, 19, 1088 8 of 21
the continuous Morlet wavelet transform. Here S indicates that the value of the scale factor a range
Sensors 2019, 19, x 8 of 21
Sensors
from 12019, 19, xIn
to S. a practical application, as long as the value of S is sufficiently large, sufficient 8 ofraw
21
signal characteristics
gray-scale can be obtained.
image. However, the CNN Finally, the time-frequency
model usually has difficulty spectrum is present
in dealing in the
with 1024 × Sform
image, of
gray-scale
gray-scale image. However,
image. However,the theCNN
CNNmodel
modelusually
usuallyhashas difficulty
difficulty in in dealing
dealing with with
10241024
× S × S image,
image, and
and the extra-large size of the image may result in considerable computational complexity. A simple
and
the the extra-large
extra-large size
size of theofimage
the image may
mayon result
result in considerable
in considerable computational complexity.
computational A simple
image compression method based bicubic interpolation [48] is used complexity.
to decreaseAthe simple
size image
of the
image compression
compression method method
based based
on on bicubic
bicubic interpolation
interpolation [48] is [48]
usedistoused to decrease
decrease the sizethe
of size
the of the
image.
image. In this paper, the size of the compressed gray-scale image varies due to the different volumes
image.
In In this paper,
the sizethe
of size of the compressed gray-scale
image image
varies varies
due todue
the to the different volumes
of this paper,
the signal data. the compressed gray-scale different volumes of the
of the signal
signal data. data.
CWT
CWT
conversion
conversion
Image
Image
compression
compressionGray-scale image
Coefficient matrix
Coefficient matrix Gray-scale image
Figure 3. The
Figure specific
3. The process
specific of
processof signal-to-image
ofsignal-to-image conversion
signal-to-image conversion method.
method.
Figure 3. The specific process conversion method.
4.2. Design of the Proposed CNN
4.2. Design of the Proposed CNN
Based on the gray-scale
gray-scale images converted from raw vibration signals, a CNN model based on
Based on the gray-scale images converted from raw vibration signals, a CNN model based on
LeNet-5 is designed and pre-trained for feature learning. The training dataset is used to update the
LeNet-5 is designed and pre-trained for feature learning. The training dataset is used to update the
parameters
parameters of
of CNN by back-propagating
back-propagating errors.
errors. Once the training process finishes, the representative
parameters of CNN by back-propagating errors. Once the training process finishes, the representative
multi-level features can
can be
be extracted
extracted automatically
automatically from
from these
these images.
images.
multi-level features can be extracted automatically from these images.
Figure
Figure 4 illustrates the architecture
architecture of the proposed CNN which contains seven layers, including
Figure 4 illustrates the architecture of the proposed CNN which contains seven layers, including
one input layer,
layer, two
two convolutional
convolutional layers, two pooling layers, one fully-connected layer, and one
one input layer, two convolutional layers, two pooling layers, one fully-connected layer, and one
softmax output layer.
layer.
softmax output layer.
map in C1 is subsampled to the corresponding feature map in layer S2 by the max-pooling operation
of size s1 × s1 . Layer S2 is a pooling layer which is composed of the same number of feature maps of
size ((n1 − c1 + 1)/s1 ) × ((n1 − c1 + 1)/s1 ). Layer C3 and S4 are formed in a similar way. Layer FC5
is a fully-connected layer and the size of each feature map in this layer is 1 × 1. Each pixel in layer FC5
is connected
Sensors 2019, 19, xto a variable sized neighborhood in all feature maps in S4. Finally, a softmax output layer
9 of 21
is followed to output the classification results.
FC5 isEspecially,
connectedthe to afeature
variable sized
maps x (ineighborhood
) in layer FC5 are in all
putfeature
into the maps in S4.
softmax Finally,and
classifier a softmax output
the probability
layer is followed to output the classification results.
distribution of the input sample belonging to each class is calculated as:
Especially, the feature maps 𝑥 ( ) in layer FC5 are put into the softmax classifier and the
probability distribution of the input sample belonging to expeach (i ) is calculated as:
θ jT xclass
p(y(i) = j| x (i) ; θ ) = 𝑒𝑥𝑝k 𝜃 𝑥 ( T) (i)
𝑝(𝑦 ( ) = 𝑗|𝑥 ( ) ; 𝜃) = ∑l =1 exp θl x( )
∑ 𝑒𝑥𝑝(𝜃 𝑥 )
y=
𝑦 (y(i() )==j𝑗|𝑥
argmax j p𝑝(𝑦
= 𝑎𝑟𝑔𝑚𝑎𝑥 | x (i() ;) ;θ𝜃),
), (9)
(9)
where𝑖 i==1,1,2,2,…. ., .𝑛;, n;𝑛 n
where is is the
the numberofoftraining
number data;𝑗 j==1,1,2,2,…. ., .𝑘;, k;k is
trainingdata; k isthethedimension
dimensionofofoutput
outputlayer
layer
and it should be set as the number of fault types. Additionally, express
and it should be set as the number of fault types. Additionally, 𝜃 express the parameters of the
θ the parameters of the softmax
classifier.
softmax The lossThe
classifier. function of the softmax
loss function classifierclassifier
of the softmax is defined as:
is defined as:
FC5:feature maps
S2:feature maps S4:feature maps
RF2 RF3
RF1
Result1 Result2 Result3
Output the ensemble classification result
Figure
Figure 5. The
5. The ensembleofofmultiple
ensemble multiple classifiers.
classifiers.
EachRF
Each RFclassifier
classifierisistrained
trainedindependently
independentlyusing
usingthethefeature
featuremaps
mapsinindifferent
differentlayers
layersofofCNN.
CNN.
Once all the RF classifiers are completely trained, the outputs of three RF classifiers can
Once all the RF classifiers are completely trained, the outputs of three RF classifiers can be combinedbe combined
totoobtain
obtainbetter
betterclassification
classificationperformance
performanceby by the
the winner-take-all
winner-take-all ensemble
ensemble strategy
strategy [38].
[38]. This
Thisisis
because ensemble learning technique has a great impact on the improvement of
because ensemble learning technique has a great impact on the improvement of model performance model performance
andhas
and hasbeen
beenwidely
widelyapplied
appliedtotofault
faultdiagnosis
diagnosis[30–32].
[30–32].InInthethewinner-take-all
winner-take-allensemble
ensemblestrategy,
strategy,
several base RF classifiers are competing with each other, so the ensemble output is
several base RF classifiers are competing with each other, so the ensemble output is consistent consistent with the
with
the output of the base RF classifier which obtains the best classification performance in different
layers. The general procedure of the proposed fault diagnosis method based on CNN and RF
ensemble is given in Algorithm 1.
Sensors 2019, 19, 1088 10 of 21
output of the base RF classifier which obtains the best classification performance in different layers.
The general procedure of the proposed fault diagnosis method based on CNN and RF ensemble is
given in Algorithm 1.
5. Experimental Results
In order to evaluate the effectiveness of the proposed method for bearing fault diagnosis, two
case studies are conducted with two bearing datasets from reliance electric motor and rolling mill,
respectively. All the experiments are carried out with Matlab R2018a on a desktop computer equipped
with an Intel 4-core 2.3 GHz processor, 8 GB memory and a 500 GB hard disk.
5.1. Case Study 1: Bearing Fault Diagnosis for Reliance Electric Motor
In this experiment, two datasets are generated. Firstly, for each health condition, 50 samples with
1024 data points are randomly selected under each load condition in the training dataset. That is to say,
there are 2000 training samples with 10 health conditions under four load conditions. On the other
hand, in the test dataset, 2000 samples are randomly selected in the same manner. In addition, another
dataset is generated to further verify the robustness and generalization ability of the proposed method.
Furthermore, the training and testing samples of Dataset II are selected under different operational
loads. 1500 samples with 10 health conditions are randomly selected under the loads of 0, 1, and 2
in the training dataset, while the test dataset is composed of 500 samples with 10 health conditions
under the load of 3. More details of the two datasets, named Dataset I and Dataset II, are listed in
Table 1. It should be noted that it is important to randomize the order of all the samples in the dataset
before training.
The raw vibration signal data are converted to gray-scale images by CWT and the scale factor
of CWT is set as 1024. Because of the volume of the signal data in this case study, all the images are
compressed to the size of 32 × 32 by the imresize function based on bicubic interpolation in Matlab.
In this case study, the detailed structure of the CNN model is shown in Table 2. The performance of
CNN model reaches the peak with the above configuration. Here C1(6@28 × 28) denotes that 6 feature
maps of size 28 × 28 in layer C1 are used, C1(6@5 × 5) denotes that the layer C1 is obtained by the
convolution between the 6 kernels of size 5×5 and the input layer. In addition, S2(2 × 2) denotes that
the pooling layer S2 is obtained by the max-pooling operation on layer C1 of size 2 × 2. The parameters
of the CNN model are optimized in a heuristic way [22]. The value of initial learning rate is selected
from 0.0001 to 0.1 with the step of 0.005. As a result, the model achieves the best training performance
when the initial learning rate is set as 0.05. Considering the number of samples in training dataset, the
batch sizes are selected among these values: 30, 60, 90 and 120, the model performance is checked for
each value. The best batch size is 120 for dataset I, 90 for dataset II. The number of epochs is set as 60.
(a)
(b)
(c)
(d)
Figure6. 6.
Figure The
The rawraw vibration
vibration signal
signal waveform
waveform and conversion
and conversion results: results: (a) condition;
(a) normal normal condition;
(b) inner
(b) fault
race innercondition;
race fault(c)condition; (c) ball fault
ball fault condition; condition;
(d) outer (d)condition.
race fault outer race fault condition.
The
Theweights
weightsandandbias
biasofofthe
theCNN
CNNmodel
modelare
areoptimized
optimizedthrough
throughtraining
trainingthethegray-scale
gray-scaleimage
image
samples. The training accuracy curves for dataset I and II are shown in Figure 7.
samples. The training accuracy curves for dataset I and II are shown in Figure 7. In canIn can be seen that that
be seen the
proposed CNNCNN
the proposed modelmodel
can converge with a limited
can converge numbernumber
with a limited of iterations so that the
of iterations sotraining
that theaccuracy
training
for two datasets both can reach almost 100%.
accuracy for two datasets both can reach almost 100%.
Sensors 2019, 19, x 13 of 21
Sensors 2019,
Sensors 19, x1088
2019, 19, 1313ofof21
21
(a) (b)
Figure 7. The training (a) accuracy curves of the proposed CNN model: (b) (a) Dataset I;
(b)Figure
Dataset
Figure II. training
7.7. The
The trainingaccuracy
accuracy curves
curves of the of the proposed
proposed CNN
CNN model: model:I; (b)
(a) Dataset (a) Dataset
DatasetII. I;
(b) Dataset II.
To
To demonstrate
demonstrate the representative
representative and robust feature feature extraction
extraction ability of the proposed CNN
model, taking
taking the
the dataset
dataset I forI for example,
example, the the
dimension
To demonstrate the representative and robust feature extraction dimension
of the of the extracted
extracted features
ability ofisfeatures
reduced
the proposedistoreduced
two
CNN for
to twotaking
model, for visualization
visualization by dataset
the by example,
the t-distributed
I for the t-distributed
stochastic
the dimension stochastic
neighbor of the neighbor
extracted embedding
embedding (t-SNE) is
features (t-SNE)
method.
reduced to method.
The t-SNE
two for
The t-SNE
technique technique
[50], an [50],
efficient an efficient
nonlinear nonlinear
dimensionality dimensionality
reduction
visualization by the t-distributed stochastic neighbor embedding (t-SNE) method. The t-SNE reduction
method, is method,
used for is used
embedding for
embedding
high-dimensional
technique high-dimensional
[50], an data data
for nonlinear
efficient for visualization
visualization in a low-dimensional
in a low-dimensional
dimensionality reduction method, space.
space.is The The two-dimensional
usedtwo-dimensional
for embedding
visualizations of the
high-dimensional extracted
data features in the
for visualization inlasta layer under load 0,space.
low-dimensional 1, 2, 3, and
The loads 0–3, also the
two-dimensional
multi-level features under loads 0–3 are shown in Figure 8, in which
visualizations of the extracted features in the last layer under load 0, 1, 2, 3, and loads 0–3,represent
multi-level features under loads 0–3 are shown in Figure 8, in different
which colors
different represent
colors different
also the
health
different
multi-levelconditions.
health
features As shown
conditions.
under As
loads in shown
Figure
0–3 arein 8a–e, it can
Figure
shown be found
in8a–e,
Figure it can that
8, in under
bewhich
found all
thatdifferent
different under
colors operational
all different
represent
operational
conditions
different conditions
the
health extracted the
conditions. extracted
high-level
As shown high-level
features for features
in Figure 8a–e,for
the same it the
healthcansame healthare
conditions
be found conditions
that under are
relatively relatively
allcentralized
different
centralized
except very except
little ofvery little
samples, of samples,
while the while
features the
for features
the for
different
operational conditions the extracted high-level features for the same health conditions are relatively the different
health health
conditions conditions
are separated.are
separated. itTherefore,
Therefore,
centralized can be
except very itlittle
can of
concluded bethatconcluded
the proposed
samples, whilethatthe the
CNN proposed
features model for hasCNN
the model
strong
different has in
ability
health strong abilityare
extracting
conditions in
the
extracting
representative the representative
features, and isfeatures,
certainly and is
effective certainly
for effective
multi-classification
separated. Therefore, it can be concluded that the proposed CNN model has strong ability in for multi-classification
problems in problems
fault diagnosis. in
fault diagnosis.
On the
extracting other On the other features,
hand, by
the representative the hand, byand
contrast the contrast
analysis between
is certainly analysis
Figure
effective between
8e–g, it Figure 8e,f and Figure
is worth
for multi-classificationmentioning problems 8g, itin
that theis
worth
vast
fault mentioning
majority
diagnosis. ofOn that
the the
samples vast
belonging
other majority
hand, to the
by the ofsame
samples
contrast health belonging
conditions
analysis to the
between cansame
also health
Figure be8e,f conditions
welland gathered can
Figuretogether also
8g, it is
be the
in wellmentioning
worth gathered
hidden together
layers.
that In in themajority
theaddition,
vast hidden layers. In addition,
the two-dimensional
of samples belonging thetotwo-dimensional
visualizations
the sameofhealth visualizations
the extracted
conditions canofalso
features the
in
extracted
different
be features
levels
well gathered underin different
togetherthe complex
in the levels under
operational
hidden the condition
layers. complex
In addition, operational
(loads:0–3) condition
are different
the two-dimensional (loads:0–3)
from each are
visualizations different
other,ofthus
the
from eachfeatures
concluding
extracted other,
that the thus concluding
inlow-level
different that
features
levels thealso
are
under low-level
the useful
complex features
faultare
foroperational alsocondition
usefuland
classification for(loads:0–3)
fault
the classification
multi-level and
features
are different
the multi-level
can
from contribute features
each other,different can contribute
knowledge
thus concluding that different
tothe
the diagnosis
low-level knowledge
results. are
features to the
alsodiagnosis
useful forresults.
fault classification and
the multi-level features can contribute different knowledge to the diagnosis results.
(a) (b)
(a) (b)
Figure 8. Cont.
Sensors 2019, 19, x 14 of 21
(c) (d)
(e) (f)
(g)
Figure 8. Visualization
Figure 8. Visualization of
of multi-level
multi-level features
features via
via t-SNE:
t-SNE: (a)
(a) FC5
FC5 (Load:
(Load: 0);
0); (b)
(b) FC5
FC5 (Load:
(Load: 1);
1); (c)
(c) FC5
FC5
(Load: 2); (d) FC5 (Load: 3); (e) FC5 (Loads: 0–3); (f) S2 (Loads: 0–3); (g) S4 (Loads: 0–3).
(Load: 2); (d) FC5 (Load: 3); (e) FC5 (Loads: 0–3); (f) S2 (Loads: 0–3); (g) S4 (Loads: 0–3)
The
The extracted
extractedfeature
featuremaps
mapsininlayer
layerS2,
S2,S4S4
andandFC5FC5 areare
fedfed
intointo
three RF classifiers
three separately
RF classifiers and
separately
the
andtraining errorerror
the training curves are shown
curves in Figure
are shown 9. It should
in Figure be noted
9. It should that the
be noted thattraining errorserrors
the training for the
forthree
the
RF classifiers are close to zero, which further indicates that the feature maps in the
three RF classifiers are close to zero, which further indicates that the feature maps in the hiddenhidden layers also
contains important
layers also containsinformation that can contribute
important information that canto the diagnosis
contribute to theresults.
diagnosis results.
Sensors 2019, 19, x 15 of 21
Sensors 2019, 19, 1088 15 of 21
(a) (b)
Figure
Figure9.9.The
Thetraining
trainingerror
error curves of
of 33 RF
RFclassifiers:
classifiers:(a)
(a)Dataset
Dataset I; (b)
I; (b) Dataset
Dataset II. II.
Based
Basedon the
on learned
the learned multi-layer features,
multi-layer the diagnosis
features, the diagnosis experiments on both
experiments ondatasets are executed
both datasets are
executed
10 times. 10 times.
The mean The andmean
standardand standard
deviation deviation of the diagnostic
of the diagnostic accuracy
accuracy of softmax,
of softmax, RF1,RF1,
RF2, RF2,RF3
andRF3 and ensemble
ensemble classifierclassifier are shown
are shown in Table
in Table 3. As 3. As shown
shown ininTable
Table3,3,itit can
can bebe concluded
concludedthat thatthe the
method using different classifiers in different layers achieves remarkable
method using different classifiers in different layers achieves remarkable results. The method using results. The method using
RF2RF2 classifier
classifier in in layer
layer S4S4 hasthe
has thehighest
highestaccuracy
accuracy at at 99.73%
99.73% for for Dataset
DatasetIIand andthe themethod
methodusingusing RF2RF2
classifier in layer S4 has the highest accuracy at 99.08% for Dataset II.
classifier in layer S4 has the highest accuracy at 99.08% for Dataset II. Especially for Dataset II, Especially for Dataset II, the
the
accuracies of the classifiers using the feature maps in other layers are
accuracies of the classifiers using the feature maps in other layers are range from 95.02% to 98.26%, range from 95.02% to 98.26%,
which
which areare inferior
inferior to to those
those ofofthe
theensemble
ensembleclassifier.
classifier. It It is
is worth
worth emphasizing
emphasizingthat thatthe
thetraining
training and
and
testing samples of Dataset II are selected under different loads, that is to say, feature distribution of
testing samples of Dataset II are selected under different loads, that is to say, feature distribution of
training data is different from that of testing data, so the clustering performance of different layers
training data is different from that of testing data, so the clustering performance of different layers
does not have direct impact on classification accuracy for Dataset II. From the results, the different
does not have direct impact on classification accuracy for Dataset II. From the results, the different
loads in training and test dataset cannot affect the classification accuracies of the proposed method,
loads in training and test dataset cannot affect the classification accuracies of the proposed method,
and the proposed method promotes the diagnostic accuracy a lot compared to the standard CNN
andmodel
the proposed
for Dataset method
II. This promotes thecharacteristics
is due to local diagnostic accuracyand accuratea lot details
compared to the
of faults standard
exist in low-levelCNN
model for Dataset II. This is due to local characteristics and accurate details
features which are not well preserved in high-level features. Therefore, it is not always the best way of faults exist in low-level
features whichuse
to directly are the
not high-level
well preserved in high-level
features for bearing features. Therefore,where
fault diagnosis, it is not always
health the best way
conditions with to
directly use the high-level features for bearing fault diagnosis, where health
different damage sizes from the same fault type need to be classified. On the other hand, the extracted conditions with different
damage sizesfeatures
low-level from the in same fault type
the hidden layersneed
are to be classified.
universal On the
and similar forother hand,
different but the extracted
related low-level
distribution
features
trainingin and
the hidden layerswhile
test datasets, are universal and similar
the more abstracted for different
high-level butare
features related distribution
sensitive training
and specific for
andparticular
test datasets,
faultwhile the more
diagnosis abstracted
dataset or task.high-level
Furthermore, features
the are sensitive
standard and specific
deviation of theforproposed
particular
fault diagnosis
method dataset
is 0.109% foror task. Furthermore,
dataset the standard
I, 0.379% for Dataset II. It candeviation of thethat
be concluded proposed method isof0.109%
the introduction the
for ensemble
dataset I, method
0.379% for of Dataset
multipleII.RFIt classifiers using multi-level
can be concluded features canofimprove
that the introduction the robustness
the ensemble method of
and generalization
multiple RF classifiersability using of the proposed
multi-level method.
features can improve the robustness and generalization ability
of the proposed method.
Table 3. The mean and standard deviation of accuracy results.
The meanfor
Table 3. Accuracy and standard deviation ofAccuracy
Standard accuracy for
results. Standard
Methods
Dataset I (%) Deviation (%) Dataset II (%) Deviation (%)
Accuracy for Standard Accuracy for Standard
CNN
Methods+ Softmax 99.66 0.101 97.04 0.45
Dataset I (%) Deviation (%) Dataset II (%) Deviation (%)
CNN + RF1 99.46 0.243 98.26 0.542
CNN + CNN
Softmax
+ RF2 99.66
99.73 0.101
0.109 97.04
99.08 0.3790.45
CNNCNN
+ RF1+ RF3 99.46
99.66 0.243
0.128 98.26
95.02 0.542
0.447
CNN + RF2 method
Proposed 99.73
99.73 0.109
0.109 99.08
99.08 0.379
0.379
CNN + RF3 99.66 0.128 95.02 0.447
Proposed method 99.73 0.109 99.08 0.379
5.1.3. Comparison with Other Methods
Five fault diagnosis methods based on traditional machine learning models, including BPNN
5.1.3. Comparison with Other Methods
(back-propagation neural network), SVM, and standard deep learning models including DBN, DAE,
CNNFiveare
fault diagnosis
also methods
implemented based
in this caseon traditional
study machinepurpose.
for comparison learningManual
models, including
feature BPNN
extraction
(back-propagation neural network), SVM, and standard deep learning models including DBN,
DAE, CNN are also implemented in this case study for comparison purpose. Manual feature
Sensors 2019, 19, 1088 16 of 21
extraction process is conducted in fault diagnosis methods based on BPNN, SVM. Four time-domain
features and six frequency-domain features are manually selected and assist BPNN and SVM to
diagnose faults. For more details of the adopted manual feature extraction technique, please refer
to Ref. [9]. The representative features are automatically extracted and selected in fault diagnosis
methods based on standard DBN [17], DAE [20] and CNN with the aid of the corresponding data
pre-processing methods.
All the comparison experiments are conducted 10 times on the dataset I and II. In order to
verify the superiority of the proposed method, the mean accuracy, average computation time of the
above-mentioned methods are compared to that of the proposed method. The comparison results are
shown in Table 4.
It can be seen that the mean diagnosis accuracies of the traditional machine learning-based
methods are significantly worse than those of the proposed method for the two datasets, which results
from the shallow architecture of the traditional machine learning-based methods not being able to
explore the complex relationships between the health conditions and the signal data. In addition,
the diagnosis performance of these methods depends heavily on manual feature extraction. However,
the features extracted manually show a poor ability of representing raw data. The proposed method
also achieves better results compared to the other deep learning based methods, especially for
Dataset II, which further demonstrates the superiority of the proposed method. On the other
hand, although the average computation time of the proposed method is much longer than that
of traditional machine learning-based methods, but the average time spent on signal preprocessing and
manual feature extraction in these methods is not taken into account, which is a very time-consuming
and labor-intensive task. Additionally, compared with traditional deep learning based methods,
the proposed method need more computation time because ensemble requires more computation.
However, with the development of the hardware technology, a bit more computation time less than
10 seconds can be ignored due to the accurate performance of the proposed method. The excellent
performance of the proposed method is mainly due to the strong automatic feature learning ability of
the proposed CNN model and the generalization ability of the ensemble classifier.
5.2. Case Study 2: Bearing Fault Diagnosis for BaoSteel Rolling Mill
Sensors
noted that2019,
the19,zero
x 17 of 21in
padding technique is used to keep the dimension of feature maps unchanged
this case study. The optimal parameters of the CNN model are follows: the batch size is set as 120,
this case study. The optimal parameters of the CNN model are follows: the batch size is set as 120,
the number of epochs is set as 50, and the learning rate is set as 0.05.
the number of epochs is set as 50, and the learning rate is set as 0.05.
Table 5. The detailed structure of the CNN model.
Table 5. The detailed structure of the CNN model.
Layer Name
Layer Name Configuration
Configuration Kernel/Pooling
Kernel/Pooling Size Size
Input Input 16 ×
16 16
× 16
C1 C1 32@16 ×
32@16 16
× 16 3 ×3
32@3 ×32@3
S2 S2 32@8 ×
32@8 × 88 2×2 2×2
C3 C3 64@8 ×
64@8 × 88 3 ×3
64@3 ×64@3
S4 S4 64@4 ×
64@4 × 44 2×2 2×2
FC5 FC5 128
128
OutputOutput 44
5.2.2.
5.2.2. Results
Results andand Discussion
Discussion
In this
In this experiment,
experiment, thethe samesignal-to-image
same signal-to-image conversion
conversion process
process with
withcase
casestudy
study1 1isisexecuted
executed
andand
thethe conversion
conversion results
results areare shownininFigure
shown Figure10.
10.
(a)
(b)
(c)
(d)
Figure
Figure 10. The
10. The rawraw vibration
vibration signalsignal waveform
waveform and conversion
and conversion results: results: (a) Normal
(a) Normal condition;
condition; (b) Rolling
ball(b) Rolling
defect ball defect
condition; condition;
(c) Inner (c) Inner
race defect race defect
condition; condition;
(d) Outer (d) Outer
race defect race defect condition.
condition.
Sensors 2019, 19, 1088 18 of 21
From the conversion results, it can be seen that the difference between the gray-scale images
containing the signal distribution characteristics in the time-frequency domain can be easily detected.
This further proves the effectiveness of the conversion method. The CNN model are pre-trained and
finally the training accuracy reaches to 99.5% through the weights and bias adjustment. The multi-layer
features are also extracted to train different classifiers, the experiments are executed 10 times and the
mean and standard deviation of the diagnostic accuracy are shown in Table 6. From Table 6, it can be
seen that the method using RF3 classifier based on feature maps in layer FC5 outperforms the method
using other classifiers based on feature maps in other layers. Remarkably, the accuracy performance in
this case study is weaker than that in the case study 1. Additionally, the architecture of the proposed
CNN model is adjusted, but accuracy results exhibit no noticeable improvement. Finally, the raw data
are checked. As a result, it was stated that there are some “dirty data” including redundancy data and
missing data in the training samples.
(1) Presenting a novel signal-to-image conversion method based on CWT. The time-frequency spectra
obtained from CWT not only fully capture the non-stationary signal characteristics and contain
abundant fault information, but also can be regarded as two-dimensional gray-scale images,
which are suitable as the input of CNN.
(2) Implementing an automatic feature extractor using CNN. An improved CNN model with
optimized configuration is constructed and pre-trained in a supervised manner. The pre-trained
model has the ability of extracting sensitive and representative multi-level features automatically
from gray-scale images. The extracted multi-level features contain both local and global fault
information and can contribute different knowledge to diagnosis results.
Sensors 2019, 19, 1088 19 of 21
(3) Applying the winner-take-all ensemble strategy of multiple RF classifiers to improve the accuracy
and generalization performance of the fault diagnosis method. The multi-level features, especially
the low-level features in hidden layers, are used for classifying faults independently in this paper.
The extracted features in each layer are fed into a RF classifier and each RF classifier separately
outputs the diagnostic result. The outputs of all classifiers are combined by the winner-take-all
ensemble strategy to generate more accurate diagnostic result.
(4) The proposed method is validated by two bearing datasets from public website and BaoSteel
MRO Management System. The former achieves higher diagnostic accuracy at 99.73% than the
latter at 97.38%. In particular, the different feature distribution of training and test dataset has
almost no effect on the classification accuracy, which indicates the strong generalization ability.
The experimental results prove the availability and superiority of the proposed method.
Although the proposed method has made some achievements, there are still two limitations need
to be improved in the future work. Firstly, the raw data quality significantly affects the performance
of the proposed method and data cleaning process is vital in practical application. Secondly, the
enormous training data results in slow convergence speed. The efficiency of the proposed method
can be enhanced with the introduction of parallel computing architectures. In addition, the time
complexity of the proposed method need to be considered in the design and validation process of
deep learning model. For example, there is no need to extract higher-level features of CNN when the
low-level features are enough to realize high-precision fault diagnosis, which may contribute to the
efficiency improvement. Remarkably, the proposed data-driven method is expected to be widely used
in the fault diagnosis of other similar types of rotating machinery, such as gearboxes, pumps, and
optical scanning systems.
Author Contributions: G.X.: Methodology, Validation, Writing–original draft; Z.J.: Software; M.L., D.S. and W.S.:
Writing–review & editing.
Funding: Research was funded by the National Nature Science Foundation of China, grant number 61573257
and 71690234.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Jiang, W.; Xie, C.H.; Zhuang, M.Y.; Shou, Y.H.; Tang, Y.C. Sensor Data Fusion with Z-Numbers and Its
Application in Fault Diagnosis. Sensors 2016, 16, 1509. [CrossRef] [PubMed]
2. Liu, R.N.; Yang, B.Y.; Zio, E.; Chen, X.F. Artificial intelligence for fault diagnosis of rotating machinery:
A review. Mech. Syst. Signal Process. 2018, 108, 33–47. [CrossRef]
3. Li, C.; Sánchez, R.; Zurita, G.; Cerrada, M.; Cabrera, D. Fault Diagnosis for Rotating Machinery Using
Vibration Measurement Deep Statistical Feature Learning. Sensors 2016, 16, 895. [CrossRef] [PubMed]
4. Sun, J.; Yan, C.; Wen, J. Intelligent Bearing Fault Diagnosis Method Combining Compressed Data Acquisition
and Deep Learning. IEEE Trans. Instrum. Meas. 2018, 23, 101–110. [CrossRef]
5. Lu, S.; He, Q.; Yuan, T.; Kong, F. Online Fault Diagnosis of Motor Bearing via Stochastic-Resonance-Based
Adaptive Filter in an Embedded System. IEEE Trans. Syst. Man Cybern. Syst. 2017, 47, 1111–1122. [CrossRef]
6. Cai, B.; Zhao, Y.; Liu, H.; Xie, M. A Data-Driven Fault Diagnosis Methodology in Three-Phase Inverters for
PMSM Drive Systems. IEEE Trans. Power Electron. 2016, 32, 5590–5600. [CrossRef]
7. Wen, L.; Li, X.; Gao, L.; Zhang, Y. A New Convolutional Neural Network Based Data-Driven Fault Diagnosis
Method. IEEE Trans. Ind. Electron. 2018, 65, 5990–5998. [CrossRef]
8. Flores-Fuentes, W.; Sergiyenko, O.; Gonzalez-Navarro, F.F.; Rivas-López, M.; Rodríguez-Quiñonez, J.C.;
Hernández-Balbuena, D.; Tyrsa, V.; Lindner, L. Multivariate outlier mining and regression feedback for 3D
measurement improvement in opto-mechanical system. Opt. Quantum Electron. 2016, 48, 403. [CrossRef]
9. Lei, Y.; He, Z.; Zi, Y. EEMD method and WNN for fault diagnosis of locomotive roller bearings.
Expert Syst. Appl. 2011, 38, 7334–7341. [CrossRef]
10. Lin, L.; Ji, H. Signal feature extraction based on an improved EMD method. Measurement 2009, 42, 796–803.
[CrossRef]
Sensors 2019, 19, 1088 20 of 21
11. Ngaopitakkul, A.; Bunjongjit, S. An application of a discrete wavelet transform and a back-propagation
neural network algorithm for fault diagnosis on single-circuit transmission line. Int. J. Syst. Sci. 2013, 44,
1745–1761. [CrossRef]
12. Sun, W.; Chen, J.; Li, J. Decision tree and PCA-based fault diagnosis of rotating machinery. Noise Vib. Worldw.
2007, 21, 1300–1317. [CrossRef]
13. Jin, X.; Zhao, M.; Chow, T.W.S.; Pecht, M. Motor Bearing Fault Diagnosis Using Trace Ratio Linear
Discriminant Analysis. IEEE Trans. Ind. Electron. 2013, 61, 2441–2451. [CrossRef]
14. Yang, Y.; Yu, D.; Cheng, J. A fault diagnosis approach for roller bearing based on IMF envelope spectrum
and SVM. Measurement 2007, 40, 943–950. [CrossRef]
15. Pandya, D.H.; Upadhyay, S.H.; Harsha, S.P. Fault diagnosis of rolling element bearing with intrinsic mode
function of acoustic emission data using APF-KNN. Expert Syst. Appl. 2013, 40, 4137–4145. [CrossRef]
16. Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef] [PubMed]
17. Shao, H.; Jiang, H.; Zhang, X.; Niu, M. Rolling bearing fault diagnosis using an optimization deep belief
network. Meas. Sci. Technol. 2015, 26, 11500. [CrossRef]
18. He, M.; He, D. Deep Learning Based Approach for Bearing Fault Diagnosis. IEEE Trans. Ind. Appl. 2017, 53,
3057–3065. [CrossRef]
19. Qi, Y.; Shen, C.; Wang, D.; Shi, J.; Jiang, X.; Zhu, Z. Stacked Sparse Autoencoder-Based Deep Network for
Fault Diagnosis of Rotating Machinery. IEEE Access 2017, 5, 15066–15079. [CrossRef]
20. Shao, H.; Jiang, H.; Liu, X.; Wu, S. Intelligent fault diagnosis of rolling bearing using deep wavelet
auto-encoder with extreme learning machine. Knowl.-Based Syst. 2018, 140, 1–14.
21. Xia, M.; Li, T.; Xu, L.; Liu, L.; Silva, C.W. Fault Diagnosis for Rotating Machinery Using Multiple Sensors and
Convolutional Neural Networks. IEEE/ASME Trans. Mechatron. 2018, 23, 101–110. [CrossRef]
22. Lee, K.B.; Cheon, S.; Chang, O.K. A Convolutional Neural Network for Fault Classification and Diagnosis in
Semiconductor Manufacturing Processes. IEEE Trans. Semicond. Manuf. 2017, 30, 135–142. [CrossRef]
23. Wen, L.; Gao, L.; Li, X. A New Deep Transfer Learning Based on Sparse Auto-Encoder for Fault Diagnosis.
IEEE Trans. Syst. Man Cybern. Syst. 2017, 99, 1–9. [CrossRef]
24. Li, H.; Chen, J.; Lu, H.; Chi, Z. CNN for saliency detection with low-level feature integration. Neurocomputing
2017, 226, 212–220. [CrossRef]
25. Ding, X.; He, Q. Energy-Fluctuated Multiscale Feature Learning with Deep ConvNet for Intelligent Spindle
Bearing Fault Diagnosis. IEEE Trans. Instrum. Meas. 2017, 66, 1926–1935. [CrossRef]
26. Zeiler, M.D.; Fergus, R. Visualizing and Understanding Convolutional Networks. In Proceedings of the
European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 818–833.
27. Sun, Y.; Wang, X.; Tang, X. Deep Learning Face Representation from Predicting 10,000 Classes. In Proceedings
of the IEEE International Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA,
23–28 June 2014; pp. 1891–1898.
28. Lee, J.; Nam, J. Multi-Level and Multi-Scale Feature Aggregation Using Pre-trained Convolutional Neural
Networks for Music Auto-tagging. IEEE Signal Process. Lett. 2017, 24, 1208–1212. [CrossRef]
29. Sermanet, P.; Lecun, Y. Traffic sign recognition with multi-scale Convolutional Networks. In Proceedings
of the International Joint Conference on Neural Networks, San Jose, CA, USA, 31 July–5 August 2011;
pp. 2809–2813.
30. Zheng, J.; Pan, H.; Cheng, J. Rolling bearing fault detection and diagnosis based on composite multiscale
fuzzy entropy and ensemble support vector machines. Mech. Syst. Signal Process. 2017, 85, 746–759.
[CrossRef]
31. Zhang, X.; Wang, B.; Chen, X. Intelligent fault diagnosis of roller bearings with multivariable ensemble-based
incremental support vector machine. Knowl.-Based Syst. 2015, 89, 56–85. [CrossRef]
32. Shao, H.; Jiang, H.; Lin, Y.; Li, X. A novel method for intelligent fault diagnosis of rolling bearings using
ensemble deep auto-encoders. Mech. Syst. Signal Process. 2018, 102, 278–297. [CrossRef]
33. Li, X.; Ding, Q.; Sun, J.Q. Remaining useful life estimation in prognostics using deep convolution neural
networks. Reliab. Eng. Syst. Saf. 2018, 172, 1–11. [CrossRef]
34. Do, V.T.; Chong, U.P. Signal Model-Based Fault Detection and Diagnosis for Induction Motors Using Features
of Vibration Signal in Two-Dimension Domain. Stroj. Vestn. 2011, 57, 655–666. [CrossRef]
35. Chohra, A.; Kanaoui, N.; Amarger, V. A Soft Computing Based Approach Using Signal-To-Image Conversion
for Computer Aided Medical Diagnosis. Inf. Process. Secur. Syst. 2005, 365–374. [CrossRef]
Sensors 2019, 19, 1088 21 of 21
36. Flores-Fuentes, W.; Rodríguez-Quiñonez, J.C.; Hernandez-Balbuena, D.; Rivas-López, M.; Sergiyenko, O.;
Gonzalez-Navarro, F.F.; Rivera-Castillo, J. Machine vision supported by artificial intelligence. In Proceedings
of the IEEE International Symposium on Industrial Electronics, Istanbul, Turkey, 1–4 June 2014.
37. Wang, G.; Sun, J.; Ma, J.; Xu, K.; Gu, J. Sentiment classification: The contribution of ensemble learning.
Decis. Support Syst. 2014, 57, 77–93. [CrossRef]
38. Guo, C.; Yang, Y.; Pan, H.; Li, T.; Jin, W. Fault analysis of High Speed Train with DBN hierarchical
ensemble. In Proceedings of the International Joint Conference on Neural Networks, Vancouver, BC,
Canada, 24–29 July 2016.
39. Wang, Z.; Lai, C.; Chen, X.; Yang, B.; Zhao, S.; Bai, X. Flood hazard risk assessment model based on random
forest. J. Hydrol. 2015, 527, 1130–1141. [CrossRef]
40. Santur, Y.; Karaköse, M.; Akin, E. Random forest based diagnosis approach for rail fault inspection in
railways. In Proceedings of the National Conference on Electrical, Electronics and Biomedical Engineering,
Bursa, Turkey, 1–3 December 2016.
41. Qian, Q.; Jin, R.; Yi, J.; Zhang, L.; Zhu, S. Efficient distance metric learning by adaptive sampling and
mini-batch stochastic gradient descent (SGD). Mach. Learn. 2013, 99, 353–372. [CrossRef]
42. Safavian, S.R.; Landgrebe, D. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern.
2002, 21, 660–674. [CrossRef]
43. Lécun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition.
Proc. IEEE 1998, 86, 2278–2324. [CrossRef]
44. Huang, S.J.; Hsieh, C.T. High-impedance fault detection utilizing a Morlet wavelet transform approach.
IEEE Trans. Power Deliv. 1999, 14, 1401–1410. [CrossRef]
45. Lin, J.; Qu, L. Feature extraction based on morlet wavelet and its application for mechanical fault diagnosis.
J. Sound Vib. 2000, 234, 135–148. [CrossRef]
46. Peng, Z.K.; Chu, F.L. Application of the wavelet transform in machine condition monitoring and fault
diagnostics: A review with bibliography. Mech. Syst. Signal Process. 2004, 18, 199–221. [CrossRef]
47. Verstraete, D.; Ferrada, A.; Droguett, E.L.; Meruane, V.; Modarres, M. Deep Learning Enabled Fault Diagnosis
Using Time-Frequency Image Analysis of Rolling Element Bearings. Shock Vib. 2017, 2017, 1–17. [CrossRef]
48. MathWorks. Available online: http://cn.mathworks.com/help/images/ref/imresize.html (accessed on
12 December 2018).
49. Loparo, K. Case Western Reserve University Bearing Data Centre Website. 2012. Available online: http:
//csegroups.case.edu/bearingdatacenter/pages/download-data-file (accessed on 12 December 2018).
50. Van der Maaten, L.; Hinton, G.; Maaten, L.D. Visualizing data using t-SNE. J. Mach. Learn. Res. 2017, 9,
2579–2605.
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).