An autoencoder based algorithm for fault detection of rotating machines, suitable for online learning and standalone applications (1)-converted
An autoencoder based algorithm for fault detection of rotating machines, suitable for online learning and standalone applications (1)-converted
Abstract— Development of an autoencoder-based algorithm faults at various speeds and using bulky instrumentational
for fault detection in rotary machines is presented in this paper. devices.
As the cornerstone of any machine learning model, feature The most common way in extracting the time domain
engineering is thoroughly studied to select the best set of features features is to measure the signal root-mean-square (RMS)
for the application. Then, an autoencoder architecture is
employed to detect the anomalous behavior of the machine. A
level and crest factor, i.e., the ratio of maximum signal value
laboratory setup is designed and built to provide the training and to its RMS [2]. Miyachi and Seki [4] used root mean square
test data and validate the proposed algorithm. It is also shown (RMS) and crest factor as features in the time domain to
that the proposed method is well suited for online learning and diagnose localized faults in ball bearings. The RMS value is
real-time applications. very useful in detecting imbalances in rotating machines. since
it's a benchmark for a signal's energy level. [1]
Keywords— Predictive maintenance; Novelty detection; Kurtosis is another well-known feature of the time domain. It
Autoencoders; Wavelet Transform; Rotating machines is the fourth moment of the signal distribution and compares
the flatness of the signal distribution with a normal
I. INTRODUCTION distribution. Altmann J and Mathew [5] used kurtosis to detect
gearbox and bearing faults.
Modern rotating machines such as turbomachines, Williams T, et al. [6] used both accelerometer and
pumps, centrifugal compressors, motors, generators, machine acoustic emission sensors to detect faults in rolling bearings.
tool spindles, etc., are being widely used in industrial They combined RMS, peak value, crest factor, and kurtosis,
applications, and their failure leads to a lot of economic loss. along with high-frequency resonance techniques to diagnose
Therefore, a reliable method to detect impending faults in very faults in both roller and ball bearings.
early stages is of highest importance. During the past few Frequency domain features are another category of
decades, researchers and engineers developed condition useful features in condition monitoring and fault detection.
monitoring and predictive maintenance techniques to They have been studied extensively in the last three decades
overcome this issue. The fault diagnosis methods have been and generally showed better performance than time-domain
labor-intensive and expensive until recently. There have been features [1]. Using the Fast Fourier Transform (FFT) is the
vast and ongoing efforts to automate this process by using most traditional approach for feature engineering in the
intelligent algorithms [1]. frequency domain. Baydar and Ball [7], successfully used the
Fault diagnosis procedure is typically done in three steps: power spectrum, which is the square of FFT amplitude, to
acquiring Raw signals, feature extraction, and fault detection. detect gear deterioration under varying load conditions.
Using powerful feature engineering techniques is the core of Tandon [8] successfully used the power cepstrum,
conducting successful fault detection. Features used in fault which is defined as the logarithm of the power spectrum, to
detection are typically classified into the time domain, detect outer race defects in bearings. However, the method
frequency domain, and time-frequency domain. was unable to detect inner race faults.
Traditional methods for feature extraction in time During the past decade, researchers have studied
domain were mostly based on visual inspection of the time-frequency techniques to overcome the shortcomings in
vibration signal using an oscilloscope [2]. Gustafsson and the frequency domain features. Time-frequency analysis is
Tallian [3] proposed a method of defect detection based on the extremely effective when it comes to working with non-
number of "almost periodical" character of vibration peaks stationary vibration signals.
produced by rotating a damaged bearing at a constant speed.
This method had limitations, such as the inability to detect
Primary techniques in time-frequency analysis were modified Architecture is discussed as the novelty detection algorithm.
version of Fourier transform such as windowed Fourier The experimental setup and the results are presented in
transform [9] and Short-Time Fourier Transform (STFT) [10]. Section IV.
However, the STFT and windowed Fourier transform
were unable to provide multiple resolutions in both time and II. FEATURE ENGINEERING
frequency, a vital property for processing signals having low Common features in time and frequency domains used in
and high-frequency components. [11,12] condition monitoring are summarized in Tables I and II. The
Continuous Wavelet Transform (CWT) is probably important features in time domain are RMS, square root of the
the most popular time-frequency method in condition amplitude (SRA), kurtosis value (KV), skewness value (SV),
monitoring techniques due to its efficient computational and peak-peak value (PPV), in addition, some dimensionless
implementation and flexibility [13]. Wang W and Wong A K features, such as crest factor (CF), impulse factor (IF), margin
[14], used CWT to diagnose helicopter gear faults. Their factor (MF), shape factor (SF) and kurtosis factor (KF) [19].
results showed the superiority of wavelet-based methods
compared to traditional approaches such as synchronous signal TABLE I. TIME DOMAIN FEATURES
averaging technique, especially in detecting faults under low- Formula Description
load conditions.
1
The Discrete Wavelet Transform (DWC) and
1 N 2 Root Mean squared
Wavelet Packet Decomposition(WPD) were also used to X rms = xi2 of signal(RMS)
diagnose faults in rolling elements [13]. Goumasa et al. [11] N i =1
used WPD to do online quality control of washing machines, 2
but the results were inferior to that of simple Fourier-based 1 N
= xi
Square root of the
features. X sra
N i =1 amplitude
The last step in fault diagnosis is fault detection and
identification. Neural networks have been widely used in fault 1 N
xi − x
detection application of rotating machines, generally as a fault X kv =
N
(
i =1
)4 Kurtosis value
classifier [15]. However, some versions of neural networks
1 N
xi − x
can be used as what is called novelty detectors.
These models can detect faulty signals, even though
X sv =
N
(
i =1
)3 Skewness Value
they were trained only by the healthy signals. Principal
component analysis (PCA), Self-Organizing Maps (SOM), X ppv = max ( xi ) − min ( xi ) Peak-peak value
and autoencoders are among those kinds of networks, and they
have been investigated by researchers [16]. Using max ( xi )
Mahalanobis distance as the criteria is another novelty X cf =
1 Crest factor
( i =1xi2 ) 2
N 1
detection method. In this method, if the Mahalanobis distance
between the new sample and the healthy data set is less than a N
threshold, the sample is classified healthy; if the distance is max ( xi )
greater than the threshold, it will be labeled as faulty. Timusk X if =
1 Impulse Factor
N
et.al [17] studied these novelty detectors and found that a x
i =1 i
combined classifier, using the majority vote of multiple N
classifiers, is the most accurate classifier. max ( xi )
Petsche et al. [18] used autoencoder in an online X mf = 2
1 Margin Factor
N
motor fault detection system. They measured the current on a i =1
xi
N
single phase of the power supply to estimate the vibration of
1 1
the motor. The signals were acquired online and then sent to a
N 2 1 N
X sf = ( x )
i =1 i
2
i =1
xi Shape Factor
PC to check for anomalous signals. The performance of the N N
model was acceptable in real-world applications.
1 N xi − x 4
Many other fault detection methods have been i =1
(
)
proposed, such as Fuzzy sets-based techniques and Expert X kf = N Kurtosis Factor
1
( i =1xi2 ) 2
N
system-based techniques. Their main weakness point is that
N
nearly all of these methods lack the ability of self-learning
[15]. This means that using them requires extensive prior
knowledge about the system and its faults characteristic. Three features were also extracted from the frequency domain
In this paper, we are trying to overcome This issue by based on the FFT of the signals. These features are frequency
proposing a method well-suited for online learning in center (FC), RMS frequency (RMSF), and root variance
monitoring applications. In section II, we study the feature frequency (RVF), defined as follows:
engineering process where the most suitable features are
selected to proceed with. In Section III, autoencoder
signal, this process is done again for the approximation
TABLE II. FREQUENCY DOMAIN FEATURES coefficients of each level, as depicted in Figure 1, where A
Formula Description and D are approximation and detail coefficients. The high pass
and low pass filter are extracted from the discretized form of
N −1
f s( f )
i i
the wavelet function, sampled in a finite number of points.
X fc = i =0
N −1
Frequency
center
s( f )
i =0
i
1
N −1 2 2
fi s ( fi ) RMS
X rmsf = i =N0 −1 frequency
s ( fi )
i =0 Fig. 1. Discrete Wavelet transform [11]
1
N −1
( fi − X fc ) s ( fi )
2 2
As mentioned, the DWT only decomposes the
Root variance
X rmsf = i =0 N −1
frequency
approximation part of the signal at each level, i.e., this
s ( fi )
transformation can only characterize low-frequency parts of
the signal. At the same time, there is much valuable
i =0 information in the high-frequency part for extracting
discriminative features.
The wavelet Transform maps a signal from the time domain to Wavelet Packet Decomposition (WPD) is designed to tackle
the time-frequency domain, which is in contrast to Fourier this issue. In the WPD, in contrast to DWT, both the detail and
Transform, in which the signal is mapped only to the approximation coefficients are decomposed at each level,
frequency domain. The continuous wavelet transform does allowing capture of all the information both in low and high-
this by introducing a wavelet function that is quite shorter than frequency parts of the signal. WPD has been widely used in
the raw signal and calculates the inner product of part of the many areas such as image processing, fault diagnosis, etc.,
signal and the wavelet function. The wavelet function is [21].
shifted along the signal, and inner products, as a benchmark of The Wavelet Packet Decomposition has been used to generate
similarity, are calculated. For the next step, the frequency of new features. DB4, one of the Daubechies wavelets, is
the wavelet function is changed, i.e., the function becomes selected, and decomposition is done until the fourth level,
denser or wider, and the process of shifting the wavelet and creating 16 nodes at the last level. The energies of the
calculating the inner products is repeated, according to coefficients in the last layer are selected as the new features
[22].
1 t −b
CWT (a, b) =
a
s(t )w( a
)dt (1)
III. THE AUTOENCODER ARCHITECTURE
Since the prospect of this research was to introduce an
Where s is the signal in time domain and a and b are algorithm suitable for online learning and standalone
the scaling and shifting parameters for the wavelet function, applications, standard classification and clustering algorithms
w(t). The output of this continuous transformation is a 2- used in condition monitoring, such as training common neural
dimensional function. The value of the function at each point networks were not suitable. Those algorithms need both
shows the similarity between the signal and the wavelet healthy and faulty samples in the training phase. However, in
function, in a specific moment in the time and a specific an online learning application, the device does not have access
frequency of the wavelet function. The main advantage of this to faulty samples. The appropriate algorithm should identify
transformation, compared to the Fourier transformation, is its anomalies even though it is trained only by the healthy signals.
capability to catch non-stationary and transient changes in the These algorithms are called novelty detectors [16].
signal. Autoencoders are a class of Multi-Layer Neural
From another point of view, a wavelet function is a band-pass Networks (MLNN) consisting of two parts: Decoder and
filter which reduces the frequency band by two at each scaling encoder. The decoding part reduces the input vector
level [20]. Discrete wavelet transform (DWT) can be done by dimension, and the encoding part tries to reconstruct the
passing the signal through a series of filters. First, the samples original input vector. If trained appropriately, the network will
are passed through a low pass filter. The output is an be able to reconstruct vectors that are similar to vectors in the
approximation of the signal, called approximation coefficients. network training set with a small margin of error. However,
The signal is also filtered by a high pass filter, which its the network cannot reconstruct input vectors that are very
outputs give the detail coefficients. To further decompose the different from the training set. This characteristic can be used
as a means for novelty detection. First, the network is trained
only by healthy data; then, when an input vector is fed to the
network, the less the reconstructed vector is similar to the
original input, the more anomalous the input is compared to
the training set.
An autoencoder with two hidden layers was trained
using the back-propagation algorithm. The number of neurons
and activation functions are illustrated in Table III.
Fig. 4. The loss function for the training and test sets
V. CONCLUSION
As the last step, the results were normalized by subtracting the
In this article, we developed an algorithm that is suitable for
output values by the mean of the training set and dividing
online fault detection of rotating machines. Data were
them by the training set standard deviation. Assuming that the
acquired for both healthy and faulty conditions using an
results for the healthy data have a Student's t-distribution, the
experimental setup, and time, frequency, and time-frequency
99% confidence level can be calculated. The values which are
features were extracted from the signals. An autoencoder has
greater than the confidence level threshold are labeled as
been used as the model. The main challenge was that the
faulty. The threshold is 2.6127 in this case. Figures 5 shows
model should be trained only by the features of healthy signals
the result for all samples.
since it is going to be used in online learning applications,
where there is no labeled data in the process of training the
model. The model was trained, and the results were discussed.
The autoencoder does well in labeling healthy signals, yet
there is still room for improvement in truly labeling the faulty
signals.
REFERENCES
[1] Hongyu Yang, Joseph Mathew and Lin Ma, "Vibration Feature
Extraction Techniques for Fault Diagnosis of Rotating Machinery -A
Literature Survey" In AsiaPacific Vibration Conference, Gold Coast,
Fig. 5. Normalized norm of the difference between the input vector and the Australia, 12-14 November 2003.
original input [2] N. Tandon and A. Choudhury, "review of vibration and acoustic
measurement methods for the detection of defects in rolling element
The accuracy of the model can be computed by counting the bearings", Tribology International vol. 32, pp. 469–480, 1999.
number of the true positives(TP), true negatives(TN), false [3] Olof G. Gustafsson and Tibor Tallian, "Detection of Damage in
positives(FP), and false negatives(FN) in the result, which is Assembled Rolling Element Bearings", Tribology Transactions, vol.
shown in Table IV. This table is also called the confusion 5(1), pp. 197-209, 1962.
matrix of a classifier.
[4] Miyachi T and Seki K, "An investigation of the early detection of
defects in ball bearings using vibration monitoring — practical limit
TABLE IV. CONFUSION MATRIX
of detectability and growth speed of defects", In: Proceedings of the
Predicted Healthy Predicted Faulty International Conference on Rotordynamics, JSMEIFToMM, Tokyo,
pp.403–8, 14–17 September 1986.
Actual Healthy 169 (True Negative) 1 (False Positive) [5] Altmann J and Mathew J, "High Frequency Transient Analysis for
Actual Faulty 53 (False Negative) 157 (True Positive) the Detection and Diagnosis of Faults in Low Speed Rolling
Element Bearings", the Asia Pacific Vibration Conference, pp. 730-
Four common criteria used in evaluating the statistical models 735, 1997.
are precision, recall, accuracy, and F1 score, described in
equations 2 to 5. The values of these criteria are shown in [6] T. Williams, X. Ribadeneira, S. Billington and T. Kurfess, "Rolling
element bearing diagnostics in run-to-failure lifetime testing",
Table V. Mechanical Systems and Signal Processing, vol. 15(5), pp. 979-993,
2002.
TP
precision = (2) [7] Baydar N and Ball A, "Detection of gear deterioration under varying
TP + FP load conditions by using the instantaneous power spectrum", vol.
14(6), pp. 907-921 ,2000.
TP
recall = (3) [8] Tandon N, "A comparison of some vibration parameters for the
TP + FN condition monitoring of rolling element bearings", Measurement,
vol. 12(3), pp. 285-289, January 1994.
TP + TN (4)
accuracy =
TP + FN + FP + TN [9] Pan M-C, Sas P, and van Brussel H, "Non-stationary time-frequency
analysis for machine condition monitoring. In Time-Frequency and
2* TP Time-Scale Analysis", Proceedings of Third International
F1 Score = (5) Symposium on Time-Frequency and Time-Scale Analysis (TFTS-
2* TP + FP + FN 96), pp. 477-480, 1996.
TABLE V. MODEL PERFORMANCE SCORE [10] Klein R, Ingman D, and Braun S, "Non-stationary signals: phase-
criteria Precision Recall Accuracy F1 Score energy approach-theory and simulations", Mechanical Systems and
score 0.994 0.748 0.858 0.853 Signal Processing, vol. 15(6), pp. 1061-1089, 2001.
[11] S. Goumasa, M. Zervakisa, A. Pouliezosb and G.S. Stavrakakisa, [17] Markus Timusk, Maike Lipsett, Chris K. Mechefske,"fault detection
"Intelligent online quality control of washing machines using using transient machine signals", Mechanical systems and signal
discrete wavelet analysis features and likelihood classification", processing, vol. 22(7), pp. 1724-1749 ,2008.
Proceedings of SPIE - The International Society for Optical
Engineering , vol. 14(5), pp. 655-666, 2001. [18] Thomas Petsche, Angelo Marcantonio, Christian Darken, Stephen J.
Hanson, Gary M. Kuhn and Iwan Santoso, "A Neural Network
Autoassociator for Induction Motor Failure Prediction", Advances in
[12] Qian, S. and Chen D., "Joint Time-Frequency Analysis: Methods Neural Information Processing Systems, vol. 8, pp. 924-930 ,1995.
and Applications", Prentice-Hall PTR, New Jersey, 1996.
[19] Zhanguo Xia, Shixiong Xia, Ling Wan and Shiyu Cai, "Spectral
[13] Nikolaou N G and Antoniadis I A, "Rolling element bearing fault Regression Based Fault Feature Extraction for Bearing
diagnosis using wavelet packets", NDT & E International, vol. 35(3), Accelerometer Sensor Signals", Sensors, vol. 12, pp. 13694-13719,
pp. 197-205 ,2002. 2012.
[14] W. Wang and A. K. Wong, "Some new signal processing approaches [20] S.Gergely, M.N.Roman and R. V. Ciupa, "Wavelet Transform Using
for gear fault diagnosis," ISSPA '99. Proceedings of the Fifth DSP Microcontroller", Jobbágy Á. (eds) 5th European Conference of
International Symposium on Signal Processing and its Applications the International Federation for Medical and Biological Engineering.
(IEEE Cat. No.99EX359), Brisbane, Queensland, Australia, vol.2, IFMBE Proceedings, vol. 37, pp 117-120, 2011.
pp. 587-590, 1999.
[21] Hutian Feng, Rong Chen and Yiwei Wang, "Feature extraction for
[15] Hongyu Yang, Joseph Mathew and Lin Ma, "Intelligent Diagnosis of fault diagnosis based on wavelet packet decomposition: An
Rotating Machinery Faults-A Review", 3rd Asia-Pacific Conference application on linear rolling guide", Advances in Mechanical
on Systems Integrity and Maintenance, Cairns, Australia, pp. 385- Engineering, vol. 10(8), pp. 1–12 ,2018.
392, 2002.
[16] Markos Markou and Sameer Singh, "Novelty detection: a review— [22] Paramita Chattopadhyay, Pratyay Konar, "Feature Extraction using
part 2: neural network based approaches", Signal Processing, vol. Wavelet Transform for Multi-class Fault Detection of Induction
83(12), pp. 2499-2521, December 2003. Motor", Journal of The Institution of Engineers (India) Series B, vol.
95(1), pp. 73-81, May 2014.