Preprocessing-Free Gear Fault Diagnosis Using Small Datasets With Deep Convolutional Neural Network-Based Transfer Learning
Preprocessing-Free Gear Fault Diagnosis Using Small Datasets With Deep Convolutional Neural Network-Based Transfer Learning
ABSTRACT Early diagnosis of gear transmission has been a significant challenge, because gear faults occur
primarily at microstructure or even material level but their effects can only be observed indirectly at a system
level. The performance of a gear fault diagnosis system depends significantly on the features extracted and
the classifier subsequently applied. Traditionally, fault-related features are extracted and identified based on
domain expertise through data preprocessing which are system-specific and may not be easily generalized.
On the other hand, although recently the deep neural networks based approaches featuring adaptive feature
extractions and inherent classifications have attracted attention, they usually require a substantial set of
training data. Aiming at tackling these issues, this paper presents a deep convolutional neural network-based
transfer learning approach. The proposed transfer learning architecture consists of two parts; the first part is
constructed with a pre-trained deep neural network that serves to extract the features automatically from the
input, and the second part is a fully connected stage to classify the features that needs to be trained using gear
fault experimental data. Case analyses using experimental data from a benchmark gear system indicate that
the proposed approach not only entertains preprocessing free adaptive feature extractions, but also requires
only a small set of training data.
INDEX TERMS Alexnet, deep convolutional neural network, gear fault diagnosis, transfer learning.
2169-3536
2018 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 6, 2018 Personal use is also permitted, but republication/redistribution requires IEEE permission. 26241
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
P. Cao et al.: Preprocessing-Free Gear Fault Diagnosis Using Small Data sets
analysis utilizing Wigner-Ville distribution [9], [16], short Deep neural network is undoubtedly a powerful tool in
time Fourier transform [10], [17], and various wavelet trans- pattern recognition and data mining. As an end-to-end hierar-
forms [11], [18]. The time-frequency distribution in such chical system, it inherently blends the two essential elements
analysis can in theory lead to rich analysis results regarding in condition monitoring, feature extraction and classification,
the time- and frequency-related events in signals. into a single adaptive learning frame. It should be noted
Although the manual and empirical methods of feature that the amount of training data required for satisfactory
extraction have seen various levels of successes, obviously results depends on many aspects of the specific problem
their effectiveness is hinged upon the specific features being tackled, such as the correctness of training samples,
adopted in the diagnostic analysis. It is worth emphasizing the number of pattern classes to be classified, and the degree
that the choices of features as well as the often-applied sig- of separation between different classes. In most machinery
nal preprocessing techniques are generally based on domain diagnosis investigations, the lack of labeled training samples,
expertise and subjective decisions on a specific gear system. i.e., experiment data of known failure patterns, is a common
For example, while wavelet transforms have been popular issue, because it is impractical to collect experimental data
and it is well known that each wavelet coefficient can be of each failure type and especially severity for a machinery
interpreted as the energy concentration at a specific time- system. To improve the performance given limited training
frequency point, it is evident from large amount of literature data, some recent studies have attempted to combine pre-
that there does not seem to be a consensus on what kind processing and data augmentation techniques, e.g., discrete
of wavelet to use for gear fault diagnosis. This should not wavelet transform [25], antialiasing/decimation filter [23],
come as a surprise. On one hand gear faults occur primarily at and wavelet packet transform [21], with neural networks for
microstructure or even material level but their effects can only fault diagnosis. Nevertheless, the preprocessing techniques
be observed indirectly at a system level; consequently there employed, which are subjected to selection based on domain
exists a many-to-many relationship between actual faults and expertise, may negatively impact the objective nature of neu-
the observable quantifies (i.e., features) for a given gear ral networks and to some extent undermines the usage of such
system [19]. On the other hand, different gear systems have tools.
different designs which lead to very different dynamic char- In this research, aiming at advancing the state-of-the-art,
acteristics. As such, the result on features manually selected we present a deep neural network-based transfer learning
and, to a large extent, the methodology employed to extract approach utilizing limited time-domain data for gearbox
these features for one gear system design may not be easily fault diagnosis. One-dimensional time-domain data of vibra-
extrapolated to a different gear system design. tion responses related to gear fault patterns are converted
Fundamentally, condition monitoring and fault diagno- into graphical images as input. The approach inherits the
sis of gear systems belongs to the general field of pattern non-biased nature of neural networks that can avoid the
recognition. The advancements in related algorithms along manual selection of features. Meanwhile, the issue of
with the rapid enhancement of computational power have limited data is overcome by formulating a new neural
trigged the wide spread of machine learning techniques to network architecture that consists of two parts. Massive
various applications. Most recently, deep neural network- image data (1.2 million) from ImageNet (http://www.image-
based methods are progressively being investigated. When net.org/challenges/LSVRC/2010/) are used first to train an
the parameters of a deep neural network are properly trained original deep neural network model, denoted as neural net-
by available data, representative features can be extracted in a work A. The parameters of neural network A are trans-
hierarchy of conceptual abstractions, which are free of human ferred (copied) to the new architecture as the first part.
interference compared to manual selection of features. Some The second part of the architecture, an untrained neural
recent studies have adopted such type of approaches in gear network B, accommodates the gear fault diagnosis task and
fault diagnosis, aiming at identifying features implicitly and is further trained using experimentally generated gear fault
adaptively and then classifying damage/fault in an automated data. Unlike traditional neural networks, the training set of
manner with minimal tuning. For example, Zhang et al. [20] transfer learning do not necessarily subordinate to the same
developed a deep learning network for degradation pattern category or from the same physical background [26]. As to
classification and demonstrated the efficacy using turbofan be demonstrated later, with this new architecture, highly
engine dataset. Li et al. [21] proposed a deep random forest accurate gear fault diagnosis can be achieved using limited
fusion technique for gearbox fault diagnosis which achieves time-domain data directly without involving any subjective
97.68% classification accuracy. Weimer et al. [22] examined preprocessing techniques to assist feature extraction. The rest
the usage of deep convolutional neural network for indus- of this paper is organized as follows. In Section II, building
trial inspection and demonstrated excellent defect detection upon convolutional neural network and transfer learning, we
results. Ince et al. [23] developed a fast motor condition develop the specific architecture for gear fault diagnosis.
monitoring system using a 1-D convolutional neural network In Section III, experimental data are analyzed using the pro-
with a classification accuracy of 97.4%. Abdeljaber et al. [24] posed approach with uncertainties and noise; comparisons
performed real-time damage detection using convolutional with respect to different approaches are conducted as well.
neural network and showcased satisfactory efficiency. Concluding remarks are summarized in Section IV.
II. TRANSFER LEARNING FOR GEAR FAULT DIAGNOSIS operation can be expressed as,
The proposed transfer learning approach is built upon deep p X
X q
convolutional neural network. Deep neural networks have yd1 , d2 , k = xd1 i,d2 j × fi,j,k (1)
enjoyed great success but require a substantial amount i=0 j=0
of training instances for satisfactory performance. In this
where y, x and f denote the element in feature map, input and
section, for the sake of completeness in presentation we
convolution filter, respectively. fi,j,k represents the element
start from the essential formulations of convolutional neu-
on the i-th column and j-th row for filter k. yd1 , d2 , k is the
ral network and transfer learning, followed by the specific
element on the d1 -th column and d2 -th row of feature map k.
architecture developed for gear fault diagnosis with limited
And xd1 i,d2 j refers to the input element on the i-th column
training data.
and j-th row of the stride window specified by d1 and d2 .
Equation (1) gives a concise representation of the convolution
A. CONVOLUTIONAL NEURAL NETWORKS (CNNs)
operation when the input is 2-demensional, and stride and
Convolutional Neural Networks (CNNs) are a class of padding are 1 and 0. Higher dimension convolution oper-
biologically inspired neural networks featuring one or mul- ations can be conducted in a similar manner. To be more
tiple convolutional layers that simulate human visual evocative, suppose the input image can be represented by a
system [27]. In recent years, due to the enhancement in 4 × 7 matrix and the convolution kernel is a 3 × 3 iden-
computational power and the dramatic increase in the amount tity matrix. As we take kernel and stride it over the image
of data available in various applications, CNNs-based meth- matrix, dot products are taken in each step and recorded in
ods have shown significant improvements in performance a feature map matrix (Figure 2). Such operation is called
and thus have become the most popular class of approaches convolution. In CNNs, multiple convolution filters are used
for pattern recognition tasks such as image classifica- in a convolutional layer, each acquiring a feature piece in its
tion [28], natural language processing [29], recommending own perspective from the input image specified by the filter
systems [30] and fault detection [23]. CNNs learn how to parameters. Regardless of what and where a feature appears
extract and recognize characteristics of the target task by in the input, the convolutional layer will try to characterize it
combining and stacking convolutional layers, pooling layers from various perspectives that have been tuned automatically
and fully connected layers in its architecture. Figure 1 illus- by the training dataset.
trates a simple CNN with an input layer to accept input
images, a convolutional layer to extract features, a ReLU
layer to augment features through non-linear transformation,
a max pooling layer to reduce data size, and a fully connected
layer combined with a softmax layer to classify the input to
pre-defined labels. The parameters are trained through a train-
ing dataset and updated using back propagation algorithm to
reflect the features of the task that may not be recognized oth-
erwise. The basic mechanism of layers in CNNs is outlined FIGURE 2. Illustration of convolution operation.
as follows.
subsequently using the training data from the novel task. Let
the training datasets from the previous task Dpre and the novel
task Dnov be represented as
Dpre = Xpre , Lpre , Dnov = {Xnov , Lnov }
(4a, b)
FIGURE 3. Illustration of ReLU and max pooling. where X is the input and L is the output label. The
CNNs for both tasks can then be regarded as, L̂pre =
CNNpre (Xpre , θ pre ),
where L1 ≤ i ≤ U1 and L2 ≤ j ≤ U2 define the sub-
L̂nov = CNNnov (Xnov , θ nov ) (5a, b)
region. The max pooling layer not only makes the network
less sensitive to location changes of a feature but also reduces CNN operator denotes the mapping of a convolutional
the size of parameters, thus alleviates computational burden neural network given parameters θ from input to predicted
and controls overfitting. output L̂. The parameters of the previous task is trained
through
B. TRANSFER LEARNING
θ 0pre = arg min(Lpre − L̂pre )
CNNs are powerful tools, and the performance can generally θ pre
be improved by up-scaling the CNN equipped. The scale of a = arg min(Lpre − CNNpre (Xpre , θ pre )) (6)
CNN concurs with the scale of the training dataset. Naturally, θ pre
the deeper the CNN, the more parameters need to be trained,
where θ 0pre stands for the parameters after training. There-
which requires a substantial amount of valid training samples.
upon, the trained parameters of the first n layers can be
Nevertheless, in gear fault diagnosis, the training data is not
transferred to the new task as,
as sufficient as that of data-rich tasks such as natural image
classification. In fact, it is impractical to collect physical θ nov (1 : n)0 := θ pre (1 : n)0 (7)
data from each failure type and especially severity since the
severity level is continuous in nature and there are infinitely The rest of the parameter can be trained using training sam-
many possible fault profiles. ples from the novel task,
Figure 4 illustrates a representative relationship between θ nov (1 : m)0 = [θ nov (1 : n)00 , θ nov (n : m)0 ]
data size and performance for different learning methods. = arg min(Lnov − CNNnov (Xnov ,
While the performance of a large-scale CNN has the potential θ nov (1:m)
to top other methods, it is also profoundly correlated with the ×[θ nov (1 : n)0 , θ nov (n : m)])) (8)
size of training data. Transfer learning, on the other hand,
is capable of achieving prominent performance commensu- In Equation (8), by setting differential learning rates, the
rate with large scale CNNs using only a small set of training parameters in the first n layers are fine-tuned as θ nov (1 : n)00
date [31], [32]. By applying knowledge and skills (in the form using a smaller learning rate, and the parameters in the last
of parameters) learned and accumulated in previous tasks (m − n) layers are trained from scratch as θ nov (n : m)0 . The
that have sufficient training data, transfer learning provides phrase ‘‘differential learning rates’’ refers to different learn-
a possible solution to improve the performance of a neural ing rates for different parts of the network during our training.
network when applied to a novel task with small training In general, the transferred layers (i.e., the first n layers) are
dataset. Classic transfer learning approaches transfer (copy) pre-trained to detect and extract generic features of inputs
the first n layers of a well-trained network to the target which are less sensitivity to the domain of application. There-
network of layer m > n. Initially, the last (m − n) layers fore, the learning rate for the transferred layers is usually
of the target network are left untrained. They are trained very small. In an extreme case where the learning rate for the
transferred layers is zero, the parameters in the first n layers
transferred are left frozen.
Therefore, the CNN used for the novel task for future fault
classification and diagnosis can be represented as,
CNNnov (Xnov , [θ nov (1 : n)00 , θ nov (n : m)0 ]) (9)
where the parameters in the first n layers are first transferred
from a previous task. Meanwhile, as the last (m − n) layers
are trained using the training dataset of the novel task, the first
n layers are fine-tuned for better results.
θ 0nov = [θ nov (1 : n)00 , θ nov (n : m)0 ] (10)
Transfer learning becomes possible and promising because,
FIGURE 4. Learning methods: data size vs. performance. as has been discovered by recent studies, the layers at the
TABLE 1. Specifications of the proposed architecture. where α is the learning rate, i is the number of iteration, and
β stands for the contribution of previous gradient step. While
classical SGD and momentum SGD are frequently adopted in
training CNNs for their simplicity and efficiency, other tech-
niques, such as AdaGrad, AdaDelta or Adam [38] can also
be applied to carry out optimization of Equation (11). The
transferability of the base architecture and the performance
of the proposed architecture for gear fault diagnosis will be
investigated in the next section.
θ i+1 = θ i − α∇E(θ i ) + β(θ i − θ i−1 ) (12) FIGURE 7. Gearbox system employed in experimental study.
TABLE 2. Classification results (3,600 sampling points). TABLE 3. Computational time comparison (average of 5 attempts).
FIGURE 15. Vibration signal of a spalling gear. (a) 3,600 sampling points,
(b) 900 sampling points.
FIGURE 17. Box plots of classification results of the three methods after
down sampling. (a) 2%, (b) 5%, (c) 10%, (d) 20%, (e) 40%, (f) 60%,
(g) 80%.
Table 5 lists the comparison of the classification results [6] T. Fakhfakh, F. Chaari, and M. Haddar, ‘‘Numerical and experimental
of the three methods with different training data sizes. Sim- analysis of a gear system with teeth defects,’’ Int. J. Adv. Manuf. Technol.,
vol. 25, nos. 5–6, pp. 542–550, 2005.
ilar to Case 1, the proposed transfer learning approach is [7] D. Z. Li, W. Wang, and F. Ismail, ‘‘An enhanced bispectrum technique
the best performer. Figure 16 illustrates the classification with auxiliary frequency injection for induction motor health condition
results before and after down-sampling. While lowering the monitoring,’’ IEEE Trans. Instrum. Meas., vol. 64, no. 10, pp. 2679–2687,
Oct. 2015.
sampling rate deteriorates the overall performance of all [8] W. Wen, Z. Fan, D. Karg, and W. Cheng, ‘‘Rolling element bearing
approaches, each method exhibits the similar trend as seen fault diagnosis based on multiscale general fractal features,’’ Shock Vib.,
in Section III.C. For transfer learning, it starts with 60.11% vol. 2015, Jul. 2015, Art. no. 167902.
[9] B. Tang, W. Liu, and T. Song, ‘‘Wind turbine fault diagnosis based on
classification accuracy and reaches 95.88% when only 20% Morlet wavelet transformation and Wigner–Ville distribution,’’ Renew.
of data is used as training data whilst the accuracies of local Energy, vol. 35, no. 12, pp. 2862–2866, 2010.
CNN and AFS-SVM are 43.56% and 70.07%. Local CNN [10] F. Chaari, W. Bartelmus, R. Zimroz, T. Fakhfakh, and M. Haddar, ‘‘Gear-
box vibration signal amplitude and frequency modulation,’’ Shock Vib.,
performs better than AFS-SVM when 80% data is used for
vol. 19, no. 4, pp. 635–652, 2012.
training. Unlike AFS-SVM, the performance of local CNN [11] R. Yan, R. X. Gao, and X. Chen, ‘‘Wavelets for fault diagnosis of rotary
can be largely improved if significantly more training data machines: A review with applications,’’ Signal Process., vol. 96, pp. 1–15,
is incorporated because the parameters of lower stages can Mar. 2014.
[12] X. Chen and Z. Feng, ‘‘Time-frequency analysis of torsional vibration
be learned from scratch. Eventually, the performance of local signals in resonance region for planetary gearbox fault diagnosis under
CNN could reach that of the transfer learning approach. Nev- variable speed conditions,’’ IEEE Access, vol. 5, pp. 21918–21926, 2017.
ertheless, for cases with limited data, the proposed transfer [13] S. Zhang and J. Tang, ‘‘Integrating angle-frequency domain synchronous
averaging technique with feature extraction for gear fault diagnosis,’’
learning approach has an extensive performance margin com- Mech. Syst. Signal Process., vol. 99, pp. 711–729, Jan. 2018.
pared to local CNN or other preprocessing-based shallow [14] C. Pachaud, R. Salvetat, and C. Fray, ‘‘Crest factor and kurtosis contribu-
learning methods such as AFS-SVM. Even with ample train- tions to identify defects inducing periodical impulsive forces,’’ Mech. Syst.
Signal Process., vol. 11, no. 6, pp. 903–916, 1997.
ing data, initializing with transferred parameters can improve [15] S. Qian and D. Chen, ‘‘Joint time-frequency analysis,’’ IEEE Signal Pro-
the classification accuracy in general. Moreover, the proposed cess. Mag., vol. 16, no. 2, pp. 52–67, Mar. 1999.
approach requires no preprocessing. Similar to Case 1 in [16] N. Baydar and A. Ball, ‘‘A comparative study of acoustic and vibration
Section III.C, the proposed approach is very robust especially signals in detection of gear failures using Wigner–Ville distribution,’’
Mech. Syst. Signal Process., vol. 15, no. 6, pp. 1091–1107, 2001.
when 40% or more data is used for training (Figure 17). [17] W. Bartelmus and R. Zimroz, ‘‘Vibration condition monitoring of planetary
gearbox under varying external load,’’ Mech. Syst. Signal Process., vol. 23,
no. 1, pp. 246–257, 2009.
IV. CONCLUDING REMARKS
[18] J. Lin and M. J. Zuo, ‘‘Gearbox fault diagnosis using adaptive wavelet
In this research, a deep convolutional neural network-based filter,’’ Mech. Syst. Signal Process., vol. 17, no. 6, pp. 1259–1269, 2003.
transfer learning approach is developed for deep feature [19] Y. Lu, J. Tang, and H. Luo, ‘‘Wind turbine gearbox fault detection using
extraction and applied to gear fault diagnosis. This proposed multiple sensors with features level data fusion,’’ J. Eng. Gas Turbines
Power, vol. 134, no. 4, p. 042501, 2012.
approach does not require manual feature extraction, and can [20] C. Zhang, J. H. Sun, and K. C. Tan, ‘‘Deep belief networks ensemble
be effective even with a small set of training data. Experi- with multi-objective optimization for failure diagnosis,’’ in Proc. IEEE Int.
mental studies are conducted using preprocessing free raw Conf. Syst., Man, Cybern. (SMC), Oct. 2015, pp. 32–37.
[21] C. Li, R.-V. Sanchez, G. Zurita, M. Cerrada, D. Cabrera, and R. E. Vásquez,
vibration data towards gear fault diagnose. The performance ‘‘Gearbox fault diagnosis based on deep random forest fusion of acous-
of the proposed approach is highlighted through varying the tic and vibratory signals,’’ Mech. Syst. Signal Process., vols. 76–77,
size of training data. The classification accuracies of the pp. 283–293, Aug. 2016.
[22] D. Weimer, B. Scholz-Reiter, and M. Shpitalni, ‘‘Design of deep con-
proposed approach outperform those of other methods such volutional neural network architectures for automated feature extraction
as locally trained convolutional neural network and angle- in Industrial inspection,’’ CIRP Ann.-Manuf. Technol., vol. 65, no. 1,
frequency analysis-based support vector machine by as much pp. 417–420, 2016.
[23] T. Ince, S. Kiranyaz, L. Eren, M. Askar, and M. Gabbouj, ‘‘Real-time motor
as 50%. The achieved accuracy indicates that the proposed
fault detection by 1-D convolutional neural networks,’’ IEEE Trans. Ind.
approach is not only viable and robust, but also has the Electron., vol. 63, no. 11, pp. 7067–7075, Nov. 2016.
potential to be applied to fault diagnosis of other systems. [24] O. Abdeljaber, O. Avci, S. Kiranyaz, M. Gabbouj, and D. J. Inman, ‘‘Real-
time vibration-based structural damage detection using one-dimensional
convolutional neural networks,’’ J. Sound Vib., vol. 388, pp. 154–170,
REFERENCES Feb. 2017.
[1] D. Kang, Z. Xiaoyong, and C. Yahua, ‘‘The vibration characteristics of [25] N. Saravanan and K. I. Ramachandran, ‘‘Incipient gear box fault diagnosis
typical gearbox faults and its diagnosis plan,’’ J. Vib. Shock, vol. 20, no. 3, using discrete wavelet transform (DWT) for feature extraction and classifi-
pp. 7–12, 2001. cation using artificial neural network (ANN),’’ Expert Syst. Appl., vol. 37,
[2] R. B. Randall, Vibration-Based Condition Monitoring: Industrial, no. 6, pp. 4168–4181, 2010.
Aerospace and Automotive Applications. West Sussex, U.K.: Wiley, 2011. [26] J. Yang, S. Li, and W. Xu, ‘‘Active learning for visual image classification
[3] F. P. G. Márquez, A. M. Tobias, J. M. P. Pérez, and M. Papaelias, ‘‘Con- method based on transfer learning,’’ IEEE Access, vol. 6, pp. 187–198,
dition monitoring of wind turbines: Techniques and methods,’’ Renew. 2018.
Energy, vol. 46, pp. 169–178, Oct. 2012. [27] Y. Le Cun et al., ‘‘Handwritten digit recognition with a back-propagation
[4] W. Zhou, T. G. Habetler, and R. G. Harley, ‘‘Bearing fault detection via network,’’ in Proc. Adv. Neural Inf. Process. Syst., 1990, pp. 396–404.
stator current noise cancellation and statistical control,’’ IEEE Trans. Ind. [28] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification
Electron., vol. 55, no. 12, pp. 4260–4269, Dec. 2008. with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf. Pro-
[5] A. Parey and R. B. Pachori, ‘‘Variable cosine windowing of intrinsic mode cess. Syst., 2012, pp. 1097–1105.
functions: Application to gear fault diagnosis,’’ Measurement, vol. 45, [29] Y. Kim. (2014). ‘‘Convolutional neural networks for sentence classifica-
no. 3, pp. 415–426, 2012. tion.’’ [Online]. Available: https://arxiv.org/abs/1408.5882
[30] A. van den Oord, S. Dieleman, and B. Schrauwen, ‘‘Deep content-based SHENGLI ZHANG received the B.S. degree from
music recommendation,’’ in Proc. Adv. Neural Inf. Process. Syst., 2013, Northwestern Polytechnical University, Xi’an,
pp. 2643–2651. China, in 2009, the M.S. degree from Xi’an Jiao-
[31] C.-K. Shie, C.-H. Chuang, C.-N. Chou, M.-H. Wu, and E. Y. Chang, tong University, Xi’an, in 2012, and the Ph.D.
‘‘Transfer representation learning for medical image analysis,’’ in Proc. degree from the University of Connecticut, Storrs,
37th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), Aug. 2015, USA, in 2017, all in mechanical engineering.
pp. 711–714. After graduation, he joined Stanley Black &
[32] R. Zhang, H. Tao, L. Wu, and Y. Guan, ‘‘Transfer learning with neural
Decker, Towson, MD, USA, as a CAE Engi-
networks for bearing fault diagnosis in changing working conditions,’’
neer, where he is involved in product development
IEEE Access, vol. 5, pp. 14347–14357, 2017.
[33] M. D. Zeiler and R. Fergus. (2013). ‘‘Stochastic pooling for regularization and experimental data analysis as well as product
of deep convolutional neural networks.’’ [Online]. Available: https://arxiv. performance prediction and analytical dynamic analysis.
org/abs/1301.3557
[34] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun.
(2013). ‘‘OverFeat: Integrated recognition, localization and detection using
convolutional networks.’’ [Online]. Available: https://arxiv.org/abs/1312.
6229
[35] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, ‘‘How transferable are
features in deep neural networks?’’ in Proc. Adv. Neural Inf. Process. Syst.,
2014, pp. 3320–3328.
[36] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
R. Salakhutdinov, ‘‘Dropout: A simple way to prevent neural networks
from overfitting,’’ J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958,
2014.
[37] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, ‘‘On the importance of
initialization and momentum in deep learning,’’ in Proc. Int. Conf. Mach. JIONG TANG (M’09) received the B.S. and M.S.
Learn., Feb. 2013, pp. 1139–1147. degrees in applied mechanics from Fudan Univer-
[38] D. P. Kingma and J. Ba. (2014). ‘‘Adam: A method for stochastic optimiza- sity, China, in 1989 and 1992, respectively, and
tion.’’ [Online]. Available: https://arxiv.org/abs/1412.6980 the Ph.D. degree in mechanical engineering from
[39] H. S. Prashanth, H. L. Shashidhara, and M. K. N. Balasubramanya, ‘‘Image the Pennsylvania State University, USA, in 2001.
scaling comparison using universal image quality index,’’ in Proc. Int.
He was with the GE Global Research Center as a
Conf. Adv. Comput., Control, Telecommun. Technol. (ACT), Dec. 2009,
Mechanical Engineer from 2001 to 2002. Then he
pp. 859–863.
joined the Mechanical Engineering Department,
University of Connecticut, where he is currently a
PEI CAO received the B.S. degree in automa- Professor and the Director of Dynamics, Sensing,
tion from Northwestern Polytechnical University, and Controls Laboratory. His research interests include structural dynam-
Xi’an, China, in 2011. He is currently pursuing ics and system dynamics, control, and sensing and monitoring. He cur-
the Ph.D. degree in mechanical engineering with rently serves as an Associate Editor for the IEEE/ASME TRANSACTIONS ON
the University of Connecticut, Storrs, CT, USA. MECHATRONICS, and served as an Associate Editor for the IEEE TRANSACTIONS
His research interests include global optimization, ON INSTRUMENTATION AND MEASUREMENT from 2009 to 2012. He also served as
dynamic programming, statistical inference, lay- an Associate Editor for the ASME Journal of Vibration and Acoustics, and an
out design, and machine learning. Associate Editor for the ASME Journal of Dynamic Systems, Measurement,
and Control.