0% found this document useful (0 votes)
91 views13 pages

Preprocessing-Free Gear Fault Diagnosis Using Small Datasets With Deep Convolutional Neural Network-Based Transfer Learning

This document presents a research paper that proposes a deep convolutional neural network-based transfer learning approach for gear fault diagnosis using small datasets. The approach consists of two parts: 1) a pre-trained deep neural network that serves to automatically extract features from vibration data inputs, and 2) a fully connected stage to classify the extracted features, which is trained using a small set of experimental gear fault data. The study aims to develop a preprocessing-free method for gear fault diagnosis that can adaptively extract features and handle small training datasets using transfer learning techniques. Experimental results on a benchmark gear system indicate the proposed approach achieves both goals.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views13 pages

Preprocessing-Free Gear Fault Diagnosis Using Small Datasets With Deep Convolutional Neural Network-Based Transfer Learning

This document presents a research paper that proposes a deep convolutional neural network-based transfer learning approach for gear fault diagnosis using small datasets. The approach consists of two parts: 1) a pre-trained deep neural network that serves to automatically extract features from vibration data inputs, and 2) a fully connected stage to classify the extracted features, which is trained using a small set of experimental gear fault data. The study aims to develop a preprocessing-free method for gear fault diagnosis that can adaptively extract features and handle small training datasets using transfer learning techniques. Experimental results on a benchmark gear system indicate the proposed approach achieves both goals.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Received April 17, 2018, accepted May 11, 2018, date of publication May 16, 2018, date of current

version June 5, 2018.


Digital Object Identifier 10.1109/ACCESS.2018.2837621

Preprocessing-Free Gear Fault Diagnosis Using


Small Datasets With Deep Convolutional
Neural Network-Based Transfer Learning
PEI CAO1 , SHENGLI ZHANG2 , AND JIONG TANG 1, (Member, IEEE)
1 Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA
2 Stanley Black & Decker, Towson, MD 21286, USA

Corresponding author: Jiong Tang ([email protected])


This work was supported by the National Science Foundation under Grant IIS-1741171.

ABSTRACT Early diagnosis of gear transmission has been a significant challenge, because gear faults occur
primarily at microstructure or even material level but their effects can only be observed indirectly at a system
level. The performance of a gear fault diagnosis system depends significantly on the features extracted and
the classifier subsequently applied. Traditionally, fault-related features are extracted and identified based on
domain expertise through data preprocessing which are system-specific and may not be easily generalized.
On the other hand, although recently the deep neural networks based approaches featuring adaptive feature
extractions and inherent classifications have attracted attention, they usually require a substantial set of
training data. Aiming at tackling these issues, this paper presents a deep convolutional neural network-based
transfer learning approach. The proposed transfer learning architecture consists of two parts; the first part is
constructed with a pre-trained deep neural network that serves to extract the features automatically from the
input, and the second part is a fully connected stage to classify the features that needs to be trained using gear
fault experimental data. Case analyses using experimental data from a benchmark gear system indicate that
the proposed approach not only entertains preprocessing free adaptive feature extractions, but also requires
only a small set of training data.

INDEX TERMS Alexnet, deep convolutional neural network, gear fault diagnosis, transfer learning.

I. INTRODUCTION feature extraction methods that are manual and empirical in


Condition monitoring and fault diagnosis play essential role nature [1]–[3]. Generally, a certain signal processing tech-
in ensuring the safe and sustainable operations of modern nique is applied to vibration signals to identify fault-related
machinery systems. Gearbox, as one common component features that are selected based on engineering judgment.
used in those systems, is prone to fault condition or even Subsequently, a classifier is developed and applied to new sig-
failure, because of the severe working condition with high nals to predict fault occurrence in terms of type and severity.
mechanical loading and typically long operational time. Cur- There have been extensive and diverse attempts in manually
rently, vibration signals are most widely used to infer the and empirically identifying and extracting useful features
health condition of gear system, because they contain rich from gear vibration signals, which fall into three main cat-
information and can be easily measured using off-the-shelf, egories: time-domain analysis [4], [5], frequency domain-
low-cost sensors. Indeed, gear vibration signals contain three analysis [6]–[8] and time-frequency analysis [9]–[13].
components: periodic meshing frequencies, their harmonics, Time-domain statistical approaches can capture the changes
and random noise. For a healthy gear system, the mesh- in amplitude and phase modulation caused by faults [5], [14].
ing frequencies and their harmonics dominate the vibration In comparison, spectrum analysis may extract the features
response. Fault conditions cause additional dynamic effects. more easily to detect distributed faults with clear side-
The practice of fault diagnosis of gear system using vibra- bands [6], [8], [15]. To deal with noise and at the same
tion signals has proved to be a very challenging subject. The time utilize the transient components in vibration signals,
mainstream of gear condition monitoring is built upon various many efforts have focused on joint time-frequency domain

2169-3536
2018 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 6, 2018 Personal use is also permitted, but republication/redistribution requires IEEE permission. 26241
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
P. Cao et al.: Preprocessing-Free Gear Fault Diagnosis Using Small Data sets

analysis utilizing Wigner-Ville distribution [9], [16], short Deep neural network is undoubtedly a powerful tool in
time Fourier transform [10], [17], and various wavelet trans- pattern recognition and data mining. As an end-to-end hierar-
forms [11], [18]. The time-frequency distribution in such chical system, it inherently blends the two essential elements
analysis can in theory lead to rich analysis results regarding in condition monitoring, feature extraction and classification,
the time- and frequency-related events in signals. into a single adaptive learning frame. It should be noted
Although the manual and empirical methods of feature that the amount of training data required for satisfactory
extraction have seen various levels of successes, obviously results depends on many aspects of the specific problem
their effectiveness is hinged upon the specific features being tackled, such as the correctness of training samples,
adopted in the diagnostic analysis. It is worth emphasizing the number of pattern classes to be classified, and the degree
that the choices of features as well as the often-applied sig- of separation between different classes. In most machinery
nal preprocessing techniques are generally based on domain diagnosis investigations, the lack of labeled training samples,
expertise and subjective decisions on a specific gear system. i.e., experiment data of known failure patterns, is a common
For example, while wavelet transforms have been popular issue, because it is impractical to collect experimental data
and it is well known that each wavelet coefficient can be of each failure type and especially severity for a machinery
interpreted as the energy concentration at a specific time- system. To improve the performance given limited training
frequency point, it is evident from large amount of literature data, some recent studies have attempted to combine pre-
that there does not seem to be a consensus on what kind processing and data augmentation techniques, e.g., discrete
of wavelet to use for gear fault diagnosis. This should not wavelet transform [25], antialiasing/decimation filter [23],
come as a surprise. On one hand gear faults occur primarily at and wavelet packet transform [21], with neural networks for
microstructure or even material level but their effects can only fault diagnosis. Nevertheless, the preprocessing techniques
be observed indirectly at a system level; consequently there employed, which are subjected to selection based on domain
exists a many-to-many relationship between actual faults and expertise, may negatively impact the objective nature of neu-
the observable quantifies (i.e., features) for a given gear ral networks and to some extent undermines the usage of such
system [19]. On the other hand, different gear systems have tools.
different designs which lead to very different dynamic char- In this research, aiming at advancing the state-of-the-art,
acteristics. As such, the result on features manually selected we present a deep neural network-based transfer learning
and, to a large extent, the methodology employed to extract approach utilizing limited time-domain data for gearbox
these features for one gear system design may not be easily fault diagnosis. One-dimensional time-domain data of vibra-
extrapolated to a different gear system design. tion responses related to gear fault patterns are converted
Fundamentally, condition monitoring and fault diagno- into graphical images as input. The approach inherits the
sis of gear systems belongs to the general field of pattern non-biased nature of neural networks that can avoid the
recognition. The advancements in related algorithms along manual selection of features. Meanwhile, the issue of
with the rapid enhancement of computational power have limited data is overcome by formulating a new neural
trigged the wide spread of machine learning techniques to network architecture that consists of two parts. Massive
various applications. Most recently, deep neural network- image data (1.2 million) from ImageNet (http://www.image-
based methods are progressively being investigated. When net.org/challenges/LSVRC/2010/) are used first to train an
the parameters of a deep neural network are properly trained original deep neural network model, denoted as neural net-
by available data, representative features can be extracted in a work A. The parameters of neural network A are trans-
hierarchy of conceptual abstractions, which are free of human ferred (copied) to the new architecture as the first part.
interference compared to manual selection of features. Some The second part of the architecture, an untrained neural
recent studies have adopted such type of approaches in gear network B, accommodates the gear fault diagnosis task and
fault diagnosis, aiming at identifying features implicitly and is further trained using experimentally generated gear fault
adaptively and then classifying damage/fault in an automated data. Unlike traditional neural networks, the training set of
manner with minimal tuning. For example, Zhang et al. [20] transfer learning do not necessarily subordinate to the same
developed a deep learning network for degradation pattern category or from the same physical background [26]. As to
classification and demonstrated the efficacy using turbofan be demonstrated later, with this new architecture, highly
engine dataset. Li et al. [21] proposed a deep random forest accurate gear fault diagnosis can be achieved using limited
fusion technique for gearbox fault diagnosis which achieves time-domain data directly without involving any subjective
97.68% classification accuracy. Weimer et al. [22] examined preprocessing techniques to assist feature extraction. The rest
the usage of deep convolutional neural network for indus- of this paper is organized as follows. In Section II, building
trial inspection and demonstrated excellent defect detection upon convolutional neural network and transfer learning, we
results. Ince et al. [23] developed a fast motor condition develop the specific architecture for gear fault diagnosis.
monitoring system using a 1-D convolutional neural network In Section III, experimental data are analyzed using the pro-
with a classification accuracy of 97.4%. Abdeljaber et al. [24] posed approach with uncertainties and noise; comparisons
performed real-time damage detection using convolutional with respect to different approaches are conducted as well.
neural network and showcased satisfactory efficiency. Concluding remarks are summarized in Section IV.

26242 VOLUME 6, 2018


P. Cao et al.: Preprocessing-Free Gear Fault Diagnosis Using Small Data sets

II. TRANSFER LEARNING FOR GEAR FAULT DIAGNOSIS operation can be expressed as,
The proposed transfer learning approach is built upon deep p X
X q
convolutional neural network. Deep neural networks have yd1 , d2 , k = xd1 i,d2 j × fi,j,k (1)
enjoyed great success but require a substantial amount i=0 j=0
of training instances for satisfactory performance. In this
where y, x and f denote the element in feature map, input and
section, for the sake of completeness in presentation we
convolution filter, respectively. fi,j,k represents the element
start from the essential formulations of convolutional neu-
on the i-th column and j-th row for filter k. yd1 , d2 , k is the
ral network and transfer learning, followed by the specific
element on the d1 -th column and d2 -th row of feature map k.
architecture developed for gear fault diagnosis with limited
And xd1 i,d2 j refers to the input element on the i-th column
training data.
and j-th row of the stride window specified by d1 and d2 .
Equation (1) gives a concise representation of the convolution
A. CONVOLUTIONAL NEURAL NETWORKS (CNNs)
operation when the input is 2-demensional, and stride and
Convolutional Neural Networks (CNNs) are a class of padding are 1 and 0. Higher dimension convolution oper-
biologically inspired neural networks featuring one or mul- ations can be conducted in a similar manner. To be more
tiple convolutional layers that simulate human visual evocative, suppose the input image can be represented by a
system [27]. In recent years, due to the enhancement in 4 × 7 matrix and the convolution kernel is a 3 × 3 iden-
computational power and the dramatic increase in the amount tity matrix. As we take kernel and stride it over the image
of data available in various applications, CNNs-based meth- matrix, dot products are taken in each step and recorded in
ods have shown significant improvements in performance a feature map matrix (Figure 2). Such operation is called
and thus have become the most popular class of approaches convolution. In CNNs, multiple convolution filters are used
for pattern recognition tasks such as image classifica- in a convolutional layer, each acquiring a feature piece in its
tion [28], natural language processing [29], recommending own perspective from the input image specified by the filter
systems [30] and fault detection [23]. CNNs learn how to parameters. Regardless of what and where a feature appears
extract and recognize characteristics of the target task by in the input, the convolutional layer will try to characterize it
combining and stacking convolutional layers, pooling layers from various perspectives that have been tuned automatically
and fully connected layers in its architecture. Figure 1 illus- by the training dataset.
trates a simple CNN with an input layer to accept input
images, a convolutional layer to extract features, a ReLU
layer to augment features through non-linear transformation,
a max pooling layer to reduce data size, and a fully connected
layer combined with a softmax layer to classify the input to
pre-defined labels. The parameters are trained through a train-
ing dataset and updated using back propagation algorithm to
reflect the features of the task that may not be recognized oth-
erwise. The basic mechanism of layers in CNNs is outlined FIGURE 2. Illustration of convolution operation.
as follows.

1) CONVOLUTIONAL LAYER 2) ReLU Layer


Each feature map in the convolutional layer shown In CNNs, ReLU (rectified linear units) layers are commonly
in Figure 1 is generated by a convolution filter. Generally, used after convolutional layers. In most cases, the relationship
the input and convolution filters are tensors of size m × n between the input and output is not linear. While the convo-
and p × q × K (K is the number of filter used), respectively. lution operation is linear, the ReLU layer is designed to take
Stride (i.e., step size of the filter sliding over input) is set non-linear relationship into account, as shown in the equation
to 1 and padding (i.e., the number of rows and columns to below,
insert around the original input) is set to 0. The convolution
ȳ = max(0, y) (2)
The ReLU operation is applied to each feature map and
returns an activation map (Figure 3). The depth of the
ReLU layer equals to that of the convolutional layer.

3) MAX POOLING LAYER


Max pooling down-samples a sub-region of the activation
map to its maximum value,
ŷ = max ȳi,j (3)
FIGURE 1. An example of convolutional neural network. L1 ≤i≤U1 , L2 ≤i≤U2

VOLUME 6, 2018 26243


P. Cao et al.: Preprocessing-Free Gear Fault Diagnosis Using Small Data sets

subsequently using the training data from the novel task. Let
the training datasets from the previous task Dpre and the novel
task Dnov be represented as
Dpre = Xpre , Lpre , Dnov = {Xnov , Lnov }

(4a, b)
FIGURE 3. Illustration of ReLU and max pooling. where X is the input and L is the output label. The
CNNs for both tasks can then be regarded as, L̂pre =
CNNpre (Xpre , θ pre ),
where L1 ≤ i ≤ U1 and L2 ≤ j ≤ U2 define the sub-
L̂nov = CNNnov (Xnov , θ nov ) (5a, b)
region. The max pooling layer not only makes the network
less sensitive to location changes of a feature but also reduces CNN operator denotes the mapping of a convolutional
the size of parameters, thus alleviates computational burden neural network given parameters θ from input to predicted
and controls overfitting. output L̂. The parameters of the previous task is trained
through
B. TRANSFER LEARNING
θ 0pre = arg min(Lpre − L̂pre )
CNNs are powerful tools, and the performance can generally θ pre
be improved by up-scaling the CNN equipped. The scale of a = arg min(Lpre − CNNpre (Xpre , θ pre )) (6)
CNN concurs with the scale of the training dataset. Naturally, θ pre
the deeper the CNN, the more parameters need to be trained,
where θ 0pre stands for the parameters after training. There-
which requires a substantial amount of valid training samples.
upon, the trained parameters of the first n layers can be
Nevertheless, in gear fault diagnosis, the training data is not
transferred to the new task as,
as sufficient as that of data-rich tasks such as natural image
classification. In fact, it is impractical to collect physical θ nov (1 : n)0 := θ pre (1 : n)0 (7)
data from each failure type and especially severity since the
severity level is continuous in nature and there are infinitely The rest of the parameter can be trained using training sam-
many possible fault profiles. ples from the novel task,
Figure 4 illustrates a representative relationship between θ nov (1 : m)0 = [θ nov (1 : n)00 , θ nov (n : m)0 ]
data size and performance for different learning methods. = arg min(Lnov − CNNnov (Xnov ,
While the performance of a large-scale CNN has the potential θ nov (1:m)
to top other methods, it is also profoundly correlated with the ×[θ nov (1 : n)0 , θ nov (n : m)])) (8)
size of training data. Transfer learning, on the other hand,
is capable of achieving prominent performance commensu- In Equation (8), by setting differential learning rates, the
rate with large scale CNNs using only a small set of training parameters in the first n layers are fine-tuned as θ nov (1 : n)00
date [31], [32]. By applying knowledge and skills (in the form using a smaller learning rate, and the parameters in the last
of parameters) learned and accumulated in previous tasks (m − n) layers are trained from scratch as θ nov (n : m)0 . The
that have sufficient training data, transfer learning provides phrase ‘‘differential learning rates’’ refers to different learn-
a possible solution to improve the performance of a neural ing rates for different parts of the network during our training.
network when applied to a novel task with small training In general, the transferred layers (i.e., the first n layers) are
dataset. Classic transfer learning approaches transfer (copy) pre-trained to detect and extract generic features of inputs
the first n layers of a well-trained network to the target which are less sensitivity to the domain of application. There-
network of layer m > n. Initially, the last (m − n) layers fore, the learning rate for the transferred layers is usually
of the target network are left untrained. They are trained very small. In an extreme case where the learning rate for the
transferred layers is zero, the parameters in the first n layers
transferred are left frozen.
Therefore, the CNN used for the novel task for future fault
classification and diagnosis can be represented as,
CNNnov (Xnov , [θ nov (1 : n)00 , θ nov (n : m)0 ]) (9)
where the parameters in the first n layers are first transferred
from a previous task. Meanwhile, as the last (m − n) layers
are trained using the training dataset of the novel task, the first
n layers are fine-tuned for better results.
θ 0nov = [θ nov (1 : n)00 , θ nov (n : m)0 ] (10)
Transfer learning becomes possible and promising because,
FIGURE 4. Learning methods: data size vs. performance. as has been discovered by recent studies, the layers at the

26244 VOLUME 6, 2018


P. Cao et al.: Preprocessing-Free Gear Fault Diagnosis Using Small Data sets

convolutional stages (convolutional layers, ReLU layers and


pooling layers) of the convolutional neural network trained
on large dataset indeed extract general features of inputs,
while the layers of fully connected stages (fully connected
layers, softmax layers, classification layers) are more specific
to task [33], [34]. Therefore, the n layers transferred to the
new task as a whole can be regarded as a well-trained feature
extraction tool towards similar tasks and the last few layers
serve as a classifier to be trained. Even with substantial train-
ing data, initializing with transferred parameters can improve
the performance in general [35].
In this research, transfer learning is implemented to gear-
box fault diagnosis. The CNN is well-trained in terms of
pulling characteristics from images. As illustrated in Figure 5,
the parameters in the convolutional stage, i.e., the parameters
used in the convolution filter, the ReLU operator and the max
pooling operator are transferred to the fault diagnosis task.
The parameters used in the fully connected layer and the
softmax layers are trained subsequently using a small amount
of training data generated from gear fault experiments.

FIGURE 6. Illustration of the transfer learning architecture.

3D true-color nature images from ImageNet Large Scale


Visual Recognition Challenge 2010 (http://www.image-
FIGURE 5. Illustration of transfer learning. net.org/challenges/LSVRC/2010/). The trained parameters in
the first five stages are well-polished in characterizing high-
level abstractions of the input image and thus have the poten-
C. PROPOSED ARCHITECTURE tial to be used for other tasks with image inputs. Meanwhile,
In this sub-section we present the proposed architecture. the last three stages are trained to nonlinearly combine the
In gear fault diagnosis, vibration responses are recorded using high-level features. Although the images of vibration sig-
accelerometers during gearbox operation. The time-domain nals may look different from the images used to train the
vibration signals can then be represented directly by 2D grey- original CNN, useful features can be extracted in a similar
scale/true-color images (as shown in Figure 5) which serve as manner as long as the CNN adopted is capable of identifying
inputs of the deep CNN. More details on image representation high-level abstractions [35]. Stage 8 of the original archi-
of time-domain data will be provided in Section III.A. The tecture is configured for 1000 classes in the previous image
deep CNN adopted as the base architecture in this study classification task. Therefore, the first seven stages of the
was originally proposed by Krizhevsky et al. [28] which is base architecture can be possibly transferred to facilitate gear
essentially composed of five convolutional stages and three fault diagnosis. As discussed in Section II.B, the first seven
fully connected stages (Figure 6). This base architecture stages indeed serve as a general well-trained tool for auto-
showed its extraordinary performance in Large Scale Visual matic feature extraction. The more stages and layers used,
Recognition Challenge 2012 (ILSVRC2012) and has since the higher level of features can be obtained. The final stage
been repurposed for other learning tasks [31]. is left to be trained as a classifier using the experimental data
In the base architecture, the parameters are trained specific to the fault diagnosis task. As specified in Table 1,
using approximately 1.2 million human/software labeled a total of 24 layers are used in the proposed architecture; the

VOLUME 6, 2018 26245


P. Cao et al.: Preprocessing-Free Gear Fault Diagnosis Using Small Data sets

TABLE 1. Specifications of the proposed architecture. where α is the learning rate, i is the number of iteration, and
β stands for the contribution of previous gradient step. While
classical SGD and momentum SGD are frequently adopted in
training CNNs for their simplicity and efficiency, other tech-
niques, such as AdaGrad, AdaDelta or Adam [38] can also
be applied to carry out optimization of Equation (11). The
transferability of the base architecture and the performance
of the proposed architecture for gear fault diagnosis will be
investigated in the next section.

III. GEAR FAULT DIAGNOSIS IMPLEMENTATION AND


DEMONSTRATION
A. DATA ACQUISITION
Many types of faults and failure modes can occur to gear
transmission in various machinery systems. Vibration signals
collected from such a system are usually used to reveal its
health condition. In this research, experimental data are col-
lected from a benchmark two-stage gearbox with replaceable
gears as shown in Figure 7. The gear speed is controlled by
a motor. The torque is supplied by a magnetic brake which
can be adjusted by changing its input voltage. A 32-tooth
pinion and an 80-tooth gear are installed on the first stage
input shaft. The second stage consists of a 48-tooth pinion
parameters and specifications used in the first 21 layers can and 64-tooth gear. The input shaft speed is measured by a
be transferred from the base architecture. tachometer, and gear vibration signals are measured by an
We observe Table 1. Overfitting of the learning model accelerometer. The signals are recorded through a dSPACE
is essentially controlled by the max pooling layers in system (DS1006 processor board, dSPACE Inc.) with sam-
Stages 1, 2, and 5, and the dropout layers in Stages 6 and 7. pling frequency of 20 KHz. As shown in Figure 8, nine
As explained in Section II.A, a max pooling layer not only different gear conditions are introduced to the pinion on the
makes the network less sensitive to location changes of a input shaft, including healthy condition, missing tooth, root
feature but also reduces the size of parameters. Therefore, crack, spalling, and chipping tip with five different levels
max pooling can reduce computational burden and control of severity. The dynamic responses of a system involving
overfitting. In our architecture, dropout layers are employed gear mechanism are angle-periodic. In reality, while gearbox
after the ReLU layers in Stages 6 and 7. Because a fully system is recorded in a fixed sampling rate, the time-domain
connected layer possesses a large number of parameters, it is responses are generally not time-periodic due to speed varia-
prone to overfitting. A simple and effective way to prevent tions under load disturbance, geometric tolerance, and motor
from overfitting is dropout [36]. In our study, individual control error etc [13]. In order to solve the non-stationary
nodes are ‘‘dropped out of’’ (temporarily removed from) the issue and eliminate the uncertainty caused by speed vary-
net with probability 50% as suggested in [36]. Dropout can ing, here we apply the time synchronous averaging (TSA)
be interpreted as a stochastic regularization technique which approach, where the time-even signals are resampled based
not only decreases overfitting by avoiding training all nodes, on the shaft speed measured by the tachometer and averaged
but also significantly improves training efficiency. in angular domain. As TSA converts the signals from the
The loss function used is the cross-entropy function given time-even representation to the angle-even representation,
as follows, it can significantly reduce the non-coherent components in
the system response. It is worth mentioning that TSA is
E(θ) = −L̂ ln (CNN (X, θ)) + γ kθk2 = −L̂ ln L + γ kθk2
(11)

where kθk2 is a l2 normalization term which also


contributes to preventing the network from overfitting.
Equation (11) quantifies the difference between correct out-
put labels and predicted labels. And the loss is then back-
propagated to update the parameters using the stochastic
gradient descent (SGD) method [37] given as,

θ i+1 = θ i − α∇E(θ i ) + β(θ i − θ i−1 ) (12) FIGURE 7. Gearbox system employed in experimental study.

26246 VOLUME 6, 2018


P. Cao et al.: Preprocessing-Free Gear Fault Diagnosis Using Small Data sets

generate a polyline. Figure 9(b) shows an example of such


polyline represented in an 875 × 656 image generated
by MATLAB plot function. The original matrix or image
representation of the vibration signal is then resized to
a 227 × 227 gray scale image using Bicubic interpola-
tion [39] as shown in Figure 9(c). There are 51,529 pixels per
image. Figure 10 showcases some example images generated
from angle-even vibration signals. For each gear condition,
104 signals are collected using the experimental gearbox sys-
tem. For each signal, 3,600 angle-even samples are recorded
in the course of four gear revolutions first for the case study
in Section III.C, and then down-sampled to 900 angle-even
points for the case study in Section III.D. Figure 10 shows
20 example signals of each type of gear condition where the
vertical axis is the acceleration of the gear (rad/s2 ) and the
horizontal axis corresponds to the 3,600 angel-even sampling
points. All the data used in this study is made public at
https://doi.org/10.6084/m9.figshare.6127874.v1.

B. SETUP OF CASE ILLUSTRATION AND COMPARISON


In this study, in order to highlight its effectiveness, the pro-
posed transfer learning approach is examined and compared
with two contemporary approaches. As indicated, the pro-
posed transfer learning approach does not rely on manual
FIGURE 8. Nine pinions with different health conditions (five levels of
severity for chipping tip).
selection of features, and we use this approach to analyze
the angle-even representation of the original time-domain
signals. The first approach adopted for comparison is a three-
a standard, non-biased technique that can facilitate effective stage (nine layers) CNN, thereafter referred to as local CNN,
pattern recognition of various datasets [13]. which consists of two convolutional stages and a fully con-
To proceed, in this research we adopt a preprocessing-free nected stage and uses the angle-even representation of the
approach to transform the vibration signals to images in order time-domain signals as inputs. Different from the proposed
to discover the 2D features of raw signals. As time domain approach, the local CNN will be only trained by the data
vibration signals have been cast into angle-even domain generated from gearbox experiments. The specifications are
for consistency as sample points (Figure 9(a)), the adjacent the same as the stage 1, stage 2 and stage 8 given in Table 1.
data points are then connected in chronological sequence to The other approach adopted for comparison is based upon
manual identification/selection of features. In a recent inves-
tigation, it was recognized that the angle-frequency domain
synchronous analysis (AFS) can enhance significantly fault-
induced features in gearbox responses [13]. AFS resamples
the time-domain signal into angle-domain based on the speed
information collected from tachometer. The angle-domain
signal is then sliced into a series of segments every four gear
revolutions. Subsequently, angle-frequency analysis based on
short time Fourier Transform is carried out on each seg-
ment of the angle-domain signal. The resultant spectrogram
coefficients are then averaged to remove the noise and non-
coherent components. As such, the features related to the gear
health conditions are highly enhanced and a feature extraction
technique, i.e. Principal Component Analysis, is employed
to reduce the dimensionality. In this research, these low-
dimensional data extracted by AFS are imported into support
vector machine (SVM) for fault classification.
For the proposed transfer learning approach and the
FIGURE 9. Construction of input for transfer leaning. (a) 875∗ 656 image locally-trained CNN approach (local CNN), mini-batch size
representation of 3600 samples, (b) 875∗ 656 image representation of the
samples connected, (c) 227∗ 227 image representation of the samples
is set to 5, and 15 epochs are conducted meaning the training
connected. datasets are used to train the neural net 15 times throughout.

VOLUME 6, 2018 26247


P. Cao et al.: Preprocessing-Free Gear Fault Diagnosis Using Small Data sets

SVM approach based on manual feature selection, Gaussian


kernel is adopted. In the next two sub-sections, the relative
performance of the three approaches is highlighted as we
change the sampling frequency as well as the size of the
training dataset, i.e., the portion of measured gear vibration
signals used for training.
Neural networks are inherently parallel algorithms. There-
fore, graphical processing units (GPUs) are frequently
adopted as the execution environment to take advantage of
the parallelism natural of CNNs and expedite the classifi-
cation process. In this research, both CNNs are trained and
implemented using a single CUDA-enabled NVIDIA Quadro
M2000 GPU, while AFS-SVM approach is facilitated based
on an Intel Xenon E5-2640 v4 CPU.

C. CASE 1 – 3,600 SAMPLING POINTS WITH VARYING


TRAINING DATA SIZE
As mentioned in Section III.A, 104 vibration signals are gen-
erated for each gear condition. In the case studies, a portion
of the signals are randomly selected as training data while the
rest serves as validation data. To demonstrate the performance
of the proposed approach towards various data sizes, the size
of the training dataset ranges from 80% (83 training data per
condition, 83 × 9 data in total) to 2% (2 training data per
condition, 2 × 9 data in total) of all the 104 signals for each
health condition.
Table 2 shows the classification results where the mean
accuracy is the average of five training attempts. The clas-
sification accuracy is the ratio of the correctly classified
validation data to the total validation dataset. As illustrated
in Figure 11, the proposed transfer learning approach has
the best classification accuracy for all types of data size.
Even when only five vibration signals per condition are
selected for training, the proposed approach is able to achieve
an excellent 94.90% classification accuracy, which further
increases to 99%-100% when 10% and more training data are
used. On the other hand, while the performance of AFS-SVM
reaches the plateau (showing only minimal increments) after
20% date is used for training, the classification accuracy of
local CNN gradually increases with data size from 27.99% to
97.57% and surpasses AFS-SVM eventually when 80% data
is used for training, indicating the significance of the size
of training data in order to properly train a neural network.
Although the data size greatly affects the performance of
a CNN in the general sense, the proposed transfer learn-
ing architecture exhibits very high classification accuracy
because only one fully connected stage needs to be trained
locally, which notably lowers the standard of the data required
FIGURE 10. Vibration signal examples under different gear health by a CNN in terms of achieving satisfactory outcome.
conditions. Healthy, (b) Missing tooth, (c) Root crack, (d) Spalling,
(e) Chipping tip_5 (least severe), (f) Chipping tip_4, (g) Chipping tip_3, The average computational time consumed by each method
(h) Chipping tip_2, (i) Chipping tip_1 (most severe). is reported in Table 3, which contains the portions used
for both training and classification. Generally speaking,
The learning rate α is set to be 1e−4 and 1e−2 for transferred deep neural networks are more time consuming in training
layers and non-transferred layers, respectively, following the compared to traditional approaches. The computational cost
suggestion in [28]. The momentum β in Equation (12) is set per iteration of a mini-batch back-propagation is propor-
to 0.9 for transfer learning and 0.5 for local CNN. For the tional to the number of weights involved. And the overall

26248 VOLUME 6, 2018


P. Cao et al.: Preprocessing-Free Gear Fault Diagnosis Using Small Data sets

TABLE 2. Classification results (3,600 sampling points). TABLE 3. Computational time comparison (average of 5 attempts).

FIGURE 12. Convergent histories of transfer learning and local CNN


for 5% training data. (a) Accuracy, (b) Mini-batch loss.

local CNN when 5% data is used for training. As can be seen


from the comparisons, transfer learning gradually converges
in terms of both accuracy and loss as the training iterates
while local CNN inclines to ‘random walk’ due to insufficient
data. Compared with AFS-SVM, the proposed approach not
only excels in performance, but also requires no preprocess-
ing effort, which makes it more unbiased in feature extraction
FIGURE 11. Comparison of classification results when training data size and readily applicable to other fault diagnosis practices. The
varies. proposed approach also shows satisfactory outcomes in the
regard of robustness. As demonstrated in Figure 13, it has
computational time is linearly proportional to the size of the smallest variance among all cases. On the other hand,
the training data. As shown in Table 3, when the size of the performance of the under-trained local CNN oscillates the
training data is small (2%), the transfer learning approach not most.
only leads in accuracy, but also in computational efficiency As mentioned in Section II.C, the parameters in the first
compared to AFS-SVM. five convolutional stages of the original CNN are well-trained
Figure 12 shows the convergent histories (mini-batch accu- in characterizing high-level abstractions while the last three
racy and mini-batch loss) of the proposed approach and fully connected stages are trained to nonlinearly combine the

VOLUME 6, 2018 26249


P. Cao et al.: Preprocessing-Free Gear Fault Diagnosis Using Small Data sets

TABLE 4. Classification results of transfer learnings (average


of 5 attempts).

FIGURE 14. Feature maps extracted by 5 convolution layers of the


proposed transfer learning approach.

FIGURE 15. Vibration signal of a spalling gear. (a) 3,600 sampling points,
(b) 900 sampling points.

compare the accuracy of the transfer learning approach when


different aggregates are transferred. As shown in Table 4,
transferring Stages 1 to 7 and transferring Stages 1 to 6
yield similar performances, which are better than transfer-
ring merely Stages 1 to 5 especially when data size is
FIGURE 13. Comparison of box plots of classification results when small. Recall Table 1. Stage 6 contains 4096 more weighting
training data size varies. (a) 2%, (b) 5%, (c) 10%, (d) 20%, (e) 40%, parameters, which apparently requires more training data to
(f) 60%, (g) 80%.
fine-tune even though the feature extraction passage is well-
established. Moreover, transferring more layers may indeed
high-level features. Hence, it is recommended to repurpose prevent the model from overfitting because the layers trans-
Stages 1 to 5 for novel tasks as to adaptively extract image ferred are already extensively trained so the generalization
features. Whether to transfer Stages 6 and 7 remains optional error is naturally reduced when the model is repurposed and
depending on the training data size. In our previous com- only a small portion is trained by a different set of data.
parisons, only Stage 8 is reconstructed (from 1000 classes As discussed in Section II.B and Section II.C, the trans-
to 9 classes) and trained using local dataset. Here, we also ferred stages of the proposed architecture tend to extract

26250 VOLUME 6, 2018


P. Cao et al.: Preprocessing-Free Gear Fault Diagnosis Using Small Data sets

TABLE 5. Classification results (900 sampling points).

FIGURE 17. Box plots of classification results of the three methods after
down sampling. (a) 2%, (b) 5%, (c) 10%, (d) 20%, (e) 40%, (f) 60%,
(g) 80%.

abstraction level of the input image continuously escalates


from the 1st feature map to the 5th feature map. In general,
the number of convolutional stages equipped is correlated
with the level of abstraction the features can be represented
in CNNs. As demonstrated in this case study, the base archi-
tecture is indeed transferable towards gear fault diagnosis
tasks and the proposed approach performs well with raw
image signal inputs, which indicates the transferred layers
constructed in this study are generally applicable to represent
useful features of an input image in high-level abstraction.
D. CASE 2 – 900 SAMPLING POINTS WITH VARYING
FIGURE 16. Classification results of the three methods after down
TRAINING DATA SIZE
sampling. In Case 1, each vibration signal is composed of 3,600 angel-
even data points in the course of 4 gear revolutions. In some
the high-level abstract features of the input that cannot be practical fault diagnosis systems, however, the sampling rate
recognized otherwise, even if the input is different from that may be lower, which means that some features could have
of the previous task. Figure 14 gives an example of such been lost. To take this factor into consideration and further
procedure by showing the feature maps generated in each examine the approach, we now down-sample the original
convolutional layer by the proposed architecture when it is vibration signals to 900 angel-even data points (Figure 15)
used to classify a gearbox vibration signal. It is seen that the and apply the same three methods for classification.

VOLUME 6, 2018 26251


P. Cao et al.: Preprocessing-Free Gear Fault Diagnosis Using Small Data sets

Table 5 lists the comparison of the classification results [6] T. Fakhfakh, F. Chaari, and M. Haddar, ‘‘Numerical and experimental
of the three methods with different training data sizes. Sim- analysis of a gear system with teeth defects,’’ Int. J. Adv. Manuf. Technol.,
vol. 25, nos. 5–6, pp. 542–550, 2005.
ilar to Case 1, the proposed transfer learning approach is [7] D. Z. Li, W. Wang, and F. Ismail, ‘‘An enhanced bispectrum technique
the best performer. Figure 16 illustrates the classification with auxiliary frequency injection for induction motor health condition
results before and after down-sampling. While lowering the monitoring,’’ IEEE Trans. Instrum. Meas., vol. 64, no. 10, pp. 2679–2687,
Oct. 2015.
sampling rate deteriorates the overall performance of all [8] W. Wen, Z. Fan, D. Karg, and W. Cheng, ‘‘Rolling element bearing
approaches, each method exhibits the similar trend as seen fault diagnosis based on multiscale general fractal features,’’ Shock Vib.,
in Section III.C. For transfer learning, it starts with 60.11% vol. 2015, Jul. 2015, Art. no. 167902.
[9] B. Tang, W. Liu, and T. Song, ‘‘Wind turbine fault diagnosis based on
classification accuracy and reaches 95.88% when only 20% Morlet wavelet transformation and Wigner–Ville distribution,’’ Renew.
of data is used as training data whilst the accuracies of local Energy, vol. 35, no. 12, pp. 2862–2866, 2010.
CNN and AFS-SVM are 43.56% and 70.07%. Local CNN [10] F. Chaari, W. Bartelmus, R. Zimroz, T. Fakhfakh, and M. Haddar, ‘‘Gear-
box vibration signal amplitude and frequency modulation,’’ Shock Vib.,
performs better than AFS-SVM when 80% data is used for
vol. 19, no. 4, pp. 635–652, 2012.
training. Unlike AFS-SVM, the performance of local CNN [11] R. Yan, R. X. Gao, and X. Chen, ‘‘Wavelets for fault diagnosis of rotary
can be largely improved if significantly more training data machines: A review with applications,’’ Signal Process., vol. 96, pp. 1–15,
is incorporated because the parameters of lower stages can Mar. 2014.
[12] X. Chen and Z. Feng, ‘‘Time-frequency analysis of torsional vibration
be learned from scratch. Eventually, the performance of local signals in resonance region for planetary gearbox fault diagnosis under
CNN could reach that of the transfer learning approach. Nev- variable speed conditions,’’ IEEE Access, vol. 5, pp. 21918–21926, 2017.
ertheless, for cases with limited data, the proposed transfer [13] S. Zhang and J. Tang, ‘‘Integrating angle-frequency domain synchronous
averaging technique with feature extraction for gear fault diagnosis,’’
learning approach has an extensive performance margin com- Mech. Syst. Signal Process., vol. 99, pp. 711–729, Jan. 2018.
pared to local CNN or other preprocessing-based shallow [14] C. Pachaud, R. Salvetat, and C. Fray, ‘‘Crest factor and kurtosis contribu-
learning methods such as AFS-SVM. Even with ample train- tions to identify defects inducing periodical impulsive forces,’’ Mech. Syst.
Signal Process., vol. 11, no. 6, pp. 903–916, 1997.
ing data, initializing with transferred parameters can improve [15] S. Qian and D. Chen, ‘‘Joint time-frequency analysis,’’ IEEE Signal Pro-
the classification accuracy in general. Moreover, the proposed cess. Mag., vol. 16, no. 2, pp. 52–67, Mar. 1999.
approach requires no preprocessing. Similar to Case 1 in [16] N. Baydar and A. Ball, ‘‘A comparative study of acoustic and vibration
Section III.C, the proposed approach is very robust especially signals in detection of gear failures using Wigner–Ville distribution,’’
Mech. Syst. Signal Process., vol. 15, no. 6, pp. 1091–1107, 2001.
when 40% or more data is used for training (Figure 17). [17] W. Bartelmus and R. Zimroz, ‘‘Vibration condition monitoring of planetary
gearbox under varying external load,’’ Mech. Syst. Signal Process., vol. 23,
no. 1, pp. 246–257, 2009.
IV. CONCLUDING REMARKS
[18] J. Lin and M. J. Zuo, ‘‘Gearbox fault diagnosis using adaptive wavelet
In this research, a deep convolutional neural network-based filter,’’ Mech. Syst. Signal Process., vol. 17, no. 6, pp. 1259–1269, 2003.
transfer learning approach is developed for deep feature [19] Y. Lu, J. Tang, and H. Luo, ‘‘Wind turbine gearbox fault detection using
extraction and applied to gear fault diagnosis. This proposed multiple sensors with features level data fusion,’’ J. Eng. Gas Turbines
Power, vol. 134, no. 4, p. 042501, 2012.
approach does not require manual feature extraction, and can [20] C. Zhang, J. H. Sun, and K. C. Tan, ‘‘Deep belief networks ensemble
be effective even with a small set of training data. Experi- with multi-objective optimization for failure diagnosis,’’ in Proc. IEEE Int.
mental studies are conducted using preprocessing free raw Conf. Syst., Man, Cybern. (SMC), Oct. 2015, pp. 32–37.
[21] C. Li, R.-V. Sanchez, G. Zurita, M. Cerrada, D. Cabrera, and R. E. Vásquez,
vibration data towards gear fault diagnose. The performance ‘‘Gearbox fault diagnosis based on deep random forest fusion of acous-
of the proposed approach is highlighted through varying the tic and vibratory signals,’’ Mech. Syst. Signal Process., vols. 76–77,
size of training data. The classification accuracies of the pp. 283–293, Aug. 2016.
[22] D. Weimer, B. Scholz-Reiter, and M. Shpitalni, ‘‘Design of deep con-
proposed approach outperform those of other methods such volutional neural network architectures for automated feature extraction
as locally trained convolutional neural network and angle- in Industrial inspection,’’ CIRP Ann.-Manuf. Technol., vol. 65, no. 1,
frequency analysis-based support vector machine by as much pp. 417–420, 2016.
[23] T. Ince, S. Kiranyaz, L. Eren, M. Askar, and M. Gabbouj, ‘‘Real-time motor
as 50%. The achieved accuracy indicates that the proposed
fault detection by 1-D convolutional neural networks,’’ IEEE Trans. Ind.
approach is not only viable and robust, but also has the Electron., vol. 63, no. 11, pp. 7067–7075, Nov. 2016.
potential to be applied to fault diagnosis of other systems. [24] O. Abdeljaber, O. Avci, S. Kiranyaz, M. Gabbouj, and D. J. Inman, ‘‘Real-
time vibration-based structural damage detection using one-dimensional
convolutional neural networks,’’ J. Sound Vib., vol. 388, pp. 154–170,
REFERENCES Feb. 2017.
[1] D. Kang, Z. Xiaoyong, and C. Yahua, ‘‘The vibration characteristics of [25] N. Saravanan and K. I. Ramachandran, ‘‘Incipient gear box fault diagnosis
typical gearbox faults and its diagnosis plan,’’ J. Vib. Shock, vol. 20, no. 3, using discrete wavelet transform (DWT) for feature extraction and classifi-
pp. 7–12, 2001. cation using artificial neural network (ANN),’’ Expert Syst. Appl., vol. 37,
[2] R. B. Randall, Vibration-Based Condition Monitoring: Industrial, no. 6, pp. 4168–4181, 2010.
Aerospace and Automotive Applications. West Sussex, U.K.: Wiley, 2011. [26] J. Yang, S. Li, and W. Xu, ‘‘Active learning for visual image classification
[3] F. P. G. Márquez, A. M. Tobias, J. M. P. Pérez, and M. Papaelias, ‘‘Con- method based on transfer learning,’’ IEEE Access, vol. 6, pp. 187–198,
dition monitoring of wind turbines: Techniques and methods,’’ Renew. 2018.
Energy, vol. 46, pp. 169–178, Oct. 2012. [27] Y. Le Cun et al., ‘‘Handwritten digit recognition with a back-propagation
[4] W. Zhou, T. G. Habetler, and R. G. Harley, ‘‘Bearing fault detection via network,’’ in Proc. Adv. Neural Inf. Process. Syst., 1990, pp. 396–404.
stator current noise cancellation and statistical control,’’ IEEE Trans. Ind. [28] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification
Electron., vol. 55, no. 12, pp. 4260–4269, Dec. 2008. with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf. Pro-
[5] A. Parey and R. B. Pachori, ‘‘Variable cosine windowing of intrinsic mode cess. Syst., 2012, pp. 1097–1105.
functions: Application to gear fault diagnosis,’’ Measurement, vol. 45, [29] Y. Kim. (2014). ‘‘Convolutional neural networks for sentence classifica-
no. 3, pp. 415–426, 2012. tion.’’ [Online]. Available: https://arxiv.org/abs/1408.5882

26252 VOLUME 6, 2018


P. Cao et al.: Preprocessing-Free Gear Fault Diagnosis Using Small Data sets

[30] A. van den Oord, S. Dieleman, and B. Schrauwen, ‘‘Deep content-based SHENGLI ZHANG received the B.S. degree from
music recommendation,’’ in Proc. Adv. Neural Inf. Process. Syst., 2013, Northwestern Polytechnical University, Xi’an,
pp. 2643–2651. China, in 2009, the M.S. degree from Xi’an Jiao-
[31] C.-K. Shie, C.-H. Chuang, C.-N. Chou, M.-H. Wu, and E. Y. Chang, tong University, Xi’an, in 2012, and the Ph.D.
‘‘Transfer representation learning for medical image analysis,’’ in Proc. degree from the University of Connecticut, Storrs,
37th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), Aug. 2015, USA, in 2017, all in mechanical engineering.
pp. 711–714. After graduation, he joined Stanley Black &
[32] R. Zhang, H. Tao, L. Wu, and Y. Guan, ‘‘Transfer learning with neural
Decker, Towson, MD, USA, as a CAE Engi-
networks for bearing fault diagnosis in changing working conditions,’’
neer, where he is involved in product development
IEEE Access, vol. 5, pp. 14347–14357, 2017.
[33] M. D. Zeiler and R. Fergus. (2013). ‘‘Stochastic pooling for regularization and experimental data analysis as well as product
of deep convolutional neural networks.’’ [Online]. Available: https://arxiv. performance prediction and analytical dynamic analysis.
org/abs/1301.3557
[34] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun.
(2013). ‘‘OverFeat: Integrated recognition, localization and detection using
convolutional networks.’’ [Online]. Available: https://arxiv.org/abs/1312.
6229
[35] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, ‘‘How transferable are
features in deep neural networks?’’ in Proc. Adv. Neural Inf. Process. Syst.,
2014, pp. 3320–3328.
[36] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
R. Salakhutdinov, ‘‘Dropout: A simple way to prevent neural networks
from overfitting,’’ J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958,
2014.
[37] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, ‘‘On the importance of
initialization and momentum in deep learning,’’ in Proc. Int. Conf. Mach. JIONG TANG (M’09) received the B.S. and M.S.
Learn., Feb. 2013, pp. 1139–1147. degrees in applied mechanics from Fudan Univer-
[38] D. P. Kingma and J. Ba. (2014). ‘‘Adam: A method for stochastic optimiza- sity, China, in 1989 and 1992, respectively, and
tion.’’ [Online]. Available: https://arxiv.org/abs/1412.6980 the Ph.D. degree in mechanical engineering from
[39] H. S. Prashanth, H. L. Shashidhara, and M. K. N. Balasubramanya, ‘‘Image the Pennsylvania State University, USA, in 2001.
scaling comparison using universal image quality index,’’ in Proc. Int.
He was with the GE Global Research Center as a
Conf. Adv. Comput., Control, Telecommun. Technol. (ACT), Dec. 2009,
Mechanical Engineer from 2001 to 2002. Then he
pp. 859–863.
joined the Mechanical Engineering Department,
University of Connecticut, where he is currently a
PEI CAO received the B.S. degree in automa- Professor and the Director of Dynamics, Sensing,
tion from Northwestern Polytechnical University, and Controls Laboratory. His research interests include structural dynam-
Xi’an, China, in 2011. He is currently pursuing ics and system dynamics, control, and sensing and monitoring. He cur-
the Ph.D. degree in mechanical engineering with rently serves as an Associate Editor for the IEEE/ASME TRANSACTIONS ON
the University of Connecticut, Storrs, CT, USA. MECHATRONICS, and served as an Associate Editor for the IEEE TRANSACTIONS
His research interests include global optimization, ON INSTRUMENTATION AND MEASUREMENT from 2009 to 2012. He also served as
dynamic programming, statistical inference, lay- an Associate Editor for the ASME Journal of Vibration and Acoustics, and an
out design, and machine learning. Associate Editor for the ASME Journal of Dynamic Systems, Measurement,
and Control.

VOLUME 6, 2018 26253

You might also like