0% found this document useful (0 votes)

69 views13 pages

ISA Transactions: Te Han, Chao Liu, Wenguang Yang, Dongxiang Jiang

Uploaded by

Felipe Andres Figueroa Videla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views13 pages

ISA Transactions: Te Han, Chao Liu, Wenguang Yang, Dongxiang Jiang

Uploaded by

Felipe Andres Figueroa Videla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

ISA Transactions 97 (2020) 269–281

Contents lists available at ScienceDirect

ISA Transactions
journal homepage: www.elsevier.com/locate/isatrans

Practice article

Deep transfer network with joint distribution adaptation: A new

intelligent fault diagnosis framework for industry application
∗
Te Han a,b , Chao Liu a,c , , Wenguang Yang a,b , Dongxiang Jiang a,b
a
Department of Energy and Power Engineering, Tsinghua University, Beijing 100084, China
b
State Key Laboratory of Control and Simulation of Power System and Generation Equipment, Tsinghua University, Beijing 100084, China
c
Key Laboratory for Thermal Science and Power Engineering of Ministry of Education, Tsinghua University, Beijing 100084, China

article info a b s t r a c t

Article history: In recent years, an increasing popularity of deep learning model for intelligent condition monitoring
Received 30 January 2019 and diagnosis as well as prognostics used for mechanical systems and structures has been observed.
Received in revised form 2 August 2019 In the previous studies, however, a major assumption accepted by default, is that the training and
Accepted 4 August 2019
testing data are taking from same feature distribution. Unfortunately, this assumption is mostly invalid
Available online 12 August 2019
in real application, resulting in a certain lack of applicability for the traditional diagnosis approaches.
Keywords: Inspired by the idea of transfer learning that leverages the knowledge learnt from rich labeled data in
Transfer learning source domain to facilitate diagnosing a new but similar target task, a new intelligent fault diagnosis
Domain adaptation framework, i.e., deep transfer network (DTN), which generalizes deep learning model to domain
Joint distribution adaptation adaptation scenario, is proposed in this paper. By extending the marginal distribution adaptation
Intelligent fault diagnosis (MDA) to joint distribution adaptation (JDA), the proposed framework can exploit the discrimination
Convolutional neural networks structures associated with the labeled data in source domain to adapt the conditional distribution of
unlabeled target data, and thus guarantee a more accurate distribution matching. Extensive empirical
evaluations on three fault datasets validate the applicability and practicability of DTN, while achieving
many state-of-the-art transfer results in terms of diverse operating conditions, fault severities and fault
types.
© 2019 ISA. Published by Elsevier Ltd. All rights reserved.

1. Introduction data and testing data follow a similar distribution. Take bearing
fault diagnosis as an example. Lei et al. [1] utilized ensemble
In modern industry, machines and equipment are developing empirical mode decomposition (EEMD) and statistical param-
towards the direction of high-precision, high-efficiency, more eters to extract features, and wavelet neural network (WNN)
automatic and more complicated, making the breakdown or even to intelligently classify and diagnose bearing health conditions.
accidents more frequent. Intelligent monitoring and fault diagno- Verstraete et al. [10] designed a deep feature learning method
sis systems, in a broad sense, have always been key to attaining using time–frequency images and convolutional neural networks
the enhancement of security and reliability of industry equip- (CNN) for bearing fault diagnosis. Feng et al. [11] presented a
ment [1]. Over the past decade, various attempts have been made local connection network constructed by stacked auto-encoder
to design efficient algorithms or new ways for achieving superior (SAE) to extract shift-invariant features from bearing fault signals.
diagnostic performance. These studies usually merge advanced Numerous other works can be found in related reviews [15,16].
signal processing algorithms and machine learning techniques to In these works, the monitored signal is generally divide into
process machine data and make diagnostic decisions intelligently, many segments, i.e., samples. These samples are randomly par-
leading to impressive results in many diagnosis cases [2–6]. titioned into the training data and testing data. In this manner,
Marvelous success on diverse intelligent fault diagnosis frame- the designed methods or algorithms are actually validated in
works have been reported over the past decade [7–14]. However, the same data distribution. These reported works contribute to
two latent problems in these works may restrict extensive and the development of more effective diagnosis methods utilizing
flexible industry applications. (1) Most of designed methods or expert knowledge or adaptively feature learning, while ignore
algorithms are validated based on a assumption: the training the fact of distribution discrepancy. Due to the multiple loading
conditions, working environments and fault severities for bear-
∗ Corresponding author at: Department of Energy and Power Engineering, ing, the distributions between training data and testing data are
Tsinghua University, Beijing 100084, China. different in real situations. The diagnostic model is generally
E-mail address: [email protected] (C. Liu). learned with the training data of limited conditions, and the

https://doi.org/10.1016/j.isatra.2019.08.012
0019-0578/© 2019 ISA. Published by Elsevier Ltd. All rights reserved.
270 T. Han, C. Liu, W. Yang et al. / ISA Transactions 97 (2020) 269–281

generalization error cannot be large enough to guarantee the 2. Related works

success on the testing data for diverse application domains. (2)
The success of intelligent fault diagnosis methods relies on the To date, according to the procedure of diagnosis framework,
supervised training of labeled data. A large amount of training most of previous studies can be divided into two stages. In the
data is often required so that the hierarchical features can be first stage, the diagnosis framework mainly includes three steps
fully learned and a stronger generalization ability can be achieved (1) data collection, (2) feature extraction and selection, and (3)
by the deep networks with millions of parameters. However, in fault classification (Fig. 1(a)) [7,23,24]. In this framework, a mas-
real problems, especially for those unseen conditions, collecting sive efforts have been devoted to manual feature extraction and
sufficient typically labeled samples is usually an expensive or selection. This process benefits from the extensive domain ex-
even impossible task. Considering the previous problem, it is even pertise captured by diagnosis specialist, but inevitably requires
impossible to relearn the diagnostic model from sufficient fault a large expenditure of labor and time. Besides, the designed
samples for new diagnosis tasks. features always aim at special application object, and thus have
Consequently, there is, in particular, a need to develop a limited adaptability when facing new diagnosis issues or chang-
framework that can solve the problem of distribution discrep- ing the physical characteristics of the original systems [25]. More-
ancy between training data and testing data. And the obtained over, the final decision-making resorts to pattern recognition
useful information from historical training tasks can be fur- methods, and the diagnostic performance is often sensitive to
ther borrowed to assist the diagnosis of new but similar testing model parameters, such as the penalty factor and kernel func-
tasks, instead of reconstructing and re-training a new diagnosis tion parameter in support vector machine (SVM), indicating the
model from scratch. This could make the diagnosis systems more additional parameter optimization procedure need to be exe-
practical and flexible to be deployed in a variety of applica- cuted [26]. To tackle these issues, an adaptive feature learning
tions. Transfer learning provides a new tool for these problems based diagnostic framework with deep learning technology is
and has proven its wide applicability spanning through various emerging in the second stage [12,27–29]. With the aid of multi-
fields [17–21]. Different from traditional machine learning pro- layer nonlinear modeling scheme, this framework provides an
cedure, transfer learning framework focuses on enhancing the end-to-end learning procedure from input signals to output diag-
model performance and reduces the quantity of required sample nosis labels, as shown in Fig. 1(b). The training process, in which
in target domain by leveraging the transferable features or knowl- the error estimated by the upper classification layer is back-
edge from source domain [22]. It is definitely a promising way
propagated to update the parameters of lower feature descriptor
for tackling the aforementioned challenges. Motivated by this,
layers, further guarantees the co-adaptation of the whole net-
a new intelligent fault diagnosis framework, i.e., deep transfer
work. In the past few years, the deep learning models, including
network (DTN), is proposed in this work. A base network, CNN,
SAE [11,30], deep belief networks (DBN) [31] and, in particular,
is first learned on sufficient training data (source domain). Then,
CNN [10,13,14,32–35], have gained much popularity and success
a joint distribution adaptation (JDA) scheme is devised to reduce
in mechanical fault diagnosis issues, showing an extraordinary
the discrepancy and learn the shared features representation be-
feature learning and fitting capacity. These studies significantly
tween training data and real-time monitored testing data (target
facilitate the application of artificial intelligence in fault diagno-
domain) in real deployment scenario. Finally, the adapted model
sis. However, as mentioned above, most of developed methods
can be competent in the target tasks. On the basis of absorbing
are still categorized into traditional machine learning without
and drawing upon informed research, DTN utilizes the widely
considering the problem of domain discrepancy. For real indus-
reported deep learning-based fault diagnosis framework, and
trial diagnosis tasks,typically fault data is often limited, and the
further realize the goal of cross-domain fault diagnosis, making
training data are generally from the experimental environment
the diagnosis framework be more fit to practical engineering.
The major contributions of this work can be summarized or the historical database of associated equipment. Due to the
as: (1) A new intelligent fault diagnosis framework, exploiting complex application scenarios of mechanical equipment, the real-
the idea of transfer learning, is proposed for more practical ap- time testing data may follow the different feature distribution.
plication scenarios. The DTN with JDA method can utilize the Consequently, the researches on cross-domain fault diagnosis
unlabeled target data to realize domain adaption, which conforms have a significant practical sense.
better with the real situation. (2) Compared with traditional max- In recent years, the broad application prospect of transfer
imum mean discrepancy (MMD) minimization in marginal distri- learning has been viewed in different research areas. Several com-
bution, the superiority of JDA has been demonstrated in the field prehensive surveys were made to review the present develop-
of fault diagnosis. The experimental results show the DTN with ment of transfer learning [36,37]. In the surveys, transfer learning
JDA is able to provide a more accurate matching of feature distri- technology is categorized into many branches, such as inductive
bution between domains. (3) Three benchmark datasets, i.e., wind transfer learning, transductive transfer learning and unsupervised
turbine dataset, bearing dataset and gearbox dataset, are used for transfer learning. Domain adaptation, as a transductive transfer
extensive empirical evaluations. Three novel transfer scenarios learning, fits the situation where the source domain labels are
in mechanical fault diagnosis, namely, various operating condi- available, the target domain labels are unavailable. As this sit-
tions, diverse fault severity levels and different fault types, are uation is normally seen in practical problems for various fields,
considered. The presented framework achieves superior diagnosis the domain adaptation has also received wide attention. Some
performance in comparison with the state-of-the-art algorithms algorithms concerned, such as transfer joint matching (TJM) [38],
including supervised and domain adaption algorithms. transfer component analysis (TCA) [39], joint distribution adap-
The remainder of this work includes. In Section 2, the liter- tation (JDA) [40] and deep transfer learning [41–43] have been
atures about intelligent fault diagnosis are reviewed to further designed gradually for image classification. In intelligent fault
show the advantages and limitations of existing studies. In Sec- diagnosis, there are only a few works considering the appli-
tion 3, the preliminaries are described. In Section 4, the proposed cation of transfer learning to strengthen the applicability and
intelligent fault diagnosis framework, DTN with JDA, is presented. flexibility of diagnosis framework for diverse domain tasks, as
The comparison methods and implementation details are also shown in Fig. 1(c). In [44], the authors developed a SAE based
explained. The experiments, results and discussion are given in domain adaptation method for bearing diagnosis across diverse
Section 5 and Section 6, respectively. Finally, the conclusions are operating conditions, where a MMD term is utilized to measure
drawn in Section 7. the domain discrepancy. In [45], the authors proposed a domain
T. Han, C. Liu, W. Yang et al. / ISA Transactions 97 (2020) 269–281 271

Fig. 1. Intelligent fault diagnosis framework. (a) Stage I, (b) Stage II and (c) New one.

adaptation method, which employed a MMD term to evaluate the where ℓ means the loss function to calculate the cost between
discrepancy of normal category between source and target do- true label h(X ) and predicted label by CNN model f (X , {W i }Ki=1 ,
mains, and retained the sophisticated fault features with a weight {bi }Ki=1 , U ).
regularization term. These studies have preliminarily explored
the effectiveness of transfer learning in the field of intelligent 3.2. Transfer learning
fault diagnosis, but further works are needed to improve this
framework in the following two aspects. (1) The transfer scenario For completeness, the definitions of transfer learning are first
should be extended to more challenging diagnosis tasks, such as presented.
the diverse fault severity levels and diverse fault types. (2) The
previous studies only adapted the marginal distribution without
Definition 1 (Domain). A domain D is composed of two compo-
considering the conditional distribution, leading to the neglect of
nents: a feature space X and a marginal probability distribution
the discrimination structures in rich labeled source data. Jointly
reducing the discrepancy in both marginal distribution and con- P(X ), where X = {x1 , . . . , xn } ∈ X is a particular training dataset,
ditional distribution may hold the potential to achieve superior i.e., D = {X , P(X )}.
transfer performance.
Definition 2 (Task). A task T consists of two parts, a label space
3. Preliminaries Y and a predictive function f (X ), which can be learned from the
instance set X , i.e., T = {Y , f (X )}. Also, f (X ) = Q (Y |X ) is the
3.1. Convolutional neural network conditional probability distribution.

CNN, as a type of most effective deep learning models, has Definition 3 (Transfer Learning). Given a source domain Ds with
been widely used in image processing, computer vision and a learning task Ts and a target domain Dt with a learning task Tt ,
speech recognition. Typically, a CNN is composed of three types transfer learning aims to facilitate the learning process of target
of layers, which are convolutional layers, pooling layers and fully- predictive function ft (X ) in Dt by using the related information
connected layers. The first step of CNN is to convolve the input or knowledge in Ds and Ts , where Ds ̸ = Dt , or Ts ̸ = Tt . When
signal with a set of filter kernels (1D for time-series signal and 2D the Ds = Dt and Ts = Tt , it will be categorized into traditional
for image). All the feature activations by convolution operation at machine learning task.
different locations constitute the feature map. A nonlinear activa-
tion function, generally rectified linear unit (ReLU), is applied on Two remarks should⋁ be emphasized here. The condition Ds ̸=
the sum of feature maps. The operation of convolutional layer can Dt means Xs ̸ = ⋁ Xt Ps (Xs ) ̸ = Pt (Xt ). And the condition of Ts ̸ = Tt
be expressed as: implies Ys ̸ = Yt Qs (Ys |Xs ) ̸ = Qt (Yt |Xt ).
∑
cnr = ReLU( vmr −1 ∗ wnr + brn ) (1)
3.3. Maximum mean discrepancy
m

where cnr
is the nth output of convolutional layer r, n represents MMD is an index to measure the discrepancy of two distribu-
the number of filter in layer r, wnr and brn are the nth filter and bias tions. Given two dataset Xs , Xt , Ps (Xs ) ̸ = Pt (Xt ) and a nonlinear
of layer r respectively, vmr −1
is the mth output from previous layer
mapping function φ in a reproducing Kernel Hilbert space H
r − 1, ∗ denotes the convolution operation. The obtained feature
(RKHS), the formulation of MMD can be defined as:
map is then processed with a pooling layer by taking the mean
ns nt
or maximum feature activation over disjoint regions. By cascad- 1 ∑ 1 ∑
ing the combination of convolutional layer and pooling layer, a MMDH (Xs , Xt ) = ∥ φ (xsi ) − φ (xti )∥2H (3)
ns nt
multi-layer structure is built for feature description. Finally, the i=1 i=1
fully-connected layers, just like the layers in multi-layer neural In (3), we can find that the empirical estimation of the discrep-
network, are employed for classification. Given the training set ancy for two distributions is considered as the distance between
{Xj }j , the learning process of a CNN with K convolutional layers, the two data distributions in RKHS. A value near zero for MMD
including the parameters of filters {W i }Ki=1 , the biases {bi }Ki=1 and
means the two distributions are matched. In transfer learning,
classification layers U, can be defined as an optimization task:
∑ MMD is generally used to construct the regularization term for
min ℓ(h(X j ), f ({W i }Ki=1 , {bi }Ki=1 , U )) (2) the constraint in feature learning, making the learned feature
{Wi }Ki=1 ,{bi }Ki=1 distributions more similar between different domains.
j
272 T. Han, C. Liu, W. Yang et al. / ISA Transactions 97 (2020) 269–281

Fig. 2. An illustration of MDA and CDA, f : discriminative hyperplane, Ds : feature distribution in source domain, Dt : feature distribution in target domain.

4. Deep transfer network with joint distribution adaptation same distribution. If the marginal distribution for (4) holds, the
optimization problem in (7) becomes
4.1. Joint distribution adaptation
min D(Qs (φ (Xs )|Ys ), Qt (φ (Xt )|Yt )) (8)
Generally, the probability distributions of diverse domains The above objective function is noted as CDA. This step is
may exhibit significant difference not only in marginal distribu- essential for an accurate and robust distribution adaptation. How-
tion, which represents the cluster center of feature distributions, ever, it is still intractable as Yt is unknown. Some previous studies
but also in conditional distribution for large amount of practical proposed a circuitous way by exploiting the pseudo labels for
applications. From Fig. 2(a) to (b), it is clear the distributions target data to handle the CDA in unsupervised domain adapta-
for source and target domains are different. The direct use of tion [40,46]. With the aid of the pre-trained models on labeled
trained discriminative hyperplane in source domain will lead to source data, pseudo labels for target data can be preliminarily
the extensive misclassification in target domain. The marginal supplied. Supposing a total of C categories and the category c ∈
distribution adaptation (MDA) contributes to improving transfer {1, . . . , C }, the distance index, MMD, can be defined to measure
performance by aligning the two distribution centers. However, the mismatch of conditional distributions Qs (xs |ys = c) and
only adapting the marginal distributions is insufficient, since the Qt (xt |yt = c) of c category,
discriminative hyperplanes may be different for diverse domain (c) 1 ∑ 1 ∑
tasks. The conditional distribution adaptation (CDA), which aims MMD2H (Qs(c) , Qt ) = ∥ (c)
φ (xsi ) − (c)
φ (xtj )∥2H (9)
ns nt
to match the discriminative structures between labeled source (c)
xsi ∈Ds
(c)
xtj ∈Dt
data and unlabeled target data, is also indispensable and highly
(c)
effective. An intuitive description of this consideration is illus- where Ds = {xi : xi ∈ Ds ∧ y(xi ) = c }, y(xi ) is the true label, and
(c) (c) (c)
trated from Fig. 2(b) to (c). Hence, in this part, we are dedicated to ns = |Ds |, Dt = {xj : xj ∈ Dt ∧ ŷ(xj ) = c }, ŷ(xj ) is the pseudo
presenting a simple mathematical formulation of JDA, and further (c) (c)
label and nt = |Dt |.
providing a specific deep transfer framework. It should be noted that, although there are probably many
Problem formulation (joint distribution adaptation) In a mistakes in the initial pseudo labels, one can iteratively update
n
fault diagnosis task, given a labeled source dataset Xs = {xsi , ysi }i=s 1 the pseudo labels in the stage of model optimization to ob-
t nt
and a unlabeled target dataset Xt = {xi }i=1 , Xs = Xt , Ys = Yt , tain the optimal prediction accuracy under the current learning
Ps (Xs ) ̸ = Pt (Xt ), Qs (Ys |Xs ) ̸ = Qt (Yt |Xt ). The weak form of transfer conditions.
learning with domain adaptation is to learn a feature transform (3) JDA: By integrating marginal MMD and conditional MMD,
that simultaneously minimizes the discrepancy between marginal a regularization term of JDA can be written as:
distribution and conditional distribution [39], i.e., C
∑ (c)
min D(Ps (φ (Xs )), Pt (φ (Xt ))) (4) DH (Js , Jt ) = MMD2H (Ps , Pt ) + MMD2H (Qs(c) , Qt ) (10)
c =1

where Js and Jt is the joint probability distribution of Ds and Dt ,

and min D(Qs (Ys |φ (Xs )), Qt (Yt |φ (Xt ))) (5) respectively. Minimizing the (10) can guarantee the match both
in marginal distribution and marginal distribution with sufficient
where D is the function to evaluate the domain discrepancy.
statistics.
(1) MDA: The objective function of (4) is to minimize the
distance between the two data distributions in RKHS, where we 4.2. Deep transfer network
can apply MMD (3) to tackle it. The formula is described as:
ns
1 ∑
nt
1 ∑ Having introduced the regularization term of JDA, we now
MMD2H (Ps , Pt ) = ∥ φ (xsi ) − φ (xti )∥2H (6) turn to the establishment of DTN, attempting to realize the goal
ns nt of domain adaptation under deep learning framework. CNN is
i=1 i=1
utilized as the basic model in this work.
where φ : X → H is the nonlinear mapping function in RKHS.
Generally, we can train a CNN model on the sufficient source
(2) CDA: The conditional distribution in (5) is intractable in
data from scratch with the optimization task defined in (2). The
the absence of classification ground truth. We rewrite it into the
cross-entropy ℓce between estimated probability distribution and
following form:
true label is served as the loss function. When applying the
QS (φ (Xs )|Ys ) · Ps (Ys ) Qt (φ (Xt )|Yt ) · Pt (Yt ) pre-trained CNN model to domain adaptation, a new objective
min D( , ) (7)
Ps (φ (Xs )) Pt (φ (Xt )) function is redefined by integrating the ℓce and regularization
term of JDA, rewritten as:
For our problem, we have P(Ys ) = P(Yt ), as the labels of
the source domain and the target domain are assumed with the L(Θ ) = ℓce + λDH (Js , Jt ) (11)
T. Han, C. Liu, W. Yang et al. / ISA Transactions 97 (2020) 269–281 273

Fig. 3. The architecture of DTN for unsupervised domain adaptation.

l
where θ = {W i , bi i=1 } is the parameter collection of a CNN with Algorithm 1 Training Procedure of DTN with JDA
l layers and λ is non-negative regularization parameter. It should n
Input: Given the dataset Ds = {xsi , ysi }i=s 1 in source domain, unlabeled
be emphasized that the mapping function φ in RKHS H is the n
dataset Dt = {xti }i=t 1 in target domain, the architecture of deep neural
nonlinear feature transform learned by deep models herein. For network, the trade-off parameters λ.
CNNs, the features always change from general to specific with Output: Transferred network and predicted labels for target samples
the increase of layer depth. The upper layers tend to represent 1: begin
more abstract features, which may result in a larger domain dis- 2: Train a base deep network on the source dataset Ds
n
3: Predict the pseudo labels Ŷ0 = {yti }i=t 1 for target samples with base
crepancy [47]. Consequently, we deploy the regularization term
network
on the last hidden fully-connected layer, namely the layer in front 4: repeat
of discrimination layer, that is, φ (x) = hl−1 (x), where hl−1 (·) is the 5: j = j + 1
feature map by the nonlinear feature transform of the first (l − 1) 6: Compute the regularization term of JDA according to (10)
layers. The JDA regularization term employed in conjunction with 7: Network optimization with respect to (11)
deep models can generate the mapping function φ by adaptively 8: Update the pseudo labels Ŷj with optimized network
learning from data, and avoid to manually set the parameterized 9: until convergence or Ŷj = Ŷj−1
10: Check the diagnosis performance of transferred network on other
kernel function. target samples.
The architecture of proposed DTN with JDA is illustrated in
Fig. 3. A domain-shared CNN is utilized to extract signal charac-
teristics for both source data and target data. That is, the struc-
ture and weights of convolutional blocks and fully-connected
layers keep consistent in source and target domains. By execut- ⎧ ns nt
ing the forward pass, the two terms in (11) can be calculated, 2 1 ∑ 1 ∑
φ s
φ (xtj )),
⎪
namely, the traditional cross-entropy loss ℓce and the regular-
⎪
⎪ ( (x i ) − x ∈ Ds
⎨ ns ns
⎪ nt
i=1 j=1
ization term of JDA. Then, the backpropagation algorithm and ∇ MMD2H (Ps , Pt ) = nt ns
mini-batch stochastic gradient descent (SGD) are utilized for net- 2 1 ∑ 1 ∑
φ (xtj ) − φ (xsi )),
⎪
⎩ nt ( nt x ∈ Dt
⎪
work optimization. On the one hand, by optimizing the loss
⎪
ns
⎪
ℓce , the model is animated to capture the discriminant structure j=1 i=1

from the labeled source data. On the other hand, by optimizing (14)
the regularization term of JDA, the model can further reduce
the discrepancy of feature distributions between domains and
learn domain-invariant feature representation so that the learnt (c)
and ∇ MMD2H (Qs(c) , Qt ) =
discriminant structure in source domain can also be applied to ⎧
target data. 2 1 ∑ 1 ∑
⎪
⎪ (c)
( (c) φ (xsi ) − (c)
φ (xtj )), x ∈ Ds
The gradient of objective function with respect to network
⎪
⎪ n
⎨ s
⎪ n s (c)
nt (c)
parameters is xsi ∈Ds xtj ∈Dt (15)
∂ℓce ∂φ (x) 2 1 ∑ 1 ∑
φ (xtj ) − φ (xsi )),
⎪
∇θ l = + λ(∇ DH (Js , Jt ))T ( ) (12)
⎪
⎪ (c)
( (c) (c)
x ∈ Dt
∂Θl ∂Θl ⎩ nt nt t (c) ns
⎪
⎪
(c)
xj ∈Ds xsi ∈Ds
∂φ (x)
where ∂ Θ l are the partial derivatives of the output of (l −
1)th layer with network parameters. The detailed formulations 4.3. Training strategy
of ∇ D2H (Js , Jt ) are described as:
C
∑
∇ DH (Js , Jt ) = ∇ MMD2H (Ps , Pt ) + ∇ MMD2H (Qs(c) , Qt(c) ) (13) The training procedure of this framework mainly consists of
c =1
two parts: (1) the pre-training on labeled source data and (2)
274 T. Han, C. Liu, W. Yang et al. / ISA Transactions 97 (2020) 269–281

Fig. 4. Illustration of wind turbine experimental platform.

the network adaptation in target domain with the input of both loading conditions (i.e., varying wind speeds). The experiments
labeled source data and unlabeled target data. It should be noted are performed under six different wind speeds ranging from 5.8
that the dataset is generally divided into small batches, which m/s to 11.5 m/s (loads 0–5). And the corresponding speeds of
are fed into the network for training. A desirable batch size wind wheel range from 255 rpm to 300 rpm. The raw vibration
should be as large as possible to cover the variance of the whole data is collected by accelerometers. The sampling frequency is
dataset, whereas a too large batch size will also increase the 20 kHz. The time-domain waveforms of diverse machine condi-
calculation burden. It is a trade-off between transfer performance tions under load 5 are presented in Fig. 5. When the machine is
and computational effectiveness. Besides, the same amount of health (condition 0), it is clear the vibration amplitude maintains
samples from source and target domain are used for network in low level and the signal components related to rotating fre-
adaptation. When the data sizes are different across domains, the quency is dominated. When the faults are introduced to machines
re-sampling can be applied in the smaller dataset to keep the (conditions 1 to 9), obvious impulse characteristics appear, espe-
same sample size in the source and target domains. The whole cially for bearing-related faults (conditions 3 to 5). And the signal
adaptation steps of DTN with JDA are listed in Algorithm 1. components are more complex.
For clarity, the denotation of A→B is utilized to represent the
5. Experiments transfer task from source dataset A to target dataset B. In the
wind turbine fault dataset, we aim to explore the transfer ability
In this section, experiments on three mechanical fault datasets of proposed framework across diverse operating conditions. Con-
are conducted to demonstrate the efficiency, superiority as well sequently, six transfer tasks are designed for empirical evaluation
as practical value of proposed transfer framework. Mechanical (listed in Table 1). For instance, A→B: the source dataset A
equipment may appear diverse failure mode during the long-time contains the samples of ten machine conditions under load 0–
operation. Different faults may present different characteristics. 2, while the target dataset B is composed of the samples under
The studies of intelligent fault diagnosis focus on classifying the load 3–5. In Table 1, the unlabeled target samples are utilized for
signal samples from different health conditions and make di- domain adaptation. No information of label can be used in this
agnostic decisions automatically. In the three datasets, the fre- process. After domain adaptation, another set of testing target
quently occurred faults in mechanical systems are artificially samples with labels are used to evaluate the performance of
introduced to machines so as to simulate diverse health condi- transferred diagnosis model.
tions. The vibration signal under diverse machine conditions are (2) Bearing Fault Dataset: The bearing fault dataset is an open-
collected. The performance of proposed method and comparative access dataset from Case Western Reserve University [48]. Four
methods can be further tested in these fault datasets. different bearing conditions: health, outer ring fault (OF), rolling
element fault (RF) and inner ring fault (IF) (corresponding labels
5.1. Data description 0–3), are considered in this dataset. The experiments are per-
formed under four motor speeds (1797, 1772, 1750 and 1730
(1) Wind Turbine Fault Dataset: The first dataset is from our rpm) at a sampling frequency of 12 kHz. For each kind of fault,
wind turbine experimental platform, whose schematic diagram is single point faults with different severity levels are introduced
illustrated in Fig. 4. This dataset contains ten machine conditions, to test bearing respectively. In most existing studies, the samples
which are health, front bearing pedestal loosening (FB), back with the same fault type but different severity levels are treated
bearing pedestal loosening (BB), rolling element fault of front as distinct categories. Indeed, the signal characteristic of certain
bearing (RF), inner-ring fault of front bearing (IF), outer-ring fault fault type always varies with the severity level. Therefore, we aim
of front bearing (OF), misalignment in horizontal direction (MH), to investigate the performance of proposed transfer framework
misalignment in vertical direction (MV), variation in airfoil of across diverse fault severities in this dataset.
blades (VB) and yaw fault (YF) respectively (corresponding labels For simplicity, we select two fault severity levels with the fault
0–9). All these faults can basically simulate the typical failure diameters (FD) of 0.18 mm and 0.53 mm to construct transfer
modes from wind wheel to drive chain of a real wind turbine. tasks: G→H, H→G. The dataset G is composed of the samples of
To create working conditions close to reality, we change the four bearing conditions under four motor speeds and the fault
power of axial flow fan in the wind tunnel to generate varying diameter of OF, RF and IF cases is 0.18 mm. The dataset H is
T. Han, C. Liu, W. Yang et al. / ISA Transactions 97 (2020) 269–281 275

Fig. 5. Time waveforms of collected vibration data.

Table 1
Designed transfer tasks across diverse operating conditions.
Transfer tasks Source domain Target domain Unlabeled target samples Testing target sample Machine conditions
A→B Load 0–2 Load 3–5 24000 4000
B→A Load 3–5 Load 0–2 24000 4000 10
C→ D Load 2 Load 3–5 24000 4000 conditions
D→C Load 3–5 Load 2 12000 4000 (labels
E→ F Load 2 Load 5 12000 4000 0–9)
F→E Load 5 Load 2 12000 4000

formed by the health samples and fault samples with 0.53 mm 5.2. Comparison studies
fault diameter.
(3) Gearbox Fault Dataset: The gearbox fault dataset collected (1) Comparison methods: The proposed framework will be
from our single-stage cylindrical straight gearbox test rig (as compared with several state-of-the-art methods in the field of
shown in Fig. 6) is analyzed in the scenario where the domain dis- intelligent fault diagnosis: (1) SVM [26]; (2) Random forest (RF)
crepancy between specific fault types are expected to be bridged [26]; (3) Empirical mode decomposition analysis (EMD) [1]; (4)
by transfer learning. Sometimes, it may be more practical to CNN [13,33]; (5) TJM [38]; (6) TCA [39]; (7) JDA [40]; (8) DTN with
confirm the location of failure instead of specific types. Consid- MDA and (9) DTN with JDA (this work). These baseline methods
ering the example of gearbox, identifying the fault location, such can be categorized into two subsettings: the standard diagno-
as gear fault or bearing fault, is beneficial for monitoring and sis methods (1)–(4) and the transfer learning based techniques
maintenance. That said, certain types of fault occurred in one (5)–(9).
In (1)–(2), the popular statistical features, such as root mean
component, such as bearing inner race fault or outer race fault,
square and kurtosis, are extracted from raw data in time and
can be defined as one category. Besides, it may be impossible to
frequency domains to form the input of the classifiers [7,26,49]
obtain the fault data of various fault types and train a diagnosis
. In (3), EMD is applied to decompose the raw signal into a se-
model with high accuracy for a complex mechanical system.
quence of intrinsic mode functions (IMF). The energy distribution
Consequently, the transfer performance across similar but diverse
of first five IMFs is calculated as the input features for classifier.
fault types is of great practical significance. In the experiments,
In (4), using the deep learning flow, CNN. In the transfer learning
we introduced two types of faults, i.e., gear root crack (RC) and based techniques (5)–(9), TJM, TCA and JDA are the shallow
tooth surface spalling (TS), to high-speed cylindrical gearing, and transfer learning methods, and thus we also extract the statistical
another two types of faults, i.e., outer race fault (OR) and roller features from raw data, then conduct the unsupervised domain
fault (RO), to high-speed conical bearing. The vibration data is adaptation, finally make diagnosis results with classifier. In deep
collected with a sampling frequency of 20 kHz. learning flow, a comparison between the proposed method and
We state three conditions of gearbox, including health, gear DTN with MDA method by removing the CDA term in objective
fault and bearing fault in this dataset (corresponding labels 0–2), function is made. The pre-trained base network resorts to the
and design two transfer tasks: I→J, J→I. The dataset I contains optimal CNN model in source domain, that is, the trained model
the samples of health, bearing OR and gear RC. The dataset J is in (4).
formed by the samples of health, bearing RO and gear TS (see (2) Implementation details: For (1)–(4), we use the labeled
Table 2). source data to train the model, which will be applied to diagnose
276 T. Han, C. Liu, W. Yang et al. / ISA Transactions 97 (2020) 269–281

Fig. 6. The single-stage cylindrical straight gearbox test rig: (a) Schematic diagram of gearbox test rig; (b) The damaged components.

Table 2
Designed transfer tasks across diverse fault severities and types.
Transfer tasks Source domain Target domain Unlabeled target samples Testing target sample Machine conditions
G→H FD 0.18 FD 0.53 12000 4000 4 conditions
H→ G FD 0.53 FD 0.18 12000 4000 (labels 0–3)
I→J H, OR, RC H, RO, TS 12000 4000 3 conditions
J→ I H, RO, TS H, OR, RC 12000 4000 (labels 0–2)

Fig. 7. Comparison of the diagnosis accuracy of diverse methods on ten transfer tasks.

Table 3 validation. In RF, the performance is substantially robust to the

Model structure of used CNN. two structure parameters in RF, i.e., number of trees ntry and the
No. Layer type Kernel number Kernel size Activation number of random feature subset⌊√ ⌋mtry . Hence, the ntry and mtry
1 Convolution1 16 128 × 1 ReLU are empirically set to 500 and m respectively, where m is the
2 Max-pooling1 – 2 × 1 ReLU
3 Convolution2 32 64 × 1 ReLU dimension of input vector [26]. Referring to literatures of TJM,
4 Max-pooling2 – 2 × 1 ReLU TCA and JDA [38–40], the adaptation regularization parameter
5 Convolution3 64 16 × 1 ReLU λ is set by searching λ ∈ {1e−2 , 1e−1 , 1, 10, 20, 50, 100}. The
6 Max-pooling3 – 2 × 1 ReLU kernel type is selected from RBF and linear, and the dimension
7 Convolution4 128 3 × 1 ReLU after adaptation is optimized with the strategy of trial and error.
8 Max-pooling4 – 2 × 1 ReLU
In DTN methods, the adaptation regularization parameter λ is
9 Convolution5 256 2 × 1 ReLU
10 Max-pooling5 – 2 × 1 ReLU set by searching λ ∈ {1e−4 , 1e−3 , 1e−2 , 5e−2 , 1e−1 , 5e−1 , 1}. The
11 Fully-connected 1 128 ReLU learning rate of SGD is set to 0.01.
12 Fully-connected 1 64 ReLU In CNN and DTN, the architecture and parameters setting are
13 Output 1 n Softmax listed in Table 3. In this work, the structure of 5 convolutional and
max-pooling layers is used. Considering the overfitting problem,
the L2-norm regularization term are introduced for the network
parameters, whose weight decay is set to 1e-4. The batch size is
unlabeled target data. For (5)–(7), TJM, TCA and JDA simulta-
chosen from 16 to 128, the SGD is used as the optimizer with a
neously process the labeled source data and unlabeled target
learning rate of 0.01.
data for dimension reduction. The classification model is then
trained with low-dimensional features in source domain, and 6. Results and discussion
deployed on target domain. Herein, both the SVM and RF are
adopted to achieve an optimal diagnosis performance as the final 6.1. Results
results. The RBF is adopted as the kernel in SVM. The model
parameters, i.e., penalty factor and kernel function parameter, The diagnosis results of ten tasks are shown in Tables 4–
are both decided by grid search in the sense of 5-fold cross 6 and Figs. 7–9, respectively. Each result is an average of 20
T. Han, C. Liu, W. Yang et al. / ISA Transactions 97 (2020) 269–281 277

Table 4
Diagnosis accuracy (%) on ten transfer tasks with different methods.
Methods A→B B →A C →D D→C E →F F→E G→H H →G I→J J→I Avg
SVM 72.8 ± 1.9 74.5 ± 1.5 89.8 ± 0.8 90.4 ± 1.2 62.6 ± 1.0 63.4 ± 1.5 73.4 ± 1.3 75.7 ± 1.3 78.2 ± 0.9 46.6 ± 0.6 72.7 ± 1.2
RF 84.4 ± 0.9 78.3 ± 2.4 89.1 ± 0.3 92.7 ± 1.1 60.9 ± 0.7 61.0 ± 0.7 80.8 ± 1.1 49.9 ± 1.3 69.4 ± 0.9 69.2 ± 7.8 73.6 ± 1.7
EMD 79.8 ± 0.9 72.2 ± 0.9 77.7 ± 4.2 71.7 ± 6.7 64.5 ± 6.2 61.8 ± 1.2 72.5 ± 4.4 64.8 ± 10.7 57.0 ± 6.9 43.4 ± 2.8 66.5 ± 4.5
CNN 91.8 ± 0.3 93.8 ± 0.4 89.4 ± 3.1 94.9 ± 0.2 80.4 ± 4.0 82.1 ± 4.6 81.7 ± 5.3 64.5 ± 10.0 79.5 ± 1.5 72.3 ± 1.3 83.0 ± 3.1
TJM 87.4 ± 1.9 81.4 ± 2.4 92.5 ± 1.3 93.9 ± 0.4 78.3 ± 5.4 67.6 ± 1.3 92.2 ± 5.8 96.0 ± 6.9 77.1 ± 2.2 58.7 ± 5.2 82.5 ± 3.3
TCA 87.8 ± 1.7 79.0 ± 2.8 88.7 ± 0.5 92.9 ± 0.5 76.1 ± 4.0 68.6 ± 1.2 92.8 ± 7.2 94.4 ± 8.5 75.5 ± 4.7 56.4 ± 6.9 81.2 ± 3.8
JDA 86.0 ± 2.2 81.0 ± 2.6 91.3 ± 1.6 94.1 ± 1.3 83.6 ± 2.8 61.4 ± 3.4 93.9 ± 10.9 94.8 ± 11.0 79.2 ± 1.2 55.4 ± 5.4 82.1 ± 4.2
DTN.w.MDA 95.9 ± 2.9 96.9 ± 0.4 94.0 ± 1.1 97.4 ± 0.4 87.3 ± 1.2 87.4 ± 1.7 81.3 ± 5.0 68.2 ± 7.9 80.1 ± 1.0 83.6 ± 1.8 87.2 ± 2.3
DTN.w.JDA 98.3 ± 0.2 98.9 ± 0.5 96.6 ± 0.2 98.5 ± 0.2 96.8 ± 0.5 97.3 ± 0.2 99.3 ± 0.4 97.1 ± 8.7 99.9 ± 0.1 96.3 ± 0.5 97.9 ± 1.2

Table 5
Missing alarm rate (%) on ten transfer tasks with different methods.
Methods A→B B →A C →D D→C E →F F→E G→H H→G I→J J→I Avg
SVM 18.4 ± 2.2 17.7 ± 1.1 8.0 ± 0.6 7.5 ± 0.8 29.9 ± 0.9 38.3 ± 2.2 27.0 ± 1.1 14.2 ± 2.2 15.4 ± 0.8 52.5 ± 0.4 22.9 ± 1.2
RF 12.4 ± 0.8 11.6 ± 1.7 8.5 ± 0.2 5.0 ± 0.6 40.0 ± 1.0 39.5 ± 1.5 12.7 ± 0.9 62.6 ± 0.4 24.3 ± 6.2 28.6 ± 11.7 24.5 ± 2.5
EMD 19.5 ± 0.9 27.2 ± 0.9 21.3 ± 3.7 29.0 ± 7.0 35.6 ± 7.2 37.4 ± 1.3 27.8 ± 5.0 35.4 ± 13.2 44.6 ± 6.6 58.4 ± 2.9 33.6 ± 4.9
CNN 8.2 ± 0.3 6.1 ± 0.4 10.6 ± 3.0 5.0 ± 0.2 19.7 ± 4.2 17.9 ± 4.7 18.3 ± 5.3 35.3 ± 9.9 21.0 ± 2.0 27.1 ± 1.6 16.9 ± 3.2
TJM 9.8 ± 1.0 13.4 ± 2.0 6.4 ± 1.1 5.4 ± 0.4 16.3 ± 3.4 29.6 ± 5.4 6.0 ± 4.2 2.5 ± 3.8 17.0 ± 2.7 25.7 ± 4.7 13.2 ± 2.9
TCA 9.9 ± 1.2 15.6 ± 2.5 10.4 ± 0.9 6.3 ± 0.5 19.6 ± 1.4 25.3 ± 5.0 5.3 ± 4.4 3.4 ± 4.7 16.1 ± 2.7 31.0 ± 11.2 14.3 ± 3.5
JDA 11.9 ± 1.5 13.9 ± 2.1 7.1 ± 1.5 5.4 ± 1.1 14.1 ± 2.2 46.6 ± 2.5 8.3 ± 15.7 7.7 ± 16.2 14.9 ± 1.0 35.2 ± 11.1 16.5 ± 5.5
DTN.w.MDA 4.2 ± 0.3 3.2 ± 0.5 6.0 ± 1.0 2.6 ± 0.4 12.8 ± 1.2 12.7 ± 1.8 18.9 ± 5.0 32.2 ± 7.5 19.8 ± 1.0 16.4 ± 1.8 12.9 ± 2.1
DTN.w.JDA 1.7 ± 0.2 1.1 ± 0.5 3.4 ± 0.2 1.6 ± 0.2 3.2 ± 0.5 2.6 ± 0.5 0.8 ± 0.4 2.8 ± 8.5 0.1 ± 0.1 3.7 ± 0.5 2.1 ± 1.2

Table 6
False alarm rate(%) on ten transfer tasks with different methods.
Methods A →B B→A C→D D →C E →F F→E G →H H →G I→J J →I Avg
SVM 22.1 ± 1.9 25.8 ± 1.5 11.8 ± 0.7 11.1 ± 1.3 36.1 ± 0.9 36.6 ± 1.3 26.8 ± 1.0 24.3 ± 0.3 22.9 ± 1.1 53.4 ± 0.5 27.1 ± 1.1
RF 15.8 ± 0.9 21.4 ± 2.4 11.2 ± 0.4 8.7 ± 1.1 39.2 ± 0.7 38.9 ± 0.7 19.3 ± 1.1 49.3 ± 1.3 32.8 ± 0.6 31.6 ± 7.6 26.8 ± 1.7
EMD 20.1 ± 0.9 27.6 ± 0.9 21.9 ± 3.8 27.9 ± 6.6 34.8 ± 6.6 37.0 ± 2.1 27.0 ± 4.0 35.3 ± 11.2 44.0 ± 6.7 57.7 ± 3.6 33.3 ± 4.6
CNN 7.3 ± 0.3 5.4 ± 0.3 9.5 ± 2.6 4.0 ± 0.2 17.2 ± 3.8 16.6 ± 3.5 17.5 ± 5.3 48.1 ± 9.7 12.7 ± 0.7 18.1 ± 1.6 15.6 ± 2.8
TJM 12.6 ± 1.8 18.4 ± 2.4 7.6 ± 1.4 6.7 ± 0.3 20.6 ± 5.0 32.5 ± 1.5 7.7 ± 5.9 3.9 ± 6.6 22.8 ± 1.9 41.8 ± 5.0 17.5 ± 3.2
TCA 12.3 ± 1.7 21.2 ± 2.8 11.6 ± 0.6 7.4 ± 0.6 23.3 ± 3.4 30.5 ± 1.2 7.3 ± 7.2 5.6 ± 8.5 25.1 ± 4.4 43.6 ± 7.4 18.8 ± 3.8
JDA 14.1 ± 2.2 19.2 ± 2.5 8.8 ± 1.5 6.3 ± 1.4 15.7 ± 2.7 43.8 ± 5.0 5.9 ± 10.7 5.0 ± 10.5 21.1 ± 1.1 44.6 ± 4.5 18.5 ± 4.2
DTN.w.MDA 3.8 ± 0.2 2.7 ± 0.3 5.5 ± 0.9 2.4 ± 0.2 10.8 ± 0.9 11.5 ± 1.0 18.4 ± 5.2 39.6 ± 2.8 19.7 ± 1.0 12.5 ± 0.9 12.7 ± 1.3
DTN.w.JDA 1.6 ± 0.2 1.1 ± 0.4 2.3 ± 1.0 1.4 ± 0.2 2.9 ± 0.3 2.5 ± 0.3 0.7 ± 0.3 4.4 ± 13.0 0.1 ± 0.1 3.3 ± 0.4 2.0 ± 1.6

Fig. 8. Comparison of the MAR of diverse methods on ten transfer tasks.

random tests, where the training set and testing set are randomly standard methods (the first four) is much improved with domain
splitted. To comprehensively show the capabilities of proposed adaptation for most cases. Especially, the average accuracy of
method, three performance indices, i.e., average diagnosis accu- DTN with JDA is 97.9%, and makes a 14.9% transfer improvement,
racy, missing alarm rate (MAR), false alarm rate (FAR) [50], are comparing with the baseline CNN, 83.0%. (3) The deep learning
reported. Several encouraging observations are firstly noted. (1)
methods always present a superior performance to the shallow
The DTN with JDA method in this work significantly outperforms
methods no matter in standard diagnosis framework or transfer
the other methods. The stable average accuracies and low root-
learning framework, conforming its extraordinary feature learn-
mean-square error under different transfer scenarios (over 96%
for all tasks) validate the effective and robust domain adaptation ing and representation capacity as well as a stronger feature
ability of proposed method. The better performances of DTN transferability. (4) By jointly adapting the marginal distribution
with JDA can also be found for MAR and FAR (much lower and conditional distribution, the DTN with JDA in this work
than comparative methods). (2) The diagnosis performance in significantly promotes the adaptation ability of previous DTN
278 T. Han, C. Liu, W. Yang et al. / ISA Transactions 97 (2020) 269–281

Fig. 9. Comparison of the FAR of diverse methods on ten transfer tasks.

Fig. 10. Network visualization in task E→F: t-SNE is applied on the feature representation of last hidden fully-connected layer for both the source data and target
data. There are total 10 categories in wind turbine dataset (corresponding labels 0–9). S represents the samples in source domain and T means the target domain.
For instance, the c5-T corresponds to the samples of category 5 (inner-ring fault of bearing, as introduced above) in target domain.

Table 7 visualization results of standard CNN (that is, the pre-trained base
Computation complexity for diverse deep methods with wind turbine dataset in
network for further domain adaptation), DTN with MDA and DTN
task A→B.
with JDA in three transfer tasks are presented in Figs. 10–12,
Methods Time (s/epoch) Memory (MB)
respectively.
CNN 4.52 1016.6
Task E→F is to realize the domain adaptation across diverse
DTN.w.MDA 7.12 1548.0
DTN.w.JDA 33.78 1548.5 operating conditions. First, as shown in Fig. 10(a), most of the 10
categories of source samples are well separated with the standard
CNN, while the feature distributions of same category between
source and target domains are not aligned well. And even worse,
with MDA, especially under the transfer scenarios of diverse fault
a large overlapping areas can be inspected among the target
severity levels and diverse fault types.
samples of certain categories, such as 2, 3 and 8. These observa-
To show the real-time practicality of the proposed framework,
tions suggest the domain discrepancy exists not only in marginal
the computation complexity of diverse methods in task A→B
distribution, but also in conditional distribution, which may result
is compared in Table 7. Generally, the deep learning methods
in the degraded diagnosis results in conventional framework. In
require higher computation complexity but achieve better perfor-
Fig. 10(b) and (c), under the transfer learning framework, we
mance than shallow methods, and thus we only listed the results
can find the obvious improvement of distribution adaptation. In
of three deep methods here. Since the DTN with JDA calculates
more intermediate variables in CDA, it needs more computing particular, the same category between domains is aligned very
time and memory than standard CNN and DTN with MDA. This well by DTN with JDA, and a consonant and legible discriminant
work focuses on the investigation of effectiveness in DTN. The structure can be observed for both source and target categories.
training process is implemented in batch manner. Another online Task G→H is to adapt the distribution across diverse fault
learning manner can train the deep network from sequential severity levels. In Fig. 11(a), the standard CNN assembles the
data flow. The transform of transfer diagnosis framework from distributions of OF and IF in target domain, and the source OF
batch learning to online learning will largely reduce the real-time and target IF are mixed, explaining the unsatisfactory accuracy
computing time and memory [51]. in Table 4. As a contrast, in Fig. 11(c), the distribution of same
category between source and target domains are well matched
6.2. Network visualization with JDA. Interestingly, in Fig. 11(b), we can observe that MDA
relocates the target OF and IF away from the corresponding
In order to give a clear and intuitive understanding of pro- distribution in source domain. Naturally, marginal distribution
posed framework, t-distributed stochastic neighbor embedding only reflects the cluster structure for the feature distribution of
(t-SNE), is utilized for network visualization. For comparison, the all categories, and MDA aims to explicitly reduce the distance
T. Han, C. Liu, W. Yang et al. / ISA Transactions 97 (2020) 269–281 279

Fig. 11. Network visualization in task G→H: there are total 4 categories in bearing dataset (corresponding labels 0–3).

Fig. 12. Network visualization in task I→J: there are total 3 categories in gearbox dataset (corresponding labels 0–2).

Fig. 13. The transfer loss curves and test accuracy via DTN with JDA. Fig. 14. The transfer loss curves and test accuracy via DTN with MDA.

between the cluster centers of different domains. When the con- in task G→H are plot in Figs. 13 and 14, respectively. Here, we
ditional distributions are same across domains, MDA helps to separately display the ℓce term and regularization term for ease
correct the overall shift of feature space. However, in the field of observation. At the beginning, the losses of regularization term
of fault diagnosis, the difference in conditional distributions may for the two methods are both around 0.1, and the ones of ℓce term
be prevalent. Consequently, unlike single MDA, the JDA which are almost negligible. From Fig. 13, the loss of JDA regularization
simultaneously adapts the marginal distribution and conditional term converges to a certain degree after a series of iterations,
distribution is promising in these cases. As shown in Fig. 12, accompanied by the continued increase of test accuracy of tar-
similar results can be found in transfer task I→J across diverse get data. However, from Fig. 14, the loss of MDA regularization
fault types. Both the transfer accuracy and network visualization term finally fluctuates in a high level, and the test accuracy is
show that JDA supersedes the performance of MDA. confined around 87%. Besides, it is clear to observe the loss of
ℓce term presents an abrupt increase after around 300 iterations.
6.3. Convergence analysis Essentially, the ℓce term and regularization term in objective
function try to reduce domain discrepancy while preserving the
Since the additional regularization term is appended to objec- original discriminant structure in source domain. One possible
tive function for the transfer training, the convergence analysis is reason for the jump is that the gradient direction of the parameter
necessary to illustrate the transfer ability. The transfer loss curves optimization for regularization term conflicts with that of ℓce
and test accuracy curve for DTN with JDA and DTN with MDA term, causing a significant spike in transfer loss and test accuracy.
280 T. Han, C. Liu, W. Yang et al. / ISA Transactions 97 (2020) 269–281

The analysis reveals that the use of JDA regularization term is target domain (where the model is applied). However, re-training
capable of facilitating the network training and guaranteeing a the model is challenging and probably unrealistic as of the lack
stronger feature transferability. of sufficient labeled data in practical applications. To address this
issue, this work presents a DTN to take advantage of a pre-trained
6.4. Discussion network from the source domain and get the model transferred
with unlabeled data from the target domain, where a novel do-
In the traditional intelligent fault diagnosis framework, main adaptation approach, JDA, is presented. Through extensive
whether for shallow methods or the deep learning methods, the experiments on three datasets, the results show that the DTN
diagnosis performance varies a lot on different tasks. For instance, with JDA outperforms the state-of-the-art approaches. Compared
in the first six tasks on wind turbine dataset, RF get the best with the shallow methods, i.e., SVM, RF, EMD, TJM, TCA and JDA,
performance in task C→D and D→C (89.1% and 92.7%), while DTN with JDA achieves 25.2%, 24.3%, 31.4% 15.4%, 16.7%, 15.8%
degraded results in tasks E→F and F→E (60.9% and 61.0%). It improvements on the average accuracy in ten diagnosis tasks.
is reasonable because the operating conditions between C and In deep learning framework, DTN also effectively increase the
D are closer than those between E and F, and thus the data diagnosis accuracy from 83.0%, 87.2% to 97.9% in comparison with
for C and D shares a more similar feature space, leading to basic CNN and DTN with MDA. The network visualization further
a higher diagnosis accuracy. This phenomenon actually reveals provides the interpretation of diagnosis results, and DTN with JDA
the inherent drawback in conventional diagnosis framework, is shown to obtain a more accurate feature distribution alignment
that the feature distribution discrepancy between source domain across domains. Moreover, the DTN with JDA presents smooth
and target domain is neglected. The success much relies on the convergence and avoids negative adaptation in comparison with
similarity between source and target distributions, whereas a MDA.
large discrepancy across domains is common and inevitable in Using DTN with JDA, it is promising that the learnt diagnosis
practical diagnosis applications. The proposed transfer diagnosis models from experimental or real datasets can be transferred to
framework provides an effective measure for resolving the prob- new but similar applications in a more efficient and accurate way,
lem mentioned above, and DTN with JDA achieves the desirable which could benefit kinds of industrial applications. Further work
performance both in diagnosis indices and feature visualization. will pursue (i) quantitative assessment approaches of similarity
In domain adaptation methods, the DTN achieves superior and transferability between diverse domains, (ii) application on
performance than shallow domain adaptation methods, such as imbalanced distribution of machine conditions and (iii) hyper-
TJM, TCA and JDA. The shallow methods require manual feature parameter selection with intelligent optimization algorithms [52,
extraction, which may suffer from the interference of redundant 53].
and irrelevant features. And more importantly, this process is
not flexible and not able to meet the need of adaptivity. DTN Declaration of competing interest
establishes the domain adaptation in deep learning flow, and is
capable of adaptively learning intrinsic fault characteristics. It The authors declare that they have no known competing finan-
is also worth noting that the complexity of the domain adap- cial interests or personal relationships that could have appeared
tation process always changes with the scenarios. In the easy to influence the work reported in this paper.
transfer tasks, e.g. C→D and D→C, all these transfer learning
based techniques get the relatively satisfactory results. However, Acknowledgments
in several hard tasks, e.g. E→F and J→I, where the source and tar-
get data could be substantially dissimilar, the performance drop This work was supported by National Natural Science Founda-
in the comparative transfer methods, such as DTN with MDA, tion of China (Grant No. 11572167 and 11802152). The authors
convincingly illustrates that the difficulties of domain adaptation would like to express their sincere gratitude to Mr. Shaohua Li
will accordingly increase. The comprehensive assessments under for his contributions on the acquisition of experimental data.
diverse transfer scenarios further demonstrate the pivotal role of
JDA in DTN. References
This work proposes a novel diagnosis framework for consid-
ering the deep feature learning and cross-domain feature dis- [1] Lei Y, He Z, Zi Y. Eemd method and wnn for fault diagnosis of locomotive
tribution alignment simultaneously. It may overcome the short- roller bearings. Expert Syst Appl 2011;38(6):7334–41.
comings in existing studies and have a certain significance in [2] Jiang D, Liu C. Machine condition classification using deterioration feature
the practical diagnosis application. Although the effectiveness of extraction and anomaly determination. IEEE Trans Reliab 2011;60(1):41–8.
[3] Cui L, Huang J, Hao Z, Zhang F. Research on the meshing stiffness and vi-
proposed DTN with JDA has been demonstrated from the aspects bration response of fault gears under an angle-changing crack based on the
of diagnosis indices, feature visualization and loss convergence in universal equation of gear profile. Mech Mach Theory 2016;105:554–67.
ten experimental tasks, it still has limits in assumed conditions, [4] Gong X, Qiao W. Current-based mechanical fault detection for direct-drive
where the faults occur both in source and target domains, and wind turbines via synchronous sampling and impulse detection. IEEE Trans
Ind Electron 2015;62(3):1693–702.
the fault labels are also the same. However, the monitoring data
[5] Yunusa-Kaltungo A, Sinha JK, Nembhard AD. A novel fault diagnosis
of industrial process is mostly under health conditions, and the technique for enhancing maintenance and reliability of rotating machines.
occurred fault types may differ from the known ones in source Struct Health Monit 2015;14(6):231–62.
domain. As a result, these factors introduce additional difficul- [6] Cui L, Huang J, Zhang F, Chu F. Hvsrms localization formula and localization
ties into the application of transfer diagnosis framework. The law: Localization diagnosis of a ball bearing outer ring fault. Mech Syst
Signal Process 2019;120:608–29.
integration of data cleaning and selection techniques into this
[7] Shen Z, Chen X, Zhang X, He Z. A novel intelligent gear fault di-
framework has an important significance. agnosis model based on emd and multi-class tsvm. Measurement
2012;45(1):30–40.
7. Conclusion [8] Li Y, Wang X, Liu Z, Liang X, Si S. The entropy algorithm and its
variants in the fault diagnosis of rotating machinery: A review. IEEE Access
2018;6:66723–41.
Intelligent fault diagnosis in real industrial applications is suf- [9] Li Y, Wang X, Si S, Huang S. Entropy based fault classification using the
fering the difficulty of model re-training as of the discrepancy Case western reserve university data: A benchmark study. IEEE Trans
between the source domain (where the model is learnt) and the Reliab 2019. DOI: 10.1109/TR.2019.2896240.
T. Han, C. Liu, W. Yang et al. / ISA Transactions 97 (2020) 269–281 281

[10] Verstraete D, Ferrada A, Droguett EL, Meruane V, Modarres M. Deep [31] Shao H, Jiang H, Wang F, Wang Y. Rolling bearing fault diagnosis using
learning enabled fault diagnosis using time-frequency image analysis of adaptive deep belief network with dual-tree complex wavelet packet. ISA
rolling element bearings. Shock Vib 2017;2017:1–17. Trans 2017;187–201.
[11] Feng J, Lei Y, Guo L, Lin J, Xing S. A neural network constructed by [32] Jing L, Zhao M, Li P, Xu X. A convolutional neural network based feature
deep learning technique and its application to intelligent fault diagnosis learning and fault diagnosis method for the condition monitoring of
of machines. Neurocomputing 2018;272:619–28. gearbox. Measurement 2017;111:1–10.
[12] Jia F, Lei Y, Lin J, Zhou X, Lu N. Deep neural networks: A promising tool for [33] Zhang W, Peng G, Li C, Chen Y, Zhang Z. A new deep learning model for
fault characteristic mining and intelligent diagnosis of rotating machinery fault diagnosis with good anti-noise and domain adaptation ability on raw
with massive data. Mech Syst Signal Process 2016;72–73:303–15. vibration signals. Sensors 2017;17(2):425.
[13] Han T, Liu C, Yang W, Jiang D. A novel adversarial learning framework in [34] Liu R, Meng G, Yang B, Sun C, Chen X. Dislocated time series convolutional
deep convolutional neural network for intelligent diagnosis of mechanical neural architecture: An intelligent fault diagnosis approach for electric
faults. Knowl-Based Syst 2019;165:474–87. machine. IEEE Trans Ind Inf 2017;13(3):1310–20.
[14] Wen L, Li X, Gao L, Zhang Y. A new convolutional neural network [35] Sun W, Zhao R, Yan R, Shao S, Chen X. Convolutional discriminative
based data-driven fault diagnosis method. IEEE Trans Ind Electron feature learning for induction motor fault diagnosis. IEEE Trans Ind Inf
2017;65(7):5990–8. 2017;13(3):1350–9.
[15] Liu R, Yang B, Zio E, Chen X. Artificial intelligence for fault diagnosis of [36] Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng
rotating machinery: A review. Mech Syst Signal Process 2018;108:33–47. 2010;22(10):1345–59.
[16] Cerrada M, Sánchez RV, Li C, Pacheco F, Cabrera D, Oliveira JVD, [37] Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. J Big
Vásquez RE. A review on data-driven fault severity assessment in rolling Data 2016;3(1):9.
bearings. Mech Syst Signal Process 2018;99:169–96. [38] Long M, Wang J, Ding G, Sun J, Yu PS. Transfer joint matching for
[17] Oquab M, Bottou L, Laptev I, Sivic J. Learning and transferring mid- unsupervised domain adaptation. In: IEEE conference on computer vision
level image representations using convolutional neural networks. In: IEEE and pattern recognition. 2014, p. 1410–7.
conference on computer vision and pattern recognition. 2014, p. 1717–24. [39] Pan SJ, Tsang IW, Kwok JT, Yang Q. Domain adaptation via transfer
[18] Mun S, Shin M, Shon S, Kim W, Han DK, Ko H. DNN Transfer learning component analysis. IEEE Trans Neural Netw 2011;22(2):199.
based non-linear feature extraction for acoustic event classification. IEICE [40] Long M, Wang J, Ding G, Sun J, Yu PS. Transfer feature learning with joint
Trans Inf Syst 2017;100(9). distribution adaptation. In: IEEE international conference on computer
[19] Qureshi AS, Khan A, Zameer A, Usman A. Wind power prediction using vision. 2014, p. 2200–7.
deep neural network based meta regression and transfer learning. Appl [41] Long M, Cao Y, Wang J, Jordan MI. Learning transferable features with deep
Soft Comput 2017;58:742–55. adaptation networks, Eprint Arxiv (2015) 97–105.
[20] Khatami A, Babaie M, Tizhoosh HR, Khosravi A, Nguyen T, Nahavandi S. [42] Long M, Zhu H, Wang J, Jordan MI. Deep transfer learning with joint
A sequential search-space shrinking using CNN transfer learning and adaptation networks, Eprint Arxiv (2016).
a radon projection pool for medical image retrieval. Expert Syst Appl [43] Ghifary M, Kleijn WB, Zhang M, Balduzzi D, Li W. Deep reconstruction-
2018;100:224–33. classification networks for unsupervised domain adaptation. In: European
[21] Han T, Liu C, Yang W, Jiang D. Learning transferable features in deep conference on computer vision. 2016, p. 597–613.
convolutional neural networks for diagnosing unseen machine conditions. [44] Wen L, Gao L, Li X, Wen L, Gao L, Li X. A new deep transfer learning based
ISA Trans 2019. http://dx.doi.org/10.1016/j.isatra.2019.03.017. on sparse auto-encoder for fault diagnosis. IEEE Trans Syst Man Cybern
[22] Wei Y, Zhang Y, Yang Q. Learning to transfer, Eprint Arxiv (2017). 2017;1–9.
[23] Liu C, Jiang D, Yang W. Global geometric similarity scheme for feature [45] Lu W, Liang B, Cheng Y, Meng D, Yang J, Zhang T. Deep model
selection in fault diagnosis. Expert Syst Appl 2014;41(8):3585–95. based domain adaptation for fault diagnosis. IEEE Trans Ind Electron
[24] Zhao C, Feng Z, Wei X, Qin Y. Sparse classification based on dictio- 2017;64(3):2296–305.
nary learning for planet bearing fault identification. Expert Syst Appl [46] Xu Zhang S-FCSW, Yu FX. Deep transfer network: Unsupervised domain
2018;108:233–45. adaptation, Eprint Arxiv, arXiv:1503.00591 (2015).
[25] Lei Y, Jia F, Lin J, Xing S, Ding SX. An intelligent fault diagnosis method [47] Yosinski J, Clune J, Bengio Y, Lipson H. How transferable are features in
using unsupervised feature learning towards mechanical big data. IEEE deep neural networks?, Eprint Arxiv 27 (2014) 3320–3328.
Trans Ind Electron 2016;63(5):3137–47. [48] Center BD. Case western reserve university bearing data, http://csegroups.
[26] Han T, Jiang D, Zhao Q, Wang L, Yin K. Comparison of random forest, arti- case.edu/bearingdatacenter/pages/download-data-file, 2013.
ficial neural networks and support vector machine for intelligent diagnosis [49] Rauber TW, Boldt FDA, Varejao FM. Heterogeneous feature models and
of rotating machinery. Trans Inst Meas Control 2018;40(8):2681–93. feature selection applied to bearing fault diagnosis. IEEE Trans Ind Electron
[27] Costilla-Reyes O, Scully P, Ozanyan KB. Deep neural networks for learning 2015;62(1):637–46.
spatio-temporal features from tomography sensors. IEEE Trans Ind Electron [50] Xu J, Wang J, Izadi I, Chen T. Performance assessment and design for
2018;65(1):645–53. univariate alarm systems based on far, mar, and AAD. IEEE Trans Autom
[28] Han T, Liu C, Yang W, Jiang D. An adaptive spatiotemporal feature learning Sci Eng 2012;9(2):296–307.
approach for fault diagnosis in complex systems. Mech Syst Signal Process [51] Wang X, Hou Z, Yu W, Jin Z. Online fast deep learning tracker based on
2019;117:170–87. deep sparse neural networks. In: International conference on image and
[29] Jiao J, Zhao M, Lin J, Zhao J. A multivariate encoder information based graphics. Springer; 2017, p. 186–98.
convolutional neural network for intelligent fault diagnosis of planetary [52] Patwal RS, Narang N, Garg H. A novel TVAC-PSO based mutation strategies
gearboxes. Knowl-Based Syst 2018. algorithm for generation scheduling of pumped storage hydrothermal
[30] Lu C, Wang ZY, Qin WL, Ma J. Fault diagnosis of rotary machinery system incorporating solar units. Energy 2018;142:822–37.
components using a stacked denoising autoencoder-based health state [53] Garg H. A hybrid GSA-GA algorithm for constrained optimization problems.
identification. Signal Process 2017;130(C):377–88. Inform Sci 2019;478:499–523.

Bryson 2015
No ratings yet
Bryson 2015
17 pages
A New Convolutional Neural Network Based Data-Driven Fault Diagnosis Method
100% (1)
A New Convolutional Neural Network Based Data-Driven Fault Diagnosis Method
9 pages
Multisource Domain Feature Adaptation Network For Bearing Fault Diagnosis Under Time-Varying Working Conditions
No ratings yet
Multisource Domain Feature Adaptation Network For Bearing Fault Diagnosis Under Time-Varying Working Conditions
10 pages
IFDS An Intelligent Fault Diagnosis System With Multisource Unsupervised Domain Adaptation For Different Working Conditions
No ratings yet
IFDS An Intelligent Fault Diagnosis System With Multisource Unsupervised Domain Adaptation For Different Working Conditions
10 pages
PSNN-TADA Prototype and Stochastic Neural Network-Based Twice Adversarial Domain Adaptation For Fault Diagnosis Under Varying Working Conditions
No ratings yet
PSNN-TADA Prototype and Stochastic Neural Network-Based Twice Adversarial Domain Adaptation For Fault Diagnosis Under Varying Working Conditions
12 pages
Deep Convolutional Transfer Learning Network A New Method For Intelligent Fault Diagnosis of Machines With Unlabeled Data
No ratings yet
Deep Convolutional Transfer Learning Network A New Method For Intelligent Fault Diagnosis of Machines With Unlabeled Data
10 pages
Tie 2018 2877090
No ratings yet
Tie 2018 2877090
10 pages
Domain Conditioned Joint Adaptation Network For Intelligent Bearing Fault Diagnosis Across Different Positions and Machines
No ratings yet
Domain Conditioned Joint Adaptation Network For Intelligent Bearing Fault Diagnosis Across Different Positions and Machines
11 pages
Gaussian Mixture Variational-Based Transformer Domain Adaptation Fault Diagnosis Method and Its Application in Bearing Fault Diagnosis
No ratings yet
Gaussian Mixture Variational-Based Transformer Domain Adaptation Fault Diagnosis Method and Its Application in Bearing Fault Diagnosis
11 pages
Deep Transfer Learning in Mechanical Intelligent Fault Diagnosis: Application and Challenge
No ratings yet
Deep Transfer Learning in Mechanical Intelligent Fault Diagnosis: Application and Challenge
23 pages
An Intelligent Fault Diagnosis Scheme For Rotating Machinery Based On Supervised Domain Adaptation With Manifold Embedding
No ratings yet
An Intelligent Fault Diagnosis Scheme For Rotating Machinery Based On Supervised Domain Adaptation With Manifold Embedding
20 pages
Deep Convolutional Transfer Learning Network: A New Method For Intelligent Fault Diagnosis of Machines With Unlabeled Data
No ratings yet
Deep Convolutional Transfer Learning Network: A New Method For Intelligent Fault Diagnosis of Machines With Unlabeled Data
10 pages
Intelligent Fault Diagnosis With Deep Adversarial Domain Adaptation
No ratings yet
Intelligent Fault Diagnosis With Deep Adversarial Domain Adaptation
9 pages
Fault Diagnosis For Limited Annotation Signals and Strong Noise Based On Interpretable Attention Mechanism
No ratings yet
Fault Diagnosis For Limited Annotation Signals and Strong Noise Based On Interpretable Attention Mechanism
16 pages
07961149transfer Learning
No ratings yet
07961149transfer Learning
11 pages
EMDS2023
No ratings yet
EMDS2023
22 pages
A Multisource Domain Adaptation Network For Process Fault Diagnosis Under Different Working Conditions
No ratings yet
A Multisource Domain Adaptation Network For Process Fault Diagnosis Under Different Working Conditions
12 pages
Sensors: Bearing Fault Diagnosis Method Based On Deep Convolutional Neural Network and Random Forest Ensemble Learning
No ratings yet
Sensors: Bearing Fault Diagnosis Method Based On Deep Convolutional Neural Network and Random Forest Ensemble Learning
21 pages
Optimal Transport-Based Deep Domain Adaptation Approach For Fault Diagnosis of Rotating Machine
No ratings yet
Optimal Transport-Based Deep Domain Adaptation Approach For Fault Diagnosis of Rotating Machine
12 pages
Unsupervised Joint Subdomain Adaptation Network For Fault Diagnosis
No ratings yet
Unsupervised Joint Subdomain Adaptation Network For Fault Diagnosis
13 pages
Deep Transfer Learning Strategy For Efficient Domain Generalisation in Machine Fault Diagnosis
No ratings yet
Deep Transfer Learning Strategy For Efficient Domain Generalisation in Machine Fault Diagnosis
9 pages
An Intelligent Fault Diagnosis Method Based On Domain Adaptation and Its Application For Bearings Under Polytropic Working Conditions
No ratings yet
An Intelligent Fault Diagnosis Method Based On Domain Adaptation and Its Application For Bearings Under Polytropic Working Conditions
14 pages
Deep Neural Networks - A Promising Tool For Fault Characteristic Mining and Intelligent Diagnosis of Rotating Machinery With Massive Data PDF
No ratings yet
Deep Neural Networks - A Promising Tool For Fault Characteristic Mining and Intelligent Diagnosis of Rotating Machinery With Massive Data PDF
13 pages
Research Article
No ratings yet
Research Article
12 pages
Fault Diagnosis For Electromechanical Drivetrains Using A Joint Distribution Optimal Deep Domain Adaptation Approach
No ratings yet
Fault Diagnosis For Electromechanical Drivetrains Using A Joint Distribution Optimal Deep Domain Adaptation Approach
10 pages
A Coarse-To-Fine Bilevel Adversarial Domain Adaptation Method For Fault Diagnosis of Rolling Bearings
No ratings yet
A Coarse-To-Fine Bilevel Adversarial Domain Adaptation Method For Fault Diagnosis of Rolling Bearings
14 pages
Applsci 14 00551 v2
No ratings yet
Applsci 14 00551 v2
20 pages
Jia 2016
No ratings yet
Jia 2016
13 pages
A New Health Indicator Extracted by Unsupervised Learning Using Autoencoder in Tandem With T-Sne and Multi-Kernel CNN To Enhance The Early Detection and Classification of Bearings Multi-Faults
No ratings yet
A New Health Indicator Extracted by Unsupervised Learning Using Autoencoder in Tandem With T-Sne and Multi-Kernel CNN To Enhance The Early Detection and Classification of Bearings Multi-Faults
14 pages
A Novel Bearing Imbalance Fault-Diagnosis Method Based On A Wasserstein
No ratings yet
A Novel Bearing Imbalance Fault-Diagnosis Method Based On A Wasserstein
9 pages
A Novel Intelligent Fault Diagnosis Method For Gearbox Based On Multi-Dimensional Attention Denoising Convolution
No ratings yet
A Novel Intelligent Fault Diagnosis Method For Gearbox Based On Multi-Dimensional Attention Denoising Convolution
24 pages
An Adaptive Fault Diagnosis Framework Under Class-Imbalanced Conditions Based On Contrastive Augmented Deep Reinforcement Learning
No ratings yet
An Adaptive Fault Diagnosis Framework Under Class-Imbalanced Conditions Based On Contrastive Augmented Deep Reinforcement Learning
14 pages
Open-Set Domain Adaptation in Machinery Fault Diagnostics Using Instance-Level Weighted Adversarial Learning
No ratings yet
Open-Set Domain Adaptation in Machinery Fault Diagnostics Using Instance-Level Weighted Adversarial Learning
11 pages
Novel Convolutional Neural Network (NCNN) For The Diagnosis of Bearing Defects in Rotary Machinery
No ratings yet
Novel Convolutional Neural Network (NCNN) For The Diagnosis of Bearing Defects in Rotary Machinery
10 pages
A Multimodal Feature Fusion-Based Deep Learning Method For Online Fault Diagnosis of Rotating Machinery
No ratings yet
A Multimodal Feature Fusion-Based Deep Learning Method For Online Fault Diagnosis of Rotating Machinery
26 pages
Embedding-Enhanced Graph Attention Networks For Imbalanced Industrial Fault Diagnosis
No ratings yet
Embedding-Enhanced Graph Attention Networks For Imbalanced Industrial Fault Diagnosis
12 pages
A Systematic Review of Deep Transfer Lea
No ratings yet
A Systematic Review of Deep Transfer Lea
57 pages
Energies 12 03937
No ratings yet
Energies 12 03937
19 pages
LSTM Deep Learning Approach For Bearing Fault Diagnosis
No ratings yet
LSTM Deep Learning Approach For Bearing Fault Diagnosis
14 pages
Sensors: Fault Diagnosis From Raw Sensor Data Using Deep Neural Networks Considering Temporal Coherence
No ratings yet
Sensors: Fault Diagnosis From Raw Sensor Data Using Deep Neural Networks Considering Temporal Coherence
17 pages
Deep Adversarial Domain Adaptation Model For Bearing Fault Diagnosis
No ratings yet
Deep Adversarial Domain Adaptation Model For Bearing Fault Diagnosis
10 pages
Highly-Accurate Machine Fault Diagnosis Using Deep Transfer Learning
100% (1)
Highly-Accurate Machine Fault Diagnosis Using Deep Transfer Learning
9 pages
Wang Et Al - 2020 - Multi-Scale Deep Intra-Class Transfer Learning For Bearing Fault Diagnosis
No ratings yet
Wang Et Al - 2020 - Multi-Scale Deep Intra-Class Transfer Learning For Bearing Fault Diagnosis
15 pages
IJRPR26093
No ratings yet
IJRPR26093
6 pages
J Neucom 2019 05 052
No ratings yet
J Neucom 2019 05 052
38 pages
Deep Learning Enabled Fault Diagnosis
No ratings yet
Deep Learning Enabled Fault Diagnosis
18 pages
A Convolutional Neural Network Based On A Capsule Network With Strong Generalization For Bearing Fault Diagnosis
No ratings yet
A Convolutional Neural Network Based On A Capsule Network With Strong Generalization For Bearing Fault Diagnosis
14 pages
A Novel Data Augmentation Method Based On Denoising Diffusion Probabilistic Model For Fault Diagnosis Under Imbalanced Data
No ratings yet
A Novel Data Augmentation Method Based On Denoising Diffusion Probabilistic Model For Fault Diagnosis Under Imbalanced Data
12 pages
Neurocomputing: Feng Jia, Yaguo Lei, Liang Guo, Jing Lin, Saibo Xing
No ratings yet
Neurocomputing: Feng Jia, Yaguo Lei, Liang Guo, Jing Lin, Saibo Xing
10 pages
Bearing Fault Diagnosis Under Variable Working Conditions Base On Contrastive Domain Adaptation Method
No ratings yet
Bearing Fault Diagnosis Under Variable Working Conditions Base On Contrastive Domain Adaptation Method
11 pages
Energies 16 07680 v2
No ratings yet
Energies 16 07680 v2
19 pages
Machines 12 00787
No ratings yet
Machines 12 00787
17 pages
Entropy 27 00181
No ratings yet
Entropy 27 00181
25 pages
A Regularized Deep Clustering Method For Fault Trend Analysis
No ratings yet
A Regularized Deep Clustering Method For Fault Trend Analysis
7 pages
Deep Fuzzy Echo State Networks For Machinery Fault Diagnosis
No ratings yet
Deep Fuzzy Echo State Networks For Machinery Fault Diagnosis
14 pages
A Bearing Fault and Severity Diagnostic Technique Using Adaptive Deep Belief Networks and Dempster-Shafer Theory
No ratings yet
A Bearing Fault and Severity Diagnostic Technique Using Adaptive Deep Belief Networks and Dempster-Shafer Theory
22 pages
Enhancing Reliability Through Interpretability A Comprehensive Survey of Interpretable Intelligent Fault Diagnosis in Rotating Machinery
No ratings yet
Enhancing Reliability Through Interpretability A Comprehensive Survey of Interpretable Intelligent Fault Diagnosis in Rotating Machinery
32 pages
Balanced Adaptation Regularization Based Transfer Learning For Unsupervised Cross-Domain Fault Diagnosis
No ratings yet
Balanced Adaptation Regularization Based Transfer Learning For Unsupervised Cross-Domain Fault Diagnosis
13 pages
1 s2.0 S0003682X2500009X Main
No ratings yet
1 s2.0 S0003682X2500009X Main
11 pages
Uncertainty Theories and Multisensor Data Fusion
From Everand
Uncertainty Theories and Multisensor Data Fusion
Alain Appriou
No ratings yet
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Huang 2018
No ratings yet
Huang 2018
5 pages
Rolling Element Bearing Failure Analysis A Case ST PDF
0% (1)
Rolling Element Bearing Failure Analysis A Case ST PDF
3 pages
ISA Transactions: Te Han, Chao Liu, Wenguang Yang, Dongxiang Jiang
No ratings yet
ISA Transactions: Te Han, Chao Liu, Wenguang Yang, Dongxiang Jiang
13 pages
Detecting Rolling Elements Bearings Faults: February 2020
No ratings yet
Detecting Rolling Elements Bearings Faults: February 2020
57 pages
Envelope Bearing Analysis: Theory and Practice: IEEE Aerospace Conference Proceedings April 2005
No ratings yet
Envelope Bearing Analysis: Theory and Practice: IEEE Aerospace Conference Proceedings April 2005
10 pages
Huang 2018
No ratings yet
Huang 2018
5 pages
Deep Learning Algorithms For Bearing Fault Diagnostics-A Comprehensive Review
No ratings yet
Deep Learning Algorithms For Bearing Fault Diagnostics-A Comprehensive Review
25 pages
A Transfer Convolutional Neural Network For Fault Diagnosis Based On Resnet-50
No ratings yet
A Transfer Convolutional Neural Network For Fault Diagnosis Based On Resnet-50
14 pages
Five Star Health Safety Audit Factsheet
No ratings yet
Five Star Health Safety Audit Factsheet
1 page
Sba-Intro 1
No ratings yet
Sba-Intro 1
40 pages
Preliminaries
No ratings yet
Preliminaries
11 pages
Other Checklist Example - Lab
No ratings yet
Other Checklist Example - Lab
2 pages
KITE Report Biogas Ghana 2008
No ratings yet
KITE Report Biogas Ghana 2008
80 pages
Denney, A. & Tewksbury, R. (2013) - How To Write...
No ratings yet
Denney, A. & Tewksbury, R. (2013) - How To Write...
18 pages
10 1 1 108 40 PDF
No ratings yet
10 1 1 108 40 PDF
62 pages
Eapp 4TH Q Adm
No ratings yet
Eapp 4TH Q Adm
32 pages
Mba Tancet Cutoff 2021
No ratings yet
Mba Tancet Cutoff 2021
23 pages
Thesis Vs Directional Statement
100% (2)
Thesis Vs Directional Statement
4 pages
A Comprehensive Review of Adaptive Building Energy Management Systems Based On Users Feedback
No ratings yet
A Comprehensive Review of Adaptive Building Energy Management Systems Based On Users Feedback
7 pages
6 - Event Studies
No ratings yet
6 - Event Studies
31 pages
Visual Scrapbook Introducing Sex Education
No ratings yet
Visual Scrapbook Introducing Sex Education
19 pages
Perception of Students and Faculty On Problem - Based Learning As A Teaching Strategy
No ratings yet
Perception of Students and Faculty On Problem - Based Learning As A Teaching Strategy
20 pages
Principal Components Analysis
No ratings yet
Principal Components Analysis
50 pages
Recruitment, Selection, and Assessment: Are The CV and Interview Still Worth Using?
No ratings yet
Recruitment, Selection, and Assessment: Are The CV and Interview Still Worth Using?
6 pages
A Comparative Study On Teacher Leadership in Special Education Classroom Between China and Malaysia
No ratings yet
A Comparative Study On Teacher Leadership in Special Education Classroom Between China and Malaysia
5 pages
M6 Assignment-SpringA25 - WFAvQj1
No ratings yet
M6 Assignment-SpringA25 - WFAvQj1
20 pages
Group 3 Oral Com...
No ratings yet
Group 3 Oral Com...
26 pages
Bio Teknik Menjawab
No ratings yet
Bio Teknik Menjawab
33 pages
2007 Census Admin Report
No ratings yet
2007 Census Admin Report
125 pages
Michael Virant
No ratings yet
Michael Virant
2 pages
Research Methodology 2024-25
No ratings yet
Research Methodology 2024-25
3 pages
Gregmat Advice: (My Plan)
No ratings yet
Gregmat Advice: (My Plan)
11 pages
Lincoln Hospital Analysis
No ratings yet
Lincoln Hospital Analysis
20 pages
Cecily CV2023
No ratings yet
Cecily CV2023
3 pages
Fesen - Space Sciences
No ratings yet
Fesen - Space Sciences
61 pages
Pine Needle Length Comparisons in Conifers
0% (1)
Pine Needle Length Comparisons in Conifers
6 pages
Tejas Bhai
No ratings yet
Tejas Bhai
13 pages

ISA Transactions: Te Han, Chao Liu, Wenguang Yang, Dongxiang Jiang

Uploaded by

ISA Transactions: Te Han, Chao Liu, Wenguang Yang, Dongxiang Jiang

Uploaded by

ISA Transactions 97 (2020) 269–281

Contents lists available at ScienceDirect

Deep transfer network with joint distribution adaptation: A new

generalization error cannot be large enough to guarantee the 2. Related works

where Js and Jt is the joint probability distribution of Ds and Dt ,

Fig. 3. The architecture of DTN for unsupervised domain adaptation.

Fig. 4. Illustration of wind turbine experimental platform.

Fig. 5. Time waveforms of collected vibration data.

Table 3 validation. In RF, the performance is substantially robust to the

Fig. 8. Comparison of the MAR of diverse methods on ten transfer tasks.

Fig. 9. Comparison of the FAR of diverse methods on ten transfer tasks.

You might also like