0% found this document useful (0 votes)
32 views14 pages

An Adaptive Fault Diagnosis Framework Under Class-Imbalanced Conditions Based On Contrastive Augmented Deep Reinforcement Learning

Uploaded by

Huynh Duy Phuong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views14 pages

An Adaptive Fault Diagnosis Framework Under Class-Imbalanced Conditions Based On Contrastive Augmented Deep Reinforcement Learning

Uploaded by

Huynh Duy Phuong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Expert Systems With Applications 234 (2023) 121001

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

An adaptive fault diagnosis framework under class-imbalanced conditions


based on contrastive augmented deep reinforcement learning
Qin Zhao a, b, c, Yu Ding a, b, c, *, Chen Lu a, b, c, Chao Wang a, b, c, Liang Ma a, b, c, Laifa Tao a, b, c,
Jian Ma a, b, c
a
Institute of Reliability Engineering, Beihang University, Beijing 100191, China
b
Science & Technology on Reliability & Environmental Engineering Laboratory, Beijing 100191, China
c
School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China

A R T I C L E I N F O A B S T R A C T

Keywords: In practical scenarios, it is difficult to acquire fault data from rotating machinery, resulting in class-imbalanced
Class-imbalanced condition problems in the fault diagnosis field. Training a fault diagnosis model directly on an imbalanced dataset may lead
Fault diagnosis to overfitting for minority classes and skewness toward majority classes. Thus, we propose a fault diagnosis
Deep reinforcement learning
framework based on contrastive augmented deep reinforcement learning (CADRL) with two-stage training. In the
Contrastive learning
pretraining stage, we obtain sample pairs based on the batch construction strategy to calculate the contrastive
Rotating machinery
loss. Then, this loss is used to train a feature extraction model to reduce intraclass distances and increase
interclass distances. During the fine-tuning stage, an adaptive reward function that updates with the sample label
distribution is adopted in the fault diagnosis model. This function can balance the attention given by the model to
different fault modes and improve fault diagnosis performance on imbalanced data without prior knowledge.
Case studies conducted on two public datasets demonstrate that the pretraining stage can provide a well-trained
feature extraction model, which can be merged into the proposed fault diagnosis model to achieve better fault
diagnosis performance than that of other advanced models.

1 Introduction Furthermore, adequate and balanced labeled samples are needed to


train intelligent fault diagnosis models (Z. He, Shao, Cheng, et al., 2020).
Rotating machinery is widely used as a key equipment component in However, in practical fault diagnosis scenarios involving rotating ma­
the aerospace, rail transit, and wind power generation fields. The fail­ chinery, the number of normal samples is far greater than the number of
ures of such machinery directly affect the normal operations of the fault samples. Thus, the training processes of intelligent data-driven
associated equipment and can even cause serious property damage or fault diagnosis models usually suffer from the class imbalance problem
human casualties. Thus, it is necessary to study fault diagnosis methods caused by sample quantity differences. Intelligent fault diagnosis models
for rotating machinery to guarantee the reliability and safety of equip­ trained on imbalanced datasets may have problems such as minority
ment. Traditional fault diagnosis methods mostly use signal processing overfitting, majority dominance, and poor generalization performance
techniques to extract the characteristics of vibration signals and recog­ (T. Zhang et al., 2022). Therefore, it is important to construct an effec­
nize fault modes based on artificial experience. They are usually inef­ tive fault diagnosis framework that is suitable for imbalanced datasets to
ficient when dealing with high-dimensional or massive data due to their achieve accurate fault recognition.
heavy reliance on human experience. Recently, with the development of The recently developed typical class-imbalanced fault diagnosis
artificial intelligence technology, intelligent fault diagnosis methods methods can be divided into two types, including data-level methods
that can adaptively mine deep features without relying on expert and model-level methods. Data-level methods are mainly divided into
experience have attracted extensive attention (R. Liu et al., 2018). two categories: resampling and data augmentation approaches. Such

* Corresponding author.
E-mail addresses: [email protected] (Q. Zhao), [email protected] (Y. Ding), [email protected] (C. Lu), [email protected] (C. Wang),
[email protected] (L. Ma), [email protected] (L. Tao), [email protected] (J. Ma).

https://doi.org/10.1016/j.eswa.2023.121001
Received 11 November 2022; Received in revised form 7 July 2023; Accepted 13 July 2023
Available online 20 July 2023
0957-4174/© 2023 Published by Elsevier Ltd.
Q. Zhao et al. Expert Systems With Applications 234 (2023) 121001

methods tend to reduce the impact of imbalanced distributions by with deep reinforcement learning to get substantial sample-efficiency
rebalancing the datasets during the preprocessing stage. Resampling gains. It is a type of discriminative models of representation learning,
methods mainly include undersampling (Prakash et al., 2021), over­ which can fully learn abstract semantic information representation and
sampling (Yi et al., 2020), and mixture sampling (Mao et al., 2017; extract high-level features of original datasets. Many works combining
Swana et al., 2022). However, when the number of majority class CL with deep reinforcement learning (Laskin et al., 2020; Ma et al.,
samples is far greater than the number of minority class samples, 2021; Zhu et al., 2022) have achieved remarkable success in improving
resampling methods can hardly improve the fault diagnosis performance the data learning efficiency and feature representation ability. However,
of the resulting models. In contrast, they may change the distributions of these works focused on off-policy control and robot tasks, which are
the original imbalanced datasets and aggravate both minority over­ quite different from fault diagnosis problems. Inspired by the above
fitting and majority underfitting problems. The most widely used data methods, we propose a class-imbalanced fault diagnosis framework
augmentation methods in the fault diagnosis field are generative based on contrastive augmented DRL (CADRL), which has the ability to
adversarial networks (GANs) (Goodfellow et al., 2020) and their perform CL and possesses an adaptive reward mechanism. The proposed
improved variants (Shao et al., 2019; Z. Wang et al., 2018). These net­ framework uses CL to improve data efficiency and effectiveness of DRL-
works rebalance datasets by generating minority class samples in based fault diagnosis models. It is implemented through two stages: a
imbalanced datasets(S. Liu et al., 2021; W. Li et al., 2022). Nevertheless, pretraining stage and a fine-tuning stage. In the pretraining stage, the
a data augmentation method trained on an imbalanced dataset with a feature extraction model is optimized by contrastive loss to acquire the
serious lack of fault samples is unstable. The fault samples generated by ability to represent contrastive features. During the fine-tuning stage,
the data augmentation method are similar to the small number of real the fault diagnosis agent constructed based on the pretrained model is
fault samples in the original imbalanced dataset, which leads to a very trained on the original imbalanced datasets to enhance the fault
limited fault recognition accuracy improvement. Additionally, model- recognition ability of the model. Furthermore, we design an adaptive
level approaches include reweighting-based methods and model reward function for the fault diagnosis model, which can help reduce the
modification-based methods. Reweighting-based methods, such as the rewards of the majority classes and increase the attention given to mi­
imbalance-weighted loss, cost-weighted loss, and weighted cross- nority classes.
entropy loss (Geng et al., 2020; H. Liu et al., 2021; Ren et al., 2022), In summary, the main contributions of the paper are summarized as
can balance the attention given by models to different fault categories by follows:
assigning higher weights to minority classes or lower weights to ma­ (1) Focusing on the interclass overlap problem that leads to low fault
jority classes. However, the weight values of different fault categories recognition performance, CL is incorporated into the feature extraction
are usually provided based on prior knowledge or according to the model, which can help to reduce the intraclass distances and increase
number of samples (Cui et al., 2019; Du et al., 2020). In addition, the interclass distances in imbalanced datasets.
reweighting-based methods may alter or distort the original distribu­ (2) To address the minority overfitting problem, an adaptive reward
tions of imbalanced datasets, further affecting the deep feature repre­ function is adopted in the proposed DRL-based fault diagnosis model to
sentations of such datasets. Model modification-based methods mainly replace the classic reward function. The proposed reward function can
include transfer learning (Z. He, Shao, Wang, et al., 2020; Kuang et al., adaptively update the model according to the real-time sample label
2021), contrastive learning (CL) (Hou et al., 2022; J. Zhang et al., 2022), distribution, contributing to enhancing the attention given by the model
and ensemble learning (Jia et al., 2020; Liang et al., 2022). These to the minority classes.
methods can obtain more prior knowledge related to fault modes and (3) A novel two-stage fault diagnosis framework, which systemati­
transfer it to the classification task, which can further enhance the fault cally integrates the CL method and DRL-based model, is proposed. The
recognition ability of the resulting model. However, modifying the performance of the proposed two-stage framework is further verified in
model architecture for a concrete task requires a specific architecture both comparison experiments and ablation experiments, illustrating its
design based on prior knowledge, so this technique is difficult to apply to advantages and generalization to imbalanced conditions.
different task scenarios. The rest of the paper is organized as follows. The related work on CL
In the fault diagnosis field, the aforementioned methods can solve and deep Q networks (DQNs) are introduced in Section 2. In Section 3,
the problems caused by imbalanced datasets to a certain extent, but they we introduce the feature extraction model, the fault diagnosis frame­
still have three main limitations: (I) most methods concentrate on one work based on CADRL, and the entire fault diagnosis process. In Section
strategy, while few combine multiple strategies to address class- 4, case studies are implemented on imbalanced bearing datasets from
imbalanced fault diagnosis problems; (II) when the given datasets are Paderborn University (PU) and Jiangnan University (JNU) to verify the
extremely class-imbalanced, data-level methods cannot increase the effectiveness and applicability of the CADRL fault diagnosis framework.
diversity of the minority classes and solve the overfitting problem; and
(III) the weights and architecture of the class-imbalanced fault diagnosis 2 Related work
models are set based on prior knowledge obtained under specific sce­ 2.1 CL
narios, leading to poor adaptability and generalization. Hence, it is
important to propose an effective and adaptive fault diagnosis frame­ CL (Chopra et al., 2005) was first proposed as a self-supervised
work to decrease the impact of imbalanced data distributions. learning method that forms contrastive sample representations by
Recently, deep reinforcement learning (DRL)-based methods have mining the similarities among samples. Furthermore, supervised CL
been widely used to solve mechanical fault diagnosis problems (Ding (SCL) (Khosla et al., 2020) has been proposed as an extension of self-
et al., 2019; H. Wang et al., 2021). Meanwhile, a lot of variants and supervised CL; this method uses category labels to construct sample
improvements of DRL-based methods are proposed to address the class- pairs for model training. The main idea of CL is to learn contrastive
imbalance fault diagnosis problems. For example, Lin et al. (Lin et al., feature representations in the given feature space. CL can help reduce
2020) proposed an imbalanced classification model based on DRL to the distances between the anchors and positive samples and increase the
address the class imbalance issue in image recognition. Fan et al. (Fan distances between the anchors and negative samples (Le-Khac et al.,
et al., 2021) proposed an imbalanced sample selection strategy based on 2020).
DRL to achieve better fault diagnosis performance on imbalanced In recent years, the research content and application fields related to
datasets. Such studies have demonstrated that DRL-based methods have CL have become increasingly extensive. Among them, the typical
great potential in class-imbalanced fault diagnosis problems. Further­ application of CL is to provide improved visual representation effects in
more, inspired by the significant success in CV, NLP, and other fields (K. areas including image classification and object detection. For example,
He et al., 2020; Rethmeier & Augenstein, 2023), CL is used to combine Chen et al. (Chen et al., 2020) proposed a simple CL framework for

2
Q. Zhao et al. Expert Systems With Applications 234 (2023) 121001

visual representations (SimCLR). He et al. (K. He et al., 2020) proposed the model works.
the momentum contrastive (MOCO) learning framework. Recently, re­ DDQN first uses an evaluation network Qeval to calculate the Q value
searchers have incorporated CL into the field of mechanical fault diag­ corresponding to each action in state st+1 . Then, it selects the action
nosis, which can help to learn deep representations of mechanical value amax (st+1 ; θ) corresponding to the maximum Q value. Finally, it
signals and improve fault recognition accuracy. Ragab et al. (Ragab uses the target network Qtarget to calculate the target value based on the
et al., 2022) proposed a conditional contrastive domain generalization selected action value. Furthermore, the loss of the DDQN algorithm is
method for rolling mechanical fault diagnosis, in which CL is used to defined as:
enhance the applicability and generalization of the model in different ∑
conditions. In addition, CL is also adopted to solve class-imbalanced L(θ) = [r + γQ(st+1 , amax (st+1 ; θ); θ− ) − Q(st , at ; θ)]2 (2)
fault diagnosis problems and improve minority identification perfor­
(st ,at ,rt ,st+1 )∈D

mance. Hou et al. (Hou et al., 2022) proposed an adaptive weighted CL Recently, DQN and its improved variants have achieved good
method as an imbalanced learning strategy for mechanical long-tailed application results in many fields, such as games (Osband et al., 2016),
datasets. Peng et al. (Peng et al., 2022) proposed progressively decision making (Andriotis & Papakonstantinou, 2019), robotics
balanced supervised CL to learn deep feature representations and (Sivaranjani & Vinod, 2023), and pattern recognition (Qiao et al., 2018),
improve fault diagnosis performance on long-tailed datasets. by utilizing their powerful optimization ability and autonomous
decision-making ability. Some researchers learned from successful ex­
2.2 DQN periences in the field of pattern recognition and adopted DQN to solve
fault diagnosis problems involving mechanical equipment. Ding et al.
DQN is a classic DRL method that combines deep learning with (Ding et al., 2019) proposed an intelligent fault diagnosis method based
nonlinear mapping ability and reinforcement learning with long-term on DQN, which realizes fault diagnosis by autonomously mining the
decision-making ability. Thus, it can achieve precise mapping from relationship between the original signal and the corresponding fault
high-dimensional inputs to optimal policies through end-to-end rein­ mode. Wang et al. (H. Wang et al., 2021) proposed a fault diagnosis
forcement learning (Mnih et al., 2015). DQN utilizes deep convolutional method based on DQN, which transformed the fault diagnosis problem
neural networks (DCNNs) to approximate the optimal action-value into a sequential decision-making problem. Dangut et al. (Dangut et al.,
function Q(st , at ; θ). In addition, DQN uses an experience replay mech­ 2022) proposed an aircraft fault prediction method based on the DQN
anism to randomly select a certain number of samples in the experience model with an improved reward function and a replay mechanism for
replay buffer to calculate the loss: situations with rare fault samples. Therefore, we use a DRL-based
[( ) ] 2
method to construct a fault diagnosis model for imbalanced fault diag­
L(θ) = E(st ,at ,rt ,st+1 ) r + γmaxQ(st+1 , at+1 ; θ− ) − Q(st , at ; θ) (1) nosis, which has self-learning ability, strategy optimization ability, and a
at+1
special reward mechanism.

where st+1 represents the next state, at+1 represents the next action, st 3 The proposed class-imbalanced fault diagnosis framework
represents the current state, and at represents the current action. θ de­
notes the parameters of the action-value function Q, and θ− represents In this section, first, we introduce a feature extraction model based
the parameters of the target action-value function Q.
̂ on the contrastive loss. Then, a detailed description of the fault diagnosis
DQN calculates all the Q values in state st+1 and then selects the procedure based on the DRL method, especially the adaptive reward
action corresponding to the maximum Q value to participate in the function, is introduced. Finally, our fault diagnosis method based on the
calculation of the target value through Q.
̂ The DQN learning strategy CADRL framework is introduced in detail, including imbalanced data­
introduces some problems, such as overestimation, non-convergence, sets under-sampling, feature extraction model pretraining, fault diag­
and unstable training processes. Therefore, van Hasselt et al. (van nosis model fine-tuning, and pretrained model testing.
Hasselt et al., 2016) proposed the double DQN (DDQN). The general
architecture of the DDQN model is shown in Fig. 1, which explains how 3.1 Feature extraction model based on CL

Fig. 1. The general architecture of the DDQN model.

3
Q. Zhao et al. Expert Systems With Applications 234 (2023) 121001

In the pretraining stage of the feature extraction model, we adopt a process, the agent receives the first sample x1 as its initial state s1 . At
supervised CL loss to train the anchor to move closer to the positive each timestep, the state st corresponds to sample xt . When the next
samples and gradually move away from the negative samples. Referring training iteration begins, the order of the samples in the training set is
to the batch construction practice for sample pairs in the supervised shuffled, and the state space is changed accordingly.
contrastive (SupCon) loss (Khosla et al., 2020), the sample pairs are Action space A: The actions of the fault diagnosis agent correspond to
constructed to compute the contrastive loss of the feature extraction the category labels of the fault samples. The agent takes action at ∈ A in
model. Considering the difficulty of generating new samples from vi­ state st ∈ S. For the multiclassification problem studied in this paper,
bration signals, a single batch of samples is directly used to construct A = {0, 1, 2, ⋯, N}, where 0 represents the category label of normal
sample pairs to reduce the difficulty of implementing the pretraining samples and 1 ∼ N represents the category labels of fault samples.
model. In addition, the dot product between each pair of samples is Discount factor γ: The general range of the discount factor is [0, 1],
adopted as the similarity measure, and the cross-entropy loss is adopted which is used to balance the rewards of the fault diagnosis results at the
to constrain the similarity between samples. Furthermore, due to the current time and at the future time. The closer gamma is to 0, the more
large sample number gap between different categories in imbalanced attention the agent pays to the current fault diagnosis result.
datasets, the number of positive samples matching a majority class Reward function R: When the DDQN method is used to solve a fault
sample is much larger than that of negative samples. Thus, the number diagnosis problem, if the agent can correctly identify the category label
of positive samples is used to normalize the contrastive loss of the of the given sample, the environment feeds positive rewards back to the
sample. Finally, the loss value obtained by the feature extraction model agent according to the reward function; otherwise, it feeds back negative
based on a batch of samples is the average of the loss values of all rewards. The reward function can be expressed as follows:
samples in the batch. The loss function of the feature extraction model {
1 if at = lt
can be expressed as follows: R= (4)
− 1 otherwise
1 ∑N
1 ∑N
exp(xi xTi ) However, the fault diagnosis model constructed based on the DDQN
Lsup = ∑N 1yi =yj log ∑N (3)
N i=1 j=1 1yi =yj j=1 method under imbalanced data conditions often skews toward majority
T
k=1 exp(xi xk )

class samples and ignores the characteristics of minority class samples,


where xi represents the sample and yi is the category label of xi . N resulting in low fault diagnosis accuracy for minority class samples. To
represents the batch size and the number of sample pairs in a batch. improve the fault classification performance achieved on imbalanced
In addition, considering that the fault samples constructed based on datasets, a new adaptive reward function is proposed to construct an
the vibration signals of rotating machinery are one-dimensional (1D), improved DDQN (IDDQN) as the fault diagnosis model for class-
the 1D-DCNN is adopted for constructing a feature extraction model to imbalanced scenarios. During each epoch of the training process, the
retain the original features of the vibration signals. The structure of the IDDQN model can adaptively provide the corresponding reward values
feature extraction model based on the contrastive loss is shown in Fig. 2, of various fault modes according to their sample distributions in the
which mainly includes two convolutional (Conv) layers, two maximum experience replay buffer. There is no need to give rewards in advance or
pooling (Pool) layers, and two fully connected (FC) layers. manually update rewards. The IDDQN model produces higher rewards
or punishments for health states with fewer samples and lower rewards
3.2 Fundamental fault diagnosis framework based on DRL or punishments for health states with more samples. The reward func­
tion of the IDDQN model is defined as follows:
The proposed fault diagnosis process for class-imbalanced conditions {
is based on the DDQN, which is a value-based DRL algorithm. The fault λi = λ0 ⋅nmin /ni at = lt
R(st , at , lt ) = (5)
diagnosis process contains four main elements, including the state space, − λi = − λ0 ⋅nmin /ni at ∕
= lt
action space, reward function, and discount factor for calculating the
cumulative reward. The details of these elements are described as where ni represents the sample size of a fault with category ci , and its
follows. reward value is recorded as ±λi . nmin represents the sample size of fault
State space S: In fault diagnosis problems, the state space S is cmin with the smallest sample size among all fault categories, and its
determined by the training set samples. At the beginning of the training reward value is set to ±λ0 . According to the reward function of the

Fig. 2. The architecture of the feature extraction model based on CL.

4
Q. Zhao et al. Expert Systems With Applications 234 (2023) 121001

classic DQN algorithm, in which the reward value is within the range [0, pairs. Then, the sample pairs are used to train the feature extraction
1], the reward value λ0 of fault cmin is set to 1 as the initial reward value. model and calculate the contrastive loss of the batch samples. The
When the agent correctly or incorrectly diagnoses the sample of fault contrastive loss is calculated by Eq. (1). Furthermore, the stochastic
cmin , the environment feeds a reward value of 1 or − 1 back toward the gradient descent (SGD) method is adopted to update the parameters of
agent. When the agent correctly or incorrectly diagnoses the samples in the feature extraction model. Finally, the loss value of the feature
fault ci , the environment adaptively gives reward values of λi or − λi extraction model converges after all training iterations are completed.
according to the proportion of the number of various fault samples in the The obtained pretrained feature extraction model can decrease the
current experience replay buffer. intraclass distance and increase the interclass distance. Finally, a well-
trained feature extraction model with prior knowledge of the contras­
3.3 Class-imbalanced fault diagnosis method based on CADRL tive feature representation of the imbalanced dataset is acquired, and
this model can efficiently extract fault features for the subsequent
The fault diagnosis process for class-imbalanced conditions based on diagnosis process.
the CADRL framework is shown in Fig. 3. It involves three steps, Step 3: Training the fault diagnosis model based on DRL.
including preparing the training and testing datasets, pretraining the CL- First, the fault diagnosis problem is defined as a sequential decision-
based feature extraction model, training the fault diagnosis model based making problem in the CADRL framework. Each state is a sample, the
on CADRL, and testing the obtained fault diagnosis model. action corresponds to the category label of the sample, and the reward is
Step 1: Preparing the training and testing datasets. the feedback sent from the environment to the fault diagnosis agent.
Vibration signals are collected from rotating machinery in each fault Aiming at the weak generalization ability of the existing DRL-based
mode to form the training set and validation set. To increase the number models in class-imbalanced fault diagnosis scenarios, an improved
of samples, a fixed sliding window is used to divide the original vibration reward function is introduced to the training process of the IDDQN
signals into multiple samples. Directly using an imbalanced dataset to model, which replaces the reward function in the classic DDQN model.
train the DCNN-based feature extraction model usually leads to some The proposed improved reward function can be dynamically and
problems, including skewness toward the majority class samples and adaptively updated according to the sample distribution of each cate­
overfitting for the minority class samples. Therefore, the rebalancing gory. Furthermore, the pretrained feature extraction model is integrated
training sets for the feature extraction model are obtained by the random into the IDDQN model, which plays a key role in building the CADRL-
under-sampling (RUS) method. based fault diagnosis model.
Step 2: Pretraining the feature extraction model based on CL. During the training process of the fault diagnosis model, the agent
First, N samples are randomly selected from the training dataset to observes the state st that matches sample xt and executes a corre­
obtain one batch of samples, which is later used to construct sample sponding action at at each timestep. Furthermore, the fault diagnosis

Fig. 3. Fault diagnosis process based on the proposed CADRL framework.

5
Q. Zhao et al. Expert Systems With Applications 234 (2023) 121001

simulation environment feeds the current reward rt , the current state of (continued )
the diagnosis process terminalt , and the next state st+1 back to the agent. Algorithm 1: Fault diagnosis model training process
Then, the experience (st , at , rt , st+1 , terminalt ) is stored in the experience Calculate the loss according to Eq. (6)
replay buffer. At certain timestep intervals, mini-batch-sized samples are Update the parameters θ of Qeval via gradient descent
randomly selected from the replay buffer to train the fault diagnosis Update ε according to Eq. (8)
model. The loss of the model can be calculated as follows: If t/stepcopy = 0 do
∑ Update Qtarget parameters θ− = θ
L(θ) = [y − Qeval (st , at ; θ)]2 (6) End for
(st ,at ,rt ,st+1 )∈D Output: The fault diagnosis agent with the optimal action policy π*θ

where y represents the target Q value of the proposed fault diagnosis


model.
If the fault diagnosis iteration is terminated after performing action Algorithm 2: Adaptive environment simulation

at , the target Q value is the reward value at the current timestep. If the Definition: Fault from category ci with a sample size of ni ; The fault category
iteration is not terminated, the target Q value is calculated by Qtarget . with the smallest sample size among all fault categories is denoted as cmin , and its
sample size is nmin .
First, Qeval is used to obtain the Q values in the next state st+1 by per­ Function STEP(at ∈ A, lt ∈ L):
forming all actions. Then, the action amax (st+1 ; θ) corresponding to the If at = lt then
largest Q value is selected. Finally, the target Q value is calculated by Set rt = λ0 ⋅nmin /ni
Qtarget based on amax (st+1 ; θ). The expression of y is as follows: Else
Set rt = − λ0 ⋅nmin /ni
{
r terminal = True Output: reward rt , next state st+1 and terminalt
y= (7)
r + γQtarget (st+1 , amax (st+1 ; θ); θ− ) terminal = False

where terminal represents the process of the fault diagnosis iteration. A


true value for terminal indicates that the iteration is terminated, while a 4 Case study
false value indicates that the iteration continues.
Furthermore, the parameters of the proposed fault diagnosis model In this section, the effectiveness of the proposed fault diagnosis
are updated by performing the SGD method during the training stage. model is verified through two class-imbalanced rolling bearing
When the training process reaches the maximum timestep, the model diagnosis-based case studies. First, we introduce the details of the ex­
acquires the optimal fault diagnosis strategy and the maximum cumu­ periments, including the parameter settings of the feature extraction
lative reward. In addition, in the initial training phase of the model, to model and the fault diagnosis model. Additionally, the details of eight
accelerate the exploration process of the agent in the fault diagnosis other comparison methods are illustrated. Moreover, we implement case
environment, the dynamic ε-greedy strategy is used to determine the studies on the PU and JNU bearing datasets to demonstrate the perfor­
action at in state st . The probability ε of randomly selecting action at can mance of the proposed fault diagnosis model.
be expressed as follows:
{ 4.1 Experimental settings
ε × εdecay if ε ≥ εmin
ε= (8) 4.1.1 Model parameter settings
εmin others

where ε represents the initial probability, εmin represents the minimum In the pretraining stage of both fault diagnosis case studies, the 1D-
value of ε, and εdecay represents the decay factor. DCNN is used to build a feature extraction model, which includes con­
Step 4: Testing the trained fault diagnosis model. volutional layers, pooling layers, and fully connected layers. To mitigate
Finally, we achieve fault diagnosis based on an agent that has the overfitting problem, the dropout regularization approach is incor­
completed the training process. However, this agent cannot be used to porated into the DCNN model. Furthermore, the optimizer of the feature
recognize fault modes directly, as it can only output the predicted Q extraction model is the SGD method with a learning rate of 0.0025. In
values rather than fault recognition results. Therefore, we acquire a fault both cases, the input dimensionality of the feature extraction model is
diagnosis agent that can achieve fault recognition in testing sets by 2048, while the output dimensionality is 64. The parameter settings of
extracting the trained agent and modifying its output layer. Then, the the feature extraction model are shown in Table 1.
fault diagnosis result can be obtained by identifying the action corre­ The pretrained feature extraction model is ensembled into the fault
sponding to the maximum Q value. diagnosis model acting as the agent, which is the core state inception
The concrete training process of the proposed class-imbalanced fault part of the proposed CADRL framework. Then, the output layer
diagnosis model based on the CADRL framework is shown in Algorithm composed of a single fully connected layer is connected behind this
1. The adaptive fault diagnosis environment is shown in Algorithm 2. model to output the predicted Q-values. Furthermore, the mean squared
Algorithm 1: Fault diagnosis model training process
error (MSE) loss is introduced to the proposed model, and the loss is
equal to the average difference between the predicted Q-values and the
Initialize the experience replay buffer with a capacity of M
target Q-values. The SGD optimizer with a rate decay strategy is applied
Initialize Qeval with random weights θ
Initialize Qeval with random weights θ− = θ to minimize the loss. The capacity of the experience replay buffer of the
Initialize the sequence {st } = {xt } based on the training set Dtr = {x1 , x2 , ⋯xN } model is 2000, which means that the buffer can store the experience {st ,
For t = to T do at , rt , st+1 , terminaltt } generated within 2000 training steps. Before the
Randomly select an action at with probability ε or obtain an action at by at = model starts its iterative training process, random strategies are
argmaxQeval (st , a; θ) executed 2000 times to observe the induced experience, which is stored
a
rt , st , terminalt = STEP(at , lt ) in the replay buffer. The proposed model adopts the ε-greedy strategy to
Store (st , at , rt , st+1 , terminalt ) in the experience replay buffer explore actions during the initial learning process, and the exploration
If t > batch size andt/stepepi = 1to E do rate decays linearly from 1.0 to 0.1 within 100,000 iterations. Further­
Randomly sample the training subset with the batch size from the experience more, considering that the observations of the agent in the fault diag­
replay buffer
Calculate the accumulated reward value R
nosis problem are relatively independent, the reward discount factor is
set to 0.1, which could guide the model to pay more attention to the
(continued on next column)
current reward. The parameter settings of the fault diagnosis model are

6
Q. Zhao et al. Expert Systems With Applications 234 (2023) 121001

Table 1
The parameter settings of the feature extraction model.
Layer Conv1 MaxPooling1 Conv2 MaxPooling2 FC1 FC2

Kernel Size 5 2 3 2 – –
Number of Kernels 64 64 16 16 – –
Padding 2 – 1 – – –
Input Size 1*2048 64*2048 64*512 16*512 4096 512
Output Size 64*2048 64*512 16*512 16*256 512 64

shown in Table 2. imbalanced datasets contains 2048 data points. The 6 datasets with
basic distributions are referred to as D1-D6 and possess different
4.1.2 Comparison method settings imbalance ratios. The imbalance ratios of the imbalanced datasets are
calculated by dividing N by P, where N is the sample size of the majority
To demonstrate the effectiveness of the proposed fault diagnosis class and P is the sample size of the minority classes. Detailed infor­
model, we compare it with eight other fault diagnosis methods in both mation about the imbalanced datasets is shown in Table 4.
fault diagnosis case studies. These comparison methods include the
convolutional neural network (CNN) model-based diagnosis method, 4.2.2 Fault diagnosis results and analysis
the contrastive representation CNN (CRCNN) model-based diagnosis
method, the supervised contrastive loss CNN (SCCNN) model-based In the bearing fault case study, the number of samples in each class in
diagnosis method (J. Zhang et al., 2022), the magnet loss CNN the testing set is the same so that the average accuracy (Acc) and stan­
(MLCNN) model-based diagnosis method (Gao et al., 2022), the focal dard deviation (Std) of the fault diagnosis results can effectively
loss-equipped CNN (FLCNN) model-based diagnosis method, the DDQN demonstrate the performance of different fault diagnosis models. To
model-based diagnosis method, a diagnosis method combining the effectively verify and illustrate the fault diagnosis results, every fault
DDQN model with RUS, and a diagnosis method combining the DDQN diagnosis experiment is conducted 5 times to obtain average statistical
model with random oversampling (ROS). Such class-imbalanced fault results. The statistical results include the average accuracy and the
diagnosis methods can be divided into data-level methods and model- standard deviation produced through repeated experiments.
level methods. In addition, these eight comparison models have the
same backbone architecture and parameter settings as the proposed a) Imbalanced training sets with basic distributions.
method. Their core structure is the 1D-DCNN, which has been described b) Comparison experiments.
in Section 3.1 in detail.
First, we conduct comparative experiments on 6 imbalanced datasets
4.2 Case study 1: Fault diagnosis on the PU dataset (D1-D6) with basic distributions, which are used to train the proposed
4.2.1 PU dataset model and eight types of comparison models. Then, the fault diagnosis
performance of the trained models is reflected by experiments con­
The bearing fault dataset provided by PU (Lessmeier et al., 2016) is ducted on the testing set. The statistical results of the proposed model
used in fault diagnosis case study 1. It includes 6 sets of normal bearings, and the comparison models are shown in Fig. 6.
12 sets of artificially damaged bearings and 14 sets of real damaged The fault diagnosis results in Fig. 6 indicate that the fault diagnosis
bearings. The dataset is collected on a modular test rig with a sampling performance varies with different models and different imbalance ratios.
rate of 64 kHz, as shown in Fig. 4, including an electric motor, a torque As the imbalance ratio increases, the performance levels of different
measurement shaft, a rolling bearing test module, a flywheel and a load models decrease to different degrees. However, the proposed model
motor. The normal bearing dataset and artificially damaged bearing outperforms the comparison models on various imbalanced datasets,
datasets, which are collected under a 9000 rpm rotational speed, a 0.7 and its average accuracies are improved by 1.94%, 3.81%, 3.90%,
Nm load torque and a 1000 N radial force, are used in this case study. 3.26%, 1.67%, and 1.09% over those of the second-best models on
The dataset contains three health states, which include the normal state datasets D1-D6, which demonstrates the effectiveness of the proposed
(NS), outer ring fault (OF), and inner ring fault (IF). According to the model. Fig. 6 shows that the proposed model performs better than the
different artificial damage methods and failure severity levels, the deep learning models, including the CNN model, CRCNN model, SCCNN
bearing faults can be divided into 9 categories. The details of the bearing model, MLCNN model, and FLCNN model. The main reason for this
datasets used for fault diagnosis are summarized in Table 3. finding is that the proposed model based on DRL has stronger optimi­
To verify the effectiveness of the proposed method on imbalanced zation ability for fault diagnosis strategies and more effective adaptive
datasets, we construct two types of typical imbalanced datasets with the reward mechanism than those of the deep learning models on imbal­
basic distribution and the long-tailed distribution based on bearing vi­ anced datasets. The proposed model has significant advantages over the
bration signals, which are shown in Fig. 5. Each sample in the DDQN model. Benefitting from the contrastive loss in the pretraining
stage and the adaptive reward in the fine-tuning stage, stronger feature
representation capabilities are acquired to further improve the diagnosis
Table 2
performance on the minority classes. Furthermore, both ROS and RUS
The parameter settings of the proposed fault diagnosis model.
are used to obtain balanced training sets for the fault diagnosis model,
Parameters Case 1 Case 2
which contributes to preventing the model from skewing toward the
Minibatch size 128 64 majority class. However, when the imbalance ratios are very large, the
Maximum timestep 1,000,000 1,000,000 balanced datasets obtained by RUS or ROS may aggravate the overfitting
Episodes 200,000 200,000
problem on the minority classes. This may lead to poor generalization
Initial learning rate 0.025 0.025
Learning rate decay 0.99 0.99 and poor fault diagnosis performance, which can be observed in the
Replay buffer capacity M 2000 2000 diagnosis results obtained on dataset D1 in Fig. 6. Furthermore, the
Discount factor γ 0.1 0.1 proposed model achieves more competitive fault diagnosis performance
Initial probability ε 1 1
on D1-D4 than on other imbalanced datasets. The main reason is that
Number of exploration steps 100,000 100,000
Minimum probability εmin 0.1 0.1
D1-D4, with larger imbalance ratios, have fewer minority class samples,

7
Q. Zhao et al. Expert Systems With Applications 234 (2023) 121001

Fig. 4. The bearing fault simulation test rig.

which further proves that our model has better stability and general­
Table 3 ization on imbalanced datasets in independent repeated diagnosis
Bearing failure mode details.
experiments.
Bearing Fault Artificial damage Failure Labels
code categories methods severity
c) Ablation experiments.
K001 NS – – 0
KA01 OF electrical discharge 1 1 To further verify the effectiveness of the two key innovations in the
machining
KA03 OF electric engraving 2 2
proposed model, a series of ablation experiments are conducted. First,
KA05 OF electric engraving 1 3 we abandon the pretraining part of the proposed model to obtain the
KA07 OF drilling 1 4 IDDQN model for comparison, which adopts a reward function with an
KA08 OF drilling 2 5 adaptive update strategy in the fault diagnosis stage. Then, we abandon
KI01 IF electrical discharge 1 6
the adaptive reward function to obtain the contrastive representation
machining
KI03 IF electric engraving 1 7 DDQN (CRDDQN) model for comparison, which adopts the contrastive
KI07 IF electric engraving 2 8 loss in the pretraining stage to obtain the contrastive representation of
each sample. Finally, four models with the same architecture and
parameter settings are included in the ablation experiments, including
based on which the proposed model can mine more minority class in­ the DDQN model, IDDQN model, CRDDQN model and the proposed
formation than the comparison models. As the imbalance ratio de­ model. Ablation experiments are conducted on 6 imbalanced datasets
creases, the imbalanced datasets have more minority class samples, with basic distributions to identify the included bearing fault categories.
which weakens the advantages of the proposed model. Compared to the The Acc and Std measures are used to evaluate the performance and
other models implemented on imbalanced datasets D1-D6, the proposed stability levels of different models on the testing set. Furthermore, we
model has a lower standard deviation of its fault diagnosis accuracy,

10000 10000

8000 8000
Sample number

Sample number

6000 6000

4000 4000

2000 2000

0 0
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
Fault category labels Fault category labels
(a) imbalanced dataset with basic distribution (b) imbalanced dataset with long-tailed distribution

Fig. 5. The imbalanced datasets with different distributions.

Table 4
Introduction to imbalanced training set and testing set.
Label Training sets with basic distributions Training set with long-tailed distribution Testing set

D1 D2 D3 D4 D5 D6

0 10,000 10,000 10,000 10,000 10,000 10,000 10,000 4950


1 50 100 200 500 1000 2000 800 4950
2 50 100 200 500 1000 2000 700 4950
3 50 100 200 500 1000 2000 600 4950
4 50 100 200 500 1000 2000 500 4950
5 50 100 200 500 1000 2000 400 4950
6 50 100 200 500 1000 2000 300 4950
7 50 100 200 500 1000 2000 200 4950
8 50 100 200 500 1000 2000 100 4950

8
Q. Zhao et al. Expert Systems With Applications 234 (2023) 121001

Fig. 6. The statistical results of different fault diagnosis models.

provide a column chart with error bars in Fig. 7 to reflect the different contrastive feature representations of majority class samples and mi­
effects of the various innovations proposed in this paper. nority class samples. Additionally, the results of the CRDDQN model
As shown in Fig. 7, the proposed model achieves the highest fault show that the adaptive reward function significantly improves the
diagnosis accuracy compared to other models on each imbalanced model’s capability to recognize the fault categories in imbalanced
dataset. By comparing our approach with the IDDQN model, we find datasets. Furthermore, the DDQN model attains worse fault diagnosis
that the pretraining stage contributes to the improved fault diagnosis performance than the other models on each imbalanced dataset,
performance. Because of the contrastive loss in the pretrained feature demonstrating that the proposed innovations can effectively improve
extraction model, the fault diagnosis model can effectively mine the the fault recognition results and have a synergistic effect on class-
imbalanced fault diagnosis problems.

d) Imbalanced training set with long-tailed distribution.

In the second comparative experiment, the imbalanced dataset with


the long-tailed distribution is used as the training set of fault diagnosis
models. The experimental results obtained on the long-tailed dataset
further verify the generalization and applicability of the proposed
model. The Acc and F1-score values produced by different fault diag­
nosis models on the testing set are shown in Fig. 8.
As shown by the Acc and F1-score in the fault diagnosis results in
Fig. 8, the proposed model significantly outperforms the other models,
mainly because it has a self-learning mechanism based on DRL and an
adaptive reward function suitable for imbalanced datasets. Compared
with the CRCNN model with the second-highest Acc, the proposed
model is 3.92% higher. In addition, the SCCNN model has the lowest F1-
Fig. 7. The fault diagnosis results obtained in the ablation experiments. score, which has better stability than the proposed method. The

9
Q. Zhao et al. Expert Systems With Applications 234 (2023) 121001

Fig. 8. Acc and F1-score values of different fault diagnosis models.

proposed model, CRCNN model and SCCNN model outperform other models. The reduced-dimensionality features obtained by the t-SNE
models because they both apply CL in the pretraining stage. The CL method are shown in Fig. 9 to reflect the feature representation abilities
method helps these models learn the contrastive feature representation of the feature extraction model and the proposed fault diagnosis model.
of samples, which can alleviate the interclass overlap problem and In Fig. 9, (a) represents the sample features of the original long-tailed
improve the stability of the fault diagnosis models. Benefiting from the dataset, (b) represents the sample features output by the feature
magnet loss suitable for class-imbalanced data, the MLCNN model has extraction model, and (c) represents the sample features output by the
the third highest Acc and the third lowest F1-score. Magnet loss is a proposed fault diagnosis model.
typical distance metric learning method, which can adaptively sculpt the As shown in Fig. 9 (a), significant interclass overlap occurs between
representation space of samples and respect intra-class variation and the majority class with the normal state and minority classes with fault
inter-class similarity. Furthermore, due to the application of contrastive states in the original imbalanced dataset. However, Fig. 9 (b) indicates
loss, magnet loss, and focal loss, the CRCNN model, SCCNN model and that very few fault sample features extracted by the feature extraction
MLCNN model and FLCNN model both have better fault diagnosis per­ model overlap with the normal sample features. This indicates that the
formance than that of the typical CNN model. In addition, the proposed feature extraction model based on CL can learn the contrastive feature
model has better performance than the DDQN model, the DDQN model representations of samples and increase the interclass distances in
combined with RUS and the DDQN model combined with ROS. The main imbalanced datasets. Furthermore, from Fig. 9 (c), we find that the
reason is that the proposed model applies an adaptive reward function, sample features output by the diagnosis model have almost no overlap
which helps the proposed model pay more attention to the minor classes. problems. This demonstrates that the proposed model has excellent fault
As shown in Fig. 8, the performance of the DDQN model combined with category distinguishing capabilities for imbalanced datasets. In addi­
RUS is much worse than that of other models, demonstrating that the tion, the application of the adaptive reward function enables the pro­
RUS method loses some sample information during the resampling posed model to focus on minority classes, which further improves the
process. Meanwhile, the values of Std of all evaluation criteria for the ability of the model to extract distinguishable fault category features.
proposed model are lower than those of the DQN-based comparison
models, indicating that the proposed model has better stability on 4.3 Case study 2: Fault diagnosis on the JNU dataset
imbalanced datasets than classical DDQN scheme. According to the 4.3.1 JNU dataset
above experimental results, the proposed model has obvious advantages
on long-tailed datasets, and it is more suitable for class-imbalanced fault The bearing fault dataset provided by JNU (K. Li et al., 2013) is
diagnosis problems. adopted in fault diagnosis case study 2. The dataset is collected on an
Furthermore, we use the T-distributed stochastic neighbor embed­ inductor motor test bench, as shown in Fig. 10. The nameplate of the
ding (t-SNE) method to reduce the dimensionality of the original machine is a 3.7 kW three-phase induction motor, with Vmax = 220 V, P
imbalanced dataset and the feature outputs produced by the two-stage = 4 pole pairs, and rated speed S = 1,800 rpm. The rotor of the motor is

Fig. 9. Feature visualization results obtained on the long-tailed dataset based on t-SNE.

10
Q. Zhao et al. Expert Systems With Applications 234 (2023) 121001

Fig. 10. Inductor motor test bench. (a) illustrates the test bench, and (b) represents the real motor in the field.

carried by two bearings, one of which is defective. The bearing dataset diagnosis performance on balanced datasets, and the F1-score reflects
includes 12 sets of data obtained under 3 different working conditions, the model’s fault diagnosis performance on imbalanced datasets. Ac­
where the rotation speeds are separately set to 600 r/min, 800 r/min, cording to Table 6, the proposed model achieves the highest Acc and F1-
and 1000 r/min. Four health states are collected under one working score and has significant advantages over other models. Moreover, the
condition, including NS, roller element fault (RF), outer-race fault (OF) DDQN model combined with ROS achieves the second-best performance
and inner-race fault (IF). The bearing vibration signals are measured by on both the balanced testing set and the long-tailed testing set. On the
an accelerometer, whose sampling frequency is 50 kHz. balanced testing set, the proposed model’s Acc and F1-score are 9.93%
In the following case study, the data with 4 health states collected and 10.02% higher than those of the second-best model. On the long-
under the 600 r/min rotation speed are used. To obtain sufficient tailed testing set, the Acc and the F1-score of the proposed model are
bearing fault samples, the original vibration signals are resampled by a 0.92% and 3.58% higher than those of the second-best model. In addi­
sliding window with a length of 2048. Furthermore, to verify the fault tion, on the balanced testing set, the proposed model has the best sta­
diagnosis performance of the proposed model on imbalanced datasets, bility, which is demonstrated by the standard deviations of both
we first organize a long-tailed imbalanced dataset. There are 4000 evaluation criteria. And SCCNN and CRCNN are the second and third
samples with NS in the dataset, representing the head class. In addition, stable models respectively, as shown in Fig. 11. On the long-tailed
200 samples with RF, 150 samples with OF, and 100 samples with IF testing set, the standard deviations of F1-score of SCCNN model is
represent the tail classes. Then, the testing sets with balanced distribu­ higher than other models. Such results illustrate that the contrastive
tion and long-tailed distribution are organized. The details of the learning-based pretraining stage improves the stability of the fault
training set and testing sets are summarized in Table 5. recognition process. This is mainly because CL can mine the contrastive
representation features of the input samples and increase the distances
4.3.2 Fault diagnosis results and analysis between features belonging to different categories.

We conduct fault diagnosis experiments on the JNU bearing dataset 4.4 Discussion
to validate the effectiveness and suitability of the proposed model in
class-imbalanced fault diagnosis problems. The proposed model and Furthermore, we analyze and discuss the performance of the pro­
comparison models are trained on the long-tailed training set and tested posed fault diagnosis framework on imbalanced datasets, and the find­
on the balanced testing set and long-tailed testing set. Furthermore, two ings are summarized as follows.
typical evaluation criteria, the Acc and F1-score, and their standard (1) The effectiveness and stability of the proposed framework are
deviations are adopted to evaluate the fault diagnosis performance of fully verified in experiments conducted on two public datasets.
the different models, as shown in Table 6. In addition, the boxplots of the In case study 1, the proposed model and eight comparison models are
Acc and F1-score results are shown in Fig. 11, which can intuitively trained on six datasets with different imbalance ratios. Their fault
reflect the effectiveness and stability of all tested fault diagnosis models. diagnosis performance is shown in Fig. 6. For example, in the compar­
Then, the confusion matrices obtained by the proposed model on testing ison experiment conducted on D2 with basic distribution, the fault
sets with different distributions are shown in Fig. 12. diagnosis accuracy of the proposed model is 7.27%/3.81%/4.80%/
As shown in Table 6 and Fig. 11, the proposed model has the best 5.53%/6.96%/6.19%/6.60%/5.90% higher than that of the CNN model,
fault diagnosis performance both on the balanced testing set and on the CRCNN model, SCCNN model, MLCNN model, FLCNN model, DDQN
long-tailed testing set. The Acc and F1-score of different fault diagnosis model, DDQN model with RUS, and DDQN model with ROS, respec­
models demonstrate similar change trends, indicating that a model with tively. The standard deviation of the accuracy of the proposed model is
a higher Acc has a higher F1-score. The Acc reflects the model’s fault the smallest among all the comparison methods, which is close to 0 for
almost all imbalanced datasets. In case study 2, the experimental results
on balanced testing set in Table 6 show that the accuracy of the proposed
Table 5 model is 15.61%/14.48%/9.93%/12.52%/12.88%/17.96%/ 10.89%/
Details of the training set and testing sets. 10.10% higher than that of the other models on the long-tailed dataset.
Health Label Training Testing set with Testing set with Long- The F1-score is 16.66%/14.93%/ 10.02%/12.46%/13.08%/18.77%/
state set balanced distribution tailed distribution 11.23%/10.83% higher than that of the other models. In addition, the
NS 0 4000 900 900 standard deviations of the two model evaluation criteria are very small.
RF 1 200 900 200 The fault diagnosis results obtained on two public datasets demonstrate
OF 2 150 900 150 that the proposed framework is more effective and more stable than the
IF 3 100 900 100
compared models.

11
Q. Zhao et al. Expert Systems With Applications 234 (2023) 121001

Table 6
The fault diagnosis results obtained on testing sets with different distributions.
Model Balanced testing set Long-tailed testing set

Acc Std F1 Std Acc Std F1 Std

CNN 0.6817 0.0193 0.6685 0.0232 0.8736 0.0033 0.7521 0.0044


CRCNN 0.6929 0.0041 0.6857 0.0040 0.8787 0.0036 0.7731 0.0059
SCCNN 0.7384 0.0018 0.7367 0.0018 0.8037 0.0026 0.7029 0.0012
MLCNN 0.7126 0.0061 0.7105 0.0063 0.8033 0.0015 0.6885 0.0025
FLCNN 0.7090 0.0110 0.7043 0.0123 0.8498 0.0056 0.7267 0.0108
DDQN 0.6582 0.0079 0.6474 0.0089 0.8741 0.0082 0.7614 0.0127
RUS + DDQN 0.7288 0.0147 0.7228 0.0157 0.8224 0.0089 0.7221 0.0071
ROS + DDQN 0.7367 0.0179 0.7268 0.0171 0.8958 0.0035 0.7904 0.0064
Proposal 0.8377 0.0015 0.8351 0.0015 0.9050 0.0014 0.8262 0.0032

Fig. 11. The fault diagnosis results obtained on balanced and long-tailed testing sets.

(2) The proposed model has better generalization and suitability for than that of the proposed model, as shown in Fig. 8. In case study 2, the
different training sets and different testing sets. proposed model achieves significant improvements in the experiments
In case study 1, in the comparison experiment conducted on the conducted on both a balanced dataset and a long-tailed dataset. From
imbalanced datasets with basic distributions, the proposed model ach­ the experimental results on the balanced testing set shown in Table 6,
ieves the best performance from the perspective of the accuracy and the F1-score of the proposed model is 16.66%/14.93%/10.02%/
standard deviation. As shown in Fig. 6, our model achieves 1.94%/ 12.46%/13.08%/18.77%/11.23%/10.83% higher than that of the other
3.81%/3.90%, 3.26%/1.67%/1.09% accuracy improvements over the models. In the experiments conducted on the long-tailed testing set, the
second-best model on datasets D1-D6 with different imbalance ratios. In F1-score of the proposed model is 7.41%/5.31%/12.33%/13.77%/
addition, in the experiment conducted on the imbalanced datasets with 9.95%/6.47%/10.41%/ 3.58% higher than that of the other models.
long-tailed distributions, the accuracy of the comparison models is These experimental results show that the proposed model has better
7.47%/3.93%/5.28%/4.56%/ 5.75%/7.60%/17.36%/6.64% lower performance on imbalanced datasets with different distributions or

12
Q. Zhao et al. Expert Systems With Applications 234 (2023) 121001

Fig. 12. The confusion matrices produced by the proposed model on testing sets with different distributions.

different imbalance ratios, which further indicates that our model has knowledge under specific scenarios. To address the above issues, this
better generalization and suitability than other models. research proposes two new strategies, namely the pretraining strategy
(3) Pretraining the feature extraction model using the contrastive and the adaptive reward mechanism. The pretraining feature extraction
loss helps improve the feature representation ability of the model and model based on CL learns the contrastive feature representation of fault
avoid feature overlap problems. samples, avoiding interclass overlap problems and improving fault
According to the ablation experiment in case study 1, the CRDDQN diagnosis performance. The DRL-based model with an adaptive reward
model has better performance than the classic DDQN model, and the function achieves better fault diagnosis performance on imbalanced
proposed model achieves higher fault recognition accuracy and smaller datasets by paying more attention to minority classes. Furthermore, as a
standard deviations than the IDDQN model. The experimental results data-driven method without prior expert knowledge, the CADRL
verify that the introduction of CL improves the feature representation framework can easily be transferred into other class-imbalanced sce­
ability and fault classification accuracy of the model, as shown in Fig. 7. narios. In addition, both strategies can be combined with other methods
According to the features obtained by t-SNE in Fig. 9, there are serious separately to improve fault feature mining ability and fault diagnosis
interclass overlap problems in the original imbalanced dataset. The performance under class-imbalanced conditions.
feature extraction model pretrained on the contrastive loss can increase
the distances between samples belonging to different categories and 5 Conclusion
effectively reduce the distances between samples belonging to the same
category. In addition, the CRCNN model built on the contrastive loss- In this paper, we propose a CADRL framework combining DRL with
based pretrained model has better fault diagnosis performance than CL to address class-imbalanced fault diagnosis problems. The proposed
the classic CNN model trained on the cross-entropy loss. This further framework contains a pretraining feature extraction stage and a fine-
demonstrates that the model optimized on contrastive loss has better tuning stage in the fault diagnosis model. In the pretraining stage, CL
fault feature mining and fault diagnosis capabilities. is adapted to construct sample pairs and optimize the feature extraction
(4) The proposed adaptive reward function can significantly improve model. During the fine-tuning stage, an adaptive reward function is
the fault classification performance of the model on imbalanced datasets incorporated into the fault diagnosis model to achieve class-imbalanced
without prior knowledge. fault diagnosis. The results of two case studies demonstrate the advan­
In the ablation experiment of case study 1, the IDDQN model has tages of the proposed framework. The CADRL framework has more
better performance than the DDQN model, and the proposed model has effective and more stable performance on imbalanced fault diagnosis
better performance than the CRDDQN model from the perspective of the datasets, which is illustrated by the statistical results. Additionally, the
accuracy and F1-score metrics. The experimental results are shown in proposed model can effectively achieve improved fault diagnosis accu­
Fig. 7, indicating that the adaptive reward function can help the pro­ racy in the experiments conducted on imbalanced datasets with different
posed model automatically learn the optimal fault diagnosis strategy. distributions or imbalance ratios, which indicates that the model has
The reward values of different categories dynamically and adaptively better generalization ability than comparison models. Furthermore, the
update with the sample distribution changes observed in the experience CL strategy and the adaptive reward function can both help to improve
replay buffer without relying on prior knowledge. Furthermore, the the fault diagnosis performance attained on imbalanced datasets. In
adaptive reward function can help rebalance the attention given by the conclusion, the proposed CADRL framework has significant advantages
model to the majority class and minority class and alleviate overfitting in handling imbalanced fault diagnosis problems. In addition, as a data-
problems. Additionally, the proposed reward function can significantly driven model, the proposed framework can be easily transferred to
improve the fault diagnosis performance achieved on imbalanced similar class-imbalanced scenarios.
datasets.
In summary, the proposed framework provides new inspiration for
class-imbalanced fault diagnosis problems. As described in the afore­ Declaration of Competing Interest
mentioned sections, datasets with large imbalance ratios may lead to
serious interclass overlap problems. In addition, most reweighting-based The authors declare that they have no known competing financial
methods and model modification-based methods rely on prior interests or personal relationships that could have appeared to influence
the work reported in this paper.

13
Q. Zhao et al. Expert Systems With Applications 234 (2023) 121001

Data availability Le-Khac, P. H., Healy, G., & Smeaton, A. F. (2020). Contrastive representation learning: A
framework and review. IEEE Access, 8, 193907–193934.
Lessmeier, C., Kimotho, J. K., Zimmer, D., & Sextro, W. (2016). Condition monitoring of
Data will be made available on request. bearing damage in electromechanical drive systems by using motor current signals of
electric motors: A benchmark data set for data-driven classification. PHM Society
Acknowledgment European Conference, 3(1).
Liang, Z., Wang, H., Yang, K., & Shi, Y. (2022). Adaptive Fusion Based Method for
Imbalanced Data Classification. Frontiers in Neurorobotics, 16, Article 827913.
This work was supported by the National Key R&D Program of China Li, K., Ping, X., Wang, H., Chen, P., & Cao, Y. (2013). Sequential fuzzy diagnosis method
under Grant 2021ZD0201300, the National Natural Science Foundation for motor roller bearing in variable operating conditions based on vibration analysis.
Sensors, 13(6), 8013–8041.
of China [Grant Nos. 62103030 and 61973011], the National Defense Lin, E., Chen, Q., & Qi, X. (2020). Deep reinforcement learning for imbalanced
Basic Scientific Research Program (Grant No. 2022601C009), and the classification. Applied Intelligence, 50(8), 2488–2502.
Fundamental Research Funds for the Central Universities (Grant No. Liu, H., Liu, Z., Jia, W., Zhang, D., & Tan, J. (2021). A novel imbalanced data
classification method based on weakly supervised learning for fault diagnosis. IEEE
YWF-22-L-516). Transactions on Industrial Informatics, 18(3), 1583–1593.
Liu, R., Yang, B., Zio, E., & Chen, X. (2018). Artificial intelligence for fault diagnosis of
References rotating machinery: A review. Mechanical Systems and Signal Processing, 108, 33–47.
Liu, S., Chen, J., He, S., Xu, E., Lv, H., & Zhou, Z. (2021). Intelligent fault diagnosis under
small sample size conditions via Bidirectional InfoMax GAN with unsupervised
Andriotis, C. P., & Papakonstantinou, K. G. (2019). Managing engineering systems with
representation learning. Knowledge-Based Systems, 232, Article 107488.
large state and action spaces through deep reinforcement learning. Reliability
Li, W., Zhong, X., Shao, H., Cai, B., & Yang, X. (2022). Multi-mode data augmentation
Engineering & System Safety, 191, Article 106483.
and fault diagnosis of rotating machinery using modified ACGAN designed with new
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for
framework. Advanced Engineering Informatics, 52, Article 101552.
contrastive learning of visual representations. International Conference on Machine
Mao, W., He, L., Yan, Y., & Wang, J. (2017). Online sequential prediction of bearings
Learning, 1597–1607.
imbalanced fault diagnosis by extreme learning machine. Mechanical Systems and
Chopra, S., Hadsell, R., & LeCun, Y. (2005). Learning a similarity metric discriminatively,
Signal Processing, 83, 450–473.
with application to face verification, 1. In 2005 IEEE computer society conference on
Ma, X., Chen, S., Hsu, D., & Lee, W. S. (2021). Contrastive variational reinforcement
computer vision and pattern recognition (CVPR’05) (pp. 539–546).
learning for complex observations. Conference on Robot Learning, 959–972.
Cui, Y., Jia, M., Lin, T.-Y., Song, Y., & Belongie, S. (2019). Class-balanced loss based on
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., …
effective number of samples. In Proceedings of the IEEE/CVF conference on computer
Ostrovski, G. (2015). Human-level control through deep reinforcement learning.
vision and pattern recognition (pp. 9268–9277).
Nature, 518(7540), 529–533.
Dangut, M. D., Jennions, I. K., King, S., & Skaf, Z. (2022). Application of deep
Osband, I., Blundell, C., Pritzel, A., & van Roy, B. (2016). Deep exploration via
reinforcement learning for extremely rare failure prediction in aircraft maintenance.
bootstrapped DQN. In Advances in Neural Information Processing Systems (p. 29).
Mechanical Systems and Signal Processing, 171, Article 108873.
Peng, P., Lu, J., Tao, S., Ma, K., Zhang, Y., Wang, H., & Zhang, H. (2022). Progressively
Ding, Y., Ma, L., Ma, J., Suo, M., Tao, L., Cheng, Y., & Lu, C. (2019). Intelligent fault
balanced supervised contrastive representation learning for long-tailed fault
diagnosis for rotating machinery using deep Q-network based health state
diagnosis. IEEE Transactions on Instrumentation and Measurement, 71, 1–12.
classification: A deep reinforcement learning approach. Advanced Engineering
Prakash, J., Kankar, P. K., & Miglani, A. (2021). Internal Leakage Detection in a
Informatics, 42, Article 100977.
Hydraulic Pump using Exhaustive Feature Selection and Ensemble Learning.
Du, G., Zhang, J., Luo, Z., Ma, F., Ma, L., & Li, S. (2020). Joint imbalanced classification
International Conference on Maintenance and Intelligent Asset Management (ICMIAM),
and feature selection for hospital readmissions. Knowledge-Based Systems, 200,
2021, 1–6.
Article 106020.
Qiao, J., Wang, G., Li, W., & Chen, M. (2018). An adaptive deep Q-learning strategy for
Fan, S., Zhang, X., & Song, Z. (2021). Imbalanced sample selection with deep
handwritten digit recognition. Neural Networks, 107, 61–71.
reinforcement learning for fault diagnosis. IEEE Transactions on Industrial Informatics,
Ragab, M., Chen, Z., Zhang, W., Eldele, E., Wu, M., Kwoh, C.-K., & Li, X. (2022).
18(4), 2518–2527.
Conditional Contrastive Domain Generalization for Fault Diagnosis. IEEE
Gao, Y., Gao, L., Li, X., & Cao, S. (2022). A hierarchical training-convolutional neural
Transactions on Instrumentation and Measurement, 71, 1–12.
network for imbalanced fault diagnosis in complex equipment. IEEE Transactions on
Ren, Z., Zhu, Y., Kang, W., Fu, H., Niu, Q., Gao, D., … Hong, J. (2022). Adaptive cost-
Industrial Informatics, 18(11), 8138–8145.
sensitive learning: Improving the convergence of intelligent diagnosis models under
Geng, Y., Wang, Z., Jia, L., Qin, Y., & Chen, X. (2020). Bogie fault diagnosis under
imbalanced data. Knowledge-Based Systems, 241, Article 108296.
variable operating conditions based on fast kurtogram and deep residual learning
Rethmeier, N., & Augenstein, I. (2023). A Primer on Contrastive Pretraining in Language
towards imbalanced data. Measurement, 166, Article 108191.
Processing: Methods, Lessons Learned, and Perspectives. ACM Computing Surveys, 55
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., …
(10), 1–17.
Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63
Shao, S., Wang, P., & Yan, R. (2019). Generative adversarial networks for data
(11), 139–144.
augmentation in machine fault diagnosis. Computers in Industry, 106, 85–93.
He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for
Sivaranjani, A., & Vinod, B. (2023). Artificial Potential Field Incorporated Deep-Q-
unsupervised visual representation learning. In Proceedings of the IEEE/CVF
Network Algorithm for Mobile Robot Path Prediction. Intelligent Automation And Soft
conference on computer vision and pattern recognition (pp. 9729–9738).
Computing, 35(1), 1135–1150.
He, Z., Shao, H., Cheng, J., Zhao, X., & Yang, Y. (2020). Support tensor machine with
Swana, E. F., Doorsamy, W., & Bokoro, P. (2022). Tomek Link and SMOTE Approaches
dynamic penalty factors and its application to the fault diagnosis of rotating
for Machine Fault Classification with an Imbalanced Dataset. Sensors, 22(9), 3246.
machinery with unbalanced data. Mechanical Systems and Signal Processing, 141,
van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double
Article 106441.
q-learning. Proceedings of the AAAI Conference on Artificial Intelligence.
He, Z., Shao, H., Wang, P., Lin, J. J., Cheng, J., & Yang, Y. (2020). Deep transfer multi-
Wang, H., Xu, J., Sun, C., Yan, R., & Chen, X. (2021). Intelligent fault diagnosis for
wavelet auto-encoder for intelligent fault diagnosis of gearbox with few target
planetary gearbox using time-frequency representation and deep reinforcement
training samples. Knowledge-Based Systems, 191, Article 105313.
learning. IEEE/ASME Transactions on Mechatronics, 27(2), 985–998.
Hou, R., Chen, J., Feng, Y., Liu, S., He, S., & Zhou, Z. (2022). Contrastive-weighted self-
Wang, Z., Wang, J., & Wang, Y. (2018). An intelligent diagnosis scheme based on
supervised model for long-tailed data classification with vision transformer
generative adversarial learning deep neural networks and its application to
augmented. Mechanical Systems and Signal Processing, 177, Article 109174.
planetary gearbox fault pattern recognition. Neurocomputing, 310, 213–222.
Jia, F., Li, S., Zuo, H., & Shen, J. (2020). Deep neural network ensemble for the
Yi, H., Jiang, Q., Yan, X., & Wang, B. (2020). Imbalanced classification based on minority
intelligent fault diagnosis of machines under imbalanced data. IEEE Access, 8,
clustering synthetic minority oversampling technique with wind turbine fault
120974–120982.
detection application. IEEE Transactions on Industrial Informatics, 17(9), 5867–5875.
Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., … Krishnan, D. (2020).
Zhang, J., Zou, J., Su, Z., Tang, J., Kang, Y., Xu, H., … Fan, S. (2022). A class-aware
Supervised contrastive learning. Advances in Neural Information Processing Systems,
supervised contrastive learning framework for imbalanced fault diagnosis.
33, 18661–18673.
Knowledge-Based Systems, 252, Article 109437.
Kuang, J., Xu, G., Tao, T., & Wu, Q. (2021). Class-imbalance adversarial transfer learning
Zhang, T., Chen, J., Li, F., Zhang, K., Lv, H., He, S., & Xu, E. (2022). Intelligent fault
network for cross-domain fault diagnosis with imbalanced data. IEEE Transactions on
diagnosis of machines with small & imbalanced data: A state-of-the-art review and
Instrumentation and Measurement, 71, 1–11.
possible extensions. ISA Transactions, 119, 152–171.
Laskin, M., Srinivas, A., & Abbeel, P. (2020). Curl: Contrastive unsupervised
Zhu, J., Xia, Y., Wu, L., Deng, J., Zhou, W., Qin, T., … Li, H. (2022). Masked contrastive
representations for reinforcement learning. International Conference on Machine
representation learning for reinforcement learning. IEEE Transactions on Pattern
Learning, 5639–5650.
Analysis and Machine Intelligence.

14

You might also like