DiffNet
DiffNet
A R T I C L E I N F O A B S T R A C T
Keywords: Predicting the remaining useful life (RUL) of aero-engines is essential for their prognostics and health man-
Prognostics and health management agement (PHM). Deep learning technologies are effective in this area, but their success depends critically on
Remaining useful life estimation acquiring sufficient engine monitoring data from operation to failure, a process that is expensive and challenging
Aero-engines
in practice. Insufficient data limit the training of deep learning methods, thereby affecting their predictive
Deep learning
Data augmentation
performance. To address this issue, this study proposes a novel method named DiffRUL for augmenting multi-
Diffusion model variate engine monitoring data and generating high-quality samples mimicking degradation trends in real data.
Initially, a specialized degradation trend encoder is designed to extract degradation trend representations from
monitoring data, which serve as generative conditions. Subsequently, the diffusion model is adapted to the
scenario of generating multivariate monitoring data, reconstructing and synthesizing data from Gaussian noise
through a reverse process. Additionally, a denoising network is developed to incorporate generative conditions
and capture spatio-temporal correlations in the data, accurately estimating noise levels during the reverse
process. Experimental results on C-MAPSS and N-CMAPSS datasets show that DiffRUL successfully generates
high-fidelity multivariate monitoring data. Furthermore, these generated data effectively support the RUL pre-
diction task and significantly enhance the predictive ability of the underlying deep learning models.
* Corresponding author.
E-mail address: [email protected] (Z. Cai).
https://doi.org/10.1016/j.ress.2024.110394
Received 25 January 2024; Received in revised form 26 May 2024; Accepted 22 July 2024
Available online 26 July 2024
0951-8320/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
explored. Jayasinghe et al. [14] used 1D temporal convolutions and prediction, Chen [36] utilized diffusion models to assess the current HI
stacked LSTM for precise RUL estimation. Liu et al. [15] created a of gears, employing an attention-guided multi-hierarchy LSTM for the
feature-attention method for turbofan engines’ RUL prediction, using a prognosis of future HIs, thus enabling the determination of the gear’s
two-layer BGRU for long-term dependencies and a multilayer CNN for RUL upon exceeding a predetermined threshold. Lu et al. [37] devel-
local sequence analysis. Recently, the Transformer has become a oped a method for the prediction of rolling bearing RUL, integrating a
promising alternative for RUL prediction, with works by Chen et al. [12] coupled diffusion process with a temporal attention mechanism, and
and Zhang et al. [13] developing Transformer-based networks for introducing a specialized network to substitute the inverse process
effective long-term trend analysis in RUL estimation. within the coupled diffusion model for RUL prediction. Beyond these
Despite significant advances in deep learning for RUL prediction, the applications, the potential of exploiting diffusion models for the gener-
effectiveness of these methods largely depends on the availability of ation of system monitoring data to enhance RUL predictions has
extensive run-to-failure system monitoring datasets [16]. In the big data remained largely untapped. Accordingly, this study leverages diffusion
era, although collecting large datasets is increasingly feasible, acquiring models for the generation of multivariate engine monitoring data,
comprehensive full lifecycle data for equipment continues to pose a adeptly capturing complex patterns and distributions, and producing
major challenge [17]. This lack of complete lifecycle data remains a samples that accurately mimic real degradation trends.
primary obstacle to deploying deep learning techniques in RUL predic- The main contributions of this study can be summarized as follows:
tion. Data generation technologies such as the widely-used Generative
Adversarial Network (GAN) introduced by Goodfellow [18], provide a 1) In response to scenarios involving equipment operating under
creative approach to overcoming the challenges of obtaining large diverse conditions or exhibiting multiple failure modes, and where
real-world datasets by learning to mimic actual data distributions to monitoring data display varied distribution characteristics, we
create realistic synthetic datasets. In PHM research, GANs are utilized to introduce a degradation trend encoder, DTE. This encoder effectively
synthesize data, thereby enhancing the accuracy of health prognostics captures accurate representations of degradation trends in system
and facilitating more precise RUL predictions. Behera et al. [19] monitoring data. These distinctive representations then act as
developed a prognostics framework using a Conditional GAN (CGAN) generative conditions, enabling the generative model to more
and a deep gated recurrent unit network to generate multivariate data effectively learn complex patterns and distributions within the data.
samples and predict RUL for complex systems. Gu et al. [20] employed a 2) We adapt the diffusion model to generate multivariate monitoring
Wasserstein GAN with Gradient Penalty (WGAN-GP) to create synthetic data, introducing an innovative method named DiffRUL. This
data for lithium-ion batteries, thus boosting battery health prognostic method uses a forward diffusion process to incrementally introduce
performance. Shangguan et al. [21] employed the Time-series GAN noise into the data, followed by a reverse generation process that
(TimeGAN) to generate synthetic degradation samples of train wheels, trains a denoising network across various noise levels to progres-
improving the prediction of wheel degradation. This enhancement en- sively restore the original data samples. This technique effectively
ables more accurate RUL calculations and supports further reliability captures the probabilistic distribution characteristics of multivariate
analysis of railways. Cheng et al. [22] combined dynamic mechanisms monitoring data, ensuring that the distribution of the generated
with GAN to produce diverse full-cycle degradation data, while Yang samples closely matches that of authentic data and accurately re-
et al. [16] enhanced rolling bearing sample quality using a conditional flects degradation trends seen in real systems, thereby providing an
GAN-based framework. Zhang et al. [23] developed a Convolutional efficient way to generate multivariate system monitoring data.
Recurrent GAN (CR-GAN) to produce high-quality time-series data, 3) We develop an advanced denoising network called DiffNet, specif-
improving RUL prediction techniques. ically designed to include generative conditions and to precisely
However, employing GANs to synthesize system monitoring data capture the complex spatio-temporal correlations in multivariate
presents challenges, particularly with the stability of training and the system monitoring data. By accurately predicting noise levels at each
convergence of generator and discriminator components. A common diffusion timestep, DiffNet supports the DiffRUL framework,
problem, mode collapse, may restrict output variety and, in severe cases, enhancing the generation of high-quality data.
lead to the repetitive generation of identical outputs, reducing data di- 4) The effectiveness of our proposed method is validated experimen-
versity [24]. Moreover, system monitoring data, often multivariate, in- tally, resulting in the generation of higher quality monitoring data
cludes a broad spectrum of operational metrics from aero-engines, such and the enhancement of deep learning models’ performance in pre-
as temperature, pressure, and vibration [25]. Generating such multi- dicting RUL.
variate data requires careful handling of complex spatio-temporal cor-
relations to preserve quality. Additionally, current GAN-based methods This paper is organized as follows: Section 2 outlines data prepara-
struggle to fully capture scenarios with aero-engines operating under tion and segmentation; Section 3 describes the proposed method; Sec-
various conditions and experiencing multiple failure modes [23]. These tion 4 discusses the experimental results; Section 5 concludes the paper.
scenarios involve diverse altitudes, temperatures, speeds, and failure
types, each contributing to data with multiple distributional attributes. 2. Problem formulation
Accurately capturing these intricate patterns is crucial to ensuring the
fidelity of the synthetic data produced. 2.1. Data preparation
Building on the introduction above, this study proposes an innova-
tive method to augment multivariate engine monitoring data, termed as The system monitoring data obtained from sensors are typically
{ }
DiffRUL. This approach is grounded in the principles of diffusion prob- multivariate. Therefore, let X = X(i) |i = 1, 2, …, nengine represent the
abilistic models, a rapidly evolving class of generative models inspired run-to-failure monitoring data of nengine engines, where the monitoring
by the natural process of physical diffusion. Characteristically, diffusion data for engine i is X(i) , denoted as Eq. (1):
models commence by the integration of noise into original datasets, { }
thereafter embarking on a process to inversely model this integration, X(i) = x(i)
1 , x2 , …, xt , …, xTi (1)
(i) (i) (i)
2
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
{ }
failure for engine i. Let Y = Y(i) |i = 1, 2, …, nengine represent the cor- segment the normalized multivariate monitoring data for each engine,
{ } employing a window size of L and a step size of 1 [7]. This segmentation
responding RUL values, where Y(i) = y1 , y2 , …, yt , …, yTi . yt is the
(i) (i) (i) (i) (i)
method expands the data volume of engine i from 1 to Ti − L + 1,
RUL value of engine i at time step t, representing the time interval from { (i)(s) }
denoted as X = X |s = 1,2,…,Ti − L + 1 . After similar treatment
(i)
the current time step to the engine’s failure, as shown in Eq. (2):
of all engines’ data, the total data volume expands from nengine to N =
∑nengine
y(i)
t = Ti − t (2) i=1 (Ti − L + 1). For simplicity, the segmented data is still repre-
{ (i)(s) }
In practical applications, system degradation is typically negligible sented as X, and X = X |i = 1,2,…,nengine ,s = 1,2,…,Ti − L + 1 . For
during initial operation, allowing us to disregard the linear RUL target each segmented data X , we predict its RUL at the last time step, with
(i)(s)
3
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
logp(X) = q(Z|X)log dZ
q(Z|X) p(Z|X) L 1 is the regularization loss, which ensures that the latent variable
Z
∫ ( ) distribution learned by the Encoder closely aligns with the prior distri-
p(X|Z)p(Z)
= q(Z|X)log dZ + D KL (q(Z|X) ‖ p(Z|X)) bution, thereby preventing overfitting. L 2 is the reconstruction loss,
q(Z|X)
Z which ensures the Decoder’s output closely matches the original input
∫ (
p(X|Z)p(Z)
) data, preserving the data’s integrity. The loss functions L 1 and L 2 can
≥ q(Z|X)log dZ ELBO method effectively enable the DTE to learn useful representations in the latent
q(Z|X)
Z space, which can represent certain key features of the input data.
(8) To guarantee that the latent representation encompasses the degra-
dation trend of the system monitoring data, we incorporate a Regressor
where D KL represents the Kullback-Leibler (KL) divergence, a metric to predict the RUL of the system. Specifically, we use the latent variable
used to measure the difference between two probability distributions. Z to predict the RUL corresponding to the input data X and define an
We define L b as: additional loss function L 3 , as indicated in Eq. (15):
∫ ( )
p(X|Z)p(Z) √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√ nengine Ti − L+1
L b = q(Z|X)log dZ (9) √1 ∑ ∑
q(Z|X) L3=√ (̂y
(i)(s)
− y(i)(s) )
2
(15)
Z N i=1 s=1
which is obviously the evidence lower bound of logp(X), so that the The Regressor’s structure is straightforward, consisting of a fully
following equation is given: connected layer with a Tanh activation function and a single neuron
logp(X) = L b + D KL (q(Z|X) ‖ p(Z|X)) (10) layer dedicated to RUL prediction. L 3 imposes an extra constraint on
the DTE, compelling the model to not only learn the representation of
At this point, the optimization objective becomes to find p(X|Z) and the input data X in the latent space but also to accurately differentiate
q(Z|X) thereby maximizing L b . We rewrite L b as follows: data with varying RUL values. This ensures that the latent variable Z
∫ ( ) ∫ effectively represents the degradation trend in the system monitoring
p(Z)
Lb = q(Z|X)log dZ + q(Z|X)logp(X|Z)dZ data.
q(Z|X)
Z Z (11) To further enhance the distinctiveness of representations for varying
degradation trends in the latent space, we integrate contrastive learning
= − D KL (q(Z|X)‖ p(Z)) + Eq(Z|X) [logp(X|Z)]
[41] into our approach. The main idea is to spatially align data with
q(Z|X) is modeled by the Encoder in DTE, and p(X|Z) by the Decoder. similar degradation trends closer together in the latent space, while
The Encoder receives the input data X and maps it to a latent space. It distancing data with differing trends. This approach bolsters the model’s
generates two outputs for each input: the mean and variance, which proficiency in identifying unique data structures and distinguishing
represent the parameters of the probability distribution in the latent between various degradation trends. A pivotal aspect of contrastive
space. Since the Bidirectional Long Short-Term Memory network (Bi- learning involves creating pairs of positive and negative samples. In our
LSTM) captures context information more effectively when processing approach, we initially select a specific sample X from segmented
(i)(s)
time series data [40], we utilize it to construct the Encoder. The outputs data of engine i as the anchor sample, with its corresponding RUL
of the Bi-LSTM are then passed to two separate fully connected layers, marked as y(i)(s) . We then choose a positive sample Xpos and a negative
(i)(s)
tasked with estimating the mean μZ and variance σ 2Z of the prior distri-
sample Xneg from other segmented data of the same engine, basing our
(i)(s)
bution p(Z). The direct sampling of latent variables Z from the distri-
( )
bution N μZ , σ 2Z poses a challenge due to its randomness and selection on their RUL values, ypos and yneg , which must satisfy certain
(i)(s) (i)(s)
non-differentiability, which complicates the backpropagation algo- criteria. Specifically, ypos should align with |y(i)(s) − ypos | ≤δ1 , while
(i)(s) (i)(s)
(13) ( )
where d Z(i)(s) , Z(i)(s)
pos = ‖ Z(i)(s) − Z(i)(s)
pos ‖2 represents the distance be-
tween the representation Z(i)(s) of the anchor sample in the latent space
( )
and the representation Z(i)(s)
pos of the positive sample, d Z
(i)(s)
, Z(i)(s)
neg =
4
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
‖ Z(i)(s) − Z(i)(s)
neg ‖2 represents the distance between the representation introduces controllable amounts of Gaussian noise into X0 . Specifically,
Z (i)(s)
of the anchor sample and the representation Z(i)(s) of the negative at each step of the Markov chain, Gaussian noise with variance βt is
neg
sample, and α is a predefined margin value. The optimization objective added to Xt− 1 , generating a new variable Xt that follows distribution
of L 4 is to ensure that the distance between the representations of an q(Xt |Xt− 1 ), as depicted in Eq. (18):
anchor sample and a positive sample in the latent space is smaller than ( √̅̅̅̅̅̅̅̅̅̅̅̅̅ )
q(Xt |Xt− 1 ) = N Xt ; μt = 1 − βt Xt− 1 , Σt = βt I (18)
the distance between the anchor sample and a negative sample by at
least a margin α. Achieving this goal results in zero loss; otherwise, a
Notably, q(Xt |Xt− 1 ) is still a Gaussian distribution, defined by mean μt
penalty is imposed based on the discrepancy. The advantage of adding √̅̅̅̅̅̅̅̅̅̅̅̅̅
and variance Σt , where μt = 1 − βt Xt− 1 and Σt = βt I. Here, {βt }Tt=1 , the
margin α is to require the model to push dissimilar samples apart by at
least a preset distance margin [44], thus enhancing the distinctiveness of variance schedule, is an adjustable hyperparameter used to control the
sample representations. Furthermore, this margin acts as a form of intensity of the noise. In this study, we adopt a linear schedule for βt ,
regularization, preventing the model from perfectly distinguishing where its value linearly increases from β1 to βT . After T steps of diffusion,
training data with the minimal distance between positive and negative the original distribution q(X0 ) eventually transforms into a simple
pairs, thereby improving its generalization ability. Through this Gaussian distribution q(XT ), where T is also an adjustable hyper-
approach, a latent space can be learned where samples with similar parameter. The forward process is expressed in Eq. (19):
degradation trends cluster together, while samples with different T
∏
degradation trends are further apart. This enhances the distinctiveness q(X1:T |X0 ) = q(Xt |Xt− 1 ) (19)
of samples with different degradation trends in the latent space, more t=1
5
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
through the reverse generation process, thereby generating new data independent of Xt . Based on Eq. (20), we derive Eq. (28):
points. However, directly calculating q(Xt− 1 |Xt ) is challenging as it in-
1 ( √̅̅̅̅̅̅̅̅̅̅̅̅̅ )
volves statistical parameter estimation of the real distribution q(X0 ). To X0 = √̅̅̅̅ Xt − 1 − αt ε (28)
αt
address this, we utilize a parameterized neural network, denoted as pϕ ,
to approximate q(Xt− 1 |Xt ). Evidently, the distribution pϕ (Xt− 1 |Xt ) of the Combining Eq. (27) and (28), we can deduce that the mean μt de-
reverse process and the distribution q(Xt |Xt− 1 ) of the forward process pends solely on Xt , as shown in Eq. (29):
share the same Gaussian distribution form. Thus, we only need to 1
(
βt
)
parameterize and approximate their means and variances, as shown in μt (Xt ) = √̅̅̅̅ Xt − √̅̅̅̅̅̅̅̅̅̅̅̅̅ ε (29)
αt 1 − αt
Eq. (22):
( ) As the variance βt is a constant and does not affect the optimization
pϕ (Xt− 1 |Xt ) = N Xt− 1 ; μϕ (Xt , t), Σϕ (Xt , t) (22) process, we only need to use a denoising network εϕ to approximate the
If we start with XT and gradually remove the Gaussian noise until we Gaussian noise ε, as illustrated in Eq. (30):
( )
fully restore the original monitoring data distribution of X0 , then this 1 βt
reverse generation process can be represented as Eq. (23): μϕ (Xt , t) = √̅̅̅̅ Xt − √̅̅̅̅̅̅̅̅̅̅̅̅̅εϕ (Xt , t) (30)
αt 1 − αt
T
∏ The denoising network εϕ is a neural network that predicts the
pϕ (X0:T ) = pϕ (XT ) pϕ (Xt− 1 |Xt ) (23)
t=1 Gaussian noise ε based on the time step t and noised data Xt . Details of
the model structure will be presented in Section 3.3.1.
3.2.3. Training and sampling Since both pϕ (Xt− 1 |Xt ) and q(Xt− 1 |Xt , X0 ) follow Gaussian distribu-
To estimate the parameters of pϕ , we use the maximum likelihood tions, the KL divergence is only related to their means and variances.
estimation method, focusing on maximizing the log-likelihood function Ignoring the variance term, L t− 1 can be simplified to Eq. (31):
logpϕ (X0 ). Directly solving logpϕ (X0 ) is rather complex, so this study [ ]
β2t (√̅̅̅̅ √̅̅̅̅̅̅̅̅̅̅̅̅ ) 2
instead optimizes its evidence lower bound [26], which can be expressed L t− 1 = EX0 ,ε,t ∑ ‖ ε − εϕ αt X0 + 1− αt ε,t ‖ (31)
as Eq. (24): 2αt (1− αt )‖ ϕ ‖22 2
[ ] ( )
logpϕ (X0 ) ≥ Eq(X1 |X0 ) logpϕ (X0 |X1 ) − D KL q(XT |X0 ) ‖ pϕ (XT ) − Further discarding weighting terms [26,28], Eq. (31) reduces to Eq.
T
(32):
∑ [ ( )]
Eq(Xt |X0 ) D KL q(Xt− 1 |Xt , X0 ) ‖ pϕ (Xt− 1 |Xt ) [ (√̅̅̅̅ √̅̅̅̅̅̅̅̅̅̅̅̅̅ ) 2 ]
t=2 (24) L simple αt X0 + 1 − αt ε, t ‖ (32)
t− 1 = EX0 ,ε,t ‖ ε − εϕ
T 2
∑
=L0− LT− L t− 1
t=2
Thus, the training process can be summarized as follows: For an
√̅̅̅̅
arbitrary time step t and its corresponding noised data Xt = αt X0 +
( ) √̅̅̅̅̅̅̅̅̅̅̅̅̅
where L T = D KL q(XT |X0 )‖pϕ (XT ) , lacking trainable parameters, it 1 − αt ε, the denoising network εϕ predicts the Gaussian noise ε added
at time step t, and the loss function is computed according to Eq. (32). By
can be treated as a constant and ignored during model training. L t− 1 = this method, we can effectively optimize the log-likelihood function
[ ( )]
Eq(Xt |X0 ) D KL q(Xt− 1 |Xt , X0 )‖pϕ (Xt− 1 |Xt ) represents the distribution logpϕ (X0 ), thus completing the training of the denoising network εϕ .
After this training process, we can start with XT , progressively remove
discrepancy between q(Xt− 1 |Xt , X0 ), the true reverse process under the
the Gaussian noise, and ultimately obtain sampled data from the real
known condition of X0 , and pϕ (Xt− 1 |Xt ), the parametrized neural
distribution q(X0 ). Specifically, by combining Eq. (30) and (22), we
network used to approximate q(Xt− 1 |Xt ). L 0 is essentially a specific derive Eq. (33):
instance of L t− 1 at t = 1. Thus, our primary focus is on L t− 1 , aiming to ( ) √̅̅̅̅
minimize the discrepancy between pϕ (Xt− 1 |Xt ) and q(Xt− 1 |Xt ,X0 ). It can 1 βt
Xt− 1 ∼ pϕ (Xt− 1 |Xt ) = √̅̅̅̅ Xt − √̅̅̅̅̅̅̅̅̅̅̅̅̅εϕ (Xt , t) + βt π (33)
be demonstrated that q(Xt− 1 |Xt , X0 ) also follows a Gaussian distribution αt 1 − αt
[26], which can be expressed as Eq. (25):
where π ∼ N (0, I). Lastly, applying Eq. (23) enables us to procure the
q(Xt− 1 |Xt , X0 ) = N (Xt− 1 ; μt (Xt , X0 ), βt I) (25) desired sampled data.
where the variance βt and mean μt (Xt , X0 ) are given by Eq. (26) and Eq.
(27): 3.3. Detailed designs for DiffRUL
1 − αt− 1
βt = β (26) 3.3.1. Denoising network
1 − αt t
To effectively capture the complex spatio-temporal correlations in
√̅̅̅̅
αt (1 − αt− 1 )
√̅̅̅̅̅̅̅̅̅
αt− 1 βt multivariate monitoring data and accurately predict noise levels at each
μt (Xt , X0 ) = Xt + X0 (27) step, we meticulously design the architecture of the denoising network,
1 − αt 1 − αt
named DiffNet. This network espouses a configuration akin to that of the
Since αt− 1 , αt , and βt are constants, the variance βt is a fixed value inverted bottleneck [45], as illustrated in Fig. 3. It begins by expanding
the channel number of the input data to Nc through a 1 × 1 convolution
6
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
( )
layer. Then, it captures spatio-temporal correlations in a cos 104i/(d/2− 1)
t . te then goes through three fully connected layers. The
high-dimensional space through a feature extraction network and finally
first two layers employ the GELU activation function, and their param-
compresses the channel number back through another 1 × 1 convolution
eters are shared across all dilated layers. The final layer has unique
layer. The key advantage of this design strategy is that, by increasing the
parameters for each dilated layer. Its output is added to the input of the
channel number of the input data, the network can more effectively
dilated convolution, serving as the diffusion time step embedding for
capture and learn various complex features. This allows each channel to
that dilated layer. Finally, we sum the skip connections for all dilated
focus on detecting different patterns or aspects of the input data, thereby
layers, and after processing through a 3 × 3 convolution, batch
enriching the feature set learned by the model.
normalization layer, and GELU activation function, reconstruct the data.
The function of the feature extraction network is to apprehend the
spatio-temporal correlations inherent in multivariate system monitoring
3.3.2. Guided generation
data within a high-dimensional space. Drawing inspiration from [47],
To enhance the fidelity of degradation trend representations within
the structural foundation of the feature extraction network is predicated
data samples produced by DiffRUL, we extend the unconditional Dif-
on the implementation of stacked dilated convolutions, as depicted in
fRUL to a conditional one. This extension integrates degradation trend
Fig. 4. Specifically, the feature extraction network consists of Nd dilated
representations from DTE as generative conditions into DiffRUL’s
layers, divided into m dilation blocks, with each block containing n
reverse process. Specifically, the process initially defined by Eq. (22) and
= Nd /m dilated layers. For the n dilated layers within each dilation
{ }n− 1 (23) is modified to Eq. (34) and (35):
block, the dilation factor of the dilated convolution is 2i i=0 . To ( )
expedite model convergence and facilitate the training of deeper net- pϕ (Xt− 1 |Xt , Z) = N Xt− 1 ; μϕ (Xt , t, Z), Σϕ (Xt , t) (34)
works, we integrate residual connections into each dilated layer.
T
Moreover, we implement gated activation units [47] within these layers ∏
pϕ (X0:T |Z) = pϕ (XT ) pϕ (Xt− 1 |Xt , Z) (35)
for non-linearity, followed by a 3 × 3 convolution for feature extraction. t=1
Each dilated layer features two additional inputs: the diffusion time
step and the generative condition derived from DTE, with a compre- Accordingly, the loss function for training, initially defined in Eq.
hensive explanation provided in Section 3.3.2. This section focuses (32), is revised to Eq. (36):
primarily on incorporating the diffusion time step, crucial for model [ (√̅̅̅̅ √̅̅̅̅̅̅̅̅̅̅̅̅̅ ) 2]
training as the denoising network predicts Gaussian noise based on L simple
t− 1 = EX0 ,ε,t ‖ ε − εϕ αt X0 + 1 − αt ε, t, Z ‖ (36)
2
diffusion time step t and the corresponding noised data Xt . To facilitate
recognition of varying diffusion steps, we employ a time step embedding The data sampling method defined in Eq. (33) is also adjusted to Eq.
scheme akin to positional encoding in Attention mechanisms [46]. For (37):
each diffusion time step t, a d-dimensional vector te is used as its 1
(
βt
) √̅̅̅̅
embedding. For i = 0, 1, …, d /2 − 1, the calculation method for this Xt− 1 ∼ pϕ (Xt− 1 |Xt , Z) = √̅̅̅̅ Xt − √̅̅̅̅̅̅̅̅̅̅̅̅̅ εϕ (Xt , t, Z) + βt π (37)
( ) αt 1 − αt
vector can be expressed as te (i) = sin 104i/(d/2− 1) t , te (i + d /2) =
In the extended DiffRUL, when predicting Gaussian noise ε at time
Algorithm 1 Table 2
Training and Sampling process of DiffRUL. Details of the N-CMAPSS dataset used in this study.
Input: Diffusion time step T, Variance schedule {βt }Tt=1 , {αt = 1 − βt }Tt=1 , Subsets Data type Engines Failure modes
{ ∏ }T
αt = ts=1 αs t=1 , DS01 Training data Unit 4 HPT
Real system monitoring data distribution q(X0 ), Pre-trained Encoder in DTE. Testing data Unit 7 HPT
Output: Denoising network εϕ , Sampled monitoring data X0
sample
from real distribution DS02 Training data Unit 20 HPT + LPT
Testing data Unit 15 HPT + LPT
q(X0 ).
DS03 Training data Unit 5 HPT
Training Process Testing data Unit 12 HPT
1 For each iteration do DS04 Training data Unit 1 HPT
2 Sample real system monitoring data X0 from q(X0 ). Testing data Unit 9 HPT
3 Sample the time step t from uniform distribution {1, 2, …, T}.
4 Sample the Gaussian noise ε from N (0, I).
5 Obtain the representation Z of X0 through the Encoder. The N-CMAPSS dataset delineates the degradation trajectory of aero-
6 Predict the Gaussian noise ε by the denoising model εϕ , and calculate the loss engines from normal operation to failure under actual flight conditions,
by Eq. (36). featuring two distinct failure modes: the abnormal high-pressure turbine
7 Update the parameters ϕ of εϕ using gradient descent.
8 End For
(HPT) and low-pressure turbine (LPT). This dual-mode configuration
Sampling Process signifies a more complex degradation pattern compared to single-failure
9 Sample XT from N (0, I). scenarios. In our experiments, we utilize eight engines from subsets
10 Sample real system monitoring data X0 from q(X0 ). DS01, DS02, DS03, and DS04 to validate the proposed method, with
11 Obtain the representation Z of X0 through the Encoder. details in Table 2. We employ data from 20 sensors per engine for model
12 For t = T, T − 1, …, 1 do
training, as cited in [49], and the dataset provides accurate RULs,
13 Sample a random variable π from N (0, I).
14
facilitating precise label assignment. Data normalization and segmen-
Sample Xt− 1 from pϕ (Xt− 1 |Xt , Z) by Eq. (37).
15 End For tation are conducted in accordance with Eq. (4) and the procedures in
Section 2.2, respectively.
important to note that during the training of DiffRUL, the weights of the Score = si , si = (39)
i=1
⎪
⎪ ŷt − yt
(i) (i)
Encoder are kept frozen. The specific training and sampling procedures ⎪
⎪
⎪
⎩ e 10 − 1, y
̂ (i)
− y (i)
t ≥ 0
t
for the extended DiffRUL are described in Algorithm 1.
4. Experiments where yt is the actual RUL value of engine i at time step t, while ̂
y t is
(i) (i)
8
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
Fig. 5. The degradation trajectories of real data and sampled data in FD001.
in the loss function L 4 is set to α = 0.4. Due to the loss function L 4 representations of its degradation trends. Subsequently, these repre-
being significantly lower than L 1 , L 2 , and L 3 , to ensure that the model sentations are used as generative conditions in the sampling process of
does not neglect L 4 during training, we set the coefficient λ4 higher DiffRUL to generate sampled data. Since the real engine data are pro-
than the other coefficients. Specifically, we utilize the grid search cessed in segments, the resulting sampled data are also segmented. We
method, setting λ1 , λ2 , and λ3 to 1 and defining the search space for λ4 as concatenate these segmented samples by retaining the value at the last
{5,10,15,20}. After conducting validation on the C-MAPSS dataset, we timestep of each segment in the order of data segmentation, ultimately
determine that the coefficient combination with the best validation obtaining a complete sample of the engine running until failure. Fig. 5
performance is λ1 = 1, λ2 = 1, λ3 = 1, and λ4 = 10. The time step of visually contrasts the real data with the generated samples, with each
DiffRUL is set to T = 50, and the variance of Gaussian noise βt is linearly subplot illustrating the trajectory of a sensor’s monitoring data over
increased from β1 = 0.0004 to βT = 0.05. In the DiffNet, the number of time. Following the description in Section 4.1, we retain only the data
channels Nc is set to 64, the number of dilated layers Nd to 30, the from 14 valuable sensors. In the figure, real data are shown in blue and
number of dilation blocks m to 3, and the dimension of the diffusion time sampled data in yellow. Across all subplots, the congruent degradation
step embedding d to 128. In augmenting the dataset, we encode the real trends between the blue and yellow lines validate DiffRUL’s ability to
engine monitoring data using DTE to capture their degradation trends. capture the global degradation information of real engine monitoring
These representations then serve as generative conditions for producing data. Additionally, the presence of local fluctuations in both real and
sampled data via DiffRUL, effectively doubling the quantity of the sampled data, with the sampled data closely following the real data,
original training data. To confirm the robustness of our results, we underscores DiffRUL’s ability to capture nuanced local degradation
conduct 10 trials per subset for all methods and report the average details. Moreover, no significant anomalies or unnatural phenomena
outcomes. appear in the sampled data, suggesting that the data generated by Dif-
fRUL is of high quality and do not introduce large errors or unrealistic
values. Finally, in the monitoring data from different sensors, DiffRUL
4.4. Experiment results on the C-MAPSS dataset
consistently fits well with the real data’s degradation trends, main-
taining a comparably consistent performance level. This is equally
4.4.1. Results of the generated data
crucial for enhancing the subsequent prediction of RUL.
To evaluate DiffRUL’s proficiency in mimicking the distribution and
degradation trends of system monitoring data, we analyze the complete
4.4.2. Results of RUL prediction
degradation trajectories of engines running to failure, comparing real
To verify whether the data generated by DiffRUL can effectively
and sampled data. Specifically, we first randomly select an engine from
enhance the performance of deep learning models in RUL prediction
the subset FD001 and use DTE to encode its monitoring data, obtaining
9
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
Table 3
Comparison results of RMSE and Score with different methods in C-MAPSS dataset.
Methods RMSE Score
CNN-based Method 16.65 18.72 15.62 22.77 461.25 2448.96 513.58 4109.14
CNN-based Method + DiffRUL 13.71 18.36 12.92 21.15 259.36 1749.45 336.95 2850.36
Gain 17.65% 1.95% 17.32% 7.10% 43.77% 28.56% 34.39% 30.63%
RNN-based Method 13.79 17.55 13.75 20.31 351.60 1711.56 375.11 2784.96
RNN-based Method + DiffRUL 12.78 16.59 12.36 21.13 256.68 1200.43 289.90 2091.82
Gain 7.36% 5.48% 10.13% -4.01% 27.00% 29.86% 22.72% 24.89%
Hybrid Method 13.53 18.01 13.37 21.20 293.62 1848.61 396.67 2449.59
Hybrid Method + DiffRUL 12.65 17.51 13.47 19.63 254.69 1270.47 357.83 1884.86
Gain 6.54% 2.76% -0.76% 7.40% 13.26% 31.27% 9.79% 23.05%
Transformer-based Method 12.60 16.99 13.52 19.49 271.59 1401.99 314.41 2097.70
Transformer-based Method + DiffRUL 11.71 15.90 11.77 18.43 199.29 1162.84 240.37 1627.37
Gain 7.02% 6.41% 12.94% 5.45% 26.62% 17.06% 23.55% 22.42%
tasks, we integrate the sampled data with real data for data augmenta- specifically CNN [6], LSTM [10], TCMN [14], and Transformer [12].
tion, and then use this augmented data for RUL prediction. Specifically, Given that some of these models were proposed earlier, we have opti-
we conduct comparative experiments using different types of model mized their structure and parameters, applied more advanced activation
architectures, including the convolutional neural network-based functions and normalization techniques, and employed Dropout tech-
method, the recurrent neural network-based method, the hybrid nology to prevent overfitting. Specific configuration details are provided
method, and the transformer-based method. The fundamental designs of in Tables D1 to D4 in the Appendix. To ensure the fairness and
these methods are inspired by established RUL prediction models, comparability of the experimental results, all methods use the same data
10
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
processing approaches (including sensor selection, data preprocessing, FD001 and FD003, their enhancements are less substantial than those
and data segmentation), training approaches (including optimizer achieved by DiffRUL. In FD002 and FD004, they even negatively impact
choice, learning rate adjustment strategy, and parameter initialization the base method’s performance. Fig. 7, Fig. 8, and Fig. 9 display the
method), and are run on the same hardware equipment. Table 3 presents performance of methods based on CNN, hybrid, and Transformer, and
comparative results, demonstrating that DiffRUL significantly enhances their variants enhanced by CGAN, WGAN-GP, TimeGAN, CR-GAN, and
the performance of baseline methods in nearly all cases. Crucially, Dif- DiffRUL across four subsets. Analysis of these figures leads to a similar
fRUL improves predictions not only in subsets with limited operating conclusion: DiffRUL improves the RMSE and Score in almost all subsets,
conditions and failure modes, like FD001 and FD003, but also in more highlighting its effectiveness. Conversely, the other four generative al-
varied subsets such as FD002 and FD004. This suggests that DiffRUL gorithms lack consistent performance enhancements, showing benefits
effectively captures complex patterns and distributions in diverse and in FD001 and FD003 but limited or adverse effects in FD002 and FD004.
complex data scenarios, generating high-quality sampled data that Synthesizing the results from Figs. 6 to 9, DiffRUL significantly
substantially increases the accuracy of RUL predictions. outperforms the other four generative algorithms in enhancing RUL
prediction effectiveness, with CR-GAN also surpassing CGAN, WGAN-
4.4.3. Comparison with existing methods GP, and TimeGAN. To further explore performance differences, we
To further validate the effectiveness of our proposed DiffRUL, we calculate gain rates in RMSE and Score for methods using CR-GAN and
compare it with four established generative algorithms: CGAN [19], DiffRUL, presented in Figs. 10 and 11. Results show that CR-GAN im-
WGAN-GP [20], TimeGAN [21], and CR-GAN [23], with results shown proves RMSE in FD001 and FD003 with gains of 6.35% and 7.32%,
in Figs. 6 to 9. Fig. 6 illustrates the performance of the CNN-based respectively, but decreases RMSE in FD002 and FD004 by 2.91% and
method and its enhancement through CGAN, WGAN-GP, TimeGAN, 4.08%, with similar trends in Score. Conversely, DiffRUL enhances
CR-GAN, and DiffRUL across four subsets. The introduction of DiffRUL performance across nearly all subsets and notably outperforms CR-GAN
notably improves the RMSE and Score in all subsets, outperforming the in the more challenging FD002 and FD004. In summary, while CGAN,
other generative algorithms, which show inconsistent effects across WGAN-GP, TimeGAN, and CR-GAN falter in complex subsets like FD002
different subsets. Specifically, while they provide some improvement in and FD004, leading to suboptimal enhancements, DiffRUL not only
11
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
Fig. 10. The gain rates of RMSE for different methods enhanced by CR-GAN and DiffRUL.
Fig. 11. The gain rates of Score for different methods enhanced by CR-GAN and DiffRUL.
excels in simpler subsets (FD001 and FD003) but also shows marked each engine, each linked to a specific RUL value. In Fig. 12, each dot
improvement in more complex ones (FD002 and FD004). These findings represents these representations within the latent space, color-coded to
highlight DiffRUL’s widespread applicability and superior efficacy in reflect an engine’s RUL value, as indicated by the color bar. A yellow
RUL prediction, particularly in scenarios demanding accurate capture region in each subplot’s lower-left corner denotes the engine’s early
and replication of complex data distributions. operational stage, representing a longer RUL. As degradation progresses,
the dots move towards the green and ultimately to the deep blue area in
4.4.4. Ablation experiments the upper-right corner, signaling imminent failure. Additionally, we
In this section, we perform ablation experiments to validate that DTE randomly select an engine in each subset and highlight its series of
effectively extracts distinctive representations of degradation trends representations in distinct colors, as shown in Fig. 12. These visual re-
from the system monitoring data, aiding DiffRUL in generating superior sults clearly demonstrate DTE’s capability to effectively distinguish
samples. Initially, we encode monitoring data from all engines across various degradation trends, exemplified by engine unit #78 in FD001,
FD001, FD002, FD003, and FD004 with DTE to obtain these represen- where the representation shifts from the yellow area, indicating a longer
tations, visualized in Fig. 12. As the full lifecycle monitoring data for RUL, to the deep blue area, indicating a shorter RUL, culminating in
each engine is segmented, DTE produces a series of representations for failure.
12
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
Fig. 13. Degradation trend representations obtained by DTE without contrastive learning on four subsets.
13
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
Table 4
Comparison results of RMSE and Score augmented by different methods in C-MAPSS dataset.
Base Architecture Augmented Methods RMSE Score
CNN-based Method I 15.06 19.94 13.55 21.85 403.07 2812.83 398.59 4460.15
Method II 14.88 19.67 13.77 22.08 297.83 1839.75 360.26 3072.19
Proposed 13.71 18.36 12.92 21.15 259.36 1749.45 336.95 2850.36
RNN-based Method I 12.85 17.85 12.89 21.58 284.69 1910.51 272.98 3183.54
Method II 13.08 17.08 12.81 21.12 282.75 1324.19 253.22 2123.29
Proposed 12.78 16.59 12.36 21.13 256.68 1200.43 289.90 2091.82
Hybrid Method I 13.09 18.86 13.70 23.16 254.00 2368.68 365.16 4018.00
Method II 12.99 17.59 13.30 21.54 250.67 1502.43 336.30 2879.97
Proposed 12.65 17.51 13.47 19.63 254.69 1270.47 357.83 1884.86
Transformer-based Method I 12.72 16.38 12.57 19.28 278.88 1372.57 285.56 2196.97
Method II 12.59 16.70 13.25 18.63 261.09 1090.46 287.21 1786.53
Proposed 11.71 15.90 11.77 18.43 199.29 1162.84 240.37 1627.37
Secondly, to evaluate the impact of integrating contrastive learning degradation trends, capturing more distinctive data structures and
within DTE to enhance the distinctiveness of degradation trend repre- patterns.
sentations in the latent space, we adjust the coefficient of the triplet loss Finally, to validate that the degradation trend representations
function to zero in the joint optimization framework of DTE. We then re- extracted by DTE can guide DiffRUL in generating higher-quality data
visualize the engine representations, with results displayed in Fig. 13. samples, we conduct a comparative experiment. This experiment com-
Compared to Fig. 12, the representations in Fig. 13 show notably less pares performance improvements of the base RUL prediction methods
distinctiveness, with degradation trends clustering more closely in the enhanced by DiffRUL, using representations extracted by DTE versus
latent space. For instance, engine unit #167 in FD004 exhibits a clear using actual RUL values directly as generative conditions [23]. Addi-
degradation trajectory in Fig. 12, while in Fig. 13, its representations are tionally, to further demonstrate the effectiveness of contrastive learning
more clustered, displaying a less defined degradation path. This contrast in DTE, we set the triplet loss function coefficient to zero and replicate
highlights the significance of incorporating contrastive learning, as it the experiment with this modified version. The results are detailed in
enables the model to more effectively discern and represent different Table 4, where Method I represents DiffRUL using actual RUL values as
Fig. 14. The degradation trajectories of real data and sampled data in DS01.
14
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
Table 5
Comparison results of RMSE and Score augmented by different methods in N-CMAPSS dataset.
Base Architecture Augmented Methods RMSE Score
CNN-based Non-augmentation 7.18 8.40 10.60 11.34 78.80 53.03 111.95 125.82
CGAN 7.20 9.44 10.95 11.12 74.33 63.43 119.53 121.59
Gain -0.28% -12.38% -3.30% 1.94% 5.67% -19.61% -6.77% 3.36%
WGAN-GP 7.13 9.21 11.07 10.95 76.52 64.84 118.70 111.25
Gain 0.70% -9.64% -4.43% 3.44% 2.89% -22.27% -6.03% 11.58%
TimeGAN 7.01 8.86 10.14 10.77 70.48 60.50 108.71 112.46
Gain 2.37% -5.48% 4.34% 5.03% 10.56% -14.09% 2.89% 10.62%
CR-GAN 6.67 8.97 9.88 10.47 65.85 55.69 113.88 108.75
Gain 7.10% -6.79% 6.79% 7.67% 16.43% -5.02% -1.72% 13.57%
Proposed 6.30 7.47 9.90 9.56 57.43 42.98 104.62 95.49
Gain 12.26% 11.07% 6.60% 15.70% 27.12% 18.95% 6.55% 24.11%
RNN-based Non-augmentation 6.61 7.83 10.26 10.90 70.38 51.64 105.19 117.68
CGAN 6.48 8.08 10.23 10.79 64.39 56.91 103.42 121.48
Gain 1.97% -3.19% 0.29% 1.01% 8.51% -10.21% 1.68% -3.23%
WGAN-GP 6.44 8.11 10.15 10.66 62.42 55.67 106.31 112.28
Gain 2.57% -3.58% 1.07% 2.20% 11.31% -7.80% -1.06% 4.59%
TimeGAN 6.52 8.01 9.89 10.37 59.77 53.85 99.89 108.75
Gain 1.36% -2.30% 3.61% 4.86% 15.08% -4.28% 5.04% 7.59%
CR-GAN 5.65 7.99 9.58 9.89 54.85 50.31 97.57 103.23
Gain 14.52% -2.04% 6.63% 9.27% 22.07% 2.58% 7.24% 12.28%
Proposed 5.72 6.93 8.97 9.36 51.09 40.62 84.09 94.42
Gain 13.46% 11.49% 12.57% 14.13% 27.41% 21.34% 20.06% 19.77%
Hybrid Non-augmentation 6.39 6.78 9.77 10.14 69.57 48.26 98.91 103.83
CGAN 6.52 7.02 9.67 9.89 71.02 57.60 97.80 101.28
Gain -2.03% -3.54% 1.02% 2.47% -2.08% -19.35% 1.12% 2.46%
WGAN-GP 6.27 6.93 9.49 9.77 65.74 55.87 102.41 95.46
Gain 1.88% -2.21% 2.87% 3.65% 5.51% -15.77% -3.54% 8.06%
TimeGAN 6.05 6.87 9.42 9.65 63.42 51.23 91.32 97.77
Gain 5.32% -1.33% 3.58% 4.83% 8.84% -6.15% 7.67% 5.84%
CR-GAN 5.96 6.85 9.04 9.27 58.62 52.06 90.20 92.16
Gain 6.73% -1.03% 7.47% 8.58% 15.74% -7.87% 8.81% 11.24%
Proposed 5.76 6.29 8.56 8.69 51.23 36.76 82.18 80.31
Gain 9.86% 7.23% 12.38% 14.30% 26.36% 23.83% 16.91% 22.65%
Transformer-based Non-augmentation 6.06 6.61 9.50 9.86 62.60 43.18 93.37 98.98
CGAN 6.23 7.13 9.44 10.05 70.28 52.42 91.77 96.64
Gain -2.81% -7.87% 0.63% -1.93% -12.27% -21.40% 1.71% 2.36%
WGAN-GP 6.15 7.05 9.28 9.74 65.50 49.67 90.23 91.78
Gain -1.49% -6.66% 2.32% 1.22% -4.63% -15.03% 3.36% 7.27%
TimeGAN 5.72 6.92 9.12 9.17 57.36 50.31 87.78 92.23
Gain 5.61% -4.69% 4.00% 7.00% 8.37% -16.51% 5.99% 6.82%
CR-GAN 4.77 6.88 8.72 8.97 51.20 47.52 85.55 87.45
Gain 21.29% -4.08% 8.21% 9.03% 18.21% -10.05% 8.38% 11.65%
Proposed 4.86 5.81 7.95 8.48 44.73 29.53 69.11 77.22
Gain 19.80% 12.10% 16.32% 14.00% 28.55% 31.61% 25.98% 21.98%
generative conditions, and Method II applies DiffRUL with the triplet degradation trajectories of a selected engine within the subset DS01.
loss function coefficient set to zero. The results indicate that using our Given the longer operational cycles of engines in DS01, to facilitate vi-
proposed DiffRUL, the four base methods achieve average RMSE values sual comparison, we confine our comparison to the last 3,000 samples.
of 12.71, 17.09, 12.63, and 20.09 and average Score values of 242.50, Although 20 sensors are used in the model training process, due to space
1345.80, 306.26, and 2113.60 across FD001, FD002, FD003, and constraints, this section presents data from only 14 sensors. As shown in
FD004, respectively. In contrast, under Method I, these methods yield Fig. 14, through a series of sub-figures, each tracing the evolution of
RMSE averages of 13.43, 18.26, 13.18, and 21.47 and Score averages of sensor-derived data over time, the real data is marked by blue lines and
305.16, 2116.15, 330.57, and 3464.67 in the same subsets. These results the sampled data by yellow lines. The comparison reveals a high degree
affirm that employing degradation trend representations extracted by of consistency between the real and sampled data in terms of global
DTE as generative conditions enhances base method performance more change trends and local fluctuations, with no significant anomalies
effectively in RUL prediction tasks than using actual RUL values. This observed in the sampled data. These results further confirm DiffRUL’s
confirms that degradation trend representations effectively guide Dif- effectiveness in accurately capturing and understanding the distribution
fRUL in generating higher quality data samples. Furthermore, with and degradation patterns of system monitoring data.
Method II, the base methods record RMSE averages of 13.39, 17.76,
13.28, and 20.84, and Score averages of 273.09, 1439.21, 309.25, and 4.5.2. RUL prediction results of different methods
2465.49, respectively, which further highlights the efficacy of DiffRUL, To further evaluate the effectiveness of our proposed method, we
particularly in enhancing RUL prediction performance and substantiat- conduct a comparative assessment of various RUL prediction methods
ing the benefits of integrating contrastive learning into DTE. within the N-CMAPSS dataset, enhanced by DiffRUL and four other
generative algorithms, as detailed in Table 5. The “Non-augmentation”
category represents the baseline architecture, while “Gain” refers to the
4.5. Experiment results on the N-CMAPSS dataset
percentage improvement in RMSE and Score using generative algo-
rithms. Notably, our proposed DiffRUL significantly enhances the per-
4.5.1. Results of the generated data
formance metrics of baseline RUL prediction methods across all subsets.
For the N-CMAPSS dataset, we compare the real and sampled
15
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
In contrast, the other four generative algorithms do not show uniform proving its capability to capture complex distributions and synthesize
effectiveness across different subsets, as seen with DiffRUL. For instance, high-quality data.
CR-GAN improves RMSE and Score in subsets DS01, DS03, and DS04, As a supplementary assessment to the RMSE and Score, we analyze
but not to the extent achieved by DiffRUL. In DS02, CR-GAN even the distribution of prediction errors for selected engines in four subsets
negatively impacts the performance, with baseline methods showing an using histograms, further illustrating DiffRUL’s enhancement of baseline
average RMSE decrease of 3.49% and a Score decrease of 5.09%. RUL prediction methods. These results are shown in Figs. 15 to 18. The
Similarly, the influence exerted by TimeGAN varies, benefiting DS01, histograms display the RUL prediction error on the horizontal axis and
DS03, and DS04, while worsening outcomes in DS02. WGAN-GP and the probability distribution on the vertical axis, with the blue sections
CGAN show minimal positive impact on baseline methods, with im- representing the error distribution of the baseline prediction model, the
provements in some subsets overshadowed by notable performance red sections showing the error distribution of the baseline model
decrements in others. These results underscore DiffRUL’s superior enhanced by DiffRUL, and the dark red sections indicating the overlap
ability to enhance prediction effectiveness compared to other generative between the two. A method is considered superior if most prediction
algorithms, excelling in both simpler subsets (DS01, DS03, and DS04) errors cluster towards lower values. Fig. 15 reveals that for subsets
and consistently improving in the more complex subset (DS02), further DS01, DS02, DS03, and DS04, the error distribution of the CNN-based
16
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
method, enhanced by DiffRUL, markedly shifts towards lower values, both the quality and degradation trajectories of real monitoring data.
confirming DiffRUL’s ability to refine the accuracy of RUL predictions. Initially employing DTE to extract distinctive representations of the
Figs. 16, 17, and 18 depict error distributions for the RNN-based, hybrid, degradation process from operation to failure, these representations
and Transformer-based methods, respectively, leading to similar find- significantly enhance the model’s ability to identify complex data pat-
ings: DiffRUL significantly reduces RUL prediction errors, boosting terns. Moreover, the diffusion model, adapted for multivariate system
performance. These analyses further verify that DiffRUL effectively monitoring, includes a forward process that incrementally adds noise,
generates high-quality sample data in the N-CMAPSS dataset, markedly followed by a reverse process in which a denoising model, refined at
enhancing RUL prediction accuracy. successive noise levels, restores the original samples. This methodology
enriches understanding of the data’s underlying distribution. Addition-
5. Conclusions ally, the creation of DiffNet, integrated with generative conditions,
accurately captures complex spatio-temporal correlations, ensuring
This paper introduces DiffRUL, an innovative method designed to precise noise level predictions at each diffusion time step. Empirical
augment multivariate engine monitoring data using a diffusion model results confirm DiffRUL’s effectiveness, demonstrating its superiority in
framework. This approach excels in generating samples that replicate producing high-quality data and significantly enhancing the predictive
17
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
accuracy of deep learning models in RUL estimation tasks. Declaration of competing interest
In future work, we aim to tackle two challenges. First, we will
attempt to overcome the issue of slow model training induced by the The authors declare that they have no known competing financial
diffusion process. Second, we will focus on using diffusion models to interests or personal relationships that could have appeared to influence
augment incomplete system monitoring data, a common issue in prac- the work reported in this paper.
tical scenarios where old systems are often replaced before scheduled
maintenance. This results in a substantial amount of incomplete data, Data availability
lacking failure times, which poses challenges for training RUL prediction
models. The key research question is how to effectively utilize the Data will be made available on request.
diffusion model’s strengths in fitting complex data distributions to
extend these incomplete data sequences.
Acknowledgments
CRediT authorship contribution statement
This work was supported by the National Natural Science Foundation
Wei Wang: Writing – review & editing, Writing – original draft, of China [72271200, 72231008], the Open Fund of Intelligent Control
Validation, Software, Methodology, Conceptualization. Honghao Song: Laboratory [ICL-2023-0304], the Science and Technology Innovation
Writing – review & editing. Shubin Si: Writing – review & editing, Group Program of Shaanxi Province [2024RS-CXTD-28], and the
Funding acquisition. Wenhao Lu: Writing – review & editing. Zhiqiang Distinguished Young Scholar Program of Shaanxi Province [2023-JQ-
Cai: Writing – review & editing, Supervision, Resources, Funding JC-10].
acquisition.
First, we use the Evidence Lower Bound (ELBO) to optimize the log-likelihood function:
∫ [ ]
pϕ (X0:T ) pϕ (X0:T )
logpϕ (X0 ) = log q(X1:T |X0 ) d(X1:T ) ≥ Eq(X1:T |X0 ) log (40)
q(X1:T |X0 ) q(X1:T |X0 )
The result obtained from Eq. (40) is ELBO, which can be further expanded as follows:
[ ] [ ∏ ]
pϕ (X0:T ) pϕ (XT ) Tt=1 pϕ (Xt− 1 |Xt )
ELBO = Eq(X1:T |X0 ) log = Eq(X1:T |X0 ) log ∏T
q(X1:T |X0 ) t=1 q(Xt |Xt− 1 )
[ ] (41)
pϕ (XT )pϕ (X0 |X1 ) ∏T pϕ (Xt− 1 |Xt )
= Eq(X1:T |X0 ) log + log t=2
q(X1 |X0 ) q(Xt |Xt− 1 )
18
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
{ [ ]}
T
[ ] ( ) ∑ q(Xt− 1 |Xt , X0 )
ELBO = Eq(X1 |X0 ) logpϕ (X0 |X1 ) − D KL q(XT |X0 ) ‖ pϕ (XT ) − Eq(Xt |X0 ) Eq(Xt− 1 |X0 ,Xt ) log
t=2 pϕ (Xt− 1 |Xt )
(45)
T
[ ] ( ) ∑ [ ( )]
= Eq(X1 |X0 ) logpϕ (X0 |X1 ) − D KL q(XT |X0 ) ‖ pϕ (XT ) − Eq(Xt |X0 ) D KL q(Xt− 1 |Xt , X0 ) ‖ pϕ (Xt− 1 |Xt )
t=2
In our study, we set the variances of two Gaussian distributions, pϕ (Xt− 1 |Xt ) and q(Xt− 1 |Xt , X0 ), to be exactly matched, which results in:
( ) ( ( ) ( ))
D KL q(Xt− 1 |Xt , X0 ) ‖ pϕ (Xt− 1 |Xt ) = D KL N Xt− 1 ; μq , Σq (t) ‖ N Xt− 1 ; μϕ , Σq (t)
[ )]
1 ( )T (
= log1 − d + d + μϕ − μq Σq (t)− 1 μϕ − μq (52)
2
1 [ ]
= 2
‖ μϕ − μq ‖22
2‖ Σq (t) ‖2
[ ( ) ( ) 2]
1 1 βt 1 βt
= EX0 ,ε,t ∑ ‖ √ ̅̅̅̅ X t − √ ̅̅̅̅̅̅̅̅̅̅̅̅̅ ε (Xt , t) − √̅̅̅̅ Xt − √̅̅̅̅̅̅̅̅̅̅̅̅̅ ε ‖ (53)
2
1 − αt 1 − αt
ϕ
2‖ ϕ (Xt , t) ‖ αt
2
αt 2
[ ]
β2t (√̅̅̅̅ √̅̅̅̅̅̅̅̅̅̅̅̅̅ ) 2
= EX0 ,ε,t ∑ ‖ ε − εϕ αt X0 + 1 − αt ε, t ‖
2αt (1 − αt )‖ ϕ ‖22 2
19
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
Table D1
Specific configuration of the convolutional neural network-based method.
Table D2
Specific configuration of the recurrent neural network-based method.
Table D3
Specific configuration of the hybrid method.
20
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
Table D4
Specific configuration of the transformer-based method.
21
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394
[41] Le-Khac PH, Healy G, Smeaton AF. Contrastive representation learning: A [47] Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, et al.
framework and review. IEEE Access 2020;8:193907–34. https://doi.org/10.1109/ WaveNet: A generative model for raw. audio 2016. https://doi.org/10.48550/
ACCESS.2020.3031549. arXiv.1609.03499.
[42] Robinson J, Chuang C-Y, Sra S, Jegelka S. Contrastive learning with hard negative [48] Arias Chao M, Kulkarni C, Goebel K, Fink O. Aircraft engine run-to-failure dataset
samples 2021. https://doi.org/10.48550/arXiv.2010.04592. under real flight conditions for prognostics and diagnostics. Data 2021;6:5. https://
[43] Jiang R, Nguyen T, Ishwar P, Aeron S. Supervised contrastive learning with hard doi.org/10.3390/data6010005.
negative samples 2022. https://doi.org/10.48550/arXiv.2209.00078. [49] Arias Chao M, Kulkarni C, Goebel K, Fink O. Fusing physics-based and deep
[44] Schroff F, Kalenichenko D, Philbin J. FaceNet: A unified embedding for face learning models for prognostics. Reliab Eng Syst Saf 2022;217:107961. https://doi.
recognition and clustering. In: 2015 IEEE Conf. Comput. Vis. Pattern Recognit. org/10.1016/j.ress.2021.107961.
CVPR. IEEE; 2015. p. 815–23. https://doi.org/10.1109/CVPR.2015.7298682. [50] Kingma DP, Ba J. Adam: A method for stochastic optimization. https://doi.org/10.
[45] Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C. MobileNetV2: Inverted 48550/arXiv.1412.6980; 2017.
residuals and linear bottlenecks. IEEE Computer Society 2018:4510–20. https:// [51] Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward
doi.org/10.1109/CVPR.2018.00474. neural networks. In: Proc. Thirteen. Int. Conf. Artif. Intell. Stat., JMLR Workshop
[46] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention and Conference Proceedings; 2010. p. 249–56.
is all you need. Adv. Neural Inf. Process. Syst., 30. Curran Associates, Inc.; 2017.
22