0% found this document useful (0 votes)

13 views

DiffNet

This study introduces DiffRUL, a novel method for augmenting multivariate engine monitoring data to improve the prediction of remaining useful life (RUL) for aero-engines. By leveraging a diffusion probabilistic model, DiffRUL generates high-quality synthetic data that mimics real degradation trends, thus addressing the challenge of insufficient training data for deep learning models. Experimental results demonstrate that the generated data significantly enhance the predictive performance of RUL estimation models.

Uploaded by

arian aghamohseni

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

DiffNet

Uploaded by

arian aghamohseni

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Reliability Engineering and System Safety 252 (2024) 110394

Contents lists available at ScienceDirect

Reliability Engineering and System Safety

journal homepage: www.elsevier.com/locate/ress

Data augmentation based on diffusion probabilistic model for remaining

useful life estimation of aero-engines
Wei Wang a, b , Honghao Song a, b , Shubin Si a, b , Wenhao Lu a, b, c , Zhiqiang Cai a, b, *
a
School of Mechanical Engineering, Northwestern Polytechnical University, Xi’an 710072, China
b
Ministry of Industry and Information Technology Key Laboratory of Industrial Engineering and Intelligent Manufacturing, Northwestern Polytechnical University, Xi’an
710072, China
c
Department of Automotive Engineering, Suzhou Vocational Institute of Industrial Technology, Suzhou 215104, China

A R T I C L E I N F O A B S T R A C T

Keywords: Predicting the remaining useful life (RUL) of aero-engines is essential for their prognostics and health man-
Prognostics and health management agement (PHM). Deep learning technologies are effective in this area, but their success depends critically on
Remaining useful life estimation acquiring sufficient engine monitoring data from operation to failure, a process that is expensive and challenging
Aero-engines
in practice. Insufficient data limit the training of deep learning methods, thereby affecting their predictive
Deep learning
Data augmentation
performance. To address this issue, this study proposes a novel method named DiffRUL for augmenting multi-
Diffusion model variate engine monitoring data and generating high-quality samples mimicking degradation trends in real data.
Initially, a specialized degradation trend encoder is designed to extract degradation trend representations from
monitoring data, which serve as generative conditions. Subsequently, the diffusion model is adapted to the
scenario of generating multivariate monitoring data, reconstructing and synthesizing data from Gaussian noise
through a reverse process. Additionally, a denoising network is developed to incorporate generative conditions
and capture spatio-temporal correlations in the data, accurately estimating noise levels during the reverse
process. Experimental results on C-MAPSS and N-CMAPSS datasets show that DiffRUL successfully generates
high-fidelity multivariate monitoring data. Furthermore, these generated data effectively support the RUL pre-
diction task and significantly enhance the predictive ability of the underlying deep learning models.

1. Introduction monitoring data [4]. Various architectures, including Convolutional

Neural Networks (CNNs) [5–7], Recurrent Neural Networks (RNNs) [8],
The reliability of aero-engines is critical for safety, efficiency, and Long Short-Term Memory (LSTM) networks [9–11], and Transformers
economic viability in the aviation industry, necessitating ongoing [12,13], are widely used in RUL prediction tasks. CNNs excel in pro-
maintenance due to their complex designs and varied operational con- cessing spatio-temporal data, which is advantageous for scenarios
ditions. Advances in aviation technology have highlighted the essential involving evolving spatial degradation patterns. Sateesh Babu et al. [6]
need for robust aero-engine management, prompting the adoption of utilized a CNN-based regression approach for RUL estimation, applying
Prognostics and Health Management (PHM) [1]. PHM integrates sensor convolutional and pooling filters to the time dimension of sensor data. Li
monitoring, data analytics, fault diagnosis, and Remaining Useful Life et al. [7] used 1D CNNs for local feature extraction, thereby improving
(RUL) estimation to enhance engine health management significantly. RUL prediction accuracy. However, CNNs face challenges with
This proactive approach not only boosts safety but also cuts mainte- long-term data trends essential for RUL prediction, whereas RNNs and
nance needs, reduces downtime, and lowers costs [2]. Crucially, within LSTMs are adept at capturing long-term time series dependencies. Zheng
the PHM framework, RUL estimation—predicting the time until an en- et al. [10] developed an LSTM-based method for RUL estimation that
gine is likely to fail—plays a key role in planning preventative mainte- harnesses sequential data to reveal hidden patterns. Zhang et al. [11]
nance [3]. designed a two-stage LSTM method for RUL prediction, incorporating
Deep learning has recently become prominent for RUL prediction, multiple nonlinear Health Index (HI) functions to guide the learning
effectively identifying complex degradation patterns in system process. Hybrid models combining CNNs and RNNs have also been

* Corresponding author.
E-mail address: [email protected] (Z. Cai).

https://doi.org/10.1016/j.ress.2024.110394
Received 25 January 2024; Received in revised form 26 May 2024; Accepted 22 July 2024
Available online 26 July 2024
0951-8320/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

explored. Jayasinghe et al. [14] used 1D temporal convolutions and prediction, Chen [36] utilized diffusion models to assess the current HI
stacked LSTM for precise RUL estimation. Liu et al. [15] created a of gears, employing an attention-guided multi-hierarchy LSTM for the
feature-attention method for turbofan engines’ RUL prediction, using a prognosis of future HIs, thus enabling the determination of the gear’s
two-layer BGRU for long-term dependencies and a multilayer CNN for RUL upon exceeding a predetermined threshold. Lu et al. [37] devel-
local sequence analysis. Recently, the Transformer has become a oped a method for the prediction of rolling bearing RUL, integrating a
promising alternative for RUL prediction, with works by Chen et al. [12] coupled diffusion process with a temporal attention mechanism, and
and Zhang et al. [13] developing Transformer-based networks for introducing a specialized network to substitute the inverse process
effective long-term trend analysis in RUL estimation. within the coupled diffusion model for RUL prediction. Beyond these
Despite significant advances in deep learning for RUL prediction, the applications, the potential of exploiting diffusion models for the gener-
effectiveness of these methods largely depends on the availability of ation of system monitoring data to enhance RUL predictions has
extensive run-to-failure system monitoring datasets [16]. In the big data remained largely untapped. Accordingly, this study leverages diffusion
era, although collecting large datasets is increasingly feasible, acquiring models for the generation of multivariate engine monitoring data,
comprehensive full lifecycle data for equipment continues to pose a adeptly capturing complex patterns and distributions, and producing
major challenge [17]. This lack of complete lifecycle data remains a samples that accurately mimic real degradation trends.
primary obstacle to deploying deep learning techniques in RUL predic- The main contributions of this study can be summarized as follows:
tion. Data generation technologies such as the widely-used Generative
Adversarial Network (GAN) introduced by Goodfellow [18], provide a 1) In response to scenarios involving equipment operating under
creative approach to overcoming the challenges of obtaining large diverse conditions or exhibiting multiple failure modes, and where
real-world datasets by learning to mimic actual data distributions to monitoring data display varied distribution characteristics, we
create realistic synthetic datasets. In PHM research, GANs are utilized to introduce a degradation trend encoder, DTE. This encoder effectively
synthesize data, thereby enhancing the accuracy of health prognostics captures accurate representations of degradation trends in system
and facilitating more precise RUL predictions. Behera et al. [19] monitoring data. These distinctive representations then act as
developed a prognostics framework using a Conditional GAN (CGAN) generative conditions, enabling the generative model to more
and a deep gated recurrent unit network to generate multivariate data effectively learn complex patterns and distributions within the data.
samples and predict RUL for complex systems. Gu et al. [20] employed a 2) We adapt the diffusion model to generate multivariate monitoring
Wasserstein GAN with Gradient Penalty (WGAN-GP) to create synthetic data, introducing an innovative method named DiffRUL. This
data for lithium-ion batteries, thus boosting battery health prognostic method uses a forward diffusion process to incrementally introduce
performance. Shangguan et al. [21] employed the Time-series GAN noise into the data, followed by a reverse generation process that
(TimeGAN) to generate synthetic degradation samples of train wheels, trains a denoising network across various noise levels to progres-
improving the prediction of wheel degradation. This enhancement en- sively restore the original data samples. This technique effectively
ables more accurate RUL calculations and supports further reliability captures the probabilistic distribution characteristics of multivariate
analysis of railways. Cheng et al. [22] combined dynamic mechanisms monitoring data, ensuring that the distribution of the generated
with GAN to produce diverse full-cycle degradation data, while Yang samples closely matches that of authentic data and accurately re-
et al. [16] enhanced rolling bearing sample quality using a conditional flects degradation trends seen in real systems, thereby providing an
GAN-based framework. Zhang et al. [23] developed a Convolutional efficient way to generate multivariate system monitoring data.
Recurrent GAN (CR-GAN) to produce high-quality time-series data, 3) We develop an advanced denoising network called DiffNet, specif-
improving RUL prediction techniques. ically designed to include generative conditions and to precisely
However, employing GANs to synthesize system monitoring data capture the complex spatio-temporal correlations in multivariate
presents challenges, particularly with the stability of training and the system monitoring data. By accurately predicting noise levels at each
convergence of generator and discriminator components. A common diffusion timestep, DiffNet supports the DiffRUL framework,
problem, mode collapse, may restrict output variety and, in severe cases, enhancing the generation of high-quality data.
lead to the repetitive generation of identical outputs, reducing data di- 4) The effectiveness of our proposed method is validated experimen-
versity [24]. Moreover, system monitoring data, often multivariate, in- tally, resulting in the generation of higher quality monitoring data
cludes a broad spectrum of operational metrics from aero-engines, such and the enhancement of deep learning models’ performance in pre-
as temperature, pressure, and vibration [25]. Generating such multi- dicting RUL.
variate data requires careful handling of complex spatio-temporal cor-
relations to preserve quality. Additionally, current GAN-based methods This paper is organized as follows: Section 2 outlines data prepara-
struggle to fully capture scenarios with aero-engines operating under tion and segmentation; Section 3 describes the proposed method; Sec-
various conditions and experiencing multiple failure modes [23]. These tion 4 discusses the experimental results; Section 5 concludes the paper.
scenarios involve diverse altitudes, temperatures, speeds, and failure
types, each contributing to data with multiple distributional attributes. 2. Problem formulation
Accurately capturing these intricate patterns is crucial to ensuring the
fidelity of the synthetic data produced. 2.1. Data preparation
Building on the introduction above, this study proposes an innova-
tive method to augment multivariate engine monitoring data, termed as The system monitoring data obtained from sensors are typically
{ }
DiffRUL. This approach is grounded in the principles of diffusion prob- multivariate. Therefore, let X = X(i) |i = 1, 2, …, nengine represent the
abilistic models, a rapidly evolving class of generative models inspired run-to-failure monitoring data of nengine engines, where the monitoring
by the natural process of physical diffusion. Characteristically, diffusion data for engine i is X(i) , denoted as Eq. (1):
models commence by the integration of noise into original datasets, { }
thereafter embarking on a process to inversely model this integration, X(i) = x(i)
1 , x2 , …, xt , …, xTi (1)
(i) (i) (i)

thereby facilitating the synthesis of data of enhanced quality [26–28].

Compared to GANs, diffusion models offer greater training stability, { }
where xt = xt,j |j = 1, 2, …, nsensor represents the monitoring data at
(i) (i)
avoid issues like mode collapse, and better simulate the full data dis-
tribution [29]. Their effectiveness in generating complex, time step t in X(i) , and nsensor is the total number of monitoring sensors. Ti
high-dimensional distributions has been proven in fields such as images denotes the total number of monitoring time steps from operation to
[30,31], text [32,33], and speech [34,35]. Within the context of RUL

2
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Fig. 1. The architecture of DTE.

{ }
failure for engine i. Let Y = Y(i) |i = 1, 2, …, nengine represent the cor- segment the normalized multivariate monitoring data for each engine,
{ } employing a window size of L and a step size of 1 [7]. This segmentation
responding RUL values, where Y(i) = y1 , y2 , …, yt , …, yTi . yt is the
(i) (i) (i) (i) (i)
method expands the data volume of engine i from 1 to Ti − L + 1,
RUL value of engine i at time step t, representing the time interval from { (i)(s) }
denoted as X = X |s = 1,2,…,Ti − L + 1 . After similar treatment
(i)
the current time step to the engine’s failure, as shown in Eq. (2):
of all engines’ data, the total data volume expands from nengine to N =
∑nengine
y(i)
t = Ti − t (2) i=1 (Ti − L + 1). For simplicity, the segmented data is still repre-
{ (i)(s) }
In practical applications, system degradation is typically negligible sented as X, and X = X |i = 1,2,…,nengine ,s = 1,2,…,Ti − L + 1 . For
during initial operation, allowing us to disregard the linear RUL target each segmented data X , we predict its RUL at the last time step, with
(i)(s)

function in Eq. (2). We adopt a piecewise linear RUL function [7], {

the corresponding RUL label denoted as Ylabel = y(i)(s) |i = 1,2,…,nengine ,
maintaining a constant RUL in the initial phase as demonstrated in Eq. }
s = 1, 2, …, Ti − L + 1 .
(3):
{
ymax , if t ≤ tmax 3. Methodology
y(i)
t = (3)
Ti − t, if t > tmax
3.1. Degradation trend encoder
where ymax is the cutoff threshold for the initial operation phase in the
piecewise linear RUL target function, and tmax is the corresponding time To effectively capture degradation trend representations in system
step. The range of monitoring data from various sensors varies greatly, monitoring data, we introduce an innovative degradation trend encoder
necessitating normalization during model training to enhance training called DTE, grounded in the variational autoencoder (VAE) framework
efficiency and hasten convergence. Consequently, we apply min-max [38]. The core idea of DTE is to compress the monitoring data X into a
normalization to the raw data [10], as outlined in Eq. (4): low-dimensional latent variable Z. Its structure includes a Decoder, a
Regressor, and three Encoders with shared weights, as shown in Fig. 1.
x(i)
t,j − xmin,j
xt,j =
(i)
(4) Four loss functions, L 1 , L 2 , L 3 , and L 4 , are defined for model opti-
xmax,j − xmin,j mization. We start by fitting the real distribution of the system moni-
toring data X with a parametric distribution p(X), as shown in Eq. (6):
where xt,j is the monitoring data of sensor j at time step t in X(i) , and xt,j
(i) (i)
∫
is the corresponding normalized data. xmin,j and xmax,j represent the p(X) = p(Z)p(X|Z)dZ (6)
minimum and maximum values, respectively, of the data obtained from Z
sensor j across nengine engines. The normalized monitoring data of nengine
{ (i) } where p(Z) represents the prior distribution of the latent variable Z,
engines running to failure is X = X |i = 1,2,…,nengine , where the data
assumed to follow a Gaussian distribution. p(X|Z) denotes the distribu-
for engine i is X , represented as Eq. (5):
(i)
tion of the system monitoring data X given the known latent variable Z.
{ }
Utilizing the maximum likelihood estimation method, we optimize the
X = x(i) 1 , x2 , …, xt , …, xTi (5)
(i) (i) (i) (i)

model by maximizing the likelihood function L of p(X), as detailed in

Eq. (7):
2.2. Data segmentation ∑
L = logp(X) (7)
X
In this study, we utilize a sliding window along the time direction to

3
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Next, we introduce a new distribution q(Z|X), representing the dis-

= − Eq(Z|X) [logp(X|Z)]
tribution of the latent variable Z given the known system monitoring 2
L
data X [39]. Then, we can obtain: nengine Ti − L+1 [
1 ∑ ∑ 1 2] (14)
= ‖X
(i)(s) ̂ (i)(s) ‖
− X
∫ (
p(Z, X) q(Z|X)
) N i=1 s=1 L⋅nsensor 2

logp(X) = q(Z|X)log dZ
q(Z|X) p(Z|X) L 1 is the regularization loss, which ensures that the latent variable
Z
∫ ( ) distribution learned by the Encoder closely aligns with the prior distri-
p(X|Z)p(Z)
= q(Z|X)log dZ + D KL (q(Z|X) ‖ p(Z|X)) bution, thereby preventing overfitting. L 2 is the reconstruction loss,
q(Z|X)
Z which ensures the Decoder’s output closely matches the original input
∫ (
p(X|Z)p(Z)
) data, preserving the data’s integrity. The loss functions L 1 and L 2 can
≥ q(Z|X)log dZ ELBO method effectively enable the DTE to learn useful representations in the latent
q(Z|X)
Z space, which can represent certain key features of the input data.
(8) To guarantee that the latent representation encompasses the degra-
dation trend of the system monitoring data, we incorporate a Regressor
where D KL represents the Kullback-Leibler (KL) divergence, a metric to predict the RUL of the system. Specifically, we use the latent variable
used to measure the difference between two probability distributions. Z to predict the RUL corresponding to the input data X and define an
We define L b as: additional loss function L 3 , as indicated in Eq. (15):
∫ ( )
p(X|Z)p(Z) √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√ nengine Ti − L+1
L b = q(Z|X)log dZ (9) √1 ∑ ∑
q(Z|X) L3=√ (̂y
(i)(s)
− y(i)(s) )
2
(15)
Z N i=1 s=1

which is obviously the evidence lower bound of logp(X), so that the The Regressor’s structure is straightforward, consisting of a fully
following equation is given: connected layer with a Tanh activation function and a single neuron
logp(X) = L b + D KL (q(Z|X) ‖ p(Z|X)) (10) layer dedicated to RUL prediction. L 3 imposes an extra constraint on
the DTE, compelling the model to not only learn the representation of
At this point, the optimization objective becomes to find p(X|Z) and the input data X in the latent space but also to accurately differentiate
q(Z|X) thereby maximizing L b . We rewrite L b as follows: data with varying RUL values. This ensures that the latent variable Z
∫ ( ) ∫ effectively represents the degradation trend in the system monitoring
p(Z)
Lb = q(Z|X)log dZ + q(Z|X)logp(X|Z)dZ data.
q(Z|X)
Z Z (11) To further enhance the distinctiveness of representations for varying
degradation trends in the latent space, we integrate contrastive learning
= − D KL (q(Z|X)‖ p(Z)) + Eq(Z|X) [logp(X|Z)]
[41] into our approach. The main idea is to spatially align data with
q(Z|X) is modeled by the Encoder in DTE, and p(X|Z) by the Decoder. similar degradation trends closer together in the latent space, while
The Encoder receives the input data X and maps it to a latent space. It distancing data with differing trends. This approach bolsters the model’s
generates two outputs for each input: the mean and variance, which proficiency in identifying unique data structures and distinguishing
represent the parameters of the probability distribution in the latent between various degradation trends. A pivotal aspect of contrastive
space. Since the Bidirectional Long Short-Term Memory network (Bi- learning involves creating pairs of positive and negative samples. In our
LSTM) captures context information more effectively when processing approach, we initially select a specific sample X from segmented
(i)(s)

time series data [40], we utilize it to construct the Encoder. The outputs data of engine i as the anchor sample, with its corresponding RUL
of the Bi-LSTM are then passed to two separate fully connected layers, marked as y(i)(s) . We then choose a positive sample Xpos and a negative
(i)(s)

tasked with estimating the mean μZ and variance σ 2Z of the prior distri-
sample Xneg from other segmented data of the same engine, basing our
(i)(s)
bution p(Z). The direct sampling of latent variables Z from the distri-
( )
bution N μZ , σ 2Z poses a challenge due to its randomness and selection on their RUL values, ypos and yneg , which must satisfy certain
(i)(s) (i)(s)

non-differentiability, which complicates the backpropagation algo- criteria. Specifically, ypos should align with |y(i)(s) − ypos | ≤δ1 , while
(i)(s) (i)(s)

rithm. To circumvent this, we employ a reparameterization trick to

yneg should conform to |y yneg | >δ1 and |y − yneg | ≤δ2 . The
(i)(s) (i)(s) (i)(s) (i)(s) (i)(s)
−
refine the sampling process [38]. Specifically, we sample a variable ε
main reason for setting an upper limit δ2 for the negative sample yneg is
(i)(s)
from the normal distribution N (0, I) and then generate the latent vari-
able Z through a transformation process. This approach maintains the to construct hard negatives [42,43]. This can drive the model to achieve
feasibility of backpropagation, as denoted in Eq. (12): a deeper and more detailed understanding of the data, enhance the ac-
curacy and generalizability of the model’s learning of sample repre-
Z = μZ + σ Z × ε (12) sentations. Additionally, two extra Encoders are employed, sharing
repeating VAE
weights with the Encoder designated for the anchor sample, to map both
where × signifies element-wise multiplication. For the Decoder, we also positive and negative samples into the latent space. This process yields
use the Bi-LSTM. Thus, based on Eq. (11), we define two loss functions
representations denoted as Z(i)(s)
pos and Zneg , respectively. For contrastive
(i)(s)
for the DTE, L 1 and L 2 , as outlined in Eq. (13) and Eq. (14):
( ( ) ) training, we define a triplet loss function L 4 . The specific formula is
L 1 = D KL (q(Z|X)‖ p(Z)) = D KL N μZ , σ 2Z ‖ N (0, I) shown in Eq. (16):
nengine Ti − L+1 [ )2 )] nengine Ti − L+1
1 ∑ ∑ 1( (( )2 ) ( )2 ( 1 ∑ ∑ { ( ) ( ) }
= − 1 + log σ (i)(s)
Z − μ(i)(s)
Z − σ (i)(s)
Z L 4 = max d Z(i)(s) , Z(i)(s) − d Z(i)(s) , Z(i)(s) + α, 0 (16)
N i=1 s=1 2 N i=1 s=1 pos neg

(13) ( )
where d Z(i)(s) , Z(i)(s)
pos = ‖ Z(i)(s) − Z(i)(s)
pos ‖2 represents the distance be-

tween the representation Z(i)(s) of the anchor sample in the latent space
( )
and the representation Z(i)(s)
pos of the positive sample, d Z
(i)(s)
, Z(i)(s)
neg =

4
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Fig. 2. An illustration for monitoring data generation with DiffRUL.

‖ Z(i)(s) − Z(i)(s)
neg ‖2 represents the distance between the representation introduces controllable amounts of Gaussian noise into X0 . Specifically,
Z (i)(s)
of the anchor sample and the representation Z(i)(s) of the negative at each step of the Markov chain, Gaussian noise with variance βt is
neg
sample, and α is a predefined margin value. The optimization objective added to Xt− 1 , generating a new variable Xt that follows distribution
of L 4 is to ensure that the distance between the representations of an q(Xt |Xt− 1 ), as depicted in Eq. (18):
anchor sample and a positive sample in the latent space is smaller than ( √̅̅̅̅̅̅̅̅̅̅̅̅̅ )
q(Xt |Xt− 1 ) = N Xt ; μt = 1 − βt Xt− 1 , Σt = βt I (18)
the distance between the anchor sample and a negative sample by at
least a margin α. Achieving this goal results in zero loss; otherwise, a
Notably, q(Xt |Xt− 1 ) is still a Gaussian distribution, defined by mean μt
penalty is imposed based on the discrepancy. The advantage of adding √̅̅̅̅̅̅̅̅̅̅̅̅̅
and variance Σt , where μt = 1 − βt Xt− 1 and Σt = βt I. Here, {βt }Tt=1 , the
margin α is to require the model to push dissimilar samples apart by at
least a preset distance margin [44], thus enhancing the distinctiveness of variance schedule, is an adjustable hyperparameter used to control the
sample representations. Furthermore, this margin acts as a form of intensity of the noise. In this study, we adopt a linear schedule for βt ,
regularization, preventing the model from perfectly distinguishing where its value linearly increases from β1 to βT . After T steps of diffusion,
training data with the minimal distance between positive and negative the original distribution q(X0 ) eventually transforms into a simple
pairs, thereby improving its generalization ability. Through this Gaussian distribution q(XT ), where T is also an adjustable hyper-
approach, a latent space can be learned where samples with similar parameter. The forward process is expressed in Eq. (19):
degradation trends cluster together, while samples with different T
∏
degradation trends are further apart. This enhances the distinctiveness q(X1:T |X0 ) = q(Xt |Xt− 1 ) (19)
of samples with different degradation trends in the latent space, more t=1

accurately reflecting the degradation trajectory of an engine from initial

operation to failure. The efficacy of incorporating contrastive learning Here, the symbol : in q(X1:T |X0 ) denotes the repeated application of q
into DTE will be demonstrated through experiments in Section 4. from time step 1 to T. However, the non-differentiability of random
We denote parameters of the Encoder, Decoder, and Regressor as ωe , variables in models impedes backpropagation for updating parameters.
ωd , and ωr , respectively, and use Dropout technology in these modules to To address this, we utilize a reparameterization trick [28], transforming
prevent overfitting. By amalgamating all defined loss functions, we random variables into deterministic variable functions, thereby
establish a comprehensive optimization framework. This involves solv- rendering the relationship between random variables and model pa-
ing Eq. (17): rameters differentiable. Specifically, we use this trick to explicitly
calculate Xt at any intermediate step based on the original monitoring
minωe ,ωd ,ωr λ1 L 1 + λ2 L 2 + λ3 L 3 + λ4 L (17) ∏
4
data X0 . Defining αt = 1 − βt and αt = ts=1 αs , we can demonstrate this
as shown in Eq. (20):
where λ1 , λ2 , λ3 , and λ4 are coefficients corresponding to the different
√̅̅̅̅̅̅̅̅̅̅̅̅̅ √̅̅̅̅ √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
loss functions. Upon completing joint optimization, we derive repre- Xt =
√̅̅̅̅̅̅̅̅̅̅̅̅̅
1 − βt Xt− 1 + βt εt− 1 = αt αt− 1 Xt− 2 + 1 − αt αt− 1 εt− 2
sentation Z of system monitoring data X through the Encoder. This = … √̅̅̅̅ √̅̅̅̅̅̅̅̅̅̅̅̅̅ (20)
representation, encapsulating both degradation trends and enhanced = αt X0 + 1 − αt ε0
distinctiveness, serves as a critical generative condition, guiding the
generative model toward nuanced recognition of intricate patterns and Here, ε0 , …, εt− 2 , εt− 1 ∼ N (0, I), and given the consistent addition of the
distributions in the dataset. same Gaussian noise at each step, we henceforth use ε as the sole symbol.
Consequently, we derive the distribution of Xt as Eq. (21):
3.2. DiffRUL framework √̅̅̅̅
Xt ∼ q(Xt |X0 ) = N (Xt ; αt X0 , (1 − αt )I) (21)
This section details how DiffRUL captures and simulates the distri- As βt is predetermined, for any time step t, αt and αt can be pre-
bution characteristics of multivariate system monitoring data. As illus- calculated, allowing us to directly calculate the variable Xt at any step
trated in Fig. 2, the structure of DiffRUL comprises two main parts: the using Eq. (21).
forward diffusion process and the reverse generation process. Without
loss of generality, we represent the preprocessed system monitoring data 3.2.2. Reverse generation process
as X0 , assuming it follows the distribution q(X0 ). Upon completing the T-step diffusion process, if T is sufficiently
large, the distribution q(XT ) ultimately becomes a simple Gaussian dis-
3.2.1. Forward diffusion process tribution. Therefore, if we can obtain the distribution q(Xt− 1 |Xt ) of the
The forward diffusion process can be seen as a Markov chain with T reverse process, we can sample XT from the Gaussian distribution
time steps, which is an iterative process. Given a system monitoring data
N (0, I) and restore samples from the real data distribution q(X0 )
sample X0 following the true distribution q(X0 ), this process gradually

5
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Fig. 3. The architecture of DiffNet.

through the reverse generation process, thereby generating new data independent of Xt . Based on Eq. (20), we derive Eq. (28):
points. However, directly calculating q(Xt− 1 |Xt ) is challenging as it in-
1 ( √̅̅̅̅̅̅̅̅̅̅̅̅̅ )
volves statistical parameter estimation of the real distribution q(X0 ). To X0 = √̅̅̅̅ Xt − 1 − αt ε (28)
αt
address this, we utilize a parameterized neural network, denoted as pϕ ,
to approximate q(Xt− 1 |Xt ). Evidently, the distribution pϕ (Xt− 1 |Xt ) of the Combining Eq. (27) and (28), we can deduce that the mean μt de-
reverse process and the distribution q(Xt |Xt− 1 ) of the forward process pends solely on Xt , as shown in Eq. (29):
share the same Gaussian distribution form. Thus, we only need to 1
(
βt
)
parameterize and approximate their means and variances, as shown in μt (Xt ) = √̅̅̅̅ Xt − √̅̅̅̅̅̅̅̅̅̅̅̅̅ ε (29)
αt 1 − αt
Eq. (22):
( ) As the variance βt is a constant and does not affect the optimization
pϕ (Xt− 1 |Xt ) = N Xt− 1 ; μϕ (Xt , t), Σϕ (Xt , t) (22) process, we only need to use a denoising network εϕ to approximate the
If we start with XT and gradually remove the Gaussian noise until we Gaussian noise ε, as illustrated in Eq. (30):
( )
fully restore the original monitoring data distribution of X0 , then this 1 βt
reverse generation process can be represented as Eq. (23): μϕ (Xt , t) = √̅̅̅̅ Xt − √̅̅̅̅̅̅̅̅̅̅̅̅̅εϕ (Xt , t) (30)
αt 1 − αt
T
∏ The denoising network εϕ is a neural network that predicts the
pϕ (X0:T ) = pϕ (XT ) pϕ (Xt− 1 |Xt ) (23)
t=1 Gaussian noise ε based on the time step t and noised data Xt . Details of
the model structure will be presented in Section 3.3.1.
3.2.3. Training and sampling Since both pϕ (Xt− 1 |Xt ) and q(Xt− 1 |Xt , X0 ) follow Gaussian distribu-
To estimate the parameters of pϕ , we use the maximum likelihood tions, the KL divergence is only related to their means and variances.
estimation method, focusing on maximizing the log-likelihood function Ignoring the variance term, L t− 1 can be simplified to Eq. (31):
logpϕ (X0 ). Directly solving logpϕ (X0 ) is rather complex, so this study [ ]
β2t (√̅̅̅̅ √̅̅̅̅̅̅̅̅̅̅̅̅ ) 2
instead optimizes its evidence lower bound [26], which can be expressed L t− 1 = EX0 ,ε,t ∑ ‖ ε − εϕ αt X0 + 1− αt ε,t ‖ (31)
as Eq. (24): 2αt (1− αt )‖ ϕ ‖22 2

[ ] ( )
logpϕ (X0 ) ≥ Eq(X1 |X0 ) logpϕ (X0 |X1 ) − D KL q(XT |X0 ) ‖ pϕ (XT ) − Further discarding weighting terms [26,28], Eq. (31) reduces to Eq.
T
(32):
∑ [ ( )]
Eq(Xt |X0 ) D KL q(Xt− 1 |Xt , X0 ) ‖ pϕ (Xt− 1 |Xt ) [ (√̅̅̅̅ √̅̅̅̅̅̅̅̅̅̅̅̅̅ ) 2 ]
t=2 (24) L simple αt X0 + 1 − αt ε, t ‖ (32)
t− 1 = EX0 ,ε,t ‖ ε − εϕ
T 2
∑
=L0− LT− L t− 1
t=2
Thus, the training process can be summarized as follows: For an
√̅̅̅̅
arbitrary time step t and its corresponding noised data Xt = αt X0 +
( ) √̅̅̅̅̅̅̅̅̅̅̅̅̅
where L T = D KL q(XT |X0 )‖pϕ (XT ) , lacking trainable parameters, it 1 − αt ε, the denoising network εϕ predicts the Gaussian noise ε added
at time step t, and the loss function is computed according to Eq. (32). By
can be treated as a constant and ignored during model training. L t− 1 = this method, we can effectively optimize the log-likelihood function
[ ( )]
Eq(Xt |X0 ) D KL q(Xt− 1 |Xt , X0 )‖pϕ (Xt− 1 |Xt ) represents the distribution logpϕ (X0 ), thus completing the training of the denoising network εϕ .
After this training process, we can start with XT , progressively remove
discrepancy between q(Xt− 1 |Xt , X0 ), the true reverse process under the
the Gaussian noise, and ultimately obtain sampled data from the real
known condition of X0 , and pϕ (Xt− 1 |Xt ), the parametrized neural
distribution q(X0 ). Specifically, by combining Eq. (30) and (22), we
network used to approximate q(Xt− 1 |Xt ). L 0 is essentially a specific derive Eq. (33):
instance of L t− 1 at t = 1. Thus, our primary focus is on L t− 1 , aiming to ( ) √̅̅̅̅
minimize the discrepancy between pϕ (Xt− 1 |Xt ) and q(Xt− 1 |Xt ,X0 ). It can 1 βt
Xt− 1 ∼ pϕ (Xt− 1 |Xt ) = √̅̅̅̅ Xt − √̅̅̅̅̅̅̅̅̅̅̅̅̅εϕ (Xt , t) + βt π (33)
be demonstrated that q(Xt− 1 |Xt , X0 ) also follows a Gaussian distribution αt 1 − αt
[26], which can be expressed as Eq. (25):
where π ∼ N (0, I). Lastly, applying Eq. (23) enables us to procure the
q(Xt− 1 |Xt , X0 ) = N (Xt− 1 ; μt (Xt , X0 ), βt I) (25) desired sampled data.

where the variance βt and mean μt (Xt , X0 ) are given by Eq. (26) and Eq.
(27): 3.3. Detailed designs for DiffRUL
1 − αt− 1
βt = β (26) 3.3.1. Denoising network
1 − αt t
To effectively capture the complex spatio-temporal correlations in
√̅̅̅̅
αt (1 − αt− 1 )
√̅̅̅̅̅̅̅̅̅
αt− 1 βt multivariate monitoring data and accurately predict noise levels at each
μt (Xt , X0 ) = Xt + X0 (27) step, we meticulously design the architecture of the denoising network,
1 − αt 1 − αt
named DiffNet. This network espouses a configuration akin to that of the
Since αt− 1 , αt , and βt are constants, the variance βt is a fixed value inverted bottleneck [45], as illustrated in Fig. 3. It begins by expanding
the channel number of the input data to Nc through a 1 × 1 convolution

6
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Fig. 4. The architecture of feature extraction network.

( )
layer. Then, it captures spatio-temporal correlations in a cos 104i/(d/2− 1)
t . te then goes through three fully connected layers. The
high-dimensional space through a feature extraction network and finally
first two layers employ the GELU activation function, and their param-
compresses the channel number back through another 1 × 1 convolution
eters are shared across all dilated layers. The final layer has unique
layer. The key advantage of this design strategy is that, by increasing the
parameters for each dilated layer. Its output is added to the input of the
channel number of the input data, the network can more effectively
dilated convolution, serving as the diffusion time step embedding for
capture and learn various complex features. This allows each channel to
that dilated layer. Finally, we sum the skip connections for all dilated
focus on detecting different patterns or aspects of the input data, thereby
layers, and after processing through a 3 × 3 convolution, batch
enriching the feature set learned by the model.
normalization layer, and GELU activation function, reconstruct the data.
The function of the feature extraction network is to apprehend the
spatio-temporal correlations inherent in multivariate system monitoring
3.3.2. Guided generation
data within a high-dimensional space. Drawing inspiration from [47],
To enhance the fidelity of degradation trend representations within
the structural foundation of the feature extraction network is predicated
data samples produced by DiffRUL, we extend the unconditional Dif-
on the implementation of stacked dilated convolutions, as depicted in
fRUL to a conditional one. This extension integrates degradation trend
Fig. 4. Specifically, the feature extraction network consists of Nd dilated
representations from DTE as generative conditions into DiffRUL’s
layers, divided into m dilation blocks, with each block containing n
reverse process. Specifically, the process initially defined by Eq. (22) and
= Nd /m dilated layers. For the n dilated layers within each dilation
{ }n− 1 (23) is modified to Eq. (34) and (35):
block, the dilation factor of the dilated convolution is 2i i=0 . To ( )
expedite model convergence and facilitate the training of deeper net- pϕ (Xt− 1 |Xt , Z) = N Xt− 1 ; μϕ (Xt , t, Z), Σϕ (Xt , t) (34)
works, we integrate residual connections into each dilated layer.
T
Moreover, we implement gated activation units [47] within these layers ∏
pϕ (X0:T |Z) = pϕ (XT ) pϕ (Xt− 1 |Xt , Z) (35)
for non-linearity, followed by a 3 × 3 convolution for feature extraction. t=1
Each dilated layer features two additional inputs: the diffusion time
step and the generative condition derived from DTE, with a compre- Accordingly, the loss function for training, initially defined in Eq.
hensive explanation provided in Section 3.3.2. This section focuses (32), is revised to Eq. (36):
primarily on incorporating the diffusion time step, crucial for model [ (√̅̅̅̅ √̅̅̅̅̅̅̅̅̅̅̅̅̅ ) 2]
training as the denoising network predicts Gaussian noise based on L simple
t− 1 = EX0 ,ε,t ‖ ε − εϕ αt X0 + 1 − αt ε, t, Z ‖ (36)
2
diffusion time step t and the corresponding noised data Xt . To facilitate
recognition of varying diffusion steps, we employ a time step embedding The data sampling method defined in Eq. (33) is also adjusted to Eq.
scheme akin to positional encoding in Attention mechanisms [46]. For (37):
each diffusion time step t, a d-dimensional vector te is used as its 1
(
βt
) √̅̅̅̅
embedding. For i = 0, 1, …, d /2 − 1, the calculation method for this Xt− 1 ∼ pϕ (Xt− 1 |Xt , Z) = √̅̅̅̅ Xt − √̅̅̅̅̅̅̅̅̅̅̅̅̅ εϕ (Xt , t, Z) + βt π (37)
( ) αt 1 − αt
vector can be expressed as te (i) = sin 104i/(d/2− 1) t , te (i + d /2) =
In the extended DiffRUL, when predicting Gaussian noise ε at time

Similar to the embedding of the attention mechanism 7

W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Algorithm 1 Table 2
Training and Sampling process of DiffRUL. Details of the N-CMAPSS dataset used in this study.

Input: Diffusion time step T, Variance schedule {βt }Tt=1 , {αt = 1 − βt }Tt=1 , Subsets Data type Engines Failure modes
{ ∏ }T
αt = ts=1 αs t=1 , DS01 Training data Unit 4 HPT
Real system monitoring data distribution q(X0 ), Pre-trained Encoder in DTE. Testing data Unit 7 HPT
Output: Denoising network εϕ , Sampled monitoring data X0
sample
from real distribution DS02 Training data Unit 20 HPT + LPT
Testing data Unit 15 HPT + LPT
q(X0 ).
DS03 Training data Unit 5 HPT
Training Process Testing data Unit 12 HPT
1 For each iteration do DS04 Training data Unit 1 HPT
2 Sample real system monitoring data X0 from q(X0 ). Testing data Unit 9 HPT
3 Sample the time step t from uniform distribution {1, 2, …, T}.
4 Sample the Gaussian noise ε from N (0, I).
5 Obtain the representation Z of X0 through the Encoder. The N-CMAPSS dataset delineates the degradation trajectory of aero-
6 Predict the Gaussian noise ε by the denoising model εϕ , and calculate the loss engines from normal operation to failure under actual flight conditions,
by Eq. (36). featuring two distinct failure modes: the abnormal high-pressure turbine
7 Update the parameters ϕ of εϕ using gradient descent.
8 End For
(HPT) and low-pressure turbine (LPT). This dual-mode configuration
Sampling Process signifies a more complex degradation pattern compared to single-failure
9 Sample XT from N (0, I). scenarios. In our experiments, we utilize eight engines from subsets
10 Sample real system monitoring data X0 from q(X0 ). DS01, DS02, DS03, and DS04 to validate the proposed method, with
11 Obtain the representation Z of X0 through the Encoder. details in Table 2. We employ data from 20 sensors per engine for model
12 For t = T, T − 1, …, 1 do
training, as cited in [49], and the dataset provides accurate RULs,
13 Sample a random variable π from N (0, I).
14
facilitating precise label assignment. Data normalization and segmen-
Sample Xt− 1 from pϕ (Xt− 1 |Xt , Z) by Eq. (37).
15 End For tation are conducted in accordance with Eq. (4) and the procedures in
Section 2.2, respectively.

Table 1 4.2. Evaluation metrics

Details of the C-MAPSS dataset.
In this study, we use Root Mean Square Error (RMSE) and Scoring
Subsets FD001 FD002 FD003 FD004
Function (Score) [25] to evaluate the model’s performance in RUL
Train engines 100 260 100 249
prediction. RMSE measures the difference between the model’s pre-
Test engines 100 259 100 248
Operating conditions 1 6 1 6 dictions and actual values, as calculated by Eq. (38):
Failure modes 1 1 2 2 √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√
√1 ∑ N ( )2
RMSE = √ y t − y(i) (38)
(i)
̂ t
N i=1
step t, the denoising network εϕ considers not just time step t and noised
data Xt , but also the generative condition Z. Therefore, we first encode Given that delayed predictions of RUL generally pose greater risks
the system monitoring data X0 through the Encoder in DTE to obtain the than early predictions in engineering applications, we introduce a
generative condition Z. Then, this Z is fed into each dilated layer of the scoring function that imposes heavier penalties on delays, as defined in
feature extraction network, where it undergoes processing through two Eq. (39):
fully connected layers and a GELU activation function before merging ⎧
with the output of the dilated convolution. This forms the generative ⎪
⎪
⎪ ŷ(i)t − yt
(i)
⎪
condition embedding for that dilated layer, as illustrated in Fig. 4. It is ⎨ e− 13 − 1, ̂
⎪ y t − yt < 0
(i) (i)
∑ N

important to note that during the training of DiffRUL, the weights of the Score = si , si = (39)
i=1
⎪
⎪ ŷt − yt
(i) (i)

Encoder are kept frozen. The specific training and sampling procedures ⎪
⎪
⎪
⎩ e 10 − 1, y
̂ (i)
− y (i)
t ≥ 0
t
for the extended DiffRUL are described in Algorithm 1.

4. Experiments where yt is the actual RUL value of engine i at time step t, while ̂
y t is
(i) (i)

the corresponding prediction value. Lower values of RMSE and Score

In this section, the C-MAPSS [25] and N-CMAPSS [48] datasets are indicate superior predictive performance of the model.
used to validate the performance of the proposed method.
4.3. Implementation details
4.1. Data description
All experiments are conducted on a Linux server equipped with an
1) C-MAPSS dataset NVIDIA GeForce RTX 3090 GPU. We train the proposed method using
PyTorch, with Python serving as the programming language. We opti-
The C-MAPSS dataset, detailed in Table 1, includes aero-engine mize model parameters using the Adam algorithm [50] and exponential
degradation data across four subsets. Subsets FD001 and FD003 pre- decay for learning rate adjustment, initializing model parameters with
sent simpler scenarios with limited operating conditions and failure the Xavier method [51]. The initial learning rate is set to 0.001, and the
modes, whereas FD002 and FD004 feature greater complexity and di- learning rate decay factor is set to 0.98. For the C-MAPSS dataset, the
versity, enhancing challenges in data augmentation and RUL prediction. batch size is set to 32; for the N-CMAPSS dataset, the batch size is set to
For each engine, data from 14 sensors are employed for model training, 64. In the piecewise linear RUL target function, we set the cutoff
as referenced in [7]. RUL labels are assigned based on Eq. (3), with data threshold to ymax = 130 [10]. When performing data segmentation for
normalization according to Eq. (4), and training data is segmented as the C-MAPSS dataset, we chose a sliding window size of L = 30 [9]; for
described in Section 2.2. the N-CMAPSS dataset, we chose a sliding window size of L = 50 [49]. In
DTE, the positive and negative sample thresholds for contrastive
2) N-CMAPSS dataset learning are set at δ1 = 5 and δ2 = 15 respectively, and the margin value

8
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Fig. 5. The degradation trajectories of real data and sampled data in FD001.

in the loss function L 4 is set to α = 0.4. Due to the loss function L 4 representations of its degradation trends. Subsequently, these repre-
being significantly lower than L 1 , L 2 , and L 3 , to ensure that the model sentations are used as generative conditions in the sampling process of
does not neglect L 4 during training, we set the coefficient λ4 higher DiffRUL to generate sampled data. Since the real engine data are pro-
than the other coefficients. Specifically, we utilize the grid search cessed in segments, the resulting sampled data are also segmented. We
method, setting λ1 , λ2 , and λ3 to 1 and defining the search space for λ4 as concatenate these segmented samples by retaining the value at the last
{5,10,15,20}. After conducting validation on the C-MAPSS dataset, we timestep of each segment in the order of data segmentation, ultimately
determine that the coefficient combination with the best validation obtaining a complete sample of the engine running until failure. Fig. 5
performance is λ1 = 1, λ2 = 1, λ3 = 1, and λ4 = 10. The time step of visually contrasts the real data with the generated samples, with each
DiffRUL is set to T = 50, and the variance of Gaussian noise βt is linearly subplot illustrating the trajectory of a sensor’s monitoring data over
increased from β1 = 0.0004 to βT = 0.05. In the DiffNet, the number of time. Following the description in Section 4.1, we retain only the data
channels Nc is set to 64, the number of dilated layers Nd to 30, the from 14 valuable sensors. In the figure, real data are shown in blue and
number of dilation blocks m to 3, and the dimension of the diffusion time sampled data in yellow. Across all subplots, the congruent degradation
step embedding d to 128. In augmenting the dataset, we encode the real trends between the blue and yellow lines validate DiffRUL’s ability to
engine monitoring data using DTE to capture their degradation trends. capture the global degradation information of real engine monitoring
These representations then serve as generative conditions for producing data. Additionally, the presence of local fluctuations in both real and
sampled data via DiffRUL, effectively doubling the quantity of the sampled data, with the sampled data closely following the real data,
original training data. To confirm the robustness of our results, we underscores DiffRUL’s ability to capture nuanced local degradation
conduct 10 trials per subset for all methods and report the average details. Moreover, no significant anomalies or unnatural phenomena
outcomes. appear in the sampled data, suggesting that the data generated by Dif-
fRUL is of high quality and do not introduce large errors or unrealistic
values. Finally, in the monitoring data from different sensors, DiffRUL
4.4. Experiment results on the C-MAPSS dataset
consistently fits well with the real data’s degradation trends, main-
taining a comparably consistent performance level. This is equally
4.4.1. Results of the generated data
crucial for enhancing the subsequent prediction of RUL.
To evaluate DiffRUL’s proficiency in mimicking the distribution and
degradation trends of system monitoring data, we analyze the complete
4.4.2. Results of RUL prediction
degradation trajectories of engines running to failure, comparing real
To verify whether the data generated by DiffRUL can effectively
and sampled data. Specifically, we first randomly select an engine from
enhance the performance of deep learning models in RUL prediction
the subset FD001 and use DTE to encode its monitoring data, obtaining

9
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Table 3
Comparison results of RMSE and Score with different methods in C-MAPSS dataset.
Methods RMSE Score

FD001 FD002 FD003 FD004 FD001 FD002 FD003 FD004

CNN-based Method 16.65 18.72 15.62 22.77 461.25 2448.96 513.58 4109.14
CNN-based Method + DiffRUL 13.71 18.36 12.92 21.15 259.36 1749.45 336.95 2850.36
Gain 17.65% 1.95% 17.32% 7.10% 43.77% 28.56% 34.39% 30.63%
RNN-based Method 13.79 17.55 13.75 20.31 351.60 1711.56 375.11 2784.96
RNN-based Method + DiffRUL 12.78 16.59 12.36 21.13 256.68 1200.43 289.90 2091.82
Gain 7.36% 5.48% 10.13% -4.01% 27.00% 29.86% 22.72% 24.89%
Hybrid Method 13.53 18.01 13.37 21.20 293.62 1848.61 396.67 2449.59
Hybrid Method + DiffRUL 12.65 17.51 13.47 19.63 254.69 1270.47 357.83 1884.86
Gain 6.54% 2.76% -0.76% 7.40% 13.26% 31.27% 9.79% 23.05%
Transformer-based Method 12.60 16.99 13.52 19.49 271.59 1401.99 314.41 2097.70
Transformer-based Method + DiffRUL 11.71 15.90 11.77 18.43 199.29 1162.84 240.37 1627.37
Gain 7.02% 6.41% 12.94% 5.45% 26.62% 17.06% 23.55% 22.42%

Fig. 6. Results of the CNN-based method in C-MAPSS dataset.

tasks, we integrate the sampled data with real data for data augmenta- specifically CNN [6], LSTM [10], TCMN [14], and Transformer [12].
tion, and then use this augmented data for RUL prediction. Specifically, Given that some of these models were proposed earlier, we have opti-
we conduct comparative experiments using different types of model mized their structure and parameters, applied more advanced activation
architectures, including the convolutional neural network-based functions and normalization techniques, and employed Dropout tech-
method, the recurrent neural network-based method, the hybrid nology to prevent overfitting. Specific configuration details are provided
method, and the transformer-based method. The fundamental designs of in Tables D1 to D4 in the Appendix. To ensure the fairness and
these methods are inspired by established RUL prediction models, comparability of the experimental results, all methods use the same data

Fig. 7. Results of the RNN-based method in C-MAPSS dataset.

10
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Fig. 8. Results of the hybrid method in C-MAPSS dataset.

Fig. 9. Results of the Transformer-based method in C-MAPSS dataset.

processing approaches (including sensor selection, data preprocessing, FD001 and FD003, their enhancements are less substantial than those
and data segmentation), training approaches (including optimizer achieved by DiffRUL. In FD002 and FD004, they even negatively impact
choice, learning rate adjustment strategy, and parameter initialization the base method’s performance. Fig. 7, Fig. 8, and Fig. 9 display the
method), and are run on the same hardware equipment. Table 3 presents performance of methods based on CNN, hybrid, and Transformer, and
comparative results, demonstrating that DiffRUL significantly enhances their variants enhanced by CGAN, WGAN-GP, TimeGAN, CR-GAN, and
the performance of baseline methods in nearly all cases. Crucially, Dif- DiffRUL across four subsets. Analysis of these figures leads to a similar
fRUL improves predictions not only in subsets with limited operating conclusion: DiffRUL improves the RMSE and Score in almost all subsets,
conditions and failure modes, like FD001 and FD003, but also in more highlighting its effectiveness. Conversely, the other four generative al-
varied subsets such as FD002 and FD004. This suggests that DiffRUL gorithms lack consistent performance enhancements, showing benefits
effectively captures complex patterns and distributions in diverse and in FD001 and FD003 but limited or adverse effects in FD002 and FD004.
complex data scenarios, generating high-quality sampled data that Synthesizing the results from Figs. 6 to 9, DiffRUL significantly
substantially increases the accuracy of RUL predictions. outperforms the other four generative algorithms in enhancing RUL
prediction effectiveness, with CR-GAN also surpassing CGAN, WGAN-
4.4.3. Comparison with existing methods GP, and TimeGAN. To further explore performance differences, we
To further validate the effectiveness of our proposed DiffRUL, we calculate gain rates in RMSE and Score for methods using CR-GAN and
compare it with four established generative algorithms: CGAN [19], DiffRUL, presented in Figs. 10 and 11. Results show that CR-GAN im-
WGAN-GP [20], TimeGAN [21], and CR-GAN [23], with results shown proves RMSE in FD001 and FD003 with gains of 6.35% and 7.32%,
in Figs. 6 to 9. Fig. 6 illustrates the performance of the CNN-based respectively, but decreases RMSE in FD002 and FD004 by 2.91% and
method and its enhancement through CGAN, WGAN-GP, TimeGAN, 4.08%, with similar trends in Score. Conversely, DiffRUL enhances
CR-GAN, and DiffRUL across four subsets. The introduction of DiffRUL performance across nearly all subsets and notably outperforms CR-GAN
notably improves the RMSE and Score in all subsets, outperforming the in the more challenging FD002 and FD004. In summary, while CGAN,
other generative algorithms, which show inconsistent effects across WGAN-GP, TimeGAN, and CR-GAN falter in complex subsets like FD002
different subsets. Specifically, while they provide some improvement in and FD004, leading to suboptimal enhancements, DiffRUL not only

11
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Fig. 10. The gain rates of RMSE for different methods enhanced by CR-GAN and DiffRUL.

Fig. 11. The gain rates of Score for different methods enhanced by CR-GAN and DiffRUL.

excels in simpler subsets (FD001 and FD003) but also shows marked each engine, each linked to a specific RUL value. In Fig. 12, each dot
improvement in more complex ones (FD002 and FD004). These findings represents these representations within the latent space, color-coded to
highlight DiffRUL’s widespread applicability and superior efficacy in reflect an engine’s RUL value, as indicated by the color bar. A yellow
RUL prediction, particularly in scenarios demanding accurate capture region in each subplot’s lower-left corner denotes the engine’s early
and replication of complex data distributions. operational stage, representing a longer RUL. As degradation progresses,
the dots move towards the green and ultimately to the deep blue area in
4.4.4. Ablation experiments the upper-right corner, signaling imminent failure. Additionally, we
In this section, we perform ablation experiments to validate that DTE randomly select an engine in each subset and highlight its series of
effectively extracts distinctive representations of degradation trends representations in distinct colors, as shown in Fig. 12. These visual re-
from the system monitoring data, aiding DiffRUL in generating superior sults clearly demonstrate DTE’s capability to effectively distinguish
samples. Initially, we encode monitoring data from all engines across various degradation trends, exemplified by engine unit #78 in FD001,
FD001, FD002, FD003, and FD004 with DTE to obtain these represen- where the representation shifts from the yellow area, indicating a longer
tations, visualized in Fig. 12. As the full lifecycle monitoring data for RUL, to the deep blue area, indicating a shorter RUL, culminating in
each engine is segmented, DTE produces a series of representations for failure.

12
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Fig. 12. Degradation trend representations obtained by DTE on four subsets.

Fig. 13. Degradation trend representations obtained by DTE without contrastive learning on four subsets.

13
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Table 4
Comparison results of RMSE and Score augmented by different methods in C-MAPSS dataset.
Base Architecture Augmented Methods RMSE Score

FD001 FD002 FD003 FD004 FD001 FD002 FD003 FD004

CNN-based Method I 15.06 19.94 13.55 21.85 403.07 2812.83 398.59 4460.15
Method II 14.88 19.67 13.77 22.08 297.83 1839.75 360.26 3072.19
Proposed 13.71 18.36 12.92 21.15 259.36 1749.45 336.95 2850.36
RNN-based Method I 12.85 17.85 12.89 21.58 284.69 1910.51 272.98 3183.54
Method II 13.08 17.08 12.81 21.12 282.75 1324.19 253.22 2123.29
Proposed 12.78 16.59 12.36 21.13 256.68 1200.43 289.90 2091.82
Hybrid Method I 13.09 18.86 13.70 23.16 254.00 2368.68 365.16 4018.00
Method II 12.99 17.59 13.30 21.54 250.67 1502.43 336.30 2879.97
Proposed 12.65 17.51 13.47 19.63 254.69 1270.47 357.83 1884.86
Transformer-based Method I 12.72 16.38 12.57 19.28 278.88 1372.57 285.56 2196.97
Method II 12.59 16.70 13.25 18.63 261.09 1090.46 287.21 1786.53
Proposed 11.71 15.90 11.77 18.43 199.29 1162.84 240.37 1627.37

Secondly, to evaluate the impact of integrating contrastive learning degradation trends, capturing more distinctive data structures and
within DTE to enhance the distinctiveness of degradation trend repre- patterns.
sentations in the latent space, we adjust the coefficient of the triplet loss Finally, to validate that the degradation trend representations
function to zero in the joint optimization framework of DTE. We then re- extracted by DTE can guide DiffRUL in generating higher-quality data
visualize the engine representations, with results displayed in Fig. 13. samples, we conduct a comparative experiment. This experiment com-
Compared to Fig. 12, the representations in Fig. 13 show notably less pares performance improvements of the base RUL prediction methods
distinctiveness, with degradation trends clustering more closely in the enhanced by DiffRUL, using representations extracted by DTE versus
latent space. For instance, engine unit #167 in FD004 exhibits a clear using actual RUL values directly as generative conditions [23]. Addi-
degradation trajectory in Fig. 12, while in Fig. 13, its representations are tionally, to further demonstrate the effectiveness of contrastive learning
more clustered, displaying a less defined degradation path. This contrast in DTE, we set the triplet loss function coefficient to zero and replicate
highlights the significance of incorporating contrastive learning, as it the experiment with this modified version. The results are detailed in
enables the model to more effectively discern and represent different Table 4, where Method I represents DiffRUL using actual RUL values as

Fig. 14. The degradation trajectories of real data and sampled data in DS01.

14
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Table 5
Comparison results of RMSE and Score augmented by different methods in N-CMAPSS dataset.
Base Architecture Augmented Methods RMSE Score

DS01 DS02 DS03 DS04 DS01 DS02 DS03 DS04

CNN-based Non-augmentation 7.18 8.40 10.60 11.34 78.80 53.03 111.95 125.82
CGAN 7.20 9.44 10.95 11.12 74.33 63.43 119.53 121.59
Gain -0.28% -12.38% -3.30% 1.94% 5.67% -19.61% -6.77% 3.36%
WGAN-GP 7.13 9.21 11.07 10.95 76.52 64.84 118.70 111.25
Gain 0.70% -9.64% -4.43% 3.44% 2.89% -22.27% -6.03% 11.58%
TimeGAN 7.01 8.86 10.14 10.77 70.48 60.50 108.71 112.46
Gain 2.37% -5.48% 4.34% 5.03% 10.56% -14.09% 2.89% 10.62%
CR-GAN 6.67 8.97 9.88 10.47 65.85 55.69 113.88 108.75
Gain 7.10% -6.79% 6.79% 7.67% 16.43% -5.02% -1.72% 13.57%
Proposed 6.30 7.47 9.90 9.56 57.43 42.98 104.62 95.49
Gain 12.26% 11.07% 6.60% 15.70% 27.12% 18.95% 6.55% 24.11%
RNN-based Non-augmentation 6.61 7.83 10.26 10.90 70.38 51.64 105.19 117.68
CGAN 6.48 8.08 10.23 10.79 64.39 56.91 103.42 121.48
Gain 1.97% -3.19% 0.29% 1.01% 8.51% -10.21% 1.68% -3.23%
WGAN-GP 6.44 8.11 10.15 10.66 62.42 55.67 106.31 112.28
Gain 2.57% -3.58% 1.07% 2.20% 11.31% -7.80% -1.06% 4.59%
TimeGAN 6.52 8.01 9.89 10.37 59.77 53.85 99.89 108.75
Gain 1.36% -2.30% 3.61% 4.86% 15.08% -4.28% 5.04% 7.59%
CR-GAN 5.65 7.99 9.58 9.89 54.85 50.31 97.57 103.23
Gain 14.52% -2.04% 6.63% 9.27% 22.07% 2.58% 7.24% 12.28%
Proposed 5.72 6.93 8.97 9.36 51.09 40.62 84.09 94.42
Gain 13.46% 11.49% 12.57% 14.13% 27.41% 21.34% 20.06% 19.77%
Hybrid Non-augmentation 6.39 6.78 9.77 10.14 69.57 48.26 98.91 103.83
CGAN 6.52 7.02 9.67 9.89 71.02 57.60 97.80 101.28
Gain -2.03% -3.54% 1.02% 2.47% -2.08% -19.35% 1.12% 2.46%
WGAN-GP 6.27 6.93 9.49 9.77 65.74 55.87 102.41 95.46
Gain 1.88% -2.21% 2.87% 3.65% 5.51% -15.77% -3.54% 8.06%
TimeGAN 6.05 6.87 9.42 9.65 63.42 51.23 91.32 97.77
Gain 5.32% -1.33% 3.58% 4.83% 8.84% -6.15% 7.67% 5.84%
CR-GAN 5.96 6.85 9.04 9.27 58.62 52.06 90.20 92.16
Gain 6.73% -1.03% 7.47% 8.58% 15.74% -7.87% 8.81% 11.24%
Proposed 5.76 6.29 8.56 8.69 51.23 36.76 82.18 80.31
Gain 9.86% 7.23% 12.38% 14.30% 26.36% 23.83% 16.91% 22.65%
Transformer-based Non-augmentation 6.06 6.61 9.50 9.86 62.60 43.18 93.37 98.98
CGAN 6.23 7.13 9.44 10.05 70.28 52.42 91.77 96.64
Gain -2.81% -7.87% 0.63% -1.93% -12.27% -21.40% 1.71% 2.36%
WGAN-GP 6.15 7.05 9.28 9.74 65.50 49.67 90.23 91.78
Gain -1.49% -6.66% 2.32% 1.22% -4.63% -15.03% 3.36% 7.27%
TimeGAN 5.72 6.92 9.12 9.17 57.36 50.31 87.78 92.23
Gain 5.61% -4.69% 4.00% 7.00% 8.37% -16.51% 5.99% 6.82%
CR-GAN 4.77 6.88 8.72 8.97 51.20 47.52 85.55 87.45
Gain 21.29% -4.08% 8.21% 9.03% 18.21% -10.05% 8.38% 11.65%
Proposed 4.86 5.81 7.95 8.48 44.73 29.53 69.11 77.22
Gain 19.80% 12.10% 16.32% 14.00% 28.55% 31.61% 25.98% 21.98%

generative conditions, and Method II applies DiffRUL with the triplet degradation trajectories of a selected engine within the subset DS01.
loss function coefficient set to zero. The results indicate that using our Given the longer operational cycles of engines in DS01, to facilitate vi-
proposed DiffRUL, the four base methods achieve average RMSE values sual comparison, we confine our comparison to the last 3,000 samples.
of 12.71, 17.09, 12.63, and 20.09 and average Score values of 242.50, Although 20 sensors are used in the model training process, due to space
1345.80, 306.26, and 2113.60 across FD001, FD002, FD003, and constraints, this section presents data from only 14 sensors. As shown in
FD004, respectively. In contrast, under Method I, these methods yield Fig. 14, through a series of sub-figures, each tracing the evolution of
RMSE averages of 13.43, 18.26, 13.18, and 21.47 and Score averages of sensor-derived data over time, the real data is marked by blue lines and
305.16, 2116.15, 330.57, and 3464.67 in the same subsets. These results the sampled data by yellow lines. The comparison reveals a high degree
affirm that employing degradation trend representations extracted by of consistency between the real and sampled data in terms of global
DTE as generative conditions enhances base method performance more change trends and local fluctuations, with no significant anomalies
effectively in RUL prediction tasks than using actual RUL values. This observed in the sampled data. These results further confirm DiffRUL’s
confirms that degradation trend representations effectively guide Dif- effectiveness in accurately capturing and understanding the distribution
fRUL in generating higher quality data samples. Furthermore, with and degradation patterns of system monitoring data.
Method II, the base methods record RMSE averages of 13.39, 17.76,
13.28, and 20.84, and Score averages of 273.09, 1439.21, 309.25, and 4.5.2. RUL prediction results of different methods
2465.49, respectively, which further highlights the efficacy of DiffRUL, To further evaluate the effectiveness of our proposed method, we
particularly in enhancing RUL prediction performance and substantiat- conduct a comparative assessment of various RUL prediction methods
ing the benefits of integrating contrastive learning into DTE. within the N-CMAPSS dataset, enhanced by DiffRUL and four other
generative algorithms, as detailed in Table 5. The “Non-augmentation”
category represents the baseline architecture, while “Gain” refers to the
4.5. Experiment results on the N-CMAPSS dataset
percentage improvement in RMSE and Score using generative algo-
rithms. Notably, our proposed DiffRUL significantly enhances the per-
4.5.1. Results of the generated data
formance metrics of baseline RUL prediction methods across all subsets.
For the N-CMAPSS dataset, we compare the real and sampled

15
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Fig. 15. Error distribution of the CNN-based method in N-CMAPSS dataset.

Fig. 16. Error distribution of the RNN-based method in N-CMAPSS dataset.

In contrast, the other four generative algorithms do not show uniform proving its capability to capture complex distributions and synthesize
effectiveness across different subsets, as seen with DiffRUL. For instance, high-quality data.
CR-GAN improves RMSE and Score in subsets DS01, DS03, and DS04, As a supplementary assessment to the RMSE and Score, we analyze
but not to the extent achieved by DiffRUL. In DS02, CR-GAN even the distribution of prediction errors for selected engines in four subsets
negatively impacts the performance, with baseline methods showing an using histograms, further illustrating DiffRUL’s enhancement of baseline
average RMSE decrease of 3.49% and a Score decrease of 5.09%. RUL prediction methods. These results are shown in Figs. 15 to 18. The
Similarly, the influence exerted by TimeGAN varies, benefiting DS01, histograms display the RUL prediction error on the horizontal axis and
DS03, and DS04, while worsening outcomes in DS02. WGAN-GP and the probability distribution on the vertical axis, with the blue sections
CGAN show minimal positive impact on baseline methods, with im- representing the error distribution of the baseline prediction model, the
provements in some subsets overshadowed by notable performance red sections showing the error distribution of the baseline model
decrements in others. These results underscore DiffRUL’s superior enhanced by DiffRUL, and the dark red sections indicating the overlap
ability to enhance prediction effectiveness compared to other generative between the two. A method is considered superior if most prediction
algorithms, excelling in both simpler subsets (DS01, DS03, and DS04) errors cluster towards lower values. Fig. 15 reveals that for subsets
and consistently improving in the more complex subset (DS02), further DS01, DS02, DS03, and DS04, the error distribution of the CNN-based

16
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Fig. 17. Error distribution of the hybrid method in N-CMAPSS dataset.

Fig. 18. Error distribution of the Transformer-based method in N-CMAPSS dataset.

method, enhanced by DiffRUL, markedly shifts towards lower values, both the quality and degradation trajectories of real monitoring data.
confirming DiffRUL’s ability to refine the accuracy of RUL predictions. Initially employing DTE to extract distinctive representations of the
Figs. 16, 17, and 18 depict error distributions for the RNN-based, hybrid, degradation process from operation to failure, these representations
and Transformer-based methods, respectively, leading to similar find- significantly enhance the model’s ability to identify complex data pat-
ings: DiffRUL significantly reduces RUL prediction errors, boosting terns. Moreover, the diffusion model, adapted for multivariate system
performance. These analyses further verify that DiffRUL effectively monitoring, includes a forward process that incrementally adds noise,
generates high-quality sample data in the N-CMAPSS dataset, markedly followed by a reverse process in which a denoising model, refined at
enhancing RUL prediction accuracy. successive noise levels, restores the original samples. This methodology
enriches understanding of the data’s underlying distribution. Addition-
5. Conclusions ally, the creation of DiffNet, integrated with generative conditions,
accurately captures complex spatio-temporal correlations, ensuring
This paper introduces DiffRUL, an innovative method designed to precise noise level predictions at each diffusion time step. Empirical
augment multivariate engine monitoring data using a diffusion model results confirm DiffRUL’s effectiveness, demonstrating its superiority in
framework. This approach excels in generating samples that replicate producing high-quality data and significantly enhancing the predictive

17
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

accuracy of deep learning models in RUL estimation tasks. Declaration of competing interest
In future work, we aim to tackle two challenges. First, we will
attempt to overcome the issue of slow model training induced by the The authors declare that they have no known competing financial
diffusion process. Second, we will focus on using diffusion models to interests or personal relationships that could have appeared to influence
augment incomplete system monitoring data, a common issue in prac- the work reported in this paper.
tical scenarios where old systems are often replaced before scheduled
maintenance. This results in a substantial amount of incomplete data, Data availability
lacking failure times, which poses challenges for training RUL prediction
models. The key research question is how to effectively utilize the Data will be made available on request.
diffusion model’s strengths in fitting complex data distributions to
extend these incomplete data sequences.
Acknowledgments
CRediT authorship contribution statement
This work was supported by the National Natural Science Foundation
Wei Wang: Writing – review & editing, Writing – original draft, of China [72271200, 72231008], the Open Fund of Intelligent Control
Validation, Software, Methodology, Conceptualization. Honghao Song: Laboratory [ICL-2023-0304], the Science and Technology Innovation
Writing – review & editing. Shubin Si: Writing – review & editing, Group Program of Shaanxi Province [2024RS-CXTD-28], and the
Funding acquisition. Wenhao Lu: Writing – review & editing. Zhiqiang Distinguished Young Scholar Program of Shaanxi Province [2023-JQ-
Cai: Writing – review & editing, Supervision, Resources, Funding JC-10].
acquisition.

Appendix A. Derivation of Eq. (24)

where q(Xt |Xt− 1 ) can be represented as:

where q(Xt− 1 , Xt |X0 ) can be expressed as:

q(Xt− 1 , Xt , X0 ) q(Xt , X0 )
q(Xt− 1 , Xt |X0 ) = ⋅ = q(Xt− 1 |X0 ,Xt )q(Xt |X0 ) (44)
q(Xt , X0 ) q(X0 )
Substituting into Eq. (43), we obtain:

18
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

{ [ ]}
T
[ ] ( ) ∑ q(Xt− 1 |Xt , X0 )
ELBO = Eq(X1 |X0 ) logpϕ (X0 |X1 ) − D KL q(XT |X0 ) ‖ pϕ (XT ) − Eq(Xt |X0 ) Eq(Xt− 1 |X0 ,Xt ) log
t=2 pϕ (Xt− 1 |Xt )
(45)
T
[ ] ( ) ∑ [ ( )]
= Eq(X1 |X0 ) logpϕ (X0 |X1 ) − D KL q(XT |X0 ) ‖ pϕ (XT ) − Eq(Xt |X0 ) D KL q(Xt− 1 |Xt , X0 ) ‖ pϕ (Xt− 1 |Xt )
t=2

which completes the derivation of Eq. (24).

Appendix B. Derivation of Eq. (25), (26), and (27)

q(Xt− 1 , Xt , X0 ) q(Xt− 1 , X0 )/q(X0 ) q(Xt |Xt− 1 )q(Xt− 1 |X0 )

q(Xt− 1 |Xt , X0 ) = ⋅ = (46)
q(Xt− 1 , X0 ) q(Xt , X0 )/q(X0 ) q(Xt |X0 )
Based on Eq. (20), we obtain:
√̅̅̅̅ √̅̅̅̅̅̅̅̅̅̅̅̅̅ √̅̅̅̅
q(Xt |Xt− 1 ) = αt Xt− 1 + 1 − αt ε ∼ N (Xt ; αt Xt− 1 , (1 − αt )I) (47)
√̅̅̅̅̅̅̅̅̅ √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ √̅̅̅̅̅̅̅̅̅
q(Xt− 1 |X0 ) = αt− 1 X0 + 1 − αt− 1 ε ∼ N (Xt− 1 ; αt− 1 X0 , (1 − αt− 1 )I) (48)
√̅̅̅̅ √̅̅̅̅̅̅̅̅̅̅̅̅̅ √̅̅̅̅
q(Xt |X0 ) = αt X0 + 1 − αt ε ∼ N (Xt ; αt X0 , (1 − αt )I) (49)
Then, we get:
( √̅̅̅̅ ) ( √̅̅̅̅̅̅̅̅̅ )
N Xt ; αt Xt− 1 , (1 − αt )I N Xt− 1 ; αt− 1 X0 , (1 − αt− 1 )I
q(Xt− 1 |Xt , X0 ) = ( √̅̅̅̅ )
N Xt ; αt X0 , (1 − αt )I
{ [( √̅̅̅̅ )2 ( √̅̅̅̅̅̅̅̅̅ )2 ( √̅̅̅̅ )2 ]}
1 Xt − αt Xt− 1 Xt− 1 − αt− 1 X0 Xt − αt X 0
∝exp − + −
2 1 − αt 1 − αt− 1 1 − αt
⎧ ⎡[ (√̅̅̅ √̅̅̅̅̅̅̅ )]2 ⎤⎫
(50)
⎪ X X X ⎪
αt (1− αt− 1 ) αt− 1 βt
⎪
⎨ 1 ⎢ t− 1 − t + 0 ⎪
1− αt 1− αt ⎥⎬
∝exp − ⎣ ⎢ ⎥
⎪
⎪ 2 1 − αt− 1 ⎦⎪
⎪
⎩ β ⎭
1 − αt t
( (√̅̅̅̅ √̅̅̅̅̅̅̅̅̅ ) )
αt (1 − αt− 1 ) αt− 1 βt 1 − αt− 1
∼ N Xt− 1 ; Xt + X0 , βt I
1 − αt 1 − αt 1 − αt

which completes the derivation of Eq. (25), (26), and (27).

Appendix C. Derivation of Eq. (31)

Recall that the KL divergence between two Gaussian distributions is:

[ ]
1 |ΣY | ( )
D KL (N (X; μX , ΣX ) ‖ N (Y; μY , ΣY )) = log − d + tr ΣY − 1 ΣX + (μY − μX )T ΣY − 1 (μY − μX ) (51)
2 |ΣX |

In our study, we set the variances of two Gaussian distributions, pϕ (Xt− 1 |Xt ) and q(Xt− 1 |Xt , X0 ), to be exactly matched, which results in:
( ) ( ( ) ( ))
D KL q(Xt− 1 |Xt , X0 ) ‖ pϕ (Xt− 1 |Xt ) = D KL N Xt− 1 ; μq , Σq (t) ‖ N Xt− 1 ; μϕ , Σq (t)
[ )]
1 ( )T (
= log1 − d + d + μϕ − μq Σq (t)− 1 μϕ − μq (52)
2
1 [ ]
= 2
‖ μϕ − μq ‖22
2‖ Σq (t) ‖2

Based on Eq. (29), (30), and (52), we can derive:

[ ]
1 2
L t− 1 = EX0 ,ε,t ∑ 2
‖ μt (X t ) − μ (Xt , t) ‖ 2
2‖ ϕ (Xt , t) ‖2
ϕ

[ ( ) ( ) 2]
1 1 βt 1 βt
= EX0 ,ε,t ∑ ‖ √ ̅̅̅̅ X t − √ ̅̅̅̅̅̅̅̅̅̅̅̅̅ ε (Xt , t) − √̅̅̅̅ Xt − √̅̅̅̅̅̅̅̅̅̅̅̅̅ ε ‖ (53)
2
1 − αt 1 − αt
ϕ
2‖ ϕ (Xt , t) ‖ αt
2
αt 2

[ ]
β2t (√̅̅̅̅ √̅̅̅̅̅̅̅̅̅̅̅̅̅ ) 2
= EX0 ,ε,t ∑ ‖ ε − εϕ αt X0 + 1 − αt ε, t ‖
2αt (1 − αt )‖ ϕ ‖22 2

which completes the derivation of Eq. (31).

19
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Appendix D. Specific configuration of RUL prediction methods

Table D1
Specific configuration of the convolutional neural network-based method.

Layer Network Parameters

1 2D Convolution Filters = 32, Kernel size = (14, 3), Stride = (1, 1)

2D Batch Normalization Number of features = 32
Activation ReLU
2D Average Pooling Kernel size = (1, 2), Stride = 2, Padding = 0
2 2D Convolution Filters = 64, Kernel size = (1, 3), Stride = (1, 1)
2D Batch Normalization Number of features = 64
Activation ReLU
2D Average Pooling Kernel size = (1, 2), Stride = 2, Padding = 0
3 Linear Units = 16, Bias = True
Activation ReLU
Dropout p = 0.1
4 Linear Units = 8, Bias = True
Activation ReLU
Dropout p = 0.1
5 Linear Units = 1, Bias = True

Table D2
Specific configuration of the recurrent neural network-based method.

Layer Network Parameters

1 LSTM Hidden size = 64, Bias = True, Bidirectional = False

Dropout p = 0.1
2 LSTM Hidden size = 64, Bias = True, Bidirectional = False
Dropout p = 0.1
3 Linear Units = 16, Bias = True
Activation ReLU
Dropout p = 0.1
4 Linear Units = 8, Bias = True
Activation ReLU
Dropout p = 0.1
5 Linear Units = 1, Bias = True

Table D3
Specific configuration of the hybrid method.

Layer Network Parameters

1 1D Convolution Filters = 18, Kernel size = 2, Stride = 1, Padding = same

Activation ReLU
1D Max Pooling Kernel size = 2, Stride = 2, Padding = same
2 1D Convolution Filters = 36, Kernel size = 2, Stride = 1, Padding = same
Activation ReLU
1D Max Pooling Kernel size = 2, Stride = 2, Padding = same
3 1D Convolution Filters = 72, Kernel size = 2, Stride = 1, Padding = same
Activation ReLU
1D Max Pooling Kernel size = 2, Stride = 2, Padding = same
4 Linear Units = 420, Bias = True
Activation ReLU
Dropout p = 0.1
5 LSTM Hidden size = 32, Bias = True, Bidirectional = False
Dropout p = 0.1
6 LSTM Hidden size = 32, Bias = True, Bidirectional = False
Dropout p = 0.1
7 Linear Units = 16, Bias = True
Activation ReLU
Dropout p = 0.1
8 Linear Units = 1, Bias = True

20
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

Table D4
Specific configuration of the transformer-based method.

Layer Network Parameters

1 Linear Units = 64, Bias = True

2 Positional Encoding Number of features = 64
Dropout p = 0.1
3 Transformer Encoder Number of layers = 2, Number of features = 64,
Number of heads = 8, Dimension of feedforward network = 512
Dropout p = 0.1
4 Linear Units = 16, Bias = True
Activation ReLU
Dropout p = 0.1
5 Linear Units = 8, Bias = True
Activation ReLU
Dropout p = 0.1
6 Linear Units = 1, Bias = True

References [18] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al.

Generative adversarial nets. Adv. Neural Inf. Process. Syst., 27. Curran Associates,
Inc.; 2014.
[1] De Giorgi MG, Menga N, Ficarella A. Exploring prognostic and diagnostic
[19] Behera S, Misra R. Generative adversarial networks based remaining useful life
techniques for jet engine health monitoring: A review of degradation mechanisms
estimation for IIoT. Comput Electr Eng 2021;92:107195. https://doi.org/10.1016/
and advanced prediction strategies. Energies 2023;16:2711. https://doi.org/
j.compeleceng.2021.107195.
10.3390/en16062711.
[20] Gu X, See KW, Liu Y, Arshad B, Zhao L, Wang Y. A time-series Wasserstein GAN
[2] Nguyen KTP, Medjaher K, Tran DT. A review of artificial intelligence methods for
method for state-of-charge estimation of lithium-ion batteries. J Power Sources
engineering prognostics and health management with implementation guidelines.
2023;581:233472. https://doi.org/10.1016/j.jpowsour.2023.233472.
Artif Intell Rev 2023;56:3659–709. https://doi.org/10.1007/s10462-022-10260-y.
[21] Shangguan A, Xie G, Fei R, Mu L, Hei X. Train wheel degradation generation and
[3] Guo B, Qiao Z, Dong H, Wang Z, Huang S, Xu Z, et al. Temporal convolutional
prediction based on the time series generation adversarial network. Reliab Eng Syst
approach with residual multi-head attention mechanism for remaining useful life of
Saf 2023;229:108816. https://doi.org/10.1016/j.ress.2022.108816.
manufacturing tools. Eng Appl Artif Intell 2024;128:107538. https://doi.org/
[22] Cheng H, Kong X, Wang Q, Ma H, Yang S, Xu K. Remaining useful life prediction
10.1016/j.engappai.2023.107538.
combined dynamic model with transfer learning under insufficient degradation
[4] Kordestani M, Orchard ME, Khorasani K, Saif M. An overview of the state of the art
data. Reliab Eng Syst Saf 2023;236:109292. https://doi.org/10.1016/j.
in aircraft prognostic and health management strategies. IEEE Trans Instrum Meas
ress.2023.109292.
2023;72:1–15. https://doi.org/10.1109/TIM.2023.3236342.
[23] Zhang X, Qin Y, Yuen C, Jayasinghe L, Liu X. Time-series regeneration with
[5] Fu S, Lin L, Wang Y, Guo F, Zhao M, Zhong B, et al. MCA-DTCN: A novel dual-task
convolutional recurrent generative adversarial network for remaining useful life
temporal convolutional network with multi-channel attention for first prediction
estimation. IEEE Trans Ind Inform 2021;17:6820–31. https://doi.org/10.1109/
time detection and remaining useful life prediction. Reliab Eng Syst Saf 2024;241:
TII.2020.3046036.
109696. https://doi.org/10.1016/j.ress.2023.109696.
[24] Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X, et al.
[6] Sateesh Babu G, Zhao P, Li X-L. Deep convolutional neural network based
Improved techniques for training GANs. Adv. Neural Inf. Process. Syst., 29. Curran
regression approach for estimation of remaining useful life. In: Database Syst. Adv.
Associates, Inc.; 2016.
Appl. Cham: Springer International Publishing; 2016. p. 214–28. https://doi.org/
[25] Saxena A, Goebel K, Simon D, Eklund N. Damage propagation modeling for aircraft
10.1007/978-3-319-32025-0_14.
engine run-to-failure simulation. In: 2008 Int. Conf. Progn. Health Manag. IEEE;
[7] Li X, Ding Q, Sun J-Q. Remaining useful life estimation in prognostics using deep
2008. p. 1–9. https://doi.org/10.1109/PHM.2008.4711414.
convolution neural networks. Reliab Eng Syst Saf 2018;172:1–11. https://doi.org/
[26] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. Adv. Neural Inf.
10.1016/j.ress.2017.11.021.
Process. Syst., 33. Curran Associates, Inc.; 2020. p. 6840–51.
[8] Xiang S, Li P, Huang Y, Luo J, Qin Y. Single gated RNN with differential weighted
[27] Nichol AQ, Dhariwal P. Improved denoising diffusion probabilistic models. In:
information storage mechanism and its application to machine RUL prediction.
Proc. 38th Int. Conf. Mach. Learn., PMLR; 2021. p. 8162–71.
Reliab Eng Syst Saf 2024;242:109741. https://doi.org/10.1016/j.
[28] Luo C. Understanding diffusion models: A unified perspective 2022. https://doi.
ress.2023.109741.
org/10.48550/arXiv.2208.11970.
[9] Shi J, Zhong J, Zhang Y, Xiao B, Xiao L, Zheng Y. A dual attention LSTM
[29] Dhariwal P, Nichol A. Diffusion models beat GANs on image synthesis. Adv. Neural
lightweight model based on exponential smoothing for remaining useful life
Inf. Process. Syst., 34. Curran Associates, Inc.; 2021. p. 8780–94.
prediction. Reliab Eng Syst Saf 2024;243:109821. https://doi.org/10.1016/j.
[30] Kulikov V, Yadin S, Kleiner M, Michaeli T. SinDDM: A single image denoising
ress.2023.109821.
diffusion model. In: Proc. 40th Int. Conf. Mach. Learn., PMLR; 2023. p. 17920–30.
[10] Zheng S, Ristovski K, Farahat A, Gupta C. Long short-term memory network for
[31] Croitoru F-A, Hondru V, Ionescu RT, Shah M. Diffusion models in vision: A survey.
remaining useful life estimation. In: 2017 IEEE Int. Conf. Progn. Health Manag.
IEEE Trans Pattern Anal Mach Intell 2023;45:10850–69. https://doi.org/10.1109/
ICPHM; 2017. p. 88–95. https://doi.org/10.1109/ICPHM.2017.7998311.
TPAMI.2023.3261988.
[11] Zhang H, Xi X, Pan R. A two-stage data-driven approach to remaining useful life
[32] Li X, Thickstun J, Gulrajani I, Liang PS, Hashimoto TB. Diffusion-LM improves
prediction via long short-term memory networks. Reliab Eng Syst Saf 2023;237:
controllable text generation. Adv Neural Inf Process Syst 2022;35:4328–43.
109332. https://doi.org/10.1016/j.ress.2023.109332.
[33] Lin Z, Gong Y, Shen Y, Wu T, Fan Z, Lin C, et al. Text generation with diffusion
[12] Chen D, Hong W, Zhou X. Transformer network for remaining useful life prediction
language models: A pre-training approach with continuous paragraph denoise. In:
of lithium-Ion batteries. IEEE Access 2022;10:19621–8. https://doi.org/10.1109/
Proc. 40th Int. Conf. Mach. Learn., PMLR; 2023. p. 21051–64.
ACCESS.2022.3151975.
[34] Kong Z, Ping W, Huang J, Zhao K, Catanzaro B. DiffWave: A versatile diffusion
[13] Zhang Y, Su C, Wu J, Liu H, Xie M. Trend-augmented and temporal-featured
model for audio synthesis 2021. https://doi.org/10.48550/arXiv.2009.09761.
Transformer network with multi-sensor signals for remaining useful life prediction.
[35] Moliner E, Lehtinen J, Välimäki V. Solving audio inverse problems with a diffusion
Reliab Eng Syst Saf 2024;241:109662. https://doi.org/10.1016/j.
model. In: ICASSP 2023 - 2023 IEEE Int. Conf. Acoust. Speech Signal Process.
ress.2023.109662.
ICASSP; 2023. p. 1–5. https://doi.org/10.1109/ICASSP49357.2023.10095637.
[14] Jayasinghe L, Samarasinghe T, Yuenv C, Ni Low JC, Sam Ge S. Temporal
[36] Chen X. A novel gear RUL prediction method by diffusion model generation health
convolutional memory networks for remaining useful life estimation of industrial
index and attention guided multi-hierarchy LSTM. Sci Rep 2024;14:1795. https://
machinery. In: 2019 IEEE Int. Conf. Ind. Technol. ICIT; 2019. p. 915–20. https://
doi.org/10.1038/s41598-024-52151-y.
doi.org/10.1109/ICIT.2019.8754956.
[37] Lu Y, Tang D, Zhu D, Gao Q, Zhao D, Lyu J. Remaining useful life prediction for
[15] Liu H, Liu Z, Jia W, Lin X. Remaining useful life prediction using a novel feature-
bearing based on coupled diffusion process and temporal attention. IEEE Trans
attention-based end-to-end approach. IEEE Trans Ind Inform 2021;17:1197–207.
Instrum Meas 2024;73:1–10. https://doi.org/10.1109/TIM.2024.3366270.
https://doi.org/10.1109/TII.2020.2983760.
[38] Kingma DP, Welling M. Auto-encoding variational bayes 2022. https://doi.org/10.
[16] Yang N, Wang Z, Cai W, Li Y. Data regeneration based on multiple degradation
48550/arXiv.1312.6114.
processes for remaining useful life estimation. Reliab Eng Syst Saf 2023;229:
[39] Kingma DP, Welling M. An introduction to variational autoencoders. Found
108867. https://doi.org/10.1016/j.ress.2022.108867.
Trends® Mach Learn 2019;12:307–92. https://doi.org/10.1561/2200000056.
[17] Zio E. Prognostics and Health Management (PHM): Where are we and where do we
[40] Dong S, Xiao J, Hu X, Fang N, Liu L, Yao J. Deep transfer learning based on Bi-
(need to) go in theory and practice. Reliab Eng Syst Saf 2022;218:108119. https://
LSTM and attention for remaining useful life prediction of rolling bearing. Reliab
doi.org/10.1016/j.ress.2021.108119.
Eng Syst Saf 2023;230:108914. https://doi.org/10.1016/j.ress.2022.108914.

21
W. Wang et al. Reliability Engineering and System Safety 252 (2024) 110394

[41] Le-Khac PH, Healy G, Smeaton AF. Contrastive representation learning: A [47] Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, et al.
framework and review. IEEE Access 2020;8:193907–34. https://doi.org/10.1109/ WaveNet: A generative model for raw. audio 2016. https://doi.org/10.48550/
ACCESS.2020.3031549. arXiv.1609.03499.
[42] Robinson J, Chuang C-Y, Sra S, Jegelka S. Contrastive learning with hard negative [48] Arias Chao M, Kulkarni C, Goebel K, Fink O. Aircraft engine run-to-failure dataset
samples 2021. https://doi.org/10.48550/arXiv.2010.04592. under real flight conditions for prognostics and diagnostics. Data 2021;6:5. https://
[43] Jiang R, Nguyen T, Ishwar P, Aeron S. Supervised contrastive learning with hard doi.org/10.3390/data6010005.
negative samples 2022. https://doi.org/10.48550/arXiv.2209.00078. [49] Arias Chao M, Kulkarni C, Goebel K, Fink O. Fusing physics-based and deep
[44] Schroff F, Kalenichenko D, Philbin J. FaceNet: A unified embedding for face learning models for prognostics. Reliab Eng Syst Saf 2022;217:107961. https://doi.
recognition and clustering. In: 2015 IEEE Conf. Comput. Vis. Pattern Recognit. org/10.1016/j.ress.2021.107961.
CVPR. IEEE; 2015. p. 815–23. https://doi.org/10.1109/CVPR.2015.7298682. [50] Kingma DP, Ba J. Adam: A method for stochastic optimization. https://doi.org/10.
[45] Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C. MobileNetV2: Inverted 48550/arXiv.1412.6980; 2017.
residuals and linear bottlenecks. IEEE Computer Society 2018:4510–20. https:// [51] Glorot X, Bengio Y. Understanding the difficulty of training deep feedforward
doi.org/10.1109/CVPR.2018.00474. neural networks. In: Proc. Thirteen. Int. Conf. Artif. Intell. Stat., JMLR Workshop
[46] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention and Conference Proceedings; 2010. p. 249–56.
is all you need. Adv. Neural Inf. Process. Syst., 30. Curran Associates, Inc.; 2017.

a2687a6c946cf465db5f4cda9e455b355298
No ratings yet
a2687a6c946cf465db5f4cda9e455b355298
23 pages
Research Article
No ratings yet
Research Article
14 pages
Fault Knowledge Transfer Assisted Ensemble Method For Remaining Useful Life Prediction
No ratings yet
Fault Knowledge Transfer Assisted Ensemble Method For Remaining Useful Life Prediction
12 pages
Feature-Enhanced Multisource Subdomain Adaptation On Robust Remaining Useful Life Prediction
No ratings yet
Feature-Enhanced Multisource Subdomain Adaptation On Robust Remaining Useful Life Prediction
8 pages
s44196-024-00639-w
No ratings yet
s44196-024-00639-w
12 pages
actuators-11-00067
No ratings yet
actuators-11-00067
17 pages
1-s2.0-S0951832017307779-Xiang Li Remaining Useful Life 2017
No ratings yet
1-s2.0-S0951832017307779-Xiang Li Remaining Useful Life 2017
29 pages
Machines 11 00341
No ratings yet
Machines 11 00341
16 pages
Mathematics 10 02066
No ratings yet
Mathematics 10 02066
13 pages
Bearing Remaining Useful Life Prediction Based On Regression Shapalet and Graph Neural Network
No ratings yet
Bearing Remaining Useful Life Prediction Based On Regression Shapalet and Graph Neural Network
12 pages
Unsupervised Health Indicator Fusing Time and Frequency Domain Information and Its Application to Remaining Useful Life Prediction
No ratings yet
Unsupervised Health Indicator Fusing Time and Frequency Domain Information and Its Application to Remaining Useful Life Prediction
12 pages
A Sequence - To - Sequence Approach For Remaining Useful Lifetime Estimation Using Attention Augmented Bidirectional LSTM
No ratings yet
A Sequence - To - Sequence Approach For Remaining Useful Lifetime Estimation Using Attention Augmented Bidirectional LSTM
18 pages
1 s2.0 S0888327023007963 Main
No ratings yet
1 s2.0 S0888327023007963 Main
25 pages
Practical Options For Adopting Recurrent Neural Network and Its Variants On Remaining Useful Life Prediction
No ratings yet
Practical Options For Adopting Recurrent Neural Network and Its Variants On Remaining Useful Life Prediction
20 pages
1 Applsci-13-07186
No ratings yet
1 Applsci-13-07186
17 pages
A_Novel_Remaining_Useful_Life_Prediction_Method_of_Rolling_Bearings_Based_on_Deep_Transfer_A
No ratings yet
A_Novel_Remaining_Useful_Life_Prediction_Method_of_Rolling_Bearings_Based_on_Deep_Transfer_A
12 pages
Mathematics 12 01137
No ratings yet
Mathematics 12 01137
20 pages
1 s2.0 S0263224120308162 Main
No ratings yet
1 s2.0 S0263224120308162 Main
7 pages
Remaining Useful Life Estimation in Prognostics Using Deep Reinforcement Learning
No ratings yet
Remaining Useful Life Estimation in Prognostics Using Deep Reinforcement Learning
16 pages
2020 Placement of Distribution-Level PhasorMeasurements for Topological Observability andMonitoring of Active Distribution Networks
No ratings yet
2020 Placement of Distribution-Level PhasorMeasurements for Topological Observability andMonitoring of Active Distribution Networks
10 pages
Articulo 1
No ratings yet
Articulo 1
13 pages
Feature Extraction for Data-Driven Remaining Useful Life Prediction of Rolling Bearings
No ratings yet
Feature Extraction for Data-Driven Remaining Useful Life Prediction of Rolling Bearings
10 pages
Long-Term Structural State Trend Forecasting Based
No ratings yet
Long-Term Structural State Trend Forecasting Based
17 pages
Two-Dimensional_Optimization_Framework_of_Online_Interpretable_Time-Frequency_Feature_Learning_for_Practical_Machine_Health_Monitoring
No ratings yet
Two-Dimensional_Optimization_Framework_of_Online_Interpretable_Time-Frequency_Feature_Learning_for_Practical_Machine_Health_Monitoring
15 pages
Ensemble_Learning_Based_Convolutional_Neural_Netwo (2)
No ratings yet
Ensemble_Learning_Based_Convolutional_Neural_Netwo (2)
12 pages
Ma 2020
No ratings yet
Ma 2020
10 pages
A Soft Sensor for Multirate Quality Variables Based on MC-CNN
No ratings yet
A Soft Sensor for Multirate Quality Variables Based on MC-CNN
12 pages
Enhancing Time Series Forecasting Accuracy With Deep Learning Models: A Comparative Study
No ratings yet
Enhancing Time Series Forecasting Accuracy With Deep Learning Models: A Comparative Study
10 pages
Time-Series Interval Forecasting With Dual-Output
No ratings yet
Time-Series Interval Forecasting With Dual-Output
21 pages
1-A New Dynamic Predictive Maintenance Framework Using Deep Learning For Failure Prognostics
No ratings yet
1-A New Dynamic Predictive Maintenance Framework Using Deep Learning For Failure Prognostics
12 pages
6. 1-s2.0-S0925231224002376-main
No ratings yet
6. 1-s2.0-S0925231224002376-main
9 pages
Novel Wind Speed Prediction System Based On Dimensionalit - 2024 - Expert System
No ratings yet
Novel Wind Speed Prediction System Based On Dimensionalit - 2024 - Expert System
16 pages
Sustainability 13 08548 v2
No ratings yet
Sustainability 13 08548 v2
19 pages
(Subrayado) Direct - Remaining - Useful - Life - Estimation - Based - On - Random - Forest - Regression
No ratings yet
(Subrayado) Direct - Remaining - Useful - Life - Estimation - Based - On - Random - Forest - Regression
7 pages
4-s2.0-S1568494623011080-main
No ratings yet
4-s2.0-S1568494623011080-main
17 pages
Advancing Aircraft Engine RUL Predictions: An Interpretable Integrated Approach of Feature Engineering and Aggregated Feature Importance
No ratings yet
Advancing Aircraft Engine RUL Predictions: An Interpretable Integrated Approach of Feature Engineering and Aggregated Feature Importance
14 pages
Computer Communications: Hsin-Yao Hsu, Gautam Srivastava, Hsin-Te Wu, Mu-Yen Chen
No ratings yet
Computer Communications: Hsin-Yao Hsu, Gautam Srivastava, Hsin-Te Wu, Mu-Yen Chen
10 pages
Data Driven Remaining Useful Life Prediction Via M
No ratings yet
Data Driven Remaining Useful Life Prediction Via M
10 pages
Dynamic Predictive Maintenance For Multiple Components Using Data-Driven Probabilistic RUL Prognostics The Case of Turbofan Engines
No ratings yet
Dynamic Predictive Maintenance For Multiple Components Using Data-Driven Probabilistic RUL Prognostics The Case of Turbofan Engines
14 pages
Dual-dimensional Dependency Fusion Transformer for Long-Term Spatiotemporal Flow Prediction
No ratings yet
Dual-dimensional Dependency Fusion Transformer for Long-Term Spatiotemporal Flow Prediction
8 pages
Aircraft Engine Run-to-Failure Dataset Under Real
No ratings yet
Aircraft Engine Run-to-Failure Dataset Under Real
14 pages
j.neucom.2021.07.080
No ratings yet
j.neucom.2021.07.080
13 pages
Data-Driven Prognostic Scheme For Bearings Based On A Novel Health Indicator and Gated Recurrent Unit Network
No ratings yet
Data-Driven Prognostic Scheme For Bearings Based On A Novel Health Indicator and Gated Recurrent Unit Network
11 pages
Energies 14 02163
No ratings yet
Energies 14 02163
18 pages
2019remaining Useful Life Prediction Based On A Double-Convolutional Neural Network Architecture
No ratings yet
2019remaining Useful Life Prediction Based On A Double-Convolutional Neural Network Architecture
10 pages
Gruh
No ratings yet
Gruh
12 pages
Performance assessment of time series forecasting models for simple network management protocol-based hypervisor data
No ratings yet
Performance assessment of time series forecasting models for simple network management protocol-based hypervisor data
14 pages
Data-Driven and Physics-Based Approaches For Wind Turbine High-Speed Shaft Bearing Prognostics
No ratings yet
Data-Driven and Physics-Based Approaches For Wind Turbine High-Speed Shaft Bearing Prognostics
6 pages
1-s2.0-S095070512400827X-main
No ratings yet
1-s2.0-S095070512400827X-main
11 pages
Remaining_Useful_Life_Prognosis_for_Turbofan_Engin
No ratings yet
Remaining_Useful_Life_Prognosis_for_Turbofan_Engin
19 pages
entropy-27-00181
No ratings yet
entropy-27-00181
25 pages
Fault Detection in Complex Mechatronic Systems by A Hi - 2024 - Reliability Engi
No ratings yet
Fault Detection in Complex Mechatronic Systems by A Hi - 2024 - Reliability Engi
11 pages
13
No ratings yet
13
10 pages
Rolling Element Bearing Degradation Prediction Using Dynamic Model and An Improved Adversarial Domain Adaptation Approach
No ratings yet
Rolling Element Bearing Degradation Prediction Using Dynamic Model and An Improved Adversarial Domain Adaptation Approach
12 pages
applsci-14-00551-v2
No ratings yet
applsci-14-00551-v2
20 pages
1-s2.0-S1746809424000211-main
No ratings yet
1-s2.0-S1746809424000211-main
8 pages
A Generic Framework For Prognostics of Complex SystemsAerospace
No ratings yet
A Generic Framework For Prognostics of Complex SystemsAerospace
27 pages
I8_rahimi
No ratings yet
I8_rahimi
8 pages
Advances and Limitations in Machine Learning Approaches Applied To Remaining Useful Life Predictions: A Critical Review
No ratings yet
Advances and Limitations in Machine Learning Approaches Applied To Remaining Useful Life Predictions: A Critical Review
18 pages
The InfluxDB Handbook: Deploying, Optimizing, and Scaling Time Series Data
From Everand
The InfluxDB Handbook: Deploying, Optimizing, and Scaling Time Series Data
Robert Johnson
No ratings yet
Inductive Moment Matching
No ratings yet
Inductive Moment Matching
36 pages
LLM_usecase
No ratings yet
LLM_usecase
31 pages
Opt. DRformer RUL prediction
No ratings yet
Opt. DRformer RUL prediction
15 pages
Generalized Diffusion RUL
No ratings yet
Generalized Diffusion RUL
21 pages
DiffBat
No ratings yet
DiffBat
15 pages
Unpaired Image-To-Image Translation Via Schrondinger Bridge
No ratings yet
Unpaired Image-To-Image Translation Via Schrondinger Bridge
15 pages
Selected Research Papers for ML_AI Project
No ratings yet
Selected Research Papers for ML_AI Project
3 pages
STABLE DIFFUSION WITH GENERATIVE AI
No ratings yet
STABLE DIFFUSION WITH GENERATIVE AI
3 pages
Resume Latest
No ratings yet
Resume Latest
3 pages
2502.15283v1
No ratings yet
2502.15283v1
23 pages
Emu Video
No ratings yet
Emu Video
29 pages
Class 5 - Deep Dive Into AI
No ratings yet
Class 5 - Deep Dive Into AI
32 pages
DistriFusion - Distributed Parallel Inference For High - Resolution Diffusion Models
No ratings yet
DistriFusion - Distributed Parallel Inference For High - Resolution Diffusion Models
12 pages
2024 - Language Model Beats Diffusion - Tokenizer Is Key To Visual Generation - Yu Et Al
No ratings yet
2024 - Language Model Beats Diffusion - Tokenizer Is Key To Visual Generation - Yu Et Al
19 pages
Frequency Compensated Diffusion Model For Real-Scene Dehazing
No ratings yet
Frequency Compensated Diffusion Model For Real-Scene Dehazing
16 pages
2406.08801v2hallo - Hierarchical Audio-Driven Visual Synthesis For Portrait Image Animation
No ratings yet
2406.08801v2hallo - Hierarchical Audio-Driven Visual Synthesis For Portrait Image Animation
20 pages
Fast High-Resolution Image Synthesis With Latent Adversarial Diffusion Distillation
No ratings yet
Fast High-Resolution Image Synthesis With Latent Adversarial Diffusion Distillation
19 pages
Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model
No ratings yet
Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model
16 pages
SinSR- Diffusion-Based Image Super-Resolution in a Single Step
No ratings yet
SinSR- Diffusion-Based Image Super-Resolution in a Single Step
10 pages
Classifier-Guided-Diffusion-Diffusion Models Beat GANs On Image Synthesis
No ratings yet
Classifier-Guided-Diffusion-Diffusion Models Beat GANs On Image Synthesis
44 pages
Efficient Image Deblurring Networks based on DF
No ratings yet
Efficient Image Deblurring Networks based on DF
16 pages
Simulating Images of Radio Galaxies With Diffusion Models: T. Vi Cánek Martínez, N. Baron Perez, and M. Brüggen
No ratings yet
Simulating Images of Radio Galaxies With Diffusion Models: T. Vi Cánek Martínez, N. Baron Perez, and M. Brüggen
18 pages
How To Speak AI 1719345908
100% (1)
How To Speak AI 1719345908
102 pages
HC L-Diff: Hybrid Conditional Latent Diffusion With High Frequency Enhancement For CBCT-to-CT Synthesis
No ratings yet
HC L-Diff: Hybrid Conditional Latent Diffusion With High Frequency Enhancement For CBCT-to-CT Synthesis
13 pages
Diffusion Policy
No ratings yet
Diffusion Policy
16 pages
Common Diffusion Noise Schedules and Sample Steps Are Flawed
No ratings yet
Common Diffusion Noise Schedules and Sample Steps Are Flawed
8 pages
2406.01900v3Follow-Your-Emoji Fine-Controllable and Expressive Freestyle Portrait Animation-Tencent
No ratings yet
2406.01900v3Follow-Your-Emoji Fine-Controllable and Expressive Freestyle Portrait Animation-Tencent
15 pages
DiffUCD - Unsupervised Hyperspectral Image Change Detection With Semantic Correlation Diffusion Model
No ratings yet
DiffUCD - Unsupervised Hyperspectral Image Change Detection With Semantic Correlation Diffusion Model
10 pages
3 DDIM Inversion
No ratings yet
3 DDIM Inversion
20 pages
CIFAKE Image Classification and Explainable Identification of AI-Generated Synthetic Images
No ratings yet
CIFAKE Image Classification and Explainable Identification of AI-Generated Synthetic Images
10 pages
Customizing Text-to-Image Models With A Single Image Pair
No ratings yet
Customizing Text-to-Image Models With A Single Image Pair
27 pages
Guidance With Spherical Gaussian Constraint for Conditional Diffusion
No ratings yet
Guidance With Spherical Gaussian Constraint for Conditional Diffusion
25 pages
ReMoDiffuse - Retrieval-Augmented Motion Diffusion Model
No ratings yet
ReMoDiffuse - Retrieval-Augmented Motion Diffusion Model
12 pages
Accelerating_Digital_Twin_Development_with_Generative_AI_A_Framework_for_3D_Modeling_and_Data_Integration
No ratings yet
Accelerating_Digital_Twin_Development_with_Generative_AI_A_Framework_for_3D_Modeling_and_Data_Integration
19 pages

DiffNet

Uploaded by

DiffNet

Uploaded by

Reliability Engineering and System Safety 252 (2024) 110394

Contents lists available at ScienceDirect

Reliability Engineering and System Safety

Data augmentation based on diffusion probabilistic model for remaining

1. Introduction monitoring data [4]. Various architectures, including Convolutional

thereby facilitating the synthesis of data of enhanced quality [26–28].

Fig. 1. The architecture of DTE.

function in Eq. (2). We adopt a piecewise linear RUL function [7], {

model by maximizing the likelihood function L of p(X), as detailed in

Next, we introduce a new distribution q(Z|X), representing the dis-

rithm. To circumvent this, we employ a reparameterization trick to

Fig. 2. An illustration for monitoring data generation with DiffRUL.

accurately reflecting the degradation trajectory of an engine from initial

Fig. 3. The architecture of DiffNet.

Fig. 4. The architecture of feature extraction network.

Similar to the embedding of the attention mechanism 7

Table 1 4.2. Evaluation metrics

the corresponding prediction value. Lower values of RMSE and Score

FD001 FD002 FD003 FD004 FD001 FD002 FD003 FD004

Fig. 6. Results of the CNN-based method in C-MAPSS dataset.

Fig. 7. Results of the RNN-based method in C-MAPSS dataset.

Fig. 8. Results of the hybrid method in C-MAPSS dataset.

Fig. 9. Results of the Transformer-based method in C-MAPSS dataset.

Fig. 12. Degradation trend representations obtained by DTE on four subsets.

FD001 FD002 FD003 FD004 FD001 FD002 FD003 FD004

DS01 DS02 DS03 DS04 DS01 DS02 DS03 DS04

Fig. 15. Error distribution of the CNN-based method in N-CMAPSS dataset.

Fig. 16. Error distribution of the RNN-based method in N-CMAPSS dataset.

Fig. 17. Error distribution of the hybrid method in N-CMAPSS dataset.

Fig. 18. Error distribution of the Transformer-based method in N-CMAPSS dataset.

Appendix A. Derivation of Eq. (24)

where q(Xt |Xt− 1 ) can be represented as:

where q(Xt− 1 , Xt |X0 ) can be expressed as:

which completes the derivation of Eq. (24).

Appendix B. Derivation of Eq. (25), (26), and (27)

q(Xt− 1 , Xt , X0 ) q(Xt− 1 , X0 )/q(X0 ) q(Xt |Xt− 1 )q(Xt− 1 |X0 )

which completes the derivation of Eq. (25), (26), and (27).

Appendix C. Derivation of Eq. (31)

Recall that the KL divergence between two Gaussian distributions is:

Based on Eq. (29), (30), and (52), we can derive:

which completes the derivation of Eq. (31).

Appendix D. Specific configuration of RUL prediction methods

Layer Network Parameters

1 2D Convolution Filters = 32, Kernel size = (14, 3), Stride = (1, 1)

Layer Network Parameters

1 LSTM Hidden size = 64, Bias = True, Bidirectional = False

Layer Network Parameters

1 1D Convolution Filters = 18, Kernel size = 2, Stride = 1, Padding = same

Layer Network Parameters

1 Linear Units = 64, Bias = True

References [18] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al.

You might also like