Application of Non Gaussian Feature Enhancement Extraction - 2024 - Expert Syste
Application of Non Gaussian Feature Enhancement Extraction - 2024 - Expert Syste
Keywords: The nonlinear, time correlation, and non-Gaussian features in data present significant challenges for effective
Batch production processes fault detection. While the Gate Recurrent Unit (GRU) network is renowned for its capacity to manage time
Fault detection correlation, it falls short in capturing non-Gaussian features in process data, which can likely lead to suboptimal
Feature enhancement extraction
monitoring results. To address this limitation, the Enhancement Gate Recurrent Unit (ENGRU) is developed
Enhancement gate recurrent unit network
to perfect the fault detection accuracy of the network. Specifically, The ENGRU effectively extracts high
order statistics information by employing the overcompleted independent component analysis method, thereby
augmenting its ability to capture non-Gaussian properties. The extracted features information are then entered
into the ENGRU model further to uncover additional hidden features beyond what the GRU can achieve.
The ENGRU network, which is built upon the extracted characteristic information, even farther enhances the
accuracy of the fault detection. The merits of the proposed model are demonstrated by comparing it with
excellent fault detection algorithms on a benchmark platform.
1. Introduction the precise kinematic equations. These representations are then used
to obtain associated residuals and establish monitoring statistics to
The importance of batch production processes in assuring plant evaluate the process condition (Lou et al., 2022; Zhang & Zhao, 2018).
safety and product quality for batch process has piqued the inter- Including principal component analysis (PCA) (Lv et al., 2014; Stief
est of academic and industrial researchers. However, in mass batch et al., 2019), and partial least squares (PLS) (Si et al., 2020), and the
production processes, without timely and effective monitoring and like, which are able to characterize the condition of monitored process
fault elimination the results are often disastrous and multiply. Serious systems flawlessly by mapping raw data from a high dimensional space
effects, including diminished product quality, damaged equipment and into a variable set for the purpose of dimensionality reduction. The
personal casualties (Granados et al., 2020; Jiang et al., 2019; Wang above published literatures are restricted to linear process, whereas
et al., 2020). Hence it is imperative that batch production processes nonlinear characteristics are prevalent in batch production processes.
must be subject to process monitoring. It is remarkably, however, that As a result, these established approaches fail to take into consideration
the manufacturing techniques used in batch production processes tend nonlinear associated features.
to be fairly complicated, and that the process variables are highly
Kernel techniques have shown remarkable success in learning non-
nonlinear, non-Gaussian, and time correlated, all of which provide
linear features. The fundamental idea of kernel models is to trans-
serious barriers to fault detection (Jiang & Yan, 2018). As a result of
form nonlinear data into a high-dimensional space where information
these factors, it is vital to conduct appropriate process monitoring by
about the features can be extracted through linear approximation.
resorting to superior methods are necessary to reduce losses caused by
Various kernel methods, including kernel principal component analysis
faults and anomalies (Chang et al., 2022; Yang et al., 2023).
(KPCA) (Deng et al., 2018), kernel partial least squares (KPLS) (Fazai
Data driven methods based on multivariate statistical process mon-
et al., 2019), and kernel independent component analysis (KICA) (Lee
itoring (MSPM) techniques have been developed and widely employed
for monitoring batch production processes, which alleviates the chal- et al., 2007), have been successively presented. Deng et al. have de-
lenge of fault detection (Tong & Yan, 2015). Unlike knowledge based vised a hierarchical statistical framework that depends on the KPCA
and mechanism model based methods, MSPM constructs representa- model to capture linear and nonlinear principal components more
tions from process data without requiring a thorough understanding of deeply (Deng et al., 2018). In paper (Cai et al., 2015), the authors
∗ Corresponding author at: Faculty of Information Technology, Beijing University of Technology, Beijing 100124, PR China.
E-mail address: [email protected] (C. Peng).
https://doi.org/10.1016/j.eswa.2023.121348
Received 13 May 2023; Received in revised form 11 August 2023; Accepted 26 August 2023
Available online 1 September 2023
0957-4174/© 2023 Elsevier Ltd. All rights reserved.
C. Peng et al. Expert Systems With Applications 237 (2024) 121348
have introduced a novel model for fault detection based on weighted broad learning system (OBLS) fault detection network outperforms clas-
KICA, which achieves superior detection performance. An excellent sic monitoring methods. The OBLS network utilizes the overcomplete
method, named as dynamic kernel independent component analysis independent component analysis method based on enhanced features
(DKICA), has been developed for extracting fault detection character- to extract non-Gaussian natures, resulting in satisfactory monitoring
istics from dynamic processes (Feng & Sun, 2017). Additionally, the results.
data-driven approach of the dynamic bayesian network has emerged Inspired by the works mentioned above, the objective of this paper
as a highly efficacious tool in addressing uncertain problems caused is to explore ways deriving meaningful feature representations from
by nonlinear and dynamic factors of the control system closed-loop complex batch production processes that exhibit nonlinearity, time
links, which has found widespread application in fault detection and correlation, and non-Gaussianity simultaneously. By maximizing the ex-
diagnosis within industrial processes (Amin et al., 2019; Kong et al., traction of feature information hidden within the data, ENGRU method
2022b; Wee et al., 2015). For instance, Kong et al. (2022b) considered offers a promising approach for achieving more accurate and effective
the impact of sampling frequency, noise, and redundancy level on fault detection in batch production processes. The contributions of this
fault detection and diagnostic accuracy, followed by proposing a novel paper can be summarized in the following specific ways:
dynamic bayesian network (DBN) network for the dual redundant 1. An excellent ENGRU model has been developed and applied for
subsea blowout preventer control system performance evaluation and the first time in the penicillin fermentation batch production process.
fault diagnosis inference. This proposed framework adeptly captures This model fills in gaps where the GRU model struggles to mine
dynamic data fluctuations, thereby effectively mitigating the cascading non-Gaussian information and enables for effectively mining of the non-
repercussions arising from faults within the subsea blowout preventer linearity, time correlation, and non-Gaussianity present in the dataset
control system. Nonetheless, several limitations hinder the capability to achieve improved fault detection accuracy.
for enhanced fault detection within industrial processes, including the 2. Neurons in the ENGRU network are given more wiggle room
significant computational load arising from kernel projection, along in terms of establishing weights and biases since the size of the in-
with the absence of prior knowledge to aid in the selection of ideal dependent component matrix is larger than the dimension of the raw
kernel parameters. Similarly, the modeling method based on Bayesian matrix.
Networks exhibit a strong reliance on data, necessitating an ample The remainder layout of the paper is laid down. The ENGRU theory
availability of historical data for the establishment of a comprehensive is partially introduced in Section 2. In Section 3, the modeling process
and precise model. This is why the majority of kernel trick approaches is described. Section 4 validates proposed method against a batch peni-
and bayesian networks fall short in yielding desirable results (Chen cillin fermentation process to prove the validity of ENGRU network. In
et al., 2021; Jiang et al., 2020). Section 5, the conclusion of this work is delivered.
Artificial Neural networks (ANNs) are increasingly being used as
the preferred method for fault detection in a wide variety of indus- 2. The construction of enhancement gate recurrent network net-
trial processes owing to their excellent ability to interpret raw data work
nonlinearly. Yu et al. employed SAE to describe the distribution of
process information and to discover beneficial features for nonlinear This section primarily presents the specifics of the proposed EN-
process monitoring, and the local and global information preserved GRU network. Figs. 1 and 2 depicts the GRU and ENGRU networks
is merged onto the encoding section of SAE to acquire internal in- architecture, respectively. The fundamental idea underlying the neural
formation (Yu & Zhang, 2020). In paper (Yu & Yan, 2021) the SAE network-based modeling approach utilized in this paper is to fully
is developed to extract the most advantageous features by amplifying explore the features in the data. The data gathered from the batch pro-
the features that are beneficial to fault detection. While deep learning duction process clearly deviates from a Gaussian distribution. Merely
networks like SAE exhibit exceptional capabilities in nonlinear feature relying on the GRU model enables the extraction of nonlinearity,
extraction, another crucial aspect in batch production processes is time correlation, but fails to uncover the non-Gaussian information
the time correlation of process data. In batch production processes embedded within. The ENGRU model incorporates an overcomplete
such as penicillin fermentation, data from previous moments could independent component analysis (OICA) window to extract high-order
provide critical information in the present moment for fault detec- statistics containing non-Gaussian information to enhance the capacity
tion (Chang et al., 2022). The SAE model, however, ignores the time of model in learning and extracting the diverse feature of batch produc-
correlation of the data, and thus lacks the ability to assess the process tion process data. At its core, OICA window of ENGRU model employs
condition based on the time dimension. The Gated Recurrent Unit mixing matrix, denoted as 𝐿, to extract independent components that
(GRU) monitoring framework is an undeniable solution when consid- exhibit non-Gaussian properties. ICA aims to address the problem of
ering the time correlation of process data (Ahmad & Wu, 2023). Like finding the mixing matrix, which can be simplified as finding the appro-
priate orthogonal matrix using the fast independent component analysis
other excellent models such as Recurrent Neural Network and Long
(FastICA) algorithm or other algorithms. In contrast to how ICA pro-
Short Term Memory models (Ren et al., 2021; Yuan et al., 2020), the
duces the mixing matrix, ENGRU obtains it via a novel semi-adaptive
GRU network may remember past information and selectively forget
deflation procedure.
unimportant and irrelevant information to mine the time correlation.
Ahmad et al. developed the Bi-GRU model, which efficiently learns 𝑆 = 𝑋𝐿 (1)
the temporal dynamics of frame sequences in both the forward and
backward directions, aiming to tackle the problem of time correlation where 𝑆 represents the independent meta. The primary purpose of the
modeling (Ahmad et al., 2023). Actually, non-Gaussianity is also one OICA layer of ENGRU model is to perform the necessary computations
of the significantly important features in batch process, and neglecting for the mixing component 𝐿. Steps 1 through 8 of Algorithm 2 provides
non-Gaussian characteristics will have a negative effect on the moni- a concise description of the steps involved in acquiring the mixing
toring accuracy to some extent (Chang et al., 2021; Ding et al., 2020; component.
Xie et al., 2013). This is another factor that makes it hard for GRU The overcomplete layer of the ENGRU model derives the covariance
monitoring framework to work in industrial batch process applications. matrix by employing the second-order cumulant generating function
To detect faults in non-Gaussian process, a promising approach is to (SCFG). The cumulative generating function of 𝑥 is defined as follows:
( ⊤ )
use overcomplete independent component analysis, which effectively 𝜙𝑥 (𝜎) = log E 𝑒𝜎 𝑥 (2)
processes the non-Gaussianity problem by extracting independent fea-
tures subject to Gaussian distribution (Chang et al., 2020). The results where 𝜎 ∈ R𝑀 and 𝑀 symbolizes the dimensionality of the process
presented in Chang and Lu (2021) demonstrate that the overcomplete variable, specifically, the second-order cumulant is consistent with the
2
C. Peng et al. Expert Systems With Applications 237 (2024) 121348
follows:
( ⊤
)
E 𝑆𝑆 ⊤ 𝑒𝑆 𝑑
𝐶𝑆 (𝑑) = ( ) − 𝑆 (𝑑)𝑆 (𝑑)⊤ − cov(𝑆) (8)
E 𝑒𝑆 ⊤ 𝑑
( ⊤
) ( ⊤ )
𝑆 (𝑑) = ∇𝜙𝑆 (𝑑) = E 𝑆𝑒𝑑 𝑆 ∕E 𝑒𝑑 𝑆 (9)
∑
𝑘
𝐶𝑥 (𝜎) = 𝜔𝑖 (𝜎)𝑙𝑖 𝑙𝑖⊤ (10)
𝑖=1
[ ( )]
where 𝜔𝑖 (𝜎) = 𝛼 𝐿⊤ 𝜎 𝑖𝑖 is the generalized covariance of the 𝑖th
independent source. Choose a sufficiently large number of generalized
[ ]
covariance matrices 𝑠 > 𝑘 to span the subspace 𝑊 = 𝐻1 , 𝐻2 , … , 𝐻𝑠 .
Given enough non-zero vectors 𝜎1 , … , 𝜎𝑠 , construct an equivalent ma-
( )
trix 𝐻𝑗 = 𝑥 𝜎𝑗 , 𝑗 ∈ [𝑠].
On the basis of the space 𝑊 , the semi-definite program (SDP)
is utilized to conclude the atomic estimation 𝑙𝑖 𝑙𝑖⊤ , and the mixing
component 𝑙𝑖 is then made available. The ENGRU model employs a
novel semi-adaptive deflation method to finalize all atomic estimations.
The atomic estimation problem can be transformed into a SDP problem,
specifically expressed as:
Fig. 2. The network flowchart for fault detection focusing upon ENGRU. ∗
𝐵𝑠𝑑𝑝 = argmax⟨𝐺, 𝐵⟩ (11)
𝐵∈𝑆𝑝𝑎𝑛
3
C. Peng et al. Expert Systems With Applications 237 (2024) 121348
can only be approximated. Consequently, this paper introduces relax- Algorithm 1 FISTA for Eq. (15)
ation constraints instead of retaining the first hard constraint. There-
Input: 𝑌 (1) = 𝐵 (0) ∈ Span, 𝑧1 = 1.
fore, Eq. (13)–(14) can be written equivalently as:
Output: 𝐵 ∗ = 𝐵 (𝑛) .
𝜇 ∑ ⟨ ⟩2
𝐵 ∗ = argmax⟨𝐺, 𝐵⟩ − 𝐵, 𝐹𝑗 1: while not converged
[ or 𝑛 > 𝑛max do
]
𝐵∈𝑆pan 2 𝑗∈[𝑚−𝑘]
(15) ( )
2: 𝐵 (𝑛) = Proj 𝑌 (𝑛) − 𝐿1 ∇𝑓 𝑌 (𝑛) .
= {Tr(𝐵) = 1, 𝐵 ⪰ 0} ( √ )
3: 𝑧𝑛+1 = 21 1 + 1 + 4𝑧2𝑛 .
where 𝜇 > 0 is a regularization parameter that aids in mitigating the
( )
noise interference existing in the actual batch production process. The 𝑧 −1 ( (𝑛) )
4: 𝑌 (𝑛+1) = 𝐵 (𝑛) + 𝑧𝑛 𝐵 − 𝐵 (𝑛−1)
𝑛+1
ENGRU model uses the fast iterative shrinkage-thresholding algorithm
5: 𝑛 = 𝑛 + 1.
(FISTA) (Beck & Teboulle, 2009) and the majorization-maximization
6: end while
principle (MMP) (Hunter & Lange, 2004) to efficiently addresses the
relaxation problem and derives the atomic estimate 𝐵 ∗ . Based on
clustering algorithm of semi-adaptive deflation process, thet negative
objective of the problem Eq. (15) can be further expressed as: remaining atoms are estimated using the FISTA algorithm (see Algo-
𝜇 ∑ ⟨ ⟩2 rithm 1) and the majorization-maximization principle. Based on the
𝑓 (𝐵) = −⟨𝐺, 𝐵⟩ + 𝐵, 𝐹𝑗 (16)
2 𝑗∈[𝑚−𝑘] novel semi-adaptive deflation process, the negative objective function
of Eq. (15) can be further expressed as follows:
Assume that g(B) is the indicator function of the set . Hence, the 𝜇 ∑ ⟨ ⟩2
negative objective function of Eq. (15) can be written further as follows: 𝑓 (𝐵) = −⟨𝐺, 𝐵⟩ + 𝐵, 𝐹𝑗 (21)
2 𝑗∈[𝑚−𝑘+𝑡]
where 𝑡 represent the current number of acquired atoms, and 𝑡 ∈ [𝑎, 𝑘].
argmin 𝑓 (𝐵) + 𝑔(𝐵) (17)
One notable difference between the objective function of the adaptive
The following expression represents the gradient of the differentiable deflation algorithm and the clustering process is the inclusion of an
portion of the objective function: additional variable, 𝑡. The introduction of variable 𝑡 is primarily due
∑ ( ) to the adaptive deflation algorithm adding the currently found atoms
∇𝑓 (𝐵) = −𝐺⊤ + 𝜇 Tr 𝐹𝑗 𝐵 𝐹𝑗⊤ (18) to the orthogonal complementary bases of 𝑊 . Consequently, the space
𝑗∈[𝑚−𝑘]
of the complement basis 𝐹𝑗 for the subspace W continuously expands
FISTA algorithm is employed to address the objective function problem, and grows larger. Considering the constraints = {Tr(𝐵) = 1, 𝐵 ⪰ 0},
as described in Algorithm 1. Let the Lipschitz constant be L = 𝜇. In the gradient of the differentiable portion of the objective function as
Algorithm 1, 𝑛𝑚𝑎𝑥 denotes the maximum number of iterations, n repre- follows.
sents the current number of iterations, and 𝑧𝑛 signifies the acceleration ∑ ( )
factor. Proj (𝐵) can be expressed as the projection of B onto the set . ∇𝑓 (𝐵) = −𝐺T + 𝜇 Tr 𝐹𝑗 𝐵 𝐹𝑗⊤ (22)
𝑗∈[𝑚−𝑘+𝑡]
Can perform eigenvalue decomposition on the matrix 𝐵, that is, 𝐵 =
𝑉 𝛬𝑉 ⊤ , and then project its eigenvalue 𝜆 = diag(𝛬) to the probability The majorization-maximization principle is applied to terminate the
simplex, represented by the symbol 𝛥𝑝 . As contraction process early when finding suitable atoms. Thus, after this
[ ] a result, Proj (𝐵) can be
written as Proj (𝐵) = 𝑉 Diag Proj𝛥𝑝 (𝜆) 𝑉 ⊤ . The specific definition adaptive deflation algorithm, 𝑘 atoms are obtained and denoted as L.
of the probability simplex 𝛥𝑝 can be found in the following Duchi Then, each atomic matrix in 𝐿 is subjected to eigenvalue decompo-
et al. (2008). The semi-definite programming problem represented by sition, and the eigenvector corresponding to the largest eigenvalue is
Eq. (12), and its relaxation constraints Eq. (15) aims to estimate only selected as a mixing component, denoted as 𝑙𝑖 , for the mixing matrix
specific elements. Given that 𝐺 influences the deflation direction of the 𝐿. Consequently, the entire composite matrix 𝐿 = [𝑙1 , 𝑙2 , … , 𝑙𝑘 ] is
atoms, it is natural to resample 𝐺 by multiples of 𝑘. On the basis of obtained. Gaining independent components 𝑆 based on mixing com-
FISTA, the majorization-maximization principle and its corresponding ponents, where the size of the independent component matrix exceeds
deflation direction 𝐺 select multiples of the specified atomic 𝑘. The the dimension of the raw matrix. Consequently, neurons in the ENGRU
final step in obtaining 𝐷𝑐𝑙𝑢𝑠𝑡 involves clustering the obtained atoms with network have more flexibility in establishing weights and biases, thus
specified 𝑘 multiples into 𝑘 clusters, selecting an atom from each cluster getting the reset gate 𝑟𝑡 and update gate 𝑢𝑡 in ENGRU network performs
(by default, the first column is chosen), and D is obtained. The 𝑘 atoms the following update:
in 𝐷𝑐𝑙𝑢𝑠𝑡 are then subjected to the following screening procedure: ( )
𝑢𝑡 = sigmoid 𝑊𝑠𝑢 𝑆𝑡 + 𝑊𝑢ℎ ℎ𝑡−1 + 𝑏𝑢 (23)
| ⊤ |
𝐺𝑐𝑙𝑢𝑠𝑡 = |𝐷𝑐𝑙𝑢𝑠𝑡 𝐷𝑐𝑙𝑢𝑠𝑡 | − 𝐼𝑘 (19) ( )
| |
𝑟𝑡 = sigmoid 𝑊𝑠𝑟 𝑆𝑡 + 𝑊𝑟ℎ ℎ𝑡−1 + 𝑏𝑟 (24)
{ ( )}
𝑁𝑢𝑚 = arg min max 𝐺𝑐𝑙𝑢𝑠𝑡 (20) where the ℎ𝑡−1 and ℎ𝑡 mean the hidden layer state of time step 𝑡 − 1 and
𝑁𝑢𝑚 is the index of the smallest value among all the maximum values. 𝑡, respectively. The update and reset gate biases are represented by 𝑏𝑢
The 𝐷𝑐𝑙𝑢𝑠𝑡 depicts the most independent atom, i.e. an atom utilized and 𝑏𝑟 , correspondingly. Furthermore, the reset gate 𝑟𝑡 and the update
as a clustering filter, and other atoms in 𝐷𝑐𝑙𝑢𝑠𝑡 are screened. If the gate 𝑢𝑡 have a range of values between 0 and 1. Specifically, when
maximum value corresponding to the remaining atoms of 𝐷𝑐𝑙𝑢𝑠𝑡 is less the sigmoid activation function returns a value of 0, the update gate
than the specified threshold, it is retained as a new estimated atom. 𝑢𝑡 decides to retain the candidate hidden state ̃
ℎ𝑡 , while the reset gate
After all the screening is completed, it is judged that if the number 𝑎 of 𝑟𝑡 chooses to disregard the previous hidden layer state ℎ𝑡−1 . Conversely,
atoms screened by the clustering algorithm is less than 𝑘, the adaptive when the sigmoid activation function returns a value of 1, the update
deflation algorithm is used to estimate the remaining atoms. gate 𝑢𝑡 decides to disregard the candidate hidden state ̃ ℎ𝑡 , while the
(2) Adaptive Deflation Process of Semi-adaptive Deflation Algorithm reset gate 𝑟𝑡 chooses to maintain the previous hidden state ℎ𝑡−1 .
The adaptive deflation process excludes all currently obtained atoms ( ( ) )
̃
ℎ𝑡 = tanh 𝑊̃ℎℎ 𝑟𝑡 ∗ ℎ𝑡−1 + 𝑊̃ℎ𝑠 𝑆𝑡 + 𝑏ℎ (25)
from the search to update the constraint set. The subspace 𝑊 is updated
to be spanned by atoms other than the selected estimated atoms, while where candidate hidden state ̃ ℎ𝑡 utilizes reset gate 𝑟𝑡 to regulate intake
the atoms found through the clustering algorithm are added to the of previous hidden state ℎ𝑡−1 that stores historical data. The previous
orthogonal complementary bases (null space) of 𝑊 . Subsequently, the hidden state ℎ𝑡−1 will be eliminated if reset gate 𝑟𝑡 becomes somewhat
4
C. Peng et al. Expert Systems With Applications 237 (2024) 121348
near to zero. The reset gate 𝑟𝑡 offers a method for erasing previous 3. The ENGRU network modeling process for fault detection
hidden state ℎ𝑡−1 , which is unrelated to current state. In other words,
it measures how much information from the past has been ignored. This paper presents a systematic strategy for detecting faults in
( ) batch processes. The proposed monitoring strategy can be divided
ℎ𝑡 = 𝑢𝑡 ∗ ℎ𝑡−1 + 1 − 𝑢𝑡 ∗ ̃
ℎ𝑡 (26)
into three stages: preprocessing, network training, and online appli-
The weight matrices denoting the connections from the hidden unit cation. The following steps provide an overview of the procedure for
to the hidden unit and from the input unit to the hidden unit are implementing the approach.
respectively denoted as 𝑊̃ℎℎ and 𝑊̃ℎ𝑠 . The hidden state ℎ𝑡 employs the Preprocessing stage
update gate 𝑢𝑡 to update the previous hidden state ℎ𝑡−1 and candidate
Step 1: Initialize both the parameters and hyperparameters.
hidden state ̃ℎ𝑡 . In the case where update gate 𝑢𝑡 is near to 1, the
Step 2: Load the data and preprocess it.
previous hidden state ℎ𝑡−1 is preserved and carried forward to the
Network training stage
present time. The learned information is ultimately mapped to the
output layer 𝑜 for the purpose of fault detection in batch processes. Step 3: Initialize some required parameters.
Step 4: The OICA window of ENGRU model employs the mixing
𝑜𝑦 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑊𝑜 ℎ𝑛 + 𝑏𝑜 ) (27) matrix denoted as 𝐿 to extract independent meta components that ex-
where the weight of the output unit, denoted by 𝑊𝑜 , and the bias, hibit non-Gaussian information, which enhances the capacity of mode
denoted by 𝑏𝑜 , are respectively used. to learn and mine diverse features from batch production process data.
The ENGRU model shares a similar structure with other recurrent Step 5: To obtain the mixing matrix mentioned in Step 4, it is
neural networks, but updating its hidden state necessitates more effort. essential first to estimate the subspace 𝑊 based on Eq. (2)–(10).
Instead of simply replacing the current hidden state with the previ- Notably, a sufficiently large number of generalized covariance matrices,
ous hidden state, ENGRU employs the reset gate and update gate to designated as 𝑠 > 𝑘, must be chosen to span the subspace 𝑊 =
[ ]
evaluate the usefulness of the information in the previous hidden state. 𝐻1 , 𝐻2 , … , 𝐻𝑠 .
Only if the information is deemed beneficial will it be kept. ENGRU Step 6: Building on the derived space 𝑊 , the atomic estimation
requires less parameter updating and calculation, which makes network problem transforms into the SDP problem. The ENGRU model em-
training more efficient. The ENGRU monitoring framework is a suitable ploys an innovative semi-adaptive deflation approach to finalize all
approach to address the issue of unstable gradients in deep neural atomic estimations 𝑙𝑖 𝑙𝑖⊤ , subsequently making the mixing component 𝑙𝑖
networks, and it has performed well in handling nonlinearity, time available.
correlation, and non-Gaussianity. Algorithm 2 presents a pseudocode Step 7 : Within this process, the utilization of the SDP problem
description of one portion of the suggested approach, while Fig. 2 yields a single atom denoted as 𝐵𝑠𝑑𝑝 ∗ , attained through Eq. (11)–(18).
depicts the overall framework architecture of the ENGRU model. Following this, the novel semi-adaptive deflation technique is applied
Remark: The ENGRU network possesses all of the following benefits.
to derive all atomic estimations. Significantly, this technique integrates
Firstly, the independent component matrix size is larger than the raw
clustering and adaptive deflation algorithms.
matrix dimension, which provides more freedom to the neurons of
Step 8: Both the clustering and adaptive deflation algorithms lever-
ENGRU network in establishing weights and biases. Secondly, ENGRU
age the FISTA (see Algorithm 1) and MMP principle to efficiently
is able to display the characteristics of the data itself in a more accurate
manner than GRU does, allowing the network to reveal and exploit address the relaxation problem, leading to the derivation of the atomic
further features information hidden within the dataset. This assists estimate 𝐵 ∗ .
to fill in the blanks left by the ineffective mining of non-Gaussian Step 9: Upon completion of the screening process involving the
information by the classic GRU model. clustering algorithm of semi-adaptive deflation, a decision is made. If
the number 𝑎 of atoms screened by the clustering algorithm is less
than 𝑘, the adaptive deflation algorithm of semi-adaptive deflation is
engaged to estimate the remaining atoms. Refer to Eq. (13)–(20) for
Algorithm 2 The pseudocode description of ENGRU network
detailed elucidation.
Input: Parameters initialization for ENGRU model, and preprocessing
Step 10: The key distinction between the clustering and adaptive de-
data 𝑋.
flation algorithms lies in the adaptive deflation process, which excludes
Output: Acquiring the results from the batch Batch production
all currently obtained atoms from the search and updates the constraint
processes.
set. Consequently, subspace 𝑊 is refreshed, spanning atoms other than
1: STEP I:Estimation of the subspace W. the selected estimated atoms. Meanwhile, atoms identified through the
2: By utilizing the second-order cumulant generating function, the
{ } clustering algorithm are incorporated into the orthogonal complemen-
subspace 𝑊 = Span 𝑙1 𝑙1𝑇 , 𝑙2 𝑙2𝑇 , … , 𝑙𝑘 𝑙𝑘𝑇 can be constructed, which tary bases (null space) of 𝑊 . See Eq. (21)–(22) for comprehensive
is spanned by atoms. clarification.
3: STEP II: Estimation of the atoms.
Step 11: Building upon the aforementioned steps, independent meta
4: Estimation of the atoms based on the novel semi-adaptive deflation
components 𝑆 are obtained, where the size of the independent meta
algorithm.
component matrix exceeds the dimension of the raw matrix. Conse-
5: Given 𝐺(𝑖) for every deflation step 𝑖 = 1, 2, … , 𝑘
quently, neurons within the ENGRU network attain heightened flexibil-
6: Solve the relaxation of semi-definite program Eq. (16) with 𝐺(𝑖) .
ity in establishing weights and biases. This facilitates the performance
7: Estimate the 𝑖-th mixing component 𝑙𝑖 from 𝐵 ∗ .
[ ] of update operations for the reset gate 𝑟𝑡 and update gate 𝑢𝑡 within the
8: Mixing matrix 𝑙 = 𝑙1 , 𝑙2 , … , 𝑙𝑘 .
ENGRU network. Furthermore, calculations for the candidate hidden
9: At every time increment, the loop is executed.
state ̃ℎ𝑡 , hidden state ℎ𝑡 , and output state 𝑜𝑦 are executed. Detailed
10: for t = 1 to n do
insights can be found in Eq. (23)–(27).
11: Reset gate 𝑟𝑡 and update gate 𝑢𝑡 are determined by employing
Eq. (23)–(24). Step 12: Drawing from Steps 1–11, backpropagation is employed to
12: The candidate hidden state ̃ ℎ𝑡 and the hidden state ℎ𝑡 are figured iteratively update weight matrices and bias vectors, thereby training
out by applying Eq. (25)–(26). the ENGRU network.
13: end for Step 13: Evaluation of accuracy and other performance indicators
14: 𝑜𝑦 is determined by applying Eq. (27). pertinent to fault detection is conducted, and the network parameters
aare preserved for subsequent Online application stage.
5
C. Peng et al. Expert Systems With Applications 237 (2024) 121348
Table 1
Process variables involved in the experiment.
No. Process variables Units
1 Aeration Rate L∕h
2 Agitator power RPM
3 Substrate feed flow rate L∕h
4 Substrate feed temperature K
5 DO concentration. mg∕L
6 CO2 %
7 PH PH
8 Temperature K
9 Generated heat K
10 Cold water flow rate L∕h
Table 2
Specific faults information used in the experiment.
Fault no. Fault variable Magnitude Fault type Start time End time
1 Substrate feed flow rate −5% step 45 100 Fig. 3. A flow chart of penicillin fermentation process.
2 Aeration rate +3% step 20 150
3 Substrate feed flow rate +0.005L/h ramp 45 150
4 Aeration rate −3% Step 150 250
5 Substrate feed flow rate +5% step 45 150 conventional statistical methods such as KICA and DKICA, deep neural
6 Substrate feed flow rate −5% step 60 150 network methods such as SAE and GRU, as well as methods that exploit
7 Aeration rate −3L/h ramp 60 300
the potential of wide learning systems such as BLS and OBLS, are
8 Agitator power +4% Step 60 300
9 Aeration rate −2L/h Ramp 100 200 chosen. In the experimental configuration, KICA and DKICA select 10
10 Agitator power +5% Step 100 200 and 14 independent components, respectively, and the control limits
for 𝑇 2 and SPE statistics are set to 95%. The time step of DKICA is
selected as 2. The Gaussian kernel function is chosen as the kernel
method. The architecture of the SAE consists of three sequential layers:
Online application stage
two autoencoder layers followed by a softmax layer, comprising 100,
Frequently, unplanned faults occur in batch processes. Assuming
50, and 4 neurons, respectively. The GRU architecture comprises a GRU
that failure to monitor the faults online could have resulted in an layer and a Dense layer, containing 200 and 64 neurons, respectively.
emergency shutdown in the operating state and damage to property. ENGRU, on the other hand, features an independent component layer
Step14: Load the testing data 𝑋 𝑛𝑒𝑤 and preprocess it. with 24 independent meta, a GRU layer, and a Dense layer, consisting
Step15: According to a series of parameters generated during the of 300 and 64 neurons, respectively. The feature window numbers,
network training stage, evaluate the accuracy and other performance node numbers within each window, and enhancement node numbers
indicators employed for process monitoring. for both BLS and OBLS are set to 10, 10, and 50, respectively. OBLS
Step16: The superiority of the ENGRU network, which can promptly also contains 24 independent meta, matching the amount in ENGRU.
detect and distinguish whether the system is abnormal, will be vali- As evaluation metrics, False Alarm Rate (FAR), Miss Alarm Rate (MAR),
dated. This serves as a timely reminder for technicians to stay focused and Accuracy Rate (ACC) are selected to accurately evaluate the perfor-
on the production process and maintain the batch processes in a safe mance of the proposed method. The calculation formulas are provided
state. below:
No. of false alarms
4. Experimental verification FAR = × 100% (28)
No. of total normal samples
No. of missing alarms
4.1. Industrial penicillin fermentation process MAR = × 100% (29)
No. of total fault samples
( )
Penicillin has enormous medicinal potential as a therapeutic an- No. of false and miss alarms samples
ACC = 1 − × 100% (30)
tibiotic. The fermentation of penicillin is a sophisticated industrial No. of total samples
batch process involving biochemical reactions, strain screening, strain Tables 3 and 4 demonstrate that the ENGRU approach outperformed
improvement, and other operational activities. On the basis of the the other comparison algorithms in terms of testing accuracy in the
penicillin fermentation mechanism, the Benchmark platform has been majority of cases. Table 5 contrasts the time overhead of the ENGRU
created by Birol et al. which could be applied to imitate the fermen- model to those of five competitors. In contrast to other approaches,
tation process (Birol et al., 2002). The Benchmark Pensim platform is the accuracy of the 𝑇 2 and SPE statistics for KICA is subpar due to
employed throughout this section to assess whether ENGRU is capa- their high computational complexity in dealing with high-dimensional
ble of detecting and responding appropriately to abnormalities in the data, resulting in a 15.517 s time overhead. In addition to this, it
penicillin fermentation process. A simplified schematic of the penicillin is prone to instability and falls short in its ability to extract inde-
fermentation process is depicted in Fig. 3. Sampling frequency appears pendent features, both of which contribute to an elevated likelihood
to be 1 h, and the fermentation time for penicillin production equals of false alarms and missed alarms. Therefore, the accuracy of faults
400 h. As seen in Table 1, multiple process parameters, including aer- detection is drastically diminished. On the other hand, the ACC of the
ation rate, agitator power and substrate feed flow rate, among others, enhanced kernel independent component analysis algorithm, particu-
have a substantial impact on the efficiency of penicillin fermentation. larly the DKICA model, performs exceptionally well in comparison to
Table 2 describes fault types involved in the experiment. the standard KICA. Meanwhile, the additional computational overhead
it incurs is only approximately 12 s longer than that of KICA. The
4.2. Analysis of experimental results SAE is a dependable fault detection model that enhances nonlinear
processing capabilities. However, it demonstrates a lower monitoring
The data generated during the fermentation process are nonlinear, accuracy with a mean accuracy of 0.8660. This limitation primarily
non-Gaussian, and time correlated. For comparison experiments, more stems from its failure to accounts for the wide variety of fault types
6
C. Peng et al. Expert Systems With Applications 237 (2024) 121348
Table 3
Experimental results of classical monitoring methods: KICA and DKICA (FAR, MAR and ACC).
Fault no. KICA(T2 ) KICA(SPE) DKICA(T2 ) DKICA(SPE)
FAR MAR ACC FAR MAR ACC FAR MAR ACC FAR MAR ACC
1 0.0494 0.9643 0.8225 0.0552 0.9643 0.8175 0.0581 0.8929 0.8250 0.0407 0.8571 0.8450
2 0.0223 0.0000 0.9850 0.0260 0.0000 0.9825 0.0446 0.0000 0.9700 0.0260 0.0000 0.9825
3 0.0374 0.9340 0.7250 0.0578 0.9623 0.7025 0.0408 0.8491 0.7450 0.0374 0.8491 0.7475
4 0.0535 0.0000 0.9600 0.0635 0.0000 0.9525 0.0502 0.0000 0.9625 0.0669 0.0000 0.9500
5 0.0408 0.9434 0.7200 0.0578 0.9623 0.7025 0.0476 0.9057 0.7250 0.0374 0.9057 0.7325
6 0.0421 0.9231 0.7575 0.0615 0.9670 0.7325 0.0615 0.9011 0.7475 0.0356 0.9231 0.7625
7 0.0629 0.0041 0.9725 0.1006 0.0041 0.9575 0.0818 0.0000 0.9675 0.0755 0.0000 0.9700
8 0.0566 0.6763 0.5700 0.0943 0.6598 0.5650 0.0755 0.2365 0.8275 0.0692 0.1909 0.8575
9 0.0435 0.0396 0.9575 0.0569 0.0396 0.9475 0.0569 0.0000 0.9575 0.0602 0.0000 0.9550
10 0.0435 0.4059 0.8650 0.0569 0.2871 0.8850 0.0569 0.0792 0.9375 0.0602 0.0693 0.9375
AVG 0.0452 0.4891 0.8335 0.0631 0.4846 0.8245 0.0574 0.3864 0.8665 0.0509 0.3795 0.8740
Table 4
Experimental results of some monitoring methods: SAE, BLS, OBLS GRU and ENGRU (FAR, MAR and ACC).
Fault no. SAE BLS OBLS GRU ENGRU
FAR MAR ACC FAR MAR ACC FAR MAR ACC FAR MAR ACC FAR MAR ACC
1 0.0814 0.8036 0.8175 0.0436 0.5179 0.8900 0.0087 0.3750 0.9400 0.0029 0.1273 0.9800 0.0000 0.1636 0.9775
2 0.0000 0.2748 0.9100 0.0000 0.1832 0.9400 0.0000 0.0840 0.9725 0.0000 0.1077 0.9650 0.0000 0.0846 0.9725
3 0.0034 0.9528 0.7450 0.0238 0.6981 0.7975 0.0000 0.5000 0.8675 0.0000 0.1905 0.9500 0.0000 0.1619 0.9575
4 0.0000 1.0000 0.7475 0.0000 1.0000 0.7475 0.0000 1.0000 0.7475 0.0000 1.0000 0.7500 0.0033 1.0000 0.7475
5 0.0680 0.8962 0.7125 0.0272 0.8019 0.7675 0.0000 0.6415 0.8300 0.0034 0.2667 0.9275 0.0034 0.2762 0.9250
6 0.0259 0.7582 0.8075 0.0227 0.7033 0.8225 0.0065 0.5604 0.8675 0.0065 0.5889 0.8625 0.0032 0.5667 0.8700
7 0.0189 0.0539 0.9600 0.0000 0.6307 0.6200 0.0000 0.5975 0.6400 0.0000 0.0083 0.9950 0.0000 0.0000 1.0000
8 0.0000 0.0000 1.0000 0.0000 0.4647 0.7200 0.0000 0.4813 0.7100 0.0063 0.0917 0.9425 0.0063 0.0125 0.9900
9 0.0000 0.1584 0.9600 0.0000 0.1188 0.9700 0.0033 0.2475 0.9350 0.0000 0.1000 0.9750 0.0000 0.0700 0.9825
10 0.0000 0.0000 1.0000 0.0000 0.0000 1.0000 0.0000 0.0000 1.0000 0.0033 0.0000 0.9975 0.0033 0.0000 0.9975
AVG 0.0198 0.4898 0.8660 0.0117 0.5119 0.8275 0.0019 0.4487 0.8510 0.0022 0.2481 0.9345 0.0020 0.2336 0.9420
Table 5
Time overhead of models in penicillin fermentation process.
Models Time overhead (s)
KICA 15.517
DKICA 27.459
BLS 0.398
OBLS 5.816
GRU 145.207
ENGRU 204.345
and the evident time correlation among process variables, leading to Fig. 4. Mean value of each index about KICA, DKICA, SAE, BLS, OBLS, GRU and
incomplete feature extraction and suboptimal performance in fault ENGRU for faults monitoring.
detection. In contrast, the GRU network excels at capturing the dynamic
characteristics of time series data. Experimental results reveal that the
GRU achieves an average accuracy of 0.9345, while the ENGRU attains ENGRU amounts to 204.345 s, which incurs a slight increase compared
a comparable performance of 0.9420. This significant improvement to the standard GRU. This minor increase can be attributed to the
in mean accuracy highlights the advantages of the GRU and ENGRU larger dimension of the independent component matrix in the ENGRU,
regarding their memory cells and gating mechanisms. These features which necessitates learning more parameters. Despite the increased
enable the networks to selectively retain or discard information over computational overhead, the ENGRU model achieves an average im-
time, effectively addressing the evident time correlation among process provement in accuracy, which still falls within the acceptable range for
variables. Results from the GRU model are generally positive, and the batch production processes. The overall results suggest that the method
majority of faults can be monitored, though there are a few notable enhanced by overcomplete independent component analysis surpasses
exceptions, such as faults 4 and 6. An analysis of agitator power faults the raw model in fault detection. For instance, the average ACC of
(fault 8), aeration rate faults (faults 7 and 9), and substrate feed flow the BLS is 0.8275, while that of the OBLS is 0.8510. Similarly, the
rate faults (faults 3 and 6) reveals that the precision of the ENGRU ENGRU attains an average ACC of 0.9420, whereas the GRU achieves a
model is consistently more accurate than other comparison algorithms, comparable 0.9345. An improvement of this magnitude in the mean
demonstrating the potential of the proposed fault detection method. monitoring accuracy indicates that models may greatly benefit from
the overcomplete independent component extraction mechanism to
The FRA, MAR, ACC and time overhead are evaluation metrics for
further enhance faults detection accuracy. In summary, based on the
models. Fig. 4 illustrates the mean FRA, MAR, and ACC for all fault
aforementioned four indices, ENGRU delivers the best performance,
types, facilitating a straightforward visual comparison of the monitor-
providing the most effective and reliable monitoring results for batch
ing results analyzed in this paper. The experimental outcomes reveal
production processes.
that the ENGRU method yields the lowest MAR while achieving the
highest average ACC, thereby signifying its effectiveness in detecting 5. Conclusion
faults. Although the FAR for the ENGRU method, one of the evaluation
indicators, is not optimal, its value remains within an acceptable range Efficient fault detection continues to face significant challenges due
and is less than 0.2%. It is worth noting that the time overhead of the to the nonlinearity, time correlation, and non-Gaussianity of process
7
C. Peng et al. Expert Systems With Applications 237 (2024) 121348
data. The ENGRU neural network offers an effective and accurate solu- Chang, P., Li, Z., Wang, G., & Wang, P. (2021). An effective deep recurrent network
tion for feature extraction from process data, as it successfully alleviate with high-order statistic information for fault monitoring in wastewater treatment
process. Expert Systems with Applications, 167, Article 114141.
the challenge of nonlinearity, time correlation, and non-Gaussianity
Chang, P., & Lu, R. (2021). Process monitoring of batch process based on overcomplete
simultaneously. To evaluate the performance of ENGRU, a comparative broad learning network. Engineering Applications of Artificial Intelligence, 99, Article
analysis is conducted on the Pensim platform, alongside other neural 104139.
networks. Experimental results demonstrate that ENGRU outperforms Chang, P., Wang, K., & Wang, P. (2020). Quality relevant over-complete independent
component analysis based monitoring for non-linear and non-Gaussian batch
conventional approaches in terms of fault detection, as it is capable
process. Chemometrics and Intelligent Laboratory Systems, 205, Article 104140.
of exploiting more information to enhance monitoring accuracy. Con- Chang, P., Xu, Y., & Hu, Z. (2022). Industrial process monitoring based on dynamic
sequently, ENGRU exhibits superior performance in fault detection. overcomplete broad learning network. IEEE Transactions on Neural Networks and
However, the current ENGRU network encounters certain limitations, Learning Systems.
especially when applied to batch processes that involve composite Chen, Q., Liu, Z., Ma, X., & Wang, Y. (2021). Artificial neural correlation analysis for
performance-indicator-related nonlinear process monitoring. IEEE Transactions on
faults. Regrettably, the focus was solely directed towards a singular Industrial Informatics, 18(2), 1039–1049.
type of fault in this paper. Furthermore, drawing inspiration from Deng, X., Tian, X., Chen, S., & Harris, C. J. (2018). Deep principal component analysis
the works referred to as Kong et al. (2022a), Yang et al. (2023), we based on layerwise feature extraction and its application to nonlinear process
have come to realize the paramount importance of determining the monitoring. IEEE Transactions on Control Systems Technology, 27(6), 2526–2540.
Ding, C., Chang, P., & Olivia, K. (2020). Enhanced high-order information extraction
optimal placement and sampling frequency of sensors. This determi-
for multiphase batch process fault monitoring. The Canadian Journal of Chemical
nation is crucial for enhancing the efficiency and accuracy of fault Engineering, 98(10), 2187–2204.
detection. Therefore, future research endeavors will be concentrated Duchi, J., Shalev, S., Singer, Y., & Chandra, T. (2008). Efficient projections onto the
on establishing an optimization model for determining the optimal l 1-ball for learning in high dimensions. In Proceedings of the 25th International
Conference on Machine Learning (pp. 272–279).
number and placement of sensors to enhance the performance of fault
Fazai, R., Mansouri, M., Abodayeh, K., Nounou, H., & Nounou, M. (2019). Online
detection model. Additionally, we also intend to develop fault detection reduced kernel PLS combined with GLRT for fault detection in chemical systems.
framework tailored for composite faults that integrate real world data Process Safety and Environmental Protection, 128, 228–243.
from the physical system with virtual data from the digital twin model Feng, L., & Sun, R. (2017). Dynamic kernel independent component analysis ap-
to significantly enhance the accuracy of detecting composite faults proach for fault detection and diagnosis. In 2017 Chinese automation congress (pp.
2193–2197). IEEE.
within industrial processes. Granados, G. E., Lacroix, L., & Medjaher, K. (2020). Condition monitoring and
prediction of solution quality during a copper electroplating process. Journal of
CRediT authorship contribution statement Intelligent Manufacturing, 31(2), 285–300.
Hunter, D. R., & Lange, K. (2004). A tutorial on MM algorithms. The American
Statistician, 58(1), 30–37.
Chang Peng: Investigation, Project administration, Supervision,
Jiang, Q., & Yan, X. (2018). Parallel PCA–KPCA for nonlinear process monitoring.
Writing – original draft, Formal analysis, Validation. Xu Ying: Inves- Control Engineering Practice, 80, 17–25.
tigation, Writing – original draft, software, Formal analysis, Valida- Jiang, Q., Yan, S., Cheng, H., & Yan, X. (2020). Local–global modeling and distributed
tion. Shi ShanQi: Writing, Formal analysis, Validation. Fang ZiYun: computing framework for nonlinear plant-wide process monitoring with industrial
big data. IEEE Transactions on Neural Networks and Learning Systems, 32(8),
Writing, Validation.
3355–3365.
Jiang, Q., Yan, X., & Huang, B. (2019). Deep discriminative representation learning
Declaration of competing interest for nonlinear process fault detection. IEEE Transactions on Automation Science and
Engineering, 17(3), 1410–1419.
Kong, X., Cai, B., Liu, Y., Zhu, H., Liu, Y., Shao, H., Yang, C., Li, H., & Mo, T.
The authors declare that they have no known competing finan-
(2022). Optimal sensor placement methodology of hydraulic control system for
cial interests or personal relationships that could have appeared to fault diagnosis. Mechanical Systems and Signal Processing, 174, Article 109069.
influence the work reported in this paper. Kong, X., Cai, B., Liu, Y., Zhu, H., Yang, C., Gao, C., Liu, Y., Liu, Z., & Ji, R. (2022).
Fault diagnosis methodology of redundant closed-loop feedback control systems:
Data availability Subsea blowout preventer system as a case study. IEEE Transactions on Systems,
Man, and Cybernetics: Systems, 53(3), 1618–1629.
Lee, J. i., Qin, S. J., & Lee, I. B. (2007). Fault detection of non-linear processes
Data will be made available on request using kernel independent component analysis. The Canadian Journal of Chemical
Engineering, 85(4), 526–536.
Acknowledgments Lou, Z., Wang, Y., Si, Y., & Lu, S. (2022). A novel multivariate statistical process
monitoring algorithm: Orthonormal subspace analysis. Automatica, 138, Article
110148.
This paper has received support from the both the Beijing Natural Lv, Z., Yan, X., & Jiang, Q. (2014). Batch process monitoring based on just-in-time
Science Foundation, PR China (4232042) and the National Natural learning and multiple-subspace principal component analysis. Chemometrics and
Science Foundation of China (62273190). Intelligent Laboratory Systems, 137, 128–139.
Podosinnikova, A., Bach, F., & Lacoste-Julien, S. (2016). Beyond CCA: Moment
matching for multi-view models. In International conference on machine learning (pp.
References 458–467). PMLR.
Ren, L., Wang, T., Laili, Y., & Zhang, L. (2021). A data-driven self-supervised LSTM-
Ahmad, T., & Wu, J. (2023). SDIGRU: Spatial and deep features integration using DeepFM model for industrial soft sensor. IEEE Transactions on Industrial Informatics,
multilayer gated recurrent unit for human activity recognition. IEEE Transactions 18(9), 5859–5869.
on Computational Social Systems. Si, Y., Wang, Y., & Zhou, D. (2020). Key-performance-indicator-related process monitor-
Ahmad, T., Wu, J., Alwageed, H. S., Khan, F., Khan, J., & Lee, Y. (2023). Human activity ing based on improved kernel partial least squares. IEEE Transactions on Industrial
recognition based on deep-temporal learning using convolution neural networks Electronics, 68(3), 2626–2636.
features and bidirectional gated recurrent unit with features selection. IEEE Access, Stief, A., Ottewill, J. R., Baranowski, J., & Orkisz, M. (2019). A PCA and two-stage
11, 33148–33159. Bayesian sensor fusion approach for diagnosing electrical and mechanical faults in
Amin, M. T., Khan, F., & Imtiaz, S. (2019). Fault detection and pathway analysis using induction motors. IEEE Transactions on Industrial Electronics, 66(12), 9510–9520.
a dynamic Bayesian network. Chemical Engineering Science, 195, 777–790. Tong, C., & Yan, X. (2015). A novel decentralized process monitoring scheme using a
Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for modified multiblock PCA algorithm. IEEE Transactions on Automation Science and
linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202. Engineering, 14(2), 1129–1138.
Birol, G., Ündey, C., & Cinar, A. (2002). A modular simulation package for fed- Wang, J., He, Y.-L., & Zhu, Q.-X. (2020). Energy and production efficiency optimization
batch fermentation: penicillin production. Computers & Chemical Engineering, 26(11), of an ethylene plant considering process operation and structure. Industrial and
1553–1565. Engineering Chemistry Research, 59(3), 1202–1217.
Cai, L., Tian, X., & Chen, S. (2015). Monitoring nonlinear and non-Gaussian processes Wee, Y. Y., Cheah, W. P., Tan, S. C., & Wee, K. (2015). A method for root cause
using Gaussian mixture model-based weighted kernel independent component analysis with a Bayesian belief network and fuzzy cognitive map. Expert Systems
analysis. IEEE Transactions on Neural Networks and Learning Systems, 28(1), 122–135. with Applications, 42(1), 468–487.
8
C. Peng et al. Expert Systems With Applications 237 (2024) 121348
Xie, L., Zeng, J., & Gao, C. (2013). Novel just-in-time learning-based soft sensor utilizing Yu, J., & Zhang, C. (2020). Manifold regularized stacked autoencoders-based feature
non-Gaussian information. IEEE Transactions on Control Systems Technology, 22(1), learning for fault detection in industrial processes. Journal of Process Control, 92,
360–368. 119–136.
Yang, C., Cai, B., Wu, Q., Wang, C., Ge, W., Hu, Z., Zhu, W., Zhang, L., & Yuan, X., Li, L., Shardt, Y. A., Wang, Y., & Yang, C. (2020). Deep learning with
Wang, L. (2023). Digital twin-driven fault diagnosis method for composite faults spatiotemporal attention-based LSTM for industrial soft sensor model development.
by combining virtual and real data. Journal of Industrial Information Integration, 33, IEEE Transactions on Industrial Electronics, 68(5), 4404–4414.
Article 100469. Zhang, S., & Zhao, C. (2018). Slow-feature-analysis-based batch process monitoring
Yu, J., & Yan, X. (2021). A new deep model based on the stacked autoencoder with with comprehensive interpretation of operation condition deviation and dynamic
intensified iterative learning style for industrial fault detection. Process Safety and anomaly. IEEE Transactions on Industrial Electronics, 66(5), 3773–3783.
Environmental Protection, 153, 47–59.