0% found this document useful (0 votes)
29 views14 pages

A Data Enhancement Method Based On Generative Adversari 2024 Computers Che

A Data Enhancement Method Based on Generative Adversari 2024 Computers Che
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views14 pages

A Data Enhancement Method Based On Generative Adversari 2024 Computers Che

A Data Enhancement Method Based on Generative Adversari 2024 Computers Che
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Computers and Chemical Engineering 186 (2024) 108707

Contents lists available at ScienceDirect

Computers and Chemical Engineering


journal homepage: www.elsevier.com/locate/compchemeng

A data enhancement method based on generative adversarial network for


small sample-size with soft sensor application
Zhongyi Zhang a, Xueting Wang b, Guan Wang b, Qingchao Jiang a, *, Xuefeng Yan a,
Yingping Zhuang b
a
Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China
b
State Key Laboratory of Bioreactor Engineering, East China University of Science and Technology, Shanghai 200237, China

A R T I C L E I N F O A B S T R A C T

Keywords: Soft sensor plays an important role in improving product quality; however, practical applications may often face
Small sample-size with the problem of small sample size, which is challenging for developing data-driven models in terms of feature
Generating adversarial network selection and good generalization. This paper proposes a data enhancement approach for small sample size data-
Soft sensor
driven problems based on generative adversarial networks integrated with maximum relevance minimum
Process modeling
redundancy (MRMR). First, sample expansion is performed on the initial data by using a generative adversarial
network. Second, irrelevant variables are eliminated by the MRMR and optimal features are obtained. Finally,
neural networks-based soft sensor modeling is performed using the augmented dataset and the selected features.
The proposed method is tested on a simulated penicillin case, an actual penicillin production case and an actual
erythromycin production case. Experimental results show that the proposed method outperforms state-of-the-art
existing methods, which verify the effectiveness and superiority of the proposed method.

1. Introduction To handle nonlinear and high-dimensional processes, various ma­


chine learning and deep learning methods have been proposed,
Modern industrial processes are becoming more intensified and in­ including kernel-based methods, locally weighted methods and artificial
tegrated, which means increased challenges for process data analysis neural networks (ANNs) methods. The main idea of kernel-based
and modeling due to high-dimensionality and complicated non­ methods is to use kernel functions to project the data into a high
linearities. Establishing real-time quality indicators is of vital impor­ dimensional space so that the variables are more likely to be linearly
tance for safe and efficient operation of industrial processes (Souza et al., related (Rosipal and Trejo, 2002). Jiang et al. proposed a parallel PCA
2016). However, in most cases direct quality variables are not available monitoring method based on PCA and kernel PCA, which achieved good
due to the problems such as the cost of measurement equipment and results for high nonlinear industrial process modeling (Jiang and Yan,
immeasurable variables (Yeo et al., 2023). Therefore, soft sensor tech­ 2018). Zhang et al. proposed a concept drift adaptive dynamic partial
niques have been widely used to develop quality indicators for industrial least squares method to overcome transient and concept drift problems,
processes (Souza et al., 2016; Yeo et al., 2023; Kadlec et al., 2009; Kadlec which showed good performance on several industrial datasets (Zhang
et al., 2011). Due to high dimensionality of industrial process variables, et al., 2023). Yan et al. proposed local depth-specific view auto kernels
feature selection for soft sensor modeling is particularly important mimicking deep neural networks to effectively construct a deep
(Zaman et al., 2022). Traditional soft sensor modeling methods like least matching model for multi-view kernel learning (Yan et al., 2023). The
squares (LS) (Kim et al., 2022; Ding, 2023), partial least squares (PLS) basic idea of the locally weighted approaches is to linearize the
(Xie et al., 2022; Zhou et al., 2023) and principal component analysis nonlinear process locally, which has the advantage of good interpret­
(PCA) (Gaigole and Rajiv, 2023) can be well applied to model soft ability of the model (Cleveland and Devlin, 1988). For instance, Wang
sensors of linear processes. However, when facing the complex et al. proposed a variable-wise weighted parallel stacked auto encoder
nonlinear processes, the ability of LS, PLS, and PCA becomes insufficient (SAE) model for nonlinear multimodal process monitoring and was
for capturing and describing the nonlinear relationship among variables. applied to develop quality soft sensors in a nitrogen synthesis process

* Corresponding author.
E-mail address: [email protected] (Q. Jiang).

https://doi.org/10.1016/j.compchemeng.2024.108707
Received 9 October 2023; Received in revised form 19 March 2024; Accepted 20 April 2024
Available online 27 April 2024
0098-1354/© 2024 Elsevier Ltd. All rights reserved.
Z. Zhang et al. Computers and Chemical Engineering 186 (2024) 108707

(Wang et al., 2021). However, it is challenging to optimize the weighting combined with the current research status of ANNs, insufficient data
parameters when facing complex and high-dimensional nonlinear volume is a common problem in real industrial process modeling.
models with small data size (Yan et al., 2020). Therefore, it would be helpful to integrate GAN into soft measure­
ANNs have universal fitting capabilities, making ANNs based ment to improve the accuracy. However, during the data enhancement
methods suit for data-driven environments. Jung et al. proposed a procedure, the generative network is a complex network with a high
dataset fusion method for predicting the aging status of catalysts in computational cost for training. Meanwhile, irrelevant features increase
heavy oil conversion processes. The proposed model adds a mechanistic the computational cost, also influences model accuracy. Hence, we
model balance penalty term to the neural network training constraints to proposed to incorporate feature selection in the process of data
keep the relative prediction error below 10% (Jung et al., 2023). Yuan enhancement using the GAN. The incorporation of feature selection
et al. developed a variable weighted deep learning method for eliminates irrelevant variables, thus improving the prediction accuracy
layer-by-layer weighted optimization of neural networks for soft sensor of the model and reducing the overall complexity of the model. How­
development (Yuan et al., 2018). Song et al. proposed a neural network ever, facing small samples, the feature selection method cannot play its
embedded variable selection method based on false discovery rate, role well. Therefore, the generative adversarial network can be
which achieved superior results in patient lesion prediction (Song and employed to perform data augmentation to assist feature selection. The
Li, 2021). Feng et al. proposed the method of adversarial smoothing penicillin fermentation process is well known to suffer from long
regularization and proposed an adversarial smoothing tri-regression fermentation times, high potential for sensor contamination of the
model for soft sensor development in order to overcome the problem bacterium, high cost of high-precision sensors, and the need for offline
of noise and sample imbalance in industrial data (Feng et al., 2021). measurement of quality variables. Actual industrial data generally suffer
Jiang et al. proposed the use of neural networks to assist in the inference from compromised data quality and inadequate sample sizes. Therefore,
of non-Markovian mode parameters, and achieved good results in methods applied to develop a well soft sensor for an actual production
intracellular substance prediction (Jiang et al., 2021). Lee et al. pro­ process of penicillin are well worth investigating.
posed a CNN-based sensor layout optimization strategy for industrial Considering the advantages and disadvantages of GAN and feature
safety problem detection and early warning (Lee et al., 2023). However, selection methods, this paper proposes a GAN-based feature relevance
the above methods generally require a large amount of high-quality data learning (GAN-FRL) method for data enhancement. The proposed
to achieve good results. In real process industry, the acquisition of large method retains the end-to-end learning approach. Although the dual
amounts of quality data is generally costly. network structure contains both generative adversarial loss and pre­
For data enhancement and data representation, generative adversa­ diction error loss, the generative adversarial loss is reduced in order to
rial networks (GAN) methods are proposed. GAN can effectively have smaller prediction error loss. In addition, compared to the single
augment the data to learn the intrinsic relationships and representations maximum correlation minimum redundancy (MRMR) approach, the
of the data, so it has been widely used in data enhancement. Pu et al. introduction of generative adversarial networks allows MRMR to
proposed a cyclic consistent generative adversarial network by incor­ perform variable screening well even for a small sample size. Compared
porating the sliced Wasserstein distance and the system of cyclic equa­ with a single GAN, the proposed method reduces the complexity of the
tions into the generative adversarial network. The convergence rate and whole model and also improves the accuracy and interpretability of the
data enhancement capability of the network are remarkably improved model by removing irrelevant redundant variables through introducing
(Pu et al., 2023). Zhang et al. proposed a data-driven recursive gener­ the MRMR.
ative adversarial network that can effectively avoid the neglect of The rest of this paper is structured as follows. Section 2 briefly in­
inter-category differences in traditional generative networks and troduces GAN and MRMR. Section 3 details the proposed data
generate more representative visual features (Zhang et al., 2023). enhancement method for GAN-FR. Section 4 presents three case studies
Christian et al. proposed a temporal sample data bootstrap generation for developing a soft sensor by using the proposed method, a penicillin
adversarial network to generate high-quality temporal data on a simu­ simulation case, an actual penicillin production process from National
lated dataset (Dahl and Sorensen, 2022). Mumbelli et al. incorporated Engineering Research Center for Biotechnology in China and a real
GAN into a computer vision system to improve fault detection and production case from a biopharmaceutical company in China. Finally,
reduce the sample size required for training in the case of automotive Section 5 concludes with a summary and outlook.
manufacturing (Mumbelli et al., 2023). The above approach illustrates
that GAN can be applied to various types of data, including image data, 2. Preliminary knowledge
temporal data, and non-temporal data. At the same time, GAN can
reduce the sample size requirement of the model and enhance the 2.1. Generative adversarial networks
characterization of the original data by generating new data.
GAN has been applied as a means of sample enhancement in soft An introduction to the generating adversarial networks is provided in
sensors modeling due to the problem of data insufficiency that occurs in this section. Fig. 1 is the schematic of GAN, where the green dotted line
real industrial process soft sensors. Wang et al. proposed a soft sensor
method based on Wasserstein GAN to solve the problem of insufficient
data in industrial process modeling (Wang and Liu, 2020). Zhu et al.
introduced the local anomaly factor and K-means method to find the
original data scarcity space, which makes the GAN generated samples
tend to be scarcity space samples, and thus the generated samples can be
better applied to the subsequent soft sensors (Zhu et al., 2021). Gao et al.
combined supervised variational autoencoder (SVAE) and GAN to pro­
pose a new data supplementation method to enhance the soft mea­
surement accuracy, which was successfully applied to thermal power
industrial processes (Gao et al., 2022). Jin et al. proposed a new virtual
sample generation model by combining SVAE with Wasserstein GAN.
The generated samples are also optimally pruned using multi-objective
optimization to obtain enhanced datasets to improve the accuracy of
soft sensors (Jin et al., 2023). All of the above studies have successfully
applied GAN to soft measurements to improve the accuracy. Meanwhile, Fig. 1. Schematic diagram for generating adversarial networks.

2
Z. Zhang et al. Computers and Chemical Engineering 186 (2024) 108707

with an arrow indicates that D has a renewing effect on G and D has a where xi and xj denotes the variables in the set S. Combining Eqs. (5) and
renewing effect on itself. (6) yields the objective function as:
Suppose there are two models trained simultaneously. One of them is
the generator G, while the other one is the discriminator D. G is maxΦ(DR , R) = DR − R (7)
responsible for generating simulation samples from a known distribu­
tion FZ of data by transforming it (usually this distribution is designated where Φdenotes the metrics to be optimized.
as white noise). D discriminates whether the given data is from the true
distribution FX which is the distribution of real data and unknown. 3. GAN-FRL for small sample soft sensor method
This is a game training, where G has to generate enough true fake
samples to deceive the discriminator D. D needs to discriminate as much Feature selection is important in modeling complex processes.
as possible whether the given samples are from real data or not. However, from an actual industrial production process, it is costly to
Therefore, the loss function of GAN is set as (Arjovsky et al., 2017): obtain high-quality samples, meaning that the sample size is often
inadequate. GAN can effectively solve the problem of insufficient initial
minmaxf (D, G) = EFX [logD(x; θD )] + EFZ [log(1 − D(G(z; θG ); θD ))] (1) data samples by generating new samples. Therefore, the fusion of GAN
G D
and feature selection methods of modeling can theoretically achieve
where f(D, G) denotes the loss function of GAN; θD denotes the inter­ good results.
nalization parameter of discriminator D; θG denotes the internalization According to the above idea, this paper proposes a data enhancement
parameter of the generator G; x denotes the true sample (x ∼ FX ); z method by combining GAN and MRMR. Fig. 2 presents the basic idea of
denotes the generated sample (z ∼ FZ ); D(⋅) and G(⋅) denote the mapping the proposed method. The red dotted line in the graph represents the
of discriminator and the mapping of generator. update.
The gradient of GAN is alternatively updated. Normally the Here the key effort is devoted to the estimation of quality indicators
discriminator will be updated first, as following (Arjovsky et al., 2017): at critical moments and the data are common transient data. Therefore,
the loss function of the generating errors is the reconstruction error:
θl+1 l
D = θD − η∇θlD (2)
1∑ n
2
(8)
(i)
lossG = ‖ x(i) − xgen ‖2
where θl+1
denotes the parameters of the updated D,
D θlD
denotes the n i=1
parameter of D before the update, and η denotes the learning rate. Then
the discriminator is fixed and the parameters of the generator are where x(i) and xgen denote the initial sample and the generated sample,
(i)

updated according to a similar principle as following: respectively.


After generating the data by GAN, the generated dataset and the
θl+1 l
G = θG − η∇θlG (3)
initial dataset are merged. The new dataset is then used as input for the
successive variable selection. MRMR method is employed for feature
where θl+1 l
G denotes the parameters of the updated G, and θG denotes the selection to eliminate irrelevant and redundant variables. In the pro­
parameter of G before the update. Then the discriminator and generator duction process, the quality variable is generally defined as titer. Thus,
are updated in turn until convergence to a pre-determined threshold the variable screening is calculated as follows:
value.
1 ∑
maxDtiter (So , ytiter ) = I(xi , ytiter ) (9)
2.2. Maximum correlation minimum redundancy
2
S |So | xi ∈s

Mutual information (MI) is a measure of correlation between vari­ minRtiter (So ) =


1 ∑ (
I xi , xj
)
(10)
ables and is quantitative expression of mutual certainty between vari­ S |So | 2
xi ,xj ∈S
ables based on information entropy (Kraskov et al., 2004). In contrast to
the Pearson correlation coefficient, MI measures the nonlinear rela­ maxΦtiter (Dtiter , Rtiter ) = Dtiter − Rtiter (11)
tionship between variables, rather than a linear relationship only
(Kraskov et al., 2004). Therefore, MI has been widely used in the field of where So denotes the best set of variables for quality variable, ytiter de­
feature selection and variable correlation analysis. The MI is calculated notes the quality variable, Dtiter denotes the relevance between the set of
as shown in Eq. (4) (Kraskov et al., 2004): preferred variables and ytiter , Rtiter denotes the redundancy of the set of
∑∑ preferred variables, and Φtiter denotes the final optimization objective.
p(x, y)
I(x, y) = p(x, y)log (4) The number of variables in So is a hyper parameter, generally depending
p(x)p(y)
x∈X y∈Y on experience and here is defined as mo .
After MRMR-based variable selection, the data of the optimal vari­
where I(x, y) denotes the MI. Eq. (6) is the calculation of MI between able set is obtained. Then the new dataset is used as input for soft sensor
discrete variables. modeling by fully connected neural networks. The loss function of this
MRMR is the set of variables with maximum relevance and minimum network is as follows:
redundancy selected on the basis of mutual information as a measure of
relevance (Peng and Long, 2005). Let S denote the set of features, then in 1 ∑ nc
( )2
lossF = ytrue − ypre (12)
order to select ks variables such that the correlation is maximized: nc j=1

1 ∑
maxDR (S, y) = I(xi , y) (5) where nc denotes the number of samples in the new data set, ypre denotes
S |S|2 xi ∈s
final predicted quality variable. Overall, the proposed GAN-FRL method
consists of three networks with two structures. The purpose is to build a
where xi denotes the variable in the set S, y denotes quality variables. At
soft sensor model with high accuracy and low complexity in case of
this point the set S contains the features with the highest correlation. The
insufficient initial samples. The main steps of the proposed GAN-FRL
redundancy of the set S needs to be minimized, namely:
method are as follows:
1 ∑ ( )
minR(S) = 2
I xi , xj (6)
S |S| xi ,xj ∈S

3
Z. Zhang et al. Computers and Chemical Engineering 186 (2024) 108707

Fig. 2. Method flowchart of GAN-FRL.

1) Use initial data {Xinitial , Y initial } (Data after normalization) as input for 2002), and the main parameters of the process are shown in Table 1. In
data amplification with GAN. Combine the generated data with the total, 1500 batches of data were generated, and the penicillin titer at the
initial data to form the augmented dataset{Xamp , Y amp }. final moment was taken as the quality variable. The penicillin produc­
2) Set Y as the quality variable and X as the surrogate screening variable tion process is time-series and the variables at adjacent moments have
and use MRMR to select variables of importance. Remove informa­ an impact on the quality variables at the batch-end moment. Therefore,
tion about irrelevant and redundant variables to obtain a new dataset the data are expanded in time series toward the previous moments with
{Ximpor , Y amp }. an expansion degree of 3 (meaning a total of 60 variables). The
3) Model with a new fully connected network and dataset {Ximpor ,Y amp }. data-unfolding method is shown in Fig. 3.
To generate high quality data and take into account the overall
The proposed method is superior and necessary for the following complexity of the model, the hidden layer of the generative network is
advantages and reasons: set as 2 layers after experimental debugging. The fully connected neural
network used for regression modeling is also set as two hidden layers.
1) Neither a single GAN nor a feature selection method can model well Also, the sample size of the generated data is the same as the initial
in a small sample setting. A small data sample size sets barrier for training data sample size in order to facilitate the validation. After cross-
feature selection methods. However, even if the sample size is small, validation debugging, the number of variable reservations mo in MRMR
it may contain irrelevant variables. is set to 20. The data in all experiments are divided into training and test
2) Combining GAN and MRMR is complementary. GAN can compensate sets by 4:1. The sample size of the initial dataset used for the experiment
for the difficulty of feature selection due to insufficient samples, was incremented from 500 to 1500, with an interval of 100. Subsequent
while MRMR can reduce model complexity by eliminating irrelevant experiments will try to find out the critical sample size of the proposed
variables. method.
Fig. 4 shows the prediction performance of the proposed method in
Section 4 will verify the effectiveness and practicality of the method the sample size of 1500 (the value of the predictor variable is the true
through a simulation case and a real industrial production case. The absolute value). It is apparent that the proposed method can provide a
main evaluation metrics are root mean square error (RMSE) and R- good overall fit when the sample size is 1500. In order to better represent
squared (R2), which are calculated as follows: the advantages and effectiveness of the proposed method for modeling
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ on small size samples, MRMR (mo set the same) and single-GAN were
∑NT
RMSE = (y(j) − ̂ y (j) )/(NT − 1) (13) chosen as the main comparison methods for the experiment. The single-
j=1

/

NT
( (j) )2

NT
( )2 Table 1
2
R =1− (j)
y − ̂y y(j) − y (14)
j=1 j=1 Variable description of the penicillin simulation.
Observed variables Variable Description
4. Case studies and application example x1 Aeration rate
x2 Agitator power
4.1. Penicillin simulation x3 Flow rate of feeding substrate
x4 Temperature of feeding substrate
x5 Substrate concentration
Penicillin is one of the most common and widely used antibiotics. x6 Dissolved oxygen concentration
After years of research and improvement, penicillin has a mature pro­ x7 Biomass concentration
duction process. However, there are still some problems in penicillin x8 Culture volume
x9 Carbon dioxide concentration
production process, such as insufficient process detection, difficult to
x10 Hydrogen ion concentration
measure key performance indicators, lack of process control and data x11 Temperature
analysis methods. With the deepening of the concept of green bio- x12 Heat generation
manufacturing, how to ensure high quality and efficient production of x13 Acid flow rate
penicillin has become a research hotspot. The data used in this work are x14 Base flow rate
x15 Cooling/Heating water flow rate
generated by the penicillin simulation platform designed by (Birol,

4
Z. Zhang et al. Computers and Chemical Engineering 186 (2024) 108707

Fig. 3. Data processing chart.

Fig. 4. GAN-FRL performance in the penicillin simulation case with a sample size of 1500.

GAN comparison is to verify the effectiveness and necessity of intro­ contains two dots. The red pentagram is the RMSE of GAN-FRL at a
ducing MRMR. Comparison results of the three methods are provided in sample size of 1100, and the dark blue circle is the RMSE of Single-GAN
Fig. 5, where the proposed method outperforms both MRMR and single- at a sample size of 1200. This can be interpreted to mean that with a
GAN on the sample size interval 500~1500. From the Fig. 5, we can see sample size of 1100, GAN-FRL can achieve the same accuracy as a
that the improvement of GAN-FRL accuracy with sample size in the Single-GAN with a reduced initial sample size of 100. The above analysis
interval of sample number 700~1300 is higher than the remaining two illustrates that GAN-FRL can achieve higher accuracy with fewer sam­
methods. When the sample size is less than 700, GAN-FRL slightly out­ ples, thus validating its superiority in data enhancement. Finally, at a
performs the two methods in one area. The reason is that the quality of sample size of 1500, the modeling accuracy and fit of GAN-FRL are
the samples generated by GAN is not high quality due to the too small better than the other two methods.
sample size. When the sample size is greater than 1300, the accuracy In order to show the improvement of the proposed method more
improvement of all methods tends to converge. accurately, Table 2 provides the performance of all methods at sample
In addition, it is important to note that the black dotted line box in sizes of 500, 1000, 1100, 1200 and 1500. It is important to note that ’M’
Fig. 5. The flatter box has two dots. The orange pentagram denotes the in the table denotes the method and SPL denotes the sample size used
R2 of GAN-FRL at a sample size of 1000, and the blue circle represents (containing both the training and test sets). From Table 2, GAN-FRL
the R2 of Single-GAN at a sample size of 1200. This result shows that the improves 5.2% compared to the Single-GAN at the sample size of
initial sample size used by GAN-FRL is 200 less than that of Single-GAN 1500. In addition, the accuracy of GAN-FRL at a sample size of 1100 is
with comparable fitting effect. In Fig. 5, looking down, the circular box comparable to that of Single-GAN at a sample size of 1500, indicating

5
Z. Zhang et al. Computers and Chemical Engineering 186 (2024) 108707

Fig. 5. Performance of the ablation experiments with sample size.

Table 2
Performance of RMSE of the 8 methods (Simulation case).
M

SPL GPR LSR MLP MRMR Single-GAN WGAN CGAN GAN-FRL (Proposed)

<500 0.01455 0.01310 0.01332 0.01323 0.01543(Unstable) 0.01572(Unstable) 0.01588(Unstable) 0.01533 (Unstable)
500 0.01437 0.01385 0.01328 0.01284 0.01283 0.01283 0.01282 0.01281
1000 0.01343 0.01339 0.01258 0.01222 0.01168 0.01167 0.01132 0.01121
1100 0.01327 0.01305 0.01247 0.01210 0.01146 0.01144 0.01104 0.01085
1200 0.01325 0.01273 0.01227 0.01141 0.01085 0.01085 0.01073 0.01045
1500 0.01285 0.01241 0.01180 0.01129 0.01083 0.01082 0.01048 0.01027
>2000 0.01206 0.01237 0.01034 0.01027 0.01029 0.01028 0.01025(4) 0.01024(3)

that the proposed method can achieve higher accuracy in an environ­ size of 1100 than Single-GAN at a sample size of 1500.
ment with fewer samples. Overall, the proposed method GAN-FRL out­ In addition, as shown in Table 2 and Table 3, WGAN has a small
performs all other methods in RMSE evaluation metrics. From Table 3, advantage over GAN. CGAN has a large improvement over GAN (the
GAN-FRL improves 2.06% compared to Single-GAN at the sample size of conditional data for CGAN is a time series expansion from T-7 to T-4, and
1500 in R2 evaluation metrics. In addition, the R2 of GAN-FRL at a y is the titer at the T-4 moment). From Fig. 6, the performance of the
sample size of 1000 is higher than that of MRMR at a sample size of proposed method is superior to CGAN in all sample size incremental
1500. As mentioned before, the sample size generated by the generative experiments, especially at sample sizes of 800–1100, with a significant
network is equal to the initial sample size. Then, the actual training improvement. From Table 2, compared to CGAN, GAN-FRL improves the
sample size given to the regression model part in the GAN-FRL model is most at a sample size of 1200, with an RMSE improvement percentage
1600. In contrast, the training sample size given to the regression model about 2.6%. The boxed values in Fig. 6 represent the RMSE and R2,
part in MRMR at an initial sample of 1500 is only 1200. This indicates which is to show the maximum advantage that can be achieved by GAN-
that the variable selection method is greatly constrained by the small FRL compared to CGAN with the same sample size. Calculations show
sample environment. Meanwhile, GAN-FRL has a better R2 at a sample that the R2 improvement is greatest at a sample size of 900, with an

Table 3
Performance of R2 of the 8 methods (Simulation case).
M

SPL GPR LSR MLP MRMR Single-GAN WGAN CGAN GAN-FRL (Proposed)

<500 0.6348 0.7112 0.7147 0.7192 0.5927(Unstable) 0.5913(Unstable) 0.5892 (Unstable) 0.593 3(Unstable)
500 0.6820 0.7163 0.7377 0.7443 0.7484 0.7484 0.7488 0.7492
1000 0.7266 0.7357 0.7685 0.7723 0.7823 0.7823 0.7835 0.7894
1100 0.7359 0.7415 0.7716 0.7741 0.7868 0.7870 0.7886 0.7929
1200 0.7364 0.7525 0.7758 0.7827 0.7897 0.7899 0.7941 0.7960
1500 0.7466 0.7633 0.7798 0.7881 0.7915 0.7918 0.8052 0.8078
>2000 0.7735 0.7641 0.8062 0.8074 0.8074 0.8074 0.8081(8) 0.8081(3)

6
Z. Zhang et al. Computers and Chemical Engineering 186 (2024) 108707

Fig. 6. Performance of the CGAN and GAN-FRL with sample size.

improvement about 1.21%, and the accuracy improvement is greatest at sizes of 900–1500.
a sample size of 1300, about 3.27% accuracy improvement. Parentheses
in Table 2 and Table 3 indicate the value of the next bit.
4.2. Real penicillin production process
Therefore, it can be concluded that the combination of generative
models and variable selection methods for data enhancement is neces­
Real penicillin production process data was provided by the National
sary and superior to other methods. Therefore, it can be shown that the
proposed method is effective and superior in the simulation case. Engineering Research Center for Biotechnology in Shanghai, China.
Penicillin fermentation was carried out in a 50-liter bioreactor using a
Finally, it should be noted that the number of samples (<500) in the
first row of the table indicates the smallest number of samples that the proprietary process control strategy and Penicillium chrysogenum
GAN can run through (which is no longer stable). The number of samples strain. The metric used to assess the production performance of the
(> 1500) in the last row of the table indicates that the data basically fermenter was penicillin titer. In general, penicillin titer can only be
converge to the number of samples required to bring the prediction obtained through offline sampling and subsequent analysis with long
accuracy on line (only the simulation process is available here, subse­ delays. Currently, real-time monitoring of this quality performance
quent real cases do not exist for this experiment because of limited data). metric can only be achieved by soft sensing modeled in conjunction with
As can be seen from the first row of data in Table 2, the continued other control and measurement variables (Fig. 7). However, the cost of
reduction of the sample does not have a large impact on the GPR and experimental data is quite high, especially the cost of time. The
LSR. However, this has a significant impact on neural network-based fermentation process of penicillin is typically 180–220 h, with titer
modeling approaches, especially on GAN-based methods. This is in measurements taken once every 8 h. This means that a batch can only
line with the perception that neural networks require a large number of have 23–27 labeled samples (including one sample for initial condi­
samples. Additionally, from the last two rows of Table 2, neural tions). However, to model with high accuracy, for such a complex pro­
network-based methods can basically reach the upper limit of prediction cess, a large number of samples are required. Therefore, it is necessary to
accuracy of the data. use a data enhancement method for the actual modeling.
A total of more than 60 batches from the laboratory, with a total of
Therefore, the effect of GAN, MRMR and MLP can be tested by first
developing models using all the available data. Recommendations for 1484 labeled samples after data cleaning. The penicillin fermentation
process is time-series and suffers from the problem of varying mea­
practical applications are given based on the results:
surement frequencies of the measured variables. Therefore, in this
paper, we consider 2 h to be unfolded forward from the moment of titer
1) If the GAN cannot be stabilized for data augmentation, then the
method is not applicable to the scenario, and other machine learning measurement as the base time and the data includes 65 variables. The
sample size of the initial data set used for the experiment was incre­
methods that cope with very small sample sizes can be attempted.
2) If the accuracy of GAN is comparable to that of MRMR, it is more mented from 500 to 1300, with an interval of 100. And ended up with
the full sample size of data, and a sample size of 1484. Considering the
appropriate to directly use the MRMR method considering the
complexity and high time cost of GAN training. practical situation, the training set and test set are set to 9:1.
Fig. 8 shows the prediction results of the proposed method with using
3) If the accuracy of GAN is significantly improved compared to
MRMR or MRMR cannot meet the accuracy requirement, the pro­ all the samples. On the whole, the prediction results of the proposed
method are very good, and the model is basically able to fit in the test
posed method GAN-FRL is more suitable for such a situation.
set. In order to more intuitively represent the effect of sample size on
modeling accuracy, the comparison experiments of GAN-FRL, GAN and
Combined with Fig. 5, GAN-FRL is suitable in this case for sample
MRMR are set up. From Fig. 9, the proposed method outperforms both

7
Z. Zhang et al. Computers and Chemical Engineering 186 (2024) 108707

Fig. 7. Penicillin actual laboratorial production process flowchart.

Fig. 8. GAN-FRL performance in the penicillin production process with a sample size of 1484.

MRMR and GAN in terms of full sample size. Besides, the prediction GAN-based methods. It is important to note that the data cover the
accuracy and goodness-of-fit of GAN-FRL are significantly improved whole process of the simulation case. This means that the sampling in­
compared to MRMR and also compared to GAN when the sample size is terval is fixed and contains labeled data at every sampling point, which
600–1000, especially at sample size of 900. Please note that the black is beneficial for constructing valid conditional data. Therefore,
dotted line oval box in Fig. 9, despite having 100 fewer samples compared to GAN, the performance of CGAN is significantly improved
compared to GAN and 200 fewer samples compared to MRMR, still (Table 2). However, in real cases, the intervals between label collection
outperforms both methods in terms of prediction accuracy. and measurement variable collection are not uniform, and there is a
In terms of the RMSE metrics, the GAN-FRL full sample size has been large environmental gap between different batches with different end
improved significantly. When all samples are used, GAN has a more times. Therefore, it is difficult to construct effective conditional data.
limited improvement over MRMR, while the proposed method is able to This leads to the performance of CGAN gradually being comparable to
have a larger improvement in accuracy and fit. This shows that the that of GAN. In summary, although CGAN can well improve the stability
fusion of feature selection is not only to reduce the overall complexity of virtual data generation and the quality of generated data through the
but also to improve the final prediction of the model. Overall, fusion introduction of conditional data, CGAN requires extra efforts in data
modeling of GAN with variable selection is very necessary and suitable integrity and standardization. GAN-FRL can be well adapted to low-
for this case. quality data in real scenes and can achieve good accuracy, which
As in the simulation case, CGAN is added here as a comparison test makes GAN-FRL suitable for the practical applications of industrialized
(WGAN works similarly to GAN in scenarios with uniform data distri­ data, highlighting the usefulness of the proposed method.
bution). From Fig. 10, CGAN has a more substantial improvement over In order to better analyze the GAN-FRL lifting, Table 4 shows the
GAN when the sample size is incremented from 600 to 1000. However, RMSE of the six methods at each sample size. From the last row of
as the sample size continues to increase, CGAN’s results are comparable Table 4, the RMSE of GAN-FRL is improved by 19.21% compared to
to GAN’s, while GAN-FRL is still significantly better than the other two MRMR, with the highest percentage of improvement. When the sample

8
Z. Zhang et al. Computers and Chemical Engineering 186 (2024) 108707

Fig. 9. Performance of the ablation experiments with sample size in the penicillin production process.

Fig. 10. Performance of the GAN-based method versus sample size.

size is 600, the absolute value of the improvement is the largest MRMR and only 0.47% compared to GAN. When the sample size is 1000,
compared to the other five methods. Overall, the proposed method GAN-FRL improves the most compared to MRMR with an absolute value
improves the accuracy by more than 10% at any sample size. In addition of 0.0154 and an improvement of 1.63%. Finally, it should be noted that
to this, from Table 5, the improvement in fit is relatively small when all the proposed method is not applicable when the sample size is <600 due
samples are used. The proposed method improves 0.62% compared to to the unstable GAN training. In summary, the proposed method

9
Z. Zhang et al. Computers and Chemical Engineering 186 (2024) 108707

Table 4
Performance of RMSE of the 7 methods (Real penicillin).
M

SPL LSR GPR MLP MRMR Single-GAN CGAN GAN-FRL (Proposed)

<600 0.10388 0.08105 0.08420 0.08035 0.09265 (Unstable) 0.09184 (Unstable) 0.09753 (Unstable)
600 0.08963 0.07986 0.07744 0.07530 0.07535 0.07501 0.06530
1000 0.08011 0.06975 0.06827 0.05915 0.05529 0.05433 0.05134
1100 0.07972 0.06630 0.06356 0.05377 0.05009 0.04924 0.04552
1200 0.07898 0.06299 0.05909 0.04853 0.04319 0.04302 0.03945
1300 0.07705 0.05980 0.05487 0.04261 0.03924 0.03919 0.03389
1484 0.06894 0.05193 0.04721 0.03467 0.03364 0.03362 0.02801

Table 5
Performance of RMSE of the 7 methods (Real penicillin).
M

SPL LSR GPR MLP MRMR Single-GAN CGAN GAN-FRL (Proposed)

500 0.8647 0.9061 0.8898 0.9049 0.8737(Unstable) 0.8763(Unstable) 0.8614 (Unstable)


600 0.8863 0.9080 0.9111 0.9220 0.9237 0.9244 0.9309
1000 0.9076 0.9297 0.9365 0.9469 0.9563 0.9603 0.9623
1100 0.9127 0.9365 0.9445 0.9587 0.9641 0.9650 0.9703
1200 0.9168 0.9427 0.9522 0.9663 0.9733 0.9736 0.9777
1300 0.9205 0.9483 0.9581 0.9752 0.9781 0.9782 0.9836
1484 0.9275 0.9611 0.9668 0.9828 0.9843 0.9843 0.9889

outperforms the other five methods for all sample sizes (GAN stabili­ nonlinear high-dimensional variable process. The actual industrial
zation). This shows that GAN-FRL can be applied with good results in production has a small sample size available for each batch because
this case, and further shows that the proposed method can be applied to erythromycin titers are measured offline, and historical batch data are
real data at the laboratory level. limited. Therefore, data generation and variable selection are critical to
Note: In this case, GAN-FRL is applicable as it is significantly process modeling and simulation. The overall flow for an industrial-
improved when the sample size is not higher than 1300. When all scale erythromycin production is shown in Fig. 11.
samples are used, the proposed method still has a more effective accu­ Through expert experience and data cleaning, 900 samples were
racy improvement, so the modeling cost can be combined to consider selected. The process contains 33 measurements or control variables and
whether to use this method or not. one quality variable. Since offline data is not uniform with online data
and the time interval of offline data is variable, the data is not expanded
in time series. The hidden layers of both generators and regression fitter
4.3. Real erythromycin production process are two layers. Combined experimental commissioning and experience,
the number of variable reservations mo in MRMR is set to 16. The sample
Data from a real erythromycin production process provided by a size of the initial data set used for the experiment was incremented from
biopharmaceutical company in China was used to develop soft sensor 400 to 900, with an interval of 50.
models. Fed-batch erythromycin fermentation was carried out in a 100 Fig. 12 shows the prediction results of the proposed method at the
m3 bioreactor with proprietary process control strategies by Saccha­ maximum sample size. Overall, the prediction is good for practical
ropolyspora erythraea. The indicator to evaluate the production perfor­ application. Like the simulation case and real penicillin case from lab­
mance of the fermenters is the erythromycin titer. Generally, oratory, the real erythromycin case is also set up with MRMR and Single-
erythromycin titers can only be obtained through off-line sampling and GAN as the main comparison experiment. From Fig. 13, GAN-FRL out­
later be analyzed with a long-time delay. From the current situation, in performs the other two methods in terms of prediction accuracy and fit
order to achieve real-time monitoring of erythromycin titers, it is in any sample size environment in the experiment. In particular, the
possible to model them with other control and measurement variables improvement of model performance by GAN-FRL is particularly
by soft sensors. However, the erythromycin production process is a

Fig. 11. Erythromycin actual industrial production flowchart.

10
Z. Zhang et al. Computers and Chemical Engineering 186 (2024) 108707

Fig. 12. GAN-FRL performance in the erythromycin production process with a sample size of 900.

Fig. 13. Performance of the three methods with sample size in the erythromycin production process.

significant when the sample size is below 600. As the sample size in­ Table 6
creases, the advantage of GAN-FRL gradually decreases, but the per­ Performance of RMSE of the 6 methods (Erythromycin).
formance is still better than that of the other two methods. Especially in M
the RMSE metric, even at a sample size of 900, there is still a large
SPL LSR GPR MLP MRMR Single- GAN-FRL
improvement by proposed method. In terms of trend, GAN-FRL tends to GAN (Proposed)
converge, Single-GAN is relatively convergent, while MRMR has not yet
400 382.85 323.88 320.73 316.88 292.54 283.17
converged. In other words, MRMR and Single-GAN can achieve perfor­ 450 364.57 323.27 316.23 310.14 290.02 282.00
mance strictly comparable to GAN-FRL if new samples can continue to 500 337.43 316.35 309.49 304.68 288.64 279.72
be provided. The result supports the advantage of GAN-FRL in a small 600 310.07 301.41 295.59 295.31 287.22 278.58
sample environment and is suitable for data enhancement. It shows that 900 291.03 288.39 286.38 282.67 279.91 272.44
GAN-FRL is well suited to solve the problem of small sample size in the
actual erythromycin production process.
of erythromycin titer is 6198.34. From the evaluation metrics of RMSE,
To better analyze the improvement of the proposed method, Table 6
GAN-FRL improves the accuracy by 3.2% compared to the Single-GAN
shows the performance of the six methods at sample sizes of 400, 450,
which has the best performance in the comparison method with the
500, 600 and 900. In the samples used in the experiment, the mean value
sample size of 400. The proposed method can still improve 2.7%

11
Z. Zhang et al. Computers and Chemical Engineering 186 (2024) 108707

Table 7 using a small sample size. As a result, GAN-FRL can model with high
Performance of R2 of the 6 methods (Erythromycin). accuracy and low redundancy for small sample problems and shows
M superior performance to other state-of-the-art existing methods. Then,
the experimental results of a simulation case prove that the modeling
SPL LSR GPR MLP MRMR Single- GAN-FRL
GAN (Proposed) performance of the proposed method is better than some state-of-the-art
existing methods. In addition to this, a real penicillin laboratory data
400 0.9253 0.9557 0.9580 0.9601 0.9797 0.9890
450 0.9341 0.9567 0.9609 0.9652 0.9819 0.9900
case validates that the proposed method can be effectively modeled in a
500 0.9489 0.9604 0.9661 0.9703 0.9831 0.9901 small sample setting. The experimental results show that the proposed
600 0.9661 0.9721 0.9772 0.9778 0.9846 0.9908 method is superior to other existing state-of-the-art methods. Further, in
900 0.9815 0.9835 0.9856 0.9881 0.9902 0.9921 the real erythromycin production case, experimental results show that
the proposed method can substantially reduce the sample requirement
while ensuring the accuracy of the model. Superior prediction perfor­
compared to Single-GAN when all samples are used. Concerning the R2
mance is observed compared with existing methods, indicating that the
index, from Table 7 in the sample size of 400, the proposed method
proposed GAN-FRL method can achieve good results in the data
compared to Single-GAN and MRMR improves 0.0093 and 0.0289. In
enhancement of real industrial production processes. It is noted that the
summary, the performance of the proposed method is remarkably
hyper parameters of the GAN in the proposed method were obtained by
improved compared to the other methods in small sample size modeling.
cross-validation and interval taking. Besides, the proposed method is
The result shows that the proposed GAN-FRL is necessary and superior
application-driven, and is black-box model in nature. Hence, the
to the state-of-the-art existing methods. It can further be shown that the
developed model lacks interpretability. It is challenging to provide
proposed method can be applied to a factory level environment.
theoretical support for the proposed approach. Future work can be
It should be stated that a sample size of 400 is already the lower limit
devoted to exploring parameter optimization methods and theoretical
for GANs to be able to perform data enhancement stably. In conjunction
derivation of the model to further improve its performance and
with the last part of Section 4.1, the proposed method is significantly
interpretability.
enhanced in this case when the sample size is 400–600. There is a large
enhancement compared to the MRMR method for all sample size cases.
Therefore, this case is suitable for this method. In addition to this, from CRediT authorship contribution statement
Table 6, the accuracy of GAN-FRL is improved by 3.62% compared to
MRMR when the sample is fully utilized. Then, Compared to Single- Zhongyi Zhang: Writing – review & editing, Writing – original draft,
GAN, GAN-FRL improves by 0.19%. However, both methods use GAN, Visualization, Methodology, Investigation. Xueting Wang: Validation,
but the proposed method reduces the variable redundancy by intro­ Resources. Guan Wang: Validation, Resources. Qingchao Jiang:
ducing MRMR thus reducing the complexity of the soft sensor model Funding acquisition, Formal analysis, Conceptualization, Supervision.
using the GAN, which improves the overall model performance. Xuefeng Yan: Writing – review & editing, Methodology. Yingping
Zhuang: Project administration, Funding acquisition.
5. Discussion and conclusion
Declaration of competing interest
Data enhancement re-feature selection is necessary, as on the one
hand, in practical industrial applications, low data quality can lead to The authors declare that they have no known competing financial
unreliable results in MRMR feature selection, on the other hand, MRMR interests or personal relationships that could have appeared to influence
can reduce sample complexity and thus generates better samples. the work reported in this paper.
Therefore, in the early stages of the study, this paper did a preparatory
experiment. The experimental results show that feature selection first Data availability
could dismiss some key correlations among the variables. Then, the
followed use of the GAN fails to generate some fundamental informa­ Data will be made available on request.
tion, thus limiting the performance of data enhancement. The paper
combines preparatory experiments and theoretical analysis to finalize
the proposed method. Because the inclusion of a preparatory experiment Acknowledgements
would have made the paper more like an analysis of ideas and thus less
formal, the detailed explanation is provided in the Appendix (Table A1, The authors gratefully acknowledge the support from the following
Table A2). foundations: National Key R&D Program of China under Grant No.
In this paper, a data enhancement method based on GAN and MRMR 2021YFC2101100, National Natural Science Foundation of China under
is proposed and applied to soft sensor development. The proposed Grants (62322309, 31900073), Shanghai Science and Technology
method combines the advantages of MRMR and GAN so as to overcome Commission Program (23S41900500), and Shanghai Rising-Star Pro­
the difficulties of the modeling of high-dimensional nonlinear processes gram under Grants (21QA1402400).

Appendix 1

For reading fluency and completeness, the pre-experimental results of using MRMR first are shown in the appendix section. This pre-experiment
uses data from the Case 1 penicillin simulation process. The data are treated in the same way as in Case 1 in the main text. There are four methods for
this preparatory experiment, MLP, MRMR, Single-GAN and MRMR-GAN (using MRMR first). The pre-experiment was to discuss the feasibility of the
method, so the sample size interval was set to 300.

12
Z. Zhang et al. Computers and Chemical Engineering 186 (2024) 108707

Appendix Table
1 Experimental results of four methods in RMSE.

SPL MLP MRMR Single-GAN MRMR-GAN

600 0.013032 0.012804 0.012759 0.019413


900 0.012752 0.012490 0.012440 0.019622
1200 0.012268 0.011409 0.010852 0.012331
1500 0.011801 0.011284 0.010831 0.010447
Appendix Table 1 shows the experimental results of the four methods. Where the sample size is 600 and 900, the MRMR-
GAN training is very unstable, and the generated samples are side effects. When the sample size is 1200, the MRMR-GAN
training is more stable, but the generated samples are of low quality. It can only achieve comparable accuracy with MLP.
When the sample size is 1500, the MRMR-GAN training is stable, and compared with Single-GAN, MRMR-GAN has a
significant improvement. The reason for this is that 1500 samples can basically already support MRMR independent
modeling, so MRMR-GAN training is stable, due to variable selection, so due to Single-GAN.
To better illustrate the quality of the data generated by using MRMR first and GAN later, the following will be modeled by training directly with
GAN-generated data (soft measurements modeled without original data).

Appendix Table
2 Experimental results of two method (without original data).

SPL Single-GAN MRMR-GAN

600 0.012682 0.37512 (not convergent)


900 0.012424 0.40118 (not convergent)
1200 0.010849 0.013037
1500 0.010832 0.010473
From Appendix Table 2, at sample sizes of 600 and 900, MRMR-GAN is
essentially incapable of effective modeling. Combined with Table I in the Ap­
pendix, the GAN is simply unable to generate valid samples in this scenario.
When the sample size is 1200, MRMR-GAN modeling using purely generated
data is a little less effective than modeling with original data. When the sample
size is 1200, MRMR-GAN modeling using purely generated data is a little less
effective than modeling with raw data, while the results are somewhat worse
than MLP. This indicates that at this time, although the GAN is able to generate
data of a certain quality, it is not yet able to learn effective features for data
enhancement. Finally, when the sample size is 1500, Single-GAN and MRMR-
GAN are essentially the same as the modeling results in the case with original
data.
In summary, this experimental idea is not in line with the idea of small sample modeling and should be aborted.

Appendix 2

NOTE: CGAN requires conditional data. In simulation case, Conditional data for CGAN is the data expansion from the T-7 to T-4 moments as the
baseline data (The value of penicillin is taken as T-4 moments). The generation is equal, which means that the training set is given as many real
samples as there are virtual samples. In real penicillin production process, Conditional data for CGAN is the average of all batches at some point in the
middle of the batch (the exact moment is not disclosed due to confidentiality agreements).

References Jin, H., et al., 2023. Soft sensor modeling for small data scenarios based on data
enhancement and selective ensemble. Chem. Eng. Sci. 279.
Jung, Y., et al., 2023. Neural network models for atmospheric residue desulfurization
Arjovsky, M., Chintala, S., Bottou, L., 2017. Wasserstein Generative Adversarial
using real plant data with novel training strategies. Comput. Chem. Eng. 177.
Networks. Proc. Mach. Learn. Res. 70, 214–223.
Kadlec, P., Gabrys, B., Strandt, S., 2009. Data-driven Soft Sensors in the process industry.
Birol, G.l., 2002. Ali C¸ inar A modular simulation package for fed-batch fermentation:
Comput. Chem. Eng. 33 (4), 795–814.
penicillin production. Computers and Chemical Engineering 26 (11), 1553–1565.
Kadlec, P., Grbić, R., Gabrys, B., 2011. Review of adaptation mechanisms for data-driven
Cleveland, W.S., Devlin, S.J., 1988. Locally Weighted Regression: An Approach to
soft sensors. Comput. Chem. Eng. 35 (1), 1–24.
Regression Analysis by Local Fitting. J. Am. Stat. Assoc. 596–610.
Kim, J.H., Choi, N., Heo, S., 2022. An iterative constrained least squares method for
Dahl, C.M., Sorensen, E.N., 2022. Time series (re)sampling using Generative Adversarial
continuous piecewise linear approximation. Comput. Chem. Eng. 168.
Networks. Neural Netw. 156, 95–107.
Kraskov, A., Stoegbauer, H., Grassberger, P., 2004. Estimating Mutual Information. Phy.
Ding, F., 2023. Least squares parameter estimation and multi-innovation least squares
Rev. E 69.
methods for linear fitting problems from noisy data. J. Comput. Appl. Math. (426).
Lee, H., et al., 2023. Efficient Gas Leak Simulation Surrogate Modeling and Super
Feng, L., Zhao, C., Huang, B., 2021. Adversarial smoothing tri-regression for robust semi-
Resolution for Gas Detector Placement Optimization. Comput. Chem. Eng.
supervised industrial soft sensor. J. Process. Control 108, 86–97.
Mumbelli, J.D.C., et al., 2023. An application of Generative Adversarial Networks to
Gaigole, P.M., Rajiv, B., 2023. Multi-performance characteristics optimization in near-
improve automatic inspection in automotive manufacturing. Appl. Soft. Comput.
dry rotary EDM of AlSiC by weighted principal component analysis. In: Materials
136.
Today: Proceedings.
Peng, H., Long, F., 2005. Feature Selection Based on Mutual Information:Criteria of Max-
Gao, S., et al., 2022. SVAE-WGAN-Based Soft Sensor Data Supplement Method for
Dependency, Max-Relevance,and Min-Redundancy. IEEe Trans. Pattern. Anal. Mach.
Process Industry. IEEe Sens. J. 22 (1), 601–610.
Intell. 27 (8), 1226–1238.
Jiang, Q., Yan, X., 2018. Parallel PCA–KPCA for nonlinear process monitoring. Control
Pu, Z., et al., 2023. Sliced Wasserstein cycle consistency generative adversarial networks
Engineering Practice 80, 17–25.
for fault data augmentation of an industrial robot. Expert. Syst. Appl. 222.
Jiang, Q., et al., 2021. Neural network aided approximation and parameter inference of
Rosipal, R., Trejo, L., 2002. Kernel Partial Least Squares Regression in Reproducing
non-Markovian models of gene expression. Nat. Commun. 12 (1), 2618.
Kernel Hilbert Space. The Journal of Machine Learning Research 2, 97–123.

13
Z. Zhang et al. Computers and Chemical Engineering 186 (2024) 108707

Sieng Yeo, Wan, et al., 2023. Just-in-time based soft sensors for process industries: A Yan, W., Li, Y., Yang, M., 2023. Towards deeper match for multi-view oriented multiple
status report and recommendations. J. Process. Control 128. kernel learning. Pattern. Recognit. 134.
Song, Z., Li, J., 2021. Variable selection with false discovery rate control in deep neural Yuan, X., et al., 2018. Deep Learning-Based Feature Representation and Its Application
networks. Nat. Mach. Intell. 3 (5), 426–433. for Soft Sensor Modeling With Variable-Wise Weighted SAE. IEEe Trans. Industr.
Souza, F.A.A., Araújo, R., Mendes, J., 2016. Review of soft sensor methods for regression Inform. 14 (7), 3235–3243.
applications. Chemometrics and Intelligent Laboratory Systems 152, 69–79. Zaman, E.A.K., Mohamed, A., Ahmad, A., 2022. Feature selection for online streaming
Wang, X., Liu, H., 2020. Data supplement for a soft sensor using a new generative model high-dimensional data: A state-of-the-art review. Appl. Soft. Comput. 127.
based on a variational autoencoder and Wasserstein GAN. J. Process. Control 85, Zhang, T., et al., 2023. Dynamic transfer soft sensor for concept drift adaptation.
91–99. J. Process. Control 123, 50–63.
Wang, K., et al., 2021. Common and specific deep feature representation for multimode Zhang, J., et al., 2023. Data driven recurrent generative adversarial network for
process monitoring using a novel variable-wise weighted parallel network. Eng. generalized zero shot image classification. Information Sciences 625, 536–552.
Appl. Artif. Intell. 104. Zhou, Y., et al., 2023. Narrow-band multi-component gas analysis based on photothermal
Xie, Z., Feng, X.a., Chen, X., 2022. Partial least trimmed squares regression. spectroscopy and partial least squares regression method. Sensors and Actuators B:
Chemometrics and Intelligent Laboratory Systems 221. Chemical 377.
Yan, X., Wang, J., Jiang, Q., 2020. Deep relevant representation learning for soft sensing. Zhu, Q.X., et al., 2021. Novel virtual sample generation using conditional GAN for
Information Sciences 514, 263–274. developing soft sensor with small data. Eng. Appl. Artif. Intell. 106.

14

You might also like