0% found this document useful (0 votes)
34 views13 pages

Articulo 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views13 pages

Articulo 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Applied Soft Computing Journal 164 (2024) 111974

Contents lists available at ScienceDirect

Applied Soft Computing


journal homepage: www.elsevier.com/locate/asoc

Deep learning with local spatiotemporal structure preserving for soft sensor
development of complex industrial processes
Xiao Wang *, Xiaomei Qi , Yong Zhang
School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo 255300, China

H I G H L I G H T S

• A deep learning method for soft sensor modeling uses a semisupervised pretraining strategy integrating spatiotemporal structure preservation.
• The developed soft sensor is deployed in an industrial scenario where soft sensing technology has not been widely adopted.
• This paper provides a theoretical and practical foundation for deep learning to address problems involving spatiotemporal data structures.

A R T I C L E I N F O A B S T R A C T

Keywords: Data-driven soft sensors have emerged as indispensable tools for predicting quality variables in complex in-
Soft sensor dustrial processes because of their cost-effectiveness and ease of maintenance. In particular, soft sensors based on
Data-driven modeling deep learning have been utilized in extensive research and successful applications in recent years. However,
Deep learning
traditional deep learning methods capture hierarchical data features by minimizing global fitting errors,
Local feature learning
neglecting the local structural characteristics implied in the original data. In this paper, we propose a new deep
learning method for soft sensor development. Utilizing autoencoders as the foundational architecture of our
network, a new semisupervised strategy is adopted for layerwise pretraining optimization. On the one hand,
more representative data features are extracted by maintaining the local spatiotemporal structure of the data; on
the other hand, layer-by-layer supervised learning is employed to identify the critical features that are aligned
with the ultimate task, which aids in obtaining the optimal network parameters and improving the resulting
prediction accuracy. Subsequently, a local spatiotemporal structure-preserving stacked semisupervised autoen-
coder (LSP-SuAE) is established. To evaluate the feasibility and effectiveness of the proposed approach, exper-
iments are carried out in a real industrial process. A soft sensor based on the LSP-SuAE is developed to predict the
rate of ethylbenzene conversion during the dehydrogenation of styrene. The experimental results demonstrate
that, compared to five other common or similar data-driven modeling methods, LSP-SuAE exhibits higher pre-
diction accuracy and better stability.

1. Introduction and can only be measured by manual offline analyses in the laboratory,
which leads to a serious time lag. Thus, this strategy fails to satisfy the
Manufacturing is often viewed as a complex industrial process, and requirements of industrial process control [4,5].
with its continual development, increasing demand for critical perfor- Soft sensors represent virtual sensing solutions that use easily
mance indicators such as product quality, energy conservation and measured related process variables as explanatory variables (inputs) and
emission reduction has been observed. In practical production scenarios, quality variables as target variables (outputs) to estimate the target
the real-time measurement of quality-related key process variables is an variables by constructing mathematical models for describing the
important premise for enhancing product quality and optimizing process input–output behavior of the system [6]. With the popularization of
control levels [1–3]. However, due to technical or economic constraints, distributed measurement systems and the development of the storage
some quality variables (e.g., the reaction rate, component concentra- and calculation capabilities of computers, data-driven methods have
tion, and viscosity) cannot be directly measured with hardware sensors become the mainstream development direction for soft sensor modeling

* Corresponding author.
E-mail address: [email protected] (X. Wang).

https://doi.org/10.1016/j.asoc.2024.111974
Received 6 November 2023; Received in revised form 10 June 2024; Accepted 3 July 2024
Available online 6 July 2024
1568-4946/© 2024 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
X. Wang et al. Applied Soft Computing 164 (2024) 111974

[7,8]. This approach relies on site data to establish models, enhancing effectiveness and practicality of the proposed method. This case
process understanding and providing optimization; it is particularly study demonstrates the significant potential of our method for use in
suitable for complex industrial process modeling where the underlying practical applications, not only by effectively reducing energy con-
mechanisms are unclear [9–11]. With the advent of the Industry 4.0 era, sumption and production costs but also by further expanding the
the integration of advanced machine learning techniques with soft application boundaries of deep learning technology in the field of
sensor development has provided new avenues for improving process industrial process monitoring and optimization.
efficiency and decision-making. Among the various available machine • This paper provides a new soft sensor development perspective for
learning paradigms, deep learning, with its significant advantage complex industrial processes and offers a theoretical and practical
regarding hierarchical data representation, has shown great potential foundation for deep learning to address a wide range of problems
for capturing the complex nonlinear relationships between process involving complex spatiotemporal data structures. Furthermore, the
variables [12]. With the successful applications of deep learning in proposed framework and method provide a reference for subsequent
computer vision, natural language processing, and other fields, soft research on similar issues.
sensors based on deep learning (DLSSs) have been widely researched
and validated in various scenarios, demonstrating outstanding perfor- The remainder of this paper is organized as follows. The current
mance in handling complex, modern, and large-scale industrial data [13, research status and background are summarized in Section 2. Section 3
14]. presents the proposed LSP-SuAE. Section 4 describes the industrial
Despite the satisfactory results produced by DLSSs in many variable application of our method, the experimental setup, and an analysis of
prediction tasks, these sensors still face some nonnegligible challenges in the experimental results. Section 5 summarizes this study.
practical industrial applications [15,16]. First, many large-scale indus-
trial processes operate in harsh environments with slow changes. Due to 2. Related work
the strong mechanical and topological relevance levels of process vari-
ables, industrial process data often exhibit strong local behaviors. In recent years, data-driven soft sensors have emerged as crucial
Therefore, preserving these structures during the model learning process tools for industrial process monitoring and quality prediction. These
can lead to the development of more robust and interpretable soft sen- models leverage process data to construct nonlinear relationships be-
sors [17]. However, traditional deep learning methods capture hierar- tween variables, realizing the prediction of quality variables. Traditional
chical data features by minimizing global fitting errors, ignoring the methods such as principal component analysis (PCA) [18], partial least
local structural features that are implicit in the original data, which is squares (PLS) and its variants including kernel-based PLS [19,20],
not conducive to soft sensor modeling and accurate prediction. Most support vector regression (SVR) [21], and artificial neural networks
existing methods either overlook the importance of local structures or (ANNs) [22], have laid the groundwork for early research. While these
fail to effectively integrate them into the learning process. Second, un- approaches perform well with linear or mildly nonlinear relationships,
supervised layerwise pretraining can be employed to effectively extract their capabilities are limited when addressing large-volume, high--
the intrinsic features of data in cases with scarce labeled data, but this dimensional, and highly nonlinear industrial data, often failing to cap-
approach exhibits certain flaws in soft sensing tasks that require accu- ture complex data characteristics [23].
rate regression-based prediction. Since the unsupervised training stage With the widespread application of deep learning in tasks such as
is not optimized for specific tasks, the constructed model may fail to image recognition and natural language processing, research related to
capture the most critical features for the final task, and the irrelevant DLSSs has rapidly advanced. Through unsupervised layerwise pre-
features learned during this stage may lead to overfitting problems. training, deep learning effectively avoids the issues of vanishing and
Therefore, it is necessary to develop deep learning methods that are exploding gradients, demonstrating its powerful ability to extract high-
more suitable for soft sensor modeling, combining global and local level feature representations from complex data. Various DLSSs,
perspectives during model training and enhancing the target orientation including those based on convolutional neural networks (CNNs) [24],
of the interlayer features to achieve accurate quality prediction. deep belief networks (DBNs) [25], stacked autoencoders (SAEs) [26],
Based on these considerations, in this paper, we propose a new DLSS long short-term memory (LSTM) [27] and recurrent neural networks
called a local spatiotemporal structure-preserving stacked semi- (RNNs) [28], have produced substantial research results. They have
supervised autoencoder (LSP-SuAE) and apply it to the complex indus- demonstrated superior performance to that of traditional methods in
trial process of ethylbenzene dehydrogenation-based styrene production multiple fields, such as chemical engineering, metallurgy, and biology
to predict the conversion rate of ethylbenzene. During the pretraining [29,30].
process of a single autoencoder, local data patterns are integrated on the Despite the clear advantage of DLSSs in terms of data representation
basis of traditional global feature extraction, capturing the complex capabilities, deep learning, which acts as a global feature extractor,
dynamic dependencies between variables by encoding local spatiotem- often focuses only on the global characteristics of data, neglecting the
poral features. Then, by conducting fine-tuning with labeled data, the preservation of local data features [31]. This limitation has resulted in
task orientation of the features is further enhanced. On this basis, the restricted generalizability and predictive performance in specific in-
LSP-SuAE model is established through multilayer stacking. This semi- dustrial applications. To address this issue, some recent studies have
supervised pretraining strategy can help the model better understand begun exploring deep learning methods that can extract local features.
the dynamic changes that occur in complex industrial processes, which For instance, a manifold regularized convolutional layer (MRCL) was
is crucial for enhancing the performance of soft sensors. The main proposed in [32] to preserve the neighborhood structure of data. In [33],
contributions of this paper are as follows. a deep Laplacian autoencoder (DLapAE) was developed to constrain
neighboring information in the given data space. A deep stacked sparse
• A new deep learning-based soft sensor modeling method is proposed. embedded clustering (DSSEC) method was proposed in [34] to form an
By maintaining the intrinsic spatiotemporal correlations of process improved local structure retention strategy. Although these methods
data, the model becomes more sensitive to minor process changes, have achieved good results in tasks such as image processing, the
improving the accuracy and robustness of predictions. A new attempt feasibility of their soft sensing applications is worth validating further.
is made to measure the spatiotemporal adjacency of data, enhancing Some studies have incorporated deep learning methods with local
the feasibility and effectiveness of identifying potential spatiotem- structure preservation into soft sensor modeling and achieved satisfac-
poral dependencies in industrial data. tory results [35,36], but they primarily optimized the performance of
• The developed soft sensor is deployed in an industrial scenario where these methods by preserving the local geometric structure of the input
soft sensing technology has not been widely adopted, verifying the data. For soft sensing tasks, given the temporal correlation

2
X. Wang et al. Applied Soft Computing 164 (2024) 111974

research remains valuable in terms of data adjacency criteria and the


applicability of these methods to different industrial scenarios and data
types.
An analysis of the existing research indicates that some studies on
DLSSs have already incorporates local structure preservation. However,
these methods still have areas that are worthy of exploration and
improvement and encounter the problem of poor correlations between
the features extracted during the pretraining phase and quality vari-
ables. Although hidden layer features effectively express input variables,
it is difficult to achieve improved quality prediction performance by
relying only on fine-tuning at the end of the modeling process. There-
fore, developing new deep learning methods with comprehensive local
feature mining for DLSS modeling and deploying them in new industrial
application scenarios can not only enhance the ability of the resulting
model to learn complex dependencies in process data but also drive the
practical deployment and widespread application of soft sensors.

3. Methods

Fig. 1. Model structure of an AE with fully connected layers. 3.1. Preliminaries

An autoencoder (AE) is an ANN that can obtain an effective repre-


sentation of input data through unsupervised learning and can be used
as a powerful feature detector when pretraining deep neural networks. A
typical AE is a single-hidden-layer network consisting of an encoder and
a decoder, whose structure is illustrated in Fig. 1. The input and output
layers have the same number of neurons, and the hidden features are
extracted by reproducing the input data. The data reconstruction pro-
cess is as follows.
For an input sample x, after encoding, the hidden layer representa-
tion h is obtained as follows:

h = f(W(1) x + b(1) ) (1)


Then, h is decoded to obtain the reconstructed output:
(1) (1)
̂ W x+̂
x = f(̂ b ) (2)

(1) (1)
where {W(1) , b(1) } and {̂
W ,̂ b } denote the weight parameters of the
encoder and decoder, respectively, and f is an activation function such as
the sigmoid or rectified linear unit (ReLU) function. The above trans-
formations are all nonlinear mappings, and their weight parameters are
usually optimized through the backpropagation (BP) algorithm.
If the original dataset is X = {x1 , x2 , …, xN }and N is the number of
Fig. 2. Model structure of the SAE.
samples, the loss function of the AE can be defined by minimizing the
reconstruction error:
characteristics of process data, it is also necessary to further consider the
adjacency relationships of data in the time domain. To learn the N
∑ /
neighboring information of data more comprehensively, some recent JAE = x i ‖2 2N
‖xi − ̂ (3)
i=1
studies have explored incorporating spatiotemporal feature preservation
into deep learning to enhance the effects of model training [37]. Xie where ||⋅|| is the L2 norm. When the dimensions of h and x are con-
et al. proposed an attention- and bidirectional LSTM (BiLSTM)-based strained to be different, if the system can reconstruct x, then h carries the
spatiotemporal feature extraction network (Att. BiLSTM-STFE) to obtain information of the original data; i.e., h is a good expression of x.
local features by integrating an attention mechanism and LSTM [38]. Liu An SAE is based on an AE, but it increases the depth of the hidden
et al. utilized a data mode related self-attention mechanism to learn layers. Taking the feature h of the hidden layer of the encoder as the
features, enhancing the perceptual ability of data within the same mode original information, a new AE is trained to obtain a new feature.
[39]. Although attention mechanisms are good at capturing the Through this layerwise stacking procedure, multiple hidden layers are
long-distance dependencies between data, their suitability for industrial retained to obtain a deep neural network, as shown in Fig. 2. The feature
data with strong local behaviors still requires an investigation. Liu et al. of the deepest layer l is calculated as follows:
developed a spatiotemporal neighborhood-preserving stacked autoen-
coder (STNP-SAE) and applied it to perform soft sensor modeling in a h(l) = f(W(l) h(l− 1)
+ b(l) ) (4)
hydrocracking process [40]. Based on certain rules in the spatial and In soft sensor modeling scenarios, to predict quality variables, a
temporal domains, two sets of neighboring samples were integrated into regression layer is usually added at the top of the utilized network to
the model training procedure. Wang et al. developed a prediction model obtain the final output as follows:
by learning the spatiotemporal features of images through multiscale
feature extraction [41]. These approaches offer new insights into y = f(W(p) h(l) + b(p) )
̂ (5)
exploring the spatiotemporal correlations of industrial data, yet further

3
X. Wang et al. Applied Soft Computing 164 (2024) 111974

dynamic changes that occur near data, sequential adjacent points are
chosen as the nearest neighbors. For a sample xi collected at time t, K
neighbors can be identified by searching along the time axis:
/ / /
DK (xi ) = {xik }Kk=1 = {x(t − K 2), x(t − K 2 + 1), …, x(t + K 2)} (7)

where DK (xi ) represents a set composed of the neighboring samples.


Notably, these K neighbors are also included within the dataset X, i.e.,
DK (xi ) ∈ X.
Second, to ensure that each neighbor contributes differently during
the modeling process, we introduce a weighting coefficient to measure
the importance levels of neighboring samples. This weight coefficient is
defined between a sample and its neighbors, representing the strength of
their adjacency relationship as well as their importance for maintaining
local structures. In this paper, we employ the heat kernel function to
measure the spatial proximity between samples, and the function is
defined as follows:
{ /
exp(− ‖xi − xik ‖2 ) c, ifxik ∈ DK (xi )
Wik = (8)
0 otherwise

where Wik is the weight coefficient between xi and its neighbor xik ,
‖xi − xik ‖2 denotes the squared Euclidean distance between xi and xik ,
and c is a scaling parameter for controlling the width of the kernel
function. Although the heat kernel function is related to the Euclidean
distance between two points, it provides a similarity measure based on
the distances between points, which decreases as the distance increases.
Fig. 3. Structure of the LSP-uAE. Compared to previously developed research methods based on spatial
distance, the heat kernel function, through its sensitivity to local data
where ̂ y is the predicted value of the target variable y and {W(p) , b(p) } is structures, can better handle high-dimensional data. By adjusting
the network weight of the regression layer. Once the pretraining stages parameter c, the similarity decay rate can be controlled, thereby flexibly
of all AEs have been completed, the labeled data are used for supervised capturing different levels of data structures. Additionally, this mea-
fine-tuning. The loss function is defined as follows: surement method is smoother and more robust to noise and small
disturbances.
N
∑ / Afterward, we incorporate the adjacency relationships between xi
Jfine− tuning = y i ‖2 2N
‖yi − ̂ (6)
i=1
and its K neighbors into the optimization objective of the AE to preserve
the local behavioral characteristics of the input data. During the data
Finally, the optimal network parameters {W(1)∗ , b(1)∗ , …, W(l)∗ , b(l)∗ , reconstruction process, these neighboring points should be as close
W(p)∗ , b(p)∗ } are obtained after iterative training. together as possible. Therefore, we introduce a regularization term to
preserve the local structure in the global reconstruction optimization
3.2. Local spatiotemporal structure-preserving stacked semisupervised AE objective and construct a new AE loss function as follows:
(LSP-SuAE)
N
∑ N ∑
∑ K
/
JLSP− AE = x i ‖2 2N + λ
‖xi − ̂ ‖̂ x ik ‖2 Wik
xi − ̂ (9)
Based on the fundamental theory of SAE, a new LSP-SuAE network is i=1 i=1 k=1
developed for industrial soft sensor modeling in this section.
where λ is a tradeoff parameter that is used to balance the relationship
3.2.1. LSP-uAE between reconstructing the target sample itself and reconstructing
To obtain more accurate data features, local data feature learning is adjacent samples.
included in the training processes of individual AEs. Within the learning The above strategy first identifies adjacent samples based on their
dataset, two samples are deemed neighbors if they are closely aligned in temporal proximity and then determines the adjacent weights based on
the input space. A proficient prediction model should maintain these spatial correlation measurements, accurately and quickly constructing a
neighborhood relationships from the inputs to the outputs. Therefore, spatiotemporal neighborhood structure for the original samples. By
during the data reconstruction process of an AE, the local neighborhood maintaining the local spatiotemporal structure in the data reconstruc-
structure of the data is maintained while reproducing the overall input tion process of the AE, the global and local structural features of the
data, which is conducive to obtaining representative and useful hidden input data can be comprehensively identified and extracted, and more
layer features. diverse and representative hidden layer features can thus be obtained.
Industrial data exhibit strong dynamic correlations, and process data To further improve the resulting prediction accuracy, an appropriate
are often slow-changing continuous time series that maintain significant amount of supervised tuning is added to the training process of the AE to
time dependencies over short periods, which is very important for increase the correlation between the extracted features and the quality
establishing dynamic models and predicting future data points. There- variables. Some labeled data are added to the learning dataset, and data
fore, when establishing local adjacency criteria, we incorporate both the reconstruction is carried out on the unlabeled data and the input parts of
temporal and spatial correlations of the input data, with a special focus the labeled data. Prediction neurons are added to the output layer to
on the significance of temporal adjacency. This approach, which enables obtain the predicted values of the quality variables. The quality pre-
the capture of more reliable local data relationships, is implemented diction error is included in the optimization procedure, and the network
through the following specific steps. weights are optimized via semisupervised learning. The framework of
First, for each sample in the dataset X, neighbors are selected within the LSP-uAE is shown in Fig. 3, and the loss function is as follows:
the time domain. Since time series neighbors can track the potential

4
X. Wang et al. Applied Soft Computing 164 (2024) 111974

Fig. 4. Framework of the proposed LSP-SuAE algorithm for soft sensor modeling.

Fig. 5. Flowchart of the styrene production process.

JLSP− = J(U+L) 3.2.2. Stacked LSP-uAE (LSP-SuAE)


LSP− AE + Jsuv− (10)
(L)
uAE tuning
After training an LSP-uAE, its encoder is retained, and the hidden
N
∑ U+L
/ N
∑ K
U+L ∑ layer features are used as inputs for training a new LSP-uAE, thereby
J(U+L)
LSP− AE = x i ‖2 2NU+L + λ
‖xi − ̂ ‖̂ x ik ‖2 Wik
xi − ̂ (11) establishing a deep LSP-SuAE model with multiple hidden layers. In this
i=1 i=1 k=1 way, the local spatiotemporal structure features of the given data are
preserved in a layer-by-layer manner, and we can obtain multiple hid-
NL

Jsuv−
(L)
= ‖yi − ̂
/
y i ‖2 2NL (12) den layer features and network parameters that are more representative
tuning
i=1 of the input data and conducive to quality prediction. Therefore, as a
deep feature extractor, the LSP-SuAE is well suited for soft sensor
where y is the measured value of a quality variable, ̂ y is the predicted
value obtained by the LSP-uAE, NU is the number of unlabeled data Table 1
points contained in the learning dataset, NL is the number of labeled data Explanatory variables of the soft sensor for predicting the conversion rate of
points, and NU+L is the total amount of data in the learning dataset. The ethylbenzene.
new loss function is used during pretraining, and the optimal parameters
Variable Meaning Unit Data
of the LSP-uAE are obtained. range
In summary, compared with the original AE, the LSP-uAE considers
meb Amount of feedstock (ethylbenzene) t/h ≤26.5
the local features of the input data while learning the global nonlinear ti1 , ti2 Inlet temperatures of the first and second reactors ◦
C 605–630
mapping relationship. This enforces the local smoothness of the output δ Water-oil ratio (WOR) 1.3–1.6
and helps avoid overfitting. Supervised tuning is adopted, as this method t1 , t2 Reaction temperatures of the first and second ◦
C 550–650
can fully mine the dynamic process behavior reflected by both the reactors
p1 ,p2 Reaction pressures of the first and second kPa 40–100
labeled and unlabeled samples, improving the correlations between the
reactors
hierarchical features and quality variables as well as the final prediction H1 ,H2 Liquid hourly space velocities (LHSVs) of the first h − 1
0.5–1.5
performance. and second reactors
mca Catalyst filling amount m3 ≤23
ρca Bulk density of the catalyst kg/ 1.0–1.4
m3

5
X. Wang et al. Applied Soft Computing 164 (2024) 111974

modeling. The learning procedures of the LSP-SuAE are shown in Al- industrial process of styrene production to predict the conversion rate of
gorithm 1, where minibatch gradient descent (MBGD) is used for ethylbenzene during the dehydrogenation reaction. Furthermore, the
parameter optimization. Fig. 4 provides the basic framework of the soft performance of the LSP-SuAE was validated by comparing its experi-
sensor modeling method based on the LSP-SuAE. mental results with those of five data-driven methods.
Algorithm 1. : LSP-SuAE Algorithm
4.1. Background

4. Application Styrene is a widely used organic chemical material that is an


important monomer for synthetic resins, ion exchange resins, and rubber
In this section, a soft sensor based on the LSP-SuAE was applied to the materials. It can also be used in pharmaceuticals, dyes, and pesticides.

6
X. Wang et al. Applied Soft Computing 164 (2024) 111974

the entrance of the first-stage reactor, and the central pipe passes
through a reaction bed covered with catalyst in the radial direction to
enable the dehydrogenation reaction, with a reaction temperature of
550–650 ◦ C. The product of the first reactor is heated by the main steam
flowing through the interstage reheater, and it enters the second-stage
reactor again for dehydrogenation. The output temperature of the sec-
ond reactor is approximately 570 ◦ C. The product of the second-stage
reaction passes through a heat exchanger to lower the temperature of
the ethylbenzene, and crude styrene is obtained after subsequent
condensation and gas–liquid separation steps. During the above pro-
cess, the conversion rate of the dehydrogenation reaction is an impor-
tant quality index that directly impacts the concentration of styrene in
the dehydrogenation product. A conversion rate that is too high will
cause severe coking of the catalyst bed in the dehydrogenation reactor,
which will affect the polymerization rate. If the conversion rate is too
low, it will be difficult to obtain styrene with good purity, and the energy
consumption level will increase.
Due to the complex chemical background of the dehydrogenation
reaction, production occurs in a high-temperature environment, and
relative lags occur between the various process variables. Therefore, the
Fig. 6. Variation in the RMSE of the LSP-SuAE under different LSR and
λ values.
Table 2
Optimal values of the hyperparameters in the LSP-SuAE.
China leads the world in styrene production. Due to strong demand, the
styrene production capacity has continued to expand in recent years, yet Hyperparameter Optimal value
a gap remains between the supply and demand of styrene. Under the Tradeoff parameterλ 2e+2
pressure of market competition, production facilities must improve the Labeled sample ratio LSR 1/8
Parameter c 30
effectiveness of their quality inspection and process control mechanisms
Parameter K 7
and implement product quality monitoring processes to save energy and Minibatch size m 32
increase profits. Pretraining learning rate 0.1
Styrene is mainly produced from the dehydrogenation of ethyl- Number of pretraining epochs 10
benzene, which accounts for more than 80 % of all global production. A Number of fine-tuning epochs (layerwise pretraining) 5
Fine-tuning learning rate 0.5
chemical factory in East China uses a negative-pressure adiabatic
Number of fine-tuning epochs 185
dehydrogenation method to produce styrene. The styrene output of this
company in 2023 was 400,000 tons. The production process has three
steps: a dehydrogenation reaction; process condensation with an off-
gassing treatment; and liquid-phase dehydrogenation, distillation, and Table 3
separation. The process flow is shown in Fig. 5. In the ethylbenzene Training results of the LSP-SuAE.
dehydrogenation unit, the catalytic dehydrogenation reaction of ethyl- Phase Performance Indicators
benzene is used to obtain crude styrene, which is the most critical step of RMSE MAPE R2 ρSCC
the process. The main reaction is as follows.
Training 0.0579 0.51 % 0.9584 0.9752
The raw material (ethylbenzene) is mixed with preheated primary
Validation 0.0658 0.53 % 0.9532 0.9742
water to form a gas-phase mixture with a stable ratio. This mixture is Testing 0.0665 0.53 % 0.9527 0.9738
mixed with the main steam produced by a steam superheating furnace at

Fig. 7. Variation in the RMSE of the LSP-SuAE under different c and K values.

7
X. Wang et al. Applied Soft Computing 164 (2024) 111974

Fig. 8. Training and validation loss curves produced by the LSP-SuAE.

4.2. Experimental setup


Table 4
Comparison among the prediction performances and training times of the six
4.2.1. Explanatory variables
methods.
The data in this study were collected at the styrene production site of
Method Training Time (s) Performance Indicators a chemical company in East China. Taking the total conversion rate of
RMSE MAPE R2 ρSCC ethylbenzene in the dehydrogenation reaction as the target variable, 12
LSP-SuAE 18.3367 0.0665 0.53 % 0.9527 0.9738 process variables that are closely related to the conversion rate of eth-
SAE 9.5027 0.0955 0.79 % 0.8812 0.9371 ylbenzene were selected from the collected variables as the input vari-
SSAE 14.1044 0.0841 0.65 % 0.9124 0.9621 ables of the soft sensor (Table 1). The activity index of the catalyst also
SVR 2.8830 0.0942 0.79 % 0.8854 0.9373
has a certain influence on the conversion rate of ethylbenzene. However,
STNP-SAE 17.7805 0.0688 0.53 % 0.9499 0.9733
Att. BiLSTM-STFE 32.5556 0.0803 0.64 % 0.9331 0.9692
the StyroMAX-9 catalyst used in production scenarios has stable cata-
lytic activity, and hence, its activity was not taken as an input variable.

4.2.2. Datasets
The experimental data were collected from November to December
2023. The temperature and pressure variables among the explanatory
variables were obtained from hardware sensors, while the other vari-
ables were estimated on-site. The target variable, the ethylbenzene
conversion rate, was determined via manual analysis in the laboratory.
After implementing preprocessing steps such as outlier removal,
normalization, and alignment on the data labels, datasets were selected
for model training. The training set consisted of 2050 samples, including
1750 unlabeled samples and 300 labeled samples. Both the validation
and test datasets contained 100 labeled samples each. The computer
configuration used in the experiment was as follows: Windows 10 (64-
bit), an Intel Core i7–8550 U (1.80 GHz) CPU, and 8.0 GB of RAM.

4.3. Results and analysis


Fig. 9. Boxplot of the RMSEs induced by the six methods.

4.3.1. Parameter sensitivity analysis


conversion rate of ethylbenzene cannot be directly measured using
During the modeling process of the LSP-SuAE, the sigmoid function
hardware sensors; it can only be measured through an offline analysis of
was used as the nonlinear activation function for neurons, and the
the product in a laboratory. This process has a lengthy time lag, which
network structure was set to 12–7-4 based on trial and error. After
inhibits timely and effective process monitoring and leads to risks and
pretraining, a regression layer was connected to the top of the LSP-SuAE
inconvenience during the production procedure. To better control the
to form a 12–7-4–1 soft sensor network. To test the sensitivity of the
product quality and reduce the energy consumption and production
hyperparameters and choose the best parameter settings, we conducted
costs, we developed a soft sensor to predict the total conversion rate of
multiple experiments to study the influence of parameter selection on
ethylbenzene in multistage dehydrogenation reactions to improve the
the performance of the LSP-SuAE. The root mean square error (RMSE)
timeliness and accuracy of production process control.
was used for training error comparisons; this metric is defined as fol-
lows:

8
X. Wang et al. Applied Soft Computing 164 (2024) 111974

Fig. 10. Error histogram results produced by the six methods.

√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
/ ̅
√ N The parameters c and K affect the learning results of the model by
√∑
RMSE = √ (yi − ̂ yi) 2
N (13) impacting the size and weight of the neighborhood structure. Fig. 7(a)
i=1 shows the prediction errors induced under different K values while c was
kept constant. The value of K was searched within the range from 1 to
The most influential parameters included the labeled sample ratio
16. An appropriate K value helps improve the prediction accuracy.
(LSR) in layerwise pretraining, the neighborhood parameter K, the
However, if K is too large, the neighborhood becomes too wide, making
neighboring sample weight adjustment parameter c, and the tradeoff
the algorithm ineffective. When K is too small, the data neighborhood is
parameter of the local spatiotemporal structure-preserving regulariza-
too small to effectively extract the local geometric features. Fig. 7(b)
tion term λ.
illustrates the RMSEs generated by different values of c under a fixed K
Since the LSR and λ directly determine the participation levels of
value. The c value was searched in the range from 10–100. To demon-
local structure preservation and layerwise supervised adjustment during
strate the effect of local neighborhood structure preservation on the
model learning, we first discuss their impacts on the performance of the
performance of the original SAE, the prediction error induced by the SAE
LSP-SuAE. LSR = 1/n indicates that the labeled samples used for su-
was plotted as the baseline in Fig. 7. The RMSE values fluctuated
pervised tuning during layerwise pretraining accounted for 1/n of all
significantly under different c and K values. In most cases, however, the
training samples. In the experiments, the value of λ was searched in the
RMSE of the LSP-SuAE was below that of the baseline.
range from 2e− 1 to 2e+3, and the LSR was searched in the range from
In addition to the above parameters, the learning rate was deter-
1/32–1. Fig. 6 shows the RMSE variation induced under different LSR
mined by a grid search conducted in the range from 0.001 to 10. The
and λ values. When the LSR was too small, the effect of supervised tuning
MBGD algorithm was used to optimize the parameters. The trial-and-
during pretraining was not significant, and when it was too large, it was
error strategy was used to select the minibatch size and the number of
prone to overfitting. Moreover, a large λ led to a large error, and a
epochs. The experimental results provided references for the
moderate value resulted in good prediction accuracy.

9
X. Wang et al. Applied Soft Computing 164 (2024) 111974

Fig. 11. Prediction curves produced by the six methods on the test dataset.

hyperparameter selection process. The optimal values of each parameter ∑


of the LSP-SuAE are shown in Table 2. y i ))2
6 Ni=1 (R(yi ) − R(̂
ρSCC = 1 − (16)
Based on the parameter settings listed in Table 2, model training was N(N2 − 1)
conducted for the LSP-SuAE. In addition to the RMSE, we also utilized where y is the average value of yi and R(yi ) and R(̂ y i ) are the
the mean absolute percentage error (MAPE), the coefficient of deter- Spearman ranks of yi and ̂ y i , respectively. Among the aforementioned
mination (R2), and Spearman’s correlation coefficient (ρSCC ) as perfor- metrics, the RMSE and MAPE denote the absolute magnitude and rela-
mance evaluation metrics for the soft sensors; these measures are tive magnitude of the prediction deviations, respectively. R2 represents
defined as follows: how much of the total variance in the output variable data could be
100% ∑ N ⃒⃒ yi − ̂

y i ⃒⃒ explained by the model, ranging from 0 to 1. The closer the value is to 1,
MAPE = ⃒ (14) the greater the correlation between the predicted and actual values. ρSCC
N i=1 ⃒ yi ⃒
indicates the direction of the correlation between two variables. When
Nt
the value approaches 1, it signifies greater relative consistency between

y i )2
(yi − ̂ the predicted values and the actual trend of the values. The final
i=1
2
R = 1− Nt
(15) learning results are presented in Table 3.

(yi − y)2 The final training and validation error curves are shown in Fig. 8.
i=1 Throughout the training process, the gap between the training and
validation curves remained stable and diminished gradually, indicating
stable model convergence.

10
X. Wang et al. Applied Soft Computing 164 (2024) 111974

strong local characteristics. In comparison, the SSAE exhibited overall


improvements in its prediction accuracy and other metrics, indicating
that adding supervised fine-tuning for the pretraining objectives helps to
learn data features that are more relevant to the target variable. How-
ever, none of the aforementioned methods consider manifold learning
for process data, and the intrinsic information contained in the extracted
features is not sufficiently comprehensive. Compared to the above three
methods, Att. BiLSTM-STFE achieved a certain sensing performance
improvement, benefiting from its consideration of the spatiotemporal
correlations of process data, thus retaining more intrinsic information
during the data representation process. However, this method has
limited capabilities when addressing process data with strong local be-
haviors. By constructing manifold regularization terms to preserve the
spatiotemporal neighborhood structure of the input data, the STNP-SAE
produced better experimental results than did Att. BiLSTM-STFE.
Compared to the other methods, the LSP-SuAE achieved superior
experimental results in terms of several evaluation metrics, confirming
the synergistic effectiveness of local feature learning and interlayer su-
pervised learning for optimizing the quality prediction performance of
the model.
Fig. 12. Prediction scatter plot yielded by the six methods. Additionally, Table 4 lists the total runtime required for model
training (excluding the validation time) by each methods. Compared to
4.3.2. Comparison with different models the other deep models, SVR had the shortest runtime because it did not
To evaluate the quality prediction performance of our method, we require hierarchical pretraining. Among the six methods, the LSP-SuAE
conducted comparative experiments between the LSP-SuAE and five required more training time, but its training time was within an
other commonly used or similar methods under the same experimental acceptable range.
conditions. These five methods included the traditional data-driven SVR Fig. 9 presents boxplots of the prediction errors induced by the six
method [21]; two deep neural network methods based on autoencoders, methods. Compared with those of the other methods, the box and
namely, an SAE [26] and a stacked semisupervised autoencoder (SSAE) whiskers of the LSP-SuAE are narrower, and its median absolute error is
[42]; and two deep learning soft sensors related to spatiotemporal the smallest, indicating good prediction accuracy and robustness. Fig. 10
feature preservation, namely, the Att. BiLSTM-STFE [38] and STNP-SAE shows a histogram of the errors induced by the six methods. The LSP-
[40]. The basic settings of the four deep models were as follows. SuAE exhibited high stability with a small error range.
Fig. 11 shows the prediction curves yielded by different methods for
• SAE: The network structure was the same as that of the LSP-SuAE. the conversion rate of ethylbenzene in the dehydrogenation reaction.
The pretraining objective function did not contain additional regu- The predictive curve of the LSP-SuAE closely mirrors the actual values,
larization terms. and no large prediction errors were produced even when the data fluc-
• SSAE: The network structure was the same as that of the SAE. A tuated violently, indicating good tracking capabilities and robustness.
supervised fine-tuning regularization term was added to the pre- Fig. 12 shows a scatter plot of the predicted values produced by the
training objective function. different methods. The axes represent the actual values versus the pre-
• STNP-SAE: The network structure was the same as that of the SAE. dicted values, with the diagonal signifying perfect prediction alignment.
The number of spatial neighbors was set to 3, and the number of Proximity to this diagonal line signifies superior predictive accuracy.
temporal neighbors was set to 4. Overall, the LSP-SuAE predictions are closest to the diagonal line, with
• Att. BiLSTM-STFE: The number of neural units in the unidirectional the highest concentration of scatter points occurring near the diagonal
LSTM was set to 12, and that in the BiLSTM was set to 6. The area.
attention module consisted of a two-layer neural network, where the
output dimensionality of the first layer was set to 8. A fully connected 4.3.3. Ablation experiments
layer with 4 neurons was added to produce a variable output. To validate the necessity of the synergy between local spatiotem-
poral feature preservation and layerwise supervised learning in the LSP-
To ensure the reliability of the results, the same parameter settings SuAE, we conducted ablation experiments targeting these two aspects.
(e.g., the learning rate and number of epochs) were used for each model. First, we retained the layerwise supervised learning process during
The experimental results in Table 4 reflect the predictive perfor- model training while removing the mechanism for learning local
mance achieved by soft sensors based on six different methods on the spatiotemporal features from the data. We found in practice that
test set. As shown in the table, among the six methods, the prediction completely removing the part preserving the local spatiotemporal
results of SVR and the SAE were not satisfactory, and further method- structure resulted in a model equivalent to the SSAE. A comparative
ological improvements are needed when addressing industrial data with analysis of the experimental results is presented in Section 4.3.2.

Table 5
Experimental results of the ablation experiments.
Method Performance Indicators

Training Testing

RMSE MAPE R2 ρSCC RMSE MAPE R2 ρSCC


LSP-SuAE 0.0579 0.51 % 0.9584 0.9752 0.0665 0.53 % 0.9527 0.9738
LSP-SuAE/-l 0.0736 0.59 % 0.9355 0.9736 0.0785 0.62 % 0.9418 0.9703
LSP-SuAE/-s 0.0632 0.53 % 0.9442 0.9749 0.0694 0.56 % 0.9474 0.9730

11
X. Wang et al. Applied Soft Computing 164 (2024) 111974

Therefore, here, we only removed the part related to spatially correlated enhance the transparency of the model, which could help soft sensors
weight learning while preserving the selection of temporally adjacent gain widespread public trust and attain improved reliability during the
samples and their corresponding regularization terms. application deployment phase.
Second, we retained the spatiotemporal neighborhood structure
preservation mechanism in the model training process while removing CRediT authorship contribution statement
the layerwise supervised learning part. Following these two settings, we
removed the corresponding terms from the loss function (10), conducted Yong Zhang: Visualization, Validation, Investigation, Data curation.
model training and testing separately, and compared the results with Xiaomei Qi: Writing – original draft, Validation, Resources, Investiga-
those of the LSP-SuAE. tion, Formal analysis. Xiao Wang: Writing – review & editing, Writing –
Table 5 shows the final error data induced by model training, where original draft, Validation, Software, Methodology, Funding acquisition,
LSP-SuAE/-l and LSP-SuAE/-s represent the first and second experi- Data curation.
mental settings, respectively. After conducting both ablation operations,
the sensing capability of the LSP-SuAE decreased to varying degrees.
Comparatively, removing the spatial local feature learning process led to Declaration of Competing Interest
a more severe performance decline. Thus, it can be concluded that for
the LSP-SuAE, unsupervised pretraining with spatiotemporal adjacent The authors declare that they have no known competing financial
sample reconstruction and supervised learning complement each other interests or personal relationships that could have appeared to influence
when integrated, jointly optimizing the training results and sensing the work reported in this paper.
performance of the model.
In summary, we utilized a soft sensor based on the LSP-SuAE to Data availability
predict the ethylbenzene conversion rate and conducted different ex-
periments to comprehensively evaluate the resulting model perfor- The authors do not have permission to share data.
mance. First, the optimal hyperparameters of the model were
determined through experiments. Subsequently, the sensing perfor- Acknowledgments
mance of the LSP-SuAE was validated from various perspectives,
including its quality prediction accuracy, predictive tracking capability, This work was supported by the National Natural Science Foundation
prediction stability, and computational burden. Moreover, the necessity of China (NSFC) (92270117), and the Natural Science Foundation of
of the core modules in the algorithm was verified through ablation ex- Shandong Province (ZR2022MF308).
periments. Overall, the results of all the experiments demonstrate the
feasibility and effectiveness of the LSP-SuAE in modeling the soft sensors Appendix A. Supporting information
encountered in typical industrial processes. some challenges remain
during the application phase. For instance, in the industrial dehydro- Supplementary data associated with this article can be found in the
genation process for the preparation of styrene from ethylbenzene, on- online version at doi:10.1016/j.asoc.2024.111974.
site control requires the prediction error of the ethylbenzene conver-
sion rate to be within the range of [0, 0.1]. While the proposed method References
largely meets these requirements, a few instances in which the predic-
tion error exceeded the specified range were observed. Therefore, [1] H.K. Mohanta, A.K. Pani, Adaptive non-linear soft sensor for quality monitoring in
further attention is needed to optimize the method and improve its refineries using just-in-time learning-generalized regression neural network
approach, Appl. Soft Comput. 119 (2022) 108546, https://doi.org/10.1016/j.
performance. Additionally, the LSP-SuAE faces the challenge of accu-
asoc.2022.108546.
racy degradation caused by the dynamic changes occurring during the [2] Y.S. Perera, D.A.A.C. Ratnaweera, C.H. Dasanayaka, C. Abeykoon, The role of
actual deployment process, which can result in higher maintenance artificial intelligence-driven soft sensors in advanced sustainable process
industries: a critical review, Eng. Appl. Artif. Intell. 121 (2023) 105988, https://
costs.
doi.org/10.1016/j.engappai.2023.105988.
[3] M. Jia, D. Xu, T. Yang, Y. Liu, Y. Yao, Graph convolutional network soft sensor for
5. Conclusions process quality prediction, J. Process Control 123 (2023) 12–25, https://doi.org/
10.1016/j.jprocont.2023.01.010.
[4] C. Liu, K. Wang, Y. Wang, X. Yuan, Learning deep multimanifold structure feature
In this paper, we proposed a deep learning strategy, the LSP-SuAE, representation for quality prediction with an industrial application, IEEE Trans.
for industrial process quality prediction. Different from the unsuper- Ind. Inform. 18 (9) (2021) 5849–5858, https://doi.org/10.1109/
vised pretraining process of the traditional SAE, the LSP-SuAE utilizes a TII.2021.3130411.
[5] Q. Cheng, Z. Chunhong, L. Qianglin, Development and application of random forest
local structure-preserving semisupervised pretraining approach. By regression soft sensor model for treating domestic wastewater in a sequencing
reconstructing a local spatiotemporal structure of the input data, the batch reactor, Sci. Rep. 13 (1) (2023) 9149. 〈https://www.nature.
smoothness of the local feature space is maintained to learn deep fea- com/articles/s41598-023-36333-8#Sec16〉.
[6] Ž. Stržinar, B. Pregelj, I. Škrjanc, Soft sensor for non-invasive detection of process
tures that are more consistent with the manifold of the input data. events based on Eigenresponse Fuzzy Clustering, Appl. Soft Comput. 132 (2023)
Furthermore, a certain amount of labeled data is used to carry out 109859, https://doi.org/10.1016/j.asoc.2022.109859.
layerwise supervised tuning to improve the quality correlations among [7] C. Yang, C. Yang, J. Li, Y. Li, F. Yan, Forecasting of iron ore sintering quality index:
a latent variable method with deep inner structure, Comput. Ind. 141 (2022)
the hierarchical data features. The experimental results obtained in 103713, https://doi.org/10.1016/j.compind.2022.103713.
actual industrial processes indicate that, compared to five other popular [8] L. Feng, C. Zhao, Y. Sun, Dual attention-based encoder-decoder: a customized
methods, the LSP-SuAE could extract data features that more accurately sequence-to-sequence learning for soft sensor development, IEEE Trans. Neural
Netw. Learn. Syst. 32 (8) (2021) 3306–3317, https://doi.org/10.1109/
represented the original data, and it exhibit a better ability to fit the TNNLS.2020.3015929.
target variables. This approach significantly improves the sensing ca- [9] W. Anupong, K. Surendra, A. Monhammed, S. Ulaganathan, J. Mukta, K. Ravi,
pabilities and stability of soft sensors while adding only a minimal Artificial intelligence - enabled soft sensor and internet of things for sustainable
agriculture using ensemble deep learning architecture, Comput. Electr. Eng. 102
computational burden. Finally, in future research, we will apply the LSP-
(2022) 108128, https://doi.org/10.1016/j.compeleceng.2022.108128.
SuAE to more soft sensor modeling application scenarios to further [10] Q. Jiang, X. Yan, H. Yi, F. Gao, Data-driven batch-end quality modeling and
optimize its generalizability and scalability based on the diverse types monitoring based on optimized sparse partial least squares, IEEE Trans. Ind.
and scales of process data. In addition, considering the characteristics of Electron. 67 (5) (2020) 4098–4107, https://doi.org/10.1109/TIE.2019.2922941.
[11] L. Ma, M. Wang, K. Peng, A missing manufacturing process data imputation
industrial applications and regulatory requirements, it may be beneficial framework for nonlinear dynamic soft sensor modeling and its application, Expert
to incorporate explanatory strategies during the modeling phase to Syst. Appl. 237 (2024) 121428, https://doi.org/10.1016/j.eswa.2023.121428.

12
X. Wang et al. Applied Soft Computing 164 (2024) 111974

[12] E.A. Costa, C.M. Rebello, V.V. Santana, A.E. Rodrigues, A.M. Ribeiro, L. Schnitman, treatment processes, Water Res. (2024) 121347, https://doi.org/10.1016/j.
I.B. Nogueira, Mapping uncertainties of soft-sensors based on deep feedforward watres.2024.121347.
neural networks through a novel monte carlo uncertainties training process, [28] P. Chang, Z. Li, Over-complete deep recurrent neutral network based on
Processes 10 (2) (2022) 409, https://doi.org/10.3390/pr10020409. wastewater treatment process soft sensor application, Appl. Soft Comput. 105
[13] L. Zeng, Z. Ge, Pyramid dynamic bayesian networks for key performance indicator (2021) 107227, https://doi.org/10.1016/j.asoc.2021.107227.
prediction in long time-delay industrial processes, IEEE Trans. Artif. Intell. 5 (2) [29] R. Guo, H. Liu, Semisupervised dynamic soft sensor based on complementary
(2024) 661–671, https://doi.org/10.1109/TAI.2023.3258938. ensemble empirical mode decomposition and deep learning, Measurement 183
[14] Y. Yao, T. Han, J. Yu, M. Xie, Uncertainty-aware deep learning for reliable health (2021) 109788, https://doi.org/10.1016/j.measurement.2021.109788.
monitoring in safety-critical energy systems, Energy 291 (2024) 130419, https:// [30] S. Hong, N. An, H. Cho, J. Lim, I.S. Han, I. Moon, J. Kim, A dynamic soft sensor
doi.org/10.1016/j.energy.2024.130419. based on hybrid neural networks to improve early off-spec detection, Eng. Comput.
[15] E.C. Rivera, C.K. Yamakawa, C.E. Rossell, J. Nolasco Jr, H.J. Kwon, Prediction of 39 (4) (2023) 3011–3021, https://doi.org/10.1007/s00366-022-01731-5.
intensified ethanol fermentation of sugarcane using a deep learning soft sensor and [31] M. Karnati, A. Seal, D. Bhattacharjee, A. Yazidi, O. Krejcar, Understanding deep
process analytical technology, J. Chem. Technol. Biotechnol. 99 (1) (2024) learning techniques for recognition of human emotions using facial expressions: a
207–216, https://doi.org/10.1002/jctb.7525. comprehensive survey, IEEE Trans. Instrum. Meas. 72 (2023) 5006631, https://
[16] F. Alassery, Predictive maintenance for cyber physical systems using neural doi.org/10.1109/TIM.2023.3243661.
network based on deep soft sensor and industrial internet of things, Comput. Electr. [32] C. Hong, J. Yu, J. Zhang, X. Jin, K.H. Lee, Multimodal face-pose estimation with
Eng. 101 (2022) 108062, https://doi.org/10.1016/j.compeleceng.2022.108062. multitask manifold deep learning, IEEE Trans. Ind. Inform. 15 (7) (2019)
[17] T. Jia, C. Cai, Forecasting citywide short-term turning traffic flow at intersections 3952–3961, https://doi.org/10.1109/TII.2018.2884211.
using an attention-based spatiotemporal deep learning model, Transp. B: Transp. [33] X. Zhao, M. Jia, M. Lin, Deep Laplacian Auto-encoder and its application into
Dyn. 11 (1) (2023) 683–705, https://doi.org/10.1080/21680566.2022.2116125. imbalanced fault diagnosis of rotating machinery, Measurement 152 (2020)
[18] L. Lian, X. Zong, K. He, Z. Yang, Soft sensing of calcination zone temperature of 107320, https://doi.org/10.1016/j.measurement.2019.107320.
lime rotary kiln based on principal component analysis and stochastic [34] J. Cai, S. Wang, W. Guo, Unsupervised embedded feature learning for deep
configuration networks, Chemom. Intell. Lab. Syst. 240 (2023) 104923, https:// clustering with stacked sparse auto-encoder, Expert Syst. Appl. 186 (2021) 115729,
doi.org/10.1016/j.chemolab.2023.104923. https://doi.org/10.1016/j.eswa.2021.115729.
[19] D. Balram, K.Y. Lian, N. Sebastian, A novel soft sensor based warning system for [35] Y. Wang, C. Liu, X. Yuan, Stacked locality preserving autoencoder for feature
hazardous ground-level ozone using advanced damped least squares neural extraction and its application for industrial process data modeling, Chemom. Intell.
network, Ecotoxicol. Environ. Saf. 205 (2020) 111168, https://doi.org/10.1016/j. Lab. Syst. 203 (2020) 104086, https://doi.org/10.1016/j.chemolab.2020.104086.
ecoenv.2020.111168. [36] C. Liu, Y. Wang, K. Wang, X. Yuan, Deep learning with nonlocal and local structure
[20] Y. Chen, D. Wang, An improved deep kernel partial least squares and its preserving stacked autoencoder for soft sensor in industrial processes, Eng. Appl.
application to industrial data modeling, IEEE Trans. Ind. Inform. (2024), https:// Artif. Intell. 104 (2021) 104341, https://doi.org/10.1016/j.
doi.org/10.1109/TII.2024.3359453. engappai.2021.104341.
[21] P. Lian, H. Liu, X. Wang, R. Guo, Soft sensor based on DBN-IPSO-SVR approach for [37] S. Abirami, P. Chitra, Regional air quality forecasting using spatiotemporal deep
rotor thermal deformation prediction of rotary air-preheater, Measurement 165 learning, J. Clean. Prod. 283 (2021) 125341, https://doi.org/10.1016/j.
(2020) 108109, https://doi.org/10.1016/j.measurement.2020.108109. jclepro.2020.125341.
[22] L. Freitas, B.H. Barbosa, L.A. Aguirre, Including steady-state information in [38] C. Xie, R. Yao, L. Zhu, H. Gong, H. Li, X. Chen, Soft-sensor development through
nonlinear models: an application to the development of soft-sensors, Eng. Appl. deep learning with spatial and temporal feature extraction of complex processes,
Artif. Intell. 102 (2021) 104253, https://doi.org/10.1016/j. Ind. Eng. Chem. Res. 62 (1) (2022) 519–534, https://doi.org/10.1021/acs.
engappai.2021.104253. iecr.2c03137.
[23] X. Yin, Z. Niu, Z. He, Z.S. Li, D.H. Lee, Ensemble deep learning based semi- [39] D. Liu, Y. Wang, C. Liu, X. Yuan, C. Yang, W. Gui, Data mode related interpretable
supervised soft sensor modeling method and its application on quality prediction transformer network for predictive modeling and key sample analysis in industrial
for coal preparation process, Adv. Eng. Inform. 46 (2020) 101136, https://doi.org/ processes, IEEE Trans. Ind. Inform. 19 (2022) 9325–9336, https://doi.org/
10.1016/j.aei.2020.101136. 10.1109/TII.2022.3227731.
[24] S. Gao, J. Xu, Z. Ma, R. Tian, X. Dang, X. Dong, Research on modeling of industrial [40] C. Liu, K. Wang, Y. Wang, S. Xie, C. Yang, Deep nonlinear dynamic feature
soft sensor based on ensemble learning, IEEE Sens. J. (2024), https://doi.org/ extraction for quality prediction based on spatiotemporal neighborhood preserving
10.1109/JSEN.2024.3375072. SAE, IEEE Trans. Instrum. Meas. 70 (2021) 1–10, https://doi.org/10.1109/
[25] Y. Wang, Z. Pan, X. Yuan, C. Yang, W. Gui, A novel deep learning based fault TIM.2021.3122187.
diagnosis approach for chemical process with extended deep belief network, ISA [41] Y. Wang, S. Li, C. Liu, K. Wang, X. Yuan, C. Yang, W. Gui, Multiscale feature fusion
Trans. 96 (2020) 457–467, https://doi.org/10.1016/j.isatra.2019.07.001. and semi-supervised temporal-spatial learning for performance monitoring in the
[26] X. Wang, H. Liu, Soft sensor based on stacked auto-encoder deep neural network flotation industrial process, IEEE Trans. Cybern. 54 (2) (2023) 974–987, https://
for air preheater rotor deformation prediction, Adv. Eng. Inform. 36 (2018) doi.org/10.1109/TCYB.2023.3295852.
112–119, https://doi.org/10.1016/j.aei.2018.03.003. [42] Y. Zhuang, Y. Liu, A. Akhil, Z. Zhong, A.C. Ehecatl, P.H. Colin, M. Mwhmet,
[27] D. Li, C. Yang, Y. Li, A multi-subsystem collaborative Bi-LSTM-based adaptive soft A hybrid data-driven and mechanistic model soft sensor for estimating CO2
sensor for global prediction of ammonia-nitrogen concentration in wastewater concentrations for a carbon capture pilot plant, Comput. Ind. 143 (2022) 103747,
https://doi.org/10.1016/j.compind.2022.103747.

13

You might also like