Articulo 1
Articulo 1
Deep learning with local spatiotemporal structure preserving for soft sensor
development of complex industrial processes
Xiao Wang *, Xiaomei Qi , Yong Zhang
School of Electrical and Electronic Engineering, Shandong University of Technology, Zibo 255300, China
H I G H L I G H T S
• A deep learning method for soft sensor modeling uses a semisupervised pretraining strategy integrating spatiotemporal structure preservation.
• The developed soft sensor is deployed in an industrial scenario where soft sensing technology has not been widely adopted.
• This paper provides a theoretical and practical foundation for deep learning to address problems involving spatiotemporal data structures.
A R T I C L E I N F O A B S T R A C T
Keywords: Data-driven soft sensors have emerged as indispensable tools for predicting quality variables in complex in-
Soft sensor dustrial processes because of their cost-effectiveness and ease of maintenance. In particular, soft sensors based on
Data-driven modeling deep learning have been utilized in extensive research and successful applications in recent years. However,
Deep learning
traditional deep learning methods capture hierarchical data features by minimizing global fitting errors,
Local feature learning
neglecting the local structural characteristics implied in the original data. In this paper, we propose a new deep
learning method for soft sensor development. Utilizing autoencoders as the foundational architecture of our
network, a new semisupervised strategy is adopted for layerwise pretraining optimization. On the one hand,
more representative data features are extracted by maintaining the local spatiotemporal structure of the data; on
the other hand, layer-by-layer supervised learning is employed to identify the critical features that are aligned
with the ultimate task, which aids in obtaining the optimal network parameters and improving the resulting
prediction accuracy. Subsequently, a local spatiotemporal structure-preserving stacked semisupervised autoen-
coder (LSP-SuAE) is established. To evaluate the feasibility and effectiveness of the proposed approach, exper-
iments are carried out in a real industrial process. A soft sensor based on the LSP-SuAE is developed to predict the
rate of ethylbenzene conversion during the dehydrogenation of styrene. The experimental results demonstrate
that, compared to five other common or similar data-driven modeling methods, LSP-SuAE exhibits higher pre-
diction accuracy and better stability.
1. Introduction and can only be measured by manual offline analyses in the laboratory,
which leads to a serious time lag. Thus, this strategy fails to satisfy the
Manufacturing is often viewed as a complex industrial process, and requirements of industrial process control [4,5].
with its continual development, increasing demand for critical perfor- Soft sensors represent virtual sensing solutions that use easily
mance indicators such as product quality, energy conservation and measured related process variables as explanatory variables (inputs) and
emission reduction has been observed. In practical production scenarios, quality variables as target variables (outputs) to estimate the target
the real-time measurement of quality-related key process variables is an variables by constructing mathematical models for describing the
important premise for enhancing product quality and optimizing process input–output behavior of the system [6]. With the popularization of
control levels [1–3]. However, due to technical or economic constraints, distributed measurement systems and the development of the storage
some quality variables (e.g., the reaction rate, component concentra- and calculation capabilities of computers, data-driven methods have
tion, and viscosity) cannot be directly measured with hardware sensors become the mainstream development direction for soft sensor modeling
* Corresponding author.
E-mail address: [email protected] (X. Wang).
https://doi.org/10.1016/j.asoc.2024.111974
Received 6 November 2023; Received in revised form 10 June 2024; Accepted 3 July 2024
Available online 6 July 2024
1568-4946/© 2024 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
X. Wang et al. Applied Soft Computing 164 (2024) 111974
[7,8]. This approach relies on site data to establish models, enhancing effectiveness and practicality of the proposed method. This case
process understanding and providing optimization; it is particularly study demonstrates the significant potential of our method for use in
suitable for complex industrial process modeling where the underlying practical applications, not only by effectively reducing energy con-
mechanisms are unclear [9–11]. With the advent of the Industry 4.0 era, sumption and production costs but also by further expanding the
the integration of advanced machine learning techniques with soft application boundaries of deep learning technology in the field of
sensor development has provided new avenues for improving process industrial process monitoring and optimization.
efficiency and decision-making. Among the various available machine • This paper provides a new soft sensor development perspective for
learning paradigms, deep learning, with its significant advantage complex industrial processes and offers a theoretical and practical
regarding hierarchical data representation, has shown great potential foundation for deep learning to address a wide range of problems
for capturing the complex nonlinear relationships between process involving complex spatiotemporal data structures. Furthermore, the
variables [12]. With the successful applications of deep learning in proposed framework and method provide a reference for subsequent
computer vision, natural language processing, and other fields, soft research on similar issues.
sensors based on deep learning (DLSSs) have been widely researched
and validated in various scenarios, demonstrating outstanding perfor- The remainder of this paper is organized as follows. The current
mance in handling complex, modern, and large-scale industrial data [13, research status and background are summarized in Section 2. Section 3
14]. presents the proposed LSP-SuAE. Section 4 describes the industrial
Despite the satisfactory results produced by DLSSs in many variable application of our method, the experimental setup, and an analysis of
prediction tasks, these sensors still face some nonnegligible challenges in the experimental results. Section 5 summarizes this study.
practical industrial applications [15,16]. First, many large-scale indus-
trial processes operate in harsh environments with slow changes. Due to 2. Related work
the strong mechanical and topological relevance levels of process vari-
ables, industrial process data often exhibit strong local behaviors. In recent years, data-driven soft sensors have emerged as crucial
Therefore, preserving these structures during the model learning process tools for industrial process monitoring and quality prediction. These
can lead to the development of more robust and interpretable soft sen- models leverage process data to construct nonlinear relationships be-
sors [17]. However, traditional deep learning methods capture hierar- tween variables, realizing the prediction of quality variables. Traditional
chical data features by minimizing global fitting errors, ignoring the methods such as principal component analysis (PCA) [18], partial least
local structural features that are implicit in the original data, which is squares (PLS) and its variants including kernel-based PLS [19,20],
not conducive to soft sensor modeling and accurate prediction. Most support vector regression (SVR) [21], and artificial neural networks
existing methods either overlook the importance of local structures or (ANNs) [22], have laid the groundwork for early research. While these
fail to effectively integrate them into the learning process. Second, un- approaches perform well with linear or mildly nonlinear relationships,
supervised layerwise pretraining can be employed to effectively extract their capabilities are limited when addressing large-volume, high--
the intrinsic features of data in cases with scarce labeled data, but this dimensional, and highly nonlinear industrial data, often failing to cap-
approach exhibits certain flaws in soft sensing tasks that require accu- ture complex data characteristics [23].
rate regression-based prediction. Since the unsupervised training stage With the widespread application of deep learning in tasks such as
is not optimized for specific tasks, the constructed model may fail to image recognition and natural language processing, research related to
capture the most critical features for the final task, and the irrelevant DLSSs has rapidly advanced. Through unsupervised layerwise pre-
features learned during this stage may lead to overfitting problems. training, deep learning effectively avoids the issues of vanishing and
Therefore, it is necessary to develop deep learning methods that are exploding gradients, demonstrating its powerful ability to extract high-
more suitable for soft sensor modeling, combining global and local level feature representations from complex data. Various DLSSs,
perspectives during model training and enhancing the target orientation including those based on convolutional neural networks (CNNs) [24],
of the interlayer features to achieve accurate quality prediction. deep belief networks (DBNs) [25], stacked autoencoders (SAEs) [26],
Based on these considerations, in this paper, we propose a new DLSS long short-term memory (LSTM) [27] and recurrent neural networks
called a local spatiotemporal structure-preserving stacked semi- (RNNs) [28], have produced substantial research results. They have
supervised autoencoder (LSP-SuAE) and apply it to the complex indus- demonstrated superior performance to that of traditional methods in
trial process of ethylbenzene dehydrogenation-based styrene production multiple fields, such as chemical engineering, metallurgy, and biology
to predict the conversion rate of ethylbenzene. During the pretraining [29,30].
process of a single autoencoder, local data patterns are integrated on the Despite the clear advantage of DLSSs in terms of data representation
basis of traditional global feature extraction, capturing the complex capabilities, deep learning, which acts as a global feature extractor,
dynamic dependencies between variables by encoding local spatiotem- often focuses only on the global characteristics of data, neglecting the
poral features. Then, by conducting fine-tuning with labeled data, the preservation of local data features [31]. This limitation has resulted in
task orientation of the features is further enhanced. On this basis, the restricted generalizability and predictive performance in specific in-
LSP-SuAE model is established through multilayer stacking. This semi- dustrial applications. To address this issue, some recent studies have
supervised pretraining strategy can help the model better understand begun exploring deep learning methods that can extract local features.
the dynamic changes that occur in complex industrial processes, which For instance, a manifold regularized convolutional layer (MRCL) was
is crucial for enhancing the performance of soft sensors. The main proposed in [32] to preserve the neighborhood structure of data. In [33],
contributions of this paper are as follows. a deep Laplacian autoencoder (DLapAE) was developed to constrain
neighboring information in the given data space. A deep stacked sparse
• A new deep learning-based soft sensor modeling method is proposed. embedded clustering (DSSEC) method was proposed in [34] to form an
By maintaining the intrinsic spatiotemporal correlations of process improved local structure retention strategy. Although these methods
data, the model becomes more sensitive to minor process changes, have achieved good results in tasks such as image processing, the
improving the accuracy and robustness of predictions. A new attempt feasibility of their soft sensing applications is worth validating further.
is made to measure the spatiotemporal adjacency of data, enhancing Some studies have incorporated deep learning methods with local
the feasibility and effectiveness of identifying potential spatiotem- structure preservation into soft sensor modeling and achieved satisfac-
poral dependencies in industrial data. tory results [35,36], but they primarily optimized the performance of
• The developed soft sensor is deployed in an industrial scenario where these methods by preserving the local geometric structure of the input
soft sensing technology has not been widely adopted, verifying the data. For soft sensing tasks, given the temporal correlation
2
X. Wang et al. Applied Soft Computing 164 (2024) 111974
3. Methods
(1) (1)
where {W(1) , b(1) } and {̂
W ,̂ b } denote the weight parameters of the
encoder and decoder, respectively, and f is an activation function such as
the sigmoid or rectified linear unit (ReLU) function. The above trans-
formations are all nonlinear mappings, and their weight parameters are
usually optimized through the backpropagation (BP) algorithm.
If the original dataset is X = {x1 , x2 , …, xN }and N is the number of
Fig. 2. Model structure of the SAE.
samples, the loss function of the AE can be defined by minimizing the
reconstruction error:
characteristics of process data, it is also necessary to further consider the
adjacency relationships of data in the time domain. To learn the N
∑ /
neighboring information of data more comprehensively, some recent JAE = x i ‖2 2N
‖xi − ̂ (3)
i=1
studies have explored incorporating spatiotemporal feature preservation
into deep learning to enhance the effects of model training [37]. Xie where ||⋅|| is the L2 norm. When the dimensions of h and x are con-
et al. proposed an attention- and bidirectional LSTM (BiLSTM)-based strained to be different, if the system can reconstruct x, then h carries the
spatiotemporal feature extraction network (Att. BiLSTM-STFE) to obtain information of the original data; i.e., h is a good expression of x.
local features by integrating an attention mechanism and LSTM [38]. Liu An SAE is based on an AE, but it increases the depth of the hidden
et al. utilized a data mode related self-attention mechanism to learn layers. Taking the feature h of the hidden layer of the encoder as the
features, enhancing the perceptual ability of data within the same mode original information, a new AE is trained to obtain a new feature.
[39]. Although attention mechanisms are good at capturing the Through this layerwise stacking procedure, multiple hidden layers are
long-distance dependencies between data, their suitability for industrial retained to obtain a deep neural network, as shown in Fig. 2. The feature
data with strong local behaviors still requires an investigation. Liu et al. of the deepest layer l is calculated as follows:
developed a spatiotemporal neighborhood-preserving stacked autoen-
coder (STNP-SAE) and applied it to perform soft sensor modeling in a h(l) = f(W(l) h(l− 1)
+ b(l) ) (4)
hydrocracking process [40]. Based on certain rules in the spatial and In soft sensor modeling scenarios, to predict quality variables, a
temporal domains, two sets of neighboring samples were integrated into regression layer is usually added at the top of the utilized network to
the model training procedure. Wang et al. developed a prediction model obtain the final output as follows:
by learning the spatiotemporal features of images through multiscale
feature extraction [41]. These approaches offer new insights into y = f(W(p) h(l) + b(p) )
̂ (5)
exploring the spatiotemporal correlations of industrial data, yet further
3
X. Wang et al. Applied Soft Computing 164 (2024) 111974
dynamic changes that occur near data, sequential adjacent points are
chosen as the nearest neighbors. For a sample xi collected at time t, K
neighbors can be identified by searching along the time axis:
/ / /
DK (xi ) = {xik }Kk=1 = {x(t − K 2), x(t − K 2 + 1), …, x(t + K 2)} (7)
where Wik is the weight coefficient between xi and its neighbor xik ,
‖xi − xik ‖2 denotes the squared Euclidean distance between xi and xik ,
and c is a scaling parameter for controlling the width of the kernel
function. Although the heat kernel function is related to the Euclidean
distance between two points, it provides a similarity measure based on
the distances between points, which decreases as the distance increases.
Fig. 3. Structure of the LSP-uAE. Compared to previously developed research methods based on spatial
distance, the heat kernel function, through its sensitivity to local data
where ̂ y is the predicted value of the target variable y and {W(p) , b(p) } is structures, can better handle high-dimensional data. By adjusting
the network weight of the regression layer. Once the pretraining stages parameter c, the similarity decay rate can be controlled, thereby flexibly
of all AEs have been completed, the labeled data are used for supervised capturing different levels of data structures. Additionally, this mea-
fine-tuning. The loss function is defined as follows: surement method is smoother and more robust to noise and small
disturbances.
N
∑ / Afterward, we incorporate the adjacency relationships between xi
Jfine− tuning = y i ‖2 2N
‖yi − ̂ (6)
i=1
and its K neighbors into the optimization objective of the AE to preserve
the local behavioral characteristics of the input data. During the data
Finally, the optimal network parameters {W(1)∗ , b(1)∗ , …, W(l)∗ , b(l)∗ , reconstruction process, these neighboring points should be as close
W(p)∗ , b(p)∗ } are obtained after iterative training. together as possible. Therefore, we introduce a regularization term to
preserve the local structure in the global reconstruction optimization
3.2. Local spatiotemporal structure-preserving stacked semisupervised AE objective and construct a new AE loss function as follows:
(LSP-SuAE)
N
∑ N ∑
∑ K
/
JLSP− AE = x i ‖2 2N + λ
‖xi − ̂ ‖̂ x ik ‖2 Wik
xi − ̂ (9)
Based on the fundamental theory of SAE, a new LSP-SuAE network is i=1 i=1 k=1
developed for industrial soft sensor modeling in this section.
where λ is a tradeoff parameter that is used to balance the relationship
3.2.1. LSP-uAE between reconstructing the target sample itself and reconstructing
To obtain more accurate data features, local data feature learning is adjacent samples.
included in the training processes of individual AEs. Within the learning The above strategy first identifies adjacent samples based on their
dataset, two samples are deemed neighbors if they are closely aligned in temporal proximity and then determines the adjacent weights based on
the input space. A proficient prediction model should maintain these spatial correlation measurements, accurately and quickly constructing a
neighborhood relationships from the inputs to the outputs. Therefore, spatiotemporal neighborhood structure for the original samples. By
during the data reconstruction process of an AE, the local neighborhood maintaining the local spatiotemporal structure in the data reconstruc-
structure of the data is maintained while reproducing the overall input tion process of the AE, the global and local structural features of the
data, which is conducive to obtaining representative and useful hidden input data can be comprehensively identified and extracted, and more
layer features. diverse and representative hidden layer features can thus be obtained.
Industrial data exhibit strong dynamic correlations, and process data To further improve the resulting prediction accuracy, an appropriate
are often slow-changing continuous time series that maintain significant amount of supervised tuning is added to the training process of the AE to
time dependencies over short periods, which is very important for increase the correlation between the extracted features and the quality
establishing dynamic models and predicting future data points. There- variables. Some labeled data are added to the learning dataset, and data
fore, when establishing local adjacency criteria, we incorporate both the reconstruction is carried out on the unlabeled data and the input parts of
temporal and spatial correlations of the input data, with a special focus the labeled data. Prediction neurons are added to the output layer to
on the significance of temporal adjacency. This approach, which enables obtain the predicted values of the quality variables. The quality pre-
the capture of more reliable local data relationships, is implemented diction error is included in the optimization procedure, and the network
through the following specific steps. weights are optimized via semisupervised learning. The framework of
First, for each sample in the dataset X, neighbors are selected within the LSP-uAE is shown in Fig. 3, and the loss function is as follows:
the time domain. Since time series neighbors can track the potential
4
X. Wang et al. Applied Soft Computing 164 (2024) 111974
Fig. 4. Framework of the proposed LSP-SuAE algorithm for soft sensor modeling.
5
X. Wang et al. Applied Soft Computing 164 (2024) 111974
modeling. The learning procedures of the LSP-SuAE are shown in Al- industrial process of styrene production to predict the conversion rate of
gorithm 1, where minibatch gradient descent (MBGD) is used for ethylbenzene during the dehydrogenation reaction. Furthermore, the
parameter optimization. Fig. 4 provides the basic framework of the soft performance of the LSP-SuAE was validated by comparing its experi-
sensor modeling method based on the LSP-SuAE. mental results with those of five data-driven methods.
Algorithm 1. : LSP-SuAE Algorithm
4.1. Background
6
X. Wang et al. Applied Soft Computing 164 (2024) 111974
the entrance of the first-stage reactor, and the central pipe passes
through a reaction bed covered with catalyst in the radial direction to
enable the dehydrogenation reaction, with a reaction temperature of
550–650 ◦ C. The product of the first reactor is heated by the main steam
flowing through the interstage reheater, and it enters the second-stage
reactor again for dehydrogenation. The output temperature of the sec-
ond reactor is approximately 570 ◦ C. The product of the second-stage
reaction passes through a heat exchanger to lower the temperature of
the ethylbenzene, and crude styrene is obtained after subsequent
condensation and gas–liquid separation steps. During the above pro-
cess, the conversion rate of the dehydrogenation reaction is an impor-
tant quality index that directly impacts the concentration of styrene in
the dehydrogenation product. A conversion rate that is too high will
cause severe coking of the catalyst bed in the dehydrogenation reactor,
which will affect the polymerization rate. If the conversion rate is too
low, it will be difficult to obtain styrene with good purity, and the energy
consumption level will increase.
Due to the complex chemical background of the dehydrogenation
reaction, production occurs in a high-temperature environment, and
relative lags occur between the various process variables. Therefore, the
Fig. 6. Variation in the RMSE of the LSP-SuAE under different LSR and
λ values.
Table 2
Optimal values of the hyperparameters in the LSP-SuAE.
China leads the world in styrene production. Due to strong demand, the
styrene production capacity has continued to expand in recent years, yet Hyperparameter Optimal value
a gap remains between the supply and demand of styrene. Under the Tradeoff parameterλ 2e+2
pressure of market competition, production facilities must improve the Labeled sample ratio LSR 1/8
Parameter c 30
effectiveness of their quality inspection and process control mechanisms
Parameter K 7
and implement product quality monitoring processes to save energy and Minibatch size m 32
increase profits. Pretraining learning rate 0.1
Styrene is mainly produced from the dehydrogenation of ethyl- Number of pretraining epochs 10
benzene, which accounts for more than 80 % of all global production. A Number of fine-tuning epochs (layerwise pretraining) 5
Fine-tuning learning rate 0.5
chemical factory in East China uses a negative-pressure adiabatic
Number of fine-tuning epochs 185
dehydrogenation method to produce styrene. The styrene output of this
company in 2023 was 400,000 tons. The production process has three
steps: a dehydrogenation reaction; process condensation with an off-
gassing treatment; and liquid-phase dehydrogenation, distillation, and Table 3
separation. The process flow is shown in Fig. 5. In the ethylbenzene Training results of the LSP-SuAE.
dehydrogenation unit, the catalytic dehydrogenation reaction of ethyl- Phase Performance Indicators
benzene is used to obtain crude styrene, which is the most critical step of RMSE MAPE R2 ρSCC
the process. The main reaction is as follows.
Training 0.0579 0.51 % 0.9584 0.9752
The raw material (ethylbenzene) is mixed with preheated primary
Validation 0.0658 0.53 % 0.9532 0.9742
water to form a gas-phase mixture with a stable ratio. This mixture is Testing 0.0665 0.53 % 0.9527 0.9738
mixed with the main steam produced by a steam superheating furnace at
Fig. 7. Variation in the RMSE of the LSP-SuAE under different c and K values.
7
X. Wang et al. Applied Soft Computing 164 (2024) 111974
4.2.2. Datasets
The experimental data were collected from November to December
2023. The temperature and pressure variables among the explanatory
variables were obtained from hardware sensors, while the other vari-
ables were estimated on-site. The target variable, the ethylbenzene
conversion rate, was determined via manual analysis in the laboratory.
After implementing preprocessing steps such as outlier removal,
normalization, and alignment on the data labels, datasets were selected
for model training. The training set consisted of 2050 samples, including
1750 unlabeled samples and 300 labeled samples. Both the validation
and test datasets contained 100 labeled samples each. The computer
configuration used in the experiment was as follows: Windows 10 (64-
bit), an Intel Core i7–8550 U (1.80 GHz) CPU, and 8.0 GB of RAM.
8
X. Wang et al. Applied Soft Computing 164 (2024) 111974
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
/ ̅
√ N The parameters c and K affect the learning results of the model by
√∑
RMSE = √ (yi − ̂ yi) 2
N (13) impacting the size and weight of the neighborhood structure. Fig. 7(a)
i=1 shows the prediction errors induced under different K values while c was
kept constant. The value of K was searched within the range from 1 to
The most influential parameters included the labeled sample ratio
16. An appropriate K value helps improve the prediction accuracy.
(LSR) in layerwise pretraining, the neighborhood parameter K, the
However, if K is too large, the neighborhood becomes too wide, making
neighboring sample weight adjustment parameter c, and the tradeoff
the algorithm ineffective. When K is too small, the data neighborhood is
parameter of the local spatiotemporal structure-preserving regulariza-
too small to effectively extract the local geometric features. Fig. 7(b)
tion term λ.
illustrates the RMSEs generated by different values of c under a fixed K
Since the LSR and λ directly determine the participation levels of
value. The c value was searched in the range from 10–100. To demon-
local structure preservation and layerwise supervised adjustment during
strate the effect of local neighborhood structure preservation on the
model learning, we first discuss their impacts on the performance of the
performance of the original SAE, the prediction error induced by the SAE
LSP-SuAE. LSR = 1/n indicates that the labeled samples used for su-
was plotted as the baseline in Fig. 7. The RMSE values fluctuated
pervised tuning during layerwise pretraining accounted for 1/n of all
significantly under different c and K values. In most cases, however, the
training samples. In the experiments, the value of λ was searched in the
RMSE of the LSP-SuAE was below that of the baseline.
range from 2e− 1 to 2e+3, and the LSR was searched in the range from
In addition to the above parameters, the learning rate was deter-
1/32–1. Fig. 6 shows the RMSE variation induced under different LSR
mined by a grid search conducted in the range from 0.001 to 10. The
and λ values. When the LSR was too small, the effect of supervised tuning
MBGD algorithm was used to optimize the parameters. The trial-and-
during pretraining was not significant, and when it was too large, it was
error strategy was used to select the minibatch size and the number of
prone to overfitting. Moreover, a large λ led to a large error, and a
epochs. The experimental results provided references for the
moderate value resulted in good prediction accuracy.
9
X. Wang et al. Applied Soft Computing 164 (2024) 111974
Fig. 11. Prediction curves produced by the six methods on the test dataset.
10
X. Wang et al. Applied Soft Computing 164 (2024) 111974
Table 5
Experimental results of the ablation experiments.
Method Performance Indicators
Training Testing
11
X. Wang et al. Applied Soft Computing 164 (2024) 111974
Therefore, here, we only removed the part related to spatially correlated enhance the transparency of the model, which could help soft sensors
weight learning while preserving the selection of temporally adjacent gain widespread public trust and attain improved reliability during the
samples and their corresponding regularization terms. application deployment phase.
Second, we retained the spatiotemporal neighborhood structure
preservation mechanism in the model training process while removing CRediT authorship contribution statement
the layerwise supervised learning part. Following these two settings, we
removed the corresponding terms from the loss function (10), conducted Yong Zhang: Visualization, Validation, Investigation, Data curation.
model training and testing separately, and compared the results with Xiaomei Qi: Writing – original draft, Validation, Resources, Investiga-
those of the LSP-SuAE. tion, Formal analysis. Xiao Wang: Writing – review & editing, Writing –
Table 5 shows the final error data induced by model training, where original draft, Validation, Software, Methodology, Funding acquisition,
LSP-SuAE/-l and LSP-SuAE/-s represent the first and second experi- Data curation.
mental settings, respectively. After conducting both ablation operations,
the sensing capability of the LSP-SuAE decreased to varying degrees.
Comparatively, removing the spatial local feature learning process led to Declaration of Competing Interest
a more severe performance decline. Thus, it can be concluded that for
the LSP-SuAE, unsupervised pretraining with spatiotemporal adjacent The authors declare that they have no known competing financial
sample reconstruction and supervised learning complement each other interests or personal relationships that could have appeared to influence
when integrated, jointly optimizing the training results and sensing the work reported in this paper.
performance of the model.
In summary, we utilized a soft sensor based on the LSP-SuAE to Data availability
predict the ethylbenzene conversion rate and conducted different ex-
periments to comprehensively evaluate the resulting model perfor- The authors do not have permission to share data.
mance. First, the optimal hyperparameters of the model were
determined through experiments. Subsequently, the sensing perfor- Acknowledgments
mance of the LSP-SuAE was validated from various perspectives,
including its quality prediction accuracy, predictive tracking capability, This work was supported by the National Natural Science Foundation
prediction stability, and computational burden. Moreover, the necessity of China (NSFC) (92270117), and the Natural Science Foundation of
of the core modules in the algorithm was verified through ablation ex- Shandong Province (ZR2022MF308).
periments. Overall, the results of all the experiments demonstrate the
feasibility and effectiveness of the LSP-SuAE in modeling the soft sensors Appendix A. Supporting information
encountered in typical industrial processes. some challenges remain
during the application phase. For instance, in the industrial dehydro- Supplementary data associated with this article can be found in the
genation process for the preparation of styrene from ethylbenzene, on- online version at doi:10.1016/j.asoc.2024.111974.
site control requires the prediction error of the ethylbenzene conver-
sion rate to be within the range of [0, 0.1]. While the proposed method References
largely meets these requirements, a few instances in which the predic-
tion error exceeded the specified range were observed. Therefore, [1] H.K. Mohanta, A.K. Pani, Adaptive non-linear soft sensor for quality monitoring in
further attention is needed to optimize the method and improve its refineries using just-in-time learning-generalized regression neural network
approach, Appl. Soft Comput. 119 (2022) 108546, https://doi.org/10.1016/j.
performance. Additionally, the LSP-SuAE faces the challenge of accu-
asoc.2022.108546.
racy degradation caused by the dynamic changes occurring during the [2] Y.S. Perera, D.A.A.C. Ratnaweera, C.H. Dasanayaka, C. Abeykoon, The role of
actual deployment process, which can result in higher maintenance artificial intelligence-driven soft sensors in advanced sustainable process
industries: a critical review, Eng. Appl. Artif. Intell. 121 (2023) 105988, https://
costs.
doi.org/10.1016/j.engappai.2023.105988.
[3] M. Jia, D. Xu, T. Yang, Y. Liu, Y. Yao, Graph convolutional network soft sensor for
5. Conclusions process quality prediction, J. Process Control 123 (2023) 12–25, https://doi.org/
10.1016/j.jprocont.2023.01.010.
[4] C. Liu, K. Wang, Y. Wang, X. Yuan, Learning deep multimanifold structure feature
In this paper, we proposed a deep learning strategy, the LSP-SuAE, representation for quality prediction with an industrial application, IEEE Trans.
for industrial process quality prediction. Different from the unsuper- Ind. Inform. 18 (9) (2021) 5849–5858, https://doi.org/10.1109/
vised pretraining process of the traditional SAE, the LSP-SuAE utilizes a TII.2021.3130411.
[5] Q. Cheng, Z. Chunhong, L. Qianglin, Development and application of random forest
local structure-preserving semisupervised pretraining approach. By regression soft sensor model for treating domestic wastewater in a sequencing
reconstructing a local spatiotemporal structure of the input data, the batch reactor, Sci. Rep. 13 (1) (2023) 9149. 〈https://www.nature.
smoothness of the local feature space is maintained to learn deep fea- com/articles/s41598-023-36333-8#Sec16〉.
[6] Ž. Stržinar, B. Pregelj, I. Škrjanc, Soft sensor for non-invasive detection of process
tures that are more consistent with the manifold of the input data. events based on Eigenresponse Fuzzy Clustering, Appl. Soft Comput. 132 (2023)
Furthermore, a certain amount of labeled data is used to carry out 109859, https://doi.org/10.1016/j.asoc.2022.109859.
layerwise supervised tuning to improve the quality correlations among [7] C. Yang, C. Yang, J. Li, Y. Li, F. Yan, Forecasting of iron ore sintering quality index:
a latent variable method with deep inner structure, Comput. Ind. 141 (2022)
the hierarchical data features. The experimental results obtained in 103713, https://doi.org/10.1016/j.compind.2022.103713.
actual industrial processes indicate that, compared to five other popular [8] L. Feng, C. Zhao, Y. Sun, Dual attention-based encoder-decoder: a customized
methods, the LSP-SuAE could extract data features that more accurately sequence-to-sequence learning for soft sensor development, IEEE Trans. Neural
Netw. Learn. Syst. 32 (8) (2021) 3306–3317, https://doi.org/10.1109/
represented the original data, and it exhibit a better ability to fit the TNNLS.2020.3015929.
target variables. This approach significantly improves the sensing ca- [9] W. Anupong, K. Surendra, A. Monhammed, S. Ulaganathan, J. Mukta, K. Ravi,
pabilities and stability of soft sensors while adding only a minimal Artificial intelligence - enabled soft sensor and internet of things for sustainable
agriculture using ensemble deep learning architecture, Comput. Electr. Eng. 102
computational burden. Finally, in future research, we will apply the LSP-
(2022) 108128, https://doi.org/10.1016/j.compeleceng.2022.108128.
SuAE to more soft sensor modeling application scenarios to further [10] Q. Jiang, X. Yan, H. Yi, F. Gao, Data-driven batch-end quality modeling and
optimize its generalizability and scalability based on the diverse types monitoring based on optimized sparse partial least squares, IEEE Trans. Ind.
and scales of process data. In addition, considering the characteristics of Electron. 67 (5) (2020) 4098–4107, https://doi.org/10.1109/TIE.2019.2922941.
[11] L. Ma, M. Wang, K. Peng, A missing manufacturing process data imputation
industrial applications and regulatory requirements, it may be beneficial framework for nonlinear dynamic soft sensor modeling and its application, Expert
to incorporate explanatory strategies during the modeling phase to Syst. Appl. 237 (2024) 121428, https://doi.org/10.1016/j.eswa.2023.121428.
12
X. Wang et al. Applied Soft Computing 164 (2024) 111974
[12] E.A. Costa, C.M. Rebello, V.V. Santana, A.E. Rodrigues, A.M. Ribeiro, L. Schnitman, treatment processes, Water Res. (2024) 121347, https://doi.org/10.1016/j.
I.B. Nogueira, Mapping uncertainties of soft-sensors based on deep feedforward watres.2024.121347.
neural networks through a novel monte carlo uncertainties training process, [28] P. Chang, Z. Li, Over-complete deep recurrent neutral network based on
Processes 10 (2) (2022) 409, https://doi.org/10.3390/pr10020409. wastewater treatment process soft sensor application, Appl. Soft Comput. 105
[13] L. Zeng, Z. Ge, Pyramid dynamic bayesian networks for key performance indicator (2021) 107227, https://doi.org/10.1016/j.asoc.2021.107227.
prediction in long time-delay industrial processes, IEEE Trans. Artif. Intell. 5 (2) [29] R. Guo, H. Liu, Semisupervised dynamic soft sensor based on complementary
(2024) 661–671, https://doi.org/10.1109/TAI.2023.3258938. ensemble empirical mode decomposition and deep learning, Measurement 183
[14] Y. Yao, T. Han, J. Yu, M. Xie, Uncertainty-aware deep learning for reliable health (2021) 109788, https://doi.org/10.1016/j.measurement.2021.109788.
monitoring in safety-critical energy systems, Energy 291 (2024) 130419, https:// [30] S. Hong, N. An, H. Cho, J. Lim, I.S. Han, I. Moon, J. Kim, A dynamic soft sensor
doi.org/10.1016/j.energy.2024.130419. based on hybrid neural networks to improve early off-spec detection, Eng. Comput.
[15] E.C. Rivera, C.K. Yamakawa, C.E. Rossell, J. Nolasco Jr, H.J. Kwon, Prediction of 39 (4) (2023) 3011–3021, https://doi.org/10.1007/s00366-022-01731-5.
intensified ethanol fermentation of sugarcane using a deep learning soft sensor and [31] M. Karnati, A. Seal, D. Bhattacharjee, A. Yazidi, O. Krejcar, Understanding deep
process analytical technology, J. Chem. Technol. Biotechnol. 99 (1) (2024) learning techniques for recognition of human emotions using facial expressions: a
207–216, https://doi.org/10.1002/jctb.7525. comprehensive survey, IEEE Trans. Instrum. Meas. 72 (2023) 5006631, https://
[16] F. Alassery, Predictive maintenance for cyber physical systems using neural doi.org/10.1109/TIM.2023.3243661.
network based on deep soft sensor and industrial internet of things, Comput. Electr. [32] C. Hong, J. Yu, J. Zhang, X. Jin, K.H. Lee, Multimodal face-pose estimation with
Eng. 101 (2022) 108062, https://doi.org/10.1016/j.compeleceng.2022.108062. multitask manifold deep learning, IEEE Trans. Ind. Inform. 15 (7) (2019)
[17] T. Jia, C. Cai, Forecasting citywide short-term turning traffic flow at intersections 3952–3961, https://doi.org/10.1109/TII.2018.2884211.
using an attention-based spatiotemporal deep learning model, Transp. B: Transp. [33] X. Zhao, M. Jia, M. Lin, Deep Laplacian Auto-encoder and its application into
Dyn. 11 (1) (2023) 683–705, https://doi.org/10.1080/21680566.2022.2116125. imbalanced fault diagnosis of rotating machinery, Measurement 152 (2020)
[18] L. Lian, X. Zong, K. He, Z. Yang, Soft sensing of calcination zone temperature of 107320, https://doi.org/10.1016/j.measurement.2019.107320.
lime rotary kiln based on principal component analysis and stochastic [34] J. Cai, S. Wang, W. Guo, Unsupervised embedded feature learning for deep
configuration networks, Chemom. Intell. Lab. Syst. 240 (2023) 104923, https:// clustering with stacked sparse auto-encoder, Expert Syst. Appl. 186 (2021) 115729,
doi.org/10.1016/j.chemolab.2023.104923. https://doi.org/10.1016/j.eswa.2021.115729.
[19] D. Balram, K.Y. Lian, N. Sebastian, A novel soft sensor based warning system for [35] Y. Wang, C. Liu, X. Yuan, Stacked locality preserving autoencoder for feature
hazardous ground-level ozone using advanced damped least squares neural extraction and its application for industrial process data modeling, Chemom. Intell.
network, Ecotoxicol. Environ. Saf. 205 (2020) 111168, https://doi.org/10.1016/j. Lab. Syst. 203 (2020) 104086, https://doi.org/10.1016/j.chemolab.2020.104086.
ecoenv.2020.111168. [36] C. Liu, Y. Wang, K. Wang, X. Yuan, Deep learning with nonlocal and local structure
[20] Y. Chen, D. Wang, An improved deep kernel partial least squares and its preserving stacked autoencoder for soft sensor in industrial processes, Eng. Appl.
application to industrial data modeling, IEEE Trans. Ind. Inform. (2024), https:// Artif. Intell. 104 (2021) 104341, https://doi.org/10.1016/j.
doi.org/10.1109/TII.2024.3359453. engappai.2021.104341.
[21] P. Lian, H. Liu, X. Wang, R. Guo, Soft sensor based on DBN-IPSO-SVR approach for [37] S. Abirami, P. Chitra, Regional air quality forecasting using spatiotemporal deep
rotor thermal deformation prediction of rotary air-preheater, Measurement 165 learning, J. Clean. Prod. 283 (2021) 125341, https://doi.org/10.1016/j.
(2020) 108109, https://doi.org/10.1016/j.measurement.2020.108109. jclepro.2020.125341.
[22] L. Freitas, B.H. Barbosa, L.A. Aguirre, Including steady-state information in [38] C. Xie, R. Yao, L. Zhu, H. Gong, H. Li, X. Chen, Soft-sensor development through
nonlinear models: an application to the development of soft-sensors, Eng. Appl. deep learning with spatial and temporal feature extraction of complex processes,
Artif. Intell. 102 (2021) 104253, https://doi.org/10.1016/j. Ind. Eng. Chem. Res. 62 (1) (2022) 519–534, https://doi.org/10.1021/acs.
engappai.2021.104253. iecr.2c03137.
[23] X. Yin, Z. Niu, Z. He, Z.S. Li, D.H. Lee, Ensemble deep learning based semi- [39] D. Liu, Y. Wang, C. Liu, X. Yuan, C. Yang, W. Gui, Data mode related interpretable
supervised soft sensor modeling method and its application on quality prediction transformer network for predictive modeling and key sample analysis in industrial
for coal preparation process, Adv. Eng. Inform. 46 (2020) 101136, https://doi.org/ processes, IEEE Trans. Ind. Inform. 19 (2022) 9325–9336, https://doi.org/
10.1016/j.aei.2020.101136. 10.1109/TII.2022.3227731.
[24] S. Gao, J. Xu, Z. Ma, R. Tian, X. Dang, X. Dong, Research on modeling of industrial [40] C. Liu, K. Wang, Y. Wang, S. Xie, C. Yang, Deep nonlinear dynamic feature
soft sensor based on ensemble learning, IEEE Sens. J. (2024), https://doi.org/ extraction for quality prediction based on spatiotemporal neighborhood preserving
10.1109/JSEN.2024.3375072. SAE, IEEE Trans. Instrum. Meas. 70 (2021) 1–10, https://doi.org/10.1109/
[25] Y. Wang, Z. Pan, X. Yuan, C. Yang, W. Gui, A novel deep learning based fault TIM.2021.3122187.
diagnosis approach for chemical process with extended deep belief network, ISA [41] Y. Wang, S. Li, C. Liu, K. Wang, X. Yuan, C. Yang, W. Gui, Multiscale feature fusion
Trans. 96 (2020) 457–467, https://doi.org/10.1016/j.isatra.2019.07.001. and semi-supervised temporal-spatial learning for performance monitoring in the
[26] X. Wang, H. Liu, Soft sensor based on stacked auto-encoder deep neural network flotation industrial process, IEEE Trans. Cybern. 54 (2) (2023) 974–987, https://
for air preheater rotor deformation prediction, Adv. Eng. Inform. 36 (2018) doi.org/10.1109/TCYB.2023.3295852.
112–119, https://doi.org/10.1016/j.aei.2018.03.003. [42] Y. Zhuang, Y. Liu, A. Akhil, Z. Zhong, A.C. Ehecatl, P.H. Colin, M. Mwhmet,
[27] D. Li, C. Yang, Y. Li, A multi-subsystem collaborative Bi-LSTM-based adaptive soft A hybrid data-driven and mechanistic model soft sensor for estimating CO2
sensor for global prediction of ammonia-nitrogen concentration in wastewater concentrations for a carbon capture pilot plant, Comput. Ind. 143 (2022) 103747,
https://doi.org/10.1016/j.compind.2022.103747.
13