Data-Driven Modeling of Multimode Chemical Process - Validation With A Real-World Distillation Column
Data-Driven Modeling of Multimode Chemical Process - Validation With A Real-World Distillation Column
Keywords: Real-world industrial processes frequently operate in different modes such as start-up, transient, and steady-
LSTM state operation. Since different operating modes are governed by different process dynamics, deriving a single
Clustering data-driven model representing the entire operation of such multimode processes is not a viable option. A
Feature selection
reasonable solution to this problem is to develop a separate model for each operating mode, which requires
Distillation
the extraction of data for each operating mode from raw data. Based on this viewpoint, this work develops
Multimode process
a data-driven modeling approach using clustering and featuring selection techniques to improve the quality
of raw data and develop a predictive model for a multimode industrial process. In particular, the developed
method focuses on training a steady-state predictive model as monitoring steady-state conditions is crucial for
achieving the desired product quality. Firstly, K-means clustering is performed to extract data describing the
steady-state operation mode from the available raw data. Next, feature selection is applied to the clustered
data using Pearson’s correlation coefficient to identify input features relevant to target features. Finally, an
LSTM model is trained using the clustered data and identified features to predict the steady-state operation.
The validity and effectiveness of the developed method are demonstrated using a real-world 2,3-Butanediol
distillation process dataset.
∗ Corresponding author at: Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77845, USA.
∗∗ Corresponding author at: Green Materials and Processes R&D Group, Korea Institute of Industrial Technology, 55 Jongga-ro, Ulsan, 44413, South Korea.
E-mail addresses: [email protected] (J.S.-I. Kwon), [email protected] (J. Kim).
1
The authors have contributed equally.
https://doi.org/10.1016/j.cej.2022.141025
Received 26 August 2022; Received in revised form 12 December 2022; Accepted 16 December 2022
Available online 24 December 2022
1385-8947/© 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-
nc/4.0/).
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025
2
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025
developed predictive model. From Dataset 1 and the training data from Since it is difficult to exactly identify the number of operating modes
Dataset 2, we extract data corresponding to steady-state operation using represented by the multimode data, choosing an optimal 𝐾 value is
K-means clustering. required. To this end, we utilize Silhouette score method for deciding
K-means clustering is a simple data clustering technique that par- an optimum number of clusters [26]. The Silhouette score provides
titions data into 𝐾 different clusters based on their similarity. The an idea about how similar a data sample is with its own cluster in
algorithm follows two steps: first, 𝐾 centroid values are randomly comparison with other clusters and is calculated as follows:
selected for 𝐾 cluster groups, and next, each data sample is assigned 𝑏 𝑗 − 𝑎𝑗
𝑆𝑗 = (2a)
into a cluster by examining its distance from the cluster centroid. Gen- max(𝑎𝑗 , 𝑏𝑗 )
erally, Euclidean distance is considered to evaluate distance between
𝑎𝑗 = 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑥𝑖𝑗 , 𝑥𝑖 ) (2b)
a data sample and cluster centroid. After performing initial grouping,
we re-calculate centroid values based on the clustered data obtained 𝑏𝑗 = min(𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑥𝑖𝑗 , 𝑥𝑚 )) (2c)
𝑚≠𝑖
in previous step. This procedure of calculating centroid values and
grouping data samples into different clusters is iteratively executed where 𝑎𝑗 is the average distance between a data sample 𝑥𝑖𝑗 and other
until convergence of centroid values is attained and variation within data samples within its own cluster 𝑖, and 𝑏𝑗 is the minimum of
each cluster, given by the following objective function, is minimized: average distances between 𝑥𝑖𝑗 and data samples in other remaining
clusters, 𝑥𝑚 (𝑚 ≠ 𝑖). The Silhouette score ranges between −1 to 1. A
∑
𝐾 ∑
𝑁
‖ 𝑖 ‖2 high Silhouette score represents a high similarity within each cluster,
𝐽= ‖𝑥𝑗 − 𝑐𝑖 ‖ (1)
‖ ‖ indicating that data samples are correctly grouped, and a low Silhouette
𝑖=1 𝑗=1
score means that the data is not correctly clustered. Accordingly, we
where 𝑥𝑖𝑗 and 𝑐𝑖 are 𝑗th data sample and centroid of cluster 𝑖, respec- evaluate Silhouette scores for a range of 𝐾 values and the 𝐾 value
tively. The above objective function is expressed in terms of Euclidean corresponding to the highest Silhouette score is selected as an optimal
distance between a 𝑗th data sample and centroid of cluster 𝑖. number of clusters. Using this optimal 𝐾, the above described K-means
3
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025
Fig. 3. Example of grouping data into two clusters using K-means clustering.
4
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025
distribution of clustered data. If a data sample 𝑥𝑗 is far from either based on a cut-off value. In the case study presented in Section 3, we
of 75th or 25th percentile of data distribution, it is identified as an implement these techniques for outlier removal.
outlier [28]. IQR method is summarized as follows: After dealing with noise and outliers, feature selection is imple-
⎧Outlier, mented. It is beneficial to utilize informative data for training a pre-
if 𝑥𝑗 < 𝑄1 − 𝛽 × IQR
⎪ dictive model so as to best capture the underlying process dynamics.
𝑥𝑗 ⎨Outlier, if 𝑥𝑗 > 𝑄3 + 𝛽 × IQR
⎪ Utilizing a large number of input features for model training and real-
⎩Normal, otherwise
time prediction can lead to poor generalization and be computationally
where IQR = 𝑄3 − 𝑄1 and 𝛽 is a parameter controlling the outlier expensive. Thus, PCC analysis is performed to identify key input fea-
detection dependence on IQR value. Typically, 𝛽 is taken as 1.5. Both tures that are significant to the process and use them for model training.
SS and IQR methods follow a similar outlier identification procedure PCC measures the strength and direction of relationship between two
by first finding the distribution of data, and then identifying an outlier linearly related variables. The correlation between an input and target
5
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025
Table 1
Input and target features for the 2,3-BDO distillation process (Fig. 4).
Feature symbol Description
𝑌 Bottom product temperature
𝑋01 Control valve for feedstock flowrate
𝑋02 Control valve for reboiler steam flowrate
𝑋03 Control valve for reflux flowrate
𝑋04 Control valve for bottom liquid level
𝑋05 Control valve for top pressure
𝑋06 Bottom liquid level
𝑋07 Feedstock flowrate
𝑋08 Reboiler steam flowrate
𝑋09 Reflux flowrate
𝑋10 Bottom product flow rate
𝑋11 Top pressure
𝑋12 Reboiler outlet pressure
𝑋13 Reflux tank pressure
𝑋14 Top temperature
𝑋15 Rectifying section temperature
Fig. 6. Silhouette scores for multimode data with a varying number of clusters. 𝑋16 Feedstock temperature
𝑋17 Stripping section temperature
𝑋18 Reboiler outlet temperature
𝑋19 Reflux tank temperature
feature based on 𝑠 input-target data samples is calculated as:
∑𝑠
𝑖=1 (𝑝𝑖 − 𝑝)(𝑞
̄ 𝑖 − 𝑞) ̄
𝑟𝑝𝑞 = √ √∑ (5)
∑𝑠 𝑠 large R2 value indicates a well-trained model, which can be further used
̄2
𝑖=1 (𝑝𝑖 − 𝑝) ̄2
𝑖=1 (𝑞𝑖 − 𝑞) for prediction. A detailed discussion on the LSTM model development,
where 𝑝 is an input feature with an average value of 𝑝, ̄ and 𝑞 is a including network architecture and training process, is presented in
target feature with an average value of 𝑞. ̄ The correlation value ranges Section 3.4, where we train an LSTM for the real-world distillation
between −1 to 1. The correlation between 𝑝 and 𝑞 is considered weak process.
if 𝑟𝑝𝑞 value is less than 0.3, and strong if greater than 0.6. In such a
manner, we execute PCC and select significant input features that are
relevant to the target of interest. Then, these relevant input features are 3. Case study: 2,3-Butanediol (BDO) distillation column
used in training an LSTM model.
Remark 1. Note that it is important to first cluster the data before This section presents the application of the developed method to a
performing feature selection. Feature selection is not effective when im- real-world 2,3-BDO distillation column. In the following subsections,
plemented on raw data (before clustering). Different operating modes we describe the process, and discuss simulation results obtained from
described in raw data are represented by different features. Because clustering, feature selection, and LSTM model training and testing.
of this, finding features relevant to steady-state operation from a high-
dimensional feature space of raw data is a challenging task. On the
other hand, clustering partitions raw data into groups of different 3.1. Process description
operating modes and reduces the dimensionality of feature space. As
a result, it becomes easier to identify the right input features from
Bio-based 2,3-BDO is a highly valuable compound produced from
clustered data.
microbial fermentation using low-cost renewable carbon sources such
2.2.3. LSTM model training as sucrose, lactose, and glucose, as raw materials. It is an eco-friendly
Unlike a traditional neural network, an LSTM is capable of process- material with minimum greenhouse gas emissions, and hence, is exten-
ing time-series data by accounting for both long-term and short-term sively used in many industrial applications, including pharmaceuticals,
input–output dependencies. Its network contains recurrent units to cosmetics, food additives, and fuels [29–31]. Raw 2,3-BDO product
retain information from previous time step and use it for prediction from microbial fermentation contains many impurities. These impu-
in the current time step. Typically, an LSTM model is a multi-layered rities are separated in a distillation column to obtain a high purity
network with input, output, and hidden layers. The input layer contains 2,3-BDO compound. In this work, we applied the developed method
initial data related to input variables, which is followed by the hidden to the historical process dataset obtained from a 2,3-BDO distillation
layers. Each hidden layer contains an LSTM memory cell wherein the column of GS Caltex in South Korea. A schematic of this distillation
time-series dependencies are processed. The final output layer is a process is presented in Fig. 4. The distillation column separates acetoin,
fully-connected layer that provides target variables for the given input
water, and other contaminants from a 50 wt% 2,3-BDO feedstock.
variables.
Pressure at the top of the column is controlled at −0.9 Kg∕cm2 g to −0.8
In this work, we train an LSTM model using the training data ob-
Kg∕cm2 g. At this pressure, temperature at the bottom of the column (𝑌
tained after clustering and feature selection. Specifically, the identified
input features from PCC are considered as input data for model training. in Fig. 4) is targeted to be maintained at a steady-state temperature of
Further, we evaluate the performance of the developed LSTM model 120 °C to produce 99% pure 2,3-BDO as a bottom product. The bottom
based on coefficient of determination (R2 ), given by the following product purity can be determined based on pressure and temperature
equation: dynamics. Therefore, predicting temperature at the column bottom is
∑𝑁 beneficial to optimize the distillation operation. Accordingly, the goal
(𝑦𝑖 − 𝑦̂𝑖 )2
R2 = 1 − ∑𝑖=1𝑁
(6) of this case study is to build a predictive steady-state data-driven model
̄2
𝑖=1 (𝑦𝑖 − 𝑦) to forecast the trend of bottom product temperature (𝑌 , target feature).
where 𝑁 is the number of data samples, 𝑦𝑖 is the actual value, 𝑦̂𝑖 is The other features highlighted in Fig. 4 are the input features. Table 1
the model predicted value, and 𝑦̄ is the mean of actual data samples. A lists all the input and target features for this process.
6
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025
Fig. 8. Pearson correlation coefficients between input features and the target bottom product temperature for 46 feature selection cases from Table 2.
3.2. K-means clustering cluster numbers between 2 to 10 are presented in Fig. 6. The figure
shows that the highest Silhouette score is obtained for 𝑘 = 5, indi-
A raw dataset containing 5,760 data samples recorded at 1 𝑚𝑖𝑛 cating that dividing Dataset 1 into 5 clusters is optimum. Accordingly,
sampling interval from start-up to steady-state operation of the column Dataset 1 is clustered into 5 different groups representing multimode
is used to develop a predictive model for bottom product temperature, characteristics highlighted by different colors in Fig. 7. The clustering
whose profile is shown in Fig. 5. The temperature profile shows that model is further applied to Dataset 2 to extract data with steady-state
the distillation column operates in different modes (start-up, transient operation characteristics common in Datasets 1 and 2. Here, the Dataset
(unsteady), and steady-state operation). 2 with steady-state data helps in providing information about steady-
Following the procedure described in Section 2.1, we first divided state characteristics. Using this information, more steady-state data
the raw dataset into multimode (Dataset 1) and steady-state operation with identical characteristics are automatically extracted from Dataset
data (Dataset 2). We used half of Dataset 2 for training purposes and 1. It can be observed that the data represented by the purple and
the other half is used for testing the developed model. Next, K-means green colors in Dataset 1 have characteristics similar to steady-state
clustering is applied to Dataset 1 by selecting the optimum number of operation data in Dataset 2. Also, we further evaluated the clustering
clusters through Silhouette scoring technique. The Silhouette scores for model by validating its performance using a test data collected from
7
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025
steady-state operation. The validation result shows that the test data Table 2
Feature selection cases based on different preprocessing methods for
was grouped into Clusters 4 and 5, indicating that the clustering model
clustered data.
successfully identified steady-state characteristics of the test data. Thus,
Case Preprocessing Tuning
we extracted the identified common steady-state operation data from index method parameter
Dataset 1 and Dataset 2 as clustered data, which is next preprocessed
Case 1 None –
before being used for model training. Case 2–4 Only LPF 𝛼 = 0.3, 0.6, 0.9 (LPF)
Please note that it is challenging to cluster high-dimensional data Case 5–7 Only IQR 𝛽 = 1.0, 1.5, 2.0 (IQR)
because of the difficulty in assessing distance between the data objects Case 8–16 LPF and IQR 𝛼 = 0.3, 0.6, 0.9 (LPF)
in a high dimensional space [32]. In such scenarios, it is often rec- 𝛽 = 1.0, 1.5, 2.0 (IQR)
Case 17–25 IQR and LPF 𝛽 = 1.0, 1.5, 2.0 (IQR)
ommended to reduce the dimensionality before performing clustering.
𝛼 = 0.3, 0.6, 0.9 (LPF)
This can be done by employing dimensionality reduction techniques Case 26–28 SS 𝛾 = 2.5, 3.0, 3.5 (SS)
such as PCA as a preprocessing step that performs linear transformation Case 29–37 LPF and SS 𝛼 = 0.3, 0.6, 0.9 (LPF)
of high-dimensional data to low-dimensional data, to which clustering 𝛾 = 2.5, 3.0, 3.5 (SS)
algorithm can then be applied. In this work, dimensionality reduction Case 38–46 SS and LPF 𝛾 = 2.5, 3.0, 3.5 (SS)
𝛼 = 0.3, 0.6, 0.9 (LPF)
is not implemented prior to K-means clustering as the considered
distillation column case study has only 18 input features. But when
dealing with high-dimensional data, it is suggested to implement PCA
as an additional preprocessing step in the developed method to improve represent different feature selection scenarios with clustered data being
clustering. cleaned using various preprocessing techniques. In practice, 𝛼 in LPF
ranges between 0 and 1, 𝛾 in SS is typically taken as 3, and 𝛽 in IQR is
3.3. Feature selection taken as 1. Based on this notion, tuning parameter values of 𝛼 = [0.3,
0.6, 0.9], 𝛽 = [1.0, 1.5, 2.0], and 𝛾 = [2.5, 3.0, 3.5] are considered in
In this section, we improve the quality of clustered data contami- the case study.
nated with noise and outliers using LPF, SS, and IQR techniques. LPF A summary of PCC estimated for all 46 cases is highlighted in Fig. 8.
is used for noise removal, and SS and IQR techniques are implemented In the figure, white, green, and dark green-colored boxes indicate the
to identify and remove outliers. After dealing with noise and outliers, PCC value between target and input features, 𝑟𝑝𝑞 , less than 0.3 and
PCC values between target and input features are calculated to select greater than 0.3 and 0.6, respectively. Since the objective is to find
significant input features for LSTM model training. Since different relevant input features, the features with white boxes representing
preprocessing methods will have a different impact on data quality, weak correlation are not selected for model training. The figure sug-
we calculated the PCC values for 46 feature selection cases to select gests that PCC values are more significantly affected by the type of
the best feature selection approach for the given data. These cases preprocessing method but not by their tuning parameters. To better
are prepared by applying different combinations of LPF, SS, and IQR analyze the result presented in this figure, we reduced the 46 cases
methods prior to feature selection, as presented in Table 2. Also, to 4 cases, as described in Table 3. The second column in the table
the effect of different tuning parameter values for each preprocessing presents significant input features selected from PCC for each case.
method is analyzed in these cases. Case 1 in Table 2 represents a The base case in the table corresponds to applying feature selection
scenario when feature selection is implemented on clustered data with to clustered data with no preprocessing. The LPF case represents a
no preprocessing. Cases 2, 3, and 4 describe the cases when feature scenario when feature selection is performed on clustered data with
selection is applied to clustered data preprocessed using LPF with an 𝛼 LPF preprocessing. The LPF + IQR corresponds to the case when noise
value of 0.3, 0.6, and 0.9, respectively. Similarly, the remaining cases removal using LPF and IQR-based outlier detection are implemented on
8
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025
Fig. 10. Comparing (a) bottom product temperature prediction and (b) absolute prediction errors for the developed LSTM model and LSTM with raw data (without feature selection
and preprocessing).
Table 4
Table 3 Hyperparameters and training options for LSTM model development.
Relevant input features identified based on average PCC values from Fig. 8. List Training options/hyperparameters
Case Features identified Number of Algorithm LSTM
relevant features Hidden layers and nodes 3 layers (10-5-10)
Base X01, X02, X06, X07, X08, X11, X12, 11 Drop out 0.1
X13, X16, X17, X18 Activation function Relu
LPF X01, X02, X03, X06, X07, X08, 12 Weight initialization He initialization
X11, X12, X13, X16, X17, X18 Optimizer Adam
LPF + IQR X02, X03, X08, X09, X11, 9 Learning rate 0.001
X12, X14, X16, X17, X18 Batch number 512
LPF + SS X02, X03, X08, X09, X12, 10 Early stopping option (trigger) Patience (10)
X14, X16, X17, X18
9
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025
Fig. 11. Comparing (a) bottom product temperature prediction and (b) absolute prediction errors for the developed LSTM model and the linear model (with reboiler outlet as
input).
Note that both feature selection and preprocessing are applied only data, model with raw data, and linear model (with X18 as input) are
to steady-state data obtained after clustering. Using this data, we train quantified using R2 , as presented in Table 5. It can be observed that as
an LSTM to predict the bottom product temperature. The resulting compared to the raw data-based model that considered all the data and
LSTM’s prediction performance is compared with an LSTM trained input features for model prediction, the clustered data-based model has
using the entire dataset (i.e., without clustering) and all 18 input predicted well with a less number of input features, thereby highlight-
features with no feature selection and preprocessing. The comparison ing the importance of feature selection for predictive modeling. These
result is shown in Fig. 10. Here, to predict an output at 𝑡𝑘 , 10 data results indicate that the developed method provides a good direction
samples collected from 𝑡𝑘−9 , … , 𝑡𝑘 are given as model input to the to employ feature selection and clustering techniques for modeling and
LSTM model. Fig. 10 shows that the model with clustered data has predicting multimode industrial processes using real-world datasets.
successfully predicted the actual temperature profile as compared to the
model trained using raw data (without any preprocessing and feature Remark 2. It can be observed that for the presented case study,
selection). the performance of the linear model is only slightly less accurate
From Fig. 8, it can be seen that reboiler outlet (input feature X18) than the developed model. In general, a linear model only fits the
is highly correlated with bottoms temperature. Because of their high linear relationship between the input and output variables. However,
correlation, one can claim that a simple linear model with reboiler many input and output variables in chemical processes have complex
outlet as input is sufficient to accurately predict bottoms temperature. nonlinear interactions, which cannot be captured using a linear model.
To test this, we compared the performance of the developed LSTM Therefore, in view of developing a generalized modeling approach, we
model with a linear model in predicting bottoms temperature for a considered building a nonlinear model.
test dataset. The linear model is trained with the data utilized in LSTM
training. The comparison results are shown in Fig. 11. The results show
that the developed method predicts better than the linear model. It 4. Conclusion
is important to note that real-world data we have is highly nonlinear
with random and chaotic disturbances. Hence, a simple linear model is In this work, a multimode process modeling framework based on
not suitable for predicting the nonlinearities present in the data. Fur- feature selection and machine learning is developed. The developed
ther, the prediction accuracies for the developed model with clustered method uses clustering to extract steady-state operation data from raw
10
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025
Table 5 References
R2 between the actual and predicted temperature.
Model R2 [1] Z. Gao, H. Saxen, C. Gao, Guest editorial: Special section on data-driven
Model with raw data 0.024 (no preprocessing) approaches for complex industrial systems, IEEE Trans. Ind. Inform. 9 (4) (2013)
Model with clustered data 0.694 (LPF+IQR) 2210–2212.
Linear model 0.600 (LPF+IQR) [2] C. Ji, W. Sun, A review on data-driven process monitoring methods:
Characterization and mining of industrial data, Processes 10 (2) (2022) 335.
[3] Y.-N. Sun, Z.-L. Zhuang, H.-W. Xu, W. Qin, M.-J. Feng, Data-driven modeling
and analysis based on complex network for multimode recognition of industrial
data, constituting multiple data characteristics from different operating processes, J. Manuf. Syst. 62 (2022) 915–924.
[4] Z. Yan, B.-L. Huang, Y. Yao, Multivariate statistical process monitoring of
modes, and implements feature selection to identify significant input batch-to-batch startups, AIChE J. 61 (11) (2015) 3719–3727.
features from the clustered data. First, K-means clustering based on [5] A. Narasingam, J.S.-I. Kwon, Development of local dynamic mode decomposition
Silhouette scoring approach is employed to extract steady-state opera- with control: Application to model predictive control of hydraulic fracturing,
Comput. Chem. Eng. 106 (2017) 501–511.
tion data. Next, noise and outliers from the clustered data are removed
[6] A. Narasingam, P. Siddhamshetty, J. Sang-Il Kwon, Temporal clustering for
based on LPF, SS, and IQR methods to improve the data quality. Then, order reduction of nonlinear parabolic PDE systems with time-dependent spatial
Pearson’s correlation coefficient between input and target features is domains: Application to a hydraulic fracturing process, AIChE J. 63 (9) (2017)
calculated to select relevant input features for model training. Finally, 3818–3831.
[7] A. Narasingam, P. Siddhamshetty, J.S.-I. Kwon, Handling spatial heterogeneity
an LSTM model is trained to predict steady-state operation of a mul-
in reservoir parameters using proper orthogonal decomposition based ensemble
timode process. As a case study, we applied the developed method to Kalman filter for model-based feedback control of hydraulic fracturing, Ind. Eng.
real-world industrial data collected from a 2,3-BDO distillation column Chem. Res. 57 (11) (2018) 3977–3989.
of GS Caltex, South Korea. In this case study, we analyzed the per- [8] M.S.F. Bangi, A. Narasingam, P. Siddhamshetty, J.S.-I. Kwon, Enlarging the
domain of attraction of the local dynamic mode decomposition with control
formance of LSTM models trained using raw data and clustered data
technique: Application to hydraulic fracturing, Ind. Eng. Chem. Res. 58 (14)
for different preprocessing techniques. The clustered data-based model (2019) 5588–5601.
accurately predicted the bottom product temperature of a test dataset, [9] B. Bhadriraju, M.S.F. Bangi, A. Narasingam, J.S.-I. Kwon, Operable adaptive
which highlights the significance of feature selection and clustering in sparse identification of systems: Application to chemical processes, AIChE J. 66
(11) (2020) e16980.
modeling the steady-state operation of multimode industrial processes.
[10] P. Kumari, B. Bhadriraju, Q. Wang, J.S.-I. Kwon, Development of parametric
The developed method can be seen as a general modeling frame- reduced-order model for consequence estimation of rare events, Chem. Eng. Res.
work for multimode industrial processes. A good amount of training Des. 169 (2021) 142–152.
[11] P. Shah, M.Z. Sheriff, M.S.F. Bangi, C. Kravaris, J.S.-I. Kwon, C. Botre, J. Hirota,
data is essential to attain a reliable performance using the developed
Deep neural network-based hybrid modeling and experimental validation for an
LSTM. Since the developed method utilizes noise removal and outlier industry-scale fermentation process: Identification of time-varying dependencies
detection strategies on clustered data prior to using it for model train- among parameters, Chem. Eng. J. 441 (2022) 135643.
ing, the developed method is capable of handling noise and outliers [12] G. Wang, J. Liu, Y. Zhang, Y. Li, A novel multi-mode data processing method
and its application in industrial process monitoring, J. Chemom. 29 (2) (2015)
in data. Nonetheless, a high prediction accuracy is not guaranteed
126–138.
when there is a limited availability of data. To address this challenge, [13] Y. Sebzalli, X. Wang, Knowledge discovery from process operational data using
complementing the LSTM model training with the known firs-principles PCA and fuzzy clustering, Eng. Appl. Artif. Intell. 14 (5) (2001) 607–616.
model accelerates the network learning using less data [11,33,34]. [14] C. Rosén, Z. Yuan, Supervisory control of wastewater treatment plants by
combining principal component analysis and fuzzy c-means clustering, Water
Furthermore, in this work, the developed method was utilized for
Sci. Technol. 43 (7) (2001) 147–156.
modeling only steady-state operation. As a next step, the developed [15] Z. Zhu, Z. Song, A. Palazoglu, Transition process modeling and monitoring based
method will be further developed to a multimode process monitoring on dynamic ensemble clustering and multiclass support vector data description,
framework for modeling and monitoring multiple operating modes and Ind. Eng. Chem. Res. 50 (24) (2011) 13969–13983.
[16] E. Gallup, T. Quah, D. Machalek, K.M. Powell, Enhancing fault detection with
the in-between mode-to-mode transitions.
clustering and covariance analysis, IFAC-PapersOnLine 55 (2) (2022) 258–263.
[17] J. Zhang, P. Wang, R. Yan, R.X. Gao, Long short-term memory for machine
remaining life prediction, J. Manuf. Syst. 48 (2018) 78–86.
Declaration of competing interest [18] J.-T. Zhou, X. Zhao, J. Gao, Tool remaining useful life prediction method based
on LSTM under variable working conditions, Int. J. Adv. Manuf. Technol. 104
(9) (2019) 4715–4726.
The authors declare that they have no known competing finan- [19] Y. Bai, J. Xie, D. Wang, W. Zhang, C. Li, A manufacturing quality prediction
cial interests or personal relationships that could have appeared to model based on AdaBoost-LSTM with rough knowledge, Comput. Ind. Eng. 155
influence the work reported in this paper. (2021) 107227.
[20] Y. Choi, N. An, S. Hong, H. Cho, J. Lim, I.-S. Han, I. Moon, J. Kim, Time-series
clustering approach for training data selection of a data-driven predictive model:
Data availability Application to an industrial bio 2, 3-butanediol distillation process, Comput.
Chem. Eng. 161 (2022) 107758.
[21] B. Wang, Y. Li, Y. Luo, X. Li, T. Freiheit, Early event detection in a deep-learning
The data that has been used is confidential. driven quality prediction model for ultrasonic welding, J. Manuf. Syst. 60 (2021)
325–336.
[22] Z. Li, J. Li, Y. Wang, K. Wang, A deep learning approach for anomaly detection
Acknowledgments based on SAE and LSTM in mechanical equipment, Int. J. Adv. Manuf. Technol.
103 (1) (2019) 499–510.
[23] P. Xu, R. Du, Z. Zhang, Predicting pipeline leakage in petrochemical system
This work was supported by the Korean Institute of Industrial Tech- through GAN and LSTM, Knowl.-Based Syst. 175 (2019) 50–61.
[24] J. Lee, Y.C. Lee, J.T. Kim, Fault detection based on one-class deep learning for
nology within the framework of the project: Development of AI Plat-
manufacturing applications limited to an imbalanced database, J. Manuf. Syst.
form Technology for Smart Chemical Process (grant number: JH-22– 57 (2020) 357–366.
0004), the Korea Institute for Advancement of Technology (KIAT) grant [25] J. Wang, J. Zhang, X. Wang, Bilateral LSTM: A two-dimensional long short-term
funded by the Korea Government (MOTIE) (grant number: P0017304, memory model with multiply memory units for short-term cycle time forecasting
in re-entrant manufacturing systems, IEEE Trans. Ind. Inform. 14 (2) (2017)
Human Resource Development Program for Industrial Innovation),
748–758.
Texas A&M Energy Institute, USA, and the Artie McFerrin Department [26] P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation
of Chemical Engineering, USA. of cluster analysis, J. Comput. Appl. Math. 20 (1987) 53–65.
11
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025
[27] G. Casiez, N. Roussel, D. Vogel, 1€ filter: a simple speed-based low-pass filter [31] D. Tinoco, S. Borschiver, P.L. Coutinho, D.M. Freire, Technological development
for noisy input in interactive systems, in: Proceedings of the SIGCHI Conference of the bio-based 2, 3-butanediol process, Biofuels, Bioprod. Biorefin. 15 (2)
on Human Factors in Computing Systems, 2012, pp. 2527–2530. (2021) 357–376.
[28] M. Mahajan, S. Kumar, B. Pant, U.K. Tiwari, Incremental outlier detection in [32] I. Assent, Clustering high dimensional data, Wiley Interdiscip. Rev. Data Min.
air quality data using statistical methods, in: 2020 International Conference on Knowl. Discov. 2 (4) (2012) 340–350.
Data Analytics for Business and Industry: Way Towards a Sustainable Economy, [33] D. Lee, A. Jayaraman, J.S. Kwon, Development of a hybrid model for a partially
ICDABI, IEEE, 2020, pp. 1–5. known intracellular signaling pathway through correction term estimation and
[29] J.M. Park, H. Song, H.J. Lee, D. Seung, In silico aided metabolic engineering of neural network modeling, PLoS Comput. Biol. 16 (12) (2020) e1008472.
Klebsiella oxytoca and fermentation optimization for enhanced 2, 3-butanediol [34] M.S.F. Bangi, J.S.-I. Kwon, Deep hybrid modeling of chemical process:
production, J. Ind. Microbiol. Biotechnol. 40 (9) (2013) 1057–1066. Application to hydraulic fracturing, Comput. Chem. Eng. 134 (2020) 106696.
[30] C.W. Song, J.M. Park, S.C. Chung, S.Y. Lee, H. Song, Microbial production of
2, 3-butanediol for industrial applications, J. Ind. Microbiol. Biotechnol. 46 (11)
(2019) 1583–1601.
12