0% found this document useful (0 votes)

33 views12 pages

Data-Driven Modeling of Multimode Chemical Process - Validation With A Real-World Distillation Column

This article presents a data-driven modeling approach for multimode industrial processes using clustering, feature selection, and LSTM modeling. The approach identifies steady-state data from raw process data using K-means clustering. Feature selection is then applied to identify relevant input features. An LSTM model is trained on the clustered steady-state data and selected features to predict steady-state operation. The method is demonstrated on a real-world 2,3-butanediol distillation process dataset.

Uploaded by

montse2002mgl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views12 pages

Data-Driven Modeling of Multimode Chemical Process - Validation With A Real-World Distillation Column

Uploaded by

montse2002mgl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Chemical Engineering Journal 457 (2023) 141025

Contents lists available at ScienceDirect

Chemical Engineering Journal

journal homepage: www.elsevier.com/locate/cej

Data-driven modeling of multimode chemical process: Validation with a

real-world distillation column
Yeongryeol Choi a,b,c,d ,1 , Bhavana Bhadriaju c,d ,1 , Hyungtae Cho a , Jongkoo Lim e , In-Su Han e ,
Il Moon b , Joseph Sang-Il Kwon c,d ,∗, Junghwan Kim a,b,c,d ,∗∗
a
Green Materials and Processes R&D Group, Korea Institute of Industrial Technology, 55 Jongga-ro, Ulsan, 44413, South Korea
b Department of Chemical and Biomolecular Engineering, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 03722, South Korea
c Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77845, USA
d Texas A&M Energy Institute, Texas A&M University, College Station, TX 77845, USA
e
Research and Development Center, GS Caltex Corporation, 359 Expo-ro, Yuseong-gu, Daejeon 34122, South Korea

ARTICLE INFO ABSTRACT

Keywords: Real-world industrial processes frequently operate in different modes such as start-up, transient, and steady-
LSTM state operation. Since different operating modes are governed by different process dynamics, deriving a single
Clustering data-driven model representing the entire operation of such multimode processes is not a viable option. A
Feature selection
reasonable solution to this problem is to develop a separate model for each operating mode, which requires
Distillation
the extraction of data for each operating mode from raw data. Based on this viewpoint, this work develops
Multimode process
a data-driven modeling approach using clustering and featuring selection techniques to improve the quality
of raw data and develop a predictive model for a multimode industrial process. In particular, the developed
method focuses on training a steady-state predictive model as monitoring steady-state conditions is crucial for
achieving the desired product quality. Firstly, K-means clustering is performed to extract data describing the
steady-state operation mode from the available raw data. Next, feature selection is applied to the clustered
data using Pearson’s correlation coefficient to identify input features relevant to target features. Finally, an
LSTM model is trained using the clustered data and identified features to predict the steady-state operation.
The validity and effectiveness of the developed method are demonstrated using a real-world 2,3-Butanediol
distillation process dataset.

1. Introduction processes operate under multiple operating modes such as start-up,

transient, and steady-state operation [3]. Additionally, fluctuations in
Technology advances and the growing global economy have in- product demand, varying product specifications, changes in feed flow,
creased the complexity of industrial processes. For an efficient oper- and process shut-down, can also contribute to multimode operation of
ation of these processes, an insight into their intricate dynamics is industrial processes.
important [1]. Traditionally, a mathematical model derived based on Developing a single model describing the entire operation of a mul-
first-principles can provide a deeper insight into a process. However, timode process is difficult. For example, the quality of manufactured
developing a first-principle model for a complex industrial process is products during a process start-up will be worse than that of its steady-
difficult or not even possible because of insufficient process knowledge. state operation. As a result, start-up dynamics will differ from the
In such cases, data-driven models can be seen as a better alternative to steady-state operation [4]. Therefore, instead of a single model, it is
describe complex process dynamics as they do not require any a priori feasible to develop separate models for each operating mode. For exam-
process knowledge. Many of the data-driven models proposed in the
ple, several data-driven modeling frameworks in the literature identify
literature are developed based on process data obtained from numerical
local models to handle spatially or temporally varying dynamics in var-
experiments (i.e., synthetic data). However, these models cannot be
ious applications such as hydraulic fracturing process [5–8], chemical
applied to real-world industrial processes as simulated processes do not
reactor control [9], consequence estimation of rare events [10], and
fully represent their complex characteristics [2]. Specifically, industrial

∗ Corresponding author at: Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77845, USA.
∗∗ Corresponding author at: Green Materials and Processes R&D Group, Korea Institute of Industrial Technology, 55 Jongga-ro, Ulsan, 44413, South Korea.
E-mail addresses: [email protected] (J.S.-I. Kwon), [email protected] (J. Kim).
1
The authors have contributed equally.

https://doi.org/10.1016/j.cej.2022.141025
Received 26 August 2022; Received in revised form 12 December 2022; Accepted 16 December 2022
Available online 24 December 2022
1385-8947/© 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-
nc/4.0/).
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025

Next, Section 3 describes the application of the developed method to

Nomenclature an industrial 2,3-BDO distillation process and discusses the observed
LSTM Long short-term memory network simulation results. Finally, conclusions are presented in Section 4.
PCC Pearson correlation coefficient
2. Methodology
PCA Pincipal component analysis
LPF Low-pass filter
This section briefly discusses the problem addressed in this work
SS Standard score
and presents the developed methodology.
IQR Interquartile range
2.1. Problem description

A process running in different operating conditions is known as

hybrid modeling of a fermentation process [11]. Developing separate a multimode process. Generally speaking, multimode processes are
models requires the identification of data corresponding to different prevalent in chemical industries due to various operation routines
operating modes from raw data recorded over the course of start-up (modes) such as shut-down, start-up, product transition, production
to steady-state operation of an industrial process. Raw data contains in- volume variation, maintenance tasks, alterations in feed materials, and
formation from different operating modes [12]. In these circumstances, weather conditions. Under these circumstances, the process dynamics
data belonging to a specific operating mode can be distinguished vary from mode to mode, and hence, no single model can fully describe
from other modes through clustering analysis. When dealing with all the operating modes. Also, the collected data will not have the
high-dimensional data, it is recommended to perform dimensionality same set of features for all the modes. Thus, not all the available
reduction prior to clustering. In [13], the authors proposed a two-step historical data and the constituting features can be used in modeling a
clustering technique. In their approach principal component analysis specific operating mode of a multimode process. Therefore, developing
(PCA) is first implemented to reduce dimensionality, and in the second a systematic approach to extract the useful data and significant features
step, fuzzy c-means algorithm is employed to cluster the data. Further, relevant to a particular mode of interest is crucial to accurately pre-
several works in the literature used dynamic PCA with clustering to dict multimode processes. Most importantly, validating the developed
account for temporal dependence of the data. In [14], the authors used framework using datasets from real-world processes is of utmost impor-
dynamic PCA with fuzzy c-means clustering to control waste water tance to enhance its usability for real-world applications. Motivated by
treatment plant. Also, dynamic PCA was combined with multiclass these considerations, in this work, we focus on improving the quality
support vector data description for transition process monitoring [15]. of raw data to develop a predictive data-driven model for steady-state
Recently, the authors in [16] explored the application of dynamic PCA process operation. Specifically, we develop a LSTM model based on
with correlation clustering for enhanced fault detection. K-means clustering and feature selection using real-world dataset of a
Since many chemical processes are targeted to operate at steady- multimode industrial process as detailed in the following section.
state conditions for achieving desired product quality, it is important
to develop reliable predictive model for steady-state operation. There- 2.2. Developed methodology
fore, the objective of this work is to develop a data-driven model for
predicting steady-state operation of multimode industrial processes. An overview of the developed framework is provided in Fig. 1. In
In the developed method, we utilize K-means clustering algorithm this work, all the data collected from multiple operating modes of an
to extract steady-state operation data from raw data. Owing to their industrial process, such as start-up mode, transient operation mode, and
ability to capture complex dynamics of nonlinear processes and ac- the beginning of steady-state operation mode are considered multimode
curately predict time-series evolution of a process, long short-term data, and steady-state data are the data collected solely from the steady-
memory networks (LSTMs) have gained much attention over various state operation mode. The developed method uses K-means clustering
systems, which include: remaining useful life prediction [17,18], qual- to extract steady-state operation data from raw data. An example of
ity prediction [19–21], anomaly detection [22–24], and forecast of grouping data into two clusters using K-means is presented in Fig. 3.
manufacturing cycle time [25]. To this end, we develop an LSTM Once the clustering model is obtained, it is further validated using a
model using the steady-state operation data from clustering. However, test data before finally implementing it for extracting data representing
it is challenging to train a generalizable model if all input features of common steady-state operation characteristics from both multimode
an industrial process are used for model training. Instead, applying a and steady-state operation datasets. A detailed procedure for clustering
feature selection scheme to extract relevant input features for model is provided in the following subsection. Based on the collected clus-
training will enhance the prediction accuracy. In this context, we tered data, feature selection is implemented to identify input features
calculate the correlation between input and target features using the relevant to the steady-state operation by calculating the correlation
well-known Pearson’s correlation coefficient (PCC) technique with the between input and target features. Following the clustering and feature
steady-state operation data available from clustering. Based on these selection steps, we attain a good quality training dataset and use it for
identified input features, a predictive model for steady-state operation building an LSTM model for predicting steady-state operation. A step-
is developed. by-step procedure of K-means clustering, feature selection, and LSTM
In summary, the main contribution of this work is developing an model development is detailed as follows:
LSTM model for predicting the steady-state operation of multimode
industrial processes. The first step is to extract training data associated 2.2.1. K-means clustering for training data acquisition
with steady-state operation from raw data, which exhibits multimode Let us consider that raw data for an industrial process is recorded
behaviors, through K-means clustering. Then, by selecting significant from its start-up to steady-state operation, thereby exhibiting multiple
features for efficient model training, an LSTM is attained that is capable dynamical characteristics from different operating modes. In the devel-
of accurately predicting industrial processes. We demonstrate the developed method, we first examine and partition the available raw data
oped method to a real-world 2,3-Butanediol (BDO) distillation column, into two categories as shown in Fig. 2: multimode data (Dataset 1) and
GS Caltex, South Korea. steady-state operation data (Dataset 2). Data recorded from start-up to
The remainder of this paper is organized as follows: Section 2 the beginning of steady-state operation conditions constitute Dataset
presents the developed methodology and details the implementation 1. A portion of Dataset 2 is used for training a predictive model and
of K-means clustering, feature selection, and LSTM model training. the remaining Dataset 2 is used for testing the performance of the

2
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025

Fig. 1. Overview of the developed methodology.

developed predictive model. From Dataset 1 and the training data from Since it is difficult to exactly identify the number of operating modes
Dataset 2, we extract data corresponding to steady-state operation using represented by the multimode data, choosing an optimal 𝐾 value is
K-means clustering. required. To this end, we utilize Silhouette score method for deciding
K-means clustering is a simple data clustering technique that par- an optimum number of clusters [26]. The Silhouette score provides
titions data into 𝐾 different clusters based on their similarity. The an idea about how similar a data sample is with its own cluster in
algorithm follows two steps: first, 𝐾 centroid values are randomly comparison with other clusters and is calculated as follows:
selected for 𝐾 cluster groups, and next, each data sample is assigned 𝑏 𝑗 − 𝑎𝑗
𝑆𝑗 = (2a)
into a cluster by examining its distance from the cluster centroid. Gen- max(𝑎𝑗 , 𝑏𝑗 )
erally, Euclidean distance is considered to evaluate distance between
𝑎𝑗 = 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑥𝑖𝑗 , 𝑥𝑖 ) (2b)
a data sample and cluster centroid. After performing initial grouping,
we re-calculate centroid values based on the clustered data obtained 𝑏𝑗 = min(𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒(𝑥𝑖𝑗 , 𝑥𝑚 )) (2c)
𝑚≠𝑖
in previous step. This procedure of calculating centroid values and
grouping data samples into different clusters is iteratively executed where 𝑎𝑗 is the average distance between a data sample 𝑥𝑖𝑗 and other
until convergence of centroid values is attained and variation within data samples within its own cluster 𝑖, and 𝑏𝑗 is the minimum of
each cluster, given by the following objective function, is minimized: average distances between 𝑥𝑖𝑗 and data samples in other remaining
clusters, 𝑥𝑚 (𝑚 ≠ 𝑖). The Silhouette score ranges between −1 to 1. A
∑
𝐾 ∑
𝑁
‖ 𝑖 ‖2 high Silhouette score represents a high similarity within each cluster,
𝐽= ‖𝑥𝑗 − 𝑐𝑖 ‖ (1)
‖ ‖ indicating that data samples are correctly grouped, and a low Silhouette
𝑖=1 𝑗=1
score means that the data is not correctly clustered. Accordingly, we
where 𝑥𝑖𝑗 and 𝑐𝑖 are 𝑗th data sample and centroid of cluster 𝑖, respec- evaluate Silhouette scores for a range of 𝐾 values and the 𝐾 value
tively. The above objective function is expressed in terms of Euclidean corresponding to the highest Silhouette score is selected as an optimal
distance between a 𝑗th data sample and centroid of cluster 𝑖. number of clusters. Using this optimal 𝐾, the above described K-means

3
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025

Fig. 2. Example illustration of raw data characteristics.

Fig. 3. Example of grouping data into two clusters using K-means clustering.

algorithm is implemented on Dataset 1, and a clustering model is iden-

tified. Subsequently, this clustering model is utilized to extract training
𝑥̂ 𝑡+1 = 𝛼𝑥𝑡+1 + (1 − 𝛼)𝑥̂ 𝑡 (3)
data corresponding to common steady-state operation conditions from
Datasets 1 and 2. where 𝑥 and 𝑥̂ denote the measured and filtered data, respectively,
and 𝛼 represents a tunable smoothing factor that balances sensitivity
to noise and lag in filtered signal. Its value varies between 0 to 1.
2.2.2. Feature selection for extracting informative training data Following the noise removal, outlier detection is performed.
This section discusses feature selection procedure followed in the Standard score (SS) or Z-score and interquartile range (IQR) are two
developed framework. Since real-world datasets are often contaminated popular data preprocessing techniques to detect and remove outliers
with noise and outliers, it is recommended to improve clustered data from data. SS is a simple outlier detection method that measures the
quality by removing noise and outliers before selecting important input distance between a data sample and the mean value of data. SS of a
features. To this end, the developed method uses low-pass filter (LPF) 𝑗th data sample is calculated as follows:
technique to deal with noisy data. LPF is a moving-average filtering 𝑥𝑗 − 𝜇
SS(𝑗) = (4)
approach that filters a data signal based on its frequency. Specifically, 𝜎
LPF attenuates high frequency signals (noisy signals) above a cut-off where 𝜇 and 𝜎 are the mean and standard deviation of clustered data.
frequency and allows low frequency signals (process signals) to pass Usually, a SS cut-off value of 𝛾 = 3 is considered to detect an outlier.
through. As a result, the overall signal-to-noise ratio is improved while If a data sample is more than 3 standard deviations away from the
minimizing degradation in signal. In this work, we use a first-order fil- mean, it is regarded as an outlier. IQR detects an outlier based on the
ter represented by the following discrete time realization equation [27]: distance between a data sample from third (Q3) and first (Q1) quartile

4
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025

Fig. 4. Schematic of 2,3-BDO-Distillation column.

Fig. 5. Bottom product temperature profile.

distribution of clustered data. If a data sample 𝑥𝑗 is far from either based on a cut-off value. In the case study presented in Section 3, we
of 75th or 25th percentile of data distribution, it is identified as an implement these techniques for outlier removal.
outlier [28]. IQR method is summarized as follows: After dealing with noise and outliers, feature selection is imple-
⎧Outlier, mented. It is beneficial to utilize informative data for training a pre-
if 𝑥𝑗 < 𝑄1 − 𝛽 × IQR
⎪ dictive model so as to best capture the underlying process dynamics.
𝑥𝑗 ⎨Outlier, if 𝑥𝑗 > 𝑄3 + 𝛽 × IQR
⎪ Utilizing a large number of input features for model training and real-
⎩Normal, otherwise
time prediction can lead to poor generalization and be computationally
where IQR = 𝑄3 − 𝑄1 and 𝛽 is a parameter controlling the outlier expensive. Thus, PCC analysis is performed to identify key input fea-
detection dependence on IQR value. Typically, 𝛽 is taken as 1.5. Both tures that are significant to the process and use them for model training.
SS and IQR methods follow a similar outlier identification procedure PCC measures the strength and direction of relationship between two
by first finding the distribution of data, and then identifying an outlier linearly related variables. The correlation between an input and target

5
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025

Table 1
Input and target features for the 2,3-BDO distillation process (Fig. 4).
Feature symbol Description
𝑌 Bottom product temperature
𝑋01 Control valve for feedstock flowrate
𝑋02 Control valve for reboiler steam flowrate
𝑋03 Control valve for reflux flowrate
𝑋04 Control valve for bottom liquid level
𝑋05 Control valve for top pressure
𝑋06 Bottom liquid level
𝑋07 Feedstock flowrate
𝑋08 Reboiler steam flowrate
𝑋09 Reflux flowrate
𝑋10 Bottom product flow rate
𝑋11 Top pressure
𝑋12 Reboiler outlet pressure
𝑋13 Reflux tank pressure
𝑋14 Top temperature
𝑋15 Rectifying section temperature
Fig. 6. Silhouette scores for multimode data with a varying number of clusters. 𝑋16 Feedstock temperature
𝑋17 Stripping section temperature
𝑋18 Reboiler outlet temperature
𝑋19 Reflux tank temperature
feature based on 𝑠 input-target data samples is calculated as:
∑𝑠
𝑖=1 (𝑝𝑖 − 𝑝)(𝑞
̄ 𝑖 − 𝑞) ̄
𝑟𝑝𝑞 = √ √∑ (5)
∑𝑠 𝑠 large R2 value indicates a well-trained model, which can be further used
̄2
𝑖=1 (𝑝𝑖 − 𝑝) ̄2
𝑖=1 (𝑞𝑖 − 𝑞) for prediction. A detailed discussion on the LSTM model development,
where 𝑝 is an input feature with an average value of 𝑝, ̄ and 𝑞 is a including network architecture and training process, is presented in
target feature with an average value of 𝑞. ̄ The correlation value ranges Section 3.4, where we train an LSTM for the real-world distillation
between −1 to 1. The correlation between 𝑝 and 𝑞 is considered weak process.
if 𝑟𝑝𝑞 value is less than 0.3, and strong if greater than 0.6. In such a
manner, we execute PCC and select significant input features that are
relevant to the target of interest. Then, these relevant input features are 3. Case study: 2,3-Butanediol (BDO) distillation column
used in training an LSTM model.

Remark 1. Note that it is important to first cluster the data before This section presents the application of the developed method to a
performing feature selection. Feature selection is not effective when im- real-world 2,3-BDO distillation column. In the following subsections,
plemented on raw data (before clustering). Different operating modes we describe the process, and discuss simulation results obtained from
described in raw data are represented by different features. Because clustering, feature selection, and LSTM model training and testing.
of this, finding features relevant to steady-state operation from a high-
dimensional feature space of raw data is a challenging task. On the
other hand, clustering partitions raw data into groups of different 3.1. Process description
operating modes and reduces the dimensionality of feature space. As
a result, it becomes easier to identify the right input features from
Bio-based 2,3-BDO is a highly valuable compound produced from
clustered data.
microbial fermentation using low-cost renewable carbon sources such
2.2.3. LSTM model training as sucrose, lactose, and glucose, as raw materials. It is an eco-friendly
Unlike a traditional neural network, an LSTM is capable of process- material with minimum greenhouse gas emissions, and hence, is exten-
ing time-series data by accounting for both long-term and short-term sively used in many industrial applications, including pharmaceuticals,
input–output dependencies. Its network contains recurrent units to cosmetics, food additives, and fuels [29–31]. Raw 2,3-BDO product
retain information from previous time step and use it for prediction from microbial fermentation contains many impurities. These impu-
in the current time step. Typically, an LSTM model is a multi-layered rities are separated in a distillation column to obtain a high purity
network with input, output, and hidden layers. The input layer contains 2,3-BDO compound. In this work, we applied the developed method
initial data related to input variables, which is followed by the hidden to the historical process dataset obtained from a 2,3-BDO distillation
layers. Each hidden layer contains an LSTM memory cell wherein the column of GS Caltex in South Korea. A schematic of this distillation
time-series dependencies are processed. The final output layer is a process is presented in Fig. 4. The distillation column separates acetoin,
fully-connected layer that provides target variables for the given input
water, and other contaminants from a 50 wt% 2,3-BDO feedstock.
variables.
Pressure at the top of the column is controlled at −0.9 Kg∕cm2 g to −0.8
In this work, we train an LSTM model using the training data ob-
Kg∕cm2 g. At this pressure, temperature at the bottom of the column (𝑌
tained after clustering and feature selection. Specifically, the identified
input features from PCC are considered as input data for model training. in Fig. 4) is targeted to be maintained at a steady-state temperature of
Further, we evaluate the performance of the developed LSTM model 120 °C to produce 99% pure 2,3-BDO as a bottom product. The bottom
based on coefficient of determination (R2 ), given by the following product purity can be determined based on pressure and temperature
equation: dynamics. Therefore, predicting temperature at the column bottom is
∑𝑁 beneficial to optimize the distillation operation. Accordingly, the goal
(𝑦𝑖 − 𝑦̂𝑖 )2
R2 = 1 − ∑𝑖=1𝑁
(6) of this case study is to build a predictive steady-state data-driven model
̄2
𝑖=1 (𝑦𝑖 − 𝑦) to forecast the trend of bottom product temperature (𝑌 , target feature).
where 𝑁 is the number of data samples, 𝑦𝑖 is the actual value, 𝑦̂𝑖 is The other features highlighted in Fig. 4 are the input features. Table 1
the model predicted value, and 𝑦̄ is the mean of actual data samples. A lists all the input and target features for this process.

6
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025

Fig. 7. Extracting steady-state operation data using K-means clustering.

Fig. 8. Pearson correlation coefficients between input features and the target bottom product temperature for 46 feature selection cases from Table 2.

3.2. K-means clustering cluster numbers between 2 to 10 are presented in Fig. 6. The figure
shows that the highest Silhouette score is obtained for 𝑘 = 5, indi-
A raw dataset containing 5,760 data samples recorded at 1 𝑚𝑖𝑛 cating that dividing Dataset 1 into 5 clusters is optimum. Accordingly,
sampling interval from start-up to steady-state operation of the column Dataset 1 is clustered into 5 different groups representing multimode
is used to develop a predictive model for bottom product temperature, characteristics highlighted by different colors in Fig. 7. The clustering
whose profile is shown in Fig. 5. The temperature profile shows that model is further applied to Dataset 2 to extract data with steady-state
the distillation column operates in different modes (start-up, transient operation characteristics common in Datasets 1 and 2. Here, the Dataset
(unsteady), and steady-state operation). 2 with steady-state data helps in providing information about steady-
Following the procedure described in Section 2.1, we first divided state characteristics. Using this information, more steady-state data
the raw dataset into multimode (Dataset 1) and steady-state operation with identical characteristics are automatically extracted from Dataset
data (Dataset 2). We used half of Dataset 2 for training purposes and 1. It can be observed that the data represented by the purple and
the other half is used for testing the developed model. Next, K-means green colors in Dataset 1 have characteristics similar to steady-state
clustering is applied to Dataset 1 by selecting the optimum number of operation data in Dataset 2. Also, we further evaluated the clustering
clusters through Silhouette scoring technique. The Silhouette scores for model by validating its performance using a test data collected from

7
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025

Fig. 9. Schematic of LSTM training.

steady-state operation. The validation result shows that the test data Table 2
Feature selection cases based on different preprocessing methods for
was grouped into Clusters 4 and 5, indicating that the clustering model
clustered data.
successfully identified steady-state characteristics of the test data. Thus,
Case Preprocessing Tuning
we extracted the identified common steady-state operation data from index method parameter
Dataset 1 and Dataset 2 as clustered data, which is next preprocessed
Case 1 None –
before being used for model training. Case 2–4 Only LPF 𝛼 = 0.3, 0.6, 0.9 (LPF)
Please note that it is challenging to cluster high-dimensional data Case 5–7 Only IQR 𝛽 = 1.0, 1.5, 2.0 (IQR)
because of the difficulty in assessing distance between the data objects Case 8–16 LPF and IQR 𝛼 = 0.3, 0.6, 0.9 (LPF)
in a high dimensional space [32]. In such scenarios, it is often rec- 𝛽 = 1.0, 1.5, 2.0 (IQR)
Case 17–25 IQR and LPF 𝛽 = 1.0, 1.5, 2.0 (IQR)
ommended to reduce the dimensionality before performing clustering.
𝛼 = 0.3, 0.6, 0.9 (LPF)
This can be done by employing dimensionality reduction techniques Case 26–28 SS 𝛾 = 2.5, 3.0, 3.5 (SS)
such as PCA as a preprocessing step that performs linear transformation Case 29–37 LPF and SS 𝛼 = 0.3, 0.6, 0.9 (LPF)
of high-dimensional data to low-dimensional data, to which clustering 𝛾 = 2.5, 3.0, 3.5 (SS)
algorithm can then be applied. In this work, dimensionality reduction Case 38–46 SS and LPF 𝛾 = 2.5, 3.0, 3.5 (SS)
𝛼 = 0.3, 0.6, 0.9 (LPF)
is not implemented prior to K-means clustering as the considered
distillation column case study has only 18 input features. But when
dealing with high-dimensional data, it is suggested to implement PCA
as an additional preprocessing step in the developed method to improve represent different feature selection scenarios with clustered data being
clustering. cleaned using various preprocessing techniques. In practice, 𝛼 in LPF
ranges between 0 and 1, 𝛾 in SS is typically taken as 3, and 𝛽 in IQR is
3.3. Feature selection taken as 1. Based on this notion, tuning parameter values of 𝛼 = [0.3,
0.6, 0.9], 𝛽 = [1.0, 1.5, 2.0], and 𝛾 = [2.5, 3.0, 3.5] are considered in
In this section, we improve the quality of clustered data contami- the case study.
nated with noise and outliers using LPF, SS, and IQR techniques. LPF A summary of PCC estimated for all 46 cases is highlighted in Fig. 8.
is used for noise removal, and SS and IQR techniques are implemented In the figure, white, green, and dark green-colored boxes indicate the
to identify and remove outliers. After dealing with noise and outliers, PCC value between target and input features, 𝑟𝑝𝑞 , less than 0.3 and
PCC values between target and input features are calculated to select greater than 0.3 and 0.6, respectively. Since the objective is to find
significant input features for LSTM model training. Since different relevant input features, the features with white boxes representing
preprocessing methods will have a different impact on data quality, weak correlation are not selected for model training. The figure sug-
we calculated the PCC values for 46 feature selection cases to select gests that PCC values are more significantly affected by the type of
the best feature selection approach for the given data. These cases preprocessing method but not by their tuning parameters. To better
are prepared by applying different combinations of LPF, SS, and IQR analyze the result presented in this figure, we reduced the 46 cases
methods prior to feature selection, as presented in Table 2. Also, to 4 cases, as described in Table 3. The second column in the table
the effect of different tuning parameter values for each preprocessing presents significant input features selected from PCC for each case.
method is analyzed in these cases. Case 1 in Table 2 represents a The base case in the table corresponds to applying feature selection
scenario when feature selection is implemented on clustered data with to clustered data with no preprocessing. The LPF case represents a
no preprocessing. Cases 2, 3, and 4 describe the cases when feature scenario when feature selection is performed on clustered data with
selection is applied to clustered data preprocessed using LPF with an 𝛼 LPF preprocessing. The LPF + IQR corresponds to the case when noise
value of 0.3, 0.6, and 0.9, respectively. Similarly, the remaining cases removal using LPF and IQR-based outlier detection are implemented on

8
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025

Fig. 10. Comparing (a) bottom product temperature prediction and (b) absolute prediction errors for the developed LSTM model and LSTM with raw data (without feature selection
and preprocessing).

Table 4
Table 3 Hyperparameters and training options for LSTM model development.
Relevant input features identified based on average PCC values from Fig. 8. List Training options/hyperparameters
Case Features identified Number of Algorithm LSTM
relevant features Hidden layers and nodes 3 layers (10-5-10)
Base X01, X02, X06, X07, X08, X11, X12, 11 Drop out 0.1
X13, X16, X17, X18 Activation function Relu
LPF X01, X02, X03, X06, X07, X08, 12 Weight initialization He initialization
X11, X12, X13, X16, X17, X18 Optimizer Adam
LPF + IQR X02, X03, X08, X09, X11, 9 Learning rate 0.001
X12, X14, X16, X17, X18 Batch number 512
LPF + SS X02, X03, X08, X09, X12, 10 Early stopping option (trigger) Patience (10)
X14, X16, X17, X18

3.4. LSTM model development

clustered data prior to performing feature selection. Similarly, before
feature selection, LPF preprocessing and outlier detection using SS are In this subsection, we developed a LSTM model using the clustered
implemented on clustered data in the LPF + SS case. We compared data to predict bottom product temperature. The network architecture
these 4 cases to see how different preprocessing techniques influence and the considered training parameters are provided in Table 4. The
the number of significant features identified. It can be seen from Table 3 network hyperparameters are tuned based on trial and error. As a
that less input features are identified by LPF + IQR and LPF + SS cases regularization technique to prevent overfitting, early stopping is im-
than the base case and only LPF case. This indicates that implementing plemented to terminate training when the network’s performance stops
noise removal and outlier detection aids in removing redundant input improving on validation data. The input features identified in Table 3
features and selecting only major input features relevant to target are used in model training. Fig. 9 presents a schematic illustrating the
feature. development of LSTM model following the developed method.

9
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025

Fig. 11. Comparing (a) bottom product temperature prediction and (b) absolute prediction errors for the developed LSTM model and the linear model (with reboiler outlet as
input).

Note that both feature selection and preprocessing are applied only data, model with raw data, and linear model (with X18 as input) are
to steady-state data obtained after clustering. Using this data, we train quantified using R2 , as presented in Table 5. It can be observed that as
an LSTM to predict the bottom product temperature. The resulting compared to the raw data-based model that considered all the data and
LSTM’s prediction performance is compared with an LSTM trained input features for model prediction, the clustered data-based model has
using the entire dataset (i.e., without clustering) and all 18 input predicted well with a less number of input features, thereby highlight-
features with no feature selection and preprocessing. The comparison ing the importance of feature selection for predictive modeling. These
result is shown in Fig. 10. Here, to predict an output at 𝑡𝑘 , 10 data results indicate that the developed method provides a good direction
samples collected from 𝑡𝑘−9 , … , 𝑡𝑘 are given as model input to the to employ feature selection and clustering techniques for modeling and
LSTM model. Fig. 10 shows that the model with clustered data has predicting multimode industrial processes using real-world datasets.
successfully predicted the actual temperature profile as compared to the
model trained using raw data (without any preprocessing and feature Remark 2. It can be observed that for the presented case study,
selection). the performance of the linear model is only slightly less accurate
From Fig. 8, it can be seen that reboiler outlet (input feature X18) than the developed model. In general, a linear model only fits the
is highly correlated with bottoms temperature. Because of their high linear relationship between the input and output variables. However,
correlation, one can claim that a simple linear model with reboiler many input and output variables in chemical processes have complex
outlet as input is sufficient to accurately predict bottoms temperature. nonlinear interactions, which cannot be captured using a linear model.
To test this, we compared the performance of the developed LSTM Therefore, in view of developing a generalized modeling approach, we
model with a linear model in predicting bottoms temperature for a considered building a nonlinear model.
test dataset. The linear model is trained with the data utilized in LSTM
training. The comparison results are shown in Fig. 11. The results show
that the developed method predicts better than the linear model. It 4. Conclusion
is important to note that real-world data we have is highly nonlinear
with random and chaotic disturbances. Hence, a simple linear model is In this work, a multimode process modeling framework based on
not suitable for predicting the nonlinearities present in the data. Fur- feature selection and machine learning is developed. The developed
ther, the prediction accuracies for the developed model with clustered method uses clustering to extract steady-state operation data from raw

10
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025

Table 5 References
R2 between the actual and predicted temperature.
Model R2 [1] Z. Gao, H. Saxen, C. Gao, Guest editorial: Special section on data-driven
Model with raw data 0.024 (no preprocessing) approaches for complex industrial systems, IEEE Trans. Ind. Inform. 9 (4) (2013)
Model with clustered data 0.694 (LPF+IQR) 2210–2212.
Linear model 0.600 (LPF+IQR) [2] C. Ji, W. Sun, A review on data-driven process monitoring methods:
Characterization and mining of industrial data, Processes 10 (2) (2022) 335.
[3] Y.-N. Sun, Z.-L. Zhuang, H.-W. Xu, W. Qin, M.-J. Feng, Data-driven modeling
and analysis based on complex network for multimode recognition of industrial
data, constituting multiple data characteristics from different operating processes, J. Manuf. Syst. 62 (2022) 915–924.
[4] Z. Yan, B.-L. Huang, Y. Yao, Multivariate statistical process monitoring of
modes, and implements feature selection to identify significant input batch-to-batch startups, AIChE J. 61 (11) (2015) 3719–3727.
features from the clustered data. First, K-means clustering based on [5] A. Narasingam, J.S.-I. Kwon, Development of local dynamic mode decomposition
Silhouette scoring approach is employed to extract steady-state opera- with control: Application to model predictive control of hydraulic fracturing,
Comput. Chem. Eng. 106 (2017) 501–511.
tion data. Next, noise and outliers from the clustered data are removed
[6] A. Narasingam, P. Siddhamshetty, J. Sang-Il Kwon, Temporal clustering for
based on LPF, SS, and IQR methods to improve the data quality. Then, order reduction of nonlinear parabolic PDE systems with time-dependent spatial
Pearson’s correlation coefficient between input and target features is domains: Application to a hydraulic fracturing process, AIChE J. 63 (9) (2017)
calculated to select relevant input features for model training. Finally, 3818–3831.
[7] A. Narasingam, P. Siddhamshetty, J.S.-I. Kwon, Handling spatial heterogeneity
an LSTM model is trained to predict steady-state operation of a mul-
in reservoir parameters using proper orthogonal decomposition based ensemble
timode process. As a case study, we applied the developed method to Kalman filter for model-based feedback control of hydraulic fracturing, Ind. Eng.
real-world industrial data collected from a 2,3-BDO distillation column Chem. Res. 57 (11) (2018) 3977–3989.
of GS Caltex, South Korea. In this case study, we analyzed the per- [8] M.S.F. Bangi, A. Narasingam, P. Siddhamshetty, J.S.-I. Kwon, Enlarging the
domain of attraction of the local dynamic mode decomposition with control
formance of LSTM models trained using raw data and clustered data
technique: Application to hydraulic fracturing, Ind. Eng. Chem. Res. 58 (14)
for different preprocessing techniques. The clustered data-based model (2019) 5588–5601.
accurately predicted the bottom product temperature of a test dataset, [9] B. Bhadriraju, M.S.F. Bangi, A. Narasingam, J.S.-I. Kwon, Operable adaptive
which highlights the significance of feature selection and clustering in sparse identification of systems: Application to chemical processes, AIChE J. 66
(11) (2020) e16980.
modeling the steady-state operation of multimode industrial processes.
[10] P. Kumari, B. Bhadriraju, Q. Wang, J.S.-I. Kwon, Development of parametric
The developed method can be seen as a general modeling frame- reduced-order model for consequence estimation of rare events, Chem. Eng. Res.
work for multimode industrial processes. A good amount of training Des. 169 (2021) 142–152.
[11] P. Shah, M.Z. Sheriff, M.S.F. Bangi, C. Kravaris, J.S.-I. Kwon, C. Botre, J. Hirota,
data is essential to attain a reliable performance using the developed
Deep neural network-based hybrid modeling and experimental validation for an
LSTM. Since the developed method utilizes noise removal and outlier industry-scale fermentation process: Identification of time-varying dependencies
detection strategies on clustered data prior to using it for model train- among parameters, Chem. Eng. J. 441 (2022) 135643.
ing, the developed method is capable of handling noise and outliers [12] G. Wang, J. Liu, Y. Zhang, Y. Li, A novel multi-mode data processing method
and its application in industrial process monitoring, J. Chemom. 29 (2) (2015)
in data. Nonetheless, a high prediction accuracy is not guaranteed
126–138.
when there is a limited availability of data. To address this challenge, [13] Y. Sebzalli, X. Wang, Knowledge discovery from process operational data using
complementing the LSTM model training with the known firs-principles PCA and fuzzy clustering, Eng. Appl. Artif. Intell. 14 (5) (2001) 607–616.
model accelerates the network learning using less data [11,33,34]. [14] C. Rosén, Z. Yuan, Supervisory control of wastewater treatment plants by
combining principal component analysis and fuzzy c-means clustering, Water
Furthermore, in this work, the developed method was utilized for
Sci. Technol. 43 (7) (2001) 147–156.
modeling only steady-state operation. As a next step, the developed [15] Z. Zhu, Z. Song, A. Palazoglu, Transition process modeling and monitoring based
method will be further developed to a multimode process monitoring on dynamic ensemble clustering and multiclass support vector data description,
framework for modeling and monitoring multiple operating modes and Ind. Eng. Chem. Res. 50 (24) (2011) 13969–13983.
[16] E. Gallup, T. Quah, D. Machalek, K.M. Powell, Enhancing fault detection with
the in-between mode-to-mode transitions.
clustering and covariance analysis, IFAC-PapersOnLine 55 (2) (2022) 258–263.
[17] J. Zhang, P. Wang, R. Yan, R.X. Gao, Long short-term memory for machine
remaining life prediction, J. Manuf. Syst. 48 (2018) 78–86.
Declaration of competing interest [18] J.-T. Zhou, X. Zhao, J. Gao, Tool remaining useful life prediction method based
on LSTM under variable working conditions, Int. J. Adv. Manuf. Technol. 104
(9) (2019) 4715–4726.
The authors declare that they have no known competing finan- [19] Y. Bai, J. Xie, D. Wang, W. Zhang, C. Li, A manufacturing quality prediction
cial interests or personal relationships that could have appeared to model based on AdaBoost-LSTM with rough knowledge, Comput. Ind. Eng. 155
influence the work reported in this paper. (2021) 107227.
[20] Y. Choi, N. An, S. Hong, H. Cho, J. Lim, I.-S. Han, I. Moon, J. Kim, Time-series
clustering approach for training data selection of a data-driven predictive model:
Data availability Application to an industrial bio 2, 3-butanediol distillation process, Comput.
Chem. Eng. 161 (2022) 107758.
[21] B. Wang, Y. Li, Y. Luo, X. Li, T. Freiheit, Early event detection in a deep-learning
The data that has been used is confidential. driven quality prediction model for ultrasonic welding, J. Manuf. Syst. 60 (2021)
325–336.
[22] Z. Li, J. Li, Y. Wang, K. Wang, A deep learning approach for anomaly detection
Acknowledgments based on SAE and LSTM in mechanical equipment, Int. J. Adv. Manuf. Technol.
103 (1) (2019) 499–510.
[23] P. Xu, R. Du, Z. Zhang, Predicting pipeline leakage in petrochemical system
This work was supported by the Korean Institute of Industrial Tech- through GAN and LSTM, Knowl.-Based Syst. 175 (2019) 50–61.
[24] J. Lee, Y.C. Lee, J.T. Kim, Fault detection based on one-class deep learning for
nology within the framework of the project: Development of AI Plat-
manufacturing applications limited to an imbalanced database, J. Manuf. Syst.
form Technology for Smart Chemical Process (grant number: JH-22– 57 (2020) 357–366.
0004), the Korea Institute for Advancement of Technology (KIAT) grant [25] J. Wang, J. Zhang, X. Wang, Bilateral LSTM: A two-dimensional long short-term
funded by the Korea Government (MOTIE) (grant number: P0017304, memory model with multiply memory units for short-term cycle time forecasting
in re-entrant manufacturing systems, IEEE Trans. Ind. Inform. 14 (2) (2017)
Human Resource Development Program for Industrial Innovation),
748–758.
Texas A&M Energy Institute, USA, and the Artie McFerrin Department [26] P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation
of Chemical Engineering, USA. of cluster analysis, J. Comput. Appl. Math. 20 (1987) 53–65.

11
Y. Choi et al. Chemical Engineering Journal 457 (2023) 141025

[27] G. Casiez, N. Roussel, D. Vogel, 1€ filter: a simple speed-based low-pass filter [31] D. Tinoco, S. Borschiver, P.L. Coutinho, D.M. Freire, Technological development
for noisy input in interactive systems, in: Proceedings of the SIGCHI Conference of the bio-based 2, 3-butanediol process, Biofuels, Bioprod. Biorefin. 15 (2)
on Human Factors in Computing Systems, 2012, pp. 2527–2530. (2021) 357–376.
[28] M. Mahajan, S. Kumar, B. Pant, U.K. Tiwari, Incremental outlier detection in [32] I. Assent, Clustering high dimensional data, Wiley Interdiscip. Rev. Data Min.
air quality data using statistical methods, in: 2020 International Conference on Knowl. Discov. 2 (4) (2012) 340–350.
Data Analytics for Business and Industry: Way Towards a Sustainable Economy, [33] D. Lee, A. Jayaraman, J.S. Kwon, Development of a hybrid model for a partially
ICDABI, IEEE, 2020, pp. 1–5. known intracellular signaling pathway through correction term estimation and
[29] J.M. Park, H. Song, H.J. Lee, D. Seung, In silico aided metabolic engineering of neural network modeling, PLoS Comput. Biol. 16 (12) (2020) e1008472.
Klebsiella oxytoca and fermentation optimization for enhanced 2, 3-butanediol [34] M.S.F. Bangi, J.S.-I. Kwon, Deep hybrid modeling of chemical process:
production, J. Ind. Microbiol. Biotechnol. 40 (9) (2013) 1057–1066. Application to hydraulic fracturing, Comput. Chem. Eng. 134 (2020) 106696.
[30] C.W. Song, J.M. Park, S.C. Chung, S.Y. Lee, H. Song, Microbial production of
2, 3-butanediol for industrial applications, J. Ind. Microbiol. Biotechnol. 46 (11)
(2019) 1583–1601.