0% found this document useful (0 votes)
21 views11 pages

CPFD Barracuda

This study presents a data-based prediction model for biomass fast pyrolysis using computational particle fluid dynamics (CPFD) simulations coupled with machine learning and deep learning techniques. The model predicts product yields based on reaction temperature and gas residence time, demonstrating improved accuracy over traditional lumped process models. The findings provide new guidelines for modeling fast pyrolysis reactions and highlight the potential of data-driven approaches in this field.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views11 pages

CPFD Barracuda

This study presents a data-based prediction model for biomass fast pyrolysis using computational particle fluid dynamics (CPFD) simulations coupled with machine learning and deep learning techniques. The model predicts product yields based on reaction temperature and gas residence time, demonstrating improved accuracy over traditional lumped process models. The findings provide new guidelines for modeling fast pyrolysis reactions and highlight the potential of data-driven approaches in this field.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Journal of Analytical and Applied Pyrolysis 162 (2022) 105448

Contents lists available at ScienceDirect

Journal of Analytical and Applied Pyrolysis


journal homepage: www.elsevier.com/locate/jaap

Biomass fast pyrolysis prediction model through data-based prediction


models coupling with CPFD simulation
Tae-Hoon Kim a, Myung Kyu Choi b, Hang Seok Choi b, *
a
Department of Computer Information Technology, Purdue University Northwest, Hammond, IN 46323, USA
b
Department of Environmental Engineering, Yonsei University, Wonju, Gangwon-do 26493, Republic of Korea

A R T I C L E I N F O A B S T R A C T

Keywords: Biomass fast pyrolysis has been traditionally investigated through experiments or numerical simulations
Fast pyrolysis including hydrodynamics, transport phenomena of heat and mass, and chemical reactions. Instead of traditional
Biomass methods, this study utilized data-based prediction algorithms to model a biomass fast pyrolyzer using a spouted
Computational particle fluid dynamics
fluidized bed, to predict the yields of major products. We used the labeled dataset generated by the computa­
Machine learning
Regression
tional particle fluid dynamics (CPFD) simulation to train the data-based prediction models, and used the reaction
temperature and gas residence time as inputs. Eight data-based prediction models, including machine learning
and deep learning, were selected and compared. The selected data-based prediction methods indicated better
agreement with the CPFD product yields than those of the lumped process models. This study provides new
guidelines for modeling fast pyrolysis reactions using CPFD and data-based prediction methods.

1. Introduction efficiently without explicit programming. Supervised and unsupervised


learning are two subsets of data-based prediction method. Supervised
Recently, carbon-neutral biomass has become an attractive renew­ learning uses a set of labeled data to train the model for prediction,
able energy source due to global warming caused by the use of fossil whereas unsupervised learning uses a set of non-labeled data. Therefore,
fuels. Several conventional technologies, such as incineration, pyrolysis, supervised learning is used to estimate or predict values from known
and gasification, can be employed to obtain renewable energy from the data patterns and is mainly used for regression or classification in ap­
thermal conversion of biomass [1–4]. Fast pyrolysis is a technique for plications such as weather forecasting, email filtering, and intrusion
obtaining liquid fuel from biomass. Fluidized bed reactors are widely detection systems [12–15]. Unsupervised learning is appropriate for
used for fast pyrolysis; however, recently, spouted bed reactors have also data with unknown patterns and is mainly used for clustering or
been studied [5–9]. Spouted bed reactors are more flexible than con­ grouping data in applications such as market segmentation, social
ventional fluidized beds considering fuel particle size and pressure drop network analysis, and organizing computer clusters [16,17]. Although
[9]. Process analysis (PA) is indispensable for scaling up fast pyrolysis data-based prediction methods have been widely adapted and applied in
systems. Generally, lumped models are used for unit processes or unit many areas, they have not been actively adapted and implemented for
devices in the entire system. However, the lumped models adopted in analyzing biomass energy. Ozbas et al. [18] predicted hydrogen pro­
process analysis have uncertainties because of their theoretical limita­ duction from biomass gasification by comparing four ML models, linear
tions. Various numerical approaches have been used to increase the regression (LR), K-nearest neighbors (KNN) regression, support vector
accuracy of the models for process intensification. regression (SVR), and decision tree regression. In their study, the coef­
Machine learning (ML) and deep learning are data-based prediction ficient of determination (R2), mean absolute error (MAE), and root mean
methods that have been introduced to improve model accuracy. square error (MSE) were used to compare the models, and LR had the
Currently, data-based prediction methods are actively adopted and best performance. Monroy et al. [19] implemented a support vector
practically used in various fields, such as finance, healthcare, and life machine (SVM) to predict light intensity using both experimental and
sciences [10,11]. The wide adoption of these models is attributed to simulated data for batch hydrogen production. Whiteman and Kana [20]
their abilities to learn from experience and improve the system proposed the use of an artificial neural network (ANN) to determine the

* Correspondence to: Department of Environmental Engineering, Yonsei University, 1 Yonseidae-gil, Wonju, Gangwon-do 26493, Republic of Korea.
E-mail address: [email protected] (H.S. Choi).

https://doi.org/10.1016/j.jaap.2022.105448
Received 8 September 2021; Received in revised form 8 December 2021; Accepted 13 January 2022
Available online 19 January 2022
0165-2370/© 2022 Elsevier B.V. All rights reserved.
T.-H. Kim et al. Journal of Analytical and Applied Pyrolysis 162 (2022) 105448

relationships between process inputs for fermentative biohydrogen model.


production, and their results showed a high accuracy. For the CPFD simulation, the reaction temperature and inlet gas
In this study, we developed and compared several data-based pre­ velocity ranged from 673 to 823 K and 4–7 m/s, respectively. Hence, the
diction models, ML, and deep learning, for the biomass fast pyrolysis total dataset included 256 ML training cases. Before performing the
reaction in a spouted bed reactor. We developed a new computational main calculation for the training data, the CPFD results were compared
particle fluid dynamics (CPFD) model of biomass fast pyrolysis for a with the experimental data for validating the CPFD models. As shown in
spouted bed. The data from the developed CPFD simulation were Fig. 3, except the gas yield, the CPFD results corresponded well with the
coupled to train and evaluate the selected data-based prediction models. experimental results. The difference of gas yield between experiment
In addition, the results of PA with the lumped process model were used and CPFD may come from experimental uncertainties and numerical
to compare and validate the accuracy of the selected data-based pre­ errors. However, the purpose of this study was to develop a data-based
diction models. Fig. 1 shows the conceptual framework of the data-based prediction model for fast pyrolysis through coupling CPFD and ML.
prediction model coupled with the CPFD. CPFD data could be applied as the training sets. In the future, the CPFD
model will be optimized. Then, the accuracy of the data-based model
2. Methods will be increased using the presently developed numerical methods.
Fig. 4 shows the contours of the time-averaged mass concentration of tar
2.1. CPFD modeling with respect to the inlet gas velocity in the reactor. The dataset for ML
was derived using CPFD simulation results. The complete details of the
The data-based prediction method requires a dataset to train the CPFD modeling and the results can be found in a previous study [24].
model, which can be generated either by experiments or simulations in
this field of study. Data generation from experiments is relatively
expensive considering cost and time. Therefore, several studies utilize 2.2. Machine learning method: regression models
CFD datasets to train data-based prediction, ML, or deep learning models
[21–23]. We used simulation data from CPFD in a spouted bed to train During biomass fast pyrolysis, CPFD considers the reaction temper­
data-based prediction models. Fig. 2 shows the computational domain ature and gas residence time for the product as an input to compute the
and structured grid of the spouted-bed reactor [24]. ratio of tar, gas, char, and outlet temperature. The scatter plot indicates
In the fast pyrolysis of biomass, the reaction temperature and gas that the computed outputs and input features have a relationship;
residence time are important process variables [25]. Hence, biomass fast therefore, ML and deep learning models can be utilized to predict the
pyrolysis was simulated using the commercial code Barracuda VR 17.2.0 outputs. Because labeled data are available from the set of simulations,
for different reaction temperatures and gas residence times. The CPFD supervised ML and deep learning regression models are suitable for our
model using the multiphase particle-in-cell (MP-PIC) method was used study. We implemented and compared several types of ML regression
to simulate the gas–solid multiphase flow fields, accompanied by fast models: LR, SVR, KNN, decision tree (DT), random forest (RF), and deep
pyrolysis reactions. The fluid phase is described by the volume-averaged learning models with three hidden layers.
Navier–Stokes equation, while the particle momentum follows the
MP-PIC method, which is a Lagrangian description of particle motion in 2.2.1. Linear regression
terms of ordinary differential equations in bi-directional coupling with LR is the simplest and most common statistical technique for pre­
the fluid. In the present study, the Wen–Yu and Ergun drag models [26] diction modeling. LR utilizes a given dataset of independent variables
were adopted to calculate the drag force of the particles. In addition, the with the corresponding dependent variables to provide a linear regres­
particle normal stress model developed by Harris and Crighton [27] was sion equation. LR determines the coefficient for each independent var­
adopted. A two-stage semi-global mechanism, including secondary tar iable to induce the least residual error (i.e., the difference between the
cracking, was applied for the chemical reactions. The kinetic data of the dependent variable value and predicted value). The hypothesis function
reaction mechanism were measured using a newly developed of the ML and LR with multiple variables is shown in Eq. (1), and the cost
micro-spouted-bed thermogravimetric analyzer [24] and the kinetics function for the parameter vector θ in Rn+1 is given by Eq. (2), where θ is
were applied to an Arrhenius type model [24]. Table 1 lists the gov­ the coefficient vector, xi represents the variables, m is the number of
erning equations for the gas–solid multiphase flow used in the CPFD samples or datapoints, and y is the target value.

Fig. 1. Conceptual framework of this study.

2
T.-H. Kim et al. Journal of Analytical and Applied Pyrolysis 162 (2022) 105448

Fig. 2. Computational domain and grid system of spouted bed reactor [24].

Table 1
Governing equations for gas–solid multiphase flow in CPFD model.
1. Fluid phase
- Continuity equation
∂(θf ρf )
+ ∇⋅(θf ρf uf ) = δ̇mp (1)- Momentum equation
∂t
∂(θf ρf uf )
+ ∇⋅(θf ρf uf uf ) = − ∇p +F +θf ρf g +∇⋅(θf τf ) (2)- Particle–fluid interaction
∂t { [ ] }
∇p dmp
F = − ∭ f mp Dp (uf − up ) − +up dmp dup dTp (3)- Energy equation
ρp dt
( )
∂(θf ρf hf ) ∂p
+ ∇⋅(θf ρf hf uf ) = θf + uf ⋅∇p +Φ − ∇⋅(θf q) +Q̇ +Sh +q̇D (4)- Species
∂t ∂t
transport equation
∂(θf ρf Yf,i )
+ ∇⋅(θf ρf Yf,i uf ) = ∇⋅(ρf Dθf ∇Yf,i ) +δṁi,chem (5)2. Solid phase
∂t
- Particle acceleration equation
dup 1 1 up − up
A= = Dp (uf − up ) − ∇p − ∇τp +g + (6)- Particle normal stress
dt ρp θp ρp τD
equation [27]
Ps εγp
τp = (7)3. Drag model
max[εcp − εp , θ(1 − εp )]
- Gidaspow drag model [28]

Fig. 3. Comparison of product yields between the CPFD data and experimental

⎪ D1 εp < 0.75εCP data [24].

⎪ ( )
⎨ εp − 0.75εCP
Dp = (D2 − D1 ) + D1 0.75εCP ≥ εp ≥ 0.85εCP (8)
0.85εCP − 0.75εCP
(1)




⎩ hθ (x) = θ0 + θ1 x1 + θ2 x2 + θ3 x3 + ⋯ + θn xn
D2 εp > 0.85εCP
- Wen and Yu
⃒ ⃒ 1 ∑m
( ( (i) ) )2
3 ρg ⃒ug − up ⃒ J(θ) = hθ x − y(i) (2)
D1 = Cd (9) 2m i=1
8 ρ rp
⎧ p
24
LR in ML uses the gradient descent technique to find the coefficient
⎪ − 2.65

⎪ Reεg
⎪ Re < 0.5


vector θ, which minimizes the cost function, J(θ). The ordinary least

Cd = 24
ε− 2.65 (1 + 0.15Re0.687 )0.5 ≤ Re ≤ 1000 (10)
⎪ Re g




squares (OLS) LR model is a type of linear least squares method to es­
timate unknown parameters, the coefficient vector, of the linear

⎩ 0.44ε− 2.65 Re > 1000
g
⃒ ⃒
2ρg rp ⃒ug − up ⃒ regression equation in Eq. (1), which minimizes the cost function J(θ) in
Re = (11)
μg Eq. (2). Multicollinearity has to be considered for multiple variable
- Ergun
( ) ⃒⃒ ⃒ problems in LR, indicating the existence of near-linear relationships
C 1 εp ρg ug − up ⃒ among the independent variables. When multicollinearity exists among
D2 = 0.5 + C2 , where C1 = 180, C2 = 2 (12)
rp ρp
εg Re
the independent variable dataset, the prediction of OLS LR may be un­
biased, but the prediction variance would be large, and the predicted

3
T.-H. Kim et al. Journal of Analytical and Applied Pyrolysis 162 (2022) 105448

Fig. 4. Contours of time-averaged tar mass concentration for different inlet velocities at a reaction temperature of 400 ◦ C [24].

value would not be the true value. To reduce this type of error, a degree SVR utilizes C for indicating the trade-off between the flatness of the
of bias should be added to the regression estimates. Ridge and Lasso are function and the number of deviations greater than tolerance, ε. Fig. 5
the two common regularized LR types that add a penalty to the cost illustrates the linear function that SVR tries to find for a given dataset of
function. Ridge LR adds a penalty equivalent to the sum of the square of independent variables, which is described in Eqs. (5)–(7).
the coefficients, and Lasso LR adds the sum of the absolute values of the However, the SVR discussed above applies only to a linear dataset.
coefficients to the cost function. Eqs. (3) and (4) show the cost functions For non-linear datasets, SVR uses a kernel function that transforms the
of Ridge and Lasso, respectively, where p is the number of independent data into a higher dimensional feature space, transforming a nonlinear
variables and α is a regularization parameter. When α is zero, the cost dataset into a linear form of the dataset for applying linear separation.
function is equivalent to the OLS LR model. When α becomes large, it An optimized model for nonlinear SVR can be formulated using Eq. (8),
penalizes the coefficients more and reduces the model complexity and where αi and α∗i are Lagrange multipliers, and K(xi, x) is the kernel.
multicollinearity. A ridge model was selected for comparison. ∑N ( )
y= i=1
αi − α∗i ∙K(xi , x) + b (8)
p
1 ∑m
( ( (i) ) )2 ∑
J(θ) = hθ x − y(i) + α θ2j (3) Radial basis function (RBF) is one of the most commonly used kernel
2m i=1 j=1
functions, as shown in Eq. (9), where γ is a free parameter:
1 ∑m ∑p
⃒ ⃒ ( )
( ( (i) ) )2 (9)
′ ′

J(θ) = hθ x − y(i) + α ⃒θ j ⃒ (4) K(x, x ) = exp γ‖x − x ‖2


2m i=1 j=1

2.2.2. Support vector regression


SVM can also be used for regression, and SVR uses the same basic
principles with some differences, where the regression function is as flat
as possible, with a low prediction error [29]. For a given dataset of in­
dependent variables, SVR determines the linear function of Eq. (5), and
all values of the dependent variable are within a given tolerance.
y = wx + b (5)
SVR seeks a small w to make the line represented by Eq. (5) flat.
Therefore, SVR attempts to find the minimum w, which satisfies all data
points that are close to the learner function with a variation of ε. This
problem can be formulated as follows. (6) and (7) for I = 1, …, n.

1 ∑N
( )
minimize ‖w‖2 + C ξi + ξ∗i (6)
2 i=1

⎧ ⎫

⎪ ⎪

⎨ yi − wxi − b ≤ ε + ξi
⎪ ⎪

subject to wxi + b − yi ≤ ε + ξ∗i (7)

⎪ ⎪


⎩ ξi , ξ∗i > 0 ⎪

Fig. 5. Approximated linear function of SVR with tolerance.

4
T.-H. Kim et al. Journal of Analytical and Applied Pyrolysis 162 (2022) 105448

The optimization parameters of SVR (RBF) are C and γ, which should and pattern recognition. The basic structure of a DNN follows an ANN
be selected carefully. inspired by biological neural networks [33]. The structure of an ANN is a
collection of artificial neurons and their connections (i.e., synapses)
2.2.3. K-nearest neighbor organized into input, hidden, and output layers. Signals travel from one
The KNN is also a supervised learning algorithm. Unlike other neuron to connected neurons, where each neuron has its own weight
methods, KNN simply uses a training dataset to predict the result based obtained from training. While ANNs have a single hidden layer, DNNs
on the outputs of the nearest neighbors, without generalizing the model. have multiple hidden layers, as shown in Fig. 6. As shown in Fig. 6(a),
As KNN is known as instance-based or lazy learning, it finds the average ANNs have the first layer as an input layer with three input features, the
of the values from k number of nearest neighbors from the testing point. second layer has two neurons, and the last layer is the output layer with
The weights are generated based on the distance. The Euclidean, Man­ one output. DNNs, as shown in Fig. 6(b), have multiple hidden layers
hattan, and Minkowski distances are used for the metrics in the with a set of neurons that are completely connected from the input layer
regression. Eqs. (10), (11), and (12) show the Euclidean, Manhattan, and to the output layer. The number of hidden layers and neurons in each
Minkowski distances, respectively, for two points x and y with n number hidden layer should be determined carefully to achieve optimal
of feature i. Minkowski with q = 1 is the Manhattan distance and q = 2 is prediction.
the Euclidean distance.
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ 2.3. Process modeling with multi-step reaction mechanism
∑n
Euclidean distance = (xi − yi )2 (10)
i=1 A process simulation was performed for the fast pyrolysis system,
and the results were used to compare the results for CPFD- and ML-based

n
predictions. Numerous kinetic models have been developed for the fast
Manhattan distance = |xi − yi | (11)
i=1
pyrolysis of biomass. Among these models, the multi-step reaction
mechanism for the fast pyrolysis reaction includes various biomass
(

n
)1/q compositions. Furthermore, Lignin-C, Lignin-H and Lignin-O reactions
Minkowski distance = (|xi − yi | )q (12) have been proposed for the fast pyrolysis reaction of lignin with complex
i=1 structures. Multi-step mechanisms can clarify the yield of detailed
Based on the distance from the testing point, k nearest neighbors are chemical species, such as levoglucosan, acetic acid, and phenol. In the
selected, and a prediction is made from selected neighbors with different present study, three types of multistep mechanisms, proposed by Ranzi
weights. The nearer neighbor contributes more (i.e., weight) than others [34], Corbetta [35], and Blondeau [36], were selected. The multi-step
in the prediction. The selection of the k value is a key hyperparameter mechanisms of Ranzi and Corbetta differ in terms of the Arrhenius
and should be determined carefully. The selection of a smaller k value constants for the pre-exponential factor and activation energy in the
results in greater variance or less stable results, whereas a larger k value primary reaction of cellulose, hemicellulose, and lignin. For these
results in a higher bias or less precise results. Generally, an adaptive multi-step mechanisms, comprehensive kinetic schemes have been
method, heuristics, or cross-validation is used to select an appropriate k developed, implying that more than 2000 reactions occur in the gaseous
value. phase. However, heterogeneous reactions were not considered in this
study. Hence, Blondeau proposed a multi-step mechanism that included
2.2.4. Decision tree secondary heterogeneous tar-cracking reactions. For reference, the
DT regression attempts to determine the structural patterns in data mechanism of Blondeau was developed based on the Ranzi mechanism.
using tree structures. Instead of using the relationships of input and The complete details of process modeling and multistep mechanisms can
output features, it uses the decision rules in the tree structure to predict be found in a previous study [37]. Also, the two-stage, semi-global re­
the output based on input features [30,31]. The root, internal, and ter­ action mechanism used in the CPFD model was compared [24].
minal nodes are main components in the decision tree structure. A root
node is located at the top of the tree. The internal nodes branch out from
the root and a set of terminal nodes (leaves) is at the bottom. The de­
cision starts from the root and follows the next internal nodes based on
the probabilities obtained from the training process. A prediction is
made when upon reaching the terminal node. Classification and
regression trees (CART), chi-square automatic interaction detection,
Iterative Dichotomiser 3, and C4.5 are the four major algorithms used
for decision tree modeling [30]. We used CART for regression because of
its efficiency [31].

2.2.5. Random forest


RF utilizes multiple decision trees for prediction [32]. The decision
tree tends to result in overfitting on the training dataset, which induces a
higher error in the testing dataset. To overcome this deficiency of the
decision tree, RF computes and compares multiple decision trees to
reduce the error in the testing dataset. The parameters in RF are used to
set the number of decision trees for comparison. The greater the number
of decision trees, the better the prediction. However, more decision trees
require a longer computation time, which is a limitation.

2.2.6. Artificial and deep neural network


Deep neural networks (DNNs) are deep learning architectures that
utilize neural networks. Deep learning is widely used in applications
such as decision making, computer vision, natural language processing, Fig. 6. Architecture of (a) artificial neural network and (b) deep neu­
ral network.

5
T.-H. Kim et al. Journal of Analytical and Applied Pyrolysis 162 (2022) 105448

3. Results and discussion Table 3


Hyperparameters for ANN, DNN with 2 and 3 hidden layers.
Eight ML models were used in this study to predict the ratio of tar, ANN DNN (H = 2) DNN (H = 3)
gas, char, and outlet temperature, and their results were compared. In
Hyper-parameters h1 = 3 h1 = 2 h1 = 3h 2 = 3h 3 = 2
this section, we discuss the hyperparameters of the selected ML-based h2 = 3
prediction models and the training process, and compare the results.

and test RMSEs, we also computed the minimum, mean, and maximum
3.1. Performance of regression models error rates from the test set to compare the models. The error rate for the
ith element is shown in Eq. (10), where yi and pi are the target and
We used 256 datapoints from the CPFD simulation study, which prediction of the ith element, respectively.
computed the mass fraction of gas, tar, char, and outlet temperature
using two input features, reaction temperature and inlet gas velocity. We Error Rate of i − th element =
|yi − pi |
(10)
normalized the dataset to the range of 1–3 to avoid the influence of the yi
larger input data for better prediction. For the comparison study, we The models were compared using the average training and test
selected the ridge model from linear regression, SVR (RBF), KNN, DT, RMSEs and the average of the minimum, mean, and maximum error
RF, ANN, and DNN with two and three hidden layers to predict the gas, rates from 10-fold cross-validation. Table 4 lists the steps of the evalu­
tar, and char contents, and outlet temperatures. First, we determined the ation process.
hyper-parameter(s) for each model that produced the best prediction by
using a grid search with a range of hyperparameters. We defined the
3.2. Comparison of regression models
search range of hyperparameters and evaluated the prediction results to
find the best one. Alpha (α) in Ridge was searched in the range of
In this section, we compare the predicted yields of gas, tar, and char,
10–10− 7. In SVR, the search range of the cost was from 2− 5 to 216, and
and outlet temperature, average training and test RMSEs, and average of
the gamma ranged from 2− 15 to 24. The k values in KNN ranged from 1 to
minimum, mean, and maximum error rates from 10-Folds cross-
50, the random number in DT ranged from 1 to 20, and the number of
validation for the eight different regression models. In each model
layers in RF ranged from 10 to 10,000. The number of neurons was
fitting with validation, the RMSEs of the training and test datasets and
searched in the range of 1–2 times the number of neurons in the previous
the minimum, mean, and maximum error rates were computed from the
layer. For example, if the number of input features was three, the search
test dataset using the fitted model.
range of the first hidden layer was 1–6, and the search range of the
The average training and test RMSEs from 10-fold cross-validations
second hidden layer was 1–12. Tables 2 and 3 show the selected
for the gas, tar, and char yields, and outlet temperatures are shown in
hyperparameters for each model. The epoch and batch size in the ANN
Figs. 7, 9, 11, and 13, respectively. According to the results, KNN, DT,
and DNN indicate the number of iterations and number of samples that
and RF outperform other models, while ANN and DNN are the worst for
were be propagated through the network, respectively. The chosen
both training and test RMSE. The DT model fits the best in the training
epoch was 500, with a batch size of 5.
dataset in all cases, whereas the trained RF model predicts the best in the
Using the selected hyperparameters, we split the dataset into two
test set. In most cases, KNN presents a larger RMSE for the training
subsets: training and test sets. The model was trained using a training
dataset than DT or RF, but the RMSE of the test set is better than that of
set, and the trained model was evaluated using both training and test
DT (Figs. 8, 10, 12, 14).
sets to avoid underfitting or overfitting. The root mean square error
In terms of error rates, the ANN and DNN produced relatively higher
(RMSE) is generally used to evaluate trained models. RMSE is deter­
mean and maximum error rates. Three models, KNN, DT, and RF, did not
mined from the square root of the sum of the difference between the
exceed 5% of the maximum error rate, while RF showed the lowest
target and prediction, as shown in Eq. (13), where yi is the target and pi is
maximum error rate of 2.96%. Deep learning models, such as ANN and
a prediction.
DNN, are fully connected networks, with neurons between layers being
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
̅
∑ fully connected. The weights of the links between neurons are trained
RMSE = (yi − pi )2 (13)
and optimized from dataset. Therefore, it generally requires a large size
The model suffers from an underfitting problem when the RMSE of
the trained model is high. An overfitting problem can be seen when the Table 4
test RMSE is high, while the training RMSE is low. However, the model K-Fold cross-validation process with K = 10.
may be biased depending on the splitting of training and test sets. K = 10;
Therefore, validation is required to evaluate the model. There are two Split data into K groups X = {X1, X2, X3, …, XK}, Y = {Y1, Y2, Y3, …, YK}
validation methods. One method, cross-validation, splits the dataset into Train_error[], Test_error[]
Error_mean[], Error_min[], Error_max[]
three folds: training, testing, and validation. Generally, 60% of the
for i = 1 to K
dataset is used for training, 20% for testing, and the remaining 20% for Fit the model with train set, {X – {Xi}}
validation. The other method is K-fold cross-validation, which randomly Find train prediction set (Ptrain) using train set
splits the dataset into K subsets and evaluates the model for K times. At Find test prediction set (Ptest) using test set, Xi
each iteration, a different subset from the K group is selected as the test Compute train RMSE from (Ptrain, {Y – {Yi}})
Compute test RMSE from (Ptest, Yi)
set, and the rest of the K − 1 subsets act as the training set. The training
Compute Error set from (Ptest, Yi)
and test RMSEs are computed at each iteration and averaged for the Train_error[i] = train RMSE
model evaluation. In this study, we used K-fold cross-validation with K Test_error[i] = test RMSE
= 10 to evaluate the model. In addition to the average of the training Error_min[i] = min(Error)
Error_mean[i] = mean(Error)
Error_max[i] = max(Error)
Table 2 end for
Hyperparameters for ridge, SVR, KNN, DT, and RF. Average train error = mean(Train error[])
Average test error = mean(Test_error[])
Ridge SVR KNN DT RF
Average minimum error = mean(Error_min[])
− 3
Hyper-parameters α = 10 cost = 4 k=3 n=0 n = 1000 Average mean error = mean(Error_mean[])
gamma = 0.125 Average maximum error = mean(Error_max[])

6
T.-H. Kim et al. Journal of Analytical and Applied Pyrolysis 162 (2022) 105448

Fig. 7. Comparison of train and test RMSE in gas prediction.

Fig. 8. Comparison of error rates in gas prediction. Fig. 10. Comparison of error rates in tar prediction.

Fig. 9. Comparison of train and test RMSE in tar prediction. Fig. 11. Comparison of train and test RMSE in char prediction.

of dataset for the better prediction modeling comparing to the machine Considering the error in prediction, RF outperforms all other models,
learning models. Due to the limited size of dataset in this study, ANN and with the lowest minimum, average, and maximum error rates. Only
DNN do not perform as good as machine learning models. It is noted that trained KNN, DT, and RF models predicted the mass fraction of gas with
Deep learning models may perform better and become meaningful as the a maximum prediction error less than 5%. The DNN with two hidden
data keeps accumulating. layers had the largest maximum error rate of 46% as expected from the
RMSE analysis.
3.2.1. Gas yield
Comparing the RMSEs of all models, it was found that KNN, DT, and 3.2.2. Tar yield
RF outperform other models, while KNN converges the best (i.e., The RMSE results are similar to those of gas yield, and the three
smallest differences of RMSE for training and test sets). The worst RMSE lowest training and test RMSEs were for KNN, DT, and RF. DT performs
was produced by the DNN with two hidden layers in both the training the best model fitting with the training dataset, whereas RF shows the
and test datasets. lowest RMSE in model fitting with the test dataset. ANN and DNN

7
T.-H. Kim et al. Journal of Analytical and Applied Pyrolysis 162 (2022) 105448

fit better than the others for outlet temperature. DT had the best fit for
the test dataset, whereas RF had the best fit for the test dataset. DT and
RF models produced the lowest error rates.

3.3. Comparison of prediction model results: CPFD, machine learning,


and process analysis with a lumped model

Fig. 15 shows the yields of fast pyrolysis products for each prediction
model for CPFD, ML, and PA, considering the reaction temperature
range from 673 to 823 K, with an input gas velocity of 4 m/s. In this
study, KNN, DT, and RF, which showed high accuracy, close to the CPFD
values in Section 3.2, and traditional LR ridge models were selected as
comparison targets. As shown in Fig. 15(a), ML models produced similar
results compared to CPFD, whereas PA produced similar tendencies only
Fig. 12. Comparison of error rates in char prediction. at reaction temperatures above 733 K. The yield of tar in CPFD increased
from 55.6 wt% to 58.7 wt% until the reaction temperature reached
723 K. As the reaction temperature further increased to 823 K, the yield
of tar decreased to 50.9 wt%. Apart from the LR_Ridge model, the ML
models showed similar trends as the CPFD with an error rate less than
1%. The prediction result of the LR_Ridge model did not follow the CPFD
result but continued to decrease from 59.7 wt% to 51.8 wt% with
increasing reaction temperature. Considering PA, the multi-step mech­
anism of Corbetta, compared to that of Ranzi, resulted in higher biomass
degradation at reaction temperatures above 753 K, resulting in a higher
tar yield. Moreover, the yield of tar was almost constant for both
mechanisms as the reaction temperature increased higher than 773 K.
These two multistep mechanisms cannot simulate heterogeneous tar-
cracking reactions. The mechanism presented by Blondeau, which in­
cludes a secondary tar cracking reaction, indicates a peak of tar yield at a
reaction temperature of 753 K. Among the PA models, the mechanism
proposed by Blondeau was the most similar to the CPFD data. The gas
Fig. 13. Comparison of train and test RMSE in outlet temperature prediction.
yield increased to slightly less than 30 wt%. for the CPFD- and ML-based
prediction models in Fig. 15(b), as the reaction temperature increased.
Conversely, the yield of char in Fig. 15(c) decreased slightly. Consid­
ering PA, the prediction of the yields of gas and char indicate steep in­
creases and decreases, respectively, as the reaction temperature
increased above 703 K, resulting in a relatively higher error rate
compared to the result from CPFD. The result of the two stage semi-
global model showed low reactivity compared to other multi-step re­
actions, and the lower or higher product yields.
Figs. 16 and 17 show the product yields at specific reaction tem­
peratures (753 and 823 K, respectively) with respect to the input gas
velocity. A higher input gas velocity led to a shorter residence time for
the product gas inside the pyrolysis reactor. Considering the reactor
volume, the residence time of the product gas decreased from 1 to 0.5 s
as the input gas velocity increased from 4–7 m/s, respectively. In
Figs. 16 and 17, for CPFD and ML, when the inlet gas velocity increased
from 4 m to 7 m/s, the tar yield only increased by approximately 5 wt%.
Fig. 14. Comparison of error rates in outlet temperature prediction.
It was confirmed that the tar yield at the reaction temperature of 823 K
was lower than that at 753 K in all cases, because secondary tar cracking
showed the highest mean and maximum error rates as observed for gas
actively occurred at higher reaction temperatures. The yields of the gas
yield. Additionally, the results show that a maximum error rate of less
and char were almost the same. As evidenced from the CPFD and ML
than 5% is observed in KNN, DT, and RF, where the RF leads to the
results, the yield of the product was influenced by the inlet gas velocity
lowest error.
but not by the reaction temperature. Ridge showed a lower accuracy
compared with other methods particularly for the gas yield. Further­
3.2.3. Char yield
more, a higher reaction temperature led to a lower accuracy, as shown in
For char yield, the RMSE of the training and test sets are similar to
Figs. 16(b) and 17(b). Tar yield decreased as the input gas velocity
those for gas and tar yields. The best fitting model for the training
increased in PA. This contrasts with the CPFD and ML results. However,
dataset was DT, whereas the best fitting model for the test dataset was
the gas yield decreased, and the char yield increased as the input gas
RF. In terms of the error rates, KNN, DT, and RF all had low minimum,
velocity increased, as in CPFD and ML. However, the error rates were
mean, and maximum error rates. However, none of the models showed a
larger. The mechanism proposed by Blondeau appeared to be similar to
maximum error rate less than 5%. The minimum error rate for the RF
the CPFD results particularly among PA, and the error rate was
model was the lowest at 5.79%.
approximately 5.4% on average. For the two stage semi-global model,
lower reactivity was shown compared to other multi-step reactions.
3.2.4. Outlet temperature
The test and training RMSEs indicated that KNN, DT, and RF models

8
T.-H. Kim et al. Journal of Analytical and Applied Pyrolysis 162 (2022) 105448

Fig. 16. Yields of fast pyrolysis products for various prediction models with
Fig. 15. Yields of fast pyrolysis products for various prediction models with respect to the inlet gas velocity (Reaction temperature: 753 K).
respect to the reaction temperature (Inlet gas velocity: 4 m/s).

Finally, the results of the ML prediction models were compared with


4. Conclusion those of the CPFD and PA lumped models for all products. Our com­
parison study indicates that several ML models produce highly accurate
In this study, we proposed a new numerical method that coupled ML predictions. DT and Ridge had the highest and lowest accuracies of
and CPFD. Particularly, a high-fidelity CPFD dataset was created and 99.99% and 96.71%, respectively. RF and KNN had accuracies of
utilized to train, test, and evaluate the ML models. The ML models were 99.77% and 99.27%, respectively. However, considering accuracy, deep
used to predict the yields of biomass fast pyrolysis in a spouted bed.

9
T.-H. Kim et al. Journal of Analytical and Applied Pyrolysis 162 (2022) 105448

showed that the ML-based models predicted the CPFD results with a high
accuracy. More importantly, it can significantly reduce the computation
time required for the same conditions. For example, for aerodynamic
analysis, there was a previous study that could reduce the computation
time by about 150 times compared to the same CFD calculation [21].
Besides the computation time, CPFD requires relatively more time, cost,
and even more experience. However, data-based approach using ML can
reduced those barriers.

CRediT authorship contribution statement

Tae-Hoon Kim: Conceptualization, Methodology, Software, Vali­


dation, Formal analysis, Data curation, Writing – original draft, Visu­
alization. Myung Kyu Choi: Methodology, Software, Validation,
Writing – original draft, Writing – review & editing. Hang Seok Choi:
Conceptualization, Resources, Writing – review & editing, Supervision,
Project administration, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial


interests or personal relationships that could have appeared to influence
the work reported in this paper.

Acknowledgments

This study was carried out with the support of ‘R&D Program for
Forest Science Technology (Project No. “2021356A00-2123-AC03”)’
provided by Korea Forest Service (Korea Forestry Promotion Institute).
And this work was supported by the Korea Institute of Energy Tech­
nology Evaluation and Planning (KETEP) and the Ministry of Trade,
Industry and Energy (MOTIE) of the Republic of Korea (No.
20173010092430).

References

[1] P. McKendry, Energy production from biomass (part 1): overview of biomass,
Bioresour. Technol. 83 (1) (2002) 37–46.
[2] M. Balat, Biomass energy and biochemical conversion processing for fuels and
chemicals, Energy Sources Part A 28 (6) (2006) 517–525.
[3] H. Long, X. Li, H. Wang, J. Jia, Biomass resources and their bioenergy potential
estimation: a review, Renew. Sustain. Energy Rev. 26 (2013) 344–352.
[4] J. François, L. Abdelouahed, G. Mauviel, F. Patisson, O. Mirgaux, C. Rogaume,
A. Dufour, Detailed process modeling of a wood gasification combined heat and
power plant, Biomass Bioenergy 51 (2013) 68–82.
[5] M.J. San José, M. Olazar, F.J. Peñas, J.M. Arandes, J. Bilbao, Correlation for
calculation of the gas dispersion coefficient in conical spouted beds, Chem. Eng.
Sci. 50 (13) (1995) 2161–2172.
[6] M. Olazar, R. Aguado, J. Bilbao, A. Barona, Pyrolysis of sawdust in a conical
spouted-bed reactor with a HZSM-5 catalyst, AIChE J. 46 (5) (2000) 1025–1033.
[7] R. Aguado, M. Olazar, M.J. San José, G. Aguirre, J. Bilbao, Pyrolysis of sawdust in a
conical spouted bed reactor. Yields and product composition, Ind. Eng. Chem. Res.
39 (6) (2000) 1925–1933.
[8] G. Elordi, G. Lopez, R. Aguado, M. Olazar, J. Bilbao, Catalytic pyrolysis of high
density polyethylene on a HZSM-5 zeolite catalyst in a conical spouted bed reactor,
Int. J. Chem. React. Eng. 5 (2007) 1–14.
[9] A.R. Fernandez-Akarregi, J. Makibar, G. Lopez, M. Amutio, M. Olazar, Design and
operation of a conical spouted bed reactor pilot plant (25 kg/h) for biomass fast
pyrolysis, Fuel Process. Technol. 112 (2013) 48–56.
[10] H. Ghoddusi, G.G. Creamer, N. Rafizadeh, Machine learning in energy economics
and finance: a review, Energy Econ. 81 (2019) 709–727.
[11] Y. Yu, Z. Ye, Healthcare data-based prediction algorithm for potential knee joint
Fig. 17. Yields of fast pyrolysis products for various prediction models with injury of football players, J. Healthc. Eng. 2021 (2021), 2021-10.
respect to the inlet gas velocity (Reaction temperature: 823 K). [12] F. Al-Obeidat, B. Spencer, O. Alfandi, Consistently accurate forecasts of
temperature within buildings from sensor data using ridge and lasso regression,
Future Gener. Comput. Syst. 110 (2020) 382–392.
learning showed relatively higher mean and maximum error rates [13] G. Manogaran, D. Lopez, A survey of big data architectures and machine learning
because of insufficient data. There are some limitations to generalizing algorithms in healthcare, Int. J. Biomed. Eng. Technol. 25 (2–4) (2017) 182–211.
the results of this study. First, depending on the characteristics of ML, [14] A. Aybar-Ruiz, S. Jiménez-Fernández, L. Cornejo-Bueno, C. Casanova-Mateo,
J. Sanz-Justo, P. Salvador-González, S. Salcedo-Sanz, A novel grouping genetic
where the sample size of data affects the accuracy, comparing the ac­ algorithm–extreme learning machine approach for global solar radiation prediction
curacy of the results of ML based on a larger number of samples is from numerical weather models inputs, Sol. Energy 132 (2016) 129–142.
necessary in subsequent studies. However, the results of this study [15] C. Yin, Y. Zhu, J. Fei, X. He, A deep learning approach for intrusion detection using
recurrent neural networks, IEEE Access 5 (2017) 21954–21961.

10
T.-H. Kim et al. Journal of Analytical and Applied Pyrolysis 162 (2022) 105448

[16] D. Borthakur, H. Dubey, N. Constant, L. Mahler, K. Mankodiya, Smart fog: fog [26] C.Y. Wen, Mechanics of fluidization, Chemical Engineering Progress Symposium
computing framework for unsupervised clustering analytics in wearable internet of Series, Vol. 62, 1966, pp. 100–111.
things, in: Proceedings of the 2017 IEEE Global Conference on Signal and [27] S.E. Harris, D.G. Crighton, Solitons, solitary waves, and voidage disturbances in
Information Processing (GlobalSIP), IEEE, 2017, pp. 472–476. gas-fluidized beds, J. Fluid Mech. 266 (1994) 243–276.
[17] M. Di Capua, E. Di Nardo, A. Petrosino, Unsupervised cyber bullying detection in [28] D. Gidaspow, Multiphase Flow and Fluidization: Continuum and Kinetic Theory
social networks, in: Proceedings of the 23rd International Conference on Pattern Descriptions, Academic Press, 1994.
Recognition (ICPR), IEEE, 2016, pp. 432–437. [29] A.J. Smola, B. Schölkopf, A tutorial on support vector regression, Stat. Comput. 14
[18] E.E. Ozbas, D. Aksu, A. Ongen, M.A. Aydin, H.K. Ozcan, Hydrogen production via (3) (2004) 199–222.
biomass gasification, and modeling by supervised machine learning algorithms, [30] H. Saito, D. Nakayama, H. Matsuyama, Comparison of landslide susceptibility
Int. J. Hydrog. Energy 44 (32) (2019) 17260–17268. based on a decision-tree model and actual landslide occurrence: the Akaishi
[19] I. Monroy, E. Guevara-López, G. Buitrón, A mechanistic model supported by data- Mountains, Japan, Geomorphology 109 (3–4) (2009) 108–121.
based classification models for batch hydrogen production with an immobilized [31] B. Pradhan, A comparative study on the predictive ability of the decision tree,
photo-bacteria consortium, Int. J. Hydrog. Energy 41 (48) (2016) 22802–22811. support vector machine and neuro-fuzzy models in landslide susceptibility
[20] J.K. Whiteman, E.G. Kana, Comparative assessment of the artificial neural network mapping using GIS, Comput. Geosci. 51 (2013) 350–365.
and response surface modelling efficiencies for biohydrogen production on sugar [32] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32.
cane molasses, BioEnergy Res. 7 (1) (2014) 295–305. [33] Y. Bengio, Learning deep architectures for Al, Found. Trends Mach. Learn. 2 (1)
[21] V. Sekar, Q. Jiang, C. Shu, B.C. Khoo, Fast flow field prediction over airfoils using (2007) 1–127, 2009.
deep learning approach, Phys. Fluids 31 (5) (2019), 057103. [34] E. Ranzi, P.E.A. Debiagi, A. Frassoldati, Mathematical modeling of fast biomass
[22] J. Tian, C. Qi, Y. Sun, Z.M. Yaseen, B.T. Pham, Permeability prediction of porous pyrolysis and bio-oil formation. Note I: kinetic mechanism of biomass pyrolysis,
media using a combination of computational fluid dynamics and hybrid machine ACS Sustain. Chem. Eng. 5 (4) (2017) 2867–2881.
learning methods, Eng. Comput. (2020) 1–17. [35] M. Corbetta, S. Pierucci, E. Ranzi, H. Bennadji, E. Fisher, Multistep kinetic model of
[23] H. Bazai, E. Kargar, M. Mehrabi, Using an encoder-decoder convolutional neural biomass pyrolysis, in: Proceedings from the XXXVI Meeting of the Italian Section of
network to predict the solid holdup patterns in a pseudo-2d fluidized bed, Chem. the Combustion Institute, 2013.
Eng. Sci. 246 (2021), 116886. [36] J. Blondeau, H. Jeanmart, Biomass pyrolysis at high temperatures: prediction of
[24] H.C. Park, H.S. Choi, Fast pyrolysis of biomass in a spouted bed reactor: gaseous species yields from an anisotropic particle, Biomass Bioenergy 41 (2012)
Hydrodynamics, heat transfer and chemical reaction, Renew. Energy 143 (2019) 107–121.
1268–1284. [37] M.K. Choi, H.C. Park, H.S. Choi, Comprehensive evaluation of various pyrolysis
[25] A.V. Bridgwater, Review of fast pyrolysis of biomass and product upgrading, reaction mechanisms for pyrolysis process simulation, Chem. Eng. Process. Process
Biomass Bioenergy 38 (2012) 68–94. Intensif. 130 (2018) 19–35.

11

You might also like