0% found this document useful (0 votes)
41 views17 pages

A Self-Organizing Deep Network Architecture Designed Based On LSTM Network Via Elitism-Driven Roulette-Wheel Selection For Time-Series Forecasting

A self-organizing deep network architecture designed based on LSTM network via elitism-driven roulette-wheel selection for time-series forecasting

Uploaded by

lynyker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views17 pages

A Self-Organizing Deep Network Architecture Designed Based On LSTM Network Via Elitism-Driven Roulette-Wheel Selection For Time-Series Forecasting

A self-organizing deep network architecture designed based on LSTM network via elitism-driven roulette-wheel selection for time-series forecasting

Uploaded by

lynyker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Knowledge-Based Systems 289 (2024) 111481

Contents lists available at ScienceDirect

Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys

A self-organizing deep network architecture designed based on LSTM


network via elitism-driven roulette-wheel selection for
time-series forecasting
Kun Zhou a, Sung-Kwun Oh b, c, d, *, Witold Pedrycz e, f, g, Jianlong Qiu a, h, Kisung Seo c
a
School of Automation and Electrical Engineering, Linyi University, Linyi, Shandong 276000, China
b
School of Electrical & Electronic Engineering, The University of Suwon, 17 Wauan-gil, Bongdam-eup, Hwaseong-si, Gyeonggi-do 18323, South Korea
c
Department of Electronic Engineering, Seokyeong University, Seoul 02713, South Korea
d
Research Center for Big Data and Artificial Intelligence, Linyi University, Linyi 276005, China
e
Department of Electrical & Computer Engineering, University of Alberta, Edmonton, AB T6R 2V4, Canada
f
Systems Research Institute, Polish Academy of Sciences, Warsaw 00-901, Poland
g
Faculty of Engineering and Natural Sciences, Department of Computer Engineering, Istinye University, Sariyer, Istanbul, Turkiye
h
Key Laboratory of Complex Systems and Intelligent Computing in University of Shandong, Linyi University, Linyi, Shandong 276000, China

A R T I C L E I N F O A B S T R A C T

Keywords: In this study, we propose a new self-organizing deep network architecture of fuzzy polynomial neural networks
Fuzzy polynomial neural networks (FPNN) (FPNN) based on Fuzzy rule-based Polynomial Neurons (FPNs) and a long short-term memory (LSTM) network to
Long short-term memory network (LSTM) solve the task of time-series forecasting. In the existing regression model based on polynomial neural networks
Fuzzy c-means (FCM) clustering
(PNN), it is difficult to achieve high quality performance when predicting time series data, because this model
Elitism-driven roulette-wheel selection
(E_RWS)
lacks the ability to extract temporal and spatial information. Therefore, we propose a new architecture consisting
Time-series forecasting of one LSTM (temporal) layer and several fuzzy polynomial (spatial) layers to overcome the above-mentioned
shortcomings of PNN and enhance its predictive ability to approximate the data. The temporal layer consists
of LSTM neurons that have inherently strong modeling capabilities to learn sequential information. The spatial
layers are composed of Rule-based Polynomial Neurons (FPNs) that can effectively reflect the complex nonlinear
structure found in the input space and granulate it using of the Fuzzy C-Means (FCM) clustering method. An
elitism-driven roulette-wheel selection (E_RWS) is used to select appropriate neurons. E_RWS not only ensures
that the neuron with the strongest fitting ability is selected but also increases the diversity of candidate neurons.
According to the experimental results, the proposed model has a high prediction performance and outperforms
many state-of-the-art prediction methods when applied to the real-world time-series.

1. Introduction proposed to deal with prediction problems due to its effective nonlinear
fitting abilities [4]. PNN was one of the variants of Group Method of
Artificial Neural Networks (ANNs) are a category of important Data Handling (GMDH), a popular method for determining high-order
intelligent algorithms that perform well in classification or prediction nonlinear relationships between input and output space [5–7]. The
tasks such as image classification, text emotion classification or stock most evident benefit of GMDH comes with its ability to handle
forecasting [1–3]. There are numerous ANN models such as fuzzy neural high-dimensional data and approximate the nonlinear relationship
networks (FNN), radial basis function neural networks (RBFNN), poly­ present in input-output data, so it is widely applied in prediction
nomial neural networks (PNNs) and deep learning including convolu­ modeling of complex nonlinear processes. Compared with other modi­
tional neural networks (CNNs), recurrent neural networks (RNNs) and fied GMDH models, PNN exhibits a high degree of flexibility in its
LSTM. compositional structure and the composition of each layer is determined
As a part of advanced and extended neural networks, PNN was by the learning process.

* Corresponding author at: School of Electrical & Electronic Engineering, The University of Suwon, 17 Wauan-gil, Bongdam-eup, Hwaseong-si, Gyeonggi-do 18323,
South Korea.
E-mail address: [email protected] (S.-K. Oh).

https://doi.org/10.1016/j.knosys.2024.111481
Received 27 February 2023; Received in revised form 19 December 2023; Accepted 4 February 2024
Available online 5 February 2024
0950-7051/© 2024 Published by Elsevier B.V.
K. Zhou et al. Knowledge-Based Systems 289 (2024) 111481

A great deal of improved approaches based on PNN have achieved estimation methods, such as fuzzy logic/fuzzy inference system, is the
excellent performance in prediction tasks. Oh and Pedrycz present fuzzy most commonly used approach to construct predictive models. Li et al.
polynomial neural networks (FPNNs), which that combine fuzzy logic proposed a pruning algorithm based on partial least squares (PLS)
with PNN to create a novel fuzzy polynomial structure, and develop regression for a novel simplified LSTM neural network (LSTM), which is
several extensions and optimizations of FPNNs [8–10]. Oh and Park applied to time series forecasting [26]. Fan et al. proposed a hybrid
et al. designed another regression model that combines the radial basis forecasting method based on gray wolf optimization, variational mode
function and PNN [11]. Huang present a new topology of PNN that decomposition, artificial gorilla optimization, convolutional neural
combined fuzzy rules and wavelet neural networks [12]. After that, an network, and two-way short-term memory network, namely
improved model called hybrid fuzzy wavelet neural network (HFWNN) GWO-VMD-GTO CNN-BiLSTMS forecasting model [27]. The experi­
was developed to remove the limitations of PNNs and FWNNs from the ments prove that the proposed GWO-VMD-GTO CNN-BiLSTMS model is
previous study [13]. Zhang designed a novel reinforced hybrid model better than other models in terms of prediction accuracy. Qin et al.
called RHFNNs that achieve the best prediction accuracy on machine proposed the deep attention fuzzy cognitive maps (DAFCM), which is
learning datasets [14]. Roh proposed a novel hierarchical structure composed of spatiotemporal fuzzy cognitive maps (STFCM), long
where the generation of layer is dynamical and multiple least-squares short-term memory (LSTM) neural network, temporal fuzzy cognitive
(MLS) support vector regression (SVR) is utilized as neurons [15]. maps (TFCM) and residual structures [28]. DAFCM can make inter­
The fuzzy-based models are well known models for time series pretable predictions of multivariate non-stationary long time series
forecasting. Lin et al. proposed an air quality prediction system based on through deep learning. Ahmed et al. uses a fusion of polylinear regres­
the neuro-fuzzy network approach [16]. A four-layer fuzzy neural sion, LSTM, and data augmentation to predict time series [29]. The
network is constructed from fuzzy clusters automatically selected from model can effectively predict financial markets in the future. Zheng and
training data. Then, a PSO and BP algorithms are used to optimize the Zhang proposed a novel Dynamic Spatial-Temporal Adjacent Graph
parameters. Inspired by kernel methods and support vector machines, Convolutional Network (DSTAGCN), which connects the latest time slice
Yuan et al. proposed a time series prediction framework based on kernel with each past time slice to construct the spatiotemporal graph. Exper­
mapping and high-order HFCM filtering [17]. Xu et al. presented a iments on public datasets show that the method outperforms baselines
correlation based neuro-fuzzy Wiener type wind power forecasting with fast convergence [30]. Moreover, the time series prediction method
model is proposed by using special separate signals to decouple dynamic based on fuzzy information granulation has attracted the attention of
linear and static nonlinear characteristics of wind power systems for many scholars. Tang et al. built an LSTM-based trend fuzzy granulation
high accuracy wind power forecasting [18]. Mohammadi et al. an ac­ to perform the long-term prediction of the financial data [31]. In [32], a
curate, precise, and efficient combined forecasting framework with multi-stage point-interval combined significant wave height forecasting
ridge regression, high order FCM (HFCM), and empirical wavelet system (CWHFs) based on the multi-objective grasshopper optimization
transform (EWT) [19]. EWT is used to transform non-stationary time algorithm and the fuzzy information granulation strategy is designed to
series into multivariate sequences. And HFCM is used to model and forecast the half-hour actual wave height at different buoy locations. Li
predict multivariate time series, and help identify trend patterns. In et al. used the LSTM and fuzzy information granulation to construct a
[20], an assignment scheme based on the K-nearest neighbor was pro­ building load interval prediction method and improved the interval
posed, which sets the membership degree according to the category prediction accuracy [33].
attribute of the nearest neighbor samples. The Gustafson-Kessel (GK) However, there are some limitations and research gaps in the related
clustering algorithm is used to obtain the membership values of the works. The PNN-based models (Refs [8–15]) focused on two basic as­
input set of the type-1 fuzzy regression functions (T1FRFs) approach. sumptions to design the new neurons: one is that there is no or weak
Compared with traditional fuzzy neural networks (FNNs), IT2FNN has correlation between the input variables and the relationship between
stronger ability to address the uncertainty and anti-disturbance prob­ the input variables remains unchanged, and the other is that these
lems. Wang et al., propose an intelligent hybrid multivariable air quality studies are mostly used for small data sets. The model is severely con­
forecasting system based on feature selection and modified evolving strained by these assumptions, which prevent it from often exhibiting
interval type-2 quantum fuzzy neural network (eIT2QFNN) [21]. This the desired flexibility. Nevertheless, few studies have focused on the
study proposes a two-stage feature selection model based on the Pearson hybrid algorithm of PNN and LSTM and applied it to time series fore­
correlation coefficient (PCC) and relief-F, which can extract feature casting. In addition, certain FNN-based models, as noted in [17,20,21,
variables and remove redundant information. A novel multi-objective 23], have shown good results but mainly on small datasets. In some
chaotic Bonobo optimizer algorithm is proposed to improve the hybrid methods, the network structure is fixed from the beginning and
eIT2QFNN. Salimi-Badr et al. proposed a new type of shapeable interval cannot be adapted to the situation [26–31].
type-2 fuzzy sets to construct a correlation-aware IT2FNN. This type of Based on the analysis of the related prediction works of time series, it
fuzzy sets can construct different shapes to better handle uncertainty can be concluded that the prediction accuracy, the combination strategy
[22]. Zhang et al. presented a non-stationary fuzzy neural network of PNN and LSTM, the data processing capabilities and the ability of
(NFNN) model by combining fuzzy inference systems and neural net­ hybrid models to handle time series prediction still need improvement
works [23]. Similar to type-2 FNNs, the NFNN has the ability to directly and investigation. Therefore, the motivation of this study can be sum­
handle uncertainties. However, it should be emphasized that an NFNN is marized as follows:
simply a repetition of a type-1 FNN with slightly different instantiations
of the MFs over time. Therefore, it is fundamentally different from the (1) In today’s complex world, time series forecasting is essential due
type-2 FNNs. to the prevalence of uncertainty and complexity in various daily
Recently, deep learning has emerged as a novel and promising applications, industries, and natural phenomena. Accurate time
approach that shows excellent performance in many fields to solve series prediction is of great importance in finance, economy,
classification or prediction problems. CNN is a typical deep learning environmental protection, and agriculture, etc.
method that has unique superiority in image classification due to its (2) The traditional PNN model does not include the prediction
appropriate feature extraction ability and nonlinear classification problem of time series. Therefore, the research of time series
capability [24]. In the prediction problems, LSTM is a modified RNN prediction using PNN is a work of theoretical significance.
architecture that has attracted much attention for its sufficiency in (3) For PNN, the elitism selection is used as the selection criteria of
capturing nonlinear trends and dependencies and has been applied to neurons to construct the network structure. It makes the selected
alleviate the problem of long-term dependency in time series processing. neurons too similar to cause the training process to fall into a
[25]. Currently, the combining of deep learning models with uncertainty local optimum. Effective neuron selection is of great importance

2
K. Zhou et al. Knowledge-Based Systems 289 (2024) 111481

effectively utilized to address real-world challenges, including


the estimation of cement compressive strength and the prediction
of key parameters in the activated sludge process.

The novelties of the proposed model are summarized below. A novel


topology of LSTM-FPNN based on E_RWS is proposed for time series
prediction. LSTM and FPN are used as the basic neurons of the layers to
capture the temporal features and the nonlinear relationship between
inputs and outputs. E_RWS increases the flexibility of network con­
struction and improves the generalization ability of the model this
innovative model addresses a critical limitation of the original PNN,
which is its insensitivity to temporal and spatial information. It effec­
tively fills the gap where the original PNN falls short in handling time
series data. The synergistic effect of LSTM, fuzzy rule-based PNN, and
hybrid selection strategy is exploited to construct a sound forecasting
model with higher forecasting accuracy and reliability.
Fig. 1. Structure of LSTM cell.
The study is organized as follows. In Section 2, we discuss our pro­
posed LSTM-FPNN and the selection method of E_RWS. In Section 3, we
for constructing network structure and enhancing the general­ introduce the learning paradigms used in our architecture. Section 4
ization of prediction models. describes the framework of our proposed model. The experimental setup
(4) The effectiveness and efficiency of the model must not only be and results, as well as the subsequent analysis are reported in Section 5.
estimated on benchmark time series datasets, but also its appli­ Finally, Section 6 concludes the study.
cation in the real-world needs to be explored.
2. Architecture of the long short- term memory based-fuzzy
Therefore, we proposed a self-organizing deep architecture consist­ polynomial neural network
ing of long short-term memory-based fuzzy polynomial neural network
(LSTM-FPNN) with the aid of elitism-driven roulette wheel selection. The topology structure of LSTM-FPNN is constructed by involving
LSTM with simple architecture is employed as the neurons on the first LSTM, FPNN and E_RWS. LSTM neurons are employed due to its supe­
layer of the network. FPN are utilized to replace the original polynomial riority to capture features in time-series. FPN neurons are used to cap­
neurons (PN) as the basic neurons in the remaining layers. A hybrid ture the details of data and represent uncertainty involved in data.
selection strategy of elitism-driven roulette wheel selection (E_RWS) is E_RWS is designed to select the neurons during the generation of layers.
proposed for the selection of neurons in the processing of layer
generation.
2.1. Structure of LSTM network neurons
The contributions of this study can be outlined as follows:

LSTM is based on an extended of RNN. The LSTM model can identify


(1) A novel neuron, designed to learn temporal features through deep
nonlinear trends in data and retain information from the past over an
learning using LSTM, serves as the basic neuron in the first layer.
extended period of time, because they are based on memory and gates
Notably, this is the first application of the LSTM mechanism to
and use nonlinear activation functions in each layer, making them
the PNN model, effectively addressing the challenges of long-
suitable for learning long-term dependencies. The cell state, which
range dependency. LSTM demonstrates its ability to capture
functions like a conveyor belt to flow the information, is the key of
extensive dependencies within sequential data without suc­
LSTM. The advantage of LSTM is that three types of gates (input, forget,
cumbing to the problems of exploding or vanishing gradients. As
and output gate) as illustrated in Fig. 1 are carefully designed to protect
a result, it proves to be a suitable tool for capturing nonlinear
and control the cell state. Let {x1 , x2 , …, xT } denote a typical input
trends and dependencies.
(2) In the second and subsequent layers, FPN is used as the basic sequence for an LSTM, where xt ∈ Rk represents a k-dimensional vector
neuron to capture nonlinear trends between the input and output of real values at the tth time step.
variables of the model. FPN is a construct that combines fuzzy Forget gate: it decides what information will be thrown away from
logic and PNN, it serves a dual purpose: it dynamically generates the cell state and can be described as follows:
the model structure with adaptability and inherits the complex (
ft = σ Wf × xt + Uf × ht− 1 + bf
)
(1)
nonlinear fitting capabilities. At the same time, FPN effectively
conveys data uncertainty by using clustering methods (FCM), Where ft denotes the forgetting threshold, σ denotes the sigmoid
thus facilitating a clear representation of the relationships be­ activation function that converts an input value to a value between 0 and
tween data points. 1, xt is the input value, Wf is the input weight and Uf is the recurrent
(3) In the proposed LSTM-FPNN, a hybrid selection strategy known weight, ht− 1 is the output value at time t − 1, and bf is the bias term.
as E_RWS is developed to select neurons and construct the Input gate: it decides which information will be stored in the cell
network. In conventional PNN, the method for selecting candi­ state.
date neurons is based on choosing the ones with the best per­
it = σ (Wi × xt + Ui × ht− 1 + bi ) (2)
formance. However, this approach can often result in the
selection of neurons that are too similar, leading to the trapping
Ct = tanh(Wc × xt + Uc × ht− 1 + bc ) (3)
of the optimization process in local optima. E_RWS addresses this
problem by selecting neurons with higher predictive potential to Where it denotes the input threshold, Wi and Wc are their input
increase network diversity, while transferring elite neurons to the weights; Ui and Uc are their recurrent weights; bi and bc are the bias
next layer. term. (4) is used to update the cell state at time t. Ct− 1 is a memory
(4) The proposed LSTM-FPNN based on E_RWS was successfully content at time t − 1.
applied to time series data and outperformed many classical
Ct = ft × Ct− 1 + it × Ct (4)
forecasting methods. Furthermore, this model has been

3
K. Zhou et al. Knowledge-Based Systems 289 (2024) 111481

logic with PNN. The FPNN is composed of a number of layers that are
implemented using FPN neurons. The fuzzification of FPN is imple­
mented by using clustering algorithm (CA). CA is a typical information
granularity technique, which divides each data point into a specific
group based on similar characteristics or features between data points
[35]. CA is a typical analysis tool that can be used to extract some
important structural properties and insights from data. The architecture
of FPN is composed of premise, consequent and inference parts.
The FPN uses the fuzzy clustering to define the premise part in the
fuzzy rule. The number of clusters is the number of fuzzy rule and the
degree of belongs of each cluster is used as the degree of membership
function (MF). Compared to traditional approaches, fuzzy clustering
serves as the firing strength, which does not calculate the parameters (e.
g., widths and centers)) of the membership function. The entries of the
partition matrix (membership functions) are taken as the values of the
Fig. 2. Structure of the LSTM network.
firing strength.
First, the prototypes (cluster centers) are initialized, and then their
values are updated through the iterative learning implemented by the
Output gate: it is produced as output information and can be
FCM algorithm.
described as follows:
∑N m
(5) uik xk
Ot = σ(Wo × xt + Uo × ht− 1 + bo ) vi = ∑k=1N m
(7)
k=1 uik
Where Ot denotes the output threshold at time t, Wo denotes the
input weight and Uo is the recurrent weight, and bo is the bias term. The where vi ∈ Rn are the prototypes of the ith cluster, xk ∈ Rn denote the
output value at time step t can be expressed as follows: input variables. m denotes a fuzzification coefficient, m > 1.0. N is the
ht = Ot × tanhCt (6) number of data samples. uik is the membership function which comes
from FCM and also serves as the firing strength.
The cell state is then passed through tanh (to normalize values be­
1
tween the range − 1 and 1) and multiplied by the Ot . Ct is the cell state at uik = ( )m−2 1 (8)
time t. All the W, U and b must be determined by learning in the training ∑c ‖xk − vi ‖

processing. The gating architecture of the LSTM allows it to handle both


j=1 ‖xk − vj ‖

short- and long-term temporal correlations present in sequences.


Here, ‖ ⋅ ‖ represents the Euclidean distance; c is the number of
The structure of an individual LSTM network is depicted in Fig. 2,
clusters. uik must satisfy the following constraints:
which is composed of input layer, LSTM layer, fully connected layer and
output layer. The proposed LSTM neuron (LSTMN) is more efficient and ∑
c
uik = 1, 0 ≤ uik ≤ 1 (9)
powerful in feature representation than the original PN especially when i=1
extracting time (temporal) features from sequential data to deal with the
challenge of sequence prediction problems. and

N
0< uik < N, 1 ≤ i ≤ c (10)
2.2. Structure of fuzzy rule-based polynomial neurons (FPNs) k=1

In the consequent part, linear function, quadratic polynomial func­


PNNs can be seen as a dynamic model, whose topology can be
tion or reduced quadratic polynomial function are used as the connec­
adjusted during the learning procedure [34]. FPNN is an advanced
tion weights to replace the fixed weights (constant) to enhance the
model of PNN, which fully captures the complicated nonlinear trend
capacity of FPN for situation adaptation.
encountered in the data space by combining the advantages of fuzzy

Fig. 3. Structure of PNN: (a) Structure of PNN. (b) Description of PN.

4
K. Zhou et al. Knowledge-Based Systems 289 (2024) 111481

Fig. 4. Structure of FPNN: (a) Structure of FPNN. (b) Description of FPN.

Constant (Co):
fi (xk ) = ai0 , (xk ∈ Rn ) (11)
Linear function (LI):

n
fi (xk ) = ai0 + aij xj , (xk ∈ Rn ) (12)
j=1

Quadratic polynomial function (QP):



n n ∑
∑ n
fi (xk ) = ai0 + aij xj + aijk xj xs , (xk ∈ Rn ) (13)
j=1 j=1 s=1

Reduced quadratic polynomial function (RQ)



n n ∑
∑ n
fi (xk ) = ai0 + aij xj + = s, xk ∈ Rn )
aijk xj xs , (j ∕ (14)
j=1 j=1 s=2

The inference part involves fuzzy reasoning and mapping process.


The membership function and connections (fi ) are combined to produce
the output ̂ y k (xk ) of FPN.
∑c ∑c
i=1 fi (xk )uik (xk )
y k (xk ) = ∑
̂ c = fi (xk )̃
uik (xk ) (15)
r=1 urk (xk ) i=1

The structure of FPN can be expressed as a set of fuzzy rules:

Rik : If xk is Aki with vi then yik = fi (xk ) (16)

where Rik denotes the ith rule, i = 1,2, …, c. The input variables are
denoted by xk , k = 1, 2, …, n. Aki denotes the membership function. fi
represents the connection weights.
In our proposed model, FPN is used as the basic neuron to replace the
original PN in the second and high layers. The structures of PNN and
FPNN are displayed in Figs. 3 and 4, respectively.
The detailed description of PN is shown in Fig. 3(b), where xi and xj
stand for the outputs of the neurons in the previous layer, a0 , a1 , a2 ,…,
a3 are the coefficients of the polynomial. In Fig. 4(a), the FPNN is
composed of several FPN neurons in each layer. The architecture of FPN
is composed of premise, consequent and inference part as displayed in
Fig. 4(b). In the premise part, “uik ” represents the membership function
produced by the CA algorithm. In the consequent part, “fi ” represents the
connection weights which contains four different types polynomial
Fig. 5. Generation procedure of existing PNN. functions. The output y of inference part stands for the combination of
the connections “fi ” and membership function “uik ”.

5
K. Zhou et al. Knowledge-Based Systems 289 (2024) 111481

Fig. 6. Hybrid selection strategy of E_RWS being applied to LSTM/FPN layers of the proposed network.

Fig. 7. Overall structure of LSTM-FPNN. (a) Topology of LSTM-FPNN (b) Data processing (c) Elitism-driven roulette-wheel selection (d) Structure of LSTM network
(e) Structure of FPN (f) Learning method.

2.3. Hybrid selection strategy of elitism-driven roulette-wheel selection input variables. Combinations of the initial input variables are taken
into account when constructing each PN. The outputs of these PNs are
In the generation procedure of conventional PNN as illustrated in utilized as the inputs which are fed to the second layer and the con­
Fig. 5, the number of neurons will increase along with the deeper layers. struction of higher layers are same to this process. The overall number of
This will lead the computational load of the model to become higher. PN in each layer is given below:
In the first layer, the number of neurons (PN) is determined by the

6
K. Zhou et al. Knowledge-Based Systems 289 (2024) 111481


⎪ m(m − 1) 3. LEARNING techniques of LSTM-FPNN

⎨ , K=2
2!
T= (17)
⎪ 3.1. Back-propagation (BP) through time algorithm for LSTM
⎩ m(m − 1)(m − 3), K = 3

3!
The first layer of LSTM-FPNN is composed by the LSTM neurons.
Here K (set to 2 or 3) denotes the dimensionality of inputs of the PN. Each neuron represents an independent LSTM network. To train LSTM
m is the original dimensionality of inputs. The number of neurons in­ network, an algorithm of back propagation through time with the mean
creases along with the number of layers. To improve the efficiency of the squared error is employed [25,38].
computational overhead and to reduce redundant information, it is The forward pass in an LSTM involves sequentially processing each
necessary to select a few promising neurons from among all of them. time step of the input data, updating the hidden and cell states using the
For PNN, the selection criteria for neurons is the best fitness of gates of the LSTM, and producing an output at each step.
training data (elitism selection). The neurons with the highest fitness in In the backward pass, we propagate forward step by step by
the current layer are selected and fed to the next layer as inputs. computing the gradient (δ) of the hidden state ht and Ct .
Although this method can make the network converge quickly, it may
lead to two problems. 1) it makes the selected neurons too similar to y t = σ(Vht + c)
̂ (19)
bring about the training process to fall into a local optimum. 2) it re­
duces the diversity of neurons in each layer, some potential better pre­ δht =
∂L
(20)
diction neurons are discarded. ∂ht
Currently, a variety of selection strategies, such as rank-based se­
∂L
lection and the roulette wheel, are extensively utilized in genetic algo­ δ Ct = (21)
rithms [36–37]. The forecasting results of the model is impacted by
∂Ct
various selection strategies differently. Inspired by those selection Where ̂y t denotes the prediction output, L is the loss function, V is the
strategies, a hybrid selection strategy that combined the roulette-wheel weight and c is the bias. The variable L(t) allows us to express the
selection algorithm and the elitism selection is designed in our model. following recursion:
The hybrid selection strategy of E_RWS can alleviate the problem of {
getting stuck in local optimum and increase the diversity of nodes while l(t) + L(t + 1) if t < τ
L(t) = (22)
l(t) if t = τ
ensuring the best individual will not be lost in the procedure from the
current layer to the next layer. E_RWS consists of the following steps: For the time slot t = τ
[Step 1] Estimate all candidate neurons of the current layer to obtain ( )
the fitness value (performance index) of all neurons. Sorting guarantees ∂Oτ T ∂Lτ
δhτ = = V T (̂
y τ − yτ ) (23)
that the first n neurons in the population of candidate neurons are the ∂hτ ∂Oτ
best n neurons. ( )T
[Step 2] Calculate the sum of the fitness values of every neuron in the ∂hτ ∂Lτ ( )
δC τ = = δhτ ⋅Oτ ⋅ 1 − tanh2 (Cτ ) (24)
candidate neurons. ∂Cτ ∂hτ
[Step 3] Calculate the fitness value of each neuron and their proba­
Based on δCt+1 and δht+1 , we calculate δht and δCt :
bility of selection.
( )T ( )T
∂L ∂l(t) ∂ht+1 ∂L(t + 1) ∂ht+1
Fi
Pi = ∑m (18) δht = = + = V T (̂
y t − yt ) + δht+1 (25)
∂ht ∂ht ∂ht ∂ht+1 ∂ht
j=1 Fi
According to and Eqs. (4) and (6), we have:
where Fi stands for the fitness, Pi is the selection probability, and m
denotes the number of neurons. ∂ht+1
= diag[Ot+1 ⋅(1 − Ot+1 )⋅tanh(Ct+1 )]Wo + diag[ΔC⋅ft+1 ⋅(1 − ft+1 )⋅Ct ]Wf
[Step 4] The random number (range from 0 to 1) is compared with ∂ht
the cumulative probability of the neuron until obtaining a value larger
[ [ ]]
than it, then the neuron corresponding to that value is selected. Repeat +diag ΔC⋅it+1 ⋅ 1 − (Ct+1 )2 WC + diag[ΔC⋅Ct+1 ⋅it+1 ⋅(1 − it+1 )]Wi (26)
this selection process until the preset number of candidates (neurons)
[ ]
has been met. ΔC = Ot+1 ⋅ 1 − tanh2 (Ct+1 ) (27)
[Step 5] Merge the neurons obtained in Step 1 and 4 and remove
duplicate neurons. Fig. 6 shows the illustration of the E_RWS selection Based on δCt+1 and ht , we have:
strategy. ( )
∂Ct+1 T ∂L
( )T
∂ht ∂L
δ Ct = +
∂Ct ∂Ct+1 ∂Ct ∂ht
2.4. Structure of the LSTM-FPNN ( )
∂Ct+1 T ( )
= δCt+1 + δht ⋅Ot ⋅ 1 − tanh2 (Ct )
The LSTM-FPNN composes of two different types of neurons based on ∂Ct
deep learning method (LSTM), clustering algorithm (FCM) and PNN. ( )
The architecture of the proposed LSTM-FPNN is outlined in Fig. 7, which = δCt+1 ⋅ft+1 + δht ⋅Ot ⋅ 1 − tanh2 (Ct ) (28)
is composed of several dynamically generated layers. In the temporal Based on δth and δtC , we have:
layer, LSTM network is designed to capture temporal features of the
time-series. In the fuzzy layers, FPNN is utilized to flexible approxima­ ∑τ
∂L
= [δCt ⋅Ct− 1 ⋅ft ⋅(1 − ft )](ht− 1 )T (29)
tion the relationship between input and output and to capture uncer­ ∂Wf t=1
tainty present in data.
∂L
Wfnew = Wf − η (30)
∂Wf

where η is the learning rate. The calculation of other weights of Wi , Wo ,


Wc , Uf , Ui , Uo , Uc , bf , bi , bo , bc are similar to that of Wf . In addition,

7
K. Zhou et al. Knowledge-Based Systems 289 (2024) 111481

Fig. 8. Overall design framework of the proposed LSTM-FPNN through hybrid selection technique and LSTM network.

8
K. Zhou et al. Knowledge-Based Systems 289 (2024) 111481

Table 1 ⎡ ⎤
Parameters of the existing PNN and proposed LSTM-FPNN with E_RWS. u11 ⋯ uc1 x11 u11 ⋯ x11 u11 ⋯ xn1 uc1
⎢ u12 ⋯ uc2 x12 u12 ⋯ x12 u12 ⋯ xn2 uc2 ⎥
Existing PNN LSTM-FPNN with E_RWS

X=⎣ ⎥, (32)
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⎦
1st 2nd and higher 1st 2nd and higher
u1N ⋯ ucN x1N u1N ⋯ x1N ucN ⋯ xnN ucN
layer layers layer layers

No. of inputs for each 2 (3) 2 (3) 20 2 (3) A = [a10 ⋯ac0 ⋯ac1 ⋯acn ]T , Y = [y1 y2 ⋯yN ]T (33)
node
No. of layers 1 5 1 5
No. of neurons per ≤20 ≤20 ≤ 20 ≤20 where N denotes the number of data, uik (xk ) is the firing strength
layer (membership function) calculated according to FCM clustering. fi (xk )
Polynomial type LI (QP) LI (QP) N/A LI (QP) are the connection weights which have four polynomial types Eqs.(11)–
Fuzzification N/A N/A N/A 2 (14).
coefficient
To find the A, we minimize the loss function:
No. of elitisms N/A N/A N/A 1~3

(LI: linear, QP: quadratic polynomial, N/A: Not applicable). minLoss(A) = (y − XA′)(y − XA)
A

= y′y − y′XA − (XA′)y + (XA′)XA


Table 2
Experimental performance of benchmark PNN and the LSTM-FPNN model for
MDT data (denoted by Mean±Std). = y′y − 2y′XA + A′X′XA (34)
Layer Benchmark model Proposed model The necessary conditions for a minimum are:
Existing PNN with elitism selection LSTM-FPNN with E_RWS
∂Loss
TPI EPI TPI EPI = − 2X′y + 2X′XA = 0 (35)
1 2.4776±0.010 2.4852±0.041 2.3732±0.010 2.3842±0.057
∂A
2 2.4494±0.009 2.4571±0.038 2.3587±0.017 2.3771±0.057
3 2.4305±0.007 2.4396±0.025 2.3513±0.014 2.3779±0.064 X′y = X′XA (36)
4 2.4183±0.008 2.4326±0.028 2.3429±0.015 2.3759±0.059 1
5 2.4154±0.008 2.4318±0.029 2.3345±0.015 2.3839±0.062 Premultiplying both sides by (X′X)− gives
6 2.4122±0.010 2.4319±0.032 2.3312±0.016 2.3849±0.057
(X′X)− 1 X′y = (X′X)− 1 X′XA (37)
the training data is divided into batches of many instances each. The The weight can be obtained as follows:
weights are updated upon after the completion of each batch. Adam is ( )− 1
used as the optimization method in the training process of the network A = XT X XT Y (38)
[39].
4. Design framework of the LSTM-FPNN
3.2. Least square error (LSE) method for FPN
In general, the design procedure of LSTM-FPNN involves the
The LSE method is utilized to adjust the weights of FPN. The lost following steps, which are illustrated in Fig. 8.
function is as follows: [Step 1] Transform the time series into n-variate time series
( )2 A time series is a sequence of numbers that are ordered by a time
∑ ∑ index. To transform time series data to be used in supervised learning,
N c
Loss(A) = yk − fi (xk )uik (xk ) = ‖ y − XA ‖2 (31)
k=1 i=1
data preprocessing is used. Given is a time series X = {x1 , x2 , …, xN }.
Transform X into an n-variate time series: D = {d1 , d2 , …, dN− n+1 }.
Since the connection weights use linear function as an example, A, X,
Y are expressed in the form

Fig. 9. Performance of the LSTM-FPNN determined by E_RWS method for NE = 2, C = 3, K = 2. (a) Training data. (b) Testing data.

9
K. Zhou et al. Knowledge-Based Systems 289 (2024) 111481

Fig. 10. Performance of LSTM-FPNN for NE = 2, C = 3, K = 2. (a) Training data. (b) Testing data.

Fig. 11. Optimized topology of the LSTM-FPNN. (a) Topology of LSTM-FPNN with E_RWS (b) Input nodes of the 3rd layer of LSTM-FPNN selected through RWS/
Elite selection.

Table 3 Table 4
Performance (RMSE) of PNN and LSTM-FPNN model for predicting time-series. Prediction performance of LSTM-FPNN model adopting two kinds of selection
Data Benchmark model Proposed model
method.
Existing PNN with elitism selection LSTM-FPNN with E_RWS Proposed model Proposed model
Data LSTM-FPNN with elitism selection LSTM-FPNN with E_RWS
M = 20 M = 20
K=2 K=3 K=2 K=3 M = 20 M = 20
MDT 2.4318±0.029 2.4341±0.060 2.3759±0.059 2.3768±0.038 K=2 K=3 K=2 K=3
GE 0.5135±0.029 0.5108±0.020 0.5089±0.030 0.5112±0.037 MDT 2.3780±0.043 2.3759±0.041 2.3759±0.059 2.3768±0.038
JD 0.7665±0.013 0.7693±0.050 0.7677±0.039 0.7705±0.029 GE 0.5107±0.040 0.5117±0.039 0.5089±0.030 0.5112±0.037
OC 51.951±1.057 55.543±3.042 34.508±2.938 32.291±3.215 JD 0.7691±0.073 0.7713±0.081 0.7677±0.039 0.7705±0.029
TI 1.0726±0.028 1.0739±0.035 0.9832±0.055 0.9840±0.059 OC 32.556±3.923 33.368±3.553 34.508±2.938 32.291±3.215
AQ 3.4375±0.094 3.6537±0.194 2.9800±0.173 2.9924±0.068 TI 0.9869±0.038 0.9840±0.039 0.9832±0.055 0.9840±0.059
TB 0.9563±0.016 0.9722±0.017 0.9720±0.025 0.9113±0.023 AQ 2.9956±0.051 2.9970±0.047 2.9800±0.173 2.9924±0.068
AEP 0.1070±0.005 0.1068±0.005 0.0991±0.003 0.0967±0.003 TB 0.9135±0.013 0.9123±0.012 0.9720±0.025 0.9113±0.023
PM25 41.175±0.516 40.844±1.358 37.772±1.001 37.128±0.670 AEP 0.0996±0.002 0.0970±0.003 0.0991±0.003 0.0967±0.003
PM25 37.699±1.630 37.429±1.248 37.772±1.001 37.128±0.670

10
K. Zhou et al. Knowledge-Based Systems 289 (2024) 111481

Fig. 12. Comparison of the effect of the node selection strategy between the proposed selection (E_RWS) and the existing Elitism selection in the LSTM-FPNN to show
the superiority of convergence and output performance.
⎡ ⎤ ⎡ ⎤
x1 d1 neurons, the training and testing output of each neuron, as well as its

⎢ x





⎥ errors with respect to the actual output are calculated. The E_RWS
d2
⎢ 2






⎥ method is used to select the different neurons among all the LSTM
⎢ . ⎥ ⎢ . ⎥ neurons.
Transform, X = ⎢
⎢ .
⎥, into D = ⎢
⎥ ⎢


⎢ ⎥ ⎢ . ⎥ 2) In the second and other layers, the input of the current layer is
⎢ ⎥ ⎢ ⎥
⎢ .




⎣ . ⎥
⎦ selected by the output of the previous layer and the E_RWS method. The
xN dN− input of the second layer has been obtained in 1), and the coefficients of
n+1
⎡ ⎤ the polynomial in the FPN are estimated by LSE. In the same way, we can
(x1 , x2 , …xn ) select the inputs of the remaining layers (including the last layer) in
⎢ ⎥
⎢ (x2 , x3 , …xn+1 ) ⎥ sequence.
⎢ ⎥
⎢ ⎥ Algorithm 1. Elitism-driven Roulette-wheel selection
⎢ . ⎥
=⎢ ⎥. Input: LSTM neurons or FPN
⎢ ⎥
⎢ . ⎥ Output: selected LSTM or FPN
⎢ ⎥
⎢ . ⎥ 1: Set the number of elitisms (NE), and the maximum number of neurons in each
⎣ ⎦ layer (MN).
(xN− n+1 , xN− n+2 , …xN ) 2: Evaluate the fitness of all neurons in the previous layer.
3: Sort neurons (LSTM or FPN) according to their fitness.
The last column can be taken as the output value and the others are 4: Select the top NE neurons with the best fitness.
input values. 5: If i<(MN–NE)
6: Compute the sum of the fitness values of each remaining neuron (remove NE
[Step 2] Split data into k-fold cross validation (k-fcv) and set the
neurons)
parameters. 7: Calculate the cumulative selection values for each neuron.
The data are split into training and testing part by k-fcv. Here, the 8: The generated random number (0–1) is compared with the cumulative fitness
number of folds is set to 5. The parameters of the model such as number value of the neuron until a value larger than this is obtained, then the neuron
of hidden units in LSTM, polynomial type of FPN, number of clusters, corresponding to this value is selected (SE).
9: Remove the selected neuron from the remaining neurons.
etc. are determined.
10: Repeat 6–9 until the number of neurons is equal to (MN–NE).
[Step 3] Build the LSTM-FPNN model 11: Else
1) In the first layer, the original inputs are fed into the several LSTM 12: Combine the NE neurons and SE neurons of the current layer.
neurons. The training data is utilized to estimate the parameters 13: Repeat 2 to 12 until the last layer of the model is finished.
(weights) of each LSTM neuron. According to the constructed LSTM

11
K. Zhou et al. Knowledge-Based Systems 289 (2024) 111481

Fig. 13. Performance (RMSE) according to different hidden units in LSTM.

expressed in (Eq.39) as the evaluation metrics. The performance index of


training (TPI) and testing (EPI) is taken as the RMSE calculated for the
[Step 4] Determine the termination condition. training and testing data, respectively.
After the current layer is generated, check whether the maximum
number of layers is satisfied and record the result of the best neuron as
5.1. Datasets description and setting
the model output.
[Step 5] Report the prediction performance of LSTM-FPNN.
The description of datasets is as follows:
The proposed neural network is constructed with the use of the
Dataset 1: Minimum Daily Temperatures (MDT). 3650 data of mini­
training data, and then the quality of the network is evaluated with the
mum daily temperatures from 1981 to 1990 in the city Melbourne,
testing data. Training performance index (TPI) and testing performance
Australia. The dataset shows a strong seasonal component with large
index (EPI) of FPN are taken as the root mean square error (RMSE)
fluctuations.
computed for the training and testing data.
Dataset 2: Gree Electric Stock (GE). 4000 stock data of Gree Electric
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√ from December 31, 1999 to November 20, 2015.
√1 ∑ N
TPI(EPI) = √ (yn − ̂ y n )2 (39) Dataset 3: JD Stock (JD). 1394 stock data of JD from May 22, 2014 to
N n=1 December 3, 2019.
Dataset 4: Occupancy (OC). 17,600 occupancy data per half hour
Where yn and ̂ y n represent the desired output and prediction output.
collected from car parks in Birmingham from October 4, 2016 to
N stands for the samples.
December 14, 2016.
Dataset 5: Temperatures in Italian (TI). 8991 hourly temperatures in
5. Experimental studies and discussion
an Italian city from March 10, 2004 to April 4, 2005.
Dataset 6: Air Quality (AQ). 9358 data (True hourly averaged NO2
To validate our LSTM-FPNN, we conduct experiments on eight time-
concentration in microg/m^3) come from an Air Quality Chemical
series datasets coming from the UCI database and https://github.
Multisensor Device in Italian.
com/zhouk92/research.git [40]. The parameters setting of the pro­
Dataset 7: Temperatures in Beijing (TB). 35,044 hourly temperatures
posed model is reported in Table 1. Data structure, model effectiveness,
in Beijing Olympic Sports Center, China from March 1, 2013 to March
and complexity are also taken into consideration when selecting pa­
20, 2015.
rameters. To evaluate the performance, we take the RMSE being
Dataset 8: Appliances Energy Prediction (AEP). 19,735 appliances

12
K. Zhou et al. Knowledge-Based Systems 289 (2024) 111481

Fig. 14. Performance (RMSE) according to different clusters.

energy prediction data (windspeed) collected from a low energy house. Table 2 provides comparative results of the PNN and the LSTM-FPNN
Dataset 9: PM25. 31,892 air quality data with eight features are model. The best performance index of testing (EPI) is highlighted in
collected to predict the severity of PM2.5 pollution from 2010 to 2015 in boldface.
Beijing Olympic Sports Center, China. The performance of the LSTM is better than the conventional PNN as
A. Experiment 1: Minimum Daily Temperatures data shown in Table 2. By comparing the first layer results of PNN and LSTM-
We take the Dataset 1: Minimum Daily Temperatures (MDT) data as FPNN, the LSTM neuron improves the prediction accuracy due to its
an example to introduce the implementation procedure of the LSTM- ability to capture temporal features. The results of the other layers show
FPNN in detail and analyze it. In this experiment, n is set to 21, other that the FPN neuron has a better fitting ability through granulating data
parameters are shown in Table 1. with the help of a clustering algorithm compared with PN.
Data transform: The XMDT = {x1 , x2 , …, x3650 } is transformed to DMDT Fig. 9. displays the performance of the LSTM-FPNN with E_RWS
= {d1 , d2 , …, d3630 } = {(x1 , x2 , …x21 ), (x2 , x3 , …x22 ), …, (x3630 , x3631 , method for “MDT” data.
…x3650 )}. The front 20 dimension of each instance (d) are used as input Fig. 10. shows the performance of the LSTM-FPNN with E_RWS
and the last one as output. method, with NE (the number of elitisms) equal to 2, C (the number of
First (LSTM) layer: the training data are fed into the LSTM layer clusters) = 3, L (number of layers) = 6, K (number of input variables) =
which including several LSTM neurons. The TPI of the best LSTM is 2. The real output versus the model output have been presented. In this
reported as result of this layer. E_RWS is used to select the optimal case TPI (RMSE) = 2.3429 and EPI (RMSE) = 2.3759.
neurons from entire candidate neurons. The outputs of the selected Fig. 11 displays the topology generated by E_RWS in the proposed
neurons are used to construct FPN for the second layer. LSTM-FPNN.
Second or higher (FPNN) layer: the outputs of first layer are fed into B. Experiment 2: other time-series datasets
the FPNN layer which including several FPN neurons. The best neuron of We have carried out the experiments by using nine datasets (Dataset
the selected neurons through E_RWS is as the result of second layer, the 1–9) of time-series to verify the superiority of the LSTM-FPNN. Table 3
outputs of the others are also used as the inputs of the next layer. Repeat covers the performance of proposed model for predicting time-series.
this process until the maximum layer is satisfied. Compared with existing PNN, the proposed LSTM-FPNN improves
Evaluate testing data: according to the recorded neurons selected for the prediction performance for most of time-series datasets. These re­
each layer, redundant neurons are removed to obtain the optimal sults indicates that the prediction results of the LSTM-FPNN is consid­
network. The testing data are fed into this optimal network to get per­ erably improved compared to PNN. This is because the LSTM-FPNN not
formance of testing. only can learn the temporal features from time-series but also

13
K. Zhou et al. Knowledge-Based Systems 289 (2024) 111481

Fig. 15. RMSE of the proposed model according to different type of polynomial function.

Table 5
Comparison of performance in different models (RMSE).
Dataset ARMA [47] MLP [48] SVR [48] RBFNN [48] LSTM [25] LR [48] Proposed LSTM-FPNN

MDT 2.747±0.07 2.680±0.04 2.421±0.03 2.417±0.04 2.384±0.05 2.417±0.03 2.375±0.05


GE 0.587±0.04 0.605±0.16 0.484±0.02 0.616±0.03 0.532±0.04 0.485±0.02 0.508±0.03
JD 0.845±0.03 0.857±0.02 0.781±0.04 0.833±0.05 0.786±0.04 0.779±0.04 0.767±0.03
OC 78.95±1.38 32.34±0.44 42.82±4.15 41.77±2.92 53.34±10.1 42.06±3.49 32.29±3.21
TI 1.592±0.02 1.338±0.29 1.071±0.04 1.091±0.04 1.014±0.05 1.068±0.04 0.984±0.05
AQ 4.384±0.18 3.629±0.46 3.894±0.17 3.827±0.21 3.041±0.08 3.811±0.18 2.980±0.17
TB 1.261±0.01 1.038±0.05 0.977±0.01 0.985±0.01 0.931±0.02 0.967±0.01 0.911±0.02
AEP 0.147±0.01 0.098±0.02 0.084±0.01 0.125±0.01 0.103±0.01 0.828±0.01 0.096±0.01
PM25 645.1±8.25 41.89±3.95 44.14±0.39 41.36±1.30 42.37±1.95 42.84±0.32 37.12±0.67
Average 81.73 9.38 10.74 10.36 11.61 10.58 8.67

approximate the complex distributed data well. Therefore, the effec­ existing Elitism selection in the LSTM-FPNN to show the superiority of
tiveness of LSTM-FPNN and E_RWS is demonstrated. convergence and output performance.
To highlight the distinction between the conventional selection From Fig. 12, it can be seen that the experimental testing errors of
method and E_RWS, we compare the prediction performance of two LSTM-FPNN with E_RWS decrease as the number of layers increases.
method under the same situation which are summarized in Table 4. Compared with ES, the E_RWS method can not only obviously improve
Table 4 shows that the results of LSTM-FPNN with E_RWS is better the prediction ability of the model as well as the convergence during the
than that of LSTM-FPNN with elitism selection in most cases, which growth process of the network layer, but also effectively enhance the
verifies the effectiveness of LSTM-FPNN with E_RWS. Besides, our stability of the model architecture.
method utilized E_RWS which furtherly enhance the model generaliza­
tion. This is because the E_RWS can select the neurons with more pre­ 5.2. Computational complexity analysis
dictive potential well.
Fig. 12 describes the detailed comparison of the effect of the node (1) Complexity analysis of E_RWS (Algorithm 1)
selection strategy between the proposed selection (E_RWS) and the The E_RWS is a hybrid selection strategy that combines the roulette-

14
K. Zhou et al. Knowledge-Based Systems 289 (2024) 111481

Fig. 16. The main flowchart of construction for Portland cement dataset.

Table 6
Comparative results of cement dataset with regression methods.
Dataset type Isotonic Regression Linear Regression MLP SVR RBF Regressor LSTM Proposed model

Cement 4.739±0.174 3.658±0.176 2.773±0.530 3.151±0.123 2.993±0.170 2.731±0.147 2.538±0.130

polynomial (P). HU in LSTM corresponds to the amount of information


remembered between time steps. C are related to the formation of fuzzy
rules in FPN. P determines the adaptive abilities of FPN to different
scenarios.
Fig. 13 displays the performance (RMSE) of the LSTM-FPNN with
different hidden units in LSTM. From Fig. 13(a)–(c) and (e), with an
increase in the number of inputs, the RMSE of LSTM-FPNN will decrease
slowly at first and then level off when the HU greater than 20 in the
training process. In the testing process, it exhibits the same variation
trend as the training process. From Fig. 13(d) and (f), it can be seen that
the RMSE of LSTM-FPNN will decrease with an increase in the number of
hidden units. While more hidden units improve the performance, it also
increases the computational overhead. Especially when the HU is equal
to 20, the LSTM-FPNN has the best performance in Fig. 13(b) and (e).
Fig. 17. The process of activated sludge wastewater treatment system. Therefore, the HU sets to 20 is suggested in the proposed LSTM-FPNN
model.
wheel selection algorithm and the elitism selection. The complexity of The number of clusters affects the effectiveness of LSTM-FPNN
elitism selection is O(nlogn), while that of roulette-wheel selection is O model. From Fig. 14(a), (c) and (e), the RMSE will decrease as the
(n) [41–42]. n is the number of candidate neurons. Therefore, the number of clusters increase in the testing process. Fig. 14(b),(d) and (f)
complexity of E_RWS is O(nlogn+n). exhibit obvious fluctuations as the number of clusters increases. This
(2) Complexity analysis of the LSTM-FPNN demonstrates that increasing the number of clusters will not improve
The LSTM-FPNN mainly consists of LSTM network and FPNN. For the TPI and will actually increase the complexity of the FPN neuron. From
LSTM network, according to [43–44], the computational complexity of this, the number of clusters equals to 3~5 is suggested in the proposed
training one LSTM is O((QH)+(QMcBs)+(HUf)+(McBsUf))=O(W) where, LSTM-FPNN model.
Mc denotes the number of memory cell blocks, Bs is the size of the blocks, In addition, P is also an important factor in FPN. Fig. 15 displays the
Q denotes the number of output units, H is the number of hidden units, RMSE of the proposed LSTM-FPNN with different type of P. From
Uf is the number of connected units. For the FPNN, the computational Fig. 15, the RMSE of the LSTM-FPNN is the smallest in most cases when
complexity of each FPN is O(tc2mN+m2N). Where t is the number of the polynomial function is set to quadratic polynomial (QP). For the
iterations, c is the number of clusters, m denotes the dimension of the MDT, TI and GE data, the RMSE of the proposed model with using
input values, N denotes the number of samples [45–46]. Therefore, the reduced quadratic polynomial (RQ) are lower than the other two cases.
complexity of LSTM-FPNN is O(W+ tc2mN+m2N). From Fig. 15, we can note that different type of polynomial help HNN
model deal with different time-series forecasting. Therefore the type of
5.3. Discussion of hyperparameters in LSTM-FPNN polynomial function commonly set to QP.

This section studies the impact of three hyperparameters of training 5.4. Comparison with conventional predictive models
the LSTM-FPNN-based model which includes the number of hidden units
(HU) in LSTM, the number of clusters (C) in FPN and the type of In the experiment, we selected several classical forecasting models

Table 7
Comparative results of sewage dataset with regression methods.
Dataset type Isotonic Regression Linear Regression MLP SVR RBF Regressor LSTM Proposed model

Sewage treatment 0.0265±0.003 0.0267±0.003 0.0291±0.004 0.0271±0.003 0.0257±0.003 0.0231±0.002 0.0224±0.002

15
K. Zhou et al. Knowledge-Based Systems 289 (2024) 111481

for comparisons, including: autoregressive moving average model different types of neurons (viz., LSTM and FPN) and Elitism-driven
(ARMA) [47], multilayer perceptron (MLP) [48], support vector Roulette Wheel Selection (E_RWS) for time-series prediction.
regression (SVR) [48], radial basis functions neuron network (RBFNN) In the first layer, LSTMs are used as neurons that can effectively
[48], LSTM [25] and linear regression (LR) [48]. The significant capture the temporal features from time-series datasets. In the subse­
hyperparameter settings of all the models mentioned above are sum­ quent layers, FPNs are used as basic neurons to reflect the complex
marized as follows. nonlinear structure encountered in the data space and granulate it
In the ARMA, the p and q are set to (1, 1). In the LSTM, the experi­ through the algorithm of FCM clustering. When it comes to the neuron
mental settings are the same as LSTM layer of our proposed model which selection strategy, the E_RWS is utilized to take place the conventional
are mentioned in Table 1. In the MLP, SVR, RBFNN and LR, the default selection strategy of best fitness. The E_RWS is able to increase the di­
values of the WEKA toolkit were used for the parameters required to versity of the chosen neurons while ensuring elite neurons can be
learn each model [48]. All experimental results were obtained by 5-fcv. selected.
Table 5 shows the performance comparison in different models. Through a series of experimental studies, the proposed LSTM-FPNN
From Table 5, the following conclusions can be drawn. (1) The demonstrated superb performance (quantified both in terms of both
performance of the proposed LSTM-FPNN model is preferable to the prediction accuracy and STD values) on time-series datasets. Minimum
other models in most datasets (8/9). The forecasts of proposed model for Daily Temperatures (MDT) data are used as an example to introduce the
the GE are not the most accurate. However, the proposed model is better implementation procedure of the LSTM-FPNN in detail. On the MDT
than other five models. (2) For PM25 data, the RMSE of the proposed dataset, LSTM-FPNN obtained the smallest RMSE compared to the PNN.
LSTM-FPNN drop sharply compared to the other models. It indicated Based on the RMSE, LSTM-FPNN showed a decrease of 4.06 %, 3.25 %,
that the prediction ability of LSTM-FPNN is much better than other 2.52 %, 2.33 %, 1.96 % and 1.93 % in each layer. The experimental
models. (3) Compared to LSTM, the RMSE of the LSTM-FPNN on Dataset results showed that the LSTM neuron improves the prediction accuracy
1–9 are reduced by 3.8 %, 4.5 %, 2.4 %, 39.4 %, 2.9 %, 2.0 %, 2.1 %, 6.7 due to its ability to capture temporal features. The FPN neuron has a
% and 12.3 %, respectively. It proves that the synergistic effect of LSTM better fitting ability compared to PN. For the nine time-series datasets,
and FPN obviously improves the performance of the model. the proposed model reduces the prediction error (8/9) for most of the
Therefore, the present LSTM-FPNN not only improves the prediction time-series datasets. These results indicate that the prediction results of
ability by the deep learning method and fuzzy inference mechanism but the LSTM-FPNN are significantly improved compared to PNN. The re­
also has flexible network structure by the elitism-driven roulette-wheel sults of comparing different selection strategies showed that the E_RWS
selection. outperformed the elitism selection under the same situation. Moreover,
some parameter settings were obtained through experiments. For
5.5. Practical application in other domains example, it is suggested that the number of hidden units in LSTM should
be set to 20, the number of clusters should be between 3 and 5, and the
Case study 1: The estimation of cement compressive strength type of polynomial function is usually set to QP. To further evaluate the
One of the most commonly utilized building materials worldwide is performance of the model, several classical forecasting models were
cement. The most crucial mechanical and physical characteristics that used for comparison. The experimental results showed that LSTM-FPNN
reflect cement quality is cement compressive strength (CCS). To verify showed an average 7.56 % and 16.31 % over the second and third best
the prediction performance, we apply the proposed model to estimate methods (i.e., MLP, RBFNN).
the CCS. Fig. 16 shows the main flowchart of construction for Portland In the task of cement compressive strength estimation, the proposed
cement dataset. model showed a decrease of 7.06 % from the second best model. For
A total of 3600 samples with 56 eigenvalues (6 eigenvalues of gray- predicting the key parameter in the activated sludge process, LSTM-
level histogram (GLH) and 50 eigenvalues of gray-level co-occurrence FPNN was the best model among other widely used regression
matrix (GLCM)) composed of 3D microstructural image features and methods. Overall, the experimental results showed that LSTM-FPNN
CCS were obtained [49–50]. In Table 6, the comparative results between outperformed other classical prediction models for both time-series
the proposed model with other widely used regression methods are datasets and two real-world datasets.
shown. As a future work, it might be interesting to use temporal convolution
Case study 2: Sewage treatment process networks as the neurons to capture the related characteristic of the
The activated sludge wastewater treatment process is widely used sequence and to improve their prediction performance.
and relied on by many municipalities. This process utilizes a multi-
chamber reactor unit that consisted of primary clarifier, aeration tank CRediT authorship contribution statement
and secondary clarifier. Fig. 17 shows the activated sludge process in
wastewater treatment system. One important aspect of the activated Kun Zhou: Methodology, Software, Visualization, Writing – original
sludge process is this recirculation. draft. Sung-Kwun Oh: Formal analysis, Funding acquisition, Method­
Dissolved oxygen (DO) concentration is considered as the most ology, Supervision, Validation, Writing – review & editing. Witold
important control parameter in the activated sludge process. To verify Pedrycz: Validation, Writing – review & editing. Jianlong Qiu: Fund­
the prediction performance of the proposed model in the sewage treat­ ing acquisition, Supervision. Kisung Seo: Funding acquisition, Super­
ment process, we collected the key data affecting DO to analyze the vision, Writing – review & editing.
control of the dissolved oxygen level in the reactors. In the experiment,
six sensor data (Chemical Oxygen Demand (COD), Total Nirtrogen (TN), Declaration of competing interest
Total Phosphorus (TP) and Suspended Solid (SS) of the effluent, and
Ammonium (NH4) and Mixed Liquid Suspended Solid (MLSS) of aera­ The authors declare that they have no known competing financial
tion tank) which affects DO are used as input variables of the model interests or personal relationships that could have appeared to influence
[51]. The comparative results between the proposed model with other the work reported in this paper.
widely used regression methods are shown in Table 7.
Data availability
6. Conclusion
Data will be made available on request.
In this study, we have presented and investigated the novel self-
organizing deep methods of LSTM-FPNN realized with the aid of two

16
K. Zhou et al. Knowledge-Based Systems 289 (2024) 111481

Acknowledgments [24] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436–444.
[25] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 (8)
(1997) 1735–1780.
This work was supported by the National Research Foundation of [26] W. Li, X. Wang, H. Han, J. Qiao, A PLS-based pruning algorithm for simplified
Korea (NRF) grant funded by the Korea Government (MSIT) (NRF- long–short term memory neural network in time series prediction, Knowl. Based
2021R1F1A1056102, NRF-2023K2A9A2A06060385, and RS-2023- Syst. 254 (2022) 109608.
[27] G.F. Fan, Y.Y. Han, J.J. Wang, H.L. Jia, L.L. Peng, H.P. Huang, W.C. Hong, A new
00244355), and also by the National Natural Science Foundation of intelligent hybrid forecasting method for power load considering uncertainty,
China under Grant No. 61877033, 61833005, and also by Natural Sci­ Knowl. Based Syst. 280 (2023) 111034.
ence Foundation of Shandong Province under Grant No. ZR2019MF021. [28] D. Qin, Z. Peng, L. Wu, Deep attention fuzzy cognitive maps for interpretable
multivariate time series prediction, Knowl. Based Syst. (2023) 110700.
[29] S. Ahmed, R.K. Chakrabortty, D.L. Essam, W. Ding, Poly-linear regression with
References augmented long short term memory neural network: predicting time series data,
Inf. Sci. 606 (2022) 573–600.
[1] Z. Lu, Y. Lu, Enhancing the reliability of image classification using the intrinsic [30] Q. Zheng, Y. Zhang, DSTAGCN: dynamic spatial-temporal adjacent graph
features, Knowl. Based Syst. 263 (2023) 110256. convolutional network for traffic forecasting, IEEE Trans. Big Data 9 (1) (2023)
[2] S. Peng, R. Zeng, L. Cao, A. Yang, J. Niu, C. Zong, G. Zhou, Multi-source domain 241–253.
adaptation method for textual emotion classification using deep and broad [31] Y.Q. Tang, F.S. Yu, W. Pedrycz, X.Y. Yang, J.Y. Wang, S.H. Liu, Building trend
learning, Knowl. Based Syst. 260 (2023) 110173. fuzzy granulation-based LSTM recurrent neural network for long–term time–series
[3] B. Yang, T. Liang, J. Xiong, C. Zhong, Deep reinforcement learning based on forecasting, IEEE Trans. Fuzzy Syst. 30 (6) (2022) 1599–1613.
transformer and U-Net framework for stock trading, Knowl. Based Syst. 262 (2023) [32] Y. Dong, J. Wang, R. Wang, H. Jiang, Accurate combination forecasting of wave
110211. energy based on multiobjective optimization and fuzzy information granulation,
[4] S.J. Farlow, Self-organizing Methods in modeling: GMDH Type Algorithms, CrC J. Clean. Prod. 386 (2023) 135772.
Press, 2020. [33] Y. Li, Z. Tong, S. Tong, D. Westerdahl, A data-driven interval forecasting model for
[5] A. Ivakhnenko, G. Ivakhnenko, The review of problems solvable by algorithms of building energy prediction using attention-based LSTM and fuzzy information
the group method of data handling (GMDH), Pattern Recognit. Image Anal. 5 granulation, Sustain. Cities Soc. 76 (2022) 103481.
(1995) 527–535. Raspoznavaniye Obrazov I Analiz Izobrazhenii. [34] S.K. Oh, W. Pedrycz, B.J. Park, Polynomial neural networks architecture: analysis
[6] D.H. Lim, S.H. Lee, M.G. Na, Smart soft-sensing for the feedwater flowrate at PWRs and design, Comput. Electr. Eng. 29 (6) (2003) 703–725.
using a GMDH algorithm, IEEE Trans. Nucl. Sci. 57 (1) (2010) 340–347. [35] W. Pedrycz, G. Vukovich, Granular neural networks, Neurocomputing 36 (1–4)
[7] E.E. Elattar, J.Y. Goulermas, Q.H. Wu, Generalized locally weighted GMDH for (2001) 205–224.
short term load forecasting, IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42 (3) [36] Y. Hu, Y. Zhang, X. Gao, D. Gong, X. Song, Y. Guo, J. Wang, A federated feature
(2011) 345–356. selection algorithm based on particle swarm optimization under privacy
[8] S.K. Oh, W. Pedrycz, T.C. Ahn, Self-organizing neural networks with fuzzy protection, Knowl. Based Syst. 260 (2023) 110122.
polynomial neurons, Appl. Soft Comput. 2 (1) (2002) 1–10. [37] M. Akhavan, S.M.H. Hasheminejad, A two-phase gene selection method using
[9] S.K. Oh, W. Pedrycz, H.S. Park, A new approach to the development of genetically anomaly detection and genetic algorithm for microarray data, Knowl. Based Syst.
optimized multilayer fuzzy polynomial neural networks, IEEE Trans. Ind. Electron. 262 (2023) 110249.
53 (4) (2006) 1309–1321. [38] P.J. Werbos, Backpropagation through time: what it does and how to do it, Proc.
[10] S.K. Oh, W. Pedrycz, H.S. Park, Fuzzy relation-based neural networks and their IEEE 78 (10) (1990) 1550–1560.
hybrid identification, IEEE Trans. Instrum. Meas. 56 (6) (2007) 2522–2537. [39] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization. 8th International
[11] S.K. Oh, H.S. Park, W.D. Kim, W. Pedrycz, A new approach to radial basis function- Conference on Learning Representations, ICLR, 2015.
based polynomial neural networks: analysis and design, Knowl. Inf. Syst. 36 (1) [40] D. Newman, S. Hettich, C. Blake, C. Merz, UCI Repository of Machine Learning
(2013) 121–151. Databases, University of California, Department of Information and Computer
[12] W. Huang, S.K. Oh, W. Pedrycz, Fuzzy wavelet polynomial neural networks: Science, Irvine, CA, 1998. See also, http://www.ics.uci.edu/~mlearn/MLRepositor
analysis and design, IEEE Trans. Fuzzy Syst. 25 (5) (2016) 1329–1341. y.html.
[13] W. Huang, S.K. Oh, W. Pedrycz, Hybrid fuzzy wavelet neural networks architecture [41] A. Lipowski, D. Lipowska, Roulette-wheel selection via stochastic acceptance, Phys.
based on polynomial neural networks and fuzzy set/relation inference-based A 391 (6) (2012) 2193–2196.
wavelet neurons, IEEE Trans. Neural Netw. Learn. Syst. 29 (8) (2017) 3452–3462. [42] D.E. Goldberg, K. Deb, G.J.E. Rawlins, A comparative analysis of selection schemes
[14] C. Zhang, S.K. Oh, Z. Fu, W. Pedrycz, Design of reinforced hybrid fuzzy rule-based used in genetic algorithms. Foundations of Genetic Algorithms, Morgan Kaufmann
neural networks driven to inhomogeneous neurons and tournament selection, IEEE Publishers, San Mateo, California, 1991, pp. 69–93.
Trans. Fuzzy Syst. 29 (11) (2020) 3293–3307. [43] H. Sak, A. Senior, F. Beaufays, Long short-term memory based recurrent neural
[15] S.B. Roh, S.K. Oh, W. Pedrycz, Z. Fu, Dynamically generated hierarchical neural network architectures for large vocabulary speech recognition, 2014, [online]
networks designed with the aid of multiple support vector regressors and PNN Available: http://arxiv.org/abs/1402.1128. (2014).
architecture with probabilistic selection, IEEE Trans. Neural Netw. Learn. Syst. 33 [44] Y. Imrana, Y. Xiang, L. Ali, Z. Abdul-Rauf, A bidirectional LSTM deep learning
(4) (2022) 1385–1399. approach for intrusion detection, Expert Syst. Appl. 185 (2021) 115524.
[16] Y.C. Lin, S.J. Lee, C.S. Ouyang, C.H. Wu, Air quality prediction by neuro-fuzzy [45] E.H. Kim, S.K. Oh, W. Pedrycz, Z. Fu, Reinforced fuzzy clustering-based ensemble
modeling approach, Appl. Soft Comput. 86 (2020) 105898. neural networks, IEEE Trans. Fuzzy Syst. 28 (3) (2019) 569–582.
[17] K. Yuan, J. Liu, S. Yang, K. Wu, F. Shen, Time series forecasting based on kernel [46] M.W. Mahoney, Randomized algorithms for matrices and data, Found. Trends
mapping and high order fuzzy cognitive maps, Knowl. Based Syst. 206 (2020) Mach. Learn. 3 (2) (2011) 123–224.
106359. [47] G. Tiao, M. Grupe, Hidden periodic autoregressiye-moving average models in time
[18] Y. Xu, L. Jia, W. Yang, Correlation based neuro-fuzzy Wiener type wind power series data, Biometrika 67 (2) (1980) 365–373.
forecasting model by using special separate signals, Energy Convers. Manag. 253 [48] E. Frank, M.A. Hall, I.H. Witten, Online appendix for data mining: practical
(2022) 115173. machine learning tools and techniques. The WEKA Workbench, 4th Edition,
[19] H.A. Mohammadi, S. Ghofrani, A. Nikseresht, Using empirical wavelet transform Morgan Kaufmann, 2016.
and high-order fuzzy cognitive maps for time series forecasting, Appl. Soft Comput. [49] L. Wang, B. Yang, A. Abraham, L. Qi, X. Zhao, Z. Chen, Construction of dynamic
135 (2023) 109990. three-dimensional microstructure for the hydration of cement using 3D image
[20] E. Bas, &.E. Egrioglu, A fuzzy regression functions approach based on Gustafson- registration, Pattern. Anal. Appl. 17 (3) (2014) 655–665.
Kessel clustering algorithm, Inf. Sci. 592 (2022) 206–214. [50] L. Zhang, S.K. Oh, W. Pedrycz, B. Yang, Y. Han, Building fuzzy relationships
[21] J. Wang, H. Li, H. Yang, W. Y, Intelligent multivariable air-quality forecasting between compressive strength and 3D microstructural image features for cement
system based on feature selection and modified evolving interval type-2 quantum hydration using Gaussian mixture model-based polynomial radial basis function
fuzzy neural network, Environ. Pollut. 274 (2021) 116429. neural networks, Appl. Soft Comput. 112 (2021) 107766.
[22] A. Salimi-Badr, IT2CFNN: an interval type-2 correlation-aware fuzzy neural [51] E.S. Nahm, A study on fuzzy control method of energy saving for activated sludge
network to construct non-separable fuzzy rules with uncertain and adaptive shapes process in sewage treatment plant, Trans. Korean Inst. Electr. Eng. 67 (11) (2018)
for nonlinear function approximation, Appl. Soft Comput. 115 (2022) 108258. 1477–1485.
[23] B. Zhang, X. Gong, J. Wang, F. Tang, K. Zhang, W. Wu, Nonstationary fuzzy neural
network based on FCMnet clustering and a modified CG method with Armijo-type
rule, Inf. Sci. 608 (2022) 313–338.

17

You might also like