0% found this document useful (0 votes)
22 views15 pages

Paper 16

This document discusses a new self-adaptive neuro-fuzzy system called PALM that uses hyperplane clustering to reduce the number of parameters needed. It aims to handle big data streams with single-pass learning while dealing with concept drift. The document introduces the challenges of data streams and requirements of learning systems, describes existing approaches and their limitations, and presents the new PALM approach and extensions.

Uploaded by

PRATIMA RAJPUT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views15 pages

Paper 16

This document discusses a new self-adaptive neuro-fuzzy system called PALM that uses hyperplane clustering to reduce the number of parameters needed. It aims to handle big data streams with single-pass learning while dealing with concept drift. The document introduces the challenges of data streams and requirements of learning systems, describes existing approaches and their limitations, and presents the new PALM approach and extensions.

Uploaded by

PRATIMA RAJPUT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 27, NO.

11, NOVEMBER 2019 2115

PALM: An Incremental Construction of


Hyperplanes for Data Stream Regression
Md Meftahul Ferdaus , Student Member, IEEE, Mahardhika Pratama , Member, IEEE, Sreenatha G. Anavatti ,
and Matthew A. Garratt

Abstract—Data stream has been the underlying challenge in the I. INTRODUCTION


age of big data because it calls for real-time data processing with
DVANCE in both hardware and software technologies has
the absence of a retraining process and/or an iterative learning ap-
proach. In the realm of the fuzzy system community, data stream is
handled by algorithmic development of self-adaptive neuro-fuzzy
A triggered generation of a large quantity of data in an au-
tomated way. Such applications can be exemplified by space,
systems (SANFS) characterized by the single-pass learning mode autonomous systems, aircraft, meteorological analysis, stock
and the open structure property that enables effective handling of
fast and rapidly changing natures of data streams. The underlying
market analysis, sensors networks, users of the internet, etc.,
bottleneck of SANFSs lies in its design principle, which involves where the generated data are not only massive and possibly
a high number of free parameters (rule premise and rule conse- unbounded but also produced at a rapid rate under complex
quent) to be adapted in the training process. This figure can even environments. Such online data are known as data stream [1],
double in the case of the type-2 fuzzy system. In this paper, a novel [2]. A data stream can be expressed
SANFS, namely parsimonious learning machine (PALM), is pro-  in a more formal way
posed. PALM features utilization of a new type of fuzzy rule based
[3] as S = x1 , x2 , . . . , xi , . . . , x∞ , where xi is enormous se-
on the concept of hyperplane clustering, which significantly reduces quence of data objects and possibly unbounded. Each of the
the number of network parameters because it has no rule premise data object can be defined by an n-dimensional feature vector
parameters. PALM is proposed in both type-1 and type-2 fuzzy as xi = [xij ]nj=1 , which may belong to a continuous, categori-
systems where all of which characterize a fully dynamic rule-based cal, or mixed feature space. In the field of data stream mining,
system. That is, it is capable of automatically generating, merging,
and tuning the hyperplane-based fuzzy rule in the single-pass man-
developing a learning algorithm as a universal approximator is
ner. Moreover, an extension of PALM, namely recurrent PALM, is challenging due to the following factors.
proposed and adopts the concept of teacher-forcing mechanism in 1) The whole data to train the learning algorithm is not read-
the deep learning literature. The efficacy of PALM has been eval- ily available since the data arrive continuously.
uated through numerical study with six real-world and synthetic 2) The size of a data stream is not bounded.
data streams from public database and our own real-world project
of autonomous vehicles. The proposed model showcases significant
3) Dealing with a huge amount of data.
improvements in terms of computational complexity and number 4) Distribution of the incoming unseen data may slide over
of required parameters against several renowned SANFSs, while time slowly, rapidly, abruptly, gradually, locally, glob-
attaining comparable and often better predictive accuracy. ally, cyclically, or otherwise. Such variations in the data
Index Terms—Data stream, fuzzy, hyperplane, incremental, distribution of data streams over time are known as
learning machine, parsimonious. concept drif t [4], [5];
5) Data are discarded after being processed to suppress mem-
ory consumption into practical level.
To cope with above-mentioned stated challenges in data
streams, the learning machine should be equipped with the fol-
Manuscript received May 10, 2018; revised August 28, 2018 and October 17, lowing features:
2018; accepted December 18, 2018. Date of publication January 16, 2019; date 1) capability of working in single-pass mode;
of current version November 4, 2019. This work was financially supported in 2) handling various concept drifts in data streams;
part by NTU start up grant and in part by MOE Tier 1 Research Grant (Grant
No.: RG130/17). (Corresponding author: Mahardhika Pratama.) 3) has low memory burden and computational complexity to
M. M. Ferdaus, S. G. Anavatti, and M. A. Garratt are with the School of enable real-time deployment under resource constrained
Engineering and Information Technology, University of New South Wales at the environment.
Australian Defence Force Academy, Canberra, ACT 2612, Australia (e-mail:,
[email protected]; [email protected]; M.Garratt@adfa. In realm of the fuzzy system, such learning aptitude is
edu.au). demonstrated by the self-adaptive neuro-fuzzy system (SANFS)
M. Pratama is with the School of Computer Science and Engineering, [6]. Until now, existing SANFSs are usually constructed
Nanyang Technological University, Singapore 639798 (e-mail:, mpratama@
ntu.edu.sg). via hypersphere-based or hyperellipsoid-based clustering tech-
This paper has supplementary downloadable material available at http:// niques (HSBC or HEBC) to automatically partition the input
ieeexplore.ieee.org, provided by the authors. The material is 543 KB in size. space into a number of fuzzy rule and rely on the assumption
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. of normal distribution due to the use of Gaussian membership
Digital Object Identifier 10.1109/TFUZZ.2019.2893565 function [7]–[16]. As a result, they are always associated with

1063-6706 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Authorized licensed use limited to: South Asian University. Downloaded on May 10,2024 at 12:38:37 UTC from IEEE Xplore. Restrictions apply.
2116 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 27, NO. 11, NOVEMBER 2019

rule premise parameters, the mean, and width of Gaussian func- hyperplane distance [26]. This concept is inspired by the
tion, which need to be continuously adjusted. This issue com- teacher forcing mechanism in the deep learning literature
plicates its implementation in a complex and deep structure. As where activation degree of a node is calculated with re-
a matter of fact, existing neuro-fuzzy systems can be seen as a spect to predictor’s previous output. The performance of
single hidden layer feedforward network. Other than the HSBC rPALM has been numerically validated in our supplemen-
or HEBC, the data cloud based clustering (DCBC) concept is tal document where its performance is slightly inferior
utilized in [17] and [18] to construct the SANFS. Unlike the to PALM but still highly competitive to most prominent
HSBC and HEBC, the data clouds do not have any specific SANFSs in terms of accuracy.
shape. Therefore, required parameters in DCBC are less than 5) Two real-world problems from our own project, namely
HSBC and HEBC. However, in DCBC, parameters like mean, online identification of Quadcopter unmanned aerial ve-
accumulated distance of a specific point to all other points need hicle (UAV) and helicopter UAV, are presented in this
to be calculated. In other words, it does not offer significant re- paper and exemplify real-world streaming data problems.
duction on the computational complexity and memory demand The two datasets are collected from indoor flight tests
of SANFS. Hyperplane-based clustering (HPBC) provides a in the UAV lab of the University of New South Wales
promising avenue to overcome this drawback because it bridges (UNSW), Canberra campus. These datasets, PALM and
the rule premise and the rule consequent by means of the hyper- rPALM codes are made publicly available in [27].
plane construction. The efficacy of both type-1 and type-2 PALMs have been nu-
Although the concept of HPBC already exists since the last merically evaluated using six real-world and synthetic stream-
two decades [19]–[21], all of them are characterized by a static ing data problems. Moreover, PALM is also compared against
structure and are not compatible for data stream analytic due prominent SANFSs in the literature and demonstrates encour-
to their offline characteristics. Besides, majority of these algo- aging numerical results in which it generates compact and par-
rithms still use the Gaussian or bell-shaped Gaussian function simonious network structure while delivering comparable and
[22] to create the rule premise and are not free of the rule even better accuracy than other benchmarked algorithms.
premise parameters. This problem is solved in [23], where they The remainder of this paper is structured is as follows.
have proposed a new function to accommodate the hyperplanes Section II discusses literature survey over closely related works.
directly in the rule premise. Nevertheless, their model also ex- In Section III, the network architecture of both type-1 and type-2
hibits a fixed structure and operates in the batch learning node. PALM are elaborated. Section IV describes the online learning
Based on this research gap, a novel SANFS, namely parsimo- policy of type-1 PALM, while Section V presents online learn-
nious learning machine (PALM), is proposed in this paper. The ing mechanism of type-2 PALM. In Section VI, the proposed
novelty of this paper can be summarized as follows. PALM’s efficacy has been evaluated through real-world and
1) PALM is constructed using the HPBC technique and its synthetic data streams. Finally, this paper ends by drawing the
fuzzy rule is fully characterized by a hyperplane that un- concluding remarks in Section VII.
derpins both the rule consequent and the rule premise.
This strategy reduces the rule base parameter to the level
of C ∗ (P + 1) where C, P are, respectively, the number II. RELATED WORK AND RESEARCH GAP WITH THE
of fuzzy rule and input dimension. STATE-OF-THE-ART ALGORITHMS
2) PALM is proposed in both type-1 and type-2 versions de- SANFS can be employed for data stream regression, since
rived from the concept of type-1 and type-2 fuzzy systems. they can learn from scratch with no base knowledge and are
Type-1 version incurs less network parameters and faster embedded with the self-organizing property to adapt to the
training speed than the type-2 version whereas type-2 ver- changing system dynamics [28]. It fully work in a single-pass
sion expands the degree of freedom of the type-1 version learning scenario, which is efficient for online learning under
by applying the interval-valued concept leading to be more limited computational resources. An early work in this domain
robust against uncertainty than the type-1 version. is seen in [6] where an SANFS, namely SONFIN, was pro-
3) PALM features a fully open network structure where its posed. The evolving clustering method is implemented in [29]
rules can be automatically generated, merged, and up- to evolve fuzzy rules. Another pioneering work in this area is the
dated on demand in the one-pass learning fashion. The development of the online evolving Takagi-Sugeno (T-S) fuzzy
rule generation process is based on the self-constructing system namely eTS [7] by Angelov. eTS has been improved in
clustering approach [24], [25] checking coherence of in- the several follow-up works: eTS+ [30], Simpl_eTS [8], AnYa
put and output space. The rule merging scenario is driven [17]. However, eTS+, and Simpl_eTS generate axis parallel el-
by the similarity analysis via the distance and orientation lipsoidal clusters, which cannot deal effectively with nonaxis
of two hyperplanes. The online hyperplane tuning sce- parallel data distribution. To deal with the nonaxis parallel data
nario is executed using the fuzzily weighted generalized distribution, an evolving multivariable Gaussian function was
RLS (FWGRLS) method. introduced in the fuzzy system in [31]. Another example of
4) An extension of PALM, namely recurrent PALM SANFS exploiting the multivarible Gaussian function is found
(rPALM), is put forward in this paper. rPALM addresses in [10] where the concept of statistical contribution is imple-
the underlying bottleneck of the HPBC method: depen- mented to grow and prune the fuzzy rules on the fly. Their work
dency on target variable due to the definition of point-to- has been extended in [9] where the idea of statistical contribu-

Authorized licensed use limited to: South Asian University. Downloaded on May 10,2024 at 12:38:37 UTC from IEEE Xplore. Restrictions apply.
FERDAUS et al.: PALM: AN INCREMENTAL CONSTRUCTION OF HYPERPLANES FOR DATA STREAM REGRESSION 2117

tion is used as a basis of input contribution estimation for the


online feature selection scenario.
The idea of SANFS was implemented in type-2 fuzzy system
in [32]. Afterward, they have extended their concept in local re-
current architecture [33], and interactive recurrent architecture
[34]. These works utilize Karnik–Mendel (KM) type reduction
technique [35], which relies on an iterative approach to find
left-most and right-most points. To mitigate this shortcoming,
the KM type reduction technique can be replaced with q design
coefficient [36] introduced in [37]. SANFS is also introduced
under the context of metacognitive learning machine (McLM),
which encompasses three fundamental pillars of human learn-
ing: what-to-learn, how-to-learn, and when-to-learn. The idea of
McLM was introduced in [38]. McLM has been modified with Fig. 1. Clustering in the T-S fuzzy model using hyperplanes.
the use of Scaffolding theory, McSLM, which aims to realize
the plug-and-play learning fashion [39]. To solve the problem of
still deploy Gaussian function to represent the rule premise of T-
uncertainty, temporal system dynamics, and the unknown sys-
S fuzzy model, which does not exploit the parameter efficiency
tem order McSLM was extended in recurrent interval-valued
trait of HPBC. To fill up this research gap, a new membership
metacognitive scaffolding fuzzy neural network [11]. The vast
function [23] is proposed to accommodate the use of hyper-
majority of SANFSs are developed using the concept of HSBC
planes in the rule premise part of the TS fuzzy system. It can be
and HEBC, which impose considerable memory demand and
expressed as
computational burden because both rule premise and rule conse-  
quent have to be stored and evolved during the training process. dst(j)
μB (j) = exp −Γ (2)
max (dst(j))
III. NETWORK ARCHITECTURE OF PALM
where j = 1, 2, . . . , R; R is the number of rules, Γ is an adjust-
In this section, the network architecture of PALM is presented ment parameter, which controls the fuzziness of membership
in details. The T-S fuzzy system is a commonly used technique grades. Based on the observation in [23], and empirical analysis
to approximate complex nonlinear systems due to its universal with variety of data streams in our work, the range of Γ is settled
approximation property. The rule base in the T-S fuzzy model as [1, 100]. dst(j) denotes the distance from present sample to
of that the multiinput single-output (MISO) system can be ex- the jth hyperplane. In our work, dst(j) is defined as [23] as
pressed in the following IF–THEN rule format follows:
Rj : IF x1 is B1j and x2 is B2j and . . . and xn is Bnj |Xt ωj |
dst(j) = (3)
|ωj |
THEN yj = b0j + a1j x1 + · · · + an j xn (1)
where Xt ∈ 1×(n +1) and ωj ∈ (n +1)×1 , respectively, stand
where Rj stands for the jth rule, j = 1, 2, 3, . . . , R, and R
for the input vector of the tth observation and the output weight
indicates the number of rules, i = 1, 2, . . . , n; n denotes the di-
vector of the jth rule. This membership function enables the
mension of input feature, xn is the nth input feature, a and b
incorporation of HPBC directly into the T-S fuzzy system di-
are consequent parameters of the submodel belonging to the jth
rectly with the absence of rule parameters except the first-order
rule, and yj is the output of the jth submodel. The T-S fuzzy
linear function or hyperplane. Because a point to plane distance
model can approximate a nonlinear system with a combination
is not unique, the compatibility measure is executed using the
of several piecewise linear systems by partitioning the entire
minimum point to plane distance. The following discusses the
input space into several fuzzy regions. It expresses each input–
network structure of PALM encompassing its type-1 and type-
output space with a linear equation as presented in (1). Approxi-
2 versions. PALM can be modeled as a four-layered network
mation using T-S fuzzy model leads to a nonlinear programming
working in tandem, where the fuzzy rule triggers a hyperplane-
problem and hinders its practical use. A simple solution to the
shaped cluster and is induced by (3). Since T-S fuzzy rules can
problem is the utilization of various clustering techniques to
be developed solely using a hyperplane, PALM is free from
identify the rule premise parameters. Because of the generation
antecedent parameters that results in dramatic reduction of net-
of the linear equation in the consequent part, the HPBC can be
work parameters. Furthermore, it operates in the one-pass learn-
applied to construct the T-S fuzzy system efficiently. The ad-
ing fashion where it works point by point and a data point is
vantages of using HPBC in the T-S fuzzy model can be seen
discarded directly once learned.
graphically in Fig. 1.
Some popular algorithms with HPBC are fuzzy C-regression
model [40], fuzzy C-quadratic shell (FCQS) [41], double FCM A. Structure of the Type-1 PALM Network
[19], and inter type-2 fuzzy C-regression model [23]. A crucial In type-1 PALM network architecture, the membership func-
limitation of these algorithms is their nonincremental nature, tion exposed in (2) is utilized to fit the hyperplane-shaped clus-
which does not suit for data stream regression. Moreover, they ter in identifying the type-1 T-S fuzzy model. To understand the

Authorized licensed use limited to: South Asian University. Downloaded on May 10,2024 at 12:38:37 UTC from IEEE Xplore. Restrictions apply.
2118 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 27, NO. 11, NOVEMBER 2019

work flow let us consider that a single data point xn is fed into T-S fuzzy system is functionally equivalent to the radial basis
PALM at the nth observation. Appertaining to the concept of function network if the rule firing strength is directly connected
the type-1 fuzzy system, this crisp data need to be transformed to the output of the consequent layer [42]. It is also depicted
to fuzzy set. This fuzzification process is attained using type- that the final crisp output is produced by the weighted average
1 hyperplane-shaped membership function, which is framed defuzzification scheme.
through the concept of point-to-plane distance. This hyperplane-
shaped type-1 membership function can be expressed as B. Network Structure of the Type-2 PALM
 
dst(j) Type-2 PALM differs from the type-1 variant in the use of
fT1 1 = μB (j) = exp −Γ (4)
max (dst(j)) interval-valued hyperplane generating the type-2 fuzzy rule.
Akin to its type-1 version, type-2 PALM starts operating by
where dst(j) in (4) denotes the distance between the current
intaking the crisp input data stream xn to be fuzzied. Here,
sample and jth hyperplane as with (3). It is defined as per defi-
the fuzzification occurs with help of interval-valued hyperplane
nition of a point-to-plane distance [26] and is formally expressed
based membership function, which can be expressed as
as follows:
   ⎛ ⎞
 yd − ( ni=1 aij xi + b0j ) 

dst(j) =     dst(j)
1 + ni=1 (aij )2 
(5) 1
fout = exp ⎝−Γ ⎠ (9)
max dst(j)
where aij and b0j are consequent parameters of the jth rule,  
i = 1, 2, . . . , n; n is the number of input dimension, and yd 1
1
where fout = f 1out , f out is the upper and lower hyperplane,
is the target variable. The exertion of yd is an obstruction for  
PALM due to target variable’s unavailability in testing phase. dst(j) = dst(j), dst(j) is interval valued distance, where
This issue comes into picture due to the definition of a point- dst(j) is the distance between present input samples and jth
to-hyperplane distance [26]. To eradicate such impediment, an upper hyperplane, and dst(j) is that between present input sam-
rPALM framework is developed here. We refer curious readers ples and jth lower hyperplane. In type-2 architecture, distances
to the supplementary document for details on the rPALM. Con- among incoming input data and upper and lower hyperplanes
sidering an MISO system, the IF–THEN rule of type-1 PALM can are calculated as follows:
be expressed as follows:   
 yd − ( n aij xi + b0j ) 
dst(j) =   i=1   (10)
Rj : IF Xn is close to fT2 1 j THEN yj = xje ωj (6) 1 + ni=1 (aij )2 
where xe is the extended input vector and is expressed by    
where aij = aij ; aij and b0j = b0j ; b0j are the interval-
inserting the intercept to the original input vector as xe = valued coefficients of the rule consequent of type-2 PALM.
[1, xk1 , xk2 , . . . , xkn ], ωj is the weight vector for the jth rule, yj is Like the type-1 variants, type-2 PALM has dependency on target
the consequent part of the jth rule. Since type-1 PALM has no value (yd ). Therefore, they are also extended into type-2 recur-
premise parameters, the antecedent part is simply hyperplane. rent structure and elaborated in the supplementary document.
It is observed from (6) that the drawback of the HPBC-based The use of interval-valued coefficients result in the interval-
TS fuzzy system lies in the high level fuzzy inference scheme, valued firing strength, which forms the footprint of uncertainty
which degrades the transparency of fuzzy rule. The intercept (FoU). The FoU is the key component against uncertainty of data
of extended input vector controls the slope of hyperplane that streams and sets the degree of tolerance against uncertainty.
functions to prevent the untypical gradient problem. In an MISO system, theIF–THEN rule of type-2 PALM can be
The consequent part is akin to the basic T-S fuzzy model’s expressed as
rule consequent part (yj = b0j + a1j x1 + · · · + an j xn ). The
consequent part for the jth hyperplane is calculated by weighting Rj : IF Xn is close to fout
2
THEN yj = xje ωj (11)
the extended input variable (xe ) with its corresponding weight
vector as follows: where xe is the extended input vector, ωj is the interval-valued
weight vector for the jth rule, yj is the consequent part of the
fT2 1 j = xTe ωj . (7) jth rule, whereas the antecedent part is merely interval-valued
The weight vector (ωj ) is used in (7) after updating recursively hyperplane. The type-2 fuzzy rule is similar to that of the type-1
by the FWGRLS method, which ensures a smooth change in the variant except the presence of interval-valued firing strength and
weight value. In the following step, the rule firing strength is interval-valued weight vector. In type-2 PALM, the consequent
normalized and combined with the rule consequent to produce part is calculated by weighting the extended input variable
 xe
the end-output of type-1 PALM. The final crisp output of the with the interval-valued output weight vectors ωj = ωj , ωj
PALM for type-1 model can be expressed as follows: as follows:
R 1 2 2
3 j =1 fT 1 j fT 1 j f outj = xje ω j , f 2out = xje ω j . (12)
fT 1 = R . (8) j
1
i=1 fT 1 i The lower weight vector ω j for the jth lower hyperplane, and
The normalization term in (8) guarantees the partition of unity upper weight vector ω j for the jth upper hyperplane are ini-
where the sum of normalized membership degree is unity. The tialized by allocating higher value for upper weight vector than

Authorized licensed use limited to: South Asian University. Downloaded on May 10,2024 at 12:38:37 UTC from IEEE Xplore. Restrictions apply.
FERDAUS et al.: PALM: AN INCREMENTAL CONSTRUCTION OF HYPERPLANES FOR DATA STREAM REGRESSION 2119

the lower weight vector. These vectors are updated recursively of two hyperplanes using the distance and angle concept. The
by the FWGRLS method, which ensures a smooth change in hyperplane-based fuzzy rule is adjusted using the FWGRLS
weight value. method in the single-pass learning fashion.
Before performing the defuzzification method, the type reduc-
tion mechanism is carried out to craft the type-reduced set—the A. Mechanism of Growing Rules
transformation from the type-2 fuzzy variable to the type-1 fuzzy
The rule growing mechanism of type-1 PALM is adopted
variable. One of the commonly used type-reduction method is
from the SCC method developed in [24] and [25] to adapt the
the KM procedure [35]. However, in the KM method, there
number of rules. This method has been successfully applied
is an involvement of an iterative process due to the require-
to automatically generate interval-valued data clouds in [18]
ment of reordering the rule consequent first in ascending order
but its use for HPBC deserves an in-depth investigation. In
before getting the cross-over points iteratively incurring expen-
this technique, the rule significance is measured by calculating
sive computational cost. Therefore, instead of the KM method,
the input and output coherence. The coherence is measured
the q design factor [36] is utilized to orchestrate the type reduc-
by analyzing the correlation between the existing data samples
tion process. The final crisp output of the type-2 PALM can be
and the target concept. Hereby assuming the input vector as
expressed as follows:
Xt ∈ n , target vector as Tt ∈ n , hyperplane of the ith local
3 1 submodel as Hi ∈ 1×(n +1) , the input and output coherence
fout = yout = (yl + yr o u t ) (13)
2 out between Xt ∈ n and each Hi ∈ 1×(n +1) are calculated as
where follows:
R R 1 Ic (Hi , Xt ) = ξ(Hi , Xt ) (16)
1 2
j =1 ql f out f out j =1 (1 − ql )f out f 2out
yl o u t = R + R (14)
1
f 1out Oc (Hi , Xt ) = ξ(Xt , Tt ) − ξ(Hi , Tt ) (17)
i=1 f out i=1
R 2 R 1 2 where ξ( ) expresses the correlation function. There are various
j =1 qr f 1out f out − qr )f out f out
j =1 (1
yr o u t = R + R (15) linear and nonlinear correlation methods for measuring correla-
1 1
i=1 f out i=1 f out tion, which can be applied. Among them, the nonlinear methods
for measuring the correlation between variables are hard to em-
where yl o u t and yr o u t are the left and right outputs resulted from
ploy in the online environment since they commonly use the
the type reduction mechanism. ql and qr , utilized in (14) and
discretization or the Parzen window method. On the other hand,
(15), are the design factors initialized in a way to satisfy the
Pearson correlation is a widely used method for measuring cor-
condition ql < qr . In our q design factor, the ql and qr steers
relation between two variables. However, it suffers from some
the proportion of the upper and lower rules to the final crisp
limitations: it is insensitivity to the scaling and translation of
outputs yl o u t and yr o u t of the PALM. The normalization process
variables and sensitivity to rotation [43]. To solve these prob-
of the type-2 fuzzy inference scheme [37] was modified in [11]
lems, a method namely maximal information compression index
to prevent the generation of the invalid interval. The generation
(MCI) is proposed in [43], which has also been utilized in the
of this invalid interval as a result of the normalization process
SSC method to measure the correlation ξ( ) between variables
of [37] was also proved in [11]. Therefore, normalization pro-
as follows:
cess as adopted in [11] is applied and advanced in terms of ql
and qr in our work. Besides, in order to improve the perfor- 1
ξ(Xt , Tt ) = (var(Xt ) + var(Tt )
mance of the proposed PALM, the ql and qr are not left constant 2

rather continuously adapted using gradient decent technique as − (var(Xt ) + var(Tt ))2 − 4var(Xt )(Tt )(1 − ρ(Xt , Tt )2 ))
explained in Section IV. Notwithstanding that the type-2 PALM (18)
is supposed to handle uncertainty better than its type-1 variant,
it incurs a higher number of network parameters in the level cov(Xt , Tt )
ρ(Xt , Tt ) =  (19)
of 2 × R × (n + 1) as  a result
 of the use of upper and lower var(Xt )var(Tt )
weight vectors ωj = ω j , ω j . In addition, the implementation
where var(Xt ) and var(Tt ) express the variance of Xt and Tt ,
of q-design factor imposes extra computational cost because ql
respectively, cov(Xt , Tt ) presents the covariance between two
and qr call for a tuning procedure with the gradient descent
variables Xt and Tt , and ρ(Xt , Tt ) stands for Pearson corre-
method.
lation index of Xt and Tt . In a similar way, the correlation
ξ(Hi , Xt ) and ξ(Hi , Tt ) can be measured using (18) and (19).
IV. ONLINE LEARNING POLICY IN TYPE-1 PALM
In addition, the MCI method measures the compressed infor-
This section describes the online learning policy of our pro- mation when a newly observed sample is ignored. Properties of
posed type-1 PALM. PALM is capable of starting its learning the MCI method in our work can be expressed as follows:
process from scratch with an empty rule base. Its fuzzy rules can 1) 0 ≤ ξ(Xt , Tt ) ≤ 12 (var(Xt ) + var(Tt ));
be automatically generated on the fly using the self-constructive 2) a maximum possible correlation is ξ(Xt , Tt ) = 0;
clustering (SCC) method that checks the input and output co- 3) express symmetric behavior ξ(Xt , Tt ) = ξ(Tt , Xt );
herence. The complexity reduction mechanism is implemented 4) invariance against the translation of the dataset;
using the hyperplane merging module, which vets similarity 5) express the robustness against rotation.

Authorized licensed use limited to: South Asian University. Downloaded on May 10,2024 at 12:38:37 UTC from IEEE Xplore. Restrictions apply.
2120 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 27, NO. 11, NOVEMBER 2019

Fig. 2. Merging of redundant hyperplanes (rules) due to newly incoming training samples.

Ic (Hi , Xt ) is projected to explore the similarity between B. Mechanism of Merging Rules


Hi and Xt directly, while Oc (Hi , Xt ) is meant to examine In SANFS, the rule evolution mechanism usually generate
the dissimilarity between Hi and Xt indirectly by utilizing the redundant rules. These unnecessary rules create complicacy in
target vector as a reference. In the present hypothesis, the input the rule base, which hinders some desirable features of fuzzy
and output coherence need to satisfy the following conditions rules: transparency and tractability in their operation. Notably,
to add a new rule or hyperplane: in handling data streams, two overlapping clusters or rules may
Ic (Hi , Xt ) > b1 and Oc (Hi , Xt ) < b2 (20) easily be obtained when new samples occupied the gap between
the existing two clusters. Several useful methods have been em-
where b1 ∈ [0.01, 0.1] and b2 ∈ [0.01, 0.1] are predetermined ployed to merge redundant rules or clusters in [9], [18], [30], and
thresholds. If the hypothesis satisfies both the conditions of [46]. However, all these techniques are appropriate for mainly
(20), a new rule is added with the highest input coherence. HSBC or ellipsoid-based clusters.
Besides, the accommodated data points of a rule are updated In realm of hyperplane clusters, there is a possibility of gen-
as Nj ∗ = Nj ∗ + 1. Also, the correlation measure functions erating a higher number of hyperplanes in dealing with the same
ξ( ) are updated with (18) and (19). Due to the utilization dataset than spherical or ellipsoidal clusters because of the na-
of the local learning scenario, each rule is adapted separately ture of HPBC in which each hyperplane represents specific
and, therefore, covariance matrix is independent to each rule operating region of the approximation curve. This opens higher
Cj (k) ∈ (n +1)×(n +1) , here n is the number of inputs. When a chance in generating redundant rules than HSBC and HEBC.
new hyperplane is added by satisfying (20), the hyperplane pa- Therefore, an appropriate merging technique is vital and has
rameters and the output covariance matrix of FWGRLS method to achieve tradeoff between diversity of fuzzy rules and gen-
are crafted as follows: eralization power of the rule base. To understand clearly, the
merging of two hyperplanes due to the new incoming training
πR +1 = πR ∗ , CR +1 = ΩI. (21)
data samples is illustrated in Fig. 2.
Due to the utilization of the local learning scenario, the con- In [47], to merge the hyperplanes, the similarity and dissim-
sequent of the newly added rules can be assigned as the closest ilarity between them are obtained by measuring only the angle
rule, since the expected trend in the local region can be portrayed between the hyperplanes. This strategy is, however, not conclu-
easily from the nearest rule. The value of correction parameter sive to decide the similarity between two hyperplanes because it
Ω in (21) is very large (105 ). Initially, the weights (ω) have solely considers the orientation of hyperplane without looking
a low approximation power and, therefore, have a large value at the relationship of two hyperplanes in the target space.
of correction factor Ω in (21); consequently, large covariance In our work, to measure the similarity between the
matrix Cj helps to obtain a fast convergence of the consequent hyperplane-shaped fuzzy rules, the angle between them is esti-
parameters to the optimal solution in the least squares sense mated as follows [9], [48]:
[44], [45]. This approach includes the possibility to perform  T 
 ωR ωR +1 
incremental training from scratch without generating an initial θhp = arccos    (22)
|ωR ||ωR +1 | 
model. The proof of such consequent parameter setting is de-
tailed in [45]. In addition, the covariance matrix of the individual where θhp is ranged between 0 and π radian, ωR =
rule has no relationship with each other. Thus, when the rules [b1,R , b2,R , . . . , bk ,R ] , ωR +1 = [b1,R +1 , b2,R +1 , . . . , bk ,R +1 ] .
are pruned in the rule merging module, the covariance matrix, The angle between the hyperplanes is not sufficient to decide
and consequent parameters are deleted as it does not affect the whether the rule merging scenario should take place because
convergence characteristics of the C matrix and consequent of it does not inform the closeness of two hyperplanes in the
remaining rules. target space. Therefore, the spatial proximity between two

Authorized licensed use limited to: South Asian University. Downloaded on May 10,2024 at 12:38:37 UTC from IEEE Xplore. Restrictions apply.
FERDAUS et al.: PALM: AN INCREMENTAL CONSTRUCTION OF HYPERPLANES FOR DATA STREAM REGRESSION 2121

hyperplanes in the hyperspace are taken into account. If we con- consequent parameters of other rules, which results in a signifi-
sider two hyperplanes as lR 1 = a1 + xb1 , and lR 2 = a2 + xb2 , cantly stable updating process [51].
then the minimum distance between them can be projected as Due to the desired features of local learning scenario, the
follows: GRLS method is extended in [9] and [11]: the fuzzily weighted
  GRLS (FWGRLS) method. FWGRLS can be seen also as a
 (b1 × b2 ) 
dR ,R +1 = (a1 − a2 ). . (23) variation of the fuzzily weighted RLS (FWRLS) method [7]
|b1 × b2 | 
with insertion of weight decay term. The FWGRLS method is
The rule merging condition is formulated as follows: formed in the proposed type-1 PALM, where the cost function
can be expressed as:
θhp ≤ c1 and dR ,R +1 ≤ c2 (24)
JLn j = (yt − xe πj )Λj (yt − xe πj )
where c1 ∈ [0.01, 0.1] and c2 ∈ [0.001, 0.1] are predefined
thresholds. If (24) is satisfied, fuzzy rules are merged. It is + 2βϕ(πj ) + (π − πj )(Cj xe )−1 (π − πj ) (27)
worth noting that the merging technique is only applicable in

i
the local learning context because, in case of global learning, JLn = JLn j (28)
the orientation and similarity of two hyperplanes have no direct j =1
correlation to their relationship.
In our merging mechanism, a dominant rule having higher where Λj denotes a diagonal matrix with the diagonal element
support is retained, whereas a less dominant hyperplane (rule) of Rj , β represents a regularization parameter, ϕ is a decaying
resided by less number of samples is pruned to mitigate the struc- factor, xe is the extended input vector, Cj is the covariance
tural simplification scenario of PALM. A dominant rule has a matrix, and πj is the local subsystem of the jth hyperplane.
higher influence on the merged cluster because it represents the Following the similar approach as [9], the final expression of
underlying data distribution. That is, the dominant rule is kept in the FWGRLS approach is formed as follows:
the rule base in order for good partition of data space to be main- πj (k) = πj (k − 1) − βCj (k)∇ϕπj (k − 1)
tained and even improved. For simplicity, the weighted average
strategy is adopted in merging two hyperplanes as follows: + Υ(k)(yt (k) − xe πj (k)); j = [1, 2, . . . , R] (29)

old old old old where


new ωacm Nacm + ωacm +1 Nacm +1
ωacm = (25)
old + N old
Nacm acm +1
Cj (k) = Cj (k − 1) − Υ(k)xe Cj (k − 1) (30)
new old old
 −1
Nacm = Nacm + Nacm (26) 1
+1 Υ(k) = Cj (k − 1)xe + xe Cj (k − 1)xe
T
(31)
old
Λj
where ωacm is the output weight vector of the acmth rule,
old
ωacm with the initial conditions
+1 is the output weight vector of (acm + 1)th rule, and
new
ωacm is the output weight vector of the merged rule, N is the π1 (1) = 0 and C1 (1) = ΩI (32)
population of a fuzzy rule. Note that the rule acm is more influ-
ential than the rule acm + 1, since Nacm > Nacm +1 . The rule where Υ(k) denotes the Kalman gain, R is the number of rules,
merging procedure is committed during the stable period where Ω = 105 is a large positive constant. In this paper, the regu-
no addition of rules occurs. This strategy aims to attain a stable larization parameter β is assigned as an extremely small value
rule evolution and prevents new rules to be merged straightaway (β ≈ 10−7 ). It can be observed that the FWGRLS method is
after being introduced in the rule base. As an alternative, the similar to the RLS method without the term βπj (k)∇ϕ(k).
Yager’s participatory learning-inspired merging scenario [46] This term steers the value of πj (k) even to update an insignif-
can be used to merge the two hyperplanes. icant amount of it minimizing the impact of inconsequential
rules. The quadratic weight decay function is chosen in PALM
written as follows:
C. Adaptation of Hyperplanes
1
In previous work on hyperplane-based T-S fuzzy system [49], ϕ(πj (k − 1)) = (πj (k − 1))2 . (33)
2
the recursive least square (RLS) method is employed to calcu-
Its gradient can be expressed as
late parameters of hyperplane. As an advancement to the RLS
method, a term for decaying the consequent parameter in the ∇ϕ(πj (k − 1)) = πj (k − 1). (34)
cost function of the RLS method is utilized in [50] and helps
By utilizing this function, the adapted-weight is shrunk to a
to obtain a solid generalization performance—generalized RLS
factor proportional to the present value. It helps to intensify
(GRLS) approach. However, their approach is formed in the
the generalization capability by maintaining dynamic of output
context of global learning. A local learning method has some
weights into small values [52].
advantages over its global counterpart: interpretability and ro-
bustness over noise. The interpretability is supported by the
V. ONLINE LEARNING POLICY IN TYPE-2 PALM
fact that each hyperplane portrays specific operating region of
approximation curve. Also, in local learning, the generation The learning policy of the type-1 PALM is extended in the
or deletion of any rule does not harm the convergence of the context of the type-2 fuzzy system, where q-design factor is

Authorized licensed use limited to: South Asian University. Downloaded on May 10,2024 at 12:38:37 UTC from IEEE Xplore. Restrictions apply.
2122 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 27, NO. 11, NOVEMBER 2019

utilized to carry out the type-reduction scenario. The learning in the type-2 PALM. The formula of merged weight in (25) is
mechanisms are detailed in the following subsections. extended for the interval-valued merged weight as follows:
old old old old
new ωacm Nacm + ωacm +1 Nacm +1
A. Mechanism of Growing Rules ωacm = old
(44)
old
Nacm + Nacm +1
In realm of the type-2 fuzzy system, the SSC method has
been extended to the type-2 SSC (T2SSC) in [18]. It has been where ωacm = [ω acm ω acm ]. As with the type-1 PALM, the
adopted and extended in terms of the design factors ql and weighted average strategy is followed in the rule merging pro-
qr , since the original work in [18] only deals with a single- cedure of the type-2 PALM.
design factor q. In this T2SSC method, the rule significance is
measured by calculating the input and output coherence as done C. Learning of the Hyperplane Submodels Parameters
in the type-1 system. By assuming Hi = [Hi , Hi ] ∈ R ×(1+n )
The FWGRLS method [9] is extended to adjust the upper
as interval-valued hyperplane of the ith local submodel, the
and lower hyperplanes of the interval type-2 PALM. The final
input and output coherence for our proposed type-2 system can
expression of the FWGRLS method is shown as follows:
be extended as follows:
πj (k) = πj (k − 1) − β Cj (k)∇ϕπj (k − 1)
Ic L (Hi , Xt ) = (1 − ql )ξ(Hi , Xt ) + ql ξ(Hi , Xt ) (35)
+ Υ(k)(yt (k) − xe πj (k)); j = [1, 2, . . . , R] (45)
Ic R (Hi , Xt ) = (1 − qr )ξ(Hi , Xt ) + qr ξ(Hi , Xt ) (36)
 where
Ic L (Hi , Xt ) + Ic R (Hi , Xt )
Ic (Hi , Xt ) = (37) Cj (k) = Cj (k − 1) − Υ(k)xe Cj (k − 1) (46)
2
 −1
Oc (Hi , Xt ) = ξ(Xt , Tt ) − ξ(Hi , Tt ) (38) 1
Υ(k) = Cj (k − 1)xe + xe Cj (k − 1)xTe (47)
where Λj

ξL (Hi , Tt ) = (1 − ql )ξ(Hi , Tt ) + ql ξ(Hi , Tt ) (39) where πj = [π j π j ], Cj = [C j C j ], Υ = [Υ Υ], and Λj =


[Λj Λj ]. The quadratic weight decay function of the FWGRLS
ξR (Hi , Tt ) = (1 − qr )ξ(Hi , Tt ) + qr ξ(Hi , Tt ) (40) method remains in the type-2 PALM to provide the weight decay
 effect in the rule merging scenario.
ξL (Hi , Tt ) + ξR (Hi , Tt )
ξ(Hi , Tt ) = . (41)
2 D. Adaptation of q Design Factors
Unlike the direct calculation of input coherence Ic ( ) in type-1 The q design factor as used in [11] is extended in terms of
system, in type-2 system the Ic ( ) is calculated using (37) based left ql and right qr design factor to actualize a high degree
on left Ic L ( ) and right Ic R ( ) input coherence. By using the MCI of freedom of the type-2 fuzzy model. They are initialized in
method in the T2SCC rule growing process, the correlation is such a way that the condition qr > ql is maintained. In this
measured using (18) and (19), where (Xt , Tt ) are substituted adaptation process, the gradient of ql and qr with respect to
with (Hi , Xt ), (Hi , Xt ), (Hi , Tt ), and (Hi , Tt ). The condi- error E = 12 (yd − yout )2 can be expressed as follows:
tions for growing rules remain the same as expressed in (20)
and is only modified to fit the type-2 fuzzy system platform. ∂E ∂E ∂yout ∂yl o u t
= × ×
The parameter settings for the predefined thresholds are as with ∂ql ∂yout ∂yl o u t ∂ql
the type-1 fuzzy model.  1 2 1 
1 f out f out f out f 2out
= − (yd − yout ) R 1 − R 1
(48)
B. Mechanism of Merging Rules 2 f i=1 f
i=1 out out
The merging mechanism of the type-1 PALM is extended for ∂E ∂E ∂yout ∂yr o u t
= × ×
the type-2 fuzzy model. To merge the rules, both the angle and ∂qr ∂yout ∂yr o u t ∂qr
distance between two interval-valued hyperplanes are measured  1 2 1 2

as follows: 1 f out f out f out f out
 T  = − (yd − yout ) R 1 − R 1
. (49)
 ωR ωR +1  2 f i=1 f
θhp = arccos    (42)
i=1 out out
|ωR ||ωR +1 |  ∂E ∂E
After obtaining the gradient and , the ql and qr are
  ∂ ql ∂ qr
 (b1 × b2 )  updated using formulas as follows:
dR ,R +1 = (a1 − a2 ).  (43)
|b1 × b2 | ∂E
qlnew = qlold − a (50)
∂qlold
where θhp = [θhp θhp ], and dR ,R +1 = [dR ,R +1 dR ,R +1 ]. This
θhp and dR ,R +1 also needs to satisfy the condition of (24) to ∂E
qrnew = qrold − a (51)
merge the rules, where the same range of c1 and c2 are applied ∂qrold

Authorized licensed use limited to: South Asian University. Downloaded on May 10,2024 at 12:38:37 UTC from IEEE Xplore. Restrictions apply.
FERDAUS et al.: PALM: AN INCREMENTAL CONSTRUCTION OF HYPERPLANES FOR DATA STREAM REGRESSION 2123

where a = 0.1 is a learning rate. Note that the learning rate problem to forecast the future value of a chaotic differential
is a key of ql and qr convergence because it determines the delay equation by using the past values. Many researchers have
step size of adjustment. An adaptive strategy as done in [39] used the MG dataset to evaluate their SANFSs’ learning and
can be implemented to shorten the convergence time without the generalization performance. This dataset is characterized by
compromising the stability of adaptation process. their nonlinear and chaotic behaviors where its nonlinear oscil-
lations replicate most of the physiological processes. The MG
E. Impediments of the Basic PALM Structure dataset is initially proposed as a control model of the generation
of white blood cells. The mathematical model is expressed as
In the PALM, hyperplane-shaped membership function is for-
mulated exercising a distance (dst(j)) exposed in (5). The dst(j) dy(k) by(k − δ)
= − ay(k) (53)
is calculated using true output value based on theory of point dt 1 + y 10 y(k − δ)
to hyperplane distance [26]. Therefore, the PALM has a depen-
where b = 0.2, a = 0.1, and δ = 85. The chaotic element is pri-
dency on the true output in deployment phase. Usually, true out-
marily attributed by δ ≥ 17. Data samples are generated through
puts are not known in the deployment mode. To circumvent such
the fourth-order Range–Kutta method and our goal is to pre-
structural shortcoming, the so-called “Teacher Forcing” mech-
dict the system output y(k + 85) at k = 85 using four inputs:
anism [53] is employed in PALM. In teacher forcing technique,
y(k), y(k − 6), y(k − 12), and y(k − 18). This series-parallel
network has connections from outputs to their hidden nodes at
regression model can be expressed as follows:
the next time step. Based on this concept, the output of PALM is
connected with the input layer at the next step, which constructs y(k + 85) = f (y(k), y(k − 6), y(k − 12), y(k − 18)) .
an rPALM architecture. The modified distance formula for the (54)
rPALM architecture is provided in the supplementary document. For the training purpose, a total of 3000 samples between k =
Besides, the code of the proposed rPALM is made available in 201 and k = 3200 is generated with the help of the fourth-order
[54]. Our numerical results demonstrate that rPALM produces Range–Kutta method, whereas the predictive model is tested
minor decrease of predictive accuracy compared to PALM but with unseen 500 samples in the range of k = 5001 − 5500 to
is still better than many of benchmarked SANFSs. The down- assess the generalization capability of the PALM.
side of the rPALM is that the rules are slightly not transparent c) Nonlinear System Identification Dataset: A nonlinear sys-
because it relies on its predicted output of the previous time tem identification is put forward to validate the efficacy of PALM
instant y(k − 1) rather than incoming input xk . and has frequently been used by researchers to test their SAN-
FSs. The nonlinear dynamic of the system can be formulated by
VI. EVALUATION the following differential equation:
PALM has been evaluated through numerical studies with the y(k)
use of synthetic and real-world streaming datasets. The code of y(k + 1) = + u3 (k) (55)
1 + y 2 (k)
PALMs and rPALMs along with these datasets have been made
publicly available in [27] and [54]. where u(k) = sin(2πk/100). The predicted output of the sys-
tem y(k + 1) depends on the previous inputs and its own lagged
outputs, which can be expressed as follows:
A. Experimental Setup
1) Synthetic Streaming Datasets: Three synthetic streaming y(k + 1) = f (y(k), y(k − 1), . . . , y(k − 10), u(k)). (56)
datasets are utilized in our work to evaluate the adaptive mech- The first 50 000 samples are employed to build our predictive
anism of the PALM: model, and other 200 samples are fed the model to test model’s
1) Box–Jenkins (BJ) time series dataset; generalization.
2) the Mackey–Glass (MG) chaotic time series dataset; and 2) Real-World Streaming Datasets: Three different real-
3) nonlinear system identification dataset. world streaming datasets from two rotary wing unmanned aerial
a) BJ Gas Furnace Time Series Dataset: The BJ gas furnace vehicle’s (RUAV) experimental flight tests and a time-varying
dataset is a famous benchmark problem in the literature to verify stock index forecasting data are exploited to study the perfor-
the performance of SANFSs. The objective of the BJ gas furnace mance of PALM.
problem is to model the output (y(k)), i.e., the CO2 concentra- a) Quadcopter UAV Streaming Data: A real-world streaming
tion from the time-delayed input (u(k − 4)) methane flow rate dataset is collected from a Pixhawk autopilot framework based
and its previous output y(k − 1). The I/O configuration follows quadcopter RUAV’s experimental flight test. All experiments are
the standard setting in the literature as follows: performed in the indoor UAV laboratory at the UNSW, Canberra
y(k) = f (u(k − 4), y(k − 1)). (52) campus. To record quadcopter flight data, the Robot Operating
System (ROS), running under the Ubuntu 16.04 version of Linux
This problem of (52) consists of 290 data samples where 200 is used. By using the ROS, a well-structured communication
samples are reserved for the training samples while remaining layer is introduced into the quadcopter reducing the burden of
90 samples are used to test model’s generalization. having to reinvent necessary software.
b) MG Chaotic Time Series Dataset: MG chaotic time se- During the real-time flight testing accurate vehicle position,
ries problem having its root in [57] is a popular benchmark velocity, and orientation are the required information to identify

Authorized licensed use limited to: South Asian University. Downloaded on May 10,2024 at 12:38:37 UTC from IEEE Xplore. Restrictions apply.
2124 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 27, NO. 11, NOVEMBER 2019

the quadcopter online. For system identification, a flight data 3) Type-2 PALM (L);
of quadcopter’s altitude containing approximately 9000 sam- 4) Type-2 PALM (G).
ples are recorded with some noise from VICON optical motion L denotes the Local update strategy while G stands for the
capture system. Among them, 60% of the samples are used for Global learning mechanism. Basic PALM models are tested
training and remaining 40% are for testing. In this paper, our with three synthetic and three real-world streaming datasets.
model’s output y(k) is estimated as y(k) from the previous point Furthermore, the models are compared against eight promi-
y(k − 6), and the system input u(k), which is the required thrust nent variants of SANFSs, namely DFNN [42], GDFNN [55],
to the rotors of the quadcopter. The regression model from the FAOSPFNN [56], eTS [7], simp_eTS [8], GENEFIS [9], PAN-
quadcopter data stream can be expressed as follows: FIS [10], and pRVFLN [18]. Experiments with real-world and
synthesis data streams are repeated with rPALM. All experimen-
y(k) = f (y(k − 6), u(k)) . (57)
tal results using the rPALM are also purveyed in the supplemen-
b) Helicopter UAV Streaming Data: The chosen RUAV for tary document. Proposed PALMs’ efficacy has been evaluated
gathering streaming dataset is a Taiwanese made Align Trex450 by measuring the root mean square error (RMSE), and nondi-
Pro Direct Flight Control, fly bar-less, helicopter. The high de- mensional error index (NDEI) written as follows:
gree of nonlinearity associated with the Trex450 RUAV vertical
N
dynamics makes it challenging to build a regression model from
k =1 (yt − yk )2 √
experimental data streams. All experiments are conducted at MSE = , RMSE = MSE (60)
NT s
the UAV laboratory of the UNSW Canberra campus. Flight data
consists of 6000 samples collected in near hover, heave and in RMSE
NDEI = (61)
ground effect flight conditions to simulate nonstationary envi- Std(Ts )
ronments. First 3600 samples are used for the training data, and
the rest of the data are aimed to test the model. The nonlinear de- where NT s is the total number of testing samples, and Std(Ts )
pendence of the helicopter RUAV is governed by the regression denotes a standard deviation over all actual output values in the
model as follows: testing set. A comparison is produced under the same compu-
tational platform in Intel(R) Xeon(R) E5-1630 v4 CPU with a
y(k + 1) = f (y(k), u(k)) (58)
3.70 GHz processor and 16.0 GB installed memory.
where y(k + 1) is the estimated output of the helicopter system 1) Results and Discussion on Synthetic Streaming Datasets:
at k = 1. Table I sums up the outcomes of the BJ time series for all bench-
c) Time-Varying Stock Index Forecasting Data: Our proposed marked models. Among various models, our proposed type-2
PALM has been evaluated by the time-varying dataset, namely PALM (G) clearly outperforms other consolidated algorithms in
the prediction of Standard and Poor’ s 500 [S&P-500 (∧ GSPC)] terms of predictive accuracy. For instance, the measured NDEI
market index [58], [59]. The dataset consists of 60 years of is just 0.0562—the lowest among all models. Type-2 PALM
daily index values ranging from January 3, 1950 to March 12, (G) generates eleven (11) rules to achieve this accuracy level.
2009, downloaded from [60]. This problem comprises 14 893 Although the number of generated rules is higher than that of
data samples. In our work, the reversed order data points of remaining models, this accuracy far exceeds its counterparts
the same 60 years indexes have amalgamated with the origi- whose accuracy hovers around 0.29. A fair comparison is also
nal dataset, forming a new dataset with 29 786 index values. established by utilizing very close number of rules in some
Among them, 14 893 samples are allocated to train the model benchmarked strategies namely eTS, simp_eTS, PANFIS, and
and the remainder of 14 893 samples are used for the valida- GENEFIS. By doing so the lowest observed NDEI among the
tion data. The target variable is the next day S&P-500 index benchmarked variations is 0.29, delivered by GENEFIS. It is
y(k + 1) predicted using previous five consecutive days in- substantially higher than the measured NDEI of type-2 PALM
dexes: y(k), y(k − 1), y(k − 2), y(k − 3), and y(k − 4). The (G). The advantage of HPBC is evidenced by the number of
functional relationship of the predictive model is formalized as PALM’s network parameters, where with 13 rules and two in-
follows: puts, PALM evolves only 39 parameters, whereas the number
of network parameters of other algorithm for instance GENE-
y(k + 1) = f (y(k), y(k − 1), y(k − 2), y(k − 3) y(k − 4)) .
FIS is 117. PALM requires the fewest parameters than all the
(59)
other variants of SANFS as well and affects positively to exe-
This dataset carries the sudden drift property that happens
cution speed of PALM. On the other hand, with only two rule
around 2008. This property corresponds to the economic reces-
the NDEI of PALM is also lower than the benchmarked variants
sion in the U.S. due to the housing crisis in 2009.
as observed in type-2 PALM (L) from Table I, where it requires
only 12 network parameter. It is important to note that the rule
B. Results and Discussion merging mechanism is active in the case of only local learning
In this paper, we have developed PALM by implementing scenario. Here the number of induced rules are 8 and 2, which
type-1 and type-2 fuzzy concept, where both of them are simu- is lower than, i.e., 8 and 11 in their global learning versions.
lated under two parameter optimization scenarios: In both cases of G and L, the NDEI is very close to each other
1) Type-1 PALM (L); with a very similar number of rules. In short, PALM constructs
2) Type-1 PALM (G); a compact regression model using the BJ time series with the

Authorized licensed use limited to: South Asian University. Downloaded on May 10,2024 at 12:38:37 UTC from IEEE Xplore. Restrictions apply.
FERDAUS et al.: PALM: AN INCREMENTAL CONSTRUCTION OF HYPERPLANES FOR DATA STREAM REGRESSION 2125

TABLE I
MODELING OF THE BJ TIME SERIES USING VARIOUS SANFSS

TABLE II
MODELING OF THE MG CHAOTIC TIME SERIES USING VARIOUS SANFSS

least number of network parameters while producing the most on the nonlinear system identification dataset is presented in
reliable prediction. Table III. This study case depicts similar trend where PALM is
The prediction of MG chaotic time series is challenging due capable of delivering comparable accuracy but with much less
to the nonlinear and chaotic behavior. Numerical results on the computational complexity and memory demand. The deploy-
MG chaotic time series dataset is consolidated in Table II, where ment of rule merging module lessens the number of rules from
500 unseen samples are used to test all the models. Due to the 9 to 5 in case of type-1 PALM, and 3 from 7 in type-2 PALM.
highly nonlinear behavior, an NDEI lower than 0.2 was obtained The obtained NDEI of PALMs with such a small number of
from only GENEFIS [9] among other benchmarked algorithms. rules is also similar to other SANFS variants. To sum up, the
However, it costs 42 rules and requires a big number (1050) of PALM can deal with streaming examples with low computa-
network parameters. On the contrary, with only 18 rules, 180 tional burden due to the utilization of few network parameters,
network parameters, and faster execution, the type-2 PALM (G) where it maintains a comparable or better predictive accuracy.
attains NDEI of 0.0575, where this result is traced within 3.21 s 2) Results and Discussion on Real-World Data Streams:
due to the deployment of fewer parameters than its counterparts. Table IV outlines the results of online identification of a
The use of rule merging method in local learning mode reduces quadcopter RUAV from experimental flight test data. A total
the generated rules to three (3)—type-1 PALM (L). A compara- 9112 samples of quadcopter’s hovering test with a very high
ble accuracy is obtained from type-1 PALM (L) with only three noise from motion capture technique namely VICON [61] is
rules and 15 network parameters. An accomplishment of such recorded. Building SANFS using the noisy streaming dataset
accuracy with few parameters decreases the computational com- is computationally expensive as seen from a high execution
plexity in the predicting complex nonlinear system as witnessed time of the benchmarked SANFSs. Contrast with these stan-
from type-1 PALM (L) in Table II. Due to low computational dard SANFSs, a quick execution time is seen from PALMs.
burden, the lowest execution time of 0.7771 s is achieved by the It happens due to the requirement of few network parameters.
type-1 PALM (G). Besides, PALM arrives at encouraging accuracy as well. For
PALM has been utilized to estimate a high-dimensional non- instance, the lowest NDEI at just 0.1538 is elicited in type-
linear system with 50 000 training samples. Numerical results 2 PALM (G). To put it plainly, due to utilizing incremental

Authorized licensed use limited to: South Asian University. Downloaded on May 10,2024 at 12:38:37 UTC from IEEE Xplore. Restrictions apply.
2126 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 27, NO. 11, NOVEMBER 2019

TABLE III
MODELING OF THE NONLINEAR SYSTEM USING VARIOUS SANFSS

TABLE IV
ONLINE MODELING OF THE QUADCOPTER UTILIZING VARIOUS SANFSS

HPBC, PALM can perform better than its counterparts SANFSs In (62), the antecedent part is manifesting the hyperplane. The
driven by HSBC and HEBC methods when dealing with noisy consequent part is simply y1 = x1e ω, where ω ∈ (n +1)×1 , n is
datasets. the number of input dimension. Usage of two inputs in the
The online identification of a helicopter RUAV (Trex450 Pro) experiment of Table V assembles an extended input vector
from experimental flight data at hovering condition are tabulated like: x1e = [1, x1 , x2 ]. The weight vector is: [ω01 , ω11 , ω21 ] =
in Table V. The highest identification accuracy with the NDEI [0.3334, 0.3334, 0.3334]. In case of Type-2 local learning con-
of only 0.1380 is obtained from the proposed type-2 PALM (G) figuration, a rule can be stated as follows:
with 9 rules. As with the previous experiments, the activation 
of rule merging scenario reduces the fuzzy rules significantly 
R1 : IF X is close to [1, x1 , x2 ]
from 11 to 6 in type-1 PALM, and from 9 to 6 in type-2 PALM.
The highest accuracy is produced by type-2 PALM with only 
× [0.0787, −0.3179, 1.0281]T , ([1, x1 , x2 ]
9 rules due to most likely uncertainty handling capacity of the 
type-2 fuzzy system. PALM’s prediction on the helicopter’s
× [0.2587, −0.1767, 1.2042] )
T
hovering dynamic and its rule evolution are depicted in Fig. 3.
These figures are produced by the type-2 PALM (L). For further
clarification, the fuzzy rule extracted by type-1 PALM (L) in THEN y1 = [0.0787, 0.2587] + [−0.3179, −0.1767]x1
case of modeling helicopter can be uttered as follows: + [1.0281, 1.2042]x2 (63)

R1 : IF X is close to [1, x1 , x2 ] × [0.0186, where (63) is expressing the first rule among 6 rules formed
in that experiment in Type-2 PALM’s local learning sce-

nario. Since the PALM has no premise parameters, the an-
− 0.0909, 0.9997] T
, THEN y1 = 0.0186 − 0.0909x1 tecedent part is just presenting the interval-valued hyper-
planes. The consequent part is noting but y1 = x1e ω, where
+ 0.9997x2 . (62) ω ∈ (2(n +1))×1 , n is the number of input dimension. Since

Authorized licensed use limited to: South Asian University. Downloaded on May 10,2024 at 12:38:37 UTC from IEEE Xplore. Restrictions apply.
FERDAUS et al.: PALM: AN INCREMENTAL CONSTRUCTION OF HYPERPLANES FOR DATA STREAM REGRESSION 2127

TABLE V
ONLINE MODELING OF THE HELICOPTER UTILIZING VARIOUS SANFSS

Fig. 3. (a) Online identification of helicopter (in hovering condition). (b) Rule evolution in that identification using type-2 PALM (L).

2 inputs are availed in the experiment of Table V, the extended In the first test, b2 is varied in the range of [0.052, 0.053,
input vector is: x1e = [1,x1 , x2 ], and interval-valued
 weight
 0.054, 0.055], while the value of b1 is kept fixed at 0.020. On
vectors are: ω01 , ω01 = [0.0787, 0.2587]; ω 11 , ω 11 = the other hand, the varied range for b1 is [0.020, 0.022, 0.024,
[−0.3179, −0.1767]; ω21 , ω21 = [1.0281, 1.2042]. 0.026], while b2 is maintained at 0.055. In the second test, the
The numerical results on the time-varying Stock Index Fore- altering range for b1 is [0.031, 0.033, 0.035, 0.037], and for b2
casting S&P-500 (∧ GSPC) problem are organized in Table VI. is [0.044, 0.046, 0.048, 0.050]. In this test, for a varying b1 , the
The lowest number of network parameters is obtained from constant value of b2 is 0.050, where b1 is fixed at 0.035 during
PALMs, and subsequently, the fastest training speed of 2.0326 the change of b2 . To evaluate the sensitivity of these thresholds,
s is attained by type-1 PALM (L). All consolidated bench- normalized RMSE (NRMSE), NDEI, running time, and number
marked algorithms generate comparatively lower level of ac- of rules are reported in Table VII. The NRMSE formula can be
curacy around 0.015–1.07. expressed as

MSE
NRMSE = . (64)
C. Sensitivity Analysis of Predefined Thresholds Std(Ts )
In the rule growing scenario, two predefined thresholds (b1
and b2 ) are utilized in our work. During various experimenta- From Table VII, it has been observed that in the first test for
tion, it has been observed that the higher the value of b1 , the less different values of b1 and b2 , the value of NRMSE and NDEI
the number of hyperplanes are added and vice versa. Unlike the remains stable at 0.023 and 0.059, respectively. The execution
effect of b1 , in case of b2 , at higher values, more hyperplanes are time varies in a stable range of [0.31, 0.35] s and the number
added and vice versa. To further validate this feature, the sensi- of generated rules is 13. In the second test, the NRMSE, NDEI,
tivity of b1 and b2 is evaluated using the BJ gas furnace dataset. and execution time are relatively constant in the range of [0.046,
The same I/O relationship as described in the subSection VI-A 0.048], [0.115, 0.121], [0.26, 0.31] correspondingly. The value
is applied here, where the model is trained also with same 200 of b1 increases, and b2 reduces compared to test 1, and less
samples and remaining 90 unseen samples are used to test the number of rules are generated across different experiments of
model. our work.

Authorized licensed use limited to: South Asian University. Downloaded on May 10,2024 at 12:38:37 UTC from IEEE Xplore. Restrictions apply.
2128 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 27, NO. 11, NOVEMBER 2019

TABLE VI
MODELING OF THE TIME-VARYING STOCK INDEX FORECASTING USING VARIOUS SANFSS

TABLE VII datasets from the quadcopter and helicopter flight test, and Com-
SENSITIVITY ANALYSIS OF RULE GROWING THRESHOLDS
putational Intelligence Laboratory of Nanyang Technological
University (NTU) Singapore for the computational support.

REFERENCES
[1] C. C. Aggarwal, Data Streams: Models and Algorithms, vol. 31. Berlin,
Germany: Springer, 2007.
[2] J. Gama, Knowledge Discovery From Data Streams. Boca Raton, FL,
USA: CRC Press, 2010.
[3] J. A. Silva et al., “Data stream clustering: A survey,” ACM Comput.
Surveys, vol. 46, no. 1, 2013, Art. no. 13.
[4] R. J. C. Bose, W. M. Van Der Aalst, I. Zliobaite, and M. Pechenizkiy,
“Dealing with concept drifts in process mining,” IEEE Trans. Neural
Netw. Learn. Syst., vol. 25, no. 1, pp. 154–171, Jan. 2014.
[5] E. Lughofer and P. Angelov, “Handling drifts and shifts in on-line data
streams with evolving fuzzy systems,” Appl. Soft Comput., vol. 11, no. 2,
pp. 2057–2068, 2011.
[6] C.-F. Juang and C.-T. Lin, “An online self-constructing neural fuzzy infer-
ence network and its applications,” IEEE Trans. Fuzzy Syst., vol. 6, no. 1,
pp. 12–32, Feb. 1998.
[7] P. P. Angelov and D. P. Filev, “An approach to online identification of
VII. CONCLUSION Takagi-Sugeno fuzzy models,” IEEE Trans. Syst., Man Cybern., B (Cy-
bern.), vol. 34, no. 1, pp. 484–498, Feb. 2004.
A novel SANFS, namely PALM, is proposed in this paper [8] P. Angelov and D. Filev, “Simpl_eTS: A simplified method for learning
for data stream regression. The PALM is developed with the evolving Takagi-Sugeno fuzzy models,” in Proc. 14th IEEE Int. Conf.
Fuzzy Syst., 2005, pp. 1068–1073.
concept of HPBC which incurs very low network parameters. [9] M. Pratama, S. G. Anavatti, and E. Lughofer, “GENEFIS: Toward an
The reduction of network parameters bring down the execution effective localist network,” IEEE Trans. Fuzzy Syst., vol. 22, no. 3, pp. 547–
times because only the output weight vector calls for the tuning 562, Jun. 2014.
[10] M. Pratama, S. G. Anavatti, P. P. Angelov, and E. Lughofer, “PANFIS: A
scenario without compromise on predictive accuracy. PALM novel incremental learning machine,” IEEE Trans. Neural Netw. Learn.
possesses a highly adaptive rule base where its fuzzy rules can be Syst., vol. 25, no. 1, pp. 55–68, Jan. 2014.
automatically added when necessary based on the SCC theory. It [11] M. Pratama, E. Lughofer, M. J. Er, S. Anavatti, and C.-P. Lim, “Data
driven modelling based on recurrent interval-valued metacognitive scaf-
implements the rule merging scenario for complexity reduction folding fuzzy neural network,” Neurocomputing, vol. 262, pp. 4–27,
and the concept of distance and angle is introduced to coalesce 2017.
similar rules. The efficiency of the PALM has been tested in [12] J. de Jesús Rubio, “Error convergence analysis of the SUFIN and
CSUFIN,” Appl. Soft Comput., vol. 72, pp. 587–595, 2018.
six real-world and artificial data stream regression problems [13] Y. Pan, Y. Liu, B. Xu, and H. Yu, “Hybrid feedback feedforward: An
where PALM outperforms recently published works in terms of efficient design of adaptive neural network control,” Neural Netw., vol. 76,
network parameters and running time. It also delivers state-of- pp. 122–134, 2016.
[14] J. de Jesús Rubio, “USNFIS: Uniform stable neuro fuzzy inference sys-
the art accuracies that happen to be comparable and often better tem,” Neurocomputing, vol. 262, pp. 57–66, 2017.
than its counterparts. In the future, PALM will be incorporated [15] J. A. Meda-Campana, “Estimation of complex systems with parametric
under a deep network structure. uncertainties using a JSSF heuristically adjusted,” IEEE Latin America
Trans., vol. 16, no. 2, pp. 350–357, Feb. 2018.
[16] M. M. Ferdaus, M. Pratama, S. G. Anavatti, and M. A. Garratt, “Online
ACKNOWLEDGMENT identification of a rotary wing unmanned aerial vehicle from data streams,”
Appl. Soft Comput., vol. 76, pp. 313–325, 2019.
The authors would like to thank the UAV laboratory of the [17] P. Angelov and R. Yager, “A new type of simplified fuzzy rule-based
UNSW, Canberra campus for supporting with the real-world system,” Int. J. General Syst., vol. 41, no. 2, pp. 163–185, 2012.

Authorized licensed use limited to: South Asian University. Downloaded on May 10,2024 at 12:38:37 UTC from IEEE Xplore. Restrictions apply.
FERDAUS et al.: PALM: AN INCREMENTAL CONSTRUCTION OF HYPERPLANES FOR DATA STREAM REGRESSION 2129

[18] M. Pratama, P. P. Angelov, E. Lughofer, and M. J. Er, “Parsimonious [41] R. Krishnapuram, H. Frigui, and O. Nasraoui, “Fuzzy and possibilistic
random vector functional link network for data streams,” Inf. Sci., vol. 430, shell clustering algorithms and their application to boundary detection and
pp. 519–537, 2018. surface approximation,” IEEE Trans. Fuzzy Syst., vol. 3, no. 1, pp. 29–43,
[19] E. Kim, M. Park, S. Ji, and M. Park, “A new approach to fuzzy modeling,” Feb. 1995.
IEEE Trans. Fuzzy Syst., vol. 5, no. 3, pp. 328–337, Aug. 1997. [42] S. Wu and M. J. Er, “Dynamic fuzzy neural networks-a novel approach to
[20] C. Kung and J. Su, “Affine Takagi-Sugeno fuzzy modelling algorithm by function approximation,” IEEE Trans. Syst., Man Cybern., B (Cybern.),
fuzzy c-regression models clustering with a novel cluster validity crite- vol. 30, no. 2, pp. 358–364, Apr. 2000.
rion,” IET Control Theory Appl., vol. 1, no. 5, pp. 1255–1265, 2007. [43] P. Mitra, C. Murthy, and S. K. Pal, “Unsupervised feature selection using
[21] C. Li, J. Zhou, X. Xiang, Q. Li, and X. An, “T-S fuzzy model identification feature similarity,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3,
based on a novel fuzzy C-regression model clustering algorithm,” Eng. pp. 301–312, Mar. 2002.
Appl. Artif. Intell., vol. 22, no. 4-5, pp. 646–653, 2009. [44] L. Ljung, System Identification, Theory for the User. Upper Saddle River,
[22] M. F. Zarandi, R. Gamasaee, and I. Turksen, “A type-2 fuzzy c-regression NJ, USA: Prentice-Hall, 1999.
clustering algorithm for Takagi-Sugeno system identification and its ap- [45] E. Lughofer, Evolving Fuzzy Systems-Methodologies, Advanced Concepts
plication in the steel industry,” Inf. Sci., vol. 187, pp. 179–203, 2012. and Applications, vol. 53. Berlin, Germany: Springer, 2011.
[23] W. Zou, C. Li, and N. Zhang, “A T-S Fuzzy model identification approach [46] E. Lughofer, C. Cernuda, S. Kindermann, and M. Pratama, “Generalized
based on a modified inter Type-2 FRCM algorithm,” IEEE Trans. Fuzzy smart evolving fuzzy systems,” Evolving Syst., vol. 6, no. 4, pp. 269–292,
Syst., vol. 26, no. 3, pp. 1104–1113, Jun. 2017. 2015.
[24] R.-F. Xu and S.-J. Lee, “Dimensionality reduction by feature clustering [47] C.-H. Kim and M.-S. Kim, “Incremental hyperplane-based fuzzy cluster-
for regression problems,” Inf. Sci., vol. 299, pp. 42–57, 2015. ing for system modeling,” in Proc. 33rd Annu. Conf. IEEE Ind. Electron.
[25] J.-Y. Jiang, R.-J. Liou, and S.-J. Lee, “A fuzzy self-constructing feature Soc., 2007, pp. 614–619.
clustering algorithm for text classification,” IEEE Trans. Knowl. Data [48] E. Lughofer, J.-L. Bouchot, and A. Shaker, “On-line elimination of local
Eng., vol. 23, no. 3, pp. 335–349, Mar. 2011. redundancies in evolving fuzzy systems,” Evolving Syst., vol. 2, no. 3,
[26] wolfram, “Point-plane distance,” 2018. [Online]. Available: https:// pp. 165–187, 2011.
mathworld.wolfram.com/Point-PlaneDistance.html [49] M.-S. Kim, C.-H. Kim, and J.-J. Lee, “Evolving compact and interpretable
[27] M. M. Ferdaus and M. Pratama, “PALM code,” 2018. [Online]. Available: Takagi-Sugeno fuzzy models with a new encoding scheme,” IEEE Trans.
https://www.researchgate.net/publication/325253670_PALM_Matlab_ Syst., Man, Cybern., B (Cybern.), vol. 36, no. 5, pp. 1006–1023, Oct. 2006.
Code [50] Y. Xu, K.-W. Wong, and C.-S. Leung, “Generalized RLS approach to the
[28] E. D. Lughofer, “FLEXFIS: A robust incremental learning approach for training of neural networks,” IEEE Trans. Neural Netw., vol. 17, no. 1,
evolving Takagi-Sugeno fuzzy models,” IEEE Trans. Fuzzy Syst., vol. 16, pp. 19–34, Jan. 2006.
no. 6, pp. 1393–1410, Dec. 2008. [51] P. Angelov, E. Lughofer, and X. Zhou, “Evolving fuzzy classifiers using
[29] N. K. Kasabov and Q. Song, “DENFIS: Dynamic evolving neural-fuzzy different model architectures,” Fuzzy Sets Syst., vol. 159, no. 23, pp. 3160–
inference system and its application for time-series prediction,” IEEE 3182, 2008.
Trans. Fuzzy Syst., vol. 10, no. 2, pp. 144–154, Apr. 2002. [52] D. J. MacKay, “Bayesian interpolation,” Neural Comput., vol. 4, no. 3,
[30] P. Angelov, “Evolving Takagi-Sugeno fuzzy systems from streaming data pp. 415–447, 1992.
(eTS+),” Evolving Intell. Syst., Methodol. Appl., vol. 12, pp. 21–50, 2010. [53] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep Learning,
[31] A. Lemos, W. Caminhas, and F. Gomide, “Adaptive fault detection and vol. 1. Cambridge, MA, USA: MIT Press, 2016.
diagnosis using an evolving fuzzy classifier,” Inf. Sci., vol. 220, pp. 64–85, [54] M. M. Ferdaus and M. Pratama, “RPALM code,” 2018. [On-
2013. line]. Available: https://www.researchgate.net/publication/325253670_
[32] C.-F. Juang and Y.-W. Tsao, “A self-evolving interval type-2 fuzzy neural RPALM_Matlab_Code
network with online structure and parameter learning,” IEEE Trans. Fuzzy [55] S. Wu, M. J. Er, and Y. Gao, “A fast approach for automatic generation of
Syst., vol. 16, no. 6, pp. 1411–1424, Dec. 2008. fuzzy rules by generalized dynamic fuzzy neural networks,” IEEE Trans.
[33] C.-F. Juang, Y.-Y. Lin, and C.-C. Tu, “A recurrent self-evolving fuzzy Fuzzy Syst., vol. 9, no. 4, pp. 578–594, Aug. 2001.
neural network with local feedbacks and its application to dynamic [56] N. Wang, M. J. Er, and X. Meng, “A fast and accurate online self-
system processing,” Fuzzy Sets Syst., vol. 161, no. 19, pp. 2552–2568, organizing scheme for parsimonious fuzzy neural networks,” Neurocom-
2010. puting, vol. 72, no. 16-18, pp. 3818–3829, 2009.
[34] C.-F. Juang and C.-Y. Chen, “Data-driven interval type-2 neural fuzzy [57] M. C. Mackey and L. Glass, “Oscillation and chaos in physiological
system with high learning accuracy and improved model interpretability,” control systems,” Science, vol. 197, no. 4300, pp. 287–289, 1977.
IEEE Trans. Cybern., vol. 43, no. 6, pp. 1781–1795, Dec. 2013. [58] R. J. Oentaryo, M. J. Er, S. Linn, and X. Li, “Online probabilistic learning
[35] N. N. Karnik, J. M. Mendel, and Q. Liang, “Type-2 fuzzy logic systems,” for fuzzy inference system,” Expert Syst. Appl., vol. 41, no. 11, pp. 5082–
IEEE Trans. Fuzzy Syst., vol. 7, no. 6, pp. 643–658, Dec. 1999. 5096, 2014.
[36] C. J. Watkins and P. Dayan, “Q-learning,” Mach. Learn., vol. 8, no. 3/4, [59] J. Tan and C. Quek, “A BCM theory of meta-plasticity for online
pp. 279–292, 1992. self-reorganizing fuzzy-associative learning,” IEEE Trans. Neural Netw.,
[37] R. H. Abiyev and O. Kaynak, “Type 2 fuzzy neural structure for identi- vol. 21, no. 6, pp. 985–1003, Jun. 2010.
fication and control of time-varying plants,” IEEE Trans. Ind. Electron., [60] FinanceYahoocom, “S and p 500,” 2018. [Online]. Available: https://
vol. 57, no. 12, pp. 4147–4159, Dec. 2010. finance.yahoo.com/quote/%5EGSPC?p=ˆGSPC
[38] S. Suresh, K. Dong, and H. Kim, “A sequential learning algorithm for self- [61] M. capture camera, “vicon camera,” 2018. [Online]. Available: https://
adaptive resource allocation network classifier,” Neurocomputing, vol. 73, www.vicon.com/
no. 16–18, pp. 3012–3019, 2010.
[39] M. Pratama, J. Lu, S. Anavatti, E. Lughofer, and C.-P. Lim, “An incre-
mental meta-cognitive-based scaffolding fuzzy neural network,” Neuro-
computing, vol. 171, pp. 89–105, 2016.
[40] R. J. Hathaway and J. C. Bezdek, “Switching regression models and fuzzy
clustering,” IEEE Trans. Fuzzy Syst., vol. 1, no. 3, pp. 195–204, Aug. 1993. Authors’ photographs and biographies not available at the time of publication.

Authorized licensed use limited to: South Asian University. Downloaded on May 10,2024 at 12:38:37 UTC from IEEE Xplore. Restrictions apply.

You might also like