Neuro Fuzzy Methods
Neuro Fuzzy Methods
Neuro Fuzzy Methods
Robert Full er
E otv os Lor and University, Budapest
VACATION SCHOOL
Neuro-Fuzzy Methods for Modelling
&
Fault Diagnosis
Lisbon, August 31 and September 1, 2001
1
Fuzzy logic and neural networks
Fuzzy sets were introduced by Zadeh in 1965
to represent/manipulate data and informa-
tion possessing nonstatistical uncertainties.
Fuzzy logic provides an inference morphol-
ogy that enables approximate human rea-
soning capabilities to be applied to knowledge-
based systems. The theory of fuzzy logic
provides a mathematical strength to capture
the uncertainties associated with human cog-
nitive processes, such as thinking and rea-
soning.
The conventional approaches to knowledge
representation lack the means for represen-
tating the meaning of fuzzy concepts.
As a consequence, the approaches based on
rst order logic and classical probablity the-
2
ory do not provide an appropriate concep-
tual framework for dealing with the repre-
sentation of commonsense knowledge, since
such knowledge is by its nature both lexi-
cally imprecise and noncategorical.
The developement of fuzzy logic was mo-
tivated in large measure by the need for a
conceptual framework which can address the
issue of uncertainty and lexical imprecision.
Some of the essential characteristics of fuzzy
logic relate to the following (Zadeh, 1992):
In fuzzy logic, exact reasoning is
viewed as a limiting case of ap-
proximate reasoning.
In fuzzy logic, everything is a mat-
ter of degree.
In fuzzy logic, knowledge is in-
terpreted a collection of elastic or,
equivalently, fuzzy constraint on a
3
collection of variables.
Inference is viewed as a process of
propagation of elastic constraints.
Any logical system can be fuzzi-
ed.
There are two main characteristics of fuzzy
systems that give them better performance
for specic applications.
Fuzzy systems are suitable for uncertain
or approximate reasoning, especially for
the system with a mathematical model
that is difcult to derive.
Fuzzy logic allows decision making with
estimated values under incomplete or un-
certain information.
Articial neural systems can be considered
as simplied mathematical models of brain-
4
like systems and they function as parallel
distributed computing networks. However,
in contrast to conventional computers, which
are programmed to perform specic task,
most neural networks must be taught, or trained.
They can learn new associations, new func-
tional dependencies and new patterns.
Perhaps the most important advantage of neu-
ral networks is their adaptivity. Neural net-
works can automatically adjust their weights
to optimize their behavior as pattern recog-
nizers, decision makers, system controllers,
predictors, etc. Adaptivity allows the neu-
ral network to perform well even when the
environment or the system being controlled
varies over time.
5
Hybrid systems
Hybrid systems combining fuzzy logic, neu-
ral networks, genetic algorithms, and expert
systems are proving their effectiveness in a
wide variety of real-world problems.
Every intelligent technique has particular com-
putational properties (e.g. ability to learn,
explanation of decisions) that make them
suited for particular problems and not for
others.
For example, while neural networks are good
at recognizing patterns, they are not good at
explaining how they reach their decisions.
Fuzzy logic systems, which can reason with
imprecise information, are good at explain-
ing their decisions but they cannot automat-
ically acquire the rules they use to make
6
those decisions.
These limitations have been a central driv-
ing force behind the creation of intelligent
hybrid systems where two or more techniques
are combined in a manner that overcomes
the limitations of individual techniques.
Hybrid systems are also important when con-
sidering the varied nature of application do-
mains. Many complex domains have many
different component problems, each of which
may require different types of processing.
If there is a complex application which has
two distinct sub-problems, say a signal pro-
cessing task and a serial reasoning task, then
a neural network and an expert system re-
spectively can be used for solving these sep-
arate tasks.
7
The use of intelligent hybrid systems is
growing rapidly with successful applica-
tions in many areas including process con-
trol, engineering design, nancial trad-
ing, credit evaluation, medical diagnosis,
and cognitive simulation.
While fuzzy logic provides an inference mech-
anism under cognitive uncertainty, compu-
tational neural networks offer exciting ad-
vantages, such as learning, adaptation, fault-
tolerance, parallelism and generalization.
To enable a system to deal with cognitive
uncertainties in a manner more like humans,
one may incorporate the concept of fuzzy
logic into the neural networks. The result-
ing hybrid systemis called fuzzy neural, neu-
ral fuzzy, neuro-fuzzy or fuzzy-neuro net-
work.
8
Neural networks are used to tune mem-
bership functions of fuzzy systems that
are employed as decision-making systems
for controlling equipment.
Although fuzzy logic can encode expert knowl-
edge directly using rules with linguistic la-
bels, it usually takes a lot of time to design
and tune the membership functions which
quantitatively dene these linquistic labels.
Neural network learning techniques can au-
tomate this process and substantially reduce
development time and cost while improving
performance.
In theory, neural networks, and fuzzy sys-
tems are equivalent in that they are convert-
ible, yet in practice each has its own advan-
tages and disadvantages. For neural net-
works, the knowledge is automatically ac-
9
quired by the backpropagation algorithm, but
the learning process is relatively slow and
analysis of the trained network is difcult
(black box).
Neither is it possible to extract structural
knowledge (rules) from the trained neural
network, nor can we integrate special in-
formation about the problem into the neu-
ral network in order to simplify the learning
procedure.
Fuzzy systems are more favorable in that
their behavior can be explained based on
fuzzy rules and thus their performance can
be adjusted by tuning the rules. But since,
in general, knowledge acquisition is dif-
cult and also the universe of discourse of
each input variable needs to be divided into
several intervals, applications of fuzzy sys-
tems are restricted to the elds where expert
10
knowledge is available and the number of
input variables is small.
To overcome the problem of knowledge ac-
quisition, neural networks are extended to
automatically extract fuzzy rules from nu-
merical data.
The computational process envisioned for
fuzzy neural systems is as follows. It starts
with the development of a fuzzy neuron
based on the understanding of biological neu-
ronal morphologies, followed by learning
mechanisms. This leads to the following
three steps in a fuzzy neural computational
process
development of fuzzy neural models mo-
tivated by biological neurons,
models of synaptic connections which in-
corporates fuzziness into neural network,
11
Linguistic
statements
Neural
Network
Decisions
Perception as
neural inputs
(Neural
outputs)
Fuzzy
Interface
Learning
algorithm
development of learning algorithms (that
is the method of adjusting the synaptic
weights)
Two possible models of fuzzy neural sys-
tems are
In response to linguistic statements, the
fuzzy interface block provides an input
vector to a multi-layer neural network.
The neural network can be adapted (trained)
to yield desired command outputs or de-
cisions.
Figure 1: The rst model of fuzzy neural system.
12
Neural
Inputs
Decisions
Fuzzy
Inference
Neural outputs
Neural
Network
Learning
algorithm
Knowledge-base
Amulti-layered neural network drives the
fuzzy inference mechanism.
Figure 2: The second model of fuzzy neural system.
The basic processing elements of neural net-
works are called articial neurons, or sim-
ply neurons. The signal ow from of neu-
ron inputs, x
j
, is considered to be unidirec-
tionalas indicated by arrows, as is a neu-
rons output signal ow.
Consider a simple neural net in Figure 3.
All signals and weights are real numbers.
13
x1
xn
w1
wn
y = f(<w, x>)
f
Figure 3: A simple neural net.
The input neurons do not change the input
signals so their output is the same as their
input.
The signal x
i
interacts with the weight w
i
to
produce the product p
i
= w
i
x
i
, i = 1, . . . , n.
The input information p
i
is aggregated, by
addition, to produce the input
net = p
1
+ + p
n
= w
1
x
1
+ + w
n
x
n
,
to the neuron. The neuron uses its trans-
fer function f, which could be a sigmoidal
function,
14
f(t) =
1
1 + e
t
to compute the output
y = f(net) = f(w
1
x
1
+ + w
n
x
n
).
This simple neural net, which employs mul-
tiplication, addition, and sigmoidal f, will
be called as regular (or standard) neural net.
A hybrid neural net is a neural net with crisp
signals and weights and crisp transfer func-
tion. However, (i) we can combine x
i
and
w
i
using some other continuous operation;
(ii) we can aggregate the p
i
s with some other
other continuous function; (iii) f can be any
continuous function from input to output.
We emphasize here that all inputs, outputs
15
x_1
x_2
w_1x_1 + w_2x_2 =
and the weights of a hybrid neural net are
real numbers taken from the unit interval
[0, 1].
A processing element of a hybrid neural net
is called fuzzy neuron.
K. Hirota and W. Pedrycz, OR/ANDneu-
ron in modeling fuzzy set connectives,
IEEE Transactions on Fuzzy Systems, 2(994)
151-161.
Figure 4: Regular neural net.
16
Denition 1 (AND fuzzy neuron). The sig-
nal x
i
and w
i
are combined by the maximum
operator to produce
p
i
= max{w
i
, x
i
}, i = 1, 2.
The input information p
i
is aggregated by
the minimumoperator to produce the output
y = min{p
1
, p
2
} = min{w
1
x
1
, w
2
x
2
}
of the neuron.
Denition 2 (OR fuzzy neuron). The sig-
nal x
i
and w
i
are combined by the minimum
operator
p
i
= min{w
i
, x
i
}, i = 1, 2.
17
The input information p
i
is aggregated by
the maximum to produce the output
y = max{w
1
x
1
, w
2
x
2
}
of the neuron.
Denition 3 (OR(max-product) fuzzy neu-
ron). The signal x
i
and w
i
are combined by
the product operator
p
i
= w
i
x
i
, i = 1, 2.
The input information p
i
is aggregated by
the maximum to produce the output
y = max{w
1
x
1
, w
2
x
2
}.
18
x_1
x_2
max {w_1x_1, w_2x_2} >
of the neuron.
Figure 5: Max-product hybrid neural net.
The ANDand ORfuzzy neurons realize pure
logic operations on the membership values.
The role of the connections is to differenti-
ate between particular leveles of impact that
the individual inputs might have on the re-
sult of aggregation.
It is well-known that regular nets are uni-
versal approximators, i.e. they can approx-
imate any continuous function on a com-
19
pact set to arbitrary accuracy. The problem
with this result that it is non-constructive
and does not tell you how to build the net.
Hybrid neural nets can be used to imple-
ment fuzzy IF-THEN rules in a constructive
way.
Though hybrid neural nets can not use di-
rectly the standard error backpropagation
algorithm for learning, they can be trained
by steepest descent methods to learn the pa-
rameters of the membership functions rep-
resenting the linguistic terms in the rules.
20
Descent methods for minimization
The error correction learning procedure
is simple enough in conception. The pro-
cedure is as follows: During training an
input is put into the network and ows
through the network generating a set of
values on the output units.
Then, the actual output is compared with
the desired target, and a match is com-
puted. If the output and target match, no
change is made to the net. However, if the
output differs from the target a change
must be made to some of the connections.
Consider a differentiable function f : R
R. A differentiable function is always in-
creasing in the direction of its derivative,
and decreasing in the opposite direction. In
a descent method for function minimization
21
the next iteration w
n+1
should satisfy the
following property
f(w
n+1
) < f(w
n
)
i.e. the value of f at w
n+1
is smaller than its
previous value at w
n
.
In error correction learning procedure, each
iteration of a descent method calculates the
downhill direction (opposite of the direc-
tion of the derivative) at w
n
which means
that for a sufciently small > 0 the in-
equality
f(w
n
f
(w
n
)) < f(w
n
),
should hold, and we let w
n+1
be the vector
22
w
n+1
= w
n
f
(w
n
).
Let f : R
n
R be a real-valued function.
In a descent method, whatever is the next it-
eration, w
n+1
, it should satisfy the property
f(w
n+1
) < f(w
n
),
i.e. the value of f at w
n+1
is smaller than its
value at previous approximation w
n
.
Each iteration of a descent method calcu-
lates a downhill direction (opposite of the
direction of the derivative) at w
n
which means
that for a sufciently small > 0 the in-
equality
f(w
n
f
(w
n
)) < f(w
n
),
23
should hold, and we let w
n+1
be the vector
w
n+1
= w
n
f
(w
n
).
Exercise 1. The error function to be mini-
mized is given by
E(w
1
, w
2
) =
1
2
[(w
2
w
1
)
2
+ (1 w
1
)
2
].
Find analytically the gradient vector
E
(w) =
_
1
E(w)
2
E(w)
_
Find analytically the weight vector w
that
minimizes the error function such that
24
E
(w) = 0.
Derive the steepest descent method for the
minimization of E.
Solution 1. The gradient vector of E is
E
(w) =
_
(w
1
w
2
) + (w
1
1)
(w
2
w
1
)
_
=
_
2w
1
w
2
1
w
2
w
1
_
and w
(1, 1)
T
is the unique solution to the
equation
_
2w
1
w
2
1
w
2
w
1
_
=
_
0
0
_
.
25
The steepest descent method for the mini-
mization of E reads
_
w
1
(t + 1)
w
2
(t + 1)
_
=
_
2w
1
(t) w
2
(t) 1
w
2
(t) w
1
(t)
_
.
where > 0 is the learning constant and t
indexes the number of iterations.
That is,
w
1
(t + 1) = w
1
(t) (2w
1
(t) w
2
(t) 1),
w
2
(t + 1) = w
2
(t) (w
2
(t) w
1
(t)).
26
Some fuzzy reasoning schemes
Consider a fuzzy rule-based system of the
form
1
: if x is A
1
and y is B
1
then z is C
1
2
: if x is A
2
and y is B
2
then z is C
2
n
: if x is A
n
and y is B
n
then z is C
n
fact : x = x
0
and y = y
0
consequence : z is C
where A
i
and B
i
are fuzzy sets, i = 1, . . . , m.
The procedure for obtaining the fuzzy out-
put of such a knowledge base consists from
the following three steps:
Find the ring level of each of the rules.
27
Find the output of each of the rules.
Aggregate the individual rule outputs to
obtain the overall system output.
Sugeno and Takagi use the following rules
1
: if x is A
1
and y is B
1
then z
1
= a
1
x + b
1
y
2
: if x is A
2
and y is B
2
then z
2
= a
2
x + b
2
y
The ring levels of the rules are computed
by
1
= A
1
(x
0
)B
1
(y
0
),
2
= A
2
(x
0
)B
2
(y
0
),
then the individual rule outputs are derived
from the relationships
28
u
v
u
v
min
1
2
A1
B1
A2
B2
a1x + b1y
a2x + b2y x
y
Figure 6: Sugenos inference mechanism.
z
1
= a
1
x
0
+ b
1
y
0
, z
2
= a
2
x
0
+ b
2
y
0
and the crisp control action is expressed as
o =
1
z
1
+
2
z
2
1
+
2
=
1
z
1
+
2
z
2
where
1
and
1
are the normalized values
of
1
and
2
with respect to the sum (
1
+
29
2
), i.e.
1
=
1
+
2
,
2
=
1
+
2
.
Example 1. We illustrate Sugenos reason-
ing method by the following simple example
if x is SMALL and y is BIG then o = x y
if x is BIG and y is SMALL then o = x + y
if x is BIG and y is BIG then o = x + 2y
where the membership functions SMALL
and BIG are dened by
SMALL(v) =
_
_
1 if v 1
1
v 1
4
if 1 v 5
0 otherwise
30
BIG(u) =
_
_
1 if u 5
1 (5 u)/4 if 1 u 5
0 otherwise
Suppose we have the inputs x
0
= 3 and y
0
=
3. What is the output of the system?
The ring level of the rst rule is
1
= min{SMALL(3), BIG(3)}
= min{0.5, 0.5} = 0.5
the individual output of the rst rule is
o
1
= x
0
y
0
= 3 3 = 0.
The ring level of the second rule is
1
= min{BIG(3), SMALL(3)}
= min{0.5, 0.5} = 0.5
the individual output of the second rule is
o
2
= x
0
+ y
0
= 3 + 3 = 6.
31
The ring level of the third rule is
1
= min{BIG(3), BIG(3)}
= min{0.5, 0.5} = 0.5
the individual output of the third rule is
o
3
= x
0
+ 2y
0
= 3 + 6 = 9.
and the system output, o, is computed from
the equation
o =
0 0.5 + 6 0.5 + 9 0.5
1.5
= 5.0.
As an example, we show how to construct a
hybrid neural net (called adaptive network
by Jang which is funcionally equivalent to
Sugenos inference mechanism.
A hybrid neural net computationally identi-
cal to Sugeno-type fuzzy reasoning is shown
32
A1
A2
B1
B2
A1(x0)
A2(x0)
B1(y0)
B2(y0)
1
2
T
T
N
N
1
2
1z1
2z2
o
Layer 1 Layer 2 Layer 3 Layer 4 Layer 5
x0
y0
x0
x0 y0
y0
in
Figure 7: ANFIS architecture for Sugenos reasoning method.
For simplicity, we have assumed only two
rules, and two linguistic values for each in-
put variable.
Layer 1 The output of the node is the
degree to which the given input satis-
es the linguistic label associated to this
node. Usually, we choose bell-shaped
membership functions
A
i
(u) = exp
_
1
2
_
u a
i1
b
i1
_
2
_
,
33
B
i
(v) = exp
_
1
2
_
v a
i2
b
i2
_
2
_
,
to represent the linguistic terms, where
{a
i1
, a
i2
, b
i1
, b
i2
}
is the parameter set.
As the values of these parameters change,
the bell-shaped functions vary accord-
ingly, thus exhibiting various forms of
membership functions on linguistic la-
bels A
i
and B
i
.
In fact, any continuous, such as trape-
zoidal and triangular-shaped membership
functions, are also quantied candidates
for node functions in this layer. Parame-
ters in this layer are referred to as premise
parameters.
Layer 2 Each node computes the ring
strength of the associated rule. The out-
34
put of top neuron is
1
= A
1
(x
0
)B
1
(y
0
) = A
1
(x
0
)B
1
(y
0
),
and the output of the bottom neuron is
2
= A
2
(x
0
)B
2
(y
0
) = A
2
(x
0
)B
2
(y
0
)
Both node in this layer is labeled by T,
because we can choose other t-norms for
modeling the logical and operator. The
nodes of this layer are called rule nodes.
Layer 3 Every node in this layer is la-
beled by N to indicate the normalization
of the ring levels.
The output of top neuron is the normal-
ized (with respect to the sum of ring
levels) ring level of the rst rule
1
=
1
+
2
,
35
and the output of the bottom neuron is
the normalized ring level of the second
rule
2
=
1
+
2
,
Layer 4 The output of top neuron is the
product of the normalized ring level and
the individual rule output of the rst rule
1
z
1
=
1
(a
1
x
0
+ b
1
y
0
),
The output of top neuron is the product
of the normalized ring level and the in-
dividual rule output of the second rule
2
z
2
=
2
(a
2
x
0
+ b
2
y
0
),
Layer 5 The single node in this layer
computes the overall system output as
the sum of all incoming signals, i.e.
o =
1
z
1
+
2
z
2
.
36
If a crisp traing set
{(x
k
, y
k
), k = 1, . . . , K}
is given then the parameters of the hybrid
neural net (which determine the shape of
the membership functions of the premises)
can be learned by descent-type methods.
This architecture and learning procedure
is called ANFIS (adaptive-network-based
fuzzy inference system) by Jang.
The error function for pattern k can be given
by
E
k
=
1
2
(y
k
o
k
)
2
37
where y
k
is the desired output and o
k
is the
computed output by the hybrid neural net.
We we have a simplied fuzzy reasoning scheme
if the individual rule outputs are given by
crisp numbers then we can use their weighted
sum(where the weights are the ring strengths
of the corresponding rules) to obtain the over-
all system output:
1
: if x
1
is A
1
and x
2
is B
1
then o = z
1
. . . . . . . . . . . .
m
: if x
1
is A
m
and x
2
is B
m
then o = z
m
fact: x
1
= u
1
and x
2
= u
2
cons.: o = z
0
where A
ij
are fuzzy sets.
We derive z
0
from the initial content of the
data base, {u
1
, u
2
}, and from the fuzzy rule
38
base by the simplied fuzzy reasoning scheme
- as the weighted average of individual rule
outputs - by
o = z
0
=
z
1
1
+ + z
m
1
+ +
m
where the ring level of the i-th rule is com-
puted by
i
= A
i
(u
1
) B
i
(u
2
).
39
Tuning fuzzy control parameters by neu-
ral nets
Fuzzy inference is applied to various prob-
lems. For the implementation of a fuzzy
controller it is necessary to determine mem-
bership functions representing the linguistic
terms of the linguistic inference rules.
For example, consider the linguistic term
approximately one. Obviously, the corre-
sponding fuzzy set should be a unimodal
function reaching its maximum at the value
one. Neither the shape, which could be tri-
angular or Gaussian, nor the range, i.e. the
support of the membership function is uniquely
determined by approximately one.
Generally, a control expert has some idea
about the range of the membership func-
tion, but he would not be able to argue about
40
small changes of his specied range.
The effectivity of the fuzzy models represent-
ing nonlinear input-output relationships de-
pends on the fuzzy partition of the input space.
Therefore, the tuning of membership func-
tions becomes an import issue in fuzzy con-
trol. Since this tuning task can be viewed
as an optimization problem neural networks
and genetic algorithms offer a possibility to
solve this problem.
A straightforward approach is to assume a
certain shape for the membership functions
which depends on different parameters that
can be learned by a neural network. This
idea was carried out in:
H. Nomura, I. Hayashi and N. Wakami,
Alearning method of fuzzy inference rules
41
by descent method, in: Proceedings of
the IEEE International Conference on Fuzzy
Systems, San Diego, 1992 203-210.
where the membership functions are assumed
to be symmetrical triangular functions de-
pending on two parameters, one of them de-
termining where the function reaches its max-
imum, the order giving the width of the sup-
port. Gaussian membership functions were
used in
H. Ichihashi, Iterative fuzzy modelling
and a hierarchical network, in: R.Lowen
and M.Roubens eds., Proceedings of the
Fourth IFSA Congress, Vol. Engineer-
ing, Brussels, 1991 49-52.
Both approaches require a set training data
in the form of correct input-output tuples
42
and a specication of the rules including a
preliminary denition of the corresponding
membership functions.
We describe now a simple method for learn-
ing of membership functions of the antecedent
and consequent parts of fuzzy IF-THENrules.
Suppose the unknown nonlinear mapping to
be realized by fuzzy systems can be repre-
sented as
y
k
= f(x
k
) = f(x
k
1
, . . . , x
k
n
)
for k = 1, . . . , K, i.e. we have the follow-
ing training set
{(x
1
, y
1
), . . . , (x
K
, y
K
)}
43
For modeling the unknown mapping f, we
employ simplied fuzzy IF-THEN rules of
the following type
i
: if x
1
is A
i1
and . . . and x
n
is A
in
then o = z
i
,
i = 1, . . . , m, where A
ij
are fuzzy numbers
of triangular form and z
i
are real numbers.
In this context, the word simplied means
that the individual rule outputs are given by
crisp numbers, and therefore, we can use
their weighted sum (where the weights are
the ring strengths of the corresponding rules)
to obtain the overall system output.
Let o be the output from the fuzzy system
corresponding to the input x. Suppose the
ring level of the i-th rule, denoted by
i
, is
44
dened by the product operator
i
=
n
j=1
A
ij
(x
j
)
and the output of the system is computed by
o =
m
i=1
i
z
i
m
i=1
i
.
We dene the measure of error for the k-th
training pattern as usually
E =
1
2
(o y)
2
,
where o is the computed output from the
fuzzy system corresponding to the input
pattern x, and y is the desired output.
45
The steepest descent method is used to learn
z
i
in the consequent part of the fuzzy rule
i
.
That is,
z
i
(t + 1) = z
i
(t)
E
z
i
,
= z
i
(t) (o y)
1
+ +
m
,
for i = 1, . . . , m, where is the learning
constant and t indexes the number of the
adjustments of z
i
. We illustrate the above
tuning process by a simple example.
Consider two fuzzy rules with one input and
one output variable
1
: if x is A
1
then o = z
1
2
: if x is A
2
then o = z
2
,
where the fuzzy terms A
1
small and A
2
46
big have sigmoid membership functions
dened by
A
1
(x) =
1
1 + exp(b(x a))
,
A
2
(x) =
1
1 + exp(b(x a))
where a and b are the shared parameters of
A
1
and A
2
.
Figure 8: Symmetrical membership functions.
In this case the equation
A
1
(x) + A
2
(x) = 1,
47
holds for all x from the domain of A
1
and
A
2
.
The overall system output is computed by
o =
A
1
(x)z
1
+ A
2
(x)z
2
A
1
(x) + A
2
(x)
.
The weight adjustments are dened as fol-
lows
z
1
(t+1) = z
1
(t)
E
z
1
= z
1
(t)(oy)A
1
(x)
z
2
(t+1) = z
2
(t)
E
z
2
= z
2
(t)(oy)A
2
(x
k
)
48
a(t + 1) = a(t)
E(a, b)
a
b(t + 1) = b(t)
E(a, b)
b
where
E(a, b)
a
= (o y)
o
k
a
= (o y)
a
[z
1
A
1
(x) + z
2
A
2
(x)]
= (o y)
a
[z
1
A
1
(x) + z
2
(1 A
1
(x))]
= (o y)(z
1
z
2
)
A
1
(x)
a
= (o y)(z
1
z
2
)bA
1
(x)(1 A
1
(x))
= (o y)(z
1
z
2
)bA
1
(x)A
2
(x),
and
49
E(a, b)
b
= (o y)(z
1
z
2
)
A
1
(x)
b
= (o y)(z
1
z
2
)(x a)A
1
(x)A
2
(x).
In
J.-S. Roger Jang, ANFIS: Adaptive-network-
based fuzzy inference system, IEEE Trans.
Syst., Man, and Cybernetics, 23(1993)
665-685.
Jang showed that fuzzy inference systems
with simplied fuzzy IF-THENrules are uni-
versal approximators, i.e. they can approxi-
mate any continuous function on a compact
set to arbitrary accuracy.
It means that the more fuzzy terms (and con-
sequently more rules) are used in the rule
50
base, the closer is the output of the fuzzy
system to the desired values of the function
to be approximated.
Exercise 2. Suppose the unknown mapping
to be realized by fuzzy systems can be rep-
resented as
y = f(x
1
, x
2
)
and we have the following two input/output
training pairs
{(1, 1; 1), (2, 2; 2)}
(i.e. if the input vector is (1, 1) then the
desired output is equal to 1, and if the in-
put vector is (2, 2) then the desired output is
equal to 2)
51
For modeling the unknown mapping f, we
employ four fuzzy IF-THEN rules
if x
1
is small and x
2
is small then o = ax
1
bx
2
if x
1
is small and x
2
is big then o = ax
1
+ bx
2
if x
1
is big and x
2
is small then o = bx
1
+ ax
2
if x
1
is big and x
2
is big then o = bx
1
ax
2
where the membership functions of fuzzy num-
bers small and big are given by
small(v) =
_
_
_
1
v
2
if 0 v 2
0 otherwise
big(v) =
_
_
_
1
2 v
2
if 0 v 2
0 otherwise
a and b are the unknown parameters.
52
The overall system output is computed by
Sugenos reasoning mechanism.
Construct the error functions, E
1
(a, b), E
2
(a, b)
for the rst and second training pairs!
Solution 2. Let (1, 1) be the input to the fuzzy
system. The ring levels of the rules are
computed by
1
= small(1) small(1) = 0.5,
2
= small(1) big(1) = 0.5,
3
= big(1) small(1) = 0.5,
4
= big(1) big(1) = 0.5,
and the output of the system is computed by
o
1
=
a + b
2
We dene the measure of error for the rst
53
training pattern as
E
1
(a, b) =
1
2
_
a + b
2
1
_
2
and in the case of the second training pat-
tern we get
1
= small(2) small(2) = 0,
2
= small(2) big(2) = 0,
3
= big(2) small(2) = 0,
4
= big(2) big(2) = 1,
and the output of the system is computed by
o
2
= 2b 2a
The measure of error for the second train-
ing pattern is as
E
2
(a, b) =
1
2
_
2b 2a 2
_
2
54
Exercise 3. Suppose the unknown nonlinear
mapping to be realized by fuzzy systems can
be represented as
y
k
= f(x
k
) = f(x
k
1
, . . . , x
k
n
)
for k = 1, . . . , K, i.e. we have the following
training set
{(x
1
, y
1
), . . . , (x
K
, y
K
)}
For modeling the unknown mapping f, we
employ three simplied fuzzy IF-THENrules
of the following type
if x is small then o = z
1
if x is medium then o = z
2
if x is big then o = z
3
55
where the linguistic terms A
1
= small,
A
2
= medium and A
3
= big are of
triangular form with membership functions
(see Figure 4.16)
A
1
(v) =
_
_
1 if v c
1
c
2
x
c
2
c
1
if c
1
v c
2
0 otherwise
A
2
(u) =
_
_
x c
1
c
2
c
1
if c
1
u c
2
c
3
x
c
3
c
2
if c
2
u c
3
0 otherwise
56
c1 c2 c3
1
A1
A2
A3
Figure 9: Initial fuzzy partition with three linguistic terms.
A
3
(u) =
_
_
1 if u c
3
x c
2
c
3
c
2
if c
2
u c
3
0 otherwise
Derive the steepest descent method for tun-
ing the premise parameters {c
1
, c
2
, c
3
} and
the consequent parameters {y
1
, y
2
, y
3
}.
Solution 3. Let x be the input to the fuzzy
system. The ring levels of the rules are
computed by
1
= A
1
(x),
2
= A
2
(x),
3
= A
3
(x),
57
and the output of the system is computed by
o =
1
z
1
+
2
z
2
+
3
z
3
1
+
2
+
3
=
A
1
(x)z
1
+ A
2
(x)z
2
+ A
3
(x)z
3
A
1
(x) + A
2
(x) + A
3
(x)
= A
1
(x)z
1
+ A
2
(x)z
2
+ A
3
(x)z
3
where we have used the identity
A
1
(x) + A
2
(x) + A
3
(x) = 1
for all x [0, 1].
We dene the measure of error for the k-th
58
training pattern as usually
E
k
= E
k
(c
1
, c
2
, c
3
, z
1
, z
2
, z
3
)
=
1
2
(o
k
(c
1
, c
2
, c
3
, z
1
, z
2
, z
3
) y
k
)
2
where o
k
is the computed output from the
fuzzy system corresponding to the input pat-
tern x
k
and y
k
is the desired output, k =
1, . . . , K.
The steepest descent method is used to learn
z
i
in the consequent part of the i-th fuzzy
rule.
That is,
z
1
(t+1) = z
1
(t)
E
k
z
1
= z
1
(t)(o
k
y
k
)A
1
(x
k
)
59
z
2
(t+1) = z
2
(t)
E
k
z
2
= z
2
(t)(o
k
y
k
)A
2
(x
k
)
z
3
(t+1) = z
3
(t)
E
k
z
3
= z
3
(t)(o
k
y
k
)A
3
(x
k
)
where x
k
is the input to the system, > 0
is the learning constant and t indexes the
number of the adjustments of z
i
.
In a similar manner we can tune the centers
of A
1
, A
2
and A
3
.
c
1
(t + 1) = c
1
(t)
E
k
c
1
,
c
2
(t + 1) = c
2
(t)
E
k
c
2
,
c
3
(t + 1) = c
3
(t)
E
k
c
3
,
60
where > 0 is the learning constant and
t indexes the number of the adjustments of
the parameters.
The partial derivative of the error function
E
k
with respect to c
1
can be written as
E
k
c
1
= (o
k
y
k
)
o
k
c
1
= (o
k
y
k
)
(x c
1
)
(c
2
c
1
)
2
(z
1
z
2
)
if c
1
x
k
c
2
, and zero otherwise.
It should be noted that the adjustments of
a center can not be done independently of
other centers, because the inequality
61
0 c
1
(t + 1) < c
2
(t + 1) < c
3
(t + 1) 1
must hold for all t.
62
Neuro-fuzzy classiers
Conventional approaches of pattern classi-
cation involve clustering training samples
and associating clusters to given categories.
The complexity and limitations of previous
mechanisms are largely due to the lacking
of an effective way of dening the bound-
aries among clusters.
This problembecomes more intractable when
the number of features used for classica-
tion increases.
On the contrary, fuzzy classication assumes
the boundary between two neighboring classes
as a continuous, overlapping area within which
an object has partial membership in each
class. This viewpoint not only reects the
reality of many applications in which cate-
gories have fuzzy boundaries, but also pro-
63
vides a simple representation of the poten-
tially complex partition of the feature space.
In brief, we use fuzzy IF-THEN rules to
describe a classier. Assume that K pat-
terns x
p
= (x
p1
, . . . , x
pn
), p = 1, . . . , K
are given from two classes, where x
p
is an
n-dimensional crisp vector. Typical fuzzy
classication rules for n = 2 are like
If x
p1
is small and x
p2
is very large
then x
p
= (x
p1
, x
p2
) belongs to Class C
1
If x
p1
is large and x
p2
is very small
then x
p
= (x
p1
, x
p2
) belongs to Class C
2
where x
p1
and x
p2
are the features of pat-
tern (or object) p, small and very large are
linguistic terms characterized by appropri-
ate membership functions.
64
The ring level of a rule
i
: If x
p1
is A
i
and x
p2
is B
i
then x
p
= (x
p1
, x
p2
) belongs to Class C
i
with respect to a given object x
p
is inter-
preted as the degree of belogness of x
p
to
C
i
.
This ring level, denoted by
i
, is usually
determined as
i
= A
i
(x
p1
) A
2
(x
p2
),
where is a triangular norm modeling the
logical connective and.
As such, a fuzzy rule gives a meaningful ex-
pression of the qualitative aspects of human
recognition. Based on the result of pattern
65
matching between rule antecedents and in-
put signals, a number of fuzzy rules are trig-
gered in parallel with various values of r-
ing strength.
Individually invoked actions are considered
together with a combination logic. Further-
more, we want the system to have learn-
ing ability of updating and ne-tuning itself
based on newly coming information.
The task of fuzzy classication is to gener-
ate an appropriate fuzzy partition of the fea-
ture space.
In this context the word appropriate means
that the number of misclassied patterns is
very small or zero.
Then the rule base should be optimized by
deleting rules which are not used.
66
A1
A2
A3
B1
B2
B3
1
1
1
1/2
1/2
x1
x2
Consider a two-class classication problem
shown in Fig. 10. Suppose that the fuzzy
partition for each input feature consists of
three linguistic terms
{small, medium, big}
which are represented by triangular mem-
bership functions.
Figure 10: Initial fuzzy partition with 9 fuzzy subspaces and 2 misclassied
patterns. Closed and open circles represent the given pattens from Class 1
and Class 2, respectively.
67
Both initial fuzzy partitions in Fig. 10 sat-
isfy 0.5-completeness for each input vari-
able, and a pattern x
p
is classied into Class
j if there exists at least one rule for Class j
in the rule base whose ring strength (de-
ned by the minimum t-norm) with respect
to x
p
is bigger or equal to 0.5.
So a rule is created by nding for a given
input pattern x
p
the combination of fuzzy
sets, where each yields the highest degree
of membership for the respective input fea-
ture. If this combination is not identical to
the antecedents of an already existing rule
then a new rule is created.
However, it can occur that if the fuzzy parti-
tion is not set up correctly, or if the number
of linguistic terms for the input features is
not large enough, then some patterns will
be missclassied.
68
The following 9 rules can be generated from
the initial fuzzy partitions shown in Fig. 10:
1
: If x
1
is small and x
2
is big
then x
p
= (x
1
, x
2
) belongs to Class C
1
2
: If x
1
is small and x
2
is medium
then x
p
= (x
1
, x
2
) belongs to Class C
1
3
: If x
1
is small and x
2
is small
then x
p
= (x
1
, x
2
) belongs to Class C
1
4
: If x
1
is big and x
2
is small
then x
p
= (x
1
, x
2
) belongs to Class C
1
5
: If x
1
is big and x
2
is big
then x
p
= (x
1
, x
2
) belongs to Class C
1
6
: If x
1
is medium and x
2
is small
then x
p
= (x
1
, x
2
) belongs to Class C
2
7
: If x
1
is medium and x
2
is medium
then x
p
= (x
1
, x
2
) belongs to Class C
2
8
: If x
1
is medium and x
2
is big
then x
p
= (x
1
, x
2
) belongs to Class C
2
9
: If x
1
is big and x
2
is medium
then x
p
= (x
1
, x
2
) belongs to Class C
2
69
where we have used the linguistic terms small
for A
1
and B
1
, medium for A
2
and B
2
, and
big for A
3
and B
3
.
However, the same rate of error can be reached
by noticing that if x
1
is medium then the
pattern (x
1
, x
2
) belongs to Class 2, indepen-
dently from the value of x
2
, i.e. the follow-
ing 7 rules provides the same classication
result
1
: If x
1
is small and x
2
is big then x
p
belongs to Class C
1
2
: If x
1
is small and x
2
is medium then x
p
belongs to Class C
1
3
: If x
1
is small and x
2
is small then x
p
belongs to Class C
1
4
: If x
1
is big and x
2
is small then x
p
belongs to Class C
1
5
: If x
1
is big and x
2
is big then x
p
belongs to Class C
1
6
: If x
1
is medium then x
p
belongs to Class C
2
7
: If x
1
is big and x
2
is medium then x
p
belongs to Class C
2
Fig. 11 is an example of fuzzy partitions
70
Figure 11: Appropriate fuzzy partition with 15 fuzzy subspaces.
(3 linguistic terms for the rst input feature
and 5 for the second) which classify cor-
rectly the patterns.
Sun and Jang in
C.-T. Sun and J.-S. Jang, A neuro-fuzzy
classier and its applications, in: Proc.
IEEE Int. Conference on Neural Net-
works, San Francisco, 1993 9498.
propose an adaptive-network-based fuzzy clas-
71
A1
A2
B1
B2
T
T
Layer 1 Layer 2
Layer 3 Layer 4
x2
T
T
S
S
C1
C2
x1
Figure 12: An adaptive-network-based fuzzy classier.
sier to solve fuzzy classication problems.
Fig. 12 demonstrates this classier archi-
tecture with two input variables x
1
and x
2
.
The training data are categorized by two classes
C
1
and C
2
. Each input is represented by two
linguistic terms, thus we have four rules.
Layer 1 The output of the node is the
degree to which the given input satis-
es the linguistic label associated to this
72
node. Usually, we choose bell-shaped
membership functions
A
i
(u) = exp
_
1
2
_
u a
i1
b
i1
_
2
_
,
B
i
(v) = exp
_
1
2
_
v a
i2
b
i2
_
2
_
,
to represent the linguistic terms, where
{a
i1
, a
i2
, b
i1
, b
i2
},
is the parameter set. As the values of
these parameters change, the bell-shaped
functions vary accordingly, thus exhibit-
ing various forms of membership func-
tions on linguistic labels A
i
and B
i
.
Layer 2 Each node generates a signal
corresponing to the conjuctive combina-
tion of individual degrees of match. The
output signal is the ring strength of a
73
fuzzy rule with respect to an object to be
categorized.
In most pattern classication and query-
retrival systems, the conjuction operator
plays an important role and its interpre-
tation context-dependent.
Since does not exist a single operator that
is suitable for all applications, we can
use parametrized t-norms to cope with
this dynamic property of classier de-
sign.
All nodes in this layer is labeled by T,
because we can choose any t-norm for
modeling the logical and operator. The
nodes of this layer are called rule nodes.
Features can be combined in a compen-
satory way. For instance, we can use the
generalized p-mean proposed by Dyck-
hoff and Pedrycz:
74
_
x
p
+ y
p
2
_
1/p
, p 1.
We take the linear combination of the
ring strengths of the rules at Layer 3
and apply a sigmoidal function at Layer
4 to calculate the degree of belonging to
a certain class.
If we are given the training set
{(x
k
, y
k
), k = 1, . . . , K}
where x
k
refers to the k-th input pattern and
y
k
=
_
(1, 0)
T
if x
k
belongs to Class 1
(0, 1)
T
if x
k
belongs to Class 2
75
then the parameters of the hybrid neural net
(which determine the shape of the member-
ship functions of the premises) can be learned
by descent-type methods.
The error function for pattern k can be de-
ned by
E
k
=
1
2
_
(o
k
1
y
k
1
)
2
+ (o
k
2
y
k
2
)
2
where y
k
is the desired output and o
k
is the
computed output by the hybrid neural net.
76
Literature
1. Robert Full er, Introduction to Neuro-Fuzzy
Systems, Advances in Soft Computing
Series, Springer-Verlag, Berlin, 1999. [ISBN
3-7908-1256-0]
2. Christer Carlsson and Robert Full er, Fuzzy
Reasoning in Decision Making and Op-
timization,
Springer-Verlag, Berlin/Heildelberg, 2001.
3. Robert Full er, Neural Fuzzy Systems,
Abo
Akademis tryckeri,
Abo, ESF Series A:443,
1995.
77