Fuzzy Neural Logic Network and Its Learning Algorithms
Fuzzy Neural Logic Network and Its Learning Algorithms
with three columns or layers. They are the input layer, the
Abstract
hidden layer and the output layer. Each layer is made up of a set
This paper introduces the basic features of fuzzy neural of nodes. The nodes in the input layer are defined as input
logic network. Each fuzzy neural logic network model is trained nodes, the nodes in the hidden layer as hidden nodes and the
from a set of knowledge in the form of examples using one of the nodes in the output layer as output nodes. There is a directed arc
three learning algorithms introduced. These three learning from every node in the input layer to every node in the hidden
algorithms are the delta rule controlled learning algorithm and two layer, and from every node in the hidden layer to every node in
mathematical construction algorithms, namely, the local learning the output layer. Figure 1 shows a fuzzy neural logic network
method and the global learning method. Once the fuzzy neural model with m input nodes I, p hidden nodes H and n output
logic network model is constructed, it is ready to accept any nodes 0.
unknown input from the user. With a low percentage of
mismatched features, output solution can be obtained.
Introduction
476
.OO 0 1991 IEEE
0073-1129/91/0000/0476$01
Fuzzy Activation having an activation of (1,O). Otherwise, the hidden node is in
an inhibitory state, i.e., with an activation of (0,l) or (0,O).
Each input, hidden and output node represents a
condition. The activation of a node is denoted by an ordered pair Each output node 0 represents a possible
of real, non-negative numbers (a,b). These numbers are recommendation action. The activation value (a,b) of the output
normalized in such a way that the following constraints are node represents the eagerness with which an action is
satisfied: recommended to be taken (indicated by the quantity "a") or not to
OIall be taken (indicated by the quantity "b"). Since (1-a-b) represents
OIbSl the uncertainty, a large (1-a-b) means that the system does not
OIa+b<l know whether to recommend for or against the action.
The interpretation of these numbers is as follows
There is a connecting arrow between every node in the
M[51[61[71[111: input layer to every node in the hidden layer and from every node
in the hidden layer to every node in the output layer. Each
1. The quantity 'la'' denotes the amount of evidence
for the condition represented by the node, connecting arrow is assigned an ordered pair of real numbers
(a$) as its weight. Unlike the activations, these weights are not
2. The quantity "b" denotes the amount of evidence
against the condition represented by the node, normalized and negative values are allowed.
3. The quantity "c" expresses the lack of evidence
Propagation Rule
regarding the condition.
The network is activated by assigning value (a,b) to each
Each input node I to a condition' If an input of the input ,,odes where a and b represent the amount of
attribute can take Only two possible it is represented by evidence for and against the condition represented by the input
one input node. On the other hand, if an input attribute is
node respectively. These activation values then propagate along
classified into more than two categories, say n categories, then it
the network using the rule of propagation defined as follows:
is represented by n input nodes where each input node represents
the amount of evidence for and against a category. Suppose there are m incoming arrows connecting m
nodes Pi whose activations are (ai,bi) to a node R where
The activations of these input nodes are assigned by the i=1,2,...,m. Let the connecting weight between each Pi node
user. For example, if 100 customers were asked about the price
and R be (ai,pi) where a i and pi are the weights associated with
of a product, 60 customers feel that the price is high, 20 the evidence for (indicated by the quantity "ai") and against
customers think that the price is low and 20 customers do not
(indicated by the quantity "bi") the condition represented by node
give any comments, then the activation of the input node
Pi respectively. The following figure illustrates this portion of
representing the condition 'price of product is high' is assigned a
the network.
value of (0.6,0.2).
411
For each value of i, take the inner products of drawn from every node in the first column to every node in the
(ai,bi) and (ai.pi) to form the ordered pair (t,f) = second column, and from every node in the second column to
(aiai&iPi). every node in the last column.
Make components of the ordered pair positive. Once the network structure is constructed, it is trained
using one of the following three techniques:
If both t and f a r e positive, set ( r i f i ) = (t,f).
1. Iterative learning using delta rule
If both t and f a r e negative, set (ti&) = (-f,-t).
2. Local learning by construction
If only t is negative, set (ti&) = (0,f-t).
3. Global learning by construction
If only f is negative, set ( t i f i ) = (t-f,O).
In the iterative learning method, weights are allowed to
The net excitatory input at node R is given by
m vary while the system is fed with examples. In the construction
T = Cti. methods (local and global learning), weights are assigned based
i=l on the given set of training examples. However, weights that are
assigned using the construction methods are so precise that any
The net inhibitory input at node R is given by
m deviation in inputs cannot be tolerated. Hence, a refinement
F= 5 i . algorithm is introduced to solve this problem.
i=l
It is assumed that the evidence for and against a condition
Perform a strong threshold calculation. The given
are weighted equally in these three learning algorithms. In other
threshold for node R is denoted by e r . The
words, for all the weights (a,p)that connect nodes in the first
strong threshold calculation is: layer to nodes in the second layer, a=P.
If T-F2 er. set pr = 1 and qr = 0.
If T-FI-Br, set Pr = 0 and qr = 1. Iterative Learning using Delta Rule
If IT-FI<Or, set Pr = 0 and qr = 0.
Like any other conventional neural network models, a
The activation of node R is given by (Ppqr). fuzzy neural logic network model can be trained using an iterative
learning algorithm. Being enhanced perceptrons, it uses a
The threshold e r is set to 1 in our approach.
controlled learning algorithm modified from the delta rule [ 111.
The same propagation process is repeated until no further During the process of delta rule controlled learning, the
change of value is needed. The neural network has then reached network is repeatedly presented with a set of initial stimulus
a "stable" state. At this point, the values of the output nodes are patterns or input vectors, with a corresponding set of outputs.
the output values of the given input values. After the learning process, each hidden node becomes sensitive to
one initial stimulus pattern or input vector, and the rest of the
Learning
hidden nodes are representatives for other stimulus patterns or
478
input vector. Similarly, let 0 1 , 0 2 ,...,Ok be k (not necessary CASE 1:
distinct) members of Sn such that Oj (i=1,2, ...,k) corresponds to If i=r,
the output vector of the ith training example and n refers to the a = a + q [1-(c-d)] (a-b)
number of terms in each output vector. p=P + q [1-(c-d)] (a-b)
Take one node say the jth node of column 1 and take any CASE 2:
node say the ith node of column 2. Attach the edge joining theJZh If i#t,
node of column 1 to the ith node of column 2 by the ordered pair a = a + q min(0,O-(c-d)) (a-b)
(a$),where a and are real numbers. p = P + q min(0,O-(c-d)) (a-b)
where
Let (a,b) be the jth term of the input vector Ij. = a small positive constant that determines the
q
learning rate (O.Ol<q<O.l).
CASE 1:
If (a,b) = (O,O), assign For each fixed t , modify the weights for each pair of
(a,P)= (0,O). nodes j=1,2 ,...,m; i=1,2 ,...,k.
Let (a$) be the weight associated with the edge joining performed by means of "learning without teacher". By weighing
the input attributes of each training example equally, the learning
the jthnode of column 1 to the ith node of column 2. Let (a,b) be
the jthvalue of Ii. Let (c,d) be the propagated value of the ith of each initial stimulus pattern is independent of the rest of the
stimulus patterns. In other words, each hidden node identifies a
node of column 2.
stimulus pattern by weighing each of the features to be of equal
Modify the values (a$)according to the following cases: importance.
419
During the process of learning, the network is presented Step 3: Weights Assignment to Edges Joining Nodes
with a set of initial input and output patterns. After the self- in Columns 2 & 3. Take one node say the ith node of column
learning process, each hidden node becomes very sensitive to one 2 and one node say the jth node of column 3. Attach the edge
initial stimulus pattern. Each hidden node has the ability to joining these two nodes by the ordered pair (a'$'),where a',
recognize a pattern and has become selectively responsive only to p' are real numbers obtained as follows:
that stimulus pattern. In fact, each initial stimulus pattern has
become elicit to one output through a representative hidden node. Let (a',b') be thejth term of the output vector Oi.
The following are the steps for this local learning by CASE 1:
construction algorithm: If (a',b') = (l,O), assign
= (1,O).
(a'$')
Step 1: Construction of Network Structure. Construct
the fuzzy neural logic network structure using the training CASE 2:
examples. If (a',b') = (O,l), assign
(a'J3')= (-1,O).
Step 2: Weights Assignment to Edges Joining Nodes
in Columns 1 & 2. Let R = ((l,O), (O,l), (0,O)) and S = Do this to every pair of nodes between column 2 and
((l,O), (0,l)). Let 11,12,...,Ik be k (distinct) members of Rm column 3.
such that Ii (i=1,2, ...,k) refers to the input vector of the ith
Step 4: Modification of Weights to Achieve Desired
training example and m refers to the number of terms in each
Error Tolerance. Unlike the Hopfield network, a fuzzy neural
input vector. Similarly, let 0 1 , 0 2 ,...,Ok be k (not necessary
logic network model constructed using the above construction
distinct) members of Sn such that Oi (i=1,2,...,k) corresponds to
algorithm can recognize all the initial stimulus patterns. The
the output vector of the ith training example and n refers to the
mapping from the input vector set (11,12,...,1k) to the output
number of terms in each output vector.
vector set ( 0 1 , 0 2 ,...,O k ) can be confirmed by direct
Take one node say the jthnode of column 1 and take any verification. However, the weights obtained are so precise that
node say the ith node of column 2. Attach the edge joining thejth the network cannot tolerate any minor deviation in its inputs. This
node of column 1 to the ith node of column 2 by the ordered pair problem is solved by applying the following refinement
(a$), where a and P are real numbers. algorithm:
where c has the same meaning as in case 1. Step 5: Relearning using Delta Rule. After the weights
are modified, the network model may no longer mapped the input
CASE 3: vector set (I1,12,..., Ik) to the output vector set ( 0 1 , 0 2,...,O k ) .
If (a,b) = (O,O), assign Hence, the delta rule controlled learning algorithm (i.e., steps 4
(a$)= (0,O). and 5 of the iterative learning using delta rule method given
earlier) is applied to relearn the pattern mappings.
This operation will be performed on every pair of nodes
between column 1 and column 2. Although the local learning by construction cum
480
refinement method does not identify the relative importance of the k
attributes, it is an improvement over the iterative learning method Let N be the size of U {Oi)
because the time taken to train the network has been significantly i= 1
reduced. Furthermore, the amount of computations required for For each term (say p h term) of the input vector
both the construction and refinement algorithms are very much Initialize S(1,o) and S(o,1) to be empty sets
less than that required for iterative learning. For each training example (say qth training example)
Let (a,b) be the pth term of the input vector Iq
An advantage of using this method is the flexibility in If (ab) = (l,O), then
setting the error tolerance requirement. Not only does the S(1,O) = S(1,O) U (Oq)
network matches the given pair of pattern sets {11,12,...,1k) and If (a,b) = (O,l), then
{ 0 1 , 0 2 ,...,0 k ) correctly, it also tolerates certain degree of error S(0,l) = S(0,l) U (Oq)
depending on the input vector set {11,12,..&) as well as the Assign T(1,o) (p) = N + 1 - size of S(1,o)
desired error tolerance requirement. Assign T(o,l) @) = N + 1 - size of S(o,1)
CASE 1:
Step 2: Determination of Features' Importance. Let R =
If (a,b) = (l,O), assign
{(l,O), (O,l), (0,O)) and S = {(l,O), (0,l)). Let 11,12,...,Ik be
k (distinct) members of Rm such that Ii (i=1,2,...,k) refers to the
input vector of the ith training example and m refers to the
p=1 p=l
number of terms in each input vector. Similarly, let
0 1 , 0 2 ,...,0 k be k (not necessary distinct) members of Sn such
CASE 2:
that Oi (i=1,2,...,k) corresponds to the output vector of the ith If (a,b) = (O,l), assign
training example and n refers to the number of terms in each
(%PI =( -wii ' 1- m
output vector. The following algorithm determines the
importance of each feature:
48 1
CASE 3: refinement process. In addition, the refinement algorithm
If (a,b) = (O,O), assign provides the users the flexibility to set the error tolerance
(a$) = (0,O). requirement. Hence, this method is definitely more superior than
the delta rule controlled learning method.
This operation will be performed on every pair of nodes
between column 1 and column 2. The third method, the global learning by construction
method, is an improvement over the local learning by
Step 5: Weights Assignment to Edges Joining Nodes construction method. This method retains all the strong points
in Columns 2 & 3. Take one node say the ith node of column and overcomes the weakness of the second method. By
2 and one node say the jth node of column 3. Attach the edge weighing input attributes according to their importance, more
joining these two nodes by the ordered pair (a'$'),where a',
realistic models are simulated.
p' are real numbers obtained as follows:
To summarize, the construction cum refinement methods
Let (a',b) be the jth term of the output vector Oi.
are more efficient than the iterative method. The local learning by
construction method is appropriate when input attributes are to be
CASE 1:
weighted equally. The global learning by construction method is
If (a',b') = (l,O), assign
used when the relative importance of the attributes are to be taken
(a'$')= (1,O).
into consideration.
CASE 2:
Knowledge Acquisition
If (a',b) = (O,l), assign
(a'$')= (-1,O). A fuzzy neural logic network model is set up using a set
of knowledge in the form of examples. This knowledge can be
Do this to every pair of nodes between column 2 and
acquired from many sources, including textbooks, documents,
column 3.
human experts, our own experience and generalized observations
Step 6: Error Tolerance Refinement. Although a fuzzy Of the environment.
neural logic network model constructed using the above
A training example indicates the conditions (or attribute
construction algorithm can recognize all the initial stimulus
values) to satisfy a conclusion. Before a set of training examples
patterns, the weights obtained are such that even minor error are used to set up the model, they are converted into the binary
cannot be tolerated. Hence, the refinement algorithm (i.e., steps
form with (l,o), (o,l) and (o,o) A training example that
Of the local learning by consmction method given
is in its binary form is made up of two components, namely, the
is applied to enable the network to certain degree Of error. input vector and the output vector. The input vector corresponds
Comparison of the Learning Algorithms to the conditions or attribute values to be satisfied and the output
vector corresponds to the conclusion. The input vector is made
Three learning algorithms are proposed to train the fuzzy UP of (1,0), (0,l) and (0,O) terms and the output vector is made
neural logic network. up of (1,O) and (0,l) terms. The number of terms in.the input
vector is determined as follows:
The first method, the delta rule iterative learning method,
weighs all input attributes equally. The major drawback of this For each input attribute
method is the great amount of time required to train the network. if it can take only two possible values, it is
represented by only one ordered pair term.
The second method is named 'local learning by
if it can take more than two possible values, say j
construction' because each training example is learned
values, it is represented by j ordered pair terms.
independently of the others. This is possible because the input
attributes of each training example are given equal weights. Suppose there are k input attributes in the training set.
Since the network is constructed by assignment of weights, it can Let r1,r2,...,rk be the number of ordered pair terms that
be set up in a relatively short time. Unlike the iterative learning are required to represent the lSt, 2"d ,..., kth input
method, only a few computations are required, if at all, in the attributes respectively.
482
Hence, the number of terms in the input vector, m = B. Conversion of outDut attributes' values or conclusions to
rl+r2+ ...+rk. binarv outuut vector
The number of terms in the output vector is determined as If there are only two possible conclusions
follows: then the first conclusion is represented by (1,O) and the
second conclusion by (0,l).
If there are only two possible conclusions, they are If there are more than two possible conclusions, say
represented by only one term in the output vector. n conclusions,
If there are more than two possible conclusions, say n then the qth value is represented by n ordered pair
conclusions, they are represented by n terms in terms such that the qth term is (1,O) and the other
the output vector. terms are (O,l)s, i.e.,
the first conclusion is represented by n terms of
The conversion procedure of a training example from its
natural language form to its binary form is outlined as follows: (1 ,O)(O, 1) ( O , 1)...,
the second conclusion is represented by n terms of
A. (0,1)(I ,O)(O, I)...,
483
'shopping' without further information on 'target market'. In Hence, the binary representation of the training examples
other words, 'target market' is irrelevant with respect to the in table 1 is as follows:
training example. On the other hand, if 'target market' is
'general', 'product involvement' is 'low', 'product decision
making' is 'limited' and 'brand loyalty' is 'low', then 'type of
product' is 'impulsive'. Lastly, 'product decision making' is
'convenience' if 'target market' is 'general', 'product
involvement' is 'low', 'product decision making' is 'habitual' A fuzzy neural logic network model for the above set of
and 'brand loyalty' is 'high. training examples are constructed using the global learning by
construction method with a desired error tolerance of 10%. This
Table 1 illustrates these training examples in tabular form.
fuzzy neural logic network model is illustrated in figure 3.
Table I :A Sample Set of Training Examples Target mark&
(Seleaive,Ceneral)
Produd t y p
Product
Target Product decision Brand Product
market + involvement +makine + lovaltv --> twe \\ \034
484
involvement, product decision making and brand loyalty. Given References
the evidence, the product type is to be determined.
1. Chan S.C., Loe K.F., Teh H.H. (1987), "Modelling Intelligence
Table 2 : Evidence on the Characteristicsof a Product
Using Neural Logic Networks", Department of Information Systems
CHARACTERISTICS EVIDENCE
Target market 0% selective100% general 0% uncertainty and Computer Science, National University of Singapore.
product involvement 3% high 95% low 2%unceaainty 2. Chan S.C., Hsu L.S., Brody S., Teh H.H. (1988), "On Neural
Product decision making 10% complex 85% limited 0% habitual 5% uncertainty
0% uncertainty
Logic Networks", Neural Network Journal, 1 (Supplement 1): 428.
Brand loyalty 0% high 100% low
3. Chan S.C., Hsu L.S., Brody S., Teh H.H. (1989). "Neural Three-
Hence, the input nodes take the following values: valued-logic Networks", Proceedings of the International Joint
Conference on Neural Networks, 2: 594.
(0,1)(0.03,0.95)(0.1,0.85)(0.85,0.1)(0,0.95)(0,1).
4. Hsu L.S., Teh H.H., Chan S.C., Loe K.F. (1989). "Fuzzy Decision
The output solution that is produced at the output layer is Making based on Neural Logic Networks", Proceedings of the Inter
(O,l)(O,l)( l,O)(O,l). Since this is the binary representation for Faculty Seminar on Neuronet Computing, Technical Report
'impulsive', the conclusion is 'impulsive product type'. TRA6/89, Department of Information Systems and Computer
Science, National University of Singapore, June.
Conclusion 5. Hsu L.S., Teh H.H., Chan S.C., Loe K.F. (1989), "NELONET
based Decision Making", Department of Information Systems and
In this paper, a new class of neural networks called
Computer Science, National University of Singapore.
'Fuzzy Neural Logic Network' has been proposed. It uses
6. Hsu L.S., Teh H.H., Chan S.C., Loe K.F. (1990). "Imprecise
fuzzy-valued logic to handle fuzziness, bias and uncertainty.
Reasoning using Neural Networks", Proceedings of the Twenty-
As fuzzy neural logic network is massively parallel, Third Annual Hawaii International Conference on System Sciences,
decisions can be made at high speed. The network can recognize 4: 363-368.
all the initial stimulus patterns even if these patterns are very 7. Hsu L.S.;Teh H.H., Chan S.C., Loe K.F. (1990). "Fuzzy Logic in
similar. Furthermore, the network is able to draw conclusions Connectionists' Expert Systems", International Joint Conference on
despite minor variations in its inputs. Neural Networks, 2: 599-602.
8. Mockler R.J. (1989), "Knowledge-based Systems for Management
Only the basic features of fuzzy neural logic network are Decisions", Prentice Hall.
introduced in this paper. In fact, research on fuzzy neural logic 9. Nah F.H. (1990), "Fuzzy Neural Logic Network with Application
network has progressed more than what are presented. to Decision Support Systems", Master thesis, Department of
Information Systems and Computer Science, National University of
Multi-level fuzzy neural logic network has been designed
Singapore, in preparation.
[9] to solve complex problems. With this design, multi-level
10. Teh H.H., Yu W.C.P. (1988). "A Controlled Learning Environment
decision making can be supported by allowing the output of a
of Enhanced Perceptron", IEEE Proceedings, Future Trend in
lower level model to become the input of a higher level model.
Distributed Computing Systems, Hong Kong.
This is consistent with human's structured way of thinking.
11. Teh H.H., Chan S.C., Hsu L.S., Loe K.F. (1989). "Probabilistic
Lastly, a decision support system based on multi-level Neural-Logic Networks", Proceedings of the International Joint
fuzzy neural logic network has been implemented on the SUN Conference on Neural Netwofks, 2: 600.
workstation using the 'C' programming language. 12. Turner R. (1984). "Logics for Artificial Intelligence", Ellis
Honvood, Chichester.
485