0% found this document useful (0 votes)
58 views601 pages

Fuzzy Logic and Neural Networks

The document discusses introduction to fuzzy sets. It first describes classical or crisp sets, including representing crisp sets using elements, properties, and characteristic functions. It then covers properties of crisp sets such as empty sets, subset and superset relationships. Finally, it introduces fuzzy sets as an extension of classical sets to allow partial membership.

Uploaded by

annupriya1295
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views601 pages

Fuzzy Logic and Neural Networks

The document discusses introduction to fuzzy sets. It first describes classical or crisp sets, including representing crisp sets using elements, properties, and characteristic functions. It then covers properties of crisp sets such as empty sets, subset and superset relationships. Finally, it introduces fuzzy sets as an extension of classical sets to allow partial membership.

Uploaded by

annupriya1295
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 601

INDEX

S.NO TOPICS PAGE.NO


Week 1
1 Lecture 1 : Introduction to Fuzzy Sets 3

2 Lecture 2 : Introduction to Fuzzy Sets (Contd.) 15

3 Lecture 3 : Introduction to Fuzzy Sets (Contd.) 29

4 Lecture 4 : Introduction to Fuzzy Sets (Contd.) 46

5 Lecture 5 : Introduction to Fuzzy Sets (Contd.) 61

6 Lecture 6 : Introduction to Fuzzy Sets (Contd.) 80

Week 2
7 Lecture 07: Applications of Fuzzy Sets 97

8 Lecture 08: Applications of Fuzzy Sets (Contd.) 110

9 Lecture 09: Applications of Fuzzy Sets (Contd.) 123

10 Lecture 10: Applications of Fuzzy Sets (Contd.) 136

11 Lecture 11: Applications of Fuzzy Sets (Contd.) 148

Week 3
12 Lecture 12: Applications of Fuzzy Sets (Contd.) 162

13 Lecture 13: Applications of Fuzzy Sets (Contd.) 174

14 Lecture 14: Applications of Fuzzy Sets (Contd.) 186

15 Lecture 15: Applications of Fuzzy Sets (Contd.) 198

16 Lecture 16: Applications of Fuzzy Sets (Contd.) 214

Week 4
17 Lecture 17: Optimizationof Fuzzy Reasoning and Clustering Tool 230

Lecture 18: Optimizationof Fuzzy Reasoning and Clustering Tool


18 (Contd.) 242

Lecture 19: Optimizationof Fuzzy Reasoning and Clustering Tool


19 (Contd.) 262

1
Lecture 20: Optimizationof Fuzzy Reasoning and Clustering Tool
20 (Contd.) 279

Week 5
21 Lecture 21 : Some Examples of Neural Networks 300

22 Lecture 22 : Some Examples of Neural Networks (Contd.) 315

23 Lecture 23 : Some Examples of Neural Networks (Contd.) 330

24 Lecture 24 : Some Examples of Neural Networks (Contd.) 343

25 Lecture 25 : Some Examples of Neural Networks (Contd.) 357

26 Lecture 26 : Some Examples of Neural Networks (Contd.) 375

Week 6
27 Lecture 27 : Some Examples of Neural Networks (Contd.) 384

28 Lecture 28 : Some Examples of Neural Networks (Contd.) 394

29 Lecture 29 : Some Examples of Neural Networks (Contd.) 411

30 Lecture 30 : Some Examples of Neural Networks (Contd.) 421

Week 7
31 Lecture 31 : Optimal Designs of Neural Networks 432

32 Lecture 32 : Optimal Designs of Neural Networks (Contd.) 446

33 Lecture 33 : Neuro-Fuzzy System 461

34 Lecture 34 : Neuro-Fuzzy System (Contd.) 477

35 Lecture 35 : Neuro-Fuzzy System (Contd.) 492

36 Lecture 36 : Neuro-Fuzzy System (Contd.) 504

Week 8
37 Lecture 37 : Concepts of Soft Computing and Expert Systems 518

38 Lecture 38 : Concepts of Soft Computing and Expert Systems (Contd.) 532

39 Lecture 39 : A Few Applications 545

40 Lecture 40 : A Few Applications (Contd.) 561

41 Lecture 41 : A Few Applications (Contd.) 573

42 Lecture 42 : A Few Applications (Contd.) 592

2
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 01
Introduction to Fuzzy Sets

I welcome you all to the course on Fuzzy Logic and Neural Networks. Now, in this
particular course, we are trying to model human brain in the artificial way and in other
words, we are also going to discuss the principle of soft computing, in details. Now, let
us start with the first topic, that is, Introduction to Fuzzy Sets.

(Refer Slide Time: 00:50)

Now, here so, this is actually the topics, which I am going to covered in this lecture. So,
at first, we will give a brief introduction to the classical set or the crisp set and after that,
the properties of classical set or the crisp set will be discussed in details. Now, before I
start with the concept of fuzzy sets, we will try to explain the reason behind going for
this particular the fuzzy sets, we will also discuss, how to represent the fuzzy set.

3
(Refer Slide Time: 01:32)

Now, introduction to the classical set or the crisp set, which is denoted by A; now, before
I just go for discussing like what do we mean by the fuzzy set, let me discuss, what do
we mean by classical set or the crisp set, first. Now, to define the concept of the crisp set,
what we do is, we try to explain the terms: the universal set or the universe of discourse.
Now, let me take one example, now supposing that we are going to form a set of all
technical universities in this world, now all the technical university universities of this
world will constitute one big set and that is nothing, but the universe of discourse or the
universal set, that is denoted by capital X.

Now, let me draw one universal set; now supposing that this is nothing, but the universal
set denoted by capital X. Now, next I am just going to ask another question, like can you
not form a set of technical universities having at least five departments each? Now, if
you just investigate a number of universities throughout the world, we will have at least
five departments each and there is a possibility that inside the universal set, I will be
getting a subset and this is nothing, but this particular subset and this subset is nothing,
but the crisp set. So, this is nothing, but the crisp set because, this set is having the well-
defined boundary.

So, by definition, the classical set or a crisp set, we mean the set with fixed and well-
defined boundary. So, this is nothing, but the classical set or the crisp set.

4
(Refer Slide Time: 03:52)

The next, I am just going to discuss, how to represent a crisp set, that is denoted by A.
Now, if you see the literature, the crisp set has been defined in 3 different ways, the first
method. So, A is nothing, but a collection of all the elements like you’re a_1, a-2 up to
a_n. So, the classical set or the crisp set, this is written as A equals to a collection of a_1,
a_2, up to a_n. So, this is one way of representing the crisp set, the second method of
representing the crisp set is as follows.

So, A is nothing, but a crisp set which is nothing, but the collection of x, such that it has
got the property P(x). So, P(x) is nothing, but the property and these particular symbol
indicates that A is actually a crisp set, which is having the properties P(x) and this
particular small x of course, it belongs to your the universal set or universe of discourse.
Now, the third method of representing the crisp set is as follows, we take the help of
some characteristic function to represent the crisp set.

Now, this mu_A(x) represents the characteristic function of the crisp set A. Now, this
mu_A(x) is equals to 1, if x belongs to A, and if it does not belongs to A, then this
mu_A(x) is nothing but 0. Now, this is almost similar to the situation, whether it is a
member or a non-member. So, if it is member, then the characteristic function, that is,
mu_A(x) is nothing, but is equal to 1, and if it is a non-member, then the characteristic
function is equal to 0.

5
So, this is either 1 or 0. So, these are the 3 ways actually, we can represent the crisp set
or the classical set.

(Refer Slide Time: 06:26)

Now, let us see the some of the properties of this particular crisp set, but before that, let
me try to concentrate on the notations, which are generally used in set theory. Now, the
set theory, you have already studied might be during your school days or in the first year
of your under graduation. So, whatever I am discussing related to the crisp set or the
classical set are nothing, but recapitulation for all of you. Now, this particular symbol,
this indicates actually the empty set or the null set, then this symbol x belongs to A. So, x
belongs to A so, this is represented by this particular symbol, then x does not belongs to
A is this particular symbol, then comes your A is a subset of B. So, this is the symbol,
which is generally used to represent A is a subset of B. The next is A is a superset of B.

So, this is represented by this particular symbol, and if you write A is equal to B. So, this
is the symbol; that means, your set A is equal to set B, and if set A and set B are not
equal, we use this particular symbol. So, these are the different symbols used and there
are a few other symbols also, which are generally used in the classical set or the crisp set.

6
(Refer Slide Time: 08:02)

Now, here, this particular symbol, A is a proper subset of B. So, to represent A is a


proper subset of B, we use this particular symbol, then the next symbol indicates A is a
proper superset of B. So, this is nothing, but A is a proper superset of B. Now, I am just
going to concentrate on another symbol. So, this is actually the symbol of the mod sign
or the mod value. Now, this symbol indicates actually the cardinality of a set A and by
cardinality actually, we mean the total number of elements present in a particular set.
Now, let me take a very simple example, now supposing that I have got A, the set, say,
this particular set, that is a crisp set and it has got 3 elements like your a_1, a_2 and a_3.

Now, what is the cardinality of this particular set? It is very simple, the cardinality of this
particular set is nothing but 3, because it has got 3 elements a_1, a_2 and a_3. So, it is so
simple. Now, the next symbol is p(A), now this p(A) indicates the power of set A. Now,
by power of set A, we mean the maximum number of subsets that can be constructed
from a particular set. Now, let me try to concentrate on the same crisp set, that is, A is a
collection of 3 elements a_1 comma a_2 comma a_3. Now, let us see, how many such
subsets can be constructed from this particular set a including the null. Now, if you see,
if I take one element set, that is called the singleton.

So, I can from here construct the subset like, a_1 next the subset like a_2, next I can
construct a_3, next I can construct a_1 a_2, next I can construct a_2 a_3, next I can
construct you’re a_3 a_1, I can also construct the same that is you’re a_1 a_2 a_3 and I

7
can also construct the null set. Now, let me count how many subsets, we have
constructed; so 1, 2, 3, 4, 5, 6, 7, 8.

So, 8 such subsets, I have constructed. Now, if I just try to find out, what is the
cardinality of this particular power set of A; that means, what is this cardinality of power
set of A and that is nothing, but 8 because, I am able to construct 8 such subsets. Now,
this particular 8 can be written as 2 raised to the power 3, and that is nothing, but 2 raised
to the power cardinality of A; that means, I can find out. So, this particular relationship
that is the cardinality of A is nothing, but 2 raised to the power cardinality of A. So, this
is the way actually, we can define the power set of A and it is cardinality.

(Refer Slide Time: 12:05)

Now, I am just going to concentrate on some crisp set operations. Now here, I am just
going to define like, what do we mean by the difference between the two classical sets
that is A − B . Now, supposing that I have got one universal set or the universe of
discourse like this, and I have got two such classical sets or the crisp set 1 is nothing, but
A and another is nothing but B. So, my aim is to find out so, A − B , that is, the difference
between A and B.

Now, this A − B is also known as the relative compliment of set B with respect to set A,
and mathematically, A − B is nothing, but x, such that x belongs to A and x does not
belongs to B. Now, if we just follow that so, x belongs to A and x does not belongs to B.
So, I will be getting, A − B is nothing, but is this. So, this black portion or the shaded

8
portion and that is nothing, but the relative complement of B with respect to A. Now, let
me take a very simple example, now supposing that I have got a classical set or a crisp
set that is nothing, but A and it has got a few elements like your a, b, c, d, e, say f. So, I
have got say 6 elements.

Now, I have got another classical set or the crisp set say, it is denoted by B and
supposing that this is nothing, but b, d, f. So, there are 3 elements. Now, if I find out; so,
this A − B , how to find out the A − B ? It is very simple. So, I will be getting here a then
comes your c and then I will be getting e. So, this is nothing, but the difference between
A and B or the relative complement of B with respect to A. Now, there is another
concept that is called the absolute compliment. So, this absolute compliment, that is,
represented by A or AC now; here let us try to understand from here.

So, supposing that we have got. So, this is nothing, but the universal set denoted by X
and I am got a crisp set that is denoted by A here. So, X − A that is nothing, but the
absolute complement of A and this is nothing, but this black region or the shaded region.
So, that is nothing, but AC or the complement of your A. Now, this is the way actually,
we can define the concept of the difference between two fuzzy sets.

(Refer Slide Time: 15:26)

Now, I am just going to discuss another, that is called the intersection between two crisp
sets. Now, let me try to concentrate. So, this is nothing but X, is nothing but my

9
universal set or universe of discourse denoted by capital X, I have got two crisp sets, one
is called A and another is called B.

So, by definition the intersection of A and B, that is A ∩ B= {x | x ∈ Aandx ∈ B} ; so, this


is a common region between A and B, and if you see. So, this particular black portion or
the shaded portion is nothing, but a common to both A and B, and that indicates actually
A intersection B. Now, if I take the same example like say the crisp set, which is having
the elements like your a, b, c, d, e f. So, I am taking the same example and supposing that
I have got another crisp set and which is nothing but b, d, f.

Now, if I try to find out the common region between A and B and that is nothing, but A
intersection B and that will be nothing, but so, this should be common to both. So, this is
nothing, but is your b, d, f. So, this is what we mean by the intersection and here, let me
mention supposing that I have got two sets like set A and set B and supposing that the
two sets are disjoint; by disjoint we mean there is no common element between these two
classical sets or the crisp sets. Now, if they are disjoint. So, this particular A intersection
B will be nothing, but a null set, so if A and B are disjoint the intersection between these
two sets are nothing, but the null set.

(Refer Slide Time: 17:49)

Now, these are all fundamentals of the classical set and all of us we know, and as I told
this is some sort of recapitulation. Now, then comes the concept of the union. So, once
again, I have got the universal set that is your capital X and I have got two sets, two crisp

10
sets: one is called the A and another is nothing, but B, and I am trying to find out the
union, that is, A ∪ B= {x | x ∈ Aorx ∈ B} ; that means, we consider the maximum area,
and this is A ∪ B .

So, this particular shaded portion or the black portion will be your A ∪ B . Now, once
again, let me concentrate on the same example like A is nothing, but your a, b, c, d, e, f
and B is another crisp set, which is nothing, but is your b, d, f. Now, if I try to find out
what should be A ∪ B it is very simple. So, we consider the maximum; that means, your
A ∪ B will be nothing, but a, b, c, d, e, f. So, this is nothing, but is your A ∪ B .

(Refer Slide Time: 19:25)

So, this is the way actually, we can define your the union of two crisp sets. Now, I am
just going to concentrate on the properties of the crisp set and these are very important
because, the crisp set follows all ten laws. Now, I am just going to state the laws one
after another and as I told this you people have already studied. So, this is some sort of
recapitulation, the first law, now this is known as law of involution. So, it states that the
complement of a complement of a crisp set is nothing, but the original set, ok. So, this is
nothing, but law of involution. The next is the law of commutativity, now it is states that
A ∪ B = B ∪ A; A ∩ B = B ∩ A .

So, this is nothing, but law of commutativity, next comes your law of associativity. It
states that ( A ∪ B ) ∪ C = A ∪ ( B ∪ C );( A ∩ B ) ∩ C = A ∩ ( B ∩ C ) . Now, then comes

11
your law of distributivity, it states that
A ∩ ( B ∪ C ) = ( A ∩ B ) ∪ ( A ∩ C ); A ∪ ( B ∩ C ) = ( A ∪ B ) ∩ ( A ∩ C ) .

Next is your law of tautology, that is A ∪ A= A; A ∩ A= A . Next, come laws of


absorption, which states that A ∪ ( A ∩ B )= A; A ∩ ( A ∪ B )= A , then comes laws of
identity A ∪ X= X ; A ∩ X= A; A ∪ 0= A; A ∩ 0= 0 , now then comes very famous De
Morgan’s laws.

Now, it states that A intersection B complement of that is nothing, but complement of A


union complement of B, another statement A union B complement is equal to A
complement intersection B complement. So, these are very famous, as I told, De
Morgan’s laws. The next is your law of contradiction. Now, it states that A ∩ A =
0 and
the last law that is known as law of excluded middle and it states that A ∪ A =X . Now,
here actually all such ten rules are followed by the crisp set or the classical set.

(Refer Slide Time: 24:01)

Now, then comes the concept of fuzzy set, now before I start with the concept of fuzzy
set, let me try to spend some time just to understand why do we need the concept of this
particular the fuzzy sets. Now, if you see the real-world problems, real-world problems
are very complex very difficult and these are associated with some sort of imprecisions
and uncertainties.

12
Now, prior to the year 1965, people used to believe that it is the probability theory,
which can tackle the different types of uncertainties in this world, but in the year 1965,
Professor L.A. Zadeh of the University of California, USA, told that there are many
uncertainties in this particular world and the probability theory can handle only one out
of different uncertainties. And, there are a few uncertainties, which cannot be tackled
using the principle of probability theory, which works based on the classical set or the
crisp set and Professor Zadeh argued that we need something, which is at higher level
compared to the classical set, if we want to represent the different types of uncertainties
or the imprecision, which we face in real-world problems.

Now, let me take a very simple example, very practical example, now supposing that one
of your friends is going to market and you are requesting him please bring 1 kg red apple
for me and supposing that your friend has brought that particular apple from the market
and there is a probability associated with the availability of apple and it depends on the
season. Now, there is a an uncertainty of regarding the availability of the apple and this
particular uncertainty can be handled using the principle of probability theory and the
probability of getting apple varies from say 0 to 1 supposing that it is 0.6.

So, with the probability of 0.6, your friend has got some apples. Now, my next query,
what is the guarantee that this particular apple is red, how can you define the colour red
and the definition of this particular colour red will vary from person to person. Now, the
problem is how to tackle this particular uncertainty regarding the guarantee that the
colour of the apple is red. So, this particular uncertainty cannot be answered using the
probability theory or using the crisp set and to answer that, we need to have another set
and that is nothing, but is the fuzzy sets.

Now, the same example which I took let me just write it here. Now, the colour of this
particular apple is red, now according to this classical set or the crisp set, there are only
two possibilities, one is your the apple will be red or it will not be red. So, there are two
answers, one is yes and another is no. So, if the apple is found to be red, it characteristic
value is 1.0 and if it is not, its characteristic value is 0.0; in the classical set or crisp set.

So, there are only two answers, one is the member, another is non-member; that means,
1.0 and or 0.0, but the same colour red will be defined in a slightly different way in fuzzy
sets. For example, the colour red can be perfectly read and the same colour, some other

13
person may say this is almost red AR, some other person will say this is slightly red and
there could be a possibility that it may not be called read also, ok. Now, if it is perfectly
red, then we declare it is read with membership function value.

So, this defines a term that is called the membership function value and that is nothing,
but the degree of belongingness, the degree of belongingness and that is nothing, but the
similarity of an element to a particular the class. Now, this particular membership
function value will vary from 0 to 1, now as I told, if it is perfectly red, then we say that
it is red with membership function value 1.0. And, if it is almost red, then also it is called
red with some membership function value 0.8, if it is slightly red then also it is called as
a red with membership function value 0.4 and if it is not red, then also it is red with
membership function value 0.0.

So, this is the way actually we define the colour red in the fuzzy sets, and now you can
see that this is actually a very practical way of representing this type of uncertainties, and
that is why actually this fuzzy set has gained popularity.

Thank you.

14
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 02
Introduction to Fuzzy Sets (Contd.)

(Refer Slide Time: 00:15)

Now, as I told, Professor LA Zadeh of University of California, he actually argued that


there are many uncertainties, which cannot be tackled using only the probability theory,
which works based on the classical set or the crisp set. So, there are many uncertainties
and if you want to tackle, if you want to handle those uncertainties, you will have to take
the help of another concept, that is the concept of the fuzzy sets.

Now, if you see the literature, you will find that the concept of fuzzy set or if the similar
concept of fuzzy set was proposed long back in the year 1937 by one American
Philosopher, whose name is Max Black. So, Max black, an American Philosopher,
introduced the concept of fuzzy sets and as usual, he was opposed by the traditional
mathematician of USA and he is stopped, and then after a few years, in the year 1965,
the concept of fuzzy set was reintroduced by Professor Zadeh.

Now, to define this particular concept of fuzzy set, let me try to take one example and
this particular example, I have already taken, but I will slightly modify this particular
example. Now, if you remember at the beginning, we talked about the universal set that

15
is nothing, but the set of all technical universities in this particular world, and next, we
try to find out the set of technical universities having at least 5 departments.

And, now I am just going to make it more complex, I am just going to find out a set of
technical universities in the world having at least 5 good departments. The moment, you
add this particular adjective good, the problem becomes very difficult, because, how to
define this particular adjective “good” and this particular definition will vary from
person to person and that is why, this particular problem is very complex and very
difficult to answer.

Now, if I just draw in the form of say universal set at this particular fuzzy set, supposing
that the universal set is nothing, but this. So, this is capital X, that is the set of all
technical universities. In this particular word and as I told, the definition of this particular
term that is good will vary from person to person.

That is why, you may not get a very precise subset and we may get a set that is called the
fuzzy set, which is nothing, but sets with imprecise or vague boundaries and that is why,
this particular fuzzy set, we try to draw with the help of dotted line, ok. So, this particular
dotted line, this set is nothing, but the fuzzy set. If you remember, while drawing the
crisp set, we use the solid lines, but for drawing this particular fuzzy set, we use the
dotted line because, this particular definition of the subset will vary from person to
person and there is no well-defined boundary for this particular set and that is why, this
is known as the fuzzy set, that is set with imprecise or the vague boundaries.

Now, this particular sets, fuzzy sets are potential tools for handling imprecision and
uncertainties and we can say that the fuzzy set is a more general concept of this
particular classical set.

16
(Refer Slide Time: 05:03)

Now, let us try to see, how to represent the particular fuzzy set. Now, this I have already
discussed that to represent the fuzzy set, we take the value of membership and this
membership is nothing, but the degree of belongingness and that is defined by, that is
denoted by this particular µ . So, actually µ denotes this membership function value.
Now the fuzzy set is defined as follows.

So, it is nothing, but x comma mu (x) and this particular small x belongs to the universal
X, that is, your capital X; that means, if you want to represent the fuzzy set, we will have
to take the help of this membership function value, which varies from 0 to 1, and if you
remember the probability value that will also vary from 0 to 1, but truly speaking, the
concept of probability and the concept of membership are not exactly the same.

So, by probability, we mean it is the frequency of likelihood that an element is in a class;


that means, your probability is related to the frequency data. On the other hand,
membership is nothing, but the similarity of an element to a particular class. For
example, if I take the probability of getting apple, so that will vary from 0 to 1, and what
is the guarantee that the apple is red? It has got some membership function value lying
between 0 and 1. Now, if I compare these two uncertainties, one is related to the
availability of the apple and another is related to the colour of this particular apple. So,
these two uncertainties are not exactly the same and there is a difference.

17
Now, the availability of the apple that will be handled by probability, but the guarantee
weather this particular apple is red. So, that will be indicated by the membership function
value for example, with membership function value of 0.9 and some other people will
say that this particular apple is red with membership function value of 0.4, and so on. So,
this is how to represent the fuzzy set.

(Refer Slide Time: 08:06)

Now, if you see the literature. So, we have got two types of fuzzy sets, now one is called
actually the discrete fuzzy set and another is called the continuous fuzzy set. Now, let me
define the concept of this particular discrete fuzzy set first. Now, the discreet fuzzy set is
n
defined as A( x) = ∑ µ A ( xi ) / xi . Now, remember that this particular symbol does not
i =1

indicate a division and this particular symbol does not indicate actually summation in
that sense, it indicates actually the collection of data. The collection of this membership
function value and here, the small n indicates the number of elements presents in that
particular set. Now, I am just going to define the concept of this discrete fuzzy set with
the help of this example. So, in this particular example, this figure is going to indicate
supposing that the temperature of a particular place during the first 15 days of a month.
Now, supposing that I have declared the temperature of city say B during the first 15
days of a month is moderate.

18
Now, on the first day, it has got a temperature value, now that is called moderate with
this much of membership function value, on the second day it has got another
temperature value and that is also called moderate with this much of membership
function value µ , and µ varies from 0 to 1.

Similarly, on the third day is called the moderate temperature with this much of
membership function value, that is µ , similarly, for all 15 days, I can represent the
temperature is medium with different values of your membership. Now, this can also be
written following this particular rule as follows: like you’re a(x) is nothing, but the
membership function value of the temperature on first day, supposing that this is 0.15 on
the first day.

So, this is moderate temperature on the first day, the temperature is called moderate with
this much of membership function value on the second day, it is called moderate with
this much of membership function value, say 0.3 divided by 3 is actually slash 3, it is
actually slash 3, and so on. So, on the first day, the temperature is moderate with this
much of membership function value, second day the temperature is moderate with this
much of membership function value, on third day it is moderate with this much of
membership function value, and so on. So, this is the way actually, we can represent the
discrete fuzzy set.

Now, if you see the literature, the same discrete fuzzy set can also be represented in
another form for example, on the first day the temperature is called moderate with this
much of membership function value. On the second day, it is called moderate with this
much of membership function value, and so on. So, this is another way of representing
the discrete fuzzy sets. Now, this is how to represent the discrete fuzzy set, now if I just
see how to represent the continuous fuzzy set. Now, here for this continuous fuzzy set,
this is the way we will have to represent; that means are A(x) is nothing, but so, in place
of summation, we are using the integration.

So, A( x) = ∫ µ A ( x) / x and once again, this is actually not the true integration and this is
X

actually not the true division. So, here actually what you do is, in continuous fuzzy set,
we are going to represent or we are going to fit a curve to represent the fuzzy sets. Now
we are going to look into all such issues in details.

19
(Refer Slide Time: 13:22)

Now, before I go for that, like how to represent the continues fuzzy set more clearly.

So, I am just going to concentrate on one concept that is called the concept of convex
versus non-convex membership function distributions. Now, this is very important, it is
important in the sense, supposing that this is the temperature denoted by x and this is the
membership function value µ . Now, the membership function distribution, it could be
something like this and it can also be something like this, like it will increase then
decrease, it will remain constant once again it increases then it decreases, and here, the
membership function value will go on increasing, then it will reach the maximum then it
will go on decreasing.

So, this is one type of membership function value and this is the second type of
membership function value, in both the types, the value of µ is going to vary from 0 to
1, if I take say 1.0 here and if I take another point here, supposing that this is
corresponding to x_1 and this corresponds to your x_2 and there is a µ value,
membership function value. Now, if I want to check, whether it is a convex membership
function distribution or a non-convex. So, this is the rule to be followed, a fuzzy set is
called convex, if this particular condition gets fulfilled, that is,
µ A{λ x1 + (1 − λ ) x2 } ≥ min{µ A ( x1 ), µ A ( x2 )} , now what is x_1 and what is x_2?

20
So, this is the x_1 a particular value of x or the temperature, this is x_2 another value of
temperature, that is x_2 and its corresponding µ is this much and its corresponding µ is

nothing, but this much. So, this is your µ A ( x1 ) and this is your µ A ( x2 ) . So, I can

compare to find out what is the minimum value, take a particular value of λ lying
between 0 and 1, say you take λ equals to say, 0.6. So, can I now find out what should
be the numerical value corresponding to your left hand side? The answer is yes.

So, I can find out the numerical value of the left hand side, because I know the λ , I
know the value of x_1, x_2. So, I can calculate. So, I will find out a value of x. So,
corresponding to that, I can find out the µ from this distribution and similarly, on the

right hand side, what you can do is I know what is µ A ( x1 ) . So, this is my µ A ( x1 ) and this

is my µ A ( x2 ) , I can compare to find out the minimum.

So, this is nothing, but is your right hand side. Now, if this particular condition holds
good, then we say that this particular membership function distribution is a convex
membership function distribution and if it is of this type for example, say it is increasing,
decreasing, remaining constant, once again increasing and decreasing and if you take one
value here, say this is my x_1, if you take another value here.

So, this is my x_2, I can find out the corresponding µ and if you just check this
particular condition, there is a possibility that this particular condition will not hold good
and that is why, this type of membership is known as non-convex membership function
distribution and this is nothing, but a convex membership function distribution, I hope
the idea behind this particular convex versus non-convex membership function
distribution is clear to all of you.

21
(Refer Slide Time: 18:19)

Now, then comes here, how to represent the membership function distribution.

Now, you see the membership function distribution has been represented using both
linear function as well as the non-linear function. Now here, I am just going to
concentrate on this particular triangular membership function distribution; that means,
this is nothing, but µ varies from 0 to 1 and this is the variable say temperature or
humidity or whatever may be now here, exactly at a, the membership function value is 0;
at x equal to b, the membership function value is 1.0 and once again, at x equal to c, the
membership function value is actually equal to 0.

Now, mathematically how to represent, it is a very simple because, this is the equation of
a straight line. So, I can use =
y mx + c , there is no problem I can find out one
expression. Similarly, I can find out another expression for this particular the straight
light and I can find out corresponding to a particular value of x, what should be the value
for this particular µ ? It is very simple, now if you see the literature.

So, this triangular membership function distribution has been represented using this
particular expression and also, with the help of some max and min operator. For
example, say µtriangle . So, this is the triangle membership function distribution. So,

x−a c−x
µtriangle = max(min( , ), 0) . Now, if I put x equals to a here, say x equals to a, if I
b−a c−b

22
put what will happen? So, if I put x equals to a. So, this will become 0 and this will
become non 0 and c minus a divided by c minus b. So, this will become greater than 1.
So, 0 comma a value greater than 1 and the minimum will be 0 and the maximum
between 0 and 0 will be 0. So, at x equals to a. So, mu will become equal to 0. Similarly,
you can check what happens at x equals to b. So, at x equals to b. So, this will become b
minus a divided by b minus a. So, this will give rise to 1 and here. So, c minus b divided
by c minus b will give rise to 1.0.

Now, the maximum between 1 and 0 is 1. So, at x equals to b, the µ will become equal
to 1 and similarly, you can find out what will happen at x equals to c and once again, you
will be getting µ becomes equal to 0. So, this is the way actually, we can represent the
triangular membership function distribution.

(Refer Slide Time: 21:58)

Now, next comes the trapezoidal membership function distribution, it is very simple. So,
this is x, the variable say temperature or humidity and this is your µ and now at x equals
to a, µ becomes equal to 0, at x equals to b, µ becomes equal to 1 and after that the
value of µ will remain constant up to this, then at x equals to c. Once again, the value
will be 1 and after that it will start decreasing and at x equals to d, the value of the
membership function will become equal to 0, and mathematically actually, this can be
represented using this particular formula.

23
x−a d −x
So, this µtrapeziodal = max(min( ,1, ), 0) . Let me take one very simple example,
b−a d −c
supposing that x is equal to b. So, if I put x equals to b here. So, this will become 1
comma 1 and d minus your b. So, d minus b divided by d minus c. So, this will become
more than 1. So, the minimum among 1 comma 1 comma a value more than 1. So, the
minimum is 1 and maximum between 1 comma 0 is nothing, but your 1. So, µ becomes
equal to 1.

So, at x equals to b µ becomes equal to 1. So, this is actually the trapezoidal


membership function distribution, then comes the Gaussian distribution and you know
so, this particular Gaussian curve is nothing, but a non-linear curve now here. So, this is
actually the membership function distribution µ varies from 0 to 1 and this is the
variable say, temperature or humidity, whatever may be and this is your the Gaussian
distribution and for this particular Gaussian distribution, the mean is here that is denoted
by m and σ is nothing, but is your standard deviation. So, m denotes the mean and σ
denotes the standard deviation.

1
So, µGaussian = x−m
. So, we can find out the expression for this particular µ , the
0.5( )2
σ
e
moment you select a particular value of x. So, we can find out. So, knowing the value for
this mean and standard deviation, we can find out what should be the value for this
particular µ and as I told that this is some sort of non-linear distribution for this
membership, ok?

24
(Refer Slide Time: 25:19)

Now, I am just going to concentrate on another very popular non-linear membership


function distribution and that is known as the Bell shaped membership function
distribution.

Now, the distribution is such so, this is x, this is your µ the membership function value.
So, it will start from here, there will be non-linear distribution and it will go on
increasing and after that it will reach 1 and it will be kept same then comes your it will
go on decreasing something like this, ok. So, this is nothing, but the bell shaped
membership function distribution and this is the mathematical expression for your
1
membership function distribution. So, µ Bell − shaped = . Now, here c indicates
x − c 2b
1+ | |
a
that this is nothing, but the centre of this particular distribution, which is visible from
here. Now, what does this particular a indicate? Now, a denotes actually the width of this
particular distribution. So, a denotes actually the width of the distribution and this
particular b, that is considered a positive value, and it indicates actually the slope of this
particular distribution.

The slope of the distribution is denoted by b. Now, let us see what happens, if I take a
very high value for this b? Now if I take a high value for this particular b, the distribution
will be stiffer and if I take a low value for this particular b, the distribution will become
flatter one, ok.

25
So, this is the way actually we control the distribution of this particular function with the
help of a, b and c and this bell shaped membership function distribution is very popular.

(Refer Slide Time: 27:51)

Then comes another very popular membership function distribution that is very
frequently used and that is known as the sigmoid membership function and this is
1
actually the mathematical expression: µ sigmoid = − a ( x −b )
.
1+ e

So, this particular a is nothing, but a positive value and here, I have got a negative sign
before a, but a itself is a positive value and a indicates the slope of this particular
distribution, the higher the value of this particular a, the stiffer will be the curve and
vice-versa.

So, if I put x equals to b in this particular expression. So, this will become 1 divided by 1
plus e raised to the power 0 and that is nothing, but 1. So, I will be getting your 1 divided
by 2, that is 0.5. So, corresponding to this x equals to b. So, I will be getting your µ , that
is equal to 0.5. So, this is actually the membership function distribution, which we are
very frequently using.

26
(Refer Slide Time: 29:17)

Now, whatever I have discussed in this particular lecture, you can consult the book
written by me, that is, Soft Computing Fundamentals and Applications.

So, you will be getting all the details in this particular text book, which is the text
textbook for this course.

(Refer Slide Time: 29:38)

Now, let me conclude, whatever I discussed in this particular lecture. So, to start with I
gave a brief introduction to the concept of the classical set or the crisp set. I discussed the
10 properties of the classical set or the crisp set, I tried to find out the reason, why should

27
we go for the concept of fuzzy set, the concept of fuzzy set has been defined with the
help of suitable examples and we have seen, how to represent the different fuzzy sets; the
fuzzy sets could be either discrete or it could be continuous, but we should be able to
represent those fuzzy sets.

Thank you.

28
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 03
Introduction to Fuzzy Sets (Contd.)

We are discussing the grammar of fuzzy sets. So, we will be continuing with the topic
introduction to fuzzy sets.

(Refer Slide Time: 00:27)

Now, at the beginning of this lecture. So, we are going to discuss how to solve some
numerical examples related to determination of membership value for different types of
membership function distribution and after that we are going to define a few terms
related to fuzzy sets, some standard operations used in fuzzy sets will be discussed.

The properties of fuzzy sets will be explained and at the end. Two terms, namely
fuzziness and inaccuracy of fuzzy sets will be defined and we will be solving some
numerical examples like how to determine the fuzziness and inaccuracy of fuzzy sets.

29
(Refer Slide Time: 01:24)

Now, to start with the numerical examples related to determination of the membership
value, that is, µ , let me start here with a triangular membership function distribution and
our aim is to determine the membership value, that is, µ corresponding to a particular
value of the variable say x equals to 8.0.

Now, here, I am just going to consider one triangular membership function distribution.
So, this is nothing, but the triangular membership function distribution and if you see
now, here, I have written a equals to 2, b equals to 6 and c equals to 10. Now, using these
a, b and c, mathematically I can define. So, this particular triangular membership
function distribution, I have already discussed in the last lecture. Now, along this axis, it
is the variation of µ , and µ varies from 0.0 to 1.0.

Now, here let us see how to determine the value of µ corresponding to a particular value
of x and here, we have assumed x equals to 8.0.

30
(Refer Slide Time: 02:50)

Now, this µtriangle is nothing, but maximum of the minimum between x minus a divided

by b minus a comma c minus x divided by c minus b. So, what you will have to do is,
you will have to find out the minimum between these two, and after that you will have to
compare this particular minimum and 0, and find out the maximum. Now, here, if I just
insert the values for this a, b and c. So, by substituting the values for a, b and c we get the
maximum between the minimum of x minus 2 divided by 6 minus 2, comma 10 minus x
divided by 10 minus 6, comma 0. Here, a equals to 2, b equals to 6 and c equals to 10.

Now, if you simplify. So, this can be written as maximum of the minimum between x
minus 2 divided by 4 comma 10 minus x divided by 4 comma 0. Now, if you put x
equals to 8, then we will be getting µtriangle equals to maximum of the minimum between.

So, x equals to 8 if I put. So, this will become 6 by 4 and that is nothing, but 3 by 2 and
here, if I put x equals to 8, this will become 2 by 4 is nothing, but 1 by 2. So, what you
can do is. So, we can find out the minimum between 3 by 2 comma 1 by 2 and the
minimum is your half and now, we will try to find out the maximum between half and 0
and that is nothing, but half. So, this µtriangle is coming to be equal to 0.5.

So, this is the way corresponding to a particular value of x, we can find out, what should
be the value for the membership for the triangular membership function distribution.

31
(Refer Slide Time: 05:02)

Now, we are going to discuss what happens in case of trapezoidal membership function
distribution. So, once again our aim is to determine the value of this particular µ , that is
the membership function value corresponding to x equals to 3.5. Now, this is nothing,
but the trapezoidal membership function distribution. So, this is the trapezoidal
membership function distribution. So, here a is kept equal to 2, b is kept equal to 4, c is
kept equal to 8 and d is equal to 10. Now, using this information of a, b, c and d,
mathematically you can express µtrapeziodal .

(Refer Slide Time: 05:53)

32
So, if you see that µtrapeziodal , that particular mathematical expression, this will become

something like this. So, µtrapeziodal is nothing, but the maximum between two. Now, the

first is actually, we will have to find out the minimum among your x minus a divided by
b minus a comma 1 comma d minus x divided by d minus c. So, we will have to find out
the minimum among these three and then, will have to compare with 0 and will have to
find out the maximum.

Now, if we substitute the values for these a, b, c and d, we will be getting the maximum
between the minimum among x minus 2 divided by 4 minus 2 comma 1 comma10 minus
x divided by 10 minus 8 comma 0. Now so, if you simplify this, it will become the
minimum among x minus 2 divided by 2 comma 1, 10 minus x divided by 2 and we
compare the minimum among these three and this 0, and we will try to find out the
maximum and whatever you get. So, let me just try to see.

(Refer Slide Time: 07:18)

If we put x equals to 3.5, we will be getting µtrapeziodal is nothing, but the maximum

between these two, and before that, we will have to find out the minimum among these
three, that is your that is your 1.5 divided by 2 comma 1 comma 6.5 divided by 2.

So, the minimum among these three is nothing, but 0.75 that is your 1.5 divided by 2 and
now, we will have to compare 0.75 and 0 and will have to find out the maximum and that

33
is nothing, but 0.75. So, you can find out the value for this µtrapeziodal and that is becoming

equal to 0.75.

(Refer Slide Time: 08:12)

Now, then we are going to concentrate on the Gaussian membership function distribution
and our aim is to determine membership value. So, this particular µ corresponding to
your x equals to 9.0. Now this is nothing, but the Gaussian membership function
distribution.

So, if you see, this is the Gaussian membership function distribution. So, this is x
direction and this is the µ and we know the mathematical expression for this particular
Gaussian membership function distribution.

34
(Refer Slide Time: 08:44)

As the mathematical expression for this is nothing, but µGaussian is 1 divided by e raised to
the power half into x minus m divided by sigma square and m is nothing, but your the
mean of the Gaussian distribution and sigma denotes the standard deviation. Now we are
going to substitute the values for your m equals to 10.0 and the standard deviation is
equal to 3.0.

So, we will be getting this particular expression that is mu Gaussian is nothing, but 1
divided by e raised to the power half x minus 10 divided by 3 square, and now we put x
equals to 9.0 then µGaussian is nothing, but 1 divided by e raised to the power half. So, x
equals to 9.0 minus 10.0 divided by 3.0 square and if you simplify you will be getting
0.9459. So, we can find out, what should be the value for this membership function
corresponding to the Gaussian distribution.

35
(Refer Slide Time: 10:02)

So, this is how to tackle the Gaussian distribution, Now, we are going to concentrate on
another membership function distribution, that is called bell-shaped membership
function a distribution and our aim is to determine the value for µ corresponding to say x
equals to 8.0.

Now, let us see how to do it and this is nothing, but is your bell-shaped membership
function distribution. So, this is the distribution and µ varies from 0 to 1. So, let us try
to find out the value for µ corresponding to x equal to 8.0.

(Refer Slide Time: 10:44)

36
Now, this is the mathematical expression, which I have already discussed that
1
µ Bell − shaped = and this I have already discussed that the meaning for this a, b
x − c 2b
1+ | |
a
and c. Now, here, we take c equals to 10.0 and that is actually nothing, but the centre of
this distribution, a indicates the spread of this distribution and let be consider, a is equal
to your say 2.0 and b is nothing, but a 3.0, we have assumed.

Now, we put c is equal to 10, a is equal to 2, b is equal to 3.0. So, we will be able to find
out. So, this µ Bell − shaped is nothing, but 1 divided by say 1 plus the mod value of 8 minus

10 divided by 2 raised to the power 6. Now 8 minus 10 is minus 2 divided by 2 is equal


to minus 1, the mod value of that is nothing, but plus 1, it is raised to the power of 6. So,
this will become equal to 1. So, 1 divided by 2 and this is going to give rise your 0.5.

So, this µ Bell − shaped is equal to 0.5, corresponding to x equals to 8.0. Now, this is the way

actually, we should be able to find out.

(Refer Slide Time: 12:37)

The value for the membership function distribution. Now, I am just going to concentrate
on another very popular membership function distribution that is known as sigmoid
membership function distribution. Now, this sigmoid membership function distribution is

37
1
mathematically expressed as follows. So, µ sigmoid = − a ( x −b )
. Now, our aim is to
1+ e
determine the value for this µ corresponding to x equal to 8.0 now.

So, this is actually the distribution for this sigmoid membership function. Now,
corresponding to your b equals to 6.0 and a equals to 2, I will be getting the expression
for µ sigmoid , that is nothing, but 1 divided by 1 plus e raised to the power minus a, (a

equals to 2) multiplied by x minus b, b is equal to 6. Now, here, if you substitute the


value for x that is your x equals to 8.0, you will be getting µ sigmoid is nothing, but 1

divided by e raised to the power minus 2 multiplied by 2.0. So, this is nothing, but 1
divided by 1 plus e raised to the power minus 4 and that is nothing, but 0.98,
corresponding to x equals to 8.0.

So, we are able to find out the value for this particular µ for the sigmoid membership
function distribution and that is coming to be equal to 0.98. So, this is the way actually
we can determine the membership function value for various distributions used for the
representing the fuzzy sets.

(Refer Slide Time: 14:44)

So, till now actually we have discussed, we have defined, the concept of fuzzy sets and
the concept of your classical sets. Now, to recapitulate the definitions and the difference
between the classical set that is crisp set and the fuzzy set. Now, let me take another

38
example, now this particular example actually is going to tell us, what is the difference
between the fuzzy sets and the crisp sets more clearly.

Now, this particular example is something like this supposing that I am just going to
invite my friends for today’s dinner party and the time has been given as 7 pm for the
dinner. Supposing that, I have invited a large number of friends. Now, some of my
friends will be coming might be at 6.50 pm, some of them may come at 6.55 pm, some
will be coming exactly at 7.00 pm, some may also come at say 5 minutes past 7 or some
other may come say 10 minutes past 7, and so on. Although I have invited them and I
have requested them to come exactly at 7 pm. So, some of them may come before 7 and
some other people may come after 7.

Now, if you see the concept of the classical set or the crisp set, this particular distribution
indicates the crisp set representation for 7 pm. So, this particular thing once again let me
tell you, if this is the crisp set presentation for your 7.00pm. Now the fuzzy set
representation for 7 pm is something like this. So, this is actually the fuzzy set
representation for 7 pm; that means, if some of my friend is coming at 6.50 pm will
assume that he or she or they have come at 7 pm with the membership function value say
0.33. Similarly, if a friend comes at 6.55 pm, it is assumed that he has come for the
dinner which is scheduled at 7 pm with the membership function value of 0.66, the same
situation occurs like, if we comes at 7.05 pm; that means, 5 minutes past 7.00pm.

Then, we will assume that he has attended that particular party and he has come at 7 pm
with the membership function value of 0.66 and if he comes at 7.10pm, we will assume
that he has come at 7 pm with the membership function value of 0.33. Now, here, let us
try to understand.

Now, the 7 pm, as I told, has been expressed like this. On the other hand, the 7 pm in
fuzzy set has been expressed by this particular triangle, ok. Now, let me try to find out
the difference between the classical set and the fuzzy set. So, this is nothing, but the crisp
set representation for 7 pm and this is nothing, but the fuzzy set representation for the 7
pm. Now, my query is, which one is more practical? Now, we know it is bit difficult to
join the party exactly at 7 pm.

So, this particular distribution for 7 pm that is the crisp set representation is bit difficult
to achieve, whereas the fuzzy set representation for the 7 pm, it is little bit easier to

39
achieve and that is why, the fuzzy set representation for the 7 pm is a more practical way
of representing the time, that is your 7 pm. So, let me conclude that fuzzy set
representation could be more practical compared to your the crisp set representation for
the 7 pm. So, I hope, you have understood the difference between the representation of
crisp set and this particular the fuzzy set.

(Refer Slide Time: 19:34)

Now, I am just going to concentrate on a few terms, which are very frequently used in
fuzzy sets and I am just going to define those terms and I am just going to solve a few
numerical examples also. Now, the first term, which I am going to define related to fuzzy
sets is very popular, that is known as the α − cut of a fuzzy set. Now, supposing that I
have got a fuzzy set, which is denoted by A(x). Now, it is α − cut is represented by this
alpha mu A x. So, this is nothing, but the alpha cut of a fuzzy set.

Now, the value for this particular alpha will vary from 0.0 to 1.0. Now, if I insert a
particular value for this particular alpha, for example, say alpha equals to say 0.7. So,
truly speaking, I am just going to find out, what should be the 0.7-cut of the fuzzy set.
Now, let us see how to define this α − cut of a fuzzy set, which is represented by
α µ A ( x) {x | µ A ( x) ≥ α } . So, this is actually the definition for this particular the α − cut .
=

(Refer Slide Time: 21:10)

40
Now, I am just going to take one example just to tell you, what do we mean by the
definition of alpha cut. Now, let me assume that this particular triangular membership
function distribution is going to represent a fuzzy set and that is nothing, but A(x). So,
A(x) is going to represent a fuzzy set and it is α − cut , I am just going to define. Now,
supposing that the α is here say 0.4 or 0.45 something like this. Now, corresponding to
this particular α , you draw one line here. So, this is the line and it is going to intersect at
these particular points.

Now, corresponding to this particular point, you try to find out the value for this
particular x and according to the definition of this particular α − cut , this µ A ( x) should

be either greater than or equal to α . So, this is the value of α . So, more than α means,
I am here, ok. So, either alpha or more than alpha and if I just follow that, I will be able
to find out the subset of this particular fuzzy set and that is nothing, but this. So, this is
known as the α − cut of this particular the fuzzy set. So, this is the way actually we can
define the α − cut of that particular fuzzy set.

Now, another term I am just going to define here and that is known as the strong alpha
cut of a fuzzy set. Now, by a strong α − cut of a fuzzy set, we mean it is α + µ A ( x) . So,

just to indicate the strong α − cut , we take the help of this particular symbol of plus and
by definition this is nothing, but x such that µ A ( x) ≥ α . So, this is actually the definition

of your the strong α − cut of a fuzzy set.

41
(Refer Slide Time: 23:31).

Now, we are going to solve one numerical example, just to make it more clear.

Now, the statement of the numerical example is as follows: the membership function
distribution of a fuzzy set is assumed to follow a Gaussian distribution with mean m
equals to 100 and standard deviation, that is, your sigma is equal to 20. Now, determine
0.6-cut of this particular distribution. So, I have got a membership function distribution,
which is nothing, but Gaussian and whose mathematical expression is nothing, but
1
µ= x−m
. So, this is nothing, but the membership function distribution for this
0.5( )2
σ
e
particular Gaussian distribution. Now, where m is nothing, but the mean and sigma is the
standard deviation. So, what we do is, we substitute the values for µ . So, µ is equal to
0.6, the mean m equals to 100 and standard deviation sigma equals to 20, ok. Now, let us
try to find out like what should be the values of x for this µ ?

42
(Refer Slide Time: 24:59)

Now, what I do is, substitute the values for the µ , m and σ , and this is nothing, but is
your the Gaussian membership function distribution. So, this is the Gaussian
membership function distribution with mean equals to m.

Now if you substitute the values. So, this will become µ equals to 0.6 is equal to 1
divided by e raised to the power half multiplied by x minus 100 divided by 20 square and
if you simplify. So, this will become e raised to the power half x minus 100 divided by
20 square and that is equals to 1 divided by 0.6. Now, we can take log on both the sides.
So, ln (that is log base e) e raised to the power half x minus 100 divided by 20 square is
nothing, but ln 1.6667 and if you simplify, you will be getting the values for x like
79.7846 and 120.2153.

Now, if I just plot. So, on this plot, if I just indicate the values, this is nothing, but
79.7846 and this is nothing, but 120.2153 and here. So, you will be getting one range for
this particular x. So, this is nothing, but is your 0.6-cut of this Gaussian distribution. So,
this is the way, actually you will be able to find out the α − cut of that particular fuzzy
set.

43
(Refer Slide Time: 26:51)

Now, we are going to define another term and that is known as support of a fuzzy set.
Now let us see, what do we mean by the support of a fuzzy set. Now, by definition,
support of a fuzzy set x is nothing, but say x belongs to the universe of discourse that is
capital X, such that mu_A(x) is greater than 0. So, this is what we mean by actually the
support of a fuzzy set.

So, mu_A(x) is greater than 0, if I consider. Now, let me try to consider say, this is a
particular fuzzy set, say the triangular membership function distribution, if I consider.
So, this is nothing, but A(x). So, µ is actually along this direction. So, this is 0.0 and
corresponding to this is 1.0. Now, let me try to read the definition once again, support of
a fuzzy set A(x) is nothing, but x belongs to capital X, that is the universe of discourse,
such that µ A ( x) is greater than 0. So, corresponding to this, µ is equal to 0.0 and if I

consider that µ is greater than 0. So, as if I am just going to touch, I am just going to
indicate the limiting value for this particular x and this is going to indicate the support of
a fuzzy set and here, you see x belongs to capital X.

So, this is nothing but x. So, x belongs to capital X, that is universe of discourse and this
particular condition holds good, that is, µ A ( x) is greater than 0. So, this is going to
indicate the support of a fuzzy set. Now, the next is the scalar cardinality of a fuzzy set.
So, by scalar cardinality, we mean and that mod value of A(x). So, A(x) is nothing, but
the fuzzy sets and it is scalar cardinality is denoted by mod value of A(x). So, this

44
particular symbol and that is nothing, but is your summation µ A ( x) , x belongs to capital
X. So, this is nothing, but the collection of all membership function values.

(Refer Slide Time: 29:26)

Now, I am just going to solve one numerical example, just to find out the scalar
cardinality of a particular fuzzy set.

Now, let me assume that say this is nothing, but a fuzzy sets having say the discrete
values corresponding to the four values of the elements. So, A(x) is nothing, but x_1
comma 0.1, x_2 comma 0.2, x_3 comma 0.3, x_4 comma 0.4. So, this is nothing, but a
discrete fuzzy set. Now, a scalar cardinality is denoted by this particular symbol and that
is nothing, but the sum of all the µ values; that means, we have got 0.1 + 0.2 + 0.3 + 0.4
and this is nothing, but 1.0. So, 1.0 is coming as scalar cardinality of this particular fuzzy
set A(x).

Thank you.

45
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 04
Introduction to Fuzzy Sets (Contd.)

(Refer Slide Time: 00:14)

Now, we are going to define another term of a fuzzy set and that is known as the core of
a fuzzy set. Now, let us see what do you mean by the core of a fuzzy set. The core of a
fuzzy set is defined as your one-cut. Now, let me just try to draw one fuzzy set here,
supposing that this is nothing, but a fuzzy set, that is, your A(x) and the µ is varying; so
this is 0.0 and this is nothing, but 1.0 and this is the x direction.

Now, here, this is nothing, but a one-cut; one-cut means what? According to the
definition. So, µ should be greater than or equal to 1, but it cannot be more than 1. So, it
is exactly equal to 1. So, corresponding to this particular 1.0, that is µ equals to 1, I can
find out what should be the corresponding value for this particular x, ok. So, this is
actually the value for the x for which µ becomes equal to 1.0 and this is going to
indicate the core of a fuzzy set. So, the core of a fuzzy set is nothing, but it is a one-cut.
So, I hope, the meaning for this particular term is clear to you.

46
Now, I am just going to define another term, that is called the height of a fuzzy set. Now,
how to define the height of a fuzzy set? Now, corresponding to the different values for
this particular variable, that is, x, I can find out so what should be the value for this µ
another value x. So, what should be the value of the mu for another value of x, what
should be the value for this particular µ ? So, corresponding to the different values for
this particular x, so I can find out what should be the membership function values and
out of all the membership function values, we try to find out the largest one.

So, the height of a fuzzy set is defined as the largest of all the membership function
values corresponding to the different values of this particular element. Now, here, if I
consider. So, here, the maximum value for the membership is nothing, but 1.0. So, height
of this particular fuzzy set is nothing, but is your 1.0. So, corresponding to this particular
fuzzy set, its height is nothing, but 1.0. So, this is the way actually, we define the height
of a fuzzy set.

(Refer Slide Time: 03:09)

Now, I am just going to define another term, that is called the normal fuzzy set. Now, a
fuzzy set is called normal, if its height is found to be equal to 1.0, now let me draw one
membership function distribution and this is nothing, but a fuzzy set. So, this is A(x) is
the fuzzy set and µ is say 0.0 and here, it is 1.0. Now, what is the height here? The
height is nothing, but the height of this particular fuzzy set and here, the height of the

47
fuzzy set is nothing, but 1.0 and this is a normal fuzzy set. Now, there is a another
concept that is called the sub-normal fuzzy set.

Now, let me take another example, now supposing that I am drawing a fuzzy set here and
this is the µ , say µ is equal to 0.0 and here, I have got 1.0. Now, supposing that I have
got a fuzzy set something like this and if I can say and this is the x. Now, if I try to find
out the height, this is nothing, but the height of this particular fuzzy set, if this is your A
(x), ok. Now, here so this particular height is less than 1.0 because a 1.0 is here and this
particular height could be around say 0.8 or 0.7 something like this, ok. Now, here the
height of the fuzzy set is less than 1 and this is what you mean by the sub-normal fuzzy
set.

(Refer Slide Time: 04:57)

So, we have defined the normal fuzzy sets and the sub-normal fuzzy sets. So, we have
defined a few terms related to the fuzzy sets and now, we are going to concentrate on
some standard operations, which are very frequently used in fuzzy sets and let me try
with the first operation, that is called the proper subset of a fuzzy set. Now, how to
define the proper subset of a fuzzy set? Supposing that I have got two fuzzy sets say A(x)
and another is your B(x) and these are defined in the universe of discourse or say x
belongs to capital X.

Now, I am just going to compare these two fuzzy sets. And, I am just going to say how
to carry out this particular operation, that is called the proper subset of a fuzzy set or how

48
to declare that this particular fuzzy set is a proper subset of another fuzzy set. Now, here
the set A(x) will be called the proper subset of B(x), if µ A ( x) becomes a less than

µ B ( x) . So, if this particular condition holds good, we declare that A(x) is a proper subset
of this particular B(x). Now, I am just going to take the help of one numerical example.

(Refer Slide Time: 06:32)

Now, supposing that I have got two fuzzy sets, two discrete fuzzy sets: one is A(x) is
nothing, but x_1 comma 0.1, x_2 comma 0.2, x_3 comma 0.3, x_4 comma 0.4 and
another that is B(x) x_1 comma 0.5, x_2 comma 0.7, x_3 comma 0.8, and x_4 comma
0.9. Now, I am just going to compare. Now, for all x belonging to capital X, there is the
universe of discourse; now if I compare the element-wise, their µ values for example,
corresponding to x_1. So, if I compare these two corresponding to x_2, if I compare
these two then corresponding to x_3, if I compare these two then corresponding to x_4, if
I compare these two.

So, we can observe that µ A ( x) is less than µ B ( x) because 0.1 is less than 0.5, 0.2 is less
than 0.7, and so on. So, we can declare that this particular A(x) is a proper subset of your
B(x) and because this particular condition holds good. So, this is the way, actually we
can compare and declare whether a particular set is a proper subset of another set or not.

49
(Refer Slide Time: 08:21)

So, the next operation is actually how to declare that two fuzzy sets are equal. Now, let
me take the same example, say I have got two fuzzy sets A(x) and B(x) defined in the
same universe of discourse. Now, we call like A(x) is equal to B(x), if and only if µ A ( x)

is found to be equal to your µ B ( x) .

(Refer Slide Time: 08:54)

Now, what will have to do is element-wise, we will have to compare, now element-wise
if we compare, the two fuzzy sets for example, say one is nothing, but is your A(x) is
x_1, 0.1; x_2, 0.2; x_3, 0.3; x_4, 0.4 and B(x) is defined something like this. So,

50
element-wise will have to compare the µ values. Now, if I compare then 0.1 and 0.5

they are not equal, 0.2 and 0.7 they are not equal, and so on. So, my decision is µ A ( x) is

not equal to your µ B ( x) ; that means, your A(x) is not equal to the B(x).

So, the fuzzy set A(x) is not equal to fuzzy set B(x). So, this is the way actually, we can
compare and declare whether the two fuzzy sets are equal or not.

(Refer Slide Time: 09:53)

Now, I am just going to discuss another operation, which is nothing, but the complement
of a fuzzy set. Now, how to determine the complement of a fuzzy set, by definition the
complement of a particular fuzzy set A(x) is nothing, but is your A( x) and that is
nothing, but 1 minus your A(x). So, this is the definition of the complement of a fuzzy
set.

Now, supposing that I have got the fuzzy set like this, that is A(x). So, this is nothing, but
the fuzzy set and its complement is nothing, but this. How to find out the complement? It
is very simple, now here, this is nothing, but your A(x). Now, corresponding to this value
of x, what is the value of µ ? µ is equals to 0, corresponding to this, what is the value of
mu is equals to 0. So, 1 minus 0 is nothing, but 1. So, I will be getting up to this, I will be
getting this type of compliment, then from here to here, this µ is going to increase from
0 to 1; that means, your A(x) bar, its complement, is going to decrease starting from 1 up
to 0.

51
Now, you concentrate here, corresponding to this value of x. So, this is the value of µ
corresponding to A(x) and that is nothing, but 1. So, 1 minus 1 is 0. So, this will be the
µ corresponding to its complement and starting from here, the value of µ is decreasing.
So, if I concentrate on its complement, the value for the µ will be increasing from 0 to 1
and after that, the value for the µ is kept equal to 0. So, 1 minus 0 is nothing, but 1. So,
the dotted is going to indicate actually the complement of the fuzzy set.

(Refer Slide Time: 12:17)

Now, I am just going to solve one numerical example just to show you like how to find
out the complement of a fuzzy set.

Now, supposing that I have got a discrete fuzzy set something like this. So, A(x) is
nothing, but this, that is, x_1 0.1 comma x_2 0.2 comma x_3 0.3 comma x_4 0.4 is
nothing, but is your A(x). Now, its complement A( x) , so element-wise I will have to
find out 1 minus that µ . So, here, µ is 0.1. So, 1 minus 0.1 is nothing, but 0.9. So, for
its complement corresponding to x_1, the µ value will be 0.9, corresponding to x_2 it
will be 1 minus 0.2 and that is nothing, but 0.8. So, I will be getting this then
corresponding to x_3, I will be getting 0.7, corresponding to x_4. So, I will be getting
0.6. So, following this particular principle; we can find out the complement of the fuzzy
set.

(Refer Slide Time: 13:37)

52
Now, I am just going to discuss the concept of intersection of two fuzzy sets, now once
again let me repeat that I have got two fuzzy sets A(x) and B(x) defined in the same
universe of discourse, that is capital X. Now, how to find out the intersection of these
two fuzzy sets, now intersection of these two fuzzy sets is denoted by this particular
symbol. So, A intersection B(x) is nothing, but intersection of the two fuzzy sets A(x)
and B(x). Now, to define this particular intersection, so what we do is, we try to find out
what should be the µ value and we will have to compare, in fact, the µ values. So, mu
A intersection B(x) is nothing, but the minimum between µ A ( x) and µ B ( x) . So, the two

µ values will have to compared and we will have to find out its minimum and that is
going to indicate the intersection of the two fuzzy sets.

53
(Refer Slide Time: 14:57)

Now, I am just going to take actually one numerical example just to show you like how
to determine the intersection of two fuzzy sets, but before that, let me try to find out this.
Now, as I told that this is nothing, but A(x) one fuzzy set and this is another fuzzy set,
that is B(x) defined on the same universe of discourse and this is actually the direction of
x and this is the µ , now the moment I am here. So, corresponding to this A(x), I have
got µ A ( x) and that value is what? That is your 0 here, but µ B ( x) , the B(x) has not yet
been started. So, I am just going to proceed in this particular direction. The moment, I
come here, so I have got the membership function value corresponding to this particular
A(x). So, this is nothing, but the membership function value corresponding to this A(x)
and corresponding to this B(x) the membership function value is equal to 0.0 and we try
to consider the minimum.

So, the minimum between 0.0 and a particular value, which is very near to say 0.9. So,
the minimum is 0. So, I am just going to consider that µ corresponding to this is
nothing, but 0 its intersection is nothing, but 0, the moment I consider another value for
this particular x. So, corresponding to A(x), I will be getting some µ corresponding to
the B(x). I will be getting another µ you compare and you consider the minimum. So,
ultimately, I will be getting this type of the area which is the common to both the fuzzy
sets.

54
So, by intersection actually, what we mean is, we try to mean the common area between
the two fuzzy sets. So, this is actually the common between the two fuzzy sets and this is
nothing, but the intersection of the two fuzzy sets and this is similar to the logical AND
operation. So, by AND operation actually, we always try to consider the minimum and
this is also known as the mean operation or the mean operator. So, AND is nothing, but
the mean operator and that is nothing, but the intersection, the concept of intersection of
two fuzzy sets.

(Refer Slide Time: 17:31)

Now, I am just going to solve one numerical example, now supposing that I have got two
fuzzy sets, two discrete fuzzy sets one is A(x) is nothing, but x_1 comma 0.1 comma x_2
comma 0.2 comma x_3 comma 0.3 comma x_4 comma 0.4. So, this is nothing, but the
fuzzy set A(x) and I have got another fuzzy set which is nothing, but the B(x). Now,
what we do is, we concentrate element-wise first you try to concentrate on this particular
the x_1. So, µ( A∩ B ) ( x1 ) is nothing, but the minimum between µ A ( x1 ) and µ B ( x1 ) .

So, what you do is, these two values, we try to compare, that is the minimum between
0.1 and 0.5 and the minimum is nothing, but 0.1. Now, next, we try to concentrate on
x_2. So, the µ( A∩ B ) ( x2 ) is nothing, but the minimum between 0.2 and 0.7 and we will be

getting 0.2, next we try to concentrate on x_3. So, µ( A∩ B ) ( x3 ) is nothing, but the

minimum between 0.3 and 0.8 and that is nothing, but 0.3, next we try to find out the

55
µ( A∩ B ) ( x4 ) and that is nothing, but the minimum between 0.4 and 0.9 and you will be
getting 0.4. So, this is the way actually, we can find out your intersection.

(Refer Slide Time: 19:36)

Then, comes the concept of the union of two fuzzy sets. Now, let me once again consider
the two fuzzy sets like A(x) and B(x) define in the same universe of discourse and their
union is represented by this particular symbol, that is, ( A  B )( x) , such that its

membership function value, that is, µ( A B ) ( x) is nothing, but the maximum between

µ A ( x) comma µ B ( x) . So, what we do is, we try to compare the membership function


values element-wise and we are going to consider the maximum.

56
(Refer Slide Time: 20:27)

So, I am just going to take some example. Now, here you can see that I have got one
membership function distribution and one fuzzy sets like A(x) and another fuzzy sets like
B(x) defined in the same universe of discourse.

Now, I am just varying the value of x, the moment I am here, I have got the µ
corresponding to your say A(x) is nothing, but 0 but corresponding to the B(x), suppose
that this is absent. So, we consider this thing as the maximum, now the moment I am
here, corresponding to this A(x) might be this is the membership function value, say 1.0
and corresponding to B(x) the membership function value is 0.0 and if I just compare the
maximum will be 1.0. So, I will have to consider up to this. Next, the moment whenever
I am here, so corresponding to this particular A(x), I will be getting some µ value and
corresponding to the B(x), I will be getting some µ value, and we will have to consider
the maximum.

Now, if I follow this particular method, then there is a possibility that I will be getting
this shaded portion as actually the union of these two fuzzy sets. So, by union, we mean
actually the AND operator and this OR, sorry I am sorry. So, this is the OR operator and
this OR operator is nothing, but is actually the max operator. So, we try to find out the
maximum between the two µ values and that will give you the concept of union and that
is nothing, but is your OR operator. So, OR operator is nothing, but the max operator and
that is nothing, but the union of the two fuzzy sets.

57
(Refer Slide Time: 22:42)

Now, I am just going to just solve one numerical example just to give you the concept of
this particular union of two fuzzy sets.

Now, let me consider once again that I have got two fuzzy sets like one is A(x) and
another is B(x) and these particular fuzzy sets are nothing, but the discrete fuzzy sets. So,
element-wise we have got the membership function values, that is the µ values, now if I
want to find out that µ( A B ) ( x1 ) ; that means, corresponding to this particular x_1. So, I

am just going to compare these two µ values, that if your µ A ( x1 ) and µ B ( x1 ) and we try
to find out the maximum and; that means, we try to find out the maximum between 0.1
and 0.5 and the maximum value will be 0.5. Similarly corresponding to this particular
x_2, we are trying to compare the two µ values.

So, µ( A B ) ( x2 ) is nothing, but the maximum between 0.2 and 0.7 and the maximum

value is 0.7 then corresponding to x_3. So, µ( A B ) ( x3 ) is nothing, but the maximum

between 0.3 and 0.8. So, I will be getting 0.8, then corresponding to this particular x_4.
So, µ( A B ) ( x4 ) is nothing, but the maximum between your 0.4 and 0.9 and this is

nothing, but 0.9. So, I can find out the union of these two fuzzy sets.

58
(Refer Slide Time: 24:40)

Then, comes the concept of algebraic products of two fuzzy sets, now supposing that I
have got say two fuzzy sets, one is A(x) another is B(x). So, by algebraic product
actually, we mean another set whose membership function values will be nothing, but a
µ A ( x) multiplied by your µ B ( x) . So, this is actually what you mean by algebraic
product of two fuzzy sets, now I am just going to take one numerical example just to
make it clear.

(Refer Slide Time: 25:18)

59
Now, supposing that I have got two fuzzy sets one is your A(x) another is your B (x). So,
what I do is, element-wise, the µ values we simply multiply.

So, what I do is corresponding to this particular your x_1. So, what I do is your so, we
multiply 0.1 multiplied by 0.5. So, this is nothing, but 0.05. So, this A( x).B( x) is
nothing, but x_1 comma 0.05.

Similarly, corresponding to this x_2, I will have to multiply 0.2 by 0.7 so, this will
become 0.14. So, I will be getting 0.1 4, then corresponding to this particular x_3. So, I
will be getting 0.3 multiplied by 0.8 and this is nothing, but is your 0.24, so you will be
getting 0.24. Then, corresponding to this particular x_4, I will be getting 0.4 multiplied
by 0.9, so I will be getting 0.36. So, by following this particular method, you can find out
the product of two fuzzy sets.

Thank you.

60
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 05
Introduction to Fuzzy Sets (Contd.)

We are discussing some standard operations used in fuzzy sets. We have already
discussed a few and now, we will be discussing a few more.

(Refer Slide Time: 00:29)

Now, let me start with the concept of multiplication of a fuzzy set by a crisp number.
Now, here so, what we are going to do is, we are going to multiply a fuzzy set by a crisp
number, as follows:

Say A (x) is a fuzzy set; now this particular fuzzy set, I am just going to multiply by a
crisp number, that is, d. Now, if you see how to write down this? So, d multiplied by A
(x) is nothing, but x such that you have got the membership function value which is
nothing, but d multiplied by µ A ( x) , x belongs to capital X.

Now, here, the range for this particular d, generally we consider the d is greater than 0
and less than equals to 1.0. Now, my question is, why do you need this? We need this
particular concept, because if you want to optimize the shape and size of the membership

61
function distribution, we will have to consider this particular concept of multiplication of
a fuzzy set by a crisp number.

Now, let me take one numerical example, the things will be made more clear.

(Refer Slide Time: 01:53)

Now, supposing that I have got a fuzzy set A (x), which is nothing, but x_1 comma 0.1,
x_2 comma 0.2, x_3 comma 0.3, x_4 comma 0.4. So, this is a fuzzy set, a discrete fuzzy
set having 4 elements x_1, x_2, x_3 and x_4, and supposing that the crisp number d is
set equal to 0.2.

Now, if d is equal to 0.2 now, d multiplied by A (x). So, this particular thing will become
equal to x_1 comma, d is multiplied by this particular mu value. So, d is 0.2, multiplied
by 0.1 and this is nothing, but 0.02. So, corresponding to this particular x_1; this will be
d multiplied by µ and this will become equal to 0.02.

Now, following the same procedure I can also find out what will happen to
corresponding to this particular x_2. Now, here corresponding to x_2 the membership
function value is 0.2 multiplied by the d is 0.2. So, this is nothing, but 0.04. So,
corresponding to this x_2 we will have 0.04, similarly corresponding to x_3 we will have
0.3 multiplied by 0.2 and that is nothing, but 0.06.

Then corresponding to x_4 we will have 0.4 multiplied by 0.2 and this is nothing, but
0.08. So, corresponding to your x 4; so we will be getting 0.08. Now, let us try to

62
understand the physical significance of this particular multiplication. Now, supposing
that I am just going to draw the discrete fuzzy set, that is, A (x).

(Refer Slide Time: 04:09)

So, let me just draw it here and element-wise, I am just going to write down say the
membership value. So, this is mu and mu varies from say 0 to 1 and here, let me write
down say 0 to only say 0.5 and we have got the values up to 1.0.

So, this is corresponding to say x_1, this is x_2, this is x_3 and this is your x_4. So, this
is the direction of x, now if I consider the original fuzzy set, that is, A (x). So,
corresponding to your x_1, I have got 0.1 supposing that. So, this is nothing, but 0.1 this
is 0.3, this is 0.3 and this is your 0.4. Now, corresponding to x_1; so, I have got say 0.1.
So, I am here, then corresponding to x_2. So, I have got 0.2; so I am here.

Now, corresponding to x_3 say I have got 0.3 say might be I am here, now this is say 0.3
and corresponding to x_4. So, I have got say 0.4. So, might be I am here; now if this is
the situation. So, this is the distribution of your A (x) that is the fuzzy set A x, now if I
just multiply by d, where d is equals to 0.2. So, corresponding to x_1; so I will be getting
0.02. So, might be I am here, then corresponding to x_2. So, I will be getting 0.04. So,
might be I am here.

Corresponding to x_3. So, I will be getting 0.06. So, might be I am here and
corresponding to your x_4. So, I have got 0.08; So I could be here. Now, the original

63
fuzzy set, the discrete fuzzy set is something like this. So, as it is discrete, I am not going
to draw in continuous curve and if I just multiply by d. So, I will be getting this type of
nature; that means, there is a chance of variation of the particular distribution.

Now, remember one thing. So, this particular concept might be required if you want to
optimize the shape and size of this particular membership function distribution. Because,
what I you do is, initially we assume some membership function distribution but we do
not know whether that is the correct one, and with the help of some optimizer and with
the help of some training scenarios, what we do is, we try to optimize the shape of that
particular member function distribution. And, this is actually the physical significance of
your multiplication of a fuzzy set by some crisp number. So, I think this particular idea is
clear to you.

(Refer Slide Time: 07:13)

Now, I am just going to consider the power of a fuzzy set. Now the power of a fuzzy set
that is defined as your A p ( x) and this indicates actually the p-th power of the fuzzy set
A (x).

Now, the moment we are considering the p-th power of the fuzzy set A (x). So, it will
have the membership value that is denoted by µ A p ( x) and µ A p ( x) = {µ A ( x)} p . Now, p is

actually the power and where small x belongs to the universe of discourse, that is your
capital X. Now, if I consider, say a particular value for this p; supposing that p is set

64
equal to 2. Now, this is known as the concentration of the fuzzy set. So, if I put p is
equals to 2, this is known as the concentration of a fuzzy set.

For example, say I have got a fuzzy set A (x), now if I say, can you please find out the
concentration of this particular fuzzy set? So, what we will have to do is, you will have
to find out A (x) raise to the power 2. So, this is nothing, but the concentration of a fuzzy
set. On the other hand, if I set p is equal to say half, that is called the dilation of a fuzzy
set. So, for the fuzzy set A (x), if I tell you that can you please find out the dilation of a
fuzzy set and that is nothing, but is your A (x) raise to the power half. So, this is nothing,
but the dilation of a fuzzy set.

Now, I am just going to consider some numerical example just to show, how does it
work.

(Refer Slide Time: 09:09)

And, after that, I will try to find out what should be the physical significance of this
particular concept of power of a fuzzy set. Now, let us consider the same discrete fuzzy
set denoted by A (x) is equal to x_1 comma 0.1, x_2 comma 0.2, x_3 comma 0.3 and x_4
comma 0.4 and let me consider the power is equal to 2. So, this is nothing, but the
concentration of a fuzzy set. Now, this A2 ( x) is nothing by x_1. So, what we will have
to do is we will have to find out 0.1 raise to the power 2, square of that and that is
nothing, but 0.01.

65
So, corresponding to x_1, we will have to write down 0.01, similarly corresponding to
x_2. What we will have to do is, we will have to write 0.2 square that is nothing, but
0.04. Then, corresponding to x_3, it will be 0.3 square that is your 0.09 and
corresponding to x 4 actually what we will have is, 0.4 raise to the power 2 and that is
nothing, but is your 0.16. So, this is the way, actually we can find out the concentration
of the fuzzy set.

Now, I am just going to discuss its physical significance. Now, let us try to find out the
philosophy behind this particular concept.

(Refer Slide Time: 10:51)

Now, once again let me draw a fuzzy set like if this is your µ , and let me consider, this
is 0.0, this is 0.4, say this could be 0.1, 0.2 and this is your 0.3 and along this, we have
got x, and supposing that, this is x_1, this is x_2, this is your x_3 and this is x_4. Now
corresponding to x_1; so, I have got 0.1, similarly corresponding to x_2. So, I have got
0.2 and corresponding to x_3 say I have got 0.3 and corresponding to this x_4 say I have
got 0.4.

So, this is actually the fuzzy set A (x), now we will try to find out what should be its
concentration. Now, concentration corresponding to your x_1 is 0.01. So, very near to 0
corresponding to x_2 it is 0.04. So, I could be here; corresponding to x_3 it is 0.09. So, I
could be here and corresponding to x_4. So, this is nothing, but 0.16; so, this is 0.1. So,
might be I am here.

66
Now, this is actually the concentration of this particular fuzzy set and once again, the
purpose is actually to find out the optimal distribution of this particular fuzzy set. Now,
once again, we can take the help of one optimizer and with the help of some training
scenarios or the training data, we can find out what should be the optimal distribution of
the membership function of that particular fuzzy set.

So, this is actually the physical significance of this particular concept of the power of a
fuzzy set.

(Refer Slide Time: 12:53)

Now, I am just going to find out how to find out your the algebraic sum of two fuzzy sets
say I have got two fuzzy sets A (x) and B (x) defined in the same universe of discourse.
So, their algebraic sum is nothing, but A( x) + B( x) is nothing, but x, such that it has got

the members value µ A+ B ( x) . So, this will be the membership value and x belongs to

capital X, where µ A+ B ( x) is nothing, but µ A ( x) + µ B ( x) − µ A ( x).µ B ( x) .

Now, let us see, let us try to solve one numerical example corresponding to this.

67
(Refer Slide Time: 13:47)

Now, supposing that I have got two fuzzy sets now one is your A (x) is nothing, but this
is a discrete fuzzy set and I have got another discrete fuzzy set which is nothing, but B
(x) and B (x) is nothing, but your x_1 comma 0.5, x_2 comma 0.7, x_3 comma 0.8 and
x_4 comma 0.9 and A (x), I have already use several times the same fuzzy set.

Now, how to find out. So, this A (x) + B (x)? Now, as I told that to find out the
membership value corresponding to x_1. So, what we will have to do is, we will have to
add. So, this particular membership function value is added to this particular membership
function value and then, we will have to subtract these multiplied form.

So, what you can do is. So, 0.1 plus 0.5 minus 0.1 multiplied by 0.5. So, that will be
actually the µ corresponding to this particular the x_1. So, this is nothing, but 0.6 x
multiplied by 0.05 and this is nothing, but 0.55. So, corresponding to x_1; so we have got
the membership function value 0.55. Similarly corresponding to this x_2; so, I have got
0.2 here, 0.7 here. So, 0.2 plus 0.7 minus 0.2 multiplied by 0.7; so, it is 0.14. So, this is
nothing, but 0.9 minus 0.1 4 and this will become equal to your 0.76.

So, corresponding to this x_2. So, I will be getting 0.76; similarly corresponding to x_3,
it will become 0.3 plus 0.8minus 0.24. So, this is nothing, but 1.1 minus 0.24 and this is
nothing, but 0.86. So, you will be getting 0.86, similarly by following the same method
corresponding to x_4. So, I will be getting 0.94. So, this is the way actually we can find
out. So, A (x) + B (x).

68
(Refer Slide Time: 16:23)

Now, the next operation is known as the bounded sum up of fuzzy sets.

So, once again, I have got two fuzzy sets like your A (x) and B (x), they are defined in
the same universe of discourse that x belongs to your capital X and I will have to find out
A x bounded sum. So, this is the symbol for the bounded sum. So, A (x) bounded sum B
(x) and this is nothing, but x such that it has got the membership value, which is nothing,
but µ A⊕ B ( x) and how to determine? So, this particular µ A⊕ B ( x) . So, this is the way
actually we can find out. So, this is nothing, but the minimum between two quantities
one is 1 and another is your µ A ( x) + µ B ( x) . So, what I do is we simply add the two µ
values and we compare with one, and we will be taking the minimum.

Now, let us see how does it work, and let me solve one numerical example here for this
bounded sum.

69
(Refer Slide Time: 17:35)

Now, supposing that A (x) is a fuzzy set, the discrete fuzzy set like this, and B (x) is
another discrete fuzzy set like this. So, what we will have to do is. So, you will have to
find out the minimum between one and what we will have to do is, you will have to add.
So, this particular µ value and that particular the µ value; so, 0.1 plus 0.5

So, it is 0.6 and we will have to find out the minimum and that is nothing, but 0.6. So,
corresponding to x_1. So, I will have to write down 0.6; similarly corresponding to x_2.
So, it is nothing, but the minimum between 1 and 0.2 plus 0.7 is your 0.9 and the
minimum is your 0.9. So, corresponding to x_2; so will have to write down the
membership value as 0.9. Now following the same procedure for this particular x_3. So,
this will become minimum between 1 comma; so 0.3 plus 0 0.8. So, this is nothing, but
1.1 and minimum is nothing, but 1.0.

So, here, we will have to write down 1.0. Similarly, corresponding to this x_4. So, this is
nothing, but the minimum between 1, and 0.3 sorry 0.4 plus 0.9. So, this is nothing, but
1.3 and we will have to find out the minimum that is your 1.0. So, corresponding to this
particular x_4. So, we are writing 1.0. So, this is the way actually we can find out the
bounded sum of your A (x) and B (x).

70
(Refer Slide Time: 19:37)

Now, let us see how to carry out the algebraic difference between the two fuzzy sets.
Now supposing that I have got your the two fuzzy sets like A (x) and B (x) and they are
defined in the same universe of discourse, that is the x belongs to capital X. So, how to
find out the algebraic difference that is your A (x) minus B (x)? Now, A( x) − B( x) is

nothing, but x, such that it has got the membership function value that is your µ A− B ( x) .

Now, how to define this µ A− B ( x) ? The µ A− B ( x) that is defined as µ A B ( x) . So, what will

have to do is. So, I have got two fuzzy sets A x and B x. So, we have got A (x) and B (x).
So, what we will have to do is, will have to find out the complement of these particular B
and that is nothing, but is your B ( x) . So, this is the complement of B (x) and after that

we will have to use the concept of intersection that is your A  B . Now, let us say how
does it work and just to saw. So, I am just going to solve one numerical example.

71
(Refer Slide Time: 21:07)

Now, supposing that your A (x) is nothing, but a discrete fuzzy sets something like this,
and B (x) is a another discrete fuzzy set something like this.

So, what I do is, we try to find out its complement that is B ( x) is nothing, but x_1
comma 1 minus 0.5 that is 0.5. So, I will be getting 0.5 here. Then, x_2 corresponding to
this x_2 and this will be your1 minus 0.7 that is nothing, but is your 0.3, then
corresponding to this particular x_3. So, this will become 1 minus 0.8. So, this is 0.2. So,
I will be getting this as compliment, then corresponding to this x_4. So, this is nothing,
but1 minus 0.9 and this is nothing, but 0.1. So, I will be getting this.

So, this is nothing, but the complement of B. Now, actually what we do is, we try to find
out the intersection of this particular A (x) with your B . Now, if I try to find out the
intersection of A (x) and this particular your B bar. So, A (x) intersection. So, B bar I am
just going to find out now, element-wise. So, I will have to find out what should be the
mu now corresponding to this x 1. So, what I will have to do is, I will have to compare.
So, this 0.1 with your 0.5 and will have to consider the minimum and that is nothing, but
0.1.

Then corresponding to x_2. So, I have got 0.2 and I have to 0.3 here. So, minimum of
that is 0.2, then corresponding to x_3. So, I have got 0.3 here and 0.2 here and the
minimum is 0.2 then comes your corresponding to x_4. So, I will have to compare 0.4

72
and 0.1 and the minimum is your 0.1. So, this is the way actually we can find out what
should be your A x minus B x. So, this is the way we can find out the difference.

(Refer Slide Time: 23:39)

Now, then comes your the bounded difference between the two fuzzy sets. Now let us
see how to find out the bounded difference of two fuzzy sets. Now, the two fuzzy sets are
nothing, but A (x) and B (x) and we try to find out the bounded difference. Now, this
particular symbol is actually the symbol for the bounded difference and this is nothing,
but x, such that it has got the membership function value which is nothing, but a bounded
difference, x belongs to capital X, now how to define this particular membership?

Now, this mu_(A bounded difference B) is nothing, but the maximum between two
quantities one is 0 and another is your µ A ( x) + µ B ( x) − 1 . So, what we will have to do is,

will have to add µ A ( x) and µ B ( x) and subtract 1. So, this particular quantity we will
have to determine and will have to compare with 0 and we will have to find out the
maximum. Now, let us see, how does it work, let us solve one numerical example.

73
(Refer Slide Time: 25:03)

Now, if you just solve the numerical example. So, you can find out say I have got two
fuzzy sets here, I have got two fuzzy sets here one is nothing, but is your A (x) another is
your B (x) and we will have to find out the maximum between 0 and so, this particular
µ that is your 0.1 plus 0.5 minus 1.0 and we will have to find out the maximum of this.

And, this is nothing, but your the maximum between 0 and here it is 0.6 minus 1. So, it is
your minus 0.4 and the maximum between 0 and minus 0.4 is nothing, but 0. So,
corresponding to this particular x_1; so I will have to write down 0.0, then corresponding
to x_2. So, what we will have to do is; so you will have to find out the maximum
between 0 comma 0.2, plus 0.7 minus 1.0. So, this is nothing, but the maximum between
your 0.9 minus 1.0. So, this is nothing, but minus 0.1 and the maximum value is 0.

So, corresponding to x_2; so there will be 0.0, then corresponding to x_3; so what we
will have is your the maximum between 0 then 0.3 plus 0.8 is a 1.1 minus 1; so, this is
nothing, but 0.1 and the maximum between 0 and 0.1 is nothing, but 0.1 and
corresponding to your x_4. So, this will be your maximum between 0 comma 0.4 plus
0.9 is a 1.3 minus 1.0. So, this is nothing, but 0.3 and the maximum between 0 and 0.3 is
nothing, but is your 0.3.

So, we can find out this particular your A (x) bounded difference your the B (x).

74
(Refer Slide Time: 27:33)

Now, let us see how to find out the Cartesian product of two fuzzy sets. Now, let us
consider that I have got two fuzzy sets, say A (x) that is defined in the universe of
discourse capital X, and I have got another the fuzzy set that is B (y) that is defined is
another universe of discourse. Now, this A (x) might be the set of temperature in a month
for the first few days. So, that is nothing, but the A (x) and B (y) could be the values or
the collection of values of say the humidity of a particular place and where y is nothing,
but the collection of all humidity values.

So, I have got two universe of discourses, that is the X, another is Y; one could be
temperature another could be humidity and A (x) is defined in X and B (y) is defined in
Y. So, how to find out the Cartesian product of these two fuzzy sets? The Cartesian
product of these two fuzzy sets is defined as µ A× B ( x, y ) is nothing, but the minimum

between µ A ( x) , µ B ( y ) . So, what we will have to do is. So, I will have to compare. So,

these two µ values and will have to consider the minimum.

75
(Refer Slide Time: 29:05)

Now, let us see, how does it work. Now, supposing that say I have got two fuzzy sets one
is your A (x) and I have got another fuzzy set in another universe of discourse that is say
B (y). So, how to find out their Cartesian product? So, what we will have to do is. So, we
will have to find out the minimum of µ A ( x1 ), µ B ( y1 ) . So, I will have to compare. So, that

is nothing, but the minimum. So, µ A ( x1 ) is 0.2 and µ B ( y1 ) is 0.8. So, we compare 0.2
and 0.8 and we try to find out the minimum.

Similarly, we will have to compare. So, µ A ( x1 ), µ B ( y2 ) ; that means, I will have to


compare your the 0.2 and 0.6, and their minimum value is nothing, but 0.2.

76
(Refer Slide Time: 30:17)

And, by following the same procedure so, I will have to find out the other elements that
is your the minimum between µ A ( x1 ), µ B ( y3 ) that is the minimum between 0.2 and 0.3

and that is nothing, but 0.2; next is the minimum between µ A ( x2 ), µ B ( y1 ) that is nothing,
but the minimum between 0.2 and 0.8 and that is nothing, but is your 0.3.

Next is the minimum between µ A ( x2 ), µ B ( y2 ) and that is nothing, but the minimum

between 0.3 and 0.6 and that is nothing, but 0.3 the next is the minimum between µ A ( x2 )

and µ B ( y3 ) and that is what the minimum between 0.3 and 0.3 and the minimum is your
the 0.3.

77
(Refer Slide Time: 31:23)

The next, we try to find out the minimum between µ A ( x3 ) µ B ( y1 ) and that is nothing, but
the minimum between 0.5 and 0.8 and that is 0.5.

Next is the minimum between µ A ( x3 ) µ B ( y2 ) following the same procedure I will be

getting 0.5 then comes your minimum between µ A ( x3 ) and µ B ( y3 ) and that is nothing,
but the minimum between 0.5 and 0.3. So, I will be getting 0.3 here. Next is the
minimum between µ A ( x4 ) and µ B ( y1 ) and that is nothing, but minimum between 0.6
and 0.8 and you will be getting 0.6.

78
(Refer Slide Time: 32:19)

Now, here, this is the product and to find out the products, we need to find out some
other elements like your the minimum between µ A ( x4 ) µ B ( y2 ) is nothing, but the
minimum between 0.6 and 0.6 it is 0.6.

Then comes your the minimum between µ A ( x4 ) , µ B ( y3 ) that is nothing, but is your the
minimum between 0.6 and 0.3 and will be getting 0.3. So, in the matrix form; all the
elements have already been determined. So, this I can write down in the matrix form and
this will look this. So, there will be your 4 rows and 3 columns. So, this is nothing, but a
4 × 3 matrix. So, this is actually the product of A and B.

Thank you.

79
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 06
Introduction to Fuzzy Sets (Contd.)

Now, we are going to discuss how to determine the composition of fuzzy relations.

(Refer Slide Time: 00:22)

Now, supposing that I have got two fuzzy relations in the matrix form, for example, A is
nothing but say a_ij and B is nothing but say b_jk. Now, how to find out the composition
of these two fuzzy relations? Now, the composition that is denoted by C is nothing but A
composition B, and this particular symbol indicates actually the composition.

So, C is nothing but A composition B. Now, in the matrix form, this can be written as,
say I know a_ij, I know b_jk. So, a_ij composition b_jk is nothing but is your c_ik; and
how to find out this particular c_ik? Now, c_ik is determined between the maximum of
the minimum of this a_ij coma b_jk. Now, let us see with the help of one numerical
example. So, how to find out this particular c or the composition of two fuzzy relations.

So, I am just going to take one numerical example.

80
(Refer Slide Time: 01:51)

Now, supposing that I have got, one fuzzy relation, that is, A is a_ij in the matrix form
and this is nothing but say in the 2 × 2 matrix like 0.2 0.3 0.5 0.7; the next is your the B
is another fuzzy relation and in the matrix form. So, this is nothing but b_jk and
supposing that I have got a 2 × 3 matrix and the elements are 0.3, 0.6, 0.7, 0.1, 0.8, 0.6.

Now, our aim is to find out the elements of these particular c_ik. Now, let us see how to
find out the elements, that is the c_ik. Now, here, if I just try to find out the element-
wise.

(Refer Slide Time: 02:51)

81
Now, the first thing is your c_11 that is the first row the first column element. Now, if
you remember. So, this particular the relationship that is c_ik is nothing but a_ij
composition that is your b_jk. Now, here actually what you will have to do is; so here to
determine the c 11. So, I will have to put i equals to 1, I will have to put k equals to 1.

Now, here, I can put i equals to 1 and k equals to 1 and j will vary from 1 to 2. Now, if j
varies from 1 to 2. So, very easily I will be getting C_11 is nothing but the maximum
between the minimum between. So, i equals to 1 and let me put j equals to 1 here. So, 1 1
then comes your b_jk now j is equals to 1 and k is equal to 1. So, b_11 comma the
minimum of a_1 now I will have to put j equals to 2. So, a_12 then comes your b_j
equals to 2. So, it is 21. So, first you will have to find out the minimum between these
two the minimum between these two and then I will have to find out the maximum
between these two; that will be nothing but is your C_11.

Now, corresponding to this particular numerical example like a_11 that is a matrix first
row first column and that is nothing but 0.2, then comes your b_11. So, the b matrix first
row first column is 0.3. Next, comes the minimum between a_12 that is your first row
second column and that is nothing but 0.3. Then, comes your b_21 that that is your
second row first column and that is nothing but 0.1.

Now, if I compare 0.2 and 0.3. So, the minimum will be 0.2. Similarly, if I compare 0.3
and 0.1, the minimum will be 0.1 and the maximum between 0.2 and 0.1 is nothing but is
your 0.2. So, I can find out; so this c_11 is nothing but 0.2 and by following the same
procedure, I can also find out the other elements of the matrix.

82
(Refer Slide Time: 05:48)

For example say c_12 and let me try to derive this particular thing once again. Now, this
c_12 means what?

So, c_ik is nothing but a_ij then comes your b_jk. Now, the moment I am writing c_12.
So, i equals to 1 and k is equals to 2 and once again, j will vary from 1 to 2. So, this
C_12 is nothing but the maximum of the minimum between. So, i equals to 1, j equals to
1. So, a_11 then comes your b_jk, j equals to 1 and what is k? k is nothing but is your 2
minimum your next is what like a_12. So, i equals to 1 and what about j? j is equals to
your 2. So, a_12 and next is your b_jk. So, j equals to 2 and what about your k? k is also
2.

So, this is the relationship which will be getting. Now, if you just put the elements the
numerical values like a_11 first row first column is 0.2, b_12 that is your first row
second column 0.6, next a_12 is 0.3 and b_22 is 0.8. So, the minimum between 0.2 and
0.6 is nothing but 0.2, similarly the minimum between 0.3 and 0.8 is nothing but 0.3 and
the maximum between them 0.3.

So, we can find out what should be the numerical value for this particular your C_12.

83
(Refer Slide Time: 07:56)

Now, by following the same procedure; so I can find out what is your c_13; and c_13 is
nothing but the maximum or the minimum between a_11, b-13 comma the minimum
between a_12 comma b_23; and if you just substitute the numerical values here. So, this
will become the maximum between the minimum between 0.2 and 0.7 and the minimum
between 0.3 and 0.6. So, here I will be getting 0.2. I will be getting 0.3 and the maximum
between them is the 0.3. So, c_13 is nothing but is your 0.3.

(Refer Slide Time: 08:46)

84
Now, following the same procedure. So, I can also find out your c_21 using this
particular expression, and if you insert the numerical values I will be getting this, and I
can find out the maximum between 0.3 and 0.1 and you will be getting 0.3.

(Refer Slide Time: 09:09)

Now, by following the same procedure: so you can also find out the next element that is
your c_22, which is nothing but this. And, we substitute the numerical values will be
getting like this, then maximum between 0.5 and 0.7 and I will be getting your the 0.7.

Now the next is your c_23.

(Refer Slide Time: 09:36)

85
So, I can find out what is this c_23 this is nothing but this particular expression substitute
the numerical values will be getting this, find out the maximum between 0.5 and 0.6.

So, you will be getting your c_23 as 0.6.

(Refer Slide Time: 09:56)

Then, we can write down the c matrix. So, all the elements, I can write out and it is
nothing but actually 2 × 3 matrix, and these are the element c_11 c_12 c_13 c_21 c_22
c_23. So, you will be getting this particular matrix. So, this is the way actually we can
find out the composition of the two fuzzy relations.

(Refer Slide Time: 10:30)

86
Now, I am just going to start with another topic, that is the properties of fuzzy sets. Now,
if you remember while discussing the concept of crisp set, we have already discussed
about ten properties. Now, these ten properties if you look into. Now, this fuzzy set can
follow the first 8 properties out of the 10, but the last two properties of the crisp set are
not followed by the fuzzy sets and these are nothing but the law of excluded middle; and
then comes law of contradiction.

(Refer Slide Time: 11:16)

Now, if you see this law of contradiction, if you see the law of excluded middle first.
Now, as I told that these two properties are not followed by the fuzzy set, one is law of
excluded middle and the law of contradiction. Now, according to the law of excluded
middle in the crisp set. A  A that is the intersection of two fuzzy sets like A and its
complement is null set. The union of two fuzzy sets that is A and its complement is
nothing but the universal set.

But, in fuzzy sets we will find that A union A bar is not equals to X. So, this particular
violation we are getting in the fuzzy set now. So once again, let me repeat according to
the law of excluded middle in crisp set. So, A union A bar is X that is universe of
discourse, but in fuzzy sets A union A bar is not equal to the universe of discourse. Now,
let us try to explain now, supposing that this particular triangle is going to represent the
fuzzy set A (x). Now, if this is A (x). So, we have already discussed now, how to find

87
outs its complement supposing that this is denoted by the dotted line. So, its complement
is denoted by this, this I have already discussed.

Now, if you want to find out the union. So, what you will have to do is. So, you will have
to concentrate here corresponding to this value of x and according to your A (x). So, the
µ is nothing but 0, but corresponding to A( x) that is the compliment. So, its your
membership function value is your 1.0 and this is the union. So, we will have to consider
the maximum; that means, corresponding to this value of x. So, I will have to consider
this as the µ . Next, the moment I am here; so this particular value of say x. So, now, this
is actually your µ corresponding to your the fuzzy set A (x) and this is nothing but the

µ corresponding to this particular A .

So, this is nothing but µ corresponding to A and this is nothing but µ corresponding to

A . Now, these two µ s, we will have to compare and we will have to consider the
maximum. So, I am here; now if I follow this principle and if we increase the value of x.
So, there is a possibility that I will be getting actually this union, which is nothing but
this.

(Refer Slide Time: 14:34)

So, this type of thing I will be getting as the union of you fuzzy set A (x) and its
complement. So, this is the thing actually, we will be getting, ok. So, this shaded portion

88
actually indicates your union of a fuzzy set and its complement, but you see here, I have
got one white portion here, I have got one white portion which has not been attended.

So, we cannot say that in fuzzy sets, say A  A is equal to the whole thing, that is capital
X. So, this particular condition holds good for the fuzzy set, but according to the crisp set
this should not be the condition. So, there is a violation of the fuzzy set, that violence is
in terms of law of excluded middle. That means your fuzzy set does not follow the law of
excluded middle. The reason, I am just going to tell you after sometime and let me
explain this another law, which is not followed by this particular fuzzy set and that is
known as law of contradiction.

Now, according to this law of contradiction, in crisp set like A  A is nothing but the
null set, but in fuzzy set A intersection A bar is not equal to 0. So, there is a violation of
this law of contradiction by fuzzy sets. Now, let us try to explain here now supposing
that, this is nothing but the membership function distribution of a fuzzy set A (x) and its
complement is nothing but this dotted line.

So, this is your A( x) , now I try to find out their intersection. So, we will start from here,
now corresponding to this value of to x say x_1 according to this A (x). So, I have got
the µ value, which is equal to 0 and corresponding to this value of x_1, the µ value

corresponding this A( x) is nothing but 1 and we will have to consider the minimum. So,
I will have to consider this particular 0. The next is, I consider another value of x say x_2
and this indicates the µ corresponding to your A that is nothing but µ corresponding to

A, and µ corresponding to your A is nothing but µ corresponding to your A .

Now, we compare µ A with your µ A and this µ A is found to be minimum; so


corresponding to this x_2. So, we consider up to this and by following the same
principle. So, if I just proceed with different values of x. So, I will be getting actually one
area and this particular area is the intersection area. So, this type of intersection area I
will be getting and that is why A intersection your A bar is not equals to 0 in fuzzy set.

Now, these two rules are not followed by the fuzzy set. Now, I am just going to tell you
the reason behind this particular fact. Now, let us try to understand what is the reason

89
that these two rules are not followed by the fuzzy sets? Now, to understand that
particular reason, let me assume that, let me just draw it here the same picture.

(Refer Slide Time: 18:33)

Now, supposing that I have got a fuzzy set for say the red color. Now this is for the red
color and this is nothing but µ , µ equals to 0.0, 1.0, as we have discussed here. So, here
corresponding to this thing, what I will be getting here is this is nothing but its
compliment.

So, this is the compliment, say it is not red. So, not red is actually the complement of the
red. So, if the red is nothing but is your A (x). So, not red is nothing but is your A( x)
and here. So, this is x now, supposing that I am considering a particular color. So, that
color is nothing but this, and I am just going to give a statement whether it is red or not
red or if it is red with what membership function value.

Now, what you do is, corresponding to this value of x or the color, you we try to find out
the µ . Now, this particular value of mu is nothing but that it is red with the membership
function value say µred , and it is the same color it is not red with another membership

function value. And, this particular value is nothing but is your; so µnotred . So, this is

nothing but is your µnotred .

90
So, corresponding to this, I will be getting this is nothing but µnotred and here

corresponding to the red color, I will be getting your the µred . Now, let us take very

hypothetical situation supposing that this µred is nothing but 0.3, now, what will happen

to the value of your µnotred , if it is a triangular distribution? So, this will become your 0.7,
now let me give the statement. So, the same color is considered red with some
membership function value, and it is also considered as not red with another membership
function value and let me put it in another way.

That means, an element that is the red color belongs to its fuzzy set as well as its
complement with different values of membership and this is actually the reason, why this
particular fuzzy set is not going to follow these two rules. So, let me repeat the reasons
behind violation of these two rules by the fuzzy set, which are as follows. Here, in fuzzy
sets and element belongs to a particular fuzzy set as well as its complement with
different values of membership and that is why, we are getting this type of violation of
the two rules.

(Refer Slide Time: 21:59)

Now, I am just going to discuss, in fact, I am just going to define two terms: one is called
the measure of fuzziness. So, how to mathematically find out the fuzziness of a fuzzy
set? Now, to define this particular fuzziness of a fuzzy set, we use a particular term that
is called the entropy. So, the term: entropy is used just to define the fuzziness of a fuzzy
set. Now supposing that I have got a discrete fuzzy set, say A (x) and I know element-

91
wise. So, its membership function values, how to find out the entropy of this particular
fuzzy set, which is denoted by capital H (A). Now,
1 n
H ( A) =− ∑ [µ A ( xi ) log{µ A ( xi )} + {1 − µ A ( xi )}log{1 − µ A ( xi )}] . So, using this particular
n i =1
the expression; so I am just going to find what should be the entropy of a fuzzy set and
that is nothing but the measure of fuzziness of a fuzzy set.

(Refer Slide Time: 23:34)

Now, here, I am just going to take one numerical example, supposing that I have got a
discrete fuzzy set A (x) like this and its entropy, according to the rule which I have
already discussed is nothing but minus 1 divided by n. Now, small n is equal to 4 here,
because I have got four such elements x_1 x_2 x_3 and x_4. So, this particular mu if you
see this particular formula once again; so I can see. So, this is actually the formula:
1 n
H ( A) =− ∑ [µ A ( xi ) log{µ A ( xi )} + {1 − µ A ( xi )}log{1 − µ A ( xi )}] ; now if you see this
n i =1
particular formula. So, I very easily I can find out corresponding to this x 1 I have got 0.1
log of 0.1, then comes your 1 minus 0.1 is 0.9 and log of 1 minus 0.1, (that is, 0.9).

So, corresponding to this particular x_1, I can find out this part, then corresponding to
x_2, I can find out your this particular component, corresponding to x_3, we have got
this and corresponding to x_4 we have got this and now, if we calculate. So, we will be
getting actually one numerical value and that is nothing but 0.2499 and that is entropy of

92
this particular fuzzy set and that is nothing but the fuzziness of the fuzzy set in the
numerical term.

So, this is the way actually we can find out the fuzziness of a fuzzy set using the concept
of entropy.

(Refer Slide Time: 25:29)

Now, we are going to discuss like how to measure the in accuracy of a fuzzy set. Now,
suppose if that I have got two fuzzy sets: A (x) and B (x) defined in the same universe of
discourse and I want to find out the inaccuracy of fuzzy set with respect to fuzzy set A.
So, I am just going to find out the inaccuracy of the fuzzy set B with respect to the fuzzy
set A and mathematically this can be expressed as follows:
1 n
I ( A; B) =− ∑ [µ A ( xi ) log{µB ( xi )} + {1 − µ A ( xi )}log{1 − µB ( xi )}] . So, this is the way
n i =1
actually, we can find out the inaccuracy of a fuzzy set with respect to another fuzzy set.

93
(Refer Slide Time: 26:37)

Now, here if you see; so I have got two fuzzy sets here.

So, A (x) is nothing but this and B (x) is nothing but this, now in accuracy of B (x) with
respect to A (x) is determined as I ( A; B) is nothing but minus 1 by 4 0.1 log of 0.5, then
comes your 1 minus 0.1 is 0.9 and log of 1 minus 0.5 is 0.5. This is actually
corresponding to x_1 similarly corresponding to x_2 0.2 and log of 0.7, then comes your
1 minus 0.2 is 0.8 and 1 minus 0.7 is your 0.3. So, corresponding to x_2 we can find out
this then corresponding to x_3 we can find out this and corresponding to your x_4.

We will be able to calculate this, and ultimately if we calculate you will be getting
0.4717 and this is nothing but in accuracy of fuzzy set B with respect to your fuzzy set
A. Now, this is the way actually, we can calculate what should be the inaccuracy of a
fuzzy set with respect to another fuzzy set.

94
(Refer Slide Time: 28:05)

Now, the reference for this the discussion, which I have already discussed the text book
you can see the Soft Computing Fundamentals and Applications written by me and
published by Narosa Publishing House. And, the reference book you can see Fuzzy Sets
and Fuzzy Logic Theory, and Applications written by George Klir & others, which has
published in the year 1995.

(Refer Slide Time: 28:38)

Now, just to summarize quickly like till now, we have discussed a few terms related to
fuzzy sets, some standard operations in fuzzy sets have been discussed in details and

95
after that, the properties of fuzzy sets we have discussed. And, we have already
mentioned that out of the ten properties of the crisp set, the first eight properties are
followed by the fuzzy sets, but the last two properties are not followed by the fuzzy sets.
And, at the end, we have defined mathematically, how to determine the fuzziness of a
fuzzy set and inaccuracy of a fuzzy set with respect to another fuzzy set.

Thank you.

96
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 07
Applications to Fuzzy Sets

We are going to discuss how to use the concept of fuzzy sets to solve a variety of real-
world problems. We have already explained the grammar of fuzzy sets and let us see,
how to utilize that fuzzy set to solve a variety of problems.

(Refer Slide Time: 00:35)

Now, as we told that we are going to concentrate on application of fuzzy sets. Now, in
this lecture actually, we are going to concentrate mainly on two applications: one is how
to design and develop fuzzy reasoning tool in the form of fuzzy logic controller.

Now, if you see the literature, we have got two very popular approaches: one is called
the Mamdani approach and another is called Takagi and Sugeno’s approach. Now, both
the approaches will be discussed, in detail, with suitable numerical examples, then we
will concentrate on fuzzy clustering. Now, clustering is done based on the concept of
similarity. Now, two similar points should belong to the same cluster and two dissimilar
point should go to two different clusters. Now, let us see how to use the concept of fuzzy
sets to design and develop the clustering tools.

97
(Refer Slide Time: 01:43)

Now, this I have already mentioned that the concept of fuzzy sets have been used to
develop fuzzy reasoning tool like fuzzy logic controller, then fuzzy clustering and if you
see the literature, the fuzzy sets have been used for fuzzy mathematical programming,
then comes fuzzy graph theory, and others.

(Refer Slide Time: 02:05)

Now, here, I am just going to concentrate on how to design and develop fuzzy logic
controller or fuzzy reasoning tool. Now, the first notable approach developed in this
particular direction is that by Mamdani and Assilian.

98
Now, Mamdani and Assilian, in the year around 1974-75, developed one fuzzy reasoning
tool or fuzzy logic controller. The purpose was how to model the input-output
relationships of a process. Now, this Mamdani approach if you see the working principle,
the performance of the fuzzy logic controller depends on the knowledge base and this
knowledge base consists of data base as well as rule base.

Now, let me first define the concept of this data base and rule base, now, let me take a
very simple example, very practical example, supposing that for this lecture room, I want
to control the temperature and humidity, and I want to keep the temperature and
humidity within a very reasonable range or very acceptable range. Now, let us see how to
control with the help of one air conditioner, supposing that I have got one AC here in this
particular room, and I want to control the temperature and humidity of this room.

Now, the first thing we will have to do is we will have to find out or we will have to
identify, what are the variables the design variables. Now, the design variables could be
something like this, for example, say the temperature of this particular room, humidity of
this particular room, the temperature outside the room, humidity outside the room
thermal conductivity of the wall, number of people sitting in this particular room. So,
these are all the variables or these are actually the design variables.

Now, supposing that for simplicity let me consider that I will have to develop fuzzy
logic-based expert system, which is going to control the valve opening of this particular
air conditioner so, that the temperature and humidity remain within the comfortable
zone. So, let me consider, for simplicity, that there are two inputs: one is the temperature
T and another is humidity H, and the output is nothing, but the angle of valve opening for
this particular air conditioner.

Now, let us see, how to define the data base and the rule base, if I want to design and
develop one fuzzy reasoning based tool or fuzzy logic controller. So, here, the inputs are
the temperature and humidity inside this particular room and output is nothing, but the
angle of valve opening for this air conditioner. So, what we do is, we try to define some
range for the temperature. Now, supposing that this is nothing, but the range for the
temperature, this is the minimum temperature and this is the maximum temperature. Let
me consider the minimum temperature is 10 degrees centigrade and the maximum
temperature is 50 degrees centigrade, and what we do is. So, this whole range of

99
temperature is expressed with the help of some linguistic terms and for simplicity, let me
consider that I am using only three linguistic terms.

Now, supposing that the linguistic terms are as follows: for example, say I am just going
to say, this is the low temperature, this is the membership function distribution or the
fuzzy set for the low temperature denoted by L, this is the membership function
distribution or the fuzzy set for the medium temperature M and this is say for the high
temperature say that is your H, so H ′ . So, we have got low temperature denoted by L,
the medium temperature denoted by M and high temperature denoted by say H ′ . So, this
is nothing, but the membership function distribution for the temperature.

Similarly, for the humidity, what we do is, say we defined the minimum and the
maximum value for this particular humidity in a particular scale and using some unit. So,
let me assume, let me put some numerical value say, the minimum is 5 and say
maximum is 25 in certain scale and using some units and once again supposing that I am
just going to use say three linguistic terms.

So, if I use three linguistic term, one is say L that is low value of humidity, another is the
medium value of humidity that is denoted by M and another could be your the high value
of humidity and that is denoted by your H prime. So, the whole range of humidity that is
expressed with the help of three linguistic term, one is low, another is medium and
another is high.

So, L, M and H ′ . So, this is nothing, but the humidity, now this is what we mean by the
data base for the temperature and this is nothing, but the database for your the humidity.
Now, similarly, I can also construct one database for the output that is nothing, but the
angle of valve opening. So, this is the angle of valve opening say denoted by A, say I
know the minimum value, I know the maximum value and once again I use say three
linguistic terms: say one is say small denoted by S, another is say the medium say
denoted by M and another could be the high that is the high angle or the large angle say
denoted by say LR, ok.

So, this particular range for this angle of the valve opening is divided into or that is
expressed using three linguistic terms, one is small, another is your medium M and
another is your large, that is denoted by LR. So, this is what we mean by the data base of
some variables say inputs or the output variables. Now, using the concept of this

100
particular database, now I can define the rules. Now, here, there are three linguistic terms
for temperature and three linguistic term for this humidity. So, 3 multiplied by 3, there
will be actually the 9 possible rules; that means, 9 possible conditions for the input
variables.

Now, out of those 9, if I just write only 1, supposing that the rule could be something like
this. So, if your temperature is low and the humidity that is H. So, H is your say medium
then the angle of valve opening that is denoted by A is nothing, but say small. So, this is
one rule and similarly, as I told, there could be 3 multiplied by 3, a maximum of 9 rules,
this is what we mean by the rule base.

So, the rule base consists of 9 rules, a maximum of 9 rules and out of the 9 rules, here, I
have just written a particular one. For example, let me repeat, if temperature is low and
humidity is medium then the angle of valve opening is small. So, this is a particular rule.
So, this is the way actually, we define the database and rule base of this particular fuzzy
reasoning tool and knowledge base consist of both the database as well as the rule base.

Now, here, I am just going to make one comment that the performance of an FLC largely
depends on the rule base and optimizing the database is a fine tuning process. Now, here
the presence or absence of a rule that is going to dictate the output of that particular
fuzzy reasoning tool in a very large way. On the other hand, if I see the database
optimization that is nothing, but the fine tuning process, now let us see, what do you do
during this particular database optimization.

101
(Refer Slide Time: 11:57)

Now, if I consider, the database is nothing, but this type of your triangle. So, if I consider
this type of triangular distribution for this particular database. So, this is low, this is
medium, at this is the high. So, what does it mean? During the optimization, we try to
say, either it increases or decreases. So, this particular base of the triangle; that means, I
will be getting some sort of flatter triangle or I will be getting some sort of the smaller or
the steeper triangle sort of thing.

Now, during the optimization actually we can vary the width or half base-width of this
particular triangle and by doing that, we are simply doing some sort of fine tuning. So,
this is the way actually, we do some sort of fine tuning just to improve the performance
of this fuzzy reasoning tool.

102
(Refer Slide Time: 12:55)

Now, if you see the literature. So, this fuzzy reasoning tool or fuzzy logic controller has
been divided into two groups. Now, one is called your the linguistic fuzzy modeling and
another is called the precise fuzzy modeling.

So, linguistic fuzzy modeling and this precise fuzzy modeling, now let me try to explain
what we mean by the linguistic fuzzy modeling. So, by linguistic fuzzy modeling we
mean those fuzzy modeling where we have got high interpretability, but low accuracy.
Now, let us try to understand, what do you mean by this interpretability of a rule. Now,
let me just write down one hypothetical rule. So, if I_1 that is the first input is say low
and the I_2 that is the second input if it is medium, then the output O is some sort of say
high. So, if this is a rule, the moment I read this particular rule, I will be able to
understand what is the control action?

So, if I read, if I_1 is low AND I_2 is M, then output is high. So, immediately some
control action is coming to my mind; that means, its interpretability is very good; that
means, your understandability of the meaning of this particular output for a set of input is
very high and that is what, we mean by interpretability of a particular rule for the fuzzy
listening tool. Then, comes the accuracy; accuracy is nothing, but the precision; that
means, your the accuracy in prediction of the output for a set of inputs. Now, the
example for this linguistic fuzzy modelling is nothing, but the Mamdani approach, which
will be discussed in much more details.

103
Now, then comes the precise fuzzy modeling and here, we get low interpretability, but
we will be getting actually the high accuracy or the precision. Now, the example is
nothing, but the Takagi and Sugeno’s approach of fuzzy reasoning tool. Now, in Takagi
and Sugeno’s approach, the way we write down the rule is as follows like if I_1 is low
and (this is not the operator end and this is the conjunction and) and I_2 is your medium,
then output is expressed as a function of the input parameters, that is your I_1 and I_2.

Now, this particular output could be either the linear function of the input parameters or
it could be non-linear function. And, the coefficients of this particular function will be
determined with the help of some optimizers, with the help of some training scenarios,
and that is why, we can give a guarantee of high accuracy. But, interpretability is low in
the sense, if I just read this particular rule, no control action is coming to my mind
directly.

So, if I read, if I_1 is low and I_2 is medium, then output is a particular function of this
particular your I_1 and I_2 and let me write that. So, this is nothing, but a1 I1 + b1 I 2 + c1
and if I just write this particular output something like this, no control action is directly
coming to my mind and that is why, the interpretability of this type of fuzzy reasoning
tool is less.

Now, I am just going to discuss this Mamdani approach in much more details.

(Refer Slide Time: 17:09)

104
Now, this schematic view shows actually the working principle of the Mamdani
approach of fuzzy reasoning tool. Now, if we want to implement the Mamdani approach
of this fuzzy reasoning tool the first thing we will have to do is, we will have to
concentrate on the process to be controlled and we will have to identify what are the
condition variables and what are the action variables; that means, what are the inputs and
what are the outputs.

Now, let me once again take the same example like the temperature and humidity control
of this particular room with the help of one air conditioner. And, here the inputs are, as I
mentioned, the temperature and humidity inside the room temperature and humidity
outside the room, then comes the thermal conductivity of the wall and the number of
people sitting inside this particular room. So, these are all condition variables and what is
the output or the action? The output is nothing, but the angle of valve opening, so that we
can keep the temperature and humidity of this particular room in a very comfortable
zone.

So, the first task is, we will have to identify the inputs and the outputs; that means your
condition variables and action variables. Now, these condition variables are also known
as antecedents. So, this is known as antecedents and the action variables are known as
your consequents. So the antecedents and consequents, we will have to identify first and
once we got these particular antecedents, that is, condition variables, we go for the
fuzzification module.

So, by fuzzification module, actually we mean that corresponding to this set of inputs,
we try to find out what should be the membership function values. Now, let me take a
very simple example of temperature. Now, if I consider there are three linguistic terms to
represent the temperature. So, this is the low temperature, this is actually the medium
temperature and this is, say, the high temperature is H ′ .

Now, here, this is the temperature. So, this is 10 degree centigrade and this is say 50
degree centigrade. Supposing that the room temperature is around say around 20 degree
or something. So, might be I am here; so might be I am here. So, if it is so, so what you
can do is, you will have to find out corresponding to this particular temperature, what is
the membership function value for the low and what is the membership function value
for the medium.

105
So, this is the membership function distribution corresponding the low temperature and
this is the membership function distribution corresponding to the medium temperature.
Now, this particular task of determining the membership function value is nothing, but
the fuzzification. Now, once that particular fuzzification has been done, next we
concentrate on the fuzzy inference.

(Refer Slide Time: 20:45)

Now, let us try to understand the utility of this fuzzy inference engine. Now, let me take
a very simple example supposing that the same example of the expert system or fuzzy
logic base-based expert system, to control temperature and humidity. Now, as I told that
we consider a maximum of 9 rules, which are present in this particular the rule base.
Now, supposing that for a set of inputs, at a time, there could be only four fired rules.
Now, out of these 9 rules, which four will be fired that is decided by this particular fuzzy
inference engine.

So, depending on the set of inputs, which four out of these maximum 9 rules will be
fired, that is decided by the fuzzy inference engine. Now, once you have got those five
fired rules, now for each of these particular fired rules, I will be able to find out, what
should be the output and that I am going to discuss in much more details and once you
have got that particular output for each of the fired rules, I will be able to find out one
output and that is nothing, but the fuzzified fired output.

106
So, here, I will be getting the fuzzified output, now this particular fuzzified output will
be nothing, but an area, which I will be discussing much more details after some time.
Now, this output is nothing, but an area, for example, say if it is triangle. So, there could
be a possibility that I will be getting this type of truncated triangle. So, this truncated
triangle shown as the shaded area is nothing, but the output of a particular rule, but here
there, is no such crisp value.

So, what I need is, a crisp value corresponding to this particular area and how to find out
that particular crisp value? To find out the crisp value from this particular fuzzified area,
we take the help of some defuzzification module. Now, a few defuzzification modules
are available in the literature and we will be discussing all such things in much more
details with the help of some numerical examples. And, once you have got the crisp
output, that is nothing, but the actions to be taken for the same example like here the
output will be the angle of the valve opening; whether it is clockwise or anti clockwise
and by how much, I will have to rotate and what should be the valve opening, whether it
is plus 10 degree or minus 10 degree clockwise or anti-clockwise. So, that I will have to
decide, as the action variable that is nothing, but the output of this fuzzy reasoning tool.

(Refer Slide Time: 23:59)

Now, here, whatever we discussed, all such things I have written it. Now, the first task,
let me repeat is actually, I will have to identify the condition and the action variables. So,
the first task will be your identification of the condition and the action variables, that is

107
nothing, but the antecedents and consequents, and once you have got that, now for a set
of inputs, we will try to find out the membership values and that is nothing, but the task
of fuzzification and that is the fuzzification module.

(Refer Slide Time: 24:41)

And, once you have got that now, next we go for the inference engine, I have already
discussed the purpose of inference engine, out of all the possible rules depending on the
set of inputs, which rules will be getting fired, so that will be decided by this particular
the inference engine.

And, once you have got that particular fired rule, for each of the fired rules will be
getting the output and then, how to combine that I am going to discuss in details and
once you have got that particular fuzzified output, then how to determine the crisp value
corresponding to that particular fuzzified output, so that we get some crisp value
corresponding to the output and we can control that particular process. Now, let me
repeat once again the purpose of designing or developing.

So, this fuzzy reasoning tool or fuzzy logic controller is to find out the relationship
between the inputs and outputs of an engineering process. So, the moment we supply a
set of inputs, I should be able to determine the set of outputs. So, that we can control that
particular process very accurately. Now, how to design and develop I am just going to
discuss in much more details with the help of some numerical examples.

108
Thank you.

109
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 08
Applications to Fuzzy Sets (Contd.)

Now, we have discussed that for a set of inputs, with the help of fuzzy inference engine,
we will be able to find out, which are the rules to be fired.

(Refer Slide Time: 00:29)

Now supposing that these two rules are going to be fired and here, we have got the
membership function distribution for the inputs and outputs. Now, let me read the rules
and supposing that these two rules are going to be fired and we will have to find out the
output of these two rules.

Now, the first rule is nothing but, if S_1 is A_1 AND S_2 is B_1 then f ′ is C_1. Now,
let us see, how to represent in the figure. So, this indicates actually the first rule if S_1 is
A_1. So, this is nothing but the membership function distribution for A_1 and this is
nothing but the membership function for your S_2 and S_2 is B_1 then f prime is C_1.
So, this is nothing but the membership function distribution for the output.

Now, let me repeat. So, if S_1 is A_1, if S_1 is A_1 AND S_2 is B_1 then f^prime that
is the output is nothing but C_1. So, this indicates actually the first fired rule, similarly,

110
the second fired rule if S_1 is A_2. So, this is nothing but the membership function
distribution for A_2 and S_2 is your B_2. So, this is nothing but the membership
function distribution for B_2 then f ′ is C_2. So, this is nothing but is your C_2. So, the
second rule is actually represented by; so this particular part of the figure.

So, this is the first rule, first fired rule and this is the second fired rule and supposing that
I am passing one set of inputs, the inputs are nothing but S1∗ and your S 2∗ . So, this is the
set of inputs. So, this is nothing but two inputs-one output process, now here, I am just
going to pass this S1∗ that is the first input. Now, if I pass this particular the first input,
corresponding to the first fired rule; I will be getting one membership function value
here, and I will be getting another membership function value here, and corresponding to
this particular S 2∗ , I will be getting one membership function value here and I will be
getting another membership function value here. Now we concentrate on this particular
the first rule the rule is as follows: if S_1 is A_1. So, here I will be getting some
membership function value corresponding to S_1 and S_2 is B_1; so here corresponding
to this S 2∗ . So, I will be getting some membership function value then f ′ is C_1; that
means this is nothing but the membership function distribution for this particular output.

Now, here, once again, if we concentrate on the rule 1, the first fired rule; so here, we
have got one AND operator and AND operator is nothing but the minimum operator. So,
this is nothing but the min operator. So, what I do is, to find out the output of the first
fired rule, we compare this particular µ value and this particular µ value. So, this µ

value is corresponding to S1∗ and this µ value is corresponding to your S 2∗ and we

compare these two µ values and we try to find out the minimum because we have got
the AND operator.

So, if I compare this µ value and that particular µ value, this is the minimum; so
corresponding to that, I can find out. So, this is nothing but the fuzzified output
corresponding to the first fired rule. Now, by following this similar procedure, for the
second fired rule, which states if S_1 is A_2 and S_2 is B_2, then f ′ is your C_2. So,

this indicates the second rule, now corresponding to this S1∗ . So, I have got this particular

membership function value and corresponding to this S 2∗ .

111
I have got this particular membership function value and now, I will have to compare
this numerical value of membership and this numerical value of membership. And, if I
compare, this is found to be the smaller and there is AND operator. So, we will have to
consider the minimum value of µ and corresponding to that minimum value of µ . So, I
will be getting some output here and this shaded portion is nothing but the output of the
second fired rule. So, we have got the output of the first fired rule, we have got the
output of the second fired rule.

Now, we will have to combine. Now, to combine these particular outputs, we take the
concept of the OR operator. Now, in the rule base supposing that we have got a large
number of rules say there are nine rules. So, we say that either the first rule has got fired
or the second rule has got fired or the third rule has got fired, and so on. So, there is one
OR operator in between the rules and that is why, if we want to combine these two
outputs, what we are going to do is, we are going to use the OR operator and by OR
operator, we know that this is nothing but the max operator.

So, what we do is, we superimpose. So, this particular shaded portion and that particular
shaded portion and we try to find out the maximum for example, say. So, this particular
thing I have copied it here and this shaded area, I have copied it here. Now, if I just try to
find out what should be actually the combined control action, the output of the combined
control action is decided by this particular output. So, this is actually the area that
indicates the combined control action considering both the fired rules.

So, this is the way actually, we combine the output of the two fired rules using the OR
operator; now you see. So, this particular output is nothing but actually the fuzzified
output. So, this is an area and corresponding to this particular area, we will have to find
out what is the crisp value and that is why, we will have to go for some sort of your the
defuzzification.

112
(Refer Slide Time: 08:05)

So, if you see like whatever we discuss, the same thing I have written it here. So, let me
just read it out. Now, I am passing S1∗ and S 2∗ ; so these two inputs. So, these two inputs

and corresponding to this S1∗ and S 2∗ for the first fired rule, I will be getting the firing

strength and that is nothing but the minimum of µ A1 ( S1∗ ) and µ B1 ( S 2∗ ) this I have already

discuss and you will have to find out the minimum of these two µ values and that is
nothing but the firing strength of the first fired rule; similarly the firing strength of the
second rule, we can find out. So, using this particular expression, that is, α 2 is nothing

but the minimum between µ A 2 ( S1∗ ) , µ B 2 ( S 2∗ ) . So, we are going to compare these two µ
values and we will try to find out the minimum.

So, this is the way actually we actually determine the firing strength of each of the fired
rules.

113
(Refer Slide Time: 09:20)

And, once you have got the firing strengths, now we are going for combining. So, we try
to find out the combined control action considering both the fired rules. And, as we have
already discussed that we take the help of some sort of max operator and that is nothing
but the OR operator; and by using the concept of max operator, which I have already
discussed, we can find out what should be the output, but this particular output is nothing
but the fuzzified output.

So, we will have to take the help of some sort of defuzzification.

(Refer Slide Time: 10:01)

114
Now, we are going to discuss the defuzzification methods. Now, if you see the literature,
we have got a number of methods for defuzzification and I am just going to use three
methods for defuzzification, which are very popular. Now, here, the first method is the
center of sums method. Now, according to this particular method, the center of sums
method so, what you do is, supposing that we have got two fired rules.

Now, for the first fired rule supposing that I have got this type of the output, that is
nothing but the truncated triangle, and for the second fired rule, supposing that I have got
this type of output, that is another truncated triangle and of course, here, there are some
numerical values, there are some numerical values here. So, those things I am not
writing. Now, supposing that I am getting this particular truncated triangle corresponding
to the first fired rule. So, what I am going to do is, I will try to find out what is the area
of this particular truncated triangle or the trapezium and what should be the center of
area denoted by your the C_1.

Now, for this trapezium, very easily we can find out the area, if I know, this particular
dimension say a, and if I know the dimension say b and if I know this h. So, very easily,
I can find out that A_1 is nothing but half a plus b multiplied by h. So, very easily, you
can find out what should be the area of this particular trapezium; and what should be the
center of area along this particular direction? The center of area from symmetry I can
find out. So, this could be the center of area.

So, actually this along this dimension; so this is going to indicate the center of area, truly
speaking, the center of area could be here and along the direction of b. So, along the
variable; so I can find out this particular numerical value. So, I know this particular A_1
and C_1 and following the same procedure, I can also find out what is A_2 and what is
your C_2 and once you have got your A_1, C_1, A_2, C_2. So, very easily we can find
A1C1 + A2C2
out the crisp output that is denoted by your U and that is nothing but is . So,
A1 + A2
this way is going to give the crisp output, now here, in this mathematical formulation, in
this formula actually, I have used the slightly different notations. So, it indicates the crisp
p

∑ A(α
j =1
j ) fj
value, that is, U ′f = p
. Now, this A (alpha_j) is going to indicate your A_1 and
∑ A(α
j =1
j )

115
A_2 and this f_j that is going to indicate your this C_1 and C_2. So, using this particular
expression; so you can find out actually what should be the crisp output.

(Refer Slide Time: 13:59)

Now, if I just go back to the slide, I can find out. So, if this is one area, I know this area
and center of area, this is another area. So, I know the area and center of area. So, this
will be actually the crisp output. So, we can find out the crisp output and while
controlling the process. So, we will have to depend on the crisp output. So, this is
actually the way, the center of sums method works.

(Refer Slide Time: 14:33)

116
The same thing actually, I have explained here. So, this area, this particular figure has
been used to explain that center of sums method, which I have already discussed.

So, center of sums method, I have already discussed. So, this is the center of sums
method. Now, I am just going to discuss another technique and that is called the Centroid
Method. Now, here actually, what we do is, we take the help of the centroid method.
Now, the combined this output is divided into a few. So, this is actually the output of the
combined control action and that is divided into that is divided into a few standard sub-
regions.

So, what we do is, this whole area is divided into a number of sub-regions. So, this is
nothing but the first sub-region, this is nothing but the second sub-region, this particular
triangle is nothing but the third sub-region, this rectangle is the fourth sub-region, this
particular rectangle is the fifth sub-region. And, we have got a triangle that is the sixth
sub-region now, for each of this particular sub-region, very easily you can find out what
should be the area and the center of area, for example, say if I consider this particular
triangle that is denoted by 1 and very easily, I can find out what is the area. So, if I know
this, this particular dimension say a and if I know, so this particular dimension say b.

1
So, very easily, I can find out, the area and this area is nothing but (a × b) . So, this is
2
nothing but is your area for this particular the triangle and the center of this particular
area can be determined very easily. Now, if it is a, if the total dimension is a. So, if I just
try to measure from here, this would be a one-third, two-third. So, this will be two-third.
So, this will be your two-third of a and this is nothing but is your one-third a. So, I can
find out the center of area very easily for this particular right angled triangle.

Now, if I have got one say rectangle, very easily I can find out this particular area, if I
know these particular dimensions, say a and b. So, area is nothing but a multiplied by b,
and very easily I can find out the center of area. Now, if I have got this type of triangle,
for example, say, this type of right angled triangle and I am measuring from this
particular side. So, what I will have to do is if this is a and if this is your the b, very
easily I can find out the center of area and could be here and this is nothing but is your
one-third a and this is nothing but is your two-third a.

117
So, one-third, two-third a you can find out. So, either we have this type of triangle or you
have got this type of triangle or you have got this particular the rectangle. And, we can
find out for each of this particular sub-region, what is the area and what is the center of
area. Now, once you have got this area and center of area,

(Refer Slide Time: 18:39)

what you can do is now, you can use this centroid method to find out what should be the
crisp output. Now, the crisp output, that is nothing but U ′f is nothing but summation i

equals to 1 to N, A_i multiplied by f_j divided by summation i equals to 1 to n A_i,


where A_i is nothing but the area of i-th small region and f_j is nothing but the center of
area of j-th small region.

Now, I have already discussed how to determine the area and the center of area for each
of these regular sub-regions and once you have got those information, by using this
particular formula; you can find out what should be the crisp output considering all the
combined outputs. So, this is the way, using the centroid method, what you can do is,
you can find out the crisp value and that is nothing but the control action.

118
(Refer Slide Time: 19:52)

Now, I am just going to discuss another method, that is called the mean of maxima
method.

Now, mean of maxima method, I can explain with the help of this figure very easily.
Now, let me repeat that considering both the fired rules supposing that I am getting, this
type of the fuzzified output and this shows actually the combined output of your both the
fired rules. Now, our aim is to find out a crisp value. Now, what I do is, we start from
here and try to move in this particular direction, and we try to find out the value of µ
considering this combined control action this fuzzified output. So, if I move, if I am here,
this is nothing but my µ ; now if I am here, this is nothing but my µ , now if I just follow
this particular principle that starting from here. So, I am just going to move along this
particular direction.

119
(Refer Slide Time: 21:01)

The value of µ is going to vary and corresponding to this particular value of the
variable.

So, I will reach the maximum value of µ and after that, it will remain constant. So, this
will remain the same up to this value of the output variable. That means, starting from
here up to this, the value of this µ will remain constant and that is the maximum value,
the range for getting the maximum value of the µ , that is the membership value. And,
after that µ is going to be reduced, and then once again it will remain constant and it is
going to be reduced.

Now, we try to find out the range for the output variable, where the µ reaches the
maximum. Now, here, it reaches the maximum and after that it will remain the same up
to this. That means, from here to here in this particular range, the µ will reach the
maximum and that will remain constant. So, we try to find out a range for the output,
where we get the maximum value for this membership, and once you have got this
particular range for the output variable, for which we get the maximum value for the µ ,
we try to find out the center of this particular range very easily.

So, we try to find out the midpoint of this particular range and that is nothing but the
crisp output. So, this is the way actually, we try to find out the crisp value corresponding
to the fuzzified output. So, these three methods are very popularly used to determine the

120
crisp value corresponding to the fuzzified output. And, let me repeat, the first method is
your center of sums method. The second method is the centroid method and we have got
the mean of maxima method.

Now, if I compare, these three methods, now question may arise, out of these three
methods which one is the best. Now, it is bit difficult to say, which one is the best, but if
you see in terms of the computational complexity, the mean of maxima method is
computationally the fastest one and this center of sums method and the centroid method
will be difficult in terms of computation, particularly if I consider the non-linear
membership function distribution. Now, supposing that I am just going to consider, say
one Gaussian distribution as the membership function distribution for the output.

So, this is nothing but the Gaussian distribution for the output variable, now supposing
that I have got. So, this is one Gaussian, there could be another Gaussian here and let me
consider for simplicity only one Gaussian. So, what I will have to do is, you will have to
find out what is this particular area; and if you want to find out the area of the shaded
portion. So, there is no way out, but you will have to take the help of integration because
this is actually a non-linear one. So, I will have to find out the area of this particular
shaded portion and we will have to take the help of integration and integration, you
know, it is computationally very expensive. And, that is why, the center of sums method
and the centroid method are computationally very heavy and we can rely on this mean of
maxima method.

Now, I am just going to come back to the same query, that out of these three, which one
is the best. The way I answered that it is bit difficult to declare that this is the best. The
reason I am just going to tell you, I am just going to take a very practical example just to
understand that it is bit difficult to declare which one is the best. Now, let me take this
particular very practical example. Now, this particular course I hope will be taken by
some undergraduate students also.

Now, supposing that after completing your degree, so you are going to join some
industry and supposing that say in a particular organization there are 10 people. So, 10
people have joined at a time and these 10 people supposing that are coming from 10
different institutes, say X_1, X_2 up to say X_10. Now, the institutes are different and 10
people have joined, 10 graduate engineers have joined a particular organization. Now,

121
your performance is going to tell you the output. So, your output will be decided by your
performance, whether you are coming from the institute X_1 or X_2 and or X_10 that is
not the thing which has to be considered. The main considerations would be your
performance, your output, the same is true here whether you are using the center of sums
method or centroid method or mean of maxima method, anyone is good.

Because based on that we are going to develop the database and the optimized rule base,
so that this particular fuzzy reasoning tool is going to give you that output for a set of
inputs as accurately as possible. So, based on this particular method of defuzzification,
whether it is center of sums method or centroid method or mean of maxima method, we
are going to optimize. We are going to determine the knowledge base of the fuzzy
reasoning tool.

And, in fact, whether we are following a mean of maxima or center of sums method or
centroid method, that is not the thing. We will see the performance like the whether the
fuzzy reasoning tool or the fuzzy logic controller is going to or it is able to determine the
output for a set of inputs accurately or not. So, that is actually the performance of this
fuzzy reasoning tool and we are interested to get very accurate modeling for this
particular input-output relationships of a process. Now, how to ensure that? That I am
going to discuss in the next lecture.

Thank you.

122
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 09
Applications to Fuzzy Sets (Contd.)

We have already discussed the working principle of Mamdani approach of fuzzy


reasoning tool in the form of fuzzy logic controller. Now, today, we are going to discuss
one numerical example like how to solve a numerical example using the Mamdani
approach.

(Refer Slide Time: 00:31)

Now, the statement of the problem is as follows; now let us suppose a problem scenario
related to navigation of a mobile robot in the presence of four moving obstacles.

123
(Refer Slide Time: 00:56)

Now, here we are going to solve one navigation problem of mobile robots and this
particular problem is known as actually the dynamic motion planning problem. So, this is
the dynamic motion planning problem and this is nothing, but a navigation problem of a
mobile robot in the presence of some moving obstacles. Now, here, I am just going to
explain the problem scenario. Now, for simplicity, the robot the physical robot has been
represented by a point. So, this is nothing, but the point robot and the starting position of
the point robot is denoted by S and supposing that, the goal of this robot is indicated by
the point G. So, G is the goal and S is the starting point.

Now, let us consider one hypothetical situation that there is no, such moving obstacles
here. Now, if there is no such moving obstacle and this is the starting point and this is the
goal, and if I say to find out one collision-free time-optimal path, the robot will try to
find out. This particular path and this is actually the optimal path, if there is no such
moving obstacles in the workspace. But, here, the physical problem is slightly different,
different in the sense, we have got a few moving obstacles for example, say we have got
obstacle 1, that is denoted by O_1 then comes we have got obstacle O_2, then we have
got obstacle O_3 and obstacle O_4.

Now, what is our aim? Our aim is to find out the collision-free and time-optimal path for
this particular robot starting from point S to reach that point G. Now, to solve this
particular problem, the first thing we will have to do is you will have to find out the most

124
critical obstacle for the robot. Now this is actually the direction of your speed. So, this
are the velocity directions for the different obstacles. Now, here, to determine the critical
obstacle like your most critical obstacle so, what I will have to do is, I will have to see, at
least, your two things: one is the distance between the present position of the robot and
that of the obstacle, and another is the direction of movement.

Now, here, if you see the obstacle O_1 is moving in this particular direction, obstacle
O_2 is moving in this particular direction, O_3 is in this direction and O_4 is in this
particular direction. Now, if I consider both the distance as well as the moving direction.
So, very easily, we can find out the most critical obstacle, the most critical obstacle is
nothing, but is your O_2. So, O_2 is the most critical obstacle; that means, to avoid
collision with the most critical obstacle. So, I will have to make some planning with the
help of your fuzzy reasoning tool.

So, I am just going to develop one fuzzy logic controller or the fuzzy reasoning tool so,
that the robot can avoid collision with this particular, the most critical obstacle. Now, to
solve this particular problem actually, what we do is, we take the help of one fuzzy
reasoning tool or fuzzy logic controller. So, let me write here FLC; and for this FLC,
there should be a few inputs and output also. Now, what we do is, the distance between
the robot and the most critical obstacle and we will have to consider its boundary. So,
this particular distance is nothing, but one input for the fuzzy reasoning tool and another
input could be the included angle, that is your G S and O_2.

So, this particular angle could be another input. So, let me write here, angle is another
input say denoted by A and we will have to find out one output of the fuzzy reasoning
tool, that output could be the angle of deviation to avoid the collision with a the most
critical obstacle. Now supposing that the angle of deviation to avoid the collision, it
could be something like this. So, this could be the direction of movement of the robot
just to avoid collision with the most critical obstacle. Now, if this is the scenario so, this
particular angle; that means, if I just give one point here, say D,the angle GSD is noth
ing, but actually the deviation; now this deviation so, I can write down that is by say D′ .
So, D′ is nothing, but the angle of deviation to avoid collision with this particular, the
most critical obstacle. Now, let us see, how to solve this particular problem using the
principle of the Mamdani approach of fuzzy reasoning tool.

125
(Refer Slide Time: 06:52)

Now, here, the membership function distributions for this distance and the input angle.
So, we have considered something like this and this is nothing, but the membership
function distribution for the output, that is, the deviation. Now, if you see, if you
concentrate on the database or the membership function distribution for the distance. So,
the range for the distance has been consider as 0.1 to 2.2 meter and the whole range that
is actually distributed or that is represented using four linguistic terms.

For example, we have got very near, then comes near far and very far and for simplicity
the membership function distribution has been consider to be triangle. Now, I could have
consider some sort of non-linear distribution also like Gaussian, but here, for simplicity,
we have considered the triangular membership function distribution. And, you can see
that there is a overlapping, for example, this near distance and far distance, there will be
some sort of overlapping and that is actually, according to the definition of the fuzzy
sets.

Now, the angle input that is divided into the range is divided into 4 or 5 linguistic terms,
for example, say. So, if the angle is between minus 45 (minus mean say if I consider this
is clockwise and plus if I consider it is anti-clockwise) and plus 45; that means, if I
consider my left side, left hand side is negative and my right hand side is positive that is
also possible. So, this A stands for the ahead, the angle is ahead; so, this angle ahead that
is defined in the range of minus 45 degree to plus 45 degree.

126
Similarly, we have got ahead right ART and that is defined in the range of 0 to 90
degree, then we have got right, that is, RT plus 45 to 90, then we have got ahead left. So,
from minus 90 to 0 and then, comes your left that is denoted by LT; so, minus 90 to
minus 45. So, there are 5 such linguistic term to represent the angle input. Now,
similarly, to represent the deviation that is the output of the fuzzy logic controller, once
again, we are using five linguistic terms like your ahead right, then comes ahead left and
left. So, this is the way actually we will have to manually design, the data base, that is
the membership function distributions for the two inputs and one output.

(Refer Slide Time: 10:12)

And, once they have got this particular membership function distribution now, we are in
a position to design this particular rule base, that is the manually constructed rule base;
that means, the based on the information of this particular problem of the designer, the
designer is going to design this particular rule base.

Now, we have seen that for the distance input we have got like four linguistic terms and
for the angle input, we have got five linguistic terms. So, we have got 4 multiplied by 5
like 20 possible combinations of the input variables and we have got a maximum of 20
rules. Now, this table shows all such 20 rules. Now, here, I have put distance and
distance is expressed in terms of the linguistic terms like very near, far and very far and
this particular angle input is represented as left, ahead left, ahead right and right.

127
Now, the first entry that is your first row first column entry it indicates A; A means your
ahead. So, this is nothing, but the deviation. So, corresponding to this particular entry, if
I write down the rules, it looks like this. So, if the distance D is very near and the angle is
your the left then the output that is nothing, but the deviation, that is, D′ . So, this is
nothing, but D′ is nothing, but is your ahead. So, this is nothing, but a rule and similarly,
we have got 20 such rules, here.

Now, this particular rule base has to be manually designed, but it may not be the optimal
one and we will be discussing one method, how to optimize this particular rule base after
sometime. Now, here, in this numerical example, the statement is not yet complete. Now,
here, what I will have to do is, we will have to find out the output, that is, the deviation
for the set of inputs like distance is 1.04 meter. And, angle like GSO 2 is nothing, but 30
degrees and we will have to use Mamdani approach and we are going to use the different
methods of defuzzification. Now, let us see how to proceed with this particular numerical
example and how to solve it.

(Refer Slide Time: 13:07)

Now, before I just go for that, let me once again try to concentrate here. So, the distance
is 1.04; that means, your, I might be here. So, let me just draw here.

128
(Refer Slide Time: 13:20)

So, this is my 1.04 distance, say it is 1.04 meter and the angle input it is 30. So, might be
might be I am here. So, if I am here, now corresponding to this particular 1.04, this
particular distance can be called near with this much of membership function value, and
this can also be called far with this much of membership function value.

Now, similarly, this angle 30 degree can be called ahead with this much of membership
function value and it can also be called ahead right with this much of membership
function value. Now, that means, if we just write it down here. So, corresponding to this,
this is nothing, but is your µ NR and this is nothing, but is your µ FR and here. So, this is

nothing, but is your µ A and this is your µ ART .

So, this is the way, actually we can find out the membership function distribution for the
set of inputs. Now, with this particular information, let me start with finding the solution.
Now, as I told, that the distance of say 1.04 meter can be called either near or far,
similarly the angle of 30 degree may be called either ahead or ahead right. Now, let us
try to find out your the membership function value.

Now, how to determine the membership function value, it is very simple, now let me
first concentrate on the distance, near. So, if you see the membership function
distribution for the near distance, you will be getting this type of distribution for the near

129
lying in the range of 0.8 to 1.5, and here, the input is your 1.04. Now, corresponding to
1.04, my aim is to find out, what should be this particular µ .

So, this µ actually, I will have to find out. Now, what we do is, we use the principle of
similar triangle. Now, if I consider say this is one triangle, and another triangle, if I
consider something like this, if I consider another triangle is something like this. So, this
is another triangle. So, these two triangles are actually similar. Now, if these two
triangles are similar, then we can say that this angle is actually common to both the
triangles, then this particular angle is equal to that particular angle.

So, I can use the principle of similar triangle. So, similar triangle, if I use, then very
easily you can find out what should be the µ value. Now, here, I have written. So, x
divided by your 1.0. So, x divided by 1.0 and opposite to that is this particular angle and
that is actually common for both the triangles, then you concentrate on this particular
angle.

So, now opposite to this is nothing, but 1.5 minus 0.4. So, 1.5 minus 1.04 and opposite to
this is nothing, but 1.5 minus 0.8. So, 1.5 minus 0.8, and if you solve it, you will be
getting x equals to 0.6571. So, this is the way, by using the principle of similar triangle,
very easily, I can find out what should be the value of this particular x and that is
nothing, but your µ NR . So, this is nothing, but is your µ NR , ok. So, µ NR is nothing, but

0.6571, now once you have got this particular µ NR . So, very easily you can find out, what

is µ FR .

130
(Refer Slide Time: 17:41)

Now, this µ FR is nothing, but is your 1 minus 0.6571 and that is nothing, but is your

0.3429. So, this 0.3429 is nothing, but is your µ FR .

Now, by following the similar procedure, the angle input of 30 degree can be called
either ahead or ahead right. So, this can be called ahead with the membership function
value of 0.3333 and this can also be called ahead right with the membership function
value of 0.6667, and once you have got these membership function values, we are in a
position just to find out what should be the fired rules. Now, here, we have seen that the
distance input could be either near or it could be far, similarly the angle input, it could be
either the ahead or we have got ahead right.

Now, using this particular information, I can write down the four fired rules, there could
be a maximum of four fired rules. Because here we have got two possibilities for the
distance and we have got two possibilities for the angle. So, 2 multiplied by 2, there
could be a maximum of 4 fired rules, but what is the total number of rules we have in the
rule base? That is nothing, but 20, then out of 20, a maximum of four can be fired. Now,
what are the fired rules, if distance is near, the first fired rule is if distance is near (now,
as I have already mentioned that this particular AND is actually AND operator not the
conjunction and that is why you have to write in capital AND) AND angle A is then
deviation is RT.

131
So, this is actually the first fired rule. So, I am using NR and A combination, now, I will
have to use NR and ART combination; that means, if distance is NR and angle is ART
then deviation is A. So, this is nothing, but the second fired rule, then comes your the
third fired rule. So, I will have to use this particular FR along with A. So, if distance is
far, that is, FR and angle is A, that is, ahead, then deviation is ahead right. So, this is the
third fired rule, then comes the fourth fired rule, if distance is FR and angle is ART then
the deviation is A. So, as I mentioned there could be a maximum of four fired rules and
out of these of 20 rules, these 4 rules could be fired.

Now, let us see how to determine the output corresponding to each of the fired rules,
now, to determine the output of each of the fired rules the first thing we will have to do
is, we will have to find out the firing strength of each of the fired rules.

(Refer Slide Time: 21:09)

Now, the firing strength for the first rule that is denoted by is your α1 is nothing, but the

minimum between µ NR and µ A . Now, minimum between 0.6571, that is your µ NR and

µ A is nothing, but 0.3333 and we will have to find out the minimum.

So, this is nothing, but the minimum. Similarly, the firing strength for the second rule
that is α 2 is a minimum between µ NR and µ ART and that is nothing, but the minimum
between 0.6571 and 0.6667, and the minimum is your 0.6571, then comes your third

132
fired rule. And, its firing strength, that is, α 3 is nothing, but the minimum between µ FR

and µ A that is the minimum between 0.3429 and 0.3333 and the minimum is your this.

Now, we will have to concentrate on the firing strength of the fourth rule that is your α 4

and that is nothing, but the minimum between µ FR and µ ART and that is nothing, but the
minimum between 0.3429 and 0.6667 and actually, the answer is 0.3429. So, this is the
way actually, we can determine the firing strength of each of these particular rules. And,
once you have got the firing strength, now we are in a position to determine what should
be the fuzzified output corresponding to actually each of the fired rules.

(Refer Slide Time: 23:11)

Now, here, if you see the first fired rule, that is represented by actually this; this is
nothing, but the first fired rule, and if you see this particular rule, if distance is near and
angle is ahead then the deviation is right. So, this is actually the first fired rule and what
are the inputs? The inputs are: distance is your 1.04 meter and the angle is your 30
degrees. So, these two inputs, we are passing and corresponding to the distance equals to
1.04, I will be getting actually µ and this is nothing, but is your µ FR .

So, this is your µ sorry µ NR . So, this is your µ NR and similarly, corresponding µ NR to

the angle of 30 degree, I will be getting one µ here. So, this particular µ and this µ is

nothing, but is your µ A , now we will compare. So, µ NR and your µ A and there is an

133
AND operator, this is a minimum operator. So, we will try to find out the minimum
between this µ NR and µ A and if you see the minimum is nothing, but your µ A .

So, I will be getting actually the output of the first fired rule is something like this. So,
this is actually the shaded area, is actually the output corresponding to the first fired rule.
Now, by following the similar procedure; I can also find out your what should be the
output corresponding to the second fired rule. So, this is the second fire rule and here
corresponding to the distance equals to 1.04, I will be getting µ NR is nothing, but this.

So, this is your µ NR and corresponding to this particular angle. So, I can also find out the

µ and that is nothing, but is your µ ART .

And, if I just compare these two µ values. So, the minimum will be your µ NR and
corresponding to that, actually I can find out, what could be the fuzzified output. So, this
is nothing, but the fuzzified output and truly speaking, this is actually the membership
function distribution for the ART direction, ok. So, this is this shaded portion is nothing,
but the fuzzified output corresponding to the second fired rule.

Now, by following the similar procedure; so, I can also find out what should be the your
the fuzzified output corresponding to your third fired rule. So, if I just concentrate on the
third fired rule, this is the third fired rule and corresponding to the distance. So, this is
one µ so, I will be getting one µ that and I will be getting another µ here
corresponding to the second input, and if I just compare these two µ s. So, this is the
minimum. So, I will be getting the fuzzified output is nothing, but this shaded area.

Now, corresponding to the fired rule; now corresponding to the fourth fired rule, what
you can do is, so, we can find out what should be the fuzzified output, now here
corresponding to the distance input, this is nothing, but your µ and here, corresponding
to the angle, this is nothing, but is your µ and if I compare these two µ s, this is the
minimum. So, the fuzzified output corresponding to the fourth fired rule will be
something like this. So, this is the way actually, we can find out what should be the
fuzzified output for each of the fired rules, and after that actually we will have to use the
OR operator just to combine all such outputs into one, and as I told this OR operator is
the max operator.

134
So, we will have to superimpose all such fuzzified outputs, the truncated area whatever
we have got and then, we will have to use this OR operator or the max operator. So, just
to find out what should be the fuzzified output considering to all four fired rules. Now, if
you see this particular fuzzified output corresponding to your all the fired rules, now you
will be getting actually this type of area.

(Refer Slide Time: 28:04)

Now, if you just see so, corresponding to one of the fired rules, you will be getting these
types of your output. So, this output is corresponding to one fired rule. So, this is the
output. Now, corresponding to another rule the output is something like this. So, this is
corresponding to another rule and then, comes your another fired rule. So, the output is
something like this. So, this is actually another fuzzified output. So, this is the fuzzified
output and corresponding to another, you can find out.

So, this is nothing, but the fuzzified output. So, all such things actually, you will have to
superimpose to find out what should be the fuzzified output considering all the fired
rules. Now, these are all fuzzified outputs, that means, this is actually the fuzzified
output now, which cannot be directly used for controlling a particular process or to take
any decision. So, what you will have to do is, from this fuzzified output actually we will
have to find out the crisp output. So, how to find out the crisp output that I will be
discussing.

Thank you.

135
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 10
Applications to Fuzzy Sets (Contd.)

So, corresponding to the four fired rules, this is actually the fuzzified output, the
combined fuzzified output which you have got.

(Refer Slide Time: 00:22)

And, as I told, our aim is to determine what should be the crisp output. And, to determine
the crisp output actually, we will have to take the help of defuzzification. Now, I am just
going to discuss the different methods of defuzzification. Now, let us try to concentrate
on the methods of the defuzzification.

136
(Refer Slide Time: 00:52)

Now, the first method for which I have already discussed the principle, and now I am
just going to apply to solve this numerical example. Now, the first method, this particular
method is your the center of sums method, now let us see according to this, how to
determine the crisp output. Now, if you see corresponding to the different fired rules, we
will be getting the different shaded regions as the fuzzified output; for example, say if I
concentrate on this particular output first.

So, this particular output corresponds to a particular fired rule and this shaded portion, I
have just drawn it here, and what is our aim? Our aim is to find out like what should be
the area of this particular shaded portion, and where should be the center of area. Now,
this is very simple now to find out the area and center of area, you can do it in different
ways. Now, one of the very simplest ways could be something like this, for example, say
I have got this type of say truncated area.

So, this type of truncated area I have got, ok. Now, how to find out the area and center of
area, it is very simple. So, you just divide into two regular regions. So, this is nothing,
but a triangle. So, this is nothing, but a triangle and this is nothing, but one rectangle.
Now, for this particular triangle, very easily, I can find out what should be the area. Now,
if I know, this particular dimension is say a and if I know this particular dimension is say
1
b. So, the area this is A_1. So, this is A1 = × a × b and where will be the center of area?
2
The center of area is C_1.

137
So, starting from here, it is two-third, one-third. So, this is nothing, but this particular
dimension starting from here. So, it will be you two third a. So, I can find out this is
nothing, but your two third a. So, it is two-third, one-third. So, if this is the situation,
very easily I can find out what is C_1. Similarly, for the second region the sub region
that is the rectangular one. So, if I know so, this particular dimension is say C. So, very
easily I can find out what is A_2 so, for this area is A_2.

So, A_2 is nothing, but b multiplied by c, and very easily you can find out what is C_2,
that is the center of area from symmetry, I can find out very easily. And, once you have
got this particular thing, you just add what should be the total area. So, the total area is
nothing, but A_1 plus A_2 and if you calculate this will become 12.5 and the center of
area. So, how to find out the center of area? It is very simple that is nothing, but
A1C1 + A2C2
.
A1 + A2

Now, this will give you the center of area. Now, for this actually, the shaded region if
you find out the center of area so, you will be getting as 71, ok. This is how to find out
actually the area and center of area corresponding to one fired rule and its shaded region
as the fuzzified output. Now, if you follow the same principle for the other fired rules,
for example, say if I concentrated on say another fired rule. So, this particular fired rule
that is your so, this is from minus 45 to 45. So, this is your this particular thing.

(Refer Slide Time: 05:08)

138
So, this particular shaded region, if we consider, very easily you will be able to find out
what should be the area and center of area. Now, how to find out the area and center of
area it is very simple, it is just like a trapezium. So, for this trapezium, if I know this
particular dimension say a and this particular dimension say b and if I know so, this
particular dimension say h. So, very easily I can find out this area is nothing, but
1
× (a + b) × h . So, this is nothing, but the area.
2

And, if you calculate, you will be getting this as the area and how to find out the center
of area, from actually, it is symmetric, I can find out the center of area; if it is minus 45
and if it is plus 45. So, the center of area should pass through 0. So, this is the way
actually you can find out the area and center of area. The same method you can also
adopt for this, now if you just do here.

(Refer Slide Time: 06:29)

So, what you can do is your for this particular area so, 0 to 90. So, 0 to 90 is nothing, but
this. So, I can use the same principle of how to determine actually your area and center
of area.

139
(Refer Slide Time: 06:43)

So, this shaded region. So, I can find out area and center of area following the same
method and for this, you will be getting the area is nothing, but is your 25 and the center
of area. So, this is 0 and 90 center of area will be your 45. And, next we concentrate on is
another this thing another shaded area and this is nothing, but is your another shaded area
is something like this and that means, I am here. So, this particular shaded area and I can
find out following the same principle, what is the area and what is the center of area. The
area will become equal to this 25.5699 and the center of area will be 0. Now, for each of
the fired rules, I am able to find out what is area and what is center of area.

Now, according to this your the center of sums method, the crisp output can be
determined as like area that is your area multiplied by the center of area for one fired
rule, then area multiplied by center of area for another fired rule, area multiplied by the
center of area for another fired rule, then area and center of area and here we put sum of
all area values. So, here, it will be getting actually the crisp output. So, using your center
of sums method so, I can find out so, this that is your 19.5809 as the crisp output. Now, I
will see for the same problem actually how to use the other method of defuzzification.

140
(Refer Slide Time: 08:48)

Now, the principle of centroid method actually I have already discussed. Now, here
actually, what we do is, we try to divide the whole region, the whole fuzzified output
after considering all the fried rules. We try to divide into a number of some regular sub-
regions. For example, say if I consider all the four fired rules, the combined output is
nothing, but this.

So, if this is nothing, but the combined output, say this is the combined output, after
considering all four fired rules. So, what we do is, we try to divide this into a number of
regular sub-regions. Now, the sub-region could be something like this, for example, say
one sub-region could be this. So, one is nothing, but a triangle, next we can go for the
second sub-region that is nothing but the rectangle, then here, we have got a triangle. So,
this is nothing, but 3 and we have got actually here, one such your rectangular sort of
thing and this is nothing, but is your 4.

So, this particular shaded portion the combined control action, is divided into 4 sub-
regions regular sub-regions. Now, if I concentrate on each of these particular regular sub-
regions for example, say if I concentrate on sub-region 1. So, this is actually the sub-
region 1, that is nothing, but a triangle. So, I know the dimension. So, very easily, I can
find out what is this particular area of the triangle and I can also find out what should be
the center of area that is your two-third one-third principle, and I will be able to find out
the center of this particular area.

141
Now, I concentrate on your the second one, that is nothing, but the rectangle. So, if I
consider so, these particular rectangle here. So, what will be getting is your so, this is the
rectangle. So, very easily I can find out what is the area and the center of area. So, this
will become equal to 0. Next is your, I can concentrate on the third region and that is
nothing, but your the triangle. So, this small triangle, that is this particular small triangle
if I consider; I will be getting this type of triangle here, and I know the dimensions. So,
very easily, I can find out what is area and center of area, then I can concentrate on this
rectangle.

So, your this particular rectangle and I can find out the area and the center of area. So,
this is the rectangle, area is nothing, but this and center of area is nothing, but this. Now,
for each of the sub-regions, I know the area and center of area, now I can find out what
should be the crisp output corresponding to this.

(Refer Slide Time: 12:13)

And, the crisp output corresponding to this is nothing, but U; U is a nothing, but A
divided by B, where A is your the area and centre of area, area and center of area then
area and center of area, area and center of area. There are four regular sub-regions, and
here, we add all the area values. And, if we just calculate A divided by B, you will be
getting the crisp value, that is nothing, but 19.4450.

142
So, this is nothing, but the crisp output corresponding to the fired rules. So, this is the
way actually, you can find out the crisp output using the centroid method. Now, then
comes your another method that is called the mean of maxima.

(Refer Slide Time: 13:09)

Now, the principle of this particular mean of maxima; I have already discussed now let
us see how to use the same principle to find out the crisp output. Now before I discuss
once again the mean of maxima, the first thing we will have to do is. I will have to
indicate actually the combined control action, that is your fuzzified output considering all
the four fired rules.

Now, if I just indicate, I will be getting this is actually the combined control action, the
fuzzified output. So, this is the area, ok. Now, what you do is. So, we start from here and
we increase this particular variable in this particular direction and we try to find out. So,
corresponding to these, we try to find out the µ , corresponding to this we try to find out
µ , here we try to find out µ we try to find out µ . So, we can find out that the µ is
going to vary or it may remain constant or different values for this particular your
deviation angle.

Now, if I just look into this. So, there is a possibility that it will be getting a range for the
deviation, the range is something like this. So, this particular range corresponding to
which actually you will be getting the maximum value for this µ . So, starting from here

143
up to this, the µ is something like this, that will remain constant and we are able to
reach the maximum value for this particular µ . And, if you calculate for this problem,
this will become equal to 0.6571.

Now, here actually, what you can do is, I can identify the range for the deviation
corresponding to which I am getting the maximum value for this particular µ and that
particular range starts from here and it will end here. So, this is nothing, but actually the
range, where we are getting the maximum value for this particular membership that is
µ ; and once you have got this particular range. So, what we do is we try to find out the
mid value of this particular range as the crisp output. For example, say, here we are
getting one value for the range is your minus 15.4305 degrees and the maximum value of
the range for which you have got maximum value of µ is 15.4305 degree and its mid
value is nothing, but 0. So, your the crisp value is nothing, but 0.

So, using the mean of maxima, I will be getting your the crisp value and that is equal to
0, now you see the problem. Now, by using the three different methods of your
defuzzification; in the first method we have got the crisp value for the same set of inputs
as 19.5809 and using this particular the centroid method, what you have got is, the crisp
value is something like this. So, this is actually the crisp value corresponding to the
centroid method and then, using this, your the mean of maxima method actually I am
getting this crisp value and that is nothing, but 0.

(Refer Slide Time: 17:12)

144
So, even the inputs are the same. So, by using the three different methods of your
defuzzification, I am getting three different values, now my question is, which one to be
believed and which one is the most accurate. The answer is you can believe any one and
you can carry out the optimization because ultimately, if you want to utilize this fuzzy
reasoning tool or the fuzzy logic controller, the main thing which you will have to do is
you will have to find out, what should be the optimal knowledge base.

Now, how to find out the optimal knowledge base, I have not yet discussed, I will be
discussing in details. Now, this knowledge base consist of your data base and the rule
base and by using any one of these methods of defuzzification, I will be getting the
output and based on that particular output, I am just going to use some optimizer and
with the help of training scenario, I am just going to develop what should be my optimal
knowledge base, that is the optimal data base and optimal rule base.

Now, once you have got that particular knowledge base, the optimal knowledge base,
your fuzzy reasoning tool is going to perform in a very nice way, in a very good way and
it does not depend on which method of defuzzification you have used. You can either use
the center of sums method or you can use the centroid method or the mean of maxima
method and you can find out, what should be the optimal knowledge base that is the
optimal data base and the rule base, so that this fuzzy reasoning tool can perform in the
optimal sense for any set of input parameters.

Now, here, I just want to compare these three methods of defuzzification once again in
terms of computational complexity. Now, this I have already discussed in my last
lecture, but once again I just want to mention that in terms of computational complexity,
mean of maxima method is the fastest. Because in center of sums method or centroid
method, you will have to calculate area and center of area, which is computationally very
expansive particularly if you consider the non-linear distribution of this membership.

Now, if we consider the non-linear distribution for example, say if I consider say
Gaussian or if I consider say the bell-shaped membership function distribution, and if I
have to find out what should be the area and center of area. For example, corresponding
to the µ supposing that I am getting so, this is the area. So, for determining this
particular area, we will have to take the help of your integration and you know that
integration is computationally very expensive and that is why, it is better to use the mean

145
of maxima because it does not calculate the area or the center of area, but still it can find
out actually one crisp value. And using this particular crisp value, you can carry out
some sort of training or some sort of optimization for the fuzzy reasoning tool so, that we
can find out the database and the rule base.

And, once you have got this particular the optimal knowledge base for the fuzzy
reasoning tool, now we are in a position to use this fuzzy reasoning tool online because
once the training is obtained, once it is trained, now the computational complexity is not
a problem. And, within a fraction of second using the fuzzy reasoning tool, you will be
getting the output for a set of inputs. So, this is the way actually the Mamdani approach
works, but as I told that this particular approach is having some problem, it has got some
merits and demerits. This I have already discussed, but once again, let me repeat a little
bit now, for this Mamdani approach actually.

(Refer Slide Time: 21:50)

So, it has got very good interpretability, this I have already mentioned now; that means,
if you read this particular rule, very easily you can find out what should be the control
action; that means, it has got very good interpretability, but the problem is actually the
accuracy. Now if we want to get more accuracy in Mamdani approach. So, what you will
have to do is. So, you will have to use large number of linguistic terms, let me take a
very simple example. Now, supposing that I am just going to develop Mamdani approach

146
having two inputs and one output, I_1 and I_2 are the inputs and output is nothing, but
O.

Now, if I use say four linguistic terms for I_1 and four linguistic terms for I_2. I will be
getting 4 multiplied by 4 that is 16 rules. Now, similarly if I consider say 5 linguistic
terms for I_1, 5 for your I_2. So, I will be getting say 25 rules. Now, remember one
thing, if the range is divided into a large number of small-small sub-regions or segments;
that means, if I use more number of linguistic terms to represent a particular variable
there is a possibility I will be getting the better accuracy.

Now, if I use more number of linguistic terms, for example, for I_1 if I you say 10
linguistic terms and for I_2 if I use a 10 linguistic term, I will have say 100 rules now.
So, if I need more accuracy, the number of rules is going to increase and if the number of
rules increases what will happen to the computational complexity, that is also going to
increase. That means, in Mamdani approach, I may get slightly better accuracy or the
precision, but at the cost of more computation and which is also not desirable.

So, as I told that Mamdani approach has got both merits as well as demerits. Now, in the
next lecture, I will be discussing another method in which you will see that
interpretability may not be so much good, but accuracy could be better that I will be
discussing in the next lecture.

Thank you.

147
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 11
Applications of Fuzzy Sets (Contd.)

We are going to discuss the working principle of another very popular fuzzy reasoning
tool, that is known as Takagi and Sugeno’s approach.

(Refer Slide Time: 00:27)

Now, here, in Takagi and Sugeno’s approach, what you do is, a particular rule, say i-th
rule is represented as follows. Now, here, the rule is written like this if x_1 is A1i and x_2

is A2i and there are a few terms here and x_n is Ani then y i =a0i + a1i x1 + ............ + ani xn .
Now, here, I have already mentioned a little bit that in this approach for a particular rule,
the output of a rule is nothing, but the function of the input parameters or the input
variables. Now, here, in this rule I am considering that there are n such variables like
your x_1, x_2 up to x_n. So, this output that is y^i that is represented as a function of the
input parameters. Now, if you see, we have got a few coefficients for example, say
a0i , a1i ,........, ani these are all coefficients.

Now, these coefficients are to be predetermined. How to determine that? Now, what I do
is, we take the help of some optimisation tool and with the help of some available data;

148
so, we try to find out what should be the numerical values of the coefficients. Now,
generally, we use an optimization tool that is known as the least squared technique. So,
this least squared technique actually, we generally use, to find out the values for the
coefficients.

Now, you can see this output has a linear function of the input parameters and that is
why so, this approach can be termed as linear approximation of a non-linear system.

(Refer Slide Time: 02:46)

So, this is nothing, but actually the non-linear system’s representation, as a combination
of several linear systems. Now, here, if I know the i-th rule; now what you do is, we try
to find out the strength of the i-th rule or the weight of the i-th rule. Now, this weight of
the i-th rule is represented as wi = µ Ai 1 ( x1 ) × µ Ai 2 ( x2 ) × .............. × µ An
i
( xn ) . That means, the
strength of the i-th rule that is represented as the membership value of A_1
corresponding to the i-th rule, the moment I am passing this x_1 as the input variable
multiplied by the µ A 2 the corresponding to the i-th rule, and the moment I am passing
this x_2 as the input variable.

So, all such µ values we try to find out and the last term is nothing, but µ An
i

corresponding to your x n. So, here as if we are passing all the input parameters or the
input variables like x_1, x_2 and x_n and we try to find out what should be the µ value,

149
what should be the µ value and what should be the µ value and after that we multiply
all the µ values and that will be the weight of the i-th rule.

(Refer Slide Time: 04:31)

And, once I got this particular weight of the i-th rule, now next what we can do is, we
can find out the output very easily using this particular the expression. Now, for the
k

∑w y i i

combined control action, this output y = i =1


k
. Now, here, k indicates the number of
∑w
i =1
i

the fired rules and what is y^i; y^i is nothing, but the output corresponding to your i-th
rule and this w^i is nothing, but the weight. Now, how to determine so, this particular
output, which is the function of the input parameters.

150
(Refer Slide Time: 05:21)

So, that I am going to discuss after sometime. But, before that let me tell you that, here in
Takagi and Sugeno’s approach, the way we express the output as a function of the input
parameters. Now, if I just read one rule so, no control action will be coming to my mind
and that is why actually here the interpretability is much less. Although we can go for the
better accuracy, because we take the help of some optimizer, some optimisation tool and
if you just optimise with the help of some known data. So, there is a possibility, you will
be getting very accurate coefficients, the values of the coefficients and ultimately, you
will be getting very accurate output.

So, the predication will be a really good and very accurate in this Takagi and Sugeno’s
approach. Now, to explain the working principle of this approach further, so, we are
going to take the help of one numerical example. Now, let me give the statement of this
particular the numerical example which we are going to solve. Now, this statement is as
follows: like your say a fuzzy logic-based expert system is to be developed, that will
work based on Takagi and Sugeno’s approach to predict the output of a process. Now,
the database of the FLC is shown. So, I am just going to show you the data base, that is
the membership function distribution of these particular FLC, particularly for the two
inputs because for the output variable, there is no such membership function distribution.

So, as there are 2 inputs I_1 and I_2 and each input is represented using three linguistic
terms for example, say low, medium and high for I_1 and near, far and very far for I_2.

151
So, there is a maximum of 3 multiplied by 3, that is, 9 feasible solutions. The output of
the i th rule that is denoted by this y^i, i varies from 1 to up to 9 is expressed as follows.

(Refer Slide Time: 07:43)

So, I am just going to give you that particular expression for the output of the i-th rule.
Now, the output of the i-th rule that is y^i is nothing, but a function of two input
variables, that is your life f of I_1, I_2 and that is nothing, but a ij I1 + bki I 2 . So, what you

do is we consider so, this is a linear function of the input variable.

So, output of i-th rule is nothing, but the linear function of the two input variables, your
I_1 and I_2, where j k are 1 or 2 or 3. Now, this a1i for example, if I put j equals to 1. So,

I will be getting =
a1i 1,=
a2i 2,=
a3i 3 , if I_1 is found to be low, medium and high,

respectively. Now, similarly this b1i that is I am just going to put k equals to 1. So,

=
b1i 1,=
b2i 2,=
b3i 3 , if I_2 is seem to be near, far and very far, respectively.

Now, we will have to calculate the output of the FLC corresponding to the inputs like I_1
equals to 6.0 and I_2 equals to your like 2.2. So, this is the statement. So, this is a very
simple system having 2 inputs and 1 output. So, I have got this particular I_1 and I_2 and
I will have to find out this particular output and this is Takagi and Sugeno’s fuzzy logic
controller. And, let us see, how to determine the output for a set of inputs.

152
(Refer Slide Time: 09:48)

Now, if you see the membership function distribution for the inputs like this is the
membership function distribution for the first input, that is your I_1, the range for I_1 is
5.0 to 15.0. And, this particular range is expressed using 3 linguistic terms like your low,
medium and high and as I told previously that for simplicity, we have consider the
triangular membership function distribution.

Now, here, we consider one isosceles triangle. Similarly, for this low, we consider some
sort of your the right angle triangle and for a high also, we are going to consider some
sort of say right angle triangle. Now, similarly for this I_2 the second input variable,
what you do is, the whole range starting from 1.0 to 3.0, that is divided into 3 linguistic
terms; that means, 3 linguistic terms are used to represent I_2, one is your this NR is the
near, FR stands for far and VFR is your very far. So, using actually the three linguistic
terms, we can represent the input variables like your I_1 and I_2 and once I got this
particular representation for the inputs that is nothing, but the database.

153
(Refer Slide Time: 11:23)

So, now you are in a position to find out what should be the output for the set of inputs.
Now, let us try to concentrate. So, here I_1 is 6.0 and this should be in fact, your I_2. So,
here there is a small mistake. So, this should be I_2. So, I_2 is equals to 2.2. Now
corresponding to your the 6.0 and 2.2. So, let us try to find out like the membership
function value. So, 6.0; that means, I am here so, 6.0 I could be here, now 6.0 can be
called medium with some membership function value and it can also be called low with
another membership function value.

So, it is called medium with this much of membership function value, say µmedium and
this can be called low with another membership function value and this is nothing, but is
your µlow . Now, similarly like 2.2 so, this value of I_2 is 2.2 and corresponding to this
particular 2.2. So, I can find out what should be the membership function value. So, if it
is very far so this is nothing, but the membership function value for very far and
similarly, this is the membership function value for your the far and once you have
calculated so, these membership function value. So, we can proceed further and how to
calculate this membership function value that I have already discussed in details.

Now, here so, we have got that this I_1 that is equal to 6.0. So, it may be called either
low or medium. Similarly your I_2 that is 2.2 can be called either far or very far. Now,
once you have got this particular µ value, now let us see how to determine the µ value,
how to determine the µ value and that I am going to discuss once again.

154
(Refer Slide Time: 13:40)

Now, here you see so, this is your the membership function distribution for low. So, I am
just going to show. So, this particular right angle triangle, at it starts from 5 to and it ends
at 10 and corresponding to this particular 6. So, what I will have to do is, I will have to
calculate this x. Now, as we have already discussed once again we are going to use the
principal of the similar triangle. So, if I use the principle of similar triangle, I can find
out what should be the value of x for example, say.

So, this particular triangle is similar to your this triangle; that means, this angle is equal
to that particular angle and this angle is the common angle. So, I can write down x
divided by 1.0 is nothing, but 10 minus 6 divided by your 10 minus 5. And, now if I just
find out the value of x, x will come out to be equal to 0.8. So, this membership value that
is your µlow is nothing, but is your 0.8. So, this is the way actually we can determine the
value of the membership.

155
(Refer Slide Time: 15:09)

So, this input I_1 of 6.0 may be called low with membership value 0.8 and the same
input I_1 that is equal to 6.0 may be called medium with your the membership value of
0.2 and that is nothing, but 1 minus 0.8 and that is equal to 0.2. Similarly, the input I_2
of 2.2 may be called far with the membership function value. So, µ far is 0.8. So, this can

be calculated by following the same procedure.

Now, the input I_2 of 2.2 may be called very far with the membership function value 0.2.
So, for each of these inputs I_1 and I_2 and with respect to their linguistic terms, we are
able to find out what should be the membership function values. And, once you have got
this particular membership function values.

156
(Refer Slide Time: 16:21)

So, we are in a position to find out what should be the weight of each of these particular
the fired rule. Now I before calculate the weight of the fired rule let me try to identify or
let me try to mention here the set of fired inputs. Now, the set of fired inputs are as
follows. So, if I_1 is low and I_2 is far.

So, this is actually the first set of the fired input parameters or input variables, the second
set of fired input parameters are if I_1 is low and I_2 is very far, then the third setup of
fired input parameters if I_1 is medium and I_2 is far and the fourth set of your this fired
inputs is if I_1 is medium and I_2 is very far. Now, corresponding to these sets of fired
input parameters, we should be able to find out the weight. Now, let us see how to find
out these weight values.

157
(Refer Slide Time: 17:36)

Now, to determine the weight values, what you do is. So, we try to find out w_1 that is
nothing, but the weight of the first fired rule and this w_1 is nothing, but µlow multiplied

by µ far . Now, µlow is 0.8 and µ far is once again 0.8 and if you multiply. So, you will be

getting your 0.64. Now, similarly corresponding to the second the fired rule, what you
can do is, we can find out the weight, that is, w_2 is nothing, but µlow multiplied by

µveryfar , that is 0.8 multiplied by 0.4 and that is nothing, but 0.16

Now, similarly corresponding to the third rule; so, we can find out that is nothing, but
µ M multiplied by µ FR and that is equal to 0.2 multiplied by 0.8. So, we will be getting
0.16 and corresponding to the fourth rule, the weight will be calculated as follows like
your µ M multiplied by µVFR , that is 0.2 multiplied by 0.2 and here, we will be getting
0.04. So, this is the way actually, we will have to calculate the weights of the different
fired rules.

158
(Refer Slide Time: 19:13)

So, once you got this particular the weights now, we are in a position to find out. In fact,
the output of each of the fired rule, and then, I will combine. Now, the output of the first
fired rule that is denoted by your y_1 that is nothing, but your I1 + 2 I 2 . Now, how to find
out this particular coefficient? The values of the coefficients I have already defined, that
this I_1 is represented using three linguistic terms and for each of the linguistic terms,
there is a separate value for the coefficient. For example, if it is low, medium and high, if
it is low the coefficient is 1, so, if it is medium the coefficient is 2 and if it is high the
coefficient of this thing is 3 and something like this; the same is also for I_2, the
coefficient of I_2. So, this y_1 is nothing, but I1 + 2 I 2 and I_1 is what? I_1 is your 6.0
and I_2 is your 2.2. So, I if I just calculate so, I will be getting the output of the first rule
is your 10.4. Now similarly, we can find out what should be output for the second fired
rule and that is nothing, but I1 + 3I 2 and that is nothing, but 6.0 plus 3 multiplied by 2.2
and this is your 12.6.

Similarly, your y^3 is nothing, but 2 I1 + 2 I 2 and that is nothing, but 2 multiplied by 6.0
plus 2 multiplied by 2.2 and I will be getting 16.4. Then comes your y^4 is nothing, but
2 I1 + 3I 2 ; that means, your 2 multiplied by 6.0 and 3 multiplied by 2.2 and if you just
add them up you will be getting 18.6.

159
So, till now, we have determined the weights of each of the fired rules and the output of
each of the fired rules. Now, I am just going to find out like how to determine the output
considering the combined control action.

(Refer Slide Time: 21:39)

Now, this output is nothing, but is your w^1 multiplied by y^1. So, this is actually the
weight of the first fired rule and this is your the output of the first fired rule; plus w^2
multiplied by y^2 plus w^3 multiplied by y^3 plus w^4 multiplied by y^4 divided by
w^1 plus w^2 plus w^3 plus w^4. And if you just substitute all the numerical values here
for example in place of w^1 I am just going to put 0.64 and y^1 is 10.4, then w^2 is 0.16
and y^2 is 12.6, then w^3 is 0.16 and y^3 is 16.4, then comes w^4 is 0.04 and your y^4 is
18.6 divided by the sum of all w values then I will be able to find out what should be the
output that is nothing, but your 12.04.

So, using this particular the rule, using this particular the technique, that is Takagi and
Sugeno’s approach, we are able to find out what should be the control action or the
output for a set of inputs. Now, as I told that if I want to find out the output, the first
thing will have to know is we have to know the coefficients and as I have already
mentioned to determine the coefficient we take the help of some optimization tools, and
that is why, this particular approach is able to provide very accurate prediction.

Now, we have already discussed like if we just plot say interpretability versus accuracy.
So, supposing that here I am writing interpretability and here I am writing the accuracy.

160
Now, till now actually we have discussed two very popular approaches of fuzzy
reasoning tool, and one is the Mamdani approach and another is actually Takagi and
Sugeno’s approach. Now, for the Mamdani approach, the interpretability will be high,
but the accuracy is low. So, might be I am here, this could be your the Mamdani
approach. So, this is the Mamdani approach, and the Takagi and Sugeno’s approach the
accuracy is actually high, but interpretability is not good. So, might be that particular
point is here.

So, this is nothing, but Takagi and Sugeno’s approach. So, this is actually the real
situation of these 2 algorithms, but what do you want is we want one algorithm, which
will be able to do the prediction accurately and at the same time, its interpretability
should be good. And, that is why, we try to find out one algorithm and that algorithm
may take the position somewhat here, and that is it will give a some sort of your the
better prediction, but not at the cost of computational complexity. So, it should be
computationally tractable.

It should be interpretable and at the same time the accuracy should be good. So, might be
we are going to search for the algorithm which could be here and that will provide a very
good combination of these particular accuracy and interpretability. That means, my
expected fuzzy reasoning tool should be such that now, in fact, like if I just read the rule,
some control action should be understandable and we should be able to understand the
output of a particular rule and at the same time, the accuracy should be good. And, a lot
of studies have been made how to find out an algorithm for this fuzzy reasoning tool,
which will provide both accuracy as well as interpretability.

Thank you.

161
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 12
Applications of Fuzzy Sets (Contd.)

Now, we are going to discuss the concept of Hierarchical Fuzzy logic Controller.

(Refer Slide Time: 00:20)

So, in short, this is known as HFLC. So, let us try to understand the reason behind going
for this particular the HFLC. Now, let me assume a very complex real-world problem,
supposing that, the problem is related to say weather forecasting. Now, if you see the
problem of weather forecasting,whether there will be a rain or not after say 2 hours or 3
hours.

So, that particular prediction depends on a large number of parameters. Now, supposing
that I am going to use a fuzzy reasoning tool as an expert system just to predict whether
there will be rain or not. Now, if you see that particular process, it depends on a large
number of input parameters and the output, that is a function of a large number of input
parameters like your I_1, I_2 up to your say I_n, a large number of parameters. Say,
might be n is equal to 30 or 40 and so, then how to predict and how to design and
develop the knowledge base for this particular the fuzzy reasoning tool.

162
Now, to design this type of the fuzzy reasoning tool, actually we are going to face a
major difficulty and difficulty in the sense that we will have a large number of rules.
Now, here, if you see say, if I consider like n number of input parameters. So, this is
nothing, but your fuzzy logic controller and it has got n number of inputs like your I_1,
I_2, I_3 up to I_n and supposing that n is equal to 40 and supposing that, I have got only
one output. Now, what should be the number of rules and what is the maximum number
of rules, which I will have to design?

Now, to represent each of these input parameters like I_1, I_2 up to say I_n, supposing
that I am using say small m number of linguistic terms. Now, if I use small m number of
linguistic terms to represent each of the input variables like I_1, I_2 say I_n, and if there
are n such input parameters, then what should be the number of rules?

Now, the maximum number of rules will be how much? So, that is nothing, but your
m × m × ...........n.terms , because we have got small n number of input variables. For each
input variable, I have got small m linguistic terms and this will give rise to your m n and
this is actually will be a very high number.

Now, let us just try to put some numerical value, supposing that n is equals to your say
40. So, small n if I put that is equals to say 40 and supposing that I am considering the
small m is equals to say 5; that means, I am using 5 linguistic terms to represent each of
the variable. So, the total number of rules that will become your m n that is nothing, but
is your 540 now, this 5 raised to the power 40 will give rise to a very large number. So,
many such rules we will have to design and so, many such rules we will have to
consider, while determining the output for a set of input parameters.

So, it is bit difficult in terms of computation and as the number of input variable
increases and as the number of linguistic terms used to represent each of the input
variable increases. So, what will happen to the number of rules? The number of rules is
going to increase like anything and computationally, it will become very complex. Now,
let me summarize whatever I have discussed. Now, supposing that I am just going to
handle a real-world problem having a large number of input variables and if I use say 4
or 5 linguistic terms to represent each of the input variables. So, we need a very large
number of rules and as the number of rules increases the computational complexity of

163
this fuzzy logic controller is going to increase. So, this is not desirable. And, this
particular problem is actually known as the curse of dimensionality.

(Refer Slide Time: 05:42)

Now, if you see the literature, people are using this particular term, that is, curse of
dimensionality. So, the fuzzy reasoning tool or fuzzy logic controller is suffering from
this particular drawback and that is nothing, but the curse of dimensionality. Now, how
to solve this particular problem? To solve this particular problem, the concept of your
HFLC, that is your hierarchical fuzzy logic controller has come into the picture. Now, let
us try to understand the working principle of this particular, say HFLC.

164
(Refer Slide Time: 06:24)

Now, to understand the working principle of this particular figure. Now, here actually
what you do is, we have got say the same problem like I have got small n number of
input parameters and I have got only one output and previously, I was using only one
fuzzy logic controller. So, I have got a large number of inputs here say I_1 and this is say
I_n and I have got one output. So, this is actually the conventional FLC. Now, to replace
that so, what I am doing is, I am just going to use the concept of HFLC or your
hierarchical the fuzzy reasoning tool or fuzzy logic controller. Now, the structure wise it
is very simple. So, what you do is, out of these n input variables so, we try to find out
which are the most important ones.

Now, supposing that I_1 is very important; that means, I_1 is having some significant
contribution towards the output, similarly I_2 is another very important input variables.
So, what you do is; so, this I_1 and I-2 so, these two input variables we consider for the
first fuzzy logic system and whatever output we are getting from the first fuzzy logic
system, that will be used as the input for the second fuzzy logic system and another input
is also coming from here, say I_3 is coming here and these two inputs are entering to the
second fuzzy logic system and it will give rise to an output that is nothing, but O_2 and
this particular O_2 is going to enter the third fuzzy logic system FLS_3 and another input
is coming say I_4 and here, I will be getting the output that is O_3 and the same process
will be continued and the last fuzzy logic system that will be like this.

165
So, here On − 2 is going to enter as input and the last input that is I_n is also going to enter

and on the output side, we will be getting On −1 and that is nothing, but the final output.
Now, here actually what you do is, the conventional fuzzy logic controller is replaced by
a number of simple fuzzy logic systems and this fuzzy logic systems are put in the
hierarchy confession. Say, might be in the series this, particular thing support and by
doing that, actually we are getting one advantage. Now, let us try to understand, what
type of advantage we are getting. Now, here if you see, say if you concentrate on the first
fuzzy logic system.

So, there are two inputs, supposing that for I_1 say I am using m linguistics terms and to
represent I_2, I am using say m linguistic terms. So, what should be the number of rules?
The number of rules will be nothing, but m 2 , multiplied by m, and this particular fuzzy
logic system is going to tackle only m 2 rules and similarly, how many such fuzzy logic
system we have got? We have got n minus 1 small fuzzy logic systems. So, what should
be the total number of rules which we are going to consider? That is nothing, but
m 2 (n − 1) .

Now, you see the advantage of this particular HFLC.

(Refer Slide Time: 10:38)

So, previously in the conventional fuzzy logic system so, we had the total number of
rules like your; total number of rules like your m n .

166
(Refer Slide Time: 10:50)

And, now actually in place of that we have got m 2 (n − 1) . So, in the conventional fuzzy

logic controller, we had this number of rules that is m n , and now in the HFLC so, we
have got only m square multiplied by your say n minus 1. Let me take a very simple
example, let me put some numerical value will understand like if I put say m equals to
say 4 and if I consider say n equals to 10.

So, in the conventional FLC, we had the maximum number of rules like 410 , which is a
very large number. And, in place of that so, here I am getting like your m square that is
nothing, but 16 multiplied by n minus 1. So, that is nothing, but is your 9 and this is
equal to your 144. So, in place of so, 410 , I am actually using only 144 rules.

So, this is actually the advantage of using this HFLC, that is, the hierarchical fuzzy logic
controller. Now, to conclude actually, what will have to do is, for a problem having a
large number of input variables, we generally go for this, your the hierarchical fuzzy
logic controller. Now, here so, this is the merit, this is the advantage of using HFLC, but
it has got one demerit also.

Now, that particular demerit is actually as follows: like we may not get the actual level of
accuracy, which we get in the conventional FLC. So, we will be getting less accuracy in
HFLC in comparison with that of your conventional FLC, but computationally, it is
faster HFLC is faster compared to your conventional FLC. So, these are the relative

167
merits and demerits of conventional FLC and HFLC. Now, we are going to concentrate
on another thing that is called your the sensitivity analysis of a fuzzy reasoning tool or
fuzzy logic controller. Now, let us try to understand the meaning of the term: sensitivity.

So, by sensitivity actually what we mean is, actually the change of output to the change
in input. So, sensitivity S is nothing, but the change in output to the change in input; that
means, for unit change of input, what should be the amount of change in output, that is
nothing, but the sensitivity. So, supposing that I have got one FLC, which is used to
model a process having say two inputs I_1 and I_2. So, here I have got one FLC having
two inputs, say I_1 and I_2 and have got only 1 output.

Now, my aim should be how to find out the rate of change of output with respect to your
this particular I_1 and rate of change of output with respect to this your I 2. That means,
if I make unit change in I_1, what will happen to the change in output and if I make unit
change in I_2, what will happen to the change in output? So, those things actually, we
are going to find out. Now, how to determine this? So, the change in output to change in
input so, what you do is, so, we vary the input variables by different amounts. For
example, say we increase by say 0.1 percent then 1 percent, and then 10 percent and we
try to notice what should be the corresponding change in the value of the output.

Now, if I change the input by say 0.1 percent 1 percent and 10 percent and if I can find
out what should be the change in output. So, I can find out the sensitivity of this
particular fuzzy reasoning tool. Now, this is a very crude method, we are doing; now
mathematically also, we can find out this particular sensitivity.

168
(Refer Slide Time: 16:06)

Now, let me concentrate on a fuzzy logic controller, which is going to determine the
output for two inputs, like I_1 and I_2. So, the output is a function of the two input
parameters I_1 and I_2.

Now, what I do is, we just change input I_1 by a small amount say δ I1 . So, what you do

is, we try to find out the change in output that is your δ O and that is nothing, but
f ( I1 + δ I1 , I 2 ) . So, the change in I _is nothing, but this amount δ I1 . So, we try to find

out f ( I1 + δ I1 , I 2 ) − f ( I1 , I 2 ) and this is nothing, but the change in output that is your

∂f
δ O and that is equal to the × δ I1 .
∂I1

So, this is the way actually we can find out, in fact, the change in output using the mean
value theorem. So, by using the mean value theorem, we can write down and we can find
out this. Now, supposing that the mod value of this particular your partial derivative of f
with respect to your I_1 is coming to be greater than 1. So, we can write down that your
change in output is greater than your change in input; that means, your if I change this
particular the input by a small amount, I will be getting more change in this particular
output. So, this I_1 is actually very significant. So, we can carry out the sensitivity
analysis by following this particular method.

169
Now, in place of I_1, I can also carry out the sensitivity analysis for I_2 and if you carry
out the sensitivity analysis for I-2 by following the same procedure, I can also find out
what should be the change in output corresponding to the unit change in your input, that
is your I_2. So, sensitivity analysis can be carried out both for I_1 and I_2 separately by
following the same procedure, which I have already discussed.

(Refer Slide Time: 19:08)

Now, this is actually what you mean by the sensitivity analysis of the fuzzy reasoning
tool.

(Refer Slide Time: 19:10)

170
And, we can find out the contributions of the different input parameters or the input
variables towards its output and we can find out the sensitivity analysis.

Now, I am just going to discuss the merits and demerits of this fuzzy reasoning tool or
fuzzy logic controller. The 1st point, which I am going to make here that fuzzy reasoning
tool is a potential tool for dealing with imprecision and uncertainty. So, this I have
already discussed several times. The 2nd merit or the advantage of FLC it does not
required an extensive mathematical formulation; that means, your if I you if you want to
find out the input-output relationships, we need not go for the differential equation and
its solution. And, if the designer has some information of the process to be controlled or
the process to be modelled so, he or she can design the rule base, the database, that is
nothing, but the knowledge base of the fuzzy logic controller and once that particular
knowledge base has been determined, if we just send one set of input parameters, there is
a possibility. So, we will be getting the output. So, it does not require the extensive
mathematical formulation of the problem, and one fact we have already discussed several
times that most of the real-world problems are very complex and those are bit difficult to
model mathematically. And, that is why, this type of fuzzy reasoning tool or fuzzy logic
controller is going to help us a lot, particularly for tackling or solving the complex real-
world problems. Now, I am just going to concentrate on the demerits.

(Refer Slide Time: 21:24)

171
(Refer Slide Time: 21:26)

So, here to discuss the demerits actually, what I do is so, we try to say that the
performance depends on the knowledge base and how to determine this particular
optimal knowledge base; that means, how to determine the database, how to determine
the optimal rule base? So, that is a difficult task, the fuzzy reasoning tool, in fact, does
not know anything, it does not have any in built optimisation tool.

So, what we will have to do is, we will have to optimise or you will have to train this
fuzzy reasoning tool with the help of one optimizer and the known input-output data; that
means, we will have to design and develop the knowledge base of this particular the
fuzzy logic controller or fuzzy reasoning tool. And as I told determining the proper
knowledge base is not an easy task and in fact, we will be discussing like how to design
and develop the optimal knowledge base of a fuzzy reasoning tool or a fuzzy logic
controller, so that it can perform in the optimal sense.

Now, the next point may not be suitable for modelling a process involving many
variables, this problem we have already discussed, like if there are large number of
variables like the problem of weather forecasting and so on, say very difficult problem
real-world problem. And, for this particular problem, there will be a very large number
of input parameters or the input variables and consequently, the number of rules is going
to increase like anything and the computational complexity is going to be very heavy and
that is why, for a problem having a large number of input parameters or the input

172
variables, our recommendation should be, we should not go for this type of fuzzy
reasoning tool or your are the fuzzy logic controller. And, to solve this type of problem
or to tackle this type of real-world problem, in fact, we have got another very powerful
tool that is your artificial neural networks. So, those tools and techniques will be
discussed in details, after sometime.

Thank you.

173
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 13
Applications of Fuzzy Sets (Contd.)

We are going to start with another potential applications of Fuzzy sets, and that is the
form of fuzzy clustering.

(Refer Slide Time: 00:24)

Now, this clustering is a powerful tool for data mining, and the purpose of data mining is
to extract useful information from a data set. Now, actually what you do in clustering,
the clustering is done based on the concept of similarity; that means, the two similar
point should belong to the same cluster and two dissimilar points should belong to two
different clusters.

Now, if you see the literature, the clustering could be either crisp or fuzzy in nature.
Now, for the crisp cluster, there should be well-defined boundary, but for the fuzzy
clusters the boundaries could be vague. Now, let me take one very practical example
very simple example. Now, supposing that say you are staying in a hostel and in the
hostel there are say 1000 students. Now, if you see, these 1000 students they will move
in a few clusters they will form the clusters based on their similarity and might be these
1000 students will move in say 10 or say 12 clusters and you can see that.

174
So, a particular student may belong to more than one clusters and there is another
possibility. So, a particular student may leave a cluster and he may join another cluster.
So, these are all examples of actually the fuzzy clusters, and for each of the , there will
be a leader and that is nothing but the cluster centers. Now, if you see the literature on
fuzzy clustering, we have got a large number of methods for example, say we have got
Fuzzy C-clustering, then we have got the potential-based clustering, we have got
entropy-based clustering, and so on.

So, there are a few other methods, but out of all the methods of fuzzy clustering, the
Fuzzy C-Means algorithm is the most popular one and we are going to , in detail, the
working principle of this Fuzzy C-Means clustering.

(Refer Slide Time: 03:02)

Now, if you see the Fuzzy C-Means clustering. So, this technique was proposed in the
year 1973 by Bezdek and after that, actually a lot of modifications have been
incorporated into the Fuzzy C-Means algorithm. Now, here, the way the clustering is
done is as follows:

Now, supposing that I have got a large number of data points, and I have got a few
predefined clusters. So, a particular data point may belong to different clusters with
different numerical values of membership, and some of the membership function values
will become equal to 1.0. Now, here, if I consider say a particular data point, and how
much close is that particular data point with respect to the cluster centre.

175
So, what you do is, like we try to measure the Euclidean distance between that particular
data point and cluster and the more the value for this Euclidean distance, the less will be
the similarity and the more will be the dissimilarity, and our aim is to minimize this
particular dissimilarity. So, that the data point can be brought very near to the cluster and
it may belong to the cluster. Now, this particular principle has been used mathematically
just to design and develop the Fuzzy C-Means algorithm.

(Refer Slide Time: 04:58)

Now, I am just going to discuss its principle, in details and after that, I will be solving
one numerical example, just to make it more clear. Now, here I am just going to take a
very typical example, now this example is as follows. Supposing that in the higher
dimensional space, say L dimensional space, say I have got a large number of data
points, say capital N number of data points. Now, each data point is denoted by x_i,
where i varies from 1,2, up to capital N. And, our aim is to the form the clusters the
fuzzy clusters of these particular data points based on their similarity.

Now, if I consider, a particular data point say x_i, now this is in L dimension. So, to
represent this particular point I need to have the numerical values like x_i1, x_i2 and up
to say x_iL. So, I need to have so, many numerical values, if I want to represent a
particular data point say x i in L dimension. Now, let us see, how does it work, how does
this particular Fuzzy C-Means algorithm work. Now I have already mentioned that to
start with so, we will have to form a few clusters. That means, there should be some

176
predefined clusters. Now, let me take some predefined clusters like this for example, say
one fuzzy cluster could be something like this, another fuzzy clusters could be something
like this and there could be overlapping also, because these are all fuzzy clusters.
Another fuzzy clusters could be say something like this, another fuzzy clusters could be
something like this, now supposing that I have defined say four clusters initially, and I
am just going to take the decision whether a particular data point, say this particular data
point will belong to cluster 1 or cluster 2 or cluster 3 and let me consider the cluster 4 as
a general cluster say j-th cluster, say.

So, this is nothing but the cluster j. So, there could be some other clusters also. Now,
how to take the decision whether this particular data point say the i-th data point should
belong to any one of these clusters or not. Now, how to proceed, now the way we
actually try to solve the problem is as follows. So, what you do is, as I told we try to find
out your the Euclidean distance between this data point and the cluster centre, and that
particular data the Euclidean distance as I told, the more the distance the less will be the
similarity and the more will be the dissimilarity and our aim is to minimize this particular
dissimilarity.

Now, if you see, say for the first cluster, supposing that I have got a cluster centre here
and the second cluster, say cluster centre is here, the third cluster, the cluster centre could
be here, the j-th cluster the cluster centre could be here and what I do is. So, from here
this i-th point. So, we try to find out like what should be the membership value,
supposing that the membership function value between the i-th data point and the first
cluster centre is nothing but µi1 , that is the membership value between the i-th data point
and the first cluster. Similarly the membership value between the second cluster and i-th
data point; so this is nothing but µi 2 .

Similarly, between the third cluster and the data point that is the i-th data point. So, µ is
nothing but µi 3 and similarly your. So, this particular between i and j, j-th cluster. So,

the membership function value is your µij . So, µ indicates actually the membership

function value and which varies from like 0.0 to 1.0. So, this is the range for the µ and
here, a particular condition has to be fulfilled. So, that the sum of all the µ values
becomes equal to 1.0.

177
So, this is actually a functional constraint let me repeat. So, with respect to this
particular, and we have got different µ values with the different clusters and the sum of
all the µ values should be equal to 1.0 and that is nothing but the functional constraint
and what is our aim? Our aim is to find out the fuzzy clusters. Now, let us see, how do
we proceed?

(Refer Slide Time: 10:26)

Now to proceed further, what I will have to do is, I will have to formulate as an
optimization problem, and this particular optimization problem has to be solved using
some technique.

Now, what I do is we try to minimize the dissimilarity, as I mentioned, and this particular
dissimilarity will be expressed in terms of the Euclidean distance value that is nothing
but is your dij 2 ; now what is dij ? dij is nothing but the Euclidean distance between the

i-th data point and j-th cluster. So, what is our aim? Our aim is to minimize the
dissimilarity; that means I will have to minimize dij 2 and it is multiplied by µij g .

Now, what is µij ? µij is nothing but the membership function value between the i-th data

point and j-th cluster center, let me repeat µij stands for the membership function value

for i-th data point with respect to the j-th cluster centre and here, we put µij g , and this

particular g is nothing but the level of the fuzziness. Now, if you remember that we

178
discussed the power of a fuzzy set. So, this is almost similar to the power of the fuzzy
set. So, this g is nothing but the level of cluster fuzziness and generally we consider g is
greater than 1, ok.

Now, let us see how to write down this objective function? Now, our aim is to minimize
the dissimilarity and that is, F ( µ , C ) , µ is the membership function value, and this
particular C actually it indicates the total number of clusters or the number of clusters to
be made and C has got a range. For example, say C should be greater than equals to 2
and it should be less than equals to n, what does it mean? It means that the minimum
number of clusters should be 2, it cannot be 1, if it is 1 then no clustering is done and this
the maximum number of clusters cannot exceed n, that is the total number of data points.
So, C is nothing but the number of clusters.

C N
So, this F ( µ , C ) = ∑∑ µijg dij2 . So, this is the objective function, which I will have to
=j 1 =i 1

C
minimize, subject to the condition that ∑µ
j =1
ij = 1.0 and this is nothing but the functional

constraint. Now, let us see, how to solve it using the traditional method of optimization
and to solve this type of problem we take the help of Lagrange multiplier. So, what you
do is. So, this objective function is written in a slightly different form and that is nothing
C N N C
but your F ( µ , C , λ1 ,......λN ) = ∑ ∑µ
=j 1 =i 1
g
ij d + ∑ λi (∑ µij − 1.0) So, this is actually the
2
ij
=i 1 =j 1

functional constraint. So, this one you bring it to the left hand side. So, I will be getting
this particular the expression and here. So, this particular λi is nothing but is your
Lagrange multiplier. So, this is nothing but is your Lagrange multiplier, now I have got
only one big expression for this objective function and I will have to solve using the
dF
method of calculus. Now, what we do is, we try to find out the = 0 . Then, the

dF dF dF dF
= 0 , then derivative of = 0 , then we write = 0 and the last is = 0.
dC d λ1 d λ2 d λN

Now, if we put all such things equals to 0. So, you will be getting a set of equations.
And, those equations, if you solve then there is a possibility that you will be getting
actually the expression like.

179
(Refer Slide Time: 16:11)

The CCj is nothing but the cluster centre corresponding to the j-th cluster. So, what
N

∑µ g
x
ij i
should be the expression for the cluster centre? Now, CC j = i =1
N
.
∑µ
i =1
g
ij

So, this is actually going to give the coordinate of the j-th cluster centre. Now, the j-th
cluster centre, now as the data are in L dimensions, it will have L numerical values and
accordingly, I can write down like your CC_jk, now k varies from 1 to l. So, I can find
out the information for each of the dimensions of this particular the cluster centre. And,
another expression you will be getting by solving those equations that is nothing but
1
µij = . So, if we solve those equations, we will be getting these two
C dij
∑(d
2
g −1
)
m =1 im

particular expressions. One is actually how to determine the coordinate of the cluster
centre and another is how to determine the membership value of a particular data point
with respect to a cluster say j-th cluster and what is our aim? So, in this algorithm, what
we do is, we try to update the cluster centre and the membership value of the data points
with respect to the cluster centre iteratively. So, this is actually an iterative process. So,
this algorithm is an iterative algorithm.

180
(Refer Slide Time: 18:32)

So, this I have already mentioned.

(Refer Slide Time: 18:35)

Now, I am just going to tell you the steps of this Fuzzy C-means algorithm like the FCM
algorithm. So, one after another, now the step 1: we assume the number of clusters to be
made and supposing that that is denoted by C and as I have already discussed the C is
greater than equals to 2 and less than equals to N. Now, step 2, we select some
appropriate level for cluster fuzziness. And so, this is nothing but g, generally it is
considered g is greater than 1 for example, it could be say 1.25, 1.5 ,and so on.

181
Then, step 3: So, this µij that is the membership value of the i-th data point with the j-th

cluster centre. So, what we do is, we try to initialize at random and then, we try to
modify it through a large number of iterations, now if there are N such data points. So,
capital N number of data points and if we are generating say the C number of clusters.
So, for each data point I should have C number of numerical values for the membership
and I have got N number of data points. So, the size of the initial membership matrix that
is µ , it will have the dimension like N × C , N is the total number of data points and C is
the total number of predefined clusters.

So, I will have to generate initially the membership matrix µ at random and what is the
size of that particular matrix, that is, N × C . Now, this I have already mentioned that µ
varies from 0 to 1 and the sum of all the mu values corresponding to a particular point is
nothing but 1.0.

So, this is the way actually the algorithm is working.

(Refer Slide Time: 20:53)

Now, then we go for step 4. Now, here actually what we do is, we try to update the
cluster centre that is your the k-th dimension of the j-th cluster centre CC_jk is nothing

182
but the k-th dimension of the j-th cluster centre j-th cluster centre CC. So,
N

∑µ g
x
ij ik
CC jk = i =1
N
.
∑µ
i =1
g
ij

So, this is the way actually we can calculate the k-th dimension of the j-th cluster centre.
Now, in step five, we calculate the Euclidean distance between the i-th data point and j-
th cluster centre and that is nothing but is your d_ij.

(Refer Slide Time: 22:03)

Then, in step 6, we try to update the membership value of the i-th data point with respect
1
to the j-th cluster centre, that is your µij . Now, µij = , if dij > 0 . And, if d_ij
C dij
∑(d
2
g −1
)
m =1 im

is found to be 0, then what happens the distance between the two points is 0 ; that means,
the similarity is the maximum and your mu_ij will become equal to 1.0.

Now, step 7: we repeat from step 4 to step 6, unless the change in µ values come out to
be less than some pre-specified value, say it is denoted by epsilon; supposing that ε is
say 0.001, a very small value. Now, I am running this particular algorithm, now after
running this particular algorithm through a large number of iterations, there is a
possibility that the µ values are going to reach some saturated level. And, after that,

183
there may not be any significant change in the values of this particular the mu. And if
this particular change is found to be less than equals to this, then we say that the
algorithm has reached the optimal solutions and we try to terminate the program.

So, this is the way actually we terminate that particular the program. So, this shows
actually this algorithm, the steps for this particular algorithm. Now, here, I just want to
make one comment. Now, this algorithm is an iterative algorithm. Now, as it is an
iterative algorithm, there is a possibility that the quality of the clusters will be updated
and iteration-wise and there is a possibility that as the algorithm runs, we will be getting
more and more compact clusters.

Now, I am just going to define, what do you mean by a compact cluster, and there is
another objective, which we very frequently use, that is the distinctness of the clusters.

(Refer Slide Time: 25:00)

Now, here I am just going to define, what do you mean by the compactness of a
particular cluster. So, the compactness of a cluster and another is your distinctness;
distinctness of a particular cluster, now let me try to define, what do you mean by the
compactness of a particular cluster?

Now, supposing that I have got say one fuzzy cluster something like this and say I have
got a cluster centre, supposing that the cluster centre is denoted by c_j. Now, here
surrounding this particular j-th cluster, there will be a large number of members, who are

184
following actually the cluster centre, ok. Now, what we do is, we try to find out the
Euclidean distance between the cluster centre and all the points surrounding it. For
example, say here there are 200 points surrounding the cluster centre. So, what you do is,
we try to find out the Euclidean distance between the j-th cluster centre and each of these
two hundred data points surrounding that cluster centre. So, how many Euclidean
distance values will be getting? We will be getting 200 Euclidean distance values and
what you do is, you add them up and you find out the average. Now if the average
Euclidean distance is less we define that this particular cluster is a very compact cluster.

So, this is the way actually we define the compactness of a particular cluster. Now, let us
try to define the distinctness. Now, supposing that say I have got a large number of data
points and we have got a few fuzzy clusters for example, say one clusters could be
something like this, say this is the first cluster and its centre is your c_1, I have got
another fuzzy cluster something like this and supposing that its cluster is c_2, then I have
got another say might be I am here. So, I will be getting another clusters say c_3. I have
got another cluster here and this is nothing but the cluster centre is say c_4 and so on.

So, what we do is, we try to find out the Euclidean distance between the cluster centers,
for example, we try to find out the Euclidean distance between c_1 and c_2 then comes
c_1 and c_3, then comes c_1 and c_4, we also try to find out what is Euclidean distance
between c_3 and c_4, c_3 and c_2 and c_3 c_1, I have already calculated then we try to
find out what should be the Euclidean distance between 2 and 4, and so on.

So, all the Euclidean distance values between the two of these possible clusters we
determine and we try to find out what should be the average value. Now if the average
Euclidean distance between the cluster centers if it is found to be more then we declare
that this is a very distinct cluster. Now, if you use this Fuzzy C-Means algorithm, there is
a possibility that you will be getting a very compact cluster, but at the cost of
distinctness. So, you may not get a very distinct cluster using this particular the Fuzzy C-
Means algorithm. But, what is our aim? Our aim is to get more compact as well as more
distinct cluster, and how to get it. So, that I am going to discuss after some time.

So, these are actually the relative merits and demerits of Fuzzy C-Means algorithm.

Thank you.

185
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 14
Applications of Fuzzy Sets (Contd.)

Now, we are going to discuss, how to use the concept of this Fuzzy C-algorithm to solve
one numerical example.

(Refer Slide Time: 00:27)

Now, the numerical example, which I have taken here, is a very practical example. The
example is as follows: supposing that, we want to carry out some sort of machining
operation to generate the free-form surface. Now, the free-form surface is actually a bit
difficult to do the machining. Now, a very good example of free-surface could be the
surface of your this mouse, this is an example.

Now, how to do this type of machining or how to generate? So, this type of surface, now
this is a simple one, but if there are so many such ups and downs for example, the
surface which I have considered here, how to carry out the machining. Now, if you see
this particular surface, what you do is the nature of the surface is something like this, say
I have got that this is the three dimension like x, y and z and the nature of the surface is
something like this. So, this could be actually the nature of the surface and here if you
see, we have got this type of undulations.

186
So, there are large number of undulations here and we will have to do the machining just
to generate the free-form surface. Now, what you do is, we take the help of some milling
cutter. Now, as there are so many such up’s and downs on this particular surface, this
milling cutter will have to utilize in an optimal sense and truly speaking, the machining
has to be done cluster-wise.

So, this surface, we try to divide into a number of clusters based on the similarity and
after that, for a particular, we do the machining in one way and for a another cluster, we
will have to do the machining in another way, and to decide that particular strategy of
machining, so that we can get a very accurate free-form surface, we may take the
example or we may take the help of this type of clustering or the fuzzy clustering.

Now, let us see, how to solve so this type of problem. Now, here, for simplicity, what I
have done is, I have considered the 10 points only lying on the freeform surface. And, I
have tried to show you like how to do the clustering, so that we can select the machining
strategy accordingly. Now, let us see, how to proceed with this type of the clustering.

(Refer Slide Time: 03:17)

Now, as I told that we have considered, for simplicity, only 10 points, so that I can show
you the hand calculations but truly speaking, on the surface you will have to generate a
large number of points like 1000 points, 10000 points something like this. But, for this
numerical example, I have just considered 10 points selected at random and these points
are lying on the free-form surface and let us see like how to do the clustering?

187
For example, for the 1st point, the x dimension is 0.2, y coordinate is 0.4 and z
coordinate is 0.6 and so on. So, for each of the 10 points, we have got x, y and z
coordinates and as I told, these points are, in fact, those lying on the free-form surface.
Now, what is our task? We will have to carry out the clustering using Fuzzy C-means
algorithm and we are going to assume that the level of cluster fuzziness, that is, g is 1.25
and termination criterion, that is, ε , we have considered 0.01. Now, let us see how to
proceed with the clustering.

(Refer Slide Time: 04:40)

Now, here, the number of data points we have considered that is equal to 10, for
simplicity, only 10 points I have considered and each data point is having the three
dimensions like your x, y and z.

The level of cluster fuzziness, we have assumed that g equals to 1.25 and let us assume
that there could be only two clusters because I have considered only 10 points. So, it is
better to go for only two clusters and let us see how does it work, how to explain the
working principle of this Fuzzy C-means algorithm to solve this numerical example.

188
(Refer Slide Time: 05:23)

Now, what you do is, so initially, we assume the membership matrix that is denoted by
µ and what is the size of this particular µ , it is nothing, but 10 × 2 . Now, why 10 × 2 ,
because we have got 10 data points and we have considered only two cluster centres.

Now, corresponding to the first data point, so it has got the membership with the two
clusters. So, this is the membership with corresponding to the first point with respect to
the first cluster centre. So, this is the membership value corresponding to the first data
point with respect to the second cluster centre. So, this is the membership function value
and if you add them up, this will become equal to 1.0 and the same is true for each of
these entries.

So, this is the membership values for the second data point, membership values for the
third data point, for the fourth data point and if you add them up, you will be getting 1.0
and these particular matrix the µ matrix of size 10 × 2 . This is generated at random
initially. Now, let us see, how to proceed further and how can you do the clustering.

189
(Refer Slide Time: 06:50)

Now, I am just going to determine, what should be the first dimension of the first cluster
centre and that is denoted by is your CC_11. If you remember that particular expression
which I use CC_jk, now what is that? That is nothing but the k-th dimension of the j-th
cluster centre and CC_11 is nothing but the first dimension of the first cluster centre.

Now, the first dimension of the first cluster centre, I am just going to find out. Now, how
N

∑µ
x
A
g
i1 i1
=
to do it, the same formula which I have derived, CC11 =
i =1
N
. Now, let us see
∑µ
g B
i1
i =1

how to determine so, this particular A and B? Now, to calculate A, you concentrate here,
so µig1 and i varies from 1 to N, N is the total number of data points. Now, what does it

mean the moment I put i equals to one. So, this is nothing but µ11g . What is µ11 ? µ11 is
nothing but the membership value of the first data point with respect to the first cluster
centre.

So, membership value of the first data point with respect to the first cluster centre. Now,
if you see the previous thing the membership value of the first data point with respect to
the first cluster centre, so, this is actually the numerical value ok. Now, similarly the
membership value of the second point second data point with respect to the first cluster
centre this is nothing but the µ value. Similarly, the membership value of the third data

190
point with respect to the first cluster centre is nothing but this particular the numerical
value. And, corresponding to your the first the data point, if you see, the X dimension is
nothing but 0.2, second data point if you see the X dimension is your 0.4, third data point
the X dimension is 0.8, and so on. Now, I am just going to use all such information here.

Now, A is nothing but this particular expression and I am just going to put first your i is
equals to 1. So, it is µ11g × X 11 . Now, X_11 means what? The first data point first
dimension. So, that is 0.2 and this is the membership value raised to the power 1.25. The
next is i equals to 2; that means, your µ21g that is your the membership value of the
second data point with respect to the first cluster centre and this is nothing but this
multiplied by X_21. What does it mean? It means that the second data point first
dimension and that is your 0.4.

Similarly, when i equals to 3, this is the scenario, then i equals to 4, i equals to 5, i equals
to 6, i equals to 7, 8, 9 and i equals to 10. So, by following this and if you just simplify, I
will be getting one numerical value for A and that is nothing but 1.912120. So, this is
nothing but the numerical value for A. So, I hope this is clear to all of you and now, I
will have to find out what is B.

(Refer Slide Time: 11:30)

Now, to find out this particular B, B is nothing but, so µig1 and i varies from 1 to N.

191
So, I put i equals to 1, next time i equals to 2, i equals to 3, then at the end, i equals to N.
Now, if I do that then, very easily, I will be able to find out B, that is nothing but
0.680551 raised to the power 1.25 plus 0.495150 raised to the power 1.25 and these
corresponds, in fact, i equals to your 2, similarly i equals to 3, 4, i equals to 5, equals to
6.

So, this is the way actually, we can find out and we can consider all the data points and
ultimately, you will be getting the B. And, once you have got this particular B, so CC_11
that is what that is nothing but the first cluster centre, the first dimension. So, the first
dimension of the the first cluster centre, let me once again write CC_jk is the k-th
dimension of the j-th cluster centre.

And, CC_11 is the first dimension of the first cluster centre and I will be getting
0.487404 and by following the same procedure, I can find out what is CC_12 and that is
nothing but the second dimension of the first cluster centre. Then comes your CC_13 and
that is nothing but the third dimension of the first cluster centre and that is your 0.543316
and similarly, we can find out CC_21 that is the first dimension of the second cluster
centre, CC_22 second dimension of the second cluster centre, that is 0.459073.

And, by following the same procedure, I can also find out CC_23 that is your the third
dimension of the second cluster centre. So, I will be getting this particular the numerical
value.

(Refer Slide Time: 14:08)

192
And, once you have got this particular the numerical value now, the coordinates of
cluster centres are known and these are determined. So, we have got this C_1, that is the
first cluster centre and it is corresponding the first dimension, second dimension and the
third dimension because, these are all 3 D data and for the second cluster centre, the first
dimension second dimension and the third dimension. Now, once you have got it what
will have to do is, you will have to update this µ .

That means, I will have to update the membership value of a particular data point with
respect to the clusters. So, how to update now to update that? So, this formula I have
already derived. So, this mu_11 is nothing but 1 divided by 1 plus d_11 divided by d_12
raised to the power 2 divided by g minus 1. Now, we will have to substitute the values
for this particular g, that is the level of cluster fuzziness which is 1.25 here we will have
to find out the Euclidean distance, that is, d_11 and d_12.

So, if you remember the d_ij is the Euclidean distance between the i-th data point and the
j-th cluster centre, this d_11 is nothing but the Euclidean distance between the first data
point and your the first cluster and d_12 is the Euclidean distance between the first data
point and the second cluster.

(Refer Slide Time: 15:58)

So, how to determine that? To determine this actually, mathematically, we can calculate
this d_11, as I told the Euclidean distance between the first data point and your the first
cluster centre, and these are nothing but the dimensions of the first cluster centres. And,

193
corresponding to the first data point this is my x, y and z coordinates and using this, so I
can find out the Euclidean distance d_11 and this will become equal to this. And,
following the same procedure, I can also find out d_12 is nothing but square root the first
data point and second cluster centre.

So, this is actually the dimension of the second cluster centre and this minus 0.2 square,
this minus 0.4 square, this minus 0.6 square and these 0.2, 0.4, 0.6 these are nothing but
actually the dimensions of the first data point. So, out of those ten data points, these are
the dimensions or the coordinates of the first data point. So, very easily, I can calculate
the d_12 and that is nothing but 0.380105056. Now, I can find out what is µ11 ?

Now, this µ11 , let me once again repeat that is actually here, if you see the mu ij that is
the membership value of the ith data point with the jth cluster centre. Similarly, the
1
membership value of the first data point with your the first cluster µ11 = , ok,
d11 8
1+ ( )
d12
this formula I have already derived. And, if we just put all the numerical values and if
you calculate, so you will be getting this µ11 is nothing but 0.888564. So, I will be
getting that particular the membership value.

(Refer Slide Time: 18:28)

Now, following the same procedure, I can also find out the other values and if you
calculate, then you will be getting this thing, I have already calculated, that is 0.888564,

194
just now I calculated. Similarly, the other membership values also, you can calculate. So,
these are all updated values. If you remember, to start with the algorithm, initially we
actually assumed some numerical values and after that we are updating those values, so
we have already discussed how to get this particular numerical value follow the same
principle.

So, to get all the numerical values, exactly the same principle we will have to follow. It
completes actually one iteration of this particular the algorithm now if you remember.
So, at the beginning of the first iteration we started with some initial assumption of the
µ matrix and now, you have got slightly modified or updated matrix and the in the
second iteration, we are going to start with this particular µ matrix and once again, we
will repeat the process and these particular iterations will go on and go on.

(Refer Slide Time: 19:59)

And, after a large number of iterations, there is a possibility that we will be getting the
modified, cluster centre and the modified values for these memberships. Now, after a
few iteration, there is a possibility that we will be getting the cluster centre something
like this. So, this will be the first cluster centre having x, y and z coordinates. The second
cluster centre like x, y and z coordinate and this will be getting after running this
particular algorithm for a few iterations. So, might be after 20 iterations, 30 iterations.
We will be getting so, this type of modified cluster centres.

195
(Refer Slide Time: 20:48)

And, corresponding to that, we will also be getting, what should be the modified the µ
values, that is the membership function values. Now, let us try to relook, what does it
mean, now let me repeat the same thing. Now, if you see this µ matrix and if we
concentrate here, what is the practical meaning? The meaning is something like this, so
taht the first data point; the first data point belongs to the first cluster with this much of
membership value. It belongs to the second cluster with this much of membership value;
sum of their values is equal to 1.0.

Now, this shows the membership value of the second data point with the first cluster
centre, membership value of the second data point with the second cluster, and so on.
Now, you try to locate so, out of these your the ten sets of values and with respect to the
first cluster centre, whose values are very near to 1. For example, you can see,
corresponding to the first data point and the first cluster centre this particular µ value is
very near to 0, this is also very near to 0, this is also very near to 0, this is also very near
to 0, this is also very near to 0; that means, your the 1st, then comes here, the 2nd, then
3rd, 4th, 5th, 6th, 6th, 7th, 8th, 8th and your 9th, they are forming a particular group, ok.

And, if you see here, corresponding to this, this is the membership value with respect to
the second cluster and it is very near to 1, here also, it is very near to 1 with respect to the
second cluster, here it is in between, but you just give the opportunity to join the second

196
cluster. So, this is very near to 1 with respect to the second cluster, this is also very near
to 1 with respect to the second cluster, ok.

So, this is one point, this is another point, another point, another point, another point. So,
this is the 3rd one, 3rd point, then comes your the 4th point, then comes your the 5th
point, then 6, 7 points and this is your the 10th point so, this will form another group.
Now, in terms of clusters like say it will form one fuzzy cluster something like this. So,
this is one fuzzy clusters. There could be another fuzzy clusters and there could be
overlapping also and might be here, the first point, second point 6th, 8th and 9 points are
lying and here there is a possibility that 3rd, 4th, 5th, 7th and 10th points are lying. So, I
will be getting one such fuzzy cluster here, and I will be getting another such fuzzy
cluster here.

So, this is the way using the concept of the fuzzy C-means clustering, we will be able to
do the clustering and we will be able to get, say two fuzzy clusters particularly, for this
very simple problem having only 10 numerical data like 10 data points and we have seen
that this algorithm is able to do this clustering very efficiently.

Thank you.

197
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 15
Applications to Fuzzy Sets (Contd.)

(Refer Slide Time: 00:14)

We are discussing fuzzy clustering algorithms. Now, we have already explained the
working principle of fuzzy C-means clustering with a suitable numerical example. Now,
we are going to start with another very popular algorithm, which is known as entropy
based fuzzy clustering algorithm. Now, here, we are going to use a term, that is called
the entropy and which is nothing, but an index. Now, this particular index is used just to
identify which one should be the cluster centre.

Now, supposing that we have got a large number of data points in multidimensional form
and our aim is to identify, which should be the cluster centre. Now, what you do is, we
use the concept of this particular entropy, and this entropy value, if you want to
calculate, we will have to take the help of one value that is called the similarity value.
And, similarity is based on the numerical value of Euclidean distance, now supposing
that I have got two points say point i and another point say point j. So, very easily, I can
find out, what is the Euclidean distance and that is denoted by d_ij. And, if I know the

198
Euclidean distance, I can find out the similarity between these two points, that is denoted
by S_ij.

Now, if the distance between the two points is more, the similarity will be less and vice-
versa. And, as I told, once I got this particular Euclidean distance, we are in a position to
calculate the similarity. And, if I know the similarity information, we can find out this
particular entropy, and I have already mentioned that this particular entropy is an index,
which helps us to decide, which one should be the cluster centre.

Now, the point which is having the minimum entropy value is selected as the cluster
centre. Now, once I have got this particular cluster center, supposing that I have got this
is the cluster center. Now, surrounding this particular cluster center, we have got a
number of data points, now we will have to take one decision. So, out of all the data
points, which are going to enter that particular cluster.

Now, what I do is, we try to take the help of similarity, once again and there will be
some threshold value of similarity. Now, the data points surrounding this particular
cluster centre, which are found to have similarity greater than equals to some pre-
specified value will be encouraged to enter this particular cluster. Now, this is the way
actually, we do the clustering and this particular thing, we are going to discuss in much
more details.

(Refer Slide Time: 03:36)

199
Now, here, I am just going to take one typical example, supposing that we have got a
large number of data points. So, capital N number of data points and supposing that this
particular capital N is equal to say 1000. So, we have got 1000 data points and these data
points are in higher dimension say L dimensional space; that means, if I want to
represent a particular point, I need to have capital L number of numerical values.

Now, what we do is, by using this entropy-based fuzzy clustering, as I discuss, the first
thing we do is, we try to identify, which one should be the cluster centre. Now,
supposing that we have calculated entropy for each of the data points, and now how to
calculate I will be discussing in much more details. But, supposing that we have got the
entropy values, the data point which is having the minimum value of entropy will be
declared as a cluster centre. Now, supposing that let me assume that, this particular data
point is having the minimum entropy value. So, this is nothing, but the cluster center.

Now, the moment, we declare that this is the cluster center, and now surrounding this, we
will have to find out actually a few members which will also belong to this particular
cluster. Now, what we do is? So, we try to compare the similarity of the data points with
this cluster centre. So, how to find out the similarity? So, what we do is, we calculate the
Euclidean distance and using the Euclidean distance information, we try to find out the
similarity.

Now, if this particular similarity with respect to the centre of the cluster is found to be
greater than or equal to some threshold value denoted by β , then we allow those points
to lie within this particular the cluster. So, this is the way, actually we do the clustering
using the entropy-based fuzzy clustering. Now, I am just going to discuss in much more
details.

200
(Refer Slide Time: 05:52)

Now, let us consider that we have got capital N number of data points and these data
points are in L dimensional hyperspace and our aim is to actually carry out this particular
clustering; that means will have to divide these data points into a few clusters and these
clusters will be fuzzy clusters. Now, we take the help of a few steps now, step 1: we
arrange the data point in N rows and L columns. So, N rows means I have got capital N
number of data points and each data point is having L dimensions; that means, L
numerical values.

And, we take the help of one matrix and its dimensions are actually N × L . So, there are
N rows and L columns. Now, step 2: we calculate the Euclidean distance between the
two data points i and j, say using these particular well-known formula. So,
L
=dij ∑ (x
k =1
ik − x jk ) 2 , it is very simple.

Now, here k varies from 1 to L, L is nothing, but the total number of dimensions. Now,
here, we consider k equals to 1 to L and I have got two points, one is your i and another
is your j, now dimension-wise we try to find out the difference square of them and after
that we add them up and we take the squared root of that. So, this is the way we calculate
the Euclidean distance between the two points i and j, now once, I have got the
Euclidean distance between the two data points.

201
(Refer Slide Time: 07:47)

Now, actually, we will have to find out what is the similarity among them. Now, as I told
that if the Euclidean distance between the two data points is more, their similarity is less
and vice-versa. Now, let us see how to represent this particular relationship in the
mathematical form. Now, in step 3; we try to find out the similarity S_ij between the two
data points, that is your i and j. Now, the way the similarity and this particular distance
−α dij
relationship has been written is as follows, like Sij = e .

Now, if d_ij is more then this becomes 1 divided by e raise to the power alpha dij. So, if
d_ij is more so, this particular expression is going to be reduced; that means, your
similarity is less. So, if the two data points are too far. So, that similarity will be less and
vice-versa. Now, here, I just want to mention that this particular relationship is actually
not the unique.

Now, I can write down this particular relationship between the Euclidean distance and
similarity in a slightly different way also, but this is actually the method, the proposer
−α dij
used, so I am just going to use the same expression, that is, Sij = e , where α is a

constant and the value for this particular constant is to be determined. Now, how to
determine the value of this particular α , that I am going to discuss.

(Refer Slide Time: 09:49)

202
Now, here actually to determine the value of α , we assume a similarity of 0.5, when the
distance between the two data points that is d_ij becomes equal to the mean distance of
all pairs of the data points. Now, let me take a very simple example, supposing that I
have got only 5 points here. So, if I have got only 5 points, what could be the
possibilities of the distance values, the distance could be your d_11 then comes d_12,
d_13, d_14, d_15.

Then, the distance between 2 and 1, 2 and 2, then comes 2 and 3, then d_24 then comes
your d_25 then comes d_31, d_32. So, d_33, d_34 then comes d_35 then d_41, d_42,
d_43, d_44 then comes d_45 then d_51. So, d_52 then comes d_53 then comes d_54 and
then comes your d_55.

Now, here so, d_12 means, what is the distance between 1 and 2 and d_21 is the distance
between 2 and 1. So, we assume that your d_21 is equal to your d_12, similarly d_41 is
equal to your d_14 and so on. Now if you concentrate on the diagonal elements that is
d_11, d_22, d_33, d_44 and d_55 so, these diagonal elements if we concentrate. So, the
distance between 1 and 1 or 2 and 2 so, these are all equal to 0.

So, distance between 1 1, 2 2 and so on and that is equals to 0. And, moreover we have
already consider the d_21 is nothing, but d_12; that means, your if I just concentrate on
only one side of this principle diagonal, I will be able to find out the distance values for
example, if I just concentrate on these the distance values; that means, your d_12, d_13,

203
d_14, d_15, d_23, d_24, d_25, d_34, d_35 and d_45. So, I will be able to find out the
distance values.

Now, here, we consider 1 2 3 4 5 6 7 8 9 10. So, we consider the 10 distance values.


Now, here, if I just know these particular 10 distance values, my purpose is served and I
did not calculate all 5 multiplied by 5, 25 distance values. Now, these particular 10 is
actually nothing, but is your 5C2 and 5C2 if we calculate, this is 5 factorial then comes
your 3 factorial, 2 factorial. So, this is nothing, but 5 multiplied by 4 that is 20 divided by
your 2 and this is nothing, but 10.

So, if I know these 10 information, my purpose will be served. And, what we do is, we
determine your the d , that is the mean distance. So, we consider 1 divided by N C2 . So,
N
this C2 is nothing, but 10, because here N is equal to 5, according to this particular
example. And, then, we try to find out summation i equals to 1 to N, summation j is
greater than i to N, d_ij; that means, we consider only one side of this particular the
triangle and this is the way, actually we can find out, what should be the average distance
value, that is nothing, but is your d .

Now, let me come back. So, we assume a similarity of 0.5, when the distance between
the two points that is d_ij becomes equals to d . So, d_ij equals to d and this is nothing,
but this particular expression. Now, here actually, if I just derive, I can find out what
should be the suitable expression.

204
(Refer Slide Time: 14:43)

For this particular α , now let me try to derive here the similarity. So, if I just see the
expression for the similarity that was nothing, but S_ij. So, e raise to the power minus
alpha d_ij.

So, this is the relationship between the similarity and the Euclidean distance and our aim
is to derive this particular the expression for α . Now, what we do , we consider a
similarity of 0.5. So, 0.5 is nothing, but say 1 divided by 2 and that is nothing, but is your
e raise to the power minus α and these d_ij is nothing, but the mean distance, that is, d .

1
Now, if I take log like the log base e on the both sides. So, I will be getting ln = ln e −α d
2
now this can be return as your −α d . So, −α d ln e and this can be written as ln 1 minus
ln 2 and that is nothing, but - α . So, d now ln e, that is, log e base e that is equals to 1
−α d and from here, I can find
and here, log 1 is equals to 0. So, I can find out − ln 2 =
out that α = ln 2 / d and this is the expression which I have written.

So, I can find out the numerical value for this particular α and once I have got the
numerical value for this particle α , very easily, I can find out the relationship between
your similarity and the Euclidean distance values using this particular expression. So,
this is the way actually we will have to calculate the similarity. Now, once I have got this
particular similarity.

205
(Refer Slide Time: 17:03)

Now, here, we will have to find out what should be the entropy. Now, to determine the
entropy, that is the index, what I do is, we use this particular expression in step 4 and
remember one thing, this particular expression has been actually designed based on one
philosophy. Now, I am just going to discuss the philosophy first and which is the reason
behind defining this relationship between entropy and similarity. So, this indicates
actually the relationship between this entropy and the similarity S. Now let me try to
concentrate here, let us see, if I take some suitable value for the similarity, then what
happens to the entropy.

Now, let me assume that S is equal to 0, now if I take the S is equal to 0; that means,
your similarity equals to 0; that means, the distance between the two point is actually
very high and the two points are too far and let us see what happens to entropy. So, by
using this particular the expression, now if S is equal to 0, what will happen to the
entropy? So, E is nothing, but minus so, I am just going to put S is equals to 0 and here if
I put S is equal to 0, log 0 is not defined. So, it is undefined, but it is multiplied by 0 so,
its contribution will be actually 0.

Then, comes your minus 1 minus S so, S equals to 0. So, I will be getting 1 here, then log
base 2 S equals to 0. So, log base 2, 1 and log base 2, 1 is once again equal to 0. So, I
will be getting 0. So, this is nothing, but 0. So, if I put S equals to 0. So, I am getting
entropy is equal to 0, now let me put another extreme value for this similarity, let me put

206
S is equal to 1; that means, your the similarity between the two data points is equal to 1;
that means, the two data points are exactly similar and the distance between them is
equals to 0.

Now, if I put S equals to 1 in this particular expression, I will be getting that entropy is
nothing, but so, −1log 2 1 . So, that is equals to 0 then comes your S equals to 1 so, this
will becomes 0. So, once again, I will be getting 0. So, far S equals to 1; that means,
when the similarity between the two data point is equals to 1; that means, your the
Euclidean distance is equal to 0, then also the entropy becomes equal to 0.

So, for the two extreme conditions, when the similarity equals to 0 and similarity equals
to 1, the entropy becomes equal to 0. Now, let us try to find out, when S is put equal to
say 0.5 or half let us see what happens, because S is in between now. So, we calculate
1
actually the entropy. So, this is nothing, but −1/ 2 then comes your log 2 . So, S is
2
equal to half so, this will becomes half then comes your log base 2. So, this is nothing,
1
but half. So, this can be written as your − log 2 . So, this is nothing, but is your minus.
2
So, log 2 1 − log 2 2 .

Now, log 2 base 2 is actually your 1 and your log 2 1 = 0 . So, I will be getting minus 1
here and I have got here minus. So, this minus and minus so, this will become equal to
plus 1. So, entropy becomes equal to 1. So, when similarity is 0.5. So, entropy becomes
equal to your 1.0. So, based on this particular philosophy so, this relationship has been
derived and once I have got this particular thing, now, you are in a position to find out
the total entropy for each of the data points.

207
(Refer Slide Time: 22:05)

Now, you have got capital N number of data points, for each of the data points, I will be
able to find out what is E_i, where i varies from say 1, 2 up to your N. So, for all the data
points, I will be getting the entropy values, that are the index values and to calculate the
total entropy actually, this is the expression which I will have to use.

j ≠i
So, Ei =−∑ ( Sij log 2 Sij + (1 − Sij ) log 2 (1 − Sij )) . So, this is the way actually, we can find
j∈x

out what should be the total entropy for each of the data points and once I have got this
information of the total entropy for each of the data points.

208
(Refer Slide Time: 23:11)

Now, I am in a position to give the statement of the different steps of this algorithm. So,
we are going to discuss the steps to be followed in this clustering algorithm. Now, step 1;
we calculate entropy E_i for each of the data point x_i lying in T hyperspace. So, as I
told, we have got capital N number of data points in L dimensions So, all such data
points, I am just going to represent by T hyperspace data points, ok.

Now, what you do is? So, we try to calculate the entropy the total entropy for each of the
data points using the method which we have already discussed. And, once I have got the
whole information of this entropy for all the data points, now you are in a position to
identify a particular data point, which is having the minimum value of entropy. So, step
2; we identify x_i that is got the minimum entropy value and that is declared as your the
cluster centre. So, we declare that particular point as the cluster centre, which is having
the minimum value of entropy.

209
(Refer Slide Time: 24:39)

And, once I have got that particular cluster centre in step 3, actually what I do is, we put
xi minimum and the data points having similarity with x_i minimum greater than your β
and β is nothing, but the threshold value of similarity.

So, the user is going to define. what should be the threshold value of similarity for
example, say this could be 0.6, it could be 0.4, it could be 0.8, and so on and what you do
is, we have got this particular cluster centre and surrounding this, we are going to define
one cluster. So, what you do is, the data points which are having similarity with this
cluster centre greater than or equals to the threshold value, are allowed to enter to this
particular cluster.

So, finally, I will be getting this particular cluster. So, let me repeat in step 3; we put x_i
minimum and the data points having similarity with x_i minimum greater than β in a
cluster and we remove them from your the T hyperspace supposing that initially I had
1000 data points and in the first cluster supposing that, 300 data points are entering. So, I
have got 1000 minus this 300 that is nothing, but 700 data points remaining and with the
help of these remaining 700 data points, I will try to form the second cluster, third
cluster, and so on.

Now, step 4 checks, if T is empty if it is yes, you terminate the program; that means, all
the data points have been put into some clusters, else we go to step 2; that means, we

210
repeat from step 2 to step 4. So, this is the way actually we do the clustering using the
entropy-based fuzzy clustering.

(Refer Slide Time: 26:48)

Now, here, I just want to tell you that this clustering algorithm is very flexible and if I
just change the value of this β a little bit there will be actually too much change in the
obtained cluster and that is why, this clustering algorithm is very flexible.

Now, here, I am just going to discuss one concept, that is your the concept of outlier.
Now, actually if you see so, supposing that say I have a got 1000 data points and this
1000 data points have been divided into a few clusters, for example, say this is one
cluster, this is another cluster, this is another cluster, this is a fuzzy cluster. So, there
could be overlapping also.

So, there could be another cluster say C_4, now what we do is, we try to count the
number of data points present in each of the clusters. So, we try to find out, how many
data points are there in the first cluster, that is C_1, the second cluster whose centre is
C_2, that number of data points in third cluster, number of data points in fourth cluster.
And, supposing that these 1000 data points have been actually clustered into four clusters
and we know how many data points are present in each of the clusters. So, after that
actually, what we do is, we try to define whether all the clusters are valid clusters or
there could be a few outliers; outliers means actually those data points, which do not
belong to any of the clusters.

211
So, what I do is, we try to count the number of data points present in each of the clusters
and if that particular data point is found to be greater than equals to γ percent of the total
data points, then we declare that this particular cluster is a valid cluster. Now, if I take
say γ equals to 10 percent that is one-tenth; that means, 10 percent, that is a 10% of your
1000, that is nothing, but your 100 points. That means, to declare a particular cluster a
valid one, there must be at least 100 points in that particular cluster, supposing that there
are only 30 points in a particular cluster, so-called cluster. So, we define those 30 points
are nothing, but the outliers and we, in fact, do not consider that particular cluster having
only 30 points as a valid cluster.

Now, this is the way actually, we do the clustering and as I have already mentioned that
while doing this particular clustering. So, we will have to be very careful, so that this
particular cluster becomes a distinct one.

(Refer Slide Time: 30:00)

So, we try to define the distinct cluster. So, the cluster has to be distinct, this has been
already discussed that the cluster has to be compact and at the same time there should not
be any outliers. So, the number of outliers number of outlier should be as minimum as
possible. So, our aim is to minimize the number of the outliers and in ideal condition the
number of outlier should be equal to 0 and our aim is to maximize the distinctness and
our maximize the compactness also. Now, these I have already discussed, to measure the

212
distinctness of the clusters, we consider inter-cluster distances, this I have already
discussed in the last lecture.

So, to measure the distinctness actually what I do is, we consider actually inter-cluster
distance. To measure the compactness, what I do is, we consider the intra-cluster
Euclidean distance values. So, this is the way actually, we try to find out what should be
the distinctness, what should be the compactness and we can also measure the number of
outliers. And, let me repeat, our aim is to reach the clustering, which ensure the
maximum distinctness, maximum compactness and a minimum number of outliers. Now,
this is the way actually we use this particular entropy-based fuzzy clustering.

Thank you.

213
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture - 16
Applications to Fuzzy Sets (Contd.)

(Refer Slide Time: 00:14)

Now, we are going to discuss one numerical example to explain the working principle of
this entropy-based fuzzy clustering. Now, I am just going to take the same numerical
example, which I took for the previous algorithm, that is fuzzy C-means algorithm.

Now, once again, let me take the same example. So, this example is nothing, but the
example of the free-form surface and let me repeat, what I will have to do is, this is the
free-form surface and this particular free-form surface, I will have to generate, I will
have to do the machining to get this type of the free-form surface. So, how to get it?
Now, to carry out this particular machining with the help of milling cutters, so before
that, what we do is? We do clustering based on similarity.

Now, here we are going to discuss. So, how to use? So, this entropy-based clustering to
solve this clustering problem or how to achieve the suitable clusters?

214
(Refer Slide Time: 01:32)

Now, let us take the same example, and here, what I am going to do is, the same set of 10
points, we are going to consider and which are lying on that free-form surface and let me
do this particular clustering using entropy-based clustering. Now, these data points are
nothing, but the 3D data points; that means, corresponding to the first point, I have got X
dimension, Y dimension and Z dimension.

Similarly, we have got 10 number of data points. Now, in fact, if I want to do the
optimization for the practical problem related to the machining of free-form surface, we
will have to take a very large number of data points, might be 10000, 20000 data points,
but here, for simplicity, I am just going to consider only 10 data points. And, for each of
the data points, I have got three dimensions. So, in the matrix form, this particular data
can be represented by your 10 × 3 there are 10 rows and 3 columns.

Now, here, I am just going to carry out the fuzzy clustering based on similarity and
entropy and we assume that the threshold value for this particular similarity that is β is
nothing, but 0.05. And, to determine whether there is any such outlier, we consider the
concept of γ and we assume that γ is equal to 10 percent. Now, here, let us see like how
to use this particular concept to solve the clustering problem.

215
(Refer Slide Time: 03:15)

Now, the first thing, we do is, we try to find out the equilibrium distance between the
two data points i and j. So, this formula, I have already discussed. So,
L
=dij ∑ (x
k =1
ik − x jk ) 2 . And, here, we are going to consider actually the 10 data points,

that is N is equal to 10.

(Refer Slide Time: 03:47)

Now, using this actually, we can find out what should d and all such things. Because,
10 N N
here so, C2 that is nothing, but is our C2 . So, C2 is actually the total number of

216
distance values, which we will have to consider and calculate. And, here, it is nothing,
but is our ^10 C_2 and that is nothing, but is your 10 factorial divided by 8 factorial and
2 factorial. So, it is 9 multiplied by 10, 90 divided by 2 so, I have got actually the 45
distance values. And, here, d_ji is nothing, but d_ij and the diagonal elements are all put
equals to 0 because d_00, d_11 is nothing, but equal to 0.

So, using this particular information, very easily you can find out, what is your d ?

(Refer Slide Time: 04:45)

That is your the mean distance, that is d . And, once you have got this particular d , that
is summation d_ij divided by 45 is something like this 0.518373 and once you have got
this particular d so, very easily, I can find out this α . α is nothing, but ln 2 divided by
is your d and if I calculate for this particular data point. So, this will become equal to
1.337160. And, once you have got this particular value for the α now we are in a
position to calculate what should be the similarity, that is, S_ij and that is nothing, but
−α dij
Sij = e and α is equal to 1.337160.

217
(Refer Slide Time: 05:43)

Now, if I just calculate, what should be the equilibrium distance value and what should
be the similarity. Now, we can find out so, there are 10 points. So, the way I am marking
the first point is marked as 0, the second point is marked as 1, similarly the 10th point is
marked as 9. So, what we do is we try to find out, we try to first determine the
equilibrium distance between 0 and 1 that is nothing, but d_01. So, this is nothing, but is
your d_ij and this is nothing, but the similarity, that is, S_ij.

So, what we do is, the distance between 0 and we try to calculate the way I have already
discussed and once I got that particular distance value and knowing the value of α . So,
we can also find out what is the similarity? Now, similarly the equilibrium distance and
similarity or the different data sets for example, say different data combinations like 0
and 2, 0 and 3, 0 4, 0 5, 0 6, 0 7, 0 8 and 0 9, we can find out. So, for different
combinations of these data points starting from 0 so, I can find out the equilibrium
distance values and I can find out their similarities, the exactly the same way, the way I
have already discussed.

218
(Refer Slide Time: 07:23)

Now, the next is, I will have to find out the distance and your similarity values. So, this
is nothing, but d_ij and this is your S_ij and I can find out the distance and similarity
between 1 and 2, 1 and 3, 1 and 4, 1 and 5, 1 and 6, 1 and , 1 and 8, 1 and 9. Now, this 1
0 I should not determine because I have already calculated 0 1 and the distance between
0 1 that is your d_01 is nothing, but d_10 and we consider similarity between 0 1 is
nothing, but similarity 1 0. So, starting from 1 so, 1 2 3 and so on up to 1 9 I can find out
the distance and your the similarity values.

(Refer Slide Time: 08:19)

219
Now, by following the same procedure. So, I can also find out the equivalent distance,
that is, d_ij and the similarity that is S_ij between 2 and 3, 2 and 4, 2 and 5, 2 and 6, 2
and 7, 2 and 8, 2 and 9. Now, here I should not determine 2 and 0, because I have already
determined 0 2 then 2 and 1 because, I have already considered 1 2. So, I can find out the
equilibrium distance and the similarity similarly starting from 3 so, I can find out
between 3 and 4, 4 and 5, 3 and 6, 3 and 7, 3 and 8 and 3 and 9 so, using this particular
method, I can find out.

(Refer Slide Time: 09:04)

Then, between 4 and 5, I can find out the distance and similarity. Then, 4 and 6, 4 and 7,
4 and 8 and 4 and 9 other things I have already considered. Then, between 5 and 6, 5 and
7, 5 and 8, 5 and 9 then comes within 6 and 7, 6 and 8, 6 and 9, 7 and 8, 7 and 9 and 8
and 9. So, I can find out the equilibrium distance and the similarity. Now, for the
different combinations of the data points, we have already calculated their equilibrium
distance values and the similarity values. Now, we are going to use this particular
information to find out, what should be the total entropy for each of the data points.

220
(Refer Slide Time: 09:51)

So, here, we have considered, there are ten points; that means, starting from E0 , I will

have to find out E 9. Now, let us see, how to find out, the first that is E0 . So, E0 is
nothing, but this, I am just going to use this particular expression. Here, i equals to
naught and j varies from what j belongs to x. So, if it is if it is i equals to your 0 and j is
not equals to i. So, I will have to start from 1 and I will have to go up to 9.

So, what I will have to do is? So, I will have to put here minus. So, S_i equals to 0 and
let me put j equals to 1 then log base 2 then S_01 plus 1 minus, S_ij so, this is nothing,
but S_01 then comes your log base 2, 1 minus S, then i is equal to 0 and this is nothing,
but 1.

And, here actually, this would be your minus because minus is outside. So, if I write
separately here that would be minus then I will have to write down and j equals to 2. So,
this will become minus S_02 log base 2 S_02 minus 1 minus S_02 then come log base 2
then comes 1 minus S_02. And, this I will have to write and the last term will be as
follows S_09 then comes your log base 2 then S_09 minus 1 minus S_09 log base 2 then
comes 1 minus S_09. And, if I just calculate, then I will be getting, E_naught is equal to
your 8.285456, the way I have written it here.

So, this is the way, actually we can find out what should be the total entropy of a
particular the point.

221
Student: Summation of the task (Refer Time: 12:40).

So, these are to be added, in fact, this is not equal. So, all such things, I will have to
actually go on adding or go on subtracting then finally, you will be getting this particular
expression. So, this should be this equal sign should be replaced and this would be
negative sign because this is a summation.

Now, so, following this method so, I will be getting this particular E0 and similarly

actually, we can also find out what is E1 what is E2 and then comes your E3 . So, using
this actually I can find out what should be the entropy values for the different data points.

(Refer Slide Time: 13:25)

Now, as I told, by following the same procedure, I can find out what should be your E_4
then E_5, E_6, E_7, E_8 and E_9. Now, if I compare, all the entropy values, now if you
see the entropy values like E_naught, E_1, E_2, E_3 and your E_4, E_5, E_6, E_7, E_8
and E_9, the minimum in terms of the numerical value, the minimum will be your
E_naught. That means your the first point will be selected as the first cluster center that
means, your the first cluster center will be the first point.

222
(Refer Slide Time: 14:13)

Because, the 0th point that is the 1st point is nothing, but the first cluster center. And, let
me assume that the threshold value for the similarity, that is, β is equal to 0.5. Now, if
we just go back on the picture of the similarity. So, I can find out the first cluster for
example, say the 0 has been taken or has been considered as the first cluster center.

(Refer Slide Time: 14:42)

And, now, I will have to find out the similarity between 0 and 2 up to 0 and 9, and if I
concentrate on these particular similarity values and β the threshold value of similarity
is 0.5.

223
So, I will have to identify. So, those points whose similarity with the cluster center that is
0 is greater than or equals to 0.5. Now, here if you see the similarity is your 0.669551.
So, this is very similar to your the first cluster center. So, this should be considered in the
first cluster. The second is 0.424773 that is a less than 0.5. So, this should not be
considered, the 3rd point should not be considered, the 4th should be considered in the
first cluster.

The 5th should be considered, but 6th should not be considered, 7th should be
considered, 8th should be considered, but 9th should not be considered in the first
cluster. And, the same thing actually, I have just put it here so, you can see that in the
first cluster.

(Refer Slide Time: 16:04)

So, we have considered the 1st point, that is, the 0th point is nothing, but the cluster
center and the other points will be the 1st point, 4th point, 5th, 6th and the 8th point.
Now, if 0th is the 1st point. So, (Refer Time: 16:17) this is the 2nd, 5th, 6th, 8th, and 9th
point and this is the way actually we will have to form this particular cluster.

224
(Refer Slide Time: 16:26)

The cluster now, once you have got this particular cluster. So, out of the ten points, a few
points have been considered, but we have got a few remaining points, what are those
remaining points? The remaining points are nothing, but 2nd, 3rd, 5th, and 9th,. And, out of
these the E values, if we consider, that is your E_2, E_3, E_6 and E_9, if you compare.
So, this E_6 is found to be the minimum. So, the 6th point will be considered as the
second cluster center and once you have considered the 6th point as the second cluster
center once again you see the similarity of the other points like 2nd, 3rd and 9th with the
6th and those similarity values, we have already considered.

So, what we do is, we consider at the center that is nothing, but the 6th point and
surrounding that, we have got a points like the 2nd point, then comes the 3rd point then
comes the ninth point in this particular the cluster. Now, till now actually whatever we
have discussed. So, we have got two clusters.

225
(Refer Slide Time: 17:38)

And, if I just draw it. So, might be this is my the first cluster and this is my actually, the
second cluster and the clusters are fuzzy in nature. So, there could be some overlapping
region also.

Now, here so, 0th it is the first cluster center, and that is followed by the 1st, then 4th, 5th,
7th and 8th points. And here in the second cluster the 6th is actually the cluster center
and here the second 3rd and 9th will be your the data points, which are going to follow
your the 6th that is the second cluster center. So, this is the way actually, we will be
getting two such fuzzy clusters using the entropy-based clustering.

226
(Refer Slide Time: 18:30)

Now, if you see here, the number and nature or the quality of the clusters depends on
actually a number of parameters. In entropy-based fuzzy clustering, we have considered
a few parameters for example, say we have got α that relates the relationship between
your the equilibrium distance and similarity. Then comes we have got β that is the
threshold value of similarity, then comes we have got γ which decides your the outliers.
So, the performance depends on this particular α , β and γ and we have seen that. So,
this particular clustering algorithm is a very flexible and we can yield the distinct clusters
here, but the compactness will be less.

Now, this algorithm is also very fast and you will be getting the distinct clusters, but as I
told we may not get very compact. And, that is why, actually we we try to combine the
merits of fuzzy C-means algorithm and the merits of this entropy-based clustering
algorithm. So, just to develop the entropy based fuzzy C-means clustering actually, this
entropy-based fuzzy C-means clustering has been proposed by us, and where we tried to
consider the merits of these two algorithms and we try to eliminate their inherent
demerits.

Now, let me repeat what we need, we need the clusters should be very distinct and we
need actually the clusters should be very compact and at the same time the number of
outliers should be as minimum as possible. So, we formulated this as an optimization
problem and we solved using nature-inspired optimization tool. So, genetic algorithm,

227
which I am not going to discuss in details in this course, we could find out very distinct,
very compact and the number of outlets will become minimum. So, that type of ideal
clusters, we could get.

And, moreover another experience which I am going to share with you people, the
performance of clustering algorithms are found to be data dependent. So, for different
data sets, the performance for the clustering algorithms could be different and actually, it
depends on the nature of the data sets.

(Refer Slide Time: 21:17)

Now, the references, the textbook for this particular course that is the textbook, Soft
Computing: Fundamentals and Applications written by me, you will be getting the
material, which I discussed, you can also consult the book: Fuzzy Sets and Fuzzy Logic:
theory and application by George Klir and for the combined entropy-based fuzzy
clustering, if you want to have a look, you will have to look into this paper written by us
like Genetic algorithm-tuned entropy-based fuzzy C-means clustering for obtaining
distinct and compact cluster.

Now, here, I just want to tell you that in this course, I am not going to discuss the
principle of nature-inspired optimization tool or the genetic algorithm, in details.
Actually that the working principle of genetic algorithm and other nature-inspired
optimization tools has been discussed in much more details in another MOOC program,
another MOOC course, that is called Traditional and Nontraditional Optimization Tools

228
developed by me and this is also , in details, in your textbook, that is, Soft Computing:
Fundamentals and Applications by D. K. Pratihar. So, you can have a look of this book.

(Refer Slide Time: 22:43)

Now, let me conclude, whatever I discussed. So, here we tried to concentrate on the
various applications of fuzzy sets. Now, as I told that if you see the literature, the fuzzy
set has been used to solve a variety of problems now out of all such problems, two
problems I have discussed, in details, like how to develop design and develop the fuzzy
reasoning tool in the form of fuzzy logic controller so that we can establish the input-
output relationships. So, to establish the input-output relationships, we can take the help
of fuzzy reasoning tool or your fuzzy logic controller. And, the principle of fuzzy
reasoning tool like your Mamdani approach, then comes Takagi and Savinos approach,
we have discussed, in details, with the help of suitable numerical examples.

Now, after that, we started with fuzzy clustering two very popular tools for fuzzy
clustering is the fuzzy C-means clustering, another is your entropy-based fuzzy
clustering’s have been discussed in details, with the help of some numerical examples.

Thank you.

229
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 17
Optimization of Fuzzy Reasoning and Clustering Tool

(Refer Slide Time: 00:24)

We are going to discuss on optimal design of fuzzy reasoning and clustering tools. Now,
in this lecture, I am just going to concentrate on these topics. At first, I will give a brief
introduction to the nature-inspired optimization tools; and after that, I will concentrate on
how to optimize the fuzzy reasoning tool; and at the end, we will deal with optimization
related to fuzzy clustering. Now, let me start with this optimization or the nature-inspired
optimization.

230
(Refer Slide Time: 01:00)

Now, before I start with this nature-inspired optimization tools, let me give you a very
brief introduction to the concept of optimization, and why should you go for this nature-
inspired optimization tools. And, these are also known as nontraditional optimization
tools. Now, by the term: optimization actually we try to select the best design out of all
the possibilities. Supposing that we have got a number of feasible designs, and we try to
find out which one is the optimal, that is the task of optimization.

Now, if you see the literature on optimization, a huge literature is available. Now, there
are a large number of classical tools or the traditional tools for optimization, these are
also known as the conventional methods of optimization. And, there are a few
unconventional or nature-inspired optimization tools. Now, if you see the conventional
or the traditional optimization tool, so traditional optimization tool if you see, these are
broadly classified into two groups. Now, one is called the direct search method, and
another is known as your the gradient-based method. So, we have got this gradient based
method.

Now, this gradient-based method actually, we try to find out the search direction by
knowing the gradient information of the objective function. So, in gradient search
method, the algorithm will try to move along the direction or opposite to the direction
depending on the problem, whether it is maximization or minimization. On the other
hand, in direct search method, the search direction is decided by the value of the

231
objective function. And, here we do not need the gradient information of the objective
function.

Now, here for these traditional methods actually there are a few demerits. I am just going
to mention those demerits one after another, and then I will try to explain, why should
you go for so this nature-inspired optimization tools. Now, if you see the various
demerits of the traditional or the conventional optimization tools, the first one is, in
traditional method, there is a chance that the algorithm is going to get stuck at the local
optimum solution. So, instead of reaching the globally optimal solution, there is a
possibility that we will be getting the locally optimal solution.

Now, if you are going to use the gradient-based method and supposing that I have got
one objective function, which is discontinuous, so we cannot find out the gradient. So, it
is bit difficult to apply the gradient-based optimization tool. Now, then comes supposing
that we have got a few integer variables. Now, if there are integer variables, it becomes
bit difficult in traditional tool to tackle that type of optimization problem.

The next is your or if you see the parallel computing like if you want to make it faster,
this traditional tool, as it starts with only one solution selected at random, we cannot use
the principle of parallel computing. And, if you see, this traditional tool for optimization,
so to solve different types of problems, we will have to take the help of various
algorithms. And, a particular algorithm may not be suitable to solve a variety of
problems. So, it may not be so much robust.

Now, to overcome all these drawbacks of traditional tools for optimization, we take the
help of the nature-inspired optimization tool or non-traditional optimization tool. Now,
let us see the reality, now whenever you face some very difficult problem, the real world
problem, we are unable to use the traditional tools for optimization in a very efficient
way. And, the moment we felt, we try to see the way our nature has solved the similar
type of problems, and we try to copy its principle in the artificial way. And consequently
actually a large number of the nature-inspired optimization tools have come into the
picture.

232
(Refer Slide Time: 06:22)

Now, if you see the literature, a large number of tools are available, the literature is huge.
For example, say it starts with say genetic algorithm, in short this is known as GA, which
was proposed in the year 1965. There are a few other algorithms like your genetic
programming, in short this is known as GP. We have got the evolution strategies, now
this is known as ES; then we have got evolutionary programming, it is known as EP.
Then, comes your particle swarm optimization, in short this is known as PSO.

Then, we have got ant colony optimization, this is known as ACO; then, we have got
artificial immune system, that is AIS. We have got artificial bee colony, in short this is
known as ABC, and others. We have got, in fact, a large number of nature-inspired
optimization tools. Now, this particular course actually, it is not a course on
optimization. So, I will not be able to discuss all such algorithms in details.

Now, what I am going to do is, I am just going to discuss, in brief, the working principle
of at least one algorithm and that is the most popular one, that is your the genetic
algorithm. And, we are going to use the principle of genetic algorithm just to optimize
the fuzzy reasoning tool and clustering algorithms. So, we are going to discuss all such
things, in details.

233
(Refer Slide Time: 08:22)

So, let me start with your the working cycle of a genetic algorithm. Now, as I told that
this is not a course on optimization, and I will be discussing in brief the working
principle of genetic algorithm. Now, if you want to get more information, you will have
to refer to the textbook of this course, that is, Soft Computing: Fundamentals and
Applications, or we have already developed another course on Optimization, that is a
MOOC course, NPTEL course, that is Traditional and Non-traditional Optimization
Tools. So, you will have to attend that particular course

Now, let me start with the working principle or the working cycle of this particular the
genetic algorithm. Now, as I told, the genetic algorithm was proposed in the year 1965
by Prof. John Holland. Now, he is from the University of Michigan, USA. Now, Prof.
John Holland, he proposed actually the concept of this genetic algorithm. And, genetic
algorithm is actually a population-based search and optimization tool, which works
based on Darwin’s principle of natural selection. So, according to this natural selection,
it is the survival of the fittest. So, on principle, this genetic algorithm can solve the
maximization problem. But, supposing that you have got a minimization problem, now
this minimization problem can be converted into a maximization problem for the purpose
of solving.

Now, here, as I told that it starts with a population of solution, and this population of
solution is generated at random using the random number generator. Now, let me take

234
the very simple example, supposing that I am just going to maximize a very simple
problem say an optimization problem and this is a function of say two variables: x_1 and
x_2. And, of course, x_1 and x_2 are having some range, like the minimum and
maximum values for x_1 and x_2. Now, what we do is, here actually I am just going to
use the binary-coded GA, that is, BCGA. We have got a few other types of GA like real-
coded GAs, then comes here gray-coded GAs and so, but here, we are going to
concentrate on the binary-coded genetic algorithm.

Now, I am just going to tell you, in short, how to generate this initial population of
solution. And, its name indicates the BCGA, that is a binary-coded genetic algorithm that
means the variables x_1 and x_2 will be represented with the help of some binary
numbers. For example, this is nothing but a collection of 1s and 0s. Now, to represent
this x_1 and x_2, the first thing we will have to do is, we will have to select how many
bits we are going to assign to represent these x_1 and x_2. Now, if we need the better
precision, we will have to assign more number of bits and vice-versa.

Now, depending on the level of the precision we need, if we want to determine, how
many bits to be assigned, that can be determined very easily using this particular
x1max − x1min
relationship. So, l = log 2 . And, this ε is actually the level of precision we
ε
need. Now, this indicates that if you need more precision or the better precision, so we
will have to assign a large number of bits.

Now, let me assume that say we are going to assign 5-bits to represent each of these x_1
and x_2. Now, if you see this initial population or a particular GA-string. So, these
particular five bits are going to represent x_1; then comes your 1 0 1 0 1, another five
bits, so this is going to represent x_2. So, the GA-string will be actually 10-bits long and
this is population-based approach.

So, what we do is, we try to actually generate the whole population, and the population
size is denoted by capital N. So, this could be equal to say 100 or say 200 or say 50
depending on the complexity of the problem. And, what you do is, this population of
solutions, in fact, we generate at random. So, the whole population of solutions will be
generated at random, and this is the way, actually, by using the random number
generator, we can generate the initial population of your genetic algorithm.

235
Now, as this particular population is generated at random, there is no guarantee that we
will be able to select all such good solutions in the initial population. So, this is nothing
but your initial population of genetic algorithm. And, once you have got, the number of
bits and the bits to represent x_1, so very easily, we can find out the real value
corresponding to this particular x_1, and which is determined as follows: So,
x1max − x1min
x1 =
x +min
× DV .
2l − 1
1

Now, x_1 minimum, x_1 maximum will be supplied, l is actually the number of bits to
be assigned to represent x_1. Now, we will have to find out this decoded value. Now,
determining the decoded value is not difficult. Now, let us concentrate on these particular
5-bits. Supposing that I have got the bits like here 1 0 1 1 1, the place value for this
particular 1 is 2 raised to the power 0. Here, it is 2 raise to the power 1; the place value
for this is 2 raised to the power 2; this is 2 raised to the power 3, and this is 2 raised to
the power 4.

Now, its decoded value will be nothing but is your 2 raise to the power 4 is nothing but
16 plus 2 raise to the power 4 is 4 plus 2 raise to the power 1 is 2 plus 2 raise to the
power 0 is 1, so 16 plus 4: 20 plus 2: 22 plus 1, that is 23. So, 23 is nothing but is your
the decoded value. So, very easily, you can find out the real value for this particular x_1
using this linear mapping rule. Now, this is known as the linear mapping rule.

Now, by following the same procedure, I can also find out, what should be the value for
your x_2, the real value for this particular x_2. And, once you have got, the real values
for these particular your x_1 and x_2, so using the expression of say objective function,
so I can find out, what should be the numerical value for this particular objective. And,
this is nothing but the fitness of a particular GA-string provided this is a maximization
problem.

Now, once you have got this particular fitness value for the whole population, we are in a
position to use the other operators, so that we can further modify it. Now, supposing that
your so this particular fitness information is something like this. Now, supposing that the
fitness for the first GA-string is your f_1, then comes your second GA-string is f_2, and
your the last GA-string, so its fitness is nothing but f_n. And, once you have got the

236
fitness information for the whole population, we are in a position to use actually the
operator and that is called the reproduction operator.

Now, the purpose of this reproduction operator is to actually select the good solutions
from the initial population just to make the mating pool. Now, the population size for this
mating pool will once again be equal to your N, that is N could be say 100. So, by using
the reproduction scheme actually, we are going to select all such good strings from the
initial population, and we will try to make one population of size capital N, that is
nothing but the mating pool. Now, if you see the literature, so we have got, in fact, the
different types of your the reproduction scheme.

(Refer Slide Time: 18:51)

Now, if you see the literature, we have got the reproduction scheme like the
proportionate selection. So, we are just going to discuss the reproduction scheme. So, we
have got the proportionate selection. So, this proportionate selection, actually
proportionate selection actually, what we do is, we try to select the GA-string based on
its fitness value. So, the higher the fitness or the more the fitness, so higher will be the
probability of being selected in the mating pool and vice-versa. Now, this proportionate
selection can be implemented with the help of your the roulette wheel selection, which is
actually the very old roulette wheel selection or we can go for some sort of your the
ranking selection. So, this roulette wheel selection or the ranking selection, what we do

237
is, we try to select the good solutions from the initial population and its probability of
being selected is proportional to the fitness.

Now, next, we have got another very popular reproduction scheme and that is known as
your the tournament selection. Now, here actually, what I do is, in tournament selection,
we choose a tournament size. Now, if the population size that is denoted by capital N, if
it is 100, so the tournament size, we select as say might be 3 or say 4 or 5, generally not
more than 5.

And, depending on this particular tournament size, what we do is, we select a few
solutions at random from the initial population, and we compare in terms of their fitness
values. And, if it is a maximization problem, we select the GA-string which corresponds
to your the maximum fitness and we put it in the mating pool. And, if the population size
is 100, so 100 times you will have to play this particular tournament, and each time, we
are going to select a particular string.

Now, supposing that we have got the mating pool. Now, the size of the mating pool once
again will be made equal to your population size, that is nothing but your capital N. And,
now we just go for the next operator and that is nothing but the cross over operator. Now,
in crossover operator actually, there will be an exchange of properties and the two
parents are going to participate in crossover. And, due to the exchange of properties,
some good children solutions will be created.

Now, if you see the literature, we have got different types of crossover schemes. For
example, say in binary-coded genetic algorithm, so we have got the single-point cross
over, then we have got the two-point crossover, then we have got the multi-point
crossover, then comes your uniform crossover, and so on. So, we have got the different
types of the crossover. Now, as the time is short and this is not a course on optimization,
so I am just going to discuss say might be I am discussing only the single point
crossover. Now, just to see like how does it work, now let me try to concentrate on the
single-point crossover only.

238
(Refer Slide Time: 22:52)

Now, if I see the principle of your single-point crossover, the principle is very simple.
Now, supposing that I have got say the two parents, which are going to participate in
crossover. Now, one parent could be your 1 0 1 1 1 1 0 1 0 1, and another parent could
be 0 1 1 0 1 1 1 1 0 0. Supposing that we have got so these two parents, and these two
parents, so this is parent_1 and this is your parent_2. So, these two parents are going to
participate in say a single-point crossover.

Now, what you do is there are 10 bits. So, there are 9 places for the crossover. Now,
supposing that I select a particular crossover site at random and supposing that the cross
over site is selected here. Now, if this is the cross over site, which is selected at random,
then the bits which are lying on the left hand side will be kept intact, and there will be
swapping of the bits, which are lying on the right side of this particular crossover site.

So, if you just do this particular the single-point crossover, we will be getting the
children solution as follows: So, the children solution will be 1 0 1 1 1 1, then 0 1 1 0 1
1. So, this will remain the same. And, there will be swapping here. So, these particular
bits will go up. So, this will become 1 1 0 0, and this will come down. So, this will
become 0 1 0 1. So, this is nothing but child_1 and this is nothing but child_2. So, in
children solution, there is an exchange of properties. Now, if the parents are good, the
children solutions are expected to be good, but there is no guarantee.

239
So, using this crossover operator, there is an exchange of properties. And due to this
particular exchange, there is a possibility that some new properties will be added and
which is going to give some sort of diversification to the solution. Now, once you have
got the children solution, the size of the children solution will become once again equal
to N, so N is the population size. So, I will be getting all such children solutions here. So,
all such children solutions will be getting here. And, the population size is once again is
kept equal to your capital N, and might be capital N is equals to 100.

Next, we go for your the mutation operator. Now, before I discuss the principle of this
mutation operator, let me try to understand the reason behind going for this particular
mutation operator. Now, in biology, so by mutation, we mean a sudden change of
properties. Now, if you see a particular GA-string, GA-string can be compared to our
chromosome, and on the chromosome, you will find some gene values and the properties
of a human being depends on the gene values. Similarly, the property of this particular
GA-string will depend on the bit values.

Now, what we do is in biological mutation, we change, there is a sudden change of


parameter; the same thing has been copied here in the artificial mutation. Now, what we
do is we try to check with each of the bits present in the population, whether it is going
to participate in mutation with a small probability value, which is denoted by the p_m.
So, p_m indicates actually the probability of mutation.

So, what we do is, we try to check at each of the bit positions, whether there will be
mutation or not, how to implement. It is very simple. So, what we do is, at each of the bit
position, we do the coin tossing for appearing head with a small probability p_m. So, if
head appears, this is a success, so we go for the mutation; otherwise the bit will remain
the same, that means, it will remain as the intact one.

So, what you do is, at each of the bit positions, we check with the probability p_m. And,
if there is a success, if head appears in coin-tossing, so what you will do is, a particular
bit at which we get that success of the coin-tossing, if there is 0 here, that will be
replaced by 1 and vice-versa. And, by doing that, so by doing this bit-wise mutation, we
can bring some sort of a local change. And, this particular mutation is going to help us
just to avoid actually this local minimum problem.

240
Now, let me take a very simple example. So, we will understand the concept of so this
local minima, and how can this particular mutation help to overcome this local minima
problem? Let me take a very simple example, the objective function is a function of only
one variable. So, y is a function of x. And, supposing that I have got a plot like this, and I
am solving a minimization problem. And, this is nothing but the globally minimum
solution, and this is your locally optimal or the locally minimum solution. Supposing
that, all the population solutions are lying in this local basin. So, if you run genetic
algorithm for a large number of iteration, there is no guarantee that we are going to hit
the globally optimal solution.

Now, the function of this mutation is just to push one solution to the global basin. And, if
you can push one solution to the global basin, then iteratively it is going to reach that
globally optimal solution. So, this is the way, actually mutation is going to help. Now,
with the application of this reproduction, crossover and mutation, that completes one
iteration or one generation of the GA.

Now, this particular process will continue for a large number of iterations. The moment
it reaches this termination criterion, we declare that is the end of generation, and we try
to find out the optimal solution. Now, this is the way actually, one genetic algorithm, one
binary-coded genetic algorithm works to determine the optimal solution.

Thank you.

241
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 18
Optimization of Fuzzy Reasoning and Clustering Tool (Contd.)

(Refer Slide Time: 00:15)

Now, we are going to discuss, how to carry out optimization to achieve one optimized
fuzzy reasoning tool. Now, we have seen that the performance of a fuzzy reasoning tool
or a fuzzy logic controller depends on its knowledge base, which is nothing, but a
collection of its database and rule base. Now, let us see the scheme, how can you
optimize or how can you tune this particular knowledge base of the fuzzy logic controller
or fuzzy reasoning tool, so that we can model the input-output relationships of a process
as accurately as possible.

Now, here what we do is, this knowledge base of the fuzzy logic controller, that is the
database and the rule base, we try to optimize with the help of a nature-inspired
optimization tool. Now, during this training or the tuning, what we will have to do is, we
will have to take the help of some known input-output relationships and that is known as
the training scenarios. Now, this nature-inspired optimization tool is an iterative method
and it takes a huge amount of time just to converge to the optimal solution. And, that is
why, this particular training or the tuning of the parameters is carried out offline. Now,

242
once you have got the optimized knowledge base of the fuzzy logic controller, now we
can pass the set of inputs through the fuzzy logic controller.

So, we will be getting this particular output, online, might be within a fraction of second
for a set of inputs. So, we will be getting that particular the output and that is why, once
it is trained offline and now, we are in a position to use it online. Now, let us see how
does it work.

(Refer Slide Time: 02:35)

Now, to explain this, we are going to take the help of one numerical example, now there
are several approaches of optimizing. So, this particular fuzzy reasoning tool or fuzzy
logic controller, the first approach is known as the GA-based tuning of manually
constructed FLC.

So, what we do is, supposing that I want to determine the input-output relationships of a
particular process. Now, what we do is, we try to design the knowledge base which is
nothing, but a collection of database and rule base manually. So, based on the
information we have we try to design manually first, but that may not be the optimal in
any sense. And, after that, we are going to take the help of an optimizer, it is a genetic
algorithm, so that we can find out the optimal knowledge base for this particular fuzzy
logic controller.

243
Now, as I told that we are going to take the help of one binary-coded genetic algorithm
here. So, a binary-coded genetic algorithm is used to obtain optimal database and rule
base of a fuzzy reasoning tool and let me consider a process having say 2 inputs and 1
output, the two inputs are I_1 and I_2 and we have got one output, that is denoted by O.
The membership function distribution of the inputs: I_1 and I_2 and the output O are
assumed to be triangular in nature, for simplicity.

So, this shows the membership function distribution for the first input, that is I_1 and this
shows the membership function distribution for the second input, that is I_2 and this
shows the membership function distribution for the output, that is O. Now, both the
inputs and the output actually are represented with the help of four linguistic terms each,
for example, say the linguistic terms are low, medium, high and very high. Now, there
are four linguistic terms for I_1 and four linguistic terms for I_2.

So, I will have 4 multiplied by 4, there are 16 possible combinations for the inputs and
we are going to consider the 16 rules and each rule is nothing, but the relationship
between the inputs and the output. Now, as I told that for simplicity, we have considered
that membership function distribution is triangular. So, for this medium and high, we
have considered isosceles triangle and there is overlapping region also. And, for this low
and medium in fact, we are going to consider some sort of say right angled triangle and
similarly, for this I_2 and O.

So, we are having the membership function distribution for your the low, medium, high
and very high. And, here, once again low, medium high and very high and once you have
got the membership function distribution, as I told, now, we are in a position to design
the rule base.

244
(Refer Slide Time: 06:23)

Now, here, we have got the 16 rules. Now, the 16 rules can be read as follows: For
example, the first rule could be so if I_1 is say low. So, I_1 is low and I_2 is your low.
So, I 2 is low then your output O is also low. So, this is actually the first rule, similarly
we can design all the 16 rules, now all these are manually constructed 16 rules.

Now, the designer will design this particular rule base based on his or her experience of
that particular problem. Now, as I told that a particular rule is nothing, but the
relationship between the inputs and output. And, let me read once again the first rule and
which is as follows: if I_1 is low and I_2 is low then output O is low and similarly we
have got say 16 such rules.

245
(Refer Slide Time: 07:51)

Now, I am still continuing with the statement of this particular numerical example. So,
here once again let me tell that a binary-coded GA, we will be using to optimize both the
database as well as the rule base of this fuzzy reasoning tool. And, here, we have got the
set of training cases and a particular trading scenario is nothing, but the relationship
between the inputs and output.

For example, say the first training scenario is nothing, but if I_1 is 10 with some units
and I_2 is 28 with some other units, then the output is nothing, but 3.5 with some other in
unit. So, this is the way actually, we can design and we can collect in fact, the known
input output relationship and that is nothing, but the set of training cases or the set of
training scenario. Now, here we have got say capital T number of training scenarios.

246
(Refer Slide Time: 09:05)

Now, once you got this particular training scenario, now, actually we will have to design
that particular GA-string, the GA string for the binary-coded GA. Now, this particular
GA-string will carry information of all the design variables, now, what are the design
variables, let me have a look fast, then I will concentrate here.

(Refer Slide Time: 09:29)

Now, one of the design variables could be your this particular b_1. So, b_1 is going to
represent, whether this particular triangle will be is a stiffer one or it will be a flatter one.
For example, say if it is a right-angled triangle, this indicates the base width of this right

247
angled triangle and if it is isosceles triangle. So, this b_1 indicates actually the half base
width of this particular isosceles triangle.

Similarly, for this particular I_2, b_2 is actually the variable. So, b_2 could be large or it
could be small, similarly for this particular output. So, b_3 could be your the design
variables. Now, we will have to assign some bits to represent; so, this b_1, b_2 and b_3,
let me assume that I am assigning. So, 5 bits to represent each of the variables like your
b_1, b_2 and b_3.

(Refer Slide Time: 10:41)

Now, once you have assigned. So, some bits like 5 bits each to b_1, b_2 and b_3. So,
now, we have got, in fact, your for 3 variables, 5 plus 5 plus 5; so 15 bits to represent
b_1, b_2 and b_3. Now, I have to represent the rule base, how to represent the rule base?
I have got 16 rules, now to represent, in fact, the presence or absence of a rule, we use
either 1 or 0; now if it is 1, it means that that particular rule is present and if it is 0, the
rule is absent.

Now, what you do is, this indicates actually the rule base. So, we concentrate on the left
most top corner. So, we start with this, then we just move in this particular direction,
next we move in this particular direction, next we move, in fact, in this particular
direction, next we move this particular direction to represent that particular the rule and
what you do is. So, if there is 1 here means that particular rule is present, if there is a 0

248
here means that particular output is absent, that means your that particular output is
absent, and so on.

So, there are 16 such rules. So, I need, in fact, 16 bits just to represent, whether the rule
is present or not. So, the GA-string will consist of 15 for this b_1, b_2 and b_3 ( this is
for b_1, b_2 and b_3) plus 16 for the rule base. So, GA-string will be your 31 bits long.
So, the same thing actually, I am just going to represent here.

(Refer Slide Time: 12:43)

Now, if you see the GA-string, this particular population, which are generated at random,
the first 5 bits are going to represent the b_1, the next 5 bits are going to represent in fact,
your b_2, the next 5 bits are going to represent b_3 and the remaining 16 bits are going
to represent the rule base.

So, the rule base is represented by these 16 bits, this is going to represent b_1, this is
going to represent b_2 and this is going to represent b_3, and the GA-string will be your
31 bits long. And, similarly, we have got the whole population of solutions generated at
random. Now, once you have got this, now let us see, how can it optimize, how can GA
optimize that particular database and the rule base.

249
(Refer Slide Time: 13:49)

Now, before we proceed further, let me finish the statement of the problem. Now in fact,
our aim is to determine the deviation in prediction for the set of training scenarios; that
means, once it is trained, we are going to pass a set of inputs at the test scenario.

And, we will try to find out, what should be the output and how much is the deviation
during the optimization, so these are the ranges for your b_1, b_2 and b_3. Now, these
are all real variables and we will have to define the ranges for your b_1, b_2 and b_3. So,
this completes actually the statement of the problem. Now, let us see, how to find out the
solution for this particular problem.

250
(Refer Slide Time: 14:49)

Now, this particular method, I have already discussed a little bit. So, let me discuss once
again, let me concentrate on the first GA-string, which is 31 bits long. So, this particular
GA-string is 31 bits long. So, first five bits are going to represent b_1, the b_2 is going
to be represented by the next 5 bits, the next 5 bits are going to represent b_3 and the 16
bits are going to represent that particular the rule base. Now, we have already discussed
like how to find out the decoded value, now if I try to find out for 1 0 1 1 0.

So, this displays the values: 2 raise to the power 0, 2 raise to the power 1, 2 raise to the
power 2, 2 raise to the power 3, 2 raise to the power 4. So, the decoded value will be
your 2 raise to the power 4 is nothing, but 16, then 2 raise to the power 2 is nothing, but
4 plus 2 raise to the power 1 is nothing, but 2. So, this is nothing, but 22. So, this is
nothing, but is your the decoded value.

And, once you have got the decoded value and once again, we will have to use that linear
mapping rule, which has already been discussed and using that linear mapping rule, I can
find out what should be the real value for this particular b_1, that is the first design
variable. Now by following the similar procedure, I can also find out, what is b_2, then
what is b_3 and once we got the real values for this b_1, b_2 and b_3.

251
(Refer Slide Time: 16:41)

So, now in fact, we are in a position to find out like what should be the modified
membership function distribution. So, the modified membership function distribution
will look like this. So, the starting value for I_1 we keep it fixed, similarly the starting
value for I_2 is kept fixed, starting value of output is kept fixed.

Now, we are going to find out, what should be the modified value for this particular your
b_1 and depending on the values of this b_1, b_2 and b_3. So, this is your b_2 and this is
nothing, but is your b_3. So, we redraw the modified membership function distribution.
So, the modified membership function distribution for I_1, I_2 and O will look like this,
and once you have got the modified membership function distribution.

252
(Refer Slide Time: 17:43)

Now, we are going to discuss like how to select the good rules with the help of the GA
string, supposing that these 16 bits lying on the first GA-string are going to represent the
presence and the absence of the rules. Now, this I have already discussed that these
represent actually the output of the rules. So, I_1 has got 4 linguistic terms, I_2 has got 4
linguistic terms.

So, this represents actually the first rule. Now if I just write on this particular GA-string
here. So, this is 1. So, this particular 1, then there is three 0’s 0 0 0 so I am here, next is 1
0 so I am here, next is 1 0 so I am here, next is 1 0 so I am here, next is 1 1 1 so I am
here, then 0 0 1. Now, one means your so this particular rule is present, 0 means this rule
is absent and so on. So, this is the way actually, we can code in fact, the rule base inside
the GA-string.

253
(Refer Slide Time: 19:09)

Now, once we have got this particular thing, now, actually what we can do is, I can pass
a particular training scenario. Now the training scenario is something like this the first
training scenario. So, if I_1 is 10, I_2 is 28. So, actually we will have to find out what
should be the output of this fuzzy reasoning tool. Now, for I_1 equals to 10 and I_2
equals to 28. So, if you see the modified membership function distribution, if we see the
modified membership function distribution, it is like this.

(Refer Slide Time: 19:53)

254
So, I_1 is 10 means I am here, so if I_1 is 10 so I am here, so that means, I am here. So,
this I_1 could be either medium with some µ value or it could be a high with another µ
value; now similarly this I_2 is 28; that means, I am here. So, for this 28, it could be low,
with this much of membership function value and it could be medium with this much of
membership function value.

So, this particular I_1 could be medium or high and your this I_2 could be either this
particular your low or medium. So, I have got 2 multiplied by 2, a maximum of 4 fired
rules. Now, out of this maximum possibilities, we will have to check, which of the rules
are present. So, to check it, in fact, so what we will have do is, we will have to come here
and we will have to find out. So, out of these 4 fired rules, which 1 or which 2 or which 3
or which 4 rules are present here, that we will have to find out.

Now, for this particular problem, if you see. So out of this in fact, only there are 2 fired
rules the fired rules are as follows. If I_1 is medium AND I_2 is low then output is low,
the 2nd fired rule if I_1 is high AND I_2 is low then output is actually the medium. Now,
let us see, now corresponding these, if I_1 is your medium, this is the membership
function distribution for medium. So, corresponding to this so I am passing I 1 equals to
10. So, I should be able to find out, what should be the membership function value and
this is nothing, but the membership function value.

Now this I have already discussed that by using the principle of the similar triangle, very
easily you can find out what should be the membership function distribution
corresponding to this medium and if you calculate, you will be getting mu_m is nothing,
but 0.83.

255
(Refer Slide Time: 22:17)

Now, if you follow the same principle, just to find out what should be the µ value
corresponding to I_2 equals to your that 28. So, corresponding to 28, I can also find out
what is your µlow following the principle of the similar triangle and I can calculate the
mu low is nothing, but 0.13.

(Refer Slide Time: 22:49)

Now, if you see, you can find out the µ values. Now, if I concentrate on your the 1st
fired rule, that is nothing, but if I_1 is medium AND I_2 is low then output is low, this is
a membership function distribution for the low. Now, corresponding to this medium, we

256
have already determine µm , corresponding to low we have determined µlow . And, we
will have to find out the minimum value, as we have already discussed and
corresponding to the low value of µ , this will be the output, the fuzzified output.

(Refer Slide Time: 23:33)

Now, following the similar procedure, I can also find out, what should be the output for
the 2nd fired rule, but before that let me concentrate a little bit more on how to determine
the area and center of area, corresponding to this fuzzified output, which we have got
corresponding to the 1st fired rule. Now, I will have to find out the area and center of
area of this particular the shaded portion. Now, to determine the area and center of area,
so what we do is.

So, this is divided into two parts, I have got one rectangle sort of thing and I have got one
triangle sort of thing, ok. Now, what you can do is, I can find out the area for this
rectangle, that is A_1 and area for this triangle, that is A_2 and I can also find out the
center of area. Now, this I have already discussed in much more details. So, I am just
going to skip the detailed discussion on this and because I have already discussed. So,
this area A_1 can be determined like this, the center of the area the C_1 can be
determined like this. Similarly, the area for the triangle that is A_2 can be determined
like this and the center of area can be calculated like this.

257
(Refer Slide Time: 25:07)

And, once you have got this area and center of area. So, now, I am in a position to find
out corresponding to the 1st fired rule. So, what should be your the center of combined
area. So, I can find out corresponding the 1st fired rule like what should be your the area
and what should be your the center of area. Now, this area and center of area I can find
out. So, this area is nothing, but this and center of area is nothing, but this, corresponding
to the 1st fired rule.

(Refer Slide Time: 25:41)

258
And, once you have got this, now we can concentrate on the 2nd fired rule. The 2nd fired
rule states, if I_1 is high and I_2 is low then output is medium, and I can find out what is
µ H , I can also find out what is µlow I can compare, so this is the minimum.

So, corresponding to this in fact, I can find out. So, this will be the area, the shaded area
and now, I will have to find out the area and center of area of this particular shaded
portion.

(Refer Slide Time: 26:23)

Now, this area I can find out, now this is the area of the shaded portion and center of area
I can find out. So, if I know the area and center of area for this particular fuzzified output
of the 2nd rule and by knowing the same for the first rule; now we are in a position so
that we can find out using the center of sums method, what should be the crisp output.

Now, if you follow the principle of center of sums method very easily you can find out
that this will be the crisp output and once you have got the crisp output and that is
nothing, but is your the calculated output. Now, if I know the calculated output.

259
(Refer Slide Time: 27:09)

So, this is the calculated output, we compare with the target output, find out the deviation
and the deviation could be either positive or negative, take the mod value and this is
actually the deviation. Now, this particular deviation is corresponding to the first training
scenario. So, this is the first training scenario, now we will have to follow the same
procedure for all the training scenarios and we will be able to find out d_2, d_3 up to
d_T.

(Refer Slide Time: 27:51)

260
And, after that actually, what we do is, we try to find out the total deviation and that is
divided by the number of training scenarios and that is nothing, but the average
deviation. Now this particular average deviation is nothing, but the fitness of this
particular the first GA-string.

So, this is the GA-string for which we have got the fitness and this is f_1, but is your d
bar that is the mean deviation. Now, similarly you can find out the fitness for the second
GA-string the fitness for the n-th GA-string and this is a minimization problem. Now, if
use the binary-coded GA and if you want to convert, we can convert the objective
function to minimize this particular fitness. So, what we will have to do is, I will have to
maximize say 1 divided by the fitness. So, I can maximize 1 divided by fitness just to
minimize your the fitness, that is, f.

Now, you are going to use the operators like reproduction, crossover and mutation. Now,
GA through a large number of iterations will try to find out the optimal knowledge base
of the fuzzy logic controller; that means, your optimal database and rule base for the
fuzzy logic controller so that this particular fuzzy logic controller can predict the output
for a set of inputs as accurately as possible.

Thank you.

261
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 19
Optimization of Fuzzy Reasoning and Clustering Tool (Contd.)

Our aim is to design optimal knowledge base of a fuzzy reasoning tool. Now, what we
do is, in approach 1, which we have already discussed, the designer based on his own
experience of the problem to be modeled, he or she designs the knowledge base.

(Refer Slide Time: 00:37)

That is the rule base and data base of the fuzzy reasoning tool. Now, after that, we use
one optimizer say one nature-inspired optimization tool like genetic algorithm to tune its
database and a rule base. Now, this genetic algorithm through a large number of
iterations, we will try to find out what should be the optimal knowledge base for this
fuzzy reasoning tool.

Now, supposing that we are going to model a very complicated process, a real-world
problem. And, it is bit difficult for the designer to determine the knowledge base and the
rule base of the fuzzy reasoning tool beforehand. Now, in that case, actually we cannot
go for approach 1, that is the genetic algorithm-based tuning of the knowledgebase of

262
fuzzy reasoning tool. And, there we will have to go for another approach and that is your
approach 2 that is nothing but automatic design of FLC using a genetic algorithm.

Now, here, we do not determine the rule base of the fuzzy reasoning tool or fuzzy logic
controller beforehand, because we do not have sufficient information of the process to be
controlled. And, here, the whole task of designing the rule base is given to the genetic
algorithm. Now, genetic algorithm through a large number of iterations will try to
evolve, what should be the optimal database, and what should be the optimal rule base of
the fuzzy reasoning tool, so that it can make the prediction as accurately as possible.

Now, this approach, that is approach 2, so we are going to discuss with the help of one
numerical example, the same numerical example which I consider for approach 1, so I
am just going to consider it once again. So, our aim is to design and develop the
knowledge base of one fuzzy reasoning tool or fuzzy logic controller, whose aim is to
model a process having two inputs: I_1 and I_2, and it has got only one output that is O.

Now, we have already discussed that four linguistic terms are used to represent I_1, I_2,
and the output O. And, the linguistic terms are like very low, low, medium or high, or it
could be your say low, medium, high and very high. So, we are going to consider for
linguistic term, that is low, medium, high and very high. So, I have got four such
linguistic terms here for representing I_1; I have got four linguistic term for representing
I_2. So, I have got 4 multiplied by 4, there are 16 rules, that mean 16 possible
combinations for the input parameters.

And, once again, we are going to use four linguistic terms to represent the output, that is,
we are going to use low, medium, then comes your high and very high. Now, let us see
how can you implement this particular approach that is automatic design of FLC using a
genetic algorithm. Now, here what you do is, so there are four linguistic terms for the
outputs. So, what we do, we try to represent, say if the output is low, that is represented
by say 0 0; then medium can be represented by 0 1; high can be represented by 1 0; and
very high can be represented by your 1 1. So, to represent the output of a particular rule,
we are going to use, in fact, 2-bits. And, there are 16 rules, so I will have to use 16
multiplied by 2, that is your 32 bits to represent what should be the output of these rules.

Now, here, so we have got the variables like the way I discussed in the first approach.
So, we have got the variables to represent the shape of the membership function

263
distribution for input one that is nothing but b_1 if we remember. For I_2, we have got
another variable that is b_2; and for this output, we have got another variable that is
nothing but your b_3. So, we have got b_1, b_2 and b_3, so these three real variables.
And, then, we have got 16 rules just to represent the presence or absence of the rules.
And, we have got 16 multiplied by 2, that is, 32 bits to represent what should be the
output for the 16 rules.

Now, if you see, we are assigning 5-bits to represent b_1; 5-more bits to represent b_2,
and 5 more bits to represent b_3, then there will be 16 bits to represent the 16 rules like
the presence or absence of the rule. And, 2 multiplied by 16, that is 32 bits to represent
what should be the output for the 16 rules. So, we have got 5 + 5 + 5 + 16 + 2 ×16 =63 .
So, in total we have got 63 bits. So, the GA-string will be 63 bits long. And, a particular
GA-string will carry information of the data base and rule base, and the output of the
rules for this the fuzzy logic controller.

Now, our aim is to pass one set of inputs like I_1 is equal to 10 and I_2 is 28.0. And, we
have got actually the target output, that is 3.5. And, we will try to find out what should
be the calculated output and that particular calculated output will be compared with the
target output just to find out the deviation. And, this particular deviation, we will have to
minimize. So, this is actually the problem description. The same numerical example, we
are going to solve using approach 2.

The only thing, the only difference here, is in earlier approach, that is, approach 1, we
have predetermined set of rules; and here, we have got the combination of the input
parameters, but their corresponding outputs are not known. Now, here, actually we are
going to use genetic algorithm to find out what should be the output for each of these 16
rules. So, this is actually approach 2. Now, I am just going to solve this numerical
example in details.

264
(Refer Slide Time: 08:24)

Now, if you see, the population of the solution or the GA strings, so a particular GA-
string will look like this. So, as I mentioned that say it has got 63 bits. So, this is nothing
but the first GA-string. Similarly, we have got second GA-string and we have got capital
N number of GA-strings, because the population size is nothing but your capital N. And,
generally we consider N is equal to say 100. So, we have got 100 such GA-strings in the
population. And, these particular GA-strings are generated at random using the random
number generator. Now, if I concentrate on a particular GA-string, for example, the first
GA string here. Now, let us see, how to determine the output corresponding to this
particular the GA-string.

265
(Refer Slide Time: 09:21)

Now, here this shows, in fact, the first GA-string. So, this is, in fact, the first GA-string.
So, the first 5 bits represent the b_1 that is 1, 2, 3, 4, 5. So, 5 bits represent b_1. The next
5 bits represent b_2; and b_3 is represented by the next 5 bits. Now, 16 bits are going to
represent actually the presence or absence of the rules, and the output of the rules are
represented by these 32-bits, this is 16 multiplied by 2, that is, 32-bits. So, one complete
GA-string is going to represent actually the database and the rule base for this particular
fuzzy logic controller.

Now, let us see, how to find out the output for a set of inputs. Now, corresponding to
these particular 5-bits used to represent b_1, we can find out what should be their real
values knowing the lower limit and upper limit for this particular b_1. Now, if you
determine the real values for this particular b_1, so we will be getting 3.419355. So, this
is the real value for b 1. And by following the same principle, I can find out the real
value for b_2, and that is nothing but 9.193548. Then, corresponding to this, I can find
out the real value for this particular b_3, and that is nothing but 1.370968. And, once we
have got the real values for this particular b_1, b_2 and b_3. So now, we are in a position
to find out what should be the membership function distribution or the modified
membership function distribution for this b_1, b_2, and b_3.

266
(Refer Slide Time: 11:25)

Now, if you see the modified membership function distribution, this is actually the
modified membership function distribution for I_1, this is for I_2 and this is for the
output O. And as we discussed that we have got four linguistic terms: low, medium, high
and very high, for I_1, I_2 and this particular the output. The only thing is, we have
optimized what should be the base width for the right angle triangle used to represent
low or the half base-width for the isosceles triangle used to represent medium and high,
and so on.

And for simplicity, we consider the symmetric triangles. So, this is the modified
membership function distribution for I_1, modified membership function distribution for
I_2, and modified membership function distribution for your the output. Now, actually
what we will have to do is, we will have to represent or we will have to find out what
should be the rule base, and the rule base, which is represented by this particular GA-
string.

267
(Refer Slide Time: 12:42)

Now, before that let me just try to say that, if I put I_1 equals to 10, and I_2 equals to 28,
so this will be as follows:

(Refer Slide Time: 12:57)

If I put I_1 equals to 10 here, so I_1 equals to 10 means I am here. So, this particular I_1
can be called your medium with this much of membership function value. And, it can be
called high with this much of membership function value. Similarly, if I just write here,
I_2 equals to 28 and corresponding to 28, we can find out the membership function value

268
corresponding to low and membership function value corresponding to this particular the
medium.

(Refer Slide Time: 13:37)

Now, let us see, what happens to the rule base. Now, to find out the rule base
corresponding to this particular substring, I should say, for example, say these 16-bits
will represent the presence and absence of the rule, that means, if I start from here, 1
indicates that the first rule will be present, then second, third, fourth will be absent, then
the fifth rule will be present, and so on. Now, if I just implement, so very easily we can
find out what are the rules to be there in the rule base. Now, if I consider that this first
rule is present and what should be its output, that is decided by these two bits.

Now, this is 0 0. Now, if it is 0 0, so this is nothing but your so this is nothing but the
first option that is your low; so the first option that is going to be indicated by this 0 0.
So, this is nothing but the low. So, if the first rule is present, so its output will be low.
Now, if I just see this, so on this particular table, I can find out, according to that
particular substring, the rules which are present, which are found to be present are as
follows.

Like if I_1 is low and I_2 is low, then the output is low. So, this particular rule is present,
but the second rule is absent, the third rule is absent, the fourth rule is absent. Then the
fifth rule is present, which states if I_1 is medium and I_2 is low then the output is
medium. Then sixth is absent; seventh is present; eighth is absent; the ninth is present;

269
tenth is absent, and so on. Now, as I told that this particular I_1, which is equal to 10.0, it
can be called either medium or it could be high. Similarly, the I_2, which is equal to 28.0
can be called either low or it can be called medium. So, there is a maximum of four fired
rules. And, let us see out of those four fired rules, which one or which two or which three
or whether all four are present or not.

Now, let me try to find out, whether the first combination is present or not. So, if I_1 is
medium, so I am here, if I_1 is medium and I_2 is low, so I_2 is low, then output is
actually the medium. So, this particular the rule is present. So, this is present. Next is if I
1 is medium, and I_2 is medium, the rule is absent here. The next is, if I_1 is H, so I am
here. And I_2 is low, that means, I am here. So, this particular rule is present whose
output is actually high. The next is if I_1 is H, so I am here; and I_2 is M, so I am here.
So, that particular rule is absent.

So, out of the four maximum rules, only two are found to be present here. And the rules
are as follows: like if I_1 is medium, so if I_1 is medium, and your I_2 is low, then the
output O is nothing but the medium. So, this is one present fired rule, and another present
fired rule is if I_1 is say high, so this one, and I_2 is low, then the output o is nothing but
is your high. So, out of these four, only these two rules are found to be present here.
Now, we will have to find out, what should be the output corresponding to these two
fired rules. And, then, I will have to combine just to find out what should be the control
the combined output or the combined control action.

Now, let us see how to find out.

270
(Refer Slide Time: 18:15)

Now, corresponding to the first fire rule, that is your if I_1 is medium and I_2 is low, the
output is medium. So, what you can do is, we can find out what should be the fuzzified
output. And, corresponding to the second rule, once again, I can find out what should be
the fuzzified output, and then, we combine. Then, we will be getting the fuzzified output
for the combined control action considering these two rules. And, once you have got it,
now we can use the center of sums method of de-fuzzification. And, if I use it, there is a
possibility that you will be able to find out what should be the crisp output and that is
coming to be equal to 4.056452.

Now, this is the calculated output corresponding to the inputs like I_1 is 10 and I_2 is
nothing but 28.0. Now, this is actually the output, but the target output is nothing but is
your 3.5. So, there is some deviation and here, this particular deviation, that is, 3.5 minus
4.056452, so this is nothing but a negative value, and that is why we use the mod value
just to make it positive. That means out of all the training scenarios, which you have, so
if I pass the first training scenario, I am getting this particular deviation. Now, by
following the same procedure, I am just going to pass the second training scenario, third
training scenario up to the T-th training scenario.

271
(Refer Slide Time: 20:09)

Now, if I pass all the training scenarios, then I will be getting actually the different
deviation values. Now, corresponding to the first training scenario, say the deviation is
denoted by say d_1, corresponding to the second training scenario supposing that the
deviation is denoted by say d_2. And, corresponding to the T-th training scenario
supposing that the deviation is represented by d_T. Now, what we do is, we try to find
out the average deviation and that is nothing but the sum of all devalues divided by the
number of training scenarios that is nothing but T. So, I can find out what should be this
average deviation that is d .

(Refer Slide Time: 20:58)

272
And, once you have got this average deviation now this average deviation is nothing but
the fitness for the first GA string. So, I should be able to find out what should be the
fitness for the first GA string that is your f 1. And, by following the same procedure, I
can find out the fitness for the second GA string. And we try to find out by following the
same procedure the fitness for the other GA string. And for the Nth GA string the fitness
is denoted by f_N.

Now, we take the help of the GA-operators like the reproduction, crossover and
mutation. And, GA through a large number of iteration we will try to find out what
should be the optimal database, and what should be the optimal rule base for this fuzzy
reasoning tool. And, once, you have got this optimal database and rule base, so what you
can do is, now this optimal fuzzy logic controller, you can use for your per online
application. That means, we can pass some test scenarios and we can find out what
should be the output for a set of inputs.

(Refer Slide Time: 22:14)

Now, if we optimized or if you try to evolve the knowledge base, that is the rule base and
data base, particularly the rule base of a fuzzy logic controller by following this
particular the method which I have already discussed, there is a possibility that there will
be some redundant rules in the rule base. Now, redundant rule means like, let me try to
explain supposing that I have started with says 16 rules. Now, I have started with 16
rules, now if I just do this GA based tuning,

273
which have already discussed that automatic design of fuzzy logic controller using the
genetic algorithm, there is a possibility say I will be getting say 9 good rules out of this
particular the 16. Now, this 9, I will be getting if I just do the GA based tuning only once
ok.

Now, if I do the same GA based tuning once again, so there is a possibility that from a 9,
it may select only 6 rules or the 7th rules out of these 9. So, we will be getting the further
tuned rule base. Now, if I follow this particular principle; that means, there are some
redundant rules ok, and those redundant rules, in fact, we will have to identify.

Now, to identify the redundant rules, so what we do is, we introduced one technique like
we calculated what do you mean by importance factor. Now, this importance factor that
is denoted by say I.F. To find out, what we do is, we try to find out, what is the
probability of occurrence of the different rules during the training. That means, a
particular rule how many times it has been fired during the training scenarios or during
the training. And, we try to find out what is the probability of occurrence of all the
possible rules during the training. And, moreover, we try to find out what should be the
worth of a particular rule.

Now, this is what is decided for the set of inputs, what is the output, now out of these 16
rules, the importance of all the 16 rules may not be equally good. Now, what I do is, we
try to represent the importance of each of this particular rule in a scale of say 0 to 1. And,
if I get the worth of a particular rule and the probability of occurrence, so both the things
are going to lie between 0 and 1. So, I will be getting some value lying between 0 and 1.
And, if you multiply them, I will be getting another value which is once again will be
lying between 0 and 1.

Now, if that particular value that is nothing but the importance factor, now if this
importance factor is found to be say either less than some threshold value, that means
that particular rule is not very good and that can be declared as a redundant role. So, this
is the way actually we declared the redundant rules. But, if it just go on tuning, so this
particular the rule base of the fuzzy logic controller, so there is a possibility that it will
give raise only a very weak or a very optimized rule based sort of thing.

And, there could be a few test scenarios. Now, if I pass these test scenarios, there is a
possibility that not even a single rule is going to be fired and that is actually the problem

274
of no firing. Now, by no-firing, in fact, we mean a situation, whenever we are passing
say one set of training, one set of inputs, but it is not going to actually trigger any of the
rules and none of the rules is going to be fired. So, we cannot find out actually the output
for these set of inputs, so that particular situation is nothing but actually the no-firing
situation. So, our aim is to reduce the redundant rules, but at the same time we should
take care that there should not be any such no firing or weak firing. Now, this is the way
actually we can optimize the fuzzy reasoning tool.

(Refer Slide Time: 27:09)

And, now I am just going to discuss, in short, like how to carryout optimization for the
fuzzy clustering. Now, we have already discussed that our aim is to determine the
clusters, which are very distinct and the clusters should be very compact, and at the same
time the number of outlier should be as minimum as possible. And, in fact, ideally we
want that there should not be any such outliers.

Now, if I just formulate this particular problem as a maximization problem, so I can


1
formulate like this. So, maximize f, f = W1 × d + W2 × C + W3 × , which indicates your
O′
outliers. So, our aim is to maximize this particular objective function. And, once again,
we can take the help of your this type of say genetic algorithm. And genetic algorithm is
going to encode, the values for this d then comes your c and this say your 1 divided by O
prime that is your outliers and the sum of all the W values should be equal to your 1.0.

275
So, all such values, we can actually find out and we can find out what should be the
distinctness, what should be the compactness and what should be the number of outliers.
So, these are the things, which will have to find out, but what should be the design
variables. The design variables or the performance of a fuzzy clustering tool, so fuzzy C-
means clustering depends on the number of clusters to be made. Then comes the initial
matrix of your membership values and the level of cluster fuzziness. So, all such things
at the design parameters for this FCM that is our fuzzy c means clustering. And we can
use some GA strings to represent the design variables and this would be your objective
function. And, our aim is to maximize this particular the objective function.

And, GA will try to find out through a large number of iterations like what should be the
number of clusters to be made, what should be the the matrix for the C values and what
should be the level of cluster fuzziness, so that this particular condition gets fulfilled and
it will try to find out; the optimal clustering using the fuzzy C-means clustering.

(Refer Slide Time: 30:24)

Now, next we try to see another method of clustering, which have been used, that is
entropy-based fuzzy clustering. And, if I want to carry out the similar type of
optimization, where our aim is to maximize the distinctness to maximize the
compactness and to minimize the number of outliers. So, we can use the same principle
for this entropy-based fuzzy clustering also. Keeping the same objective function, now
here the design variables will be α , β , γ , this have been already discussed. Now, α

276
actually indicates the relationship between the Euclidean distance and similarity. And, β
represents the threshold value of similarity; and γ indicates actually the outliers.

Now, in the GA string, if I consider say a particular GA string something like this, say
this is one say GA string. So, if I consider, it can represent the value of α , then comes
your β , and then comes your γ , and we can generate a population of solution. Now,
corresponding to these α , β , γ , so it will carry out some sort of fuzzy clustering; and we
will be getting the quality of clusters in terms of the distinctness, in terms of the
compactness and we can also find out, whether there is any such outliers or not. And GA
through a large number of iterations, we will try to find out that values of α , β , γ ,
corresponding to a particular say the datasets, so that it can ensure the optimal clusters.

(Refer Slide Time: 32:22)

Now, this is the way actually, we can carry out some sort of optimization, if I want to
optimize the performance of fuzzy logic controller and performance of your fuzzy
clustering tools. Now, this is actually the reference based on which so we carried out this
discussion, that is, Soft Computing: Fundamentals and Applications by D.K. Pratihar.
So, this is the textbook for this course. So, you can refer to this.

277
(Refer Slide Time: 32:51)

And, here, I just want to summarize, whatever we have discussed in this particular
lecture. Now, at the beginning, we gave a brief introduction to the nature-inspired
optimization tools particularly the genetic algorithm. We spent some time on
optimization of fuzzy reasoning tool or fuzzy logic controller, like how to find out the
optimal database, optimal rule base for the fuzzy logic controller, that we discussed in
details and we solve some numerical examples also.

Now, next, in fact, we concentrated on how to optimize the clustering algorithms like
fuzzy C-means algorithm or entropy-based algorithm, so that it can ensure the optimal
clusters in terms of compactness, in terms of distinctness, and there should not be any
such outliers.

Thank you.

278
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 20
Introduction to Neural Networks

Now, we are going to start with another topic, that is, Introduction to Neural Networks.
So, we are going to enter another region, that is artificial neural networks.

(Refer Slide Time: 00:31)

Now, the purpose of these artificial neural networks is how to model the human brain in
the artificial way. Now, these are the topics actually which we are going to discuss. So,
at first I will give a brief introduction to the working principle of a biological neuron and
I will try to design an artificial neuron based on the working principle of the biological
neuron. And, after that, we are going to discuss like how to design one artificial neural
network, which consists of a large number of neurons.

Then, we will be discussing the principle of supervised and unsupervised learning and at
the end, we will try to define like what do you mean by incremental and batch modes of
training. Now, let us start with actually the biological neuron.

279
(Refer Slide Time: 01:33)

Now, before that let me tell you that the concept of neural networks actually was
proposed in the year 1943 by McCulloh and Pitts. And, here actually, what we do is, we
try to copy everything from the biological neurons or biological nervous system. Now, if
you see the biological nervous system, it consists of a large number of neurons and these
neurons are working in parallel.

Now, the average human brain contains approximately 1011 neurons. And, of course, it
varies from person to person and these neurons are working in parallel and that is why,
our brain is a highly complex parallel computer.

280
(Refer Slide Time: 02:33)

Now, let us see the working principle of a particular biological neuron. Now, if you see a
biological neuron, it consists of, in fact, say a bush of thin fibers and those are known as
dendrites.

(Refer Slide Time: 02:43)

So, these are nothing but the dendrites and then, we have got a one long cylindrical fiber
and that is called the axon. So, we have got this particular axon and we have got the cell
body or the soma and of course, we have got this particular the synapse.

281
Now, let me explain the working of each of these particular components of the neurons
for example, a particular neuron will try to collect information from the neighboring
neuron, now this neighboring neuron could be here. Now, it will try to collect
information from the neighboring neurons with the help of these thin fibers, those are
nothing but the dendrites and it will collect all such information and in the cell body or
the soma, the information will be collected and this particular collected information will
pass through the axon and then, it will go to the junction between so this particular
neuron and the next neuron and that particular junction is known as actually the synapse.
So, a biological neuron consist of dendrites, it has got axon, it has got cell body or soma
and it has got this particular the synapse and the junction between the axon and the cell
body is known as the axon hillock.

Now, what happens? So, the information will be collected from the neighboring neurons
with the help of dendrites and it will be collected here, then it will pass through the axon
then, here actually in the synapse, there will be actually some sort of transfer of
information. Now, in biological neuron, the transfer of information takes place through
the difference in ion concentration, for example, there is a difference of sodium ion
concentration, potassium ion concentration. And, due to this particular sodium ion and
potassium ion concentration, some part of the information, the collected information will
be passed to the next neuron through the synapse. So, this is the way actually, one
biological neuron works.

Now, in an artificial neuron, the same working principle has been copied in the artificial
way.

282
(Refer Slide Time: 05:21)

Now, let us see, how to copy this particular principle in the artificial way. Now, just like
our biological neuron, it collects information with the help of dendrites. So, these are all
collected information, for example, say I_1, I_2 up to say I_n. So, these are all inputs,
these inputs are nothing but the collected information and these inputs will be multiplied
by the corresponding connecting weights, that is denoted by your the W. So, W is going
to represent actually the connecting weights. So, what we do is, we multiply a particular
input with its corresponding connecting weights.

So, we try to find out I_k multiplied by W_kj and we just sum them up. So, all such
things are summed up. So, here this summed up value is going to enter and we add some
bias value, that is your b_j just a small value. So, that particular bias value is going to be
added here.

n
So, I will be getting this particular u_j and =
this u j ∑I W
k =1
k kj + b j . So, this is nothing but

is your u_j, now if you remember this particular u_j, as if it is passing through the action
of the biological neuron. So, this u_j, let me repeat is u_j is going to pass through the
long cylindrical fiber that is nothing but the axon and then it will go to the synapse and in
the synapse, there will be some transfer function. And, here actually this is going to
represent the synapse of the biological neuron, and that is nothing but here, we have got
one activation function or the transfer function.

283
So, this particular unit, the input, that is u_j is going to enter through the transfer function
and consequently, I will be getting this particular output and this output is nothing but
the amount of information which is going to enter the second neuron. So, this is the way
actually, we passed the information in biological nervous system from one neuron to the
next neuron and the same thing actually it has been copied here, in the artificial way, in
the artificial neuron.

Now, this indicates actually the summing junction and this is almost similar to your cell
body or soma of the biological neuron. Now, this particular thing, this artificial neural
generally, we try to represent with the help of actually one circle. So, this circle is going
to represent one artificial neuron and this circle has got two compartments, the first part,
we have got u_j, that is nothing but we gave got the summing junction and here, we are
going to add this particular the bias value; that means, this is nothing but u_j. Now, this
u_j is going to pass through the transfer function and the transfer function is here and it
passes through the transfer function, I will be getting this particular output, that is
denoted by O_j.

Now, actually to represent this artificial neuron generally we use this type of circle and
there are some inputs coming. So, these are the all inputs, each of the inputs will be
multiplied by its corresponding connecting weight, the bias value will be added, we will
be getting u_j; u_j will pass through the transfer function and consequently, I will be
getting this particular the output. This is the way actually, we get the output for a
particular neuron from this input.

Now, this is the way actually, one artificial neuron works and it has been copied from the
biological neuron.

284
(Refer Slide Time: 09:45)

Now, if you see the literature, we use different types of transfer function. For example,
say we use some sort of hard limit transfer function, we also use some sort of linear
transfer function; we use sigmoid transfer function and tan sigmoid transfer function.
Now, here actually, I am just going to explain each of this particular transfer function.

(Refer Slide Time: 10:11)

So, this indicates actually, the hard limit transfer function. Now, let us try to understand,
what do you mean by this hard limit transfer function.

285
Now, if you remember the input for the transfer function is nothing but u and output is
nothing but O. Now, the output will become 0.0, if input u is found to be less than 0.0;
that means, if it is negative, if the input u is negative, my output will become equal to 0,
otherwise the output will be actually 1.0. So, for this type of transfer function, there are
two outputs, it is either 0 or it is 1.0.

So, this type of transfer function is known as the hard limit transfer function and this type
of transfer function, we use, in fact, in a special type of neuron, that is called the
perceptron neuron. So, we use this type of hard limit transfer function in perceptron
neuron. So, this is the way actually, this hard limit transfer function works.

(Refer Slide Time: 11:33)

The next is the linear transfer function now, here you can see once again. So, u indicates
the input, O is nothing but the output and this is actually the transfer function. So,
y = mx . So, y equals to say mx state line and here let me consider m is equals to say 1.0.
So, if y equals to x, if m equals to 1.0. So, this becomes y equals to x; that means, this
particular angle is your 45 degrees. That means, output is nothing but the input. So,
O = u . So, y equals to x.

So, the output is same as actually the input and this type of transfer function is used in
linear filter. So, we use this type of transfer function in linear filter and if I use y = mx , I
can also find out what should be the optimal value for this particular m. So, m could be
1, it could be less than 1. So, we can find out what should be the value for this m, during

286
optimization, we can also find out what should be the most appropriate value for this
particular m.

(Refer Slide Time: 12:53)

Now, then comes your the log sigmoid transfer function. So, this is the mathematical
expression for the log sigmoid transfer function.

1
Now, here, O = , now a is actually the coefficient, which decides what should be
1 + e − au
the slope of this particular curve. Now, if I take the higher value of a, this particular
curve will be very steep, and if I take low value, I will be getting some sort of flatter
distribution for this particular log sigmoid transfer function. Now, let us see what
happens, if I consider u is equal to 0.

Now, if I put u equals to 0. So, this will become 1 divided by 1 plus e raise to the power
0 and that is nothing but 1. So, this is 1 divided by 2 that is 0.5. So, corresponding to u
equals to 0. So, I will be getting this is 0.5 and here, this value of your output, it varies
from 0 to 1 because this is log sigmoid, so it cannot be negative. So, it varies from 0 to 1
and this is actually a non-linear distribution for this transfer function and this is very
frequently used in artificial neural networks.

287
(Refer Slide Time: 14:33)

Now, then comes here the tan sigmoid transfer function. Now, for this particular tan
e au − e − au
sigmoid transfer function, this is the mathematical expression, that is O = .
e au + e − au
And, once again, a is going to decide what should be the slope of this particular
distribution. So, the higher the value of a the steeper will be this particular curve and
vice-versa, and supposing that if I put say u is equals to 0 if, I put u equals to 0. So, the
output O will become what? This will become your e raise to the power 0 is 1 minus e
raise to the power 0 that will become 1 and here I will be getting 1 plus say 1 and this
will become equal to 0.

So, corresponding to this particular u equals to 0. So, there is a possibility I will be


getting the output is equals 0 and here, the output actually will vary between minus 1 to
plus 1 and this is actually your the tan-sigmoid. So, output varies from -1 to + 1. So,
these are the actually some of the very popular transfer functions, which are generally
used in neural networks.

288
(Refer Slide Time: 16:07)

And, now I am just going to consider say one layer of neurons. Now, a particular layer of
neurons will consist of a number of neurons for example, say. So, this particular layer is
considered, so this is consisting of say 1, 2 and. So, I have got say p number of neurons
and if I concentrate on each of the neuron you can see that it has got two compartments,
one is the summing junction, so this is the summing junction and this is nothing but the
transfer function.

Now, what you do is. So, I have got a large number of inputs say n number of inputs I_1,
I_2 up to I_n. So, what I do is, these particular inputs will be multiplied by the
connecting weight, the corresponding connecting weights, that is, W and those things
will be summed up here. Now, if you just sum them here, this is actually what is
happening. So, I_j multiplied by W_jk. So, this particular input will be multiplied by the
corresponding connecting weights and those things will be summed up here. And, I am
just going to add the bias value say this particular b, then it will pass through the transfer
function. So, this particular transfer function and I will be getting the output.

Now, it shows what happens for a particular neuron, that is the k-th neuron, for example,
this is the p-th neuron. So, I will be getting by following this particular principle, what
should be the output of the p-th neuron. So, I have got the set of inputs say n inputs, I
know the connecting weights, I know the transfer function, I know the bias values. So, I
will be able to find out what should be the output of a particular neuron, for example, this

289
is the output of the p-th neuron and in one layer, you have got a large number of neurons
for example, say 10 neurons or say 20 neurons, and so on.

So, we should be able to find out, what should be the output for each of the neurons lying
in this particular layer. And, as I mentioned like this, W indicates actually the connecting
weights and the values of the connecting weights actually will vary from say - 1 to + 1 in
the normalize scale. Now, this shows actually one layer of neurons.

(Refer Slide Time: 18:47)

And, now I am just going to show you a few layers of neurons and that is nothing but an
artificial neural network, which consists of a number of layers, for example, say here for
this particular simple artificial neural network, which have been considered. So, it has
got three layers one is called the input layer, we have got the hidden layer and we have
got this particular output layer.

Now, for simplicity, we have consider only 2 neurons in the input layer and we have
considered 3 neurons in the hidden layer and we have consider only 1 neuron in the
output layer. And, that is why, this is nothing but a 2-3,-1 network. That means, there are
2 neurons in the input layer, 3 neurons on the hidden layer and 1 neuron on the output
layer. So, this is also known as a 2-3-1 network and this is nothing but actually one
artificial neural network.

290
Now, as I told, there are two inputs I_1 and I_2, V and W indicate the connecting
weights. So, V indicates the connecting weights between the input and hidden layers and
W matrix indicates the connecting weights between the hidden and the output layer. And,
at each of the neurons, we have got the transfer function, ok. So, I will be able to find out
what should be the output here, I should also be able to calculate, what should be the
input here, I will be able to calculate the output here, then the input here and I can also
find out what should be the final output of this particular network.

All such things will be discussing in much more details. Now, here, what I want to say,
one artificial neural network consists of a number of layers and each layer contains a
number of neurons, and there will be some sort of connectivity and that is why, starting
from the input, I will be getting finally, some output here. So, this is actually the way,
one artificial neural network looks like.

(Refer Slide Time: 21:09)

Now, I am just going to explain the concept of static versus the dynamic neural
networks. Now, if I see the static network means there is no error compensation, there is
no chance of feedback, and if I consider the dynamic network there will be a chance of
error feedback and there will be a chance of further improvement.

Now, this schematic view shows one dynamic network, now, the performance of a
particular neural network depends on the architecture of that or the topology of this
particular network. That means, what is the number of layers, how many neurons are

291
going to be present at each of the layers that indicates actually the architecture or
topology of this particular network. And, the performance of this particular network of
course, depends on the connecting weights values and moreover, it depends on the
coefficient of transfer functions, the different transfer functions used in different layers.

So, the performance depends on the architecture, weights, coefficients of transfer


function and all such things, the movement we pass one set of inputs. So, I will be
getting some output here, now this particular output will be compared with the target
output just to find out what should be this particular error. Now, this error is fedback for
the adjustment of these particular parameters and this process will go on and go on. And,
through a number of iterations, there is a possibility that will be getting some artificial
neural network, which will be able to make the prediction for a set of inputs very
accurately. Now, this is the way actually, we can develop the dynamic neural networks.

Now, these things actually we will be discussing in much more details in future.

(Refer Slide Time: 23:23)

So, let me discuss a little bit, how to design and develop this particular dynamic network.
Now, if you want to design and develop the dynamic network. So, what will have to do
is we will have to train the network. So, we will have to use some optimizer or we will
have to use some sort of learning tool. So that we can design and develop that particular
the neural network so that it can predict the output for a set of inputs very accurately.

292
Now, if you see the literature. In fact, we have got two types of learning, now one is
called actually the supervised learning and another is called, in fact, your unsupervised
learning. Now, if you see the supervised learning and the unsupervised learning. So, we
will just try to find out the difference between the supervised learning and the
unsupervised learning; now the supervised learning is also known as learning with a
teacher.

Now, if the students make mistake the teachers are there to make them correct and
actually, there will be some sort of a feedback and there will be some sort of error
compensation. And, due to this error compensation or the feedback, there will be actually
some sort of supervised learning.

Now, actually, what you do is, in supervised learning, for a set of inputs, we calculate the
output, now these calculated output is compared with the target output to find out what
should be the error. Now, this particular error is feedback for actually the adjustment of
the parameters of the network, so that this particular network can predict as accurately as
possible. So, in supervised learning actually, we have got the provision for error
compensation, we have got the provision for the feedback and that is why, this particular
network, which will be trained using the supervised learning will become more efficient
and more accurate, I should say. On the other hand, we have got some sort of un-
supervised learning. Now, for supervised learning, what I need is, some well defined
training scenarios, some known input-output relationships. Now, if you have got some
pre-collected input-output relationships, we can carry out supervised learning with the
help of that particular the training data, but supposing that we do not have the training
data, now if you do not have the training data, then how to find out.

So, this type of the network, which will be able to predict as accurately as possible; so
we are in trouble, now to solve that particular problem actually, what we do is, we use
the principle of your un-supervised learning and that is nothing but the learning without a
teacher. Now, here, actually we use the principle of competition and through the
competition one winner will be selected or declared, there will be cooperation with the
between the winner and the surroundings. And, there will be further updating and
through this principle of the competition, cooperation and updating, in fact, we can
implement the unsupervised learning or unsupervised training to the network.

293
Now, this particular things, in fact, both supervised as well as unsupervised learning will
be discussed in much more details later on, in this particular the course; now let me
concentrate for the time being a little bit on supervised learning.

(Refer Slide Time: 27:37)

So, if you see the literature. So, we have got two types of training used in supervised
learning. Now, one is called the incremental training and another is called the batch
mode of training. Now, these incremental training is also known as the online training.

Now, let us try to understand the principle of your incremental training or the online
training. Now, to understand this, let me go back a few slides back on, let me concentrate
on this particular slide a little bit.

294
(Refer Slide Time: 28:17)

Now, here actually what you do is, during the incremental training, what I am doing, I
am passing say one set of inputs, now depending on the connecting weights the
coefficient of the transfer function or the nature of the transfer function, I will be getting
some output here. So, output will be calculated. Now, this particular calculated output,
we will be comparing with the target output denoted by T and we will try to find out the
error. So, by comparing the target and this particular calculated output, we try to find out
this particular error.

Now, based on this particular error, what you do is, we try to go for the feedback circuit;
that means, this error will be propagated back, so that we can modify the values of the
connecting weights, the coefficients of the transfer function, the bias values and all such
things. So, I pass one training scenario calculate the output, determine the error and this
particular error will be fedback for the purpose of updating this particular network, that is
nothing but is your incremental training or online training.

Now, let me make it more clear, supposing that I have got 1000 training scenarios,
collected known input-output relationships, now what we do is, out of the 1000 training
scenarios, the first one, the first training scenario means known input-output relationship.
So, inputs we are passing, I will be able to calculate the output. So, this calculated output
will be compared with the target value, I will be able to find out, what will be the error

295
based on this particular error, we update this particular network, that is nothing but is
your incremental training or online training.

Next, we go for the second training scenario, once again we repeat the process, once
again we modify the network then, I go for the third training scenario repeat the process
and this is actually the principle of your incremental training or the online training. Now,
if you see the computational complexity of this particular training, it is computationally
faster. But, there is a possibility that you may not get a very adaptive network, now by
adaptive network I mean a network, which will be able to provide some sort of adaptive
solutions, even for some unknown test scenarios.

So, if I follow the incremental training or the online training, we may not get a very
adaptive network, but its computational complexity is much less. Now, I am just going to
explain the principle of another training and that is called the batch mode of training.

(Refer Slide Time: 31:31)

Now, let me take the same example that I have got a set of 1000 training scenarios like
the known input-output training scenarios. Now out of 1000, the first one I pass through
this particular network and I will be getting some output. So, you store that particular
output and this output you compare with the target output of the first training scenario,
find out the error and this particular error actually, you save it.

296
And, then you go to the second training scenario. So, once again, I am going to pass the
set of inputs corresponding the second training scenario and I will be getting some
output, this output will be compared to the target output and I will be getting some error
supposing that corresponding to the first training scenario, say I have got some error. So,
that is say denoted by e_1 corresponding to the second training scenario the error which I
am getting say e_2 and I am passing all the training scenarios. So, supposing that I am
passing all the 1000 training scenarios.

So, I will be getting this particular error, you find out the average error, that is, e , you
sum them up divided by one thousand that will become your average error and based on
this particular average error, you update the network only once. That is actually, the
principle of your batch mode of training or the offline training and once again let me
repeat the batch mode of training.

So, this particular updating of the network is done after passing all the training scenarios
and based on the average effect, we are going to update this particular the network and as
you consider the average effect, while updating the network, there is a possibility that
this particular network will become very adaptive.

So, this adaptiveness or adaptability we can achieve, if I just go for the batch mode of
training and which is nothing but is your offline training. Now, here, we should take one
precaution like what should be the number of training scenarios during the batch mode of
training, actually that we will have to follow very meticulously. Now, supposing that, I
have got, say capital say X number of the design parameters in a network. Now, these
particular design parameters include the number of connecting weights, the number of
bias values, the coefficient of transfer function and all such things; and supposing that I
have got capital X number of design variables corresponding to the network.

So, I will have to select the number of training scenario in such a way that the number of
training scenario becomes at least equal to X or greater than X. So, the number of
training scenarios should be either equal to a capital X or it should be slightly more than
capital X, but it cannot be less than capital X.

Now, if I consider the number of training scenario is a less than capital X, there is a
chance that there will be some sort of under-training and this particular network may not
be able to predict very accurately and that is why, actually, for the batch mode of training

297
so, we will have to use a large number of training scenarios. Now, the way I mentioned
that after passing all the training scenarios, we do the updating only once and that is
called actually one epoch of this particular training.

So, if I consider the batch mode of training, after passing all the training scenarios, we do
the updating once, that actually concludes one epoch of this particular batch mode of
training. So, this is the way, actually we implement the batch mode of training or the off-
line training.

(Refer Slide Time: 36:05)

Now, regarding the reference, the textbook for this particular course is Soft Computing:
Fundamentals and Applications. So, we will have to see it for more details.

298
(Refer Slide Time: 36:21)

Now, let me conclude, whatever I discussed in this particular the lecture. Actually, we
started with the working principle of one biological neuron and we tried to design one
artificial neuron just by copying the functions of or the working principle of that
particular biological neuron. Then, we in fact, discuss or introduced the structure of an
artificial neural network, we discuss the principle of the supervised and unsupervised
training. And, these things will be discussed in much more details in future. Then, we
concentrated on the concept of the incremental and batch modes of training generally
used in supervised training for the neural networks.

Thank you.

299
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 21
Some Examples of Neural Networks

Some Examples of Artificial Neural Networks: Now, we have seen how to design an
artificial neuron by copying the principle of a biological neuron. We have also discussed,
how to form a layer of neurons and we have also seen, how to design a particular
structure of artificial neural network.

(Refer Slide Time: 00:53)

Now, we are going to discuss a some of the very important and popular neural networks.
So, we are going to concentrate on the working principle of a few networks. Now, these
networks are as follows: We are going to discuss the principle of multi-layer-feed-
forward neural networks. Then, we are going to concentrate on radial basis function
networks. After that, we are going to concentrate, how to include the feedback circuit
along with the feed-forward circuit to develop the most popular, that is your recurrent
neural networks. We will see the principle of self-organizing map or Kohonen network.
And, at the end, we will try to see the working principle of a counter-propagation neural
network and we will solve a few numerical examples also.

300
(Refer Slide Time: 01:57)

Now, here actually, let me start with the first one that is your multi-layer feed-forward
network. And, in short, this is known as your MLFFNN, that is Multi-Layer Feed-
Forward Neural Network. Now, its name indicates that this particular network consists of
a number of layers. And, here, we are going to consider three layers, that is the input
layer, then comes hidden layer and we have got the output layer.

Now, here on the input layer, we are going to consider capital M number of neurons.
Similarly, on the hidden layer, we have got capital N number of neurons and on the
output layer, we have got capital P number of neurons. So, this is known as in fact your
M-N-P networks.

Now, here if you see on the input layer, we are using the linear transfer function say it is
something like y = x sort of thing, and we have considered the slope of this particular
the straight line say m that is equals to 1. So, on the input layer, we are considering the
linear transfer function y = x . On the hidden layer, we are considering the log sigmoid
transfer function.

And, we have already seen the mathematical expression for the log sigmoid transfer
1
function, that is nothing but y = . Now, here, this particular a is actually the
1 + e − ax
coefficient, which is going to decide, what should be the slope of this particular curve.
Now, these are already mentioned that the higher the value of a, the steeper will be the

301
curve and vice-versa. So, this is your the log sigmoid transfer function and its output
varies from your 0 to 1.

Now, on the output layer, we consider the tan sigmoid transfer function. And, if you see
the mathematical expression for a tan sigmoid transfer function, this is something like
e ax − e − ax
this. y = . So, this is nothing but the tan sigmoid transfer function. And, once
e ax + e − ax
again, the coefficient a indicates the slope of the curve.

Now, if you see the range for this particular tan sigmoid transfer function, it varies in the
range of – 1 to + 1. So, this is regarding the different types of the transfer function used
in the different layers of this particular network.

Now, then comes here the connecting weights between the input layer and this particular
your hidden layer is denoted by the V matrix. And, similarly, the connecting weights
between the hidden layer and the output layer are denoted by so this particular the W
matrix. And, if I see the individual values for this connecting weights, it may vary in a
range of say 0 to 1 or in the range of say – 1 to + 1, that means, these are in the
normalized scale.

Now, here if you see the inputs, as I mentioned there are M inputs and actually on the
input layer, we have got the M number of neurons. Now, let me concentrate on a
particular neuron lying on the input layer say ith neuron. And, a particular neuron that is
your the j-th neuron lying in the hidden layer. And, a particular neuron lying on the
output layer, say this is nothing but the k-th neuron. So, the connecting weight between i
and j is denoted by your v_i j. And, the connecting weight between your j and k, this is
nothing but is your w_j k.

Now, I am just going to discuss a little bit like how to design, in fact, your how to send
the inputs to this particular network. Now, remember one thing so, the inputs to the
neural networks are sent in the normalized form, because the different inputs may have
different ranges.

302
(Refer Slide Time: 07:23)

And, if you do not normalized, there could be a possibility that this particular network
may not work properly. So, what will have to do is, all the inputs like your so these
inputs, we will have to pass in the normalized scale. Now, before that, if I how to make it
normalized, so I am just going to discuss in details. Now, before that, let me just tell you
the notations, which I am going to use here.

Now, this I _I1 this particular notation indicates, the input of the first neuron lying on the
input layer, then comes here I _O1 is nothing but the output of the first neuron lying on
the input layer. Similarly, if I see so this H _Ij is nothing but the input of the jth the
neuron lying on the hidden layer. So, H_Oj is nothing but the output of the jth neuron
lying on the hidden layer. And, this O_Ik is nothing but you are the input of the kth
neuron lying on the output layer. And, this O_Ok is nothing but your output of the kth
neuron lying on the output layer.

Now, I am going to discuss how to make this particular inputs in the normalized scale.
Now, normalized scale means either we will have to put in the scale of say 0 to 1 or in
the scale of say – 1 to + 1. So, let me discuss how to make it in the scale of say either 0
to 1 or from – 1 to +m 1.

Now, let me concentrate on this first one, that is I_I1. Now, to represent in the
normalized scale, that is, 0 to 1 the formula, which is generally used is something like

303
I I 1 − I Imin
this, that is, 1
. Now, if input, that is, I_I1 is kept equal to your I_I1^minimum,
I Imax
1 − I min
I1

I will be getting, which is nothing 0. And, if I put that I_I1 is nothing but
I_I1^maximum, so I will be getting 1. So, this is the way actually, this particular input
can be converted into the scale of 0 to 1.

Similarly, if I want to represent in the scale of – 1 to + 1. So, what I will have to do is


2( I I 1 − I Imin
1 )
actually, I will have to use one expression, which is nothing but −1.
I I1 − I I1
max min

If I put I_I1 is equal to I_I1^minimum, so I will be getting 0 here that means I will be
able to generate – 1. And, if I put I_I1 equals to I_I1^maximum, so here, I will be
getting 2 -1. So, this is nothing but + 1. So, I can vary this particular input in the scale of
– 1 to + 1.

Now, the point, which I am going to make that this particular inputs are to be represented
in the form of normalized scale. And, once you got this particular inputs in the
normalized scale, and if I use the linear transfer function on the input layer, I will be
getting the output, that is nothing but equal to the input, that means, your I_Oi will
become equals to I_Ii. So, this is the way, actually we can find out the output of the
neurons lying on the input layer.

And, once you have got, this particular outputs of the first layer. Now, I can multiply
with the corresponding connecting weight and I can sum them up, so that I can find out
the input for the hidden layer. And, once you have got this particular input for the hidden
layer, it will be passed through the log sigmoid transfer function. So, I will be getting the
output like H_Oj, H_O1, and so on.

And, once you have got this particular outputs, now we will multiply with the
corresponding connecting weights denoted by W and these are summed up here. And,
that will be the input of the kth neuron lying on the output layer. And once again, it will
pass to the transfer function, that is your tan sigmoid transfer function. And, accordingly,
I will be getting the output of the kth neuron lying on the output layer.

So, this the way actually, we can carry out the forward calculation. And, we can find out,
what should be the output for a set of inputs. Now, here before I go for the forward

304
calculation. I am just going to mention two things. One is your, we generally use some
bias value, so we put some bias value but, here for simplicity actually, I have assumed
that bias is equal to 0. And, this is actually, how to make this particular analysis a little
bit simple.

Now, another thing I should mention that initially we generate all such connecting
weight values like the V values and W values at random using the random number
generator. And, then, through a large number of iterations, we try to find out what should
be the updated values for these V and W, so that this particular network can make the
prediction as accurately as possible.

Now, let me repeat once again for the set of inputs, if I know all the transfer functions at
the connecting weights, I will be able to find out this particular output and this is nothing
but the calculated output. Now, for this training scenario that means, for this set of input
parameters, there is one known output. And that is nothing but the target output that
means, if I write here, O_Ok is nothing but the calculated output of the kth neuron lying
on the output layer, I can also write down.

So, T_Ok is nothing but the target output of the k-th neuron. Now, if I know the target
output, very easily I can find out the error. How to find out? I am just going to discuss in
details. And, once I have got this particular error by comparing the calculated output
with target output, I will propagate in the backward direction. I can modify or I can
update all the connecting weights, and I can update the coefficients of the transfer
functions. And, through a large number of iterations, I am just going to do this particular
updating. And, ultimately, so this particular network is going to make the prediction as
accurately as possible.

Now, another thing I should mention that the performance of this particular network
depends on a number of parameters. For example, it depends on the connecting weights,
it depends on your the coefficient of the transfer function, whether it is log sigmoid, tan
sigmoid or whether it is the linear transfer function, it depends on the slope of the linear
transfer function, it also depends on the topology or the architecture of this particular
network.

And, to represent the topology or the architecture, we see how many layers are there and
how many neurons are present in each of these particular layers. So, the performance of

305
this particular network not only depends on the architecture or the topology, it also
depends on the connecting weights, the coefficient of transfer function, and all such
things.

(Refer Slide Time: 16:17)

Now, let me see, how to carry out, actually, the forward calculation for this particular
network. Now, before I go for that, this I have already mentioned that this V is nothing
but the connecting weights between the input and hidden layers.

And, this is actually a matrix, the matrix of connecting weights. And, here, this is
nothing but one M × N . So, this is nothing but actually M × N . And, this particular W
matrix is a connecting weight matrix between the hidden and output layers. And, this is
nothing but is your N × P matrix. Now, as I told that initially, this is generated at
random, and through a large number of iterations, these particular connecting weights
will be updated.

306
(Refer Slide Time: 17:05)

Now, let us see how to carry out the forward calculations. Now, in step-1, we try to
determine the output of the input layer. Now, as we have considered the linear transfer
function of the form y = x , so output equals to input. So, we can very easily write down,
the output of the ith neuron lying on the input layer is nothing but the input of the ith
neuron lying on the input layer. And, here, small m is varying from 1, 2 up to capital M.

Now, once we have got the output of the input layer, now we are in a position to
determine what should be the inputs of the hidden layer. So, we are going to discuss
step-2 and the input of the hidden layer that is nothing but
H Ij = Vij I O1 + ............... + Vij I Oi + ............. + VMj I OM . Now, here, j varies from 1, 2 up to N.

And, once you have got the input of the hidden layer, now it will be passed through the
transfer function just to find out, what should be the output of this hidden neuron.

307
(Refer Slide Time: 18:49)

Now, let us see how to find out the output. So, step-3, that is, the determination of output
1
of the hidden neuron, that is, H Oj = −a H
. So, H_Ij is a actually nothing but the
1 + e 1 Ij
input of the jth neuron lying on the hidden layer. And, this particular a_1 is actually your
the coefficient of the transfer function.

Now, next is your the step-4, that means, we will have to find out the inputs of the output
layer. Now, this O_Ik is nothing but the input of the kth neuron lying on the output layer,
and that is nothing but W_1k multiplied by H_O1 plus there are a few terms here plus
W_jk multiplied by H_Oj plus there are a few terms here, and the last term is your
W_Nk multiplied by H_ON, and k varies from 1, 2 up to capital P, and this P is nothing
but the total number of neurons lying on the output layer.

308
(Refer Slide Time: 20:19)

Now, this is the way actually, you can find out the inputs of the output layer. And, once
you have got the input of the output layer, now in step-5, we allow this particular input to
e a2OIk − e − a2OIk
pass through the transfer function, and you will be getting actually OOj = .
e a2OIk + e − a2OIk
And, this a_2 is nothing but the coefficient of the transfer function. So, this O_Ok is
nothing but the calculated output of the kth neuron lying on the output layer.

And, once you have got it, now we can compare with the target value, that is the output
of the kth neuron that means your the target output that is denoted by T_Ok. Now, if I
have got the target output is nothing but T_Ok and the calculated output is nothing but
O_Ok, now this particular thing it could be either positive or negative.

And, that is why, to make it positive, what you do is, either we consider the mod value of
this or what you considered is your like T_Ok minus O_Ok. So, square of that just to
make it positive. And, here you can see, I have added here one term that is your half, so
this is multiplied by half. The reason is actually very simple.

Now, in future, I will have to differentiate. So, this particular error of the kth output
neuron with respect to the calculated one. And, if I put square here and if I differentiate,
I will be getting one, 2. So, this 2 will be multiplied by 1/2 just to make it 1. For this

309
particular purpose actually, we use this particular term, that is, 1/2. So, we are able to
1
=
find out what should of the Ek (TOk − OOk ) 2 .
2

(Refer Slide Time: 22:45)

Now, once you have got this particular output of the kth neuron lying on the output layer.
Now, in the output layer, in fact, we have got a large number of neurons or say P number
of neuron. So, I can find out what should be the total error of all the neurons lying on the
P
1
=
output layer and that is nothing but E ∑
k =1 2
(TOk − OOk ) 2 . So, this is nothing but the

total output of the output layer considering P neurons.

Now, if I have got this particular total error that corresponds to, in fact, only one training
scenario, that means, only one set of inputs and outputs. And, supposing that we have got
say capital L number of training scenarios, so what we will have to do is, we will have to
pass all the training scenarios one after another and we can find out the total error after
passing all capital L training scenario and that is denoted by E_total. So,
L P
1
=Etotal ∑∑
=l 1 =k 1 2
(TOkl − OOkl ) 2 , where capital L indicates the number of training

scenario.

Now, here, this particular term O_Okl represents the output of the k-th neuron lying on
the output neuron layer corresponding to l-th training scenario. Similarly, say this

310
particular T_Okl indicates the target output of the k-th neuron lying on the output layer.
And this particular output is the target output corresponding to the l-th training scenario.

So, this is the way actually, you can find out what should be the expression of the total
error for this output layer neurons after considering all the training scenarios. And, once
you got this particular picture, now, we are in a position like how to propagate it back, so
that we can modify actually your the connecting weights and the coefficient of transfer
function such that this particular network can make the prediction as accurately as
possible.

(Refer Slide Time: 25:13)

Now, here actually, I am just going to discuss now how to minimize this particular error
in prediction. Now, to minimize the error in prediction, what we will have to do is, we
will have to take the help of one optimizer or one optimization algorithm. Now, here, I
have already discussed that the performance of this particular network depends on a
number of parameters. So, if I consider this error, error in prediction of the network, so
that is a function of so many variables.

For example, say it depends on the connecting weights V, it depends on the connecting
weights W, it depends on the coefficient of transfer function say a_1, it depends on the
coefficient of transfer function a_2 for the output layer, it also depends on the slope of
the linear transfer function, say m and there are a few other parameters. So, it depends
on, in fact, a large number of parameters, ok.

311
Now, what you do is, for simplicity for the purpose of explaining how can it be
minimized, how can this particular error be minimized. So, what we assume that this
particular error is a function of only two variables, that is V, another is your W and for
the time being you just forget the other terms. And, if I express that is error in prediction
is a function of two variables V and W, so very easily, we can prepare the plot, that plot
is nothing but the error plot. And, we will be getting this particular error surface.

Now, if I plot this particular error along the z direction and say, the connecting weight v
along x direction and the connecting weight w along y direction, I will be getting this
particular error surface in 3 D. Now, we know that we, human beings, can visualize only
up to three dimension and that is why, we have considered that this particular error is a
function of only two variables, so that we can plot this particular error surface in 3 D.

Now, if I consider more, so it will become four or more than four dimensions, which we
cannot visualize. So, let us assume that this error is a function of only two variables, so
on the 3 D plot, I can see, what should be this particular error surface. Now, as I told that
we start with the random values for these particular the V and W and supposing that
initially, the error of the network is here say.

And, what is our aim, our aim is to reach the minimum value of error, which is here. So,
what you do is, we start from here, then through a large number of iterations, actually we
try to move towards the minimum error solution and gradually, during the training, the
network is going to reach this particular state, so that if we send a set of inputs, it will be
able to predict the output as accurately as possible.

Now, as I told that we are going to take the help of some optimization tool for example,
say we are going to use a very popular optimization tool, the traditional tool for
optimization, that is known as the steepest descent method. So, the steepest descent
method actually, we are going to use. Now, this steepest descent method is one of the
most popular traditional tools for optimization. And, here, the search direction is
opposite to the gradient.

So, we try to find out the gradient direction of the objective function and try to move in a
direction opposite to the gradient. The same principle actually has been copied here in
the back-propagation algorithm. Now, here, actually what you do is, we try to find out
the change in V, that is the connecting weight between the input and the hidden layer,

312
that is nothing but minus eta multiplied by the partial derivative of error E with respect to
∂E
the V. ∆V =−η . Now, this particular actually the η is known as the learning rate.
∂V
Now, this learning rate actually it varies in the range of, say 0 to 1.

Now, if I compare this particular expression with the expression of this particular, the
∂E
steepest descent method, so the search direction is decided by − , that means, I have
∂V
moving in a direction opposite to the gradient. And, here so this η , that is the learning
rate is going to represent actually the step length of this particular steepest descent
algorithm. Now, let me mention that for any optimization tool, there are two things
which will have to consider, one is your what should be the search direction, and what
should be the step length.

Now, the same thing, we have just copied here to find out the change in V is nothing but
eta (the step length) and minus partial derivative of E with respect to V, so that is going
to decide, what should be the search direction. Now, similarly, actually the change in W
∂E
is nothing but −η . Similarly, if I just want to say find out the updated value for this
∂W
the coefficient your a_1, a_2.

∂E
So, what will have to do is, so we will have to find out what is ∆a1 =−η . And,
∂a1
similarly, I can also find out the small change in a_2 also. So, this is the way actually we
can implement this delta rule just to find out what should be the updated values for the
connecting weights and the coefficients.

313
(Refer Slide Time: 32:33)

Now, I will be discussing all such things in details mathematically we will see how to
determine these updated values with the help of some mode of training and these things
will be discussed, in details.

Thank you.

314
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 22
Some Examples of Neural Networks (Contd.)

(Refer Slide Time: 00:17)

So, this is the multilayer feed forward network. I am supposing that we have passed say
one set of training scenarios here, now if I pass one set of training scenarios and we have
discussed like how to get this calculated output, and we have seen how to determine this
particular error of the kth output neuron. Now, based on this particular error, this error
has to be propagated back; so, I will have to update this particular w_jk and here, we
have got the connecting weight say V_ij. Now, let us see how to update.

Now, to update that actually, what you do is, we try to move in the backward direction
and the moment it reaches here we will stop. Now, here to update this V_ij, so starting
from this particular error. I will propagate it back and the moment it reaches here, I am
going to stop. And, another thing I am just going to tell you that this particular w_jk has
got some contribution on the output of this particular kth neuron, but it has got no
contribution towards the output of the other output neurons. So, w_jk has got
contributions towards the output of this kth output neuron, that means, if I want to update
this w_jk, I will have to consider only the kth output neuron and its error.

315
On the other hand, if I want to update this particular V_ij. So, depending on the V_ij, I
will be getting some output here. Now, this particular H_Oj has got at least some
contribution of this particular V_ij. And, here, this H_Oj is connected to all the output
neurons, so whatever outputs we are getting at the different neurons of the output layer.
So, this particular V_ij has point has got at least some contribution. That means, if I want
to update this particular V_ij, I will have to consider the average error of this particular
output neuron. On the other hand, if I want to update only w_jk, so I will have to
consider the error of only this kth output neuron. So, I think, I am clear.

Now, with this particular understanding, so let me start with the principle of your the
incremental training and let us see how to use the principle of incremental training.

(Refer Slide Time: 03:05)

The principle of which, I have already discussed in details, now I am just going to
discuss how to update this particular w_jk. Now, w_jk is nothing but the connecting
weight between the jth hidden neuron and the kth output neuron. Now, this
∂Ek
w= −η
w jk , previous + ∆w jk . Now, this ∆w jk = .
∂w jk
jk ,updated

Now, let us see how to determine this particular partial differentiation, that is your
∂Ek ∂E ∂OOk ∂OIK
= . So, we are going to use actually the chain rule of
∂w jk ∂OOk ∂OIK ∂w jk

316
differentiation. So, the chain rule of differentiation, we are going to use just to find out
∂Ek
what should be this particular .
∂w jk

1
Now here, this=
Ek (TOk − OOk ) 2 . Now, if I try to find out the partial derivative of this
2
particular O_Ok, so I will be getting actually del E_k del O_Ok is nothing but your 2
multiplied by 1/2 multiplied by minus 1 then comes your T_Ok minus O_Ok. So, this 2,
2 gets cancel. So, this is nothing but T_Ok minus O_Ok. So, this is the way actually, we
can find out. So, this partial derivative is nothing but −(TOk − OOk ) . So, this is the way
actually, this first partial derivative we can find out.

Now, we will have to find out the partial derivative of O_Ok with respect to your O_Ik
and if you remember actually on the output layer, we use actually your the tan sigmoid
transfer function.

(Refer Slide Time: 06:13)

Now, if you use tan sigmoid transfer function, so very easily, you can find out what
should be the derivative. Now, let me concentrate on how to find out the derivative of
this particular the tan sigmoid transfer function.

317
e a2 x − e − a2 x
Now, if you just write down the expression like you have y = . Now, if you
e a2 x + e − a2 x
dy
find out the derivative, that is, = a2 (1 + y )(1 − y ) . So, we can find out this particular
dx
derivative and all of us you know, how to find out the derivative with respect to x and if
you simplify, so you will be getting this particular the expression where y is nothing but
this particular the expression.

dOOk
The same thing, we have copied it here, just to find out =a2 (1 + OOk )(1 − OOk ) . So,
dOIk
very easily, you can find out this particular partial derivative. Now, then comes here the
partial derivative of O_Ik with respect to your w_jk. Now, here, if you remember, so this
O_Ik is what? That is nothing but the input of the kth neuron lying on the output layer,
and if you just find out that particular expression that is your O_Ik there will be a few
terms and in fact, there will be a few terms, but at the middle almost we will be getting a
term that is nothing but H_Oj multiplied by your W_jk and there are a few other terms.

Now, what is this? This is nothing but the output of the jth neuron lying on the hidden
layer multiplied by the connecting weight, that is your w_jk. Now, if I find out the partial
derivative of O_Ik with respect to your w_jk. So, definitely I will be getting this
particular H_Oj. So, very easily we can find out all the derivatives. And, once you have
dEk
got all the derivatives, now you are in a position to determine what is . So, we
dw jk

substitute all the values, all the expressions, and then we will be getting this particular
the final expression.

And, once you got this particular final expression of the partial derivative. So, we can
find out ∆w jk is −η multiplied by this particular partial derivative. So, I will be getting

η a_2 (T_Ok minus O_Ok)(1 plus O_Ok) (1 minus O_Ok) multiplied by H_Oj. So, very
easily, we can find out what should be this particular change in w.

318
(Refer Slide Time: 09:41)

Now, once you have got the change in w. Now, we are going to discuss like your how to
find out the change in V matrix. Now, let me concentrate on V_ij that is nothing but the
connecting weight between the ith input neuron and jth hidden neuron. So,
V=
ij ,updated Vij , previous + ∆Vij , and I have already discussed that this particular delta V_ij, if I

want to determine.

So, we will have to consider the average effect, av stands for average. So,
∂E
Vij = −η{ }av . Now, how to find out this particular expression? Del E/del V ij average
∂Vij

is nothing but summation k equals to 1 to P. In fact, this would be your capital P, the
∂E 1 P ∂E
notations which are using. So, { }av = ∑ k .
∂Vij P k =1 ∂Vij

319
(Refer Slide Time: 11:11)

Now, once you have got this particular expression, you will see how to determine, in
∂Ek ∂Ek ∂OOk ∂OIk ∂H Oj ∂H Ij
fact, your this = . Now, we have already discussed, how
∂Vij ∂OOk ∂OIk ∂H Oj ∂H Ij ∂Vij

to determine, this partial derivatives, that is partial derivative of E_k with respect to
O_Ok, that is nothing but this particular expression. We have also seen how to determine
this particular expression of partial derivative and that is nothing but this particular
expression.

(Refer Slide Time: 12:27)

320
∂OIk
And, now I am just going to discuss, how to find out the next one, that is, = w jk
∂H Oj

that also we have seen. Now, I am just going to find out the partial derivative of H_Oj
with respect your H_ij that means, output of jth neuron lying on the hidden layer and this
is the input of jth neuron lying in the hidden layer.

Now, in the hidden layer actually, we have used one log sigmoid transfer function, which
1 dy
is nothing but y = − a1 x
. And, if you find out its derivative that is = a1 y (1 − y ) .
1+ e dx
And, once you got this particular expression, very easily, you can find out the
∂H Oj ∂H Ij
= a1 H Oj (1 − H Oj ) . And, the last term that is your = I= I Ii , that is output of
∂H Ij ∂Vij
Oi

the ith neuron laying in the input layer and this is nothing but the input of the ith neuron
lying in the input layer, so I can find out this particular expression.

(Refer Slide Time: 14:25)

And, once we have got it, now we can find out what is partial derivative of E_k with
respect to V_ij and this is nothing but this particular the big expression. And, once you
have got it, now we can also find out actually what should be your this particular V_ij.
Now, V_ij is, in fact, nothing but your this particular expression. So, this is your this
∂E
{ }av is nothing but this and we have already discussed how to determine this
∂Vij

321
particular thing and once you know this, I can find out this average and once I know this
average I can multiply by η and put one negative side, that is nothing but is your change
in V ij.

So, we are in a position using this particular your differential calculus just to find out,
what should be the change in connecting weights or the updated values for this particular
the connecting weights using the incremental mode of training.

(Refer Slide Time: 15:25)

Now, I am just going to discuss the batch mode of training. Now, this batch mode of
training and its principle I have already discussed that supposing that I have got a large
number of training scenario say capital L number of training scenarios. Now, if I got
capital L number of training scenarios, what you do is, we pass all the training scenarios
one after another and we try to find out, what is the error corresponding to the each of the
training scenarios, we try to find out how much is the total error, we also try to calculate
what is the average error and based on this particular average error, we update the
network only once.

Now, let us see, how to implement this. Now, let us consider, there are capital L number
of training scenarios. So, mean squared deviation in prediction for the kth output neuron
1 1 L
that can be written as that is E ′ =× ∑ (TOkl − OOkl ) 2 ; and here actually, what will
2 L l =1
have do is, once again we will have to update the w_jk and v_ij, that is the connecting

322
weights between the hidden neurons and output neuron, and that between the input and
the hidden neurons.

Now, let us see, how to update this particular w_jk. Now, this delta w_jk is nothing but
minus eta multiplied by partial derivative of E^prime with respect to your w_jk. Now,
how to determine this particular partial derivative? To determine the partial derivative, I
am once again using the chain rule of differentiation. So, this del E^prime del w_jk is
nothing but del E^prime del E_l that means, the error corresponding to the small lth
training scenario then comes the rate of change of E_l with respect to your E_k then
comes your the partial derivative of E_k with respect to O_Ok, partial derivative of
O_Ok with respect to O_Ik, partial derivative of O_Ik with respect your w_jk.

Now, here I just want to mention one thing. Now, regarding the last three terms, for
example, these three terms we can find out some numerical values, ok. So, ultimately
you will be getting some numerical values. But, the first two terms, that means, your this
particular partial derivative, it indicates only the rate of change of E^prime with respect
to the lth training scenario, and, the rate of change of error with respect to the lth
scenario, with respect to your the E_k (that is the error of the kth output neuron) are used
just to tell you that we will have to sum all such things, but you may not get the direct
numerical value corresponding to this first term and this particular the second term. But,
starting from the third up to the fifth term, you will be getting some numerical values.
Now, this is the way actually, we will have to find out the partial derivative of E^prime
with respect your the w_jk.

323
(Refer Slide Time: 19:33)

Now, then comes here, how to update the v_ij. Now, this ∆vij is nothing but minus η

the partial derivative of E^prime with respect to v_ij average. And, this particular del
E^prime/del v_ij average is nothing but summation k equals to 1 up to P, del E_k^prime
del v_ij and will have to sum them up and then will have to multiplied by one divided by
capital P. So, this is the way actually, we can find out the average of this particular
partial derivative. And, how to find out this del E_k^prime/del v_ij. So, this
∂Ek' ∂Ek' ∂Ekl ∂OOk ∂OIk ∂H Oj ∂H Ij
= . Now, we have already discussed like how to
∂vij ∂Ekl ∂OOk ∂OIk ∂H Oj ∂H Ij ∂vij

find out these derivatives and I can also find out the numerical values corresponding to
each of the partial derivatives. But, once again, the first, this particular component, that is
the partial derivative of E_k^prime with respect to your E_kl, you may not be able to
determine the numerical value, but it indicates the rate of change of E_k^prime with
respect to E_kl. Similarly, this is nothing but your the rate of change of E_kl with respect
your O_Ok. Now, these two terms is going to help us or going to tell us that you find out,
this particular expression for each of the training scenarios and you sum them up, just to
find out the total effect and, so that we can find out the average effect for updating of that
particular the connecting weights.

Now, once you have got this particular derivative using this formula, I can find out this
average and once I have got this particular average. So, I will be able to find out what

324
should be the change in v_ij. So, this is the way actually, we can update the connecting
weights using actually the batch mode of training.

(Refer Slide Time: 22:25)

Now, supposing that we know the updated values for the connecting weights. Now, if
you see this particular rule, now let me just write it here, for example, say the change in
∂E
w is nothing but (let me repeat) as ∆w =−η . Now, if I follow this particular rule that
∂w
is called the delta rule. Now, there is a possibility that this particular partial derivative, it
could be either positive or negative. But, this η is always positive and it is lying in the
range of 0 to 1. Now, this η is positive, but this partial derivative could be either positive
or negative. Now, supposing that it has become negative and here, I have got another
negative side so that will make the ∆w is actually a positive value in some of the
iterations.

Now, that means, while updating this particular w, if I just go on adding some numerical
values. So, after running this particular algorithm for a large number of iterations, there
is a possibility that the value for the w may come out of the range. For example, the
range is 0.0 to 1.0, another range could be minus 1.0 to plus 1.0. Now, if I just go on
adding, so this particular updated value then what will happen actually is this weight may
go out of the range and the network may lose the stability or the balance. Now, neural
network actually does not know anything of the physical problem. So, if it is wrongly

325
trained, then also it is going to give some results, but we will have to be careful that the
stability of this particular network should be maintained, that means, your w should not
exceed its own range.

Now, supposing that w is found be greater than 1, in that case, we will have to use some
correction and that correction is something like this. So, what I will have to do is, if w is
found to be greater than 1. So, I am just going to put 1 divided by w. Now, if I consider 1
divided by w, this will become less than 1 and once again, I can change this particular
network for more iterations. The moment I put 1 divided by w in place of w, there could
be a sudden change in the performance of the network, but after a few iterations, once
again it is going to reach that particular the balanced region and this particular network
can be believed, only it is working in that particular the balanced region.

Now, to overcome this particular problem, so that w does not become greater than 1, so
what we do is, we use the generalized delta rule and that is nothing but is your delta w
(t), (t indicates t-th iteration) is nothing but minus η multiplied by the partial derivative
of E with respect to w corresponding to the t-th iteration plus alpha^prime multiplied by
∆w (t - 1). So, this particular extra term we are going to add and α ′ is known as the
momentum constant and the range for the momentum constant is once again from 0 to 1,
and what we do is, we try to see: what was the change of w in the previous iteration that
is nothing but (t minus 1)-th iteration. That means, we try to see the history of this
particular the updated weight that means, what happens at (t minus 1)-th iteration to this
particular ∆w , that I am going to consider.

Now, if I consider then there is a chance that I am going to provide some sort of damping
effect to this particular network or I am just going to put some sort of cushioning effect
to this particular network, so that the w does not exceeds it range, but if I follow this
once again there is no guarantee that the connecting weight will lie within its range, if the
network is running for a large number of iterations, so it may once again go out of this
particular range. And, if it goes out of the range, this is actually the remedy.

Now, as I told, just to put some sort of damping effect to this particular network, we
consider its history that means, what happens to the previous iteration, we want to give
some weightage to this particular change in connecting weight, just to find out, what

326
should be your the updated weight. So, this is the way, we can update, this particular the
network.

(Refer Slide Time: 27:57)

Now, a few notes here I have put, I just want to mention. Now, this multilayer feed
forward network trained with the help of this back-propagation algorithm, that is the
delta rule or the generalized delta rule, there is a possibility that there will be a local
minima problem. The reason is very simple because it works based on the steepest
descent algorithm and that means, it is going to use the information of the gradient.

Now, supposing that say I have a very complicated error function with so many such ups
and downs and undulations. Now, what you do is, while minimizing the error, this
particular algorithm will try to find out the search direction, which is opposite to the
gradient. Now, gradient is a local property. So, there is a chance that it is going to get
stuck at the local minima, and it will not be able to reach that particular your globally
optimal solution and this back propagation algorithm is actually having a chance of local
minima. And, the transfer function has to be differentiable because we are using the
gradient information, and at the point of discontinuity, in fact, we will not be able to find
out the gradient of this particular objective function or the error function.

Now, this back propagation neural network or multilayer feed forward network trained
using the principle of back propagation algorithm, may not be able to capture the
dynamics of a highly complex or highly dynamic process and that is why, actually we

327
will have to go for some feedback circuit, which I will be discussing in details, that
means, will have to go for the recurrent network. Those things will be discussed in much
more details, while discussing the recurrent networks.

Inputs are to be normalized I have already mentioned the range for the connecting
weights have already mentioned, and actually, we will have to find out some
convergence criteria for this particular the algorithm during the training. Now, if the rate
of change of error in prediction becomes less than or equal to some pre-specified small
value, we say that the network has reached that particular optimal situation and we
consider that particular network is an optimal network.

Now, a neural network could be either a fully-connected network or it could be a


partially-connected network. Now, very quickly, let me take a very simple example. So,
if I have got say one network having this type of structure. So, it is 2 input neuron, 3
hidden neuron and 1 output neuron, and if the connectivity is something like this, the all
neurons are connected. So, this type of network is known as the fully connected network.
On the other hand, if I have got a network of this type like, if I have got a network of this
type, say I have got these are the input neurons, and supposing the these are the hidden
neuron and this is output neuron.

(Refer Slide Time: 31:29)

And, if the connectivity is something like this, so this is nothing but actually one partially
connected network, because this connectivity is actually not ensured. Then comes, your

328
this connectivity is not ensured. So, dotted lines are actually absent. So, this type of
network is known as the partially-connected network.

Thank you.

329
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 23
Some Examples of Neural Networks (Contd.)

(Refer Slide Time: 00:15)

We are discussing the working principle of a Multilayered Feed-Forward Neural


Network. Now, supposing that I have got a process, whose input-output modeling has to
be done, that means, for a known set of inputs what should be the set of outputs that we
will have to determine.

Now, the first thing will have to do is, we will have to identify the input parameters and
the output variables. Now, these input parameters are to be independent and this would
be measurable. Now, here if you once again see the schematic view of this multilayered
feed-forward network, we can see that, this is the input layer through which we pass the
set of inputs and these inputs are to be in the normalized scale either in the scale of 0 to 1
or from minus 1 to plus 1. Now, how to represent in the scale of 0 to 1 or minus 1 to plus
1; that I have already discussed and here, so this is nothing but the output layer and on
the output layer, say we have got say P number of neurons.

Now, this output could be either independent and they could be dependent on each other,
also. Now, here you see the connecting weights between the input and the hidden layer is

330
denoted by V and that between the hidden and the output layer is denoted by W. And, we
have already mentioned the range for the connecting weights could be either from 0 to 1
or from minus 1 to plus 1.

Now, here, for this particular process, the number of inputs, that is, M is kept fixed and
similarly, the number of outputs of this particular process is also known. So, the number
of neurons in the input layer, that is nothing but the number of input parameters and the
number of neurons on the output layer is nothing but your, the number of outputs. So, the
number of neurons in the input layer and the number of neurons in the output layer are
known, these are kept fixed.

Now, the architecture of this particular network depends on the number of hidden layers
and the number of neurons you put in each of the hidden layers. Now, here, you can see,
for simplicity, we have considered only one hidden layer and this hidden layer is having
say capital N number of neurons. Now, how to decide the number of hidden layers or
how to decide this particular architecture? So, that I am going to discuss a little bit.

Now, before we start that, let me mention that there is no guarantee that we can ensure
more accuracy to this particular network by increasing the number of hidden layers. That
means, if I use one hidden layer for modeling input-output process and by using the same
training set, if I develop another network having say two hidden layers, the network
having two hidden layers actually cannot give guarantee that it is going to give more
accurate predictions.

Now, how to decide? Now, to decide this actually, what you do is, the first thing we will
have to decide that what should be the minimum number of neurons in this particular, the
hidden layer. And, let me assume that there is only one hidden layer and the minimum
number of neurons to be put in the hidden layer is equal to 2. The reason is, if you do not
put the minimum number as 2, we may not be able to capture the nonlinearities of this
particular process. So, to capture the nonlinearity of this particular process, the minimum
number of neurons to be put in the hidden layer is kept equal to 2.

Now, what should be the optimal number? What should be the optimal number of this
particular neurons in this hidden layer? Now, to decide that actually, we will have to
carry out one parametric study. Now, before I go for discussing the method of this
parametric study, let me once again mention that the performance of this particular

331
network depends on a number of parameters of course, it depends on the connecting
weights, that is, V and W.

Now, besides these particular the V and W matrices, the performance of this particular
network depends on the number of the neurons, which we are going to put in the hidden
layer, that is denoted by capital N, or the number of the neurons, we put on this particular
the hidden layer, that is, N and it depends on the bias value. Now, for simplicity, we have
not shown the bias value, but it depends on the bias value. Now, if you want to put the
bias value, so we will have to put that particular the bias value. So, I can put this
particular the bias value, here.

It depends on the coefficient of transfer function, that is, a_1 that means, your this log
sigmoid transfer function. It also depends on the coefficient of transfer function of the
output layer, that is nothing but a_2, and so on. So, these are the parameters for which we
will have to determine the set of optimal values.

Now, to carry out this parametric study actually, what you do is, we first try to fix the
range of each of these particular parameters. For example, to fix the range, the minimum
number is kept equal to 2. Now, supposing that I am just going to put the range for N we
say 2 to say 20. Now, I can also put the range for this particular b. So, might be it is
0.001 to say 0.01 something like this and for this a_1 and a_2, I can also put the range, so
might be it is from 0.5 to say 5.0, and so on.

Now, once you have decided these particular ranges for the parameters, now we are in a
position to start the parametric study and the purpose of parametric study is to decide,
what should be the topology of this particular network. And, what should be the
parameters, the other parameters and let me repeat that this connecting weight values,
those are generated in the normalized scale using the random number generator.

Now, let us see, how to carry out this particular the parametric study. Now, the first thing
we will have to do is, we will have to vary the number of the hidden neurons starting
from 2 to 20 say and regarding the other parameters like your b, a_1 and a_2. So, we try
to find out their mid values and these parameters are set to their corresponding mid-
values and I am just going to vary the number of neurons to be present in the hidden
layer.

332
Now, if I carry out this particular study so there is a possibility that I will be getting.

(Refer Slide Time: 08:39)

So, with the results of this study, what you can do is, you can plot say the error of this
particular network, that is error in prediction, and here, we have got the number of
neurons to be put in the hidden layer and we keep some fixed value for say b, we keep
some fixed values for a_1 and a_2.

Now, what we do is, we start with the two number of hidden neurons. So, say this is
going to indicate the number of hidden neurons, so I will be getting some error. So, error
could be here, then I use like three number of hidden neurons, might be the error could
be here, then I use the four number of hidden neurons might be the error could be here.
So, what we do is, we try to find out the nature of the performance of this particular
network.

And, then, we try to select that number of hidden neurons, which gives actually the
optimal performance in terms of error in prediction. Now, supposing that, say this is the
number of neurons corresponding to which I am getting the optimal performance. Now,
what you do is, the value of N is kept fixed to this particular value and next, we go for
the variation of this particular b.

333
(Refer Slide Time: 10:06)

Now, what we do is, so exactly in the same way, we carry out the study. So, now, here
there will be error and here there will be the bias value and N is kept fixed to the optimal
value, which you have already determined in the previous step. And, a_1 is kept fixed
with mid value, a_2 is kept fixed with mid-value and once again, for different values of
b, so we try to find out the error and we try to locate that value of b corresponding to
which, we are getting the minimum error. So, we select the near-optimal value for this
particular the b.

Next we start this particular study by wearing this particular a_1 and we try to find out
what should be the optimal or the near optimal value for this particular a_1. Next, we
start, this particular study actually with the variation of this particular a_2 and we try to
find out what should be the near optimal value for this particular the a_2.

So, by carrying out this particular study, so we can find out what should be the near
optimal value for this particular the number of hidden neurons, what should be near
optimal value for this particular the bias value, then comes what should be the near
optimal value for this a_1 and the near optimal value for this particular a_2. And once
you have got this particular thing, now we are in a position to decide at least
approximately, what should be the topology of this particular network and what should
be the other parameters.

334
Now, here I just want to make a comment, this process of carrying out the parametric
study is an approximate one, because here, the performance of this network depends on
all the parameters simultaneously. But, what we did is, we consider one parameter at a
time that means, this is an approximate way of determining the near optimal parameters
of this particular the network. So, this is the way actually, we can approximately
determine like, what should be the topology and what should be the other parameters for
this particular the network.

Now, as I discussed in my last lecture that this particular network has to be trained and if
I go for the supervised learning, this supervised learning actually can be implemented
either in the incremental mode of training or in the batch mode of training.

(Refer Slide Time: 12:47)

Now, I am not going to repeat because this I have already discussed in much more details
that the principles of the incremental training, the batch mode of training, their relative
merits and demerits, these things we have already discussed. But, now, I am just going to
concentrate a little bit on the batch mode of training for this supervised learning.

Now, supposing that in batch mode of training, say I have got a large number of training
scenario say capital L number of training scenarios. And, as I mention several time, out
of these capital L, the first one I am just going to pass and depending on the connecting
weights, the coefficient of transfer function and other things, so I will be getting the set
of outputs.

335
Now, this set of calculated outputs will be compared with the respective target values
just to find out the error. And, once you have got error at each of this particular output
neurons, I can find out what should be the combined error and that corresponds to a
particular training scenario. Then, we take the help of the second training scenario, I will
be getting another, the total error here.

Then, the third one we put their and I just go for the L-th training scenario and once you
have passed all these L training scenarios, I will be able to find out, what is the total
error, what is the average error and based on this particular, I will have to propagate
back, so that I can modify the different variables and ultimately through a large number
of iterations, I get that particular optimal or the near-optimal network.

Now, here I just want to make a point like, if I just go for the batch mode of training, we
will have to be careful that we have used sufficient number of training scenarios. Now,
supposing that, this particular network is having say approximately, say 100 design
variables. Now, if it is having 100 design variables or your parameters, which are to be
optimized, whose modified values are to be determined. So, what will have to do is I will
have to pass at least 100 training scenarios. So, either 100 or slightly more than 100
training scenarios I will have to pass, while carrying out the batch mode of training.

Now, let us see, what happens if I just pass say less number of training scenario, this I
have already mentioned and let me try to repeat once again. Now, supposing that I have
got 100 design variables. Now, if I pass say 90 training scenarios that is a case of the
under-training of these particular the network. That means, mathematically, if you want
to put, supposing that I have got 10 unknowns and I have got say 9 equations. So, if the
10 unknowns and nine equations we cannot solve mathematically the same thing is to
here, if I have got less number of training scenarios compared to the number of
unknowns, this will be a clear example of your under-training.

And, if there is under-training for this particular network and if I pass a set of inputs, will
be getting some output for this particular output cannot be actually believed and this
network actually does not know anything of the physical problem. So, if you just pass
say one set of training scenarios, we are going to get the outputs for a set of inputs. But,
this particular input-output relationship may not be reliable.

336
Now, on the other hand, if I use a very large number of training scenarios and if I train
this particular network through a large number of iterations, now let us see what
happens. Now, if I just train this particular network for a large number of iterations using
a very large number of training scenario, supposing that I have got only 100 design
variable and if I use 1000 training scenario and train this particular network for a very
large number of iterations, say 5000 iterations. What will happened to this particular the
network? Will this network be going to give a very good performance? The answer is no.

Now, here, there will be some sort of overtraining. So, this overtraining is also actually is
not good for this particular network and the reason, I am just going to discuss in details.
Now, if there is overtraining, this particular network will try to remember actually the
training scenarios and what will happen; the moment we pass the unknown test
scenarios, it will try to remember only the training scenario and it may not perform in the
optimal sense for the unknown test scenarios, because its adaptability will be lost or its
generalization capability will be lost, if I just go for the overtraining.

Now, as I told this particular network should have a very good generalization capability.
So, this generalization capability is actually a desired property for this particular network
and if you want to ensure; so what will have to do is. So, we should not go for under-
training and we should not go for overtraining, and in other words, actually we should go
for a perfect training for this particular network.

Now, supposing that so we have selected the number of training scenarios required for
carrying out the batch mode of training and while carrying out this particular batch mode
of training, there is a possibility that we will be a facing another problem now, that
problem actually I am just going to explain which I have learnt from my own practical
experience. So, this, I am going to discuss.

Now, supposing that I am carrying out the batch mode of training for this supervised
learning and I am using the suitable number of training scenarios, and I am using the
generalized delta rule, the back propagation algorithm that is nothing but supposing that I
want to update this particular the connecting weight.

337
(Refer Slide Time: 20:33)

∂Ek
So, what I will have do is, so ∆w jk =−η + α ′∆w(t − 1) . This is actually the
∂w jk

generalized delta rule. This I have already discussed.

Now, using this generalized delta rule, I will have to update this connecting weights.
And, I have already discussed that η is nothing but the learning rate and it has got the
range of 0 to 1, then α ′ is nothing but the momentum constant, and the purpose of using
the momentum constant I have already discussed. If you want to provide some sort of
damping effect to the network and if you want to provide some sort of cushioning effect
to this particular network, we will have to use this particular your momentum term.

Now, here there is one mathematical reason, why do you put this particular your the
momentum term. Now, supposing that the way we updated this particular network is
nothing but W=
jk (updated ) W jk ( previous ) + ∆W jk . So, this is the way actually we

update. Now, supposing that this particular delta W value is found to be positive for a
large number of consecutive iterations.

Now, this ∆W it could be either positive or negative, but fortunately or unfortunately


say I am getting the positive values for a large number of consecutive iterations. Then,
what will happen to this particular your ∆W jk updated?. Now, there is a possibility that

this particular ∆W jk will come out of the range that means, it will become say greater

338
than 1, supposing the this is becoming w is becoming say 1.5, whereas the maximum
limit for this W is 1. Now, what will happen to this particular network?

Now, as I mention several times that network does not know anything of the physical
problem. Now, even if the W has come out of the range or the V has come out of the
range, if you pass the set of inputs, the network is going to give you the output, but that
particular output may not be reliable, because these particular connecting weight values
have come out of the range. And, if this is the situation, that particular situation is
actually known as actually the stability problem of this particular network. That means,
the network has become unstable, it has lost its balance and so this particular network is
going to behave in a very the erratic way and you may not get a very good prediction for
a set of inputs.

(Refer Slide Time: 24:55)

Now, here, we can put one remedy, the remedy is something like this. So, if W or V
becomes greater than 1, so it has become out of range. So, what you do is, in place of W
what you put is, 1 divided by W or in place of V, you put like 1 divided by V. So, by
doing that actually, what you can do is, W becomes greater than 1. Now, 1 divided by W
will be less than 1, the same is true for your V and 1 by V and by doing that, once again
forcefully we are going to put that particular limit within the range and what will happen
is, your network once again will try to become stable.

339
But, if you see this particular performance like if I just say this is error and what will
happen the moment you put this particular correction, there will be a sharp change of this
particular performance and after that, it will reach once again the stable region. And,
once again, if it becomes out of range like V and W and if you put this particular
correction, so once again, it will have this type of erratic behavior and then, it will try to
reach that particular stable zone. Now, within this particular erratic behavior zone, we
cannot believe the network, and we will have to believe in this particular range and if
you want to use this network for making some predictions.

(Refer Slide Time: 26:35)

So, these are all actually the facts, which we should understand, if we want to design a
very efficient multilayer feed forward network. Now, I am just going to quickly look into
the merits and demerits of this multilayer feed forward network. Now, this particular
multilayer feed forward network can handle a large number of variables.

Now, if you remember while discussing the fuzzy reasoning tool, we have already
discussed that if the number of input parameters increase and handling or modeling that
particular process using fuzzy reasoning tool will be more difficult, because the number
of rules is going to increase exponentially. For example, say this I have already
discussed, if a particular process is having say n number of input parameters and to
represent each input parameter, if I use m linguistic terms in fuzzy reasoning tool. So,

340
what will happen? The total number of rules become m n , it is a very high number large
number and this is known as the curse of dimensionality.

Now, here for the same problem if I use the multi-layered feed forward network, so we
will be getting efficient modeling. So, in place of this fuzzy reasoning tool we can use
this multilayered feed forward network. The next is, it may not require so much problem
information, which you need in fuzzy reasoning tool or fuzzy logic controller because
while discussing the fuzzy reasoning tool, we have already discussed that its
performance depends on the database and the rule base.

And, if the designer wants to design initially the database and the rule base of the fuzzy
reasoning tool, he or she should know or he or she should have at least some initial
information of the process to be controlled. But, here, in neural network training, we may
not need so much problem information. Now, based on the experiments, if you can
collect some input-output relationships that you can directly used to train the network
and once it is trained the network will be able to predict the input output relationships in
a very efficient way.

Now, disadvantages of the multilayer feed forward network or demerits. The solution of
this back propagation neural network that is that multilayered feed forward network
trained using back propagation algorithm, the solution may get stuck at the local minima.
So, this I have already discussed, because initially we do not know what should be the
nature of the training nature of the error surface during the training. Now, if the error
surface is found to be unimodal. So, this back propagation algorithm, which is nothing
but the steepest decent algorithm is going to hit the globally minimum solution very
easily.

But, supposing that the nature of the error surface is such, it is having multiple modes
there are many such ups and downs, so in that case there is a possibility, the solution of
this back propagation neural network or multilayered feed forward network trained using
back propagation algorithm may get stuck at the local minima and it may not be able to
reach the globally minimum solution.

Now, if you see the computational complexity, the training complexity of a neural
network and the training complexity of a fuzzy reasoning tool, the training complexity of
the multilayered feed forward network is more compared to the computational

341
complexity of the fuzzy reasoning tool. So, the training of neural network is
computationally more complex compared to that of the fuzzy reasoning tool.

And, there is another drawback, I should say that is your this particular multilayered feed
forward network is nothing but a black box. For example, say we have got a set of
training scenarios and we train a particular the network. Now, for training, what you
need is for a set of training input parameters, what should be the set of output parameters
and if those relationships are known for a large number of training scenarios, we can
train the network, but during the training what is happening inside the network, we, the
designers, do not know. But, ultimately through a large number of iterations, it is going
to give rise to one optimal or the near optimal network. So, this particular multilayered
feed forward network works like a black box.

Thank you.

342
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 24
Some Examples of Neural Networks (Contd.)

(Refer Slide Time: 00:14)

Now, we are going to discuss, how to solve one numerical example related to the multi
layered the feed forward network.

(Refer Slide Time: 00:36)

343
Now, here I am just going to show a small network. So, before I read this particular
statement, I am just going to show a small network. Now, this network is nothing, but
actually a three layered network, now on the input layer, we have got say 2 neurons, on
the hidden layer, we have got the 3 neurons, and on the output layer actually, we have
got 1 neuron.

So, this is nothing, but a 2-3-1 network. And, now, I am just going to show and I am just
going to state the problem. So, this is the schematic view of this multi layered feed
forward network and it consists of three layers like your input layer, hidden layer and
output layer. The neurons lying on the input, hidden and output layers have the transfer
functions represented by y = x on the input layer (that is a linear transfer function),

1 e x − e− x
y= , (this is nothing, but is your log sigmoid transfer function) and y =
1 + e− x e x + e− x
(this is nothing, but the tan sigmoid transfer function), respectively. There are two inputs:
I_1 and I_2 and there is only one output, that is, O. The connecting weights between the
input and the hidden layers are denoted by V and that between the hidden and output
layers are denoted by the W.

(Refer Slide Time: 02:37)

Now, the initial values for these particular connecting weights are shown here. Now,
here you can see that is v_11 is nothing, but your 0.2; that means, your the connecting
weights between the first input neuron and the first hidden neuron, that is, v_11 is 0.2.

344
Similarly, v_12 is 0.4, v_13 0.3, v_21 (that is between the second neuron lying on input
layer and the first neuron lying on the your the hidden layer) is nothing, but is your 0.1,
v_22 is 0.6 and v_23 is nothing, but 0.5.

Now, similarly, the connecting weights between the hidden layer and output layer, that is
w_11 (that is the connecting weight between the first hidden neuron and the output
neuron) is 0.1. The connecting weight between the second hidden neuron and output
neuron, that is w_21 is 0.2, similarly, w_31 is equal to 0.1 and here, you have got a large
number of training scenarios and out of all the training scenarios, supposing that say I am
just going to show only one.

Now, the training scenario is something like this, if I_1 is 0.5 and I_2 is minus 0.4, then
the target output is nothing, but 0.15, now we are going to use the incremental mode of
training and using this incremental mode of training, we are going to find out, what
should be the modified value for this V and the modified value for this particular your
the W.

So, our aim is to determine the changes in the values of V and W during this training and
we are going to consider the learning rate, that is, η is 0.2 and for simplicity, actually the
momentum constant that is α ′ has been taken to be equal to 0.0; that means, we did not
consider the momentum term. And, through hand calculations, we are going to show one
iteration of this particular network. Let us see how does it work. Now, before I go for so,
this particular solution let me once again look into this particular network. So, it is a very
simple network.

345
(Refer Slide Time: 05:37)

So, what you do is, here we have got 2 inputs and 1 output, these are the connecting
weights, and here, we have got the transfer function like y = x and in the hidden layer
1
actually, I have got the log sigmoid transfer function, that is, y = . In output layer,
1 + e− x
e x − e− x
we have got the tan sigmoid transfer function that is nothing, but y = .
e x + e− x

And, this learning rate value we have assumed and the moment I pass say one set of
training scenario, I will be able to find out, what is the calculated output. Now, this
calculated output will be compared with the target and the error will be determined and
this error will be propagated back for the purpose of updating the connecting weights, so
that this particular network can predict the say the output for a set of inputs more
accurately.

Now, let us see, how to carry out so this particular calculations and how to find out the
change in V and your the change in W values in order to minimize the error in
prediction.

346
(Refer Slide Time: 07:02)

Now, the way it has to be solved, I have already discussed, now let me repeat. So, what
we will you have to do is, in the input layer, we are using the linear transfer function of
the form y = x . So, output will be nothing but the input. Now, here the same symbol, I
am just going to use the same nomenclature, for example, say I_O1 is nothing, but the
output of the first neuron laying on the input layer, I_I1, that is, your input of the first
neuron lying on the input layer is nothing, but 0.5, similarly, I_O2 is nothing, but I_I2 is
nothing, but is your minus 0.4.

So, these are nothing, but the outputs of this particular input layer, and once you have got
this particular output, now the respective outputs actually we are going to multiply by the
connecting weights and we can find out like what should be the input of the different
neurons lying in the hidden layer. For example, say H_I1 that is input of the first neuron
lying on the hidden layer is nothing, but I_O1 multiplied by your v_11 plus I_O2
multiplied by v_21 and if you calculate, you will be getting 0.06. Now, similarly, this
H_I2 is nothing, but I_O1 multiplied by v_12 plus I_O2 multiplied by v_22 and that is
nothing, but minus 0.04, and similarly, I can find out H_I3 that is nothing, but the input
of the 3rd neuron lying in the hidden layer and that is nothing, but I_O1 multiplied by
v_13 plus I_O2 multiplied by v_23 and that is nothing, but minus 0.05. And, once you
got these particular inputs of the hidden neuron, now very easily, we can find out what
should be the corresponding output.

347
(Refer Slide Time: 09:54)

Here, we have got this particular transfer function, that is your the log sigmoid transfer
1
function and that is nothing, but y = . So, this particular x is actually I will have to
1 + e− x
1
put the input of the different hidden neurons. Now, this H O1 = and if you put
1 + e− H I 1
this numerical value and solve there is a possibility that I will be getting this particular
the output.

Similarly, the output of the second neuron lying on the hidden layer using this particular
expression I can find out that is your 0.490001 and by following the same, I can also find
out what is H_O3, that is output of the 3rd neuron lying on this particular hidden layer
and this is nothing, but your 0.487503. So, this is the way, actually we can find out what
should be the outputs of your different hidden neurons.

348
And, once you have got, this particular output, we can find out, what should be the input
of the neuron lying on the output layer. So, this O_I1 is the input of the first neuron lying
on the output layer, we have got only 1 neuron lying on this particular output layer. So,
OI 1 = H O1w11 + H O 2 w21 + H O 3 w31 . And, if you insert the numerical values and calculate,
you will be getting this is nothing, but the calculated output of this particular network for
this set of inputs, and once you have got. So, I am sorry. So, this is nothing, but the input
of the neuron lying on the output layer.

So, if I know this particular input, I can find out what should be the output of this
particular neuron lying on the output layer and here, actually we have got the tan sigmoid
transfer function. And, for this tan sigmoid transfer function, this O_I1 is nothing, but
this the input of the neuron lying on output layer. So, I will be getting the calculated
output of the neuron lying on the output layer is nothing but this O_O1. Now, if you
know this calculated output so, very easily we can find out what is this error. So, this
1
=
error in prediction is nothing, E (TO − OO1 ) 2 and if you calculate, we will be getting
2
this as the error and based on this particular error, actually I will have to do the updating.

349
(Refer Slide Time: 13:27)

Now, let us see how to update. So, this particular connecting weight, that is your w_11,
now this w_11 is actually nothing, but is your this particular thing. So, this is your w_11.
So, I am just going to update it. So, I am just propagating back this particular error and I
am going to update this particular connecting weight (Refer Time: 13:55). Now, let us
see how to update this particular connecting weights. Now, to update the connecting
weights, we are using, in fact, the back propagation algorithm or the delta rule now
∂E
−η
according to this delta rule, the change in w_11; so, ∆w11 = .
∂w11

Now, the partial derivative of E with respect to w_11 is nothing, but partial derivative of
E with respect to O_O1 multiplied by the partial derivative of O_O1 with respect to O_I1
multiplied by the partial derivative of O_I1 with respect to your w_11. Now, here, we
have already discussed like how to find out this partial derivatives, for example, say your
this partial derivative of E with respect to O_O1, very easily you can find out this
particular expression, then comes your this partial derivative of O_O1 with respect to
your O_I1 and here, we have got actually the tan sigmoid transfer function.

e x − e− x
Now, if you write down the expression for tan sigmoid. So, y = . So, very easily
e x + e− x
actually, we can find out what is dy/dx, this I have already discussed. Now, if you find
out the dy/dx, then with a little bit of simplification so, you will be getting this particular
expression. So, this is nothing, but the partial derivative of O_O1 with respect to O_I1 is

350
nothing, but this then partial derivative of O_I1 with respect to w_11 is nothing, but
H_O1 and once you have got all the terms.

(Refer Slide Time: 16:22)

So, now actually, what you can do is, we can multiply just to find out, what is this your
partial derivative of E with respect to your w_11 and we also put actually the numerical
value of the learning rate. And, once you have got this particular thing, very easily we
can find out, what should be this change in your w_11, that is change in w_11 is nothing,
but minus 0.004526 and once you have got it very easily you can find out your w_11
updated is nothing, but w_11 previous plus your ∆w11 .

Now, this ∆w11 we have already got. So, very easily, you can find out the updated value
for this w_11. Now, the same principle we are going to use for determining, what should
be the updated value or what should be the change in w_21. So, change in w_21 will be
minus 0.004306, then change in w_31 is nothing, but minus 0.004284. So, by using the
same principle, we can find out what should be the change in w values.

351
(Refer Slide Time: 17:53)

And, once you have got, now, you will have to find out the change in v values. So, this
v_11 is nothing, but the connecting weights between the first neuron of the input layer
∂E
and the first neuron of this particular the hidden layer. So, ∆v11 =−η . Now, the
∂v11
partial derivative of E with respect to this v_11 is nothing, but your partial derivative of
E with respect to O_I1 multiplied by the partial derivative of O_O1 with respect to O_I1.
Then comes your the partial derivative of O_I1 with respect to your H_O1. Then, partial
derivative of H_O1 with respect to H_I1, then partial derivative of H_I1 with respect
your v_11 and once again we are going to use the chain rule of differentiation. Now,
these partial derivative of E with respect your O_O1. So, I can find out this particular
expression, then comes your the partial derivative of O with respect to your O_I1 this we
have already seen. So, we can find out.

352
(Refer Slide Time: 19:33)

The next is your the partial derivative of O_I1 with respect to H_O1 is nothing, but
w_11, then partial derivative of H_O1 with respect to H_I1 is nothing, but e raise to the
power minus H_I1 divided by 1 plus e raise to the power minus H_I1 square. Now, this
1
is nothing, but is your the log sigmoid transfer function and it is of the form y = .
1 + e− x
And, we have already discussed that this particular derivative can be determined and
once you have got this particular derivative with a little bit of simplification, you will be
getting this particular expression and your the partial derivative of H_I1 with respect to
v_11 is nothing, but I_O1 and once you have got all the expressions, now we can put
together and we can write down. So, this expression of partial derivative of E with
respect to v_11 and if you put all such numerical values in that particular expression; you
will be getting the partial derivative of E with respect to v_11 is nothing, but 0.000549.

353
(Refer Slide Time: 21:12)

And, once you have got this very easily you can find out what is your ∆v11 . So, this ∆v11

actually, we can find and once you have got this ∆v11 , now we can find out what should
be the updated value for this is your v_11 because v_11 updated once again is your
nothing, but v_11 previous plus your ∆v11 .

And, we can find out, what is this v_11 updated and once you have got by following the
same principle, we can find out like what should be your the change in v_21. Now, the
change in v_21 is something like this then change in v_12, change in v_22, change in
v_13 change in v_23. So, all such numerical values, we can find out.

354
(Refer Slide Time: 22:22)

Now, actually, we are in a position to find out the updated values for this particular
network in one iteration , now if you see the updated values. So, the updated value for
this v_11 will become 0.199890. Similarly, the updated value for v_12 is nothing, but
this v_13 is nothing, but this, then comes your v_21 updated value is this, v_22 the
updated value is this and v_23 the updated value is something like this. And, I can also
find out the updated values for this w that is your w_11, the updated value will be
something like this, for w_21 the updated value will be something like this, and for w_31
the updated value for this something like this.

Now, once you have got the particular updated values and I am using say the incremental
mode of training. So, I can find out this updated values and using the updated values
once again, if I pass the same set of training scenario, there is a possibility that I will be
getting a slightly less error in prediction and supposing that I am running for say 10 or 20
iterations by following the same principle before I go back or before I start with the
second training scenario.

So, based on the first training scenario, let me update for 10 times or let me just run this
for say 10 times, 10 iterations, then we go for the second training scenario and repeat the
process. Then, you go for the third training scenario, you repeat the process and all the
training scenarios you pass one after another and at the end of each training passing each
training scenario, you update this particular network.

355
Now, if you follow this particular method, there is a possibility that you will be getting
one network, the optimal network or the near optimal network, after passing the 10-th
training scenario and whatever you got after passing the first training scenario, there
could be a lot of difference. So, these two networks could be different performance-wise
and if you follow this incremental mode of training, there is a possibility that you may
not get a very good generalization capability of this particular network.

The network may not be adaptive in nature and if it is not adaptive in nature, for the
unknown test scenario, this particular network may not work well. Particularly, if you
just go for the incremental mode of training, which is computationally very fast
compared to the batch mode of training, but its generalization capability may not be
sufficient.

Thank you.

356
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 25
Some Examples of Neural Networks (Contd.)

(Refer Slide Time: 00:15)

We are going to discuss the working principle of another very popular network and that
is known as actually a Radial Basis Function Network, that is RBFN or Radial Basis
Function Neural Network, that is your RBFNN. So, let us see how does it work, and how
can it solve the input-output modeling problem.

That means, if you want to represent the input output relationship of a particular
engineering system or a process, we can also use the radial basis function network in
place of the multilayered feed forward network. Now, let me discuss first the working
principle of this radial basis function network and then, I will make a comparison of this
particular network with the multilayered feed forward network.

Now, to define like what do you mean by the radial basis function network, this radial
basis function is actually a special type of function, where the response increases or
decreases monotonically with its distance from a centre point. Now, I am just going to or
take the example of a few radial basis functions and these radial basis functions are used

357
as the transfer functions in radial basis function neural network. Now, let me concentrate
more on this particular the radial basis function.

(Refer Slide Time: 02:11)

Now, if you see the radial basis function, we have got the different types of function. For
example, say we have got the thin plate spline function and mathematically, this is
nothing but f ( x) = x 2 log x .

(Refer Slide Time: 02:33)

358
Now, if you see the plot of this particular function, the plot looks like this. So, this is
your x, and this is f (x), that is y. And, here you can see, as x increases, so this particular
function increases monotonically. So, this is actually a very good example of a radial
basis function and this is known as the thin plate spline function.

Now, if you see the other forms of this particular radial basis function, we have got
another function that is the well-known Gaussian function and this particular Gaussian
function is used very frequently in radial basis function network. And, if you see this
mathematical expression of the radial basis function, so this
1 x−µ 2
=y f=
( x) exp(− ( ) ) . Now, if you see this particular plot this figure shows the
2 σ
Gaussian distribution.

(Refer Slide Time: 03:43)

This figure shows a Gaussian distribution and this Gaussian distribution is used as a
radial function and if you see the mathematical expression for this Gaussian distribution
1 x−µ 2
=y exp(− ( ) ) . Now, if you see, so this is the way the function is actually
2 σ
increasing and this is a way it is decreasing. Now, if I select, this particular sigma, so I
can actually control the distribution of this particular function. So, this indicates the
mean property and sigma is going to indicate what should be the distribution.

359
Now, if I consider the smaller value for this particular σ , so I will be getting the steeper
curve. On the other hand, if I consider the higher value for this particular σ , you will be
getting the flatter curve in the radial basis function network. The next is your multi

quadratic function, that is,=y f=


( x) x2 + σ 2 .

(Refer Slide Time: 05:47)

Now, if you see the plot of this particular function, this is nothing but the plot. Now,
here, you can see that this particular function is decreasing monotonically and then, it is
increasing also, ok. So, this is a very good example of a radial basis function.

And, this is also very frequently used in radial basis function network. Now, then comes
your the inverse of this, that is known as inverse multi quadratic function and

= ( x) 1/ x 2 + σ 2 . So, this is nothing but inverse multi-quadratic function. And,


y f=
this is also very frequently used in your radial basis function network, ok.

360
(Refer Slide Time: 06:39)

And this is the function plot, the function plot for inverse multi quadratic function. Now,
this radial basis functions are used in radial basis function network.

(Refer Slide Time: 06:55)

Now, if we concentrate on the structure of a radial basis function network, it looks like
this. Now, supposing that I have got a process having say capital M number of inputs and
I have got the capital P number of outputs.

Now, here on the input node actually we pass all the input parameters. So, these are all
input parameters and on the output layer, we represent this output neurons, ok. And, in

361
between the input nodes and the output layer we have got a hidden layer, and the
architecture of the topology of this particular radial basis function network depends on
the number of neurons we put on the hidden layer and generally, for this radial basis
function network, we use only one hidden layer. Now, here, how to decide the number of
neurons to be put on the hidden layer that I am going to discuss in details.

But, before that let me tell you one fact regarding this particular network, then once
again, I will be discussing how to decide the topology or the architecture of this
particular network.

(Refer Slide Time: 08:25)

Now, let me concentrate on the training scenarios, first. Now, supposing that we have got
say capital L number of training scenarios. So, this capital X is nothing but a collection
of all capital L training scenarios. Now, if I concentrate on a particular training scenario,
so l-th training scenario. So, capital X_small l is a collection of X_l1, X_l2, dot dot dot
X_li, then a few other values at the last value is X_lM. So, this is actually nothing but the
l-th training scenario; that means, the input or the l-th training scenario and supposing
that I have got capital M number of inputs. So, this X l that is the l-th training scenario,
so its inputs are nothing but these.

So, these inputs we are going to pass to the network and will be getting some calculated
output. Now, this calculated output will be compared with the target outputs to determine
the error values. Now, let us see, how does it work and how to decide, this particular

362
your architecture first and then, I will see how does it work how to carry out the forward
calculation or how to carry out the feedback calculations or the feed forward calculations
rather.

Now, as I told, the architecture depends on the number of the neurons you put on the
hidden layers and at each of this particular hidden neurons, we put the radial basis
function as actually your the transfer function. For example, say here, I am using some
sort of the Gaussian distribution as the transfer function.

(Refer Slide Time: 10:37)

So, corresponding to the first hidden neuron, I have got say one Gaussian distribution
and supposing that it is denoted by µ1 and standard deviation is your σ 1 . Similarly, for

the j-th one, mean is µ j and your standard deviation is σ j . Corresponding to this n-th

one, so the mean is your µ N and standard deviation is nothing but is your σ N .

Now, how to decide the value of this particular the capital N that, I am going to discuss.
But, before that, let me tell you one more thing that here in place of input layer, I am
putting input nodes. So, truly speaking for this particular network, there is no input layer
and this is nothing but actually a 2-layer network. That means, here it has got one hidden
layer, which is in between the input nodes and output layer and we have got one output
layer. And, truly speaking there is no input layer here, instead what you have got it is
your input node and if you notice it carefully to represent a neuron, I am using a circle

363
something like this and to represent the node, in fact, it is actually the filled-up small
circle sort of thing, ok.

So, this indicates the node, this is node but not the neuron. So, this is a node not the
neuron. And, the difference between this particular node and the neuron is, we do not use
any transfer function here. Now, whatever is coming as inputs the same input you pass it
here, so the input is passed through this particular node and all such inputs are summed
up here. So, this is nothing but H_I1, that is the input of the first neuron lying on the
hidden layer this is H_Ij that is the input of the j-th neuron lying on the hidden layer and
this is H_In that is nothing but the input of the n-th neuron lying on the hidden layer.

Now, depending on this Gaussian transfer function, I will be getting some output here,
some output here, some output here and to determine the input of the output layer by
following the same principle. So, I will have to multiply this particular output by this
connecting weight, this particular output by that connecting weight, and this particular
output by this connecting weight and you sum them up, so you will be getting the input
of the k-th neuron lying on the output layer and generally, on the output layer, we use the
linear transfer function.

So, output is nothing but the input. So, output of the k-th neuron lying on the output layer
is nothing but the input of the k-th neuron lying on the output layer. So, this way
actually, it works. But, let me tell you one more thing and I have not yet discussed, in
fact, how to decide the topology or how to determine the number of hidden neurons.
Now, if you see the literature you will find that there are different ways, there are
different methods used to decide what should be the number of hidden neurons in this
hidden layer. Now, out of all such methods I am just going to discuss a few very popular
methods.

For example, say this I have already mentioned the minimum number of neurons in the
hidden layer has to be once again 2, but what should be the optimal number? To decide
the optimal number, as I discussed, I can carry out some sort of parametric study, the
way I carried it out for the multilayered feed forward network, and in the parametric
study, what you can do is, you can decide what should be the optimal number of N, what
should be your this particular coefficient of this transfer function.

364
For example, linear transfer function it could be, y = mx , what should be the suitable
value for this particular m or if I use some non-linear transfer function like log sigmoid
or say tan sigmoid, so in place of m, I will have to find out what is a_1 or a_2 and so on.
And, of course, I can add some bias value, so I can put that bias, I can put some bias
value, here, ok; so bias can also be determined. And, exactly the same procedure, which I
discussed, we can follow just to find out what should be the near optimal values for this
N, then comes m, a_1, a_2, b and all such things.

And, once you have got this near optimal network, now you can believe that particular
the network. So, parametric study is one method, now I am just going to mention another
very scientific method, which I have already discussed. So, what you can is, you can do
some sort of clustering using the principle of fuzzy clustering. For example, say the
fuzzy C-means clustering, then comes your fuzzy entropy-based clustering.

Now, if you do clustering based on similarity of the training data, there is a possibility of
determining N, supposing that I have got 1000 training data, and if I do clustering based
on similarity, what will happen is, supposing that I am getting ten optimal number of
clusters for each cluster, supposing that say in the first cluster, say I have got say 80 data,
second cluster I have got say 110 data, third cluster I have got say 90 data, fourth cluster
I have say 120 data, and so on. Similarly, I have got say 10 clusters, 10 clusters mean
your 10 hidden neurons.

Now, each neuron is going to represent a particular cluster and say in the first clusters, I
have got 80 data and based on its leader, the cluster centre, I can find out what should be
the mean property of this particular Gaussian distribution. And, once I know this
particular mean property, so I can find out the variance of this particular surrounding
data and I can also find out, what should be the standard deviation, that is, σ 1 . So, for
each of these particular hidden neurons, the Gaussian distribution, I can find out, what
should be the mean and standard deviation.

And, once you have got the mean and standard deviation for these 10 number of hidden
neurons, I know all such properties and the moment I pass these particular inputs to these
hidden neurons, depending on this your µ and σ , I will be getting the different outputs,
although the inputs for each of these particular hidden neurons are exactly the same
numerically.

365
For example, say what you are doing; this H_I1 is numerically exactly equal to H_Ij and
that is numerically equal to H_IN. So, what I do is, we can find out this particular inputs
and depending on the σ and µ , I will be getting the different output, different output,
different outputs, and then, as I told, the outputs will be multiplied by the corresponding
connecting weight and these are summed up here and then, it will pass through this
particular transfer function just to find out the final output.

So, this is the way actually, this particular radial basis function network works. Now,
whatever I have discussed, the same thing actually I have written it here.

(Refer Slide Time: 19:37)

For example, say I am passing the training scenario, the l-th training scenario having
your M numerical values for M inputs. And, once you got, I can find out the output of
this hidden layer and the output of the hidden layer is nothing but this. So, H_Oj, that is,
output of the j-th neuron lying on the hidden layer can be determined using this particular
Gaussian distribution.

366
(Refer Slide Time: 20:13)

Now, if I have got output of this hidden neuron, what you can do is, very easily, I can
find out the input of the k-th neuron lying on the output layer, and it is nothing but
N
OIk = ∑ w jk H Oj , and once I have got this, now, here we are using the linear transfer
j =1

function. So, output is nothing but the input. So, I can find out the error in prediction at
the k-th output neuron. So, that is nothing but this.

(Refer Slide Time: 20:51)

367
And, once you have got this particular output, what you can do is, you can use the
incremental mode of training, just to update that particular network, and if you want to
update this particular network very easily you can do the principal I have already
discussed

So, here w_updated is nothing but w previous plus ∆w , where this


∂Ek
∆w jk (t ) =−η (t ) + α ′∆w jk (t − 1) . So, I am using the generalized delta rule. And this
∂w jk

particular partial derivative can be determined by following the same principle, the same
chain rule of differentiation.

∂Ek ∂Ek ∂OOk ∂OIk


So, the = . So, by following this, I can find out, so this particular
∂w jk ∂OOk ∂OIk ∂w jk

partial derivative.

(Refer Slide Time: 22:15)

Now, the expression for each of the partial derivatives very easily you can find out,
which I have discussed several times. So, we can find out the partial derivative of E_k
with respect to O_Ok this is nothing but this particular expression, then partial derivative
of O_Ok with respect to O_Ik is equals to 1, partial derivative of O_Ik with respect to
w_jk is nothing but H_Oj.

368
(Refer Slide Time: 22:49)

So, this is the way you can find out. Now, let us see how to update the mean of this
Gaussian distribution and how to update the standard deviation of this particular
Gaussian distribution, that I am going to discuss. Now, before I go for this, let me once
again concentrate on this particular network.

(Refer Slide Time: 23:11)

So, our aim is to find out, what should be the updated value for the mean. So, let me
concentrate on this particular the radial basis function, the Gaussian radial basis function,
and supposing that it has got the µ that is the mean and it is got the standard deviation,

369
that is your σ j , so how to update this particular mean and your standard deviation that I

am going to discuss. Now, if you see this particular H_Oj. So, this is connected to
through this connecting weight to the first output neuron. Then, comes your this is
connected to this particular network, so this is nothing but w_jp, and so on.

So, this particular radial basis function has got some contribution to each of these
particular output neurons. That means, if I want to update this particular mean or the
standard deviation, I will have to consider the average effect of this particular error. And,
that is why, the way I discussed in fact, we are going to consider the average effect of
this particular. Now, here your µ j updated is nothing but µ j previous plus ∆µ j . Now,

∂E ∂E 1 P ∂E
∆µ j (t ) =−η{ (t )}av + α ′∆µ j (t − 1) . Now, this particular { }av = ∑ k . Now,
∂µ j ∂µ j P k =1 ∂µ j

you are multiplied by k sorry summation k equals to 1 to P partial derivative of E_k with
respect µ j and this is multiplied by 1 by P. Now, this particular partial derivative can be

determined using the chain rule of differentiation. So, by following the same procedure I
can find out.

(Refer Slide Time: 25:49)

So, very easily you can find out. So, this particular expression, that is
∂Ek ∂Ek ∂OOk ∂OIk ∂H Oj
= . Now, we will have to find out the partial derivative of
∂µ j ∂OOk ∂OIk ∂H Oj ∂µ j

370
H_Oj with respect to your, this particular the mu_j. Now, let us try to understand how to
get this particular the expression.

Now, this is a Gaussian distribution. So, if you write on the expression, that is, your
H_Oj, this is nothing but e raise to the power minus half. Then comes your x minus µ ,

here x is what that is input that is H_Ij that is nothing but the variable x minus µ j square

divided by is your σ j square. So, this is the Gaussian distribution.

Now, if I find out its derivative for example, say if I try to find out H_Oj with respect to
your H_Ij. So, how to find out? It is very simple. So, what you can do is, this will
become e raise to the power minus half multiplied by σ j square, then comes your this

H_Ij minus µ j square multiplied by your minus half. Then comes your, so this is I am

sorry this is actually your with respect to µj I am sorry for this. So,

1 ( H Ij − µ j )
2

∂H Oj − ×
σ 2j 1
= e × (− ) × 2 × ( H Ij − µ j ) × (−1) . So, this is nothing but is your the
2

∂µ j 2

derivative with respect to your the µ j .

So, let me repeat. So, partial derivative of H_Oj with respect to mu j is nothing but e
raise to the power minus half, H_Ij minus µ j square divided by σ j square multiplied by

minus half multiplied by 2, H_Ij minus µ j multiplied by minus 1. Now, this can be

written as what is this? So, this is nothing but is your H_Oj. So, this is your H_Oj, ok.
Now, this H_Oj. Now, this 2, 2 gets cancel minus and minus this will become plus, so
∂H Oj H Ij − µ j
this is nothing but is your = H Oj . Now, this H_Ij is what? H_Ij is nothing
∂µ j σ 2j
but the summation of your x_l1, l2 up to your lm that is nothing nut is your H_Ij. If you
remember the input of the j-th neuron laying on the hidden layer is nothing but
summation of all such values. And, here, I am putting minus M µ j . Now, for each of the

dimension, so I consider that it has got your mean µ j and we assume that all the

dimensions has got the same the mean value, which is an assumption and I have got M
dimensions. So, this is nothing but minus M µ j divided by σ j square.

371
So, this is the way actually, we can find out this derivative, that means, you can find out
actually, this expression for the partial derivative of H_Oj with respect to mu_j, so this
particular the expression, we can find out very easily.

(Refer Slide Time: 30:59)

And, once you have got this particular thing, now, we are in a position to find out the
change in µ j and once you have got the change in µ j , we can find out the updated

value.

Now, the same principle, I am just going to use for updating your the σ . So, σ j updated

is nothing but is your σ j previous plus ∆σ j , exactly in the same way I am writing down

the expression of ∆σ j . So, will have to consider the average effect once again and the

partial derivative of E with respect to σ j average is nothing but 1 by P multiplied by

summation k equals to 1 to P partial derivative of E_k with respect to your σ j . Now,

once again I will have to derive that particular the expression.

372
(Refer Slide Time: 31:53)

Now, this partial derivative of E_k with respect to your σ j is nothing but this particular

expression according to the chain rule of differentiation. And now, what you can do is,
so this particular partial derivative of E_k with respect to O_Ok multiplied by partial
derivative of O_Ok with respect to O_Ik this I can find out. Now, partial derivative of
O_Ik with respect to H_Oj, I can find out.

Now, I am in a position to find out what should be this partial derivative, that is partial
derivative of H_Oj with respect to your σ j and this is nothing but is your this particular

big expression. Now, very easily once again we can derive this particular expression. So,
H_Oj is nothing but e raise to the power minus half then comes your x is nothing but say
H_Ij minus your µ j square divided by σ j square this is a Gaussian distribution.

1 ( H Ij − µ j )
2

∂H Oj −
σ 2j 1
= e × (− )( H Ij − µ j ) 2 × (−2) × σ −j 3 , ok.
2
Now, if I find out the
∂σ j 2

Now, if you simplify, this is nothing but is your H_Oj multiplied by actually this minus,
minus becomes plus 2 and half. So, I will be getting like H_Ij minus µ j square divided

by σ j cube, exactly the same expression I have written it here. Now, this H_Ij is what?

H_Ij is nothing but x l1, l2, up to lm and this µ j . So, for each of the dimension I am

considering the same mean value mu_j. So, this is nothing but is your, x_l1, minus µ j

373
square plus x_l2 minus µ j square and the last term will be x_lm minus µ j square

divided by your this sigma_j cube. So, this is the way actually we can find out.

So, this particular partial derivative and once you have got this particular partial
derivative, now, we are in a position to find out what should be your this particular the
∆σ j (t ) and once you have got this ∆σ j (t ) , so I can update this particular your σ j .

So, this is the way, actually the connecting weight, the mean and standard deviation of
the Gaussian distribution used in the radial basis function for this network can be
updated. And, through a large number of iterations, this particular network is going to
give more and more accurate prediction, that is the better prediction, and this is the way,
actually this radial basis function network is working.

Thank you.

374
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 26
Some Examples of Neural Networks (Contd.)

(Refer Slide Time: 00:15)

Now, we are going to solve one numerical example related to radial basis function
network, and let us see, how can it model the input-output relationship of a process.
Now, here, we are going to consider, in fact, a system having four inputs and one output
only for simplicity. So, I am just going to show one radial basis network having, in fact,
four inputs and one output.

375
(Refer Slide Time: 00:49)

So, this is the network. Now, if you see, there are four input is here say x_I1, x_I2, x_I3,
x_I4. And, this is nothing but actually the input node and we have got only one output.
And here on the output layer we are using the linear transfer function. And, on the
hidden layer, we are using a radial basis function network, say inverse multi-quadratic
function. So, inverse multi-quadratic function I am using as the transfer function in the
hidden layer and we have got three neurons on the hidden layer.

So, this H_I1 indicates the input of the first neuron lying in the hidden layer; H_O1 is the
output of the first neuron lying on the hidden layer. H_I2 say input of the second neuron
lying in the hidden layer; then H_I2 is the output of the second neuron lying on the
hidden layer.

Then, H_I3 is input of the hidden neuron the third neuron lying on the hidden layer; and
H_O3 is the output of the third neuron lying on the hidden layer. Now, these outputs are
multiplied by the connecting weights and those are summed up here as the input and
using the linear transfer function, we get this particular output. Let us see, how to carry
out this particular analysis and how to solve the numerical example.

376
(Refer Slide Time: 02:43)

Now, let me give the statement. There are three neurons on the hidden layer, which are
1
assumed to have inverse multi-quadratic function of the form,=y f=
( x) ,
x2 + σ 2
take σ 1 , σ 2 and σ 3 for the first, second and third hidden neurons as 0.2, 3.0 and 4.0,
respectively. Assume initial weights w_11 = 0.2, w_21 is 0.4, w_31 is 0.5.

We use incremental training scheme with the help of a training scenario: x_I1 is 1.5,
x_I2 is 2.0, x_I3 is 1.7, and x_I4 is 2.5. The target output is nothing but 0.14. We are
going to use back-propagation algorithm with a learning rate η equals to 0.2. And, we

will have to update this w_11 and σ 1 , and we are going to solve for one iteration.

377
(Refer Slide Time: 04:15)

Now, let us see how to solve it. Now, here we have some given values like x_I1 is 1.5,
x_I2 is 2.0, x_I3 is 1.7, and x_I4 is 2.5. Now, as I discussed, we try to find out H_I1, that
is nothing but the input of the first neuron lying in the hidden layer and it is the same as
input of the second layer lying on the hidden layer and it is same as input of the third
neuron lying on the hidden layer is nothing but 1.5+2.0+1.7 +2.5 = 7.7.

1
Now, this H O1 = . So, x is 7.7, σ 1 is 0.2. And, if you just insert these values
x 2 + σ 12

1
and calculate, we are getting 0.129. Similarly, this H O 2 = . If you substitute the
x 2 + σ 22

values for x and σ 2 , and if you calculate, you will be getting 0.121. So, this is the way
actually we can find out H_O1, H_O2.

378
(Refer Slide Time: 05:49)

1
And, H O 3 = ; 1 divided by square root of 7.7 square plus 4.0 square that is
x 2 + σ 32

nothing but 0.115. Then, we determine what should be the input of the first neuron lying
on the output layer that is OI 1 = H O1 × w11 + H O 2 × w21 + H 03 × w31 . And, if you substitute
all such values the numerical values, and if we calculate, you will be getting 0.1317.
And, here, on the output layer, we are using the linear transfer function. So, the output of
the neuron the first neuron laying one the output layer is nothing but its input and that is
nothing but 0.1317. So, we will be getting this particular output.

379
(Refer Slide Time: 07:07)

Now, based on this particular output, now we will have to find out this updated values,
for the connecting weights and the update value for this particular the σ . Now, if you
see by following the similar procedure, I can find out the updated value for this w_11 is
∂E
−η
nothing but w_11 previous plus delta w_11. Now, ∆w11 = . Now, this particular
∂w11
partial derivative using the chain rule of differentiation we can write down, so partial
derivative of E with respect to O_O1 multiplied by the partial derivative of O_O1 with
respect to O_I1 multiplied by the partial derivative of O_I1 with respect to W_11 ok.
And, now we can find out all such things like here this partial derivative that particular
partial derivative and this particular partial derivative, we can find out.

And, if you just substitute the numerical values, I will be getting partial derivative of E
with respect to w_11 is nothing but this, and once you got this particular thing by
multiply - η , so I will be getting this change in w_11.

380
(Refer Slide Time: 08:39)

Now, if you carry out this calculation, we will be getting your ∆w 11 is nothing but minus
0.2 (0.2 is the value of the learning rate) multiplied by minus 0.00107. And, if you just
multiply, you will be getting this as ∆w 11 . Now, w_11 update is nothing but your the
previous value plus the change in this. So, I can find out the updated value for this w_11.
Now, by following the similar procedure, so I can find out the updated values for w_21,
and the updated values for w_31.

(Refer Slide Time: 09:33)

381
And, once you have got this, so what you can do is, we can find out the updated value for
this w_11 and other w’s. Now, let us see, how to determine the updated value for this σ 1 .

Now, the updated value for σ 1 is nothing but the previous value for σ 1 + ∆σ 1 . Now, this

∂E
∆σ 1 =−η . Now, partial derivative of E with respect to sigma_1 is nothing but partial
∂σ 1
derivative of E with respect to O_O1, partial derivative of O_O1 with respect to O_I1
multiplied by partial derivative of O_I1 with respect to H_O1 multiplied by partial
derivative of H_O1 with respect to your σ 1 .

Now, these particular derivatives very easily you can find out, this we have discussed
several times. Now, let me concentrate on the last partial derivative that is partial
derivative of H_O1 with respect to your σ 1 , and how to determine this particular the
partial derivative.

Now, it is very simple. Now, this can be written as the partial derivative of this, this is
nothing but H_O1. So, this particular expression is your H_O1 with respect to σ 1 . So,

this is nothing but your if I just try to find out partial derivative or with respect to σ 1 of

this particular expression, so this is nothing but your x square plus σ 1 square raise to the
power your minus half.

So, if I just try to find out, how to find out, it is very simple. So, this is nothing but is
1
your − ( x 2 + σ 12 ) −3/2 × 2σ 1 ; that means, you are. So, this 2, 2 gets cancelled. So, I will be
2
getting −( x 2 + σ 12 ) −3/2 × σ 1 . So, exactly the same thing which I have written it here, so
very easily you can find out this particular the partial derivative.

And, once you got all such things very easily you can find out this particular partial
derivative. And, once you have got all such things, very easily you can find out this
partial derivative of E with respect to σ 1 . And, once you have got it, I can find out what

should be your the change in σ 1 . And, once you got change in σ 1 , we can find out your

what is σ 1 updated.

Now, this is the way actually, we can update the connecting weights and this particular
σ value. And, this process will go on and go on through a large number of iterations,

382
and ultimately, you will be getting a network. And, this particular network will be able to
make the prediction very accurately. Now, this is actually the working principle of the
radial basis function network. Now, if I compare this particular radial basis function
network with the multilayered feed forward network.

Now, in terms of accuracy like the both the networks are able to provide almost the same
level of accuracy. But if I compare in terms of computational complexity, this radial
basis function network is computationally faster compared to your multilayered feed
forward network, and that is why, actually this radial basis function network has become
very popular. And, this is very frequently used, in fact, to model input-output
relationship of an adjunct process having say large number of inputs and outputs.

Thank you.

383
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture - 27
Some Examples of Neural Networks (Contd.)

(Refer Slide Time: 00:15)

We have discussed the working principle of multi-layered feed-forward networks and


radial basis function network. And, we have seen, how to model the input-output
relationships of a particular process, but supposing that the process is highly dynamic.
Now, for a very dynamic process, these networks may not be able to capture the
dynamics of it.

Now, in that case, if you want to make a successful model, we will have to go for another
type of network, which is known as the recurrent network. Now, this recurrent network,
in short, is known as your RNN. Now, we are going to discuss, in details, the working
principle of these particular recurrent neural networks.

Now, before I go for the recurrent neural network, we should try to understand the
reason behind going for this type of network in more details. Now, to explain the fact,
why should we go for this type of network and let me try to take one very practical
example. Now, supposing that I am going to model a process, which is something like
straight turning of a cylindrical part. Now, supposing that I have got a cylindrical part

384
something like this, now from this particular cylindrical part, I will have to make one
turned cylindrical part and supposing that, I will have to make something like this.

Now, for getting this type of products starting from here, I will have to go for the straight
turning. Now, the straight turning is carried out on a lathe. Now, if I just go for this
particular straight turning, what I will have to do is, I will have to consider a process
which is having three inputs, say cutting speed, feed and depth of cut, say cutting speed
is denoted by say v, speed is denoted by f and your depth of cut is a denoted by d.

And, what are the outputs? The outputs are nothing, but the quality of the turned surface
and that is measured in terms of say, surface roughness, it is denoted by small s and
supposing that I will have to find out what is the power consumption so, that is a denoted
by p. Now, let me repeat. So, this is a process, process of say straight turning and this
particular straight turning has to be carried out on a lathe and here, there are three inputs
like cutting speed, speed and depth of cut and the outputs are the surface roughness and
power consumption.

Now, these inputs are selected by the user or the operator and we do the turning and
then, as a result, I will be getting some surface roughness and power consumption. Now,
the moment we select these input parameters, something else happens inside this
particular process. Now, if I call, these input parameters are nothing, but the external
input parameters and the moment we select these external input parameters, some of the
internal input parameters will be created.

For example, say, there will be some level of vibration, while doing this particular
machining. Now, this generated vibration, that is called the internal input, that is I_in.
So, this particular internal inputs will have some contributions towards the surface
roughness and power consumption.

Now, if I want to model this particular dynamic process in a very efficient way, we will
have to consider both the external inputs as well as the internal inputs because both of
them are having some contributions towards the output. Now, the network, which have
already been discussed like your multi-layered feed-forward network or the radial basis
function network cannot be used to capture the whole dynamics of this particular
process. Now, if I want to capture the complete dynamics of this particular process, we
will have to go for in fact, the Recurrent Neural Networks, that is your RNN.

385
Now, before we discuss the working principle of this particular RNN, let me try to
concentrate on how to capture the dynamics of a particular layer of neurons, now let us
try to concentrate on this. Now, for simplicity, let me consider that on a layer of a neural
network, say I have got only 4 neurons, say 1, 2, 3 and 4. Now, what do you do is, to find
out the output of this particular the first neuron, the inputs should come from the second,
third and fourth. So, input is coming from this particular neuron, then it is coming from
your this particular neuron and it is coming from this. So, the inputs are coming from
these three and we will be getting some output here.

Now, this particular output will be fedback. So, this particular symbol z inverse indicates
the feedback. So, this will be fedback actually to the 2nd neuron, then comes your 3rd
neuron and it will also go to the 4th neuron. Now, let us concentrate on the 2nd neuron.
The inputs of the 2nd neurons should come from the 1st one then comes your the 3rd one
and the 4th one and supposing that this is the output, we are getting. Now this output will
be feed-back and this will come here and this particular output will enter the 1st neuron,
then comes, it will enter your the 3rd neuron and it will also enter the 4th neuron.

Now let us concentrate on the 3rd one, now the output which you are getting from the 3rd
will be feed-back and that will enter to the 1st neuron and then, it will also enter to the
2nd neuron and this will also go to the 4th one and let us concentrate on the 4th neuron.
So, the output of the 4th neuron that will be feed-back and that will enter in fact, the 1st
neuron, then it will enter the 2nd neuron and then, it will enter the 3rd neuron. Now, this
is the way actually, we try to capture the dynamics of this particular the layer of neural
network and this particular layer, for simplicity, we have assumed that it consists of only
a few neurons.

Now, to summarize let me mention that in this type of recurrent network, what we do is,
we take the help of some sort of feedback. Now, this feed-back is actually going to help
us to capture the dynamics of this particular process and here, there will be a cycle or a
loop in this type of network. So, this is actually, the way we can capture the dynamics of
a single layer of this particular neuron.

386
(Refer Slide Time: 08:46)

Now, we are going to concentrate on the network; that means, it has got say three layers
like input layer, hidden layer and output layer. Now, here if you see the literature for this
particular recurrent neural networks, we have got a few very popular models.

For example, say we have got Elman model, then comes we have got the Jordan network
and the combined Elman and Jordan network. Now, let be concentrate first on this
particular the Elman network, let us see, how does it work. Now, in this type of network,
in fact, we have got two circuits; one is your feed-forward and another is called your the
feed-back circuit.

So, we have got feed-forward and feed-back circuits. Now, if I concentrate on this
particular input layer, the external inputs, that will be passed through the network
through this input layer. So, this is the input layer, now we have got the connecting
weights between the input layer and the hidden layer and that is denoted by v and on the
hidden layer, these neurons are having some transfer functions and depending on this
particular transfer function, I will be getting some output here; I will be getting some
output here.

Now, here actually, what we do is. So, these outputs of the hidden neurons or the hidden
layers are not directly passed to the output, instead those outputs of the hidden neurons
are taken as the feedbacks. So, the output here, for this particular hidden neuron is
actually taken back, as your the feed-back and what they do is, the output which are

387
getting, either we consider the 100 percent of that or say might be said 50 percent or 30
percent or 40 percent of that as feed-back and we just keep it, here.

Now, similarly, the output which you are getting on these particular hidden neurons, we
take as actually the feed-back and once again, we just copy and we put it here. Now,
these are actually the feedbacks, which are nothing, but the internal inputs of this
particular process denoted by I_im. So, here, we pass the external inputs and these
internal inputs will be generated inside this particular process and now, actually what we
do is, we try to pass actually all such inputs once again to the network and what you do
is, this particular the feedback, that will be allowed to come here and this particular feed-
back will be coming here and of course, we are having this particular circuits also and
here, we will be getting actually the combined input for this hidden neuron.

Now, similarly, here also I will be getting. So, this particular feed-back is coming from
here, this particular feed-back is coming from here and of course, this will come, this
will also come. So, here, I will be getting this combined input further on the hidden
neuron. And, these particular combined inputs will be passed through the hidden neurons
and then, I will be getting these particular outputs, here I will be getting this particular
output.

So, whatever output we get here, we multiply by the connecting weights, that is denoted
by your w and here, we will be getting the combined input for the output layer and the
output layer neurons are having some transfer functions. So, I will be getting this
particular final output, that completes actually, one iteration of this particular network.

Now, here, to summarize actually what we do is, we take the feed-back from this
particular hidden layer. And, either we consider 100 percent of this output of the hidden
layer or slightly less than 100 percent as feed-back and this feed-back will be actually
considered as input to the hidden layer once again and then, it is passed and ultimately,
we will be getting this particular output.

Now, if you see, this particular network consists of, in fact, two such components, one is
called the feed-forward component. So, this is nothing but the feed-forward component
and this particular component is nothing but the feed-back component. So, this network
has got both the components and it will be able to capture the dynamics and this shows
the working principle of this Elman network.

388
Now, as I have already mentioned several times that it will be preferred to the multi
layered feed-forward network, if we want to model a highly dynamic process; so, this is
the working principle.

(Refer Slide Time: 14:24)

And, now, I am just going for another recurrent network, that is called the Jordan
network. Now, in Jordan network, actually what we do is, we take the feed-back from the
output layer, but we do not take the feed-back from your the hidden layer now once
again let me start. So, I am passing one set of external inputs denoted by say I_ex and
here, we have got the connecting weights. So, whatever outputs we are getting here,
those things will be multiplied by these connecting weights, these are the connecting
weights v and those things will be summed up here. So, I will be getting some input; I
will be getting some input here.

And, initially we assume that this particular feed-back circuit is not present and now, I
will be getting the inputs and we will pass it through the neurons of this hidden layer. So,
depending on this particular transfer function, I will be getting some outputs here and
similarly, here also, for this particular hidden neuron I will be getting some outputs.
Now, these outputs will be multiplied by the connecting weights, and these things will be
summed up and this will be considered as input of the output layer. Similarly, this will be
multiplied by this connecting weight and these things are summed up and these are
nothing, but the input for the output neuron.

389
And depending on the transfer function so, we will be getting some your output here and
we will be getting some output here, but these outputs are not actually the final output.
Now what we do is instead of going for here. So, these output actually we feed-back and
this will be used as your some sort of feed-back to this particular the network and
similarly whatever output we are getting here.

So, these particular things will be fed-back and it will be used as the internal inputs.
Now, here we will be getting some internal inputs, now those internal inputs will be
passed through this particular network and that means, it will be getting some feedbacks.

So, this feed-back will come here. So, this particular feed-back will go and I will be
getting the feed-back here; I will be getting the feed-back here. Now, we are in a position
to determine what should be the combined input for these particular the hidden neurons
and we will be able to find out, what should be the combined input for this particular
hidden neuron.

Now, depending on the transfer function, I will be getting the output; and once again, we
use these particular connecting weights and then, we will be getting the combined input
for this particular output neuron and then, using the transfer function, I will be getting the
final output; I will be getting the final output here. So, that completes actually one cycle
or one loop or one iteration of this particular network.

Now, to summarize, in Jordan network, we take the feed-back from the output layer and
as I have already mentioned, this part is nothing but the feed-back circuit and this is the
feed-forward circuit and in this type of RNN, we consider both the feed-forward circuit
as well the feed-back circuit. And, ultimately for a set of external inputs, I will be getting
the final output and of course, the system is going to generate this type of your internal
inputs.

Now, for the purpose of calculating the outputs, feed-back and all such things, exactly
the same procedure, which we have discussed for the multi-layered feed-forward
network (like how to find out the inputs for a particular layer, how to find that output and
all such things exactly) the same principle, we can follow.

390
(Refer Slide Time: 18:56)

Now, actually, we are going to discuss another network, which is the combination of
this particular Elman network and Jordan network. Now, here in the combined Elman
and Jordan network, we take the feed-back both from the hidden layer as well as the
output layer and let us see, how does it work? Now, let me assume that initially, this
particular feed-back is not there. So, let me assume that this particular feed-back is not
there and let me concentrate on the feed-forward circuit first.

Now, I am passing the set of the inputs that is nothing, but your external inputs exactly
the same way I discussed. So, here, on the input layer, we have got the neurons, they are
having their transfer functions and based on that you will be getting some output here.
And, once you have got this particular output, now these outputs, will be multiplied with
a connecting weights. So, I am here, this will be multiplied with a connecting weight.

So, I am here, similarly this will be multiplied by the connecting weight, this will be
multiplied by the connecting weight. So, I will be getting some input for this hidden
layer I will be getting some input for this particular hidden layer. Now, I will pass these
inputs to the hidden layer. So, I will be getting some outputs here and now, what we do
is. So, this particular output has got two applications; one is, it will be allowed to pass to
the output layer and this particular output will also be allowed to enter; this particular
feed-back circuit.

391
Now, let us see like what happens if it goes to the feed-forward circuit. So, this output
will be multiplied by the connecting weight, this output will be multiplied by connecting
weight. So, I will be getting some combined input here. So, this particular output will be
multiplied by your the connecting weights.

So, this will be multiplied with a connecting weight and those are summed up and these
inputs are going to pass through the output neurons and here, we have got the transfer
function. So, I will be getting some output here. Now, what we do is, this particular
output we consider either 100 percent or less than 100 percent of that as feedback. So,
this will be used as feedback. So, this is the feed-back and it will be stored here.

Now, similarly, this particular output will be considered as feed-back and it will be
stored here and let us concentrate on the output of the hidden layer. Now, if you
concentrate on the output of the hidden layer, you can see that we have got the output of
the hidden layer, which is nothing, but this. So, this will be used as feedback. So, I am
just going to use feed-back and it will be stored here. Similarly, the output of these
particular hidden neuron. So, this will be allowed to pass through like this and we will be
able to collect here in this particular neuron.

So, this is the way actually, we collect the combined feedbacks both from this hidden
layer as well as the output layer and here, we will be getting that particular internal
inputs. Now, these internal inputs will be allowed to pass through these particular hidden
layer. So, I am just going to pass it here, I am just going to pass it here and similarly, this
will be allowed to enter, this particular hidden neuron and this will be allowed to enter
this particular hidden neuron.

So, what happens? Here, I will be getting the combined input for the hidden layer, I will
be getting the combined input for this hidden layer and these combined inputs will be
passed to the transfer function here. So, I will be getting the output here.

Now, the output, which you are getting, that will be multiplied by the connecting weight;
multiplied with a connecting weight, this output will be multiplied with a connecting
weight and so, these things, we are going to collect here. So, these inputs for the output
neuron, we are going to collect and these inputs will pass through the transfer function
and ultimately, we will be getting the final output of these particular the network. And,
this completes actually one cycle of this particular combined Elman and Jordan network.

392
Now, this is the way actually, we capture the dynamics of this particular process, as I
mentioned. So, this is the feed-back circuit and this is the feed-forward circuit and both
the things will be working together and we will be getting this combined Elman and
Jordan network. Now, if you see the performance; the performance of this particular
network is found to be very reliable, like if you want to model the dynamics of a highly
complex or highly dynamic process. Now, as it is having this feed-back circuit, there is
almost a guarantee that it will be able to capture the dynamics of this highly dynamic
process.

Now, if you see the computational complexity of these networks, here, as it is having
both feed-back and feed-forward circuits, compared to your multi-layered feed-forward
network, there will be more computations.

So, computationally, it could be a little bit more complex and moreover, as we have
discussed the Elman network, Jordan network and the combined Elman and Jordan
network, if you compared their computational complexity. The computational
complexity of the combined Elman and Jordan network is obviously, becoming more
compared to that of only Elman network and only Jordan network, but supposing that we
have got a very complex process, very dynamic process. So, its better to go for this type
of combined Elman and Jordan network.

So, the working principle of the recurrent network, we have discussed and principle-
wise, it is very simple and it has been reported that this network can perform very well
particularly to capture the dynamics of the highly dynamic process.

Thank you.

393
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture - 28
Some Examples of Neural Networks (Contd.)

(Refer Slide Time: 00:15)

Now, we are going to discuss the working principle of another Neural Network, which is
known as the Self-Organizing Map and in short, this is known as SOM. Now, this
network was proposed by Kohonen around 1994/95 and according to his name, this
network is also known as Kohonen network. Now, here we will see that in this particular
network actually, we use the principle of unsupervised learning. So, the concept of that
unsupervised learning, I am going to discuss in details.

Now, before that let us mention that this self-organizing map can be used as a
visualization technique or a dimensionality reduction technique and this is actually a
topology preserving tool. Now, let me discuss a little bit, what do you mean by the
visualization technique or dimensionality reduction technique and what do you mean by
this topology preserving tool? Now, supposing that say I have got a data in the higher
dimension, say in 10 dimensions, 20 dimensions, something like this.

394
Now, this higher dimensional data can be represented something like this. So, this is
actually the representation of the higher dimensional data, a large number of data and
supposing that, so this data set is having say L dimensions, say 10 dimensions.

Now, this data we, human-beings, we cannot visualize the reason is: we can visualize
only up to three dimensions. Now, if the data are in more than 3 dimensions, say 4
dimensions or so, we cannot visualize. So, for the purpose of visualization, this particular
higher dimensional data are to be mapped to the lower dimension, say in 2 dimensions,
say x and y. Now, if I want to map, these higher dimensional data to the lower dimension
for the purpose of visualization, we cannot do actually 1 : 1 linear mapping. So, we will
have to go for some sort of nonlinear mapping.

Now, these higher dimensional data, we are going to map to the lower dimension for the
purpose of visualization; that means, we want to see the relative position of the different
points, which are there in the higher dimension, so these points, we are going to see in
the lower dimension. Now, this technique is known as actually the visualization
technique and or the dimensionality reduction technique.

Now, if you see the literature there exist a large number of techniques for this
dimensionality reduction. Now, these dimensionality reduction or visualization
techniques can be classified into two groups. So, we have got the dimensionality
reduction techniques, say DRTs and these techniques can be classified into two groups;
one is called the distance preserving tools and we have got actually the topology
preserving tool.

Now, let us try to find out the difference between the distance preserving tool and
topology preserving tool. So, by distance preserving tool, actually what do you mean?
So, in the higher dimension, if I consider two points, one point is here, another point is
here, so we can find out the Euclidean distance between them. Supposing that this is
point i and this is your point j and this particular distance is nothing, but say d_ij.

So, in distance preserving technique actually, we consider only the Euclidean distance
between them, but we do not consider the relative position of the jth point with respect to
the ith point. Whether the j-th point is towards the left or towards the right or towards the
top or towards the bottom with respect to this i-th point that is not considered in the
distance preserving tool.

395
But, in topology preserving, actually we are going to maintain the topology; that means,
in the higher dimension, if the j-th point is towards the top with respect to the i-th, in
lower dimension, I am just going to map this point at this particular point. So, the relative
position of these particular points, the two points, those things will be maintained in the
lower dimension and that means, in topology preserving, not only the distance, but we
try to maintain the relative position of a particular point, with respect to another point.

Now, this Self-Organizing Map or SOM, that is actually a very efficient tool for the
topology preserving. Now, actually, I should mention one thing that in human brain, we
use this type of network very frequently and that is why, actually we can remember the
topology. For example, sitting at a particular place, we can say, for example, starting
from here, the road towards another city, so what is the direction, how does it go from
one city to another city?

So, we can imagine that we have got the topology preserving tool that is nothing, but
self-organizing map in our brain. Now, this particular self-organizing map can also be
used as a very efficient clustering tool. Now, in this course, we have already discussed
the working principle of a few clustering tools like Fuzzy C-means clustering, entropy-
based clustering, in detail.

Now, the similar type of problem can also be given to this self-organizing map and this
self-organizing map is actually a very efficient clustering tool. And, as I mentioned that
here, in this particular network, we use the principle of unsupervised and competitive
learning. So, here, we do not use the concept of supervised learning, which we have
already discussed.

Now, let us see, how to use the concept of this unsupervised learning or the competitive
learning in this particular self-organizing map. Now, if I see the main task of this
particular network, that is as follows. Supposing that, I have got some higher
dimensional data, a large number of data point, for example, say I have got say one, this
is the i-th one and this is the n-th one.

396
(Refer Slide Time: 08:12)

So, I have got say capital N number of data points, in the higher dimension, say these are
in L dimensions L-D and these particular data will be passed through the input layer of
this particular network and our aim is to map these higher dimensional data, that is the L
dimensional data to the lower dimensional layer and that is nothing, but the competition
layer for the purpose of visualization. So, what they do is, each of these particular N data
points in the higher dimension, we are going to map it to the lower dimension for the
purpose of visualization.

So, although this input layer is in L dimensions, this particular competition layer or this
is nothing, but the output layer should be either in say 2 dimensions or it could be say in
3 dimensions, so, this could be your the 3 dimensions.

Because we can visualize only up to 2 dimensions or say only up to 3 dimensions. So,


these particular data points are to be mapped to the lower dimension and that is nothing,
but the output layer or the competition layer. Now, we are going to discuss, in details,
how can you do? So, this type of mapping from the higher dimension to the lower
dimension by maintaining the topological information of these particular the data points.
So, these things actually, we are going to discuss in much more details.

397
(Refer Slide Time: 09:48)

Now, the self-organizing map can also be called actually nonlinear generalization of
principal component analysis. You might have heard about the Principal Component
Analysis, that is nothing, but PCA algorithm, now this PCA is actually a very efficient
tool for linear mapping, 1 : 1 mapping, but as we are doing the mapping from higher
dimension to lower dimension, we cannot think of the linear mapping, so we will have to
go for actually the non-linear mapping.

And, in this non-linear mapping, we will not be getting this 1 : 1 mapping, there will be
some in-accuracy and this in-accuracy, we will have to actually accommodate. Now,
here if you see, the self-organizing map consists of two layers, for example, say we have
got the input layer and the competition layer, this is nothing, but a two layer network.
And, on this particular competition layer actually, there will be three basic operations
and these operations, we are going to discuss it details.

These are nothing, but the competition, then there will be cooperation and after that,
there will be updating and through this competition cooperation and updating, actually
this particular network is going to do the mapping from the higher dimension to the
lower dimension. Now, let us see how does it work? So, let us first concentrate on the
competition. Now, the purpose of this particular competition is to declare a winner. Now,
let us see, how to declare and how to determine that particular the winner?

398
(Refer Slide Time: 11:44)

Now, this is the network; this is the network although I have shown it here. So, this is the
input layer and the competition layer and I have already drawn this particular
competition layer, but for the time being, let me assume that this particular competition
layer is actually absent. So, this competition layer, actually these type of neurons are (
the way of shown it here) are absent for the time being, say.

Now, what is our aim? Let me repeat we have got capital number of data. So, starting
from 1 up to say capital N data points, we have on the input layer and these are in L
dimensions, that is your L-D or say the higher dimension or any other dimension say.
Now, these particular data points are to be mapped to the lower dimension. Now, let us
see, how can we use the principle of this particular competition, now let us see the
principle of this competition first.

399
(Refer Slide Time: 12:53)

Now, here, as I told that we are going to map a particular data points, say i-th data point
lying on the input layer. Now, this particular i-th data point, supposing that I am
considering here m dimensional point. So, if it is having m dimensions, so to represent
this particular X_i, so, I will have to use x_i1, x_i2 up to x_im. So, I have got m number
of numerical values, where i varies from 1, 2 up to capital N. So, this N is actually the
total number of data points and each point is having say small m dimensions, now, how
to carry out this particular competition.

Now, to carry out the competition actually, what we do is, we generate, at random, some
correcting weights between this input neuron i and the neuron j lying on the competition
layer or this output layer. Now, what do you do is, we generate some connecting weight
or synaptic weight that is denoted by W_ji. Now, if you want to generate so this
particular W_ji, the first thing you will have to do is, you will have to decide its
dimension. Now, let me assume that once again, this is having m dimensions, that means,
the same dimensions of your this input data.

Now, what you do is, so this W_ji to represent, you will have to use small m number of
numerical values, that is nothing, but w_j1^i, w_j2^i and the last is your w_jm^i and j
varies from 1, 2 up to N, that is nothing, but the total number of data points lying on the
input layer.

400
Now, once you have got these particular the W values, which are generated, now very
easily I can find out what should be the Euclidean distance between your X_i and your
this particular W_i and through this competition actually, what you are going to do is, we
are trying to find out that the value of W, which is the closest to this particular X_i.

(Refer Slide Time: 15:33)

Now, before I go for that particular calculation, I am just going to show you on the
schematic view. Let us suppose, if the this is the i-th data point, which I am going to map
to the lower dimension,

(Refer Slide Time: 15:46)

401
what I do? We actually generate all such W values. So, these are all connecting weights
or the W values and these are actually your say W_j^i and these actually the connecting
weights are generated at random using the random number generator, supposing that in
the normalized scale, between say 0 to 1. Now, if this is the situation, then very easily, I
can find out the Euclidean distance between the i-th data point lying on the input layer
and capital N number of W values, each W is having small m dimension.

Now, how to do this? So, let me explain that now as I mentioned that we are going to
determine the Euclidean distance between your X_i and W_j^i and these particular

Euclidean distance is nothing, but ( X i − W ji ) 2 . So, this is nothing, but the Euclidean

distance between X_i and W_j^i. Now, corresponding to this particular i-th data point,
we have generated how many W values? Capital N number of W values. So, how many
such Euclidean distances are possible we have got capital N number of Euclidean
distance values and out of these capital N number of Euclidean distance values, what I
do is, we try to find out which one is the closest, which one is having the minimum
Euclidean distance values and that particular W which is having the minimum Euclidean
distance is actually declared as the winning neuron or the winning connecting weight.
So, mathematically, the winning neuron or the winning connecting weight is denoted by
small n, and your the Euclidean distance between n and X_i is expressed like this, and so
this n is consider as the winner; that means, it is having the minimum of these Euclidean
distance values.

Now, let me repeat, out of these Euclidean distance values, we try to locate that
particular W_j^I, which is the closest to your X_i and which will give rise to the
minimum Euclidean distance value and that particular W_j^i will be declared as the
winner. Now, the purpose of this particular competition is actually to declare the winner
of this competition, now if you just see on your plot.

402
(Refer Slide Time: 18:53)

So, here, once again corresponding to the i-th one, I am trying to find out the winner,
supposing that, this particular connecting weight is nothing, but the winner. So, this
indicates actually the winner neuron and that is denoted by your small n. So, small n is
nothing, but the winning neuron, ok. Now, this is the winner, now surrounding this
particular winner, there will be some other connecting weights.

Now, for the time being, these particular neurons are not drawn, it is simply the
connecting weights. So, these are simply the connecting weights, the neurons are not
drawn now. So, surrounding this particular connecting weight, we have got a number of
other connecting weights in this neighborhood, ok. So, let me repeat, the purpose of this
particular competition is to declare that winner and once that particular winner we have
got, now we can enter the cooperation stage.

403
(Refer Slide Time: 20:00)

The next stage is actually the cooperation and here, in this particular stage, we have
already decided the winner through competition and surrounding that particular winner,
there will be some excited connecting weights or excited neurons. So, let me just draw it
here; now here, if you see this particular surrounding that is expressed with the help of
mathematical expression, that is nothing, but the Gaussian distribution.

Now, if I just draw this particular equation of one Gaussian distribution, it will look like
this, so this is actually the Gaussian distribution and it is having your the standard
deviation that is nothing but is your σ t and it is having the mean. So, mean is nothing,
but the property of these particular the winners.

So, the mean properties are decided by the winners property, which have been declared
the winner in that particular competition and there will be a Gaussian distribution, and
the nature of the Gaussian distribution will be decided by the standard deviation, that is,
σ t ; t means that t-th iteration. Now, if you see that h_j, n(x_i), so j is actually your
excited neuron or the excited connecting weight and small n is nothing, but the winner.

So, the neighborhood function between the j-th excited connecting weight and the winner
d 2j ,n ( xi )
small n at t-th iteration is nothing, but h j ,n ( xi )=
(t ) exp(− ) . Now, this particular d_j,
2σ t2

404
n(x_i) is nothing, but the lateral distance between the winning neuron n and the excited
neuron j. So, very easily, you can find out this particular Euclidean distance.

And, once you have got these, now let us concentrate on these σ t . So, this σ or the
standard deviation, it is not kept fixed and it is actually a variable and it will vary from
t
=
your iteration to iteration. Now, here we have written this particular σ t σ 0 exp(− ) .
τ
Now, this particular τ is nothing, but pre defined number of maximum iterations,
supposing that the τ is kept equal to say 1000 or something like this, so τ you say equal
to 1000.

Now, so this is actually the fixed number, now as iteration proceeds, what will happen to
this σ t ? So, as iteration proceeds, small t is going to increase now as small t increases,

what will happen to this particular σ t . So, this particular σ t is going to be reduced. And,

if σ t reduces, what will happen to the nature? Now, if you see this particular nature, if

σ t reduces, I will be getting some sort of steeper distribution of these particular


Gaussian.

So, as iteration proceeds, there is a possibility that I will be getting say this type of
steeper distribution and as iteration proceeds, I may get even steeper distribution
corresponding to these, so this type of steeper distributions I will be getting. Now, if I
take the plan view corresponding to the first the Gaussian. So, there is a possibility that
this indicates the plan one, plan view for the original, as the iteration proceeds, so it is
going to be reduced something like this and there is a possibility that this particular
neighborhood is going to be reduced and will be getting actually a lot of interactions and
through this particular interaction between the mean properties, that is a winner and you
are the excited connecting weights or the excited neurons, there will be a chance of
updating of both the things.

Now, this is the almost similar to the situation like supposing that, say in one institute
there are a large number of professors and around under each professor, a large number
of students are working. Now, these students are all excited neurons or excited
connecting weights and as if the professor is having the mean properties and professor is
the winner. Now, there will be lot of interactions between the professor and the students

405
and through this particular interaction, there is a possibility that the students are going to
update their knowledge level and at the same time, the professor is also going to learn a
few new things.

(Refer Slide Time: 25:58)

So, through this particular interaction, both the professor as well as the students are
going to learn and there will be some sort of updating. Now, the principle of updating is
very simple. So, what we do is we try to update say
= W ji (t ) + η (t )h j ,n ( xi ) (t )[ X i − W ji (t )] .
W ji (t + 1)

And, this particular learning rate is going to vary in the range of say 0 to 1 and through
this particular interaction, both the excited neuron as well as the winner are going to be
updated, their connecting weights are going to be updated through a large number of
iterations.

So, corresponding to that particular i-th data point, if you remember, like on the input
side we have got a number of neurons, so this is the first one, this is actually the i-th one
and this is your N-th one. So, corresponding to this particular ith one, so actually, we are
going to get the updated one and ultimately, corresponding to this, I will be getting
finally, one connecting weight something like this, that is the modified winner. Now, you
repeat the same process for the remaining N minus 1 data points lying on the input layer,
so this is the input layer. So, lying on the input layer, you repeat the process inside a for
loop of the computer program.

406
So, corresponding to each of these particular data points, in the higher dimension, I will
be getting a actually one modified winner W. So, for each of the data points actually, I
will be getting these particular modified connecting weights. Now, once you have got
this type of modified connecting weights, now we will have to do the mapping. Now, let
us see, how to do this particular mapping.

(Refer Slide Time: 28:38)

Now, let us try to actually understand the situation. So, on the input side or on the input
layer, we have got capital N number of data points and for each of the data points, in the
higher dimension, I have got the modified, winner connecting weights. So, I have got,
how many such W values, once again I have got capital N number of W values.

So, we have got capital N number of W values, so what I can do is, now this W is having
the m dimensions. So, what you can do is, if I consider that m dimensional space its
origin is, what I have got actually small m number of 0s. So, we have got small m
number of 0s on the origin and what we can do is, we have got capital N number of W
values corresponding to capital N number of neurons lying on the input layer.

Now, what you can do is, starting from the origin, I can find out the Euclidean distance
of all the W values, all capital N number of W values. Then, how many Euclidean
distance values, we are going to get? So, we are going to get actually capital N number
of Euclidean distance values. Let me repeat, starting from the origin in m dimensions
like I have got 0 0 0 small m number of 0 at the origin.

407
So, I can find out, I can calculate the equilibrium distance of capital N number of W’s
and that means, I can find out capital N number of Euclidean distance values and once I
have got capital N number of the Euclidean distance values, we do the sorting in the
ascending order.

So, the W, which is the closest to the origin will be considered fist and we are going to
do the sorting in the ascending order and once you have got that particular information of
Euclidean distance in the ascending order, what you can do is, we can consider that
particular w, which is the closest to the origin and its Euclidean distance we can consider
as radius and we can draw one circular arc something like this, that is denoted by c_1.

And, the next Euclidean distance value, I can use as your the radius, I can draw another
circular arc. Similarly, I am drawing another circular arc and how many such circular
arcs, I will have to draw? I will have to draw capital N number of circular arcs. So, I am
just going to draw capital N number of circular arcs. Now, let me concentrate on the first
circular arc, you take a point at random lying on the first circle so let we us consider this
particular point.

And, now, I know the Euclidean distance between these particular Ws, so this
corresponds to one W, this corresponds to another W, the connecting weight. So, I know
the Euclidean distance between this W and that particular W and its numerical value I
can calculate and considering that as the radius, I can draw another circular arc and
supposing that I am drawing this particular arc.

Now, once you have got, so this is another point, now considering this at the center and
considering your the next W, I can find out the Euclidean distance and these particular
Euclidean distance we can consider as the radius and I can draw one circular arc here.
So, I will be getting another intersection point, following the same procedure, I will be
getting another intersection; another intersection; another intersection, and so on, and
once you have got all such intersection points, those are nothing, but the points in two
dimensions or the lower dimension. So, these higher dimensional data, now, we are in a
position to map to the lower dimension like say two dimensions for the purpose of
visualization.

Now, if you just go back to your diagram, which we have considered for the self-
organizing map, now we will be able to explain these , like corresponding to these

408
supposing that I have got only one winner. Similarly, corresponding to another data point
might be another winner and modified winner, say corresponding to this might be
another, corresponding to this might be another. So, capital N number of data points,
lying on the input layer will be mapped to the lower dimension on the competition layer
for the purpose of visualization.

So, this is the way actually, from the higher dimension, we can do the mapping to the
lower dimension for the purpose of visualization. So, now, we will be in a position to
draw all such neurons here, and each neuron indicates a particular point lying on this
input layer. So, I will be getting here capital N number of neurons and each neuron is
going to represent a particular neuron in the higher dimensional input layer. Now, this is
the way, actually it works.

(Refer Slide Time: 34:48)

Now, I can show one example like very simple example like this is actually one test
function, that is called the Schaffer’s test function. So, this Schaffer’s test function, if
you see the mathematical formulation, so this is the mathematical expression for
Schaffer’s test function and here, you see
4
sin 2 ∑ xi2
i =1 0.5
y=
0.5 + 4
− 4
. So, this is actually the mathematical
1.0 + 0.001(∑ xi ) 1.0 + 0.001(∑ xi )
2 2 2 2

=i 1 =i 1

expression and here, you see i varies from 1 to 4; that means, this is in 5 dimensions.

409
Now, these 5 dimensional data, we cannot visualize because we can visualize only up to
3 dimensions. Now, what we do? We generate 1000 data points at random, lying on the
surface of this particular test function and these 1000 data points are mapped to the 2
dimensions for the purpose of visualization using the self-organizing map.

So, in the higher dimensions, if there are 1000 data points in the lower dimension also, it
will be getting 1000 data points and here, the topological information will be kept
unaltered or intact. And, here, these particular data points are well-distributed, so there is
an ease for visualization. So, very easily, we can visualize these data points and their
relative positions or their topology, we can visualize very easily.

Thank you.

410
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture - 29
Some Examples of Neural Networks (Contd.)

(Refer Slide Time: 00:15)

We are going to discuss the working principle of another very popular network and that
is called actually the counter propagation neural network and in short, this is known as
CPNN. Now, before you start with this particular network, let me tell you the purpose of
developing this particular network. The purpose is once again to model the input-output
relationships of a process as accurately as possible.

Now, here, this was proposed in the year 1987 by Robert Nielsen. Now, this network
consists of, in fact, 3 layers one is called the input layer, we have got the unsupervised
Kohonen network or the Kohonen layer and we have got actually a supervised layer,
where we follow the principle of Grossberg learning. We have got one teachable output
layer and this particular layer is going to perform the Grossberg learning, which is
nothing, but a supervised learning.

So, in this network, we are going to consider both supervised as well as unsupervised
learning. Now, let us see, how does it work? Now, construction-wise if you see, this
particular network consists of two models: one is called the in-star model and another is

411
called the out-star model. Now, this in-star model actually consists of input and the
Kohonen layer and the out-star model consists of the Kohonen and the output layer.

Now, we are going to discuss, in details, what is happening in in-star model and what is
happening in your the out-star model.

(Refer Slide Time: 02:21)

Now, let us see actually what is happening there, but before that let me just tell you a few
facts that this particular CPNN, that is the Counter Propagation Network, is found to be
faster than the multi-layered feed forward network and here, we do not use the concept
of the back-propagation algorithm. So, the chance of its solution for getting trapped into
the local minima is actually nil and this CPNN, if you see the performance, that is, the
accuracy in prediction. So, it could be a little bit inferior compared to your multilayered
feed forward network, but it is computationally faster compared to your multilayered
feed forward network.

Now, if you see the literature, the CPNN could be either a full CPNN or it could be a
forward only CPNN. So, we are going to discuss the working principle of both full
CPNN as well as the forward-only CPNN. Now, let me first concentrate on the full
CPNN and how does it work? Let us see its working principle.

412
(Refer Slide Time: 03:41)

Now, this is actually the schematic view of the full CPNN and if you see construction-
wise, it has got two input layers. So, I have got one input layer here. So, one input layer,
here I have got another input layer here, then we have got two output layers so this is one
output layer, another output layer and we have got actually a common hidden layer and
this hidden layer is denoted by this. So, we have got two input layers, two output layers
and one hidden layer.

Now, the in-star model consists of, as I told, the input layers and this particular hidden
layer. So, this is actually, what you mean by the in-star model, and the out-star model
consists of the hidden layer and the output layer. So, this is nothing, but your the out-star
model and here, actually we take the help of your some correcting weights like your u
then comes v, w and s and these connecting weights are lying in a normalized scale,
generally we consider 0 to 1 or from – 1 to + 1.

Now, here, the purpose of this particular model is to establish the relationships between
the inputs and the outputs and supposing that I have got a process and this process, I am
just going to model and this process is having say n number of the inputs x_1, x_i up to
say x_m. So, this is your x_1, then comes your x_i, and then comes your x_m. So, we
have got m number of inputs and this process is having say small n number of outputs
denoted by y_1, y_k, y_n. So, we have got your y_1, then comes your y_k and the last
one is say your y_n; so we have got n number of outputs.

413
So, this particular process, I am just going to model with the help of your full CPNN.
Now, here, actually what will have to do is, as we have already mentioned, that we will
have to implement the Kohonen network fast. And, we will have to find out, who could
be the winner of these particular neurons lying in the hidden layer, and then we will have
to go for some sort of supervised learning, that is your the Grossberg learning and let us
see, how does it work.

(Refer Slide Time: 06:43)

So, let me concentrate first on this particular the in-star model and I have already
mentioned that in the in-star model, we consider two input layers and the hidden layer.
Now, here, so these two input layers, we are drawing in a slightly different fashion, for
example, say this is one input layer, this is another input layer and this is actually your
the hidden layer.

So, from this particular x, I am just moving towards the hidden and from this particular
y, once again we are moving towards the hidden, you can see the directions of arrow, so
these are coming from both the sides and this is nothing, but your the hidden layer and
on the hidden layer, actually we have got small p number of neurons and as I told, our
task will be out of this small p number of neurons, just to identify who could be the
winner.

Now, the connecting weights between your x_input layer and the hidden layer is
nothing, but u and the connecting weight between your y_input layer and the hidden

414
layer is nothing, but is your the v and this is actually the construction of this in-star
model, now let us see, what do you do in in-star model.

(Refer Slide Time: 08:19)

So, we generate the connecting weights, that is u and v in the range of say 0 to 1, and we
consider some learning rate, for example, say α , the learning rate between x input layer
and the hidden layer, and β is the learning rate between the y_input layer and the hidden
layer, now this α and β will lie in a range of say 0 to 1. So, they are going to lie in the
range of your 0 to 1 and these particular connecting weights, sorry this learning rate will
vary in this range.

Now, here, as I mentioned that we are going to use Kohonen and self organizing map
just to find out like, who could be the winner lying in this hidden layer.

415
(Refer Slide Time: 09:19)

Now, if you see the way, this particular winner is declared is as follows: So, we try to
find out the Euclidean distance, that is, d_j now what is d_j? So, this is nothing but,
m n
d=
j i
=i 1 =
ij∑ (x − u
k 1
) + ∑ ( yk − vkj ) 2 . Now, if we remember, we have got small m number
2

of x inputs and your small n number of y inputs and this u_ij is nothing but, your the
connecting weight and v_kj is nothing, but the connecting weight and here j varies from
1 to p. So, p is nothing, but the total number of neurons lying in the hidden layer.

So, for each of the hidden neurons, we try to find out, what are these d values and then,
we compare and find out, which hidden neuron is going to give the minimum d value,
that will be declared actually as the winner. Now, here, the connecting weights, that is,
u_ij will have to be updated using this particular rule, that is,
=
uij (updated ) uij ( previous ) + α ( xi − uij ( previous )) . Now, here, this α is nothing but,

the learning rate lying in the range of 0 to 1. Now, here, this particular i is nothing but 1,
2 up to your small m, and as I have already mentioned, so this small m is nothing but, the
number of x_inputs.

Now, v_kj, this particular connecting weight, how to update? So, v_kj (updated) is
nothing but, your v_kj (previous) plus β into (y_k minus v_kj (previous)). Now, this
particular β is once again the learning rate. Now, once that particular process is over,

416
now we are in a position to declare that who could be the winner lying on the hidden
layer, and that completes actually one iteration of in-star training.

(Refer Slide Time: 12:21)

Now actually, we will have to go for some sort of the out-star model. Now, if you see the
out-star model, the out-star model will look like this and as I told that we have already
got the winner, supposing that your z_j is nothing but the winner and this particular
winner is lying on the hidden layer.

Now, if you see the connecting weights, the connecting weights between this winner
lying on the hidden layer and your this x∗ is denoted by actually your w and the
connecting weight between your the winner hidden neuron and your y ∗ , that is, your
output layer. So, this particular connecting weight is denoted by s and we can assume or
we can generate the values of the connecting weights initially at random and we can
update also. Now, how to update that? We are going to discuss.

417
(Refer Slide Time: 13:23)

Now, here, if you see, this I have already mentioned here, for example, z_j is the winner
connecting weights lying between 0 and 1 and γ is actually the learning rate between z_j

and x∗ output layer and δ is actually your the output learning rate between z_j and y ∗

output layer. Now, if you just see so z_j and x∗ , here, we have got the your learning rate
and here, we have got another learning rate, now particular learning rates actually will
have to assign some numerical values and their values we lie between 0 and 1.

(Refer Slide Time: 14:23)

418
Now, let us see, how to carry out this particular training of the out-star model and as I
told that we are going to use the Grossbergs learning rule for updating of the connecting
weights, that is, your w and s. Now, according to your this Grossberg learning rule,
=
w ji (updated ) w ji ( previous ) + γ ( xi − w ji ( previous )) and here i varies from 1 to up to m,

=
then s jk (updated ) s jk ( previous ) + δ ( yk − s jk ( previous )) .

So, by following this actually, we can update the connecting weights and once you have
updated then at the end, we can take the decision that xi∗ = w ji (updated ) and

yk∗ = s jk (updated ) and it completes actually one iteration of this out-star training and

once completed your this in-star training and the out-star training, now your are in a
position to find out, what should be the outputs for a set of inputs.

(Refer Slide Time: 16:07)

Now, if you see like how to implement this actually, we are going to solve one numerical
example, but before that, let me try to concentrate on another possibility of this CPNN,
that is, your the forward only CPNN. Now, till now, we have considered, we have
discussed the full CPNN, where we consider that there are two input layers, two output
layers and one hidden layer. Now, for this forward only CPNN, it is simpler in fact, now
here, we consider only 1 output layer, only 1 input layer. So, we consider, in fact, your
only 1 input layer and only 1 output layer and in between, we have got say 1 hidden
layer.

419
Now, if you see the construction of this forward only CPNN, it looks like this, say we
have got 1 input layer consisting of small m number of neurons like x_1, x_i up to x_m,
so, this is nothing but the input layer and we have got small n number of your the
neurons on the output layer and which indicates actually y_1, y_k and y_n and this is
your the hidden layer having p number of neurons and in between your this input layer
and the hidden layer, so we have got the in-star model and in between the hidden and the
output we have got this particular your out-star model.

The working principle is exactly the same. So, what you will have to do is, you will
have to pass the set of inputs and this connecting weights like your u and v will have to
go on updating and what you can do is, we can use the Kohonen network, that is
unsupervised training sort of thing just to find out, who could be the winner and after
that, we can take the help of your the Grossberg learning and once again, we can use
both supervised as well as unsupervised learning just to determine like, what should be
the set of outputs corresponding to your these particular inputs.

Now, as we have already mentioned that this particular CPNN can be used to model
input-output relationships of any engineering process. Now, here, if we compare the
performance of this particular CPNN with back propagation neural network or
multilayered feed forward network or radial basis function network, there is a possibility
that we may not get so much accuracy in this particular CPNN.

But, in CPNN actually there is one advantage, as we do not use the back propagation
algorithm or the gradient-based algorithm, so the chance of the solutions for getting
trapped into the local minima is nil. So, there is no such chance of local minima problem
and that is why, actually many people prefer this particular CPNN, even compared to
multilayered feed-forward network.

Thank you.

420
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture - 30
Some examples of neural networks (Contd.)

(Refer Slide Time: 00:15)

Now, we are going to solve a numerical example related to a full CPNN. Now, if you see
the network, which I have already discussed, actually this is the schematic view of this
full CPNN. And, let me give you the statement of this particular problem first, for this
CPNN actually we are going to pass through 3 inputs and that is your x_1, x_2 and x_3.

Now, x_1, x_2 and x_3 are nothing but 0.3, 0.5 and 0.6 and its corresponding outputs are
0.3 and 0.4. And, let us consider that there are only two neurons in the hidden layer. So,
we are considering two neurons in the hidden layer here and these x_1, x_2 and x_3 are
the inputs and these are nothing but your the outputs. And, here, we consider two such
input layers, one is this, another is this, and the connecting weights u, v, w, s, the way I
discussed, and these are all star conditions for the x and star conditions for, the y.

Now, let us see the connecting weights. For these neurons, let me just put some numbers
here and then it will be easy. So, let me consider say 1 here, 2 here, 3 here. So, this is
nothing but u_11. So, this connecting weights is nothing but u_11, then this is nothing
but u_21, then comes here, so this is nothing but u_22, and so on. Now, similarly here if

421
I just put 1 and 2, so the connecting weights between this and this, so this is nothing but
is your v_11, then comes here v_12, and so on. Now, similarly, we have got the
connecting weights: w and s here. Now, these numerical values of the connecting
weights are assumed to be as follows:

(Refer Slide Time: 02:39)

Now, here, we have written all the initial connecting weights like u_11 is 0.2, u_12 is
0.3, and so on. So, this is the way actually we can write down the numerical values for
the connecting weights. Then, v is nothing but this, then comes here s is nothing but this
and your the w the connecting weight matrix is nothing but this.

422
(Refer Slide Time: 03:15)

Now, if this is the situation, now, this shows actually the connecting weights and we can
assume that the learning rate values are as follows: For example, say α is nothing but
this, then comes β is 0.3, γ is 0.1, δ is 0.4 and our aim is to calculate like x_1^star,
x_2^star, x_3^star, then y_1^star and y_2^star at the end of first iteration.

Now, here, we are going to solve only one iteration and let us see, how does it work.
Now, this is another view of your this In-star model and these are nothing but the x
inputs, these are the y inputs, so x_1 is 0.3, x_2 is 0.5, x_3 is 0.6 and y_1 is 0.3, and y_2
is nothing but 0.4. And, if you see the connecting weights, so these particular connecting
weights are 0.2, 0.3, and so on and here also, we have got the connecting weights.

Now, the first thing, which we will have to do is, you will have to find the Euclidean
distance from these two hidden neurons and we will have to declare a winner out of these
two. Now, let us see, how to proceed with that particular calculation.

423
(Refer Slide Time: 04:43)

That means, we are going to concentrate on your the In-star model. Now, here, in this in-
star model, our aim is to find out this particular the distance value. Now, this
3 2
d=
1
=i 1 =
∑ ( xi − ui1 )2 + ∑ ( yk − vk1 )2 .
k 1

Now, here, you can see that i varies from 1 to 3; that means, I put x_1 minus u_11 square
plus x_2 minus u_21 square plus x_3 minus u_31 square. So, this is the thing. Next, we
put k equals to 1 to 2. So, I put k equals to 1, so y_1 minus v_11 is nothing but this. Then
k equals to 2, so y_2 minus v_21 is nothing but this and square. And, if we calculate, I
will be getting this particular d_1 as 0.51. Now, on the hidden layer in fact, we have got
two such neurons, so I will have to calculate, in fact, another d value. Now, that is your
d_2.

424
(Refer Slide Time: 06:11)

Now, this particular d_2 is nothing but square root summation i equals to 1 to 3 (x_i
minus u_i2) square plus summation k equals to 1 to 2 (y_k minus v_k2) square. Now, I
put i equals to 1, I will be getting (x_1 minus u_12) square plus (x_2 minus u_22) square
plus (x_3 minus u_32) square plus summation k equals to 1 to 2. So, this is nothing but
(y_k minus v_k2) square; k equals to 1, so (y_1 minus v_12) square plus (y_2 minus
v_22) square and if you calculate you will be getting. So, d_2 is equal to 0.44.

Now, if you remember d_1 was your 0.51. And, if I compare this particular d_1 and d_2,
so d_2 is found to be less compared to d_1. So, z_2 is actually the winner. So, through
this competition, z_2 has been declared as winner. Now, once you got this particular
winner. Now, we are in a position to update the connecting weights. For example, say
=
u12 (updated ) u12 ( previous ) + α ( x1 − u12 ( previous )) . Now, we substitute the numerical
values and we can find out that this particular u_12 updated, it will be 0.3; that means,
there is no change in this particular the u_12 value.

425
(Refer Slide Time: 08:14)

Now, here if you just go for the updating of the other parameters like your u_22
(updated) is nothing but u_22 (previous) plus α into (x_2 minus u_22) previous. Now, if
you substitute the numerical values and calculate, so u_22 updated will be your 0.58,
then comes here u_32 (updated) is nothing but u_32 (previous) plus α into (x_3 minus
u_32) previous. And, if you substitute the numerical values you will be getting 0.52.
Similarly, we can find out the updated values for your v, that is, v_12 (updated) is
nothing but v_12 (previous) plus β multiplied by (y_1 minus v_12 (previous)). Once
again, if I substitute the numerical values; so I will be getting say 0.58.

The next we will have to update is this v_22. So, v_22 (updated) is nothing but v_22
(previous), plus β into (y_2 minus v_22 (previous)), and if you submit the numerical
values then you will be getting that v_22 updated is nothing but 0.3 3. Now, this the way
actually, we can find out the updated value.

426
(Refer Slide Time: 09:44)

And, once you have got that. Now, we are in a position to carry out the Out-star model.
Now, here in the Out-star model, so we will have to consider the winner hidden neuron
and that is nothing but your z_2. And, if you see, this is one set of connecting weights,
another set of connecting weights and those connecting weights once again we will have
to update.

(Refer Slide Time: 10:14)

Now, if you update those connecting weights like your w_21 (updated) is nothing but
w_21 (previous) plus γ multiplied by (x_1 minus w_21 (previous)). And if you

427
substitute the numerical values and calculate, so w_21 updated will be 0.21. Similarly, w
_22 (updated) it is nothing but w_22 (previous) plus γ multiplied by (x_2 minus w_22
(previous)) and if you substitute the numerical values you will be getting 0.32. Then,
w_23 (updated) is nothing but w_23 (previous) plus γ into (x_3 minus w_23 (previous))
and if you substitute the numerical values you will be getting 0.42. Then, s_21 (updated)
is nothing but s_21 (previous) plus δ into (y_1 minus s_21 (previous)), and if you
substitute the numerical values you will be getting 0.42.

(Refer Slide Time: 11:30)

Now, the same procedure, we will have to use for updating s_22. Now, s_22 (updated) is
nothing but s_22 (previous) plus δ into (y_2 minus s_22 (previous)) and if you
substitute the numerical values you will be getting that is equals to 0.58.

Now, what we do is, those updated values we are going to assign here. So, if you see
x_1^star is nothing but 0.21. So, x_1^star is nothing but 0.21, x_2^star is 0.32, and
x_3^star 0.42, y_1^star is 0.42 and y_2^star is 0.58. Now, this is the way actually, we
can complete one iteration of these particular the CPNN.

428
(Refer Slide Time: 12:34)

Now, we will have to repeat and then, through a number of iterations, we will be getting
that particular the relationship. Now, till now, we have discussed the working principle
of multi-layered feed-forward network and that is very popularly known as the back-
propagation neural network.

We have also discussed the principle of radial basis function network, then comes, we
discussed recurrent neural network, and after that, we concentrated on the self organizing
map, which works on unsupervised learning. And then, we concentrated on the counter
propagation neural network, that is CPNN, which actually uses the concept of both
supervised as well as your unsupervised learning. And, the main purpose of developing
these particular networks is to establish the input-output relationship; so this problem is
related to data mining, which we can solve using the different types of the neural
networks.

Now, for your further study, you can concentrate on the textbook of this particular
course, that is, Soft Computing: Fundamentals and Applications. So, you can see this
particular textbook for more details.

429
(Refer Slide Time: 14:08)

So, let us summarize like which we have already discussed a little bit like, we have
discussed the principle of multi-layer feed forward network, radial basis function
network, then comes your recurrent network. So, these three networks works based on
the supervised learning.

And, then, we concentrated on the self-organizing map, is a very efficient tool for
dimensionality reduction or visualization. It is also an efficient tool for clustering and
using the concept of self organizing map, we have discussed how to design and develop
this counter propagation neural network. And, once again, the purpose is to model the
input-output relationships.

Now, here, we have discussed different types of networks and let me repeat once again.
The purpose of actually designing different types of networks is once again, how to
model the human brain in the artificial way. Now, we have already discussed the
principle of fuzzy logic, we have already discussed the principle of neural networks and
we have seen like how to evolve the fuzzy reasoning tool, fuzzy clustering tool, and all
such things.

And, in the next lecture, in fact, we are going to see how to evolve a particular network,
and what is the basic principle, based on which we can evolve very efficient network, so
that we can do this type of input-output modeling in a very efficient way.

430
Thank you.

431
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture - 31
Optimal Designs of Neural Networks

Now, we are going to discuss on the topic: Optimal Design of Neural Networks. Now,
we have seen that we can use a multi-layered feed-forward network to model input-
output relationships of an engineering system/process, accurately both in the forward as
well as in the backward directions. But, this particular network does not have an inbuilt
optimization tool. Now, what you will have to do is, to ensure the accuracy in prediction.
So, we will have to use an optimizer along with this particular network in order to train
it.

Now, what we can do, we can use some traditional tools for optimization, just to
optimize this particular network, like we can use steepest decent as our optimization
algorithm, in the form of back-propagation algorithm and we have already discussed this
type of network, which is very popularly known as the back-propagation neural network.
Now, on the other hand, we can also use some sort of nature-inspired optimization tools,
like genetic algorithms, particle swarm optimizations, and others, just to optimize or to
evolve the neural networks, which will be able to predict the input-output relationship,
very accurately both in the forward as well as reverse direction.

Now, here, in this topic, I am just going to discuss like how to evolve a multi-layered
feed-forward network using the principle of genetic algorithm-based principle of
evolution.

432
(Refer Slide Time: 02:25)

So, if you see the topic, which I am going to discuss, let me repeat once again. We are
going to discuss the principle of evolution of neural networks using a nature-inspired
optimization tool, say genetic algorithm, or, in other words, we are going to optimize the
performance of this particular neural network using one nature-inspired optimization
tool, say genetic algorithm.

(Refer Slide Time: 02:56)

Now, if you see the combined genetic algorithm and neural networks. Now, this is very
popularly known as the Genetic-Neural System and in short, this is nothing, but the

433
GNS. Now, the purpose of developing this genetic-neural system is to optimize the
performance of a neural network with the help of a genetic algorithm. Now, let us see,
how to optimize the performance of a network using the principle of evolution of this
genetic algorithm.

(Refer Slide Time: 03:44)

Now, if you see the performance of a network, it depends on actually a number of things.
For example, it depends on the topology of this particular network, it depends on the
connecting weights; that means, the connecting weights between the input and hidden
layer, and that between the hidden and output layers, it depends on the bias values, it
depends on the coefficient of transfer function of the different layers.

Now, supposing that I am just going to use a multi-layered feed-forward network to


model input-output relationships of a process. And, let me take a very simple example,
supposing that the process is having say 4 inputs. So, it is having 4 inputs and there are
say 3 outputs of this particular process, and I want to model, its input-output
relationships with the help of one multi-layered feed-forward network.

Now, if I concentrate on the input layer. So, there will be 4 neurons. So, these are the
neurons on the input layer, say I have got 4 inputs like I_1, I_2, I_3 and I_4 and there are
4 outputs on the output layer. So, there will be actually 3 outputs, that is nothing, but
O_1, O_2, and O_3.

434
Now, this is the input layer. So, this is nothing, but the input layer of this network and
this is nothing, but the output layer of this particular network. Now, the number of
neurons to be present on the input layer and that to be present on the output layer are
kept fixed to the number of inputs and the number of outputs of the process to be
modeled respectively; that means, on the input layer, there will be 4 neurons; on the
output layer, there will be actually 3 neurons.

Now, the topology or the architecture of this particular network depends on how many
hidden layers we put in between the input layer, and the output layer and how many
neurons we are going to put on the hidden layers. Now, actually that is going to decide,
what should be the topology or architecture of this particular network. Now, the
performance of this particular network largely depends on the topology of the network.
Now, here, actually we are going to discuss two different approaches to develop the
genetic-neural system.

Now, I am just going to concentrate for the time being on approach 1, and that is
nothing, but genetic algorithm-based tuning of connecting weights, bias values and other
parameters. That means, in approach 1, we are going to consider a network having
known or the fixed topology or the architecture. So, this is actually nothing, but a
network of fixed, topology or architecture. Now, if you just keep the architecture or the
topology of the network fixed. Now, its performance depends on the connecting weights,
it depends on the bias values, it depends on the coefficient of transfer function, and so
on.

So, our aim in approach 1 is to determine the optimal values of this particular network,
so that it can perform in the optimal sense. Now, before I go for the principle of that
particular approach 1, I just want to discuss one fact from the biological adaptation.
Now, let us see the principle of biological adaptation.

435
(Refer Slide Time: 08:22)

So, in biological adaptation actually, there are two things; one is your the principle of
evolution and we have got the principle of learning. Now, this evolution and learning are
two important parameters in biological adaptation. Now, the same thing actually, we are
going to copy it here, in the artificial way. Now, before I copy it, let us discuss, what is
happening in biological adaptation.

Now, if you concentrate on this particular learning. Now, learning takes place during
one’s lifetime, on the other hand, the evolution takes place through a large number of
generations or a large number of iterations. So, this evolution and learning, these two
parameters are working on two different time scales.

Now, if we concentrate on this particular learning. Now, we go on learning, so, long as


we live in this particular world and we try to collect good information, good knowledge,
now this particular good information and good knowledge, we want to pass it to the next
generation. Now, if we pass it to the next generation, there is a possibility that it is going
to accelerate the rate of evolution, because the next generations are going to get all the
good information. So, there is a chance that the rate of evolution is going to increase.

Now, on the other hand, if you see, the principle of learning. So, during the learning
actually, one spends a lot of time on carrying out the optimization; that means, for this
learning knowingly or unknowingly we use the principle of optimization. And, if you see
the optimization tool particularly the nature inspired optimization tool. So, in these tools

436
actually, we use the principle of evolution. So, this particular evolution is going to help
learning and learning is also going to help the evolution. So, they are helping each other
and through this particular mutual help, there is a chance that the rate of biological
adaptation is going to increase. The same thing has been copied here, in this particular
genetic-neural system.

Now, what we do is we use some evaluation tool like say genetic algorithm, say denoted
by GA-NN. We use some learning tool, for example, say it could be a neural network,
and in this combined tool so, this particular GA and neural network, so, what we do is,
we try to optimize the neural network, or we try to improve the performance of the
neural network, and we try to actually design and develop, what is known as the
combined GA-NN technique and that is nothing, but genetic-neural system.

Now, there are many applications of this particular genetic neural system. For example,
say in the field of robotics, there are a lot of applications like how to evolve, the adaptive
controller for an intelligent robot, how to evolve an adaptive controller for a particular
motor used in robots. Now, the principle of evolution has been used in robotics and this
works based on actually the principle of genetic neural system, and a new field of robotic
research has started, that is known as the evolutionary robotics.

Now, in evolutionary robotics, the main aim is to evolve the suitable motion planner or
the adaptive controller instead of going for the direct design of this particular controller
or the motion planner. Now, here actually, we use the principle of evolution instead of
going for the direct design. Now, we actually realize one fact, that through this direct
design many things, we are unable to foresee beforehand. And, that is why, actually we
will have to take the principle of evolution for the development of an efficient system.

Now, let me see, how to use and how to develop this genetic neural system. Now, I have
already mentioned about this GA neural network and I have also discussed about these
back propagation neural network. Now, let me repeat, in back propagation neural
network, for optimizing we use the back propagation algorithm, that is the BP algorithm.
And, this BP algorithm works based on the steepest decent algorithm, this we have
already discussed. Now, if I compare the performance of this particular BPNN and the
GA NN, this BPNN works based on the steepest decent algorithm; so, there is a chance

437
of the algorithm for getting trapped into the local minima problem. And, the chance of
getting the local minima is much less in case of this particular the GA neural network.

Now, actually if you see the literature on genetic algorithm, we have got the different
versions of genetic algorithm. Now, those things actually, I am not going to discuss, in
details in this particular course, but this is available in the textbook used for this
particular course. Now, if you see the literature, the genetic algorithm the first version of
genetic algorithm is nothing, but the binary coded genetic algorithm, that is, your BCGA.
Now, in BCGA actually, what you do is, all the variables we represent with the help of
some binary numbers and binary is nothing, but a combination of 1’s and 0’s.

Now, here let me take a very simple example, supposing that I am just going to optimize
a neural network. And, let me assume that there are say 20 variables in this particular
network. So, if I want to optimize the performance of this particular neural network,
what do you will have to do is, you will have to find out the optimal values for each of
these particular 20 variables. Now, the variables could be the connecting weights bias
values, the coefficients of transfer functions and all such things. Now, here, if we use the
binary coded GA, to represent each of these particular variables, we will have to use a
number of bits, let me assume that, we are going to use say 10 bits to represent each of
these particular variables.

Now, so, if I just concentrate on a particular GA string there is a binary coded GA


string, it looks like this, like your say 10 11 dot dot dot and the last term might be say 10.
Now, if there are 20 variables and if we use 10 bits for each variable, we will have 20
multiplied by 10; that means your 200 bits in one GA string. So, here, I have got in fact,
200 bits. Now, out of 200 bits, supposing that the first 10 bits are used to represent a
particular variable, the next 10 bits are used to represent another variable, and so on.

Now, the moment we use 10 bits, we are going to divide the range of this particular
variable into actually 210 −=
1 1024 −=
1 1023 equal divisions. Now, for one variable, we
have got the 1023 equal divisions; that means, we have got say 1024 numerical values on
the range of this particular variable.

Similarly, on the second variable, within the range we have got another 1024 numerical
values, on the third we have got another 1024 numerical values and this will go on up to
your say 20th variable. Now; that means, if we see the total number of combinations of

438
the numerical values, which have to be considered before the GA can decide the optimal
solution is nothing, but 102420 .

So, this is actually a large number. So, 1024 raise to the power 20 is a large number; that
means, the GA will have to carry out some search for many combinations of the input
variables or the design variables, before it can declare, that this is a globally optimal
solution. So, this is a very difficult task for the binary coded GA and this problem, in
genetic algorithm is known as actually the permutation problem. Now, if there are large
number of variables, this binary coded GA actually can suffer from, so this particular the
permutation problem.

Now, the point, which I just want to make it clear, that if I have got a very large number
of variables, it is better not to use this binary coded GA. And, in place of the binary
coded GA, in fact, we can go for some sort of real coded GA. Now, once again, a
detailed discussion on a real coded GA is beyond the scope of this particular course, but
this thing is available in details in the textbook for this course, that is, soft computing:
fundamentals and applications.

So, we are going to use, say, the real coded GA just to overcome this particular
permutation problem, but here, for simplicity, I am just going to discuss this binary
decoded GA only; that means, how to use a binary coded GA to design or evolve a
suitable neural network, that it can predict the input output relationships very accurately.

(Refer Slide Time: 20:24)

439
Now, let us try to concentrate on approach 1, approach 1 we have already discussed and
now, I am just going to discuss some numerical examples also, after some time. And, let
me concentrate a little bit on the principle of approach 2. Now, in approach 2, actually,
what we do is, we generally go for the genetic algorithm based tuning of the neural
network topology. Now, this I have already mentioned that the topology of this particular
network, or the architecture of this particular network, depends on the number of hidden
layers and the number of neurons you put under the hidden layer.

Now, if I give this particular task to this genetic algorithm. So, the binary coded GA is
going to face one problem, the problem is as follows: Now, the number of neurons on the
hidden layers and the number of hidden layers so, that has to be encoded inside that
particular GA-string. That means, your so, this particular GA string is going to encode
like if I use the binary coded GA, this is nothing, but a particular GA string. So, it is also
going to encode the information related to the number of hidden layers, which we are
going to use and the number of neurons to be present in this particular the layer.

From one GA string to the next GA string, the number of hidden layers and the number
of neurons to be present in the hidden layer are going to vary; that means, we are going
to face a problem that is called the variable string length genetic algorithm. Now, let me
explain, supposing that in the population of GA, we have got a large number of solutions
and whose size is denoted by N, that is the population size, that will see here I have got
all such binary strings say. So, these are all binary strings here. Now, here, if I consider
this approach, whose aim is to optimize the topology or architecture of the network.

So, there is a possibility that I will be getting one GA string and might be it is having the
length say 100. So, the GA string may look like this. So, there are say 100 bits, here say
100 bits, at the next GA string it may have say 120 bits. So, there could be another GA
string here, and which may have say 120 bits. Now, one GA string is having 100 bits,
another is having 120 bits.

Now, if I just go for the crossover operator, now for this particular binary coded GA, we
are going to face a lot of problems because this particular GA string is having 100 bits
and this particular GA string is having 120 bits. So, for the last 20 bits actually we cannot
easily do the crossover operation. So, we are going to face a lot of problem in crossover
particularly, if you use the binary coded GA.

440
Now, that is actually a problem related to the variable sting lengths genetic algorithm.
Now, there are some ways to overcome this particular problem and we have got a special
type of GA, that is called your that messy GA. The messy GA is another very popular
GA, where we consider the variable string lengths during the crossover operation. Now,
this is once again beyond the scope of this particular course. So, this messy GA actually I
am not going to discuss in details.

(Refer Slide Time: 24:40)

Now, let us try to concentrate on this multi-layered feed forward network. Now, as I
discussed several times like your this network is having your say 3 layers, like your the
input layer, then hidden layer and the output layer. Now, this input layer is having M
neutrons, the hidden layer is having N neurons, and the output layer is having P neurons.
And, let me use the linear transfer function on the input layer, the log sigmoid transfer
function on the hidden layer and tan sigmoid transfer function on the output layer.

Now, this particular network I want to optimize or this particular optimal network we
want to evolve with the help of a genetic algorithm. So, what you will have to do is, all
the design parameters, like the connecting weights between the input layer, and the
hidden layer, and the connecting weights between the hidden layer, and the output layer
we will have to optimize. We will have to optimize the coefficient of the log sigmoid
transfer function, the coefficient of tan sigmoid transfer function, and for simplicity I did

441
not consider any bias value here and if I consider the bias value for example, if I consider
the bias value. So, those bias values also we will have to optimize.

And, the working principle of this multi layered feed forward network, I have already
discussed in details. So, I am not going for that once again. Now, instead what I am
going to discuss, how to represent this particular network inside one GA string.

(Refer Slide Time: 26:37)

Now, before I discuss, let me recapitulate that error in prediction of k-th output neuron is
1
nothing, but is your this particular expression, that is,=
Ek (Tok − Ook ) 2 . Then, the total
2
error in prediction considering all the output neurons is nothing, but
P
1
=E ∑ 2 (T
k =1
ok − Ook ) 2 . So, by using this, we can find out, what is the total error in

prediction.

442
(Refer Slide Time: 27:25)

Now, here, actually our aim is to represent this network with the help of a binary coded
GA string and we are going to evolve or we are going to optimize this particular
network. Now, this is a particular GA string like the binary coded GA. Now, the first few
bits are going to represent, what should be the connecting weight, that is V_11. And, the
last V connecting weight that is V_MN, then the W connecting weights, now to represent
each of these connecting weights, I will have to assign a few bits.

Then to represent a_1 that is the coefficient of transfer function for the log sigmoid
transfer function, then a_2 is the coefficient of transfer function for the tan sigmoid
transfer function. So, we will have to assign the bits. Now, this particular GA string is
going to carry information for the whole network and supposing that we know the
architecture. And, this optimization, we are doing for say, the fixed architecture or the
fixed topology. Now, if we concentrate on the GA string, it is going to carry the whole
information of this particular network.

And, similarly, in the population of genetic algorithm, we have got a large number of
strings; that means a large number of neural networks. Now, if you see so, if this is the
population of GA string, the first GA string could be something like this, second one
could be something like this, and then the last the n-th one could be something like this.
So, each of these particular GA strings is going to carry information of this particular the
network.

443
And, then, as we discussed the principle of genetic algorithm, in short, that we will have
to find out what should be the fitness, for this particular GA string, what should be the
fitness for these particular GA strings, the fitness for the n-th GA string. And, once you
have got this fitness information. Now, you can use the operators like reproduction
crossover and mutation and through a large number of iterations, the GA will try to find
out, that particular network, which will be able to predict actually this input-output
relationship very accurately.

(Refer Slide Time: 30:15)

Now, how to define this particular the fitness? Now, to define the fitness actually what
we do is, we use the concept of actually the batch mode of training. Now, the principle of
batch mode of training, we have already discussed; that means, we pass a large number
of training scenarios denoted by capital L, and we consider all the outputs, that is, your
the capital P number of outputs, and we try to find out this average error after passing all
capital L training scenarios. And, this is nothing, but the fitness of the GA string, denoted
1 1 L P 1
=by f ∑ ∑
L P=l 1 =k 1 2
(Tokl − Ookl ) 2 . Now, hear this O_l is nothing, but the output of the k-

th neuron lying on the output layer corresponding to the l-th training scenario. Similarly,
this T_l is nothing, but the target output of the - neuron lying on the output layer,
corresponding to the l-th training scenario. Now, this is the way actually, we calculate
the fitness of a particular GA string. And as I told that once you have got the fitness

444
information for the whole population; now we are in a position to discuss, how it can
evolve that optimal network?

Now, here, as GA works based on the principle of evolution, now there is a possibility
that the GA will try to find out a multiple optimal solutions for this particular the
network. Now, if you get the multiple optimal solutions, anyone actually we can use in
this particular network.

(Refer Slide Time: 32:43)

So, this network will be able to predict the input output relationships very accurately.

Thank you.

445
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 32
Optimal Designs of Neural Networks (Contd.)

(Refer Slide Time: 00:15)

Now, we have discussed, how to encode the number of hidden layers, number of neurons
to be present in the each of the hidden layers, that is the topology or the architecture of
the network and how to encode the connecting weights, the coefficient of transfer
function, the bias values inside that particular the GA-string. So, ultimately the GA-
string is going to carry the complete information of the neural network; that means, it
will carry the information related to topology, it will carry the information related to the
connecting weights, then the information related to the bias value, the coefficient of
transfer function.

So, this particular GA string is going to carry the whole information of this particular
network and we are going to take the help of the batch mode of training to optimize or to
evolve this particular network. Now, this shows actually one flowchart or the schematic
view like how does it work. Now, let me explain with the help of this particular
flowchart the working principle of this genetic-neural system.

446
Now, as I told that genetic algorithm is nothing, but a population-based approach. So, I
have got a population size N and these particular population of solutions are generated at
random. Supposing that the first solution is something like this, the second solution is
something like this and the last solution is something like this. Now, if I concentrate on a
particular the GA string that is the first GA string, it will carry the full information or the
whole information of this particular network.

Now, let us see, how to implement the batch mode of training for this type of network.
So, I am just going to discuss the batch mode of training of this particular network. Now,
GA starts with a population of solution and we create the initial population at random,
we put generation number is equal to 0. Now, here, we have got a check whether the
generation is greater than or equal to the maximum number of generations. Now, if it is
yes, that is the end of the algorithm and if it is no so, we concentrate on the first GA
string; that means, we are going to concentrate on the first GA string and we put GA
string equals to 0.

And, here there is another check, whether the GA string is greater than the population
size. Now, if it is no, then we start with the training case, there is the first training case;
supposing that we have got some training scenarios and the total number of training
scenarios, let me consider I have got capital L number of training scenarios. Now, a
particular training scenario carries information of the input and the output, similarly, we
have got capital L number of training scenarios. Now, corresponding to the first GA
string, we are going to concentrate on the training cases or the training scenarios.

Now, here, corresponding to the first GA string my neural network is ready and I am
passing all the training scenarios one after another. For example, say here I have got a
check, where the training scenario or the case is greater than the maximum case that is a
maximum number of training scenario. Now, if it is no, then we calculate the output of
the neural network. So, as I told that this particular neural network is indicated by this
particular GA-string.

So, we will be getting the output of this particular network and we use case equals to
case plus 1; that means, I am just going to pass all the training scenarios one after
another. The moment it satisfies this particular condition, so, what we do is, we calculate
the fitness of the GA string; that means, after passing all the training scenarios all capital

447
L training scenarios, we try to consider the fitness, we try to calculate the fitness of this
particular GA string and supposing that the fitness of the GA string is denoted by f_1 and
so, here we have got GA string equals to GA string plus 1; that means, we go for the
second GA string; that means, we are going to concentrate on this particular the second
GA string.

And, once again, for this particular the second GA string, my network is ready. Once
again, we will pass all the training scenarios and we will be getting this particular f_2
and this particular process will go on and go on and the moment it reaches this particular
criterion, that is GA string is greater than the population size; the fitness information for
the whole population is ready for us; that means, we have got the fitness information,
that is f_1, f_2 up to your f_n.

And, once you have got the fitness information for the whole population, now we are
going to modify this particular population of solutions using the operators like
reproduction, crossover and mutation. Now, the principle of reproduction, crossover and
mutation; these things actually, I have discussed in some of the earlier lectures. Now, this
particular process will go on and go on and this will complete actually one iteration or
one generation of this particular GA and the GA through a large number of iterations
will try to find out some optimal design of these particular networks.

And, as I told that there is a possibility that you will be getting multiple solutions; that
means, the multiple optimal neural networks and you can use any one out of these
multiple optimal neural networks for predicting the input-output relationships. Now, this
is the way actually, the genetic neural system works, the working principle of this
genetic algorithm, genetic-neural system, we have already discussed.

448
(Refer Slide Time: 08:01)

And, now actually, what we are going to do; we are going to solve one numerical
example just to make it more clear. Now, here, I am just going to solve one numerical
example and we are going to solve this numerical example and I am going to give the
statement of this numerical example.

A binary-coded genetic algorithm is used to update the connecting weights, coefficient


of transfer function of a neural network, as shown below. So, this is actually a very
simple network having say three layers – input layer, hidden layer and output layer. So,
on the input layer, there are two neurons; on the hidden layer there are three neurons and
the output, there is one neuron. So, this is nothing, but actually a 2-3-1 fully connected
network and let us see, how to optimize, this particular your network with the help of a
binary-coded algorithm.

449
(Refer Slide Time: 09:14)

Now, the rest of the statement of the problem is as follows. The neurons of the input and
hidden layer and output layers are assumed to have transfer function of the form y = x ,
1
that is, the linear transfer function for the input layer, y = that is nothing, but is
1 + e − a1x
e a2 x − e − a2 x
your the log sigmoid transfer function for the hidden layer. And, y = . So, this
e a2 x + e − a2 x
is actually the tan sigmoid transfer function.

And, the connecting weights v and w we are going to vary in the range of 0 to 1, the bias
value will vary in the range of 0.001 to 0.01 and the coefficient of transfer function, that
is a_1, a_2 are going to vary in the range of 0.5 to say 2.0. Now, we are going to show
one training scenario and for this particular training scenario actually, we will have to
find out what should be the output and that output we will have to compare with the
target output just to find out the deviation in prediction.

450
(Refer Slide Time: 10:53)

Now, the training scenario, as I told is something like this and the training scenario is
nothing, but the input-output relationship, now there are capital L training scenarios, the
out of that the first one is as follows. If I_1 is 0.6 AND I_2 is 0.7, then output is 0.9. So,
this is nothing, but the input-output relationship.

Now, if you concentrate on the GA string the GA string will carry information of this
particular network; that means, your this 2-3-1 neural network and it is having the fixed
architecture and we are going to use 1 2 3 4 5, 5 bits to represent each of these particular
the design variables. For example, 5 bits are used to represent v_11; the next 5 bits are
used to represent v_12, and so on. So, this particular GA string is going to carry the full
information or the whole information of a fixed architecture, that is, 2-3-1 neural
network. And, as I told that our aim is to determine the deviation in prediction, for this
particular the training scenario.

Now, let us see, how to find out that particular deviation in prediction.

451
(Refer Slide Time: 12:34)

Now, to determine the deviation in prediction, so what you will have to do is the first
thing you will have to find out the decoded values corresponding to this particular GA
string.

(Refer Slide Time: 12:53)

For example, say if I just want to find out the decoded value corresponding to this
particular 5 bits, which are used to represent the v_11. So, this is nothing, but 10110. The
place values are as follows: 2 raised to the power 0, 2 raised to the power 1, 2 raised to
the power 2, 2 raised to the power 3, 2 raised to the power 4 and the decoded value is

452
nothing, but 1 multiplied by 2 raised to the power 4 plus 1 multiplied by 2 raised to the
power 2 plus 1 multiplied by 2 raised to the power 1. Now, this is nothing, but is your 16,
this is nothing, but is your 4, and this is 2. So, this is nothing, but is your 22.

So, the decoded value corresponding to these 5 bits used to represent v_11 is nothing, but
is your 22 and once you have got the decoded value, now we can use the linear mapping
rule. We can use a linear mapping rule to find out the real value corresponding to that
binary sub-string. Now, what you do is, now, we have already discussed the decoded
value of that and we know that the range is nothing, but 0.0 to 1.0.

So, using the linear mapping rule, which I have already discussed the linear mapping rule
so, we can find out the real value corresponding to this v_11. So, v_11 is nothing, but
v_11 minimum that is your 0.0 plus v_11 maximum that is your 1.0 minus v_11
minimum that is nothing, but is your 0.0 divided by 2 raised to the power l; l is nothing,
but your number of bits used to represent that is 2 raised to the power 5 minus 1
multiplied by the decoded value.

Now, if I just write down this rule, that is v_11 is nothing, but v_11 minimum plus your
v_11 maximum minus v_11 minimum 2 raised to the power l minus 1 multiplied by the
decoded value. So, this is nothing, but the linear mapping rule. Now, here small l is
nothing, but is your 5 because we are using 5 bits to represent v_11. So, very easily, we
can substitute all the numerical values and we can find out what should be the real value
corresponding to your v_11.

453
(Refer Slide Time: 15:59)

Now, the same principle actually we can use to find out the decoded value and the real
value for each of the variables. Now, for this particular v_11, so, I have already
discussed like how to find out actually the real value. now, you follow the same principle
to find out the real values for each of these particular connecting weights v_11 then 12,
13, v_21, v_22, v_23, w_11, w_21 and w_31 and we can find out their corresponding
real values. So, we can find out their corresponding real values,.

Now, the range for your this particular a_1, that is the coefficient of transfer function for
the log sigmoid transfer function. Now, here the range is 0.5 to 2.0 and once again by
following the same principle you can find out what should be the real value. Similarly,
corresponding to a_2, I can find out the real value, for b, I can find out the real value and
the range for b is nothing, but 0.001 to 0.01.

So, I can find out the real values for each of these particular variables and once you have
got the real values, my network is ready and once this particular network is made ready, I
can pass actually the training scenarios, that is, your the known input-output
relationships.

454
(Refer Slide Time: 17:38)

For example, if I pass the set of inputs I will be getting the output, let us see how does it
work. It is very simple. So, this I_O1 that is nothing, but the output of the first neuron
lying on the input layer is nothing, but is your I_I1 that is the input of the first neuron
lying on the input layer plus b is the bias value and if you just substitute the numerical
values, so, you will be getting this I_O1.

Similarly, I will be getting I_O2, that is the output of the second neuron lying on the
input layer is nothing, but the input of the second neuron lying on the input layer plus
bias value and if you substitute the numerical values. So, it will be getting this as I_O2.

455
(Refer Slide Time: 18:51)

And, once you have got this particular thing, now, actually very easily I can find out,
what should be the input of the hidden neurons.

So, this your H_I1, that is the input of the first neuron lying on the hidden layer and I am
also adding some bias value b. So, I will be getting. So, this H_I1 is nothing, but I_O1
multiplied by v_11 plus I_O2 multiplied by v_21 plus the bias value is nothing, but
these, so, I will be getting this particular the input. Now, similarly, I can find out the
input of the neuron second neuron lying on the hidden layer plus bias value. I can also
find out the input of the third neuron lying on the hidden layer plus bias value and once
you have got this particular the inputs of the hidden neurons then by using the transfer
function, very easily, I can find out what should be the output.

456
(Refer Slide Time: 19:57)

Now, so, this H_O1 is nothing, but the output of the first neuron lying on the hidden
layer, then comes here H_O2 is the output of the second neuron lying in the hidden layer
and that is coming to be equal to 0.791418. Similarly, this H_O3 that is the output of the
third neuron lying on the hidden layer and this is coming equal to be your 0.650545 and
once you have got the output of the hidden neurons, very easily we can find out what
should be the input of the neuron lying on the output layer.

So, this O_I1 is nothing, but the input of the first neuron lying on the output layer and we
are adding the bias value and this O_I1 is nothing, but H_O1 multiplied by w_11, H_O2
multiplied by w_21, H_O3 multiplied by w_31 plus b. And, if you substitute the
numerical values, we will be getting the numerical value something like this and this is
nothing, but the input of the output layer.

457
(Refer Slide Time: 21:39)

And, on the output layer, we have got some transfer function and by using that so, very
easily we can find out what is your O_O1 that is nothing, but the output of the first
neuron lying on the output layer and we can use this tan sigmoid transfer function and it
will be getting the calculated output is nothing, but 0.981265.

Now, this calculated output we will have to compare with the target output and this
target output is nothing, but is your 0.9 and we can find out the deviation in prediction,
that is nothing, but 0.9 minus 0.981265, so, this is actually nothing, but the deviation. So,
the deviation is coming to be negative here. So, deviation is minus 0.081265. Now, this
is what happens after passing actually the first training scenario. Similarly, we are going
to pass the second training scenarios, third training scenarios and all capital L training
scenarios, we are going to pass one after another.

Now, this particular deviation, it could be either positive or negative and that is why,
actually, what we do is, we try to consider the mod value of these particular deviation
and the mod value of this deviation will be positive. So, for each of these particular
training scenarios or the training cases, we will be getting some deviation and then, we
find out what should be the average deviation and that particular average deviation will
be the fitness of the GA and as we discussed, that for each of the GA strings. We can
find out the fitness information. Now, we can use the operators like reproduction,
crossover and mutation. And, GA through a large number of iterations, we will try to

458
actually evolve that particular network, which can predict the input output relationships
in a very accurate way.

(Refer Slide Time: 24:08)

Now, if you see the reference like whatever we have discussed on genetic-neural system,
the same thing has been discussed in detail in the textbook of this particular course, Soft
Computing: Fundamentals and Applications, written by me. So, this is the reference for
this genetic-neural system.

(Refer Slide Time: 24:34)

459
And, now to conclude or to summarize whatever we have discussed, we have discussed
the principle of the genetic neural system, whose main purpose is to evolve a suitable
neural network, which can predict the input-output relationship of a process very
accurately. Now, the working principle, we discussed in details and after that, we have
solved one numerical example to make it more clear and I hope that you have understood
the working principle of genetic-neural system and by solving the numerical example so,
this particular concept has become more clear.

Thank you.

460
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture - 33
Neuro - Fuzzy System

Now, we are going to discuss how to combine Neural Networks with Fuzzy Logic, just to
develop a Neuro-Fuzzy System. Now, this particular neuro-fuzzy system has got a number of
practical applications. Now, here actually what we are going to do is.

(Refer Slide Time: 00:37)

We are going to concentrate on two Neuro-Fuzzy systems; one is based on the Mamdani
approach and another is based on the Takagi and Sugeno’s approach. Now, both Mamdani
approach and Takagi and Sugeno’s approach, we have discussed in details and we have
solved some numerical examples.

Now, let us see, how to develop the neuro-fuzzy system for this Mamdani approach and
Takagi and Sugeno’s approach?

461
(Refer Slide Time: 01:11)

Now, let us concentrate on this combined neural network and fuzzy logic. Now, as I told that
this Neuro-Fuzzy System, actually, in short, this is known as NFS, here, the purpose is to
improve the performance of these fuzzy reasoning tool and what we do is, the fuzzy
reasoning tool or fuzzy logic controller, we represent using the structure of a neural network
and this network is trained using either a BP algorithm or a genetic algorithm or any other
nature-inspired optimization tool. But, basically the main purpose of developing the neuro-
fuzzy system is to design and develop the fuzzy reasoning tool or fuzzy logic controller.

Now, if you see this literature, neural network and fuzzy logic have been combined, in fact, in
two different ways; one is called the neuro-fuzzy system and another is called the fuzzy
neural network.

462
(Refer Slide Time: 02:27)

Now, if you see the fuzzy neural network, now here, actually what we do is? We try to
represent the neurons of a neural network using the concept of your the fuzzy set theory. And
in fuzzy neural network, actually there are 3 different ways through which we can develop,
we can consider the real inputs and fuzzy weights, this is one way of developing the fuzzy
neural network, then comes we can consider the fuzzy inputs, but real weights and we can
also consider fuzzy inputs and fuzzy weights. Now, let me repeat.

So, this fuzzy neural network can be implemented in three different ways, as I told. But,
unfortunately, this fuzzy neural network could not reach much popularity, on the other hand,
the neuro-fuzzy system, that is your NFS could reach a huge popularity just to solve the
different types of the real world problems and that is why, actually in this course, I am just
going to consider the neuro-fuzzy system.

463
(Refer Slide Time: 03:57)

Now, this Neuro-Fuzzy system, as I told that this is known as NFS and here, I am just going
to discuss the neuro-fuzzy system based on the Mamdani approach. Now, let us see the
Mamdani approach, a fuzzy reasoning tool or fuzzy logic controller, how to represent using
the structure of a neural network and so, that we can develop the neuro-fuzzy system. But, let
me repeat once again. The main purpose of developing this neuro-fuzzy system is to design
or evolve rather one very efficient fuzzy reasoning tool, which can perform the input output
modeling.

Now, if you remember, in Mamdani approach, we have got a few steps. For example, say we
try to identify the inputs and outputs. Now, once those inputs and outputs are identified, then
we actually go for some sort of fuzzification. And, once we have got the fuzzified values for
the input parameters, then we, in fact, implement the inference, the fuzzy inference. And,
through this particular fuzzy inference, we got some output and those outputs are nothing, but
the fuzzified outputs. And, once you got that fuzzified output, we go for the defuzzificztion,
so that we can find out the crisp value of these particular outputs.

Now, here, this shows actually the schematic view of one neuro-fuzzy system, very simple
neuro-fuzzy system having only 2 inputs and 1 output, and this particular network consists of
5 layers. So, we have got this input layer. So, this is the input layer, the first layer is the input
layer, the second layer is actually the fuzzification layer, third layer is known as the AND
operation layer, then comes the fourth layer is nothing, but fuzzy inference and the fifth layer

464
is nothing, but is your defuzzification. Now, look-wise this is similar to a particular network,
but truly speaking, the is nothing, but the Mamdani approach of fuzzy reasoning tool.

Now, what is the reason? Why do you consider the structure of the neural network to
represent a particular the fuzzy reasoning tool? Now, the answer is very simple, we use the
structure of this particular network to implement or to actually represent the fuzzy reasoning
tool or fuzzy logic controller, based on Mamdani approach, so that we can modify this
particular network in order to train the Mamdani approach of fuzzy reasoning tool using the
principle of either the back-propagation algorithm or we can use some sort of nature-inspired
optimization tool like genetic algorithms and others. So, that is the main purpose, why do we
take the help of this type of the structure of the network.

Now, here, if you see we have got two inputs like your I_1 and I_2. So, on the first layer,
here you can see I have written I_I1, that is the input of the first neuron lying on the input
layer and this is I_I2, that is the input of the second neuron lying on the first layer and here, I
have got actually 1_O1 that is the output of the first neuron lying on the first layer and then
1_O2, that is the output of the second neuron lying on the first layer. So, the first layer is
nothing, but the input layer. Now, here, this is actually the second layer, the fuzzification
layer. Now, the purpose of fuzzification is to find out the normalized value corresponding to
these particular inputs.

Now, how to implement? So, that I am going to discuss in details. Now, once you have got
the fuzzified output and so, here you will be getting the outputs of layer 2 is nothing, but the
values of the memberships. Now these membership values for the two inputs: I_1 and I_2, I
will have to use here as input of the layer 3. And, on layer 3, actually we will have to carry
out the AND operation. And, if you remember, in AND operation, what we do is.

We try to compare the two µ values and we try to find out the minimum. Now, this
minimum of these two µ values will be considered as output of layer 3 or the third layer and
layer 4 is nothing, but the fuzzy inference layer. And, as we have discussed the purpose of
fuzzy inference is to select the set of fired rules corresponding to one set of input parameters.

Now, in the rule base, we have got a large number of rules. So, out of these large number of
rules, only a few will be fired depending on the set of inputs. Now, fuzzy inference is going
to decide, which rules are going to be fired out of the maximum number of rules present in

465
that particular the rule base. Now once we pass through this particular the fourth layer or the
layer 4. So, I will be getting actually the output here and these outputs are denoted by actually
4_O1 that is nothing, but the output of the first neuron lying on the fourth layer, and so on,
and here, those things will enter as input to the fifth layer or the defuzzification layer and as
output of this defuzzification layer. So, I will be getting some crisp output.

Now, this is the way actually it works, in short, but I will be discussing in much more details
and after that we will be solving one numerical examples also. Now, let us see this principle
in much more details.

(Refer Slide Time: 11:13)

Now, here for simplicity, we have assumed that both the inputs and the output are having
actually the triangular membership function distribution, for example, say, these are nothing,
but the membership function distribution for your I_1. So, the first input is expressed using
three linguistic terms like your low, medium and high. Similarly, 4 other linguistic terms are
used to represent I_2, for example, we have got very near, near, far and very far and the
output is actually represented using 3 other linguistic terms. So, S is nothing, but slow, F is
nothing, but fast and VF is nothing, but is your very fast.

So, for these 2 inputs and 1 output, we used actually the triangular membership function
distribution. Now, let us see how to proceed with this design of the neuro-fuzzy system.

466
(Refer Slide Time: 12:31)

Now, as there are 3 linguistic terms for this I_1 and there are 4 linguistic terms for this your
I_2. So, we have got 3 multiplied by 4 rules. So, we have got actually 12 rules. The rules are
as follows: if I_1 is low AND I_2 is very near then the output is your S. So, S is nothing, but
the slow. So, this is the way actually we designed this particular the 12 rules. So, these 12
rules are designed something like this.

(Refer Slide Time: 13:17)

Now, I am just going to represent actually each particular neuron lying on each of the layers.
Now, for the purpose of analysis, we are going to consider the j-th neuron lying on the first

467
layer, then comes here the j-th neuron lying on the first layer then the k-th neuron lying on
the second layer, l-th neuron lying on the third layer and m-th neuron lying on the forth layer
and n-th neuron lying on the fifth layer.

Now, the connecting weight between the layer 1 and the layer 2 is denoted by V, the
connecting weight between your layer 4 and the layer 5 is denoted by W matrix; now the
nomenclature let me explain once again. So, 1_Ij is nothing, but the input of the j-th neuron
lying on the first layer then 1_Oj that is the output of the j-th neuron lying on the first layer,
then your 2_Ik that is the input of the k-th neuron lying on the second layer.

Then, 2_Ok that is the output of the k-th neuron lying on the second layer, then 3_Il the input
of the l-th neuron lying on the third layer, the 3_Ol that is the output of the l-th neuron lying
on the third layer, then 4_Im that is the input of the m-th neuron lying on the fourth layer,
then 4_Om that is the output of the m-th neuron lying on the fourth layer, then 5_In, that is,
the input of the n-th neuron lying on the fifth layer, then 5_On that is the output of the n-th
neuron lying on the fifth layer. And, as I mentioned, in the first layer, we use some sort of
linear transfer function, that is output is nothing, but the input, in layer 2, we carry out
fuzzification; that means. So, we try to find out the membership function value denoted by
µ , which I have already discussed.

Layer 3 actually indicates all 12 rules and we carry out this particular the AND operation;
that means, we compare two µ values corresponding to the two inputs and we try to find out
the minimum. The layer 4 is nothing, but the fuzzy inference and layer 5 is the
defuzzification so, that we can find out like what should be the crisp output corresponding to
that fuzzified output.

468
(Refer Slide Time: 16:21)

Now, let us see how does it work? Now, if you once again look into this network.

(Refer Slide Time: 16:27)

So, here, you can see that we have got some connecting weights like your v_11 then comes
v_12 then comes v_13, we have got the connecting weights like your v_24, then v_25, v_26
and v_27. Now, these are all connecting weights between layer 1 and layer 2 and these
particular connecting weights lie in the range of 0 to 1.

Now, here actually, what we do is. So, this connecting weight expressed in the normalized
scale may have some other value in the real scale and that particular value in the real scale is

469
going to represent, what should be the spread of that particular triangular membership
function distribution; that means, if it is a right angled triangle. So, its base width will be
represented by the connecting weight. If it is isosceles triangle so, its half base width will be
represented by these particular connecting weights. Now, truly speaking, v_11 could be
different from v_12, could be different from v_13. Now, what we do is, we try to find out the
average of these particular three and then we assign that v_11 is nothing, but v_average, v_12
is nothing, but v_average and v_13 is nothing, but is your v_average.

Now, this is done actually just to ensure that we have got the symmetrical triangular
membership function distribution. Now, if we consider a symmetrical membership function
distribution, you need not consider the average and you need not assign the average value to
each of these connecting weights. The same thing actually is done for these connecting
weights also, which are going to represent the base width or the half base weight of the
triangular membership function distribution used to represent I_2. Now, let us see how does
it work? Now, if you see the way, it has been implemented is as follows.

So, here, as I told that we try to find out first v_11, v_12, v_13, find out the average. Now,
this particular average you assign to v_11, v_12 and v_13; that means, v_11 equals to v_12
equals to v_13 is nothing, but v_1,average, the same principle we follow here also. Like v_2,
average is nothing, but v_24 plus v_25 plus v_26 plus v_27 divided by 4 and after that, we
assign v_24 equals to v_25 equals to v_26 equals to v_27 and that is nothing, but v_2,
average.

And, this w_average we calculate. In fact, for the connecting weights between the fourth
layer and the fifth layer and these particular connecting weights or the w values are going to
represent, what should be the size of this particular the triangular membership function
distribution representing the output. The same principle, if we follow here, we find out the
average of w_11, w_21, w_31 and then, we assign w_11 equals to w_21 equals to w_31 is
equal to w_average. So, all such things, we do, now actually, the membership function
distributions for the inputs and the outputs are ready and once they are ready, now actually,
we are in a position to pass the set of input parameters.

(Refer Slide Time: 20:51)

470
Now, here actually, all the 5 steps, once again, we have written in this particular fashion. So,
in layer 1, we consider actually the linear transfer function and here, the outputs are kept the
same with the corresponding inputs. Then, layer 2 is nothing, but the fuzzification module
and here, in this fuzzification module, actually what we do is? We try to find out, what should
be the membership function value.

(Refer Slide Time: 21:39)

Now, if you concentrate on the membership function distribution of the input parameters, that
is, I_1 and I_2. So, you will be getting actually three types of triangle. So, one is this type of
right angled triangle, you will be getting and you will be getting this type of right angled
triangle also and you will be getting actually, these type of isosceles triangle also.

471
Now, if you are getting this type of right angled triangle. So, during the optimization or
during the training, we are going to optimize this particular the base width. And, if you are
getting this type of right angled triangle, we are going to optimize this base width and if you
have the isosceles triangle, we are going to optimize the half base width of this particular
triangle. And, the moment we pass actually a particular input, for example, say I am here,
very easily, I can find out what should be the µ value; and because the range for µ is your 0
to 1. So, using the principle of similar triangle, very easily you can find out the µ value. This
we have already discussed in much more details, similarly, corresponding to this, if I pass one
input, I will be able to find out this particular µ value. Then, corresponding to this actually,
if I pass then I will be getting this as the membership function value. Now, the same thing
actually, we follow just to carry out that fuzzification step.

(Refer Slide Time: 23:21)

That means, corresponding to the set of inputs, you will be getting some µ value and once
you got that particular µ value, then you go for the logical AND operation.

472
(Refer Slide Time: 23:45)

Now, let me once again go back to the schematic view of the neuro-fuzzy system and you
will understand here actually, what we do is. So, you are passing this I_1 and I_2 So, this
output is nothing, but the input and these particular connecting weights are going to represent
the membership function distribution, the shape and size of the membership function
distribution and the output of the second layer is nothing, but is actually the membership
function value or the µ value.

And, as I told, this particular AND operator or the layer 3 is going to represent the AND
operator. So, we have got 12 rules. So, each of the 12 rules actually we are going to represent
here. For example, if I concentrate on the first neuron of third layer. So, this is nothing, but if
the input I_1 is low AND the input I_2 is very near, then the output is nothing, but so, this is
your S. So, corresponding to this particular the first neuron lying on the third layer, the rule is
if I_1 is low AND I_2 is very near, then the output O is nothing, but is your S. So, output is
nothing, but slow. So, this is the rule which is actually represented by the first neuron lying
on the third layer.

Similarly, we have got 12 such neurons just to represent the 12 rules. The moment I pass one
set of inputs here I_1 and I_2. So, there is a possibility, I will be getting actually your two
such non-zero µ here. So, for example, I may get this as the non-zero µ . Similarly,
corresponding to I_2 actually, I may get this as the non-zero µ , this as the non-zero µ .
Then, if I get the non-zero µ here and the non-zero µ here, similarly non-zero µ and non-

473
zero µ here. So, there is a possibility that out of these 12 rules, only four rules are going to
be fired. And, corresponding to each of the fired rules, we are going to find out actually what
should be the fuzzified output and that is actually the purpose of using your the fuzzy
inference.

So, using the fuzzy inference, we can find out the output the fuzzified output of each of the
four fired rules and after that, we carry out defuzzification here, just to find out what should
be the crisp output corresponding to these set of inputs. Now, this is the way actually this
particular neuro-fuzzy system works, that is, the Mamdani approach works.

Now, here, if you see so, till now, we have discussed up to this. Now, in layer 3, we perform
the logical and operation therefore, we carry out the task of fuzzy inference and there will be
defuzzification on layer 5. There are several methods of defuzzification, now here, I am
discussing how to use the principle of the center of sums method. The center of sums method,
in fact, I have discussed in much more details, while discussing the fuzzy reasoning tool
particularly the Mamdani approach.

∑A f
i =1
i i
Now, the crisp output is nothing, but r
. So, we can find out actually what should be the
∑A
i =1
i

crisp output and once you have got this particular the crisp output. So, this particular crisp
output can be used for the controlling purpose.

474
(Refer Slide Time: 28:11)

Now, let us discuss a little bit, how to tune or how to train this particular the neuro-fuzzy
system. Generally, we consider the batch mode of training to train this type of neuro-fuzzy
system, we can use either back-propagation algorithm or we can use some nature-inspired
optimization tool like your genetic algorithm or particle swarm optimization, and so on. And,
by using this particular tool, actually we can optimize.

(Refer Slide Time: 28:53)

475
And, once you optimize, now you are in a position actually to train this particular network;
that means, we are basically giving training to the fuzzy reasoning tool, which works based
on this particular Mamdani approach.

Thank you.

476
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture - 34
Neuro - Fuzzy System (Contd.)

(Refer Slide Time: 00:15)

Now, we are going to discuss, how to solve a numerical example related to the Neuro-
Fuzzy System based on Mamdani approach. The statement of the problem is as follows.
So, here, we are going to develop one neuro-fuzzy system based on Mamdani approach,
and this particular fuzzy reasoning tool is represented using the structure of a
multilayered network. There are two inputs: I_1, I_2 and there is only one output, that is,
O of the fuzzy logic controller. The neural network consists of 5 layers as we discussed,
and the function of each layer is indicated in the figure, I am going to show you.

The input I_1 has been expressed using the linguistic term like near, far, very far. So,
there are three linguistic terms: near, far and very far. Now, similarly, three other
linguistic terms like here small, medium and large have been utilized to represent the
second input, that is, your I_2. And, the output has been represented using three other
linguistic terms like low, medium, high and very high, that is LW, H and VH.

477
(Refer Slide Time: 01:55)

Now, I am just going to show you. So, in this particular network, the membership
function distributions of the inputs and the outputs are assumed to be triangular in nature
and here, the base width of near, very far and half base width of far triangle are kept
equal to v_11, then comes your v_13 and 12, respectively for the first input, that is, I_1.
Similarly, for I_2, these v_24, v_26, v_25 are used to represent the base width of small,
large and half base width of the medium triangles. Now, these w_11, w_31 and w_21
indicate the base width of triangles representing low and very high outputs and half base
width of the high triangle.

Now, here, actually the starting values for this I_1, I_2 and this output are assumed to be
equal to 1.0, 10.0 and 5.0. I am just going to show you that in the figure. Now, we will
have to find out the deviation in prediction for the training scenario, which is nothing,
but I_1 equals to 1.6, I_2 equals to 18.0 and the output is nothing, but is your 9.0. So, for
this particular training scenario, I will have to find out, what should be the deviation in
prediction. Now, let us see, how to determine that particular deviation in prediction.

478
(Refer Slide Time: 03:55)

Now, this shows actually the neuro-fuzzy system. Now, here let me explain. So, I have
got two inputs here: I_1 and I_2 and as usual, on this layer 1, we use the linear transfer
function. Now, on layer 2, actually we try to represent the connecting weights between
the first neuron lying on the input layer and the first neuron lying on the second layer.
So, that is nothing, but is your say v_11 then comes v_12, v_13.

Now, v_11 is going to represent near, v_12 is going to represent half base width for the
far and v_13 is going to represent the base width for very far right angled triangle.
Similarly, the membership function distribution for the linguistic terms used to represent
I_2 are actually denoted by v_24, then comes v_25, then comes your v_26. Now, there
are three linguistic terms for I_1, three other linguistic terms for I_2. So, here I have got
actually the 9 rules and this is nothing, but the layer 3, that is, the AND operation layer
and the layer 4 is nothing, but the fuzzy inference and layer 5 is a defuzzification.

Now, the connecting weights, that is, your w_11, then comes your w_21 and this is
nothing, but w_31 are going to represent either the half base width or the base width of
the triangular membership function distribution used to represent the output variable.
Now, let us see, how to how to find out the deviation in prediction for this particular the
training scenario.

479
(Refer Slide Time: 06:17)

Now, as we discussed previously. So, here you can find out that v_11 is kept equal to
v_12 and that is kept equal to v_13 just to ensure the symmetrical triangular membership
function distribution for I_1. Then, v_24, v_25 and v_26 are going to represent the
symmetrical membership function distribution to represent I_2.

Then comes your w_11 equals to w_21 equals to w_31 are used to represent the
triangular membership function distribution for the output and we consider the
symmetrical membership function distribution. Now, here, we assume the numerical
values for this particular v, that is, v_11 equals to v_12 equals to v_13 and that is kept
equal to 0.3, then v_24 equals to v_25 equals to v_26 that is kept equal to 0.6, then w_11
is equal to w_21 is equal to w 31 so, that is kept equal to your 0.4. And, you can see that
all such values are in the normalized scale lying between 0 and 1.

Now, here the range for this b_1, b_2 and b_3; now this b_1 is going to represent what
should be the half base width or base width of the triangular membership function
distribution used to represent I_1, b_2 is the half base width or the base width of the
triangular membership function distribution used to represent I_2. And, b_3 is nothing,
but it is going to represent either half base width or base width of the triangular
membership function distribution used to represent the output and we are going to use
actually the center of sums method for defuzzification.

480
Now, here, one thing I should mention that although these values: v and w values are in
normalized scale. So, we will have to find out the real scale values corresponding to this
particular normalized value, and considering the ranges for the different variables like
your b_1, b_2 and b_3. Now, b_1, b_2, b_3 are having different ranges and will have use
this normalized value to find out what should be the actual real values for this b_1, b_2
and b_3, this I am going to discuss, in details, now.

(Refer Slide Time: 09:17)

Now, this is actually the manually constructed membership function distribution for this
I_1, I_2 and the output. As I told that for this particular I_1, there are 3 linguistic terms:
near, far and very far and we are considering the triangular membership function
distribution. Now, this is actually one right angled triangle, here also, we consider one
right angled triangle and this is nothing, but the isosceles triangle and this b_1 is going to
represent the base width of this triangular right angled triangle or the half base width of
this isosceles triangle.

Similarly, to represent I_2, we use 3 other linguistic terms, that is, small, medium and
large and the b_2 is used to represent the half base width of this isosceles triangle or the
base width of this particular right angled triangle. Now, similarly to represent the output
O, we use three linguistic terms, that is, your low, medium and very high sorry low, high
and very high and b_3 represents either the half base width of this isosceles triangle or

481
the base width of this particular right angled triangle and for simplicity, we are
considering the symmetrical membership function distribution.

(Refer Slide Time: 10:57)

Now, as I told that corresponding to this normalized value of the connecting weight, we
will have to find out the real values, how to find out the real values? Now, to find out the
real values, we use this particular equation, that is, x =
n × ( x max − x min ) + x min . Now, for a
particular variable say b_1, if I know the normalized value that is nothing, but 0.3, if I
know the maximum value for b_1, that is, 1.5 and the minimum value for this particular
your b_1, that is, 0.5. So, very easily, I can find out, what is the real value for this
particular b_1.

So, the real value for this b_1 is found to be equal to 0.8. Similarly, we can find out the
real value for this particular b_2 and b 2 is nothing, but the normalized value as 0.6
multiplied by (b_2 maximum is 15.0 minus b_2 minimum is 5.0) plus b_2 minimum is
5.0. So, this is nothing, but 11.0, similarly for this b_3, we can find out the real value. So,
n is the normalized value 0.4, b_3 maximum is 8.0, b_3 minimum is 2.0 and b_3
minimum is 2.0 and if you substitute will be getting, that is, 4.4. So, the real value for
v_1 is 0.8, b_2 is 11.0 and b_3 is nothing, but 4.4.

482
(Refer Slide Time: 13:03)

Now, using these real values actually we can modify the membership function
distribution now. So, the modified membership function distribution, we look like this.
So, the starting value for I_1, that has been kept constant to 1.0, but this particular b_1
has been changed. Now, similarly, I will be getting this modified membership function
distribution for I_1, similarly we get the modified membership function distribution for
I_2, and we also get the modified membership function distribution for the output.

And, once you have got the modified membership function distribution, now if I just
pass a particular value of I_1 input, another value of this particular say I_2. Now,
corresponding to this particular I_1. So, I will be getting actually two µ values, one is
corresponding to the near, another is corresponding to your far. Similarly, corresponding
to this particular value of I_2, I will be getting a particular µ value corresponding to
small, another µ value corresponding to this particular medium.

So, here, there are two µ values. So, 2 multiplied by 2. So, there is a maximum 4 fried
rules and using that particular fried rules, we will have to find out what should be the
fuzzified output and then, we will have to go for the crisp output using defuzzification.

483
(Refer Slide Time: 14:39)

Now, this table shows actually those 9 rules, the rules are as follows. So, if I_1 is near
AND I_2 is small then the output is low, and so on. So, here, we have got 9 rules and as
we discuss that out of these 9 rules, only 4 are going to be fired.

(Refer Slide Time: 15:11)

Now, whatever I discussed, I have written it here. So, let us try to see. So, we can see
that on layer 1, we consider the linear transfer function. So, the inputs are actually I_1 is
1.6, I_2 is 18.0. So, 1_I1 is nothing, but I_1 is 1.6, then 1_I2 is nothing, but I_2 is 18.0
and as we consider linear transfer function of the first layer. So, output is equals to input.

484
So, 1_O1 is nothing, but 1_I1 is 1.6 then comes here 1_O2 is equal to 1_I2 is nothing,
but 18.0.

(Refer Slide Time: 16:03)

Now, here, once you got this particular output now actually, what we do is, we try to find
out what should be the µ value. Now, I_1 is nothing, but 1.6 and I_2 is 18.0. So,
corresponding to this 1.6, now if I just draw it here. Now if I draw 1.6 here, then we can
see that it could be near or it could be far. So, there are two possibilities. So,
corresponding to this particular I_1. So, I_1 can be called either near or far.

Now, it can be called a near with some membership function value, that is, 0.25 and it
can also be called far with another membership function value 0.75. Now, for this
triangular membership function distribution, how to find out this µ value using the
principle of similar triangle. So, that thing we have discussed in much more details. So, I
am not going for that once again.

485
(Refer Slide Time: 17:17)

So, corresponding to I_1, the two µ values, we can find out similarly corresponding to

your I_2. So, this I_2 can be called either small or medium and we can find out µ small is

0.272727 and µmedium is nothing, but is your 0.727272. So, we can find out the µ values,
that is a membership function values; that means, the fuzzification is over.

(Refer Slide Time: 17:51)

Now, once I have done this particular fuzzification, now we go to layer 3. Now, as I told
that on layer 3, we have got 3 multiplied by 3, 9 combinations or the 9 possible rules and
out of 9, in fact, only 4 rules are going to be fired. The four fired rules are as follows: if

486
I_1 is near and I_2 is small then the output is something that are not written it here. The
second fired rule if I_1 is near and I_2 is medium then the output is something, the third
fired rule if I_1 is far and I_2 is small then the output is something, then if I_1 is far and
I_2 is medium then the output is something. So, there are four fired rules. So, out of 9, 4
rules are going to be fired, now corresponding to this fired rules actually, we will have
to find out what should be the output.

(Refer Slide Time: 18:57)

Now, if you see the inputs of the third layer. So, that is nothing, but the two µ values
corresponding with the first fired rule. So, 3_I1 is nothing, but so, the two µ values that
is 0.25 and 0.272727 then 3_I2 is 0.25 and 0.727272 then 3_I4 0.75 and 0.272727 and
3_I5 0.75 and 0.727272, now we compare. So, we try to find out what should be the
output of the AND operation layer.

So, 3_O1 that is the output that is nothing, but the minimum between these two and this
will be the minimum, then 3_O2 is the minimum between these two and this will be the
output and 3_O4 is the minimum between these two. So, this is output and 3_O5 is the
minimum between these two and this is actually the output. So, we try to find out the
minimum of the two µ values. And, once you have got this, now we are in a position to
know the output of the third layer

487
(Refer Slide Time: 20:33)

Now, we go for the fuzzy inference and that is nothing, but layer 4. Now, in the layer 4,
corresponding to each of these particular input combinations, the fired input
combinations, we know this particular output for example, say the first fired rule is if I_1
is near and I_2 is small then the output is low. Similarly, the second fired rule if I_1 is
near and I_2 is medium then the output is low, similarly you can also read the third rule
and the fourth fired rule.

Now, actually what will have to do is. So, will have to find out the firing strength of each
of these particular rules as output of the layer 3. And, once you know this particular
firing strength, we are in a position to find out like what should be the fuzzified output of
each of the fired rules.

488
(Refer Slide Time: 21:31)

For example, say if you concentrate on the first fired rule, that is, your NR, SM, LW,
that is, if I_1 is NR and I_2 is SM then the output is LW. So, I will be getting the
fuzzified output is nothing, but so, this particular the shaded portion and its
corresponding area and center of area, we can find out and how to determine those
things, we have discussed in much more details.

The second fired rule: if I_1 is near and I_2 is M then the output is low and this
particular area is nothing, but the fuzzified output, I can find out the area you can find
out the center of area, then the third fired rule: if I_1 is far and I_2 is small then the
output is high. So, I will be getting like this as the fuzzified output the shaded portion, I
can find out its area and center of area. Similarly, corresponding to the post fired rule the
fourth fired rule: if I_1 is far and I_2 is medium the output is your high. So, we can find
out like what should be this particular fuzzified output.

And, once you have got this particular fuzzified output now, actually we carry out the
OR operation. So, just to superimpose also fuzzified output and if you superimposed also
fuzzified output. So, you will be getting actually so, this type of combined fuzzified
output. So, this is nothing, but the combined fuzzified output and once you have got this
particular combined fuzzified output, we can use the centre of sums method for
defuzzification and we can find out this particular crisp output.

489
Now, this is the way actually, we determine actually the crisp output for a set of input
parameters.

(Refer Slide Time: 23:55)

That means on layer 5, we carry out this type of your defuzzification using the centre of
sums method, and this 5_O1 is nothing, but the output of the first neuron lying on the 5th
A1 f1 + A2 f 2 + A3 f3 + A4 f 4
layer and that is nothing, but .; so, we will be getting this as the
A1 + A2 + A3 + A4
crisp output. Now, if this is the crisp output, this particular crisp output can be used just
to find out the deviation by comparing it with the target value, that is 9.0.

So, we compare this calculated value with the target value and we try to find out this
particular deviation. Now, this deviation could be either the positive value or negative
value, but here fortunately we are getting the positive value. But, it could be negative
also and that is why, it is better to find out actually the mode value of the difference
between your T_O1 and that is nothing, but 5_O1. So, we can find out the mod value of
this particular the deviation.

Now, based on this particular mode value; so, what you do is so, this error will have to
propagate it back for the further modification of the network. So, we can use the back
propagation algorithm for its training or the tuning, but the transfer functions are to be
defined in a very nice way. So, that we can carry out the differentiation at least in the

490
logical sense and we can implement this particular BP algorithm, just to modify the
connecting weights or other design variables of this particular network.

Now, we can also use actually a genetic algorithm or any other nature-inspired
optimization algorithm. Now if I use the genetic algorithm, all the design variables we
can keep inside that particular GA string, the GA will try to find out or try to evolve one
optimal neuro-fuzzy system. And, if you use genetic algorithm just to evolve that optimal
neuro-fuzzy system, that will be known as actually your genetic-neuro-fuzzy system. So,
genetic-neuro-fuzzy system; that means, you are using the genetic algorithm just to
evolve the neuro-fuzzy system, that is nothing, but the genetic-neuro-fuzzy system.

So, this is the way actually, we can combine this your fuzzy logic and neural network
particularly this Mamdani approach, and we can design and develop the optimal fuzzy
reasoning tool based on the Mamdani approach. And, this particular fuzzy reasoning tool
has been implemented using the structure of a network and these structure we can utilize
for further tuning/training and ultimately, we are going to train that Mamdani approach
of fuzzy reasoning tool. Now, this is the way actually we can develop the neuro-fuzzy
system.

Thank you.

491
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture - 35
Neuro - Fuzzy System (Contd.)

(Refer Slide Time: 00:15)

We have discussed the working principle of Takagi and Sugeno’s approach of fuzzy
reasoning tool. And, this particular approach is nothing, but a precise fuzzy reasoning
tool; that means, we will be able to establish the input-output relationships in a very
accurate way. Now, today, we are going to discuss how to represent this Takagi and
Sugeno’s approach, a fuzzy reasoning tool using the structure of a neural network, so
that we can train, we can optimize the performance of this particular fuzzy reasoning
tool.

Now, the title of today’s lecture is neuro-fuzzy system based on Takagi and Sugeno’s
approach. Now, let us see how to model this Takagi and Sugeno’s approach or fuzzy
reasoning tool and how to develop the neuro-fuzzy system. Now, this particular neuro-
fuzzy system is very popularly known as Adaptive Neuro-Fuzzy Inference System, that
is nothing but your ANFIS. So, this is nothing, but the ANFIS. So, an in short this is
known as ANFIS: Adaptive Neuro-Fuzzy Inference System. Now, the first proposal

492
came in the year 1993 by Jang and after that so, this particular ANFIS has been modified
in a number of ways.

Now, to explain the working principle, let me consider a system having 2 inputs like I_1
and I_2 and there is one output that is O. So, this is a very simple system having two
inputs and one output. So, we have got I_1 and I_2 and we have got the output O. Now,
let us see how to model using the principle of this ANFIS. Now, the 2 inputs are
represented using triangular membership function distribution, now here, the first input,
that is, I_1 is represented using three linguistic terms like your low, medium and high.

And, for simplicity, we have considered the triangular membership function distribution.
So, this type of triangular membership function distribution we have considered. Now,
truly speaking, actually if you want to model more accurately, might be we will have to
go for some sort of non-linear distribution also. For example, I can also take some sort of
Gaussian distribution, if I consider the low, medium and high and if I consider the
Gaussian distribution; the Gaussian distribution will look like this and there will be some
overlapping of the regions in this Gaussian distribution. So, if I consider the Gaussian
distribution for this particular I_1 so, this could be low, this could be your medium and
that could be the high.

So, this type of non-linear distribution also we can consider. But, for simplicity, as I told
we have considered the linear membership function distribution that is the triangular
membership function distribution. Now, the size of this particular distribution depends
on this particular d_1.

Now, d_1 indicates the base width for this right angled triangle or half base width of
these particular the isosceles triangle. Now, similarly, to represent the second variable
that is your I_2, we are going to use three linguistic terms like your small, large and very
large. And, once again for small, we are going to consider this type of right angled
triangle, for large we are going to consider this type of isosceles triangle and for very
large we are going to consider this type of the right angled triangle. And, the size of this
particular distribution, that is decided by actually this d_2. So, d_2 is going to represent
the base width of this right angled triangle, and the half base width of this isosceles
triangle.

493
Now, during the optimization and training actually, what will happen, we will try to vary
the values for these particular d_1 and d_2 and accordingly, we will be getting the
modified membership function distribution for I_1 and I_2. Now, here one thing we will
have to mention that in Takagi and Sugeno’s approach, we do not consider any
membership function distribution for the output. And, here as we discuss, the output is
expressed as the function of the input parameters, that is, O is nothing, but f of your I_1
and I_2, now we can consider the linear function of the input parameters, we can also
consider some sort of non-linear function of these particular input parameters. But, most
of the time, we generally use only the linear function of the input parameters. So, these
are the membership function distribution for two inputs and as I told that for this output,
there is no membership function distribution used.

(Refer Slide Time: 06:17)

And, here as we discussed, while discussing the Takagi and Sugeno’s approach, that
output here is expressed as the function of the input parameters. Now, we can see that the
output of the i-th rule, that is denoted by y_i is nothing, but a_i I_1 plus b_i I_2 plus c_i,
now I_1 and I_2 are the input parameters and this a_i, b_i, c_i are the coefficients. Now,
these values for the coefficients are to be determined, now using some optimizer, now
we can use the least squared error technique to find out what should be the values for
these a_i, b_i and c_i, we can also use some sort of nature-inspired optimization tools
like genetic algorithm and others, like what should be the optimal values for these
particular a_i, b_i and c_i. Now, if you remember for the first input, that is I_1. So, we

494
have considered three linguistic terms and for the second input, that is I_2, we have
considered three linguistic term. So, we have got 3 multiplied by 3. So, nine possible
combinations of the input parameters and that is why so, here this y^i is a_i I_1 plus b_i
I_2 plus c_i, where i varies from 1, 2 up to n.

So, we have got 9 combinations of the input parameter; that means, we have got 9 rules.
Now, let us see how to determine the output for a set of inputs here.

(Refer Slide Time: 08:09)

Now, this shows actually the architecture of this ANFIS, now let me explain this
particular architecture. Now, here, as I consider that there are two inputs like your I_1
and I_2 and we have got only one output that is O. And, this ANFIS architecture consist
of actually 6 layers, now this layer 1. So, this is known as actually the input layer and
here, we use linear transfer function on the input layer. Then, layer 2 is nothing, but the
fuzzification layer; that means, corresponding to the real values of this I_1 and I_2.

So, we try to find out, what should be the membership function distribution. For
example, say I_1 is represented using three linguistic terms, that is your low, medium
and high and the connecting weights between the neurons lying on the first layer and the
neurons lying on the second layer are going to represent, what should be the base width
of the triangular membership function distribution or half base width of the membership
function distribution. Now, for example, the connecting weights between this particular

495
neuron and this particular node is nothing, but your say V_11, similarly, we have got the
connecting weight V_12, we have got V_13.

Now, here, these V_11, V_12 and V_13 may have different numerical values and once
again, what we consider is the average of these three, that is nothing, but is your
V11 + V12 + V13
and this is nothing, but is your say V_1 average and we assign that V_11
3
equals to V_12 is equal to V_13 is nothing, but is your V_average. Now, this we do just
to consider the symmetrical membership function distribution for this particular I_1.
Similarly, I_2 is represented using three linguistic terms like your small, large and very
large and we try to find out the connecting weights between this particular neuron and
that particular node, that is nothing, but V_24. Similarly, we have got V_25 and here, we
have got V_26.

And, once again, we follow the same method so, we try to find out the V_2 average. So,
V24 + V25 + V26
V2,average = and after that, we assign that V_24 equals to V_25 then V_26
3
and that is nothing, but V_2 average and once again, we do just to maintain the
symmetrical membership function distribution for the different linguistic terms.

Now, if you see the input of the first neuron lying on second layer. So, that is nothing,
but your 2_I1. So, 2_I1 means that is the input of the first neuron or the first node lying
on the second layer and so, these particular inputs are nothing, but the real values of
these particular parameters like I_1 and I_2, and here actually, we will have to carry out
some sort of fuzzification. And, by this particular fuzzification, we can get the µ value.

So, this 2_O1 is nothing, but the output of the first neuron lying on the second layer. So,
here actually, I will be getting some sort of µ value, that is the membership function
value, that is your µlow . Similarly, here we will be getting the µmedium and here, you will

be getting the µhigh and if you see, here also, you will be getting the outputs like µ small ,

then comes your µl arg e and here will be getting your µveryl arg e .

So, all such the membership function values will be getting as the output of the second
layer, now let us concentrate on the third layer. Now, on the third layer, we have got all 9
possible combinations of the input parameters. So, all the 9 rules we are going to

496
consider here and what should be the input of a particular neuron lying on the third
layer? For example, say 3_I1 that is the input of the first neuron lying on the third layer,
so the inputs are nothing, but your say µlow .

So, the inputs are nothing, but µlow and we have got another input and that is nothing, but

is your µ small . So, µlow and µ small will be used as actually the inputs, for example, say one

input is coming from here and another input is coming from here. So, there are two µ
values and these two µ values will be multiplied just to find out the firing strength or the
output of this particular third layer. So, output of this is denoted by π .

So, that can be written here that is nothing, but is your µlow multiplied by is your µ small .
So, we can find out like what should be the output of this particular third layer. Now, by
following the same principle, I can find out the output of each of these particular neurons
lying on the third layer. So, I will be able to find out the firing strength, that is denoted
by the w. Now, here actually we try to we multiply the two µ values, and unlike the
Mamdani approach, we do not consider the minimum.

So, here, we simply multiply the two µ values, now each of the µ values is going to lie
between 0 and 1. So, its multiplied form or the multiplication is also going to lie from 0
to 1. So, we will be getting the firing strength, that is, w_1, w_2 up to w_9 and the values
of the firing strength we lie in the range of 0 to 1. Now, we concentrate on this particular
layer 4. Now, if I concentrate on a particular neuron on layer 4, for example, I am going
to concentrate here. Now, if you see the inputs, all such w values are coming from the
different neurons of the previous layer as input to this particular neuron.

So, all such w values are coming as inputs and here, you will be getting all the w values;
that means, all 9 w values and then we try to find out the normalized value for this
w1
particular w_1 as the output and w1 = . So, we try to find out what should
w1 + ......... + w9
be the normalized value for this particular firing strength.

Now, once you have got it as an output of the fourth layer, now we are proceeding to the
fifth layer. Now, on fifth layer you will find that say we are going to find out, what
should be the output of each of these particular rules. For example, say we are trying to

497
find out or we are trying to calculate, what should be the output of the first rule. The
output of the first rule is nothing, but a1 I1 + b1 I 2 + c1 .

Now, if I get the values of these a_1, b_1 and c_1 and if I supply the numerical values
for I_1 and I_2. So, very easily, I can calculate what is y^1. Now, similarly, here we try
to find out y^2, y^3, y^4, y^5, y^6, y^7, y^8 and y^9 and once you have get that, we
simply multiply. So, this your w , that is a normalized value for the firing strength and
this particular y^1. So, that will be the output of this particular neuron lying on the layer
fifth, that is the fifth layer.

So, by following the same procedure, I can find out the output here, I can find out the
output, I can find out for all the nodes lying on the fifth layer. And, once you have got
that particular thing, now, on 6 layer, we sum them up just to find out what should be the
final output of this particular network for one set of input parameters, that is, your I_1
and I_2; now whatever I discussed here. So, those things actually are written stepwise in
the next slide.

(Refer Slide Time: 18:33)

Now, before that let me tell you that corresponding to the set of inputs that is I_1^star
and your I_2^star. So, only 4 rules are going to be fired out of 9 and those fired rules are
nothing, but if I_1 is low and I_2 is small then y^1 is nothing, but a_1 I_1 plus b_1 I_2
plus c_1. Now, this particular actually y one I can also write the notations which I am
following. So, in this particular format; now here, you can see so, I am using one and

498
now this particular is actually not the AND operator, which is used in Mamdani. So, in
Mamdani approach, we use this type of AND operator, but in Takagi and Sugeno’s
approach, actually we do not use the AND operator, instead we use the conjunction and
that is nothing, but and (small letters).

So, here in Takagi and Sugenos approach, we use this type of and, conjunction and.
Now, similarly, if you see the second fired rule, which states if I_1 is low and I_2 is
large, then y^2 is nothing, but a_2 I_1 plus b_2 I_2 plus c_2 and this is nothing, but y^2.
Now, similarly, we can also find out the output of the fourth rule and that is nothing, but
y ^4 and the rule is as follows, if I_1 is medium and I_2 is small then y^4 is nothing, but
this that is a_4 I_1 plus b_4 I_2 plus c_4 and the fifth fired rule is as follows: if I_1 is
medium and I_2 is large, then y^5 is nothing, but is your a_5 I_1 plus b_5 I_2 plus c_5
and this is nothing, but is your y^5. So, these are the outputs of these 4 fired rules and let
us see, how to proceed all such things and how to explain step-wise.

(Refer Slide Time: 20:55)

Now, whatever I discussed, I am just going to write here step-wise, now in layer one, we
use the linear transfer function and that is why, the output is nothing, but the input. So,
this 1_o1 is nothing, but 1_I1 and that is nothing, but I_1^star. Similarly, 1_o2 is
nothing, but 1_I2 and that is equal to your I_2^star, because we have used the linear
transfer function and y equals to x. So, output is nothing, but input.

499
Now, layer 2 is the fuzzification and as I told, we try to find out the membership function
value, that is, µ . Layer 3 we try to find out the firing strength for each of the rules. Now,
to determine the firing strength for each of the rules, what you do is, the µ values we
multiply. So, corresponding to the first fired rule, the firing strength is nothing, but is
your µ LW corresponding to I_1^star multiplied by µ SM corresponding to your I_2^star.
So, we will be getting the firing strength by following the similar procedure, we can find
out, what is y^2, what is y^4 and what is your y^5.

Now, once we got that firing strength, now, you can find out the normalized firing
w1
strength, that is your w1 = . And, by following the same procedure, we
w1 + w2 + w4 + w5

can find out the normalized firing strength like your w2 , w4 , w5 . So, all such normalized
firing strength values we can calculate.

(Refer Slide Time: 23:03)

And, in layer 5, what we do is, we try to find out the output, that is the output of the first
neuron lying on the fifth layer is nothing, but w1 × y1 . Then 5_o2 that is the output of the
second neuron lying on the fifth layer is nothing, but w_2 bar multiplied by y^2. Then,
comes here 5_o4 that is w_4 bar multiplied by your y^4 then comes here 5_o5 that is the
output of the fifth neuron lying on the fifth layer is nothing, but w_5 bar multiplied by is
your y^5.

500
So, we can find out the output and once you have got those outputs, now, we are in a
position to find out the final output or the overall output and that is nothing, but your
6o1 = w1 y1 + w2 y 2 + w4 y 4 + w5 y 5 . So, we can find out the output of this particular set of
inputs. Now, here we should mention that the performance of this particular ANFIS
depends on the coefficient of transfer functions, that is, a_i, b_i and c_i and of course, it
depends on the membership function distribution of these two inputs, that is, your I_1
and I_2.

(Refer Slide Time: 25:01)

Now, here actually, what you do is, like if we just go back to the architecture, if we look
into the architecture once again now, this particular architecture, you can see, there are a
few the neurons, which are denoted by circle, on the other hand we have got a few other
neurons, which are denoted by the square. So, this is nothing, but a square. So, we use
two types of symbols in this particular the ANFIS, one is your the circle and another is
your the square. Now, wherever we use square, there is a chance of further improvement,
there is a chance of optimization. So, if you see, here, I am putting your the square; that
means, we can optimize the values for these particular connecting weights, that is your
V_11, V_12 and V_13 and that is why, we have put square here.

So, there is a chance of improvement, there is a chance of optimization. Now, for the
same reason actually, we have put here square; that means, we can also optimize the
membership function distribution for these linguistic terms, that is your small large and

501
very large. Now, similarly, if you see here also, you have put actually the square symbol
in place of the circle; that means, here also, there is a chance of improvement and that
improvement actually lies on the value for this particular y. Because, if you write down
the expression for this particular y^1, you will see that we have got your a_1 I_1 plus
your b_1 I_2 plus c_1.

So, this particular a_1, b_1 and c_1 are actually going to control the value for this
particular your y^1. So, there is a chance of further improvement of this particular
performance by selecting the proper values for a_1, b_1 and c_1. Now, as I have already
mentioned that these values for the coefficients ,that is, you’re a_i, b_i and c_i can be
determined using some optimizer.

For example, we can use the traditional tools for optimization also, we can use some sort
of the error minimization algorithm. So, that we can find out these coefficients like a_i, b
_i and c_i. So, that it can predict the output very accurately, as the error in prediction
should be as minimum as possible. That means, there is a chance of improvement of the
performance of this particular network and that is why, we put the square here, in place
of this particular circle.

So, this is the way actually, this particular network works and ultimately, for the set of
inputs, we will be getting that particular output. And, as I told that we can improve the
performance using some optimizer, some traditional optimization tool we can use, we
can also use some sort of nature-inspired optimization tool, now if we use genetic
algorithm to tune this particular ANFIS. So, what you can do is, on the GA string, we
can put all such information like your connecting weights, that is, the v values. So, the
connecting weights, we can put, that is the V values like your V_11, V_12, V_13 then
we can also put your V_24, then comes V_25 then V_26.

All such things, you can put there and we can also encode the values of the coefficients
like your a_i, b_i and c_i then GA will take the responsibility to find out what should be
the optimal values of these particular parameters, so that this ANFIS work in the optimal
sense, so that the ANFIS should be able to make the prediction of the output or the set of
inputs as accurately as possible, now here actually, what you do is, as you have already
discussed, this is nothing, but the calculated output, and this calculated output will be
compared with the target output just to find out the deviation for the first training

502
scenario, that is nothing, but is your d_1 and we try to find out the target output and this
calculated output and we consider the mod value.

Now, corresponding to a particular GA-string, the way I discussed, we pass all the
training scenarios one after another. So, if I pass the second training scenario, I will be
getting d_2, similarly, if I pass the L-th training scenario, I will be getting like d_ L, you
then sum them up, find out the average and that particular average value will be the
fitness of the GA. So, if I consider, this type of population of the GA, if it is a binary-
coded GA. So, I will be getting f_1, similarly for the second, I will be getting f_2 as the
fitness, and for the N-th one, I will be getting f_N as the fitness and using the fitness
information and with the help of its operator like reproduction, crossover and mutation,
GA we will try to find out or try to evolve like what should be the optimal values of
these particular parameters, so that this ANFIS can make the prediction as accurately as
possible.

Now, if I use genetic algorithm to optimize or train this particular ANFIS network, this
will be known as actually genetic-neuro-fuzzy system and a more specifically. So, if I
use genetic algorithm to tune this ANFIS so, this is very popularly known as your GA-
ANFIS. So, this GA-ANFIS means I am just going to optimize the ANFIS parameters,
the variables for the ANFIS, with the help of a genetic algorithm, and GA-ANFIS will be
able to evolve a very optimal and suitable ANFIS network, which will be able to make
the prediction for the input-output relationships as accurately as possible.

Thank you.

503
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture - 36
Neuro - Fuzzy System (Contd.)

(Refer Slide Time: 00:15)

Now, we are going to solve a numerical example just to explain the working principle of
this particular ANFIS. Now, here I am just going to take one numerical example for a
system, it is a very simple system having two inputs and one output. So, this is
something like this, we have got I_1, I_2 and I have got only one output. Now, the
statement of this particular numerical example is as follows: So, we will have to model,
we will have to design one ANFIS, and the purpose is to model a process having two
inputs: I_1 and I_2 and one output, that is O.

Now, in this particular ANFIS, there are six layers, and we use two linguistic terms to
represent each of the input parameters. For simplicity, for example, say to represent I_1,
we have used like your low and high. Similarly, to represent this I_2, the second input
parameter, we have used two linguistic terms: one is called the small, another is large.
So, this is a very simple ANFIS and the 2 inputs each is represented using two linguistic
terms.

504
So, we have got like 2 multiplied by 2, only 4 possible rules. And, we can see that these
show actually the second layer, now in the second layer, in fact, we have got 2 neurons
here and we have got two more neurons here and the connecting weights between the
first layer and the this particular neuron of the second layer. So, this is nothing, but is
your v_11, this is your v_12 and here, the connecting weight is V_23 and this is your
V_24. It is a very simple network, and now, let us see the other part of the statement of
this particular problem.

(Refer Slide Time: 02:33)

The connecting weights are lying in the range of 0 to 1, and if you see the membership
function distribution, which we have considered for the two inputs like I_1 and I_2 are as
follows. Now, for simplicity, we have considered like the low I_1 is nothing, but this. So,
this type of membership function distribution right angled triangle and for the high,
another right angled triangle and there is overlapping also.

Now, similarly, for this I_2, there are two linguistic terms, one is the small, another is a
large and for simplicity, we have considered this type of your right angled triangle, it is a
very simple representation of the membership function and here are the starting values of
these particular I_1 and I_2. Now, you can see that the starting value is nothing, but I_1
minimum and here it is I_2 minimum. So, this I_1 minimum you have considered, that is
equal to 1.0 and your I_2 minimum we have considered, that this is equal to 5.0.

505
(Refer Slide Time: 03:55)

Now, the rest of the statement of this particular problem, like we use the connecting
weights like V_11 is equal to V_12 and that is denoted by d_1, similarly, this particular
V_23 is nothing, but is your v_24 and that is going to represent actually your d_2. Now,
these values for the d_1 and d_2 should lie in a particular range and there should be some
well-defined range also, and as I told, there are 4 possible rules, 2 multiplied by 2 and the
output of a particular rule, that is, the output of the i-th rule, that is y^i is nothing, but a_i
I_1 plus b_i I_2 plus c_i and here, this particular i is nothing, but 1, 2, 3, 4 and a_i, b_i,
c_i are the coefficients of the rules.

(Refer Slide Time: 05:03)

506
Now, here, to solve this numerical example, we are going to consider some numerical
values, the predetermined numerical values for these particular coefficients are as
follows. Now, for the first rule, the values of the coefficients are nothing, but a_i, that is,
a_1 equals to 0.2, b_1 equals to 0.3 and c_1 is 0.10. Similarly, for the second rule, a_2 is
0.2 and this b_2 is 0.4 and c_2 is 0.11. Similarly, for the third rule, a_3 is 0.3, b_3 is 0.3
and c_3 is 0.13 and for the fourth rule, a_4 is 0.3, then b_4 is 0.4 and c_4 is 0.14.

Now, these d_1 and d_2 are varied in this particular range. So, d_1 is varying in the
range of 0.8 to 1.5. Now, similarly, d_2 is going to vary in the range of 4.0 to 6.0. And,
to carry out this numerical example actually, what we do is, we assume the values for the
connecting weights, we assume that V_11 equals to V_12, which is going to represent
the membership function distribution for the first input, that is nothing, but 0.3.
Similarly, V_23 equals to V_24, this is nothing, but 0.5 and these are in the normalized
scales.

So, we will be discussing that, using these normalized values, we will have to find out
the real values lying in the ranges for this particular d_1 and d_2. And, we are going to
determine, what is the deviation in prediction for a particular training scenario, where
I_1, the first input is nothing, but 1.0 and I_2, that is the second input is 6.0 and the
output O is nothing, but is your 2.3. Now, let us see, how to determine the output
corresponding to this set of inputs. And, how to determine the deviation in prediction
corresponding to this particular training scenario. So, those things actually, I am just
going to calculate.

507
(Refer Slide Time: 07:45)

Now, let us concentrate on the solution of this numerical example, now as I told the first
thing will have to do is, corresponding to the normalized values of the connecting
weights, we will have to find out the real values; that means, will have to find out the
real values for this particular d_1 and d_2. Now, to find out the real values from the
normalized value that is denoted by n, we are going to use this particular formula. It is
very simple, like x =
n × ( xmax − xmin imum ) + xmin imum .

Now, if I put n is equal to 0. So, x will become equals to how much? That is nothing, but
x_minimum. On the other hand, if I put n is equal to 1, what will happen to the value of
x? So, if I put n equals to 1. So, this is xmax − xmin imum + xmin imum . So, this will become your
x_maximum; that means, here. So, this particular n varies in the range of 0 to 1, and
accordingly, I can find out what should be the real values for d_1 and d_2 lying within
their respective ranges.

Now, here, d_1 is nothing, but your 0.3 multiplied by 1.5 minus 0.8 plus 0.8, because n is
equal to 0.3, the maximum value of d is 1.5, the minimum value is 0.8 and if we
calculate will be getting that d_1 is equal to 1.01. Now, similarly, we can find out d_2 is
equal to 0.5, that is the value of n, multiplied by 6.0 minus 4.0 plus 4.0.

508
Now, here, d_2 minimum is 4.0, d_2 maximum is 6.0 and the value of small n is equal to
0.5 and if you substitute, we will be getting the value for this d_2 as 5.0. Now, once you
have got the real values for this particular d_1 and d_2.

(Refer Slide Time: 10:19)

So, now we are in a position to draw the modified membership function distribution for
this particular I_1 and I_2. So, for this particular I_1, this is the modified membership
function distribution and for this I_2, this is the modified membership function
distribution; where d_1 is nothing, but this and d_2 is nothing but is your this. So, we can
find out the modified membership function distribution for the two inputs.

509
(Refer Slide Time: 10:53)

And, now the layer-wise, I am just going to carry out the calculations. Now, on the first
layer, as I mentioned that we use linear transfer function. So, this output is nothing, but
the input. So, the output of the first neuron lying on the first layer is nothing, but the
input of the first neuron lying on the first layer and that is nothing, but 1.1.

Now, similarly, the output of the second neuron lying on the first layer is same as the
input of the second neuron lying on the first layer and that is nothing, but 6.0. Now, layer
2 is nothing, but the fuzzification layer. So, what we do is. So, these values for your
inputs, that is, 1.1 and 10.0, we are going to pass to the corresponding the membership
function distribution just to calculate what should be the membership function values.

510
(Refer Slide Time: 12:01)

Now, if you calculate the membership function values, you will be getting that
corresponding to the low I_1. So, the µlow is corresponding to I1* . So, this will become
equal to your 0.900990.

Then, corresponding to your I1* , the mu_H will become equal to 0.09009, similarly
corresponding to the second input, that is equal to 6.0. So, we can find out the values like
µ SM corresponding to I 2* is nothing, but 0.8, then comes your µl arg e corresponding to I 2*
is nothing, but is your 0.2, and how to determine those things, we have discussed several
times in details.

511
(Refer Slide Time: 12:57)

Then, comes your layer 3. Now, this layer 3 is actually is going to represent all the 4
possible rules for this particular reasoning tool and here, you can see, that the input of the
first neuron lying on the third layer is nothing, but the 2 values of the membership.
Similarly, the input of the second neuron lying on the third layer are nothing, but the 2
values of the membership, then your the input of third neuron lying on the third layer is
nothing, but this then the input of the fourth neuron lying on third layer is nothing, but
this. And, what you will to do is, now, to determine the output of third layer, we will
have to multiply the µ values. For example, say output of the first neuron lying on the

third layer is nothing, but µlow corresponding to I1* multiplied by your µ SM

corresponding to this I 2* . And, if you substitute the numerical values and multiply. So,
you will be getting w_1, as actually the firing strength for the first fired rule.

512
(Refer Slide Time: 14:23)

Now, by following the same procedure, we can find out the firing strength of the other
fired rules. For example, for the second rule, the firing strength we can find out and this
will become 0.180198 as w_2 and similarly, we can find out the firing strength of the
third rule, that is your w_3 and that is nothing, but is your 0.079207. For the fourth rule,
we can also find out the firing strength and that is nothing, but µ H corresponding to I1*

multiplied by µ LR corresponding to I 2* .

And, if you substitute the numerical values and carry out the multiplication, you will be
getting w_4 is nothing, but 0.019802. And, once you have determined all the firing
strength values, now we can start with layer 4. And, in layer 4, actually we try to find out
the normalized values of these firing strengths. Now, here so, w_1 bar, that is the
normalized value for the first or the fired rule that is nothing, but w_1 divided by w_1
plus w_2 plus w_3 plus w_4, and if we substitute all the numerical values, we will be
getting w_1 bar as 0.720792. Similarly, this w_2 bar is nothing, but is your w_2 divided
by the sum of all w values and if you substitute the numerical values you will be getting
w_2 bar is nothing, but 0.180198.

513
(Refer Slide Time: 16:25)

Now, by following the same procedure, we can also find out like what should be your
w_3 bar. And, w_3 bar is nothing, but w_3 divided by the sum of all the w values and
you will be getting 0.079207 as your w_3 bar. Similarly, w_4 bar is nothing, but w_4
divided by the sum of all w values and after substituting the numerical values, we can
calculate the w_4 bar is nothing, but 0.019802. Now, we have got all the normalized
firing strength values. Now, we go to layer 5.

Now in layer 5 actually, what we will have to do is, we will have to find out the output of
each of these particular fired rules; for example, for the first fired rule. So, this
y 1 = a1 I1* + b1 I 2* + c1 . So, you will be getting 2.12. Similarly, this y 2 = a2 I1* + b2 I 2* + c2 and

this will become equal to 2.73. Then, y 3 = a3 I1* + b3 I 2* + c3 and if you substitute the

numerical values, you will be getting 2.26. Then, y 4 = a4 I1* + b4 I 2* + c4 . So, if you
substitute all the numerical values you will be getting 2.87 as your y_4.

514
(Refer Slide Time: 18:23)

Now, once you have completed the layer 5. So, layer 5 actually you will be getting we
multiply. So, this w_1 bar by your y^1 and will be getting this as the output of the first
neuron lying on the fifth layer. Similarly, the output of the second neuron lying on the
fifth layer is nothing, but w_2 bar multiplied by y^2 and you will be getting 0.491941.
Similarly, the output of the third neuron lying on the fifth layer is nothing, but w_3 bar
multiplied by y^3 and that is nothing, but 0.179008 and similarly, the output of the fourth
neuron lying on the fifth layer is nothing, but w_4 bar y^4 and if you substitute the
numerical values you will be getting 0.056832.

And, once you have got the outputs of the fifth layer, now in sixth layer actually, what
we do is, we sum them up just to find out what is the overall output, that is your 6_o1.
So, 6_o1 is nothing, but w_1 bar y^1 plus w_2 bar y^2 plus w_3 bar y^3 plus w_4 bar
y^4 and if we substitute the numerical values, we will be getting 2.25586 and this is
nothing, but the final calculated output for the set of inputs.

Now, this calculated output will be compared with this particular target output and that is
denoted by T_o1 that is 2.3 and very easily we can find out the deviation in prediction
and we can also find out the mod value of the deviation prediction, that is a 2.3 minus
255860. Now, here fortunately, we are getting positive, but if we get negative so, we will
have to consider the mod value and the way I discussed. So, this is the prediction
corresponding to your the first training scenario, then we go for the second training

515
scenario, third training scenario up to all the training scenarios, say capital L-th training
scenarios and try to find out all the deviation values. You add the mod values of all the
deviations, find out the average that particular average will be the fitness of the GA, as I
discussed.

Now, if I use the GA-string and if I use a binary-coded GA, say if I use a binary-coded
GA, for each of the GA-strings, I will be able to find out the fitness and then, we use
actually the operators, the GA-operators just to modify. And, GA through a large number
of iterations will try to evolve that particular ANFIS, which will ensure a very good
prediction accuracy.

(Refer Slide Time: 21:57)

Now, this is the way actually this ANFIS works and now, regarding the reference you
can see the textbook for this particular course, that is, soft computing: fundamentals and
applications, if you want to get more details regarding this neuro-fuzzy system.

516
(Refer Slide Time: 22:15)

Now, to conclude actually, in this lecture, we discuss 2 neuro-fuzzy systems. The first
one is based on the Mamdani approach and the second one is based on the Takagi and
Sugenos approach. Now, the basic the aim of the neuro-fuzzy system is to design and
develop the fuzzy reasoning tool and we take the structure of this particular network, so
that we can represent the fuzzy reasoning tool and we can carry out the training or the
optimization of this particular network. But, truly speaking, we are going to design and
develop very accurate fuzzy reasoning tool, and we have discussed both Mamdani
approach and Takagi and Sugenos approach of fuzzy reasoning tool and we have seen,
how to represent them using the structure of a network and how to optimize.

So, how to evolve the optimal fuzzy reasoning tool based on both Mamdani approach as
well as Takagi and Sugeno’s approach by taking the help of the structure of a network,
neural network and with the help of one optimization tool. Now, this is the way actually
we can develop the neuro-fuzzy system and these neuro-fuzzy systems have got a large
number of practical applications. So, these neuro-fuzzy systems have been used widely
to solve a variety of real-world problems.

Thank you.

517
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 37
Concepts of Soft Computing and Expert Systems

We are going to discuss another topic, that is, Concepts of Soft Computing and Expert
Systems. Now, here actually what I am going to do, I am just going to define, what do
you mean by soft computing and why should you go for the soft computing? And, then, I
will try to concentrate on the expert system, like how to design and develop a suitable
expert system, so that we can solve the real-world complex problem in a very efficient
way.

Now, as I told that we are going to define the term: soft computing and we are going to
give a brief introduction to the soft computing tools. The tools, we have already
discussed and now, I am just going to put them under a family, that is called the soft
computing family. Then, I will try to explain, what do you mean by an expert system and
a few applications will be discussed, in detail.

(Refer Slide Time: 01:20)

Now, to start with the concept of soft computing, the name: soft indicates that it is not
hard; that means, it is not so much precise and it is not so much accurate. So, by the
term: soft, we means it is not precise, not accurate and if we get some acceptable

518
solution, we are happy with this particular solution. Now, before we define this particular
term: soft computing. Let us try to explain, what do you mean by the hard computing?
The moment we say that there is something called soft computing, then we have
something else, that is called the hard computing.

Now, this particular term: hard computing, that was proposed in the year 1996 by
Professor Zadeh of University of California USA; the same Professor Zadeh, who
introduced the concept of the fuzzy sets. Now, to explain or to understand the concept of
hard computing, now let us try to see, what are the steps we follow to solve some
engineering problems. Now, if we want to solve some engineering problems or if you
want to determine the input-output relationship, the first thing we will have to do is, we
will have to identify the inputs and output variables.

Now, these input variables, these are known as actually the condition variables or
antecedents and the output variables are nothing, but the action variables and these are
also known as actually the consequence. Now, let me take a very simple example. Now,
supposing that we want to control the temperature and humidity of this particular studio,
where I am just recording this lecture. Now, if I want to control the temperature and
humidity inside this particular room, what I will have to do is, I will have to find out
what are the inputs, which are having some influence on the outputs: temperature and
humidity.

Now, what we do is, we try to find out the input variables first, for example, say input
variables could be the present temperature inside this particular room, present humidity
inside this particular room, outside temperature, outside humidity, then comes your the
number of people sitting in this particular room, then comes the thermal conductivity of
the walls, and so on. So, these are all input parameters or the input variables and these
are nothing, but the condition variables. Now, these input variables are, in general,
independent. They are not dependent on others.

Now, once you have identified the input variables, we will have to search for what
should be the output variable. Now, supposing that I have gotten air conditioner just to
control the temperature and humidity of this particular room. Now, what we will have to
do is, we will have to control the angle of valve opening for this particular air
conditioner. Now, depending on the angle of valve opening, some conditioned air will

519
enter this particular room and that is going to influence the temperature and humidity of
this particular room.

Now, if I want to keep the temperature and humidity within the comfortable zone. So, I
will have to do is, I will have to control the valve opening, I will have to adjust the angle
of valve opening accordingly. For example, say might be, I will have to rotate that angle
of valve opening by say 10 degrees or 20 degrees something like this. So, this could be
the output variable. Now, that means, in the schematic, if we write down. So, if this is the
process to be controlled or processed to be modeled, we have got a large number of
inputs here, like your I_1, I_2, and say I_n and supposing that we have got only one
output here, that is nothing, but the angle of valve opening.

So, this is the way actually, we will have to identify the inputs and output variables, and
once we have identified this particular input and output variable, the next thing what we
do is.

(Refer Slide Time: 06:25)

So, we try to find out the input-output relationship and we try to see the physics of this
particular process. Now, if the physics is known, then we can find out the mathematical
equations. Say, we can find out the differential equations. Now, if we can get the
differential equations, now we can solve it either analytically or you can use some
numerical methods to solve this particular differential equations and if you get the
solution of these particular differential equation that will be nothing, but the input-output

520
relationship and this output or the control action, we can use for controlling the condition
of this particular system.

So, this is the way actually, we try to identify the inputs and outputs of a process, which
we are going to model, which you are going to control and then, we try to find out the
mathematical equation, based on the physics; we solve those equations, try to find out
the solution and that particular solution is nothing, but the input-output relationship.
Now, these steps are nothing, but the concept of the hard computing. Now, in hard
computing actually, we take the help of mathematics. So, there is a possibility that we
will be getting very accurate, very precise solutions. So, this is actually what we mean by
the hard computing.

(Refer Slide Time: 08:04)

Now, let us see, let us just take the example of this particular hard computing. Now,
supposing that here I have taken one example of stress analysis using the finite element
analysis. Now, you know that using this finite element analysis, we try to solve the
differential equation and this is nothing, but a numerical method, one of the numerical
methods to solve the differential equation. Now, let me take a very simple example, very
simple example of a cantilever beam. Now, supposing that I have got a beam something
like this, it is a very simple beam. So, this type of beam I have got and here, there is one
concentrate load, say P, this is the supporting end and this is the free end and supposing

521
that the length of this particular beam is nothing, but your L and so this beam is having
one rectangular cross-section, supposing that say this is a and this is your b.

Now, very easily, for this particular cantilever beam; we can find out, what should be
the maximum deflection and what should be the stress developed? For example, say for
this type of beam, we can find out the maximum deflection, which will come here and
PL3
this particular maximum deflection that is your δ max = . Now, you can find out your
3EI
M
the maximum stress developed, that is your σ
= × y , ok? So, M is nothing, but the
I
maximum bending moment, I is the moment of inertia and y can be found out. So, we
can find out, how much if the stress developed in that particular beam. Now, once I have
got this particular the stress developed, this is how to determine the stress developed in
the beam mathematically. Now, what you can do, supposing that we have got a beam,
which is having some sort of complicated cross-section or the varying cross-section
along the length. Now, supposing that the beam looks like this, now if the beam looks
like this. So, this type of beam if I consider and supposing that the same concentrated
load P is acting here and this is actually the fixed end and this is the free end and if I tell
you that can you please find out what should be the stress developed? It is not so easy
because along this particular length L, the cross-section is going to vary, so this will
become a more difficult problem; now how to find out the stress developed? To find out
the stress developed actually what we will have to do is, we will have to take the help of
finite element analysis.

Now, this FEM analysis we will have to carry out, and for that this particular beam will
be divided into a large number of small elements and for each of the elements, actually
we will have to find out what should be the stress developed and what should be the
deflection and then, you will have to combine to find out what should be the combined
stress develop for this the complicated beam having the varying cross-section.

Now, this is the way actually, we carry out some sort of stress analysis for a beam having
varying cross-sections using the finite element analysis. So, this is nothing, but an
example of your hard computing. So, this is nothing, but a hard computing. Now, let me
take another example like your determination of gain values of PID controller. Now, we
know that in robotic joints, we use some DC motors and for each of the motors, we

522
generally use one controller and we generally use a controller that is called the PID
controller and that is nothing, but is your proportional integral derivative controller.

Now, to implement this PID controller actually what we do is, we try to find out what are
the gain values? What is the proportional gain value, that is denoted by K_P, then we
have got the integral gain value, that is called the K_I and we have got the derivative
gain value, that is your K_D. Now, what we do is, we try to find out the values for this
K_P, K_I, and K_D. Now, if you get the values for this K_P, K_I and K_D. Now, I can
control that particular DC motor; now how to determine these particulars K_P, K_I, and
K_D? Now, there are some methods and there is a standard method that is called the
Ziegler Nichols method. So, in this Ziegler Nichols method, we can use to find out what
should be the gain values, that is, K_P, K_I and K_D. We solve that particular equation
to find out what should be the gain values and this is another very good example of the
hard computing.

Now, if you see the features of hard computing, it works based on pure mathematics. So,
there is a possibility that you will be getting very precise solutions. Now, this particular
concept of hard computing, we can use if the problem is simple and if this particular
problem can be model mathematically, but for a complex problem, for a complex real-
world problem, this particular principle of hard computing we cannot use and if we
cannot use this particular concept of hard computing, then we will have to go for the
concept of soft computing.

523
(Refer Slide Time: 15:06)

Now, let us see, what we do in soft computing and how to define this particular term:
soft computing. Now, this I have already mentioned that in soft computing, we are not
very much interested to get very precise solution or accurate solution, on the other hand,
if we get some acceptable solution, some approximate solution or the heuristic solutions,
we should be happy and that is why, we use this particular term, that is called the soft.
So, by soft computing we mean that we do not want so much precision and if we get
some acceptable solution, we are happy. Now, this term: soft computing was coined by
Professor Zadeh in the year 1992.

Now, you see that the term: hard computing was introduced in the year 1996, but soft
computing that particular term was introduced before 1996, that is in the year 1992,
although people are using the concept of hard computing for such a long time. Now, to
define the soft computing, soft computing is actually a family consisting of some
biologically inspired techniques like we have got Fuzzy Logic, then Neural Networks,
Genetic Algorithms and others, and their combined forms like your GA-Fuzzy Logic,
that is Genetic-Fuzzy system; GA-neural network, that is genetic-neural system; NN-FL,
that is Neuro-Fuzzy system, GA-FL-NN, that is Genetic Neuro-Fuzzy System, in which
precision is traded for tractability, robustness ease of implementation and low cost
solution.

524
Now, here, as I told that this is a big family and this family consist of a large number of
members, for example, Fuzzy Logic, Neural Networks, different types of your nature-
inspired optimization tools and their combined techniques and in fact, by definition of
the soft computing, we mean the combined techniques like the way, we have already
discussed Genetic-Fuzzy System, that is the combination of genetic algorithm and Fuzzy
Logic system. Genetic-neural system that is the combination of genetic algorithm and
Neural Networks, and then Neuro-Fuzzy, it is a combination of Neural Network and
Fuzzy Logic and Genetic Neuro-Fuzzy, that is, GA-FL and NN.

Now, we try to combine because there is a reason, the reason is as follows, like each of
these particular tools is having its own merits and demerits. So, in combined tools
actually, what we do is, we try to remove their demerits and at the same time, try to
utilize their merits to design some combined techniques, so that we can solve the real-
world problem in a very efficient way. Now, here this particular algorithm has to be
computationally tractable; that means, the computational complexity should not be very
heavy. It should be robust; that means, the same algorithm can be used to solve a variety
of problems. It should be easy to implement and low solution cost means in terms of
computational complexity, it should be computationally faster.

Now, here in this schematic view, we are going to show you, what do you mean by the
soft computing, now this particular circle indicates actually it is genetic algorithm, so the
region of Fuzzy Logic, that is denoted by this particular circle. Similarly, this circle is
going to represent the Neural Network. Now, what we do , here, we try to consider the
combined techniques. Now, the combined techniques means, we are going to consider
the combination of the GA and Fuzzy Logic. So, this is actually the combination of GA
and Fuzzy Logic.

Now, similarly, we are going to consider the combination of GA and Neural Network
and we are also going to consider the combination of these particular Fuzzy Logic and
Neural Network and this is actually the region of your the soft computing. So, these
particular combined region, that is actually the region of soft computing. This is what,
we mean by the concept of soft computing.

525
(Refer Slide Time: 20:21)

Now, if you see the features of soft computing. So, it does not require the extensive
mathematical formulation of the problem. In fact, for the real-world problems, it is bit
difficult to know the physics. So, it is big difficult to derive that mathematical equations,
but if you want to get some solution for a complicated real-world problem. So, we will
have to take the help of the soft computing. Now, as I told that we may not be able to
reach very precise solution like the heard computing.

So, here, if we get some acceptable solutions, we are happy and here, in soft computing,
as I told that we are going to copy the merits of the different tools and we are going to
actually remove the demerits. Now, whenever we combine the tools for example, I am
combining say the fuzzy logic with the neural networks. So, we try to take the
advantages or the merits of both of the techniques and we try to remove their inherent
demerits, but whenever we combine, we do not consider that these two tools are fighting
with each other. So, they are helping each other to develop one combined tool and that is
nothing, but the soft computing tool, so that we can solve the real-world problem in a
very efficient way.

Now, there are some examples like, say we can design and develop the adaptive motion
planner for the intelligent robots. We can design and develop like fuzzy logic-based or
neural network-based adaptive controller for the motors used in the robots. So, this is
actually the application of the soft computing and we are going to discuss some more

526
applications of soft computing in much more details. Now, here if you see, if you use the
principle of soft computing, there is a possibility that you will be getting some robust and
we will be getting some adaptive solutions to the problem and which is very interesting
like if you get some adaptive solution, which will be able to cope with the varying
situations of the environment. Now, this is actually, what you mean by the soft
computing.

(Refer Slide Time: 23:03)

Now, then comes the concept of your the hybrid computing. Now, by definition, hybrid
computing is nothing, but a combination of the hard computing and soft computing.
Now, here if you see in this particular schematic view. Now, if this is actually the area of
hard computing, and this is the area of the soft computing. Then, the common area
between them is nothing, but these and this is nothing, but the hybrid computing.

Now, I am just going to take some very practical example just to understand the utility of
the concept of this particular hybrid computing. Now, let us let take some example.

527
(Refer Slide Time: 23:57)

Now, here, as we told that a complex real-world problem will be solved using the
concept of your both hard computing as well as soft computing; that means, a part of the
complex problem will be solved using the principle of hard computing and the remaining
part will be solved using the principle of the soft computing. The moment, we consider
the concept of hybrid computing, the hard computing and the soft computing they are not
fighting with each other, instead they are helping each other. In fact, in hybrid
computing, the hard computing and soft computing are complementary in nature.

(Refer Slide Time: 24:49)

528
Now, I am just going to take some examples of the hybrid computing. Now, supposing
that now, I am just going to design some machine elements using the principle of the
finite element analysis. Now, if I take the same example which I took. So, let me once
again consider the same example of this type of beam having say varying cross-sections.
Now, along this particular length, the cross-section is going to vary and if I take this
particular the same problem. Now, if I want to find out how much is the stress
developed?

Now, to determine the stress developed, as we told that we will have to take the help of
the finite element analysis. Now, in finite element analysis, the quality of the solution
depends on a number of parameters for example, it depends on the length of the element,
the length of elements. So, it depends on the connectivity of the elements. Now,
depending on the length of the elements and the connectivity of the elements we
consider, we will be getting the different types of solutions.

Now, if I change the length of the element and the connectivity, several times, while
running this FEM package there is a possibility that you will be getting different
solutions. Now, my question is, which one to believe, which one to consider? So, there is
fuzziness, there is fuzziness in the results of this particular finite element analysis. Now,
there is another fuzziness that fuzziness belongs to the fuzziness of the material
properties. Now, if I consider the material properties of this the steel, which I am going
to use as the material for the beam it has got some yield strength, it has got some ultimate
strength, and all such things, the moment it is in working condition, there is a possibility
that these particular yield strength, ultimate strength, these strength values are going to
vary a little bit. So, there is fuzziness in the material properties.

Now, if I want to find out a very efficient design for this. Now, what you will have to do
is, so we will have to consider the fuzziness of the FEM package and we will have to
consider the fuzziness of these particular the material properties. Now, to model these
fuzziness of the material properties and your finite element analysis. So, what you can do
is, we can take the help of your the soft computing and then, we can use the principle of
the hard computing; that means, we can use the finite element analysis to carry out these
particular analysis in a very efficient way, so that we can get one efficient optimal design
for this particular cantilever beam.

529
So, this is an example of the hybrid computing. Now, another example, we can take that
is PID controller trained by soft computing. Now, as I told that the gain values, that is
your K_P, then comes your K_I and this K_D values, now these are determined
mathematically and once those values have been determined, these are kept constant.
Now, supposing that the motor is working in a particular cycle of time, now during that
the cycle time, so we are not going to vary this particular K_P, K_I and K_D and once
these values have been determined. So, those are kept constant. So, this is the
conventional the way, we use the PID controller to control the DC motor.

Now, there could be a possibility that we may need some adaptive controller, where the
values of this K_P, K_I, and K_D are going to vary depending on the situation,
depending on the requirements. Now, if we can vary the values for these K_P, K_I, and
K_D depending on the requirements or depending on the situation, we will be getting
some adaptive PID controller, where the values for this K_P, K_I and K_D actually are
going to vary, depending on the requirement. This is another very good example of the
hybrid computing. Now, the combined tools used in soft computing, their working
principle, we have already discussed for example, say we have discussed the genetic-
fuzzy system, genetic-neural system and the neuro-fuzzy system or the genetic neuro-
fuzzy system and so , and these combined tools, the soft computing tools can be utilized
to solve the real-world problem in a very efficient way.

Now, let me once again discuss a little bit, why should we go for these combined tools as
the soft computing tools? Now, this we have already discussed that each of these
particular tools is having its own merits and demerits. For example, say if you see the
fuzzy reasoning tool, the fuzzy reasoning tool is a very efficient tool powerful tool for
the imprecision and uncertainty, but it has got one drawback, drawback in the sense like
if you want to implement the fuzzy reasoning tool or if you want to find out the
knowledge base for the fuzzy reasoning tool, you should have at least some information
of this process, which you are going to model; that means, the physics of this particular
process has to be known a little bit, otherwise you cannot design the database and a rule
base initially.

Now, if we can design the database and rule base initially, and after that you can
optimize with the help of some optimizer using some training scenarios, but initially, you
will have to design some database and rule base. And, if you want to design these

530
database and rule base, you should have at least some preliminary information of the
physics of the process. Now, there is another demerit for this fuzzy reasoning tool, which
we have already discussed. Now, as the number of inputs increases and if we use more
number of linguistic terms to represent each of the design variables or the input
variables, the number of rules is going to increase and that is going to increase actually
the computational complexity of these particular the algorithm.

So, these are the merits and demerits of fuzzy reasoning tool. Similarly, if you see the
neural network, its computational complexity during the training is much more compared
to that of your fuzzy reasoning tool, on the other hand this particular neural network can
handle a large number of design variables or the input variables. So, it has got its own
merits and demerits and as I told in soft computing, we generally go for the combined
tools just to capture the merits of these tools and to remove their demerits and we have
seen that if properly designed, these soft computing tools can handle the real-world
problems in a very efficient way.

Thank you.

531
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 38
Concepts of Soft Computing and Expert Systems (Contd.)

Now, we are going to explain the terms: expert systems or the knowledge based system.
Now, let us see, what do you mean by this particular expert system or knowledge-based
system and let us try to understand, why do you need this expert system? what are the
utilities of these expert systems?

(Refer Slide Time: 00:36)

Now, to start with, let me mention that we human-beings, we have got a natural thirst or
quest to know input-output relationship of a process. Now, supposing that, we have got
one engineering system or a process here and supposing that we have got a number of
inputs, say m number of inputs, we have and supposing that we have got n number of
outputs. So, our aim is always to find out, what should be the input-output relationship,
for example, if this is the set of inputs what should be the set of outputs and vice-versa.
Supposing, that I want to achieve this set of outputs. So, how to set my input parameters
so, that I can reach that particular level of outputs.

Now, mathematically, if you see. So, this output is nothing, but the function of the input
parameters. So, in the matrix form this can be written as the output matrix is nothing, but

532
the transformation matrix T multiplied by the input matrix and this is actually what we
mean by the forward mapping. So, in forward mapping so, we try to find out the output
matrix, if I know the input matrix and supposing that we know these particular, the T
matrix that is nothing, but your transformation matrix.

So, T matrix is nothing, but the transformation matrix. So, if I know these particular
transformation matrix and if I supply the I matrix, very easily I can find out what should
be the output matrix and that is nothing, but the forward mapping. Now, supposing that
my requirement is reverse. So, I want to reach a level of output, then how to set my input
parameters. And, that is what you mean by the reverse mapping. Now, this mathematical
expression, if I just multiply both the sides by T inverse, that is the inverse of these
particular the transformation matrix. So, this will become [T ]−1[O] = [ I ] .

Now, here supposing that I know this particular output matrix, which I am going to reach
and supposing that the inverse of the transformation matrix is known to us. So, very
easily, you can find out how to set these particular input parameters. This is what is
known as the reverse mapping. Now, for any such process or the system, we should
know both the information both in the forward direction as well as in the reverse
direction; that means, we should be able to carry out both the forward mapping as well as
reverse mapping. And, if we can carry out that both forward as well as the reverse
mapping, we can automate that particular process and our aim is ultimately to automate
that particular engineering system or the engineering process.

Now, let us see, how to do it. Now, before I proceed further, let me tell you that if I want
to carry out that particular reverse mapping. So, we should know the inverse of this
particular transformation matrix; that means, the transformation matrix has to be
invertible. That means, the inverse of this particular transformation matrix should exist;
that means, this particular transformation matrix has to be non-singular. And, if that
particular condition is fulfilled, then only we can carry out this type of reverse mapping.

Now, let us see like how to develop this particular expert system and what are the
different components of this particular expert system.

533
(Refer Slide Time: 04:56)

Now, let me once again go back, how do you solve any such engineering problem. So, as
I told that we first try to see the physics of this particular process and once you have
understood the physics, we try to express mathematically, supposing that we are getting
the differential equation. So, we solve the differential equation and once you have got
the solution, now actually we can use the control action. Now, this is actually one way of
solving the problem. So, this is nothing, but hard computing, which I have already
discussed. Now, there is another way of getting the solution, for example, say supposing
that we do not know the physics of the process 100 percent. So, we are not in a position
to derive the differential equation. So, what you can do is, we can carry out some real
experiments.

Now, to carry out the real experiment; so, we follow some statistical design of
experiments, for example, say DOE, which stands for design of experiments. We can use
some full factorial design or we can use say the half factorial design or we can use
central composite design, that is, CCD or there are some other designs like Box Behnken
and design and others. So, we just follow a particular design and we try to carry out some
real experiment. And, once I have got that particular data for the real experiment
collected in this particular fashion, say either full factorial or half factorial or the
fractional factorial or central composite design, you can carry out some sort of statistical
regression analysis.

534
Now, by carrying out this statistical regression analysis, what you can do is, we can find
out output as a function of input parameters, that means, we will be getting in the matrix
form, output is nothing, but the transformation matrix multiplied by input. Now, this
particular transformation matrix has to be invertible if I want to carry out the inverse
mapping or the reverse mapping. If you want to carry out the reverse mapping, this
particular transformation matrix has to be invertible.

Now, there is no guarantee that every time you will be getting the invertible
transformation matrix, for example, say this T matrix could be a non-square matrix.
Now, it could be a singular matrix now, in that case actually, we cannot carry out the
reverse mapping, that is your [T ]−1[O] = [ I ] . So, if we cannot carry out the reverse
mapping, then what is the solution, the solution is for a complicated problem, we can
take the help of some sort of expert system or the knowledge based system and using this
expert system or the knowledge base system, you can find out the input-output
relationship and here, actually we do not need actually the hard computing. So, we will
have to take the help of some sort of soft computing to find out or to establish the input-
output relationship.

Now, this is actually one of the applicability or I should say the plus point of using this
particular expert system; that for a complex problem actually, we can model the input-
output relationship using this expert system or the knowledge based system. Now, the
same example, which I took a few minutes ago, for example, say the example of
controlling the temperature and humidity of this particular room. So, what we can do is,
we can establish the input-output relationship and we can design and develop one expert
system. Now, this expert system is actually going to help us just to control what should
be the angle of valve opening and depending on the requirement, depending on the
temperature and humidity and load of this particular room.

So, this particular expert system is going to change the angle of valve opening just to
keep the temperature and humidity within a very comfortable zone. So, this could be one
of the applications of this expert system.

535
(Refer Slide Time: 10:17)

Now, if you see the definition of this expert system, you will see that this expert system
is nothing, but a computer program, which is used just to simulate the human reasoning
to solve a particular problem. Now, let me try to find out the difference between this
particular expert system and an ordinary computer program. Now, let me take the same
example, the same example of controlling the valve opening of the air conditioner just to
keep the temperature and humidity of this particular room in a very comfortable zone.

Now, there are at least two ways of solving this particular problem. The first method is:
you find out the physics, find out the differential equation and solve it and accordingly,
you control the angle of the valve opening. And, another way of solving this particular
problem is as follows: supposing that I have got a human operator. So, this human
operator is going to sense the temperature and humidity inside this particular room and
accordingly, he or she is going to control the angle of valve opening depending on his
experience or his expert system.

Now, the second example is the example of expert system, but the first example is not
actually an expert system and that is nothing, but the solution using the principle of hard
computing. And, here in the first the solution, what we can do is, we can develop one
computer program. We can solve the differential equation, if any, we can find out the
solution, but in second method, we are not going to solve any differential equation. We

536
are going to actually copy the way one human operator is going to control that particular
angle of valve opening depending on his observation of the temperature and humidity.

Now, in expert system or the knowledge based system, we try to copy the behavior of
that particular human-being. And, that is why, we say that all the computer programs
may not be the expert system, but all the expert systems are nothing, but the computer
programs. Now, this is the way actually, we define the knowledge based system or the
expert system.

(Refer Slide Time: 13:00)

Now, here, I am just going to put some notes, these things I have already mentioned that
in an expert system, we try to simulate the human reasoning just to solve that particular
problem. But, we do not simulate the domain of that problem; that means, we do not try
to find out the physics or the differential equation of that particular problem.

Now, in expert system actually we take the help of some heuristic or approximate
methods, we do not use the principle of the hard computing here and in expert system, in
fact, we do not use any statistical method also or we do not use any algorithmic
approaches. So, what we do is, in expert system, we try to design and develop the
knowledge base. So, we try to find out that the knowledge base of that particular expert
system so, that it can perform in the optimal sense, as the situation demands. Now, if you
see the construction of the knowledge base, it looks like this.

537
(Refer Slide Time: 14:23)

Now, construction-wise, a knowledge base consists of a few components, for example,


say there will be some inputs. We will have to provide the inputs and our aim is to
determine these particular outputs. And, inside we have got the inference engine and we
have got the knowledge base and this knowledge base consists of the data base and rule
base. Now, the concept of data base, concept of rule base, concept of knowledge base,
these things we have already discussed in much more details. Now, just to summarize.

So, this particular concept of data base, I should say that the data base consists of a few
numerical values used to represent the physical parameters. Now, while discussing the
concept of fuzzy reasoning tool, we have discussed the concept of this data base, that is
the membership function distribution in much more details, for example, the height could
be the low height, it could be medium height, it could be the very high height, I should
say. Now, what should be the range of the height, so that we can say it is the low height
or what should be the range of the height. So, that we can say, it is the medium height.
So, it has got some numerical values.

Now, those values are going to constitute, what we mean by the data base for the
different variables. Now, then comes the concept of the rule base and as I have already
discussed, a particular rule is nothing, but the relationship between the inputs and the
outputs. Now, if this is the set of inputs what should be the set of outputs something like
this. And, to design a particular rule, we take the help of the database. And, as I told that

538
a rule is nothing, but the known input-output relationship and in a rule base, we have got
a large number of rules.

Now, we have got the inference engine and the function of the inference engine is to
determine which part of the knowledge base will be activated just to give reply to a set of
queries. Or, if we have or if you pass a set of inputs and if you want to determine what
should be the output; then which part of the knowledge base is actually going to be
activated just to give reply or answer that will be decided by your inference engine. So,
that is the function of the inference engine. Now, let me take a very simple example,
whenever in a class the students ask questions or queries to the teacher. The teacher uses
his or her knowledge base to give reply to that particular query.

So, if certain part of this knowledge base is activated and the teacher is going to give
reply or give answer to the queries of the student and in fact, his inference engine is
going to help like which part of the knowledge base should be activated to give reply to a
set of questions. So, this is the way actually, one inference engine who works and by an
expert system, we mean it should have the inputs, it should have inference engine and the
knowledge base, which consists of database and rule base and it should have the outputs.

Now, how to design this particular the knowledge base, that is the data base or the rule
base. To design this particular knowledge base, we take the help of actually the principle
of the soft computing. We can use some fuzzy logic-based expert system. We can
develop some neural network-based expert system and to gather this particular the
knowledge base actually and just to retain that particular information, we use the
structure of neural networks. Or, we use actually the structure of some sort of the fuzzy
reasoning tools. Now, here I just want to take another example, just to understand, what
do you mean by the knowledge base. For example, the participants, who are going to
take this particular course as a credit course or audit course or whatever may be, what is
the purpose of taking this particular course, the purpose of taking this particular course is
to develop their knowledge base in this particular field. So, after attending this particular
course, there is a possibility that some sort of knowledge base will be designed and
developed in the head of the participant and the participants will be able to handle this
type of problem in a very efficient way, that is actually the purpose of taking this
particular course.

539
(Refer Slide Time: 20:14)

Now, if you see, why do you need an expert system, this I have already mentioned that if
I have got a complex real-world problem. So, we cannot find out the differential
equation, we cannot use the principle of hard computing and to get some acceptable
solution, we will have to take the help of some sort of expert system and to develop the
expert system or to develop the knowledge base of the expert system, we take the help of
some sort of soft computing.

Now, our experience says that if this particular expert system is developed in a very
efficient way, there is a possibility that it is going to solve or it is going to give some
solutions, which is bit difficult to foresee beforehand. Let me take a very simple
example, supposing that we have designed and developed say one fuzzy logic-based or
neural network-based expert system for an intelligent robot. Now, there could be a
possibility that while training that particular robot, a few situations, a few problem
scenarios I had not considered.

But, there is a possibility that this fuzzy logic-based expert system or the neural network-
based expert system can give rise to some sort of adaptive solution. And, that particular
adaptive solution is bit difficult to foresee beforehand, but this expert system, if it is
properly designed, there is a possibility, it can provide some feasible solutions even to
slightly unknown situations. So, that possibility is there. So, expert systems are very
much useful very effective and it has got a lot of applications.

540
(Refer Slide Time: 22:25)

Now, if you see the applications of this particular the expert system, we will see that we
are using expert system to solve a variety of problems. I am just going to take a few
examples, but before that let me tell you that as I have already mentioned that to design
the expert system to design and develop the knowledge base of the expert system, we are
going to take the help of some sort of the soft computing.

And, which we have already discussed, we have already discussed the principle of fuzzy
logic. We have already discussed the working principle of different types of neural
networks. We have also discussed their combined tools and in fact, we have already
learned the principle of soft computing. And, that principle of soft computing can be
used very efficiently to design and develop the expert system.

Now, this particular area, if you see. This is coming under the umbrella of your
knowledge engineering. So, in short this is known as your KE and this is also known as
applied artificial intelligence. So, applied AI and this is also coming under the purview
of computational intelligence, which is very popularly known as CI, the principles of
which we have already discussed, those are nothing, but the principles of computational
intelligence. Those are nothing, but the principles of applied artificial intelligence and
that is also coming under the umbrella of knowledge engineering.

541
(Refer Slide Time: 24:16)

Now, let us see a few applications of already developed some expert systems or the
knowledge-based systems. Now, if you see the literature, we have got a large number of
expert systems available for example, say you might have heard about this expert system,
which is known as DENDRAL.

So, DENDRAL is actually a very popular expert system, which has been used just to
determine the structure of the chemical compounds by controlling actually the
constituent elements. Now, supposing that we have got a set of constituent elements.
Now, if I use the set of constituent elements under different operating conditions, there is
a possibility, we may get different types of chemical compounds. And, this particular
DENDRAL is actually a very popular expert system or the knowledge based system.
And, its aim is to determine the structure of chemical compounds for a set of constituent
elements.

Now, using this DENDRAL, people could actually discover a number of unknown
structures of chemical compounds. Now, then comes another very popular expert system
or the knowledge based system, which is known as MYCIN. Now, MYCIN is a very
popular expert system used in the field of medical science. And, the aim of this expert
system is to diagnose infectious blood diseases. And, it will also be going to recommend
a list of therapies to the patients. So, this is also a very popular expert system used in
medical science.

542
Now, another expert system, I hope you have already heard about IBM’s Deeper Blue.
Now, this IBM’s Deeper Blue is actually one expert system, which was developed in the
year 1997. And, this expert system could defeat world chess champion Gary Kasparov.
Now, this is nothing, but an expert system. So, IBM’s Deeper Blue is nothing, but an
expert system, which could even defeat Gary Kasparov.

Now, here this particular problem of chess playing is actually a problem of static
environment. Now, if you see the chess board, we have got a fixed location for the
different members. And, we try to play that particular thing, that you can win. So, the
environment is known to the players and there are a lot of calculations and lot of
thinking, so that one player can win that particular the game. So, this particular thinking
process has been modeled and this expert system was developed by IBM and which
could defeat Gary Kasparov.

Now, if you see today, we in fact, talk about the dynamic environment and how to design
and develop the expert system for dynamic environment, that is actually a very difficult
task. For example, we might have heard about the soccer playing, the soccer playing
games or soccer playing robots and that is nothing, but the multi-agent system of
robotics, that is MAS; multi-agent system of robotics. Now, this multi-agent system of
robotics, it could be either centralized or it could be decentralized.

Now, if it is centralized, then there will be one controller and one main computer, which
is going to control the movement of the different agents. Now, agents are nothing, but
intelligent robots and if it is decentralized, there is no centralized control and each of the
robots is intelligent. They are able to take their decisions and they are going to interact
and ultimately they are going to reach their goal. Now, this multi-agent system of
robotics, particularly if you consider the decentralized system is bit difficult. And, soccer
playing robots, we can consider that this is some sort of more towards decentralize multi-
agent system of robotics, although it has got some centralized component also. But, it is
more towards decentralized and that is actually a very complex problem.

Now, to solve this soccer playing games problem or the decentralized multi-agent system
in robotics, we can take the help of some sort of knowledge based system or the expert
system. Now, this knowledge based system or the expert system can be develop using
the principle of fuzzy logic, using the principle of neural network, which we have

543
already discussed in much more details and we have solved a number of numerical
examples also.

Thank you.

544
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 39
A Few Applications

I have explained the principles of fuzzy logic and neural networks with the help of some
numerical examples. Now, we are in a position to discuss a few applications of these
tools and techniques. Now, if you see the literature, a huge literature is available on
various applications of fuzzy logic and neural networks in different areas like the general
science, engineering science, commerce, and so on.

Now, time is short. So, I will not be able to discuss a large number of applications.

(Refer Slide Time: 01:12)

So, what I have decided, I will be concentrating mainly on two applications related to
intelligent and autonomous robots like how to design and develop intelligent and
autonomous robots using the principle of fuzzy logic and neural networks. And, then I
am going to concentrate on intelligent data mining. So, how to extract useful information
from a data set using the principle of fuzzy logic and neural networks. So, let me first
concentrate on the intelligent and autonomous robot like how to design and develop
intelligent and autonomous robots.

545
Now, before I start with this particular application or let me tell you one thing. Now, you
need not worry, if you have not taken any course or if you do not have the fundamental
information of robotics. So, the fundamentals of robotics is not required. In fact, the
thing, which I am going to discuss. So, without discussing the principle of robotics, I am
going to discuss, how to make it intelligent and autonomous using the tools and
techniques like fuzzy logic and your neural networks.(Refer Slide Time: 02:34)

Now, let us try to see like what do you mean by an intelligent robot and what is an
autonomous robot. Now, by definition, an intelligent robot is a robot, which we will be
able to take the decision as the situation demands. Now, this robot could be either a robot
with fixed base like manipulator, it could be mobile robots like your the wheeled robots,
multi-legged robots like six legged robots or four legged robot or it could be even the
biped, there could be tracked vehicles, there could be your drones, and so on.

Now, if I want to make it intelligent, I will have to add a few features to this particular
robot. So, these features, I am going to discuss and how to develop these features using
the principle of the fuzzy logic and neural networks. Now, if I proceed further, let me try
to explain what do we mean by an autonomous robot. Now, to start with let me tell you
that intelligent robot an autonomous robot there is a basic difference between an
intelligent robot and an autonomous robot. Now, autonomous robots are those intelligent
robots, which are having the permission to act in an intelligent way; that means, your all
the autonomous robots are intelligent robots. But all the intelligent robots may not be the

546
autonomous robot. Now, here in this discussion, I am just going to concentrate only on
the intelligent robot.

Now, to develop this intelligent robot, what we will have to do is, we will have to design
and develop some sort of adaptive motion planner and then adaptive controller. And, to
develop this adaptive motion planner and controller, the principle of artificial
intelligence, that is, AI or the computational intelligence, that is, CI is to be merged with
robotics. Now, if you see the literature, a huge literature is available on soccer playing
robots. The main purpose of the soccer playing robots is actually how to design and
develop adaptive motion planner, adaptive controller for the intelligent robots.

Now, this soccer playing robots actually will constitute one intelligent robotic system,
that is called the multi-agent system of robotics. Now, in short, this is known as MAS.
Now, this multi-agent system of robotics could be either centralized or it could be
decentralized. Now, if it is centralized, then it is having one central computer to control
the activities of the different robots, different intelligent robots. On the other hand, if it is
decentralized, there is no such centralized control, all the robots are intelligent. All are
agents and there will be activities of these particular agents, that is nothing, but
decentralized multi-agent system of robotics.

Now, here among these agents or your intelligent robots, there will be competition, there
will be cooperation and of course, they are having one goal and that particular goal has to
be fulfilled. Now, this particular robotic system is bit difficult and designing a suitable
motion planner and controller is bit complicated. Now, if you see, you might have heard
about the robocup. The robocup is nothing, but a competition or soccer playing robots
and the goal of the robocup has been set as follows: by the mid twenty first century, a
team of autonomous humanoid robots shall beat the human world champion team under
the official regulations of FIFA; that means, a team of humanoid robots intelligent robots
should be able to defeat the world-cup champion team by following the regulations of
FIFA. Now, this is not an easy task.

Now, if you want to reach that particular milestone, there should be a lot of activities,
particularly how to design and develop the humanoid robots that I am not going to
discuss. But, once you have designed and developed the humanoid robots, how to make
it intelligent and if required autonomous, so that it can serve the purpose, it can beat the

547
human world-cup champion football team, now this particular task, as I told, is a very
complicated one. Now, here, what I am going to do is, I am just going to take a very
simple example and let us try to see how to tackle that particular problem. And, if we can
tackle that particular problem, then with some modification we can also take the problem
related to decentralized multi-agent system of robotics.

(Refer Slide Time: 08:19)

Now, let us try to concentrate on a problem, which is very simple I should say like and
that is related to how to design and develop the intelligent robots. Now, if you want to
make the robot intelligent, we will have to concentrate, for example, say we will have to
concentrate on how to develop the adaptive motion planner. Now, the purpose of the
motion planner is to take the decision as the situation demands.

Now, if this is the set of inputs. So, how to find out the decision, that is the output
decision? So, that it can tackle that particular situation in a very effective way. The next
is your adaptive controller. I am going to discuss, in details, now at each of the robotic
joint, we use some motors those are called the DC motors and for each of the motors,
there will be a controller. Now, I will be discussing, in details, we generally use like PD
controller or PID controller and we can find out, what should be the adaptive values for
the gain values, that is, K_P, K_I and K_D.

Now, then we will concentrate on the robot vision or the computed vision and how to use
the principle of fuzzy logic and neural networks. So, that the robot can visualize the

548
different objects lying in the environment and this principle of robot vision and the
computed vision. I will be giving a brief introduction and then, I will try to find out the
regions, where we can use the principle of the fuzzy logic and neural networks to tackle
this problem in a very efficient way.

Now, if we concentrate on the biped robot, the humanoid robot, then of course, we will
have to concentrate on adaptive gait planning or adaptive gait generation. So, we will be
discussing like how to plan the adaptive gaits depending on the environment or
depending on the requirement. So, I will be concentrating one after another these
particular problems. Now, let me first start with the adaptive motion planner. Now, this I
have already mentioned that the purpose of motion planning is to determine the course of
action, for example, say if I want to find out the collision-free path for a particular robot,
the robots should not collide while moving from a particular location to another location.
And, to ensure that collision-free movement of this particular robot with the moving
obstacles, we will have to make some strategies, we will have to make some planning.

Now, let us see, how to tackle this type of problem. So, we are going to concentrate on
how to design and develop the adaptive motion planner using the principle of fuzzy logic
and neural networks.

(Refer Slide Time: 11:41)

549
Now, to start with, let me take a very simple example of one mobile robot. And, this
particular mobile robot is nothing, but actually a two-wheeled robot. So, this is actually
the robot that two-wheeled robot. So, here I have got one wheel, I have got the second
wheel here and for support actually I have got a caster here. So, caster is actually one
wheel, but it has got no drive unit and supposing that to operate. So, for these two
wheels, say I am using say two DC motors.

Now, let us see, how to make a plan. So, that it can avoid collision while moving from an
initial position to the final position. Now, supposing that this is the initial position of the
robot. So, let me consider the cg of these particular two-wheeled differential drive robot
and supposing that this is the goal for this particular robot. Now, if you imagine that
there is no such obstacle here in this particular environment. So, starting from here and if
you say that you try to find out the collision-free time-optimal path very soon, very
easily it will try to find out these directions or these paths either the collision-free time
optimal path. So, the robot should be able to reach the goal in minimum time by avoiding
any such collision, but if there is no obstacle, there is no question of collision.

Now, supposing that in this particular environment, there are a few moving obstacles for
example, say here I have got one moving obstacle say, O_1. Another moving obstacle,
say O_2. Then, we have got O_3, O_4 and O_5. So, here, I am just going to consider
five moving obstacles and these show actually the directions of movement.

Now, if you want to solve this particular problem, the first thing we will have to do is, at
a particular time, say t equals to t_1, I am just going to make the plan. So, I will have to
find out the position the predicted positions of these particular moving obstacles and
supposing that at time t equals to t_1 or, I am just going to make planning for time t
equals to t_1. So, this is the initial position of the robot.

Now, at this particular position, it will try to find out the most critical obstacle. Now,
how to find out the most critical obstacle? To find out the most critical obstacle actually,
what it will have to do is, it will have to find out, what is the distance between the robot
and the moving obstacles. So, we try to find out the distance between the robot and the
obstacle and at the same time, we try to see the direction of movement of these particular
obstacles. So, we decide the most critical obstacle by considering the distance between

550
the robot and the obstacle and we consider the direction of movement of this particular
obstacle. Now, supposing that, this is nothing, but the most critical obstacle.

So, this is the most critical obstacle, based on that I will have to make the planning, how
to make the planning, now what we do is, this is the initial position, this is the final
position and this is the most critical obstacle. So, what we do is, we try to find out the
distance between the robot and the most critical obstacle. So, this distance or up to this
distance we consider as one of the inputs for the motion planner. And, another input will
be the angular information that is the angle between the goal, the present position of the
robot and this particular obstacle. So, we try to consider particular this angle as one of
the inputs. So, we have identified, in fact, two inputs for this particular process. One is
nothing, but the distance between the robot and the most critical obstacle and another is
nothing, but the included angle.

So, the angle between the goal, the present position of the robot and this particular the
obstacle. And, what should be the output. The output is actually the angle through which
the robot should move to avoid collision with this most critical obstacle.

Now, supposing that this is the angle of deviation like if the robot wants to avoid
collision with this particular the most critical obstacle, supposing that the robot should
deviate by this particular angle. So, the output is nothing, but the deviation angle and
there should be another output, that is the speed of this particular robot; that means, the
speed or the velocity with which this particular robot is moving to avoid the collision.

Now, the speed is decided by the acceleration. So, we try to find out the acceleration of
this particular robot as one of the outputs. So, this is nothing, but the motion planner and
there are two inputs like one is the distance, another is the angle and there are two
outputs. One is the deviation, another is nothing, but the acceleration. Now, let us see
how to model this particular problem with the help of fuzzy logic system or how to
model with the help of some neuro-fuzzy system.

551
(Refer Slide Time: 18:06)

Now, before I go for that. So, this can be treated as a particular problem, whatever I
stated. So, this can be treated as a minimization problem or the optimization problem,
where the aim is to minimize the travelling time. So, our aim is to minimize the
travelling time subject to the condition that the path is collision-free. There should not be
any collision between the robot and the obstacle, and the kinematic and dynamic
constraints are to be satisfied. Now, here as I told that this is actually a car-like robot and
if it is a car-like robot, we will have to consider the non-holonomic constraint. So, I am
not going to concentrate on this non-holonomic constraint and all such things related to
robotics.

So, actually this course does not permit to discuss all such things. So, I am not going to
discuss, but let me tell you that whenever you are going to make some planning for this
particular movement, the kinematic constraint like non-holonomic constraint and then,
the dynamic constraints like that generated the torque or the force and all such things
should lie within their pre-specified ranges.

552
(Refer Slide Time: 19:38)

Now, after maintaining all such constraints, I will have to make some planning for the
movement of this particular robot. Now, this schematic view like it shows how to tackle
this particular problem using the structure of the neuro-fuzzy system, Mamdani
approach. Now, this I have already discussed in details, while discussing the neuro-fuzzy
system, the Mamdani approach. Now, you see the physical problem, which I discussed
that can be tackled very efficiently. So, by using this particular structure of neuro-fuzzy
system, we have already discussed, which consists of five layers. So, we have got layer 1
is the input layer, layer 2 is the fuzzyfication layer, layer 3 is the end operation layer,
layer 4 is the inference engine layer and layer 5 is nothing, but the output.

Now, here I am passing two inputs. One is the distance and another is your angle and
ultimately I will be getting two outputs, one is the angle of deviation and another is
nothing, but is your acceleration. Now, its working principle I have already discussed
and we have solved some numerical examples also. So, I am not going for discussing this
particular neuro-fuzzy system. So, we will have to design and develop as a motion
planne,r adaptive motion planner.

Now, depending on the inputs like distance and angle, we will try to find out what should
be the angle of deviation and what should be the acceleration of the robot. Now, using
the information of acceleration, as I told, we can find out like what should be the speed
or the velocity. So, this is actually how to use one fuzzy reasoning tool based on

553
Mamdani approach using the structure of a neural network. Now, this particular neuro-
fuzzy system, we can train with the help of some nature-inspired optimization tool like
genetic algorithm.

Now, if you use genetic algorithm, through a large number of iterations, it will try to
evolve this particular the neuro-fuzzy system. Now, this neuro-fuzzy system, the evolved
neuro-fuzzy system can be implemented online at the different steps of this particular
motion planning problem. Now, as I told that this motion planning problem will be
solved step-wise, at time t equal to t_1, I will try to find out what should be the step
direction. And, next once again at time t equals t_2, once again, I will have to re-plan
with the help of this trained neuro-fuzzy system.

So, it is possible and very easily actually, we can tackle this type of problem and we
carried out some real experiments also. I will give you how to carry out these experiment
and we will see that we can develop this type of neuro-fuzzy system as the adaptive
motion planner.

(Refer Slide Time: 22:47)

The next comes your the same problem, how to tackle with the help of ANFIS. Now,
once again the ANFIS has got like layer 1, layer 2, layer 3, layer 4, layer 5 and layer 6.
And, the working principle of ANFIS, we have already discussed in details and we have
solved some numerical example. So, I am not going to discuss the working principle of
this particular ANFIS. The thing which I am going to discuss is, if I pass a set of two

554
inputs like distance and angle, there is a possibility that I will be getting the deviation
and acceleration for this particular robot, so that it can avoid collision with these
particular moving obstacles.

Now, here, as I told once again, I can use a genetic algorithm like I can develop this GA-
ANFIS. And, GA will try to evolve this particular neuro-fuzzy system based on Takagi
and Sugenos approach, that is nothing, but ANFIS. So, this particular evolved ANFIS
will be able to tackle the motion planning problem in a very adaptive , in a very efficient
way.

(Refer Slide Time: 24:17)

So, this shows like how to use the fuzzy reasoning tool using the structure of a neural
network and which works based on the Takagi and Sugenos approach.

Now, once we develop this particular algorithms, we try to implement on the real
experiment. Now, let us see, how did we carry out the real experiment just to implement
this type of adaptive motion planner. Now, before I proceed let me tell you that we got
one project from the Department of Science and Technology, Government of India. So,
in this DST project actually, we implemented this real experiment. So, DST,
Government of India, funded this particular project.

Now, let us see how to tackle this particular problem. Now, here you see. So, I have got a
field and inside this particular field, I have got one robot and I have got one moving

555
obstacle. Now, let us see, how can these particular robot avoid collision with these
particular moving obstacles. Now, how to tackle this particular problem. Now, to tackle
this particular problem, the first thing we will have to do is, we will have to capture
information of the environment. Now, how to capture this information of the
environment? To capture the information of the environment, we will have to use some
camera.

Now, here we have used one overhead camera. Now, similarly there could be onboard
camera also, now with the help of these overhead camera, what we can do is, we can take
a snap at a regular interval of this particular environment. So, you will be getting some
pictures. Now, that particular picture actually collected with the help of this CCD camera
will pass through the BNC video cable and this will enter the CPU, that is the computer.
So, here we have got the CPU and on the CPU we have got one my vision board, that is
the hardware to carry out the image analysis. So, inside this particular CPU, we had my
vision board, as I told that this is nothing, but the hardware just to implement the
computer vision or the robot vision.

Now, with the help of this my-vision board, what we can do is, we could find out, we
could carry out the image analysis. And, through this image analysis, we can find out the
information of this particular environment; that means, we can find out the position of
this particular robot. We can also find out the position of this particular obstacle. And,
once we have got the position of this robot and this obstacle now, we are in a position to
find out what should be the distance input, that is the distance between the robot and the
obstacle and it has got a goal. So, with respect to goal, what is the angle input that also
we can find out. And, once we have got that now, we are going to use say neuro-fuzzy-
based motion planner just to find out what should be your angle of deviation and how
much should be the acceleration of this particular robot.

So, we have got this particular information, that is deviation of this robot and another is
nothing, but the acceleration. Now, this particular information, we find out
corresponding to a particular time step or a distance step; that means, I am just going to
do this planning at time t equals t_1 for a small distance, that is nothing but d_1 and once
again, I will have to re-plan at time t equals to t_2 corresponding to another distance
step, say d_2 and so on.

556
Now, supposing that, we have got the deviation and acceleration. Now, we will have to
achieve this particular deviation and acceleration. Now, how to achieve this particular
acceleration and deviation as I have already mentioned that we carried out this particular
experiment on a two wheeled one caster robot. And, that is nothing, but actually two
wheeled differential drive robot.

So, to operate this particular robot, we had two DC motors and who are going to generate
the movement of the wheel. It is actually the DC motors and for these DC motors, there
should be controller.

Now, we use PID controller just to control this particular motor. Now, what do you need
is, we will have to calculate how much should be the torque generated by the motor
mounted at the two wheels. And, how much should be the rpm, so that we can reach this
particular angle of deviation and a particular value of acceleration of this robot that can
be determined mathematically or analytically. And, once you have got this particular
information that what should be the torque and what should be these particular rpm of
the two motors mounted at the two wheels, now, we will have to implement that, how to
implement. So, this particular information that what should be the rpm of the right wheel
and what should be the rpm of the left wheel, we will have to pass to this particular
controller of the motor. And, how to pass, we took the help of some radio frequency
module, that is, the RF module and this is wireless communication and through this RF
module, this information is going to the controller and the controller is going to control
the movement of the robot. And, ultimately we will be getting the accurate movement of
this particular robot; that means, we will be able to develop the angle of deviation and we
will be able to follow the prescribed values for these particular acceleration or speed of
the robot.

Now, this is the way actually, we implemented and we carried out the real experiment.
Now, this is a very simple experiment. Now, to make it more complex, what we will
have to do is, we will have to consider more number of the moving obstacles or we can
consider the more number of the mobile robots. Now, if I consider more number of
mobile robots like multiple robots. This will become once again the multi-agent system
of robotics, which I introduced a little bit. So, ultimately the problem related to the multi-
agent system of robotics has to be solved and that is not an easy problem. As I told one

557
thing here, all such activities are to be done within a fraction of second, otherwise there
is a chance of collision.

So, all such things, all such steps we will have to implement within a fraction of second,
then only we will be able to carry out this particular experiment.

Now, as I told that this particular training of the motion planner is carried out actually
offline, because if you take the help of some optimizer, it will be computationally very
expensive. So, online training could be difficult. So, within a fraction of second, we may
not get that particular the information and that is why, for this particular motion planner,
the adaptive motion planner, the training is provided offline.

Now, if we can implement the online training that will be much more interesting,
interesting in the sense like supposing that this is the field and we have got a large
number of the robots, the planning robots and we have got a few obstacles also, all are
moving. Now, initially actually there could be a few collisions. Now, from these
collisions and the experience, it will gain some information and if you can take the
feedback and implement that type of training scheme, there is a possibility that we can
implement the principle of online training.

So, by online training, we mean the robot will be working and while working actually it
will try to learn from the environment from its success and failure.

558
(Refer Slide Time: 33:48)

And, that is actually, what do you mean by the online learning of the motion planner.
Now, if we can implement this particular online learning of the motion planner that will
be much more interesting because for this offline training, there could be a few
scenarios, which could be bit difficult to foresee beforehand and there actually this
online training is going to help a lot.

So, the same principles of fuzzy logic and neural networks can be utilized just to
implement the online learning of this particular motion planner. And, next is your
decentralized multi-agent system of robotics, which I have already discussed. Now, let
me once again discuss a little bit. Now, this decentralized multi-agent system of robotics
is bit difficult to implement. Now, supposing that I have got two teams of soccer playing
and each team say consists of say, say for simplicity, only six players.

Now, each of the six players is having its own goal and that is actually are dependent on
the main goal of the team. What is the main goal of the team? It is to win that particular
match or to score the more number of goals. Now, to win this particular match or to
score more number of goals, they will have to follow some strategy and these robots will
have to work accordingly.

There will be competition with the opponents and there will be cooperation among the
team mates and through these competition and operation, these multi-agent systems, that
is, the intelligent robots are going to reach that particular goal, that is to win that

559
particular game. Now, this particular principle of cooperation and principles of
competition can be implemented very efficiently using the principles of fuzzy logic and
neural networks.

Thank you.

560
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 40
A Few Applications (Contd.)

(Refer Slide Time: 00:15)

Now, we are going to discuss how to design and develop adaptive controller. Now, in
robotics, what we do is, at each of the joints, we use some DC motor and to control the
motor we use some controller. Now, I have already mentioned about this PID controller,
that is, your proportional, integral and derivative controller. Now, in this PID controller
actually, what we do is, we try to find out some gain values like K_P, then comes your
K_I and K_D. So, K_P is nothing but the proportional gain, K_I is the integral gain and
K_D is nothing but the derivative gain and I have already mentioned that by using some
principle, mathematically we can find out like what should be the values for these
particular your K_P, K_I and K_D.

Now, before I discuss further, let me tell you that how to determine and what are the
different components of a robotic joint torque. Now, let me tell you one thing like if you
have not taken any course on robotics or if you do not have any fundamental information
of the robotics, you need not worry much. Now, the topic, which I am going to discuss is

561
how to use the principle of fuzzy logic and neural network to design and develop the
adaptive controller.

So, you simply try to understand how to use the principle of fuzzy logic and neural
networks just to design and develop the adaptive controller. You need not understand
100 percent of the working principle of a PID controller, but let me spend some time on
that, so that you can get some idea. But, my main emphasis is how to design and develop
the fuzzy logic-based or the neural networks-based adaptive controller.

(Refer Slide Time: 02:26)

Now, if you see the joint torque of a robot it has got some components like
τ = D(θ )θ + h(θ ,θ) + C (θ ) . Now, if you see this particular expression, this is actually the
joint torque, like if I want to operate the robotic joint with the help of a DC motor, the
motor will have to generate this amount of torque. Now, it has got 3 components. Now,
the components are as follows, like D(θ ) is nothing but the inertia terms, then comes

h(θ , θ) is nothing but the Coriolis and centrifugal term and C (θ ) is nothing but the
gravity terms.

Now, I am not going for the derivation of these particular terms, actually you will be
getting a very big expression that we discuss in robotics course but not in these particular
course, but the thing which I am going to tell you that how to generate this particular τ
with the help of a motor using the principle of adaptive controller.

562
(Refer Slide Time: 03:50)

Now, if you see generally we use different types of control schemes, but out of all the
control schemes actually, the most popular one is the partitioned control scheme. Now,
in partitioned control scheme actually, what we do is, we use these particular formula,
that is, τ is actually the torque to be generated by the motor mounted at the robotic joint
and that is nothing but ατ ′ + β . Now, this α is nothing but the inertia term, that is
D(θ ) , which depends on the cross section of the robotic link, it depends on the moment
of inertia of this particular the robotic link. So, I am not going for detailed discussion on
this.

Now, this β = h(θ , θ) + C (θ ) + F (θ , θ) . Now, let us try to concentrate or let us try to find
out how to determine these particular τ ′ , that is actually our main aim.

So, how to generate this particular the τ ′ , so that the motor mounted at the robotic joint
can generate that particular motion, that is the joint angle, that is, θ accurately and it can
provide that particular the torque, the required torque.

563
(Refer Slide Time: 05:28)

Now, let us try to concentrate on the control scheme like how to develop or how to
implement that particular τ ′ using the PD control law; that means, your proportional and
derivative control law, then τ ′ is nothing but theta d double dot plus K_P multiplied by
E, plus K_D multiplied by E dot, that is actually the rate of change of error.

θd + K P E + K D E . Now, theta d double dot is nothing but


Now, here you can see this τ ′ =
the desired angular acceleration. Now, let us see, how can it be generated. So, these
particular things or how to generate these particular τ ′ . Now, if I use the PID controller,
θd + K P E + K D E .
that is, proportional integral derivative, τ ′ =

Now, this K_I is nothing but the integral gain, K_P is the proportional gain and K_D is
nothing but the derivative gain. Now, let us see how to implement the architecture of this
particular controller.

564
(Refer Slide Time: 07:08)

So, you can see that this is actually the control architecture. Now, here, our aim is to
generate this theta d double dot, that is the desired acceleration and here, we have got
one summing junction.

So, we have got a summing junction here and these τ ′ will have to be generated and the
τ ′ actually you can see that we have got these particular D(θ ) . So, these D(θ ) is nothing
but α , as I discussed and β = h(θ , θ) + C (θ ) + F (θ , θ) and if you remember ατ ′ + β is
nothing but the torque.

Now, here if I know this particular torque, that is, your ατ ′ + β is nothing but the torque.
So, this particular torque has to be generated, now by the motor and the controller is
going to help. Now, if I put one load here, the mechanical load, I will be getting some
angular displacement like θ and angular velocity like θ on the output of that particular
the motor. Now, what we will have to do is, we will have to measure these particular θ
and θ , for example, we can use some optical encoder.

So, some optical encoder, you can use just to find out what should be these particular
your θ and θ also. Now, these particular θ and θ will be brought to these particular
junctions. And, θ will be compared with θ d and will be getting what is these particular

565
the error. So, this error is nothing but is your θ d − θ and similarly, here you can find out
 θ − θ .
your E and this E
= d

So, this particular θ will be brought here for the purpose of comparison and we will be
 + EK + θ =
getting these particular your E , then EK τ ′ . So, we can find out these
D P d

τ ′ and we use the closed loop control system. So, you will be getting these particular the
accurate the joint angle and your θ .

So, this is the way actually, this control architecture works. Now, my question is how to
use the principle of fuzzy logic and neural networks just to design and develop the
adaptive controller, that I am going to discuss now. Now, how to design that adaptive
controller, so that it can generate the gain values K_P, K_I and K_D in an adaptive way.

(Refer Slide Time: 10:38)

So, what you will have to do is, you can design one fuzzy logic system or one say multi-
layered neural network you can utilize. So, what you can do is, here, the inputs will be
your E that is the error and your E that is you can say rate of change of these particular
error sort of thing and what should be your the gain values.

So, if I use PID. So, here there will be K_P, K_I and K_D. Now, depending on these
particular error and E your you can find out what should be the K_P, K_I and K_D.
Now, supposing that the robot is starting from here, say point A and it is going to reach

566
the goal. Now, there could be a possibility that in one time step, I am using one set of
K_P, K_I and K_D for another time step another set of K_P, K_I and K_D, another set
of K_P, K_I and K_D. And, by doing that say it is going to reach that particular the goal
and supposing there could be some obstacle here, just to avoid the obstacle might be this
is the path and to move along this particular path, supposing that we need some sort of
adaptive values for these K_P, K_I and K_D.

Now, if I use fuzzy logic-based system or say neural network-based system, there is a
possibility I will be getting the adaptive values for these gains; that means, I will be able
to develop some sort of adaptive controller for this particular motor.

(Refer Slide Time: 12:17)

Now, this is the way actually, we can design and develop the adaptive controller for this
particular motor. And, that is the purpose and the traditional method I have already
discussed, that is, the Ziegler Nichols method, but here, we will be getting only a set of
fixed values for the gains.

Now, if you want to go for the adaptive values for these particular gain values, what you
will have to do is, you will have to go for some sort of tuning using the computational
intelligence, that is, CI or you can develop some sort of adaptive controller using either
fuzzy logic or neural networks.

(Refer Slide Time: 12:57)

567
So, this is the way actually we can design and develop your the adaptive controller for
the robot. Now, I am just going to discuss like how to use the concept of the fuzzy logic
and neural networks in robot vision or computer vision. Now, once again, let me tell you
that if you do not know the fundamentals of robot vision or computer vision, you need
not worry. Although, I am just going to tell you the different steps used in the robot
vision in short, you need not have the proper knowledge of this particular digital image
processing or robot vision or computer vision, and the thing which I am going to
concentrate, how to use the principle of fuzzy logic and neural network, so that we can
design and develop very efficient computer vision system or the robot vision system.
Now, before I go for that let me try to understand the physical problem, the physical
problem is as follows: with the help of camera, the robot will have to collect some
information. Now, it will have to interpret whether it is object 1 or object 2 or if it is
another planning robot. So, that type of information, it will have to collect of this
particular environment.

So, how to do it now, what you do is, we follow a certain steps for the robot vision or the
computer vision. Now, the steps are as follows. So, what do you do is, we take the help
of some CCD camera and we go for your image capturing. Now, I will be discussing
each of these particular steps, in brief. Now, let me just try to spend some time on
discussing the fundamentals of this, then I will try to find out where are the areas, where
you can take the help of fuzzy logic and neural networks just to develop more efficient
robot vision system or the computer vision system.

568
So, the first step is your image capturing. So, what I do is, we used one CCD camera like
charged couple device camera just to collect information of the environment. Now, the
quality of this particular image depends on a number of parameters, for example, it
depends on the level of illumination while taking that particular the picture or the snap. It
depends on the calibration of the camera, it depends on the angle of vision like the angle
through which I am looking at that particular object, and so on.

So, there are many other factors also, which are going to control the quality of the
captured image. So, there is enough fuzziness in the quality of the collected image. So,
there is a chance that we can inject the principle of fuzzy sets, so that the fuzziness can
be captured. Now, once you have got the image collected with the help of camera, then
we go for some sort of analog to digital conversion, because ultimately, we will have to
write down the computer program and computer does not know anything of these
statements.

So, you will have to use some numbers. So, what I do is, based on the collected
information with the help of camera, the image is actually put on the computer screen
and in the computer screen. So, I have got one coordinate system say X and Y, this is
nothing but the origin, say it is 0, 0.

Supposing that this is nothing but the computer screen, so the whole computer screen
that is actually divided into a large number of small elements and these elements are
known as actually the picture elements or the pixels and supposing that here, I have got
the object. So, let me consider that here I have got one object sort of thing. Now, here
although I am showing it in color, let me concentrate that this is nothing but the black
and white object. Supposing that whatever I have drawn, the background is in say white
and this particular object is in black.

So, this is nothing but the black and white object on the computer screen. So, I will be
getting this particular object, now what you do is. So, this is divided into a large number
of pixels. So, I do the scanning, I do the scanning along Y direction, X direction. So,
there is a possibility that I will be getting some distribution of the light intensities. So,
this is nothing but the light intensity value, now if it is the black object, the light intensity
will be less and if it is white object the light intensity will be more.

569
Now, here, in sampling actually or the A/D conversion, what we do is, we do the
sampling and so, that corresponding to the different pixels, I should be able to find out
what is the numerical value; that means, corresponding to these pixels, I am just going to
find out what should be the light intensity values; that means, corresponding to this
image. So, I will be getting one matrix of the light intensity values. So, might be the
values could be 60, 68, 70 and so on 80, 81, 85 and so on. So, this type of numerical
values we will be getting and these are nothing but the light intensity values at the
different pixels and this is what do you mean by the frame grabbing.

Now, what is our aim? Our aim is, the robot should be able to identify that this is
object_1, this is object_2 lying in this particular environment. So, we are here. So,
corresponding to this particular image, we have got some number in the matrix form and
these are all light intensity values, then we take the help of your pre-processing, as I told
several times that these particular collected images or the collected light intensity values
will have some imprecision. There will be a lot of errors and there could be a lot of
noise.

Now, we want to remove these particular errors or the noise and that is why, we take the
help of pre-processing, there are many standard methods of pre-processing like one is
call the masking operation, then we have got the neighborhood averaging, then median
filtering, we can use. So, median filtering so these are all standard methods of pre-
processing and once again, this pre-processing can be tackled using the principle of
fuzzy logic.

Now, then comes the thresholding. The purpose of thresholding is actually to


differentiate or to find out the difference between the object and background. So, we try
to find out the difference between the object and background, and this is what we mean
by thresholding. So, there will be some thresholding value for light intensity and using
that particular thresholding value for light intensity, we can find out the difference
between the object and background.

So, on the computer screen, you will be able to see that there are some black spots or
black region, if it is black and white type of the image processing problem. So, after the
thresholding is over, approximately we will be getting some sort of the image, some sort
of object on the computer screen and that actually finds the difference between the object

570
and its background. Now, once you have got approximately the objects lying on the
background, now we can go for some sort of edge detection.

Now, in edge detection actually, we take the help of some derivative operator. So, we use
derivative operator just to find out the rate of change. So, by derivative, we means rate of
change and the same thing we use here just to find out, just to detect the edge of these
particular objects and to find out the edge between the object and the background. And,
once again, there could be a lot of applications. Using the fuzzy logic and neural
networks, we can handle the edge detection problem. And, once you have got the edges,
now we will have to describe the boundary, so that we can do further processing of the
object.

Now, for carrying out the further processing, we will have to express the particular
boundary and this boundary can be represented using some standard methods like one is
called the chain code method. So, we can use the chain code method or we can use some
sort of signature. So, mathematically you can represent the boundary of these objects.

Now, we can do some sort of further processing and then, we will be able to identify this
is object a, this is object b, which are lying in the environment. This is, in short, the
principle of the robot vision or the computer vision, so if the robot wants to collect
information of the environment, all such steps are to be followed within a fraction of
second, which is bit difficult because each of these particular steps will take some CPU
time and it will be computationally expensive.

Now, actually what you can do is, we can use the principle of this fuzzy logic or we can
train some network. So, just to implement each of these particular modules, so that once
it is trained within a fraction of second, this particular robot will be able to identify. So,
this is object a, this is object b, this is object c and so on. Now, here I just want to
mention that a few studies have been reported on each of these particular areas using the
principle of your the fuzzy logic and neural networks, but there is enough scope for
further development. And, what you need is actually one computer package and these
particular package will contain the fuzzy logic-based or the neural network-based
modeling of each of the steps of the computer vision system or the robot vision system.

Now, once it is developed using the principle of fuzzy logic and neural networks and
once it is properly trained, there is a possibility that the robot will be able to visualize the

571
environment with the help of this type of the tools and techniques within a fraction of
second and once it can visualize that that particular object and environment then, it will
be in a position to go for the motion planning.

Thank you.

572
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 41
A Few Applications (Contd.)

(Refer Slide Time: 00:15)

Now, we are going to discuss how to use the principle of Fuzzy Logic and Neural
Networks just to design and develop adaptive gait planner. Now, once again let me tell
you that the principle of robotics, which I am going to discuss here you may not
understand 100 percent. You need not worry, once again the main emphasis of taking
this example is to tell you like how to use the principle of fuzzy logic and neural
networks to solve or tackle this type of problem.

So, we are going to discuss how to design and develop adaptive gait planner. Now, this
gait is actually the sequence of legs movement synchronized with its body movements
during walking of a humanoid robot. Now, we human-beings, while walking, we follow
some gait cycle. Now, let us see how to design one adaptive gait depending on the
requirement.

Now, during walking actually, a legged robot or a humanoid robot should be able to
design and develop the adaptive gaits, otherwise, it will not be able to negotiate that
particular terrain very efficiently.

573
(Refer Slide Time: 01:38)

Now, here, I am just going to take one example, just to show you the walking cycle of a
particular biped robot. Now, this walking cycle consists of 2 double support phases and 2
single support phases. Now, here you can see a biped robot the human robot is walking
and in a particular cycle, supposing that the left foot is on the ground and the right foot is
also on the ground. So, this is nothing but the double support phase or both the feet are
on the ground.

Now, while walking, what we do is? So, this right foot that will be on the ground, but the
left foot will be taken away from the ground and this will be in the swing phase. Now, if
it is in the swing phase and this right foot is on the ground, so this is nothing but the
single support phase. Then, this particular left foot we are going to put on the ground and
the right foot is already on the ground. So, what will happen is, this will constitute one
double support phase and next, what we do is. So, this particular left foot will be here
and the right foot will be taken from the ground and it will be in air.

So, this is in swing phase. So, that is nothing but is the single support phase that
completes actually one walking cycle. So, one walking cycle consist of two single
support phases and two double support phases. Now, while walking, the biped robot or
the humanoid robot should be able to maintain its balance during both the single support
phase as well as the double support phase.

574
(Refer Slide Time: 03:41)

(Refer Slide Time: 03:44)

Now, here actually the main requirement while walking like we should be able to walk
by consuming the minimum power and at the same time the dynamic balance should be
the maximum, we should not lose the balance.

So, how to maintain this? And, let us see how to implement this with the help of fuzzy
reasoning tool and the neural networks. Now, here let me spend on the physical problem
this is a very simple problem, this is simple model. Now, here I am just going to explain.
So, this is nothing but actually the single support phase; that means, there is one foot on

575
this particular ground and another foot is on the air. And, supposing that, this particular
biped robot is negotiating this type of staircase. So, this shows actually the staircase and
if you see the biped robot; it is a very simple biped robot having 7 degrees of freedom
and if you just want to show it here. So, this particular foot is on the ground and this
particular foot is in air, that is, the swing foot. So, it is going to follow this particular
trajectory, while walking or while negotiating this particular staircase.

Now, similarly, if you see the limbs of this particular biped robot, this is actually one
foot. So, this is one link here, then we have got another link here, then this is the hip
joint, then comes another link here, another link here and here, we have got the ankle
joint. So, we have got ankle joint here, the knee joint here and we have got this particular
hip joint. In fact, all three points are coinciding. So, this is nothing but the hip joint.

So, this is the hip joint and this is nothing but knee joint and this is your ankle joint. So,
this is the ankle joint and for simplicity, we have considered it is having only 1 degree of
freedom. So, 1 degree of freedom plus 1 plus 1, that is, 3; 3 plus 3, that is, 6 and here,
you have got another degree of freedom for this trunk this is actually the trunk and so, it
is having 7 degrees of freedom.

Now, each of these particular links is having its mass and mass center, for example, if I
consider this particular foot, it is having some mass and mass center. So, this particular
link is having some mass and mass center, mass and mass center, mass and mass center.
So, each of these particular links, is having it is mass and mass center, and supposing that
I am considering the movement only in the sagittal plane.

Now, if I consider movement in the sagittal plane; that means, the biped robot is moving
in this particular direction. So, this is the direction of movement. I should say now if this
is the direction of movement, what you can do is? So, I will have to generate this
particular gait, that is the sequence of the leg movement, so that it consumes minimum
amount of energy and it can maintain that particular dynamic balance, while negotiating
the staircase.

Now, if I want to model it using fuzzy logic and neural networks, I will show you that a
few things are important. For example, here you can see, at the starting, I put the foot
here and that is taken as the swing foot later on. So, the initial position of the foot, that is
from the edge of this particular staircase, that is nothing but x_1 and the placement of

576
this particular ground foot with respect to the edge of this particular staircase, that is
nothing but is your x_2. So, I can consider that these particular x_1 and x_2 are nothing
but the inputs for my gait planner.

Now, if you see the way, one old person walks or negotiates the staircase and the way,
one young person negotiates the staircase, there will be a lot of difference, difference in
the sense the place, where they are going to put the foot will be different like x_1, x_2
for the old person and x_1, x_2 for a young person will be different. And, for the same
person, if the slope is much for this particular staircase, the same person is going to
change these particular x_1 and x_2. And, once we decide while negotiating the
staircase, the x_1 and x_2 and then, a few things are determined mathematically.

For example, what will be this particular hip height? For example, if you see the old
person, the hip height for the old person will be less, but for a young person the hip
height could be more. Now, once you know these x_1, x_2 and if I know these particular
joint angles like theta_1, theta_2, theta_3 and all such joint angles, I can find out what
should be your the hip height and another thing is your the step length.

Now, while walking what should be my step length, for example, for an old person, the
step length could be less; for a young person, the step length could be more. So, what do
you do is, we try to find out, what is the height of this particular knee joint and what is
this particular step length, while negotiating the staircase. So, we take the decision
regarding x_1 and x_2, where to put my feet. Now, all such things actually can be
modeled using the principle of fuzzy logic that I am going to discuss.

577
(Refer Slide Time: 10:11)

Now, the in the previous slide, we consider the single support phase and this is actually
the scenario for the double support phase. For example, if you see in the double support
phase, I have got one foot here on the ground, another foot here on the ground. So, both
the feet are on the ground. So, this is the double support phase and this particular link is
having the mass m_1 and it is having the length l_1. Similarly, it is having m_2, l_2,
m_3, l_3, m_4, l_4, m_5, l_5, m_6, l_6 and here, we have got m_7, l_7 and for the
double support phase also, we will have to find out what should be the adaptive gait, so,
that it can negotiate the staircase by consuming the minimum energy and after
maintaining the maximum dynamic balance.

578
(Refer Slide Time: 11:20)

Now, let us see how to implement or how to model this with the help of the fuzzy logic
and neural networks. Now, here you can see that I am using two multi-layer feed-
forward network. So, this is a 3 layer network and this is another 3 layer network. So,
this is a 3 layer network and here, I have got another 3 layer network. So, two multi-
layered feed-forward network I am using and these x_1 and x_2 are fed as inputs to this
particular network and here, I have already discussed, how does it work. So, I am not
going to repeat that. So, ultimately, I will be getting this h_1 and l_1, as the output of this
particular first network.

Now, what are h_1 and l_1? h_1 and l_1 are going to tell you what should be the height
of the hip and what should be your the step length sort of thing, that particular
information or what should be the projection of hip with respect to the ground foot. So,
that is nothing but is your l_1 and once you have got these particular h_1 and l_1, so,
mathematically we can find out what should be the joint angles.

So, trigonometrically, we can find out what should be the joint angle. So, all the joint
angles, we can calculate except theta_4. Now, once you have got these particular the
joint angle, I can find out what is the change in theta_2, what is the change in theta_3
and that will be fed as inputs to the second network.

And, in the second network as outputs, we will be getting what should be the change in
theta_1 and what is the change in theta_4. Now, let me once again go back to the picture.

579
Now, here, if you see, what we are getting as outputs, we are getting the change in
theta_4 and your change in theta_1 as the outputs; that is nothing but the change in joint
angle for the swing leg. Now, whenever the biped robot is negotiating the staircase its
balance largely depends on the angle of the trunk mass, that is, your theta_4 and another
is your the angle of this particular the swing foot.

And, these two things are the outputs and once we have got those particular outputs,
now, we are in a position to find out what should be the stability, whether it is
dynamically stable or not and we can also mathematically found out what should be the
torque values, how much is the power consumption. So, I am not going for all such
things, because this course actually does not permit that, so, in robotics course actually,
we consider all such things and we calculate.

Now, here, the purpose of these is actually to find out what are the changes in theta_1
and theta_4?, so that the balance is maintained for this biped robot. So, this is the way
actually we can implement the gait planning using two multi-layer feed-forward
network.

(Refer Slide Time: 14:55)

And, the same problem can also be tackled using the fuzzy reasoning tool and you can
see that exactly in the same way, we can represent this particular x_1 as the input and x_

580
2 is another input, we can use some linguistic terms like low, medium, high and very
high. So, for simplicity, we have considered triangular membership function distribution
for the two inputs: x_1 and x_2 and I will be getting the output as h_1 and l_1. And, that
is nothing but the output of the first fuzzy reasoning tool. Now, using these h_1 and l_1,
mathematically, we will be able to calculate what should be the joint angle values.

(Refer Slide Time: 15:46)

And, once we have got these particular joint angle values exactly in the same way, we
are in a position to find out what should be the change in theta_2 and your change in
theta_3. And, this particular change in theta_2 and theta_3 is going to follow some
triangular membership function distribution and as output will be getting that is nothing
but change in theta_1 and change in theta_4. And, these are the membership function
distribution for these theta_1 and theta_4.

So, we can find out the joint angles, so that it can maintain the dynamic balance and we
can also find out how much will be the power consumption. So, this is the way actually
we can model this biped walking just to find out what should be the adaptive gaits, if you
want to negotiate the different types of terrains or different types of your staircase in an
optimal sense. So, this is how to use the principle of fuzzy logic and neural networks just
to find out the adaptive gaits.

581
(Refer Slide Time: 16:59)

In fact, we have solved the problem and this shows the stick diagram during the single
support phase, while ascending that particular staircase.

(Refer Slide Time: 17:17)

And, this shows actually the stick diagram while ascending that staircase during the
double support phase. So, this is the way actually we can develop the adaptive gait.

582
(Refer Slide Time: 17:32)

Now, I am just going to discuss another area, which is very interesting and we can use
the principle of the fuzzy logic and neural networks also just to develop the intelligent
data miner. So, the problem, which I am going to discuss is nothing but intelligent data
mining. Now, for this particular problem of intelligent data mining, the purpose is how to
model the input-output relationship of a process in both forward and reverse directions.
And, if required, whether we can map the data from the higher dimension to lower
dimension for the purpose of visualization, whether we can do some sort of clustering
using the principle of similarity.

Now, here the input-output data are cluster based on similarity, we do some mapping
from higher dimension to lower dimension for the purpose of visualization. Now, this I
have already mentioned that we human-beings can visualize only up to 3 dimensions.
Supposing that the data are in say higher dimensions, so for the purpose of visualization,
we will have to do the mapping.

In fact, we can use self-organizing map for carrying out the mapping and we can also
carry out some sort of clustering using self-organizing map or some fuzzy clustering
tools like fuzzy c-means clustering or entropy-based fuzzy clustering, and so on. And,
once you have got this particular cluster, now let us see, like how to develop the
reasoning tool, which is very interesting.

583
Now, supposing that say I have got the higher dimensional data. So, this represents the
higher dimensional data. So, this is nothing but the higher dimensional data and I have
got a large number of data points here. So, we have got a large number of data points like
input-output relationship and our aim is to design and develop one expert system. So,
that you can do this particular input-output modeling as accurately as possible.

Now, what we do is? We first try to carry out some sort of clustering based on the
similarity. So, we have already discussed some tools like fuzzy c-means clustering or
entropy-based clustering and supposing that we have got some clusters here. So, I have
got one cluster here and it has got a center, I have got another cluster here, say it has got
a center. I have got another cluster here, so, it has got a center here, I have got another
cluster here, say it has got a center here, and so on. For simplicity, let me consider for the
time being that there are only 4 clusters, the data are such that we have got only 4
clusters, ok?

Now, if this is the situation and supposing that this particular data are in higher
dimensions. So, we cannot visualize that particular cluster. So, what we will have to do
is? We will have to use some dimensionality reduction techniques like self-organizing
map just to map these higher dimensional data to the lower dimension, and if you can do
the mapping, very easily we can see the clusters very distinctly.

Now, once you have got this particular cluster, now what you can do is, we can go for
some sort of cluster-wise regression or we can go for some sort of fuzzy reasoning tool
development, so that we can do this particular modeling very accurately. Now, let us see,
how to do it? Now, supposing that we have got say 4 clusters. So, I have got a cluster
center and the cluster center has got some properties.

So, what I can do is? I can carry out some, I can use some fuzzy reasoning tool here just
to find out the input-output relationship or I can use some neural network here to find out
the input-output relationship. Similarly, I can find out one another fuzzy reasoning tool
or neural network, another fuzzy reasoning tool or neural network, another fuzzy logic
tool or neural network I can develop.

Now, supposing that one new test case has come. Now, the moment you find a new test
case supposing that the test case is here. So, this is I am supplying the set of inputs and
our aim is to find out what should be the set of outputs? The moment I pass these

584
particular test scenarios, I have got the set of input parameters. So, these set of input
parameters will be compared with a set of inputs of c_1, the first cluster center, then it
will be compared with the second cluster center, then the third cluster center and the
fourth cluster center. And, we try to find out the equilibrium distance values between the
test scenario, so, this is the unknown whose outputs are not known. So, this is the test
scenario.

So, we try to find out like what should be the cluster center and the distance between this
point and c_1, the distance between this point and this center and this is your d_3 and
this is your d_4 and you try to find out the similarity between these particular test
scenario and each of these particular the cluster centers.

Now, the more the equilibrant distance value, the less will be the similarity and vice-
versa now and the more will be the similarity, the contribution of that particular cluster
towards the output of this particular test scenario will be more. Now, supposing that this
particular data point is found to be very close to say c_2. So, while determining its output
for this particular the test scenario. So, this particular your c_2 will have more
contribution while determining the output of this particular test scenario.

This is the way actually, we can implement the data miner. And, here actually, what we
can do is, we can make it intelligent also. Intelligent in the sense like the quality of the
cluster, each of the clusters, depends on a number of parameters, for example, if it is
fuzzy c-means clustering, it depends on the number of clusters which we are going to
make, then comes the level of cluster fuzziness. If it is entropy-based clustering, if you
remember, it depends on the parameters like alpha, beta and gamma.

Now, what you can do is, you can link it to one nature-inspired optimization tool, say
genetic algorithm. And, this particular genetic algorithm, we will try to evolve all such
optimal parameters like alpha, beta and gamma depending on the nature of this particular
data set, if it is fuzzy c means algorithm.

So, it will try to find out what should be the number of cluster center and what should be
the level of cluster fuzziness, say denoted by g, so that this particular fuzzy reasoning
tool will be able to predict the output as accurately as possible and GA through a large
number of iterations will try to find out what should be the optimal parameters of this
particular clustering algorithm, so that it can make the clusters in an optimal sense and

585
by using these optimal clusters, we will be able to predict the output for a set of inputs
very accurately, that is actually the concept of intelligent data mining. Now, this
intelligent data mining actually makes a very good practical sense. Let me take a very
simple example, say I have got, say one 3 dimensional data. Now, for this 3 dimensional
data if you want to represent we can do it very easily. I can just consider the corner of a
room like x, y and z.

So, the whole volume of a particular room will be the search space or the design space
for this particular problem, now generally the real-world problems are non-linear. Now,
for this non-linear problem, the level of non-linearity may not be exactly the same at the
different regions of these variables. Might be at some corner, the degree of non-linearity
could be more compared to some other corners. So, that is why, this particular concept of
clustering makes a very good sense, if we want to determine the input-output
relationship, so, it is better to go for.

So, this type of clustering and clustering-based regression like it could be your fuzzy
reasoning-based regression or you can use some neural networks also to determine and
after that actually we can combine. So, this is the way, in fact, we can develop the
intelligent data mining tools just to determine the input-output relationship in a very
efficient way. So, this particular fuzzy logic and the neural networks have got a large
number of applications.

(Refer Slide Time: 28:21)

586
Now, here quickly, I am just going to take one example, which we have solved, it is
actually the screening and prediction of psychosis. This work, we carried out a few years
ago. Now, we carried out some sort of intelligent data mining using some data related to
psychosis. Now, for these psychosis, in fact, we consulted some doctors and we collected
the data and there are 24 symptoms related to these mental disease or psychosis.

And, the data we are collected with the help of 40 doctors and we carried out some sort
of screening using the clustering and we carried out some prediction using the fuzzy
reasoning tool. Now, let us see how to carry out this particular screening and prediction
for this psychosis using the fuzzy reasoning tool.

(Refer Slide Time: 29:28)

Now, as I told that we consider 24 parameters. So, here you can see that we are
considering like your 1 2 3 4 5 6, 6 plus 6, 12 and the next slide you will find 12 more.

587
(Refer Slide Time: 29:45)

So, we have got 24 parameters, which we consider in that particular model.

(Refer Slide Time: 29:58)

And, using that 24 parameters actually, we will have to classify or we will have to do the
clustering into 7 grades of mental disease or psychosis. Now, the 7 grades of mental
diseases are as follows. Now, these are schizophrenia, then comes your mania, then
depression with psychosis, then delusional disorder, schizoaffective disorder, organic
psychosis and catatonia. So, there are 7 grades of mental diseases or psychosis and we
collected a huge amount of data with the help of 40 doctors.

588
(Refer Slide Time: 30:43)

And, those data were actually clustered using the fuzzy c-means clustering and entropy-
based clustering. Now, here you can see, very distinctly, that there are 7 clusters of these
particular diseases. So, these 24 data; that means, it was in 25 dimensions.

So, those data were clustered and then, the mapping was done to 2 dimensions for the
purpose of visualization, using the self-organizing map, whose principle I have already
discussed. So, after doing this clustering using fuzzy c-means algorithm and after
carrying out the self-organizing map-based analysis, we could find out the 7 distinct
clusters of this particular the psychosis and it is very interesting that we can find out, for
example, this is one cluster, this is the second cluster, similarly, we will be getting
actually 7 clusters of the psychosis.

589
(Refer Slide Time: 31:49)

Now, this is for the same data using the entropy-based clustering. So, we got the 7
clusters and here once again, you can identify the 7 different clusters very clearly. And,
once you have got these particular clusters as I discussed like we could develop some
fuzzy reasoning-based predictive tools. And by using that fuzzy reasoning means
predictive tools, we were able to predict the status of a disease like the degree from
which a particular patient is suffering from the mental disease. That could be predicted
using the fuzzy logic-based screening and prediction system.

(Refer Slide Time: 32:40)

590
So, this is the way actually we developed that intelligent data miner for carrying out
some sort of screening and prediction of the mental disease. Now, this particular thing
has got a very practical meaning in the sense, supposing that one expart of mental
disease, one medical doctor, say he is examining a large number of patients one after
another. The moment a particular patient comes in front of him, he generally asks a
number of questions and he tries to find out the similarity of the answers he is getting
with those seven well-defined clusters.

And, he tries to make a fit like whether this particular patient is going to fit to any of the
clusters or he is totally unfit, he is not going to fit. If he is not going to fit, he is a
mentally healthy person and if he is going to fit to a particular clusters, accordingly, the
medicines will be prescribed.

So, this is the way actually, the medical diagnosis can be carried out for mental disease
using your fuzzy reasoning tool by utilizing fuzzy clustering and the data can be mapped
for the purpose of visualization.

Thank you.

591
Fuzzy Logic and Neural Networks
Prof. Dilip Kumar Pratihar
Department of Mechanical Engineering
Indian Institute of Technology, Kharagpur

Lecture – 42
A Few Applications (Contd.)

Now, let me summarize the content of this particular course. Now, we started with
actually a concept of fuzzy sets, that is a set with weak boundaries, there is no well-
defined boundary. By using this particular fuzzy sets, we discuss like how to model the
uncertainty and imprecision.

(Refer Slide Time: 00:48).

Now we started with, in fact, the concept of classical set or the crisp set and we discuss
that, that this particular classical set or the crisp set can handle only one type of
uncertainties using the principle of the probability theory. But there are many other
uncertainties in the world, which cannot not be modeled using the classical set or the
probability theory and that is why the concept of fuzzy set came into the picture.

Now, we have defined the concept of fuzzy sets, we got the difference between the fuzzy
set and the classical set. The classical set is a set with fixed boundary on the other hand,
the fuzzy set is a set with weak boundaries. Now, we discuss the different properties of
classical set and in fact, we discussed about 10 properties followed by the classical set.
Now, out of these 10 properties, the first 8 will also be followed by fuzzy set, but the last

592
two, that is your law of contradiction and law of excluded middle, will not be followed
by the concept of the fuzzy sets.

Now, in fuzzy sets, we use the concept of the membership, we consider different types of
membership function distribution. For example, we consider the triangular membership
function distribution, trapezoidal membership function distribution, Gaussian
membership function distribution, Sigmoidal membership function distribution, and so
on. We started with a few definitions related to fuzzy sets for example, we defined alpha
cut of a fuzzy set, we defined the core of a fuzzy set, the height of a fuzzy set and all
such things. Then, we concentrated on some standard operations used in fuzzy sets for
example, say the intersection of a fuzzy set, union of a fuzzy set, then comes the bounded
sum of a fuzzy set, the bounded difference of a fuzzy set, algebraic sum of a fuzzy set,
algebraic difference of a fuzzy set, and so on.

So, we discussed the grammar of fuzzy sets, in detail. Then, we concentrate on the
different applications of the fuzzy sets. The fuzzy sets have been used to solve a variety
of problems and out of all the applications, the most popular one is the fuzzy reasoning
tool and the purpose of fuzzy reasoning tool is how to establish the input-output
relationship. Now, if you see the literature, we have got two types of fuzzy reasoning
tools: one is called the precise fuzzy reasoning tool and another is called linguistic fuzzy
reasoning tool.

Now, in precise fuzzy reasoning tool, we get more accuracy, but less interpretability that
is nothing, but Takagi and Sugeno’s Approach. Now, in the linguistic fuzzy model, that
is, the Mamdani approach, we may not get so much accuracy, but its interpretability or
readability is high. Now, while discussing the fuzzy reasoning tool, we have discussed
that the complexity of a fuzzy reasoning tool depends on the number of the linguistic
terms we used to represent each of the variables, it also depends on definitely the number
of variables. So, for a large number of variables and if I use more number of linguistic
terms, the complexity of the fuzzy reasoning tool is going to increase and to overcome
that the concept of hierarchical fuzzy logic controller has come and we discuss its
principle. By using the hierarchical fuzzy logic controller, we can reduce the number of
rules and then comes the clustering algorithm, we discussed the different clustering
algorithms, clustering is done based on similarity. So, the similar data points should
belong to the same cluster and dissimilar data point should go to two different clusters.

593
We used the different clustering algorithms like fuzzy c-means clustering, entropy-based
fuzzy clustering, and so on. Now, if we compare the fuzzy c-means clustering and your
entropy-based clustering based on the quality of the clusters obtained.

Now, in fuzzy c-means clustering, there is a possibility that we will be getting very
compact clusters, but we may not get a very distinct cluster. On the other hand, in
entropy base fuzzy clustering, we may not get a very compact cluster, but you will be
getting very distinct clusters, but what we need is your a very compact as well as distinct
cluster and that is why, sometimes the merits of these two tools we combine and we
develop actually entropy-based fuzzy c-means clustering also.

Now, after discussing the concept of fuzzy sets, fuzzy reasoning tool, fuzzy clustering,
we started with the fundamentals of neural networks and as I told that in artificial neural
networks, we copy everything from human brain in the artificial way. We copy actually a
biological neuron in the form of artificial neuron and one neural network consists of a
large number of layers and at each layer, there will be a large number of neurons. Now,
this particular network can be developed using the principle of the unsupervised learning
and supervised learning. So, by supervised learning actually what we mean is, it is
learning with a teacher.

So, we have got some known input-output relationships and with the help of this known
input-output relationships, if you just do the training it is called the supervised learning
or the learning with the teacher, now it has got the concept like if there is any mistake
done by the student, the teacher is going to make the correction, the same is true in
supervised learning because we have got the known input-output relationships. So, if
there is error. So, this particular error will be rectified in the supervised learning.

On the other hand, we also discussed actually the unsupervised learning. So, in
unsupervised learning, actually we use the concept of competition, cooperation and
updating. Now through this competition, we are just going to declare who is the winner
and surrounding that particular winner there will be some excited neuron and there will
be some interactions and through this particular interactions actually both the winner as
well as his followers are going to learn, they are going to update their information and
ultimately, you will be getting one network, that is actually the principle of your
unsupervised learning.

594
Now, one thing if you see the literature, the concept of semi-supervised learning is also
there. Now, the semi-supervised learning is such that we use actually the concept of both
supervised as well as unsupervised learning. So, the part of the problem will be solved
using the supervised learning and the rest of the problem will be solved using
unsupervised and it could be vice-versa also. So, we have the concept of semi-supervised
learning also.

Now, if you see how to implement this supervised learning, we have discussed that very
famous, very popular the back-propagation algorithm and the back-propagation
algorithm works based on the steepest descent algorithm, which is nothing, but the
gradient based method. Now, here in back-propagation algorithm, there is a one problem
like the solution may get stuck at the local minima.

Now, we have also designed and developed some sort of algorithm, that is, GA neural
network, now there we see that we could replace this back-propagation by the genetic
algorithm and we can also develop the combined genetic algorithm and neural network
just to remove the demerits of the back-propagation algorithm.

Now, in this course, actually we discussed a number of networks and we have solved a
number of numerical examples. For example, we started with the multilayer feed forward
network and we have seen like how to carry out the forward calculations, how to
compare the output of the forward calculation with the target value just to find out the
error and how to propagate this particular error in the backward direction, so that we can
minimize the error in prediction. Now, this multilayer feed-forward network has been
widely used to determine the input-output relationships of different types of processes.

Now, here the performance of this particular the multilayered feed-forward network
depends on the connecting weights, the bias values, the coefficient of transfer functions
and of course, it depends on your the architecture or the topology of this particular the
network. Now, supposing that I am just modeling a process having say 5 inputs and 4
outputs. So, if there are 5 inputs some of the input layer, we are going to use 5 neurons
and on the output layer, we are going to use 4 neurons.

Now, the architecture or the topology of this particular network depends on how many
hidden layers we are going to use and how many neurons we are going to use in each of
these hidden layers. So, once I have got these particular the architecture now you can

595
update the values of the connecting weights, the values of the coefficient of transfer
function and bias, so that this particular network can predict the input-output relationship
in a very accurate way.

Now here, one merit for this particular network I should mention that using this
particular network, we can also carry out the reverse mapping. So, both the forward
mapping as well as reverse mapping can be carried out with the help of this multilayered
feed-forward network. Then, we started with the working principle of a radial basis
function network. The purpose of using the radial basis function network is once again to
establish the input-output relationship and here, we use a special type of transfer function
and that particular transfer function is nothing, but the radial basis transfer function.

Now, here by radial basis transfer function we mean that particular transfer function,
where the value of y either increases or decreases monotonically with the value of x for
example, say if I consider this type of Gaussian distribution, this is an example of the
radial basis function. Now, this type of radial basis function or the Gaussian distribution,
we generally use as transfer function in radial basis function network. Now, in neural
network, we use some other types of transfer function also for example, say we use hard
limit transfer function, we use linear transform function, we use some other types of non-
linear like log sigmoid transfer function or tan sigmoid transfer function.

So by using this radial basis function network, as we mentioned that we can find out the
input-output relationship. Now, supposing that the process is highly dynamic and for a
highly dynamic process, your multilayer feed-forward network may not have the ability
just to capture the input-output relationship and there we will have to go for some sort of
recurrent neural networks and this particular recurrent neural network has got both feed
forward and feedback circuit and with the help of this feed-forward and feedback
circuits, so it is able to capture the non-linearity or the dynamics of that particular
process.

So, we have discussed, in detail, the working principle of this particular the recurrent
neural networks and as we have mentioned the moment we pass some external inputs,
some of the internal inputs will be created by the process and those things have been
mentioned here in recurrent neural network. Then, comes the self-organizing map and
we have already mentioned that this particular network works based on unsupervised

596
learning and we have discussed, in detail, the working principle of the self-organizing
map, how can it map from higher dimension to lower dimension and this is a very
accurate mapping because we use some sort of topology mapping here, which is very
accurate and using the self-organizing map, we can carry out that particular the mapping.

Now, using the concept of the self-organizing map, we also discuss how to develop the
counter propagation network, that is your CPNN. Now, using the counter propagation
network, once again we can model the input-output relationship, here we take the help of
both unsupervised self-organizing map and supervised Gross-berg learning, which we
have already discussed in details and after discussing the different types of networks, we
actually started with some combined tools. The reason behind going for the combined
tools is your each of the tools is having its own merits and demerits, but in combined
tools, we wanted to consider the merits of those tools and we wanted to delete their
demerits.

So, we discuss the working principle of genetic-fuzzy system, genetic-neural system,


where the aim was to evolve a fuzzy reasoning tool or to evolve a neural network-based
tool with the help of the genetic algorithm, their working principle, we have discussed, in
detail, some numerical examples we have solved, then we concentrate on the neuro-
fuzzy system.

Now, this neuro-fuzzy system has been developed in two different ways based on the
Mamdani approach and based on Takagi and Sugenos approach and we have seen how to
develop this neuro-fuzzy system using the principle of genetic algorithm and we, in fact,
discussed about genetic neuro-fuzzy system. So, after discussing this combined tools, we
started with the definition of soft computing. So, as I told that by soft computing we
mean the combined tools like the combination of genetic algorithm and fuzzy logic that
is genetic-fuzzy system, combination of genetic algorithm and neural networks, that is
genetic-neural system, combination of neural network and fuzzy logic, that is neuro-
fuzzy system or genetic neuro-fuzzy system those are going to constitute actually what
you mean by the soft computing and in soft computing, as I told that we are not so much
interested in precision or accuracy. So, it is different from the hard computing, hard
computing works based on the mathematics the differential equation and solution, but in
soft computing we take the help of some nature-inspired techniques like fuzzy logic,
neural networks, then comes genetic algorithms and other nature-inspired optimization

597
tools. Then, we concentrated on the expert system like what do you mean by an expert
system, why do you need an expert system and we in fact, give the example of a few
very popular expert systems available.

Then, we concentrate on actually a few applications of fuzzy logic and neural networks.
Now, as I mentioned that a huge literature is available on the applications of fuzzy logic
and neural networks not only in general science, but also in engineering science different
fields of engineering science and the commerce, ok. So, there are so many such
applications of these fuzzy logic and neural networks. Now, here in this particular
course, we discussed about two applications in details one is how to design and develop
intelligent and autonomous robots and to design and develop this intelligent and
autonomous robot, we discussed about how to develop adaptive motion planner, adaptive
controller, how to carry out the vision analysis like how to solve the problem of
computed vision or the robot vision, we also discuss how to generate the adaptive gaits
and after that we also discussed the principle of intelligent data mining.

(Refer Slide Time: 22:06).

Now, actually these are the things which we have discussed in this particular course and
we have solved a large number of numerical examples and just to clear the ideas of each
of the algorithms, so what I have done is, I have taken one numerical examples. Now, if
you want to see the references to get more information regarding the course, you can see
the textbook: soft computing: fundamentals and applications written by me or regarding

598
the problem related to the intelligent and autonomous robot, this particular paper actually
carries a lot of information. So, you can have a look of this particular paper, that is
written by V Mahendar and me published in fuzzy sets and systems, then for this medical
diagnosis for the psychosis, if you want to get more clear picture you should read this
paper S Chattopadhyay, D K Pratihar and S C De Sarkar and this particular paper was
published in IEEE transaction SMC Systems Man and Cybernetics part A and you can
find out more detailed information regarding the topics, which we have discussed here.

(Refer Slide Time: 23:34).

Now, to conclude actually this particular lecture, that is, lecture 9, what I did is, I started
with the definition of soft computing, I defined the terms hard computing, I took the
example of hybrid computing and I tried to find out what is the reason behind going for
the soft computing tools, we have also defined, what do you mean by expert system and
we have seen that by expert system we mean it is not simply one computer program and
there is something extra; that means, we try to copy the behavior of a human being, the
way he solves that particular problem that we are going to model in expert system or the
knowledge based system. A few applications we discussed in discussed, in details and
the total content of the course has been summarized. I think you will enjoy this particular
course and you will be learning a lot through this particular course. I wish you all the
best.

Thank you.

599
THIS BOOK
IS NOT FOR
SALE
NOR COMMERCIAL USE

(044) 2257 5905/08


nptel.ac.in
swayam.gov.in

You might also like