Soft Computing
Soft Computing
to
Soft Computing
Concept of Computing
Control Action
Soft Computing
Zadeh 1981
{True, False}
{1, 0}
OPERATIONS ON CRISP SETS
UNION:
INTERSECTION:
COMPLEMENT:
DIFFERENCE:
A| B A B
OPERATIONS ON CRISP SETS : Example
April 2007 15
Representation of fuzzy sets
• If the universe of discourse U is discrete and finite, then
fuzzy set A can be represented as
A ( x1) A ( x2 ) n A ( xi )
A ...
x1 x2 i 1 xi
A ( xi )
• NB i. The horizontal bar in is not a quotient but a
xi
delimiter.
A ( x1) A ( x2 )
• ii. The ‘+’ sign in x1 x2 does not performs
addition but is a function theoretic union.
A x
A
x
• where symbol i.e. integral sign is a continuous
function theoretic union for continuous variables.
OPERATIONS ON FUZZY SETS
Union/Disjunction: The union of fuzzy sets A and B, denoted by
A B is defined as:
A B x max A x , B x A x B x , x
B
0.5 0.4 0.1 1
2 4 6 8
A B x min A x , B x A x B x , x
B
0.5 0.4 0.1 1 , then
2 4 6 8
A B
min(1,0.5) min(0.3,0.4) min(0.5,0.1) min(0.2,1)
2 4 6 8
0.5 0.3 0.1 0.2
2 4 5 8
Operations on fuzzy sets cntd.
• Complement/negation: when A x [0,1] , the complement
of A, denoted as A is defined as
A x 1 A x , x
Fuzzy Complement
• A fuzzy complement operator is a continuous
function N: [0, 1] →[0, 1] which meets the following
axiomatic requirements:
– Boundary: N(0) = 1 and N(1) = 0
– Monotonicity: N(a) ≥ N(b) if a ≤ b
• All function satisfying these requirements form the
general class of fuzzy complements.
• Optional Requirement
– Involution: N(N(a)) = a.
Sugeno’s Complement 1 a
N s (a )
1 sa
Sugeno's compliment
1
s=-0.95
0.8
s=-0.7
0.6 s=0.0
N(a)
0.4
s=2
0.2
s=20
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x=a
Yager’s Complement N w a 1 a
1
w w
Yager's Compliment
1
w=3
0.8 w=1.5
0.6
w=0.7 w=1
N(a)
0.4
0.2
w=0.4
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
x=a
Triangular norm or T-norm
• The intersection of two fuzzy sets A and B is specified in
general by a function T:[0,1]x[0,1] ->[0,1], which
aggregates two membership grades as follows:
~
AB x T A x , B x A x * B x
where ~
* is a binary operator for the function T.
a, if b 1
iv. Drastic product: Tdp a, b b, if a 1
0 if a, b 1
Relation between the operators are:
0.8
0.6
Tmin
0.4
0.2
0
1
0.8 1
0.6 0.8
0.4 0.6
0.4
0.2 0.2
0 0
b a
T-norm Algebriac Product
0.8
0.6
Tap
0.4
0.2
0
1
1
0.8
0.5 0.6
0.4
0.2
b 0 0
a
T-norm Bounded Product
0.8
0.6
Tbp
0.4
0.2
0
1
1
0.8
0.5 0.6
0.4
0.2
b 0 0
a
T-norm Drastic Product
0.8
0.6
Tdp
0.4
0.2
0
1
1
0.8
0.5 0.6
0.4
0.2
b 0 0
a
T-conorm or S-norm
• The union of two fuzzy sets A and B is specified in general
by a function S:[0,1]x[0,1] ->[0,1], which aggregates two
membership grades as follows:
A B x S A x , B x A x ~
B x
where ~
is a binary operator for the function S.
3. Commutativity: S(a,b)=S(b,a)
1 (1.0 0.0) 1 (0.3 0.4) 1 (0.5 0.7) 1 (1.0 1.0) 1.0 0.7 1.0 1.0
A B
2 4 6 8 2 4 6 8
a, if b0
4. Drastic sum S ds a, b b, if a0
1, if a, b 0
0.8
0.6
Smax
0.4
0.2
0
1
1
0.8
0.5 0.6
0.4
0.2
b 0 0
a
S-norm Algebraic sum
0.8
0.6
Sas
0.4
0.2
0
1
1
0.8
0.5 0.6
0.4
0.2
b 0 0
a
S-norm Bounded Sum
0.8
0.6
Sbs
0.4
0.2
0
1
1
0.8
0.5 0.6
0.4
0.2
b 0 0
a
S-norm Drastic Sum
0.8
0.6
Sds
0.4
0.2
0
1
1
0.8
0.5 0.6
0.4
0.2
b 0 0
a
Do the Laws of Contradiction and Excluded Middle hold?
• Given a Fuzzy set (A, µA), we have
– The Law of Contradiction: A ∩ Ac = ∅
– The Excluded Middle: A ∪ Ac = X
• However, if A is a non-crisp set, then neither law will hold.
• Indeed, note that for a non-crisp set, there exists some x ∈ A such
that µA(x) ∈ (0, 1), i.e. µA(x) ≠ 0, 1.
• Thus, we have
– µA∩Ac (x) = min{µA(x), 1 − µA(x)} ≠ 0
– µA∪Ac (x) = max{µA(x), 1 − µA(x)} ≠ 1
• Hence, neither law holds for a non-crisp set.
PROPERTIES OF FUZZY SETS
Some terminologies
Some terminologies
Support: The support of a fuzzy set A is
the set of all points x in X such that A x 0
i.e. support ( A) x | A x 0
Membership grades
1.0
Crossover
Core: The core of a fuzzy set A is the points
Membership grades
fuzzy set A is a point x X at which 1.0
A x 0.5 i.e. crossover ( A) x | A x 0.5 Core &
support
More than one crossover points possible. 0.5
Membership grade
1.0 0.5
0.6
0.4
0
0 10 20 30 40 50 60 70 80 90 100
Age
bell function with center, c=45, width, a=15, slope, b=3
• Fuzzy numbers: A fuzzy number A is a fuzzy set in the real
line that satisfies the condition for normality and convexity.
• Bandwidth of normal and convex fuzzy sets: For
normal and convex fuzzy sets, the width or bandwidth is
defined as the distance between the two unique crossover
points.
where width ( A) | x2 x1 |,
Ex. Here the crossover points are 30 and 60, width(A)=60-30=30
A x1 A x2 0.5 1 A="Middle aged person"
Membership grade
symmetric if its MF is symmetric 0.6
A c x A c x , x X 0.2
lim A x lim A x 0
x x
Closed fuzzy set
1
Open right set
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0 5 10 15 20 25 30 35 0
-100 -50 0 50 100
Some Membership functions
• Triangular MF: is specified by three
parameters (a, b, c) with a<b<c are the 1
Triangular MF
0 if xa 0.4
x a
a xb
0.2
b a if
triangle x; a, b, c
0
0 20 40 60 80
if bxc
c b
0 if cx
OR
x a c x
triangle x; a, b, c max min , ,0
b a c b
Trapezoidal MF: is specified by 4
parameters {a, b, c, d}, with a<b<=c<d are
the x-coordinates of the 4 corners of the
underlying trapezoidal MF.
0, if xa
x a
Trapezoid MF
1
b a , if a xb
0.8
0, dx
if 0.2
OR 0
x a d x 0 20 40 60 80 100
1 x c 2
gaussian ( x; c, ) e 2
Gaussian MF
where 1
c = MFs center
= MFs width
0.8
0.6
0.4
0.2
0
0 20 40 60 80 100
c=50, sigma=12
bell MF
1
0.8
1
bell x; a, b, c 0.4
x c 2b 0.2
1 0
where a 0 20 40 60 80
a=20; b=4; c=50;
100 120
bell MF
a = width 0.8
1
0.2
upside down. 0
0 20 40 60 80 100 120
a=20; b=-4; c=50;
c = center
Bell MF Bell MF Bell MF
1 1 1
a=16 b=3 c=40
0.8 a=20 0.8 b=6 0.8 c=50
a=24 b=9 c=60
0.6 0.6 0.6
0 0 0
0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120
a=16:4:24; b=4;c=50 a=20; b=3:3:9; c=50 a=20; b=4; c=40:10:60
Advantages of Gaussian and Bell MF:
- Smooth
- Concise
- Popular
- Gaussian MFs are invariance under multiplication (i.e. product
of two Gaussian is a Gaussian with a scaling factor).
- Fourier transform of Gaussian is still a Gaussian
- Bell MF has one more parameter than Gaussian MF, so one
more degree of freedom to adjust the steepness at the
crossover point.
Disadvantages: Unable to specify asymmetric MFs.
1
Sigmoidal MF: is defined as sig ( x; a, c)
1 e a ( x c )
where ‘a’ controls slope at the crossover point, x=c
Sigmoidal MF Sigmoidal MF
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-10 -5 0 5 10 -10 -5 0 5 10
a=1; c=5 a=-2; c=5
Left-right (L-R) MF
Definition: A left-right MF is specified by three parameters {α, β, c}:
respectively.
Example: Projection
(a) 2-D fuzzy
set R
0.9
0.8
Definition: A linguistic variable is 0.7
0
- X is the universe of discourse 0 20 40 60 80 100
x
X
DIL(A)=A0.5 old
0.8 very old
• Ex:
more or less old
if A is a fuzzy set for old, 0.6
then CON(A) is a fuzzy set for
very old and 0.4
DIL(A) is a fuzzy set for
more or less old. 0.2
0
0 20 40 60 80 100
Composite linguistic terms
• Ex. Let the MFs for the linguistic
terms young and old are
1
young x bell ( x;20,2,0)
1 20
x 4
1
old x bell x;30,3,100 6
1
x 100
30
1
• More or less old = DIL(old)=old 0.5
old
0.8
= 1 more or less old
6
x
x 1
0.6
x 100
30 0.4
0.2
0
0 20 40 60 80 100
1
• Given young x and old x
1
6
1 20
x 4
x 100
1
30
Then not young and not old=
1 1
young old 1 1 x
20
x 1 x 1 x 100
4
6
30
not young and not old
1
old
young
0.8
not young and not old
0.6
0.4
0.2
0
0 20 40 60 80 100
1
• Given young x and old x
1
6
1 20
x 4
x 100
1
30
Then, young but not too young=
2
young young 2
1 1 1 x
4 4
x 1 x
x
1
1
Young but not too young 20 20
0.8
young
0.6 too young
not too young
young but not too young
0.4
0.2
0
0 20 40 60 80 100
1
• Given young x
1
and old x
6
1 20
x 4
x 100
1
30
Then, extreme old=
8
CON CON CON old old 2 2 2
1 x
x 1 x 100
6
30
Extreme old
1
0.8
old
very old
0.6
very very old
extreme old
0.4
0.2
0
0 20 40 60 80 100
• Contrast intensification: The operation of
contrast intensification on a linguistic Construct Intensifier
value A is defined by 1
set A
2 A2 , for 0 A x 0.5
0.8 INT(A)
INT A INT2(A)
2A 2
for 0.5 A x 1
0.6
0.4
– Increases A, if A 0.5 0
– Decreases A , if A 0.5
0 20 40 60 80 100
muA (x)=triangle(x;5,30,95)
n
ti x 1, x X i
i 1
• Where the ti’s are convex and normal fuzzy sets
CLASSICAL RELATIONS
AND FUZZY RELATIONS
RELATIONS
Relations represent mappings between sets and
connectives in logic.
1 1 1 Apple Father
E R 1 1 1 Orange Mother
1 1 1 Banana Child
Classical Relations: null relation
X Z
• Ex.
0 0 0 Apple Cycle
R 0 0 0 Orange Car
0 0 0 Banana Bike
OPERATIONS ON CRISP RELATIONS
PROPERTIES OF CRISP RELATIONS
Commutativity,
Associativity,
Distributivity,
Involution,
Idempotency,
DeMorgan’s Law,
Excluded Middle Laws.
COMPOSITION ON CRISP RELATIONS
Let R and S be relations on the Cartesian universe XxY and YxZ respectively.
X Y Z R Ra Si Ba
Ap 1 1 1
Apple Ram Cycle Ch 0 0 1
Sw 1 0 1
Chips Sita Car
Example:
green 1 0.5 0
Red 0 0.2 1
15
Fuzzy Relations Matrices
• Example: Let R be a fuzzy relation between two sets X1 and
X2 where X1 is the set of diseases and X2 is the set of
symptoms.
X1={typhoid, viral fever, common cold}
X2={running nose, high temperature, shivering}
The fuzzy relation may be defined as
R(x1, x2) Running nose High Shivering
temperature
Typhoid 0.1 0.9 0.8
16
Fuzzy Relations Matrices
• The elements of two sets are X={3,4,5} and Y={3,4,5,6,7}. The
MF of the fuzzy relation is defined as
yx
, if y x
R x, y x y 2
0, if y x
3 4 5 6 7
3 0 0.111 0.2 0.273 0.333
R 0
4 0 0.091 0.167 0.231
5 0 0 0 0.077 0.143
The Real-Life Relation
• x is close to y
– x and y are numbers
• x depends on y
– x and y are events
• x and y look alike
– x and y are persons or objects
• If x is large, then y is small
– x is an observed reading and y is a
corresponding action
Classical to Fuzzy Relations
• A classical relation is a set of tuples
– Binary relation (x,y)
– Ternary relation (x,y,z)
– N-ary relation (x1,…xn)
– Connection with Cross product
– Married couples
– Nuclear family
– Points on the circumference of a circle
– Sides of a right triangle that are all integers
Example (Approximate Equal)
1 u v 0 1 0.8 0.3 0 0
0.8 1 0.8 0.3 0
0.8 u v 1
R (u, v)
M R 0.3 0.8 1 0.8 0.3
0.3 u v 2
0 otherwise 0 0.3 0.8 1 0.8
0 0 0.3 0.8 1
OPERATIONS ON FUZZY RELATION
The basic operation on fuzzy sets also apply on fuzzy relations.
Projection
RY R Y RX R X
PROPERTIES OF FUZZY RELATIONS
The properties of fuzzy sets (given below) hold good for
fuzzy relations as well.
Commutativity,
Associativity,
Distributivity,
Involution,
Idempotency,
DeMorgan’s Law,
Excluded Middle Laws.
COMPOSITION OF FUZZY RELATIONS
Extension Principle
Introduction
• Then the extension principle states that the image of fuzzy set
A under the mapping f(.) can be expressed as fuzzy set B.
where yi f ( xi ), i 1,...,n
If f(.) is a many to one mapping,
then there exists x1 , x2 X , x1 x2 Ex. Let A={0.9/-1, 0.4/1}, f(x)=x2 – 1,
such that f(x1)=f(x2)=y*, y* ∈ Y. Then f(-1) = f(1) = 0 = y*
1 0.9 0
0.1 0.4 0.8 0.9 0.3
1 2 3 2 1 0 -1
0.8
extension principle.
Induced via the
Fuzzy Rule
B. B. Misra
Fuzzy if-then rule/ fuzzy rule/ fuzzy implication/
fuzzy conditional statements
A fuzzy if-then rule assumes the form
if x is A then y is B (i.e. A->B)
Where A and B are linguistic values defined by fuzzy sets on
universe of discourse X and Y.
“x is A” is called the antecedent or premise.
“y in B” is called the consequent or conclusion.
Ex.
- If pressure is high, then volume is small.
- If road is slippery, then driving is dangerous.
- If a tomato is red, then it is ripe.
- If the speed is high, then apply the brake a little.
Fuzzy rule A -> B interpreted in two ways min(A,B)
Y
i. A coupled with B
If A is coupled with B, then
~
R = A -> B = A x B = A x * B y x, y B
X Y
~
Where * is a T-norm operator and A -> B
represents fuzzy relation R. X
ii. A entails B A
If A entails B written as 4 different formulas (A coupled with B)
1. Material implication,
R A B A B
2. Propositional calculus, Y A B
R A B A A B
Fig. 2
Let F is a fuzzy relation on X x Y at fig.(a) .
Let R1=A1xB1->C1
and R2=A2xB2->C2.
Since the max-min
composition operator ‘o’ is
distributive over the union
operator, it follows that
C ' A'B' R1 R2
A'B' R1 A'B' R2
C '1C '2
4 steps of fuzzy reasoning
• Degree of Compatibility: Compare the known facts with the
antecedents of the fuzzy rules to find the degree of compatibility
with respect to each antecedent MF.
• Firing Strength: Combine degrees of compatibility with respect to
antecedent MFs in a rule using fuzzy AND or OR operators to form a
firing strength that indicates the degree to which the antecedent
part of the rule is satisfied.
• Qualified (induced) Consequent MFs: Apply the firing
strength to the consequent MF of a rule to generate a qualified
consequent MF. (The qualified consequent MFs represent how the
firing strength gets propagated and used in a fuzzy implication
statement.)
• Overall Output MF: Aggregate all the qualified consequent MFs
to obtain an overall output MF.
Fuzzy Inference Systems
• Introduction
• Mamdani Fuzzy models
• Sugeno Fuzzy Models
• Tsukamoto Fuzzy models
Introduction
Fuzzy inference is a computer paradigm
based on fuzzy set theory, fuzzy if-then-
rules and fuzzy reasoning
Rule 2
Crisp or w2 Fuzzy
Fuzzy x is A2 y is B2 Fuzzy Crisp
X Aggregator Defuzzifier
Rule r
wr Fuzzy
x is Ar y is Br
Non linearity
Defuzzification [definition]
A ( z )dz A (z )dz,
z BOA
zdz
z MOM Z'
,
dz
Z'
where Z' { z; A ( z ) }
By definition : if A ( z ) has a single maximum at z z
then z MOM z
z1 z 2
However : if max A ( z ) z 1 , z 2 then z MOM
z 2
Mamdani Fuzzy models (cont.)
.9 .9 .9
.8 .8 .8
.7 .7 .7
.6 .6 .6
.5 .5 .5
.4 .4 .4
.3 .3 .3
.2 .2 .2
.1 .1 .1
0 0 0
Z
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 Z 0 1 2 3 4 5 6 7 8 Z
.8
Premise3(rule):if x is A2 and y is B2 then z is C2 .7
----------------------------------------------------- .5
Consequence (conclusion): z in C’ .4
.3
.2
C’
' Z
c
1
For ease of calculation, total area under C’ is
converted to different segments numbered .9
as 1, 2, …, 9. .8
.7
Centroid of area, ZCOA .6
Seg. Zi
No. (z value up to seg.+ C ' Zi Z i .5 7
Area= ' Zi
C z value of the
centroid of seg.) .4 4
1 (0.3*1)/2=0.15 0+1*(2/3)=0.67 0.1005 .3
6
2 0.3*(3.6-1)=0.78 1+(3.6-1)/2=2.3 1.794 .2 5
2 9
.1 3 8
1
0 Z
0 1 2 3 4 5 6 7 8
3.6 5.5
C’
Then, the defuzzied value using
centroid of area method is
C ' Z Z
ZCOA
C ' Z
C ' ( Z )dZ C ' ( Z ) ZdZ
=?
' Z
c
1
For ease of calculation, total area under C’ is
.9
converted to different segments numbered as 1,
2, …, 9. .8
R1 (x s) & (y s) w1
R2 (x s) & (y l) w2
R3 (x l) & (y s) w3
R4 (x l) & (y l) w4
if X is small then Y is C1
if X is medium then Y is C2
if X is large then Y is C3
Genetic Algorithm (GA)
B B Misra
• The GA was introduced by Prof. John Holland of the University
of Michigan, Ann Arbor, USA, in 1965, although his seminal book
was published in 1975. This book could lay the foundation of
the GAs
• Genetic Algorithms are the heuristic search and optimization
techniques that mimic the process of natural evolution.
• Principle Of Natural Selection
Evolved species
• Thus genetic algorithms implement the
optimization strategies by simulating evolution of
species through natural selection
Simple Genetic Algorithm
function sga() {
Initialize population;
Calculate fitness function;
While(fitness value != termination criteria) {
Selection;
Crossover;
Mutation;
Calculate fitness function;
}
}
• Problem encoding
• Fitness evaluation
• Crossover
• Mutation GA Operators
• Selection
• Termination
GA example
B B Misra
• Let consider the problem
• Maximize f(x)=x(8-x)
• This is not an appropriate problem for GA.
• For GA we consider problems for which
solution/mathematical models do not exist or the
time required is very high.
• However to understand GA, let’s consider this
simple problem.
• Maximize f(x)=x(8-x)
x f(x) Graph f(x)=x(8-x)
0 0 20
1 7
15
2 12
3 15 10
4 16
5 15 5
6 12
f(x)
0
7 7
8 0 -5
9 -9
-10
10 -20
… … -15
-1 -9
-2 -20 -20
-2 0 2 4 6 8 10
… … x
• For complex problems taking all possible input in
real space and finding the respective solution is not
possible.
• Lets examine the problem here
Maximize f(x)=x(8-x)
• As we want to maximize, let’s ignore –ve values of
f(x).
• Then when x=0, f(x)=0 and when x=8, f(x)=0.
• f(x) has higher +ve values between these two
extremes.
• Let’s take 0 ≤ x ≤ 8 as our search space for the
problem.
Encoding
Problem encoding
• We will discuss binary GA here
• Our search space for problem: Maximize f(x)=x(8-x)
is 0 ≤ x ≤ 8.
• We know that with
Bits we can encode i.e
1 0,1 21=2 values
2 00,01,10,11 22=4 values
3 000, 001, 010, 011, 100, 101, 110, 111 23=8 values
n 2n= values
• Then for our search space 0 ≤ x ≤ 8 i.e. for 9 values
we need 4 bits ( log 2 (9) )
Genes in the
Chromosome
Id g1 g2 g3 g4 x f(x)
#1 0 1 1 0 6 12
#2 1 0 1 1 11 -33
Individuals
#3 1 1 0 1 13 -65
#4 0 1 0 1 5 15
#5 1 0 1 0 10 -20
#6 0 0 1 1 3 15
for gen =1 to MaxGen
tempPop=Pop; % A copy of the population taken in the tempPop, new
offspring/children born after crossover are added to it.
Fitness of tempPop is evaluated and in selection process the
individuals selected are stored in Pop.
Perform crossover
Perform mutation
Evaluate fitness
Perform selection
end
Crossover
• We select randomly two
individuals from the mating P1(1) P1(2) P1(3) P1(4)
B. B. Misra
• Usually there are only two main components of most
genetic algorithms that are problem dependent
– the problem encoding and
– the evaluation function.
• The problem is viewed as a black box with a series of
control dials represented by the parameters.
• Value returned by the evaluation function is considered
as the black box output.
• Output indicates how well a particular combination of
parameter settings solves the optimization problem.
• The goal is to set the various parameters so as to
optimize some output.
• Generally GA is used to solve nonlinear
problems.
• Each parameter may not be treated as
independent variable.
• Combined effects of the interacting
parameters are considered to optimize the
black box output.
• In the genetic algorithm community, the
interaction between variables is sometimes
referred to as epistasis.
• The first assumption is that the variables representing
parameters can be represented by bit strings.
• That is the variables are discretized in an a priori fashion,
and the range of the discretization corresponds to
power of 2.
• For example, with 10 bits per parameter, we obtain a
range with 1024 discrete values.
• When the parameters are continuous the discretization
is not a particular problem.
• The discretization provides enough resolution to make it
possible to adjust the output with the desired level of
precision.
• It also assumes that the discretization is in some sense
representative of the underlying function.
• If some parameter can only take on an exact
finite set of values then the coding issue becomes
more difficult.
• For example, if there are exactly 1200 discrete
values which can be assigned to some variable Xi.
• We need at least 11 bits to cover this range, but
this codes for a total of 2048 discrete values.
• The 848 unnecessary bit patterns may result in no
evaluation.
• Solving such coding problems is usually
considered to be part of the design of the
evaluation function.
Problem of extra search space due to encoding in higher dimension
function Problem Bits GA Search space Ratio of search
search space reqd. space needed
↑f(x)=x(8-x) 0 ≤ x ≤ 8 i.e.9 4 0 ≤ x ≤ 15 i.e. 16 9/16 ≈ 1/2
↑f(x)=x1(8-x1) + x2(8-x2) 0 ≤ x1,x2 ≤ 8 4+4 = 0 ≤ x1,x2 ≤ 15 i.e. (9/16)2 ≈ (½)2
i.e. 9*9 = 92 4*2 16*16 = 162
↑f(x)=x1(8-x1) + x2(8-x2) + 0 ≤ x1,x2,x3 4*3 0 ≤ x1,x2,x3 ≤ 15 (9/16)3 ≈ (½)3
x3(8-x3) ≤ 8 i.e. 93 i.e. 163
↑f(x)=Σxi(8-xi), 1≤i≤10 0 ≤ xi ≤ 8 i.e. 4*10 0 ≤ xi ≤ 15 i.e. (9/16)10 ≈ (½)10
910 1610 =9.765×10-4
↑f(x)=Σxi(8-xi), 1≤i≤100 0 ≤ xi ≤ 8 i.e. 4*100 0 ≤ xi ≤ 15 i.e. (9/16)100 ≈
9100 16100 (½)100 =
7.888×10-31
0 2 4 6 8 10 12 14
GA search space
GA search space
0 2 4 6 8 10 12 14
Problem
Search GA
space Search Problem
Problem
space search
search
space space
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
key ideas for encoding
Use a data structure as close as possible to the
natural representation
• Write appropriate genetic operators as
needed
• If possible, ensure that all genotypes
correspond to feasible solutions
• If possible, ensure that genetic operators
preserve feasibility
Encoding
• Encoding of chromosomes is one of the
problems, when you are starting to solve
problem with GA.
• Encoding depends on the problem.
• It is important for problem solution to select
proper encoding.
• Encoding represents transformation of solved
problem to N-dimensional space.
Binary Encoding (1)
• Binary encoding is the most common, mainly
because first works about GA used this type of
encoding.
• In binary encoding every chromosome is a
string of bits, 0 or 1.
• Chromosome A 101100101100101011100101
Chromosome B 111111100000110000011111
Binary Encoding (2)
• Binary encoding gives many possible
chromosomes even with a small number of
alleles.
• On the other hand, this encoding is often not
natural for many problems and sometimes
corrections must be made after crossover
and/or mutation.
Binary Encoding (3)
• Example: Knapsack problem
• The problem: There are certain precious items
with given value and size.
The knapsack has given capacity.
Select items to maximize the value of items in
knapsack, but do not extend knapsack capacity.
• Encoding: Each bit says, if the corresponding
object is in knapsack.
Integer represented as Binary
encoding
• To represent integer values, its binary
equivalent can be taken.
• Example: maximize f(x)=x(8-x)
• Values of x may be represented in binary form
0 1 0 1
Linear mapping
• Linear value x in the range [xl, xu], needs to be
represented in n binary bits,
where xl is the lower and xu is the upper
bound of integer value x.
Then the binary value after conversion to
decimal xn can be mapped to the appropriate
range using
Linear mapping contd
• Ex. Let min. mark to pass is 40 and max marks
is 100. To optimize performance of pass
students, 100-40=60, needs 6 bits to encode.
• Let an individual is 0 1 0 0 1 0
• Then, xn =18
• And x = 40+(100-40)/(64-1)*18 = 56.66
• Integer value 56 or rounded off value 57 may
be considered for the problem.
Binary string
In such cases the decimal Code Fibre Angle
conversion of binary string 0000 0
is not required. 0001 10
To obtain the actual value 0010 20
the string is compared 0011 30
with a table of values. 0100 45
0101 60
0110 -10
Permutation Encoding 1
• Permutation encoding can be used in ordering
problems, such as travelling salesman problem
or task ordering problem.
• In permutation encoding, every chromosome
is a string of numbers, which represents
number in a sequence.
• Chromosome A 1 5 3 2 6 4 7 9 8
Chromosome B 8 5 6 7 2 3 1 4 9
Permutation Encoding 2
• Permutation encoding is only useful for
ordering problems.
• some types of crossover and mutation
corrections must be made to leave the
chromosome consistent.
Travelling salesman problem (TSP)
• Given a list of cities and the distances between each pair of
cities, what is the shortest possible route that visits each city
exactly once and returns to the origin city?
1 2 3 4 5 6
1 0 863 1987 1407 998 1369
2 863 0 1124 1012 1049 1083
3 1987 1124 0 1461 1881 1676
4 1407 1012 1461 0 2061 2095
5 998 1049 1881 2061 0 331
6 1389 1083 1676 2095 331 0
Parents Offsprings
4 1 3 2 5 6 4 1 3 1 5 6
4 3 2 1 5 6 4 3 2 2 5 6
Value Encoding
• Direct value encoding can be used in problems, where
some complicated value, such as real numbers, are
used. Use of binary encoding for this type of problems
would be very difficult.
• In value encoding, every chromosome is a string of
some values. Values can be anything connected to
problem, form numbers, real numbers or chars to
some complicated objects.
• Chromosome A 1.2324 5.3243 0.4556 2.3293 2.4545
Chromosome B ABDJEIFJDHDIERJFDLDFLFEGT
Chromosome C (back), (back), (right), (forward), (left)
Value Encoding
• Value encoding is very good for some special
problems.
• New crossover and mutation specific for the
problem may be required.
• In value encoding more than one gene may have
the same value in the chromosomes (which is
not allowed in permutation encoding.
Value Encoding example
• To solve the following problems using GA
make value encoding of
i. β values in Multiple linear regression.
x /
y
3
Tree Encoding
• Tree encoding is good for evolving programs.
Programing language LISP is often used to this,
because programs in it are represented in this
form and can be easily parsed as a tree, so the
crossover and mutation can be done relatively
easily.
Tree Encoding
• Example of Problem: Finding a function from
given values
The problem: Some input and output values
are given. Task is to find a function, which will
give the best (closest to wanted) output to all
inputs.
Encoding: Chromosome are functions
represented in a tree.
• Order of genes on chromosome can be important
• Generally many different coding for the
parameters of a solution are possible
• Good coding is probably the most important
factor for the performance of a GA
• In many cases many possible chromosomes do
not code for feasible solutions
• During coding take care that the evaluation of
function is relatively fast.
Schema
B. B. Misra
GAs: Why Do They Work?
2
Notation (schema)
• {0,1,#} is the symbol alphabet, where # is
a special wild card symbol
• A schema is a template consisting of a
string composed of these three symbols
• Example: the schema [01#1#] matches the
strings: [01010], [01011], [01110] and
[01111]
Notation (order)
• The order of the schema S (denoted by o(S)) is
the number of fixed positions (0 or 1)
presented in the schema
• Example: for S1 = [01#1#], o(S1) = 3
• for S2 = [##1#1010], o(S2) = 5
• The order of a schema is useful to calculate
survival probability of the schema for
mutations
l-o(S)
• There are 2 different strings that match S
Notation (defining length)
• The defining length of schema S (denoted by
(S)) is the distance between the first and last
fixed positions in it
B. B. Misra
Theory of Evolution
• Every organism has unique attributes that can be
transmitted to its offspring
• Offspring are unique and have attributes from each
parent
• Selective breeding can be used to manage changes
from one generation to the next
• Nature applies certain pressures that cause
individuals to evolve over time
10/23/2021 2
Evolutionary Pressures
• Environment
– Creatures must work to survive by finding
resources like food and water
• Competition
– Creatures within the same species compete with
each other on similar tasks (e.g. finding a mate)
• Rivalry
– Different species affect each other by direct
confrontation (e.g. hunting) or indirectly by
fighting for the same resources
10/23/2021 3
Natural Selection
• Creatures that are not good at completing tasks like
hunting or mating have fewer chances of having
offspring
• Creatures that are successful in completing basic
tasks are more likely to transmit their attributes to
the next generation since there will be more
creatures born that can survive and pass on these
attributes
10/23/2021 4
• Purpose: to focus the search in promising regions of
the space
• Inspiration: Darwin’s theory “survival of the fittest”
• Trade-off between exploration and exploitation of
the search space
• Too strong fitness selection bias can lead to sub-
optimal solution
18
13
33
7
25
4
Fitness proportionate
representation of 6 individuals
• Example: Consider maximization of f(x)=x(8-x).
• If fitness, F(x)=f(x).
• Some of the F(x) values may be negative.
n
0.1889
19% 0.1703
17%
20%
75%
27%
Rank selection Fitness rank Rank *avg. Pr
But when the fitness is close enough, priority
rank selection method may be biased. In 14.5 1 1*4.76 5
the example, relative fitness gap between 15.5 2 2*4.76 10
worst and best individuals is 0.23. Rank
16 3 3*4.76 14
selection allocates about 5.6 times larger
17 4 4*4.76 19
segment to best in comparison to the
worst. Which may again lead to improper 18 5 5*4.76 24
favour to a candidate. 19 6 6*4.76 28
Total rank=21 Wheel segment per
rank = 100/21=4.76
19 14.5 28 5 10
15.5 14
18
16 24 19
17
Linear Rank Selection
In Linear Rank selection, individuals are assigned
subjective fitness based on the rank within the
population:
– sfi = (P-ri)(max-min)/(P-1) + min
– Where ri is the rank of indvidual i,
– P is the population size,
– Max represents the fitness to assign to the best individual,
– Min represents the fitness to assign to the worst
individual.
pri = sfi / sfj Roulette Wheel Selection can be
performed using the subjective fitnesses.
One disadvantage associated with linear rank
selection is that the population must be sorted on
each cycle.
Fitness Rank Sf Pr
273 1 37 0.296
85 2 31 0.248
47 3 25 0.2
23 4 19 0.152
5 5 13 0.104 Linear Rank Selection
Total 125 1
10% 30%
15%
Pop. Size, P=5;
max=37;
min=13; 20%
25%
Exponential Ranking
23% 14%
20%
22%
21%
• Two important issues of evolution process is
population diversity and selective pressure,
Whiteley 1989.
Population Diversity: From the genes of the discovered
individuals promising new areas of search space
continues to be explored.
Selective pressure: is the degree to which the better
individuals are favoured.
- higher selective pressure better convergence.
- very high selective pressure premature convergence
to local optimal solution, population diversity
exploited is lost.
- low selective pressure, slow convergence.
• Disadvantages of proportionate representation
– Stagnation of search because it lacks selective
pressure.
– Premature convergence as search is narrowed
down quickly.
Tournament
• Binary tournament
– Two individuals are randomly chosen; the better fit of the two
is selected as a parent
• Probabilistic binary tournament
– Two individuals are randomly chosen; with a chance p,
0.5<p<1, the better fit of the two is selected as a parent
• Larger tournaments
– n individuals are randomly chosen; the fittest one is
selected as a parent
– By changing n and/or p, the GA can be adjusted
dynamically
Binary tournament
id fitness Random Comparison Selected
id of fitness id
#1 23 #2,#5 19>17 #2
#2 19 #4,#1 37>23 #4
#3 48 #3,#2 48>19 #3
#4 37 #5,#1 17<23 #1
#5 17 #4,#3 37<48 #3
Probabilistic Binary tournament
Random Random Comparison Selected
id fitness
No. id of fitness id
#1 23
.77 #4,#3 37<48 #3
#2 19
.17 - - -
#3 48
.83 #3,#2 48>19 #3
#4 37
.64 #4,#1 37>23 #4
#5 17
.48 - - -
.37 - - -
B. B. Misra
Generally reproduction is of two types:
1. Sexual reproduction, in which parts of the
parents are exchanged to produce two
children, is called crossover and produces
children that in general are different from
both parents but still contain large parts of
each.
2. Asexual reproduction, in which a parent
undergoes some form of transformation to
produce a child very much like itself, is called
mutation.
• The main task of mutation is to provide new
solutions that cannot be generated
otherwise.
• It introduces an element of random search,
termed as exploration, where the selection
and crossover processes focus attention on
promising regions of the search space
referred as exploitation.
• The occurrence of mutation operator is
determined by a user-settable parameter
known as the mutation probability.
• This probability is usually much lower than the
crossover probability to prevent too much
random search. Values of 0.001 to 0.05 are
common.
• In the case of binary strings, a mutation may
be the flipping of a randomly chosen bit.
Before mutation 0 1 1 0 0 0 1 1
After mutation 0 1 1 0 1 0 1 1
• In the case of integer or real-coded strings, it
may consist in replacing a number on the
string by a new random value within the
permissible range, or adding a random value
from some distribution to that number.
• In the real-coded strings, care must be taken
to map the new value back into the
permissible range.
• The schema theorem places the greatest
emphasis on the role of crossover and
hyperplane sampling in genetic search.
• To maximize the preservation of hyperplane
samples after selection the disruptive effects
of crossover and mutation should be
minimized.
• This suggests that mutation should perhaps
not be used at all or at least used at very low
levels.
• The motivation for A typical population with
using mutation is missing genetic material
to prevent the 0 1 0 0 0 1 1
permanent loss of 1 0 0 1 0 1 0
any particular bit 0 1 0 1 1 1 1
or allele. 0 0 0 1 0 1 0
1 1 0 0 1 1 1
• Crossover or 0 1 0 0 0 1 0
1 0 0 1 1 1 0
selection operation
0 0 0 1 1 1 1
can not introduce
absent or lost Genetic Genetic
genetic material. material material
1 absent 0 absent
• After several
generations it is possible Example
that selection will drive Maximize f(x)=x(8-x).
all the bits in some After few generations a
position to a single typical population
value either 0 or 1. presented here .
• If this happens without
mutation the genetic 0 1 0 1
algorithm will converge 0 1 0 1
to a suboptimal 0 1 0 1
solution, called 0 1 0 1
premature convergence.
0 1 0 1
0 1 0 1
• premature converge is particularly be a
problem if one is working with a small
population.
• Without a mutation operator there is no
possibility for reintroducing the missing bit
value.
• If the target function is nonstationary and the
fitness landscape changes over time which is
certainly the case in real biological systems, then
there need to be some source of continuing
genetic diversity.
The NK model is a mathematical model described by its primary inventor Stuart Kauffman as a
"tunably rugged" fitness landscape. "Tunable ruggedness“ captures the intuition that both the overall
size of the landscape and the number of its local "hills and valleys" can be adjusted via changes to
its two parameters,N and K, with N being the length of a string of evolution and K determining the
level of landscape ruggedness.
• Mutation can have a significant impact on
convergence and change the number of fixed
points in the space.
• Mutation may introduce invalid values outside
the search region, special care may be required to
avoid this.
• In the search space metaphor, every point in the
space is a genotype.
• Evolutionary variation (such as mutation, sexual
recombination and genetic rearrangements)
identifies the legal moves in this space.
Mutation operators for real coded GA
• Let us suppose C c1 ,...ci ,...cn a chromosome
and ci ai , bi a gene to be mutated.
ci tbi ci , if 1
With being a random number which may have
a value 0 or 1, and
1
t
b
t y y1 r gmax
Where r is a random number in the interval
[0,1] and b is a parameter chosen by the user,
which determines the degree of dependency
on the number of iterations.
Random Mutation
• Here, mutated solution is obtained from the original solution using
the rule given below.
Prmutated = Proriginal + (r − 0.5)Δ,
• where
– r is a random number varying in the range of (0.0, 1.0),
– Δ is the maximum value of perturbation defined by the user.
Example:
• Let us assume the original parent solution
Proriginal = 15.6.
• Determine the mutated solution Prmutated, corresponding to
r = 0.7 and
Δ = 2.5.
• The mutated solution is calculated like the following:
Prmutated = 15.6 + (0.7 − 0.5) × 2.5 = 16.1
Polynomial Mutation
• Deb and Goyal proposed a mutation operator based on polynomial
distribution.
• The following steps are considered to obtain the mutated solution from an
original solution:
– Step 1: Generate a random number r lying between 0.0 and 1.0.
– Step 2: Calculate the perturbation factor δ corresponding to r using the
following equation 1
2r q 1 1 if r 0.5
1
1 - 21 - r
q 1
if r 0.5
where q is an exponent (positive real number).
– Step 3: The mutated solution is then determined from the original
solution as follows:
Prmutated = Proriginal + δ × δmax,
1 - 21 - r
q 1
0.1565
• The mutated solution is then determined from the original solution
like the following.
Prmutated = Proriginal + δ × δmax = 15.7878
Crossovers in GA
B. B. Misra
• Crossover is the main genetic operator.
• Crossover is the genetic operator that mixes
two chromosomes together to form new
offspring.
• The intuition behind crossover is exploration
of new solutions by exploitation of the existing
solutions
Crossover allows
the genetic
algorithm to
explore new
areas in the
search space
and gives the GA
the majority of
its searching
power.
0100
0101 1100 1101
0000 1000
0001 1001
1010 1011
0010 0011
What are the chromosomes (excepting itself) with which 0101 performs crossover operation and
generates clones of the parents?
All the chromosomes have hamming distance ≤ 1.
Ex. 1101, 0001,0111, 0100 (in the figure, all are in one link distance from 0101)
• The new offspring comprise different
segments from each parent and thereby
inherit properties from both parents.
• GA’s construct a better solution by mixing
good characteristic of chromosomes together.
• Higher fitness chromosomes have an
opportunity to be selected more than the
lower ones, so good solution always alive to
the next generation.
• The occurrence of crossover is determined
probabilistically called crossover probability.
• When crossover is not applied, offspring are
simply duplicates of the parents, thereby
giving each individual a chance of passing on a
pure copy of its genes into the gene pool.
Possibilities in crossover
• Two parents produce two offspring
• There is a chance that the chromosomes of
the two parents are copied unmodified as
offspring
• There is a chance that the chromosomes of
the two parents are randomly recombined
(crossover) to form offspring
• Generally the chance of crossover is between
0.6 and 1.0
1-Point Crossover
• Choose a random point across the
chromosome length
• Split parents at this crossover point
• Create children by exchanging tails
Crossover with Single Crossover Point
Crossover
Point
↓
• Father 0 0 0 0 0 0 0 0
• Mother 1 1 1 1 1 1 1 1
• Child 1 0 0 0 0 1 1 1 1
• Child 2 1 1 1 1 0 0 0 0
Multi-point Crossover
• Choose multiple random points across the
chromosome length
• Split parents at this crossover point
• Create children by exchanging bits among
these crossover points.
NB. Ensure that the genes are copied to the
same positions of the chromosomes only.
Crossover with Three Crossover Points
↓ ↓ ↓
• Father 0 0 0 0 0 0 0 0
• Mother 1 1 1 1 1 1 1 1
• Child 1 0 1 1 0 0 0 1 1
• Child 2 1 0 0 1 1 1 0 0
Uniform Crossover
• It is the extreme state of multi point crossover.
• Each bit /gene has probability 0.5 to be
selected from either of the parent.
• The number of effective crossing points is not
fixed but averages to half of the string length.
Uniform Crossover working procedure
• Generally a vector mask of the size of the
chromosome length is generated randomly with 0
or 1 bit value for each crossover operation.
• To generate each bit for 1st offspring, respective
bit of mask is checked.
• If the bit value of mask is 1, respective gene of
parent1 is copied otherwise from parent2.
• This process is reversed to generate the 2nd
offspring.
Uniform Crossover example
Parent1 Parent2
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
Mask
1 0 1 1 0 0 1 0
Offspring1 Offspring2
0 1 0 0 1 1 0 1 1 0 1 1 0 0 1 0
Assignment
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
point2
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
Point1
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 0 0 1 1 1 1 0 0 1 1 1 0 0 0 0 1 1
0 0 0 1 1 1 1 0 0 1 1 1 0 0 0 0 1 1
1 1 1 0 0 0 0 1 1 0 0 0 1 1 1 1 0 0
1 1 1 0 0 0 0 1 1 0 0 0 1 1 1 1 0 0
1 1 1 0 0 0 0 1 1 0 0 0 1 1 1 1 0 0
0 0 0 1 1 1 1 0 0 1 1 1 0 0 0 0 1 1
0 0 0 1 1 1 1 0 0 1 1 1 0 0 0 0 1 1
Offspring1 Offspring2
Crossover operator for Real code
Linear Crossover
• It was proposed by Wright in 1991.
• To explain its principle, let us consider that two parents: Pr1 and Pr2
are participating in crossover.
• They produce three solutions as follows:
0.5(Pr1 + Pr2),
(1.5Pr1 − 0.5Pr2), and
(−0.5Pr1 + 1.5Pr2).
• Out of these three solutions, the best two are selected as the children
solutions.
Example:
• Let us assume that the parents are: Pr1 = 15.65, Pr2 = 18.83.
• Using the linear crossover operator, three solutions are found to be
like the following:
0.5(15.65 + 18.83) = 17.24,
1.5 × 15.65 − 0.5 × 18.83 = 14.06,
−0.5 × 15.65 + 1.5 × 18.83 = 20.42
Blend Crossover (BLX - α)
• This operator was developed by Eshelman and Schaffer in 1993.
• Let us consider two parents: Pr1 and Pr2, such that Pr1 < Pr2.
• It creates the children solutions lying in the range of
[{Pr1 − α(Pr2 − Pr1)}, {Pr2 + α(Pr2 − Pr1)}],
• where the constant α is to be selected, so that the children solutions
do not come out of the range.
• Another parameter γ has been defined by utilizing the said α and a
random number r varying in the range of (0.0, 1.0) like the following:
γ = (1 + 2α)r − α.
• The children solutions (Ch1, Ch2) are determined from the parents as
follows:
Ch1 = (1 − γ)Pr1 + γPr2,
Ch2 = (1 − γ)Pr2 + γPr1.
Example for Blend Crossover (BLX - α)
• Example: Let us assume that the parents are:
Pr1 = 15.65,
Pr2 = 18.83.
• Assume: α = 0.5, r = 0.6.
• The parameter γ is calculated like the following
γ = (1 + 2α)r − α = (1 + 2 × 0.5)0.6 − 0.5=0.7
• The children solutions are then determined as follows:
Ch1 = (1 − γ)Pr1 + γPr2 = 17.876
Ch2 = (1 − γ)Pr2 + γPr1 = 16.604
Simulated Binary Crossover (SBX)
• It was proposed by Deb and Agrawal in 1995.
• Its search power is represented with the help of a probability
distribution of generated children solutions from the given parents.
• A spread factor α has been introduced to represent the spread of the
children solutions with respect to that of the parents, as given below.
Ch1 Ch2
, where Pr1, Pr2 represent the parent points and Ch1 and Ch2
Pr1 Pr2
are the children solution.
• Three different cases may occur:
– Case 1: Contracting crossover (α<1) i.e. the spread of the children solutions is
less than that of the parent.
– Case 2: Contracting crossover (α>1) i.e. the spread of the children solutions is
more than that of the parent.
– Case 1: Contracting crossover (α=1) i.e. the spread of the children solutions is
exactly the same as the parents.
Simulated Binary Crossover (SBX) cntd.
• The probability
distributions for
creating children
solutions from
the parents have
been assumed to
be polynomial in
nature as in the
figure.
• The probability
distributions
depend on the
exponent q,
which is a non-
Fig. Probability distributions for creating the children solutions from
negative real the parents vs. spread factor α
number.
Simulated Binary Crossover (SBX) cntd.
• For the contracting crossover, the probability distribution is given by:
C 0.5q 1 q
1
• For the expanding crossover, it is expressed as: Ex 0.5q 1 q 2
• For small values of q, the children are far away from the parents.
• For high values of q, the children are close to the parents.
• Fig. at pre page shows the variations of probability distributions for
different values of q (say 0, 2 and 10).
• The area under1 the probability distribution curve in the contracting
crossover zone C d 0.5
• and that in the expanding crossover zone 1 Ex d 0.5
0
Simulated Binary Crossover (SBX) cntd.
• The following steps are used to create two children solutions: Ch1 and
Ch2 from the parents, Pr1 and Pr2:
• Step 1: Create a random number r lying between 0.0 and 1.0.
• Step 2: Determine α’for which the cumulative probability
'
C d r, if r 0.5and
0
Ex d r 0.5, if r 0.5
'
• Step 3: Knowing the value of α’, the children solutions are determined
like the following:
Ch1 0.5Pr1 Pr2 ' Pr2 Pr 1 ,
Ch2 0.5Pr1 Pr2 ' Pr2 Pr 1
SBX Example
• Let the parents are: Pr1 = 15.65, Pr2 = 18.83.
• Determine children solutions using the SBX.
• Assume exponent q = 2.
• Let the generated random number, r = 0.6.
• As r > 0.5, we calculate α’, such that
'
Ex d r 0.5 substituting,r 0.6, wehave
1
' 1
0.5q 1 q 2 d 0.1, substituting,q 2, wehave
1
3 ' '
' 1
0.5
1.5 d 0.1, 1.5
0.1 3
0.1
1 4
3 1 1
0.5 0.5 0.5
3 3 0.1 0.4 3 ' 1.0772
' 1 '
• Then the children generated are:
Ch1 0.5Pr1 Pr2 ' Pr2 Pr1 0.515.65 18.83 1.0772 18.83 15.65 15.5273
Ch2 0.5Pr1 Pr2 ' Pr2 Pr1 0.515.65 18.83 1.0772 18.83 15.65 18.9527
Crossover is a critical feature of
genetic algorithms
B B Misra
• The fitness of the chromosome drives the
process of GA, which relates the qualitative
‘goodness’ of a candidate solution in
quantitative terms.
• The fitness function encapsulates the
problem-specific knowledge.
• The chromosomes are decoded into their
actual representation, analyzed and given a
scalar fitness value to characterize how close
they are to the ideal solution.
• Cost/fitness/objective function determines
the environment within which the solutions
are “live.”
• A fitness value that reflects how good the
chromosomes is.
• An ideal fitness function should correlate
closely to goal of the problem and should be
quickly computable.
• A fitness function quantifies the optimality of
a solution (chromosome) so that that
particular solution may be ranked against all
the other solutions
Purpose of fitness function
• Parent selection
• Measure for convergence
• For Steady state: Selection of individuals to die
• Should reflect the value of the chromosome in
some “real” way
• The most critical part of GA after encoding.
Fitness scaling
• Fitness values are scaled by subtraction and
division so that worst value is close to 0 and
the best value is close to a certain value,
typically 2
– Chance for the most fit individual is 2 times the
average
– Chance for the least fit individual is close to 0
Fitness scaling
• Problems when the original maximum is very
extreme (super-fit) or when the original
minimum is very extreme (super-unfit)
– Can be solved by defining a minimum and/or a
maximum value for the fitness
Perceptron
wi xi b 0 -(1)
i 1
• In Figure, a point (x1, x2) lies above
boundary line assigned class 1 Figure: Illustration of
and if the point is below it then hyper plane as
decision boundary for
assigned class 2 . 2D, 2class pattern
classification problem
• Example: Considering the decision boundary of the given figure,
classify the points {(2,2), (1, 2), (4,-1), (5,-3), (-1,7), (2,1)}.
We know equation of the line is y=mx + c. Considering the given coordinates (0,5)
(3,0) and (0, 5) on the decision boundary, we have 1
0 = m3+c and 2
5 = 0 +c i.e. c=5, then from previous eqn. m= -c/3= -5/3. (3,0)
Then the equation is y = -(5/3)x + 5
Here in the neuron instead of y we take x2 is used so the above eqn. can be rewritten as Decision
boundary
x2 = -(5/3)x1 + 5 -(a)
• But the eqn. we use for neuron is: w1x1 + w2x2 + b=0;
=> if w2 ≠ 0, x2 = -(w1/w2)x1- b/w2 -(b)
Comparing (a) and (b), we get -(w1/w2)= -(5/3) and –b/(w2) = 5 i.e. –b/(w2) = 15/3
w1 = 5, w2 = 3, b = -15
Then the eqn. for the decision boundary obtained is 5x1 + 3x2 – 15 = 0.
If point (2,2) is considered, we get 5*2 +3* 3 – 15=4>0, so the point is above the decision boundary and
point (2,2) ∈ 𝒸1
If point (1,2) is considered, we get 5*1 +3* 3 – 15= -1< 0, so the point is below the decision boundary and
point (1,2) ∈ 𝒸2
…
If point (5,-3) is considered, we get 5*5 -3* 3 – 15 = 1>0, so the point is above the decision boundary and
point (5, -3) ∈ 𝒸1
• Bias b shifts the boundary away from the origin.
• Synaptic weights adapted iteration by iteration basis.
• This adaptation use an error-correction rule known as the
perceptron convergence algorithm.
Perceptron convergence theorem
• Let us use the modified signal
flow graph as in the Figure.
• Let the (m+1) –by-1 input
vector, X(n)=[+1, x1(n), …,
xm(n)], where n denotes the
iteration step.
where w0(n)=b(n)
• Perceptron function properly when classes
1and 2 are linearly separable.
• In Figure (a) 1and 2 are sufficiently
separated to draw a hyperplane as
decision boundary.
• In Figure (b) 1and 2 are more close to
each other, became non-linearly separably
(can not be classified by perceptron).
• Suppose the input variable originates from
two linearly separable classes.
• Let 1 be the subset of training vectors
X1(1), X1(2), … 1 and 2 be the subset of
training vectors X2(1), X2(2), … 2.
• Complete training set is the union of 1
and 2.
• Given the input vector 1 and 2 to train the classifier, the training
process involves adjustment of weight vector W such that
WTX > 0 for every input vector X 1
WTX ≤ 0 for every input vector X 2 (3)
Rules for adapting synaptic weights of perceptron
1. If x(n) is correctly classified by W(n), no correction to W i.e.
W(n+1)= W(n), if WT(n)x(n) > 0 and x(n) 1
W(n+1)= W(n), if WT(n)x(n) ≤ 0 and x(n) 2 -(4)
2. Otherwise update weight
W(n+1)= W(n) – η(n)x(n), if WT(n)x(n) > 0 and x(n) 2
W(n+1)= W(n) + η(n)x(n), if WT(n)x(n) ≤ 0 and x(n) 1 -(5)
Where the learning rate parameter η(n) controls the adjustment
applied to the weight vector at iteration n.
Figure: Illustration of
hyper plane as
decision boundary for
2D, 2class pattern
classification problem
Fixed increment adaptation rule
• η(n)= η>0, where η is a constant learning rate parameter
independent of iteration number n.
• η>0, scale the pattern vectors without affecting separability.
• Proof: let η =1, W(0)=0 and WT(n)x(n)<0. for n=1,2,… and x(n) 1
i.e. perceptron incorrectly classifies the vectors x(1), x(2), … (2nd
condition of eqn.(3) violated).
• Then as η =1, with 2nd condition of eqn. (5) , we have
W(n+1) =W(n) +x(n), x(n) 1 -(6)
• Given initial condition W(0)=0, we iteratively solve for W(n+1) as
W(0)=0
W(1)=W(0)+ ηx(1)=W(0)+ x(1)=x(1), as η=1, if WTx(1) ≤ 0 and x(1) 1
W(2)=W(1)+ ηx(2)=x(1)+x(2), if WTx(1) ≤ 0 and x(1) 1
W(3)=W(2)+ ηx(3)=x(1)+x(2)+x(3), if WTx(1) ≤ 0 and x(1) 1
…
W(n+1)=x(1)+x(2)+ … +x(n), if WTx(1) ≤ 0 and x(1) 1 -(7)
• Since the classes are linearly separable, w0 for which WTx > 0
for x(1), x(2), … ,x(n) 1
• For a fixed solution w0, let’s define a positive number, α as
min W0T x(n) -(8)
x ( n ) 1
W0T W(n+1) ≥ nα -(9) (there are n number of terms in right side of the
above equation)
• Given two vectors W0 and W(n+1), the Cauchy-Schwarz inequality
states that W W (n 1) W W (n 1) -(10)
2 2 T 2
0 0
n 2 2
2
W ( n 1) 2 -(11)
W0
• Eqn. (6) may be rewritten as
W(k+1) =W(k) +x(k), for k=1,2, …, n, and x(k) 1 -(12)
• Squared Euclidean norm of both side of eqn. (12)
2 2 2
W (k 1) W (k ) x (k ) 2W T (k ) x(k ) -(13)
• As perceptron incorrectly classify vector x(k) 1, we have
WT(k)x(k)<0, then from eqn. (13) we deduce Ex.
2 2 2
W (k 1) W ( k ) x (k ) 10 = 9+5+2(-2)
=> 10 ≤ 9+5
2 2
W (k 1) W (k ) x (k ) , k 1,..., n
2
-(14)
• Adding these n
inequalities for k=1,2,…, n, and for W(0)=0, we have
W (k 1) x(k )
2 2
k 1
2 2 2 2
W (k 1) x(1) x(2) ... x (n)
2
W (k 1) β β ... β
2
W (k 1) nβ -(15)
2
where β is a positive number defined by max x ( k ) -(16)
x k 1
2 n 2 2
W (n 1)
• For large n, eqn. (15) conflicts with W -(11) 0
2
n 2 2
1 b1 1 b0
b b2
m
x1 w1 Z1 v1
X1 1
w1
w1m v2 y
x2
2
Z2 Y
X2
wn1
xn wn2 Zm vm
Xn wnm
1. Initialize weights between input and hidden layer. Set the weights between
hidden and output layer. Set the learning rate α.
2. Repeat
3. for each training pattern
4. for each hidden (Adaline) unit
5. Calculate net input, zinj=bj+∑ni=1xiwij, 1 ≤ i ≤ n, 1 ≤ j ≤ m
6. Calculate output of each hidden unit, zj=f(zinj)
7. endfor
8. Find input to the output unit, yin=b0+∑mj=1zjvj, 1 ≤ i ≤ m
9. Calculate output of the net, y=f(yin)
10. Update weights
11. if d ≠ y
12. bj (new)=bj (old) + α (d - zinj)
13. wij (new)=wij (old) + α (d - zinj) xi
14. endif
15. endfor
16. Until stopping condition satisfied (no or minimal wt. change or max. epochs)
x1 x2 d
1 1 -1
Implement XOR fn. b1 1
using Madaline n/w. 1 -1 1 1 b0
b2
Use bipolar input -1 1 1 Z1
and output.
x1 w11
X1 v1
-1 -1 -1 w1
y
Let the initial weights are w21
2
Z2 Y
xn v2
[w11, w21, b1]=[0.05, 0.2, 0.3] X2 w
[w12, w22, b2]=[0.1, 0.2, 0.15] 22
have yi n (10)
w ji n
• The use of eqs. 7 – 10 in eq. 6, we have
E n E n e j n y j n v j n
e j n 'j v j n yi n (11)
w ji n e j n y j n v j n w ji n
• Delta rule: The correction Δwji(n) applied to wji(n) is defined by
the delta rule:w ji n E n (12)
w ji n
where η is the learning rate parameter.
• The minus sign in eq.(12) is for gradient descent in the wt.
space.
Basics of Back-Propagation Algorithm cntd.
• Use of eq. (11) in eq. (12) yields w ji n j n yi n (13)
where the local gradient j n is defined by
E n E n e j n y j n (14)
j n e j n 'j v j n
v j n e j n y j n v j n
'j v j n
az
a
z 11
a 2
a
1
a y j n y 2j n
1 z 2 1 z 1 z 1 z 1 z 1 z
2 2
'j v j n ay j n 1 y j n
(27)
For output layer, yj(n) = oj(n), then
j n e j n 'j v j n ad j n o j n o j n 1 o j n (28)
e j n
For hidden layers,
j n 'j v j n k (n)wkj n ay j n 1 y j n k (n)wkj n (29)
k k
Activation functions for MLP cntd.
II. Hyperbolic tangent function
y j j v j n a tanh(bv j n ) , (a, b) 0 (30)
where a, b are constants
• Then by differentiating the hyperbolic tangent function, we get
'j v j n ab sech 2 (bv j n ) ab(1 tanh 2 (bv j n )) ab(1 tanh(bv j n ))(1 tanh(bv j n ))
b
a
b
(a a tanh(bv j n ))(a a tanh(bv j n )) a - y j n a y j n
a
(31)
• The equations for output and hidden layer using hyperbolic tangent fn. can
be written as:
i. For the output layer: j n e j n '
j v j n
b
d j
n o j n a - y j n a y j n
a
(32)
e j n
W 121 W 221 e1
W 112 W 112
x2 y02 v12 y12 v22 y22 = o2 d2
W 122 W 222 e2
l0 j1 j2
i 0
l
where v j is the induced local field for neuron j in the layer l.
yil 1 is output of neuron i in layer l-1
wlji is weight of neuron j in layer l that is feed from neuron i in layer l-1
for i = 0, y0l 1 1 and w j 0 b j
l l
for l >0, find the output signal y j j v j ,if the first layer (input), y 0j x j and for
l l
j k
j
k
kj
y11 v11 1 / 1 exp av11 1 / 1 exp 0.2 * 0.72 0.5359
y12 v 1 / 1 exp av 1 / 1 exp 0.2 *1.44 0.5715
1
2
1
2
v12 w102 w112 y11 w122 y12 0.4 0.7 * 5359 0.8 * 5715 1.2323
o1 y12 1 / 1 exp av12 1 / 1 exp 0.2 *1.2323 0.5613
e1 d1 o1 0.7 0.5613 0.1387
• Backward pass
12 e1 ' v12 e1ay12 1 y12 0.1387 * 0.2 * 0.5613 *[1 0.5613] 0.0068
v
1
1
1
'
1
1
1
11
k
11
w
kj ay 1 y
1
1
1
1
2
k w112 0.2 * 0.5359 * 1 0.5359* 0.0068 * 0.7 0.0002367
k k 1
v
1
1
2
'
2
1
2
11
k w 11
kj ay 1 y
1
2
1
2
2
k w122 0.2 * 0.5715 * 1 0.5715* 0.0068 * 0.8 0.0002664
k k 1
Example on back-propagation learning cntd.
1
w10 0.5, w120 0.6 w102 0.4, Parameter : 0.1, 0.5
0.4, w121 0.3 w112 0.7, input : x1 0.1 y1 , x2 0.9 y2
1 0 0
w11
+1 y10
1
w12 0.2, w122 0.9 w122 0.8 +1 y00
W 110=b11 W 210=b21
• Values obtained w120=b12
v11 v21
y11 0.5359, y12 0.5715, o1 y12 0.5613 x1 y01 W 111 y11 W 211 y21 = o1 d1
W 121 e1
0.0002367, 0.0002664, 0.0068
1
1
1
2 1
2
W 112 W 212
• Let’s update the weights x2 y02 v12 y12
W 122
wlji next wlji current wlji previous lj yil 1 l0 j1 j2
b12 w102 w102 w102 12 y10 0.4 0.1* 0.4 0.5 * 0.0068 *1 0.4434
w112 w112 w112 12 y11 0.7 0.1* 0.7 0.5 * 0.0068 * 0.5359 0.7718
w122 w122 w122 12 y12 0.8 0.1* 0.8 0.5 * 0.0068 * 0.5715 0.8819
b11 w10
1
w10
1
w10
1
11 y00 0.5 0.1* 0.5 0.5 * 0.0002367 *1 0.5501
1
w11 w11
1
w11
1
11 y10 0.4 0.1* 0.4 0.5 * 0.0002367 * 0.1 0.44
1
w12 w12
1
w12
1
11 y20 0.2 0.1* 0.2 0.5 * 0.0002367 * 0.9 0.2201
b21 w120 w120 w120 21 y00 0.6 0.1* 0.6 0.5 * 0.0002664 *1 0.6601
w121 w121 w121 21 y10 0.3 0.1* 0.3 0.5 * 0.0002664 * 0.1 0.33
w122 w122 w122 21 y20 0.9 0.1* 0.9 0.5 * 0.0002664 * 0.9 0.9901
Rate of Learning
• Back propagation approximates the trajectory in weight space.
• When η is small, change to weight is small and weight space trajectory is
smoother and learning is slow.
• When η too large, large change in weight, network unstable (i.e. oscillatory)
• Momentum term included to increase learning rate without instability by
modifying delta rule as:
w ji n w ji n 1 j n y j n (34)
• Eq. (34) is also called as Generalized Delta Rule.
• α – momentum constant, a + ve number, controls feed back loop around
Δwji(n), and 0 ≤ α ≤ 1.
i. +a, we set dj = a – є
ii. -a, we set dj = - a + є
where є is a positive constant.
Ex. for a = 1.7159, we set є = 0.7159, then desired response dj is ± 1
• If offset є is not considered, BPA drives free parameters to infinity, slow down
learning process by driving hidden neurons to saturation.
Heuristics for making the BPA perform better cnt.
5. Normalizing the inputs
• Mean value averaged over entire
training set is close to zero or small
compared to standard deviation.
y2 E yi i 2 using (35) i.e. i 0
E yi 1, i
2
(36)
1, for k i
• Let the input are uncorrelated, i.e. E yi y k (37)
0, for k i
Heuristics for making the BPA perform better cnt.
6. Initialization cntd.
• Let the weights are drawn from a uniformly distributed set of numbers with zero
mean, w E w ji 0, j, i pair , (38) and variance
v E v j E w ji yi
m
i 1
E w ji E yi , using (38), i.e. E w ji 0
m
i 1
0 (39), and
v2 E v j v 2 E v j 2
m m
E w ji w jk yi yk
i 1 k 1
1, k i
for
m m
E w ji w jk E yi yk using (37) i.e. E yi yk
i 1 k 1 0, for k i
m
E w2ji m w2
i 1 (40)
• For hyperbolic tangent fn., taking a=1.7159, b=2/3, and set σv = 1, then from (40)
v2 m w2 1 m w2 w 1 / m (41)
• Then for uniform distribution, weights should have mean zero and variance, w 1 / m
2
• Clearly, a one dimensional map will just have a single row (or a single
column) in the computational layer.
Components of Self Organization
The self-organization process involves four major components:
• Initialization: All the connection weights are initialized with small
random values.
• Competition: For each input pattern, the neurons compute their
respective values of a discriminant function which provides the basis
for competition.
The particular neuron with the smallest value of the discriminant
function is declared the winner.
Components of Self Organization cntd.
• Cooperation: The winning neuron determines the spatial location of
a topological neighbourhood of excited neurons, thereby providing
the basis for cooperation among neighbouring neurons.
• Adaptation: The excited neurons decrease their individual values of
the discriminant function in relation to the input pattern through
suitable adjustment of the associated connection weights, such that
the response of the winning neuron to the subsequent application of
a similar input pattern is enhanced.
The Competitive Process
• If the input space is D dimensional (i.e. there are D input units) we
can write the input patterns as x = {xi : i = 1, …, D} and the connection
weights between the input units i and the neurons j in the
computation layer can be written wj = {wji : j = 1, …, N; i = 1, …, D}
where N is the total number of neurons.
• We can then define our discriminant function to be the squared
Euclidean distance between the input vector x and the weight vector
wj for each neuron j
The Competitive Process cntd.
• In other words, the neuron whose weight vector comes closest to the
input vector (i.e. is most similar to it) is declared the winner.
• In this way the continuous input space can be mapped to the discrete
output space of neurons by a simple process of competition between
the neurons.
The Cooperative Process
• In neurobiological studies we find that there is lateral interaction
within a set of excited neurons.
• When one neuron fires, its closest neighbours tend to get excited
more than those further away.
• There is a topological neighbourhood that decays with distance.
• We want to define a similar topological neighbourhood for the
neurons in our SOM.
• If Sij is the lateral distance between neurons i and j on the grid of
neurons, we take
B. B. Misra
Simon Haykin, Neural Networks and Learning Machines, 3rd Eds. Prentice Hall.
Radial basis function
• The interpolation problem (strict) states: Given a set of N
different points x i m | i 1,2,..., N
and a
corresponding set of N real numbers d i 1 | i 1,2,..., N
find a function F :
N 1
that satisfies the
interpolation condition: F(xi) = di, i = 1, 2, ..., N (5.10)
• For strict interpolation the interpolating surface (i.e., function F)
is constrained to pass through all the training data points.
• The radial-basis-functions (RBF) technique consists of choosing a
N
function F that has the form F x wi x x i (5.11)
i 1
• Where ij xi x j , i, j 1,2,..., N
• Let d d1 , d 2 ,..., d N T
w w1 , w2 ,..., wN
T
interpolation matrix , whose ij-th element is ij xi x j
is nonsingular.
• There is a large class of radial-basis functions that is covered by
Micchelli’s theorem.
• It includes the following functions that are of particular interest
in the study of RBF networks.
1. Multiquadrics:
r r c for some c 0 and r
2 2 1/ 2
(5.17)
2. Inverse multiquadrics:
1
r 2 2 1/ 2 for some c 0 and r
r c (5.18)
3. Gaussian functions:
r2
r exp 2 , for some 0 and r (5.19)
2
• For the above radial-basis functions to be nonsingular, the
points xi i 1 must all be different (i.e.,distinct).
N
x1 h1(x) w1 x1 x2 y
f(x) 1 1 -1
w2 1 -1 1
x2 h2(x) w3
-1 1 1
1 -1 -1 -1
Training XOR using RBFN and weight obtained using LSE in
matlab
clear;
Weights obtained after
clc;
x=[1 1
training
1 -1 w1= -1616888291.27575
-1 1 w2= -446.694275863286
-1 -1]; %Training data w3= 1.82756548365902
y=[-1 1 1 -1]; %Training target
c=[0.1,-0.1];
sigma=[0.2,0.4];
for i=1:4 %
for j=1:2
h(i,j)=exp(-((x(i,1)-c(j))^2+(x(i,2)-c(j))^2)/(2*sigma(j)^2));
end
end
h=[h, ones(4,1)];
w=inv(h'*h)*h'*y‘;
Adaptive Resonance Theory
(ART)
Motivation
• How can we create a machine that can act and navigate in a world
that is constantly changing?
• Such a machine would have to:
– Learn what objects are and where they are located.
– Determine how the environment has changed if need be.
– Handle unexpected events.
– Be able to learn unsupervised.
• Known as the “Stability – Plasticity Dilemma”: How can a system be
adaptive enough to handle significant events while stable enough to
handle irrelevant events?
Stability – Plasticity Dilemma
• More generally, the Stability – Plasticity Dilemma asks: How can a
system retain its previously learned knowledge while incorporating
new information?
• Real world example:
– Suppose you grew up in New York and moved to California for several
years later in life.
– Upon your return to New York, you find that familiar streets and
avenues have changed due to progress and construction.
– To arrive at your specific destination, you need to incorporate this new
information with your existing (if not outdated) knowledge of how to
navigate throughout New York.
– How would you do this?
Adaptive Resonance Theory
• Gail Carpenter and Stephen Grossberg (Boston University)
developed the Adaptive Resonance learning model to answer this
question.
• Essentially, ART (Adaptive Resonance Theory) models incorporate
new data by checking for similarity between this new data and data
already learned; “memory”.
• If there is a close enough match, the new data is learned.
• Otherwise, this new data is stored as a “new memory”.
• Some models of Adaptive Resonance Theory are:
– ART1 – Discrete input.
– ART2 – Continuous input.
– ARTMAP – Using two input vectors, transforms the unsupervised ART
model into a supervised one.
– Various others: Fuzzy ART, Fuzzy ARTMAP (FARTMAP), etc…
Competitive Learning Models
• ART Models were developed out of Competitive Learning
Models.
• Let: I = 1 … M, J = 1 … N
• XI = normalized input for node “I”
• ZIJ = weight from input node I to LTM category J
Competitive Learning Models cntd.
• Competitive Learning models follow a “winner take all” approach in
that it searches for a LTM node that will determine how ZIJ is
modified.
Competitive Learning Models cntd.
• Once the appropriate LTM node has been chosen the weight vector
is updated based on the memory contained within the “winning”
node.
• One such practice is to replace the existing weight vector with the
difference between itself and the normalized input values. That is:
Let:
Zj = be the weight vector for LTM node “J”
Xj = be the vector representing the normalized input values for each input
node.
• Then the new weight vector is simply:
B. B. Misra
Artificial Neural Network
BBMisra
• Let us solve one classification problem.
• For hand calculation and simplicity of understanding, let us take a
data set with few attributes.
• Let us finalize one neural network architecture to solve it.
• Then we will use GA for hybridization.
Dataset used
• Haberman's Survival Data Set (UCI Machine Learning Repository)
• Data Set Information:
– The dataset contains cases from a study that was conducted between
1958 and 1970 at the University of Chicago's Billings Hospital on the
survival of patients who had undergone surgery for breast cancer.
• Attribute Information:
1. Age of patient at time of operation (numerical)
2. Patient's year of operation (year - 1900, numerical)
3. Number of positive axillary nodes detected (numerical)
4. Survival status (class attribute)
1 = the patient survived 5 years or longer
2 = the patient died within 5 year
Data Set Characteristics: Multivariate Number of Instances: 306
Attribute Characteristics: Integer Number of Attributes: 3
Associated Tasks: Classification Missing Values? No
• Ref.: Haberman, S. J. (1976). Generalized Residuals for Log-Linear Models, Proceedings of the 9th International
Biometrics Conference, Boston, pp. 104-122.
• For implementation in neural network
– Attributes normalized between [0,1]
– Class labels modified as 1-> 0 and 2->1
• Let’s consider the following 4 random instances from the 306
instances of Haberman's Survival Data Set for hand calculation and
understanding of the method.
Attributes Class
at1 at2 at3 label
sample1 0.23 0.45 0.02 0
Sample2 0.43 0.64 0.02 1
Sample3 0.23 0.09 0.04 0
Sample4 0.57 0.09 0.33 1
1 w 1 Logistic activation Samples from Haberman's Survival Data Set
1 w9 function used in the Attributes Class
w5
x1 neurons z1, z2, y. at1 at2 at3 label (t)
w2
sample1 0.23 0.45 0.02 0
w6 z1 w10 O
y Sample2 0.43 0.64 0.02 1
w3 Sample3 0.23 0.09 0.04 0
x2
w7 Sample4 0.57 0.09 0.33 1
z2 w11
w4 Example of one population using real coded GA
x3
w8 w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11
Forward pass calculations for BPN Id1 .27 .61 .17 .91 .82 .45 .37 .63 .72 .47 .29
Z1in=w1+x1w2+x2w3+x3w4, Id2 .39 .49 .57 .72 .13 .68 .51 .25 .91 .69 .17
Z1o=1/(1+exp(-Z1in)) - - - - - - - - - - - -
idn .87 .11 .63 .16 .56 .28 .71 .41 .26 .91 .85
Z2in=w5+x1w6+x2w7+x3w8,
Sample forward pass calculation for ID1 for all 4 samples
Z2o=1/(1+exp(-Z2in)) Sample Z1in Z2in Z1o Z2o Yin yo err
yin=w9+Z1o*w10+Z2o*w11, 1 0.51 1.1 0.62 0.75 1.23 0.77 0.77
Individual 1
0.5
Absolute error
0.45
0.4
0.35
0.3
0.25
0.2
0 20 40 60 80 100 120
Generations