Lect 15 MLP Introduction Backprop
Lect 15 MLP Introduction Backprop
Algorithm- Backpropagation
Multilayer Perceptron
output layer
hidden layer
input layer
Non-Linear Model : Mathematical Representation of
Sigmoid activation function
weights
x1 w1
output
w2 activation
x2 y
. a=i=1n wi xi
.
. wn
xn y=(a) =1/(1+e-a)
Learning with hidden units
• Networks without hidden units are very limited in the
input-output mappings they can model.
– More layers of linear units do not help. Its still linear.
– Fixed output non-linearities are not enough
yj Backward step:
dj propagate errors from
output to hidden layer
wjk
xk dk
wki
Forward step:
xi Propagate activation
from input to output
Inputs
layer
The idea behind Backpropagation
• We don’t know what the hidden units ought to do, but
we can compute how fast the error changes as we
change a hidden activity.
– Instead of using desired activities to train the hidden units, use
error derivatives w.r.t. hidden activities.
– Each hidden activity can affect many output units and can
therefore have many separate effects on the error. These effects
must be combined.
– We can compute error derivatives for all the hidden units
efficiently.
– Once we have the error derivatives for the hidden activities, its
easy to get the error derivatives for the weights going into a
hidden unit.
Formalizing learning in MLP using Backpropagation
Wj,i
K ……………………
xn
Derivative of sigmoid(range of 0 - 1)
Wk,j Wk,j + × ak × Δj eq 2 k
Equation 1 and 2 are similar in nature
Δj= g’ ( in j ) Wj,i Δi
Error at j
Error Computation chain rule
dE / dWk,j = - (Yi - ai) dai / dWk,j
=- (Yi - ai) dg (ini) / dWk,j
=- (Yi - ai) g’ (ini) d(ini)/ dWk,j
= i d(ini)/ dWk,j
= i . d / dWk,j . ( Wj,i . aj)
= - i Wj,i . d aj / dWk,j
=- i Wj,i . g’ (inj) d (inj) / dWk,j
=- i Wj,i . g’ (inj) d inj / dWk,j
= - i Wj,i . g’ (inj) d ( Wk,j . ak ) / d Wk,j
= - i Wj,i . g’ (inj) ak
= -ak . j
Change in weight at Wkj as per equation 2
Wkj -> W kj + * ak * j
Back-propagation network (BPN)
Training algorithm
• Step 1: Initialize the network synaptic weights to small random
value.
• Step 2: Form the set of training input/output pairs, present
an input pattern and calculate the network response.
• Step 3: The desire network response is compared with the actual
output of the network, and all the local errors can be
computed
• Step 4: Update weight of the network
• Step 5: Until the network reaches a predetermined level of
accuracy in producing the adequate response for all the
training pattern, continue step 2 through 4
Question
Bias is 1
Learning rate is 0.05
Activation is y=(a) =1/(1+e-a)
1
-0.2
B3 O1
0.4 0.1
B1 0.3
0.5 B2
Z1 Z2
0.6
-0.1
-0.3
0.4
X1 X2
0 1
Steps to solve the problem
• Feed-Forward Phase
– Calculate the net input at Z1 and Z2
– Calculate the net input at O1
– Compute the error at O1
• Back-Prop Phase
– Change wt between hidden and output layer
– Compute error at Z1 and Z2 w.r.t input layer
– Change wt between input and hidden layer
– Compute final wt of the network
Feed-Forward Computation
• Net input at Z1
Z1 = 0 * 0.6 + 1 * -0.1 + 1 * 0.3 = 0.2
az1 = f ( 0.2 ) =0.5498
• Net input at Z2
Z2= -0.3 * 0 + 0.4 * 1 + 1 * 0.5 = 0.9
az1 = f(0.9) =0.7109
• Net input at O1
– O1 = 0.54 * 0.4 + 0.71 * 0.1 = 0.091
– ao1 = f(.091) = 0.5227