Lec 15 MLP Cont
Lec 15 MLP Cont
x1 v11= -1
w11= 1 y1
v21= 0 w21= -1
v12= 0
w12= 0
x2 v22= 1 y2
w22= 1
v10= 1
v20= 1
x1= 0 v11= -1
w11= 1 y1
v21= 0 w21= -1
v12= 0
w12= 0
x2= 1 v22= 1 y2
w22= 1
v11= -1 u1 = 1
x1 w11= 1 y1
v21= 0 w21= -1
v12= 0
w12= 0
x2 v22= 1 y2
w22= 1
u2 = 2
u1 = -1x0 + 0x1 +1 = 1
u2 = 0x0 + 1x1 +1 = 2
Calculate first layer outputs by passing activations thru activation
functions
z1 = 1
x1 v11= -1
w11= 1 y1
v21= 0 w21= -1
v12= 0
w12= 0
x2 v22= 1 y2
w22= 1
z2 = 2
z1 = g(u1) = 1
z2 = g(u2) = 2
Calculate 2nd layer outputs (weighted sum thru activation functions):
x1 v11= -1
w11= 1 y1= 2
v21= 0 w21= -1
v12= 0
w12= 0
x2 v22= 1 y2= 2
w22= 1
y1 = a1 = 1x1 + 0x2 +1 = 2
y2 = a2 = -1x1 + 1x2 +1 = 2
Backward pass:
wij (t 1) wij (t ) i (t ) z j (t )
( d i (t ) yi (t )) g ' ( ai (t )) z j (t )
x1 v11= -1
w11= 1 1= -1
v21= 0 w21= -1
v12= 0
w12= 0
x2 v22= 1 2= -2
w22= 1
v11= -1 z1 = 1
x1 w11= 1 1 z1 =-1
v21= 0 w21= -1 1 z2 =-2
v12= 0
w12= 0
x2 v22= 1 2 z1 =-2
w22= 1 2 z2 =-4
z2 = 2
wij (t 1) wij (t ) i (t ) z j (t )
Weight changes will be:
wij (t 1) wij (t ) i (t ) z j (t )
x1 v11= -1
w11= 0.9
v21= 0 w21= -1.2
v12= 0
w12= -0.2
x2 v22= 1
w22= 0.6
But first must calculate ’s:
v11= -1
x1 1 w11= -1 1= -1
v21= 0
2 w21= 2
v12= 0 1 w12= 0
x2 v22= 1 2= -2
2 w22= -2
’s propagate back:
v11= -1 1= 1
x1 1= -1
v21= 0
v12= 0
x2 v22= 1 2= -2
2 = -2
1 = - 1 + 2 = 1
2 = 0 – 2 = -2
And are multiplied by inputs:
vij (t 1) vij (t ) i (t ) x j (t )
x1= 0 v11= -1 1 x1 = 0
1= -1
v21= 0 1 x2 = 1
v12= 0
2 x1 = 0
x2= 1 v22= 1 2= -2
2 x2 = -2
Finally change weights:
vij (t 1) vij (t ) i (t ) x j (t )
x1= 0 v11= -1
w11= 0.9
v21= 0 w21= -1.2
v12= 0.1
w12= -0.2
x2 = 1 v22= 0.8
w22= 0.6
vij (t 1) vij (t ) i (t ) x j (t )
v11= -1 z1 = 1.2
x1= 0 w11= 0.9
v21= 0 w21= -1.2
v12= 0.1
w12= -0.2
x2 = 1 v22= 0.8
w22= 0.6
z2 = 1.6
Now go forward again (would normally use a new input vector):
vij (t 1) vij (t ) i (t ) x j (t )
dg ( a )
Where: g ' ( ai (t ))
da
where k is a positive
constant. The sigmoidal
function gives a value in
range of 0 to 1.
Alternatively can use
tanh(ka) which is same
shape but in range –1 to 1.
Input-output function of a
neuron (rate coding
assumption)
Note: when net = 0, f = 0.5
Derivative of sigmoidal function is
k exp( k ai (t ))
g ' (ai (t )) k g (ai (t ))[1 g (ai (t ))]
[1 k exp(k ai (t ))]2
Sequential mode
• Less storage for each weighted connection
• Random order of presentation and updating per pattern means
search of weight space is stochastic--reducing risk of local
minima
• Able to take advantage of any redundancy in training set (i.e..
same pattern occurs more than once in training set, esp. for large
difficult training sets)
• Simpler to implement
Batch mode:
• Faster learning than sequential mode
• Easier from theoretical viewpoint
• Easier to parallelise
Dynamics of BP learning
p
1
E(t)=
2 k 1
( d k (t ) Ok (t )) 2
idea is to reduce E
valleys
Selecting initial weight values
E(F)=Es(F)+ ER(F)
ER ( F ) y d x 2
without regularization
with regularization
Momentum
p M
E [ d j ( n) y j ( n)]i2
i 1 j 1
Hold-out method
Simplest method when data is not scare
where H ( x1 , ... , xm )= k
a
i=1 i f ( m
j=1 wijxj + bi )