0% found this document useful (0 votes)
24 views21 pages

ANN5

The document describes the backpropagation algorithm for training a neural network. It defines the terms used, such as weights, activations, targets, and derivations of the error function. The backpropagation algorithm computes the error terms for the output layer first, then propagates these errors back through the hidden layers to update the weights between layers in order to minimize error. This process of forward propagation of inputs and backward propagation of errors is repeated iteratively to gradually adjust the weights.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views21 pages

ANN5

The document describes the backpropagation algorithm for training a neural network. It defines the terms used, such as weights, activations, targets, and derivations of the error function. The backpropagation algorithm computes the error terms for the output layer first, then propagates these errors back through the hidden layers to update the weights between layers in order to minimize error. This process of forward propagation of inputs and backward propagation of errors is repeated iteratively to gradually adjust the weights.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 21

Example 2

x1 2

0.982 0
x  d  1   0.1
 0.5  4 -3.93
x2
1
d = t=y

The transfer function is unipolar continuous (logsig) 1


o
f ' (net )  (1  o)o 1  e net
w   (d  o)(1  o)o x net=2*0.982+4*0.5-3.93*1=0.034

w    x o=1/(1+exp(-0.04)) = 0.51
Error=1-0.51=0.49
  (1  .51)(1  .51)(.51)  .1225
w   *  * 0.982  0.1* .1225 * .982  0.012 4+0.1*0.1225*.5=4.0061
wnew  wold  .012  2  .012  2.012
-3.93+.1*.1225*1=-3.9178
net = 2.012*0.982+4.0061*0.5-3.9178*1=0.061
Error=1-0.5152=0.4848
o=1/(1+exp(-0.061)=0.5152
1 
x  d  t  y 1   0.1
0 
3
1 2

0
4 -3.93
0 5
1
1

1 -6

x1 2

4 -3.93
x2
1
By chain rule:
∂E ∂E ∂o ∂net
---- = ---- ----i ---- i
∂wij ∂oi ∂neti ∂wij

∂E
---- = (1/2) 2 (di - oi) (-1) = (oi - ti) E = 1/2 ∑ (di - oi)2
∂oi i

∂oi ∂
---- = ---- [1 / (1 + e-xi)] = - [1 / (1 + e-xi)2] (- e-xi ) = e-xi / (1 + e-xi)2
∂xi ∂xi
(1 + e-xi) - 1 1
= ------------- • ----------- = [1 - 1 / (1 + e -x
i)] • [1 / (1 + e i)]
-x
(1 + e-xi) (1 + e-xi)

= (1 - oi) oi
∂xi
---- = aj xi = ∑ wijaj
∂wij j
∂E ∂E ∂o ∂xi
---- = ---- ----i ----
∂wij ∂oi ∂xi ∂wij

= (oi - ti) (1 - oi)oi aj


}
}
}
raw error term due to incoming
(pre-synaptic) activation
due to sigmoid

∂E
Δwij = - η ----- (where η is an arbitrary learning rate)
∂wij

wijt+1 = wijt + η (ti - oi) (1 - oi) oi aj


A two layer network
1 
x  d 1   0.1
0 

Transfer function is unipolar continuous


1
o
1  e net
net3=u3= 3*1+4*0+1*1=4 o3=1/(1+exp(-4))=0.982
net4=u4= 6*1+5*0+-6*1=0 o4=1/(1+exp(0))=0.5
net5=u5=2*0.982+4*0.5-3.93*1=0.034 o5=1/(1+exp(-0.04))
=0.51
f ' ( net )  ( 1  o )o
w   ( d  o )( 1  o )o x  w    x  5  (1  .51)(1  .51)(.51)  .1225
δ
w53   *  5 * 0.982  0.1* .1225 * .982  0.012
w53  w53  .012  2  .012  2.012
Derivation of Backprop
Output layer Define:
ai = activation of neuron i
Hidden layer
wij = synaptic weight from neuron j to neuron i
Input layer
ni = excitation of neuron i (sum of weighted
activations coming into neuron i, before
squashing)=net
di = target vector=ti
oi = output of neuron i

By definition:
ni = ∑ wijaj
j
oi = 1 / (1 + e-ni)
Summed, squared error at output layer: E = 1/2 ∑ (di - oi)2
i
Derivation of Backprop
By chain rule:
∂E ∂E ∂o ∂ni
---- = ---- ----i ----
∂wij ∂oi ∂ni ∂wij

∂E
---- = (1/2) 2 (di - oi) (-1) = (oi - ti) E = 1/2 ∑ (di - oi)2
∂oi i

∂oi ∂
---- = ---- [1 / (1 + e-ni)] = - [1 / (1 + e-ni)2] (- e-ni ) = e-ni / (1 + e-ni)2
∂ni ∂ni
(1 + e-ni) - 1 1
= ------------- • ---------- = [1 - 1 / (1 + e -n
i)] • [1 / (1 + e i)]
-n
(1 + e-ni) (1 + e-nxi)

= (1 - oi) oi
∂ni
---- = aj ni = ∑ wijaj
∂wij j
Derivation of Backprop
∂E ∂E ∂o ∂ni
---- = ---- ----i ----
∂wij ∂oi ∂ni ∂wij

= (oi - ti) (1 - oi)oi aj


}
}
}
raw error term due to incoming
(pre-synaptic) activation
due to sigmoid

∂E
Δwij = - η ----- (where η is an arbitrary learning rate)
∂wij

wijt+1 = wijt + η (ti - oi) (1 - oi) oi aj


Derivation of Backprop
Now need to compute weight changes in the hidden layer, so, as before,
we write out the equation for the error function slope w.r.t. a
particular weight leading into the hidden layer:

∂E ∂E ∂ai ∂ni
---- = ---- ---- ----
∂wij ∂ai ∂ni ∂wij
(where i now corresponds to a unit in the hidden layer and j now
corresponds to a unit in the input or earlier hidden layer)
From previous derivation, last two terms can simply be written down:
∂ai
---- = (1 - ai) ai
∂ni
∂ni
---- = aj
∂wij
Derivation of Backprop
However, the first term is more difficult to understand for this hidden
layer. It is what Minsky called the credit assignment problem, and is
what stumped connectionists for two decades. The trick is to realize
that the hidden nodes do not themselves make errors, rather they
contribute to the errors of the output nodes. So, the derivative of the
total error w.r.t. a hidden neuron’s activation is the sum of that hidden
neuron’s contributions to the errors in all of the output neurons:

∂E ∂E ∂ok ∂nk
---- = ∑ ---- ---- ---- (where k indexes over all output units)
∂ai k ∂ok ∂nk ∂ai

contribution of contribution of contribution of the


each output all inputs to the particular neuron
neuron output neuron in the hidden layer
(from the hidden
layer)
Derivation of Backprop
From our previous derivations, the first two terms are easy:
∂E
---- = (ok - dk)
∂ok

∂ok
---- = (1 - ok) ok
∂nk

For the third term, remember: nk = ∑ wkiai


i
And since only one member of the sum involves ai:

∂nk
---- = wki
∂ai
Derivation of Backprop
Combining these terms then yields:

∂E
---- = - ∑ (dk - ok) (1 - ok) ok wki
∂ai k

δk Weight between hidden and output


layers

And combining with previous results yields:


∂E
---- = - (∑ δk wki) (1 - ai) ai aj
∂wij k

ei

wijt+1 = wijt + η (∑ δk wki) (1 - ai) ai aj


k

δi
Derivation of Backprop
Forward Propagation of Activity
• Forward Direction layer by layer:
– Inputs applied
– Multiplied by weights
– Summed
– ‘Squashed’ by sigmoid activation function
– Output passed to each neuron in next layer
• Repeat above until network output produced

Back-propagation of error

• Compute error (delta or local gradient) for each output unit


• Layer-by-layer, compute error (delta or local gradient) for each
hidden unit by backpropagating errors (as shown previously)

Can then update the weights using the Generalised Delta Rule (GDR), also
known as the Back Propagation (BP) algorithm
For output neuron

wijt+1 = wijt + η (di - oi) (1 - oi) oi aj

i
For hidden neuron

wijt+1 = wijt + η (∑ δk wki) (1 - ai) ai aj


k
δ k=(dk - ok) (1 - ok) ok

i
The chain rule does the following: distribute the error of an output unit o to all
the hidden units that is it connected to, weighted by this connection. Differently
put, a hidden unit h receives a delta from each output unit o equal to the delta
of that output unit weighted with (= multiplied by) the weight of the connection
between those units.
Algorithm (Backpropagation)
Start with random weights
while error is unsatisfactory
do for each input pattern
compute hidden node input (net)
compute hidden node output (o)
compute input to output node (net)
compute network output (o)
Modify outer layer weights
wijt+1 = wijt + η (di - oi) (1 - oi) oi aj

Modify outer layer weights


wijt+1 = wijt + η (∑ δk wki) (1 - ai) ai aj
k

δ k=(dk - ok) (1 - ok) ok


end
end
w50   *  5 *1  0.1* .1225  0.01225
w50  w50  .01225  3.92  .01225  3.9078

w53   *  5 * 0.982  0.1* .1225 * .982  0.012


w53  w53  .012  2  .012  2.012

w41   *  4 *1  0.1* .1225 *1  0.01225


w41  w413  .01225  6  .0125  6.01225
w31   *  3 *1  0.1* .0043 *1  0.0043
w53  w31  .0043  3  .0043  3.0043
a new w
w δ
Verification that it works

Thus the new error (1 - 0.5239)=0.476


has been reduced by 0.014
(from 0.490 to 0.476)
Update the weights of the multi-layer network using backpropagation
algorithm. The transfer function of the neurons are tansig functions. Target
outputs are y2*=1 and y3*=0.5. Learning rate is 0.5.
Show that with the updated weights there is a reduction in the total error.

Homework- Bring the solution tomorrow


A two layer network
1 
x  d 1   0.1
0 

Transfer function is unipolar continuous


1
o
1  e net
net3=u3= 3*1+4*0+1*1=4 o3=1/(1+exp(-4))=0.982
net4=u4= 6*1+5*0+-6*1=0 o4=1/(1+exp(0))=0.5
net5=u5=2*0.982+…… o5=

f ' ( net )  ( 1  o )o
w   ( d  o )( 1  o )o x  w    x
δ

You might also like