0% found this document useful (0 votes)
12 views19 pages

38 Backpropagation

The document discusses the backpropagation learning algorithm used in multi-layer feed-forward neural networks, detailing its application in adjusting weights to minimize output error. It outlines the training process in three stages: feed-forward of input, error calculation and back-propagation, and weight updating. Additionally, it explains the architecture of the network, the activation functions used, and provides an example of the algorithm in action.

Uploaded by

f20220996
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views19 pages

38 Backpropagation

The document discusses the backpropagation learning algorithm used in multi-layer feed-forward neural networks, detailing its application in adjusting weights to minimize output error. It outlines the training process in three stages: feed-forward of input, error calculation and back-propagation, and weight updating. Additionally, it explains the architecture of the network, the activation functions used, and provides an example of the algorithm in action.

Uploaded by

f20220996
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

AI for Robotics

Backpropagation
DR. ABHISHEK SARKAR
ASSISTANT PROFESSOR
MECHANICAL ENGG., BITS
Backpropagation
• Based on gradient-descent algorithm

8
Back-Propagation Network

• This learning algorithm is applied to multi-layer feed-forward


networks consisting of processing elements with continuous
differentiable activation functions.
• For a given set of training input-output pair, this algorithm provides
a procedure for changing the weights in a BPN to classify the given
input patterns correctly.
• This is a method where the error is propagated back to the hidden
unit.
• When the hidden layers are increased the network training
9 becomes more complex.
Back-Propagation Network

• At the hidden layers, there is no direct information of the error.


• Therefore, other techniques should be used to calculate an error at
the hidden layer, which will cause minimization of the output
error, and this is the ultimate goal.
• The training of the BPN is done in three stages
- the feed-forward of the input training pattern,
- the calculation and back-propagation of the error, and
- updation of weights.

10
1 1 w01
v01 w0k
v0p w0m
v0j
t1
x1 v11 Y1 y1
x1
v1j w11
v1p
Z1 w1k

vi1 w1m
xi t1
x2i vij Y
x2k yk
wj1
vip
Zj wjk
wjm
vn1
xn t1
xn vnj
wp1
Ym ym
vnp
wpk

Zp wpm
12
Architecture

1
• The activation function could be any function v01
v0p
which increases monotonically and is also x1 v11 v0j
differentiable. x1
n v1j
v1p
Z1
• The net input to zj is zinj =v 0 j + ∑ x i v ij
i=1 xi vi1
and the output is z j =f ( z inj ) x2i vij

• Similarly, p
vip
Zj
the net input to yk is y ink =w 0 k + ∑ z j w jk vn1
i=1
xn xn vnj
and the output is y k =f ( y ink ) vnp

13 Zp
Architecture

• δk = error correction weight adjustment for wjk that is due to an


error at output unit yk, which is back-propagated to the hidden units
that feed into yk
• δj = error correction weight adjustment for vij that is due to the
back-propagation of error to the hidden unit zj

14
Training Algorithm

• The error back-propagation learning algorithm can be outlined in


the following algorithm:
• Step 0: Initialize weights and learning rate (take some small
random values).
• Step 1: Perform Steps 2-9 when stopping condition is false.
• Step 2: Perform Steps 3-8 for each training pair.
• Step 3: Each input unit receives input signal xi and sends it to the
hidden unit (i = 1 to n}

15
Feed-forward phase (Phase I)

• Step 4: Each hidden unit zj(j = 1 to p) sums its weighted input signals
n
to calculate net input:
zinj =v 0 j + ∑ x i v ij
i=1

• Applying its activation functions over Zinj,


zj = f(zinj)
and send the output signal from the hidden unit to the input of
output layer units.

16
Feed-forward phase (Phase I)

• Step 5: For each output unit yk (k = 1 to m), calculate the net input:
p
y ink =w 0 k + ∑ z j w jk
j=1

• apply the activation function yk = f(yink)

17
Back-propagation of error (Phase ll)

• Step 6: Each output unit yk(k = 1 to m) receives a target pattern


corresponding to the input training pattern and computes the error
correction term
δk = (tk - yk)f’(yink)
• The derivative f'(yink) can be calculated as [f’(x) = λ f(x) {1 - f(x)}] for
binary sigmoid function.
• On the basis of the calculated error correction term, update the
change in weights and bias:
Δ w jk = α δ k z j ; Δ w 0 k =α δ k
18
Back-propagation of error (Phase ll)

• Step 7: Each hidden unit (zj, j = 1 to p) sums its delta inputs from the
output units: m
δ inj = ∑ δ k w jk
k=1

• The term δinj gets multiplied with thee derivative of f(zinj) to calculate
the error term: δ j =δ inj f ' ( zinj )
[The derivative f'(zinj) can be calculated as before.]
• On the basis of the calculated δj, update the change in weights and
bias: Δ v ij =α δ j x i ; Δ v 0 j =α δ j
19
Weight and bias updation (Phase Ill):

• Step 8: Each output unit (yk, k = 1 to m) updates the bias and


weights: w jk (new)=w jk (old)+Δ w jk
w 0 k (new)=w 0 k (old)+Δ w 0 k
• Each hidden unit (zj, j = 1 to p) updates its bias and weights:
v ij (new )=v ij (old)+ Δ v ij
v 0 j (new )=v 0 j (old)+Δ v 0 j
• Step 9: Check for the stopping condition. The stopping condition
may be certain number of epochs reached or when the actual
20 output equals the target output.
Weight and bias updation (Phase Ill):

• The above algorithm uses the incremental approach for updation of


weights, i.e., the weights are being changed immediately after a
training pattern is presented.
• There is another way of training called batch-mode training, where
the weights are changed only after all the training patterns are
presented.
• It will converge since it implements a gradient-descent on the error
surface in the weight space, and this will roll down the error surface
to the nearest minimum error and will stop.
21
Example

1 v01= 0.3 1
w01= -0.2

V11 = 0.6 w11= 0.4


x1 Z1
-1 v12= -0.3
y t = +1
v02= 0.5 Y1 –
+1 v21= -0.1
x2 Z2 w21= 0.1
v22= 0.4
error

24
Example

• Bipolar sigmoidal activation function is used


2 1−e−x
f ( x )= −x
−1= −x
1+ e 1+e
• The net input: For Z1 in hidden layer
Zin1 =V01 + x1 V11 + x2 V21 = ?
1 v01= 0.3 1
• For Z2 in hidden layer -1 x1
V11 = 0.6
Z1 w11= 0.4
w01= -0.2

Zin2 =V02 + x1 V12 + x2 V22 = ? v12= -0.3


v02= 0.5 Y1
y

t = +1
v21= -0.1
+1
xx22
25 v22= 0.4
Z2 w21= 0.1 error
Example

• Applying activation to calculate the output, we obtain


1−e 0.4
Z 1 = f(Zin1) = 0.4
=−0.1974
1+e
1−e−1.2
Z 2 = f(Zin1) = −1.2
=0.537
1+ e

• Calculate the net input 1 v01= 0.3 1


w01= -0.2
-1 V11 = 0.6 w11= 0.4
entering the output layer x1 Z1
v12= -0.3 y t = +1
Yin= w0 + Z1w1 +Z2w2 v02= 0.5 Y1 –
v21= -0.1
+1
xx22
26 =? v22= 0.4
Z2 w21= 0.1 error
Example

• Applying activations to calculate the output, we obtain


1−e 0.22536
Y = f(Yin) = 0.22526
=−0.1122
1+e
• Compute the error portion δk:
δk = (tk – yk)f’(yink) = (tk – yk)0.5 (1 + f(yink))(1 - f(yink))
1 v01= 0.3 1
w01= -0.2
-1 V11 = 0.6 w11= 0.4
x1 Z1
v12= -0.3 y t = +1
v02= 0.5 Y1 –
v21= -0.1
+1
xx22
27 v22= 0.4
Z2 w21= 0.1 error
Example

• Find the changes in weights between hidden and output layer:


Δ w 1 =α δ 1 z 1 ;
Δ w 2 =α δ 1 z 2 ;
Δ w 0= α δ 1
• Next we calculate the error from hidden layer and proceed as
before... 1 v = 0.3 1
01
w = -0.2 01
-1 V11 = 0.6 w11= 0.4
x1 Z1
v12= -0.3 y t = +1
v02= 0.5 Y1 –
v21= -0.1
+1
xx22
28 v22= 0.4
Z2 w21= 0.1 error

You might also like