0% found this document useful (0 votes)
9 views17 pages

Backprogation With Example

The document discusses the capabilities of multilayer networks trained by the backpropagation algorithm, which can express nonlinear decision surfaces unlike single perceptrons. It outlines the forward pass process, where inputs are processed through the network to generate outputs, and the backward pass, which involves calculating errors and updating weights to minimize these errors. The use of the sigmoid function for activation and the calculation of total error for optimizing the network's performance is also explained.

Uploaded by

Sulagna Sinha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views17 pages

Backprogation With Example

The document discusses the capabilities of multilayer networks trained by the backpropagation algorithm, which can express nonlinear decision surfaces unlike single perceptrons. It outlines the forward pass process, where inputs are processed through the network to generate outputs, and the backward pass, which involves calculating errors and updating weights to minimize these errors. The use of the sigmoid function for activation and the calculation of total error for optimizing the network's performance is also explained.

Uploaded by

Sulagna Sinha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

e

rje
• Single perceptron can only express linear
decision surfaces.
MULTILAYER

ha
• In contrast, the kind of multilayer networks
trained by the backpropagation algorithm are
NETWORKS AND capable of expressing a rich variety of nonlinear

Ac
BACKPROPOGATION decision surfaces.
ALGORITHM A Differentiable Threshold Unit

al
• Perceptron : not differentiable -> can’t use
gradient descent
rim
• Linear Unit : multi-layers of linear units ->
still produce only linear function
Pa

• Sigmoid Unit : smoothed, differentiable


threshold function
Prof. Parimal Acharjee
Sigmoid Unit

e
rje
x0=1
x1 w1

ha
w0 net= Σ n wi xi
o=σ(net)=1/(1+e-net)
w2 i=0

x2 Σ o

Ac
.
.
. wn

al
xn rim
σ(x) is the sigmoid function: 1/(1+e-x)
Pa

Prof. Parimal Acharjee Parimal Acharjee, Professor, NIT Durgapur 2


Example of

e
Backpropogation

rje
ha
• A neural network with
two inputs,

Ac
• two hidden neurons,
two output neurons.

al
• Additionally, the hidden
rim
and output neurons will
include a bias.
Pa

Prof. Parimal Acharjee


Initial weights, the
biases, and training

e
rje
inputs/outputs

ha
• In this example, consider a
single training set

Ac
• Given inputs 0.05 and 0.10.
• Desired output 0.01 and

al
0.99.
• Goal of backpropagation is to
rim
optimize the weights so that
the neural network can learn
Pa

how to correctly map


arbitrary inputs to outputs.
Prof. Parimal Acharjee
The Forward Pass

e
• At the beginning, predicts the weights and biases.

rje
• Get inputs. Here, Two Inputs: 0.05 and 0.10.

ha
• Those inputs forward through the network.

Ac
• Figure out the total net input to each hidden layer
neuron.

al
• Apply an activation function.
rim
• In this example, the logistic function is used.
• Repeat the process with the output layer neurons.
Pa

Prof. Parimal Acharjee


Total net input for hidden
layers and outputs of hidden

e
layers

rje
Net Input for hidden layers 1 (ℎ1 )

ha
= 𝑤𝑤1 ∗ 𝑖𝑖1 + 𝑤𝑤2 ∗ 𝑖𝑖2 + 𝑏𝑏1 ∗ 1
𝑛𝑛𝑛𝑛𝑛𝑛ℎ1 = 0.15 ∗ 0.05 + 0.2 ∗ 0.1 + 0.35 ∗ 1

Ac
= 0.3775
Apply the logistic function (sigmoid function) to
get the output of ℎ1

al
1 1
𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 = =
rim
1 + 𝑒𝑒 −𝑛𝑛𝑛𝑛𝑛𝑛ℎ1 1 + 𝑒𝑒 −0.3775
= 0.593269992
Follow the same process for ℎ2
Pa

𝑜𝑜𝑜𝑜𝑜𝑜ℎ2 =0.596884378

Prof. Parimal Acharjee


Output from the hidden layer neurons will
be treated as input.
Apply same process for the output layer

e
neurons.

rje
We have,
𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 = 0.593269992

ha
𝑜𝑜𝑜𝑜𝑜𝑜ℎ2 = 0.5968884378
𝑛𝑛𝑛𝑛𝑛𝑛01

Ac
= 𝑤𝑤5 ∗ 𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 + 𝑤𝑤6 ∗ 𝑜𝑜𝑜𝑜𝑜𝑜ℎ2 + 𝑏𝑏2 ∗ 1
𝑛𝑛𝑛𝑛𝑛𝑛01 = 0.4 ∗ 0.593269992 + 0.45 ∗
0.5968884378+0.6*1

al
rim =1.105905967
1
𝑜𝑜𝑜𝑜𝑜𝑜01 = =
1+𝑒𝑒 −𝑛𝑛𝑛𝑛𝑛𝑛01
1
=0.75136507
Pa

1+𝑒𝑒 −1.105905967
Carrying out the same process for 𝑂𝑂2
𝑜𝑜𝑜𝑜𝑜𝑜02 = 0.772928465
Prof. Parimal Acharjee
Determine Error

e
rje
• Apply squared error function to determine the error for each output
neuron.

ha
• Add output errors to obtain the 𝑛𝑛
total error (𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 )
1
𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = � (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑖𝑖 − 𝑜𝑜𝑜𝑜𝑜𝑜𝑖𝑖 )2

Ac
2
𝑖𝑖=1
• In some papers, target is denoted as ideal.

al
• Output is mentioned as actual.
rim
• Half is included so that exponent is cancelled when we differentiate later
on.
Pa

• The result is eventually multiplied by a learning rate anyway so it doesn’t


matter that we introduce a constant here.
Prof. Parimal Acharjee
Total Error

e
rje
• Target output for 𝑂𝑂1 : 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡01 =0.01
• Neural network output for 𝑂𝑂1 : 𝑜𝑜𝑜𝑜𝑜𝑜01 =0.75136507

ha
• Error for 𝑂𝑂1 (𝐸𝐸01 )

Ac
1 2 1
• 𝐸𝐸01 = (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡01 −
𝑜𝑜𝑜𝑜𝑜𝑜01 ) = (0.01− 0.75136507)2 =0.274811083
2 2
• Target output for 𝑂𝑂2 : 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡02 =0.99

al
• Neural network output for 𝑂𝑂1 : 𝑜𝑜𝑜𝑜𝑜𝑜02 =0.772928465
rim
• Error for 𝑂𝑂2 (𝐸𝐸02 )
• 𝐸𝐸02 =0.023560026
Pa

• Total Error (𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 )= 𝐸𝐸01 +𝐸𝐸02 =0.298371109

Prof. Parimal Acharjee


• Minimize the error for each output neuron and the

e
network as a whole.

rje
• Obtain the actual output to be closer the target output.
• To achieve the goal, update each of the weights in the

ha
network.
The • Errors will be used in the backward layers to update

Ac
Backwards weights.
• Consider 𝑤𝑤5 of the output layer to be updated using the
Pass total error (𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ) of the network

al
• Need differential of 𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 with respect to 𝑤𝑤5
rim 𝜕𝜕𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
• : the partial derivative of 𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 with respect to
𝜕𝜕𝑤𝑤5
𝑤𝑤5
Pa

OR the gradient with respect to 𝑤𝑤5


Prof. Parimal Acharjee
Visualization of Backward Pass

e
rje
ha
Ac
al
rim
Pa

Prof. Parimal Acharjee


𝝏𝝏𝑬𝑬𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕
Determine

e
𝝏𝝏𝒐𝒐𝒐𝒐𝒐𝒐𝟎𝟎𝟎𝟎

rje
ha
Calculate the total error change with respect to the output.

Ac
1 2 1
𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡01 − 𝑜𝑜𝑜𝑜𝑜𝑜01 ) + (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡02 − 𝑜𝑜𝑜𝑜𝑜𝑜02 )2
2 2

al
𝜕𝜕𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 1
=2* (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡01
rim − 𝑜𝑜𝑜𝑜𝑜𝑜01 )2−1 ∗ (−1)+0
𝜕𝜕𝑜𝑜𝑜𝑜𝑜𝑜01 2

𝜕𝜕𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
=− (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡01 − 𝑜𝑜𝑜𝑜𝑜𝑜01 )=-(0.01-0.75136507)=0.74136507
Pa

𝜕𝜕𝑜𝑜𝑜𝑜𝑜𝑜01

Prof. Parimal Acharjee


𝝏𝝏𝑬𝑬𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕
Determination of
𝝏𝝏𝒘𝒘𝟓𝟓

e
rje
1
Sigmoid function is used. We have, 𝑜𝑜𝑜𝑜𝑜𝑜01 =
1+𝑒𝑒 −𝑛𝑛𝑛𝑛𝑛𝑛01

ha
𝜕𝜕𝑜𝑜𝑜𝑜𝑜𝑜01
= 𝑜𝑜𝑜𝑜𝑜𝑜01 ∗ 1 − 𝑜𝑜𝑜𝑜𝑜𝑜01 = 0.75136507 ∗ 1 − 0.75136507
𝜕𝜕𝑛𝑛𝑛𝑛𝑡𝑡01

Ac
= 0.186815602
𝑛𝑛𝑛𝑛𝑛𝑛01 = 𝑤𝑤5 ∗ 𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 + 𝑤𝑤6 ∗ 𝑜𝑜𝑜𝑜𝑜𝑜ℎ2 + 𝑏𝑏2 ∗ 1

al
𝜕𝜕𝑛𝑛𝑛𝑛𝑛𝑛01
𝜕𝜕𝑤𝑤5
= 𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 + 0 + 0 = 𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 = 0.593269992
𝜕𝜕𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
=
𝜕𝜕𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝜕𝜕𝑜𝑜𝑜𝑜𝑜𝑜
rim
∗ 𝜕𝜕𝑛𝑛𝑛𝑛𝑛𝑛01 ∗
𝜕𝜕𝑛𝑛𝑛𝑛𝑛𝑛01
𝜕𝜕𝑤𝑤5 𝜕𝜕𝑜𝑜𝑜𝑜𝑜𝑜01 01 𝜕𝜕𝑤𝑤5
𝝏𝝏𝑬𝑬𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕
= − 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝟎𝟎𝟎𝟎 − 𝒐𝒐𝒐𝒐𝒐𝒐𝟎𝟎𝟎𝟎 ∗ 𝒐𝒐𝒐𝒐𝒐𝒐𝟎𝟎𝟎𝟎 ∗ 𝟏𝟏 − 𝒐𝒐𝒐𝒐𝒐𝒐𝟎𝟎𝟎𝟎 ∗ 𝒐𝒐𝒐𝒐𝒐𝒐𝒉𝒉𝒉𝒉
Pa

𝝏𝝏𝒘𝒘𝟓𝟓
𝜕𝜕𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝜕𝜕𝑤𝑤5
= 0.74136507 ∗ 0.186815602 ∗ 0.593269992 = 0.082167041
Prof. Parimal Acharjee
e
Update weights

rje
ha
• Assume learning rate (𝜂𝜂)=0.5
• Update 𝑤𝑤5 as follows:

Ac
𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 𝜕𝜕𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
• 𝑤𝑤5 = 𝑤𝑤5 - 𝜂𝜂* 𝜕𝜕𝑤𝑤 =0.4-0.5*0.082167041=0.35891648
5
• Follow the same procedure to get the new weights of 𝑤𝑤6 , 𝑤𝑤7, 𝑤𝑤8

al
𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢
• 𝑤𝑤6 = 0.408666186
rim
𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢
• 𝑤𝑤7 = 0.511301270
Pa

𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢
• 𝑤𝑤8 = 0. 561370121

Prof. Parimal Acharjee


Backward Pass for hidden
layer

e
rje
• Update 𝑤𝑤1 , 𝑤𝑤2 , 𝑤𝑤3 , 𝑤𝑤4 through
backward pass.

ha
• Follow the same procedure as
like as output layer

Ac
• The output of each hidden layer
neuron contributes to all
outputs.

al
• 𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 affects both 𝑜𝑜𝑜𝑜𝑜𝑜1 and 𝑜𝑜𝑜𝑜𝑜𝑜2
rim
• In other words, 𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 affects
both 𝐸𝐸01 and 𝐸𝐸02 .
Pa

𝜕𝜕𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝜕𝜕𝐸𝐸01 𝜕𝜕𝐸𝐸02


• = +
𝜕𝜕𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 𝜕𝜕𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 𝜕𝜕𝑜𝑜𝑜𝑜𝑜𝑜ℎ1
Prof. Parimal Acharjee
• Updated (new) weight values: 𝑤𝑤1 =
0.14978071; 𝑤𝑤2 = 0.19956143

e
𝑤𝑤3 = 0.24975114; 𝑤𝑤4 = 0.29950229

rje
• After updating weights, calculate total error
again.

ha
Updated • Now, calculated total error= 0.291027924
• Earlier iteration (or step) total error was

Ac
weights and 0.298371109
results • The improvement is trivial.

al
rim • Continue the process 10,000 times.
• Total error reduced to 0.0000351085
• Result : 𝑜𝑜𝑜𝑜𝑜𝑜01 =0.015912196 ( where
𝑡𝑡𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎01 =0.01) and 𝑜𝑜𝑜𝑜𝑜𝑜02 = 0.984065734
Pa

(where 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡02 = 0.99)


Prof. Parimal Acharjee
Pa
rim
al
Ac
ha
rje
e

You might also like