Backprogation With Example
Backprogation With Example
rje
• Single perceptron can only express linear
decision surfaces.
MULTILAYER
ha
• In contrast, the kind of multilayer networks
trained by the backpropagation algorithm are
NETWORKS AND capable of expressing a rich variety of nonlinear
Ac
BACKPROPOGATION decision surfaces.
ALGORITHM A Differentiable Threshold Unit
al
• Perceptron : not differentiable -> can’t use
gradient descent
rim
• Linear Unit : multi-layers of linear units ->
still produce only linear function
Pa
e
rje
x0=1
x1 w1
ha
w0 net= Σ n wi xi
o=σ(net)=1/(1+e-net)
w2 i=0
x2 Σ o
Ac
.
.
. wn
al
xn rim
σ(x) is the sigmoid function: 1/(1+e-x)
Pa
e
Backpropogation
rje
ha
• A neural network with
two inputs,
Ac
• two hidden neurons,
two output neurons.
al
• Additionally, the hidden
rim
and output neurons will
include a bias.
Pa
e
rje
inputs/outputs
ha
• In this example, consider a
single training set
Ac
• Given inputs 0.05 and 0.10.
• Desired output 0.01 and
al
0.99.
• Goal of backpropagation is to
rim
optimize the weights so that
the neural network can learn
Pa
e
• At the beginning, predicts the weights and biases.
rje
• Get inputs. Here, Two Inputs: 0.05 and 0.10.
ha
• Those inputs forward through the network.
Ac
• Figure out the total net input to each hidden layer
neuron.
al
• Apply an activation function.
rim
• In this example, the logistic function is used.
• Repeat the process with the output layer neurons.
Pa
e
layers
rje
Net Input for hidden layers 1 (ℎ1 )
ha
= 𝑤𝑤1 ∗ 𝑖𝑖1 + 𝑤𝑤2 ∗ 𝑖𝑖2 + 𝑏𝑏1 ∗ 1
𝑛𝑛𝑛𝑛𝑛𝑛ℎ1 = 0.15 ∗ 0.05 + 0.2 ∗ 0.1 + 0.35 ∗ 1
Ac
= 0.3775
Apply the logistic function (sigmoid function) to
get the output of ℎ1
al
1 1
𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 = =
rim
1 + 𝑒𝑒 −𝑛𝑛𝑛𝑛𝑛𝑛ℎ1 1 + 𝑒𝑒 −0.3775
= 0.593269992
Follow the same process for ℎ2
Pa
𝑜𝑜𝑜𝑜𝑜𝑜ℎ2 =0.596884378
e
neurons.
rje
We have,
𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 = 0.593269992
ha
𝑜𝑜𝑜𝑜𝑜𝑜ℎ2 = 0.5968884378
𝑛𝑛𝑛𝑛𝑛𝑛01
Ac
= 𝑤𝑤5 ∗ 𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 + 𝑤𝑤6 ∗ 𝑜𝑜𝑜𝑜𝑜𝑜ℎ2 + 𝑏𝑏2 ∗ 1
𝑛𝑛𝑛𝑛𝑛𝑛01 = 0.4 ∗ 0.593269992 + 0.45 ∗
0.5968884378+0.6*1
al
rim =1.105905967
1
𝑜𝑜𝑜𝑜𝑜𝑜01 = =
1+𝑒𝑒 −𝑛𝑛𝑛𝑛𝑛𝑛01
1
=0.75136507
Pa
1+𝑒𝑒 −1.105905967
Carrying out the same process for 𝑂𝑂2
𝑜𝑜𝑜𝑜𝑜𝑜02 = 0.772928465
Prof. Parimal Acharjee
Determine Error
e
rje
• Apply squared error function to determine the error for each output
neuron.
ha
• Add output errors to obtain the 𝑛𝑛
total error (𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 )
1
𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = � (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑖𝑖 − 𝑜𝑜𝑜𝑜𝑜𝑜𝑖𝑖 )2
Ac
2
𝑖𝑖=1
• In some papers, target is denoted as ideal.
al
• Output is mentioned as actual.
rim
• Half is included so that exponent is cancelled when we differentiate later
on.
Pa
e
rje
• Target output for 𝑂𝑂1 : 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡01 =0.01
• Neural network output for 𝑂𝑂1 : 𝑜𝑜𝑜𝑜𝑜𝑜01 =0.75136507
ha
• Error for 𝑂𝑂1 (𝐸𝐸01 )
Ac
1 2 1
• 𝐸𝐸01 = (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡01 −
𝑜𝑜𝑜𝑜𝑜𝑜01 ) = (0.01− 0.75136507)2 =0.274811083
2 2
• Target output for 𝑂𝑂2 : 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡02 =0.99
al
• Neural network output for 𝑂𝑂1 : 𝑜𝑜𝑜𝑜𝑜𝑜02 =0.772928465
rim
• Error for 𝑂𝑂2 (𝐸𝐸02 )
• 𝐸𝐸02 =0.023560026
Pa
e
network as a whole.
rje
• Obtain the actual output to be closer the target output.
• To achieve the goal, update each of the weights in the
ha
network.
The • Errors will be used in the backward layers to update
Ac
Backwards weights.
• Consider 𝑤𝑤5 of the output layer to be updated using the
Pass total error (𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ) of the network
al
• Need differential of 𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 with respect to 𝑤𝑤5
rim 𝜕𝜕𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
• : the partial derivative of 𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 with respect to
𝜕𝜕𝑤𝑤5
𝑤𝑤5
Pa
e
rje
ha
Ac
al
rim
Pa
e
𝝏𝝏𝒐𝒐𝒐𝒐𝒐𝒐𝟎𝟎𝟎𝟎
rje
ha
Calculate the total error change with respect to the output.
Ac
1 2 1
𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡01 − 𝑜𝑜𝑜𝑜𝑜𝑜01 ) + (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡02 − 𝑜𝑜𝑜𝑜𝑜𝑜02 )2
2 2
al
𝜕𝜕𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 1
=2* (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡01
rim − 𝑜𝑜𝑜𝑜𝑜𝑜01 )2−1 ∗ (−1)+0
𝜕𝜕𝑜𝑜𝑜𝑜𝑜𝑜01 2
𝜕𝜕𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
=− (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡01 − 𝑜𝑜𝑜𝑜𝑜𝑜01 )=-(0.01-0.75136507)=0.74136507
Pa
𝜕𝜕𝑜𝑜𝑜𝑜𝑜𝑜01
e
rje
1
Sigmoid function is used. We have, 𝑜𝑜𝑜𝑜𝑜𝑜01 =
1+𝑒𝑒 −𝑛𝑛𝑛𝑛𝑛𝑛01
ha
𝜕𝜕𝑜𝑜𝑜𝑜𝑜𝑜01
= 𝑜𝑜𝑜𝑜𝑜𝑜01 ∗ 1 − 𝑜𝑜𝑜𝑜𝑜𝑜01 = 0.75136507 ∗ 1 − 0.75136507
𝜕𝜕𝑛𝑛𝑛𝑛𝑡𝑡01
Ac
= 0.186815602
𝑛𝑛𝑛𝑛𝑛𝑛01 = 𝑤𝑤5 ∗ 𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 + 𝑤𝑤6 ∗ 𝑜𝑜𝑜𝑜𝑜𝑜ℎ2 + 𝑏𝑏2 ∗ 1
al
𝜕𝜕𝑛𝑛𝑛𝑛𝑛𝑛01
𝜕𝜕𝑤𝑤5
= 𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 + 0 + 0 = 𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 = 0.593269992
𝜕𝜕𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
=
𝜕𝜕𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝜕𝜕𝑜𝑜𝑜𝑜𝑜𝑜
rim
∗ 𝜕𝜕𝑛𝑛𝑛𝑛𝑛𝑛01 ∗
𝜕𝜕𝑛𝑛𝑛𝑛𝑛𝑛01
𝜕𝜕𝑤𝑤5 𝜕𝜕𝑜𝑜𝑜𝑜𝑜𝑜01 01 𝜕𝜕𝑤𝑤5
𝝏𝝏𝑬𝑬𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕
= − 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝟎𝟎𝟎𝟎 − 𝒐𝒐𝒐𝒐𝒐𝒐𝟎𝟎𝟎𝟎 ∗ 𝒐𝒐𝒐𝒐𝒐𝒐𝟎𝟎𝟎𝟎 ∗ 𝟏𝟏 − 𝒐𝒐𝒐𝒐𝒐𝒐𝟎𝟎𝟎𝟎 ∗ 𝒐𝒐𝒐𝒐𝒐𝒐𝒉𝒉𝒉𝒉
Pa
𝝏𝝏𝒘𝒘𝟓𝟓
𝜕𝜕𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝜕𝜕𝑤𝑤5
= 0.74136507 ∗ 0.186815602 ∗ 0.593269992 = 0.082167041
Prof. Parimal Acharjee
e
Update weights
rje
ha
• Assume learning rate (𝜂𝜂)=0.5
• Update 𝑤𝑤5 as follows:
Ac
𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢 𝜕𝜕𝐸𝐸𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
• 𝑤𝑤5 = 𝑤𝑤5 - 𝜂𝜂* 𝜕𝜕𝑤𝑤 =0.4-0.5*0.082167041=0.35891648
5
• Follow the same procedure to get the new weights of 𝑤𝑤6 , 𝑤𝑤7, 𝑤𝑤8
al
𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢
• 𝑤𝑤6 = 0.408666186
rim
𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢
• 𝑤𝑤7 = 0.511301270
Pa
𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢
• 𝑤𝑤8 = 0. 561370121
e
rje
• Update 𝑤𝑤1 , 𝑤𝑤2 , 𝑤𝑤3 , 𝑤𝑤4 through
backward pass.
ha
• Follow the same procedure as
like as output layer
Ac
• The output of each hidden layer
neuron contributes to all
outputs.
al
• 𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 affects both 𝑜𝑜𝑜𝑜𝑜𝑜1 and 𝑜𝑜𝑜𝑜𝑜𝑜2
rim
• In other words, 𝑜𝑜𝑜𝑜𝑜𝑜ℎ1 affects
both 𝐸𝐸01 and 𝐸𝐸02 .
Pa
e
𝑤𝑤3 = 0.24975114; 𝑤𝑤4 = 0.29950229
rje
• After updating weights, calculate total error
again.
ha
Updated • Now, calculated total error= 0.291027924
• Earlier iteration (or step) total error was
Ac
weights and 0.298371109
results • The improvement is trivial.
al
rim • Continue the process 10,000 times.
• Total error reduced to 0.0000351085
• Result : 𝑜𝑜𝑜𝑜𝑜𝑜01 =0.015912196 ( where
𝑡𝑡𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎01 =0.01) and 𝑜𝑜𝑜𝑜𝑜𝑜02 = 0.984065734
Pa