Backpropogation Algorithm
Backpropogation Algorithm
Backpropagation Works
We have this input data…….
Feature 1 Feature 2
0.5 -0.5
0.3 0.4
0.7 0.9
Feature 1 Feature 2
0.9 0.1
0.9 0.9
0.1 0.1
Consider this Neural network….
Example taken from: Neural Networks, A classroom approach by Satish Kumar
Bias Bias
0.01 0.31
Input Layer Hidden Layer Output Layer
d1
X1 0.1 0.37
0.9
0.5
0.3 -0.22
0.9
-0.2
d2
X2 0.1
-0.5 0.55 -0.12
-0.02 0.27
Bias
Bias
Let’s Start by moving forward
Bias
0.01
Input Layer Hidden Layer
Net value is the total input
0.1
X1 coming to the neuron.
0.5
-0.2
𝑧1 = 𝑥1 (0.1) + 𝑥2 −0.2 + 𝑏𝑖𝑎𝑠
X2
-0.5 𝑧1 = 0.5 0.1 + −0.5 −0.2 + 0.01
𝑧1 = 0.16
Input Layer
Net value is the total input
X1
coming to the neuron.
0.5
0.3
Net value of the second neuron in the hidden layer:
𝑧2 = 𝑥1 (0.3) + 𝑥2 0.55 + 𝑏𝑖𝑎𝑠
X2
-0.5 0.55
-0.02
Bias
1
𝛿 𝑧1 = −0.16 = 0.5399
1+𝑒
1
𝛿 𝑧2 = 0.145 = 0.4638
1+𝑒
Let’s continue with the output neurons
Now, the hidden neuron’s output becomes the input to the next neuron
Bias
0.31
Hidden Layer Output Layer
0.37
0.9
𝑦1 = 0.5399(0.37) + 0.4638 0.9 + 0.31
𝑦1 = 0.9271
-0.22
Similarly……
-0.12
0.27
Bias
0.7164
0.5238
• The base algorithm that is used to minimize the error with respect to the
weights of the neural network. The learning rate determines the step size of
the update used to reach the minimum.
• An Epoch is one complete pass through all the samples.
https://www.learnopencv.com/understanding-activation-
functions-in-deep-learning/ https://sebastianraschka.com/faq/docs/closed-form-vs-gd.html
The Backpropagation
Remember our objective is to:
Minimize the error By Changing the Weight
Positive Slope:
When we increase w,
the loss is increasing
-(+) = - Weight
Decreases (Moving Left)
Weight Update Rule:
𝑤 𝑤 − ղ
𝑑𝑤
Old Weight Negative Learnin Gradient
https://towardsdatascience.com/gradien
Slop g Rate t-descent-in-a-nutshell-eaf8c18212f0
Local Minimum and Global Minimum
f(x)
Convex and Non-Convex
Optimization
This cannot be
done directly
𝑦 =𝑧+2 𝑦 = 𝑓(𝑧)
𝑧 =𝑤+4 𝑧 = 𝑔(𝑤)
𝑑𝑦 𝑑𝑦 𝑑𝑧
= .
𝑑𝑤 𝑑𝑧 𝑑𝑤
What Should be done is…….
𝑑𝑎
𝑑𝑧 𝑑𝐸
𝑑𝑧
𝑑𝑤 𝑑𝑎
𝑑𝐸 𝑑𝐸 𝑑𝑎 𝑑𝑧
=
𝑑𝑤 𝑑𝑎 𝑑𝑧 𝑑𝑤
More Complex
𝑤1 𝑤2
𝑥 𝑧1 𝑎1 𝑧 2 𝑎2 𝐸
𝑑𝐸
𝑑𝑧 1
𝑑𝑎1 𝑑𝑧 2 𝑑𝑎2 𝑑𝑎2
𝑑𝑤1
𝑑𝑧 1 𝑑𝑎1 𝑑𝑧 2
𝑑𝐸 𝑑𝐸 𝑑𝑎2 𝑑𝑧 2 𝑑𝑎1 𝑑𝑧 1
=
𝑑𝑤1 𝑑𝑎2 𝑑𝑧 2 𝑑𝑎1 𝑑𝑧 1 𝑑𝑤1
Consider these neurons to work with…….
Bias Bias
0.01 0.31
Input Layer Hidden Layer Output Layer
d1
X1 0.1 0.37
0.9
0.5
0.3 -0.22
0.9
-0.2
d2
X2 0.1
-0.5 0.55 -0.12
-0.02 0.27
Bias
Bias
Adjusting the weight of the output neuron
Bias Bias
0.01 0.31
Input Layer Hidden Layer Output Layer
d1
X1 0.1 0.37
0.9
0.5
0.3 -0.22
0.9
-0.2
d2
X2 0.1
-0.5 0.55 -0.12
-0.02 0.27
Bias
Bias
How much is the error changing
with respect to the output
Expected Actual
1
{𝑑1 −𝛿 𝑦1 } 2 + {𝑑2 −𝛿 𝑦2 } 2
2
= 0.7164(1-0.7164) = 0.2031
How much is the input changing
with respect to the weight
= 0.5399
All Together
= -0.0201
Weight Update for the neuron
X1
0.5
X2
-0.5
In our case, p=2 𝜕𝑧1
𝜕𝑤1
0.5
𝜕(𝛿 𝑍1 )
𝛿 𝑍1 [1-𝛿 𝑍1 ]
1
𝛿 𝑧1 =
1 + 𝑒 −𝑍1
0.5399(1-0.5399)
0.2484
In our case, p=2
𝜕𝐸 𝜕(𝛿 𝑦1 ) 𝜕 𝑦1 𝜕𝐸 𝜕(𝛿 𝑦2 ) 𝜕 𝑦2
+
𝜕(𝛿 𝑦1 ) 𝜕 𝑦1 𝜕(𝛿 𝑍1 ) 𝜕(𝛿 𝑦2 ) 𝜕 𝑦2 𝜕(𝛿 𝑍1 )
-0.1836 0.2031
𝑦1 = 𝛿 𝑍1 (𝑤1) + 𝛿 𝑍2 𝑤2 + 0.31
𝑦1 = 0.5399(0.37) + 0.4638 0.9 + 0.31
In our case, p=2
𝜕𝐸 𝜕(𝛿 𝑦1 ) 𝜕 𝑦1 𝜕𝐸 𝜕(𝛿 𝑦2 ) 𝜕 𝑦2
+
𝜕(𝛿 𝑦1 ) 𝜕 𝑦1 𝜕(𝛿 𝑍1 ) 𝜕(𝛿 𝑦2 ) 𝜕 𝑦2 𝜕(𝛿 𝑍1 )
-0.1836 0.2031 0.37
1
{𝑑1 −𝛿 𝑦1 } 2 + {𝑑2 −𝛿 𝑦2 } 2
2
In our case, p=2
𝜕𝐸 𝜕(𝛿 𝑦1 ) 𝜕 𝑦1 𝜕𝐸 𝜕(𝛿 𝑦2 ) 𝜕 𝑦2
+
𝜕(𝛿 𝑦1 ) 𝜕 𝑦1 𝜕(𝛿 𝑍1 ) 𝜕(𝛿 𝑦2 ) 𝜕 𝑦2 𝜕(𝛿 𝑍1 )
-0.1836 0.2031 0.37
−[𝑑2 − 𝛿 𝑦2 ]
-(0.1-0.5238)
0.4238
In our case, p=2
𝜕𝐸 𝜕(𝛿 𝑦1 ) 𝜕 𝑦1 𝜕𝐸 𝜕(𝛿 𝑦2 ) 𝜕 𝑦2
+
𝜕(𝛿 𝑦1 ) 𝜕 𝑦1 𝜕(𝛿 𝑍1 ) 𝜕(𝛿 𝑦2 ) 𝜕 𝑦2 𝜕(𝛿 𝑍1 )
-0.1836 0.2031 0.37 0.4238
𝛿 𝑦2 [1 − 𝛿 𝑦2 ]
0.5238(1-0.5238)
0.2494
In our case, p=2
𝜕𝐸 𝜕(𝛿 𝑦1 ) 𝜕 𝑦1 𝜕𝐸 𝜕(𝛿 𝑦2 ) 𝜕 𝑦2
+
𝜕(𝛿 𝑦1 ) 𝜕 𝑦1 𝜕(𝛿 𝑍1 ) 𝜕(𝛿 𝑦2 ) 𝜕 𝑦2 𝜕(𝛿 𝑍1 )
-0.1836 0.2031 0.37 0.4238 0.2494
𝛿 𝑍1
𝑦2 = 0.5399(−0.22) + 0.4638 −0.12 + 0.27
In our case, p=2
𝜕𝐸 𝜕(𝛿 𝑦1 ) 𝜕 𝑦1 𝜕𝐸 𝜕(𝛿 𝑦2 ) 𝜕 𝑦2
+
𝜕(𝛿 𝑦1 ) 𝜕 𝑦1 𝜕(𝛿 𝑍1 ) 𝜕(𝛿 𝑦2 ) 𝜕 𝑦2 𝜕(𝛿 𝑍1 )
-0.1836 0.2031 0.37 0.4238 0.2494 -0.22
-0.0370
0.2484 0.5
-0.0370
-0.0045954
Weight Update for the hidden neuron
-0.0045954
= 0.1 + 1.2(0.0045954) =
0.1055
The new weight
A Final Diagram to Wrap it up…….
https://www.jeremyjordan.me/neural-networks-training/
Weights Update for the network
https://www.jeremyjordan.me/neural-networks-training/
Combine
Continue…..
• Similar Procedure for all the other neurons
Feature 1 Feature 2
0.9 0.1
0.9 0.9
0.1 0.1
Take the third sample (iteration 3)
Feature 1 Feature 2
0.5 -0.5
0.3 0.4
0.7 0.9
Feature 1 Feature 2
0.9 0.1
0.9 0.9
0.1 0.1
• That was ONE EPOCH. An Epoch is one
complete pass through all the samples. After
repeating that for many epochs (ex. 25) our
neural network is expected to reach the
minimum error, and be considered as trained.
We’ll learn about optimization later!