0% found this document useful (0 votes)
5 views55 pages

7 - Feedforward and Backpropagation

Uploaded by

Swasti Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views55 pages

7 - Feedforward and Backpropagation

Uploaded by

Swasti Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Feedforward and

Backpropagation

© Nisheeth Joshi
Training a
Neural Network

• Forward Propagation
• Backpropagation

© Nisheeth Joshi
© Nisheeth Joshi
What is training

• The model learns the


relationship between the
input and output data
(Supervised Learning)
• Adding the weights and
biases so that the error
between then predicted
output and actual output is a
minimum
© Nisheeth Joshi
© Nisheeth Joshi
Sample Dataset

Student X1 X2 X3 Y
Physics (%) Chemistry Hours Studied Mathematics (%)
(%)
1 60 80 5 82
2 70 75 7 94
3 50 55 10 45
4 40 56 7 43

© Nisheeth Joshi
© Nisheeth Joshi
What kind of a problem is this? Why?

Regression/Classification

This is a regression Problem


© Nisheeth Joshi
© Nisheeth Joshi
The Network – 1 Hidden Layer

b1

W1 ∑ ∫ b3
X1 W7
W3
W2
W5
X2
∑ y’
W4
W8

W5 ∑ ∫
X3

b2
© Nisheeth Joshi
Input Layer

b1

W1 ∑ ∫ b3
X1 W7
W3
W2
W5
X2
∑ y’
W4
W8

W5 ∑ ∫
X3

b2
© Nisheeth Joshi
Edges

b1

W1 ∑ ∫ b3
X1 W7
W3
W2
W5
X2
∑ y’
W4
W8

W5 ∑ ∫
X3

b2
© Nisheeth Joshi
Weights and Biases – Most Important

b1

W1 ∑ ∫ b3
X1 W7
W3
W2
W5
X2
∑ y’
W4
W8

W5 ∑ ∫
X3

b2
© Nisheeth Joshi
Hidden Layer (Contains Neurons)

b1

W1 ∑ ∫ b3
X1 W7
W3
W2
W5
X2
∑ y’
W4
W8

W5 ∑ ∫
X3
Each Neuron has two Operations
b2 1. Summation (Linear)
2. Activation (Non-Linear)
© Nisheeth Joshi
Output Layer

b1

W1 ∑ ∫ b3
X1 W7
W3
W2
W5
X2
∑ y’
W4
W8

W5 ∑ ∫
X3

b2
© Nisheeth Joshi
Forward
Propagation
• We Feed input data
• It gets transmitted to Neurons
• by multiplying weights with inputs
• Linear and Non-Linear Operations are
performed
• Non Linear – because we want
our network to fit in all kinds of
interesting relationships (between
input and output)
• Results of neuron computation is then
multiplied with weights and output is
generated.

© Nisheeth Joshi
© Nisheeth Joshi
Sample Dataset
Student X1 X2 X3 Y
Physics (%) Chemistry Hours Studied Mathematics (%)
(%)
1 60 80 5 82
2 70 75 7 94
3 50 55 10 45
4 40 56 7 43

© Nisheeth Joshi
b1

W1 ∑ ∫ b3
60 W7
W3
W2
W5
80
∑ y’
W4
W8

W5 ∑ ∫
5

b2
© Nisheeth Joshi
b1

W1 ∑ ∫ g1

60
W3

W5
80

© Nisheeth Joshi
Inside the
Neuron
• Two Operations
• Linear Summation of weighted
inputs, plus a bias
• Non-Linear Operation

• The central idea is to extract


linear combinations of the
inputs as derived features, and
model the target as a
nonlinear function of these
features.

© Nisheeth Joshi
© Nisheeth Joshi
b1
Linear Operation


W1 ∑ z1

60 g1
W3

W5
80

© Nisheeth Joshi
-15
Linear Operation

60
0.1

0.1
∑ z1
∫ g1

0.1
80

5
z1 = w1*X1 + w2*X2 + w3*X3 + b1
= 0.1*60 + 0.1*80 + 01.*5 + (-15)
= -0.5

© Nisheeth Joshi
b1
Non-Linear Operation
W1 ∑ z1

60 g1
W3

W5
80

5
𝟏
𝒈𝟏 =
𝟏 + 𝒆−𝒛𝟏

© Nisheeth Joshi
b1 Non-Linear Operation

60
W1

W3
∑ -0.5
∫ g1

W5
𝟏
80 𝒈𝟏 =
𝟏 + 𝒆−𝒛𝟏
𝟏
5
= 𝟏+ 𝒆−(−𝟎.𝟓)

= 0.37

© Nisheeth Joshi
-15

0.1 ∑ ∫
0.37
60
0.1

0.1
80

© Nisheeth Joshi
60
W2

80
W4
g2

5
W5
∑ z2

b2
© Nisheeth Joshi
60
0.15

80 0.05
0.047

5
-0.2
∑ -3

-15
© Nisheeth Joshi
b1

W1 ∑ ∫ g1 = 0.37 b3
60
W3
W2
W5
80
∑ y’
W4
g2 = 0.047

W5 ∑ ∫
5

b2
© Nisheeth Joshi
Output Layer: Linear Summation

g1 = 0.37 b3

∑ y’
g2 = 0.047

© Nisheeth Joshi
Output Layer: Linear Summation

y’ = w7*g1 + w8*g2 + b3
= 12*0.37 + 9*0.047 + 20
g1 = 0.37 b3 = 20 = 24.95

w7 = 12

∑ y’
g2 = 0.047

w8 = 9

© Nisheeth Joshi
b1

W1 ∑ ∫ g1 = 0.37 b3
60
W3
W2
W5
80
∑ 24.95
W4
g2 = 0.047

W5 ∑ ∫
5

b2
© Nisheeth Joshi
Compare the Y-Predicted with Y-Actual

There is a huge difference

What to do?

We need a method that can update the weights so that the


error between predicted value and actual value can be reduced
© Nisheeth Joshi
© Nisheeth Joshi
Solution???

Backpropagation

© Nisheeth Joshi
© Nisheeth Joshi
Backpropagation

• Goal is to update weights and biases


• So that the predicted value is as close as to the
actual value

• Essentially, we wish to understand how the


output is affected by changing which weights
and biases.

• Problem… 

© Nisheeth Joshi
The best way
• Apply Gradient Descent
• Move along the negative
direction of the slope of an
error (cost/loss) function until
we find a minimum value

• … Calculate the partial


derivates of the cost function
with respect to each
parameter (weight or bias),
then move in the negative
direction of these derivatives
(slopes) © Nisheeth Joshi
© Nisheeth Joshi
• Let’s consider the partial
derivative of a cost function to
w7

• We want to understand as
how sensitive is the cost
function to changes in w7

© Nisheeth Joshi
Cost/Loss Function
2
𝐶𝑜𝑠𝑡 = (𝑌𝑝𝑟𝑒𝑑 − 𝑌𝑎𝑐𝑡𝑢𝑎𝑙 )
𝜕𝑐𝑜𝑠𝑡 𝜕𝑐𝑜𝑠𝑡 𝜕𝑌𝑝𝑟𝑒𝑑
= ∗
𝜕𝑤7 𝜕𝑌𝑝𝑟𝑒𝑑 𝜕𝑤7

Y’ = w7*g1 + w8*g2 + b3

© Nisheeth Joshi
Cost/Loss Function
2
𝐶𝑜𝑠𝑡 = (𝑌𝑝𝑟𝑒𝑑 − 𝑌𝑎𝑐𝑡𝑢𝑎𝑙 )
𝜕𝑐𝑜𝑠𝑡 𝜕𝑐𝑜𝑠𝑡 𝜕𝑌𝑝𝑟𝑒𝑑
= ∗ = 2(𝑌𝑝𝑟𝑒𝑑 - 𝑌𝑎𝑐𝑡𝑢𝑎𝑙 )
𝜕𝑤7 𝜕𝑌𝑝𝑟𝑒𝑑 𝜕𝑤7

Y’ = w7*g1 + w8*g2 + b3

© Nisheeth Joshi
Cost/Loss Function
2
𝐶𝑜𝑠𝑡 = (𝑌𝑝𝑟𝑒𝑑 − 𝑌𝑎𝑐𝑡𝑢𝑎𝑙 )
𝜕𝑐𝑜𝑠𝑡 𝜕𝑐𝑜𝑠𝑡 𝜕𝑌𝑝𝑟𝑒𝑑
= ∗ = 2(𝑌𝑝𝑟𝑒𝑑 - 𝑌𝑎𝑐𝑡𝑢𝑎𝑙 )*g1
𝜕𝑤7 𝜕𝑌𝑝𝑟𝑒𝑑 𝜕𝑤7

2(𝑌𝑝𝑟𝑒𝑑 - 𝑌𝑎𝑐𝑡𝑢𝑎𝑙 ) = 2*(24.95 – 82) * 0.37 = -42.2

© Nisheeth Joshi
• Similarly, the partial derivative
of a cost function to w8 can
be calculated

• So on and so forth…

© Nisheeth Joshi
Cost/Loss Function
2
𝐶𝑜𝑠𝑡 = (𝑌𝑝𝑟𝑒𝑑 − 𝑌𝑎𝑐𝑡𝑢𝑎𝑙 )
𝜕𝑐𝑜𝑠𝑡 𝜕𝑐𝑜𝑠𝑡 𝜕𝑌𝑝𝑟𝑒𝑑
= ∗ = 2(𝑌𝑝𝑟𝑒𝑑 - 𝑌𝑎𝑐𝑡𝑢𝑎𝑙 )*g1
𝜕𝑤7 𝜕𝑌𝑝𝑟𝑒𝑑 𝜕𝑤7

2(𝑌𝑝𝑟𝑒𝑑 - 𝑌𝑎𝑐𝑡𝑢𝑎𝑙 ) = 2*(24.95 – 82) * 0.047 = -5.36

© Nisheeth Joshi
Once Done…

• Updated weights and biases using GD

• Walk down the cost function in the


direction of the –ve partial derivatives,
by taking small steps, , also called the
learning rate.
• Let assume it to be 0.01,  = 0.01

© Nisheeth Joshi
© Nisheeth Joshi
Updating weights and biases
𝜕𝑐𝑜𝑠𝑡
• W7+ = W7 -  = 12 – 0.01*(-42.2) = 12.04
𝜕𝑤7

𝜕𝑐𝑜𝑠𝑡
• W8+ = W8 -  = 9 – 0.01*(-5.36) = 9.05
𝜕𝑤8

𝜕𝑐𝑜𝑠𝑡
• b3+ = b3 -  = 20 – 0.01*(-114.1) = 21.1
𝜕𝑏3

© Nisheeth Joshi
Previous Weights

g1 = 0.37 b3 = 20

w7 = 12

∑ y’
g2 = 0.047

w8 = 9

© Nisheeth Joshi
Updated Weights

g1 = 0.37 b3 = 21.1
w7 = 12.04

∑ y’
g2 = 0.047

w8 = 9.05

© Nisheeth Joshi
𝜕𝑐𝑜𝑠𝑡 𝜕𝑐𝑜𝑠𝑡
W1+ = W1 -  W7+ = W7 - 
𝜕𝑤1 𝜕𝑤7
𝜕𝑐𝑜𝑠𝑡 𝜕𝑐𝑜𝑠𝑡
W2 = W2 - 
+ W8 = W8 - 
+
𝜕𝑤2 𝜕𝑤8
𝜕𝑐𝑜𝑠𝑡 𝜕𝑐𝑜𝑠𝑡
W3 = W3 - 
+ b3 = b3 - 
+
𝜕𝑤3 𝜕𝑏3
𝜕𝑐𝑜𝑠𝑡
W4 = W4 - 
+
𝜕𝑤4
𝜕𝑐𝑜𝑠𝑡
W5 = W5 - 
+
𝜕𝑤5
𝜕𝑐𝑜𝑠𝑡
W6 = W6 - 
+
𝜕𝑤6
𝜕𝑐𝑜𝑠𝑡
b1 = b1 - 
+
𝜕𝑏1
𝜕𝑐𝑜𝑠𝑡
b2 = b2 - 
+
𝜕𝑏2

© Nisheeth Joshi
b1

W1 ∑ ∫ g1 = 0.37 b3
60
W3
W2
W5
80
∑ 24.95
W4
g2 = 0.047

W5 ∑ ∫
5

b2
© Nisheeth Joshi
Let’s consider w1
𝜕𝑐𝑜𝑠𝑡
• W1+ = W1 - 
𝜕𝑤1

• Trace back…

𝜕𝑐𝑜𝑠𝑡 𝜕𝑐𝑜𝑠𝑡 𝜕𝑌−𝑝𝑟𝑒𝑑 𝜕𝑔1


• = ∗ ∗
𝜕𝑤1 𝜕𝑌−𝑝𝑟𝑒𝑑 𝜕𝑔1 𝜕𝑤1

© Nisheeth Joshi
Let’s consider w1
𝜕𝑐𝑜𝑠𝑡
• W1+ = W1 - 
𝜕𝑤1

• Trace back…

𝜕𝑐𝑜𝑠𝑡 𝜕𝑐𝑜𝑠𝑡 𝜕𝑌−𝑝𝑟𝑒𝑑 𝝏𝒈𝟏


• = ∗ ∗
𝜕𝑤1 𝜕𝑌−𝑝𝑟𝑒𝑑 𝜕𝑔1 𝝏𝒘𝟏

© Nisheeth Joshi
-15

60
0.1

0.1
∑ z1
∫ g1

0.1
80

© Nisheeth Joshi
Let’s consider w1
𝜕𝑐𝑜𝑠𝑡
• W1+ = W1 - 
𝜕𝑤1

• Trace back…

𝜕𝑐𝑜𝑠𝑡 𝜕𝑐𝑜𝑠𝑡 𝜕𝑌−𝑝𝑟𝑒𝑑 𝜕𝑔1 𝜕𝑧1


• = ∗ ∗ ∗
𝜕𝑤1 𝜕𝑌−𝑝𝑟𝑒𝑑 𝜕𝑔1 𝜕𝑧1 𝜕𝑤1

© Nisheeth Joshi
Let’s consider w1
𝜕𝑐𝑜𝑠𝑡 𝜕𝑐𝑜𝑠𝑡 𝜕𝑌−𝑝𝑟𝑒𝑑 𝜕𝑔1 𝜕𝑧1
• = ∗ ∗ ∗
𝜕𝑤1 𝜕𝑌−𝑝𝑟𝑒𝑑 𝜕𝑔1 𝜕𝑧1 𝜕𝑤1

𝜕𝑐𝑜𝑠𝑡
• = 2(Y-pred – Y-actual)
𝜕𝑌−𝑝𝑟𝑒𝑑

© Nisheeth Joshi
Let’s consider w1
𝜕𝑐𝑜𝑠𝑡 𝜕𝑐𝑜𝑠𝑡 𝜕𝑌−𝑝𝑟𝑒𝑑 𝜕𝑔1 𝜕𝑧1
• = ∗ ∗ ∗
𝜕𝑤1 𝜕𝑌−𝑝𝑟𝑒𝑑 𝜕𝑔1 𝜕𝑧1 𝜕𝑤1

𝜕𝑐𝑜𝑠𝑡
• = 2(Y-pred – Y-actual)
𝜕𝑌−𝑝𝑟𝑒𝑑
𝜕𝑌−𝑝𝑟𝑒𝑑
• = w7
𝜕𝑔1

© Nisheeth Joshi
Let’s consider w1
𝜕𝑐𝑜𝑠𝑡 𝜕𝑐𝑜𝑠𝑡 𝜕𝑌−𝑝𝑟𝑒𝑑 𝜕𝑔1 𝜕𝑧1
• = ∗ ∗ ∗
𝜕𝑤1 𝜕𝑌−𝑝𝑟𝑒𝑑 𝜕𝑔1 𝜕𝑧1 𝜕𝑤1

𝜕𝑐𝑜𝑠𝑡
• = 2(Y-pred – Y-actual)
𝜕𝑌−𝑝𝑟𝑒𝑑
𝜕𝑌−𝑝𝑟𝑒𝑑
• = w7
𝜕𝑔1
𝜕𝑔1 𝟏 𝟏
• = g1 * (1-g1) = ( ) ∗ (𝟏 − )
𝜕𝑧1 𝟏+ 𝒆−𝒛𝟏 𝟏+ 𝒆−𝒛𝟏

© Nisheeth Joshi
Let’s consider w1
𝜕𝑐𝑜𝑠𝑡 𝜕𝑐𝑜𝑠𝑡 𝜕𝑌−𝑝𝑟𝑒𝑑 𝜕𝑔1 𝜕𝑧1
• = ∗ ∗ ∗
𝜕𝑤1 𝜕𝑌−𝑝𝑟𝑒𝑑 𝜕𝑔1 𝜕𝑧1 𝜕𝑤1

𝜕𝑐𝑜𝑠𝑡
• = 2(Y-pred – Y-actual)
𝜕𝑌−𝑝𝑟𝑒𝑑
𝜕𝑌−𝑝𝑟𝑒𝑑
• = w7
𝜕𝑔1
𝜕𝑔1 𝟏 𝟏
• = g1 * (1-g1) = ( ) ∗ (𝟏 − )
𝜕𝑧1 𝟏+ 𝒆−𝒛𝟏 𝟏+ 𝒆−𝒛𝟏
𝜕𝑧1
• = x1
𝜕𝑤1

© Nisheeth Joshi
Let’s consider w1
𝜕𝑐𝑜𝑠𝑡 𝜕𝑐𝑜𝑠𝑡 𝜕𝑌−𝑝𝑟𝑒𝑑 𝜕𝑔1 𝜕𝑧1
• = ∗ ∗ ∗
𝜕𝑤1 𝜕𝑌−𝑝𝑟𝑒𝑑 𝜕𝑔1 𝜕𝑧1 𝜕𝑤1

𝜕𝑐𝑜𝑠𝑡 𝟏 𝟏
• = 2(24.95-82) * 12*[( ) * (1 - )] ∗ 60
𝜕𝑤1 𝟏+ 𝒆−(−𝟎.𝟓) 𝟏+ 𝒆−(−𝟎.𝟓)

• -19303

© Nisheeth Joshi
Update w1
𝜕𝑐𝑜𝑠𝑡
• W1+ = W1 -  = 0.1 – (0.01)*(-19303) = 193
𝜕𝑤1

© Nisheeth Joshi
b1

W1 ∑ ∫ g1 = 0.37 b3
60
W3
W2
W5
80
∑ 24.95
W4
g2 = 0.047

W5 ∑ ∫
5

b2
© Nisheeth Joshi
Repeat

• this for each weight and biases for


student 1
• Perform the same process for rest of the
students….

• This epoch one…

© Nisheeth Joshi
© Nisheeth Joshi

You might also like