MLP
MLP
1 1
1 1 3
𝑤01 2 𝑤01
𝑤01
2
3
𝑤02 𝜃13 𝑂13
1
𝑤02 𝑤02 3
1 2 𝑤11
𝑤11 𝑤11
𝑥1 𝜃11 𝜃12 3
𝑤12
1 2
𝑤12 𝑤12
3
1 2 𝑤21 3
𝑤22 𝜃23 𝑂23
𝑤21 𝑤21
1 2
𝑤22 𝑤22
𝑥2 𝜃21 𝜃22
0 1 2 3
2
Numerical Problem
𝑘
1 1
𝑤01
1
2
𝑤𝑖𝑗
𝑤01
𝑤 → 𝑤𝑒𝑖𝑔ℎ𝑡
2
1
𝑤02
2
𝑤02 𝑘 → 𝑘 𝑡ℎ 𝑙𝑎𝑦𝑒𝑟
1
𝑤11 𝑤11
𝑥1 𝜃11 𝜃12 𝑂12 𝑖𝑗 → 𝑛𝑒𝑢𝑟𝑜𝑛 𝑖 𝑖𝑛
1 2
𝑤12 𝑤12 𝑘 − 1 𝑠𝑡 𝑙𝑎𝑦𝑒𝑟 𝑎𝑛𝑑
1
𝑤21 2
𝑤21 𝑛𝑒𝑢𝑟𝑜𝑛 𝑗 𝑖𝑛 𝑘𝑡ℎ 𝑙𝑎𝑦𝑒𝑟
1
𝑤22 2
𝑤22
𝜃22 𝑂22
𝜃𝑗𝑘 → 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑
𝑥2 𝜃21
𝑠𝑢𝑚 𝑓𝑟𝑜𝑚 𝑗𝑡ℎ
0 1 2 𝑛𝑒𝑢𝑟𝑜𝑛 𝑖𝑛 𝑙𝑎𝑦𝑒𝑟 𝑘
Forward Pass – Mean Square Error Computation 𝑂𝑗𝑘 → 𝑜𝑢𝑡𝑝𝑢𝑡 𝑓𝑟𝑜𝑚 𝑗𝑡ℎ
Backward Propagation – Weight Adjustment 𝑛𝑒𝑢𝑟𝑜𝑛 𝑖𝑛 𝑙𝑎𝑦𝑒𝑟 𝑘
3
Perceptron Learning – Gradient Descent
1. Initialize the weights and biases randomly
2. For each epoch do:
a) For each input sample do:
▪ Feed the input sample into the network and compute the output
▪ Calculate the error between the output and the desired output
▪ Backpropagate the error through the network, updating the
weights and biases using the gradient descent algorithm
b) Calculate the total error for the epoch
c) If the error is below a specified threshold, stop training and return the
weights and biases
3. Return the trained weights and biases
4
Chain Rule
▫ Let's consider a multilayer perceptron with 3 hidden layer
𝑥 𝜃1 𝜃2 𝑜
𝑓1 𝑓2 𝑓3
𝜕𝑜
▫ The derivative of ‘o’ with respect to ‘x’ i.e., 𝜕𝑥 tells if we perturb ‘x’ slightly
then what its effect is on the final output ‘o’
▫ The final output 𝑜 = 𝑓3 (𝑓2 𝑓1 𝑥 )
𝜕𝑜
▪ Here, computing the derivative is difficult as ‘o’ is quite far away from ‘x’
𝜕𝑥
5
Chain Rule contd.
▫ However, the derivative can be calculated as follows:
𝜕𝑜 𝜕𝑜 𝜕𝜃2 𝜕𝜃1
= ∙ ∙
𝜕𝑥 𝜕𝜃2 𝜕𝜃1 𝜕𝑥
▫ Above is the Chain rule of differentiation and is used in the back
propagation learning of multilayer perceptron
6
Numerical Example contd.
▫ Weights
1 1 1
𝑤01 = 0.5 𝑤11 = 1.5 𝑤21 = 0.8
1 1 1 1 1
𝑤02 = 0.8 𝑤12 = 0.2 𝑤22 = −1.6 0.5
0.9
2 2 2 1.2
𝑤01 = 0.9 𝑤11 = −1.7 𝑤21 = 1.6 0.8
1.5 −1.7
2
𝑤02 = 1.2 2
𝑤12 = 2.1 2
𝑤22 = −0.2 0.7 𝜃11 𝜃12 𝑂12
0.2 2.1
0.8 1.6
0.7
▫ Input Vector 𝑋 = −1.6 −0.2
1.2 1.2 𝜃21 𝜃22 𝑂22
1 0 1 2
▫ Ground Truth 𝑡 =
0
7
Feed Forward Pass
▫ Weighted Sum 𝜃𝑗1 = σ 𝑤𝑖𝑗
1
∙ 𝑥𝑖0 𝑓𝑜𝑟 1 ≤ 𝑗 ≤ 2 𝑎𝑛𝑑 0 ≤ 𝑖 ≤ 2
1
▫ Activation Function 𝑜𝑗1 = 1 𝑓𝑜𝑟 1 ≤ 𝑗 ≤ 2
1+𝑒 −𝜃 𝑗
1
0.5 1.5 0.8 2.51 0.92
𝜃𝑗1 = ∗ 0.7 = 𝑎𝑛𝑑 𝑜𝑗1 =
0.8 0.2 −1.6 −0.98 0.27
1.2
1
0.9 −1.7 1.6 −0.232 0.44
𝜃𝑗2 = ∗ 0.92 = 𝑎𝑛𝑑 𝑜𝑗2 =
1.2 2.1 −0.2 3.057 0.95
0.27
8
Back Propagation – Output Layer
1 2
▫ Error 𝐸 = 2 σ2𝑗=1 𝑜𝑗2 − 𝑡𝑗
𝜕𝐸
▫ Let us set 𝛿𝑗2 = 𝑜𝑗2 − 𝑡𝑗 ∙ 𝑜𝑗2 ∙ 1 − 𝑜𝑗2 ⇒ 𝜕𝑊 2 = 𝛿𝑗2 ∙ 𝑜𝑖1
𝑖𝑗
9
Back Propagation – Output Layer
𝛿𝑗2 = 𝑜𝑗2 − 𝑡𝑗 ∙ 𝑜𝑗2 ∙ 1 − 𝑜𝑗2
2′ 2 2′ 2
𝑤11 = 𝑤11 − 𝜂 ∙ (−0.126) 𝑤12 = 𝑤12 − 𝜂 ∙ (0.04)
10
Back Propagation – Output Layer
𝜕𝐸
▫ 2 = 𝛿12 ∙ 𝑜21 = −0.037 2′
𝑤21 2
= 𝑤21 − 𝜂 ∙ (−0.037)
𝜕𝑊21
𝜕𝐸 2′ 2
▫ 2 = 𝛿22 ∙ 𝑜21 = 0.012 𝑤22 = 𝑤22 − 𝜂 ∙ (0.012)
𝜕𝑊22
𝜕𝐸
▫ 2 = 𝛿12 ∙ 𝑜01 = −1.38 2′
𝑤01 2
= 𝑤01 − 𝜂 ∙ (−1.38)
𝜕𝑊01
𝜕𝐸 2′ 2
▫ 2 = 𝛿22 ∙ 𝑜01 = 0.045 𝑤02 = 𝑤02 − 𝜂 ∙ (0.045)
𝜕𝑊02
11
Back Propagation – Hidden Layer
1 2
▫ Error 𝐸 = 2 σ2𝑗=1 𝑜𝑗2 − 𝑡𝑗
𝜕𝐸 𝑀
▫ ⇒ 𝜕𝑊 1 = σ2𝑗=1 𝛿𝑗2 ∙ 𝑊𝑖𝑗2 ∙ 𝑜𝑖1 ∙ 1 − 𝑜𝑖𝑖 ∙ σ𝑝=𝑜
0
𝑥𝑝𝑖
𝑝𝑖
12
Weights Update Rule– Hidden Layer
𝑀
Putting 𝛿𝑖𝐾 = 𝑜𝑖𝐾 (1 − 𝑜𝑖𝐾 ) σ𝑗=1
𝐾+1 𝐾+1
𝛿𝑗 𝑊𝑖𝑗𝐾+1
13
Back Propagation – Hidden Layer
2
𝑗=1
▫ = 0.024
▫ 𝛿21 = 𝑜21 ∙ 1 − 𝑜21 ∙ 𝛿12 ∙ 𝑤21
2
+ 𝛿22 ∙ 𝑤22
2
▫ = −0.02
14
Back Propagation – Hidden Layer
𝜕𝐸
▫ 1 = 𝛿11 ∙ 𝑜10 = 0.024 ∗ 0.7 = 0.017
𝜕𝑊11 𝜕𝐸
1′ 1
𝑤𝑖𝑗 = 𝑤𝑖𝑗 −η∙
𝜕𝐸 𝜕𝑊𝑖𝑗1
▫ 1 = 𝛿21 ∙ 𝑜10 = −0.02 ∗ 0.7 = −0.014
𝜕𝑊12
𝜕𝐸
▫ 1 = 𝛿11 ∙ 𝑜20 = 0.024 ∗ 1.2 = 0.0288
𝜕𝑊21
𝜕𝐸
▫ 1 = 𝛿21 ∙ 𝑜20 = −0.02 ∗ 1.2 = −0.024
𝜕𝑊22
𝜕𝐸
▫ 1 = 𝛿11 ∙ 𝑜00 = 0.024 ∗ 1 = 0.024
𝜕𝑊01
𝜕𝐸
▫ 1 = 𝛿21 ∙ 𝑜00 = −0.02 ∗ 1 = −0.02
𝜕𝑊02
15
Back Propagation – Hidden Layer
𝜕𝐸
▫ 1 = 𝛿11 ∙ 𝑜10 = 0.017 1′
𝑤11 1
= 𝑤11 − 𝜂 ∙ (0.017)
𝜕𝑊11
𝜕𝐸
▫ 1 = 𝛿21 ∙ 𝑜10 = −0.014 1′
𝑤12 1
= 𝑤12 − 𝜂 ∙ (−0.014)
𝜕𝑊12
𝜕𝐸
▫ 1 = 𝛿11 ∙ 𝑜20 = 0.0288 1′
𝑤21 1
= 𝑤21 − 𝜂 ∙ (0.0288)
𝜕𝑊21
𝜕𝐸
▫ 1 = 𝛿21 ∙ 𝑜20 = −0.024 1′
𝑤22 1
= 𝑤22 − 𝜂 ∙ (−0.024)
𝜕𝑊22
𝜕𝐸
▫ 1 = 𝛿11 ∙ 𝑜00 = −0.024 1′
𝑤01 1
= 𝑤01 − 𝜂 ∙ (−0.024)
𝜕𝑊01
𝜕𝐸
▫ 1 = 𝛿21 ∙ 𝑜00 = −0.02 1′
𝑤02 1
= 𝑤02 − 𝜂 ∙ (−0.02)
𝜕𝑊02
16
After Updating the Weights
1 1′
1
𝑤01 2′
𝑤01
2′
1′
𝑤02 𝑤02
1′ 2′
𝑤11 𝑤11
𝑥1 𝜃11 𝜃12 𝑂12
1′ 2′
𝑤12 𝑤12
1′ 2′
𝑤21 𝑤21
1′ 2′
𝑤22 𝑤22
𝑥2 𝜃21 𝜃22 𝑂22
0 1 2
Forward Pass – Mean Square Error Computation
Backward Propagation – Weight Adjustment
17
Exercise
Implement a flexible Multi-Layer Perceptron (MLP) neural network that can
be configured using a JSON file. This exercise will help you understand the
structure of neural networks, forward and backward propagation, and how
to make your code adaptable to different network architectures.
Task 2 – MLP Implementation:
• Implement an MLP class that initializes the
Task 1 - JSON Configuration Parser: network based on the parsed JSON
• Create a function that reads and parses a configuration
JSON configuration file • Include methods for:
• The JSON file should specify the network a) Initializing weights and biases
architecture (number of layers, neurons per b) Implementing various activation functions
layer, activation functions) and training (e.g., sigmoid, ReLU, tanh)
hyperparameters (learning rate, number of c) Forward propagation
epochs, batch size) d) Backward propagation
e) Weight updates using gradient descent
f) Training the network
g) Making predictions
18
{
"layers": [
{
"type": "input",
"neurons": 4
},
{
"type": "hidden",
"neurons": 4,
"activation": "sigmoid"
JSON Data },
{
"type": "output",
"neurons": 2,
"activation": "sigmoid"
}
],
"learning_rate": 0.1,
"epochs": 10000,
"batch_size": 32
}
19