0% found this document useful (0 votes)
24 views56 pages

Unit 1 Until MLP

The document provides an overview of perceptrons, which are the foundational elements of neural networks used for binary classification tasks. It discusses the history, structure, and functioning of perceptrons, including their limitations and the introduction of multilayer perceptrons (MLPs) that can handle non-linear data. The document also outlines the perceptron learning algorithm and compares it to MLPs, emphasizing their differences in handling complex classification problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views56 pages

Unit 1 Until MLP

The document provides an overview of perceptrons, which are the foundational elements of neural networks used for binary classification tasks. It discusses the history, structure, and functioning of perceptrons, including their limitations and the introduction of multilayer perceptrons (MLPs) that can handle non-linear data. The document also outlines the perceptron learning algorithm and compares it to MLPs, emphasizing their differences in handling complex classification problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

19CSE456 Neural Networks and

Deep Learning

Unit 1

Course Instructor: Dr. M. Anbazhagan


Perceptrons
A single-layer neural network that can be used for
binary classification tasks
2
What is Perceptron?
▪ A perceptron is a type of artificial neural network, which is
a computational model inspired by the structure and
function of biological neural networks in the human brain
▪ Perceptron is linear classifier
▫ The perceptron is only able to work with linear separation of data
points
▪ Perceptron is binary classifier
▫ There should be only 2 categories for classification
▪ A neural network is an interconnected system of
perceptrons
▫ Perceptrons are the foundation of any neural network
3
Linear Vs. Non-linear Classification

4
History of Perceptron
▪ First implemented by an American
psychologist Frank Rosenblatt in
1957 at Cornell Aeronautical
Laboratory
▪ Rosenblatt was heavily inspired by
the biological neuron and its ability to
learn
▪ Rosenblatt’s Perceptron:
X1
X2
… P Y

Xn

▪ It was able to work only with linearly


5 separable data points
History of Perceptron contd.
▪ Rosenblatt’s idea was to
create a physical machine
that behaves like a neuron
▪ However, it’s first
implementation was a
software that had been
tested on the IBM 704
▪ Eventually, the software was
implemented into custom-
built hardware to use it for
image recognition
6
Biological Neuron
▪ A human brain has billions of neurons
▪ Neurons are interconnected nerve cells in the human brain that are
involved in processing and transmitting electrical signals

Output signals to other Neurons


7
Biological Neural Network at a Glance!

8
What is Artificial Neuron?
▪ An artificial neuron is a mathematical function based on the
model of biological neurons
▪ Each neuron takes inputs, weighs them separately, sums
them up and passes this sum through a nonlinear function to
produce output

9
Perceptron – A Detailed Look
W1, W2, &
W3 are
Weights
𝑂𝑢𝑡𝑝𝑢𝑡
X1 3
W1
𝑦ො = ෍ 𝑊𝑖 𝑋𝑖 + 𝑏𝑖𝑎𝑠
𝑖=1

X1, X2, & X3 W2


are binary X2
Neuron 𝑦ො
inputs

W3
3

෍ 𝑊𝑖 𝑋𝑖
X3
𝑖=1

10
What if Perceptron’s output is Binary?
Different kinds of activation functions:
Hyperbolic Tangent: used to output a number from -1 to 1.
X1 W1 Logistic Function: used to output a number from 0 to 1.

Bias

W2
Neuron a() Y
X2

Activation
W3 Function

X3

An activation function is a function that converts the inputs given (the input, in
this case, would be the weighted sum and the bias) into a certain output based
on a set of rules. 11
11
Perceptron Numerical Example

Suppose the perceptron tries to decide if you


should go to a concert
Criteria Input Weight
Artists is Good x1 = 0 or 1 w1 = 0.7
Weather is Good x2 = 0 or 1 w2 = 0.6
Friend will Come x3 = 0 or 1 w3 = 0.5
Food is Served x4 = 0 or 1 w4 = 0.3
Alcohol is Served x5 = 0 or 1 w5 = 0.4

12
Perceptron Numerical Example contd.
▪ The Perceptron Algorithm
▪ Set a threshold value (1.5)
▪ Multiply all inputs (1, 0, 1, 0, 1) with its weights (0.7, 0.6, 0.5,
0.3, 0.4)
▪ Sum all the results
▪ Activate the output

x1 * w1 = 1 * 0.7 = 0.7
True, if the
Set the x2 * w2 = 0 * 0.6 = 0 0.7 + 0 + 0.5
sum > 1.5.
threshold to x3 * w3 = 1 * 0.5 = 0.5 + 0 + 0.4 =
False,
1.5 x4 * w4 = 0 * 0.4 = 0 1.6
otherwise.
x5 * w5 = 1 * 0.3 = 0.4

13
Threshold as Bias

X0

W0
X0 , X1, X2, 0, 𝑖𝑓 𝑧 < 0
…, Xn are X1 𝜑 𝑧 =ቊ
1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
binary W1
inputs and
X0 is W2
Neuron 𝜑 Y
always set X2
to 1
… Wn 𝑛

𝑧 = ෍ 𝑊𝑖 𝑋𝑖
Xn 𝑖=0

14
Perceptron and Linear Binary Classification
▪ The idea behind the binary linear classifier can be described as
follows:

𝑓 𝑥, 𝑤, 𝑤0 = 𝜑 𝑥 ∙ 𝑤 + 𝑤0
▪ The 𝜑 function is used to distinguish x as either a positive (+1) or a
negative (-1) label
▪ There is the decision boundary to separate the data with different
labels, which occurs at

𝑥 ∙ 𝑤 + 𝑤0 = 0
▪ The decision boundary separates the hyperplane into two regions

15
Perceptron Decision Boundary
▪ In general, if we have n inputs, the decision boundary will be
a n-1 dimensional object called a hyperplane that separates
our n-dimensional feature space into 2 parts:
▪ one in which the points are classified as positive, and
▪ one in which the points are classified as negative

16
Perceptron Decision Boundary contd.
▪ If all the instances in the given data are linearly separable, there
exists a w and a w₀ such that y⁽ⁱ⁾ (w⋅ x⁽ⁱ⁾ + w₀) > 0 for every ith data
point, where y⁽ⁱ⁾ is the label

𝑥 ∙ 𝑤 + 𝑤0 =+1

𝑥 ∙ 𝑤 + 𝑤0 > 0

𝑥 ∙ 𝑤 + 𝑤0 = −1 𝑤
1
𝑤

𝑥 ∙ 𝑤 + 𝑤0 < 0

17
𝑥 ∙ 𝑤 + 𝑤0 = 0
Start

Initialize weights and bias to small


random numbers or zeros
Perceptron Learning

For each
Training
Sample

Compute the weighted sum of


Inputs

Apply the activation function

Update the
Is (error weights and bias
!= 0) ? based on the
error

Print the predicted output

18 Stop
Perceptron Learning Algorithm
Step 1: Initialize the weights and bias to random values.

Step 2: For each training example, compute the predicted output using the
current weights and bias.

Step 3:Update the weights and bias based on the error between the
predicted output and the true output. The update rule is given by:

▪ wi = wi + learning_rate * (y - ypred) * xi

▪ bias = bias + learning_rate * (y - ypred)

Step 4: Repeat steps 2 and 3 until the algorithm converges or a maximum


number of iterations is reached.

19
Perceptron Learning Algorithm
• Initialize the weights 𝑤
to small random
numbers or zeros
• Initialize the bias 𝑏 to a • For each epoch (a complete pass through the
small random number or training dataset):
zero • For each training sample (𝑥,𝑦):
• Set the learning rate 𝜂 • Compute the weighted sum (net
input) 𝑧:
𝑧 =𝑤∙𝑥+𝑏
• Apply the activation function to get
the predicted output 𝑦ො
1, 𝑖𝑓 𝑧 ≥ 0
𝑦ො = ቊ
0, 𝑖𝑓 𝑧 < 0
• Update the weights and bias based
on the error 𝑒 = 𝑦 − 𝑦ො
𝑤 =𝑤+𝜂⋅𝑒⋅𝑥 • Repeat the training
𝑏 =𝑏+𝜂⋅𝑒 process until
convergence or for a
fixed number of epochs.
20
What is convergence?
Convergence is typically achieved when the
weights and bias do not change
significantly between epochs or when the
classification error becomes zero

21
The update rule

𝑖𝑓 𝑦 𝑥 ∙ 𝑤𝑜𝑙𝑑 ≤ 0, 𝑤𝑛𝑒𝑤 = 𝑤𝑜𝑙𝑑 + 𝑦 − 𝑦′ ∗ 𝑥 + 𝐵𝑖𝑎𝑠

▪ The update rule increases the weights of the features that


contribute positively to the correct classification of the misclassified
data point x
▪ The update rule decreases the weights of the features that
contribute negatively to the correct classification
▪ The update is proportional to the feature values and the class label
of the misclassified data point

22
Perceptron Learning with Logic Gates
▪ AND Gate

▪ Initialize w1, w2, as 1, and b as –1.5

23
Perceptron Learning with Logic Gates contd.
▪ OR Gate

▪ Initialize w1, w2, as 1, and b as –0.5

24
Limitations of Perceptron
▪ Limited to linearly separable problems
▪ If the data is not linearly separable, the perceptron algorithm will not
converge and cannot learn a correct decision boundary

25
Limitations of Perceptron
▪ Binary classification only
▪ It cannot handle multiclass classification problems without
modifications

26
Limitations of Perceptron
▪ Sensitive to input scaling and bias
▫ If the input features are not scaled properly, or the bias term is not
set correctly, the algorithm may not converge or may converge to a
suboptimal solution

27
Limitations of Perceptron
▪ Can get stuck in local optima
▫ This means that it may not find the global optimal solution if the
initial weights are not set properly or if the optimization landscape is
complex

28
MLP
▪ After Rosenblatt perceptron was developed in the 1950s, there
was a lack of interest in neural networks
▪ In1986, Dr. Hinton and his colleagues developed the
backpropagation algorithm to train a multilayer neural networks

29
Multi-Layer Perceptron (MLP)
▪ A multilayer perceptron is a fully connected class of feedforward
artificial neural network
▪ Multilayer perceptrons are sometimes colloquially referred to as
"vanilla" neural networks, especially when they have a single hidden
layer

30
Common Uses of MLP

MLP
31
MLP contd.
▪ An MLP consists of at least three layers of nodes: an input layer, a
hidden layer, and an output layer
▪ Except for the input nodes, each node is a neuron that uses a
nonlinear activation function
▪ MLP utilizes a supervised learning technique called backpropagation
for training
▪ Its multiple layers and non-linear activation distinguish MLP from a
linear perceptron
▪ It can distinguish data that is not linearly separable

32
Typical MLP Network

33
Perceptron Vs. MLP

34
What is MLP?
▪ It is a neural network where the mapping between inputs and
output is non-linear
▪ A Multilayer Perceptron has input and output layers, and one or more hidden
layers with many neurons stacked together
▪ And while in the Perceptron the neuron must have an activation function that
imposes a threshold, like ReLU or sigmoid, neurons in a Multilayer Perceptron
can use any arbitrary activation function

35
MLP Topology
Hidden Layer
Input Layer Output Layer

A regression or binary
One neuron per They are called classification problem
input value or hidden layers may have a single
column in your because they are output neuron, whereas
dataset not directly exposed multiple neurons for
to the input multiclass classification

36
MLP as Feed Forward Algorithm
▪ The inputs are combined with the initial weights in a weighted
sum and subjected to the activation function
▪ Each linear combination is propagated to the next layer
▪ Each layer is feeding the next one with the result of their
computation
▪ This goes all the way through the hidden layers to the output layer

▪ If the algorithm
▫ Only computed the weighted sums in each neuron
▫ Propagated results to the output layer and stopped there
▫ It wouldn’t be able to learn the weights that minimize the cost
function.
37
Feed Forward Algorithm
1. Initialize the weights and biases for all layers in the
network
2. Assign the input features to the input layer of the
network
3. For each hidden layer and the output layer:
• Compute the weighted sum of inputs for each neuron in
the layer:

𝜃𝑗 = ෍ 𝑤𝑖𝑗 ∙ 𝑥𝑖 + 𝑏𝑗
𝑖
• Apply the activation function to the weighted sum to get the
output (activation) of each neuron:
𝑂𝑗 = 𝜑 𝜃𝑗
• Compute the final outputs of the network using the
activations of the last hidden layer and the output
layer's weights and biases.

38
What is Backpropagation?
▪ Backpropagation, short for “backward propagation of errors,” is a
fundamental concept in the training of artificial neural networks
▪ It’s a method used to calculate the gradient of the loss function with
respect to the weights of the network

39
Loss Function Vs. Cost Function

▪ Loss function: Used when we ▪ Cost function: Used to refer to


refer to the error for a single an average of the loss
training example functions over an entire
training data

2
𝐿 = 𝑦ො𝑖 − 𝑦𝑖 1 2
𝐽(𝑊, 𝐵) = σ𝑖 𝑦ො𝑖 − 𝑦𝑖
2𝑛

40
MLP with Backpropagation
▪ All the weights in the network are updated through backpropagation,
not just those at the starting point or the first hidden layer from back
▪ The updates occur layer by layer, starting from the output layer and
moving towards the input layer

𝑂0 𝑂1 𝑂2 𝑂𝑝 𝑂𝐾−1 𝑂𝐾

𝐿0 𝐿1 𝐿2 ⋯ 𝐿𝑝 ⋯ 𝐿𝐾−1 𝐿𝐾
∇𝑊1 𝐸 ∇𝑊2 𝐸 ∇𝑊𝑝 𝐸 ∇𝑊𝐾−1 𝐸 ∇𝑊𝐾 𝐸

41
MLP with Backpropagation

42
MLP Learning Algorithm
1. Initialize the weights and biases randomly
2. For each epoch do:
a) For each input sample do:
 Feed the input sample into the network and compute the output
 Calculate the error between the output and the desired output
 Backpropagate the error through the network, updating the
weights and biases using the gradient descent algorithm
a) Calculate the total error for the epoch
b) If the error is below a specified threshold, stop training and return the
weights and biases
3. Return the trained weights and biases

43
Start

Forward Pass: The network takes


the input data and moves it forward
through the layers to make a Is (error
prediction Yes != 0) ?
MLP Learning

Compute Gradients: The network No


computes the gradient of the loss
function with respect to each weight
in the network

Backward Pass: The calculated


gradients are then propagated back
through the network

The weights are updated using the


gradients and an optimization
algorithm

44 Stop
Perceptron Learning with Linear Transformation
Loss/Cost/Error Function
𝑛
1
𝐸 = ෍ 𝑊 𝑇 ∙ 𝑋𝑖 − 𝑦𝑖 2
2
𝑖=1
𝑦ො𝑖 = (𝑊 𝑇 ∙ 𝑋) 𝑛
𝑋𝑖
1 2
= ෍ 𝑦ො𝑖 − 𝑦𝑖
2
𝑖=1

Gradient
𝑛
𝑊 ∇𝑤 𝐸 = ෍ 𝑦ො𝑖 − 𝑦𝑖 ∙ 𝑋𝑖
𝑖=1
Weight Update Rule
𝑛

𝑊 = 𝑊 − 𝜂 ∙ ෍(𝑦ො𝑖 −𝑦𝑖 ) ∙ 𝑋𝑖
45 𝑖=1
Learning rate 𝜂
• The learning rate controls how much the weights of the neural network
are adjusted with respect to the loss gradient
• A smaller learning rate means smaller adjustments, leading to a more
gradual convergence
• Conversely, a larger learning rate results in larger adjustments, which can
speed up convergence but also risks overshooting the optimal solution

46
Perceptron Learning with Non-linear Transformation

𝑦ො𝑖 = 𝜎(𝑊 𝑇 ∙ 𝑋)

)
𝑋𝑖

𝑊𝑇 ∙ 𝑋

47
Perceptron Learning with Non-linear Transformation
Loss/Cost/Error Function
𝑛
1
𝐸 = ෍ 𝜎(𝑊 𝑇 ∙ 𝑋𝑖 ) − 𝑦𝑖 2
2
𝑖=1
𝑛
1
𝑦ො𝑖 = 𝜎(𝑊 𝑇 ∙ 𝑋) = ෍ 𝑦ො𝑖 − 𝑦𝑖 2
2
𝑋𝑖 𝑖=1

Gradient
𝑛

∇𝑤 𝐸 = ෍ 𝑦ො𝑖 1 − 𝑦ො𝑖 𝑦ො𝑖 − 𝑦𝑖 ∙ 𝑋𝑖


𝑊 𝑖=1
Weight Update Rule
𝑛

𝑊 = 𝑊 − 𝜂 ∙ ෍ 𝑦ො𝑖 1 − 𝑦ො𝑖 𝑦ො𝑖 − 𝑦𝑖 ∙ 𝑋𝑖


48 𝑖=1
Single Layer Neural Network with Non-linearity
Output
1 1
𝑜𝑗 =
1 + 𝑒 −𝜃𝑗
𝑜1
Weighted Sum
𝑛

𝑜2 𝜃𝑗 = ෍ 𝑊𝑖𝑗 ∙ 𝑋𝑖
𝑋
𝑖=0

Loss/Cost/Error Function
𝑜3
𝑚
𝑊 1 2
𝐸 = ෍ 𝑜𝑗 − 𝑡𝑗
2
𝑗=1

49
Single Layer Neural Network with Non-linearity
Loss/Cost/Error Function
𝑊 𝜃 𝑜 𝐸 𝑚
1 1 2
𝐸 = ෍ 𝑜𝑗 − 𝑡𝑗
2
𝑗=1
𝑜1
Gradient

𝜕𝐸 𝜕𝐸 𝜕𝑜𝑗 𝜕𝜃𝑗
𝑜2 = ∙ ∙
𝑋 𝜕𝑊𝑖𝑗 𝜕𝑜𝑗 𝜕𝜃𝑗 𝜕𝑊𝑖𝑗

= 𝑜𝑗 − 𝑡𝑗 oj 1 − 𝑜𝑗 𝑋𝑖
𝑜3

𝑊 Weight Update Rule

𝑊𝑖𝑗 = 𝑊𝑖𝑗 − 𝜂 ∙ 𝑜𝑗 1 − 𝑜𝑗 𝑜𝑗 − 𝑡𝑗 ∙ 𝑋𝑖
50
Multiple Hidden Layer & Multiple Output
1 1 1 1
1
𝑜1𝐾

𝑜2𝐾
𝑋

𝑜3𝐾

0 1 2 𝑝 𝐾−1 𝐾
51
Error – Output Layer
1 Gradient
𝑖 Output 𝜕𝐸 𝜕𝐸 𝜕𝑂𝑗𝐾 𝜕𝜃𝑗𝐾
𝑂1𝐾 = ∙ ∙
1 𝜕𝑊𝑖𝑗𝐾 𝜕𝑂𝑗𝐾 𝜕𝜃𝑗𝐾 𝜕𝑊𝑖𝑗𝐾
𝑂𝑗𝐾 = 𝐾
1 + 𝑒 −𝜃𝑗 𝑀𝐾−1

= 𝑂𝑗𝐾 − 𝑡𝑗 𝑂jK 1 − 𝑂𝑗𝐾 ෍ 𝑂𝑖𝐾−1


𝑂2𝐾 Weighted Sum 𝑖=0

𝑀𝐾−1
𝑗 𝐿𝑒𝑡 𝛿𝑗𝐾 = 𝑂𝑗𝐾 − 𝑡𝑗 𝑂jK 1 − 𝑂𝑗𝐾
𝜃𝑗𝐾 = ෍ 𝑊𝑖𝑗𝐾 ∙ 𝑂𝑖𝐾−1
𝑂3𝐾 𝑖=0 𝜕𝐸 𝑀𝐾−1 𝐾−1
𝐾 = 𝛿𝑗𝐾 σ𝑖=0 𝑂𝑖
𝜕𝑊𝑖𝑗
Loss/Cost/Error Function
𝑀𝐾 Weight Update Rule
1 2
𝐾−1 𝐾 𝐸 = ෍ 𝑂𝑗𝐾 − 𝑡𝑗 𝑀𝐾−1 𝐾−1
2 𝑊𝑖𝑗𝐾 = 𝑊𝑖𝑗𝐾 − 𝜂 ∙ 𝛿𝑗𝐾 σ𝑖=0 𝑂𝑖
𝑗=1

52
Error – Hidden Layer

1 1 Output
𝑝 𝑖 1
𝑂1𝐾 𝑂𝑖𝐾−1 = 𝐾−1
1 + 𝑒 −𝜃𝑖
𝑂1𝐾−1

Weighted Sum
𝑂2𝐾 𝑀𝐾−2
𝑂2𝐾−1 𝜃𝑖𝐾−1 = ෍ 𝑊𝑝𝑖
𝐾−1
∙ 𝑂𝑝𝐾−2
𝑗
𝑝=0

𝑂3𝐾
Loss/Cost/Error Function
𝑂3𝐾−1 𝑀𝐾
1 2
𝐸 = ෍ 𝑂𝑗𝐾 − 𝑡𝑗
2
𝑗=1
𝐾−2 𝐾−1 𝐾

53
Error – Hidden Layer
Gradient

Output 𝜕𝐸 𝜕𝐸 𝜕𝑂𝑗𝐾 𝜕𝜃𝑗𝐾 𝜕𝑂𝑖𝐾−1 𝜕𝜃𝑖𝐾−1


1 𝐾−1 = 𝐾∙ 𝐾 ∙ 𝐾−1 ∙ 𝐾−1 ∙ 𝐾−1
𝜕𝑊𝑝𝑖 𝜕𝑂𝑗 𝜕𝜃𝑗 𝜕𝑂𝑖 𝜕𝜃𝑖 𝜕𝑊𝑝𝑖
𝑂𝑖𝐾−1 = 𝐾−1
1 + 𝑒 −𝜃𝑖
𝜕𝑂𝑖𝐾−1 𝜕𝜃𝑖𝐾−1 𝑀
Weighted Sum 𝐾−1 ∙ 𝐾−1 = 𝑂𝑖𝐾−1 (1 − 𝑂𝑖𝐾−1 ) σ𝑝=0
𝐾−2 k−2
𝑂p
𝜕𝜃𝑖 𝜕𝑊𝑝𝑖
𝑀𝐾−2

𝜃𝑖𝐾−1 = ෍ 𝑊𝑝𝑖
𝐾−1
∙ 𝑂𝑝𝐾−2
𝜕𝐸 𝜕𝐸 𝜕𝑂𝑗𝐾 𝜕𝜃𝑗𝐾
𝑝=0 = ∙ ∙
𝜕𝑂𝑖𝐾−1 𝜕𝑂𝑗𝐾 𝜕𝜃𝑗𝐾 𝜕𝑂𝑖𝐾−1
Loss/Cost/Error Function 𝑀𝑘
𝑀𝐾 = ෍ 𝑂𝑗𝐾 − 𝑡𝑗 𝑂jK 1 − 𝑂𝑗𝐾 𝑊𝑖𝑗𝐾
1 2
𝐸= ෍ 𝑂𝑗𝐾 − 𝑡𝑗 𝑗=1
2
𝑗=1
𝐿𝑒𝑡 𝛿𝑗𝐾 = 𝑂𝑗𝐾 − 𝑡𝑗 𝑂jK 1 − 𝑂𝑗𝐾

54
MLP with Backpropagation
𝜕𝐸 𝜕𝐸 𝜕𝑂𝑗𝐾 𝜕𝜃𝑗𝐾 𝜕𝑂𝑖𝐾−1 𝜕𝜃𝑖𝐾−1
𝐾−1 = ∙ ∙ ∙ ∙
𝜕𝑊𝑝𝑖 𝜕𝑂𝑗𝐾 𝜕𝜃𝑗𝐾 𝜕𝑂𝑖𝐾−1 𝜕𝜃𝑖𝐾−1 𝜕𝑊𝑝𝑖
𝐾−1

𝑀𝐾−2 𝑀𝐾

= 𝑂𝑖𝐾−1 (1 − 𝑂𝑖𝐾−1 ) ෍ 𝑂pK−2 ෍ 𝛿𝑗𝐾 𝑊𝑖𝑗𝐾


𝑝=0 𝑗=1
𝑀𝑘

𝐿𝑒𝑡 𝛿𝑖𝐾−1 = 𝑂𝑖𝐾−1 (1 − 𝑂𝑖𝐾−1 ) ෍ 𝛿𝑗𝐾 𝑊𝑖𝑗𝐾 ,


𝑗=1
𝑀𝐾−2
𝜕𝐸 𝐾−1
𝐾−1 = 𝛿𝑖 ෍ 𝑂pK−2
𝜕𝑊𝑝𝑖
𝑝=0

Weight Update Rule


𝐾−1 𝐾−1 𝑀
𝑊𝑝𝑖 = 𝑊𝑝𝑖 − 𝜂 ∙ 𝛿𝑖𝐾−1 σ𝑝=0
𝐾−2
𝑂pK−2
55
MLP Numerical Example contd.
Gradient of Error w.r.t. W

∇𝑊 = 𝜂 ∙ 𝛿𝑗 ∙ 𝑜𝑖
Update Rules for Back Propagation – Output layer

𝛿𝑗 = (1 − 𝑜𝑗 ) ∙ 𝑜𝑗 ∙ (𝑡𝑗 − 𝑜𝑗 )

Update Rules for Back Propagation – Hidden layer

𝛿𝑗 = (1 − 𝑜𝑗 ) ∙ 𝑜𝑗 ∙ ෍ 𝛿𝑘 𝑤𝑘𝑗
𝑘
56

You might also like