0% found this document useful (0 votes)
16 views21 pages

Ann-Back Propagation

Artificial Intelligence

Uploaded by

deysuprateek100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views21 pages

Ann-Back Propagation

Artificial Intelligence

Uploaded by

deysuprateek100
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Artificial Neural Network—------

Back-propagation learning
Supervised Learning

BY
DR. ANUPAM GHOSH 10TH OCT, 2023
Neural Network Intro
Weights

𝒉 = 𝝈(𝐖𝟏 𝒙 + 𝒃𝟏 )
𝒚 = 𝝈(𝑾𝟐 𝒉 + 𝒃𝟐 )

Activation functions

How do we train?

𝒚
4 + 2 = 6 neurons (not counting inputs)
𝒙 [3 x 4] + [4 x 2] = 20 weights
4 + 2 = 6 biases
𝒉 26 learnable parameters

Demo
Training
Sample labeled Forward it
data through the Back-propagate Update the
network, get the errors network weights
(batch)
predictions

Optimize (min. or max.) objective/cost function 𝑱(𝜽)


Generate error signal that measures difference between predictions
and target values

Use error signal to change the weights and get more accurate
predictions
Subtracting a fraction of the gradient moves you towards the
(local) minimum of the cost function
10
Gradient Descent: An Illustration
𝛿𝐿
Negative gradient here ( < 0).
𝛿𝑤
Let’s move in the positive
Learning rate is very important
𝐿(𝒘)
direction

Positive gradient
here. Let’s move in
the negative
direction

𝒘(2) (1) 𝒘 (0) 𝒘


𝒘(0) 𝒘(1)𝒘
𝒘(3)
∗ 𝒘(2) 𝒘
𝒘(3)
∗ 𝒘

Stuck at a
local minima
Good initialization
is very important
Multilayer Feed-Forward Neural Network
Back propagation Learning

 Initialize the weights: The weights in the


network are initialized to small random
numbers (e.g., ranging from -1.0 to 1.0, or
-0.5 to 0.5). Each unit has a bias
associated with it. The biases are similarly
initialized to small random numbers
 Each training tuple, X, is processed by the
following steps.
 Propagate the inputs forward:
Calculate Output: Activation: Sigmoid

 Property of Sigmoid function: (Activation)


 Takes a real-valued number and  Calculate the output
“squashes” it into range between 0 and
1.
using Sigmoid function:
 Nice interpretation as the firing rate of a (Activation)
neuron
0 = not firing at all
1 = fully firing
 Sigmoid neurons saturate and kill
gradients, thus NN will barely learn
when the neuron’s activation are 0 or 1
(saturate)
gradient at these regions almost zero
almost no signal will flow to its weights
if initial weights are too large then most
neurons would saturate
Back propagate the error

 The error is propagated backward by updating the weights and biases to reflect the error of the
network’s prediction. For a unit j in the output layer, the error is computed by

 To compute the error of a hidden layer unit j, the weighted sum of the errors of the units connected
to unit j in the next layer are considered. The error of a hidden layer unit j is

Wjk is the weight of the connection from unit j to a unit k in the next higher layer, and Errk is the error of
unit k
Updating of weights and biases

 The weights and biases are updated to reflect the propagated errors
 The variable l is the learning rate, a constant typically having a value between 0.0 and 1.0

 Back propagation learns using a gradient descent method to search for a set of weights that fits the
training data so as to minimize the mean-squared distance between the network’s class prediction
and the known target value of the tuples
 The learning rate helps avoid getting stuck at a local minimum in decision space (i.e., where the
weights appear to converge, but are not the optimum solution) and encourages finding the global
minimum
 If the learning rate is too small, then learning will occur at a very slow pace. If the learning rate is too
large, then oscillation between inadequate solutions may occur. A rule of thumb is to set the learning
rate to 1=1/t , where t is the number of iterations through the training set so far.
Terminating condition

Training stops when


 All Δw in the previous epoch are so small as to be below some specified threshold,
or
 The percentage of tuples misclassified in the previous epoch is below some
threshold,
or
 A pre-specified number of epochs has expired
Demonstration:

 Figure shows a multilayer feed-forward neural network.


Let the learning rate be 0.9. The initial weight and bias
values of the network are given in Table 9.1, along with
the first training tuple, X = (1, 0, 1), with a class label of
1
Solution:
Weight and bias values after first iteration
Problem Statement

 Network topology:- 3-(2-2)-2


 Weight and bias values are initialized to 0
 For input set {1,0,1) the class label will be {1,0}
Q: What will be the updated weight and bias values after 2nd iteration if learning rate = 1
?

You might also like