Lect 4
Lect 4
Cont.
30/1/2023 U of K: Dr. Hiba Hassan 6
Cont.
• For a target (t) & an actual output (o), the error is given by the
following mean square error cost function,
Where,
30/1/2023 U of K: Dr. Hiba Hassan 8
30/1/2023 U of K: Dr. Hiba Hassan 9
Batch Training
• Batch Training: In batch mode the weights and biases of the
network are updated only after the entire training set has been
applied to the network. The gradients calculated at each training
example are added together to determine the change in the weights
and biases.
• Batch Gradient Descent: In the batch steepest descent training
function the weights and biases are updated in the direction of the
negative gradient of the performance function.
30/1/2023 U of K: Dr. Hiba Hassan 10
Cont.
Local and global minima Effect of adding Momentum
30/1/2023 U of K: Dr. Hiba Hassan 12
Wi (t o) xi
• Also called the Least Mean Square, LMS, method.
30/1/2023 U of K: Dr. Hiba Hassan 13
Q k 1 Q k 1
30/1/2023 U of K: Dr. Hiba Hassan 14
Cont.
• An adaptive linear system responds to changes in its environment
as it is operating.
• These networks are often used in error cancellation, signal
processing, and control systems. For example, they are used by
many long distance phone lines for echo cancellation.
• The pioneering work in this field was done by Widrow and Hoff,
who gave the name ADALINE to adaptive linear elements.
30/1/2023 U of K: Dr. Hiba Hassan 18
Cont.
• Multiple layer ADALINE is called MADALINE.
• The Widrow-Hoff rule can only train single-layer linear networks.
• This is not much of a disadvantage; single-layer linear networks are
just as capable as multilayer linear networks.
• For every multilayer linear network, there is an equivalent single-
layer linear network.
30/1/2023 U of K: Dr. Hiba Hassan 20
BACKPROPAGATION ALGORITHM
30/1/2023 U of K: Dr. Hiba Hassan 21
BackPropagation Algorithm
• The backpropagation algorithm was made popular by Rumelhart,
Hinton and Williams in 1986 "Learning Internal Representations by
Error Propagation". Rumelhart, David E.; McClelland, James
L. (eds.). Parallel Distributed Processing : Explorations in the
Microstructure of Cognition. Vol. 1 : Foundations. Cambridge: MIT
Press. ISBN 0-262-18120-7.]
• The researchers used semi-linear neurons with differentiable activation
functions in the hidden neurons (logistic activation functions or
sigmoids).
30/1/2023 U of K: Dr. Hiba Hassan 22
Cont.
• The error between the target and actual output is calculated at
every iteration and is back propagated through the layers of the
ANN to adapt the weights.
• The weights are adapted such that the error is minimized.
• Once the error has reached a justified minimum value, the training
is stopped.
• Among the first applications of the BP algorithm is speech
synthesis called NETalk developed by Terence Sejnowski
[Sejnowski & Rosenberg, 1987 “Parallel Networks that Learn to
Pronounce English Text”, Complex Systems 1, 145-168]
30/1/2023 U of K: Dr. Hiba Hassan 23
Cont.
• The configuration for training a neural network using the BP
algorithm is shown in the figure below.
30/1/2023 U of K: Dr. Hiba Hassan 24
Cont.
• We need to obtain the following algorithm to adapt the weights
between the output (k) and hidden (j) layers:
• And t is the iteration number and is the error signal between the
output and hidden layers & is given by:
30/1/2023 U of K: Dr. Hiba Hassan 26
Cont.
• Adaptation between input (i) and hidden (j) layers :
• Where,
• And,
30/1/2023 U of K: Dr. Hiba Hassan 27
Backpropagation Algorithm
• The following ANN model is used to derive the backpropagation
algorithm:
30/1/2023 U of K: Dr. Hiba Hassan 28
BP (cont.)
• The backpropagation has two steps,
• Forward propagation, and
• Backward propagation.
• Our ANN model has the following assumptions:
• A two-layer multilayer NN model, i.e. with 1 set of hidden neurons.
• Neurons in layer i are fully connected to layer j and neurons in
layer j are fully connected to layer k.
• Input layer neurons have linear activation functions and hidden
and output layer neurons have logistic activation functions
(sigmoids).
30/1/2023 U of K: Dr. Hiba Hassan 29
Cont.
• When c is large, the sigmoid becomes like a threshold function and
when is c is small, the sigmoid becomes more like a straight line
(linear).
• When c is large learning is much faster but a lot of information is
lost, however when c is small, learning is very slow but information
is retained.
• Since this function is differentiable, it enables the B.P. algorithm to
adapt the lower layers of weights in a multilayer neural network.
30/1/2023 U of K: Dr. Hiba Hassan 31
Cont.
• The firing angle used here is c=1.
• Bias weights are used with bias signals of 1 for hidden (j) and output
layer (k) neurons.
• In many ANN models, bias weights (θ) with bias signals of 1 are used to
speed up the convergence process.
• The learning parameter is given by the symbol η and is usually fixed a
value between 0 and 1, however, in many applications nowadays an
adaptive η is used.
• Usually η is set large in the initial stage of learning and reduced to a
small value at the final stage of learning.
• A momentum term α is also used in the G.D.R. to avoid local minima.
30/1/2023 U of K: Dr. Hiba Hassan 32
Steps of BP Algorithm
• Step 1: Obtain a set of training patterns.
• Step 2: Set up neural network model: No. of Input neurons, Hidden
neurons, and Output Neurons.
• Step 3: Set learning rate η and momentum rate α
• Step 4: Initialize all connection Wji , Wkj and bias weights θj θk to
random values.
• Step 5: Set minimum error, Emin
• Step 6: Start training by applying input patterns one at a time and
propagate through the layers then calculate total error.
30/1/2023 U of K: Dr. Hiba Hassan 33
Cont.
• Step 7: Backpropagate error through output and hidden layer and
adapt weights.
• Step 8: Backpropagate error through hidden and input layer and
adapt weights.
• Step 9: Check if Error < Emin
• If not repeat Steps 6-9. If yes stop training.
30/1/2023 U of K: Dr. Hiba Hassan 34
Cont.
• The training patterns of this ANN is the XOR example as given in
the following table:
30/1/2023 U of K: Dr. Hiba Hassan 36
Cont.
• The ANN model and its initial weights,
• Training begins when the pattern#1 and its target are provided to the
ANN.
• 1st pattern: 0, 0 target : 0
30/1/2023 U of K: Dr. Hiba Hassan 37
30/1/2023 U of K: Dr. Hiba Hassan 38
Cont.
• This error is now backpropagated through the layers following the
error signal equations given as follows:
• Between output (k) and hidden (j) layer
• Thus
• Between hidden (j) and input (i) layer :
• = -0.0035
30/1/2023 U of K: Dr. Hiba Hassan 40
Cont.
• Now we have calculated the error signal between layers (k) and (j)
= -0.0064
30/1/2023 U of K: Dr. Hiba Hassan 41
Cont.
• This is the increment of the weight after the first iteration for the
weight between layers k and j.
• Now this change in weight is added to the actual weight as follows
• and thus the weight between layers k and j has been adapted.
30/1/2023 U of K: Dr. Hiba Hassan 42
Cont.
• Similarly for the weights between layers j and i, the adaptation follows
• and this is the adapted weight between layers j and i after pattern#1 is
seen by the ANN in the first iteration.
• The whole calculation is then repeated for the next pattern (pattern#2 =
[0, 1]) with tk=1.
• After all the 4 patterns have been completed the whole process is
repeated for pattern#1 again.
30/1/2023 U of K: Dr. Hiba Hassan 43
UNSUPERVISED LEARNING
30/1/2023 U of K: Dr. Hiba Hassan 44
Unsupervised Learning
• Unsupervised learning is the process of finding structure, patterns
or correlation in the given data.
• Many times this type of learning depends on associative learning
procedures.
• We focus on two main approaches:
• Unsupervised Hebbian learning
• Principal component analysis
• Unsupervised competitive learning
• Clustering
30/1/2023 U of K: Dr. Hiba Hassan 45
Hebbian Learning
• An association principle was proposed by Hebb in 1949 in the
context of biological neurons.
• Hebb’s principle
When a neuron repeatedly excites another neuron, then the
threshold of the latter neuron is decreased, or the synaptic
weight between the neurons is increased, in effect increasing
the likelihood of the second neuron to be excited by the first.
30/1/2023 U of K: Dr. Hiba Hassan 47
Cont.
• Brilliant idea by Hebb(1949):cells that fire together, wire
together
Output Signals
Input Signals
i j
30/1/2023 U of K: Dr. Hiba Hassan 50
Example (cont.)
• The inputs are defined as follows:
Hebbian Learning
• Hebbian learning rule Δwji = ηyjxi
• Consider the update of a single weight w,
w(n + 1) = w(n) + ηy(n)x(n)
• For a linear activation function
w(n + 1) = w(n)[1 + ηx2(n)]
• Weights increase without bounds. If initial weight is negative, then
it will increase in the negative range. If it is positive, then it will
increase in the positive range.
• Hebbian learning is naturally unstable.
30/1/2023 U of K: Dr. Hiba Hassan 53