0% found this document useful (0 votes)
28 views50 pages

EELU ANN ITF309 Lecture 07 Spring 2024

The document discusses backpropagation and multilayer neural networks. It describes the limitations of perceptrons and how backpropagation can be used to train multilayer networks. It also provides examples of using neural networks for pattern classification and function approximation.

Uploaded by

ahmedhos111111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views50 pages

EELU ANN ITF309 Lecture 07 Spring 2024

The document discusses backpropagation and multilayer neural networks. It describes the limitations of perceptrons and how backpropagation can be used to train multilayer networks. It also provides examples of using neural networks for pattern classification and function approximation.

Uploaded by

ahmedhos111111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

ITF309

Artificial Neural Networks


CHAPTER 11
Back-Propagation

EELU ITF309 Neural Network


3/19/2024 Lecture 7 2
Limitations of perceptron

The perceptron learning rule is


guaranteed to converge to a solution in
a finite number of steps, so long as a
solution exists.

The two classes must be linearly


separable.
EELU ITF309 Neural Network
3/19/2024 Lecture 7 3
Limitations perceptron
Linearly non-separable problems

Try drawing a straight line between two classes vectors

EELU ITF309 Neural Network


3/19/2024 Lecture 7 4
Objectives
A generalization of the LMS algorithm, called
backpropagation, can be used to train
multilayer networks.
Backpropagation is an approximate steepest
descent algorithm, in which the performance
index is mean square error.
In order to calculate the derivatives, we need
to use the chain rule of calculus.

EELU ITF309 Neural Network


3/19/2024 Lecture 7 5
Motivation
The perceptron learning and the LMS
algorithm were designed to train single-
layer perceptron-like networks.
They are only able to solve linearly
separable classification problems.
The multilayer perceptron, trained by the
backpropagation algorithm, is currently the
most widely used neural network.

EELU ITF309 Neural Network


3/19/2024 Lecture 7 6
Three-Layer Network

Number of neurons in each layer: R  S1  S 2  S 3


EELU ITF309 Neural Network
3/19/2024 Lecture 7 7
Pattern Classification:
XOR gate
The limitations of the single-layer
perceptron (Minsky & Papert, 1969)
 0    0  
p1   , t1  0 p 2   , t 2  1
P2 P4  0    1 
 1   1 
p 3   , t3  1 p 4   , t 4  0
P1 P3  0    1 

EELU ITF309 Neural Network


3/19/2024 Lecture 7 8
Two-Layer XOR Network
w11
Two-layer, 2-2-1 network

n11 a11
p1 2
2
 P1
1 n12 a12
1

1 n12 a12  1 .5 AND
p2
1  1
1.5 P4
1
Individual Decisions 1
w
EELU ITF309 Neural Network 2
3/19/2024 Lecture 7 9
Solved Problem P11.1
Design a multilayer network to distinguish
these categories.

p1  1 1  1  1 p3  1  1 1  1
T T

p 2   1  1 1 1 p 4   1 1  1 1
T T

Class I Class II
Wp1  b  0 Wp 3  b  0
Wp 2  b  0 Wp 4  b  0
There is no hyperplane that can separate these two categories.
EELU ITF309 Neural Network
3/19/2024 Lecture 7 10
Solution of Problem P11.1
p1 2 n11 a11

p2 2
1 n12 a12
1

p3 2 n12 a12 1
 1
p4 2 1 OR
1
AND
EELU ITF309 Neural Network
3/19/2024 Lecture 7 11
Function Approximation
Two-layer, 1-2-1 network
1
f ( n) 
1
n
, f 2
( n)  n
1 e

w11  10, w12  10, b11  10, b21  10. 0

w12  1, w12  1, b2  0.
-1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
EELU ITF309 Neural Network
3/19/2024 Lecture 7 12
Function Approximation
The centers of the steps occur where
the net input to a neuron in the first
layer is zero.
n11  w11 p  b11  0  p   b11 w11   (10) 10  1
n12  w12 p  b21  0  p   b21 w12   10 10  1
The steepness of each step can be
adjusted by changing the network
weights.
EELU ITF309 Neural Network
3/19/2024 Lecture 7 13
Effect of Parameter Changes
3

b21

20 15 10 5 0

-1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

EELU ITF309 Neural Network


3/19/2024 Lecture 7 14
Effect of Parameter Changes
3

w12
1.0
2

0.5

0.0
1

-0.5

-1.0
0

-1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

EELU ITF309 Neural Network


3/19/2024 Lecture 7 15
Effect of Parameter Changes
3

w12
1.0
2

0.5

0.0
1

-0.5

-1.0
0

-1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

EELU ITF309 Neural Network


3/19/2024 Lecture 7 16
Effect of Parameter Changes
3

b2
1.0
2
0.5

0.0
1
-0.5

-1.0
0

-1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

EELU ITF309 Neural Network


3/19/2024 Lecture 7 17
Function Approximation
Two-layer networks, with sigmoid
transfer functions in the hidden layer and
linear transfer functions in the output layer,
can approximate virtually any function
of interest to any degree accuracy,
provided sufficiently many hidden units
are available.

EELU ITF309 Neural Network


3/19/2024 Lecture 7 18
Backpropagation Algorithm
x1

xn
The backpropagation algorithm was used to train the
multi-layer perception MLP
MLP used to describe any general Feedforward (no recurrent
connections) Neural Network FNN

However, we will concentrate on nets with units arranged in


layers
EELU ITF309 Neural Network
3/19/2024 Lecture 7 19
Architecture of BP Nets
• Multi-layer, feed-forward networks have the
following characteristics:
-They must have at least one hidden layer
– Hidden units must be non-linear units (usually with sigmoid
activation functions).
– Fully connected between units in two consecutive layers, but no
connection between units within one layer.
– For a net with only one hidden layer, each hidden unit receives
input from all input units and sends output to all output units
– Number of output units need not equal number of input units
– Number of hidden units per layer can be more or less than
input or output units.

EELU ITF309 Neural Network


3/19/2024 Lecture 7 20
Training a BackPropagation Net
• Feed-forward training of input patterns
– each input node receives a signal, which is broadcast to all of the
hidden units
– each hidden unit computes its activation which is broadcast to all
output nodes
• Back propagation of errors
– each output node compares its activation with the desired output
– based on these differences, the error is propagated back to all
previous nodes Delta Rule
• Adjustment of weights
– weights of all links computed simultaneously based on the errors
that were propagated back

EELU ITF309 Neural Network


3/19/2024 Lecture 7 21
Three-layer back-propagation neural
network

Input signals
1
x1 1 y1
1
2
x2 2 y2
2

i wij j wjk
xi k yk

m
n l yl
xn
Input Hidden Output
layer layer layer

Error signals

EELU ITF309 Neural Network


3/19/2024 Lecture 7 22
Generalized delta rule

• Delta rule only works for the output layer.


• Backpropagation, or the generalized delta
rule, is a way of creating desired values for
hidden layers

EELU ITF309 Neural Network


3/19/2024 Lecture 7 23
Description of Training BP Net:
Feedforward Stage
1. Initialize weights with small, random values
2. While stopping condition is not true
 for each training pair (input/output):
 each input unit broadcasts its value to all hidden
units
 each hidden unit sums its input signals & applies
activation function to compute its output signal
 each hidden unit sends its signal to the output units
 each output unit sums its input signals & applies its
activation function to compute its output signal

EELU ITF309 Neural Network


3/19/2024 Lecture 7 24
Training BP Net:
Backpropagation stage

3. Each output computes its error term, its own


weight correction term and its bias(threshold)
correction term & sends it to layer below

4. Each hidden unit sums its delta inputs from above


& multiplies by the derivative of its activation
function; it also computes its own weight
correction term and its bias correction term

EELU ITF309 Neural Network


3/19/2024 Lecture 7 25
Training a Back Prop Net:
Adjusting the Weights

5. Each output unit updates its weights and bias

6. Each hidden unit updates its weights and bias


 Each training cycle is called an epoch. The
weights are updated in each cycle
 It is not analytically possible to determine where
the global minimum is. Eventually the algorithm
stops in a low point, which may just be a local
minimum.

EELU ITF309 Neural Network


3/19/2024 Lecture 7 26
Backpropagation Algorithm
For multilayer networks the outputs of one
layer becomes the input to the following layer.
am1  f m1 (Wm1am  bm1 ), m  0,1,2,..., M 1
a0  p, a  a M

EELU ITF309 Neural Network


3/19/2024 Lecture 7 27
Performance Index
Training Set: {p1, t 1}  {p2, t2}    {pQ, t Q}
Mean Square Error: F x = E e2  = E t – a2 
T T
Vector Case: F x = Ee e  = E t – a   t – a 
Approximate Mean Square Error:
T T
F̂ x  = t  k – a  k   t k  – a k   = e  k e  k
Approximate Steepest Descent Algorithm
m m  F̂ m m F̂
w i j k + 1 = wi j k  – ------ b i k + 1 = b i k  –  ---m-
-
m
w i j b i
EELU ITF309 Neural Network
3/19/2024 Lecture 7 28
Chain Rule
df (n( w)) df (n) dn( w)
 
dw dn dw
If f(n) = en and n = 2w, so that f(n(w)) = e2w.
df (n( w)) df (n) dn( w)
   en  2
dw dn dw
Approximate mean square error:
Fˆ (x)  [t (k )  a(k )]T [t (k )  a(k )]  eT (k )e(k )
Fˆ  ˆ
F n m
wim, j (k  1)  wim, j (k )   m  wim, j (k )   m  im
wi , j ni wi , j
Fˆ Fˆ n m
bim (k  1)  bim (k )   m  bim (k )   m  im
biNeural Network
EELU ITF309 ni bi
3/19/2024 Lecture 7 29
Sensitivity & Gradient
The net input to the ith neurons of layer m:
S m 1 nim n m
nim   wim, j a mj1  bim  m  a mj1 , im  1
j 1 wi , j bi
The sensitivity of F̂ to changes in the ith element
of the net input at layer m: sim  Fˆ nim
Fˆ Fˆ nim m 1
Gradient: m  m  m  si  a j
m

wi , j ni wi , j

Fˆ Fˆ nim


   s m
 1  s m

bim EELUITF309 bim Network


i i
nim Neural
3/19/2024 Lecture 7 30
Steepest Descent Algorithm
The steepest descent algorithm for the
approximate mean square error:
Fˆ n m
wim, j (k  1)  wim, j (k )   m  im  wim, j (k )  sim a mj1
ni wi , j
Fˆ nim F
b (k  1)  b (k )   m  m  bim (k )  sim
m m
--------
ni bi
i i m
n 1
Matrix form: F
-------m-
s  -------m-- =
m m m m –1 T m F
W  k + 1 = W  k  – s a  n 2
n


bm k + 1 = bm k  – sm
F
-------m---
EELU ITF309 Neural Network
n m
3/19/2024 Lecture 7 S 31
BP the Sensitivity
Backpropagation: a recurrence
relationship in which the sensitivity at
layer m is computed from the
sensitivity at layer m+1.
Jacobian matrix:  n n  n  m 1
1
m 1
1
m 1
1
 
 n n n 
m m m
1 2 sm

m 1  n m 1
nm 1
n m 1
n  n
2 2
 2


m
n m
n  .
m
n m 1 2 sm
     
 nsmm11 nsmm11 nsmm11 
 m  
n1
 Network n2m
nsmm 
EELU ITF309 Neural
3/19/2024 Lecture 7 32
Matrix Repression
The i,j element of Jacobian matrix
 s m 1 m 
m

  wi ,l al  bi  m 1

nim 1    m 1
a m
m 1
f m
( n m
j )
  wi , j  wi , j
l 1 j

n mj n j m
n j
m
n mj
 wim, j1 f m (n mj ).
n m1 m 1  m  f m (n1m ) 
 W F (n m
) , 0  0
n m
  m (n m ) 
0 f  0 
F m (n m )   2
.
     
 f m (n mm )
 Neural
0 0  
EELU ITF309 Network s
3/19/2024 Lecture 7 33
Recurrence Relation
The recurrence relation for the sensitivity
F  n  Fˆ
ˆ m 1 T
 (n )(W ) Fˆ
s  m  
m
m 
 m 1
 F m m 1 T
m 1
n  n  n n
 F m (n m )(W m1 ) T s m1 .

The sensitivities are propagated backward


through the network from the last layer to the
first layer. M M 1
s s 2 1
  s  s .

EELU ITF309 Neural Network


3/19/2024 Lecture 7 34
Backpropagation Algorithm
At the final layer: SM
  (t j  a j ) 2
Fˆ  ( t  a ) T
(t  a ) j 1 ai
si  M 
M
  2(ti  ai ) M .
ni ni M
niM ni
ai aiM f M (niM )  M M
 M   f (ni )
niM
ni niM

siM  2(ti  ai ) f M (niM )

s M  2F M (n M )(t  a)

EELU ITF309 Neural Network


3/19/2024 Lecture 7 35
Summary
The first step is to propagate the input forward through
the network:
a0  p
am1  f m1 (Wm1am  bm1 ), m  0,1,2,..., M 1
a  aM
The second step is to propagate the sensitivities
backward through the network:
 Output layer:
 s
Hidden layer:
M
 2 
F M
(n M
)(t  a)
sm  F m (nm )(Wm1 )T s m1 , m  M 1,..., 2,1
The final step is to update the weights and biases:
Wm (k  1)  Wm (k )  s m (am1 )T
bm (k  1)  bm (k )  s m
EELU ITF309 Neural Network
3/19/2024 Lecture 7 36
BP Neural Network
Layer 1 Layer m-1 m
Layer m Layer M
1 w
w 1,1 1,1
p1 1 1 m 1 1 a1M
w12,1 w i ,1
w1S1 ,1 wSmm1 ,1
w1,mj
p2 2 j m i k a kM
w i, j

w1m,S m wSmm1 , j
wim,S m
pR S1 Sm S m 1
SM a SMM
w1S1 , R wSmm1 ,S m

EELU ITF309 Neural Network


3/19/2024 Lecture 7 37
Ex: Function Approximation
 t
g  p  = 1 + sin --- p 
4

p  e

1-2-1
Network

EELU ITF309 Neural Network


3/19/2024 Lecture 7 38
Network Architecture
p
1-2-1 a
Network

EELU ITF309 Neural Network


3/19/2024 Lecture 7 39
Initial Values
W  0  = – 0.27 b  0  = – 0.48
1 1 2 2
W  0  = 0.09 – 0.17 b 0  = 0.48
– 0.41 – 0.13

3
Network Response
Initial Network Sine Wave

Response: 2

a2 1

-1
-2 -1 0 1 2
EELU ITF309 Neural Network p
3/19/2024 Lecture 7 40
Forward Propagation
0
Initial input: a = p = 1

Output of the 1st layer:


   
a1 = f 1 W 1 a0 + b 1  = l og sig  –0.27 1 + – 0.48  = log si g – 0.75 
 –0.41 – 0.13   – 0.54 
1
--------------------
0.75
1
a = 1+ e = 0.321
1 0.368
-------------0.54
-------
1+ e
Output of the 2nd layer:
a = f W a + b  = purelin ( 0.09 – 0.17 0.321 + 0.48 ) = 0.446
2 2 2 1 2

0.368
error:
     
e = t – a =  1 + sin  --- p   – a =  1 + sin --- 1   – 0.446 = 1.261
2

 4   4 
EELU ITF309 Neural Network
3/19/2024 Lecture 7 41
Transfer Func. Derivatives
n
f 1 (n)  d  1  e
 n 

dn  1  e  (1  e  n ) 2
 1  1 
 1   n  n 
 (1  a1
)( a1
)
 1  e  1  e 

f 2 (n)  d (n)  1
dn

EELU ITF309 Neural Network


3/19/2024 Lecture 7 42
Backpropagation
The second layer sensitivity:
s 2  2F 2 (n 2 )(t  a)  2[ f 2 (n 2 )]e
 2  1  1.261  2.522
The first layer sensitivity:
 (1  a 1
)( a1
) 0   w 2
 2
1 
s  F (n )(W ) s  
1 1 2 T 2 1,1
1  2 
1 1
s
 0 (1  a2 )(a2 )  w1, 2 
1

(1  0.321)  0.321 0   0.09


     2.522
 0 (1  0.368)  0.368  0.17
 0.0495
  EELU ITF309 Neural Network
3/19/2024  0 . 0997 Lecture 7 43
Weight Update
Learning rate   0.1
W 2 (1)  W 2 (0)  s 2 (a1 )T
 0.09  0.17  0.1[2.522]0.321 0.368
 0.171  0.0772
b2 (1)  b2 (0)  s2  [0.48]  0.1[2.522]  [0.732]
W1 (1)  W1 (0)  s1 (a0 )T
 0.27  0.0495  0.265
   0.1 [1]   
  0.41  0 .0997    0.420 
 0.48  0.0495  0.475
b (1)  b (0)  s  
1 1 1
  0.1   
  0.13   0 .0997    0.140
EELU ITF309 Neural Network
3/19/2024 Lecture 7 44
Choice of Network Structure
Multilayer networks can be used to
approximate almost any function, if we
have enough neurons in the hidden
layers.
We cannot say, in general, how many
layers or how many neurons are
necessary for adequate performance.

EELU ITF309 Neural Network


3/19/2024 Lecture 7 45
Illustrated Example 1
3 3

i
g  p  = 1 + sin ----- p 
2 2

4  1 1

0 0

-1 -1
1-3-1 Network -2 -1 0 1 2 -2 -1 0 1 2

i 1 i2
3 3

2 2

1 1

0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

i4 i 8
EELU ITF309 Neural Network
3/19/2024 Lecture 7 46
Illustrated Example 2
3 3

2
1-2-1 2
1-3-1
6
g  p  = 1 + sin ------ p  1 1
4 
0 0

2 p2 -1 -1
-2 -1 0 1 2 -2 -1 0 1 2

3 3

2
1-4-1 2
1-5-1
1 1

0 0

-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

EELU ITF309 Neural Network


3/19/2024 Lecture 7 47
Convergence
g p = 1 + sinp  2  p  2
3 3

2 5 2

1
5
1 3 1 3
2 4
4 2
0 0 0

0
1
-1 -1
-2 -1 0 1 2 -2 -1 0 1 2

Convergence to Global Min. Convergence to Local Min.


The numbers to each curve indicate the sequence of iterations.
EELU ITF309 Neural Network
3/19/2024 Lecture 7 48
Generalization
In most cases the multilayer network is
trained with a finite number of examples of
proper network behavior: {p1, t 1}  {p2, t2}    {pQ, t Q}
This training set is normally representative of
a much larger class of possible input/output
pairs.
Can the network successfully generalize what it
has learned to the total population?

EELU ITF309 Neural Network


3/19/2024 Lecture 7 49
Generalization Example

g  p  = 1 + sin --- p  p = –2 –1.6 –1.2   1.6 2
4
3 3

1-2-1 1-9-1
2 2

1 1

0 0

-1
-2
Generalize well
-1 0 1
Not generalize well
2
-1
-2 -1 0 1 2

For a network to be able to generalize, it should have fewer


parameters than there are data points in the training set.
EELU ITF309 Neural Network
3/19/2024 Lecture 7 50

You might also like