0% found this document useful (0 votes)

98 views64 pages

Multi Layer Perceptron

This document discusses the multi-layer perceptron model and the backpropagation algorithm. It covers the MLP model structure with input, hidden and output layers. It then describes the backpropagation algorithm in detail, including calculating the error signal, defining the cost function, deriving the gradient descent learning rule, and distinguishing the calculations for output versus hidden neurons. The goal is to optimize network weights using gradient descent and backpropagation of error signals from the output to hidden layers.

Uploaded by

Haripriya Tyarala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views64 pages

Multi Layer Perceptron

Uploaded by

Haripriya Tyarala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

WK3 – Multi Layer Perceptron

Contents
CS 476: Networks of Neural Computation
MLP Model

BP Algorithm WK3 – Multi Layer Perceptron

Approxim.

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Feature Detection

Contents
MLP Model

BP Algorithm
Approxim.

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Contents
MLP Model

BP Algorithm
Approxim.

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Contents
MLP Model

BP Algorithm
Approxim.

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Contents
MLP Model

BP Algorithm
Approxim.

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Contents
MLP Model

BP Algorithm
Approxim.

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Contents

•MLP model details

Contents
•Back-propagation algorithm
MLP Model
•XOR Example
BP Algorithm
•Heuristics for Back-propagation
Approxim.
•Heuristics for learning rate
Model Selec.
BP & Opt.
•Approximation of functions

Conclusions •Generalisation
•Model selection through cross-validation
•Conguate-Gradient method for BP

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Contents II

•Advantages and disadvantages of BP

Contents
•Types of problems for applying BP
MLP Model
•Conclusions
BP Algorithm
Approxim.

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Multi Layer Perceptron

•“Neurons” are positioned in layers. There are Input,

Contents Hidden and Output Layers
MLP Model

BP Algorithm
Approxim.

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Multi Layer Perceptron Output

•The output y is calculated by:

Contents
 m 
MLP Model y j (n)   j ( v j ( n ))   j   w ji ( n ) y i ( n ) 
BP Algorithm
 i 0 
Approxim. Where w0(n) is the bias.
Model Selec.
BP & Opt.
•The function j(•) is a sigmoid function. Typical
examples are:
Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Transfer Functions

•The logistic sigmoid:

Contents
1
MLP Model y 
1  exp( x)
BP Algorithm
Approxim.

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Transfer Functions II

•The hyperbolic tangent sigmoid:

Contents
(exp( x )  exp(  x ))
MLP Model sinh( x ) 2
y  tanh( x)  
BP Algorithm cosh( x ) (exp( x )  exp(  x ))
2
Approxim.

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Learning Algorithm

•Assume that a set of examples ={x(n),d(n)}, n=1,…,N

Contents is given. x(n) is the input vector of dimension m0 and
MLP Model d(n) is the desired response vector of dimension M
BP Algorithm •Thus an error signal, ej(n)=dj(n)-yj(n) can be deﬁned
Approxim.
for the output neuron j.
•We can derive a learning algorithm for an MLP by
Model Selec.
assuming an optimisation approach which is based on
BP & Opt. the steepest descent direction, I.e.
Conclusions w(n)=-g(n)
Where g(n) is the gradient vector of the cost function
and  is the learning rate.

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Learning Algorithm II

•The algorithm that it is derived from the steepest

Contents descent direction is called back-propagation
MLP Model •Assume that we deﬁne a SSE instantaneous cost
function (I.e. per example) as follows:
BP Algorithm
Approxim. 1
( n )  
2
ej (n)
Model Selec. 2 j C

BP & Opt. Where C is the set of all output neurons.

Conclusions •If we assume that there are N examples in the set 
then the average squared error is:
N
1
 av 
N
 ( n )
n 1

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Learning Algorithm III

•We need to calculate the gradient wrt Eav or wrt to E(n).

Contents
In the ﬁrst case we calculate the gradient per epoch (i.
MLP Model e. in all patterns N) while in the second the gradient is
calculated per pattern.
BP Algorithm
•In the case of Eav we have the Batch mode of the
Approxim.
algorithm. In the case of E(n) we have the Online or
Model Selec. Stochastic mode of the algorithm.
BP & Opt. •Assume that we use the online mode for the rest of
the calculation. The gradient is deﬁned as:
Conclusions
 ( n )
g (n) 
w ji ( n )

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Learning Algorithm IV

•Using the chain rule of calculus we can write:

Contents
( n ) ( n ) e j ( n ) y j ( n ) v j ( n )
MLP Model 
w ji ( n ) e j ( n ) y j ( n ) v j ( n ) w ji ( n )
BP Algorithm
Approxim. •We calculate the different partial derivatives as
follows:
Model Selec.
( n )
BP & Opt.  e j (n)
e j ( n )
Conclusions
e j ( n )
 1
y j
(n)

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Learning Algorithm V

•And,
Contents
y j ( n )
MLP Model   j ' ( v j ( n ))
v j ( n )
BP Algorithm
Approxim. v j ( n )
 yi (n)
Model Selec. w ji ( n )

BP & Opt.
•Combining all the previous equations we get ﬁnally:
Conclusions

( n )
w ij ( n )    e j ( n ) j ' ( v j ( n )) y i ( n )
w ji ( n )

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Learning Algorithm VI

•The equation regarding the weight corrections can be

Contents written as:
MLP Model
w ji (n)   j ( n ) y i ( n )
BP Algorithm
Where j(n) is deﬁned as the local gradient and is given
Approxim.
by:
Model Selec.
( n ) ( n ) e j ( n ) y j ( n )
BP & Opt.  j (n)    e j ( n ) j ' ( v j ( n ))
v j ( n ) e j ( n ) y j ( n ) v j ( n )
Conclusions
•We need to distinguish two cases:
• j is an output neuron
• j is a hidden neuron

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Learning Algorithm VII

•Thus the Back-Propagation algorithm is an error-

Contents correction algorithm for supervised learning.
MLP Model

BP Algorithm •If j is an output neuron, we have already a deﬁnition of

ej(n), so, j(n) is deﬁned (after substitution) as:
Approxim.

Model Selec.  j
(n)  (d j
(n)  y j ( n ))  j ' ( v j ( n ))

BP & Opt.
•If j is a hidden neuron then j(n) is deﬁned as:
Conclusions
( n ) y j ( n ) ( n )
 (n)    j ' ( v j ( n ))
y j ( n ) v j ( n ) y j ( n )
j

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Learning Algorithm VIII

•To calculate the partial derivative of E(n) wrt to yj(n)

Contents
we remember the deﬁnition of E(n) and we change the
MLP Model index for the output neuron to k, i.e.
BP Algorithm 1
( n )  
2
ek (n)
Approxim. 2 k C

Model Selec.
•Then we have:
BP & Opt.
( n ) e k ( n )
Conclusions   ek ( n )
y j ( n ) k C y j ( n )

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Learning Algorithm IX

•We use again the chain rule of differentiation to get

Contents the partial derivative of ek(n) wrt yj(n):
MLP Model
( n ) e k ( n ) v k ( n )
BP Algorithm
  ek ( n )
y j ( n ) k C v k ( n ) y j ( n )
Approxim.
•Remembering the deﬁnition of ek(n) we have:
Model Selec.
BP & Opt. ek ( n )  d k ( n )  y k ( n )  d k ( n )   k ( v v ( n ))
Conclusions
•Hence:
e k ( n )
  k ' ( v k ( n ))
v k ( n )

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Learning Algorithm X

•The local ﬁeld vk(n) is deﬁned as:

Contents
m

MLP Model vk (n)   w kj ( n ) y j ( n )

j 0
BP Algorithm
Approxim. Where m is the number of neurons (from the previous
layer) which connect to neuron k. Thus we get:
Model Selec.
BP & Opt. v k ( n )
 w kj ( n )
Conclusions
y j ( n )
•Hence:
( n )
   e k ( n )k ' ( v k ( n )) w kj ( n )
y j ( n ) k C

    k ( n ) w kj ( n )
k C

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Learning Algorithm XI

•Putting all together we ﬁnd for the local gradient of a

Contents hidden neuron j the following formula:
MLP Model
 j
(n)   j ' ( v j ( n ))   k ( n ) w kj ( n )
BP Algorithm k C
•It is useful to remember the special form of the
Approxim.
derivatives for the logistic and hyperbolic tangent
Model Selec. sigmoids:
BP & Opt. • j’(vj(n))=yj(n)[1-yj(n)] (Logistic)
Conclusions • j’(vj(n))=[1-yj(n)][1+yj(n)] (Hyp. Tangent)

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Summary of BP Algorithm

1. Initialisation: Assuming that no prior infromation is

Contents available, pick the synaptic weights and thresholds
MLP Model from a uniform distribution whose mean is zero
and whose variance is chosen to make the std of
BP Algorithm the local ﬁelds of the neurons lie at the transition
Approxim. between the linear and saturated parts of the
sigmoid function
Model Selec.
2. Presentation of training examples: Present the
BP & Opt. network with an epoch of training examples. For
Conclusions each example in the set, perform the sequence of
the forward and backward computations described
in points 3 & 4 below.

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Summary of BP Algorithm II

3. Forward Computation:
Contents • Let the training example in the epoch be
MLP Model denoted by (x(n),d(n)), where x is the input
vector and d is the desired vector.
BP Algorithm
• Compute the local fields by proceeding forward
Approxim. through the network layer by layer. The local
Model Selec. field for neuron j at layer l is defined as:
BP & Opt. m

  w ji
(l ) (l ) ( l 1 )
Conclusions vj (n) (n) yi (n)
where m is the number i  0 of neurons which connect

to j and yi(l-1)(n) is the activation of neuron i at

layer (l-1). Wji(l)(n) is the weight

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Summary of BP Algorithm III

which connects the neurons j and i.

Contents
• For i=0, we have y0(l-1)(n)=+1 and wj0(l)(n)=bj(l)(n)
MLP Model is the bias of neuron j.
BP Algorithm • Assuming a sigmoid function, the output signal
Approxim.
of the neuron j is:
  j (v j
(l ) (l )
Model Selec. y j
(n) ( n ))

BP & Opt.
• If j is in the input layer we simply set:
Conclusions
 x j (n)
(0)
yj (n)

where xj(n) is the jth component of the input

vector x.

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Summary of BP Algorithm IV

• If j is in the output layer we have:

Contents
 o j (n)
(L)
y (n)
MLP Model j

BP Algorithm where oj(n) is the jth component of the output

Approxim. vector o. L is the total number of layers in the
network.
Model Selec.
• Compute the error signal:
BP & Opt.

Conclusions
e j (n)  d j (n)  o j (n)
where dj(n) is the desired response for the jth
element.

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Summary of BP Algorithm V

4. Backward Computation:
Contents
• Compute the s of the network deﬁned by:
MLP Model
 e j ( n ) j ' ( v j ( n )) for
(L) (L)
neuron j in output layer L
BP Algorithm  j ( l ) ( n )  
 j ' ( v j ( l ) ( n ))   k ( l 1) ( n ) w kj ( l 1) ( n ) for neuron j in hidden layer l

Approxim. k

where j(•) is the derivative of function j wrt the

Model Selec.
argument.
BP & Opt.
• Adjust the weights using the generalised delta
Conclusions rule:
 w ji  1)   j
( l 1 )
w ji
(l ) (l ) (l )
(n) (n (n) yi (n)
where  is the momentum constant

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Summary of BP Algorithm VI

5. Iteration: Iterate the forward and backward

Contents computations of steps 3 & 4 by presenting new
epochs of training examples until the stopping
MLP Model
criterion is met.
BP Algorithm
Approxim.
• The order of presentation of examples should be
Model Selec. randomised from epoch to epoch
BP & Opt. • The momentum and the learning rate parameters
Conclusions typically change (usually decreased) as the number
of training iterations increases.

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Stopping Criteria

• The BP algorithm is considered to have converged

Contents when the Euclidean norm of the gradient vector
reaches a suﬃciently small gradient threshold.
MLP Model
• The BP is considered to have converged when the
BP Algorithm
absolute value of the change in the average square
Approxim. error per epoch is suﬃciently small
Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

XOR Example

• The XOR problem is deﬁned by the following truth

Contents table:
MLP Model

BP Algorithm
Approxim.

Model Selec. • The following network solves the problem. The

BP & Opt. perceptron could not do this. (We use Sgn func.)
Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Heuristics for Back-Propagation

• To speed the convergence of the back-propagation

Contents algorithm the following heuristics are applied:
MLP Model • H1: Use sequential (online) vs batch update
BP Algorithm • H2: Maximise information content
Approxim. • Use examples that produce largest error
Model Selec. • Use example which very different from all the
BP & Opt.
previous ones

Conclusions • H3: Use an antisymmetric activation function,

such as the hyperbolic tangent. Antisymmetric
means:
(-x)=- (x)

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Heuristics for Back-Propagation II

• H4: Use different target values inside a smaller

Contents range, different from the asymptotic values of
the sigmoid
MLP Model

BP Algorithm
• H5: Normalise the inputs:
• Create zero-mean variables
Approxim.
• Decorrelate the variables
Model Selec.
• Scale the variables to have covariances
BP & Opt.
approximately equal
Conclusions
• H6: Initialise properly the weights. Use a zero
mean distribution with variance of:

 w

1
m

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Heuristics for Back-Propagation III

where m is the number of connections arriving to

Contents a neuron
MLP Model • H7: Learn from hints
BP Algorithm • H8: Adapt the learning rates appropriately (see
Approxim. next section)

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Heuristics for Learning Rate

• R1: Every adjustable parameter should have its

Contents own learning rate
MLP Model • R2: Every learning rate should be allowed to
BP Algorithm
adjust from one iteration to the next

Approxim. • R3: When the derivative of the cost function wrt

a weight has the same algebraic sign for several
Model Selec. consecutive iterations of the algorithm, the
BP & Opt. learning rate for that particular weight should be
increased.
Conclusions
• R4: When the algebraic sign of the derivative
above alternates for several consecutive
iterations of the algorithm the learning rate
should be decreased.

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Approximation of Functions

•Q: What is the minimum number of hidden layers in a

Contents MLP that provides an approximate realisation of any
MLP Model continuous mapping?
BP Algorithm
Approxim. •A: Universal Approximation Theorem
Model Selec. Let (•) be a nonconstant, bounded, and monotone
BP & Opt.
increasing continuous function. Let Im0 denote the m0-
dimensional unit hypercube [0,1]m0. The space of
Conclusions
continuous functions on Im0 is denoted by C(Im0). Then
given any function f  C(Im0) and  > 0, there exists an
integer m1 and sets of real constants ai , bi and wij where
i=1,…, m1 and j=1,…, m0 such that we may

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Approximation of Functions II

deﬁne:
Contents m1
m 
)   a i    w ij x j  b i 
0

MLP Model F ( x1 ,..., x m

 
 j 1 
0
i 1
BP Algorithm
as an approximate realisation of function f(•); that is:
Approxim.

Model Selec. | F ( x1 ,..., x m )  f ( x1 ,..., x m ) | 

0 0

BP & Opt.
for all x1, …, xm0 that lie in the input space.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Approximation of Functions III

•The Universal Approximation Theorem is directly

Contents applicable to MLPs. Speciﬁcally:
MLP Model • The sigmoid functions cover the requirements for
BP Algorithm function 
Approxim. • The network has m0 input nodes and a single hidden
Model Selec. layer consisting of m1 neurons; the inputs are
BP & Opt.
denoted by x1, …, xm0

Conclusions • Hidden neuron I has synaptic weights wi1, …, wm0

and bias bi
• The network output is a linear combination of the
outputs of the hidden neurons, with a1 ,…, am1
deﬁning the synaptic weights of the output layer
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Approximation of Functions IV

•The theorem is an existence theorem: It does not tell

Contents us exactly what is the number m1; it just says that exists!
MLP Model !!
BP Algorithm •The theorem states that a single hidden layer is
Approxim. suﬃcient for an MLP to compute a uniform 
approximation to a given training set represented by
Model Selec. the set of inputs x1, …, xm0 and a desired output f(x1, …,
BP & Opt. xm0).
Conclusions
•The theorem does not say however that a single
hidden layer is optimum in the sense of the learning
time, ease of implementation or generalisation.

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Approximation of Functions V

•Empirical knowledge shows that the number of data

Contents pairs that are needed in order to achieve a given error
MLP Model level  is:
BP Algorithm W 
N O 
Approxim.   

Model Selec. Where W is the total number of adjustable parameters

BP & Opt.
of the model. There is mathematical support for this
observation (but we will not analyse this further!)
Conclusions
•There is the “curse of dimensionality” for
approximating functions in high-dimensional spaces.
•It is theoretically justiﬁed to use two hidden layers.

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Generalisation

Def: A network generalises well when the input-output

Contents mapping computed by the network is correct (or nearly
MLP Model so) for test data never used in creating or training the
network. It is assumed that the test data are drawn
BP Algorithm form the population used to generate the training data.
Approxim.

Model Selec.
•We should try to approximate the true mechanism that
BP & Opt. generates the data; not the specific structure of the
Conclusions
data in order to achieve the generalisation. If we learn
the specific structure of the data we have overfitting or
overtraining.

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Generalisation II

Contents
MLP Model

BP Algorithm
Approxim.

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Generalisation III

•To achieve good generalisation we need:

Contents
• To have good data (see previous slides)
MLP Model
• To impose smoothness constraints on the function
BP Algorithm
• To add knowledge we have about the mechanism
Approxim.
• Reduce / constrain model parameters:
Model Selec.
• Through cross-validation
BP & Opt.
• Through regularisation (Pruning, AIC, BIC, etc)
Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Cross Validation

•In cross validation method for model selection we split

Contents the training data to two sets:
MLP Model • Estimation set
BP Algorithm • Validation set
Approxim. •We train our model in the estimation set.
Model Selec. •We evaluate the performance in the validation set.
BP & Opt. •We select the model which performs “best” in the
Conclusions validation set.

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Cross Validation II

•There are variations of the method depending on the

Contents partition of the validation set. Typical variants are:
MLP Model • Method of early stopping
BP Algorithm • Leave k-out
Approxim.

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Method of Early Stopping

•Apply the method of early stopping when the number

Contents of data pairs, N, is less than N<30W, where W is the
number of free parameters in the network.
MLP Model
•Assume that r is the ratio of the training set which is
BP Algorithm
allocated to the validation. It can be shown that the
Approxim. optimal value of this parameter is given by:
Model Selec.
2W 1 1
BP & Opt. r opt 1
2 ( W  1)
Conclusions
•The method works as follows:
• Train in the usual way the network using the data in
the estimation set

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Method of Early Stopping II

• After a period of estimation, the weights and bias

Contents levels of MLP are all ﬁxed and the network is
operating in its forward mode only. The validation
MLP Model
error is measured for each example present in the
BP Algorithm validation subset
Approxim. • When the validation phase is completed, the
Model Selec. estimation is resumed for another period (e.g. 10
epochs) and the process is repeated
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Leave k-out Validation

•We divide the set of available examples into K subsets

Contents
•The model is trained in all the subsets except for one
MLP Model
and the validation error is measured by testing it on the
BP Algorithm subset left out
Approxim. •The procedure is repeated for a total of K trials, each
Model Selec. time using a different subset for validation
BP & Opt. •The performance of the model is assessed by
averaging the squared error under validation over all the
Conclusions trials of the experiment
•There is a limiting case for K=N in which case the
method is called leave-one-out.

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Leave k-out Validation II

•An example with K=4 is shown below

Contents
MLP Model

BP Algorithm
Approxim.

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Network Pruning

•To solve real world problems we need to reduce the

Contents
free parameters of the model. We can achieve this
MLP Model objective in one of two ways:
BP Algorithm • Network growing: in which case we start with a
Approxim.
small MLP and then add a new neuron or layer of
hidden neurons only when we are unable to achieve
Model Selec. the performance level we want
BP & Opt. • Network pruning: in this case we start with a large
Conclusions MLP with an adequate performance for the
problem at hand, and then we prune it by weakening
or eliminating certain weights in a principled
manner

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Network Pruning II

•Pruning can be implemented as a form of

Contents
regularisation
MLP Model

BP Algorithm
Approxim.

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Regularisation

•In model selection we need to balance two needs:

Contents
• To achieve good performance, which usually leads
MLP Model
to a complex model
BP Algorithm
• To keep the complexity of the model manageable
Approxim. due to practical estimation diﬃculties and the
Model Selec. overﬁtting phenomenon
BP & Opt. •A principled approach to the counterbalance both
needs is given by regularisation theory.
Conclusions
•In this theory we
assume that the estimation of the
model takes place using the usual cost function and a
second term which is called complexity penalty:

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Regularisation II

R(w)=Es(w)+Ec(w)
Contents
MLP Model Where R is the total cost function, Es is the standard
BP Algorithm
performance measure, Ec is the complexity penalty and
>0 is a regularisation parameter
Approxim.

Model Selec.
•Typically one imposes smoothness constraints as a
complexity term. I.e. we want to co-minimise the
BP & Opt. smoothing integral of the kth-order:
Conclusions  1 k   2  
 c ( w , k )   ||  k F ( x , w ) ||  ( x ) d x
2 x

Where F(x,w) is the function performed by the model

and (x) is some weighting function which determines

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Regularisation III

the region of the input space where the function F(x,w)

Contents
is required to be smooth.
MLP Model

BP Algorithm
Approxim.

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Regularisation IV

•Other complexity penalty options include:

Contents
• Weight Decay:
MLP Model
  W

 c ( w ) || w ||   w i
2 2
BP Algorithm
Approxim. Where W is the total numberi of
1
all free parameters in
the model
Model Selec.
• Weight Elimination:
BP & Opt.

Conclusions  W
( wi / w0 )
2

c ( w )  
1  ( wi / w0 )
2
i 1

Where w0 is a pre-assigned parameter

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Regularisation V

•There are other methods which base their decision on

Contents
which weights to eliminate on the Hessian, H
MLP Model
•For example:
BP Algorithm
• The optimal brain damage procedure (OBD)
Approxim.
• The optimal brain surgeon procedure (OBS)
Model Selec.
• In this case a weight, wi, is eliminated when:
BP & Opt.

Conclusions
Eav < Si

Where Si is deﬁned as:

2
wi
Si  1
2[ H ] i ,i

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conjugate-Gradient Method

•The conjugate-gradient method is a 2nd order

Contents optimisation method, i.e. we assume that we can
MLP Model
approximate the cost function up to second degree in
the Taylor series:
BP Algorithm
 T  T 
1
Approxim. f (x)  x Ax  b x  c
2
Model Selec.
Where A and b are appropriate matrix and vector and x
BP & Opt.
is a W-by-1 vector
Conclusions
•We can ﬁnd the minimum point by solving the
equations:
x* = A-1b

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conjugate-Gradient Method II

•Given the matrix A we say that a set of nonzero

Contents vectors s(0), …, s(W-1) is A-conjugate if the following
condition holds:
MLP Model

BP Algorithm sT(n)As(j)=0 ,  n and j, nj

Approxim.

Model Selec. •If A is the identity matrix, conjugacy is the same as

orthogonality.
BP & Opt.
Conclusions
•A-conjugate vectors are linearly independent

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Summary of the Conjugate-Gradient Method

1. Initialisation: Unless prior knowledge on the weight

Contents vector w is available, choose the initial value w(0)
using a procedure similar to the ones which are
MLP Model
used for the BP algorithm
BP Algorithm
2. Computation:
Approxim.
1. For w(0), use the BP to compute the gradient vector
Model Selec. g(0)
BP & Opt. 2. Set s(0)=r(0)=-g(0)
Conclusions 3. At time step n, use a line search to find (n) that
minimises Eav(n) sufficiently, representing the cost
function Eav expressed as a function of  for fixed
values of w and s

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Summary of the Conjugate-Gradient Method II

4. Test to determine if the Euclidean norm of the

Contents residual r(n) has fallen below a speciﬁc value, that is,
a small fraction of the initial value ||r(0)||
MLP Model
5. Update the weight vector:
BP Algorithm
w(n+1)=w(n)+ (n) s(n)
Approxim.
6. For w(n+1), use the BP to compute the updated
Model Selec. gradient vector g(n+1)
BP & Opt. 7. Set r(n+1)=-g(n+1)
Conclusions
8. Use the Polak-Ribiere formula to calculate (n+1):
T  
r (n  1)[ r ( n  1)  r ( n )] 
 ( n  1)  max  T  , 0
 r (n)r (n) 

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Summary of the Conjugate-Gradient Method III

9. Update the direction vector:

Contents
s(n+1)=r(n+1)+ (n+1)s(n)
MLP Model 10. Set n=n+1 and go to step 3
BP Algorithm
3. Stopping Criterion: Terminate the algorithm when
Approxim. the following condition is satisﬁed:
Model Selec. ||r(n)||   ||r(0)||
BP & Opt. Where  is a prescribed small number
Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Advantages & Disadvantages

•MLP and BP is used in Cognitive and Computational

Contents Neuroscience modelling but still the algorithm does not
MLP Model have real neuro-physiological support
•The algorithm can be used to make encoding /
BP Algorithm
decoding and compression systems. Useful for data
Approxim. pre-processing operations
Model Selec. •The MLP with the BP algorithm is a universal
approximator of functions
BP & Opt.
•The algorithm is computationally eﬃcient as it has
Conclusions O(W) complexity to the model parameters
•The algorithm has “local” robustness
•The convergence of the BP can be very slow,
especially in large problems, depending on the method

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Advantages & Disadvantages II

•The BP algorithm suffers from the problem of local

Contents minima
MLP Model

BP Algorithm
Approxim.

Model Selec.
BP & Opt.

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Types of problems

•The BP algorithm is used in a great variety of problems:

Contents • Time series predictions
MLP Model • Credit risk assessment
BP Algorithm • Pattern recognition
• Speech processing
Approxim.
• Cognitive modelling
Model Selec.
• Image processing
BP & Opt.
• Control
Conclusions • Etc
•BP is the standard algorithm against which all other
NN algorithms are compared!!

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Physics-Informed Neural Networks For Encoding Dynamics in Real Physical Systems
No ratings yet
Physics-Informed Neural Networks For Encoding Dynamics in Real Physical Systems
110 pages
Deep LearningINAF With MATLAB
No ratings yet
Deep LearningINAF With MATLAB
80 pages
Training Deep Neural Networks
No ratings yet
Training Deep Neural Networks
55 pages
Chapter 7 - Neural-Networks
100% (1)
Chapter 7 - Neural-Networks
60 pages
Multi Layered Neural Networks
No ratings yet
Multi Layered Neural Networks
1 page
Chapter 7 - Regression Analysis
100% (1)
Chapter 7 - Regression Analysis
111 pages
Prismax Spec
No ratings yet
Prismax Spec
2 pages
Artificial Neural Network
100% (2)
Artificial Neural Network
20 pages
Driver HP LaserJet 107W
No ratings yet
Driver HP LaserJet 107W
2 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
55 pages
LangChain - Chat With Your Data
No ratings yet
LangChain - Chat With Your Data
32 pages
CUDA Memory Types: Parallel and High Performance Computing
No ratings yet
CUDA Memory Types: Parallel and High Performance Computing
27 pages
Unit 2 DL
No ratings yet
Unit 2 DL
44 pages
Deep Learning Unit-III
No ratings yet
Deep Learning Unit-III
9 pages
ANN Notes
No ratings yet
ANN Notes
54 pages
Vizio Vw32l Hdtv10a Service Manual
100% (1)
Vizio Vw32l Hdtv10a Service Manual
136 pages
Physics Informed Neural Network Theory and Applications
No ratings yet
Physics Informed Neural Network Theory and Applications
44 pages
Feature Selection in Machine Learning
No ratings yet
Feature Selection in Machine Learning
4 pages
Backpropagation Learning in Neural Networks
No ratings yet
Backpropagation Learning in Neural Networks
27 pages
Levine Smume6 01
100% (1)
Levine Smume6 01
14 pages
S 8401 PDF
No ratings yet
S 8401 PDF
110 pages
02 Fundamentals of Neural Network
No ratings yet
02 Fundamentals of Neural Network
40 pages
REQUIREMENTS-Storage and Filling of LPG in Bulk: WWW - Erc.go - Ke
No ratings yet
REQUIREMENTS-Storage and Filling of LPG in Bulk: WWW - Erc.go - Ke
2 pages
NeuralNetworks One PDF
No ratings yet
NeuralNetworks One PDF
58 pages
CNN PPT Unit Iv
No ratings yet
CNN PPT Unit Iv
134 pages
AI-Lecture 12 - Simple Perceptron
100% (1)
AI-Lecture 12 - Simple Perceptron
24 pages
Anfis Structure
No ratings yet
Anfis Structure
5 pages
Unit 2 DL
No ratings yet
Unit 2 DL
43 pages
Multiple-Layer Networks Backpropagation Algorithms
No ratings yet
Multiple-Layer Networks Backpropagation Algorithms
46 pages
F
No ratings yet
F
124 pages
Lec 06 Feature Selection and Extraction
No ratings yet
Lec 06 Feature Selection and Extraction
43 pages
DFT Domain Image
No ratings yet
DFT Domain Image
65 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
2BN1 2BN2 2012
No ratings yet
2BN1 2BN2 2012
63 pages
Machine Learning: Neural Networks
No ratings yet
Machine Learning: Neural Networks
22 pages
Kementerian Keuangan Republik Indonesia: Direktorat Jenderal Bea Dan Cukai Sekretariat Direktorat Jenderal Bea Dan Cukai
No ratings yet
Kementerian Keuangan Republik Indonesia: Direktorat Jenderal Bea Dan Cukai Sekretariat Direktorat Jenderal Bea Dan Cukai
3 pages
Python For Multivariate Analysis
No ratings yet
Python For Multivariate Analysis
47 pages
L10 - Intro - To - Deep - Learning
No ratings yet
L10 - Intro - To - Deep - Learning
75 pages
Characteristics of Artificial Neural Networks
No ratings yet
Characteristics of Artificial Neural Networks
38 pages
Back Propagation
100% (1)
Back Propagation
27 pages
Notes On Backpropagation
No ratings yet
Notes On Backpropagation
14 pages
Answers All 2007
0% (1)
Answers All 2007
64 pages
Temperature Prediction Models in Mass Concrete State of The Art Literature Review
No ratings yet
Temperature Prediction Models in Mass Concrete State of The Art Literature Review
10 pages
G5Aiai Introduction To AI: Graham Kendall
No ratings yet
G5Aiai Introduction To AI: Graham Kendall
48 pages
Ann Chapter 2
No ratings yet
Ann Chapter 2
240 pages
Deep Learning Lab Practicals
No ratings yet
Deep Learning Lab Practicals
24 pages
Physics-Informed Deep-Learning For Scientific Computing PDF
No ratings yet
Physics-Informed Deep-Learning For Scientific Computing PDF
19 pages
ANN Supervised Learning (Compatibility Mode)
No ratings yet
ANN Supervised Learning (Compatibility Mode)
73 pages
3 - ANN Part One PDF
No ratings yet
3 - ANN Part One PDF
30 pages
Resume Shubhendu
100% (1)
Resume Shubhendu
2 pages
Lecture Notes SC
No ratings yet
Lecture Notes SC
21 pages
Audi A6 f2 Faulty 0009
No ratings yet
Audi A6 f2 Faulty 0009
2 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
41 pages
11.feature Selection, Extraction
No ratings yet
11.feature Selection, Extraction
38 pages
Face Recognition With GNU Octave/MATLAB: Philipp Wagner
No ratings yet
Face Recognition With GNU Octave/MATLAB: Philipp Wagner
14 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
Intro4 ANN Deep CNN PDF
No ratings yet
Intro4 ANN Deep CNN PDF
20 pages
FREE Equation Calculator - Equations Solver - Mathematics Software
No ratings yet
FREE Equation Calculator - Equations Solver - Mathematics Software
4 pages
Btech CSE
No ratings yet
Btech CSE
17 pages
Artificial Intelligence in Mechanical Engineering: A Case Study On Vibration Analysis of Cracked Cantilever Beam
No ratings yet
Artificial Intelligence in Mechanical Engineering: A Case Study On Vibration Analysis of Cracked Cantilever Beam
4 pages
Alternate Autonomous AP Upgrade Procedure
No ratings yet
Alternate Autonomous AP Upgrade Procedure
14 pages
Neural and Fuzzy Logic
No ratings yet
Neural and Fuzzy Logic
8 pages
The Backpropagation Algorithm
No ratings yet
The Backpropagation Algorithm
4 pages
Riki Endri S (Kipas Angin Dinding Portable)
No ratings yet
Riki Endri S (Kipas Angin Dinding Portable)
10 pages
ICEF 2020 Keynote Prith Banerjee
No ratings yet
ICEF 2020 Keynote Prith Banerjee
23 pages
Deep Learning Unit 1
No ratings yet
Deep Learning Unit 1
24 pages
QSK19 M 660hk
100% (2)
QSK19 M 660hk
2 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
25 pages
UNIT-I - Introduction To Computer Vision
No ratings yet
UNIT-I - Introduction To Computer Vision
45 pages
Dell Optiplex 3020-Small Form Factor Owner'S Manual: Regulatory Model: D08S Regulatory Type: D08S001
No ratings yet
Dell Optiplex 3020-Small Form Factor Owner'S Manual: Regulatory Model: D08S Regulatory Type: D08S001
63 pages
고등영어 Day 2
No ratings yet
고등영어 Day 2
4 pages
Introduction To Neural Networks Using Matlab 6 0 S N Sivanandam Sumathi Deepa
0% (1)
Introduction To Neural Networks Using Matlab 6 0 S N Sivanandam Sumathi Deepa
4 pages
Ann Book
No ratings yet
Ann Book
16 pages
Zlib 3 PDF
No ratings yet
Zlib 3 PDF
2 pages
Os Installation
No ratings yet
Os Installation
16 pages
Modelo de Negocio Secubike
No ratings yet
Modelo de Negocio Secubike
1 page
SET-280. Controlling AC Lamp Dimmer Through Mobile Phone
No ratings yet
SET-280. Controlling AC Lamp Dimmer Through Mobile Phone
3 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
CZ4031 Project 2 Report
No ratings yet
CZ4031 Project 2 Report
34 pages
Dev Guide
No ratings yet
Dev Guide
8 pages
Deep Learning
No ratings yet
Deep Learning
127 pages
Lecture 01 Intro
No ratings yet
Lecture 01 Intro
31 pages
Matlab Deep Learning Series
No ratings yet
Matlab Deep Learning Series
6 pages
Accountinginthe Cloud
No ratings yet
Accountinginthe Cloud
15 pages
2-DigitalOcean Invoice 2023 Sep (7467235-466314537)
No ratings yet
2-DigitalOcean Invoice 2023 Sep (7467235-466314537)
2 pages
Datasheet - A-HV-3U Battery BOS-A V1.1
No ratings yet
Datasheet - A-HV-3U Battery BOS-A V1.1
6 pages
Deep Learning (MODULE-3)
No ratings yet
Deep Learning (MODULE-3)
85 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet
Hebbian Learning: Fundamentals and Applications for Uniting Memory and Learning
From Everand
Hebbian Learning: Fundamentals and Applications for Uniting Memory and Learning
Fouad Sabry
No ratings yet