0% found this document useful (0 votes)
48 views

Introduction

The document discusses adaptive networks and their architecture. It defines adaptive networks as network structures whose input-output behavior is determined by modifiable parameters. There are two types: feedforward and recurrent networks. Feedforward networks allow static, nonlinear mappings between input and output spaces. This mapping is determined by the network's parameters, which are adjusted by learning rules to minimize error between actual and desired outputs. Backpropagation is introduced as a basic learning rule that uses gradient descent to calculate error derivatives and update parameters recursively throughout the network.

Uploaded by

Herald Rufus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Introduction

The document discusses adaptive networks and their architecture. It defines adaptive networks as network structures whose input-output behavior is determined by modifiable parameters. There are two types: feedforward and recurrent networks. Feedforward networks allow static, nonlinear mappings between input and output spaces. This mapping is determined by the network's parameters, which are adjusted by learning rules to minimize error between actual and desired outputs. Backpropagation is introduced as a basic learning rule that uses gradient descent to calculate error derivatives and update parameters recursively throughout the network.

Uploaded by

Herald Rufus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Unit 3: Adaptive Networks

 Introduction (8.1)

 Architecture (8.2)

 Backpropagation for Feedforward


Networks (8.3)
Jyh-Shing Roger Jang et al., Neuro-Fuzzy and Soft Computing: A Computational
Approach to Learning and Machine Intelligence, First Edition, Prentice Hall, 1997
2

Introduction
 Almost all kinds of Neural Networks paradigms
with supervised learning capabilities are unified
through adaptive networks

 Nodes are process units. Causal relationships


between the connected nodes are expressed with
links

 All or part of the nodes are adaptive, which means


that the outputs of these nodes are driven by
modifiable parameters pertaining to these nodes
3
Introduction (8.1) (cont.)
 A learning rule explains how these parameters (or
weights) should be updated to minimize a
predefined error measure

 The error measure computes the discrepancy


between the network’s actual output and a desired
output

 The steepest descent method is used as a basic


learning rule. It is also called backpropagation
4

Architecture (8.2)
 Definition: An adaptive network is a network
structure whose overall input-output behavior is
determined by a collection of modifiable
parameters
 The configuration of an adaptive network is
composed of a set of nodes connected by direct
links
 Each node performs a static node function on its
incoming signals to generate a single node output

 Each link specifies the direction of signal flow


from one node to another
Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
5

A feedforward adaptive network in layered representation

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
6
Example 1 : Parameter sharing in adoptive network

A single node

Parameter sharing problem

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
7
Architecture (8.2) cont.)

 There are 2 types of adoptive networks

– Feedforward (acyclic directed graph such as in


fig. 8.1)

– Recurrent (contains at least one directed cycle)

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
8

A recurrent adaptive network

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
9
Architecture (8.2) (cont.)

 Static Mapping

– A feedforward adaptive network is a static


mapping between its inputs and output spaces

– This mapping may be linear or highly


nonlinear

– Our goal is to construct a network for


achieving a desired nonlinear mapping
Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
10
Architecture (8.2) (cont.)

 Static Mapping (cont.)

– This nonlinear mapping is regulated by a data


set consisting of desired input-output pairs of a
target system to be modeled: this data set is
called training data set

– The procedures that adjust the parameters to


improve the network’s performance are called
the learning rules

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
11
Architecture (8.2) (cont.)

 Static Mapping (cont.)

– Example 2: An adaptive network with a single


linear node

x3 = f3 (x1, x2; a1, a2, a3) = a1 x1 + a2 x2 + a3

where: x1, x2 are inputs; a1, a2 and a3 are


modifiable parameters

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
12

A linear single-node adaptive network

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
13
Architecture (8.2) (cont.)
 Static Mapping (cont.)
– Identification of parameters can be performed
through the linear least-squares estimation
method of chapter 5

– Example 3: Perception network


x3 = f3 (x1, x2; a1, a2, a3) = a1 x1 + a2 x2 + a3 and
1 if x 3  0
x 4  f4 (x 3 )  
0 if x 3  0

(f4 is called a step function)


Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
14

A nonlinear single-node adaptive network

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
15
Architecture (8.2) (cont.)
 The step function is discontinuous at one point
(origine) and flat at all other points, it is not
suitable for derivative based learning procedures
 use of sigmoidal function

 Sigmoidal function has values between 0 and 1


and is expressed as:
1
x 4  f 4 (x 3 ) 
1  e 
x3

 Sigmoidal function is the building-block of the


multi-layer perceptron
Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
16
Architecture (8.2) (cont.)
 Example 4: A multilayer perceptron

1
x7 
1  exp w 4,7 x 4  w 5,7 x 5  w 6,7 x 6  t 7 

where x4, x5 and x6 are outputs from nodes 4, 5


and 6 respectively and the set of parameters of
node 7 is {w4,7, w5,7, w6,7, t7}

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
17

A 3-3-2 neural network

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
Backpropagation for Feedforward 18

Networks (8.3)
 Basic learning rule for adaptive networks

 It is a steepest descent-based method discussed in chapter 6

 It is a recursive computation of the gradient vector in which


each element is the derivative of an error measure with
respect to a parameter

 The procedure of finding a gradient vector in a network


structure is referred to as backpropagation because the
gradient vector is calculated in the direction opposite to the
flow of the output of each node
Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
19
Backpropagation for Feedforward Networks (8.3) (cont.)

 Once the gradient is computed, regression


techniques are used to update parameters
(weights, links)
 Notations:

Assume we have L layers (l = 0, 1, …, L – 1)


– N(l) represents the number of nodes in layer l
– xl,I represents the output of node i in layer l
(i = 1, …, N(l))
– fl,i represents the function of node i

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
20
Backpropagation for Feedforward Networks (8.3) (cont.)

 Principle

– Since the output of a node depends on the


incoming signals and the parameter set of the
node, we can write:
x l ,i  f l ,i ( x l 1,1 , x l 1,2 ,..., x l 1,N( l 1) ,  , ,  ,...)

where , , , …, are the parameters of this


node.

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
21

A layered representation

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
22
Backpropagation for Feedforward Networks (8.3) (cont.)
 Principle (cont.)
– Let assume that the training set has P patterns,
therefore we define an error for the pth pattern as:
N( L )
Ep  
 k L ,k
d  x 2

k 1

Where: dk is the k-th component of the pth desired


output vector and xL,K is the k-th component of the
predicted output vector produced by presenting the
pth input vector to the network

– The task is to minimize


P an overall measure defined
as: E   Ep
Dr. Djamel Bouchaffra p 1 CSE 513 Soft Computing, Ch8: Adaptive Networks
23
Backpropagation for Feedforward Networks (8.3) (cont.)
 Principle (cont.)

– To use the steepest descent to minimize E,


we have to compute E (gradient vector)

– Causal Relationships

Change in Change in outputs Change in Change in


parameter  of nodes containing  network’s output error measure

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
24
Backpropagation for Feedforward Networks (8.3) (cont.)

 Principle (cont.)

– Therefore, the basic concept in calculating the


gradient vector is to pass a form of derivative
information starting from the output layer and
going backward layer by layer until the input
layer is attained
  Ep
 l ,i 
– Let’s define x l , i as the error signal on

– the node i in layer l (ordered derivative)


Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
25
Backpropagation for Feedforward Networks (8.3) (cont.)

 Principle (cont.)

– Example: Ordered derivatives and ordering


partial derivatives

Consider the following adaptive network

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
26

Ordered derivatives & ordinary partial derivatives

 z  g( x, y )

y  f ( x )
Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
27
Backpropagation for Feedforward Networks (8.3) (cont.)
– For the ordinary partial derivative,
z  g ( x , y ) x and y are assumed independent without

x x paying attention that y = f(x)

– For the ordered derivative, we take the indirect causal


relationship
 into account,
 z g( x, f ( x)) g( x, y ) g ( x , y ) f
  y f ( x )  y f ( x ) *
x x x y x

– The gradient vector is defined as the derivative of the error


measure with respect to the parameter variables. If  is a
 parameter
E p   E of
p the
 f l ,ith
i
node fatl , ilayer l, we can write:
 *   l ,i Where l,i is computed through
  x l ,i   The indirect causal relationship

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
28
Backpropagation for Feedforward Networks (8.3) (cont.)

– The derivative of the overall error measure E


with respect to  is:

 E P   Ep

 p 1 

– Using a steepest descent scheme, the update for


 is:  E
 next   now  


Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
29
Backpropagation for Feedforward Networks (cont.)

– Example 8.6 Adaptive network and its error


propagation model

In order to calculate the error signals at


internal nodes, an error-propagation network is
built

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
30

Error propagation model

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
31
Backpropagation for Feedforward Networks (8.3) (cont.)


 Ep E p
9    2(d 9  x 9 )
x 9 x 9
  Ep E p
8    2(d 8  x 8 )
x 8 x 8
  Ep   Ep f 8   Epf 9 f 8 f 9
7   *  *  8  9
x 7 x 8 x 7 x 9 x 7 x 7 x 7
 
w 8,7 w 9,7

Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks
32
Backpropagation for Feedforward Networks (8.3) (cont.)

– There are 2 types of learning algorithms:

• Batch learning (off-line learning): the


update for  takes place after the whole
training data set has been presented (one
epoch)

• On-line learning (pattern-by-pattern


learning): the parameters  are updated
immediately after each input output pair has
been presented
Dr. Djamel Bouchaffra CSE 513 Soft Computing, Ch8: Adaptive Networks

You might also like