MAI Lecture 07 RBFN
MAI Lecture 07 RBFN
Beakal Gizachew
Function Approximation
* * ?
* *
* *
* *
*
*
Function Approx. Vs. Classification
Class 3: [0 0 1]
xd
x1
1: Class 1
x Classifier 2: Class 2
….
xd 3: Class 3
………..
Linear output
Input layer layer
Nonlinear
transformation layer
(generates local receptive fields)
The receptive fields nonlinearly transforms (maps) the input feature space,
where the input patterns are not linearly separable, to the hidden unit space,
where the mapped inputs may be linearly separable.
Ü The hidden unit space often needs to be of a higher dimensionality
The (you guessed it right) XOR Problem
x2
Consider the nonlinear functions to map the input vector x to the j1- j2 space
1
- x - t1 2
x=[x1 x2]
j1(x) = e t1 = [1 1]T
- x -t 2 2
0 j 2 (x) = e t 2 = [0 0]T
x1
0 1
_ (1,1)
Input x j1(x) j2(x) 1.0
_
(1,1) 1 0.1353 0.8
_
(0,1) 0.3678 0.3678 ü 0.6
(1,0) 0.3678 0.3678 _
0.4 (0,1)
(0,0) 0.1353 1 _ (1,0)
0.2 (0,0)
_
0 | | | | | |
0 0.2 0.4 0.6 0.8 1.0 1.2
The nonlinear j function transformed a nonlinearly separable problem into a linearly separable one !!!
Initial Assessment
d input
nodes H hidden layer RBFs
(receptive fields)
x1
j1 c output
nodes
x2 z1
……...
Wkj
..
……....
æH ö H
Uji netk zk = f (netk ) = f å wkj y j ÷ = å wkj y j
ç
ç ÷
è j =1 ø j =1
jj y
..
j Linear act. function
zc
…
x(d-1)
j (net J = x - u J )
x1
jH uJi
2
j æ x -uJ ö
xd U = XT - çç
s
÷÷
xd =e è ø
s: spread constant
Principle of Operation
Euclidean Norm
j (net J = x - u J )
x1
UJi
2 s: spread constant
j yJ æ x -uJ
- çç
ö
÷÷
s
xd =e è ø
j y1
æH ö H
wKj zk = f (netk ) = f å wkj y j ÷ = å wkj y j
ç
ç ÷
è j =1 ø j =1
j yH
wJ:Relative weight
of Jth RBF
jJ: Jth RBF function
sJ
*
uJ Center of Jth RBF
How to Train?
{
F = j ij | (i, j ) = 1,2,..., N }
Is this matrix always invertible?
Approach 1
(Cont.)
Ü Michelli’s Theorem (1986)
Ä If {xi}iN=1 are a distinct set of points in the d-dimensional space, then the
N by N interpolation matrix F with elements obtained from radial basis
functions jij = j xi - x j is nonsingular, and hence can be inverted!
Ä Note that the theorem is valid regardless the value of N, the choice of
the RBF (as long as it is an RBF), or what the data points may be, as long
as they are distinct!
Ä A large number of RBFs can be used:
• Multiquadrics: (
j (r) = r + c 2
)
2 1/ 2
r = x-xj
Approach1
(Cont.)
Ü The Gaussian is the most commonly used RBF (why…?).
Ü Note that
as r ® ¥, j ( r ) ® 0
Ä Gaussian RBFs are localized functions ! unlike the sigmoids used by MLPs
Using Gaussian radial basis functions Using sigmoidal radial basis functions
Exact RBF Properties
Ü Using localized functions typically makes RBF networks more suitable for
function approximation problems. Why?
Ü Since first layer weights are set to input patterns, second layer weights are
obtained from solving linear equations, and spread is computed from the
data, no iterative training is involved !!!
Ü Guaranteed to correctly classify all training data points!
Ü However, since we are using as many receptive fields as the number of
data, the solution is over determined, if the underlying physical process
does not have as many degrees of freedom è Overfitting!
Ü The importance of s: Too small will
also cause overfitting. Too large will
fail to characterize rapid changes in
the signal.
Too many Receptive Fields?
Ü Until there is no change in cluster centers from one iteration to the next.
( )
M
e j = d j - å wkj x j - ti
i =1
(
G x j - ti
C
)= j ( x j - ti )
G’ represents the first derivative
of the function wrt its argument
RBF Example
Assignment RBF Implementation
23