0% found this document useful (0 votes)
57 views23 pages

MAI Lecture 07 RBFN

The document discusses radial basis function networks, which use radial basis functions as activation functions in hidden layers to map nonlinearly separable inputs to a higher dimensional space where they become linearly separable, allowing complex functions to be approximated by weighted combinations of simple basis functions; it covers the principles, parameters, and common training approaches of RBF networks.

Uploaded by

Yeabsira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views23 pages

MAI Lecture 07 RBFN

The document discusses radial basis function networks, which use radial basis functions as activation functions in hidden layers to map nonlinearly separable inputs to a higher dimensional space where they become linearly separable, allowing complex functions to be approximated by weighted combinations of simple basis functions; it covers the principles, parameters, and common training approaches of RBF networks.

Uploaded by

Yeabsira
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Radial Basis Function

Beakal Gizachew
Function Approximation

* * ?
* *
* *
* *
*
*
Function Approx. Vs. Classification

Ü Classification can be thought of as a special case of function


approximation:
Ä For a three class problem:
x1
Class 1: [1 0 0]
x Classifier Class 2: [0 1 0]
….

Class 3: [0 0 1]
xd

x1
1: Class 1
x Classifier 2: Class 2
….

xd 3: Class 3

d-dimensional input 1 or 3, c-dimensional input


y=f(x) y
x
Radial Basis Function Networks

………..
Linear output
Input layer layer
Nonlinear
transformation layer
(generates local receptive fields)

The receptive fields nonlinearly transforms (maps) the input feature space,
where the input patterns are not linearly separable, to the hidden unit space,
where the mapped inputs may be linearly separable.
Ü The hidden unit space often needs to be of a higher dimensionality
The (you guessed it right) XOR Problem
x2
Consider the nonlinear functions to map the input vector x to the j1- j2 space
1
- x - t1 2
x=[x1 x2]
j1(x) = e t1 = [1 1]T
- x -t 2 2
0 j 2 (x) = e t 2 = [0 0]T
x1
0 1

_ (1,1)
Input x j1(x) j2(x) 1.0
_
(1,1) 1 0.1353 0.8
_
(0,1) 0.3678 0.3678 ü 0.6
(1,0) 0.3678 0.3678 _
0.4 (0,1)
(0,0) 0.1353 1 _ (1,0)
0.2 (0,0)
_
0 | | | | | |
0 0.2 0.4 0.6 0.8 1.0 1.2

The nonlinear j function transformed a nonlinearly separable problem into a linearly separable one !!!
Initial Assessment

Ü Using nonlinear functions, we can convert a nonlinearly separable


problem into a linearly separable one.
Ü From a function approximation perspective, this is equivalent to
implementing a complex function (corresponding to the nonlinearly
separable decision boundary) using simple functions (corresponding
to the linearly separable decision boundary)
Ü Implementing this procedure using a network architecture, yields the
RBF networks, if the nonlinear mapping functions are radial basis
functions.
Ü Radial Basis Functions:
Ä Radial: Symmetric around its center
Ä Basis Functions: A set of functions whose linear combination can
generate an arbitrary function in a given function space.
RBF Networks

d input
nodes H hidden layer RBFs
(receptive fields)
x1
j1 c output
nodes
x2 z1
……...

Wkj

..
……....

æH ö H
Uji netk zk = f (netk ) = f å wkj y j ÷ = å wkj y j
ç
ç ÷
è j =1 ø j =1
jj y
..
j Linear act. function
zc

x(d-1)
j (net J = x - u J )
x1
jH uJi
2
j æ x -uJ ö
xd U = XT - çç
s
÷÷
xd =e è ø
s: spread constant
Principle of Operation

Euclidean Norm
j (net J = x - u J )
x1
UJi
2 s: spread constant
j yJ æ x -uJ
- çç
ö
÷÷
s
xd =e è ø

j y1
æH ö H
wKj zk = f (netk ) = f å wkj y j ÷ = å wkj y j
ç
ç ÷
è j =1 ø j =1
j yH

Unknowns: uji, wkj, s


Principle of Operation

Ü What do these parameters represent?


Ä Physical meanings:
• j: The radial basis function for the hidden layer. This is a simple nonlinear
mapping function (typically Gaussian) that transforms the d- dimensional
input patterns to a (typically higher) H-dimensional space. The complex
decision boundary will be constructed from linear combinations (weighted
sums) of these simple building blocks.
• uji: The weights joining the first to hidden layer. These weights constitute the
center points of the radial basis functions.
• s: The spread constant(s). These values determine the spread (extend) of
each radial basis function.
• Wjk: The weights joining hidden and output layers. These are the weights
which are used in obtaining the linear combination of the radial basis
functions. They determine the relative amplitudes of the RBFs when they are
combined to form the complex function.
RBFN Principle of Operation

wJ:Relative weight
of Jth RBF
jJ: Jth RBF function

sJ

*
uJ Center of Jth RBF
How to Train?

Ü There are various approaches for training RBF networks.


Ä Approach 1: Exact RBF – Guarantees correct classification of all
training data instances. Requires N hidden layer nodes, one for each
training instance. No iterative training is involved. RBF centers (u) are
fixed as training data points, spread as variance of the data, and w are
obtained by solving a set of linear equations
Ä Approach 2: Fixed centers selected at random. Uses H<N hidden
layer nodes. No iterative training is involved. Spread is based on
Euclidean metrics, w are obtained by solving a set of linear equations.
Ä Approach 3: Centers are obtained from unsupervised learning
(clustering). Spreads are obtained as variances of clusters, w are obtained
through LMS algorithm. Clustering (k-means) and LMS are iterative. This
is the most commonly used procedure. Typically provides good results.
Ä Approach 4: All unknowns are obtained from supervised learning.
Approach 1
Ü Exact RBF
Ä The first layer weights u are set to the training data; U=XT. That is the
gaussians are centered at the training data instances.
d max
Ä The spread is chosen as s = , where dmax is the maximum Euclidean
2N
distance between any two centers, and N is the number of training data
points. Note that H=N, for this case.
Ä The output of the kth RBF output neuron is then
N N
Single output
zk = å ) kj × j
(wjw x-uj Multiple z = å w j ×j x - u j
outputs
j =1 j =1

Ä During training, we want the outputs to be equal to our desired targets.


Without loss of any generality, assume that we are approximating a single
dimensional function, and let the unknown true function be f(x). The
desired output for each input is then di=f(xi), i=1, 2, …, N.
(Not to be confused with input dimensionality d)
Approach 1
(Cont.)
Ü We then have a set of linear equations, which can be represented in
the matrix form:
é j11 j12 " j1N ù é w1 ù é d1 ù
N
êj ú êw ú êd ú
z = å w j ×j x - u j ê 21
j 22 " j 2N ú ê 2 ú ê 2 ú
× =
j =1 ê ! ! ! ! ú ê ! ú ê ! ú
y
ê ú ê ú ê ú
j
ë N1 j N2 " j NN û ë wN û ëd N û

jij = j xi - x j , (i, j ) = 1,2,..., N

d = [d1, d 2 ,!d N ]T F×w = d


w = [ w1, w2 ,! wN ]T
w = F -1d
Define:

{
F = j ij | (i, j ) = 1,2,..., N }
Is this matrix always invertible?
Approach 1
(Cont.)
Ü Michelli’s Theorem (1986)
Ä If {xi}iN=1 are a distinct set of points in the d-dimensional space, then the
N by N interpolation matrix F with elements obtained from radial basis
functions jij = j xi - x j is nonsingular, and hence can be inverted!
Ä Note that the theorem is valid regardless the value of N, the choice of
the RBF (as long as it is an RBF), or what the data points may be, as long
as they are distinct!
Ä A large number of RBFs can be used:

• Multiquadrics: (
j (r) = r + c 2
)
2 1/ 2

for some c > 0, r Î Â


1
j (r) =
(r 2 + c2 )1/ 2
• Inverse multiquadrics:

• Gaussian functions: j (r) = e (- r 2 2s 2 ) for somes > 0, r Î Â

r = x-xj
Approach1
(Cont.)
Ü The Gaussian is the most commonly used RBF (why…?).
Ü Note that
as r ® ¥, j ( r ) ® 0
Ä Gaussian RBFs are localized functions ! unlike the sigmoids used by MLPs

Using Gaussian radial basis functions Using sigmoidal radial basis functions
Exact RBF Properties
Ü Using localized functions typically makes RBF networks more suitable for
function approximation problems. Why?
Ü Since first layer weights are set to input patterns, second layer weights are
obtained from solving linear equations, and spread is computed from the
data, no iterative training is involved !!!
Ü Guaranteed to correctly classify all training data points!
Ü However, since we are using as many receptive fields as the number of
data, the solution is over determined, if the underlying physical process
does not have as many degrees of freedom è Overfitting!
Ü The importance of s: Too small will
also cause overfitting. Too large will
fail to characterize rapid changes in
the signal.
Too many Receptive Fields?

Ü In order to reduce the artificial complexity of the RBF, we need to


use fewer number of receptive fields.
Ü How about using a subset of training data, say M < N of them.
Ü These M data points will then constitute M receptive field centers.
Ü How to choose these M points…?
Ä At random è Approach 2.
æ M 2ö
ç- xi -x j ÷ d max
i = 1,2,..., N j = 1,2,..., M s =
ç d2 ÷
y j = jij = j æç xi - x j ÷

= eè max ø,
è ø 2M
Output layer weights are determined as they were in Approach 1, through solving a
set of M linear equations!

Ä Unsupervised training: K-means è Approach 3


The centers are selected through self organization of clusters, where the
data is more densely populated. Determining M is usually heuristic.
Approach 3
K-Means - Unsupervised
Clustering - Algorithm

Ü Choose number of clusters, M


Ä Initialize M cluster centers to the first M training data points: tk=xk, k=1,2,…,M.
Ü Repeat
Ä At iteration n, group all patterns to the cluster whose center is closest
tk(n): center of kth RBF at
C (x ) = arg min x(n ) - t k (n ) , k = 1,2,..., M
k nth iteration
Ä Compute the centers of all clusters after the regrouping
Mk
1
New cluster center
tk =
Mk
åx j
j =1 Instances that are grouped
for kth RBF.
in the kth cluster
Number of instances
in the kth cluster

Ü Until there is no change in cluster centers from one iteration to the next.

An alternate k-means algorithm is given in Haykin (p. 301).


Determining the Output Weights:
Approach 3 LMS Algorithm
1
Ü The LMS algorithm is used to minimize the cost function E ( w ) = e2 (n ) where
2
e(n) is the error at iteration n: e(n ) = d (n ) - yT (n ) w(n )
¶E ( w ) ¶e( n ) ¶e(n) ¶E ( w )
= e( n ) = - y(n) = - y ( n ) e( n )
¶w ( n ) ¶w ¶w ¶w ( n )

Ä Using the steepest (gradient) descent method: w(n + 1) = w(n) + hy(n)e(n)

Instance based LMS algorithm pseudocode (for single output):


Initialize weights, wj to some small random value, j=1,2,…,M
Repeat
Choose next training pair (x, d);
M
Compute network output at iteration n: z(n) = å w j × j x - x j = wT × y
j =1
Compute error: e( n ) = d ( n ) - z ( n )
Update weights: w(n + 1) = w(n) + he(n) y (n)
Until weights converge to a steady set of values
Approach 4:
Supervised
RBF Training
Ü This is the most general form.
Ü All parameters, receptive field centers (first layer weights), output layer weights
and spread constants, are learned through iterative supervised training using LMS /
gradient descent algorithm.
1 N 2
E = åe j
2 j =1

( )
M
e j = d j - å wkj x j - ti
i =1

(
G x j - ti
C
)= j ( x j - ti )
G’ represents the first derivative
of the function wrt its argument
RBF Example
Assignment RBF Implementation

1. Implement Approach 3. Write your own K-means and Least Means


Square (LMS )algorithm. Compare your results to that of python
built in function, both for function approximation and classification
problems.
2. Apply your algorithms to the EEG Eye State Data Set (available in
https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State# from UCI
– ML database)
Questions?

23

You might also like