Radial Basis Function (RBF) Networks
1. They are two-layer feed-forward networks.
2. The hidden nodes implement a set of radial
basis functions (e.g. Gaussian functions).
3. The output nodes implement linear
summation functions as in an MLP.
4. The network training is divided into two
stages: first the weights from the input to
hidden layer are determined, and then the
weights from the hidden to output layer.
5. The training/learning is very fast.
6. The networks are very good at interpolation.
There is considerable evidence that neurons in visual
cortex are tuned to local regions in the retina. They are
maximally sensitive to some specific stimulus, & their
output falls off as presented stimulus moves away from
this “best” stimulus.
Gaussian basis functions
• Each RBF node in the hidden layer responds to input
only in some subspace of the input space. When input
is far away from its own center µ, (many radii
(standard deviations, σ ) away), then the output of that
unit will be so small as to be ignorable
• Each RBF has a receptive field, that is, an area of the
input space to which it responds
BIOLOGICAL PLAUIBILITY:
• RBF is more BIOLOGICALLY PLAUSIBLE, since
many sensory neurons respond only to some small
subspace of the input space, and are silent in response
to all other inputs.
Implementing XOR
0 .5
2
When mapped into the feature space (z1, z2) , the two
classes become linearly separable.
Training RBF nets
Typically, the weights of the two layers are determined
separately, i.e. find RBF weights, and then find output layer
weights
Hidden layer
– estimate parameters for each hidden unit k (whose output depends on
distance between input and a stored prototype)
e.g. for Gaussian activation function, estimate parameters: µk, σk2
– This stage involves an Unsupervised training process (no targets available)
Output layer
– set the weights (including bias weights)
– the same as training a single layer perceptron: each unit’s output depends on
weighted sum of inputs,
– using for example, the gradient descent rule
– This stage involves a Supervised training process
Clustering
K-Means Approach
1. Select k multidimensional points to be the
“seeds” or initial centroids for the k clusters
to be formed. Seeds usually selected at
random
2. Assign each observation to the cluster with
the nearest seed.
3. Update cluster centroids once all
observations have been assigned.
4. Repeat steps 2 and 3 until changes in cluster
centroids small.
5. Repeat steps 1-5 with new starting seeds. Do
this step 3 to 5 times.
K-Means Illustration – two dimensions
Fine Tuning
Computing the Output Weights
We want W (a weight matrix) such that
Target T = WX
Thus W= TX-1
If an inverse exists, then the error can be minimized
If no inverse exists, then use the pseudo-inverse to
get minimum error
‘Minimum-norm solution to a
linear system’
The pseudo-inverse is defined as
W=TX+
Where X+ = (XTX)-1 XT
XOR Problem
The relationship between the input and the
output of the network can be given by
where xj is an input vector and dj is the associated value of the desired
output.
RBF Performance
An MLP performs a global mapping, meaning all inputs
cause an output, while an RBF performs a local mapping,
meaning only inputs near a receptive field produce an
activation
Width parameter σ . This is often set equal to a multiple of the
average distance between the centres.
Function Approximation Example:
(function approximation for differently chosen
width parameters)
1 1
Target function: y 2 x 5 x 3x 20
3 2
Type of Activation Function: Gaussian
Input Range: x = [-10:10]
Centers = [-8 -5 -2 0 2 5 8]
Function to be Approximated
30
25
20
15
10
Output
-5
-10
-15
-20
-10 -5 0 5 10
Input
Case 1: Width is chosen to equal 6. This way,
the receptive fields overlap, but no neuron
function covers the entire input space. For
proper overlap, the width parameter needs to
be at least equal to the distance between the
input parameters.
100
Testing the RBF Network, w=6
50
0
Output
-50
-100
-150
-15 -10 -5 0 5 10 15
Input
Case 2: Width is chosen to equal 0.2
(too small). This causes poor
generalization inside the training space
Testing the RBF Network, w=0.2
100
50
0
Output
-50
-100
-150
-15 -10 -5 0 5 10 15
Input
Case 3: Width is chosen to be 200 (too large).
This causes each radial basis function to cover
the entire input space, and is being activated
for each input value. Thus, the network
cannot properly learn the desired mapping.
Testing the RBF Network, w=200
100
50
0
Output
-50
-100
-150
-15 -10 -5 0 5 10 15
Input
Comparison of RBF and MLP networks
• An RBF network will usually have only 1 hidden layer, but a MLP
will usually have more than 1
• Usually hidden and output neurons in an MLP share the same
neuronal model, but this is not true of RBF networks
• The hidden layer of a RBF network is non linear and the output layer
is linear. In an MLP both layers are non linear
• The argument of the activation function in a hidden RBF network
neuron computes the Euclidean norm between the input vector and
the center of the unit. In an MLP the activation function calculates
the dot product (inner product) of the input vector and the synaptic
weight vector
• MLP's construct global approximations to non linear input - output
mapping but RBF networks produce local approximations (when
using a exponentially decaying function such as Gaussians)
Linear Algebraic Equations
Under-determined systems
Ax b
A is a m x n matrix, x is a n x 1
vector, and b is a m x 1 vector
Minimum Norm
2 2 2
Solution J x1 x 2 x n x T x.
f Ax b 0
J a J (1 f 1 2 f 2 m 1 f m 1 m f m ) J λ T f .
J a
0 2x A T λ ,
x
J
0 a Ax b.
λ
x A**b, A** AT ( AAT ) 1
2x1+3x2=8 A = [2 3]; b = 8; xa = A\b
xa = 0
2.6667
xb = lsqminnorm(A,b)
xb = 1.2308
1.8462
Least Squares Solutions (Minimum error
solution)
Over-determined system
The least squares solution is solution which minimizes
the squared norm (size) of the error
J e e ( Ax b) ( Ax b).
T T
Premultiplying
X=lsqr(A,b)