Introduction To Radial Basis Function Networks
Introduction To Radial Basis Function Networks
φ1
φ2
Typical Applications of NN
● Pattern Classification
● Function Approximation
● Time-Series Forecasting
Function Approximation
Unknown
f
Approximator
ˆf
Supervised Learning
Unknown
Function
+
+−
Neural
Network
Neural Networks as
Universal Approximators
● Feedforward neural networks with a single hidden layer of
sigmoidal units are capable of approximating uniformly any
continuous multivariate function, to any desired degree of
accuracy.
– Hornik, K., Stinchcombe, M., and White, H. (1989). "Multilayer
Feedforward Networks are Universal Approximators," Neural Networks,
2(5), 359-366.
● Like feedforward neural networks with a single hidden layer
of sigmoidal units, it can be shown that RBF networks are
universal approximators.
– Park, J. and Sandberg, I. W. (1991). "Universal Approximation Using
Radial-Basis-Function Networks," Neural Computation, 3(2), 246-257.
– Park, J. and Sandberg, I. W. (1993). "Approximation and
Radial-Basis-Function Networks," Neural Computation, 5(2), 305-316.
A feedforward neural network with one hidden layer using sigmoidal activation functions (like
the logistic sigmoid or tanh) can approximate any continuous function with arbitrary accuracy,
given enough neurons in the hidden layer.
This means that if we have a function () that maps inputs to outputs, we can construct a
neural network with a single hidden layer that closely approximates ()
The key insight is that neural networks can learn complex relationships even with a single
hidden layer, as long as they have enough neurons.
Why does this matter?
It proves that neural networks have great flexibility and can model almost any real-world
function.
This result laid the foundation for why deep learning is so powerful—because neural networks
can represent almost any function.
The Model of
Function Approximator
Fixed
Basis We
Function igh
s ts
Linear Models
Linear Models
y
Linearly
Output
weighted
Units
output
w1 w2 wm
• Decomposition
Hidden φ φ φ • Feature Extraction
Units 1 2 m • Transformation
w1 w2 wm
• Decomposition
Hidden φ φ φ • Feature Extraction
Units 1 2 m • Transformation
● Fourier Series
Single-Layer Perceptrons as
Universal Aproximators
y
w1 w2 wm
With sufficient number of
Hidden φ φ φ sigmoidal units, it can be a
Units 1 2 m universal approximator.
x = x1 x2 xn
Radial Basis Function Networks as
Universal Aproximators
y
x = x1 x2 xn
Adjusted
by the We
Learning igh
process ts
Non-Linear Models
The Radial Basis
Function Networks
Radial Basis Function Networks
Output Interpolation
Units
Hidden Projection
Units
Output
Units Classes
Hidden
Subclasses
Units
22
Network Parameters
● : The weights joining hidden and output layers. These are the
weights which are used in obtaining the linear combination of the radial
basis functions. They determine the relative amplitudes of the RBFs when
they are combined to form the complex function.
● Hardy Multiquadratic
● Inverse Multiquadratic
Gaussian Basis Function (σ=0.5,1.0,1.5)
Inverse Multiquadratic
c=5
c=4
c=3
c=2
c=1
RBFN’s for
Function Approximation
The idea
y Unknown Function
to Approximate
Training
Data
x
The idea
y Unknown Function
to Approximate
Training
Data
x
Basis Functions (Kernels)
The idea
Function
y Learned
x
Basis Functions (Kernels)
The idea
Nontraining
Sample Function
y Learned
x
Basis Functions (Kernels)
The idea
Nontraining
Sample Function
y Learned
x
Radial Basis Function Networks as
Universal Aproximators
Training set
w1 w2 wm
x = x1 x2 xn
Learn the Optimal Weight Vector
Training set
w1 w2 wm
x = x1 x2 xn
Learning the Kernels
How to Train?
.
.
.
.
.
Exact RBF
The first layer weights u are set to the training data; U=XT. That is the
gaussians are centered at the training data instances.
The spread is chosen as , where dmax is the maximum Euclidean
distance between any two centers, and N is the number of training data
points. Note that H=N, for this case.
The output of the kth RBF output neuron is then
Define:
If {xi}iN=1 are a distinct set of points in the d-dimensional space, then the
N by N interpolation matrix Φ with elements obtained from radial basis
functions is nonsingular, and hence can be inverted!
Note that the theorem is valid regardless the value of N, the choice of the
RBF (as long as it is an RBF), or what the data points may be, as long as
they are distinct!
.
.
.
Approach1
.
. (Cont.)
Gaussian RBFs are localized functions ! unlike the sigmoids used by MLPs
Using localized functions typically makes RBF networks more suitable for
function approximation problems.
Since first layer weights are set to input patterns, second layer weights are
obtained from solving linear equations, and spread is computed from the
data, no iterative training is involved !!!
Guaranteed to correctly classify all training data points!
However, since we are using as many receptive fields as the number of
data, the solution is over determined, if the underlying physical process
does not have as many degrees of freedom 🡺 Overfitting!
The importance of σ: Too small will
also cause overfitting. Too large will
fail to characterize rapid changes in
the signal.
.
.
Too many
Receptive Fields?
.
.
.