Technical Report 2007-001: Radial Basis Functions Response Surfaces

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Technical Report 2007-001

Radial Basis Functions Response Surfaces


Author: Enrico Rigoni
Date: April 2, 2007

Abstract

Radial Basis Functions (RBF) are a powerful tool for multivariate scattered data
interpolation. Despite their simple formulation RBF hide a sound theoretical frame-
work. In modeFRONTIER five different radial functions are available. Furthermore
a fully automatic scaling policy is implemented, based on the minimization of the
mean leave-one-out error.

Key Words: Radial Basis Functions, Response Surfaces


Tec. Rep. 2007-001 April 2, 2007

1 Introduction
Radial Basis Functions (RBF) are a powerful tool for multivariate scattered data in-
terpolation. Scattered data means that the training points do not need to be sampled
on a regular grid: in fact RBF is a proper meshless method. Since RBF are interpolant
response surfaces they pass exactly through training points.
There exists a vast body of literature on both the theoretical and the computational
features of RBF: for example refer to [1, 2, 3] for a detailed treatment of the subject.

2 Radial Basis Functions basics


Given a training set of n points sampled from a function f (x) : Rd → R,

f (xi ) = fi , i = 1, . . . , n , (1)

a RBF interpolant has the form


n
X
s(x) = cj φ (kx − xj k/ δ) , (2)
j=1

where k.k is the Euclidean norm in the d-dimensional space, and δ a fixed scaling pa-
rameter. The radial function (or kernel ) φ(r) : [0, +∞) → R is a suitable fixed function
chosen out of a given list. So the RBF interpolant s is simply a linear combination
of identical spherical symmetric functions, centered at the n different training points
sites.
The coefficients cj represent the free parameters of the RBF model. Their values
are obtained by imposing the interpolation equations:

s(xi ) = f (xi ) = fi , ∀i = 1, . . . , n . (3)

By defining the symmetrical matrix A (termed the collocation matrix of the RBF) as

Aij = φ (kxi − xj k/ δ) , i, j = 1, . . . , n , (4)

the interpolation equations can be expressed as


n
X
s(xi ) = Aij cj = fi , ∀i = 1, . . . , n , (5)
j=1

or, in matrix from, as


A·c=f. (6)
If the matrix A is nonsingular, the unknown coefficients vector is obtained by inverting
the linear system of equations:
c = A−1 · f . (7)
It is immediately clear that RBF have a very plain formulation: but their apparent
simplicity hide a sound theoretical framework.

1
Tec. Rep. 2007-001 April 2, 2007

3 The polynomial term


A key point for obtaining a unique solution is the nonsingularity of the matrix A: this
depends only on the choice of the radial function φ.
For so-called positive definite (PD) radial functions, the matrix A is positive definite
for every choice of (distinct) training points, and the linear system (6) has a unique
solution.
In case of so-called conditionally positive definite (CPD) radial functions, the RBF
interpolant form has to be changed. In order to guarantee a unique solution, an
additional polynomial term has to be introduced in eq. (2):
n
X
s(x) = cj φ (kx − xj k/ δ) + pm (x) . (8)
j=1

Here m represents the degree of the polynomial, and it depends only on the choice of
φ. The polynomial term has the form
q
X
d
pm (x) = bj πj (x) ∈ Pm , (9)
j=1

where {πj (x)} is a basis of the linear space Pmd containing all real-valued polynomials

in d variables of degree (at most) m. q is the dimension of the polynomial space Pm d,

and it is equal to  
m+d
q= . (10)
d
An example will clarify the form of the polynomial term. With three variables
(d = 3), a second order polynomial (m = 2) involves a 10-dimensional polynomial
space (q = 10). A suitable basis {πj (x)} of this space is the following set of monomials:
{1, x1 , x2 , x3 , x21 , x1 x2 , x22 , x1 x3 , x2 x3 , x23 } .
Now by defining the matrix P as
Pij = πj (xi ) , i = 1, . . . , n and j = 1, . . . , q , (11)
the interpolation equations (6) are replaced by
A·c+P·b=f, (12)
and are coupled with q additional equations, the moment conditions:
PT · c = 0 . (13)
In this way the interpolation equations and the moment conditions can be arranged
together to form an augmented system of dimensions n + q:
! ! !
A P c f
· = . (14)
PT 0 b 0
Since the radial function is CPD, this linear system can be inverted, and there exists
a unique solution for the unknown c and b vectors of coefficients.

2
Tec. Rep. 2007-001 April 2, 2007

G φ(r) = exp(−r2 ) PD
(
r3 d odd m = (d + 1)/2
PS φ(r) = 2
r log(r) d even m = d/2

MQ φ(r) = (1 + r2 )(1/2) m=0

IMQ φ(r) = (1 + r2 )(−1/2) PD



3
 (1 − r)+ (3r + 1) d = 1

W2 φ(r) = (1 − r)4+ (4r + 1) d = 2, 3 PD

 (1 − r)5 (5r + 1) d = 4, 5
+

Table 1: Available radial functions φ(r). They are either PD, or CPD: in the latter case the degree
m of the required polynomial is specified. The symbol (.)k+ denotes the truncated power function:
(x)k+ = xk for x > 0, and (x)k+ = 0 for x ≤ 0.

4 Radial functions
In modeFRONTIER five different radial functions are available: Gaussians (G), Duchon’s
Polyharmonic Splines (PS), Hardy’s MultiQuadrics (MQ), Inverse MultiQuadrics (IMQ),
and Wendland’s Compactly Supported C 2 (W2). This list represents a complete set
of state of the art and widely used radial functions that can be found in literature.
The analytical expressions of the different functions is shown in table 1, while
figure 1 shows their plots. G, IMQ, and W2, are PD; on the contrary PS and MQ are
CPD, and so they require the additional polynomial term.
W2 have different expressions according to the space dimensionality; anyhow they
cannot be used with more than 5 dimensions.
PS have two expressions, according to the fact that space dimensionality is even or
odd. PS include Thin-Plate Splines (for d = 2) and the usual Natural Cubic Splines
(for d = 1) as particular cases.

5 Singular Value Decomposition algorithm


The Singular Value Decomposition (SVD) algorithm is a very powerful technique for
solving a linear system of equations, even when dealing with numerically difficult sit-
uations (refer to [4]).
The problem is to solve for the unknown vector c the linear system

A·c=f, (15)

given the matrix A and the vector f .

3
Tec. Rep. 2007-001 April 2, 2007

Radial functions
3
G
PS
2.5 MQ
IMQ
W2
2

1.5

φ(r)
1

0.5

−0.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
r

Figure 1: Plot of different radial functions (for d = 2).

The SVD algorthm decomposes the (square) matrix A in this way:


A = U · W · VT , (16)
where U and V are two orthogonal matrices (i.e. UT · U = 1, and VT · V = 1) ,
and W is a diagonal matrix, whose positive elements on the diagonal are the singular
values wj . Therefore it is simple to compute the inverse of A:
A−1 = V · [diag(1/wj )] · UT , (17)
where [diag(1/wj )] represents the diagonal matrix whose elements are the reciprocals
of the elements wj of the matrix W.
The condition number of the matrix A is defined as the ratio of the greatest sin-
gular value to the smallest one. The greater the condition number, the more difficult
the problem from the numerical point of view. When the condition number is com-
parable to the inverse of the machine’s precision (for double precision floating point
the machine’s precision is approximately 10−16 ) the matrix is ill-conditioned, and the
roundoff errors become important. The SVD method is able, to some extend, to face
this situation: in fact it can limit the contribution of the terms badly affected by
roundoff error.
The SVD algorithm is herein used to solve either (6) or (14) linear systems.

6 Leave-one-out error
One system for checking the goodness of an interpolant response surface is the leave-
one-out methodology. In turn, each point belonging to the training set is excluded from

4
Tec. Rep. 2007-001 April 2, 2007

the training procedure. The value predicted in the excluded point by the so created
surface is then compared to the known value. The leading idea is that the smaller this
value on average, the better the surface trained on the whole dataset.
A severe drawback of this technique is its huge computational demand: n different
surfaces have to be created, each using n − 1 training points, where n is the size of the
original training set. Very often this fact prevents this method from being used.
We will show in this section how this is not the case in RBF framework: in fact
there is a convenient way for computing the root-mean-square (rms) leave-one-out error
(see [5]).

Let’s consider again the interpolation equations (6),

A·c=f. (18)

Note that even the augmented system (14) can be considered to have the same form

A c=e
e ·e f, (19)

simply denoting with the symbol e. the relevant augmented quantities. Thus the results
presented in this section can be regarded as generic.
Let d(k) be the solution of

A · d(k) = e(k) k = 1, . . . , n , (20)

where e(k) is the k-th column of the n × n identity matrix.


Before proceeding, it is better to rewrite the previous linear systems in their alge-
braic form:
X n
Aij cj = fi , ∀i = 1, . . . , n , (21)
j=1

and
n
X (k)
Aij dj = δik , ∀i = 1, . . . , n , (22)
j=1

respectively. The symbol δik denotes the Kronecker delta:


(
0 for i 6= k
δik = (23)
1 for i = k

Let c(k) be the solution of the linear system obtained from (21) by removing from
A the k-th row and the k-th column:
n
X (k)
Aij cj = fi , ∀i = 1, . . . , k − 1, k + 1, . . . , n . (24)
j=1, j6=k

5
Tec. Rep. 2007-001 April 2, 2007

We will show shortly that the solution is


(k) ck (k)
cj = cj − (k) dj , j = 1, . . . , k − 1, k + 1, . . . , n . (25)
dk
Demonstration: using the definition (25), and equations (21) and (22), it follows that
n
X n
X n
X
(k) ck (k)
Aij cj = Aij cj − (k)
Aij dj =
j=1, j6=k j=1, j6=k dk j=1, j6=k
ck  (k)

= (fi − Aik ck ) − (k)
δ ik − Aik d k =
dk
ck
= fi − (k)
δik . (26)
dk
Since we have to consider only i 6= k, the term with δik vanishes, and so the right term
is equal to that of equation (24).

The RBF interpolant s(k) (x) obtained by excluding the k-th point from the training
set has the form (compare to equation 2):
n
X
(k) (k)
s (x) = cj φ (kx − xj k/ δ) , (27)
j=1, j6=k

where the coefficients c(k) are exactly those of equation (24), which represents indeed
the interpolation equations for the reduced training set.
Evaluating it just in the point xk excluded from the training, we obtain
n
X n
X
(k) (k) (k)
s (xk ) = cj φ (kxk − xj k/ δ) = Akj cj ; (28)
j=1, j6=k j=1, j6=k

by virtue of the result obtained in equation (26), it follows that


ck
s(k) (xk ) = fk − (k)
. (29)
dk
The leave-one-out error relative to the k-th term is then
ck
Ek = fk − s(k) (xk ) = (k) . (30)
dk
The key point is that once the surface has been trained on the whole dataset, we
hold the inverse of the matrix A (obtained by means of the SVD algorithm): so it
(k)
is trivial to solve the set of systems (22) getting the needed dk values. To be more
explicit, using equation (17), we get:
n
X n
X
(k) Vki Uki
dk = A−1
kj δjk = A−1
kk = . (31)
wi
j=1 i=1

6
Tec. Rep. 2007-001 April 2, 2007

G radial function
1
0.5
0.9 1.0
2.0
0.8

0.7

0.6

φ(r/δ) 0.5

0.4

0.3

0.2

0.1

0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
r

Figure 2: Plot of different G radial functions relative to different values of the shape parameter δ
(values are reported in the caption).

The rms leave-one-out error is


v
u n
u1 X
E=t Ek2 . (32)
n
k=1

Its evaluation represents a fringe benefit, since it is readily computable once that the
linear system of interpolation equations has been solved (which is the most demanding
part from the computational point of view). We will see later on how this value is
useful in order to set automatically the scaling parameter.

7 Scaling parameter
The scaling parameter determines the shape of the radial function: figure 2 shows an
example. Its value has to be set accordingly to the specific problem one is facing: in
general it can be related to the data spatial density.

7.1 Test problem


In order to study the role of the scaling parameter it is convenient to introduce a test
problem, just to fix ideas. Let’s consider the sin k2n2 problem (belonging to the wider
set of “sinusoid family problems”). The dimensionality of the problem is d = 2. The
input variables ranges are
xi ∈ [0, 1] , i = 1, 2 ,

7
Tec. Rep. 2007-001 April 2, 2007

sin_k2n2 problem

1
0.8
0.6
0.4
0.2
0

f
−0.2
−0.4
−0.6
−0.8
−1
1
0.8 1
0.6 0.8
0.4 0.6
0.4
0.2
0.2
0 0
y x

Figure 3: Plot of the response function of the sin k2n2 problem.

and the response function is

1h i
f (x) = sin(kπx1 ) − sin(kπx2 ) ,
2

where the parameter k (the number of half-periods in the sinusoidal functions) is fixed
to k = 2. Figure 3 shows the plot of the response function. We take n = 40 randomly
chosen points as the training set.

7.2 Effects of the scaling parameter


The effect of changing the value of the scaling parameter is here analyzed in details.
The RBF-G (RBF with radial function of type G) is studied on the test problem,
varying the scaling parameter over a wide range of values.
Figure 4 shows the plot of the condition number of the collocation matrix vs. the
scaling parameter. The curve grows rapidly as δ increases; then it settles on a plateau,
randomly oscillating around a value of about 1018 . Clearly in this limit the problem
is ill-conditioned (well beyond the 1016 threshold), and evidently the roundoff error
prevents this value from growing further.
The plot of the training performance (i.e. the rms error computed on the training
data) vs. the scaling parameter is shown in figure 5. The performance worsens as δ
increases. This behavior is correlated to the previous graph: the higher δ, the higher
the condition number. Since the problem is more difficult from the computational
point of view, the interpolation through training points becomes harder (the roundoff
error becomes greater).

8
Tec. Rep. 2007-001 April 2, 2007

20
collocation matrix
10

18
10

16
10

14
10

condition number 10
12

10
10

8
10

6
10

4
10

2
10

0
10
−2 −1 0 1 2
10 10 10 10 10
scaling parameter

Figure 4: Condition number vs. scaling parameter for RBF-G on sin k2n2 problem.

0
training data
10

−2
10

−4
10

−6
10
performance

−8
10

−10
10

−12
10

−14
10

−16
10
−2 −1 0 1 2
10 10 10 10 10
scaling parameter

Figure 5: Training performance (rms error on training data) vs. scaling parameter for RBF-G on
sin k2n2 problem.

9
Tec. Rep. 2007-001 April 2, 2007

1
validation data
10

0
10

−1
10
performance

−2
10

−3
10

−4
10
−2 −1 0 1 2
10 10 10 10 10
scaling parameter

Figure 6: Validation performance (rms error on validation data) vs. scaling parameter for RBF-G on
sin k2n2 problem.

1
leave−one−out error
10

0
10
rms error

−1
10

−2
10

−3
10
−2 −1 0 1 2
10 10 10 10 10
scaling parameter

Figure 7: rms leave-one-out error vs. scaling parameter for RBF-G on sin k2n2 problem.

10
Tec. Rep. 2007-001 April 2, 2007

RBF−G

1
0.8
0.6
0.4
0.2
0

f
−0.2
−0.4
−0.6
−0.8
−1
1
0.8 1
0.6 0.8
0.4 0.6
0.4
0.2
0.2
0 0
y x

Figure 8: Too low scaling parameter (δ = 0.05).

Figure 6 shows the plot of the validation performance (i.e. the rms error on the val-
idation data) vs. the scaling parameter. A brand new set of 500 points, the validation
dataset, has been generated in order to study the goodness of the trained RBF. Clearly
this is the function that we always would like to minimize, getting the optimum value
of δ: but obviously in real application this plot is unknown.
The plot of the rms leave-one-out error vs. the scaling parameter is shown in fig-
ure 7. The aspect of the curve resembles very closely that of figure 6: more importantly
the location of the minimum is the same. This fact make it possible to gain a feasible
method for automatically setting the scaling parameter to an optimal value. We will
analyze this aspect later on (see section 7.5).
RBF have been benchmarked over many test problems, obtaining similar results.
Also the other kinds of RBF other than G have been tested, studying their character-
istic behavior. A common result is that always the rms leave-one-out error curve is
similar to that of validation performance. Herein only one test case has been reported,
for illustrative sake.

7.3 Different settings of the scaling parameter


It is possible to identify three different characteristic behaviors of the RBF-G on the
test problem, according to the value of the scaling parameter:
• too low scaling parameter (e.g. δ = 0.05): low condition number. Bad results:
narrow peaks centered at the training points. See figure 8.
• proper scaling parameter (e.g. δ = 1.0): medium-high condition number. Good
result: proper fitting. See figure 9.

11
Tec. Rep. 2007-001 April 2, 2007

RBF−G

1
0.8
0.6
0.4
0.2
0

f −0.2
−0.4
−0.6
−0.8
−1
1
0.8 1
0.6 0.8
0.4 0.6
0.4
0.2
0.2
0 0
y x

Figure 9: Proper scaling parameter (δ = 1.0).

RBF−G

1
0.8
0.6
0.4
0.2
0
f

−0.2
−0.4
−0.6
−0.8
−1
1
0.8 1
0.6 0.8
0.4 0.6
0.4
0.2
0.2
0 0
y x

Figure 10: Too high scaling parameter (δ = 100.0).

12
Tec. Rep. 2007-001 April 2, 2007

Figure 11: Fill distance h and separation distance q.

• too high scaling parameter (e.g. δ = 100.0): ill-conditioned problem. Bad re-
sults: too “smooth” surface, even not able to interpolate the training points. See
figure 10.

7.4 Uncertainty principle


The set of scattered training points (or interpolation points) is characterized by two
quantities: the fill distance h, and the separation distance q. These quantities, defined
in the followings, are shown in figure 11.
The fill distance h is defined as the radius of the largest inner empty disk:

h = max min kx − xj k , (33)


x∈Ω 1≤j≤n

where Ω ⊂ Rd is the domain of definition of f (x). In order to achieve better approxi-


mation quality one should minimize the fill distance: min h.
The separation distance q is defined as the minimum distance between two training
points:
q = min {kxi − xj k} . (34)
i6=j

In order to improve the numerical stability of the problem one should maximize the
separation distance: max q.
Therefore in order to improve both approximation quality and numerical stability
one should maximize the ratio max(q/h). Clearly this objective is achieved for a well
distributed, almost uniform, set of training points. But in general, for scattered data,
one deals with q ≪ h. In case of uniform distribution of data, there is no way for
further improving both objectives: there is a trade-off situation between min h and
max q.

13
Tec. Rep. 2007-001 April 2, 2007

A similar trade-off behavior is found in the uncertainty principle of RBF interpo-


lation (see [6]): one cannot achieve both good accuracy (or error) and good stability
(or sensitivity). That is, by varying the scale parameter, one cannot improve the ap-
proximation quality and the numerical stability at the same time. This is clear from
figures 4 and 6: by increasing the scale parameter value we improve the performance
on the validation data (the accuracy is better), but at the same time the condition
number of the collocation matrix increases (the stability worsens).

7.5 Automatic setting of scaling parameter


As seen in section 7.2, the minimization of rms leave-one-out error is a suitable method
for finding the optimum value of the scaling parameter. Therefore this method, pro-
posed by [5], has been implemented as an automatic procedure for determining the
proper setting of the scaling parameter. A simple one dimensional minimizer can do
the job: in general 10 steps are sufficient in order to find the minimum.
In general the plot of the rms leave-one-out error for W2 functions shows an impor-
tant decrease followed by a monotone slightly descending behavior (but nearly flat): in
this case it is sufficient for the optimizer to stop immediately after the strong decrease.
Since PS are nearly insensible to scaling parameter, they do not need this method:
in this case setting δ = 1.0 is enough.

8 RBF vs. NN
Sometimes in literature one finds the terminology “Radial Basis Function Networks”:
these are particular Neural Networks (NN) which use radial functions as transfer func-
tion. Often RBF Networks are simply the RBF herein exposed, but interpreted from
the point of view of NN: in our opinion this could be misleading, since RBF and
NN are quite different algorithms for building response surfaces, each with its own
characteristics (see e.g. [7] to revise NN theory).
Here is a list of differences between RBF and NN. Furthermore some features of
RBF Networks are presented: it seems to be hard to fit them in the context of NN.

• RBF are interpolant while NN are approximants (i.e. they do not pass exactly
through training points).

• In RBF Networks the number of neurons in the hidden layers is equal to the size
of the training set. In general this is not the case for a NN. Furthermore in RBF
Networks each neuron is strongly coupled with its relevant training point (as
explained in the next point), while in usual NN there is no direct link between
neurons and training points: in general each neuron is able to map different
regions of space.

• In NN each neuron in the hidden layer performs a linear combination of the


input values, where the coefficients are the weights (the free parameters of the

14
Tec. Rep. 2007-001 April 2, 2007

network). This net input is then transformed by the non-linear transfer func-
tion. On the contrary in RBF Networks each neuron combines the inputs by
evaluating the Euclidean distance of the input vector from one training point:
so the combination of the inputs values is non-linear, and instead of having free
parameters one deals with the coordinates of one given training point. Then the
transfer function is a radial function (usually only of G type).

• In NN the backpropagation training algorithm is an iterative method: it proceeds


by means of subsequent approximations. In RBG the training is a one-step
process: an exact solver works out the linear system of interpolation equations.

9 Parameters settings
Only few parameters must be defined by the user for the RBF algorithm in mode-
FRONTIER:

Training Set : the designs set for Radial Basis Functions training. It is possible to
choose between All Designs database and Only Marked Designs.

Radial Functions : five different radial functions are available: Gaussians (G),
Duchon’s Polyharmonic Splines (PS), Hardy’s MultiQuadrics (MQ), Inverse Mul-
tiQuadrics (IMQ), and Wendland’s Compactly Supported C 2 (W2).

Scaling Policy : the Automatic selection will let the algorithm to choose the proper
scaling parameter value by means of minimization of the rms leave-one-out error.
On the contrary, if the User Defined choice is selected, the user has to define
manually the value of the scaling parameter.

Scaling Parameter : this field is significant if and only if the User Defined choice is
selected, and defines the value of the scaling parameter δ.

Variables Normalization : if Enabled (the suggested default choice), input and


output variables are normalized within the range [0, 1]. In this way the algorithm
can deal also with different scaled variables. The Disabled option should be
reserved only for cases in which variables have the same nature and the same
scale.

It is always possible to stop the run by clicking on the Stop RSM button. However,
since the interpolation equations are solved by the exact one-step SVD algorithm, a
premature stop will result in no solution.
It is always useful to look at the RSM log during and after the training process.
Some information on points distribution is shown, such as minimum, maximum, and
mean mutual distances. When the automatic scaling policy is enabled, at each step
the scaling parameter and the mean leave-one-out error values are shown. Finally the
condition number of the problem is shown. Consider that when variables normalization
is enabled, also all other values are normalized accordingly.

15
Tec. Rep. 2007-001 April 2, 2007

References
[1] Wendland, Holger, 2004, Scattered Data Approximation, Cambridge University
Press

[2] Iske, Armin, 2004, Multiresolution Methods in Scattered Data Modelling, Springer

[3] Buhmann, Martin D., 2003, Radial Basis Functions: Theory and Implementations,
Cambridge University Press

[4] Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P., 1992, Numer-
ical Recipes in C. The Art of Scientific Computing, 2nd ed., Cambridge University
Press

[5] Rippa, Shmuel, 1999, An algorithm for selecting a good value for the parameter
c in radial basis function interpolation, Adv. in Comp. Math., 11, 193-210

[6] Schaback, Robert, 1995, Error estimates and condition numbers for radial basis
function interpolation, Adv. in Comp. Math., 3, 251-264.

[7] Rigoni, Enrico and Lovison, Alberto, 2006, Neural Networks Response Surfaces,
Esteco Technical Report 2006-001.

16

You might also like