Chap 2
Chap 2
then the given data correspond to xi = S0 /Ki and ui = Ci /Ki . If we interpolate (or
approximate) these data with a curve u(x) we obtain an approximation to
C(S, K0 ) = K0 u(S/K0 ).
What makes this problem challenging is the fact that for delta and gamma hedging we
need not only C(S, K0 ) but also ∂C/∂S and ∂ 2 C/∂ 2 S so that the usual piecewise linear
approximation of the data is not in general useful. We need an approximation in terms of a
once or twice differentiable function (unless we want to entrust ourselves to the numerical
differentiation of discrete data and its numerical instability as discussed in Chapter X).
We begin by discussing the one-dimensional interpolation problem: Find a function
N
P u(x) which interpolates the data {(xi , ui )}i=0 . It will always be assumed that the data
{ui } represent measured or computed values of an underlying function u(x) which, of course,
is not known to us. Thus P u can always be considered an approximation to u. The notation
1
P u is chosen to suggest that the interpolant is a projection of u in the sense that if we take
the interpolant P (P u) of P u we end up with P u itself.
Polynomial interpolation: There are infinitely many functions which pass through
all the given data points. Conceptually a very simple function is a polynomial of degree N
N
X
PN (x) = aj xj
j=0
PN (xi ) = ui , i = 0, . . . , N.
Na = u
where
Nij = xji , i, j = 0, . . . , N
and
u = (u0 , . . . , uN ).
The matrix N is known as a Vandermonde matrix and can be shown to be non-singular for
distinct interpolation points {xi } so that the interpolating polynomial exists and is unique.
It is, in fact, not even necessary to set up the matrix system. The interpolating polynomial
can be written in terms of the so-called Lagrange coefficients
(x − x0 )(x − x1 ) · · · (x − xi−1 )(x − xi+1 ) · · · (x − xN )
`i (x) =
(xi − x0 ) · · · (xi − xi−1 )(xi − xi+1 ) · · · (xi − xN )
or in short
N
Q
(x − xj )
j=0
j6=i
`i (x) = N
.
Q
(xi − xj )
j=0
j6=i
2
Thus the interpolating polynomial for {xi , ui } is
N
X
PN (x) = uj `j (x)
j=0
−3u0 + 4u1 − u2
u0 (x0 ) = ,
2∆x
and
u0 − 2u1 + u2
u00 (x0 ) =
∆x2
(which, as we shall see later, is a much better approximation to u00 (x1 ) than to u00 (x0 )). If
the data come from an N +1 times continuously differentiable function u then it is natural to
ask how close the polynomial interpolant comes to reproducing u. This question is answered
by the following theorem:
Let {xj }N
j=0 be distinct real numbers and let u be a given real valued function with
(t − x0 ) · · · (t − xN ) (N +1)
u(t) − P u(t) = u (ζ).
(N + 1)!
3
for approximating a few data points, with N = 3, 4, or 5. It becomes problematical for
large N because the interpolant can oscillate wildly. A classic example is the interpolation
of the following nice function
1
u(x) =
1 + x2
xi = −5 + 10i/N, ui = u(xi ), i = 0, . . . , N
and plot u and P u as a function of x when we interpolate for N = 10, i.e., when we
interpolate at the integers −5, −4, . . . , 4, 5.
It is apparent from the plot that the 10th order polynomial P u reproduces u very poorly
near the endpoints of the interval. The interpolant is useless for N = 10 and becomes worse
as N increases.
Piecewise cubic spline interpolation: In order to avoid the instabilities which
can occur in high order polynomial interpolation but still be able to provide derivative
information the theory of spline interpolation has been developed in the past 30 years.
Splines have become the dominant tool for interpolating data which are thought to come
from a smooth function of a single variable. We shall discuss here the piecewise cubic spline.
We again suppose that we have data {xi , ui } for i = 0, . . . , N . The basic idea is to
define over each subinterval [xi−1 , xi ] a smooth function Pi (x) which interpolates ui−1 at
xi−1 and ui at xi so that
4
By construction P u is twice continuously differentiable on [x0 , xN ]. It turns out that the
simplest function which can satisfy these conditions is a cubic polynomial over each subin-
terval. It is straightforward to compute if we write
(x − xi−1 ) (xi − x)
Pi00 (x) = Mi + Mi−1 , i = 1, . . . , N
(xi − xi−1 ) (xi − xi−1 )
for as yet unknown constants {Mi }. By inspection Pi00 (xi ) = Mi and Pi+1
00
(xi ) = Mi so that
P u will have a continuous second derivative on [x0 , xN ] regardless of how we pick {Mi }. We
now find Pi (x) by integrating twice so that
The two constants of integration are determined such that Pi (x) interpolates u at xi−1 and
xi . It is straightforward to verify that the interpolant is
The as yet unknown coefficients {Mi } are still available to enforce continuity of the deriva-
tive. A straightforward calculation shows that
0
Pi0 (xi ) = Pi+1 (xi )
1 1 1 1 ui+1 − ui ui − ui−1
Mi−1 ∆xi + Mi ∆xi + Mi ∆xi+1 + Mi+1 ∆xi+1 = −
6 3 3 6 ∆xi+1 ∆xi
so that
M0 = M N = 0
5
or we can fit derivative information for u0 so that
(If u0 is not available one can approximate u with a low order Lagrange polynomial at the
end points and take its derivative as an approximation to u0 .) These conditions lead to the
two equations
1 1 u1 − u 0
M0 ∆x1 + M1 ∆x1 = − u0 (x0 )
3 6 ∆x1
1 1 uN − uN −1
MN −1 ∆xN + MN ∆xN = + u0 (xN ).
6 3 ∆xN
In both cases we end up with a linear system
N M = b.
For the natural spline the matrix N has dimension (N − 1) × (N − 1) and the unknowns
N −1
are {Mj }j=1 . In the second case the matrix N has dimension (N + 1) × (N + 1). In both
cases the matrix N satisfies
N
X
|Nii | > |Nij |.
j=0
j6=i
Such a matrix is called strictly diagonally dominant which guarantees the existence of a
unique solution of the linear system and makes its numerical solution on a computer straight-
forward.
We can again ask how well the cubic spline will approximate a smooth function u by
interpolating its values. One now has the following theorem:
Let u be four times continuously differentiable on the interval [a, b]. Let a = x 0 <
x1 · · · < xN = b and set
h = max ∆xi .
i
Let P u be the piecewise cubic spline which interpolates u at the points {xj } and u0 (a) and
u0 (b). Then
max |u(n) (x) − P u(n) (x)| < cn h4−n max |u(4) (x)|, n = 0, 1, 2
x
6
where
c0 = 5/384, c1 = 1/24, c2 = 3/8.
This theorem shows that the piecewise cubic spline is a high order approximation of the
function u and its first two derivatives. For example, if h = 10−1 then error in approximating
u is of order 10−4 .
For a proof see again [].
The theory of splines goes much beyond the case of simple piecewise cubic interpolants.
Matlab provides tools of splines up to order 9 (i.e., piece-wise ninth order polynomials for
a smooth interpolation of data. In addition, interpolation in terms of functions other than
polynomials can be applied, such as in the so-called theory of “splines under tension” which
can be applied when cubic splines show oscillations.
Here we shall pursue interpolation in the opposite direction and drop the order of the
spline to 1; we now are talking about piecewise linear interpolation (which is routinely em-
ployed when plotting data). In analogy to the polynomial interpolation based on Lagrange
coefficients we can write the linear interpolant through {xi , ui } in the form
N
X
Pu = uj φj (x)
j=0
x − xi−1
x ∈ [xi−1 , xi ]
x −x
i i−1
φi (x) = xi+1 − x
x ∈ [xi , xi+1 ]
xi+1 − xi
0 otherwise.
We see that φi (x) is piecewise linear between any two interpolation points and that
½
1 i=j
φi (xj ) =
6 j.
0 i=
φi (x) is often referred to as a “hat” function. Piecewise linear interpolation requires little
work compared to cubic spline interpolation. On the other hand, the approximation error
is notably increased.
7
Theorem: Let u be twice continuously differentiable on the interval [a, b]. Let a = x 0 <
. . . < xN = b and set
h = max ∆xi .
i
Let P u be the piecewise linear function which interpolates u at the points {x i }. Then
where c0 = 1/8 and c1 = 1/2. Hence the error in approximating u is of order h2 compared
to h4 for the piecewise cubic spline.
Interpolation in the plane: Let us suppose now that we need to find a surface
P u(x, y) which interpolates given points {(xi , yj ), uij } for i = 0, . . . , M and j = 0, . . . , N .
Because the data are given on a regular grid one can write down immediately interpolating
functions, such as
j=N
i=M
X
P u(x, y) = uij `i (x)`j (y)
i=0
j=0
or
j+N
i=M
X
P u(x, y) = uij φi (x)φj (y)
i=0
j=0
where `i (x) is the M th order Lagrange coefficient in x and `j (y) is the N th order Lagrange
coefficient in y. Similarly, φi (x) and φj (y) are the one-dimensional piecewise linear hat
functions.
Note that when Lagrange coefficients are used then an infinitely differentiable inter-
polant results. As in the one-dimensional case artificial oscillations rule out the polynomial
interpolation for more than a few points in each direction. We also observe that in piecewise
linear interpolation the function
φi (x)φj (y)
vanishes outside the rectangle [xi−1 , xi+1 ]x[yj−1 , yj+1 ], while inside it is piecewise linear in
x for fixed y, piecewise linear in y for fixed x and piecewise quadratic in x along any line
8
y = mx + b crossing the rectangle. The function looks like a tent or hat with height 1 at
the center and zero on and outside the boundary of the rectangle.
The essential assumption for our interpolation schemes is the requirement that the data
are given on a rectangular grid so that the interpolant can be expressed in product form.
Unfortunately, data may well be scattered. For example, if option prices C(S, t) are to be
interpolated from recorded prices at different times {tj } then the nodes {xi } change from
time level to time level since the value of the underlying asset will change with time. Hence
in general all we may assume is that data {(xi , yi , ui )}N
i=1 are given where each (xi , yi )
simply is a point with known coordinates. If all we need is a surface which interpolates the
data but which need not have continuous derivatives then the basic idea underlying the hat
function can be generalized.
First we “triangularize,” i.e., break up into triangles, the region containing the points
{(xi , yi )}. Computer subroutines are available which generate triangles in the x − y plane
such that each point (xi , yi ) is the vertex of one or more triangles. Note that in general these
points are scattered in the plane. We cannot assume in general that (xk , yk ) and (xk+1 , yk+1 )
belong to the same triangle or adjacent triangles. Then for each vertex (x n , yn ) we are going
to find the analog to the hat function which was defined above over the rectangular grid.
This function is defined over all triangles. We shall denote it by φ n (x, y). It has the shape
of a plane over each triangle. This plane is uniquely determined by its height at the three
vertices of the triangle. The plane will have height one at the chosen vertex (x n , yn ) and
height zero at all other vertices. We note that if none of the three vertices of a triangle
corresponds to the chosen vertex (xn , yn ) then φn (x, y) = 0 over the triangle. If we were to
plot φn we have a tent of unit height at (xn , yn ) which slopes linearly to zero at adjacent
vertices and is continued as zero over the remaining triangles. The interpolant of a function
u(x, y) then is
N
X
P u(x, y) = u(xn , yn )φn (x, y)
n=1
which is a continuous piecewise linear function defined over the whole domain. The actual
shape of each φn (x, y) over a given triangle is easily found. Suppose we have the unit triangle
with vertices 1, 2 and 3 given by (0, 0), (1, 0) and (0, 1), respectively. Then by inspection
9
we see that over this triangle in the X − Y plane
φ1 (X, Y ) = 1 − X − Y
φ2 (X, Y ) = X
φ3 (X, Y ) = Y.
Now let (x0 , y0 ), (x1 , y1 ) and (x2 , y2 ) be the coordinates of the three vertices of a triangle
T . We can map them to the vertices of the unit triangle with an affine transformation of
the form µ ¶ µ ¶
X x
=A +b
Y y
where the 2 × 2 matrix A and the vector b are chosen such that
µ ¶ µ ¶
0 x0
=A +b
0 y0
µ ¶ µ ¶
1 x1
=A +b
0 y1
µ ¶ µ ¶
0 x2
=A + b.
1 y2
P u(x, y) = u(x0 , y0 )φ0 (x, y) + u(x1 , y1 )φ1 (x, y) + u(x2 , y2 )φ2 (x, y)
where
φ0 (x, y) = 1 − [a11 x + a12 y + b1 ] − [a21 x + a22 y + b2 ]
10
blocks for a linear (or higher order) interpolation on triangulated domains may be found
in “finite element” program libraries which exist for the numerical integration of partial
differential equations.
Approximation
Interpolation is not a suitable representation of data if the data are noisy due to mea-
surement errors, if the mathematical model requires that a specific form of function with few
degrees of freedom reproduce the measurements as well as possible, or if an extrapolation
of data well beyond the domain is required for which measurements are available. In this
case the data are approximated by an a priori chosen function in some average manner.
A common method for approximating data is the least squares method. To illus-
trate this approach consider the following problem. Suppose we have M measurements
M
{(xi , ui )}i=1 , with M À 1, which we expect to be normally distributed. However, we know
neither the mean α1 nor the standard deviation α2 . A least squares approach would require
the calculation of α1 and α2 such that
M
2
X
F (α1 , α2 ) = wj (uj − N (xj , α1 , α2 ))
j=1
is minimized. Here Z x
1 1 x−α1 2
N (x, α1 , α2 ) = √ e − 2 ( α2 )
2π α2 −∞
while the number wi represents the weight one wishes to assign to the ith measurement. In
a sense the function N (x, α1 , α2 ) is the mathematical model one wishes to fit to the data. In
general, the model is either determined on theoretical grounds (e.g., by the laws of physics)
or represents the experimenter’s choice. Once a least squares formulation has been chosen
the problem becomes purely mathematical. The above model has two unknowns α 1 and α2 ,
i.e., two degrees of freedom, which are to be pinned down to minimize a weighted sum of
the errors between the measurements and the values predicted by the model. As is often the
case, the problem is compounded by constraints imposed on the parameters. For example,
while α1 may take on any value the standard deviation α2 is required to be positive. This
means that we have a so-called constrained minimization problem
minimize F (α1 , α2 )
11
over the set
−∞ < α1 < ∞
α2 > 0.
∂F
=0
∂α1
∂F
=0
∂α2
with Newton’s method or its many variants. However, non-linear least squares is not for the
faint of heart and requires good mathematical insight and/or a powerful program library.
The situation is much improved if the mathematical model is linear in α
~ so that is has
the general form
N
X
L(x)~
α= gj (x)αj
j=1
where the {gj (x)} are given functions and the {αj } represent N unknown parameters which
α approximates ui for all i. Typically M À N , i.e., we have
have to be found such that (Lxi )~
many more observations than degrees of freedom. The weighted least squares minimization
problem is then
M
X
minimize F (~
α) = α )2
wi (u − (Lxi )~
i=1
α) = ui − (Lxi )~
ri (~ α
minimize hW r, ri
12
where
W is the diagonal matrix W = diag{w1 , w2 , . . . , wM }
r = (r1 , . . . , rM )
N
X
L(xi )~
α= gj (xi )αj
j=1
r = ~u − A~
α
Aij = gj (xi ).
Hence we need to
α) = hW (u − A~
minimize F (~ α), u − A~
αi.
∂F (~
α)
=0 for all j.
∂αj
We compute
∂F (~
α)
= h−W Aj , u − A~
αi + hW (u − A~
α), −Aj i, j = 1, . . . , N
∂αj
(W A)T (~u − A~
α) + AT W (~u − A~
α) = 0
which is identical to
AT W A~
α = AT W ~u.
13
This is a square system which has to be solved for α
~ . These N equations are sometimes
referred to as the normal equations of the linear least squares method. Let us illustrate the
linear least squares method in the following setting. Suppose at times {t i } a portfolio has
observed values {Mi } for i = 1, . . . , M .
It is our belief that M(t) should grow exponentially according to
M(t) = α1 eα2 t .
ui = log M(ti ), i = 1, . . . , M
where
β1 = log α1
β2 = α 2
Ai1 = 1
Ai2 = ti .
The choice of weights will reflect our judgment which of the data points are more significant.
If all share equally then we would set W equal to the identity. The normal equations for
W = I can now be written in the form of a 2 × 2 system
µ ¶µ ¶ µ ¶
hA1 , A1 i hA1 , A2 i β1 iA1 , ui
=
hA1 , A2 i hA2 , A2 i β2 hA2 , ui
where Ai is the ith column of A and h , i again denotes the dot product of two vectors. We
solve for (β1 , β2 ) and then find (α1 , α2 ).
Closely related to the least squares fitting of data is the approximation of a given
function defined on an interval [a, b] in terms of a set of other functions.
14
For example, a financial record may be observed to be periodic in time with a period
of, say one month. The question may now be asked whether the overall record is the
superposition of processes which are periodic with shorter periods, say half a month, a
week, a day, in short, with periods T /n for n = 1, 2, . . . , N . If f (t) is the observed record
with t measured in days, then the mathematical problem can be stated loosely as: Can f be
approximated in terms of the set of functions {sin 2πnt/30, cos 2πnt/30} for n = 1, 2, . . . , N .
If so then, for example, the terms involving sin 4πt/30 and cos 4πt/30 would roughly be the
contribution to the total record of the processes whose period is half a month since both
the sine and cosine function go through two complete cycles in every 30 day time span.
In general, the mathematical problem is: Given f defined on [a, b] and functions
{φn (t)}N
n=1 find an approximation P f to f of the form
N
X
P f (t) = αn φn (t).
n=1
~ ) = f (t) − P f (t)
E(t, α
is the error of the approximation. One may now attempt to determine ~α̂ such that
This means the maximum error on [a, b] is minimized which would seem a good criterion
for determining {α1 , . . . , αN }. Unfortunately, this P f is not easy to calculate because the
standard tools of calculus do not apply since the absolute value function is not differentiable.
The approximation problem becomes tractable if we choose to minimize the error in
the mean square sense. This means we wish to find ~α̂ such that
Z b Z b
E(t, ~α̂)2 w(t)dt ≤ ~ )2 w(t)dt
E(t, α
a a
15
where we have included a weight function w with the property that w is continuous and
that w(t) > 0 on [a, b] except at finitely many points where its value may be zero. This
problem is solvable because Z b
F (~
α) = ~ )2 w(t)dt
E(t, α
a
∂F
=0 for i = 1, . . . , N.
∂αi
From à !2
Z b N
X
F (~
α) = f (t) − αn φn (t) w(t)dt
a n=1
we find à !
b N
∂F
Z X
= −2 f (t) − αn φn (t) φi (t)w(t)dt = 0.
∂αi a n=1
A~
α=b
Under the natural condition that the functions {φn (t)} are linearly independent, i.e., that no
function is a linear combination of the remaining functions, the matrix A will be invertible
and the linear system is uniquely solvable. Since the error can be made large by taking very
large coefficients it is clear that the critical point where ∂F/∂αi = 0 is in fact a minimizer
for the mean square error.
As an illustration consider the following problem: Find the polynomial of degree ≤ 2
which approximates best in the mean square sense (with weight function w(t) ≡ 1) the
function f (t) = t3 over the interval [−1, 1]. In this case
P f (t) = α0 1 + α1 t + α2 t2
16
where φ0 (t) ≡ 1, φ1 (t) ≡ t and φ2 (t) ≡ t2 . The system A~
α = b (with indices running from
0 to 2) has the following entries
Z 1 Z 1 Z 1
A00 = 1 1 dt = 2 A01 = 1t dt = 0 A02 = 1t2 dt = 2/3
−1 −1 −1
Z 1 Z 1 Z 1
A10 = 1t dt = 0 A11 = tt dt = 2/3 A12 = tt2 dt = 0
−1 −1 −1
Z 1 Z 1 Z 1
A20 = 1t2 dt = 2/3 A21 = tt2 dt = 0 A22 = t2 t2 dt = 2/5
−1 −1 −1
Z 1
b0 = t3 1 dt
−1
Z 1
b1 = t3 t dt = 2/5
−1
Z 1
b2 = t3 t2 dt = 0.
−1
The equation
2 0 2/3 0
0 2/3 0 α ~ = 2/5
2/3 0 2/5 0
has the unique solution
0
~ = 3/5
α
0
so that
P f (t) = 3/5t
Aij = 0 for i 6= j.
The functions {φn } are then said to be orthogonal with respect to the weight function w.
In this case it follows that Rb
f (t)φi (t)w(t)dt
αi = R ab .
a
φi (t)φi (t)w(t)dt
In this setting the αi is known as a Fourier coefficient. This situation arises routinely when
f is approximated by a Fourier series where the functions {φn } are trigonometric functions
as outlined above.
17