0% found this document useful (0 votes)
23 views222 pages

Numerical Solution of Parabolic Equation

This document is a monograph on the numerical solution of parabolic equations, focusing on finite difference methods applicable in various fields such as Physics, Geology, and Economics. It is structured into 16 chapters and 3 appendices, covering topics from basic definitions to advanced methods like ADI-methods and error estimation techniques. The text is intended for readers with a foundational understanding of calculus, linear algebra, and programming, but does not require extensive knowledge of partial differential equations.

Uploaded by

Arabela Paval
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views222 pages

Numerical Solution of Parabolic Equation

This document is a monograph on the numerical solution of parabolic equations, focusing on finite difference methods applicable in various fields such as Physics, Geology, and Economics. It is structured into 16 chapters and 3 appendices, covering topics from basic definitions to advanced methods like ADI-methods and error estimation techniques. The text is intended for readers with a foundational understanding of calculus, linear algebra, and programming, but does not require extensive knowledge of partial differential equations.

Uploaded by

Arabela Paval
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 222

Numerical Solution of

Parabolic Equations
Department of Computer Science

Ole Østerby
Monograph
ISBN 978-87-7507-313-9

DOI 10.7146/aul.5.5
Department of Computer Science April 30, 2014
Aarhus University

Numerical Solution of
Parabolic Equations

Ole Østerby

April 2014
Preface

These lecture notes are designed for a one semester course (10 ECTS). They
deal with a rather specific topic: finite difference methods for parabolic partial
differential equations as they occur in many areas such as Physics (diffusion, heat
conduction), Geology (ground water flow), and Economics (finance theory).
The reader is supposed to have a basic knowledge of calculus (Taylor’s formula),
complex analysis (exp(it)), linear algebra (eigenvalues and eigenvectors for matri-
ces), and some familiarity with programming. A deeper knowledge in the theory
of partial differential equations is, however, not required.
The text is divided into 16 chapters and 3 appendices, some long, some short,
and some very short – which does not necessarily reflect the importance. If one is
planning a shorter (5 ECTS) course one possibility is to omit Chapters 6, 7, and
8 and stop at (or after) Chapter 12. The interconnection between the various
chapters and appendices is illustrated in the reading pattern:

8 6 B

1 2 3 4 5 9 10 11 12 13

7 14 A 15 16 C

Chapter 1 contains the basic introduction to parabolic equations (existence,


uniqueness, well-posedness) and to the finite difference schemes which the book
is all about. Two sections deal with the solution of (almost) tridiagonal lin-
ear systems of equations, and the Lax-Richtmyer equivalence theorem is briefly
mentioned.
In Chapter 2 we discuss stability using the Fourier (or von Neumann) method, and
in Chapter 3 we define the local truncation error as a means to discuss accuracy of
the finite difference schemes. In Chapter 4 we look closely at boundary conditions
which, if they contain derivatives, deserve special attention.

i
In Chapter 5 we study equations with a significant first order (convection) term.
These equations are somewhat related to the one-way wave equation which is dis-
cussed in Appendix A. The next three Chapters can be omitted if time requires.
Chapter 6 studies an alternate approach to stability analysis using matrix eigen-
values. Chapter 7 explains why we seldom use two-step methods, and Chapter 8
gives various suggestions on how to fight the adverse effects of discontinuities in
the initial and/or boundary conditions.
A very important – and difficult – topic is to estimate the global error, i.e. the
difference between the true and the computed solution. Chapter 9 provides one
method to study this global error including the effect of boundary conditions
(with a derivative).
The method of Chapter 9 gives some answers but is not ideal in practice. In
Chapter 10 we introduce a practical way of estimating error which works in
many cases – and issues warning signals when the results may not be trustworthy.
Having a reliable error estimate enables us to choose a reasonable set of step sizes,
small enough to meet a given error tolerance, but not excessively small such that
we waste computer time. A reliable error estimate also opens possibilities for
extrapolation thereby gaining extra accuracy at a minimal effort. This chapter is
probably the most important one, containing material which is not readily found
elsewhere.
In Chapter 11 we take up problems with two space variables. We want to take
advantage of the good stability properties of implicit methods but not pay the
price incurred by solving large systems of equations. The answer is ADI-methods
where we solve for one space variable at a time using tridiagonal systems. ADI-
methods cannot be used directly on equations with a mixed derivative term. We
show in Chapter 12 how to modify our methods to take care of this without
sacrificing stability or efficiency. Chapter 13 is devoted to two examples from
Finance Theory involving two-factor models.
One should stay away from ill-posed problems. What may happen if we don’t is
illustrated in Chapter 14.
Chapter 15 treats the Stefan problem, an example of a moving boundary problem
where the solution region is not known beforehand, but a boundary curve must be
found together with the solution. For this particular problem we can avoid (some
of the) interpolation problems by varying the time step size. Another example
of a moving boundary problem is the American option of Chapter 16 where the
exercise boundary is not known in advance but must be determined together with
the solution.
In Appendix A we evaluate a number of difference schemes for the one-way
wave equation which is related to the convection-diffusion equation of Chapter 5.

ii
Apendix B introduces two classes of test problems with two space variables and
in Appendix C we discuss some side effects of interpolation with importance for
our methods for moving boundary problems.
Among the special features in this book we can mention

• An efficient way of estimating the order and accuracy of a method on a


particular problem.

• The possibility of extrapolating to get higher order and better accuracy.

• An efficient way of determining (near-)optimal step sizes to meet a pre-


scribed error tolerance (Chapter 10).

• A systematic way of incorporating inhomogeneous terms and boundary con-


ditions in ADI-methods (Chapter 11).

• A systematic way of incorporating mixed derivative terms in ADI-methods


(Chapter 12).

I should like to express my thanks to colleagues at the Danish Technical University


and to several students at Aarhus University in Computer Science, Geology, and
especially Finance Theory who have been exposed to more or less preliminary
versions of these lecture notes. The questions that arose in various discussions
have been the inspiration for many of the topics I have taken up and where
answers are not readily found in common text books.

iii
iv
Contents

1 Basics 1
1.1 Differential equations . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Main types of PDEs . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Separation of variables . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Side conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Well-posed and ill-posed problems . . . . . . . . . . . . . . . . 4
1.6 Two test problems . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.7 Difference operators . . . . . . . . . . . . . . . . . . . . . . . . 5
1.8 Difference schemes . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.8.1 The explicit method . . . . . . . . . . . . . . . . . . . . 7
1.8.2 The implicit method . . . . . . . . . . . . . . . . . . . . 8
1.8.3 Crank-Nicolson . . . . . . . . . . . . . . . . . . . . . . . 8
1.8.4 The general θ-method . . . . . . . . . . . . . . . . . . . 9
1.8.5 The operators P , Pk,h and Rk,h . . . . . . . . . . . . . . 9
1.9 Two-step schemes . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.10 Error norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.11 Tridiagonal systems of equations . . . . . . . . . . . . . . . . . 11
1.12 Almost tridiagonal systems . . . . . . . . . . . . . . . . . . . . 13
1.13 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.14 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

v
2 Stability 19
2.1 Fourier analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Two examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Fourier analysis and differential equations . . . . . . . . . . . . 22
2.4 Von Neumann analysis . . . . . . . . . . . . . . . . . . . . . . . 23
2.5 Implicit methods . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6 Two kinds of stability . . . . . . . . . . . . . . . . . . . . . . . 27
2.7 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3 Accuracy 29
3.1 Taylor expansions . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Symbols of operators . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 The local error . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4 Boundary Conditions 35
4.1 A Dirichlet condition . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Derivative boundary conditions . . . . . . . . . . . . . . . . . . 36
4.3 A third test problem . . . . . . . . . . . . . . . . . . . . . . . . 36
4.4 The explicit method . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4.1 First order approximation . . . . . . . . . . . . . . . . . 37
4.4.2 Asymmetric second order . . . . . . . . . . . . . . . . . 37
4.4.3 Symmetric second order . . . . . . . . . . . . . . . . . . 38
4.5 The implicit method . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5.1 First order approximation . . . . . . . . . . . . . . . . . 38
4.5.2 Asymmetric second order . . . . . . . . . . . . . . . . . 38
4.5.3 Symmetric second order . . . . . . . . . . . . . . . . . . 39
4.6 The θ-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

vi
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 The Convection-Diffusion Equation 41


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Maximum principle . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.3 The explicit method . . . . . . . . . . . . . . . . . . . . . . . . 42
5.4 The upwind scheme . . . . . . . . . . . . . . . . . . . . . . . . 44
5.5 The implicit method . . . . . . . . . . . . . . . . . . . . . . . . 47
5.6 Crank-Nicolson . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.7 Comparing the methods . . . . . . . . . . . . . . . . . . . . . . 48
5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6 The Matrix Method 51


6.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.2 The explicit method . . . . . . . . . . . . . . . . . . . . . . . . 51
6.3 The implicit method . . . . . . . . . . . . . . . . . . . . . . . . 53
6.4 The θ-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.5 Stability by the matrix method . . . . . . . . . . . . . . . . . . 54
6.6 Eigenvalues of tridiagonal matrices . . . . . . . . . . . . . . . . 55
6.7 The influence of boundary values . . . . . . . . . . . . . . . . . 58
6.8 A derivative boundary condition . . . . . . . . . . . . . . . . . 59
6.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

7 Two-step Methods 61
7.1 The central-time central-space scheme . . . . . . . . . . . . . . 61
7.2 The DuFort-Frankel scheme . . . . . . . . . . . . . . . . . . . . 62
7.3 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

8 Discontinuities 65

vii
8.1 Stability and damping . . . . . . . . . . . . . . . . . . . . . . . 65
8.2 The growth factor . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.3 Reducing the Crank-Nicolson oscillations . . . . . . . . . . . . . 67
8.3.1 AV – the moving average . . . . . . . . . . . . . . . . . 67
8.3.2 IM1 – One step with IM . . . . . . . . . . . . . . . . . . 68
8.3.3 SM – Small steps at the beginning . . . . . . . . . . . . 68
8.3.4 Pearson . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.3.5 EI – Exponentially increasing steps . . . . . . . . . . . . 68
8.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.5 A discontinuous corner . . . . . . . . . . . . . . . . . . . . . . . 69
8.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

9 The Global Error – Theoretical Aspects 71


9.1 The local error . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
9.2 The global error . . . . . . . . . . . . . . . . . . . . . . . . . . 71
9.3 The explicit method . . . . . . . . . . . . . . . . . . . . . . . . 72
9.4 The initial condition . . . . . . . . . . . . . . . . . . . . . . . . 73
9.5 Dirichlet boundary conditions . . . . . . . . . . . . . . . . . . . 74
9.6 The error for the explicit method . . . . . . . . . . . . . . . . . 74
9.7 The implicit method . . . . . . . . . . . . . . . . . . . . . . . . 75
9.7.1 An example . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.8 Crank-Nicolson . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.8.1 Example continued . . . . . . . . . . . . . . . . . . . . . 78
9.9 Upwind schemes . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.10 Boundary conditions with a derivative . . . . . . . . . . . . . . 79
9.10.1 First order approximation . . . . . . . . . . . . . . . . . 80
9.10.2 Asymmetric second order . . . . . . . . . . . . . . . . . 80

viii
9.10.3 Symmetric second order . . . . . . . . . . . . . . . . . . 81
9.10.4 Test problem 3 revisited . . . . . . . . . . . . . . . . . . 82
9.11 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

10 Estimating the Global Error and Order 85


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.2 The global error . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.3 Can we trust these results? . . . . . . . . . . . . . . . . . . . . 88
10.4 Further improvements of the error estimate . . . . . . . . . . . 90
10.5 Two independent variables . . . . . . . . . . . . . . . . . . . . . 92
10.6 Limitations of the technique. . . . . . . . . . . . . . . . . . . . 94
10.7 Test problem 3 – once again . . . . . . . . . . . . . . . . . . . . 96
10.8 Which method to choose . . . . . . . . . . . . . . . . . . . . . . 99
10.9 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
10.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

11 Two Space Dimensions 103


11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
11.2 The explicit method . . . . . . . . . . . . . . . . . . . . . . . . 104
11.3 Implicit methods . . . . . . . . . . . . . . . . . . . . . . . . . . 105
11.4 ADI methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
11.5 The Peaceman-Rachford method . . . . . . . . . . . . . . . . . 107
11.6 Practical considerations . . . . . . . . . . . . . . . . . . . . . . 108
11.7 Stability of Peaceman-Rachford . . . . . . . . . . . . . . . . . . 110
11.8 D’Yakonov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
11.9 Douglas-Rachford . . . . . . . . . . . . . . . . . . . . . . . . . . 111
11.10 Stability of Douglas-Rachford . . . . . . . . . . . . . . . . . . . 113
11.11 The local truncation error . . . . . . . . . . . . . . . . . . . . . 113

ix
11.12 The global error . . . . . . . . . . . . . . . . . . . . . . . . . . 116
11.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

12 Equations with Mixed Derivative Terms 121


12.1 Practical considerations . . . . . . . . . . . . . . . . . . . . . . 122
12.2 Stability with mixed derivative . . . . . . . . . . . . . . . . . . 123
12.3 Stability of ADI-methods . . . . . . . . . . . . . . . . . . . . . 126
12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
12.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

13 Two-Factor Models – two examples 133


13.1 The Brennan-Schwartz model . . . . . . . . . . . . . . . . . . . 133
13.2 Practicalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
13.3 A Traditional Douglas-Rachford step . . . . . . . . . . . . . . . 139
13.4 The Peaceman-Rachford method . . . . . . . . . . . . . . . . . 140
13.5 Fine points on efficiency . . . . . . . . . . . . . . . . . . . . . . 141
13.6 Convertible bonds . . . . . . . . . . . . . . . . . . . . . . . . . 142

14 Ill-Posed Problems 143


14.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
14.2 Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
14.3 Variable coefficients – An example . . . . . . . . . . . . . . . . 148

15 A Free Boundary Problem 149


15.1 The Stefan problem . . . . . . . . . . . . . . . . . . . . . . . . 149
15.2 The Douglas-Gallie method . . . . . . . . . . . . . . . . . . . . 150
15.3 The global error . . . . . . . . . . . . . . . . . . . . . . . . . . 153
15.4 Estimating the global error . . . . . . . . . . . . . . . . . . . . 157

x
16 The American Option 161
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
16.2 The mathematical model . . . . . . . . . . . . . . . . . . . . . 161
16.3 The boundary condition at infinity . . . . . . . . . . . . . . . . 164
16.4 Finite difference schemes . . . . . . . . . . . . . . . . . . . . . . 168
16.5 Varying the time steps . . . . . . . . . . . . . . . . . . . . . . . 171
16.6 The implicit method . . . . . . . . . . . . . . . . . . . . . . . . 172
16.7 Crank-Nicolson . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
16.8 Determining the order . . . . . . . . . . . . . . . . . . . . . . . 182
16.9 Efficiency of the methods . . . . . . . . . . . . . . . . . . . . . 185

A The One-Way Wave Equation 187


A.1 Forward-time forward-space . . . . . . . . . . . . . . . . . . . . 187
A.2 Forward-time backward-space . . . . . . . . . . . . . . . . . . . 188
A.3 Forward-time central-space . . . . . . . . . . . . . . . . . . . . 189
A.4 Central-time central-space or leap-frog . . . . . . . . . . . . . . 189
A.5 Lax-Friedrichs . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
A.6 Backward-time forward-space . . . . . . . . . . . . . . . . . . . 191
A.7 Backward-time backward-space . . . . . . . . . . . . . . . . . . 191
A.8 Backward-time central-space . . . . . . . . . . . . . . . . . . . 192
A.9 Lax-Wendroff . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
A.10 Crank-Nicolson . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
A.11 Overview of the methods . . . . . . . . . . . . . . . . . . . . . 194

B A Class of Test Problems 197

C Interpolation and the Order Ratio 199


C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

xi
C.2 Linear interpolation . . . . . . . . . . . . . . . . . . . . . . . . 199
C.3 Three-point interpolation . . . . . . . . . . . . . . . . . . . . . 201

Bibliography 203

xii
Chapter 1

Basics

1.1 Differential equations

An ordinary differential equation (ODE) is a relation between a function, y, of


one independent variable, x, and its derivative, and possibly derivatives of higher
order, e.g.
d3 y d2 y dy
+ a + b = f (x).
dx3 dx2 dx
A solution to the differential equation is a differentiable function y(x) which
satisfies this equation.
A partial differential equation (PDE) is the generalization to functions of two or
more independent variables, e.g.

∂u ∂2u ∂u
−b 2 +a − κu = ν(t, x).
∂t ∂x ∂x
A solution is a differentiable function u(t, x) which satisfies the equation.
Notation We shall in the following use subscripts to denote partial derivatives,
e.g.
∂u ∂u ∂2u
ut = , ux = , uxx = , etc.
∂t ∂x ∂x2
The partial differential equation above now reads

ut − buxx + aux − κu = ν(t, x)

The coefficients a, b, κ may be functions of t and x but in most of our theoretical


investigations we shall treat them as constants.

1
1.2 Main types of PDEs

The most commonly occurring PDEs are divided into three groups:

type simple example


Elliptic uxx + uyy = 0 Laplace’s equation
Parabolic ut − uxx = 0 Heat equation
Hyperbolic utt − uxx = 0 Wave equation

Parabolic and hyperbolic equations describe phenomena which evolve in time


whereas elliptic equations describe steady state situations. In physics the elliptic
equations model an electrostatic field, the parabolic equations model diffusion or
heat conduction problems, and the hyperbolic equations model wave motion. In
recent years parabolic equations have been used increasingly to model systems in
mathematical economy and finance theory. The remainder of this book will be
aimed exclusively at methods for solving parabolic equations.

1.3 Separation of variables

Some of the simpler PDEs such as the simple heat equation:

ut = uxx (1.1)

can be solved using separation of variables, i.e. we assume the solution u(t, x)
can be written as a product of a function of t and a function of x:

u(t, x) = T (t)X(x). (1.2)

Inserting in the differential equation we get

T ′ (t)X(x) = T (t)X ′′ (x).

If X(x) ≡ 0 then u(t, x) ≡ 0 which is a (trivial) solution to (1.1). Since we are


interested in non-trivial solutions we can assume that there is an x1 such that
X(x1 ) 6= 0 and therefore X(x) 6= 0 in a neighbourhood around x1 . Similarly we
can find a t1 such that T (t) 6= 0 in a neighbourhood around t1 . In a neighbourhood
around (t1 , x1 ) we can therefore divide by T (t)X(x) and get
T ′ (t) X ′′ (x)
= ,
T (t) X(x)
but since the left-hand-side is only a function of t and the right-hand-side is only
a function of x the result must be a constant which could be either positive or

2
negative. Taking the latter case first and setting the constant equal to −ω 2 where
ω is a real number we end up with two ODEs, one for T and one for X. The
general solutions are

T (t) = ct exp(−ω 2 t), X(x) = cx cos(ωx + ϕ).

Combining these we find that

u(t, x) = c exp(−ω 2 t) cos(ωx + ϕ) (1.3)

is a general solution to the simple heat equation. As this equation is linear and
homogeneous any linear combination of two or more solutions (e.g. with different
values of ω) is also a solution. We need extra information to determine the values
of c, ω, and ϕ. If for instance the initial value at t = 0 is a cosine or a linear
combination of cosines then we get a solution by multiplying each cosine by the
appropriate exponential factor. We shall return to this in sections 1.4 and 1.6.
If the constant above is positive we put it on the form +ω 2 and we are in a similar
manner led to a general solution of (1.1) in the form

u(t, x) = c exp(ω 2t) cosh(ωx + ϕ) (1.4)

Functions of type (1.4) grow without bound with increasing x and t. If we are
interested in bounded solutions, functions of this type must have zero weight.

1.4 Side conditions

As illustrated by (1.3) solutions to differential equations are not unique unless we


impose extra conditions. For parabolic equations it is customary to specify an
initial condition, i.e. to specify the solution at some initial time, usually t = 0:

u(0, x) = u0 (x) (1.5)

where u0 (x) is a given function. If this is supposed to hold for −∞ < x < ∞ then
we speak of an initial value problem (IVP). More typical in practical situations,
however, is to specify the initial value on a finite interval X1 ≤ x ≤ X2 in which
case we must also specify boundary conditions for x = X1 and x = X2 . On the
left boundary such a boundary condition may be of the general form

α1 u(t, X1 ) − β1 ux (t, X1 ) = γ1 , t>0 (1.6)

where α1 , β1 , and γ1 may depend on t. If β1 = 0 we speak of a Dirichlet condition


(J.P.G.L. Dirichlet, 1805 – 1859). If α1 = 0 we speak of a Neumann condition
(Carl Neumann, 1832 – 1925). If both α1 and β1 are different from 0 we speak

3
of a boundary condition of the third kind or a Robin condition (Gustave Robin,
1855 – 1897). On the right boundary we have a similar condition:

α2 u(t, X2) + β2 ux (t, X2 ) = γ2 , t > 0. (1.7)

In this case we speak of an initial-boundary value problem (IBVP).


The choice of signs in (1.6) and (1.7) may seem a bit strange at first sight. If β1
and β2 are non-zero then we can assume without loss of generality that they are
equal to 1. In this case positive values for α1 and α2 will ensure that the solutions
of the differential equation are not exponentially growing. We refer the reader
to [7], [18], and [25] for a further discussion of the effect of derivative boundary
conditions on the qualitative nature of the solutions, but here it shall be our
general assumption that α1 , α2 , β1 , and β2 are all non-negative.

1.5 Well-posed and ill-posed problems

If in an IVP we change the initial function by a small amount we hope and


expect that the effect on the solution function at some later time will also be
small. This is indeed the case for the simple heat equation and is related to
the fact that a maximum principle (cf. section 5.2) exists for this equation. We
say that the problem is well-posed and we shall discuss this property in more
detail in Chapter 2. If on the other hand we would attempt to solve the heat
equation in the opposite time direction – or what amounts to the same – would
consider the equation ut = −uxx , then arbitrarily small changes in the initial
condition (e.g. with large values of ω) would imply large deviations in the solution
at later times. We say that this problem is ill-posed. Such problems are not
well suited as mathematical models and attempts to solve them numerically are
doomed to disaster. The reason for this is that when we attempt to solve a
differential equation numerically we invariably introduce small (rounding) errors,
typically with large frequencies (ω) already in the first time step. In an ill-posed
problem these perturbations will be amplified in the following time steps thereby
distorting the solution function beyond recognition. We shall discuss this further
in Chapters 2 and 14.

1.6 Two test problems

The following two test problems will be used extensively in examples and exercises
to demonstrate the behaviour of various techniques and methods.

4
Problem 1
ut = uxx , −1 ≤ x ≤ 1, t > 0,
u(0, x) = u0 (x) = cos x, −1 ≤ x ≤ 1,
u(t, −1) = u(t, 1) = e−t cos 1, t > 0.

This is an IBVP with boundary conditions of Dirichlet type. It is easily seen


from formula (1.3) that the true solution is u(t, x) = e−t cos x.

Problem 2
ut = uxx , −1 ≤ x ≤ 1, t > 0,

1,

 |x| < 21 ,
u(0, x) = u0 (x) = 0, |x| = 21 ,


−1, |x| > 21 .

We now have a discontinuous initial condition. The solution will, however, be


continuous and infinitely often differentiable for t > 0. The true solution can be
found by taking the Fourier cosine series for the initial function and appending
the corresponding exponential factors as given by formula (1.3):

4X cos((2j + 1)πx) −(2j+1)2 π2 t
u(t, x) = (−1)j e . (1.8)
π j=0 2j + 1

If we take the boundary values at x = −1 og x = 1 from this series we obtain a


Dirichlet problem.
Note that we only need rather few terms in the infinite sum in order to achieve
any specific finite accuracy when t > 0 (cf. Exercise 2).

1.7 Difference operators

In most practically occurring cases an analytical solution to an IBVP for a


parabolic equation cannot be obtained and we must resort to numerical tech-
niques.
We choose a step size in the t-direction, k, and a step size in the x-direction, h,
usually chosen as h = (X2 − X1 )/M for some integer M, and we wish to find
approximations
n
vm = v(nk, X1 + mh)

5
to the true solution u(t, x) at all grid points (cf. Fig. 1.1)

(t, x) = (nk, X1 + mh), m = 0, 1, . . . M, n = 1, 2, . . . , N

where T = Nk is the maximum time. We do this by approximating the partial


derivatives with difference quotients.

X1 h X2

Figure 1.1: The (t, x) grid.

We first introduce the shift operator in the x-direction, E:


n n
Evm = vm+1

and the mean value operator, µ̃:

µ̃ = (E 1/2 + E −1/2 )/2.

Remark. In the literature the mean value operator usually appears without the
tilde. We have introduced the tilde here to avoid confusion with another µ to be
introduced in the next section. ✷
We now have three different approximations to D, the operator which denotes
the partial derivative w.r.t. x:

Forward difference: ∆ = (E − 1)/h,


Backward difference: ∇ = (1 − E −1 )/h,
Central difference: δ = (E 1/2 − E −1/2 )/h.

6
δ and µ̃ refer to half-way points and are most useful in combinations such as:

µ̃δ = (E − E −1 )/(2h),

δ 2 = (E − 2 + E −1 )/h2 .
The former is another approximation to D, the latter is an approximation to D 2 ,
the second partial derivative w.r.t. x.

To approximate Dt = ∂t
we have a choice between

∆t = (Et − 1)/k,

∇t = (1 − Et−1 )/k,
and
µ̃t δt = (Et − Et−1 )/(2k)
where the shift operator in the t-direction is
n n+1
Et vm = vm .

1.8 Difference schemes

1.8.1 The explicit method

To approximate the heat equation

ut − buxx = 0 (1.9)

we can thus suggest


n
∆t vm − b δ 2 vm
n
= 0 (1.10)

or written out
n+1 n n n n
vm − vm vm+1 − 2vm + vm−1
= b . (1.11)
k h2
Introducing the step ratio
k
µ = (1.12)
h2
this equation can be rewritten as
n+1 n n n n
vm = vm + bµ(vm+1 − 2vm + vm−1 ) (1.13)

7
n+1
and thus provides a value for vm explicitly from values at the previous time
level. The method is therefore called the explicit method. In Fig. 1.1 we have
marked the stencil for the explicit method, i.e. the geometric pattern describing
the points which enter into the basic formula (1.11). If values for the solution
0
function are given at the initial time level, vm , m = 0, 1, . . . , M, then (1.13) can
1
be used to provide values for vm for m = 1, 2, . . . , M −1, i.e. for all internal points
at time level 1. If Dirichlet boundary values are given for x = X1 and x = X2
1
then we have values for vm also for m = 0 and m = M. We now have a complete
set of values at time level 1 and can proceed from here to time level 2, 3, etc.

1.8.2 The implicit method

Another numerical formula for (1.9) is


n+1
∇t vm − b δ 2 vm
n+1
= 0 (1.14)

or written out
n+1
vm n
− vm v n+1 − 2vm
n+1 n+1
+ vm−1
− b m+1 = 0 (1.15)
k h2
or
n+1 n+1 n+1 n+1 n
vm − bµ(vm+1 − 2vm + vm−1 ) = vm . (1.16)

This formula expresses an implicit relation between three neighbouring function


values at the advanced time level and is therefore generally known as the implicit
method [19] although strictly speaking it is not a method until we have specified
a technique for solving the resulting tridiagonal set of linear equations. We shall
do this in the next section.

1.8.3 Crank-Nicolson

A third important formula is Crank-Nicolson [8]:

n 1
∆t vm − b(δ 2 vm
n
+ δ 2 vm
n+1
) = 0. (1.17)
2

Remark. Another way of writing Crank-Nicolson which better expresses the


symmetric nature of the formula is
n+ 12
(δt −bµ̃t δ 2 )vm = 0. ✷

8
1.8.4 The general θ-method

The three formulae above are special cases of the general θ-method:
n
∆t vm − b((1 − θ)δ 2 vm
n
+ θδ 2 vm
n+1
) = 0 (1.18)
corresponding to θ = 0, 1, and 12 , respectively. The linear equations in the general
case look like
n+1 n+1 n+1 n+1
vm − θbµ(vm+1 − 2vm + vm−1 ) = (1.19)
n n n n
vm + (1 − θ)bµ(vm+1 − 2vm + vm−1 ).

Table 1.1: Stencils for the methods

Method Stencil
Explicit •
•••
Implicit •••

Crank-Nicolson •••
•••

•••
Richardson •
DuFort Frankel ••
••

In Table 1.1 we have given the stencils for the methods we have just defined
together with two more which we shall discuss in Chapter 7.

1.8.5 The operators P , Pk,h and Rk,h

For the general parabolic equation


P u ≡ ut − buxx + aux − κu = ν
the θ-method approximates the differential operator P by the difference operator
Pk,h :
n n
Pk,h vm = ∆t vm − ((1 − θ)I + θEt )(bδ 2 − aµ̃δ + κ)vm
n

where I denotes the identity, and for the right-hand-side we suggest


n n
Rk,h νm = ((1 − θ)I + θEt )νm .
So the differential equation
Pu = ν (1.20)

9
is approximated by the difference scheme

Pk,h v = Rk,h ν. (1.21)

We mention in passing that there are other options for approximating the ux -
term. We shall return to these in Chapter 5.
Since the operator Rk,h is an approximation to the identity it has a well-defined
−1
inverse Rk,h and if we apply this to (1.21) we get

−1
Rk,h Pk,h v = ν. (1.22)

Comparing (1.20) and (1.22) we see that

−1
Rk,h Pk,h ≈ P

or, since we really don’t want to work with the inverse:

Pk,h − Rk,h P ≈ 0.

We shall return to this operator in section 1.13 and Chapter 3.

1.9 Two-step schemes

All the above-mentioned difference schemes are examples of one-step schemes in


the sense that they span one time step. They take data from one time level (n)
in order to compute values for the succeeding time level (n + 1). As an example
of a two-step scheme for (1.9) we mention

n
µ̃t δt vm = bδ 2 vm
n

or
n+1
vm n−1
− vm v n − 2vm
n n
+ vm−1
= b m+1 . (1.23)
2k h2

This scheme spans two time steps taking data from both time level n and n − 1
in order to compute values at time level n + 1. We shall return to this scheme in
Chapter 7 and here just mention that two-step methods require a special starting
procedure since initial values are usually only specified at one time level (n = 0).

10
1.10 Error norms
n
The difference between the true solution u(t, x) and the numerical solution vm
(t = nk, x = X1 + mh) is the error. It is only defined at the points where we
have a numerical solution, i.e. at the grid points as defined by the step sizes k
and h, although we shall find ways to extend the error function as a differentiable
function between the grid points in Chapter 9.
To study the behaviour of the error as a function of the step sizes k and h (and
the time t) we shall use various norms. It may be important for the user that
the error at any specific (grid-)point does not exceed a given tolerance. The user
will therefore be interested in the max-norm (or sup-norm or ∞-norm) at time
t = nk:

e∞ (t) = ||un − v n ||∞ = n


max |u(nk, X1 + mh) − vm |. (1.24)
0≤m≤M

The max-norm is rather difficult to analyze mathematically and we shall therefore


often study the 2-norm:
" M
# 21
X
e2 (t) = ||un − v n ||2 = h n 2
(u(nk, X1 + mh) − vm ) . (1.25)
m=0

Remark. Notice the difference from the usual vector 2-norm in that we in (1.25)
have a factor h = (X2 − X1 )/M in front of the summation. This factor originates
from the Fourier transform (cf. Chapter 2) but comes in handy because we shall
wish to compare the errors corresponding to different values of h, and therefore
the norms from vector spaces of different dimensions. ✷
Remark. The max-norm and the 2-norm of the error will usually exhibit the
same behaviour when the solution is a smooth function in the closed region
{0 ≤ t ≤ T, X1 ≤ x ≤ X2 } but they will often differ considerably when there is
a discontinuity in the initial function. ✷
Remark. When we have Dirichlet boundary conditions the error for m = 0 and
m = M are 0, and the max and the summation only apply to the internal grid
points. ✷

1.11 Tridiagonal systems of equations

When θ > 0 the formula (1.18) is implicit. For each internal point (i.e. m = 1, 2,
n+1 n+1
. . . ,M − 1) we have a linear expression involving the unknowns vm−1 , vm , and
n+1 n+1
vm+1 . We have thus M − 1 equations in the M + 1 unknowns vm , m = 0, 1,

11
. . . , M. If Dirichlet boundary values are specified then v0n+1 and vM
n+1
are given
and we are left with M − 1 equations in M − 1 unknowns (or M + 1 equations
where the first and the last are trivial).
Systems of linear equations are often solved using Gaussian elimination. For a
general set of M − 1 equations in M − 1 unknowns the computational cost is
about 31 M 3 additions and multiplications, but the coefficient matrix of systems
resulting from using the θ-method is tridiagonal, i.e. the only non-zero coefficients
appear in the diagonal and the immediate neighbours above and below, and this
implies a considerable reduction in computing time.
The general equation is written
n+1 n+1 n+1
am vm−1 + bm vm + cm vm+1 = dm , m = 1, 2, . . . , M − 1. (1.26)

Example. For the implicit method on ut = buxx we have


n
am = cm = −bµ, bm = 1+2bµ, dm = vm . ✷

In the first equation (for m = 1) the value of v0n+1 on the left-hand-side is known
from the Dirichlet boundary condition. We can therefore move the corresponding
term to the right-hand-side:

d′1 = d1 − a1 v0n+1
n+1
and similarly for vM in the last equation.
The system of equations now looks like
   n+1   



b1 c1 





v1 





d′1 



 
 
 n+1 
 
 

 a2
 b2 c2 
  v2 
 
 d2 

. . . . = . (1.27)

 
 
 n+1 
 
 


 aM −2 bM −2 cM −2 
 
 vM −2 
 
 dM −2 


 
  n+1 
 
 

 aM −1 bM −1   v   d′ 
M −1 M −1

Using Gaussian elimination we zero out the am thereby modifying the bm and the
dm :

z = am /b′m−1 ; b′m = bm − zcm−1 ; d′m = dm − zd′m−1 . (1.28)

Starting with b′1 = b1 and executing (1.28) for m = 2, 3, . . . , M − 1 we end up


with a triangular set of equations which can be solved from the bottom up (the
back substitution):
n+1 ′ ′ n+1 n+1
vM −1 = dM −1 /bM −1 , vm = (d′m − cm vm+1 )/b′m , m = M − 2, . . . , 1. (1.29)

12
The process will break down if any of the calculated b′m become equal to 0 and
will be numerically unstable if they come close to 0, but this cannot happen for
the systems we consider here.
Example. For the implicit method on ut = buxx we have

a2 (bµ)2 (bµ)2 3
b′2 = b2 − c1 = 1 + 2bµ − ≥ 1 + 2bµ − = 1 + bµ ≥ 1 + bµ
b1 1 + 2bµ 2bµ 2

a3 (bµ)2 (bµ)2
b′3 = b3 − c 2 = 1 + 2bµ − ≥ 1 + 2bµ − ≥ 1 + bµ
b′2 b′2 1 + bµ

By induction we can show that b′m ≥ 1 + bµ for m > 3.


The result for the general θ-method is the topic of Exercise 13. ✷
Remark. The process will break down on ut = buxx + κu if κk = 1 + 2bµ, so we
must be careful when κ > 0. ✷
Altogether the computational cost of solving the system is roughly
3M additions, 3M multiplications, and 2M divisions
for a total of 8M simple arithmetic operations (SAO).
The computational cost of advancing the solution one time step with the implicit
formula is therefore linear in the number of unknowns and actually comparable
to the cost of using the explicit method which is roughly 5M SAO for a trivial
set of equations with a complicated right-hand-side.
For the general θ-method, 0 < θ < 1, (and in particular θ = 1/2) we have both
a system to solve and a complicated right-hand-side so the cost amounts to 13M
SAO per time step. We shall see later that Crank-Nicolson is well worth this
extra cost.
The main observation is that the computational cost for all these schemes,
whether explicit or implicit, is linear in the number of grid points

1.12 Almost tridiagonal systems

In some cases we encounter systems of equations which are not completely tridi-
agonal but still easy to solve. This happens for example in connection with
derivative boundary conditions to be discussed in Chapter 4.

13
A typical system may look like
   n+1   



b0 c0 e0 





v0 





d0 



 
 
 n+1 
 
 

 a
 1 b1 c1   v1
  
 
 d1 

. . . . = . (1.30)

 
 
 n+1 
 
 


 aM −1 bM −1 cM −1 
 
 vM 
 
 dM −1 


 
 
 −1 
 
 

 fM aM bM   n+1
vM   dM 

Using Gaussian elimination we notice that when zeroing out a1 we must also
change c1 using

c′1 = c1 − ze0 where z = a1 /b0 .

Likewise before zeroing out aM we must eliminate fM using equation M −2 which


causes a change in aM (and dM ):

a′M = aM − zcM −2 where z = fM /b′M −2 .

The back substitution is performed as in the tridiagonal case except for the last
step (equation 0) where an extra term involving e0 appears:

v0n+1 = (d0 − c0 v1n+1 − e0 v2n+1 )/b0 .

These extra calculations will complicate the programming slightly; but they do
not affect the overall computational cost which is still 8M SAO for the implicit
method and 13M for the general θ-method.

1.13 Convergence
n
The calculated values vm are supposed to be approximations to the true solution
function u(t, x) at t = nk and x = X1 + mh. As we make the step sizes k and
h smaller we should like these approximations to become better and better. But
this will not always be the case.
Using Taylor series it is fairly easy to verify that our various difference expres-
sions become better and better approximations to the partial derivatives as the
step sizes become smaller. But at each time level we base our computations on
previously computed values with inherent errors. And as h and k become smaller
we must take more steps to get to a specific point (t, x).
And it is not obvious – and indeed not always the case – that the net effect is a
better approximation. But for a numerical method to be useful this must be the
case. This property is captured in the definition of convergence:

14
Definition. A difference scheme is called convergent if when applied to a well-
posed IBVP it produces approximations vh,k (t, x) which converge to the true
solution u(t, x) for any point (t, x) in the region when h → 0, k → 0. ✷
We shall of course assume that the side conditions are applied correctly. It is
not required that vh,k (0, x) coincides exactly with u(0, x) for all h and k but only
that lim vh,k (0, x) = u(0, x) where the limit is for h → 0, k → 0 (and similarly for
boundary conditions).
It is often a difficult task to prove that a numerical scheme converges for any
well-posed IBVP. Luckily it is not necessary either. An important theorem by
Peter Lax [21], [31] breaks the task into two much easier ones:

• to show that the numerical scheme is consistent with the differential equa-
tion and
• to show that the numerical scheme is stable i.e. that errors are not amplified
(too much).

Definition. A difference scheme Pk,h v = Rk,h ν is consistent with a differential


equation P u = ν iff
Pk,h ψ − Rk,h P ψ → 0 as h → 0, k → 0
for all smooth functions ψ. ✷
Since the solution to a parabolic equation is infinitely often differentiable for t > 0
it is not unreasonable to invoke smooth functions here. Since we prefer our results
to have wide applicability we shall require the above convergence to 0 to happen
for any smooth function ψ and not just for particular solution functions u(t, x)
where cancellations may occur and secure a convergence which is not generally
available. We shall return to discuss the concept of consistency in Chapter 3.
Definition. A one-step difference scheme Pk,h v = 0 is stable iff
∀T, ∃CT ||v n ||2 ≤ CT ||v 0||2 , nk ≤ T (1.31)
for h ≤ h0 and k ≤ k0 . ✷
We note that the inhomogeneous term does not play any role for the question of
stability, and neither does the differential operator. Since the difference between
two solutions to the difference scheme also satisfies the homogeneous scheme, the
stability condition implies that errors do not grow without bound.
Remark If (1.31) is satisfied for all h ≤ h0 and k ≤ k0 then we talk of uncondi-
tional stability. In many cases (1.31) is only satisfied in a subset, e.g. character-
ized by k ≤ h2 /2. For a method to be useful there must be a path in this subset

15
leading all the way from (h0 , k0 ) to (0, 0). In such a case we talk of conditional
stability.

1.14 Exercises
1. Solve problem 1 on page 5 with the explicit method (1.13) from t = 0 to
1 1 1
t = 0.5 with h = 10 , 20 , and 40 , and with µ(= k/h2 ) = 0.5.
Compute the max-norm and the 2-norm of the error for
t = 0.1, 0.2, 0.3, 0.4, 0.5.

2. How many terms are needed in the sum (1.8) of problem 2 to make the
1
remainder less than 10−6 when t = 200 ?
1
And when t = 3200 ?
1
3. Solve problem 2 with the explicit method from t = 0 to t = 0.5 with h = 10
1
and k = 200 .
Draw the true solution and the numerical solution as functions of x for
t = 0.005, 0.01, 0.015, 0.02, 0.025.
Draw the error as a function of x for t = 0.005, 0.01, . . . , 0.025.
Draw the error as a function of t for x = 0, 0.1, 0.2, 0.3, 0.4, 0.5.

4. Solve problem 2 with the explicit method from t = 0 to t = 0.5 with


1 1 1
h = 10 , 20 , and 40 , and with µ(= k/h2 ) = 0.5.
Compute the max-norm and the 2-norm of the error for
t = 0.1, 0.2, 0.3, 0.4, 0.5.

5. Solve problem 1 with the implicit method (1.15) from t = 0 to t = 0.5 with
1 1 1
h = 10 , 20 , and 40 , and with µ(= k/h2 ) = 0.5.
Compute the max-norm and the 2-norm of the error for
t = 0.1, 0.2, 0.3, 0.4, 0.5.
1
6. Solve problem 2 with the implicit method from t = 0 to t = 0.5 with h = 10
1
and k = 200 .
Draw the error as a function of x for t = 0.005, 0.01, . . . , 0.025.
Draw the error as a function of t for x = 0, 0.1, 0.2, 0.3, 0.4, 0.5.

7. Solve problem 2 with the implicit method from t = 0 to t = 0.5 with


1 1 1
h = 10 , 20 , and 40 , and with µ(= k/h2 ) = 0.5.
Compute the max-norm and the 2-norm of the error for
t = 0.1, 0.2, 0.3, 0.4, 0.5.

8. Solve problem 1 with Crank-Nicolson (1.17) from t = 0 to t = 0.5 with


1 1 1
h = 10 , 20 , and 40 , and with µ(= k/h2 ) = 0.5.

16
Compute the max-norm and the 2-norm of the error for
t = 0.1, 0.2, 0.3, 0.4, 0.5.
1
9. Solve problem 2 with Crank-Nicolson from t = 0 to t = 0.5 with h = 10
1
and k = 200 .
Draw the error as a function of x for t = 0.005, 0.01, . . . , 0.025.
Draw the error as a function of t for x = 0, 0.1, 0.2, 0.3, 0.4, 0.5.

10. Solve problem 2 with Crank-Nicolson from t = 0 to t = 0.5 with


1 1 1
h = 10 , 20 , and 40 , and with µ(= k/h2 ) = 0.5.
Compute the max-norm and the 2-norm of the error for
t = 0.1, 0.2, 0.3, 0.4, 0.5.

11. Solve problem 1 with the explicit method from t = 0 to t = 0.5 with
1
h = k = 10 .
Draw the error as a function of x for t = 0.1, 0.2, 0.3, 0.4, 0.5.
Draw the error as a function of t for x = 0, 0.1, 0.2, 0.3, 0.4, 0.5.

12. Solve problems 1 and 2 with Crank-Nicolson and the implicit method from
1 1 1
t = 0 to t = 0.5 with h = k = 10 , 20 , and 40 .
Compute the max-norm and the 2-norm of the error for
t = 0.1, 0.2, 0.3, 0.4, 0.5.

13. Show that b′m ≥ 1 in the tridiagonal system of equations (cf. section 1.11)
which arises when we use the general θ-method (1.18) on ut = buxx .

17
18
Chapter 2

Stability

2.1 Fourier analysis

A very important tool in the study of stability of difference schemes as well


as the behaviour of differential equations is Fourier analysis. We begin with the
continuous case. If u(x) is a real function defined on the real line then the Fourier
transform of u is
1 Z ∞ −iωx
û(ω) = √ e u(x) dx. (2.1)
2π −∞
It is possible to recreate u(x) from û(ω) by the inversion formula
Z
1 ∞
u(x) = √ eiωx û(ω) dω. (2.2)
2π −∞

û(ω) is a function of a real variable ω but it may assume complex values. From
the inversion formula we may deduce that û(ω) is uniquely determined by u(x)
and vice versa. û(ω) is just an alternate representation of u(x) just like a Fourier
series is an alternate representation of a periodic function.
If v is a grid function, i.e. vm is defined for all integers m then we define the
discrete transform

1 X
v̂(ξ) = √ e−imξ vm
2π −∞

where ξ ∈ [−π, π]. The inversion formula reads


1 Z π imξ
vm = √ e v̂(ξ) dξ.
2π −π

19
The more useful case is where the distance between grid points is h. We change
variable and define

1 X
v̂(ξ) = √ e−imhξ vm h (2.3)
2π −∞
where hξ ∈ [−π, π]. The inversion formula now reads
Z
1 π/h
vm = √ eimhξ v̂(ξ) dξ. (2.4)
2π −π/h

The L2 -norm of u(x) is defined by


Z ∞ 1/2
||u||2 = |u(x)|2 dx (2.5)
−∞

and is equal to the L2 -norm of û:


Z ∞ 1/2
2
||û||2 = |û(ω)| dω . (2.6)
−∞

This relation which is named after Parseval (Marc-Antoine Parseval des Chênes,
1755 – 1836) is proved by the following calculations:
Z Z
∞ ∞ 1 Z ∞ iωx
||u||22 = u(x) u(x) dx = u(x) √ e û(ω)dω dx
−∞ −∞ 2π −∞
Z Z
∞ 1 ∞
= u(x) √ e−iωx û(ω) dω dx
−∞ 2π −∞
Z ∞ Z ∞
1
= û(ω) √ e−iωx u(x) dx dω
Z
−∞ 2π −∞

= û(ω) û(ω) dω = ||û||22 .
−∞

The crucial point of this derivation is the interchange of the order of integration.
This is allowed if and only if u (and û) are L2 -functions, and this is also the
condition for the Fourier transform to be well-defined.
The 2-norm of vm was defined in section 1.10:

!1/2
X
2
||v||2 = h |vm | (2.7)
−∞

and also here we have a Parseval relation which states that


||v||2 = ||v̂||2 . (2.8)
Remark. Comparing (2.7) with the usual definition of the 2-norm we note an
extra factor h. This is introduced because we shall wish to compare grid functions
with different values of h with each other and with the continuous function which
they are supposed to be approximations to. ✷

20
2.2 Two examples

Example 1: The square. We should like to compute the Fourier transform of


the grid function defined by

 1
 if |m| < M
1
vm = 2
if |m| = M where Mh = 1


0 if |m| > M
For ξ 6= 0 we have
 
1 X ∞
−imhξ h  1 −iξ iξ
MX−1 
v̂(ξ) = √ e vm h = √ (e + e ) + e−imhξ
2π −∞ 2π  2 −M +1

( )
h 1 − e−i(2M −1)hξ
= √ cos(ξ) + ei(M −1)hξ
2π 1 − e−ihξ
( )
h sin(ξ − 12 hξ) h 1
= √ cos(ξ) + 1 = √ sin(ξ) cot( hξ).
2π sin( 2 hξ) 2π 2
For ξ = 0 we have
h 2
v̂(ξ) = √ (1 + 2M − 1) = √
2π 2π
which is also the limit of the first expression as ξ → 0.
In Fig. 2.1 we have to the left shown the Fourier transform of the square for
M = 11. The abscissa is hξ and we have only shown the interval 0 ≤ hξ ≤ π as
the function is symmetric around 0. We notice a substantial weight near 0. ✷

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
0.0 1.0 2.0 3.0 0.0 1.0 2.0 3.0

Figure 2.1: Fourier transform of square (left) and oscillation (right).

Example 2. The oscillation. We next consider an oscillating grid function


(
(−1)m if |m| < M
vm = where Mh = 1
0 if |m| ≥ M

21
We now have

h MX −1
h MX −1
v̂(ξ) = √ e−imhξ e−imπ = √ e−im(π+hξ)
2π −M +1 2π −M +1
h sin(M − 12 )(π + hξ) (−1)M −1 h cos(ξ − 21 hξ)
= √ = √
2π sin( 12 (π + hξ)) 2π cos( 21 hξ)
(−1)M −1 h 1
= √ (cos(ξ) + sin(ξ) tan( hξ)).
2π 2

In Fig. 2.1 we have to the right shown the Fourier transform of the oscillation for
M = 11. The abscissa is hξ and we have only shown the interval 0 ≤ hξ ≤ π as
the function is symmetric around 0. We notice a substantial weight near π. ✷
From Example 1 we see that the Fourier transform of the discrete function which
is equal to 1 in an interval around 0 has a significant contribution for hξ close
to 0. The remaining wiggles originate from the sudden drop to 0 which must
happen somewhere for the function to have a finite norm.
From Example 2 we see that the Fourier transform of the oscillation has a signif-
icant contribution near hξ = π (and −π). We therefore talk about values around
hξ = π as corresponding to high-frequency components and values around hξ = 0
as corresponding to low-frequency components of the function in question.

2.3 Fourier analysis and differential equations

If we differentiate (2.2) w.r.t. x we get


Z
du 1 ∞
= √ eiωx iω û(ω) dω. (2.9)
dx 2π −∞

From this and the uniqueness of the Fourier transform it follows that the Fourier
transform of the derivative is obtained by multiplication with iω:

c
du
( )(ω) = iω û(ω). (2.10)
dx

If we now consider functions, u(t, x), of two variables we can still perform the
Fourier transform in the x-variable. Applying this to the IVP for the simple heat
equation

ut = buxx , u(0, x) = u0(x) (2.11)

22
we obtain

ût (t, ω) = b(iω)2 û(t, ω) = −bω 2 û(t, ω). (2.12)

Using the fact that the Fourier transform of the time derivative is the same as the
time derivative of the Fourier transform we have obtained an ordinary differential
equation in t for the Fourier transform of u with initial condition û(0, ω) = û0 (ω).
Since the solution to the IVP

y ′ = −bω 2 y, y(0) = y0

is
2t
y(t) = y0 e−bω
we deduce that
2
û(t, ω) = e−bω t û0 (ω). (2.13)

Using Parseval we can now get a bound for the L2 -norm of u(t, .) at an arbitrary
time t > 0:
Z ∞ Z ∞
||u(t, .)||22 = 2
|u(t, x)| dx = |û(t, ω)|2 dω (2.14)
−∞ −∞
Z ∞ Z ∞
2
= e−2bω t |û0(ω)|2 dω ≤ |û0(ω)|2 dω = ||u0(x)||2
−∞ −∞

provided b > 0.
We note that for positive b we have a bound on the norm of the solution for
t > 0: the IVP is well-posed. On the other hand, if b < 0 (or t < 0) the problem
is ill-posed and we may expect unbounded growth. When û0 (ω) 6= 0 for some
2
value of ω 6= 0 then these components will be amplified with the factor e−bω t .
The higher the frequency ω the higher the amplification factor.

2.4 Von Neumann analysis

Consider the explicit method (1.11) on the simple heat equation ut = buxx .
n+1
Solving for vm we get
n+1 n n n
vm = (1 − 2bµ)vm + bµ(vm+1 + vm−1 ). (2.15)

We now take the Fourier inversion formula

n 1 Z π/h imhξ n
vm = √ e v̂ (ξ) dξ (2.16)
2π −π/h

23
and apply it on the right-hand-side of (2.15) for m, m + 1, and m − 1 to get
Z h i
n+1 1 π/h
vm = √ eimhξ 1 − 2bµ + bµ(eihξ + e−ihξ ) v̂ n (ξ) dξ. (2.17)
2π −π/h

Using the uniqueness of the Fourier transform we deduce

v̂ n+1 (ξ) = g(hξ)v̂ n (ξ) (2.18)

where the amplification factor or growth factor, g(hξ), is given by the square
bracket above:

g(hξ) = 1 − 2bµ + bµ(eihξ + e−ihξ ) = 1 − 2bµ + 2bµ cos(hξ)



= 1 − 4bµ sin2 . (2.19)
2
So when we advance the numerical solution from one time step to the next the
Fourier transform of the solution is multiplied by g(hξ). By induction we then
have

v̂ n (ξ) = (g(hξ))n v̂ 0 (ξ). (2.20)

Looking at the norms we have


X Z π/h
||v n ||22 = h n 2
|vm | = | v̂ n (ξ)|2 dξ
m −π/h
Z π/h
= |g(hξ)|2n |v̂ 0 (ξ)|2 dξ. (2.21)
−π/h

For stability we would like ||v n ||2 to be bounded in relation to ||v 0||2 . We see
that we can achieve this if |g(hξ)2n | is suitably bounded. In the considerations
to follow, the variable, ξ, will usually appear together with the step size, h, so to
simplify the notation we introduce ϕ = hξ.
For the explicit method on ut = buxx the growth factor now reads
ϕ
g(ϕ) = 1 − 4bµ sin2 , −π ≤ ϕ ≤ π. (2.22)
2
We note that g(ϕ) is always real and ≤ 1. If 2bµ ≤ 1 then 4bµ sin2 ϕ2 ≤ 2 and
g(ϕ) ≥ −1. In this case |g(ϕ)| ≤ 1 and therefore |g(ϕ)|2n ≤ 1 showing that we
have stability with CT = 1 for all T :

||v n ||2 ≤ ||v 0 ||2 .

If 2bµ > 1 then g(ϕ) < −1 in an interval around π. Therefore |g(ϕ)|2n will grow
without bound in this interval. We can provide no bound for ||v n ||2 and if v̂ 0 6= 0

24
somewhere in this interval the corresponding components of the solution will be
magnified: we have instability.
We conclude that the explicit method (2.15) for the heat equation ut = buxx is
stable iff 2bµ ≤ 1 or k ≤ h2 /(2b). We also note that instability shows up first in
the high frequency components of the solution.
Remark. This technique for analyzing the stability of finite difference schemes
is named after John von Neumann (or Neumann János Lajos, 1903 – 1957). ✷
It is not necessary to invoke the Fourier transform and equate integral expres-
sions every time we want to investigate the stability properties of a difference
scheme for a differential equation. Looking at equations (2.17) and (2.20) we
n
conclude that the essential features are captured if we replace vm by g n eimϕ in
the difference scheme and this is also the way the technique was first presented in
[5]. This is sometimes interpreted in the way that we are seeking solutions to the
n
difference scheme on the form vm = g n eimϕ with obvious parallels to the solution
by separation of variables for PDEs as discussed in section 1.3. This is also fine
for memorizing as long as we keep in mind that through the Fourier transform
we have the sound mathematical basis for our arguments.
Example. Consider now the explicit method on the equation ut = buxx + κu.
The difference equation is
n+1 n
vm − vm v n − 2vmn n
+ vm−1
= b m+1 2
n
+ κvm . (2.23)
k h
n
Replacing vm by g n eimϕ and dividing by g n eimϕ gives
g−1 4b ϕ
= − 2 sin2 + κ, −π ≤ ϕ ≤ π
k h 2
or
ϕ
g(ϕ) = 1 − 4bµ sin2 + κk, −π ≤ ϕ ≤ π. (2.24)
2

If κ > 0 then we have g(ϕ) > 1 for small ϕ and so it seems that we may have
problems with stability for κ > 0 (and stricter bounds on bµ for κ < 0). But note
the following three points:

• u(t, x) = (α + βx)eκt is a solution to the differential equation and if κ > 0


then this solution exhibits exponential growth. We should expect the same
from a good numerical solution.
• As k → 0 (which it does when we consider convergence) g will approach
the usual value from when κu was not present in the equation.

25
• The condition |g(ϕ)| ≤ 1 actually gave more than we demanded. We wanted
bounded growth and we got CT = 1 for all T .

We may allow |g(ϕ)| > 1 under certain conditions as the following theorem shows:
Theorem. A one-step difference scheme is stable if

∃K, h0 , k0 such that |g(ϕ, h, k)| ≤ 1 + Kk, ∀ϕ, 0 < k ≤ k0 , 0 < h ≤ h0 .

Proof:

|g(ϕ, h, k)|2n ≤ (1 + Kk)2n < (eKk )2n = e2Knk = e2Kt ≤ e2KT = CT2

so for nk = t ≤ T the norm of v n is bounded by CT = eKT times the norm of the


initial function. ✷
Example. Now consider the explicit method on the equation ut = buxx − aux .
The difference equation is
n+1
vm n
− vm v n − 2vm
n n
+ vm−1 n
vm+1 n
− vm−1
= b m+1 − a . (2.25)
k h2 2h
n
Replacing vm by g n eimϕ and dividing by g n eimϕ gives

g−1 4b ϕ eiϕ − e−iϕ


= − 2 sin2 − a , −π ≤ ϕ ≤ π
k h 2 2h
or
ϕ k
g(ϕ) = 1 − 4bµ sin2 − ia sin ϕ, −π ≤ ϕ ≤ π. (2.26)
2 h
Now the growth factor is a complex number and we find the square of the absolute
value
ϕ 2 k2
|g(ϕ)|2 = (1 − 4bµ sin2 ) + a2 2 sin2 ϕ. (2.27)
2 h
If 2bµ ≤ 1 then the first parenthesis is ≤ 1 and we have

|g(ϕ)|2 ≤ 1 + a2 µk, −π ≤ ϕ ≤ π

and therefore
1
|g(ϕ)| ≤ 1 + a2 µk + O(k 2), −π ≤ ϕ ≤ π.
2
Once again the stability condition is the same as for the simple heat equation
(2bµ ≤ 1) irrespective of the lower order terms. ✷

26
2.5 Implicit methods

The implicit method for ut = buxx − aux + κu is


n+1
vm n
− vm v n+1 − 2vm
n+1 n+1
+ vm−1 n+1
vm+1 n+1
− vm−1
= b m+1 − a n+1
+ κvm . (2.28)
k h2 2h
n
Replacing vm by g n eimϕ and dividing by g n eimϕ gives

g−1 4b ϕ eiϕ − e−iϕ


= −g 2 sin2 − ag + κg
k h 2 2h
or
1
g(ϕ) = . (2.29)
1+ 4bµ sin2 ϕ2 + ia hk sin ϕ − κk

If κ ≤ 0 then already the real part of the denominator is ≥ 1 and therefore


|g(ϕ)| ≤ 1 irrespective of bµ and a. If κ > 0 then |g(ϕ)| may be greater than
1 for small values of ϕ but by no more than O(k) so we can conclude that the
implicit method is unconditionally stable. If a = κ = 0 then 0 ≤ g(ϕ) ≤ 1
and the smallest values are attained when ϕ ≈ π implying that high frequency
components are damped most.
For the Crank-Nicolson method the growth factor becomes

1 − 2bµ sin2 ϕ2 − ia 2h
k
sin ϕ + 21 κk
g(ϕ) = . (2.30)
1 + 2bµ sin2 ϕ2 + ia 2h
k
sin ϕ − 12 κk

Once again we have unconditional stability with |g(ϕ)| ≤ 1 in all cases except
possibly when κ > 0 in which case we still have |g(ϕ)| ≤ 1 + O(k). If a = κ = 0
then −1 ≤ g(ϕ) ≤ 1 and values close to −1 are attained when ϕ ≈ π implying
that high frequency components are damped little when bµ ≫ 1. So a solution
or error component which oscillates in the x-direction will also oscillate in the
t-direction with a slowly diminishing amplitude.

2.6 Two kinds of stability

As indicated in the theorem in section 2.4 a certain growth is allowed for a stable
difference scheme. This is in accordance with the definition of stability which
allows a growth in the norm with a factor CT times the norm of the initial
function. This kind of stability is what is needed in the Lax equivalence theorem
where stability is needed to ensure convergence as the step sizes tend to 0. This
kind of stability is sometimes called numerical stability [35] or pointwise stability

27
[25] or with a term borrowed from ODEs 0-stability. In practice we shall often be
concerned with problems with decaying solutions and we shall use a numerical
scheme with fixed positive step sizes and for many steps. In this case any kind of
growth is unwanted. We shall in these cases want the condition |g(ϕ)| ≤ 1 and
we speak of dynamic stability [35] or stepwise stability [25] or with a term from
ODEs absolute stability.

2.7 Exercise
1. Prove formula (2.30).

28
Chapter 3

Accuracy

In Chapter 1 we introduced the concept of consistency which together with sta-


bility implies convergence of the numerical solution to the true solution as the
step sizes tend to 0. In this chapter we shall also be concerned with how fast
this convergence is as a means to compare various difference schemes. In other
words we shall study the local truncation error: Pk,h ψ − Rk,h P ψ and how fast it
converges to 0 as the step sizes h and k tend to 0. We shall use Taylor expansions
in this study of the difference schemes.

3.1 Taylor expansions

If f (x) is a smooth function of x then it can be expanded in a Taylor series


1 1
f (x + h) = f (x) + hf ′ (x) + h2 f ′′ (x) + · · · + hp f (p) (x) + O(hp+1). (3.1)
2 p!
That f is smooth will in this case mean that f is p times continuously differen-
tiable in an interval around x. If ψ(t, x) is a smooth function of t and x, meaning
that it possesses continuous partial derivatives of a suitable high order, then it
likewise can be expanded in a Taylor series with two variables. We shall usually
refrain from doing so because less complicated expressions are produced, and
cancellations easier detected, when we expand first in one coordinate and later
in the other. The end result will be the same but the risk of committing errors
is greatly reduced this way. The basic expansions are the following

n+1 n 1 1
ψm = ψm + kψt + k 2 ψtt + · · · + k p ψpt + O(k p+1 ), (3.2)
2 p!
n n 1 1
ψm+1 = ψm + hψx + h2 ψxx + · · · + hq ψqx + O(hq+1 ). (3.3)
2 q!

29
The common expansion point for the function and all derivatives is indicated in
the leading term on the right-hand-side but otherwise omitted. The expansion
n
for ψm−1 is easily obtained from (3.3) by changing sign for all the odd terms. For
symmetric expressions we can exploit cancellations such as

n n n 2 2
ψm+1 + ψm−1 = 2ψm + h2 ψxx + · · · + hq ψqx + O(hq+2 ), (3.4)
2 q!
n n 2 2
ψm+1 − ψm−1 = 2hψx + h3 ψxxx + · · · + hq ψqx + O(hq+2 ), (3.5)
6 q!
where q is even and odd, respectively.

3.2 Order

We shall illustrate the use of Taylor expansions with the explicit method on the
simple heat equation ut − buxx = ν. The differential operator is P = Dt − bD2 ,
the difference operator is Pk,h = ∆t − bδ 2 , and the right-hand-side operator Rk,h
is the identity. Using Taylor expansions we get

n 1
∆t ψm = ψt + kψtt + O(k 2),
2
1
δ 2 ψm
n
= ψxx + h2 ψxxxx + O(h4 ),
12
and
1 1
Pk,h ψ − Rk,h P ψ = kψtt − bh2 ψxxxx + O(k 2 + h4 ) (3.6)
2 12
As the step sizes tend to 0 the whole expression on the right-hand-side tends to
0, so we immediately conclude that the explicit method is consistent. But the
expression also reveals the rate of convergence.
Definition. A difference scheme Pk,h v = Rk,h ν which is consistent with the
equation P u = ν is accurate of order p in time and order q in space iff

Pk,h ψ − Rk,h P ψ = O(k p ) + O(hq ) (3.7)

for smooth functions ψ. We say that the scheme is accurate of order (p, q). ✷
Remark. If p = q we say that the method is of order p. ✷
Using this definition we say that the explicit method on the simple heat equation
is accurate of order (1, 2) meaning that it is first order in time and second order
in space.

30
We have already seen that the explicit method requires 2bµ ≤ 1 or equivalently
k ≤ h2 /(2b) to be stable. It is therefore customary with the explicit method to
use a step size k which is proportional to h2 . Therefore the following
Definition. A difference scheme Pk,h v = Rk,h ν with k = Λ(h) is accurate of
order r iff

Pk,h ψ − Rk,h P ψ = O(hr ) (3.8)

for smooth functions ψ. ✷


If we use the explicit method with k = h2 /(2b) then it is accurate of order 2.
Example. For the implicit method on the simple heat equation we have Pk,h =
∇t − bδ 2 , where the evaluation point now is at the advanced time level, and

n 1
∇t ψm = ψt − kψtt + O(k 2 ) (3.9)
2
and therefore
1 1
Pk,h ψ − Rk,h P ψ = − kψtt − bh2 ψxxxx + O(k 2 + h4 ). (3.10)
2 12

We note that the implicit method is accurate of order (1, 2) just like the explicit
method. Since the implicit method is unconditionally stable we are free to choose
k proportional to h as far as stability is concerned. For reasons of accuracy this
might not be such a great idea since the first order term in k will probably
dominate the error.
Note also that we in (3.9) and (3.10) implicitly have assumed that we evaluate
the inhomogeneous term at the advanced time level. This will probably not cause
any problems since the right-hand-side function is supposed to be known, but if
for some reason we want to evaluate ν(t, x) at the time level where we know the
solution function, i.e. ν(t − k, x) then the right-hand-side operator Rk,h is no
longer the identity but an inverse time shift: Rk,h = Et−1 such that

n
Rk,h P ψm = Et−1 (ψt − bψxx )
= ψt − bψxx − kψtt + bkψxxt + O(k 2 )

and therefore
1 1
Pk,hψ − Rk,h P ψ = kψtt − bkψxxt − bh2 ψxxxx + O(k 2 + h4 ).
2 12

The order is still (1, 2). ✷

31
3.3 Symbols of operators

The result of using Taylor expansions on the various (combinations of) differen-
tial and difference operators are polynomials or power series in k and h whose
coefficients in addition to numerical constants contain partial derivatives w.r.t. t
and x of the smooth function ψ. Any smooth function will do as long as none
of its partial derivatives vanish. We can simplify our investigations by a suitable
choice of ψ, a choice inspired by our considerations in section 1.3 and the test
functions we introduced for stability in section 2.4 where we used the product
of an exponential function in time and a trigonometric (or complex exponential)
function in space. So we choose

ψ(t, x) = est eiξx = esnk eiξmh (3.11)

the first expression to be used with differential operators, the second one with
difference operators.
Example.
ψt = sψ, ψx = iξψ, ψxx = −ξ 2 ψ
1
Eψ = esnk eiξ(m+1)h = eiξh ψ = (1 + iξh − ξ 2 h2 + · · ·)ψ
2
1 1
∆ψ = (E −1)ψ = (iξ − ξ 2 h+· · ·)ψ ✷
h 2

When we apply a differential operator P on ψ the result is a polynomial p(s, ξ) in


s and ξ times ψ. This polynomial is called the symbol of the differential operator.
Similarly when we apply a difference operator Pk,h on ψ the result is a power
series pk,h (s, ξ) times ψ. pk,h (s, ξ) is called the symbol of the difference operator
Pk,h .
Example. Since ψt − bψxx = (s + bξ 2 )ψ the symbol of the differential operator
P = Dt − bD2 for the simple heat equation is p(s, ξ) = s + bξ 2 . ✷
The definitions of order can now be reformulated in terms of the symbols of the
operators:
Theorem. A difference scheme Pk,h v = Rk,h ν which is consistent with the
equation P u = ν is accurate of order p in time and order q in space iff

pk,h (s, ξ) − rk,h (s, ξ)p(s, ξ) = O(k p ) + O(hq ) (3.12)

where p(s, ξ), pk,h (s, ξ), and rk,h (s, ξ) are the symbols of the operators P , Pk,h ,
and Rk,h , respectively. ✷

32
Theorem. A difference scheme Pk,h v = Rk,h ν with k = Λ(h) is accurate of order
r iff
pk,h (s, ξ) − rk,h (s, ξ)p(s, ξ) = O(hr ). (3.13)

Example. Crank-Nicolson’s method on the simple heat equation ut − buxx = ν
can be written
n+ 1 n+ 1
(δt − bµ̃t δ 2 )vm 2 = µ̃t νm 2
with the evaluation point located midway between the present and the advanced
time level. The symbol of the differential operator is p(s, ξ) = s+bξ 2 as mentioned
above. The symbol of the left-hand-side difference operator Pk,h = δt − bµ̃t δ 2 is
1 1 sk 1 b 1 1
pk,h (s, ξ) = (e 2 − e− 2 sk ) − 2 (e 2 sk + e− 2 sk )(eiξh − 2 + e−iξh )
k 2h
1 1 1
= s + s3 k 2 + O(k 4 ) − b(1 + s2 k 2 + O(k 4 ))(−ξ 2 + ξ 4 h2 + O(h4))
24 8 12
1 b b
= s + bξ 2 + s3 k 2 + s2 ξ 2 k 2 − ξ 4 h2 + O(h4 + h2 k 2 + k 4 )
24 8 12
and the symbol for the right-hand-side operator is
1 1 1 1
rk,h(s, ξ) = (e 2 sk + e− 2 sk ) = 1 + s2 k 2 + O(k 4 )
2 8
such that
1 b
rk,h (s, ξ)p(s, ξ) = s + bξ 2 + s3 k 2 + s2 ξ 2 k 2 + O(k 4)
8 8
and
1 3 2 b
s k − ξ 4 h2 + O(h4 + h2 k 2 + k 4 )
pk,h(s, ξ) − rk,h (s, ξ)p(s, ξ) = −
12 12
showing that Crank-Nicolson is second order accurate in both time and space. ✷
Remark. The contribution of the right-hand-side operator Rk,h is important
in producing second order accuracy. If we only evaluate the right-hand-side func-
tion ν at time t = nk or t = (n + 1)k the method will be first order accurate in
time. ✷
Remark. If the right-hand-side function is known for intermediate values of t
then it is O.K. to evaluate it for t = (n + 21 )k. In this case rk,h (s, ξ) = 1, since
this is the evaluation point, and the local truncation error becomes
1 3 2 b 2 2 2 b
s k + s ξ k − ξ 4 h2 + O(h4 + h2 k 2 + k 4 )
pk,h (s, ξ) − rk,h(s, ξ)p(s, ξ) =
24 8 12
and the method retains second order accuracy. ✷

33
3.4 The local error

The local truncation error was defined earlier in this chapter as Pk,h ψ − Rk,h P ψ.
Another important concept is the local error which is defined as the difference
between the true solution u(t, x) and the computed solution v(t, x), calculated
from correct initial values at time t−k and boundary values at (t, X1 ) and (t, X2 ).
If we consider the explicit method on ut = buxx we have
n+1
vm = unm + bµ(unm−1 − 2unm + unm+1 )
1
= unm + bkuxx + bkh2 uxxxx + · · ·
12
1
= unm + kut + kh2 utt + · · · (3.14)
12b
1
un+1
m = unm + kut + k 2 utt + · · · (3.15)
2
such that the
1 1
local error = ( k 2 − kh2 )utt + · · · = O(k 2 + kh2 ). (3.16)
2 12b
Comparing with (3.6) we notice that the local error contains an extra factor k
on each term compared to the local truncation error. This is a general result
although trickier to show for implicit methods.

34
Chapter 4

Boundary Conditions

As mentioned in section 1.4 boundary conditions are necessary in order to specify


a unique solution to a parabolic differential equation on a finite space interval.
They also come in handy in supplying the extra equations needed to solve for the
numerical solution. In the sections below we shall specify these extra equations
for the various difference schemes we have introduced. Dirichlet conditions are
easy to accomodate whereas conditions involving a (normal) derivative present
new challenges. We shall often just treat conditions at the left end point, x = X1 ,
since the considerations for x = X2 are quite similar.

4.1 A Dirichlet condition

A Dirichlet boundary conditon

u(t, X1 ) = γ(t), t > 0. (4.1)

is straightforward to apply.
For the explicit method the formulae (1.13) specify the values for the numerical
n+1
solution, vm , at all internal points, m = 1, 2, . . . , M − 1 at time level n + 1. The
boundary condition (4.1) is then used to supply the boundary value, v0n+1 , and
n+1
the boundary condition at x = X2 is used in a similar fashion to provide vM ,
such that we have determined the solution at all points at time level n + 1.
For the general θ-method the formulae (1.20) provide a set of M − 1 equations
in the M + 1 unknowns v0n+1 , v1n+1 , . . . , vM
n+1
. From the Dirichlet conditions we
n+1 n+1
have values for v0 and vM and we are ready to solve for the remaining M − 1
unknowns.

35
4.2 Derivative boundary conditions

If one of the boundary conditions involves a derivative then the discretization of


this has an effect on the accuracy and stability of the numerical solution as well
as on the solution process. Assume that the condition on the left boundary is

αu(t, X1 ) − βux (t, X1 ) = γ, t>0 (4.2)

where α, β and γ may depend on t. A similar condition might be imposed on


the other boundary (cf. section 1.4) and the considerations would be completely
similar so we shall just consider a derivative condition on one boundary. We shall
in turn study three different discretizations of the derivative in (4.2):
v1n − v0n
(first order) (4.3)
h
−v2n + 4v1n − 3v0n
(second order, asymmetric) (4.4)
2h
v1n − v−1
n
(second order, symmetric) (4.5)
2h
We have similar expressions for ux on the right boundary. (4.3) and (4.5) are
easily adapted while (4.4) is slightly more tricky and is therefore given here:
n n n
vM −2 − 4vM −1 + 3vM
2h

These approximations and their respective orders are easily determined using
Taylor series. The effects of the discretization on the overall accuracy of the
computed solution will be considered in Chapter 9. The effects on the stability
of the overall method will be treated in Chapter 6. In the subsequent sections
we shall focus on the practical considerations around the sets of linear equations
to be solved at each time step.

4.3 A third test problem

As an example of a problem with a derivative boundary condition at one of the


boundaries we consider the following which is closely related to test problem 1
on page 5:

Problem 3
ut = uxx , 0 ≤ x ≤ 1, t > 0,

36
u(0, x) = u0 (x) = cos x, 0 ≤ x ≤ 1,
u(t, 1) = e−t cos 1, t > 0,
ux (t, 0) = 0, t > 0.

It is easily seen that the true solution is the same as for test problem 1:
u(t, x) = e−t cos x.

4.4 The explicit method

4.4.1 First order approximation

Using the first order approximation (4.3) to ux in (4.2) results in


v1n+1 − v0n+1
αv0n+1 − β = γ
h
or
(hα + β)v0n+1 = hγ + βv1n+1 (4.6)
or
hγ + βv1n+1
v0n+1 = (4.7)
hα + β
which is used to compute v0n+1 . Since we have assumed that α and β have the
same sign there are no problems with a zero denominator.

4.4.2 Asymmetric second order

The approximation (4.3) is only first order accurate and this will have an adverse
effect on the overall accuracy of the method as we shall see in Chapter 9. Using
the second order approximation (4.4) in (4.2) we get
−v2n+1 + 4v1n+1 − 3v0n+1
αv0n+1 − β = γ
2h
or
(2hα + 3β)v0n+1 = 2hγ − βv2n+1 + 4βv1n+1 (4.8)
or
2hγ − βv2n+1 + 4βv1n+1
v0n+1 = (4.9)
2hα + 3β
which is used to compute v0n+1 . Once again a zero denominator is not possible.

37
4.4.3 Symmetric second order

Symmetric difference approximations are often more accurate than asymmetric


ones so we should like to investigate the merits of such a formula. The symmet-
ric second order approximation (4.5) refers to a point outside the region where
the differential equation is defined. We call this point a fictitious point and no
physical significance should be attached to the value assigned to it. It is merely
a computational quantity. The basic assumption is that the solution function
can be extended slightly beyond the boundary as a smooth function obeying the
same differential equation as in the interior. We then apply the difference scheme
also at the boundary. In order to calculate v0n+1 we need information on v1n , v0n ,
n
and the fictitious value v−1 . This is obtained by applying (4.5) to (4.2):
v1n − v−1
n
αv0n − β = γ
2h
or
n
β(v−1 − v1n ) = 2hγ − 2hαv0n . (4.10)

If β = 0 we have a Dirichlet condition, v0n is defined from (4.10), and there is no


n
reason to incorporate v−1 in the first place. If β 6= 0 we get

n 2h
v−1 = v1n + (γ − αv0n ). (4.11)
β

4.5 The implicit method

4.5.1 First order approximation

The system of linear equations for v n+1 contains M − 1 equations in M + 1


unknowns. One extra equation at the beginning is supplied by (4.6):

(hα + β)v0n+1 − βv1n+1 = hγ (4.12)

and a similar one is supplied at the other end from the boundary condition at
X2 .

4.5.2 Asymmetric second order

The extra equation is now supplied by (4.8):

(2hα + 3β)v0n+1 − 4βv1n+1 + βv2n+1 = 2hγ. (4.13)

38
The resulting system of equations is no longer tridiagonal because of the extra
coefficient in the first (and possibly the last) equation but a Gaussian elimination
can still be done without introducing new non-zero values in the coefficient matrix
and without affecting the linear complexity of the solution (cf. section 1.12).

4.5.3 Symmetric second order

We apply the difference scheme at the boundary point arriving at a linear equation
n+1
involving v−1 , v0n+1 , and v1n+1 . The extra equation is obtained from (4.10):
n+1
βv−1 + 2hαv0n+1 − βv1n+1 = 2hγ. (4.14)

Once again the system of equations is almost tridiagonal in the terminology of


section 1.12, and a Gaussian elimination can be performed without any real
difficulties.

4.6 The θ-method

For the general θ-method and in particular for Crank-Nicolson we use the formulas
of the preceding sections to give the extra equations needed.

4.7 Exercises
1. Solve problem 3 with the implicit method from t = 0 to t = 21 with
1 1 1
h = k = 10 , 20 , and 40 .
Use each of the three approximations (4.3) – (4.5) to approximate the
derivative at the boundary.
Compute the max-norm and the 2-norm of the error for
t = 0.1, 0.2, 0.3, 0.4, 0.5.

2. Solve problem 3 with Crank-Nicolson from t = 0 to t = 21 with


1 1 1
h = k = 10 , 20 , and 40 .
Use each of the three approximations (4.3) – (4.5) to approximate the
derivative at the boundary.
Compute the max-norm and the 2-norm of the error for
t = 0.1, 0.2, 0.3, 0.4, 0.5.

39
40
Chapter 5

The Convection-Diffusion
Equation

5.1 Introduction

The simplest convection-diffusion equation is


ut = buxx − aux (5.1)
or as we sometimes prefer
ut + aux = buxx . (5.2)
If we begin with
ut + aux = 0 (5.3)
we can easily see that the solution can be written
u(t, x) = u0 (x − at) (5.4)
where u0 (x) is the initial value at t = 0. (5.3) is called the one-way wave equation
and according to (5.4) describes transport in the x-direction with velocity a.
Inspired by (5.4) we introduce
w(t, y) = u(t, y + at) = u(t, x) (5.5)
and find that
wt = ut + aux = buxx = bwyy . (5.6)
So w(t, y) is the solution to the simple heat equation and
u(t, x) = w(t, x − at). (5.7)
The equation (5.1) thus describes simultaneous transport and diffusion.

41
5.2 Maximum principle

The solutions to the simple heat equation

ut = buxx (5.8)

satisfy a maximum principle:

max |u(t, x)| ≤ max |u(t, x)| (5.9)


Ω ∂Ω

where Ω is the open region Ω = {(t, x)| 0 < t < T, X1 < x < X2 } and ∂Ω is the
parabolic boundary of Ω consisting of the three straight lines

{t = 0, X1 ≤ x ≤ X2 }, {0 ≤ t ≤ T, x = X1 }, {0 ≤ t ≤ T, x = X2 }.

To see this assume that u(t0 , x0 ) is a local maximum for some (t0 , x0 ) satisfying
0 < t0 ≤ T, X1 < x0 < X2 . Since it is a maximum in the x-direction we must
have ux = 0, uxx < 0 and since it is a maximum in the t-direction (possibly at
T ) we must have ut ≥ 0 which is incompatible with (5.8).
Remark. We might have uxx = 0 in which case ut = 0 and some higher order
even derivative, e.g. uxxxx , must be negative and all lower order derivatives w.r.t.
x must be 0. By differentiating (5.8) we get utt = (buxx )t = butxx = b2 uxxxx and
a similar contradiction arises. ✷
Since the solutions to (5.8) satisfy a maximum principle so do the solutions to
(5.1) by the relations (5.6) and (5.7).
In Appendix A we study a number of difference schemes which can be proposed for
the solution of the one-way wave equation. Here we continue with the convection-
diffusion equation

5.3 The explicit method

For the convection-diffusion equation we would prefer to use a central difference


approximation to ux such that our method can remain of order 2 in x:
n+1
vm n
− vm v n − 2vm
n n
+ vm−1 n
vm+1 n
− vm−1
− b m+1 + a = 0 (5.10)
k h2 2h
or
n+1 1 n n 1 n
vm = (bµ + aλ)vm−1 + (1 − 2bµ)vm + (bµ − aλ)vm+1
2 2
n n n
= bµ(1 + α)vm−1 + (1 − 2bµ)vm + bµ(1 − α)vm+1 (5.11)

42
where we have introduced
k k aλ ah
λ = , µ = 2 , α = = . (5.12)
h h 2bµ 2b
As we noticed in section 2.4 the low order term has no influence on 0-stability so
we still have the well-known condition

2bµ ≤ 1. (5.13)

For this problem it might be relevant to ask for absolute stability, i.e. |g(ϕ)| ≤ 1.
From section 2.4 we have
ϕ
g(ϕ) = 1 − 4bµ sin2 − iaλ sin ϕ, −π ≤ ϕ ≤ π. (5.14)
2
and
ϕ k2
|g(ϕ)|2 = (1 − 4bµ sin2 )2 + a2 2 sin2 ϕ (5.15)
2 h
ϕ ϕ ϕ ϕ
= 1 − 8bµ sin2 + 16b2 µ2 sin4 + 4a2 kµ sin2 cos2 .
2 2 2 2

|g(ϕ)| ≤ 1
ϕ ϕ
⇔ a2 k cos2 + 4b2 µ sin2 ≤ 2b, −π ≤ ϕ ≤ π
2 2
2b
⇔ k ≤ = k0 and 2bµ ≤ 1. (5.16)
a2
We see that for absolute stability we have in addition to the usual condition
(5.13) an upper bound k0 on the allowable time step, a bound which might be
rather strict if we have convection dominated diffusion (a > b).
Since the true solution obeys a maximum principle we might consider a similar re-
quirement on the numerical solution. The condition for this is that all coefficients
in (5.11) be non-negative, i.e.

α ≤ 1 and 2bµ ≤ 1. (5.17)


n+1
That the conditions are sufficient follows from the fact that vm is a weighted
average of values from time step n.
That α ≤ 1 is necessary is seen by assuming α > 1 and taking
0 0
vm = 1 for m ≤ 0 and vm = 0 for m > 0. Then

v01 = bµ(1 + α) + 1 − 2bµ = 1 + (α − 1)bµ > 1.

The condition α ≤ 1 is equivalent to


2b
h ≤ = h0 (5.18)
a
43
so for a discrete maximum principle we have a bound on the maximum allowable
x-step. If we choose h = h0 then we must have
h20 2b
k ≤ = 2 = k0
2b a
where k0 is the same as in the condition for absolute stability.

Upwind
0-stability
ko

Absolute stability
max. pr.
h
ho

Figure 5.1: Regions for maximum principle, absolute and 0-stability


for the explicit and upwind schemes

We have illustrated the three conditions in an h-k-diagram in Fig. 5.1 for the
case b = 0.1 and a = 10. In this case h0 = 0.02 and k0 = 0.002. For 0-stability
k must be below the parabola h2 /2b. For absolute stability k must also be smaller
than k0 , and for a maximum principle to hold h must be smaller than h0 . If h is
larger then the solution will display (bounded) oscillations.
Remark. If we choose h = h0 (i.e. α = 1) and k = k0 (i.e. 2bµ = 1) then (5.11)
degenerates into
n+1 n
vm = vm−1 .
In this case we represent the transport part perfectly (because ak0 = h0 ) and
neglect the diffusion part completely. ✷
The ratio a/b is called the Reynolds number in fluid dynamics literature and the
Peclet number in heat conduction literature. If this number is large it imposes
strict limits on the step size h to avoid oscillations with the explicit method. One
way to circumvent this problem is to use a first order approximation to ux .

5.4 The upwind scheme

When a > 0 then it is possible to approximate ux with a backward difference


leading to
n+1 n n n n n n
vm − vm vm+1 − 2vm + vm−1 vm − vm−1
−b +a = 0 (5.19)
k h2 h
44
or
n+1 n n n
vm = (bµ + aλ)vm−1 + (1 − 2bµ − aλ)vm + bµvm+1
n n n
= bµ(1 + 2α)vm−1 + (1 − 2bµ(1 + α))vm + bµvm+1 . (5.20)

The approximation will only be first order accurate in x and may thus require
small values of h, but this was necessary anyway to avoid oscillations, so this
scheme may be worth a try.
For the growth factor we find
ϕ
g(ϕ) = 1 − 4bµ sin2 − aλ(1 − e−iϕ ) (5.21)
2
ϕ
= 1 − 2(2bµ + aλ) sin2 − iaλ sin ϕ, −π ≤ ϕ ≤ π.
2
The condition for 0-stability is now

2bµ + aλ ≤ 1. (5.22)

For absolute stability we consider


ϕ ϕ
|g(ϕ)|2 = 1 − 8bµ sin2 + 16b2 µ2 sin4 (5.23)
2 2
ϕ ϕ ϕ
−4aλ sin2 (1 − 4bµ sin2 − aλ sin2 ) + a2 λ2 sin2 ϕ
2 2 2
and

|g(ϕ)| ≤ 1 ⇔
ϕ ϕ
2bµ(1 − (2bµ + aλ) sin2 ) + aλ(1 − 2bµ sin2 − aλ) ≥ 0. (5.24)
2 2
This inequality must hold for ϕ = π so we must have

(2bµ + aλ)(1 − 2bµ − aλ) ≥ 0

so (5.22) is a necessary condition. That it is also sufficient is easily seen.


From (5.20) we see that when 2bµ + aλ ≤ 1 then all coefficients are non-negative
n+1
and vm is a weighted average of values at time n, so the condition (5.22) also
guarantees a discrete maximum principle.
So how does condition (5.22) compare to our previous requirements for the explicit
method. First we observe that there is no upper limit on h. Once we have decided
on the step size h then (5.22) puts a limit on k:

1 h2 h20 α2 α2
k ≤ 2b a = = = k0 . (5.25)
h2
+ h
2b(1 + α) 2b(1 + α) 1+α

45
Table 5.1: Step size limits for the upwind scheme

α 2bµ h/h0 k/k0 assessment


1 1/2 1 1/2 worse
3 1/4 3 9/4 better
9 1/10 9 81/10 ‘very good’

This limit is shown with the dashed curve in Fig. 5.1. For a given value of h
the bound on k is stricter, but on the other hand we have a numerical maximum
principle without restrictions on h. In Table 5.1 we show the upper limit for 2bµ
and k for various choices of α = h/h0 for the special case b = 0.1 and a = 10
where h0 = 0.02 and k0 = 0.002.
Formula (5.19) can be rewritten as

n+1 n n n n n n
vm − vm ah vm+1 − 2vm + vm−1 vm+1 − vm−1
− (b + ) +a = 0 (5.26)
k 2 h2 2h

showing that the upwind scheme is an O(h2 ) approximation to a convection-


diffusion equation with diffusion coefficient

ah
b+ = b(1 + α) (5.27)
2

The upwind difference scheme (5.19) is consistent with equation (5.1) and also
with the modified convection-diffusion equation where the diffusion coefficient b
is replaced by b(1 + α). It is a first order approximation to (5.1) but a second
order (in x) approximation to the modified equation. In the limit when h (and
k) tend to 0, α will tend to 0 and the two equations become equal.
When using the upwind scheme we introduce numerical diffusion or artificial
viscosity into the system, and even with α = 1 this extra contribution is of the
same magnitude as the original. The effect is that the upwind scheme will tend to
smoothe everything too much and the assessment very good in Table 5.1 should
be taken with a fair amount of irony. This issue has been addressed by Gresho
and Lee in a paper entitled: ‘Don’t suppress the wiggles. They are telling you
something’ [13].

46
5.5 The implicit method

In order to avoid the stability limitations of the explicit method we might instead
consider the implicit method
n+1
vm n
− vm v n+1 − 2vm
n+1 n+1
+ vm−1 n+1
vm+1 n+1
− vm−1
− b m+1 + a = 0 (5.28)
k h2 2h
or
n+1 n+1 n+1 n
−bµ(1 + α)vm−1 + (1 + 2bµ)vm − bµ(1 − α)vm+1 = vm . (5.29)
For the growth factor we now get
ϕ
g − 1 + 4gbµ sin2 + igaλ sin ϕ = 0, −π ≤ ϕ ≤ π (5.30)
2
or
1
g(ϕ) = 2 ϕ , −π ≤ ϕ ≤ π (5.31)
1 + 4bµ sin 2 + iaλ sin ϕ
from which we immediately deduce that |g(ϕ)| ≤ 1, i.e. we have absolute and
unconditional stability.
If α = 1, i.e. h = h0 , then (5.29) reduces to
n+1 n+1 n
−2bµvm−1 + (1 + 2bµ)vm = vm (5.32)
which can be solved from left to right:
n n+1
n+1 vm + 2bµvm−1
vm = (5.33)
1 + 2bµ
n+1 n n+1
showing that vm is a weighted average of vm and vm−1 and therefore no larger
than the largest of these, thus proving that a maximum principle holds for the
numerical solution.
For other values of α we must solve a tridiagonal system of equations with
(cf. section 1.11)
am = −bµ(1 + α), bm = 1 + 2bµ, cm = −bµ(1 − α).
This system can be solved using the procedure of section 1.11 with no fear of
numerical instability since
a2 bµ(1 + α) bµ
b′2 = b2 − c1 = 1 + 2bµ − bµ(1 − α) = 1 + 2bµ − bµ(1 − α2 ).
b1 1 + 2bµ 1 + 2bµ
If α ≥ 1 then b′2 ≥ b2 and in general b′m ≥ bm .
If α < 1 the b′2 > 1 + bµ and in general b′m > 1 + bµ.

47
5.6 Crank-Nicolson

The Crank-Nicolson method can be written


n+1 n n+1 n+1 n+1 n+1 n+1
vm − vm vm+1 − 2vm + vm−1 vm+1 − vm−1
−b +a = (5.34)
k 2h2 4h
n n n n n
vm+1 − 2vm + vm−1 vm+1 − vm−1
b −a
2h2 4h
or
1 n+1 n+1 1 n+1
− bµ(1 + α)vm−1 + (1 + bµ)vm − bµ(1 − α)vm+1 = (5.35)
2 2
1 n+1 n+1 1 n+1
bµ(1 + α)vm−1 + (1 − bµ)vm − bµ(1 − α)vm+1 .
2 2
For the growth factor we now get
ϕ
1 − 2bµ sin2 2
− 12 iaλ sin ϕ
g(ϕ) = ϕ , −π ≤ ϕ ≤ π (5.36)
1 + 2bµ sin2 2
+ 12 iaλ sin ϕ

and it is easily seen that |g(ϕ)| ≤ 1, proving absolute and unconditional stability.
If α = 1, i.e. h = h0 , then (5.35) reduces to
n+1 n+1 n n
−bµvm−1 + (1 + bµ)vm = bµvm−1 + (1 − bµ)vm (5.37)

which can be solved from left to right:


n n n+1
n+1 bµvm−1 + (1 − bµ)vm + bµvm−1
vm = (5.38)
1 + bµ
showing that a maximum principle holds for the numerical solution if and only if
bµ ≤ 1 or k ≤ 2k0 .
For other values of α we must again solve a tridiagonal system of equations and
also here there is no fear of numerical instability (cf. exercise 1).

5.7 Comparing the methods

We have compared the various methods on two test examples based on the equa-
tion ut −buxx + aux = 0 with b = 0.1 and a ≥ 0 and with an initial function either
a sawtooth which increases linearly from 0 to 1 on [−1, 0] and decreases linearly
to 0 on [0, 1] or a smooth bump defined as (1 + cos(πx))/2 on [−1, 1]. Outside
[−1, 1] the initial function is set to 0 and the interval is chosen large enough that
we can use 0 as boundary values. The time interval was chosen to [0, 0.5].

48
As time passes the amount of matter represented by the bump will diffuse and be
transported to the right with velocity a, but nothing will disappear so the area
P
under the curve will remain constant. Numerically the area is defined as h m vm
and it is easy to show that all methods considered here will conserve the area
(up to rounding errors and as long as the bump does not reach the boundaries,
cf. exercise 3).
Because of the diffusion term the maximum value of the bump will become smaller
and the half-width (measured as the width of the bump at half the maximum
value) will widen. Because of the transport term the top of the bump will move
with a velocity close to a.
We have used x-steps around h0 and time steps equal to k0 , 5k0 , and 25k0. a = 0
was chosen to get a reference value for the maximum and the half-width and
a = 9 for the real computation. With the smaller values of k Crank-Nicolson
meets these values quite well. The implicit method tends to smoothe too much,
and the explicit method too little. Time steps of 25k0 cannot be recommended
as the implicit method gives a very wide and low maximum and Crank-Nicolson
tends to produce waves with negative function values upstream. In no case does
the upwind scheme produce acceptable results.

5.8 Exercises
1. Show that we have numerical stability when solving the linear equations
for Crank-Nicolson’s method by showing that b′m > 1 + 21 bµ.

2. Take a single step with the explicit method, the implicit method, and
Crank-Nicolson with h = h0 and k = k0 , 2k0 and 3k0 from an initial func-
tion which is equal to 1 when x ≤ 0 and equal to 0 when x > 0. Compare
the results.

3. Show that the explicit and the implicit methods preserve the area under
the curves in the two test examples in section 5.7.

49
50
Chapter 6

The Matrix Method

6.1 Notation

The process of advancing the solution from one time step to the next can be
formulated in linear algebra terms. We arrange the function values at time step
n, {v0n , v1n , . . . , vM
n n n
−1 , vM } as an (M + 1)-dimensional column vector, v , and the
internal function values {v1n , . . . , vM n
−1 } as an (M −1)-dimensional column vector,
n
v . We shall also use this underline convention for matrices such that a matrix
name with no underline shall refer to an (M − 1) × (M − 1) matrix such as
the one representing the operator which takes v n into v n+1 , whereas a matrix
name with an underline refers to an (M + 1) × (M + 1) matrix which also takes
boundary values into account. A double underline shall signify a rectangular
(M − 1) × (M + 1) matrix. We shall return to the precise definition of these
matrices in the following sections.
It should be stressed here that we never in practice construct the matrices which
we are about to introduce. They are used in the analysis of the computational
process and serve merely as a means to study and understand the general be-
haviour of our numerical solutions.

6.2 The explicit method

A time step with the explicit method on the simple heat equation ut = buxx can
be written

v n+1 = A vn (6.1)

51
where
 



c d e 



 


 c d e 

A =  . . . (6.2)
 


 c d e 


 

 c d e 

with c = e = bµ and d = 1 − 2bµ. We shall also write A = I − bµ T introducing


   



0 1 0 





–1 2 –1 



 
 
 


 0 1 0 
 
 –1 2 –1 

I= . . . and T = . . . .(6.3)

 
 
 


 0 1 0 
 
 –1 2 –1 


 
 
 

 0 1 0   –1 2 –1 

We prefer to work with square matrices, so we usually treat the boundary values
separately thus removing the first and last column from A to form A = I − bµ T
where
   



1 0 





2 –1 



 
 
 

 0 1
 0 
  –1
 2 –1 

I = . . . and T = . . . (6.4)

   


 0 1 0 






 –1 2 –1 




   
0 1   –1 2 

and the explicit time step now reads

v n+1 = Av n + q n (6.5)

where q n is the (M − 1)-dimensional vector q n = bµ{v0n , 0, . . . , 0, vM


n T
} .
Remark. q n is really (M − 1)-dimensional. Component number 1 is bµv0n and
n
component number M − 1 is bµvM . ✷
Using relation (6.5) repeatedly from n = 0 we get

v n = An v 0 + An−1 q 0 + · · · + q n−1 . (6.6)

Remark. The superscripts on the vectors v and q are indices referring to the
step number whereas the superscripts on A indicate powers of A. ✷
The behaviour of the solution at time n is thus to a large extent governed by the
behaviour of the powers of matrix A. We shall return to this in section 6.6.

52
6.3 The implicit method

For the implicit method on ut = buxx we can in a similar way express the step
from time n to time n + 1 as

B v n+1 = v n (6.7)

where B = I + b µ T .
In case of Dirichlet boundary conditions the known values of v0n+1 and vM
n+1
can
be inserted and we arrive at

Bv n+1 = v n + q n (6.8)

where B = I + b µ T and q n is the (M − 1)-dimensional vector

q n = bµ{v0n+1, 0, . . . , 0, vM
n+1 T
} .

Equation (6.8) can be reformulated as

v n+1 = Av n + Aq n (6.9)

where A = B −1 .

6.4 The θ-method

The general θ-method can be formulated as

B v n+1 = C v n (6.10)

where B = I + θ b µ T and C = I − (1 − θ) b µ T .
Taking the (Dirichlet) boundary values separately we can remove the first and
the last columns of B and C and arrive at

Bv n+1 = Cv n + q n (6.11)

where B = I + θ b µ T and C = I − (1 − θ) b µ T and q n is the (M − 1)-dimensional


vector
q n = bµ{θv0n+1 + (1 − θ)v0n , 0, . . . , 0, θvM
n+1 n T
+ (1 − θ)vM }
or

v n+1 = Av n + B −1 q n (6.12)

where A = B −1 C.

53
6.5 Stability by the matrix method

If we have homogeneous boundary conditions we have q n = 0 and the transfor-


mation from time step n to n + 1 is in all cases

v n+1 = Av n (6.13)

and from the beginning to time step n

v n = An v 0 (6.14)

where the superscript on v indicates the step number and the superscript on A
indicates a power.
Introducing a vector norm, ||.||, and a compatible matrix norm (for which we
shall use the same symbol) we then have

||v n+1|| ≤ ||A|| ||v n || (6.15)

and

||v n || ≤ ||A||n ||v 0 ||. (6.16)

We note that ||A|| ≤ 1 implies absolute stability in the given vector norm.
Example. If we choose the ∞-norm for vectors

||v||∞ = max |vm |


m

a compatible matrix norm is the maximum row sum


X
||A||∞ = max |aij |.
i
j

For the explicit method we have

|c| + |d| + |e| = 2bµ + |1 − 2bµ| = 1 if 2bµ ≤ 1.

We therefore conclude that the explicit method on ut = buxx is absolutely stable


in the ∞-norm provided 2bµ ≤ 1. ✷
There are many different norms to choose from. What happens if one matrix
norm measures ||A|| less than 1 and another greater than 1? Since any two
vector norms in a finite-dimensional space are equivalent in the sense that there
exist constants γ and δ such that

γ||v||α ≤ ||v||β ≤ δ||v||α (6.17)

54
it follows that if a scheme is 0-stable in one norm it is 0-stable in any other norm.
Remark. But it is not necessarily true that a scheme is absolutely stable in one
norm if it is absolutely stable in another since the constants γ and δ allow for a
(limited) growth (cf. Exercise 1). ✷
So it becomes interesting to look for matrix norms which produce small values
when applied to matrix A. And to search for the smallest possible value of ||A||.
We have the following two results from matrix theory:
1. ||A|| ≥ ρ(A)
where ||.|| is any matrix norm and ρ(A) is the spectral radius of A, i.e. the maxi-
mum absolute value of the eigenvalues of A. ✷
2. For any matrix A and any ε > 0 there is a matrix norm such that

||A|| ≤ ρ(A) + ε

For a proof of 2 see [34, p. 284]. ✷


As a consequence we have the following stability results:
A. ρ(A) < 1 =⇒ 0-stability.
B. ρ(A) > 1 =⇒ instability.
C. ρ(A) ≤ 1 and A symmetric =⇒ absolute stability in the 2-norm.
Remark. If A is not symmetric and ρ(A) = 1 the situation is undecided. ✷
Remark. When ρ(A) < 1 we might still encounter considerable (although
bounded) error growth when measured in one of the norms we should like to
use such as the 2-norm or the ∞-norm. ✷

6.6 Eigenvalues of tridiagonal matrices

In order to investigate the stability of numerical solutions of ut = buxx it is thus


important to study the eigenvalues and eigenvectors of Toeplitz matrices of the
form A = I + αT of dimension M − 1. If w is an eigenvector with correspond-
ing eigenvalue τ for T then w is also an eigenvector for A with corresponding
eigenvalue λ = 1 + ατ .
For a vector w = {w1 , w2 , . . . , wM −1} to be an eigenvector for T with correspond-
ing eigenvalue τ we must have

−wm−1 + 2wm − wm+1 = τ wm , m = 2, . . . , M − 2. (6.18)

55
It is usually not easy to find eigenvalues and eigenvectors for a matrix, but if we
have a candidate then it is very easy to check whether it fits. A good suggestion
for w is to take

wm = sin(mϕ), m = 1, . . . , M − 1. (6.19)

Since

wm−1 + wm+1 = sin(m − 1)ϕ + sin(m + 1)ϕ (6.20)


= 2 sin(mϕ) cos ϕ = 2wm cos ϕ

(6.18) now gives


ϕ
τ = 2 − 2 cos ϕ = 4 sin2 . (6.21)
2
In addition to the M − 3 equations (6.18) we must also have the similar relations
for m = 1 and m = M − 1:

2w1 − w2 = τ w1 ,
−wM −2 + 2wM −1 = τ wM −1 .

These are fulfilled automatically if we can manage to have w0 = wM = 0.


w0 = 0 comes naturally out of (6.19). For wM we must require

wM = sin(Mϕ) = 0

or

Mϕ = pπ, p = 1, 2, . . . (6.22)

We therefore define

ϕp = , p = 1, 2, . . . , M −1 (6.23)
M
and with these M −1 values of ϕ we have a set of M −1 orthogonal eigenvectors
and corresponding eigenvalues for T :

τp = 4 sin2 , p = 1, 2, . . . , M −1. (6.24)
2M
For the explicit method on ut = buxx stability is governed by the eigenvalues of
matrix A = I − b µ T such that
ϕp
λp = 1 − 4bµ sin2 , p = 1, 2, . . . , M −1. (6.25)
2
and the condition for stability is that all eigenvalues are ≤ 1 in absolute magni-
tude.

56
0 0

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Figure 6.1: Eigenvectors corresponding to p = 1 and p = M −1 = 9

Notice the close similarity between (6.25) and (2.22). The extra information we
get out of (6.25) is that with a given choice of h (or M) only a discrete and
finite set of frequencies, ϕp , are applicable. In Fig. 6.1 we show the components
of the eigenvectors corresponding to the lowest (p = 1) and the highest (p = 9)
frequency for the case M = 10.
Remark. We have absolute stability iff |λp | ≤ 1, p = 1, 2, . . . , M−1. If we choose
M = 10, h = 0.1 then ϕ9 = 9π/10 and sin2 (ϕ9 /2) ≈ 0.97553. If b = 1 we can
actually have absolute stability with k = 0.005125, even though bµ = 0.5125 >
0.5.
But for the explicit method to be absolutely stable for arbitrary h (or M) we
must still require bµ ≤ 0.5. ✷
For the implicit method on ut = buxx the eigenvalues of A = B −1 are the recip-
rocals of the eigenvalues of matrix B = I + b µ T . We therefore have

1
λp = ϕp , p = 1, 2, . . . , M −1. (6.26)
1 + 4bµ sin2 2

Once again there is a close similarity between (6.26) and (2.29) (with a = κ = 0)
with the former emphasizing the discrete nature of a finite-dimensional problem.
For the general θ-method we use the fact that the eigenvectors for matrices B
and C in (6.11) are the same, such that the eigenvalues of A = B −1 C in (6.12)
are the ratios of corresponding eigenvalues from C and B.
If we apply the explicit method on ut = buxx −aux then matrix A has components
c = bµ + 12 ak/h, e = bµ − 12 ak/h, and d = 1 − 2bµ (cf. (6.2)). The matrix is no
longer symmetric, but as long as e > 0, i.e. a < 2b
h
then c and e have the same
sign and A has real eigenvalues which are given by:

λp = 1 − 2bµ + 2 ce cos ϕp . (6.27)

57
The condition on a is equivalent to (5.18) which together with (5.13) secures a
discrete maximum principle in accordance with the fact that λp in (6.27) are less
than 1 in absolute magnitude.
Remark. But the analogy with the von Neumann results is not complete since
the eigenvalues of A remain real for 0 < a < 2b/h where the growth factors are
complex. ✷

6.7 The influence of boundary values

The stability considerations so far have disregarded the boundary conditions (or
assumed them to be homogeneous) and can therefore in certain cases be slightly
deceiving. One might for instance get the impression that all numerical solutions
to the simple heat equation are decreasing in time if the scheme is stable.
To study the effect of Dirichlet boundary conditions on the explicit scheme for the
simple heat equation we first look at equation (6.6) which can be interpreted to
state that the effect of the initial function will diminish in time. If the boundary
function is increasing with time then the effect of the latter terms will tend to
dominate the expression for v n .
To put it in matrix terms we add two rows to the rectangular matrix A to form
the quadratic matrix
 


 an00 



 




c d e 



 


 c d e 

A =  . . . (6.28)
 


 c d e 


 


 c d e 


 


 

anM M

where c, d, and e are as in (6.2) and an00 = v0n+1 /v0n and anM M = vM
n+1 n
/vM . The
step from time n to n + 1 now reads

vn+1 = A vn (6.29)

Remark. Since the first and last component of A depend on n there really ought
to be a superscript, n, on A. We have chosen to omit it because superscripts on
matrices indicate powers of the matrix. ✷
Remark. If we have Dirichlet boundary conditions then we know the values of
v0 and vM . If these values happen to be 0 at one or more points this analysis
must be modified. ✷

58
Since we know the eigenvalues and eigenvectors of matrix A it is easy to find
the eigenvalues and eigenvectors of A. If we augment the previous eigenvectors
with w0 = wM = 0 we have a set of M −1 eigenvectors and corresponding eigen-
values for A. The remaining two eigenvalues for A are a00 and aM M and the
corresponding two eigenvectors are shown graphically in Fig. 6.2 for the case
M = 10. The main observation here is that when v0n is increasing with n the
eigenvalue a00 is greater than 1, meaning that we have an increasing solution com-
ponent, and from Fig. 6.2 we see that the effect at the internal points diminishes
as we get further into the region.

0 0

0 2 4 6 8 10 0 2 4 6 8 10

Figure 6.2: Eigenvectors corresponding to a00 and aM M for M = 10.

Remark. The above analysis holds only for exponentially increasing boundary
functions such that a00 is constant throughout the computation. But it gives an
idea of the kind of linear transformation which takes the numerical solution from
one time step to the next. ✷

6.8 A derivative boundary condition

When the boundary condition involves a derivative we use one of the formulas
from Chapter 4 to give the necessary extra equation for v0n+1 and/or vM
n+1
.
As a simple example we consider the Neumann condition ux = 0 at the left
boundary and assume that we want to use the implicit method. If we use the
first order approximation to the derivative we get v0n+1 = v1n+1 which can be used
in the first equation which then reads

(1 + bµ)v1n+1 − bµv2n+1 = v1n

We see that that matrix B is changed to B ′ = I − b µ T ′ where T ′ is obtained


from T by changing the first diagonal element from 2 to 1. T ′ is still symmetric

59
and therefore has real eigenvalues which by Gerschgorin’s theorem still lie in the
interval (0, 4).
n+1
If we instead use the symmetric second order approximation we get v−1 = v1n+1
n+1
which can be used to eliminate v−1 from the equation at m = 0 which now reads

(1 + 2bµ)v0n+1 − 2bµv1n+1 = v0n

Now the transformation matrix is B ′′ = I − b µ T ′′ where T ′′ (which now has


dimension M) is obtained from T by changing the second element in the first
row from −1 to −2. T ′′ is no longer symmetric but since it is √similar to a
symmetric matrix (by the similarity transformation using D = diag{ 2, 1, 1, . . .})
the eigenvalues are still real, and by Gershgorin’s theorem they are also still in
the interval (0, 4).
For a further analysis of problems involving the general θ-method and general
boundary conditions we refer to [25] and [18].

6.9 Exercises
1. Prove that if a scheme is 0-stable in the α-norm then it is 0-stable in the
β-norm (cf. (6.17)).
If the scheme is absolutely stable in the α-norm how much can ||v n ||β grow.

2. Prove that ||A|| ≥ ρ(A) for any matrix norm.

3. Prove that 2bµ ≤ 1 and a < 2b/h imply |λp | ≤ 1 for λp in (6.27).

4. Show that the (skewsymmetric) matrix with 1 in the upper bidiagonal, −1


in the lower bidiagonal, and 0 everywhere else, has an eigenvalue

λp = 2 i cos ϕp

corresponding to the eigenvector with components

wk = ik+1 sin(kϕp )

with ϕp given by (6.23).

60
Chapter 7

Two-step Methods

All the methods we have considered so far have been one-step methods in the
sense that they take information from one time step in order to produce values
for the succeeding time step. Many of these methods are second order accurate
in space but only first order accurate in time.

7.1 The central-time central-space scheme

In order to balance things better we might consider the scheme


n
µ̃δt vm − b δ 2 vm
n
= 0 (7.1)
or written out
n+1 n−1
vm − vm v n − 2vm
n n
+ vm−1
= b m+1 (7.2)
2k h2
or
n+1 n−1 n n n
vm − vm − 2bµ(vm+1 − 2vm + vm−1 ) = 0. (7.3)
This scheme which is also known as leap-frog has been used by Richardson [29]
and is expected to be second order accurate in both space and time (cf. Exercise
1). It will require a special starting procedure since we only have information
available at one beginning time level.
We shall begin with an analysis of the stability properties of the scheme. We
proceed as in section 2.4 using the Fourier inversion formula (2.16) on (7.3) ob-
taining
Z π/h h i
eimhξ v̂ n+1 (ξ) − v̂ n−1 (ξ) − 2bµ(eihξ − 2 + e−ihξ )v̂ n (ξ) dξ = 0. (7.4)
−π/h

61
By uniqueness of the Fourier transform the integrand must be 0 leading to

hξ n
v̂ n+1 (ξ) − v̂ n−1 (ξ) + 8bµ sin2 v̂ (ξ) = 0. (7.5)
2
This reminds us very much of a second order difference equation and we are
therefore lead to suggest a solution of the form

v̂ n (ξ) = g n (7.6)

where the right-hand-side is the n-th power of the growth factor g which is sup-
posed to be a function of ϕ = hξ. Inserting in (7.5) and dividing by g n−1 we
arrive at the quadratic equation.
ϕ
g 2 + 8bµ sin2 g − 1 = 0. (7.7)
2
We note that the suggestion on page 25 on how to avoid invoking the Fourier
n
transform by expressing vm as g n eimϕ also applies in the two-step case and leads
us directly to (7.7).
The main difference is that for a two-step scheme we have two growth factors
and they must both be ≤ 1 or 1 + O(k) for the method to be (0-)stable. The two
growth factors are the two roots of the quadratic (7.7):
r
ϕ 2 ϕ
g = −4bµ sin ± 1 + 16b2 µ2 sin4 . (7.8)
2 2
The two values for g in (7.8) are real and their product is −1. Therefore if one
root is less than 1 in absolute magnitude the other must be larger than 1. The
only exception is for ϕ = 0 where the roots are +1 and −1. We conclude that
the difference scheme is always unstable and therefore not useful in practice.
Remark. Richardson used the method with success in [29] but only for a very
small number of time steps. Since the high frequency components are spawned
from rounding errors they are very small in the beginning and it takes a number
of steps for them to build up to an appreciable size. ✷

7.2 The DuFort-Frankel scheme

In order to remedy this lack of stability, DuFort and Frankel [11] suggested re-
n
placing vm in (7.2) by an average leading to
n+1 n−1 n n+1 n−1 n
vm − vm vm+1 − vm − vm + vm−1
= b (7.9)
2k h2
62
or
n+1 n n n−1
(1 + 2bµ)vm − 2bµ(vm+1 + vm−1 ) − (1 − 2bµ)vm = 0. (7.10)
n
Replacing vm by g n eimϕ and dividing by g n−1 eimϕ we get
(1 + 2bµ)g 2 − 4bµ cos ϕ g − (1 − 2bµ) = 0 (7.11)
with roots
q
4bµ cos ϕ ± 16b2 µ2 cos2 ϕ + 4(1 − 4b2 µ2 )
g = (7.12)
2(1 + 2bµ)
q
2bµ cos ϕ ± 1 − 4b2 µ2 sin2 ϕ
= .
1 + 2bµ
If the discriminant is negative the product of the (absolute values of the complex
conjugate) roots is (1 − 2bµ)/(1 + 2bµ) which is less than 1 in absolute magnitude
implying the same for the roots. If the discriminant is positive the roots are real
and satisfy
2bµ| cos ϕ| + 1 1 + 2bµ
|g| ≤ ≤ =1
1 + 2bµ 1 + 2bµ
so we conclude that the DuFort-Frankel scheme is unconditionally stable. This
is unusual for an explicit scheme so something else must be wrong.
In order to check the accuracy of the scheme we determine the symbol of the
difference operator. So we write esnk eimhξ for vm
n
in (7.9) and get after division
esk − e−sk b
pk,h(s, ξ) = − 2 (eiξh + e−iξh − (esk + e−sk ))
2k h
1 b 1 1
= s + s3 k 2 + O(k 4 ) − 2 (2 −ξ 2h2 + ξ 4 h4 + O(h6 ) −(2 + s2k 2 + s4 k 4 + O(k 6)))
6 h 12 12
2
1 1 k k4
= s + bξ 2 + s3 k 2 − bξ 4 h2 + bs2 2 + O(h4 + k 4 + 2 ).
6 12 h h
We recognize the symbol of ut − buxx in the first two terms. For the scheme to
be consistent the remaining terms must tend to 0, but for this to happen k must
tend to 0 faster than h. In particular, if k = αh2 then the scheme is second
order accurate but then we are back to restrictions on the time step which are
comparable to those for the explicit method.

7.3 Exercise
1. Show that the central-time central-space scheme is second order accurate
in both time and space.

63
64
Chapter 8

Discontinuities

The solutions to the simple heat equation and related parabolic equations are
smooth, i.e. infinitely often differentiable in the interior of the region where the
equation applies. But it happens frequently in the mathematical model that there
is a jump discontinuity between the initial function and a boundary function at
a corner point such as (0, X1 ) or (0, X2 ), or that there is a discontinuity in the
initial function or in one of its derivatives (cf. problem 2 on page 5). If we
exclude a small neighbourhood around the singular point(s) the solution will still
be smooth in the remaining region. But our numerical methods may respond in
various ways to such discontinuities.

8.1 Stability and damping

When we study absolute stability our principal object is that the growth factor
satisfies |g| ≤ 1 and we are not particularly concerned about whether g is positive
or negative or close to +1 or −1. This is fine for smooth initial conditions where
our main concern is that the (small) errors we commit, be they rounding or
truncation errors, stay small. Rounding errors often have considerable high-
frequency parts but since their absolute magnitude is small we can even allow
a certain growth as long as it is bounded. The situation is different with a
discontinuous initial function where significant high-frequency components are
present from the beginning. In the continuous problem these components are
damped effectively. The higher the frequency the more effective the damping.
Not necessarily so for the numerical schemes which we shall study in the following
sections when applied to the simple heat equation ut = buxx .

65
8.2 The growth factor

Recall that the growth factor (damping factor might be a more appropriate name
here) for the explicit method is
ϕ
g(ϕ) = 1 − 4bµ sin2 , −π ≤ ϕ ≤ π. (8.1)
2
The explicit method is absolutely stable when 2bµ ≤ 1 because we then have
|g(ϕ)| ≤ 1 for all ϕ. But for high frequency components ϕ is close to π, and if
2bµ = 1 then g will be negative and close to −1. These components will give
rise to slowly damped oscillations in time. Because of the discrete nature of the
problem for a given choice of step sizes there is a limit to how close ϕ can get
to π (cf. section 6.5) and there is a guaranteed minimum damping per time step.
And since there is a strict bound on the time step, the oscillations will usually
not be a serious problem.
The growth factor for the implicit method (IM) is
1
g(ϕ) = , −π ≤ ϕ ≤ π (8.2)
1 + 4bµ sin2 ϕ2

and we immediately notice that 0 ≤ g ≤ 1 independently of bµ and ϕ. We see


furthermore that g becomes smaller when ϕ approaches π so we are in the ideal
situation where the numerical method mimics the continuous case rather closely.
But the implicit method is only first order accurate in time and although the
solution looks good and smooth it might be rather inaccurate.
For Crank-Nicolson (CN) the growth factor is

1 − 2bµ sin2 ϕ2
g(ϕ) = , −π ≤ ϕ ≤ π (8.3)
1 + 2bµ sin2 ϕ2

and |g| ≤ 1 for all values of bµ and ϕ. If k ≈ h then bµ may be rather large
and when ϕ ≈ π, g can be very close to −1. The consequence is that high
frequency components are damped very slowly and we observe oscillations in
time at certain points in space. Crank-Nicolson produces solutions which are
quite accurate when measured in the 2-norm, but these oscillations which occur
near the points of discontinuity can be rather annoying as also noted by Wood
and Lewis [38]. We shall in the next sections discuss ways of damping these
oscillations.

66
8.3 Reducing the Crank-Nicolson oscillations

8.3.1 AV – the moving average

A device for coping with damped oscillations known from physics and used by
Lindberg [23] for a system of ordinary differential equations is the moving average.
n
If the numerical solution, vm is oscillating in time then we might instead use
n+1 n n−1
n vm + 2vm + vm
wm = . (8.4)
4
as an approximation to the solution at (t, x) = (nk, mh). A main difference
between our approach and Lindberg’s is that he proposes to continue the calcu-
lations based on the average. If (8.4) is used in connection with Crank-Nicolson
the growth factor becomes

g + 2 + g −1 1
gav = = ϕ (8.5)
4 1 − 4b2 µ2 sin4 2

indicating that this method must never be used when bµ is small but that it
will have good performance for oscillatory components when bµ is large. It is
also seen that we may encounter difficulties with small (or even zero) values in
the denominator for small values of ϕ. These unfavourable growth phenomena
for slowly varying components is our reason for not continuing the calculations
with the average value. Instead we propose to compute with a straight Crank-
Nicolson to the end (and one step beyond) and perform the average only at the
points where a solution value shall be recorded.

Table 8.1: Growth factors for IM, CN, and AV at ϕ = π and various bµ.

bµ IM CN AV
0.1 0.7143 0.6667 1.0417
0.5 0.3333 0.0000 –
1 0.2000 −0.3333 −0.3333
10 0.0244 −0.9048 −0.0025
100 0.0025 −0.9900 −0.000025

In Table 8.1 we have given the growth factors for the implicit method, Crank-
Nicolson, and the moving average method for ϕ = π and for various values of bµ
illustrating the good damping effect of AV for large values of bµ and the possible
problems for small values of bµ and/or small values of ϕ.

67
8.3.2 IM1 – One step with IM

As bµ gets bigger the high-frequency components receive less and less damp-
ing from CN but more and more from IM as seen in Table 8.1. At bµ = 100
the damping is only 1% per Crank-Nicolson step but one step with the implicit
method will reduce the amplitude of the high frequency component by 0.0025.
Furthermore since the local error of the implicit method is second order in time
(cf. section 3.4) a single step with IM (one ping only) will not affect the second
order accuracy of CN (cf. section 9.8). It will affect the magnitude of the global
truncation error so it is a matter of balancing the effects.

8.3.3 SM – Small steps at the beginning

As seen in Table 8.1 Crank-Nicolson itself can eliminate high frequency compo-
nents if bµ is small enough. So we propose an initial time step k1 such that the
corresponding bµ1 becomes equal to 0.5 and the high frequency component is
annihilated altogether. In practice we should not expect a dramatic effect since
there are also other solution components corresponding to values of ϕ smaller
than π and these will not be reduced to zero. We might therefore consider taking
more than one small step, say s small steps where s could be 5 or 10 or 20. In
this way other solution components will be reduced by the appropriate growth
factor raised to the s-th power. In order to get back to the ’normal’ step size, k,
we may have to take an extra step of length k − sk1 .

8.3.4 Pearson

It is not necessary to aim at a complete annihilation of the oscillations in one step.


If the first step is subdivided into s equal steps of length k1 = k/s as suggested
by Pearson [28] then the cumulative damping will be g ∗ = g s and a larger value
of bµ1 such as 2 or 5 is acceptable.

8.3.5 EI – Exponentially increasing steps

One problem with both SM and Pearson is that the change in time step from
k1 to k may itself produce unwanted high frequency effects. We might therefore
suggest another way of subdividing the first interval, namely by exponentially
increasing subintervals (cf. [6]) where the subintervals are given by ki = βki−1 ,
P
i = 2, . . . , s for some β > 1 and with s1 ki = k. This gives a smoother transition

68
from the subintervals to the regular intervals, especially when β is large, i.e. near
2. The Pearson method can be viewed as a special case when β = 1.

8.4 Discussion

We refer the reader to [41] for a more detailed treatment and comparison of the
five different proposals. Here we shall just summarize the results by mentioning
that AV and IM1 are very economical but also limited in how large a reduc-
tion of the oscillations they can achieve. Furthermore they perform worse for
other than the highest frequency component and therefore show results which
are poorer than what theory predicts. If the reduction achieved with AV or
IM1 is not sufficient we must resort to one of the other three methods where
any amount of reduction is theoretically possible but at a higher computational
expense. The latter two methods perform better for lower frequency components
and are therefore better than what theory predicts.
The implicit method avoids the problem with oscillations completely but was
ruled out because it is only first order accurate in time. Using the extrapolation
techniques of Chapter 10 we can raise the order of the implicit method and thus
get the better of both worlds as reported in [20] and [12].

8.5 A discontinuous corner

Consider the following example


ut = uxx , 0 ≤ x ≤ 1, t > 0,
u(0, x) = 1, 0 ≤ x ≤ 1,
u(t, 0) = 0, t > 0,
u(t, 1) = 1, t > 0.

There is a jump discontinuity between the initial value and the boundary value
at (0,0), and we are faced with a decision about which value to choose for v00 , a
decision which depends on which numerical method we have chosen.
If we use the explicit method the corner point enters in a second difference of
initial value points, and it would seem natural to use the initial value also at the
corner point, but in fact this only delays the effect one time step.
If we use the implicit method the corner point is not used at all and the problem
disappears. This makes the implicit method seem like an ideal choice despite the
fact that it is only first order accurate in time.

69
The behaviour of Crank-Nicolson is studied in exercise 1. With the considerations
of the previous sections in mind the best option might be to begin with one
implicit step and then switch to Crank-Nicolson in order to keep the overall
second order accuracy.

8.6 Exercises
1. Solve the problem in the previous example with the implicit method and
Crank-Nicolson (with various choices of v00 ) with h = k = 0.1 up to t = 0.5.
1 1 1
2. Solve problem 2 with h = k = 10 , 20 , and 40 using Crank-Nicolson and AV
and IM1 (cf. section 8.3).
Compute the max-norm and the 2-norm of the error for
t = 0.1, 0.2, 0.3, 0.4, 0.5.

70
Chapter 9

The Global Error – Theoretical


Aspects

9.1 The local error

Information about the error of a finite difference scheme for solving a partial
differential equation is often stated in terms of the local error which is the error
committed in one step given correct starting values, or more frequently as the
local truncation error expressed in terms of a Taylor expansion, again for a single
step and with presumed correct starting values. Rather than giving numerical
values one often resorts to giving the order of the scheme in terms of the step size
such as O(h) or O(h2 ). The interesting issues, however, are the magnitude of the
global error, i.e. the difference between the true solution and the computed value
at a specified point, in a sense the cumulated value of all the local errors up to
this point, and the order of this error in terms of the step size used.

9.2 The global error

We study the linear, parabolic equation


ut = buxx − aux + κu + ν (9.1)
or as we prefer to write it here
P u = ut − buxx + aux − κu = ν (9.2)
using the partial differential operator P . The coefficients b, a, κ, and ν may
depend on t and x. We produce a numerical solution v(t, x) and our basic as-
sumption is that the global error can be expressed in terms of a series expansion

71
in the step sizes k and h

v(t, x) = u(t, x) − hc − kd − hke − h2 f − k 2 g − · · · (9.3)

The auxiliary functions c, d, e, f , and g are functions of t and x but do not


depend on the step sizes h and k. They need not all be present in any particular
situation. Often we shall observe that c or e or d are identically zero such that
the numerical solution is second order accurate in one or both of the step sizes.
Strictly speaking v(t, x) is only defined on a discrete set of grid points but it is
possible to extend it in a differentiable manner to the whole region. Actually
this can be done in many ways. The same considerations apply to the auxiliary
functions and we shall in the following see a concrete way of extending these.
The formula (9.3) expresses an assumption or a hypothesis and as such can not be
proved, but it leads to predictions which can be verified computationally, thereby
identifying those situations where the hypothesis can be assumed to hold. The
hypothesis expresses the notion that the computed solution contains information,
not only about the true solution, but also about the truncation error.
We can get information on the auxiliary functions by studying the difference
equations and by using Taylor expansions. We first look at the explicit scheme.

9.3 The explicit method

We use the difference operators from section 1.7


n n n
vm+1 − 2vm + vm−1
δ 2 vm
n
= , (9.4)
h2
v n − vm−1
n
n
µ̃δvm = m+1 . (9.5)
2h
The explicit scheme for (9.2) can now be written as
n+1 n
vm − vm
− bnm δ 2 vm
n
+ anm µ̃δvm
n
− κnm vm
n n
= νm . (9.6)
k
We apply (9.3) and Taylor expand around (nk, X1 + mh):
n+1 n
vm − vm 1 1 1 1
= ut + kutt + k 2 uttt − hct − hkctt − kdt − k 2 dtt
k 2 6 2 2
− hket − h ft − k gt + O(k + k h + kh + h3 )
2 2 3 2 2
(9.7)
1
δ 2 vm
n
= uxx + h2 u4x − hcxx − kdxx − hkexx (9.8)
12
− h2 fxx − k 2 gxx + O(· · ·)

72
n 1
µ̃δvm = ux + h2 uxxx − hcx − kdx − hkex (9.9)
6
− h fx − k 2 gx + O(· · ·)
2

n
vm = u − hc − kd − hke − h2 f − k 2 g + O(· · ·) (9.10)

We insert (9.7) – (9.10) in (9.6) and equate terms with the same powers of h and
k:

1: Pu = ν (9.11)
h: Pc = 0 (9.12)
1
k: Pd = utt (9.13)
2
1
hk : P e = − ctt (9.14)
2
1 1
h2 : P f = − bu4x + auxxx (9.15)
12 6
1 1
k2 : Pg = uttt − dtt (9.16)
6 2

The first thing we notice is that we in (9.11) recover the original equation for u
indicating that the difference scheme (and our assumption (9.3)) is consistent.
The auxiliary functions are actually only defined on the grid points but inspired
by (9.12) – (9.16) it seems natural to extend them between the gridpoints such
that these differential equations are satisfied at all points in the region. We note
that each of the auxiliary functions should satisfy a differential equation very
similar to the original one, the only difference lying in the inhomogeneous terms.

9.4 The initial condition

In order to secure a unique solution to (9.1) we must impose some side conditions.
One of these is an initial condition, typically of the form

u(0, x) = u0 (x), X 1 ≤ x ≤ X2 , (9.17)

where u0 (x) is a given function of x. It is natural to expect that we start our


0
numerical solution as accurately as possible, i.e. we set vm = v(0, X1 + mh) =
u0 (X1 + mh) for all grid points between X1 and X2 . But we would like to
extend v between grid points as well, and the natural thing would be to set
v(0, x) = u0 (x), X1 ≤ x ≤ X2 . With this assumption we see from (9.3) that

c(0, x) = d(0, x) = e(0, x) = f (0, x) = g(0, x) = · · · = 0, X1 ≤ x ≤ X2 . (9.18)

73
In section 8.3.2 we discussed the positive effects on discontinuities which we can
achieve by using the implicit method for one step and then switch to Crank-
Nicolson. To study the effects on accuracy we note that we shall be solving
with Crank-Nicolson for t > k with an initial condition at t = k given by the
values from the implicit method. Since the local error for the implicit method is
O(k 2 + kh2 ) (cf. section 3.4) we have
c(k, x) = d(k, x) = e(k, x) = f (k, x) = 0, X1 ≤ x ≤ X2 (9.19)
in addition to a nonzero value for g(k, x). As we shall see in section 9.8 this has
no effect on the order of the global error for Crank-Nicolson since the differential
equation (9.41) for g is inhomogeneous anyway. It might have an effect on the
magnitude of the global error though.

9.5 Dirichlet boundary conditions

In order to secure uniqueness we must in addition to the initial condition impose


two boundary conditions which could look like
u(t, X1) = u1 (t), u(t, X2 ) = u2 (t), t > 0, (9.20)
where u1 (t) and u2 (t) are two given functions of t. Just like for the initial condition
it is natural to require v(t, x) to satisfy these conditions not only at the grid points
on the boundary but on the whole boundary and as a consequence the auxiliary
functions will all assume the value 0 on the boundary:
c(t, X1 ) = d(t, X1 ) = e(t, X1 ) = f (t, X1 ) = g(t, X1 ) = · · · = 0, t > 0, (9.21)
c(t, X2 ) = d(t, X2 ) = e(t, X2 ) = f (t, X2 ) = g(t, X2 ) = · · · = 0, t > 0. (9.22)

9.6 The error for the explicit method

If we have an initial-boundary value problem for (9.1) with Dirichlet boundary


conditions, and if we use the explicit method for the numerical solution then we
have the following results for the auxiliary functions:
The differential equation (9.12) for c(t, x) is homogeneous and so are the side
conditions according to (9.18), (9.21), and (9.22). c(t, x) ≡ 0 is a solution, and
by uniqueness the only one. It follows that c(t, x) ≡ 0 and therefore that there
is no h-contribution to the global error in (9.3).
The differential equation (9.14) for e(t, x) is apparently inhomogeneous, but since
c(t, x) ≡ 0 so is ctt and the equation is homogeneous after all. So are the side
conditions and we can conclude that e(t, x) ≡ 0.

74
The global error expression (9.3) for the explicit method therefore takes the form

v(t, x) = u(t, x) − kd − h2 f − k 2 g − · · · (9.23)

and we deduce that the explicit method is indeed first order in time and second
order in space.
For d we have from (9.13) that P d = 21 utt so we must require the problem to
be such that u is twice differentiable w.r.t. t. This is usually no problem except
possibly in small neighbourhoods around isolated points on the boundary.

9.7 The implicit method

We write the implicit method as


n n−1
vm − vm
− bnm δ 2 vm
n
+ anm µ̃δvm
n
− κnm vm
n n
= νm . (9.24)
k
where time step n − 1 now contains the known values and time step n the values
we are about to calculate. Equations (9.8) – (9.10) still hold while equation (9.7)
is replaced by:
n n−1
vm − vm 1 1 1 1
= ut − kutt + k 2 uttt − hct + hkctt − kdt + k 2 dtt
k 2 6 2 2
− hket − h ft − k gt + O(k + k h + kh + h3 )
2 2 3 2 2
(9.25)

Equating terms as before we get a set of equations rather similar to (9.11) –


(9.16). (9.11) and (9.12) are unchanged, there is a single sign change in (9.14)
and we can still conclude that c(t, x) ≡ e(t, x) ≡ 0. The remaining equations are

1
k: P d = − utt (9.26)
2
1 1
h2 : P f = − bu4x + auxxx (9.27)
12 6
1 1
k2 : Pg = uttt + dtt (9.28)
6 2
and the error expansion for the implicit method has the same form as (9.23).
Since there is a sign change in (9.26) as compared to (9.13) we can conclude that
dIm (t, x) = −dEx (t, x). The right-hand-side of (9.27) is the same as in (9.15)
and the sign change in the right-hand-side of (9.28) is compensated by d being
of opposite sign. We therefore have that f (t, x) and g(t, x) are the same for the
explicit and the implicit method.

75
9.7.1 An example

Consider test problem 1:


ut = uxx , −1 ≤ x ≤ 1, t > 0,
u(0, x) = u0 (x) = cos x, −1 ≤ x ≤ 1,
u(t, −1) = u(t, 1) = e−t cos 1, t > 0.

with the true solution u(t, x) = e−t cos x.


For the explicit method we have
1
P d = dt − dxx = utt .
2
For f we have similarly
1
P f = ft − fxx = − u4x .
12
Since ut = uxx we have utt = utxx = u4x such that d(t, x) = −6f (t, x).
For the explicit method we must have k = µh2 with µ ≤ 12 . Keeping µ fixed the
leading terms in the error expansion are
kd + h2 f = −6µh2 f + h2 f = (1 − 6µ)h2 f.
If we choose µ = 12 as is common we get the leading term of the error to be
−2h2 f (t, x). There is an obvious advantage in choosing µ = 16 in which case we
obtain fourth order accuracy in h.
If we use the implicit method f stays the same and d changes sign and the leading
terms of the error expansion become
6µh2 f + h2 f = (1 + 6µ)h2 f.
With µ = 21 the error becomes 4h2 f (t, x) i.e. twice as big (and of opposite sign)
as for the explicit method (cf. the solutions to exercises 1 and 5 of Chapter 1).
There is no value for µ that will secure a higher order accuracy.

9.8 Crank-Nicolson

The Crank-Nicolson method can be written as


n+1 n
vm − vm 1 1
− bn+1 2 n+1
m δ vm − bnm δ 2 vm
n
(9.29)
k 2 2
1 1 1 1 1 n+1
+ an+1
m µ̃δvm
n+1
+ anm µ̃δvm
n
− κn+1
m vm
n+1
− κnm vm
n
= n
(νm + νm ).
2 2 2 2 2
76
The optimal expansion point is now ((n + 12 )k, mh). To take full advantage of
the even/odd cancellations we split the expansion in two stages. First we do the
expansions (9.8) and (9.9) for time step n and n + 1 and then we combine the
results using the formula
1 n+1 1 n 1 1
u + u = un+ 2 + k 2 utt + O(k 4 ) (9.30)
2 2 8
on all the individual terms. The resulting equations are
n+1 n
vm − vm 1
= ut + k 2 uttt − hct − kdt − hket (9.31)
k 24
− h2 ft − k 2 gt + O(k 3 + k 2 h + kh2 + h3 )
1 n+1 2 n+1 n+ 1 1
(bm δ vm + bnm δ 2 vmn
) = bm 2 {uxx + h2 u4x − hcxx − kdxx − (9.32)
2 12
1
hkexx − h2 fxx − k 2 gxx } + k 2 (buxx )tt + O(· · ·)
8
1 n+1 n+1 n n n+ 21 1 2
(a µ̃δvm + am µ̃δvm ) = am {ux + h uxxx − hcx − kdx − hkex (9.33)
2 m 6
1
− h fx − k gx } + k 2 (aux )tt + O(· · ·)
2 2
8
1 n+1 n+1 n+ 21
(κm vm + κm vm ) = κm {u − hc − kd − hke − h2 f − k 2 g}
n n
(9.34)
2
1
+ k 2 (κu)tt + O(· · ·)
8
1 n+1 1
(νm + νm ) = ν + k 2 νtt + O(· · ·)
n
(9.35)
2 8
We insert (9.31) – (9.35) in (9.29) and equate terms with the same powers of h
and k:

1: Pu = ν (9.36)
h: Pc = 0 (9.37)
k: Pd = 0 (9.38)
hk : Pe = 0 (9.39)
1 1
h2 : Pf = − bu4x + auxxx (9.40)
12 6
1 1 1 1 1
k2 : Pg = uttt − (buxx )tt + (aux )tt − (κu)tt − νtt (9.41)
24 8 8 8 8
The right-hand-side in (9.41) looks rather complicated but if the solution to (9.1)
is smooth enough such that we can differentiate (9.1) twice w.r.t. t then we can
combine the last four terms in (9.41) to − 18 uttt and the equation becomes
1
k2 : Pg = − uttt (9.42)
12
77
If the inhomogeneous term ν(t, x) in the equation (9.1) can be evaluated at the
n+ 1
mid-points ((n + 21 )k, mh) then it is tempting to use νm 2 instead of 21 (νm
n+1 n
+ νm )
1
in (9.29). We shall then miss the term with 8 νtt in (9.41) and therefore not have
complete advantage of the reduction leading to (9.42). Instead we shall have

1 1
k2 : Pg = − uttt + νtt .
12 8
It is impossible to say in general which is better, but certainly (9.42) is simpler.
Looking at equations (9.36) – (9.42) we again recognize the original equation for
u in (9.36), and from (9.37) – (9.39) we may conclude that c(t, x) ≡ d(t, x) ≡
e(t, x) ≡ 0 showing that Crank-Nicolson is indeed second order in both k and h.
We also note from (9.40) that f (t, x) for Crank-Nicolson is the same function as
for the explicit and the implicit method.

9.8.1 Example continued

For our example we have


1
P g = gt − gxx = − uttt .
12
For this particular problem we have uttt = −utt = −u4x such that
f (t, x) = −g(t, x) and that the leading terms of the error are

h2 f + k 2 g = (h2 − k 2 )f.

There is a distinct advantage in choosing k = h in which case the second order


terms in the error expansion will cancel (cf. the answer to exercise 12 of Chapter
1), but we must stress that this holds for this particular example and is not a
general result for Crank-Nicolson.

9.9 Upwind schemes

When |a| is large compared to b we occasionally observe oscillations in the nu-


merical solution. One remedy is to reduce the step sizes but this costs computer
time. Another option is to use an upwind scheme such as in the explicit case (for
a > 0):
n+1 n n n
vm − vm n 2 n n vm − vm−1
− bm δ vm + am − κnm vm
n n
= νm . (9.43)
k h
78
To analyze the effect on the error we use
n n
vm − vm−1 1 1 1
= ux − huxx + h2 uxxx − hcx + h2 cxx (9.44)
h 2 6 2
1
− kdx + hkdxx − hkex − h2 fx − k 2 gx + O(· · ·)
2

together with (9.7), (9.8), and (9.10). Equating terms with the same powers of
h and k now gives

1: Pu = ν (9.45)
1
h: P c = − auxx (9.46)
2
1
k: Pd = utt (9.47)
2
1 1
hk : P e = − ctt + adxx (9.48)
2 2

From (9.46) and (9.48) we conclude that c(t, x) and e(t, x) are no longer identically
zero and that the method is now first order in both k and h. A similar result
holds for the implicit scheme. For Crank-Nicolson the order in h is also reduced
to 1 but we keep second order accuracy in k.

9.10 Boundary conditions with a derivative

If one of the boundary conditions involves a derivative then the discretization of


this has an effect on the global error of the numerical solution. Assume that the
condition on the left boundary is

αu(t, X1 ) − βux (t, X1 ) = γ, t>0 (9.49)

where α, β and γ may depend on t. A similar condition might be imposed on the


other boundary and the considerations would be completely similar so we shall
just consider a derivative condition on one boundary. We shall in turn study
three different discretizations of the derivative in (9.49):

v1n − v0n
(first order) (9.50)
h
−v2n + 4v1n − 3v0n
(second order, asymmetric) (9.51)
2h
v1n − v−1
n
(second order, symmetric) (9.52)
2h

79
9.10.1 First order approximation

We first use the approximation (9.50) in (9.49). If the coefficients α, β and γ


depend on t they should be evaluated at t = nk:
v1n − v0n
αv0n − β = γ, t > 0. (9.53)
h
We now use the assumption (9.3) and Taylor-expand v1n around (nk, X1 ):
1 1
α{u − hc − kd − hke − h2 f − k 2 g} − β{ux + huxx + h2 uxxx − hcx
2 6
1 2 1
− h cxx − kdx − hkdxx − hkex − h2 fx − k 2 gx } − γ = O(· · ·) (9.54)
2 2
Collecting terms with 1, h, k, hk, h2 , and k 2 as before we get

1: αu − βux = γ (9.55)
1
h: αc − βcx = − βuxx (9.56)
2
k: αd − βdx = 0 (9.57)
1
hk : αe − βex = βdxx (9.58)
2
1 1
h2 : αf − βfx = − βuxxx + βcxx (9.59)
6 2
k2 : αg − βgx = 0 (9.60)

We recognize the condition (9.49) for u in (9.55). As for c the boundary condition
(9.56) is no longer homogeneous and we shall expect c to be nonzero. This holds
independently of which method is used for the discretization of the equation (9.1).
So if we use a first order boundary approximation we get a global error which is
first order in h.

9.10.2 Asymmetric second order

We now apply the approximation (9.51) in (9.49):


−v2n + 4v1n − 3v0n
αv0n −β = γ, t > 0. (9.61)
2h
We again use the assumption (9.3) and Taylor-expand v1n and v2n around (nk, X1 ):
1
α{u − hc − kd − hke − h2 f − k 2 g} − β{ux − h2 uxxx − hcx
3
− kdx − hkex − h2 fx − k 2 gx } − γ = O(h3 + h2 k + hk 2 + k 3 ) (9.62)

80
x

1.0
0.9
0.8
0.7

0.2 0.6
0.5

0.4

0.3

0.1
0.2

0.1

0.0 0.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 t

Figure 9.1: c(t, x).

Collecting terms with 1, h, k, hk, h2 , and k 2 as before we get


1: αu − βux γ = (9.63)
h: αc − βcx 0 = (9.64)
k: αd − βdx 0 = (9.65)
hk : αe − βex 0 = (9.66)
1
h2 : αf − βfx = βuxxx (9.67)
3
k2 : αg − βgx = 0 (9.68)
We recognize the condition (9.49) for u in (9.63). We now have a homogeneous
condition (9.64) for c and this will assure that c(t, x) ≡ 0 when we combine
(9.61) with the explicit, the implicit, or the Crank-Nicolson method. We also
have e(t, x) ≡ 0, but in order to have d(t, x) ≡ 0 we must use the Crank-Nicolson
method. One disadvantage with this asymmetric approximation which does not
show in equations (9.63) – (9.68) is that the next h-term is third order and
therefore can be expected to interfere more than the fourth order term which is
present in the symmetric case below.

9.10.3 Symmetric second order

We finally apply the symmetric approximation (9.52) in (9.49):


n
v1n − v−1
αv0n −β = γ, t > 0. (9.69)
2h
81
x

0.02 1.0
0.9
0.8
0.7
0.6
0.5

0.4
0.01
0.3

0.2

0.1

0.00 0.0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 t

Figure 9.2: g(t, x).

We again use the assumption (9.3) and Taylor-expand v1n and v−1
n
around (nk, X1 ):

1
α{u − hc − kd − hke − h2 f − k 2 g} − β{ux + h2 uxxx − hcx
6
− kdx − hkex − h fx − k gx } − γ = O(h + h k + hk 2 + k 3 )
2 2 3 2
(9.70)
Collecting terms with 1, h, k, hk, h2 , and k 2 as before we get
1: αu − βux = γ (9.71)
h: αc − βcx = 0 (9.72)
k: αd − βdx = 0 (9.73)
hk : αe − βex = 0 (9.74)
1
h2 : αf − βfx = − βuxxx (9.75)
6
k2 : αg − βgx = 0 (9.76)
All the same conclusions as for the asymmetric case will hold also in this sym-
metric case.

9.10.4 Test problem 3 revisited

To illustrate the above analyses let us look at test problem 3 on page 36:
ut = uxx , 0 ≤ x ≤ 1, t > 0,

82
u(0, x) = u0 (x) = cos x, 0 ≤ x ≤ 1,
u(t, 1) = e−t cos 1, t > 0,
ux (t, 0) = 0, t > 0.

with the true solution u(t, x) = e−t cos x.


We wish to solve numerically using Crank-Nicolson and want to study the be-
haviour of the global error using various discretizations of the derivative boundary
condition.
Using the first order boundary approximation we have d = e = 0 and the global
error will be on the form
h c + h2 f + k 2 g + · · ·
We have solved the initial-boundary value problems for the functions c(t, x) and
g(t, x) (using Crank-Nicolson and h = k = 0.025) and show the results graphically
in Fig. 9.1 and Fig. 9.2.
It is clear from the figures that the first order contribution to the error is consid-
erable. The values of c(t, x) lie between 0 and 0.28 and those of g(t, x) between 0
and 0.022 and from this we could estimate the truncation error for given values
of the step sizes h and k. Or we could suggest step sizes in order to make the
truncation error smaller than a given tolerance.
With the second order boundary approximations c(t, x) is expected to be identi-
cally 0 and the accuracy correspondingly better. The derivative boundary con-
dition for f reduces to fx (t, 0) = 0 for this particular case since uxxx(t, 0) =
e−t sin 0 = 0. We can therefore again conclude that f (t, x) = −g(t, x).
In more general situations it is not so easy to gain information about the auxiliary
functions in this way. In the next chapter we shall see how we can let the computer
do the work and verify our assumptions about the order of the error and at the
same time gain information about the magnitude of the error.

83
9.11 Exercise
1. Solve problem 1 on page 5 with the explicit method, the implicit method
1 1 1
and Crank-Nicolson from t = 0 to t = 0.5 with h = 10 , 20 , and 40 , and with
2
µ(= k/h ) = 1/6.
Compute the max-norm and the 2-norm of the error for each method for
t = 0.1, 0.2, 0.3, 0.4, 0.5.

84
Chapter 10

Estimating the Global Error and


Order

10.1 Introduction

In the previous chapter we introduced the basic hypothesis that the global error
could be expressed as a power series in h and k and with the auxiliary functions
c, d, . . . , and we found differential equations defining these functions. In this
chapter we shall see how we can get the computer to help us acquiring information
on the order of the method and the magnitude of the auxiliary functions at the
grid points.

10.2 The global error

We shall begin our analysis in one dimension and later extend it to functions of
two or more variables. We shall first define what we mean by the global error
being of order say O(h). Let u(x) be the true solution, and let v(x) be the
computed solution. Our basic hypothesis is (as in Chapter 9) that the computed
solution can be written as

v(x) = u(x) − hc(x) − h2 d(x) − h3 f (x) − h4 g(x) − · · · (10.1)

where c(x), d(x), f (x), and g(x) are differentiable functions of x alone, the de-
pendence of v on the step size h being expressed through the power series in h.
This is a hypothesis and as such can not be proved, but it leads to predictions
which can be verified computationally, thereby identifying those situations where
the hypothesis can be assumed to hold.

85
If the function c(x) happens to be identically 0 then the method is (at least) of
second order, otherwise it is of first order. Even if c(x) is not identically 0 then it
might very well have isolated zeroes. At such places our analysis might give results
which are difficult to interpret correctly. Therefore the analysis should always be
performed for a substantial set of grid points in order to give trustworthy results.
In the following we shall show how we by performing calculations with various
values of the step size, h, can extract information not only about the true solution
but also about the order and magnitude of the error.
A calculation with step size h will yield

v1 = u − hc − h2 d − h3 f − h4 g − · · · (10.2)

A second calculation with twice as large a step size gives

v2 = u − 2hc − 4h2 d − 8h3 f − 16h4 g − · · · (10.3)

We can now eliminate u by subtraction:

v1 − v2 = hc + 3h2 d + 7h3 f + 15h4 g + · · · (10.4)

A third calculation with 4h is necessary to retrieve information about the order

v3 = u − 4hc − 16h2 d − 64h3 f − 256h4 g − · · · (10.5)

whence

v2 − v3 = 2hc + 12h2 d + 56h3 f + 240h4 g + · · · (10.6)

and a division gives the order-ratio:

v2 − v3 c + 6hd + 28h2 f + 120h3 g + · · ·


q = = 2 . (10.7)
v1 − v2 c + 3hd + 7h2 f + 15h3 g + · · ·

This ratio can be computed in all those points where we have information from all
three calculations, i.e. all grid points corresponding to the last calculation with
step size 4h.
If c 6= 0 and h is suitably small we shall observe numbers in the neighbourhood of
2 in all points, and this would indicate that the method is of first order. If c = 0
and d 6= 0, then the quotient will assume values close to 4 and if this happens for
many points and not just at isolated spots then we can deduce that c is identically
0 and that the method is of second order. The smaller h the smaller influence for
the next terms in the numerator and the denominator, and the picture should
become clearer.

86
The error in the first calculation, v1 , is given by

e1 = u − v1 = hc + h2 d + h3 f + h4 g + · · · (10.8)

If we observe many values of the order-ratio (10.7) in the neighbourhood of 2


indicating that |c| is substantially larger than h|d|, and that the method therefore
is of first order, then e1 is represented reasonably well by v1 − v2 :

e1 = v1 − v2 − 2h2 d − 6h3 f − · · · (10.9)

and v1 − v2 can be used as an estimate of the error in v1 .


One could choose to add v1 − v2 to v1 and thereby get more accurate results.
This process is called Richardson extrapolation and can be done for all grid points
involved in the calculation of v2 .

v1′ = v1 + (v1 − v2 ) = u + 2h2 d + 6h3 f + 14h4 g + · · · (10.10)

If the error (estimate) behaves nicely we might even consider interpolating to the
intermediate points and thus get extrapolated values with spacing h. Interpola-
tion or not, we cannot at the same time, i.e. without doing some extra work, get
a realistic estimate of the error in this improved value. The old estimate can of
course still be used but it is expected to be rather pessimistic.
If in contrast we observe many numbers in the neighbourhood of 4 then |c| is
substantially less than h|d| and is probably 0. At the same time |d| will be larger
than h|f |, and the method would be of second order with an error

e2 = u − v1 = h2 d + h3 f + h4 g + · · · (10.11)

This error will be estimated nicely by (v1 − v2 )/3:


1 4
e2 = (v1 − v2 ) − h3 f − 4h4 g − · · · (10.12)
3 3
It is thus important to check the order before calculating an estimate of the error
and certainly before making any corrections using this estimate. If in doubt it
is usually safer to estimate the order on the low side. If the order is 2 and the
correct error estimate therefore (v1 − v2 )/3, then misjudging the order to be 1
and using v1 − v2 for the error estimate would not be terribly bad, and actually
on the safe side. But if we want to attempt Richardson extrapolation it is very
important to have the right order.
If our task is to compute function values with a prescribed error tolerance then the
error estimates can also be used to predict a suitable step size which would satisfy
this requirement and in the second round to check that the ensuing calculations
are satisfactory.

87
How expensive are these extra calculations which are needed to gain information
on the error? We shall compare with the computational work for v1 under the
assumption that the work is proportional to the number of grid points. Therefore
v2 costs half as much as v1 , and v3 costs one fourth. The work involved in
calculating v1 − v2 , v2 − v3 and their quotient which is done for 1/4 of the grid
points will not be considered since it is assumed to be considerably less than the
fundamental difference calculations.
The work involved in finding v1 , v2 and v3 is therefore 1.75, i.e. an extra cost
of 75%, and that is actually very inexpensive for an error estimate. Getting
information on the magnitude of the error enables us to choose a realistic step
size and thus meet the requirements without performing too many unnecessary
calculations. If the numbers allow an extrapolation then the result of this is
expected to be much better than a calculation with half the step size and we are
certainly better off. If the computational work increases faster than the number
of grid points then the result is even more in favour of the present method.

4
2

-1 1

Figure 10.1: The function w(y) = 2 1+2y


1+y
.

10.3 Can we trust these results?

Yes, if we really observe values of the order-ratio (10.7) between say 1.8 and 2.2
for all relevant grid points then the method is of first order and the first term in
the remainder series dominates the rest. Discrepancies from this pattern in small
areas are also allowed. They may be due to the fact that c(x) has an isolated

88
zero. This can be checked by observing the values of v1 − v2 in a neighbourhood.
These numbers which are usually dominated by the term hc will then become
smaller and display a change of sign indicating that c(x) has a zero somewhere in
the neighbourhood. The zero of c(x) and that of v1 − v2 will usually not coincide,
since the latter will correspond to c(x) ≈ −3hd(x). The global error itself will also
display a sign change and thus be small and pass through zero somewhere close.
We don’t know precisely where, and the exact location is also of academic interest
only, since we only have information on the computed solution at a discrete set
of points. In a small neighbourhood around this zero the error estimate, v1 − v2
may not even reproduce the sign of the error correctly, but as long as the absolute
value is small this is of lesser significance. The important thing is that the error
estimate is reliable in sign and magnitude when the error is large, and this will
be the case as long as the order ratio stays close to 2.
If a method is of first order and we choose to exploit the error estimate to adjust
the calculated value (i.e. to perform Richardson extrapolation) then it might be
reasonable to assume that the resulting method is of second order as indicated in
(10.10). This of course can be tested by repeating the above process. We shall
need a fourth calculation v4 (with step size 8h), such that we can compute three
extrapolated values, vq′ = vq + (vq − vq+1 ), q = 1, 2, 3, on the basis of which
we can get information about the (new) order. We of course expect the order
to be at least 2, but it is important to have this extra assurance that our basic
hypothesis is valid. If the results do not confirm this then it might be an idea to
review the calculations.
What will actually happen if we perform a Richardson extrapolation based on
a wrong assumption about the order? Usually not too much. If we attempt to
eliminate a second order term in a first order calculation then the result will still
be of first order; and if we attempt to eliminate a first order term in a second
order process then the absolute value of the error will double but the result will
retain its high order.
If we want to understand in detail what might happen to the order-ratio (10.7) in
the strange areas, i.e. how the ratio might vary when h|d| is not small compared
to |c|, then we can consider the behaviour of the function

1 + 2y
w(y) = 2 (10.13)
1+y

where y = 3h dc (see Fig. 10.1).


If y is positive, then 2 < w(y) < 4, and w(y) → 4, when y → ∞.
This corresponds to c and d having the same sign.
If y is small then w(y) ≈ 2.
If y is large and negative then w(y) > 4, and w(y) → 4 when y → −∞.

89
The situation y → ±∞ corresponds to c = 0, i.e. that the method is of second
order.
The picture becomes rather blurred when y is close to −1, i.e. when c and d have
opposite sign and c ≈ −3hd:

y ↑ −1 ⇒ w → +∞
y ↓ −1 ⇒ w → −∞
1
−1 < y < − ⇒ w<0
2
But in these cases we are far away from |c| ≫ h|d|.
Reducing the step size by one half corresponds to reducing y by one half.
If 0 < w(y) < 4 then w( y2 ) will be closer to 2.
If w(y) < 0 then 0 < w( y2 ) < 2.
If 6 < w(y) then w( y2 ) < 0.
If 4 < w(y) < 6 then w( y2 ) > w(y).
If c and d have opposite sign and c is not dominant, the picture will be rather
chaotic, but a suitable reduction of h will result in a clearer picture if the funda-
mental assumptions are valid.
We have been rather detailed in our analysis of first order methods with a non-
vanishing second order term. Quite similar analyses can be made for second and
third order, or for second and fourth order or for higher orders. If the ratio (10.7)
is close to 2p then our method is of order p.

10.4 Further improvements of the error estimate

The error estimates we compute are just estimates and not upper bounds on the
magnitude of the error. They will often be very realistic, but they may sometimes
underestimate the error, and it would be useful to identify those situations. The
following analysis will show that this happens when the order ratio is consistently
smaller than 2p . On the other hand, if the order ratio is larger than 2p then the
error estimate is usually a (slight) overestimate.
If the method is first order (c 6= 0) we expect the next term in the error expansion
to be second order (d 6= 0) and we have

v2 − v3 1 + 2y
q = ≈ 2 ≈ 2(1 + y) with y = 3h dc (10.14)
v1 − v2 1+y
q−2
If we observe values q = 2(1 + ε) then we have y ≈ ε = 2
.

90
From (10.4) and (10.9) we have

d
v1 − v2 = hc(1 + 3h + · · ·) = hc(1 + y + · · ·) (10.15)
c
and
2h dc + · · · 2
e1 = (v1 − v2 )(1 − ) ≈ (v1 − v2 )(1 − y). (10.16)
1+y +··· 3

If ε > 0 (i.e. q > 2) then the error is smaller than the estimate v1 − v2 , and if
ε < 0 (i.e. q < 2) then the error is larger than the estimate.
Since y ≈ ε = (q − 2)/2 we can even compensate for the effect taking as our
improved error estimate the value
2 q−2
est12 = (v1 − v2 )(1 − ε) = (v1 − v2 )(1 − ). (10.17)
3 3
A direct calculation reveals that

est12 = e1 − 8h3 f − · · · (10.18)

showing that this improved estimate takes both the first and the second order
term into account.
We must of course be careful with these calculations. They should only be used
when ε is small and varies slowly over the region in question.
If the method is second order (c = 0, d 6= 0) then the next term in the error
expansion might be third order (f 6= 0) or fourth order (f = 0, g 6= 0). In any
case
v2 − v3 1 + 2y + 4z
q = ≈ 4 (10.19)
v1 − v2 1+y+z
with
7 f g
y = h , z = 5h2 (10.20)
3 d d
From (10.4) and (10.12) we have

7 f g
v1 − v2 = 3h2 d(1 + h + 5h2 + · · ·) = 3h2 d(1 + y + z + · · ·) (10.21)
3 d d
and
v1 − v2 4h3 f + 12h4 g + · · · v1 − v2 4 4
e2 = (1 − ) ≈ (1 − y − z). (10.22)
3 3h2 d + · · · 3 7 5

91
If we have a second and a third order term then y will probably dominate z and
1 + 2y
q ≈ 4 ≈ 4(1 + y)
1+y
and if we observe values q = 4(1 + ε) then we have ε ≈ y and the error estimate
should be
v1 − v2 4 v1 − v2 4q −4
est23 = (1 − ε) = (1 − ). (10.23)
3 7 3 7 4
If the next term is fourth order then y = 0 and
1 + 4z
q ≈ 4 ≈ 4(1 + 3z)
1+z
and if we observe values q = 4(1 + ε) then we have ε ≈ 3z and the error estimate
should be
v1 − v2 4 v1 − v2 4 q−4
est24 = (1 − ε) = (1 − ). (10.24)
3 15 3 15 4
Although we might have our suspicions it is not easy to know whether the next
term is third or fourth order and this is important in order to decide which
correction to apply. We can therefore not recommend using (10.23) or (10.24)
directly for error estimation or extrapolation. To be on the safe side we instead
suggest the following guidelines for error estimation:
If ε > 0 (i.e. q > 4) then (v1 − v2 )/3 is probably larger than the error and can
safely be used as an error estimate.
If ε < 0 (i.e. q < 4) then the error is larger than (v1 − v2 )/3 and we recommend
using (1 − 47 ε)(v1 − v2 )/3 as the error estimate. If the next term is fourth order
we shall be on the safe side; if it is third order the estimate will probably be more
realistic, but it might be a slight underestimate.

10.5 Two independent variables

If u is a function of two or more variables then we can perform similar analyses


taking one variable at a time. If say u(t, x) is a function of two variables, t and
x, and v is a numerical approximation based on step sizes k and h then our basic
hypothesis would be

v1 = u − hc − kd − hke − h2 f − k 2 g − · · · (10.25)

A calculation with 2h and k gives

v2 = u − 2hc − kd − 2hke − 4h2 f − k 2 g − · · ·

92
such that

v1 − v2 = hc + hke + 3h2 f + · · · (10.26)

To check the order (in h) we need a third calculation with step sizes 4h and k:

v3 = u − 4hc − kd − 4hke − 16h2 f − k 2 g − · · ·

and we have
v2 − v3 = 2hc + 2hke + 12h2 f + · · ·
and the order-ratio
v2 − v3 c + ke + 6hf + · · ·
= 2 . (10.27)
v1 − v2 c + ke + 3hf + · · ·
For the k-dependence we compute with h and 2k:

v4 = u − hc − 2kd − 2hke − h2 f − 4k 2 g − · · ·

and with h and 4k:

v5 = u − hc − 4kd − 4hke − h2 f − 16k 2 g − · · ·

such that

v1 − v4 = kd + hke + 3k 2 g + · · · (10.28)

and
v4 − v5 d + he + 6kg + · · ·
= 2 . (10.29)
v1 − v4 d + he + 3kg + · · ·
Using (10.27) and (10.29) we can check the order in h and k of our approximation
and through (10.26) and (10.28) we can get information on the leading error
terms.
If the method is first order in h we can estimate the h-component of the error by
v1 − v2 and if the method is first order in k we can estimate the k-component of
the error by v1 − v4 . We can use this information to reduce either or both the
step sizes in order to meet specific error tolerances or we can use Richardson-
extrapolation in order to get higher order and (hopefully) more accurate results.
More specificly

v1 + (v1 − v2 ) + (v1 − v4 ) = u + hke + 2h2 f + 2k 2 g + · · · (10.30)

If the method is first order in k and second order in h as is typical for the implicit
method then the h-component of the error is estimated by (v1 − v2 )/3 and the

93
k-component by v1 − v4 . The latter will often be dominant and it will be natural
to perform Richardson-extrapolation in the k-direction only, arriving at

v1 + (v1 − v4 ) = u − h2 f + 2k 2 g + · · · (10.31)

In order to check that these extrapolations give the expected results, it is again
necessary to supplement with further calculations (and with a more advanced
numbering system for these v ′ s).
When u is a function of two variables with two independent step sizes then the
cost of the five necessary calculations is 2.5 times the cost of v1 . This is still a
reasonable price to pay. Knowing the magnitude of the error and its dependence
on the step sizes enables us to choose near-optimal combinations of these and
thus avoid redundant calculations, and a possible extrapolation might improve
the results considerably more than halving the step sizes and quadrupling the
work.

Table 10.1: h-ratio for first order boundary condition.

t\x 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
0.1 2.05 2.05 2.06 2.07 2.07 2.08 2.08 2.08 2.09 2.09
0.2 2.03 2.03 2.04 2.04 2.04 2.04 2.05 2.05 2.05 2.05
0.3 2.02 2.03 2.03 2.03 2.03 2.03 2.03 2.03 2.03 2.03
0.4 2.02 2.02 2.02 2.02 2.02 2.02 2.02 2.02 2.02 2.02
0.5 2.01 2.01 2.01 2.02 2.02 2.02 2.02 2.02 2.02 2.02
0.6 2.01 2.01 2.01 2.01 2.01 2.01 2.01 2.01 2.01 2.01
0.7 2.00 2.00 2.01 2.01 2.01 2.01 2.01 2.01 2.01 2.01
0.8 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00
0.9 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00
1.0 1.99 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00

10.6 Limitations of the technique.

It is essential for the technique to give satisfactory results that the leading term
in the error expansion is the dominant one. This will always be the case when
the step size is small, but how can we know that the step size is small enough?
This will be revealed by a study of the order-ratio and how it behaves in the
region in question. A picture like the one seen in Table 10.1 is a clear witness of
a first order process where the first order term clearly dominates the rest. The

94
Table 10.2: k-ratio for first order boundary condition.

t\x 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
0.1 1.89 14.69 3.87 1.20 4.30 5.25 4.52 3.42 2.81 2.76
0.2 1.82 17.13 2.65 4.11 4.16 3.99 3.96 4.00 4.03 4.02
0.3 1.68 6.15 3.63 4.13 4.00 3.98 4.01 4.01 4.00 4.00
0.4 1.51 4.19 3.98 4.05 3.98 4.00 4.00 4.00 4.00 4.00
0.5 1.33 3.65 4.06 4.01 3.99 4.00 4.00 4.00 4.00 4.00
0.6 1.15 3.50 4.07 3.99 4.00 4.00 4.00 4.00 4.00 4.00
0.7 1.00 3.49 4.06 3.99 4.00 4.00 4.00 4.00 4.00 4.00
0.8 0.88 3.54 4.05 3.99 4.00 4.00 4.00 4.00 4.00 4.00
0.9 0.82 3.60 4.03 3.99 4.00 4.00 4.00 4.00 4.00 4.00
1.0 0.84 3.67 4.02 4.00 4.00 4.00 4.00 4.00 4.00 4.00

error estimate will be very reliable (and a slight overestimate) and we can expect
good results from an extrapolation.
A behaviour like in Table 10.2 is more difficult to interpret. For x > 0.1 and
t > 0.1 the method is clearly second order (in k) and we should be able to
trust the estimate of the corresponding error component. For small values of t
and especially x our basic hypothesis (10.25) does not seem to quite capture the
situation. A reduction of the step size, k, might help, but the problems may
partly be due to the fact that the contribution to the error from k is so much
smaller than the contribution from h. A comparison of the differences (10.26)
and (10.28) will shed light on this.
Isolated deviations from the pattern such as seen in Table 16.3 at (t, x) =
(1.2, 175) and (0.8, 150) and (0.4, 130) are allowed and can be explained by re-
ferring to Fig. 10.1. The second order term in (10.26) has opposite sign and
the same absolute magnitude as the next in line on a curve in (t, x)-space, and
small values, negative values, and very large values of the order ratio will occur
depending on how close the grid points lie to this curve. A study of the dif-
ferences (10.26) will reveal very small numbers because of this cancellation. An
extrapolation will have little effect because of these small differences. An error
estimate based on these small differences cannot be trusted. It is safer to assume
that the error is about the same as in surrounding points where the order ratio
warrants a better determination. (The error is probably very small somewhere
in the neighbourhood, but we don’t know exactly where.)
The oscillations which often occur when using Crank-Nicolson (cf. Chapter 8) can
also confuse the picture. These oscillations typically have a period of 2 times the
step size such that for example the values at odd steps are large and the values

95
Table 10.3: h-ratio for asymmetric second order.

t\x 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
0.1 3.24 3.43 3.60 3.71 3.80 3.86 3.90 3.93 3.95 3.96
0.2 3.44 3.54 3.63 3.69 3.75 3.79 3.82 3.85 3.87 3.88
0.3 3.51 3.58 3.64 3.69 3.73 3.76 3.78 3.80 3.82 3.83
0.4 3.54 3.60 3.65 3.68 3.72 3.74 3.76 3.78 3.80 3.81
0.5 3.56 3.61 3.65 3.68 3.71 3.73 3.75 3.77 3.78 3.79
0.6 3.57 3.62 3.65 3.68 3.71 3.73 3.74 3.76 3.77 3.78
0.7 3.58 3.62 3.65 3.68 3.70 3.72 3.74 3.75 3.76 3.77
0.8 3.59 3.63 3.66 3.68 3.70 3.72 3.73 3.74 3.76 3.76
0.9 3.59 3.63 3.66 3.68 3.70 3.72 3.73 3.74 3.75 3.76
1.0 3.60 3.63 3.66 3.68 3.70 3.71 3.73 3.74 3.75 3.76

at even steps are small. If we use all values with step size 4k they are alternately
large and small. At step size 2k and k we only use values at even step numbers,
i.e. small values, and the order ratios will tend to oscillate. It may be a good idea
to use every other value to get a smoother picture.
If the order ratios do not show any easily explainable pattern (cf. Table 16.3 for
x ≤ 100), then a reduction of the step size(s) may solve the problem. If not the
necessary conclusion is that our basic hypothesis (10.25) does not hold in this
region for this problem.
How much can we expect to gain from extrapolations? If we assume that the
auxiliary functions have roughly the same magnitude then going from a first
order to a second order result may almost double the number of correct decimals.
A similar gain can be expected going from second to fourth order (when there is
no third order term present). Going from second to third order will ‘only’ give
50 % more and only if the corresponding auxiliary function is well-behaved. So
the main area of application is to first (and second) order methods, but of course
here the need is also the greatest.

10.7 Test problem 3 – once again

We shall illustrate the techniques on test problem 3:


ut = uxx , 0 ≤ x ≤ 1, t > 0,
u(0, x) = u0 (x) = cos x, 0 ≤ x ≤ 1,
u(t, 1) = e−t cos 1, t > 0,

96
Table 10.4: k-ratio for asymmetric second order.

t\x 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
0.1 3.99 3.99 3.99 3.99 4.00 4.02 4.07 4.18 4.40 4.86
0.2 4.01 4.01 4.02 4.03 4.04 4.05 4.04 4.01 3.94 3.85
0.3 4.02 4.02 4.02 4.02 4.02 4.01 4.01 4.01 4.04 4.17
0.4 4.02 4.02 4.02 4.02 4.02 4.02 4.02 4.02 4.00 3.94
0.5 4.02 4.02 4.02 4.02 4.02 4.01 4.01 4.01 4.02 4.08
0.6 4.01 4.01 4.01 4.01 4.01 4.01 4.02 4.02 4.01 3.96
0.7 4.01 4.01 4.01 4.01 4.01 4.01 4.01 4.01 4.01 4.05
0.8 4.01 4.01 4.01 4.01 4.01 4.01 4.01 4.01 4.01 3.98
0.9 4.01 4.01 4.01 4.01 4.01 4.01 4.01 4.01 4.00 4.03
1.0 4.01 4.01 4.01 4.01 4.01 4.01 4.01 4.01 4.01 3.98

Table 10.5: h-ratio for symmetric second order with h = k.


t\x 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
0.1 15.94 15.94 15.94 15.95 15.98 16.05 16.21 16.55 17.22 18.58
0.2 16.02 16.03 16.05 16.07 16.11 16.13 16.12 16.03 15.84 15.53
0.3 16.07 16.06 16.06 16.05 16.04 16.03 16.02 16.03 16.12 16.47
0.4 16.05 16.05 16.05 16.05 16.05 16.06 16.07 16.06 16.00 15.82
0.5 16.05 16.05 16.05 16.05 16.05 16.04 16.03 16.03 16.05 16.22
0.6 16.04 16.04 16.04 16.04 16.04 16.04 16.05 16.06 16.03 15.91
0.7 16.04 16.04 16.04 16.04 16.04 16.04 16.03 16.03 16.03 16.14
0.8 16.04 16.04 16.04 16.04 16.04 16.04 16.04 16.05 16.04 15.95
0.9 16.03 16.03 16.03 16.03 16.03 16.03 16.03 16.02 16.02 16.10
1.0 16.03 16.03 16.03 16.03 16.03 16.03 16.03 16.04 16.04 15.96

ux (t, 0) = 0, t > 0.

We solve the problem numerically using Crank-Nicolson and begin with the first
order boundary approximation. Using formulas (10.27) and (10.29) we check the
order of the method calculating the ratios on a 10 × 10 grid using step sizes that
are 16 times smaller. The results are shown in Table 10.1 and Table 10.2 for h
and k respectively.
The method is clearly first order in h with only few values deviating slightly from
2.0. We deduce that the first order contribution to the error clearly dominates
the other h-terms and we conclude that using a first order boundary approxima-
tion degrades the performance of Crank-Nicolson. The picture is more confusing
for k where the second order is only convincing for larger values of t or x. The

97
Table 10.6: h-ratio for asymmetric second order with h = k.

t\x 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
0.1 9.91 9.10 7.55 6.97 6.86 7.00 7.30 7.67 8.04 8.30
0.2 7.90 7.68 8.28 8.54 8.45 8.19 7.89 7.63 7.44 7.32
0.3 8.41 8.12 7.83 7.91 8.03 8.13 8.20 8.24 8.25 8.25
0.4 7.95 7.86 8.17 8.16 8.06 8.01 8.00 8.02 8.05 8.07
0.5 8.19 8.03 7.91 7.98 8.04 8.07 8.07 8.05 8.03 8.02
0.6 7.97 7.91 8.11 8.05 8.00 7.99 8.01 8.03 8.04 8.05
0.7 8.10 8.00 7.94 7.99 8.03 8.03 8.02 8.01 8.00 8.00
0.8 7.97 7.93 8.07 8.01 7.98 7.99 8.00 8.01 8.02 8.02
0.9 8.06 7.99 7.95 7.99 8.01 8.01 8.00 7.99 7.99 7.99
1.0 7.97 7.94 8.05 7.99 7.98 7.99 8.00 8.00 8.00 8.00

h-component of the error estimate is two orders of magnitude larger than the
k-component and thus dominates the error. A comparison with the actual er-
ror shows that the estimate is less than 3 % over the actual error which has a
maximum value of about 0.002.
Table 10.1 indicates that Richardson extrapolation is possible and the dominance
of the h-component of the error estimate promises that it will be useful. The
maximum error after extrapolation is 0.00003.
The k-component of the error estimate exhibits a strange behaviour for small x
where also the order ratio indicated trouble. With a maximum value of 0.000005
this of no real concern, but it still indicates that we are doing something wrong.
This something is probably the first order boundary approximation which not
only reduces the h-order to 1 but also introduces a discontinuity in cx (t, x) at
(0, 0) since c(0, x) = 0 according to (9.18) and cx (t, 0) = − 21 e−t according to
(9.56) for our example. This will have an effect on some of the higher auxiliary
functions such that v1 − v4 in (10.28) is not dominated by kd when x is close to
0.
The values for c(t, x) as determined by v1 − v2 (see 10.26) with h = k = 0.00625
agree within 7 % with those obtained from solving the differential equation for
c (which we did in Chapter 9) and a better agreement can be obtained using
smaller step sizes. The corresponding determination of g(t, x) is reasonably good
when t and x are not too close to 0. In the regions where we have difficulty
determining the order (see Table 10.2) we can of course have little trust in an
application of formula (10.28) but in regions where the ratio (10.29) is between
3.0 and 5.0 the agreement is within 8 % with the step sizes chosen.

98
For the asymmetric second order boundary approximation (9.61) the second order
in k is clearly detectable on the 10 × 10 grid using h = 0.00625 and k = 0.025 (see
Table 10.4) and reasonably so for h (see Table 10.3). The values obtained here
for g(t, x) agree within 1 % with those previously obtained whereas the values for
f (t, x) are 10 - 20 % too small in agreement with the h-ratio being determined
consistently 10 - 20 % too small. The presence of an interfering h3 -term in the
error expansion is clearly noticeable here. The error estimate is dominated by
the k-term (because of the larger step size) and the maximum error is now less
than 0.000013. The number of grid points and thus the number of arithmetic
operations is four times smaller than before and yet the error is much smaller
(even after extrapolation of the foremr result).
For the symmetric second order approximation (9.69) we also expect second order
accuracy. The order ratio (10.27) for h gives values between 3.99 and 4.01 and for
k between 3.8 and 4.8 when using h = k = 0.025. These good results are due to
the fact that the next terms in the error expansion (10.25) are h4 and k 4 because
of the symmetry and therefore interfere little with the second order terms. The
number of grid points is again reduced by a factor 4 and the two components
of the error estimate are both smaller than 0.000013. We know that f = −g in
theory. In practice they agree within 2 % with each other and with the values
obtained from the independent solution of the differential equation.
Since the two error components thus almost cancel out the results are better
than the estimates tell us. To check this we try equal step sizes with the second
order boundary approximations. Using h = k = 0.025 we can confirm that
the symmetric approximation now leads to a method which is fourth order (see
Table 10.5) and that the asymmetric approximation (9.61) leads to a method
which is third order in the common step size (see Table 10.6). The error estimate
(and the error) is less than 2.2 × 10−6 for the asymmetric approximation and
0.6 × 10−9 for the symmetric one.

10.8 Which method to choose

This is a difficult one to answer because there is no method which is best in all
situations; but we can issue some general guidelines. Usually the choice is be-
tween the explicit method, the implicit method, and Crank-Nicolson. The explicit
method is the easiest one to program, but the stability condition,
2bµ ≤ 1, will often imply such restrictions on the time step, as to render this
option impractical. Crank-Nicolson seems optimal, being unconditionally stable
and second order accurate. But in some cases it gives rise to oscillations which
are damped very slowly. In such cases the implicit method becomes a viable al-
ternative. It is, however, only first order accurate in t and in many cases it tends

99
to produce solutions that are too smooth. Using the extrapolation techniques in
this chapter we can suggest the following alternative:
1. Use the implicit method.
2. Check the order in t, i.e. compute the order ratio.
3. If the results are first order then estimate the error.
4. Extrapolate to second order in t.
5. Check the orders in t and x.
6. Estimate the two error components.
In this way we can take advantage of the very stable behaviour of the implicit
method and still get a second order result, provided the order ratio allows it.
As for boundary conditions with a derivative the theory in Chapter 9 as well as
the previous example shows that we should prefer second order approximations
to first order and symmetric approximations to asymmetric ones whenever pos-
sible. The work as measured by the number of arithmetic operations only grows
marginally and the extra programming effort is well rewarded in better accuracy.

10.9 Literature

The idea to perform extrapolation and thus achieve a higher order goes back to
the british mathematician Lewis F. Richardson who introduced it for the special
case of p = 2 [30]. An application to parabolic equations is reported by Hartree
and Womersley [16]. Extrapolation has also been used in [20] and [12] but they
propose to extrapolate step by step and continue the difference scheme with the
extrapolated values. The formula for determining or checking the order is given
in lecture notes from the Technical University of Denmark [2] but the idea is
probably older. The history of extrapolation processes in numerical analysis is
given in the survey paper [17] which also contains an extensive bibliography.

100
10.10 Exercises
1. Solve problem 1 with the implicit method from t = 0 to t = 0.5 with
1 1 1
h = k = 10 , 20 , 40 .
For each point in the grid corresponding to the largest step size compute
the order-ratio (10.7) and produce a table similar to Table 10.1.
Deduce the order of the method.
Estimate the leading term of the error for each point in the above grid and
compare to the actual error.

2. Same as exercise 1 but with Crank-Nicolson’s method.

3. Solve problem 1 with the implicit method from t = 0 to t = 0.5 with


1
h = k = 40 .
Using calculations with (2k, h), (4k, h), (k, 2h) and (k, 4h) you should gain
information about the order in k and h as well as about the first terms in
the expression for the error in all grid points corresponding to 4k and 4h.
What can we do to obtain an error which in every grid point is less than
10−6 ?

4. Same as exercise 3 but with Crank-Nicolson’s method.

5. Same as exercise 3 but with problem 2.

6. Same as exercise 5 but with Crank-Nicolson’s method.

7. Using the results from exercise 3 perform Richardson extrapolation in k.


Under the assumption that the order in time is now 2, what is the k-
contribution to the error.
How small should k be to reduce this error term to less than 12 10−6 in every
grid point?

8. Same as exercise 7 but with problem 2.

101
102
Chapter 11

Two Space Dimensions

11.1 Introduction

A general, linear, parabolic equation in two space dimensions:

ut = b1 uxx + 2b12 uxy + b2 uyy − a1 ux − a2 uy + κu + ν (11.1)

is well-posed (cf. page 23) if

b1 > 0 and b1 b2 > b212 . (11.2)

The solution u(t, x, y) is a function of t, x, and y, and the coefficients may also
depend on t, x, and y.
To ensure a unique solution (11.1) must be supplemented by an initial condition

u(0, x, y) = u0 (x, y), X 1 ≤ x ≤ X2 , Y1 ≤ y ≤ Y2 (11.3)

and boundary conditions which might be of Dirichlet type such as

u(t, X1 , y) = u11 (t, y), t>0 (11.4)

or involving a normal derivative such as

αu(t, X1 , y) − βux (t, X1 , y) = γ, t>0 (11.5)

and similar relations for x = X2 , y = Y1 , and y = Y2 .


We shall begin treating the case where there is no mixed derivative term in (11.1),
i.e. that b12 = 0. We can then write the equation as

u t = P1 u + P2 u + ν (11.6)

103
where

P1 u = b1 uxx − a1 ux + θκu, (11.7)


P2 u = b2 uyy − a2 uy + (1 − θ)κu, (11.8)

and 0 ≤ θ ≤ 1. While symmetry considerations might speak for an even distri-


bution of the κu-term (θ = 21 ) it is computationally simpler to use θ = 0 or θ = 1
i.e. to include the κu-term fully in one of the two operators.
For the numerical solution of (11.1) we choose a step size k in the t-direction and
step sizes h1 = (X2 − X1 )/L and h2 = (Y2 − Y1 )/M in the x- and the y-direction,
n
respectively, and seek the numerical solution vlm on the discrete set of points
(nk, X1 + lh1 , Y1 + mh2 ), (l = 0, 1, . . . , L; m = 0, 1, . . . , M; n = 1, 2, . . . , N).

11.2 The explicit method

In the simplest case P1 u = b1 uxx and P2 u = b2 uyy such that

ut = b1 uxx + b2 uyy . (11.9)

The explicit method for (11.9) looks like


n+1 n
vlm − vlm
= b1 δx2 vlm
n
+ b2 δy2 vlm
n
. (11.10)
k
n
To study the stability of (11.10) we take vlm on the form
n
vlm = g n eiξ1 lh1 eiξ2 mh2 = g n eilϕ1 eimϕ2 (11.11)

and insert in (11.10):

g = 1 + b1 µ1 (eiϕ1 − 2 + e−iϕ1 ) + b2 µ2 (eiϕ2 − 2 + e−iϕ2 )


ϕ1 ϕ2
= 1 − 4b1 µ1 sin2 − 4b2 µ2 sin2 (11.12)
2 2
where µ1 = k/h21 and µ2 = k/h22 . The stability requirement is |g| ≤ 1 and since
g is real and clearly less than 1 the critical condition is g ≥ −1 or
ϕ1 ϕ2
2b1 µ1 sin2 + 2b2 µ2 sin2 ≤ 1. (11.13)
2 2
Since this must hold for all ϕ1 and ϕ2 the requirement for stability is
1
b1 µ1 + b2 µ2 ≤ . (11.14)
2

104
11.3 Implicit methods

Formula (11.14) puts severe restrictions on the step size k and it is very tempting
to study generalizations of the implicit or the Crank-Nicolson methods to two
space dimensions. A similar analysis will show that they are both unconditionally
stable, i.e. they are stable for any choice of k, h1 , and h2 (cf. exercise 1).
The use of an implicit method requires the solution of a set of linear equations
for each time step. In one space dimension the coefficient matrix for these equa-
tions is tridiagonal, and the equations can be solved with a number of simple
arithmetic operations (SAO) proportional to the number of internal grid points.
The situation is less favourable in two or more space dimensions.
If we have L − 1 internal points in the x-direction and M − 1 internal points in
the y-direction then we have (L − 1)(M − 1) grid points and the same number
of equations per time step. There are at most 5 non-zero coefficients in each
equation. If the grid points are ordered lexicographically then the coefficient
matrix will have a band structure such as shown to the left in Fig. 11.1 where
the non-zero coefficients are marked with squares. During a Gaussian elimination
the region between the outer bands will fill in as shown to the right in Fig. 11.1
resulting in a number of non-zeroes which is approximately L2 · M and a number
of SAO proportional to L3 · M.

Figure 11.1: A coefficient matrix before and after elimination.

Example.
Consider ut = uxx + uyy on the unit square with h1 = h2 = 0.01, L = M = 100.
If we want to integrate to t = 1 using the explicit method then stability requires
1 2
k ≤ 4·100 2 and a number of time steps at least K = 4 · 100 . The number
of SAO is proportional to the number of grid points which is approximately
K · L · M = 4L4 = 4 · 108 .
If we want to use an implicit method we may choose k = h1 = h2 , K = L = M so
the number of grid points is equal to L3 but the number of SAO is proportional
to K · L3 · M = L5 = 1010 .
We can conclude that there is no advantage in using implicit methods directly

105
since the time involved in solving the linear equations outweighs the advantage of
using larger step sizes in time. There are more elaborate ways to order the equa-
tions and to perform the solution process, but none that will make a significant
difference in favour of implicit methods.

11.4 ADI methods

When the differential operator can be split as in (11.6) – (11.8) there are ways to
avoid the L3 -factor. Such methods are called time-splitting methods or Locally
One-Dimensional (LOD) or Alternating Direction Implicit (ADI) and the general
idea is to split a time step in two and to take one operator or one space coordinate
at a time.
Taking our inspiration from the Crank-Nicolson method we begin discretizing
(11.6) in the time-direction:

1 un+1 − un
ut ((n + )k, x, y) = + O(k 2 ), (11.15)
2 k
1 1
P1 u + P2 u + ν = P1 (un+1 + un ) + P2 (un+1 + un ) (11.16)
2 2
1 n+1
+ (ν + ν n ) + O(k 2 ).
2
Insert in (11.6), multiply by k, and rearrange:
1 1 1 1
(I − kP1 − kP2 )un+1 = (I + kP1 + kP2 )un (11.17)
2 2 2 2
1
+ k(ν n+1 + ν n ) + O(k 3).
2
If we add 14 k 2 P1 P2 un+1 on the left side and 41 k 2 P1 P2 un on the right side then we
commit an error which is O(k 3 ) and therefore can be included in that term:
1 1 1 1
(I − kP1 )(I − kP2 )un+1 = (I + kP1 )(I + kP2 )un (11.18)
2 2 2 2
1
+ k(ν n+1 + ν n ) + O(k 3 ).
2
We now discretize in the space coordinates replacing P1 by P1h , P2 by P2h , and
u by v:
1 n+1 1 n+1 n+1 1 n 1 n n
(I − kP1h )(I − kP2h )v = (I + kP1h )(I + kP2h )v
2 2 2 2
1
+ k(ν n+1 + ν n ) (11.19)
2
106
and this gives rise to the Peaceman-Rachford method [27]:
1 n+1 1 n n
(I − kP1h )ṽ = (I + kP2h )v + α, (11.20)
2 2
1 n+1 n+1 1 n
(I − kP2h )v = (I + kP1h )ṽ + β. (11.21)
2 2
The operators P1h and P2h involve the respective coefficients of the differential
equation and may therefore depend on time. This dependence is indicated by
superscripts n and n + 1. The intermediate value, ṽ, has no special relation to
any particular intermediate time value. It has been introduced for reasons of
computational efficiency and has no particular significance otherwise.
We have introduced the values α and β to take into account the inhomogeneous
term ν because it is not evident how this term should be split. We shall attend
to this matter shortly.

11.5 The Peaceman-Rachford method

In order to check whether the solution v n+1 to (11.20) and (11.21) is also the
solution to (11.19) we start with v n+1 from (11.21) and apply the difference
operators from the left side of (11.19):
1 n+1 1 n+1 n+1 1 n+1 1 n
(I − kP1h )(I − kP2h )v = (I − kP1h ){(I + kP1h )ṽ + β}
2 2 2 2
1 n 1 n+1 1 n+1
= (I + kP1h )(I − kP1h )ṽ + (I − kP1h )β
2 2 2
1 n 1 n n 1 n
= (I + kP1h )(I + kP2h )v + (I + kP1h )α (11.22)
2 2 2
1 n+1
+ (I − kP1h )β
2
The first equal sign follows from (11.21), the third one from (11.20), and the
n+1 n
second one requires that the operators P1h and P1h commute.
This is not always the case when the coefficients depend on t and x. A closer
analysis reveals that we have commutativity if the coefficients b1 , a1 , and κ are
either constant or only depend on t and y or only depend on x and y. If κ depends
on all of t, x, and y we may incorporate it in the operator P2 . If a1 and κ are 0
then b1 may depend on both t and x (and y) if it can be written as a product of
a function of t (and y) and a function of x (and y).
The operators P1 and P2 do not enter in a symmetric fashion. Therefore it may
happen that we do not have commutativity for one but we do for the other. In
this case we may switch freely between the x- and y-coordinates.

107
The main consequence of non-commutativity is that the ADI method (11.20) –
(11.21) becomes first order in time instead of the expected second order.
Once commutativity is established we take a closer look at the inhomogeneous
term. From (11.19) and (11.22) we have the requirement that
1 n 1 n+1 1
(I + kP1h )α + (I − kP1h )β = k(ν n+1 + ν n ) (11.23)
2 2 2
where a discrepancy of order O(k 3) may be allowed with reference to a similar
term in (11.18). There are three possible choices for α and β that will satisfy
this:
1 1
α = kν n , β = kν n+1 , (11.24)
2 2
1
α = β = k(ν n+1 + ν n ), (11.25)
4
1 n+ 1
α = β = kν 2 . (11.26)
2

11.6 Practical considerations

The system of equations (11.20) for ṽ contains one equation for each interior grid
point in the xy-region. The operator P1h refers to neighbouring points in the
x-direction and the resulting coefficient matrix becomes tridiagonal and we can
therefore solve the system with a number of SAO proportional to the number
of interior grid points. Similarly the system of equations (11.21) for v n+1 is
effectively tridiagonal and can be solved at a similar cost irrespective of whether
we reorder the grid points or not.
We shall now take a detailed look at how we set up and solve the two systems
(11.20) and (11.21).
1. To compute the right-hand-side of (11.20) we need the values of v n at all
interior grid points and at all interior grid points on the boundaries y = Y1 and
y = Y2 . If we have Dirichlet conditions on these boundaries we know these values
directly. If the boundary conditions involve the y-derivative on one or both of
these boundary line segments then we use a (preferably second order) difference
approximation to the derivative.
Cost: ∼ 5LM SAO.
2. To complete the system (11.20) we need information on ṽ for interior grid
points on the boundary line segments x = X1 and x = X2 . If P1h does not
depend on time then rearranging (11.21) and adding to (11.20) gives
1 n n 1 n+1 n+1
2ṽ = (I + kP2h )v + (I − kP2h )v + α − β. (11.27)
2 2
108
If we have Dirichlet boundary conditions on x = X1 and x = X2 then we have
information on v n and v n+1 here and we can apply P2h .
If the boundary conditions on one or both these lines involve the x-derivative
but those on y = Y1 and y = Y2 are of Dirichlet type, then we might consider
interchanging P1 and P2 .
If the boundary conditions, however, are on the form ux (t, Xj , y) = fj (t),
(j = 1 and/or 2) where fj does not depend on y then we can differentiate (11.27)
w.r.t. x, and since the terms with P2h vanish we end up with
1
ṽx = (f (tn ) + f (tn+1 )) (11.28)
2
plus possible α- and β-terms.
Remark. Since ṽ is an intermediate value it is sometimes suggested to use
1 n
ṽ = (v + v n+1 ) (11.29)
2
on the boundary. We cannot in general recommend this since ṽ has no particular
relation to the intermediate time level n + 21 , but we note that it is after all
an O(k 2 ) approximation to (11.27) and although this is not quite good enough
(11.29) might still come in handy in special cases. ✷
3. Solve system (11.20).
Cost: ∼ 8LM SAO.
4. To compute the right-hand-side of (11.21) we need the same values of ṽ for
x = X1 and x = X2 as we discussed in 2.
Cost: ∼ 5LM SAO.
5. To complete the system (11.21) we need information on v n+1 for y = Y1 and
y = Y2 . As in 1. if we have Dirichlet boundary conditions we know these values
directly. Otherwise we include equations involving difference approximations to
the derivatives.
6. Solve system (11.21).
Cost: ∼ 8LM SAO.
Total cost: ∼ 26LM SAO per time step.

1. 2. 3. 4. 5. 6.
In the figure above we have visualized the considerations. The horizontal or
vertical lines indicate which operator we are concerned with (P1h or P2h ) and the

109
bullets indicate which function values we are considering. In 1. we are computing
right-hand-side values based on v n . In 2. and 3. we are computing ṽ-values and
in 4. right-hand-side values based on these. Finally in 5. and 6. we compute
values for v n+1.

11.7 Stability of Peaceman-Rachford

We have derived the Peaceman-Rachford method on the basis of ideas from


Crank-Nicolson so we expect unconditional stability, but we have also made a
few minor alterations along the way so it is probably a good idea to perform an
independent check. We shall do this for the special case ut = b1 uxx + b2 uyy with
constant coefficients b1 and b2 . Inserting
n
vlm = g n eilϕ1 eimϕ2 and ṽlm = g̃vlm
n
(11.30)

in (11.20) and (11.21) gives


ϕ1 ϕ2
(1 + 2b1 µ1 sin2 )g̃ = 1 − 2b2 µ2 sin2 , (11.31)
2 2
2 ϕ2 2 ϕ1
(1 + 2b2 µ2 sin )g = (1 − 2b1 µ1 sin )g̃, (11.32)
2 2
such that
1 − 2b1 µ1 sin2 ϕ21 1 − 2b2 µ2 sin2 ϕ22
g = · . (11.33)
1 + 2b1 µ1 sin2 ϕ21 1 + 2b2 µ2 sin2 ϕ22
For simplicity we introduce
ϕ1 ϕ2
x1 = b1 µ1 sin2 , x2 = b2 µ2 sin2 (11.34)
2 2
and the formula for the growth factor now takes the simpler form
(1 − 2x1 )(1 − 2x2 )
g = . (11.35)
(1 + 2x1 )(1 + 2x2 )
Since x1 ≥ 0 and x2 ≥ 0 it is easily seen that −1 ≤ g ≤ 1 such that we indeed
have unconditional stability. We also note that components with high frequency
in both directions ϕ1 ∼ π, ϕ2 ∼ π will have g ∼ 1 (if b1 µ1 and b2 µ2 are large)
so these components will not be damped very much and they will not alternate
from one time step to the next. The growth factor might still take values close
to −1 due to components with high frequency in one direction and low frequency
in the other, and the well-known Crank-Nicolson oscillations will be observed
when Peaceman-Rachford is used on problems with discontinuities in the initial
condition.

110
11.8 D’Yakonov

There are other ways of splitting equation (11.19). D’Yakonov [39] has suggested
1 n+1 1 n 1 n n
(I − kP1h )ṽ = (I + kP1h )(I + kP2h )v + α, (11.36)
2 2 2
1 n+1 n+1
(I − kP2h )v = ṽ + β. (11.37)
2
To check the equivalence we take the solution v n+1 from (11.37) and apply the
difference operators from the left side of (11.19):
1 n+1 1 n+1 n+1 1 n+1
(I − kP1h )(I − kP2h )v = (I − kP1h )(ṽ + β) (11.38)
2 2 2
1 n 1 n n 1 n+1
= (I + kP1h )(I + kP2h )v + α + (I − kP1h )β
2 2 2
In this case we have no problem with commutativity of the operators. As for the
inhomogeneous term an obvious choice is
1
β = 0, α = k(ν n+1 + ν n ). (11.39)
2
When calculating the right-hand-side of (11.36) we must know v n on all grid
points including the boundaries and the corners. In addition we have in general
a sum of 9 terms for each equation possibly with different coefficients so the cost
for step 1. is ∼ 17LM SAO.
Setting up system (11.36) requires ṽ on the interior points on the boundary
segments x = X1 and x = X2 . These values can be found by solving (11.37) from
right to left if we have Dirichlet conditions on these boundary segments.
Solving equations (11.36) now costs ∼ 8LM SAO.
The right-hand-side of (11.37) is easy and so are the necessary boundary values
of v n+1 on y = Y1 and y = Y2 . The solution of (11.37) then costs another ∼ 8LM
SAO, and the total cost of a time step with D’Yakonov amounts to ∼ 33LM
SAO making D’Yakonov slightly more expensive than Peaceman-Rachford.

11.9 Douglas-Rachford

Other ADI methods can be derived from other basic schemes. If we for example
take our inspiration from the implicit method and discretize in time we get
un+1 − un
= P1 un+1 + P2 un+1 + ν n+1 + O(k) (11.40)
k
111
or

(I − kP1 − kP2 )un+1 = un + kν n+1 + O(k 2) (11.41)

or

(I − kP1 )(I − kP2 )un+1 = (I + k 2 P1 P2 )un + kν n+1 + O(k 2) (11.42)

where we in the last equation have incorporated k 2 P1 P2 (un+1 − un ) in the O(k 2 )-


term. (11.42) is now discretized in the space directions to
n+1 n+1 n+1
(I − kP1h )(I − kP2h )v = (I + k 2 P1h
n n
P2h )v n + kν n+1 (11.43)

and this formula can be split into the following two which are known as the
Douglas-Rachford method [10]:
n+1 n
(I − kP1h )ṽ = (I + kP2h )v n + α (11.44)
n+1 n+1 n n
(I − kP2h )v = ṽ − kP2h v +β (11.45)

To check that v n+1 in (11.45) is also the solution to (11.43) we take v n+1 from
(11.45) and apply the difference operators from (11.43):
n+1 n+1 n+1 n+1 n n
(I − kP1h )(I − kP2h )v = (I − kP1h ){ṽ − kP2h v + β}
n n n+1 n n
= (I + kP2h )v + α − (I − kP1h )(kP2h v − β)
2 n+1 n n n+1
= (I + k P1h P2h )v + α + (I − kP1h )β. (11.46)

The term with v n on the right-hand-side is not exactly what it should be if P1


depends on time, but the difference is O(k 3 ) which is allowed.
In order to match the inhomogeneous term in (11.43) a natural choice for α and
β would be α = kν n+1 , β = 0.
One could question the relevance of the first order terms on the right-hand-side of
(11.44) and (11.45). Actually the k 2 -term in (11.42) could easily be incorporated
in the O(k 2)-term and the result would be a simpler version of formula (11.43)
which could be split into
n+1
(I − kP1h )ṽ = vn + α (11.47)
n+1 n+1
(I − kP2h )v = ṽ + β (11.48)

where we again would suggest α = kν n+1 , β = 0 in order to match a possible


inhomogeneous term.
The practical considerations are dealt with as for Peaceman-Rachford or D’Yako-
nov. We just summarize the results for the computational work which is similar
to Peaceman-Rachford for (11.44) – (11.45) and slightly less (∼ 18LM SAO) for
the simpler scheme (11.47) – (11.48).

112
11.10 Stability of Douglas-Rachford

Again we expect to inherit the unconditional stability of the implicit method


but we had better check it directly. We again look at the special case ut =
b1 uxx + b2 uyy , and we use (11.30) and (11.34). From (11.44) – (11.45) we get

(1 + 4x1 )g̃ = 1 − 4x2 , (1 + 4x2 )g = g̃ + 4x2 , (11.49)

such that
1 − 4x2 + (1 + 4x1 )4x2 1 + 16x1 x2
g = = . (11.50)
(1 + 4x1 )(1 + 4x2 ) 1 + 4x1 + 4x2 + 16x1 x2

Since x1 > 0 and x2 > 0 we have 0 < g < 1 just like we hoped. For the simpler
scheme (11.47) – (11.48) the result is
1
g = (11.51)
(1 + 4x1 )(1 + 4x2 )

which also ensures 0 < g < 1. We mention in passing that the original implicit
method would have given
1
g = (11.52)
1 + 4x1 + 4x2
For large values of x1 and x2 , i.e. large values of bi µi and ϕi ≈ π, formula
(11.50) gives g ≈ 1 corresponding to weak damping whereas (11.51) gives g ≈ 0
corresponding to strong damping and even stronger than with the implicit scheme
(11.52). This may speak in favour of the simpler scheme (11.47) – (11.48).

11.11 The local truncation error

In order to check the local truncation error we use the symbols of the differential
and difference operators (cf. section 3.3). We consider the simple equation

ut − b1 uxx − b2 uyy = ν (11.53)

with constant coefficients and use the test functions


n
vlm = esnk eilh1 ξ1 eimh2 ξ2 = est eixξ1 eiyξ2 , n
ṽlm = g̃vlm . (11.54)

The symbol for the differential operator in (11.53) is

p(s, ξ1 , ξ2 ) = s + b1 ξ12 + b2 ξ22. (11.55)

113
We first look at the simple scheme (11.47) – (11.48) where we get
k h1 ξ1
(1 + 4b1 2
sin2 )g̃ = 1 + kν n+1 , (11.56)
h1 2
k h ξ
2 2 sk
(1 + 4b2 2 sin2 )e = g̃ (11.57)
h2 2
or
k h1 ξ1 k h2 ξ2 sk
(1 + 4b1 2
sin2 )(1 + 4b2 2 sin2 )e − 1 = kν n+1 . (11.58)
h1 2 h2 2
The left-hand-side contains the terms which originate from the operator Pk,h and
the right-hand-side refers to Rk,h . Since Douglas-Rachford is derived from the
implicit method the natural expansion point is at t = (n + 1)k. With this choice
the right-hand-side operator Rk,h becomes the identity and the corresponding
symbol
rk,h (s, ξ1 , ξ2) = 1. (11.59)
This also means that we should divide the left-hand-side of (11.58) by esk before
Taylor expansion which then gives
1
(1 + kb1 ξ12 + O(kh21 ))(1 + kb2 ξ22 + O(kh22 )) − (1 − sk + s2 k 2 + O(k 3 )). (11.60)
2
Before we begin checking orders we should remember that we have multiplied by
k in order to get formula (11.41) and the following formulae. Therefore we must
divide (11.60) by k in order to get back to the standard form and we now get
1
pk,h (s, ξ1, ξ2 ) = b1 ξ12 + b2 ξ22 + kb1 b2 ξ12ξ22 + s − s2 k + O(k 2 + h21 + h22 ). (11.61)
2
We now combine (11.55), (11.59) and (11.61) in
1
pk,h − rk,hp = k(b1 b2 ξ12 ξ22 − s2 ) + O(k 2 + h21 + h22 ). (11.62)
2
Formula (11.62) shows that the Simple Douglas-Rachford (SDR) scheme (11.47)
– (11.48) is indeed first order in time and second order in the space variables
as we would expect for a scheme derived from the implicit method. We note
that w.r.t. the order of the local truncation error it is not important whether we
compute the inhomogeneous term at t = (n + 1)k or at t = nk. It might have an
effect on the size of the error, though.
For the Traditional Douglas-Rachford (TDR) scheme we have instead of (11.58)
k h1 ξ1 k h2 ξ2 sk k h2 ξ2
(1 + 4b1 2
sin2 )(1 + 4b2 2 sin2 )e − (1 − 4b2 2 sin2 )
h1 2 h2 2 h2 2
k h1 ξ1 k h2 ξ2
− (1 + 4b1 2 sin2 ) · 4b2 2 sin2 = kν n+1 . (11.63)
h1 2 h2 2

114
With the expansion point at t = (n+1)k we again get rk,h = 1 and using a Taylor
expansion of the left hand side:

(1 + kb1 ξ12 + O(kh21 ))(1 + kb2 ξ22 + O(kh22 )) (11.64)


1
− (1 + (kb1 ξ12 + O(kh21 ))kb2 ξ22 + O(kh22))(1 − sk + s2 k 2 + O(k 3)).
2
The symbol of the difference operator becomes
1
pk,h (s, ξ1, ξ2 ) = s + b1 ξ12 + b2 ξ22 + k(b1 b2 ξ12 ξ22 − s2 − b1 b2 ξ12 ξ22 ) (11.65)
2
+ O(k 2 + h21 + h22 )

and the local truncation error is


1
pk,h − rk,h p = − ks2 + O(k 2 + h21 + h22 ). (11.66)
2
This result looks more elegant than (11.62) but whether the error becomes smaller
is quite a different matter.
For Peaceman-Rachford (PR) and D’Yakonov the formula corresponding to
(11.60) and (11.64) is
1 1 1
(1 + kb1 ξ12 + O(kh21))(1 + kb2 ξ22 + O(kh22 ))(1 + sk + s2 k 2 + O(k 3 ))
2 2 2
1 1
− (1 − kb1 ξ12 + O(kh21 ))(1 − kb2 ξ22 + O(kh22 )) (11.67)
2 2
such that
1 1 1 1
pk,h (s, ξ1, ξ2 ) = s + b1 ξ12 + b2 ξ22 + b1 ξ12 + b2 ξ22 (11.68)
2 2 2 2
1 1 1 1 1
+ k( b1 sξ12 + b2 sξ22 + b1 b2 ξ12 ξ22 + s2 − b1 b2 ξ12 ξ22 )
2 2 4 2 4
+ O(k 2 + h21 + h22 ).

The exact expression for rk,h (s, ξ1 , ξ2 ) depends on which one of the choices (11.24)
– (11.26) we select, but up to O(k 2), and O(h21) in case of (11.24), we get
1
rk,h (s, ξ1 , ξ2 ) = 1 + sk + O(k 2 + h21 ). (11.69)
2
We now combine (11.55), (11.68), and (11.69):
1
pk,h − rk,hp = s + b1 ξ12 + b2 ξ22 + k(b1 sξ12 + b2 sξ22 + s2 )
2
1
− (1 + sk)(s + b1 ξ12 + b2 ξ22 ) + O(k 2 + h21 + h22 ) (11.70)
2
= O(k 2 + h21 + h22 )

115
showing that Peaceman-Rachford is indeed second order accurate at least for
the simple equation (11.53) with constant coefficients. Extending the result to
lower order terms presents no problem but if the coefficients are allowed to vary
with space and time we have a more complicated picture as mentioned in the
discussion on page 107.

11.12 The global error

In order to study the global error we introduce a set of auxiliary functions which
may depend on (t, x, y) but not on (k, h1 , h2 ), and we assume that the numerical
solution can be written as

v = u − h1 c1 − h2 c2 − kd − h1 ke1 − h2 ke2 − h21 f1 − h22 f2 − k 2 g − · · · (11.71)

We need a similar assumption for the intermediate values

ṽ = ũ − h1 c̃1 − h2 c̃2 − k d˜ − h1 kẽ1 − h2 kẽ2 − h21 f˜1 − h22 f˜2 − k 2 g̃ − · · · (11.72)

and we shall seek information on these auxiliary functions. We shall assume


Dirichlet boundary conditions and therefore have homogeneous side conditions
for all the auxiliary functions. Beginning with the simple version of Douglas-
Rachford (11.47) – (11.48) we have
n+1
(I − kP1h )ṽ = v n + kν n+1 (11.73)
n+1 n+1
(I − kP2h )v = ṽ (11.74)

where P1h and P2h are discretized versions of P1 and P2 from (11.7) – (11.8).
Using Taylor expansion we have

P2h u = (b2 δy2 − a2 µ̃δy + (1 − θ)κ)u


1 1
= b2 uyy − a2 uy + (1 − θ)κu + b2 h22 u4y − a2 h22 uyyy + O(h42)
12 6
1 1
= P2 u + b2 h22 u4y − a2 h22 uyyy + O(h42) (11.75)
12 6
and similarly for the auxiliary functions and for P1 .
Inserting (11.71) on the left-hand-side of (11.74) and applying (11.75) we have
with t = (n + 1)k as expansion point

(I − kP2h )v n+1 = (I − kP2h )(u − h1 c1 − · · · − k 2 g) + O(· · ·)


= (I − kP2 )(u − h1 c1 − · · · − k 2 g) (11.76)
1 1
− b2 kh22 u4y + a2 kh22 uyyy + O(· · ·).
12 6
116
O(· · ·) shall here and in the following indicate third order terms in k, h1 , and h2
and will therefore include the two terms with u4y and uyyy .
Inserting (11.72) on the right-hand-side of (11.74) and equating terms we get

ũ = u, c̃1 = c1 , c̃2 = c2 , f˜1 = f1 , f˜2 = f2 (11.77)

together with

d˜ = d + P2 u, (11.78)
ẽ1 = e1 − P2 c1 , (11.79)
ẽ2 = e2 − P2 c2 , (11.80)
g̃ = g − P2 d. (11.81)

Next we insert (11.72) on the left-hand-side of (11.73)

(I − kP1h )ṽ = (I − kP1h )(ũ − h1 c̃1 − · · · − k 2 g̃) + O(· · ·)


= (I − kP1 )(ũ − h1 c̃1 − · · · − k 2 g̃) (11.82)
1 1
− b1 kh21 u4x + a1 kh21 uxxx + O(· · ·)
12 6
For the right-hand-side of (11.73) we must remember that the expansion point is
at t = (n + 1)k and that v n therefore is one time step earlier:

v n + kν n+1 = (u − h1 c1 − · · · − k 2 g) + kν (11.83)
1
− kut + h1 kc1t + h2 kc2t + k 2 dt + k 2 utt + O(· · ·)
2
Equating terms in (11.82) and (11.83) confirms (11.77) and adds

d˜ + P1 ũ = d + ut − ν, (11.84)
ẽ1 − P1 c̃1 = e1 − c1t , (11.85)
ẽ2 − P1 c̃2 = e2 − c2t , (11.86)
1
g̃ − P1 d˜ = g − dt − utt . (11.87)
2
Comparing (11.78) and (11.84) and remembering that ũ = u we have

d + P2 u = d + u t − P1 u − ν

or

u t − P1 u − P2 u = ν (11.88)

confirming the consistency of our assumptions.


(11.79) and (11.85) together with (11.77) gives

e1 − P2 c1 = e1 − c1t + P1 c

117
or

c1t − P1 c1 − P2 c1 = 0. (11.89)

From (11.80) and (11.86) we get a similar equation for c2 and since the side
conditions are also homogeneous we may conclude that c1 ≡ c2 ≡ 0.
From (11.81) and (11.87) we get using (11.78)
1 1
g − P2 d = g − dt + P1 d˜ − utt = g − dt + P1 d − utt + P1 P2 u
2 2
or
1
dt − P1 d − P2 d = − utt + P1 P2 u (11.90)
2
showing that d(t, x, y) is not identically 0 and that the error therefore is first
order in k. If we continue in this way we shall see that f1 and f2 are also different
from 0 and that the error therefore is O(k + h21 + h22 ) as we would expect.
We might note here that we have multiplied by k in order to get to equations
(11.73) – (11.74) and this is the reason why we only get information about the
auxiliary functions corresponding to the first order terms in (11.71) even though
we compare terms up to and including second order.
For the traditional Douglas-Rachford scheme (11.44) – (11.45) the equations are

(I − kP1h )ṽ = (I + kP2h )v n + kν n+1 , (11.91)


(I − kP2h )v n+1 = ṽ − kP2h v n . (11.92)

The extra term which has been added in (11.91) and subtracted in (11.92) is

kP2h v n = kP2h (un − h1 c1 − h2 c2 − kd) + O(· · ·)


= kP2 (un − h1 c1 − h2 c2 − kd) + O(· · ·)
= kP2 (un+1 − h1 c1 − h2 c2 − kd) − k 2 P2 ut + O(· · ·) (11.93)

where the last term is due to the expansion point being at t = (n + 1)k.
Equating terms in (11.92) gives the equalities (11.77) together with

d˜ = d, ẽ1 = e1 , ẽ2 = e2 , g̃ = g + P2 ut (11.94)

and from (11.91) we get

d˜ + P1 ũ = d + ut − P2 u − ν, (11.95)
ẽ1 − P1 c̃1 = e1 − c1t + P2 c1 , (11.96)
ẽ2 − P1 c̃2 = e2 − c2t + P2 c2 , (11.97)
1
g̃ − P1 d˜ = g − dt + P2 d − utt + P2 ut . (11.98)
2
118
The first three imply (11.88), (11.89), and its analogue such that we again can
conclude that c1 ≡ c2 ≡ 0.
From (11.94) and (11.98) we finally deduce that
1
dt − P1 d − P2 d = − utt (11.99)
2
in accordance with our expectations that Douglas-Rachford is first order in time.
We note that ṽ now is a rather good approximation to v n+1 , but whether v n+1
now is a better or worse approximation to u is impossible to decide.
A similar analysis can be performed for Peaceman-Rachford and D’Yakonov to
show that these methods are indeed second order in all three step sizes.
In practice we can check the orders and estimate the various contributions to the
error using the methods from Chapter 10 taking each step size separately.

119
11.13 Exercises
1. Calculate the growth factor, g, for the implicit method and Crank-Nicolson
on (11.9) and show that both methods are unconditionally stable.

2. Investigate the stability of D’Yakonov’s method (11.36) – (11.37) when


applied to (11.9).

3. Solve (11.9) with b1 = b2 = 1 on the unit square 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 and


for 0 ≤ t ≤ 0.5 using the Simple Douglas-Rachford (SDR) method (11.47)
1 1 1
– (11.48) with h1 = h2 = k = 10 , 20 , and 40 .
Initial and boundary values are taken from the true solution
u(t, x, y) = e−4t sin(x − y) cos(x + y).
Compute the max-norm and the 2-norm of the error for
t = 0.1, 0.2, 0.3, 0.4, 0.5.

4. Solve the problem from the previous exercise using the Traditional
Douglas-Rachford (TDR) method (11.44) – (11.45).

5. Solve the problem from the previous exercise using the Peaceman-Rachford
(PR) method.
1
6. Solve the problem from exercise 3 with SDR and with h1 = h2 = k = 40
.
Using calculations with 2h1 , 4h1 , resp. 2h2 and 4h2 , resp. 2k, 4k you should
gain information about the order in h1 , h2 , and k as well as about the first
terms in the expression for the error in all grid points corresponding to 4h1 ,
4h2 , 4k at t = 1.

7. Same as above with TDR.

8. Same as above with PR.

9. Using the results from exercise 6 perform Richardson extrapolation in k.


Under the assumption that the order in time is now 2, what is the
k-contribution to the error.

10. Same as above but with TDR.

120
Chapter 12

Equations with Mixed Derivative


Terms

We now return to the general equation (11.1). As a difference approximation to


uxy we shall use

2 vl,m+1 − vl,m−1
δxy vlm = µ̃x δx (µ̃y δy vlm ) = µ̃x δx ( )
2h2
1
= (vl+1,m+1 − vl−1,m+1 − vl+1,m−1 + vl−1,m−1 ) (12.1)
4h1 h2
2
= µ̃y δy (µ̃x δx vlm ) = δyx vlm .

There is no obvious way of splitting the mixed derivative or difference operator


between the two operators P1 and P2 in (11.6) so we shall instead treat the mixed
derivative term in a way analogous to what we did for the inhomogeneous term.
The first scheme we consider is the Simple Douglas-Rachford scheme (11.47) –
(11.48) where α and β now should be chosen to take care of the mixed derivative
term (in addition to a possible inhomogeneous term which we shall disregard
here).
Following the analysis on page 112 we shall select α and β such that
n+1 2 n+1
α + (I − kP1h )β = 2kb12 δxy v + O(k 2). (12.2)

There are three straightforward choices for α and β which will satisfy (12.2):
2 n
α = β = kb12 δxy v , (12.3)
2 n 2
α = kb12 δxy v , β = kb12 δxy ṽ, (12.4)
2 n
α = 2kb12 δxy v , β = 0. (12.5)

121
For the Traditional Douglas-Rachford scheme the condition is the same so the
same three possibilities for α and β apply.
For the Peaceman-Rachford scheme (11.20) – (11.21) we would aim at
1 n 1 n+1 2
(I + kP1h )α + (I − kP1h )β = kb12 δxy (v n+1 + v n ) + O(k 3). (12.6)
2 2
This is a bit more difficult to achieve. Two obvious suggestions are (12.3) and
(12.4) but they are not quite accurate enough and the resulting method becomes
only first order in time.
Formula (12.3) would be good enough if we could replace v n by an intermediate
1
value v n+ 2 . An approximation to a value at time t = (n + 12 )k can be obtained
by extrapolation from values at t = (n − 1)k and t = nk:
1 1
v̂ n+ 2 = v n + (v n − v n−1) (12.7)
2
and a good suggestion for α and β is now
1
2 n+ 2
α = β = kb12 δxy v̂ . (12.8)

In the same manner (12.4) would be good if we could replace ṽ by v n+1 . An


extrapolated value would here be v n + (v n − v n−1 ) such that we have
2 n 2
α = kb12 δxy v , β = kb12 δxy (2v n − v n−1 ) (12.9)

and both (12.8) and (12.9) would lead to schemes which are second order in time.

12.1 Practical considerations

We now have about a dozen different combinations but they are not all equally
good. The first point to consider is how to get boundary values at x = X1 and
x = X2 for ṽ. For the Simple Douglas-Rachford scheme (11.47) – (11.48) we solve
(11.48) to get
n+1 n+1
ṽ = (I − kP2h )v −β (12.10)

If we choose formula (12.5) then β = 0 and (12.10) can be used as it stands.


2 n
If we choose formula (12.3) then the β-term involves δxy v which cannot be
calculated at the boundary points. There are two ways around this.
1. Take the difference at the nearest neighbour point, e.g.
2 n 2 n
δxy v0m := δxy v1m

122
This will introduce an O(h1 )-error and that is not ideal.
2. Use a linear extrapolation in the x-direction, e.g.
2 n 2 n 2 n
δxy v0m := 2δxy v1m − δxy v2m .

2
Formula (12.4) presents even bigger problems since we cannot calculate δxy ṽ at
the neighbouring points to any of the boundaries. We suggest either to use
extrapolation as above or to use (11.29) which is accurate enough here.
For the Traditional Douglas-Rachford scheme (11.44) – (11.45) the formula for ṽ
at the x-boundaries is
n+1 n+1 n n
ṽ = (I − kP2h )v + kP2h v −β (12.11)

and the same considerations apply.


For the Peaceman-Rachford scheme (11.20)-(11.21) the formula for ṽ is (11.27)
and this clearly favours the case where α = β so that (12.3) and (12.8) are ideal
choices. (12.4), (12.5), and (12.9) can be tackled using the suggestions above.

12.2 Stability with mixed derivative

We shall study stability requirements in relation to the differential equation

ut = b1 uxx + 2b12 uxy + b2 uyy (12.12)

with the discretization


n+1 n
vlm − vlm
= (1 − θ)(b1 δx2 + 2b12 δxy
2 n
+ b2 δy2 )vlm (12.13)
k
+θ(b1 δx2 + 2b12 δxy
2
+ b2 δy2 )vlm
n+1

where θ = 0, 0.5, and 1 corresponds to the explicit, the Crank-Nicolson, and the
implicit method, respectively. We put the discretized solution on the form
n
vlm = esnk eiξ1 lh1 eiξ2 mh2 = g n eilϕ1 eimϕ2 (12.14)

and use the abbreviations


ϕ1
x1 = b1 µ1 sin2 , (12.15)
2
ϕ2
x2 = b2 µ2 sin2 , (12.16)
2
x12 = b12 µ12 sin ϕ1 sin ϕ2 . (12.17)

123
Remember that the condition for (12.12) to be well-posed is

b212 < b1 b2 (12.18)

together with b1 > 0 and b2 > 0. It follows that

k2 k2
b212 µ212 = b212 < b1 2 2 2 = b1 µ1 b2 µ2
b (12.19)
h21 h22 h1 h2
or
q
|b12 |µ12 < b1 µ1 b2 µ2 . (12.20)

We also have
q q q
0 ≤ ( b1 µ1 − b2 µ2 )2 = b1 µ1 + b2 µ2 − 2 b1 µ1 b2 µ2 . (12.21)

Combining (12.20) and (12.21) we get


q
2|b12 |µ12 < 2 b1 µ1 b2 µ2 ≤ b1 µ1 + b2 µ2 . (12.22)

Similarly we have
√ √
b1 ϕ1 b2 ϕ2
0 ≤ ( sin − sin )2 (12.23)
h1 2 h2 2

b1 2 ϕ1 b2 2 ϕ2 b1 b2 ϕ1 ϕ2
= 2 sin + 2 sin −2 sin sin .
h1 2 h2 2 h1 h2 2 2

Use (12.18), multiply by k and rearrange


ϕ1 ϕ2 ϕ1 ϕ2
2|b12 | µ12 | sin sin | ≤ b1 µ1 sin2 + b2 µ2 sin2 . (12.24)
2 2 2 2
Now multiply by 4| cos ϕ21 cos ϕ22 | ≤ 4 and get
ϕ1 ϕ2
2|b12 | µ12 | sin ϕ1 sin ϕ2 | ≤ 4b1 µ1 sin2 + 4b2 µ2 sin2 (12.25)
2 2
or

2|x12 | ≤ 4x1 + 4x2 . (12.26)

Furthermore we have

x212 = b212 µ212 sin2 ϕ1 sin2 ϕ2


ϕ1 ϕ1 ϕ2 ϕ2
≤ 16b1 b2 µ1 µ2 sin2 cos2 sin2 cos2
2 2 2 2
2 ϕ1 2 ϕ2
= 16x1 x2 cos cos ≤ 16x1 x2 . (12.27)
2 2
124
For the explicit scheme the growth factor becomes

g = 1 − 4x1 − 4x2 − 2x12 (12.28)

and for stability we require −1 ≤ g ≤ 1. g ≤ 1 is equivalent to (12.26) and is


therefore satisfied for a well-posed problem. g ≥ −1 is equivalent to

2x1 + 2x2 + x12 ≤ 1 (12.29)

This inequality must be fulfilled for all ϕ1 and ϕ2 , in particular for ϕ1 = ϕ2 = π


where it reduces to

2b1 µ1 + 2b2 µ2 ≤ 1 (12.30)

so this relation which we recognize from the equation without the mixed term is a
necessary condition for stability. The mixed term is significant for ϕ1 = ϕ2 = π/2
where (12.29) reduces to

b1 µ1 + b2 µ2 + b12 µ12 ≤ 1 (12.31)

This inequality follows from (12.30) and (12.22) (cf. exercise 1).
If we put ϕ1 = π − ε1 and ϕ2 = π − ε2 then sin ϕ1 = sin ε1 and
ϕ1 ε1 ε1
sin2 = cos2 = 1 − sin2
2 2 2
and similarly with index 2. We therefore have
ε1 ε2
2x1 +2x2 +x12 = 2b1 µ1 +2b2 µ2 −2b1 µ1 sin2 −2b2 µ2 sin2 +b12 µ12 sin ε1 sin ε2 .
2 2
Since by (12.30) the sum of the first two terms on the right-hand-side is ≤ 1 we
just need to show that the sum of the last three terms is non-positive, which is
seen by the following
q ε1 q ε2
0 ≤ (2 b1 µ1 sin − 2 b2 µ2 sin )2
2 2 q
2 ε1 2 ε2 ε1 ε2
= 4b1 µ1 sin + 4b2 µ2 sin − 8 b1 µ1 b2 µ2 sin sin
2 2 2 2
2 ε1 2 ε2
≤ 4b1 µ1 sin + 4b2 µ2 sin − 2b12 µ12 sin ε1 sin ε2 .
2 2
For the last inequality we have used (12.19) and
ε1 ε1 ε1
sin ε1 = 2 sin cos ≤ 2 sin .
2 2 2
We conclude that (12.30) is also a sufficient condition for stability of the explicit
method applied to (12.12).

125
For the implicit scheme the growth factor is
1
g = (12.32)
1 + 4x1 + 4x2 + 2x12
and because of (12.26) we always have 0 ≤ g ≤ 1.
For Crank-Nicolson we have
1 − 2x1 − 2x2 − x12
g = (12.33)
1 + 2x1 + 2x2 + x12
and because of (12.26) we always have −1 ≤ g ≤ 1.
Altogether the mixed derivative term does not alter the basic stability properties
of the explicit, Crank-Nicolson, or the implicit method. But in practice we do
not wish to use any of these. We would rather prefer an ADI-method.

12.3 Stability of ADI-methods

We first look at the Simple Douglas-Rachford scheme (11.47)-(11.48) together


with α = β as given by (12.3). The equations for the growth factor are
(1 + 4x1 )g̃ = 1 − x12 , (12.34)
(1 + 4x2 )g = g̃ − x12 , (12.35)

1 − x12 x12 1 − 2x12 − 4x1 x12


g = − = . (12.36)
(1 + 4x1 )(1 + 4x2 ) 1 + 4x2 (1 + 4x1 )(1 + 4x2 )
A necessary condition for stability is g ≤ 1 or
−x12 (1 + 2x1 ) ≤ 2x1 + 2x2 + 8x1 x2
or
1 + 2x2
−x12 ≤ 2x2 + 2x1 .
1 + 2x1
Comparing with (12.26) we suspect that we may be in trouble when x2 < x1
and µ is large. Actually the inequality is violated when b1 = b2 = 1, b12 = 0.9,
h1 = h2 , ϕ1 = π/2, µ > 10, and ϕ2 is small and negative.
If we combine the Simple Douglas-Rachford scheme with (12.4) the equations for
the growth factor are
(1 + 4x1 )g̃ = 1 − x12 ,
(1 + 4x2 )g = g̃(1 − x12 ),
1 − x12 1 − x12
g = · . (12.37)
1 + 4x2 1 + 4x1

126
We notice immediately that g ≥ 0 and the condition g ≤ 1 is equivalent to

2|x12 | + x212 ≤ 4x1 + 4x2 + 16x1 x2

which follows from (12.26) and (12.27). We conclude that this combination is
unconditionally stable.
If we combine the Simple Douglas-Rachford scheme with (12.5) the equations for
the growth factor are

(1 + 4x1 )g̃ = 1 − 2x12 ,


(1 + 4x2 )g = g̃,
1 − 2x12
g = . (12.38)
(1 + 4x1 )(1 + 4x2 )

From (12.26) it follows readily that −1 ≤ g ≤ 1 and that we therefore have


unconditional stability which makes this scheme very interesting indeed.
If we combine the Traditional Douglas-Rachford scheme (11.44) – (11.45) with
(12.3) the equations for the growth factor are

(1 + 4x1 )g̃ = 1 − 4x2 − x12 ,


(1 + 4x2 )g = g̃ + 4x2 − x12 ,
1 − 4x2 − x12 4x2 − x12
g = +
(1 + 4x1 )(1 + 4x2 ) 1 + 4x2
1 − 2x12 − 4x1 x12 + 16x1 x2
= . (12.39)
(1 + 4x1 )(1 + 4x2 )

Comparing with Simple Douglas-Rachford it is apparent that we have even


greater problems with the stability condition g ≤ 1.
If we combine the Traditional Douglas-Rachford scheme with (12.4) the equations
for the growth factor are

(1 + 4x1 )g̃ = 1 − 4x2 − x12 ,


(1 + 4x2 )g = g̃(1 − x12 ) + 4x2 ,
1 − x12 1 − x12 − 4x2 4x2
g = · +
1 + 4x2 1 + 4x1 1 + 4x2
2
(1 − x12 ) + 4x2 x12 + 16x1 x2
= . (12.40)
(1 + 4x1 )(1 + 4x2 )

If we supplement our earlier counterexample with ϕ2 = π/2, and µ > 10 then we


have g > 1 violating the stability requirement.

127
If we combine the Traditional Douglas-Rachford scheme with (12.5) the equations
for the growth factor are

(1 + 4x1 )g̃ = 1 − 4x2 − 2x12 ,


(1 + 4x2 )g = g̃ + 4x2 ,
1 − 4x2 − 2x12 4x2
g = +
(1 + 4x1 )(1 + 4x2 ) 1 + 4x2
1 + 16x1 x2 − 2x12
= . (12.41)
(1 + 4x1 )(1 + 4x2 )
From (12.26) it follows readily that −1 ≤ g ≤ 1 and once again we have a useful
combination.
If we combine the Peaceman-Rachford scheme (11.20)-(11.21) with (12.3) the
equations for the growth factor are

(1 + 2x1 )g̃ = 1 − 2x2 − x12 ,


(1 + 2x2 )g = g̃(1 − 2x1 ) − x12 ,
(1 − 2x1 )(1 − 2x2 − x12 ) x12
g = −
(1 + 2x1 )(1 + 2x2 ) 1 + 2x2
(1 − 2x1 )(1 − 2x2 ) − 2x12
= . (12.42)
(1 + 2x1 )(1 + 2x2 )
g ≤ 1 follows directly from (12.26), and g ≥ −1 is equivalent to

1 + 4x1 x2 − 2x12 ≥ −1 − 4x1 x2

or
1 + 4x1 x2 − x12 ≥ 0.
If ϕ1 and ϕ2 have different signs or if b12 < 0 then x12 < 0 and we are done. We
can therefore assume 0 < ϕ1 , ϕ2 < π and b12 > 0.
q k ϕ1 ϕ2
0 ≤ (1 − 2 b1 b2 sin sin )2
h1 h2 2 2
q
2 ϕ1 2 ϕ1 ϕ1 ϕ2
= 1 + 4b1 µ1 b2 µ2 sin sin − 4 b1 b2 µ12 sin sin
2 2 2 2
ϕ1 ϕ2 ϕ1 ϕ2
≤ 1 + 4x1 x2 − 4b12 µ12 sin sin cos cos
2 2 2 2
= 1 + 4x1 x2 − x12

thus proving stability.


If we combine the Peaceman-Rachford scheme with (12.4) the equations for the
growth factor are

(1 + 2x1 )g̃ = 1 − 2x2 − x12 ,

128
(1 + 2x2 )g = g̃(1 − 2x1 − x12 ),
(1 − 2x1 − x12 )(1 − 2x2 − x12 )
g = . (12.43)
(1 + 2x1 )(1 + 2x2 )
If x12 > 2 which can easily happen when µ is large, then it is clear that the
numerator is greater than the denominator, implying g > 1 and thus instability.
A specific example is b1 = b2 = 1, b12 = 0.5,
k = h1 = h2 = 0.1 ⇒ µ1 = µ2 = µ12 = 10, ϕ1 = ϕ2 = π/2
leading to x1 = x2 = x12 = 5 and g = (14/11)2 > 1.
If we combine the Peaceman-Rachford scheme with (12.5) the equations for the
growth factor are
(1 + 2x1 )g̃ = 1 − 2x2 − 2x12 ,
(1 + 2x2 )g = g̃(1 − 2x1 ),
(1 − 2x1 )(1 − 2x2 − 2x12 )
g = . (12.44)
(1 + 2x1 )(1 + 2x2 )
Taking the same example as above we get g = (9 · 19)/112 > 1 proving instability.
If we combine the Peaceman-Rachford scheme with (12.8) the equations for the
growth factor are
3 1
(1 + 2x1 )g̃ = 1 − 2x2 − x12 ( − g −1),
2 2
3 1
(1 + 2x2 )g = g̃(1 − 2x1 ) − x12 ( − g −1),
2 2
3 1
(1 + 2x1 )(1 + 2x2 )g = (1 − 2x1 )(1 − 2x2 − x12 ( − g −1))
2 2
3 1 −1
−x12 ( − g )(1 + 2x1 )
2 2
= (1 − 2x1 )(1 − 2x2 ) − x12 (3 − g −1 ).
We thus have a quadratic equation for g:
(1 + 2x1 )(1 + 2x2 )g 2 − ((1 − 2x1 )(1 − 2x2 ) − 3x12 )g − x12 = 0. (12.45)
With b1 = b2 = 1, b12 = 0.9, µ1 = µ2 = µ12 = 10, ϕ1 = ϕ2 = π/5 one of the
roots of this equation is −1.29 and this scheme is therefore only conditionally
stable. When b12 ≤ 0.5 it appears that the roots are always ≤ 1 (independent of
µ) in absolute value, and we propose that this scheme can be used whenever the
mixed term has a small weight.
If we combine the Peaceman-Rachford scheme with (12.9) the equations for the
growth factor are
(1 + 2x1 )g̃ = 1 − 2x2 − x12 ,

129
(1 + 2x2 )g = g̃(1 − 2x1 ) − x12 (2 − g −1),

(1 + 2x1 )(1 + 2x2 )g 2 − ((1 − 2x1 )(1 − 2x2 ) − x12 (3 + 2x1 ))g − (1 + 2x1 )x12 = 0.

The product of the two roots is


x12 b12 µ12 sin ϕ1 sin ϕ2
= .
1 + 2x2 1 + 2b2 µ2 sin2 ϕ22

With b2 = 1, b12 = 0.5, µ2 = µ12 = 10, ϕ1 = π/2, ϕ2 = 0.5 we get


|g1 · g2 | ≈ 1.486 > 1. If the product of the roots is greater than one then at least
one of the roots must be greater than one in absolute magnitude thus implying
instability.

12.4 Summary

Table 12.1: A comparison of methods with mixed derivative

SDR TDR PR PR
12.3 − − ++ 12.8 (+)
12.4 + − − 12.9 −
12.5 ++ ++ −

An assessment of the various combinations of ADI-schemes with suggestions for


treating the mixed derivative term is given in Table 12.1. A − indicates con-
ditional stability, a + indicates unconditional stability, and a ++ indicates a
recommended combination where the practicalities also can be solved. Formula
(12.5) is recommended with either the simple or the traditional Douglas-Rachford
method. Experiments indicate that SDR together with (12.4) is stable provided
(11.29) is used, but not together with extrapolation. The Peaceman-Rachford
method plays well together with (12.3) although the result will only be first or-
der in t. For a second order method we recommend (12.8) when the mixed term
is suitably small.

130
12.5 Exercises
1. Show that (12.30) and (12.22) imply (12.31).

2. Equation (12.12) with b1 = 1, b2 = 1, b12 = 0.5 has the solution


(cf. Appendix B)

u(t, x, y) = e−t sin(2x − 2y) cosh(x + y).

Solve the equation for 0 < t, x, y < 1 with SDR and one or more of (12.3)
– (12.9). Initial and boundary conditions are taken from the true solution.
1 1 1 1
Use h1 = h2 = k = 10 , 20 , 40 , and 80 and compute the 2-norm of the error
at t = 1.

3. Same as above with TDR.

4. Same as above with PR.

131
132
Chapter 13

Two-Factor Models – two


examples

13.1 The Brennan-Schwartz model

A model for the determination of prices for bonds suggested by Brennan og


Schwartz [3] can be formulated in the following way:

βl2 1 1
ut = (µr − λβr )ur + ( + (l − r)l)ul − ru + βr2 urr + ρβr βl url + βl2 ull
l 2 2
where
u is the price of the bond, r is the short interest, l is the long interest, and
µr = ar + br (l − r), βr = rσr , and βl = lσl .
The coefficients have been estimated to
ar = −0.00622, br = 0.2676, σr = 0.10281, σl = 0.02001, ρ = −0.0022, λ = −0.9.
We transform the r-interval [0, ∞) to (0, 1] using
1 1−x
x= , r= ,
1 + πr r πr x
and similarly for l:
1 1−y
y= , l= ,
1 + πl l πl y
where the transformation coefficients πr and πl are chosen properly, often between
10 and 13. An interest interval from 10% to 1% will with π = 10 be transformed
into [0.5, 0.91] and with π = 13 into [0.43, 0.88].

133
We now have
∂u ∂u dx −πr ∂u
= = 2
= −πr x2 ux
∂r ∂x dr (1 + πr r) ∂x
∂u
= −πl y 2 uy
∂l
∂2u ∂ dx
= − (πr x2 ux ) = 2πr2 x3 ux + πr2 x4 uxx
∂r 2 ∂x dr
∂2u
= 2πl2 y 3uy + πl2 y 4uyy
∂l2
∂2u ∂ dx
= − (πl y 2uy ) = πr πl x2 y 2uxy
∂r∂l ∂x dr
and the differential equation becomes

ut = b1 uxx + 2b12 uxy + b2 uyy − a1 ux − a2 uy + κu,

where
1 2 2 4 1 2
b1 = b1 (x) = βr πr x = σr (1 − x)2 x2
2 2
1 1
b12 = b12 (x, y) = ρβr βl πr πl x2 y 2 = ρσr σl (1 − x)(1 − y)xy
2 2
1 2 2 4 1 2
b2 = b2 (y) = β π y = σl (1 − y)2 y 2
2 l l 2
a1 = a1 (x, y) = −βr2 πr2 x3 + (µr − λβr )πr x2
πr x
= −x((1 − x)(σr2 (1 − x) + br + λσr ) − ar πr x − br (1 − y))
πl y
a2 = a2 (x, y) = (σl2 + l − r)lπl y 2 − σl2 (1 − y)2y
1−y 1−x
= y(1 − y)(σl2y + − )
πl y πr x
1−x
κ = κ(x) = −r = −
πr x
We note in passing that the differential equation is well-posed since
1 2 2
b1 b2 − b212 = σ σ (1 − x)2 (1 − y)2 x2 y 2(1 − ρ2 ) > 0.
4 r l

134
The initial condition at t = 0 is the value of the bond at expiry, i.e.

u(0, x, y) = 1

The boundary condition at x = 0 is found by multiplying the differential equation


by x and then let x → 0, which gives
y 1
0 = (1 − y) uy − u
πr πr
or
du
(1 − y)y =u
dy
or
1 1 1 1
du = dy = ( + )dy.
u (1 − y)y 1−y y
Integration gives

ln u − ln u0 = ln y − ln y0 − ln(1 − y) + ln(1 − y0 )

or
y 1 − y0
u = u0 .
1 − y y0
We wish u to be bounded, also when y → 1, and therefore we must have u0 = 0,
and thus
u(t, 0, y) = 0.
This is a so-called natural boundary condition, i.e. a condition which follows
naturally from the equation (when we are interested in bounded solutions). It
also fits well to our intuitive understanding that if r → ∞ then the back value of
the bond will not be particularly high.
The boundary condition at y = 0 is found in a similar way by multiplying with
y and then letting y → 0. We then find
πr 2
0 = −br x ux
πl
i.e. u must be constant, and since u(t, 0, 0) = 0 according to the first boundary
condition we must have
u(t, x, 0) = 0;
but this is also in agreement with our intuition about the case l → ∞.

135
The boundary condition at y = 1 is found by inserting y = 1 in the differential
equation. We then have

b12 (x, 1) = b2 (1) = a2 (x, 1) = 0

and
1 1−x
ut = σr2 (1 − x)2 x2 uxx + ((1 − x)(σr2 (1 − x) + br + λσr ) − ar πr x)xux − u.
2 πr x

This is a parabolic equation in one space dimension which can be solved before-
hand, or concurrently with the solution in the interior. This equation has the
initial condition u(0, x, 1) = 1 from the general initial condition and the boundary
conditions u(t, 0, 1) = 0 from the boundary condition for x = 0, and u(t, 1, 1) = 1
from the argument that when both the interests are 0, then the bond will retain
its value.
In order to find a boundary condition at x = 1 we likewise put x = 1 in the
differential equation and find

b1 (1) = b12 (1, y) = κ(1) = 0

and
1 πr 1 − y 1−y
ut = σl2 (1 − y)2y 2 uyy − (ar πr + br )ux − (1 − y)(σl2 y 2 + )uy
2 πl y πl
This is in principle a parabolic differential equation in t and y with an extra term
involving ux and therefore referring to u-values in the interior. This equation
cannot be solved beforehand but must be solved concurrently with the solution
in the interior.
The initial condition is as before u(0, 1, y) = 1, and the boundary conditions are
u(t, 1, 0) = 0 and u(t, 1, 1) = 1.

13.2 Practicalities

We should like to implement an ADI method for the solution of this problem.
One small detail in the practical considerations is that we need values for ṽ on
two of the boundaries of the region. Because of the difficulties mentioned above
getting boundary values at x = 1 it seems convenient to reverse the order of the
operators P1 and P2 from the usual order in Chapter 11. Thus we wish to solve
for ṽ in the y-direction and therefore we shall need information on ṽ at y = 0 and
y = 1 where information is more readily available.

136
We select a step size, h1 , in the x-direction, or rather we select an integer, L,
and set h1 = 1/L. Similarly the step size in the y-direction is given through the
integer, M, by h2 = 1/M, and the time step by k = 1/N. Including boundary
nodes we thus have (L + 1)(M + 1)N function values to compute. With small
step sizes we might not have storage space for all these numbers at the same
time, but then again we don’t need to. At any particular time step we only need
information corresponding to two consecutive time levels (or three for Peaceman-
Rachford) and we can therefore make do with two (or three) (L + 1)(M + 1)
arrays. If solution values are needed at intermediate times these can be recorded
along the way. Such values are usually only required at coarser intervals, h̄1 > h1 ,
h̄2 > h2 , and k̄ > k, and therefore require smaller arrays.
Because of the discontinuity between the initial values and the boundary values
at x = 0 and y = 0 it may be convenient to use or at least begin with the
Douglas-Rachford method. Using the simple version (11.47 – 11.48) and (12.5)
for the mixed derivative term the equations become

(I − kP2h )ṽ = v n + 2kb12 (x, y)δxy


2 n
v (13.1)
(I − kP1h )v n+1 = ṽ (13.2)

The time step from nk to (n + 1)k is now divided into a number of subtasks
numbered like in section 11.6 except that we have added one subtask at the
beginning.
0. Advance the solution on y = 1 using

ut = b1 (x)uxx − a1 (x, 1)ux + κ(x)u (13.3)

discretized using the implicit method

n+1 n n+1 n+1 n+1 a1 n+1 n+1 n+1


vl,M − vl,M = b1 µ1 (vl+1,M − 2vl,M + vl−1,M )− λ1 (vl+1,M − vl−1,M ) + κkvl,M
2
or
1 n+1 n+1
−(b1 µ1 + a1 λ1 )vl−1,M + (1 + 2b1 µ1 − κk)vl,M (13.4)
2
1 n+1 n
−(b1 µ1 − a1 λ1 )vl+1,M = vl,M
2
where we have introduced λ1 = k/h1 .
This tridiagonal system of equations supplemented with the boundary conditions
n+1 n+1
v0,M = 0 and vL,M = 1 can now be solved using Gaussian elimination.
1. The right-hand-side of (13.1) requires v n at all interior points which is no
2 n
problem and δxy v at the same points which means v n at all points including

137
those on the boundary. The only problem arises at the first time step because
of the discontinuity between the initial condition and the boundary conditions
at x = 0 and y = 0. We recommend using the initial value throughout and
thereby avoid divided differences of the order 1/h1 h2 . In the present case the
b12 -coefficient is rather small because of the small numerical value of ρ so the
effect of a different choice is minimal.
2. We next compute ṽ for y = 0 and y = 1 using (13.2) and get ṽl,0 = 0 and
n+1
ṽl,M = (I − kP1h )vl,M . (13.5)

Comparing (13.5) with (13.3) and (13.4) we note that there might be an advantage
in including the κu-term in the P1 -operator because then (13.5) takes the simpler
form of
n
ṽl,M = vl,M . (13.6)

In the general case with θκu in P1 and (1 − θ)κu in P2 the formula for ṽ becomes
n n+1
ṽl,M = vl,M + (1 − θ)κkvl,M (13.7)

3. The system of equations (13.1) can now be solved for ṽ at all interior points.
The system consists of L − 1 tridiagonal systems of M −1 unknowns each and
they can be solved independently of each other.
4. The right-hand-side of system (13.2) consists of ṽ at all interior points which
we have just computed in 3.
5. On each horizontal line (13.2) gives rise to one equation for each internal node,
i.e. a total of L − 1 equations in L + 1 unknowns, the extra unknowns being the
values of v n+1 at x = 0 and x = 1. The former is equal to 0, and for the latter
we must resort to the boundary equation

ut = b2 (y)uyy − a2 (1, y)uy − a1 (1, y)ux (13.8)

An implicit discretization of (13.8) could be

n+1 n n+1 n+1 n+1 1 n+1 n+1


vL,m − vL,m = b2 µ2 (vL,m+1 − 2vL,m + vL,m−1 ) − a2 λ2 (vL,m+1 − vL,m−1 )
2
1 n+1 n+1 n+1
− a1 λ1 (vL−2,m − 4vL−1,m + 3vL,m ) (13.9)
2
where we have used the asymmetric second order difference approximation for
ux on the boundary. A simpler formula would result from replacing the last
n+1 n+1
parenthesis in (13.9) by (−2vL−1,m + 2vL,m ) but since this is only a first order
approximation of ux the resulting v n+1 would be only first order correct in x.

138
Figure 13.1: The coefficient matrix corresponding to L = M = 5.

n+1
Equation (13.9) supplies the extra information we need about vL,m , but now the
various rows are no longer independent of each other.
6. The total system which is outlined in Fig. 13.1 in the case L = M = 5 is
tridiagonal with three exceptions all due to equation (13.9): In each block there
is an element two places left of the diagonal in the last row (the coefficient of
n+1
vL−2,m ). In each block but the first there is an element L places left of the diagonal
n+1
in the last row (the coefficient of vL,m−1 ). In each block but the last there is an
n+1
element L places right of the diagonal in the last row (the coefficient of vL,m+1 ).
On the right-hand-side of the system we must remember the effect of the boundary
value at (1,1) in the last equation. The other boundary value at (1,0) is 0 so no
correction is needed here.
Although the system of linear equations is not tridiagonal it still can be solved
using Gaussian elimination without introducing new non-zero elements, and the
solution process requires a number of simple arithmetic operations which is linear
in the number of unknowns and only marginally larger than that of a tridiagonal
system.

13.3 A Traditional Douglas-Rachford step

If one prefers to use the Traditional Douglas-Rachford method then the equations
to be solved instead of (13.1) and (13.2) are

(I − kP2h )ṽ = (I + kP1h )v n + 2kb12 (x, y)δxy


2 n
v (13.10)
(I − kP1h )v n+1 = ṽ − kP1h v n (13.11)

Most of the considerations of the preceding section are still applicable so we shall
just focus our attention on the differences which occur in 1. and 4.
1. The right-hand-side of (13.10) now also includes P1h v n which means that it
requires knowledge of v n not only at all interior points but also for x = 0 and

139
x = 1. The only difficulty lies at the very first time step where we still prefer to
settle the discontinuity issue by adopting the initial value throughout.
4. Similar considerations apply for the P1h -term on the right-hand-side of (13.11).

13.4 The Peaceman-Rachford method

The Douglas-Rachford method is only first order in time and therefore we might
prefer to use Peaceman-Rachford, possibly after an initial DR step.
The Peaceman-Rachford equations, augmented with (12.8) are
1 1 2 n+ 12
(I − kP2h )ṽ = (I + kP1h )v n + kb12 (x, y)δxy v̂ (13.12)
2 2
1 1 2 n+ 21
(I − kP1h )v n+1 = (I + kP2h )ṽ + kb12 (x, y)δxy v̂ (13.13)
2 2
where
1 1
v̂ n+ 2 = v n + (v n − v n−1 ) n ≥ 1. (13.14)
2
Formula (13.14) can not be used in the first step but here it is OK to replace it
by
1
v̂ n+ 2 = v n . (13.15)

Since b12 (x, y) in this example is so small there is actually little difference between
the results obtained with (13.14) and with (13.15).
Once again we divide the time step from nk to (n + 1)k into subtasks with the
same numbering as before.
0. On the boundary y = 1 it is now appropriate to discretize (13.3) using Crank-
Nicolson:
1 1 n+1 1 n+1 1 1 n+1
−( b1 µ1 + a1 λ1 )vl−1,M + (1 + b1 µ1 − κk)vl,M − ( b1 µ1 − a1 λ1 )vl+1,M =
2 4 2 2 4
1 1 n 1 n 1 1 n
( b1 µ1 + a1 λ1 )vl−1,M + (1 − b1 µ1 + κk)vl,M + ( b1 µ1 − a1 λ1 )vl+1,M .
2 4 2 2 4
This is a system of the same structure as (13.4) although with a more complicated
right-hand-side where previous considerations concerning the jump between the
initial and boundary values at (t, x) = (0, 0) apply at the first step.
1. The right-hand-side of (13.12) is very similar to that of (13.10) and previous
comments apply.

140
2. ṽ for y = 0 and y = 1 are now given by (11.27) which gives ṽl,0 = 0 and

1 1 n 1 1 n+1
ṽl,M = (I + kP1h )vl,M + (I − kP1h )vl,M . (13.16)
2 2 2 2
Again there is a computational advantage in including the κu-term in the P1 -
operator in which case (13.16) reduces to
1 n
ṽl,M = (I + kP1h )vl,M . (13.17)
2
In the general case with θκu in P1 and (1 − θ)κu in P2 the formula for ṽ becomes
1 n 1 n+1 n
ṽl,M = (I + kP1h )vl,M + (1 − θ)κk(vl,M − vl,M ). (13.18)
2 4

3. The system of equations (13.12) can now be solved for ṽ at all interior points.
The system consists of L − 1 tridiagonal systems of M −1 unknowns each and
they can be solved independently of each other.
4. The right-hand-side of system (13.13) requires knowledge of ṽ at all interior
points in addition to the boundary values from 2. The values needed for v̂ are
the same as in 1.
5. Equation (13.13) now gives rise to a set of tridiagonal equations which must
be supplemented by the Crank-Nicolson equivalent of (13.9) to form a system of
equations with the same pattern of nonzeroes as before.
6. This system is now solved for v n+1 at all interior grid points as well as at all
interior points on the boundary line x = 1.

13.5 Fine points on efficiency

Efficiency often amounts to a trade-off between storage space and computation


time. Readability of the program can also tip the scale in favour of one partic-
ular strategy. Since the coefficients do not depend on time many things can be
computed once and reused in each time step. Also the Gaussian elimination can
be performed once and the components of the LU factors stored for later use.
The coefficient functions (b1 , . . . , κ) may be supplied as subroutines or they may
be computed ahead of time at all grid points and stored in arrays. b12 (x, y),
a1 (x, y), and a2 (x, y) require two-dimensional (L + 1) · (M + 1) arrays, b1 (x),
κ(x), and b2 (y) need one-dimensional vectors with L + 1, resp. M + 1 elements.

141
13.6 Convertible bonds

In a model by Longstaff and Schwartz [24] the two independent variables are the
interest, r, and the volatility, V . The differential equation can be written as
1 1
ut = V urr + ((α + β)V − αβr)urV + ((α2 + αβ + β 2 )V − αβ(α + β)r)uV V
2 2
ξα − δβ δ−ξ
+(αγ + βη + r+ V )ur
β−α β−α
αβ(ξ − δ) αδ − βξ
+(α2 γ + β 2 η + r+ V )uV − ru,
β−α β−α
where the parameters have been estimated to
α = 0.001149, β = 0.1325, γ = 3.0493, δ = 0.05658, η = 0.1582, ξ = 3.998.
The conditions for this problem to be well-posed are
V >0
and
V ((α2 + αβ + β 2 )V − αβ(α + β)r) − ((α + β)V − αβr)2 > 0.
The last condition can be rewritten to
αβV 2 + α2 β 2 r 2 − αβ(α + β)rV < 0
or
V 2 − (α + β)rV + αβr 2 < 0
or
αr < V < βr
since α < β in our case.
The equation is therefore only well-posed in part of the region (r > 0, V > 0)
and ill-posed in the rest. If an equation is ill-posed, then the norm of the solution
at any particular time is not guaranteed to be bounded in terms of the initial
condition (cf. section 2.3). Or, small changes in the initial condition may produce
large changes in the solution at a later time. In principle the solution can become
very large in a very short time although in practice we may be able to retain a
limited accuracy for small time intervals. And it doesn’t help much that the
problem is well-posed in part of the region. The disturbances which originate in
the ill-posed part will quickly spread to the rest (cf. section 14.3).
The main advice for ill-posed problems is not to touch them. It is far better to
search for another model with reasonable mathematical properties. If an ill-posed
problem must be solved then approach it very carefully. And be prepared that
our numerical methods may deceive us when they are used outside their usual
area of application. A more detailed analysis is given in the next chapter.

142
Chapter 14

Ill-Posed Problems

14.1 Theory

For the simple parabolic problem

ut = buxx (14.1)

it is essential that b is positive.


From Fourier analysis (cf. Chapter 2) we know that

ût (t, ω) = −bω 2 û(t, ω) (14.2)

and therefore that the Fourier transform of u can be written


2
û(t, ω) = e−bω t û(0, ω) (14.3)

and by Parseval’s theorem we have that


Z Z Z
2
|u(t, x)|2 dx = |û(t, ω)|2 dω = |e−2bω t | |û(0, ω)|2dω. (14.4)

When b > 0 we know that the exponential factor is ≤ 1 for t > 0 and therefore
that the norm of the solution at any time t is bounded by the norm of the initial
value function. It also follows that small changes in the initial value will produce
small changes in the solution at later times.
If b < 0, or if we try to solve the heat equation backwards in time, the situation
is quite different. Since −2bω 2 t > 0 we shall now observe a magnification of the
various components of the solution, and the higher the frequency, ω, the higher
the magnification.

143
If the initial condition is smooth, consisting only of low frequency components
then the effect of the magnification is limited for small values of t. But if the
initial function contains high frequency components, or equivalently that û(0, ω)
is different from 0 for large values of ω, then the corresponding components of
the solution will exhibit a large magnification. The solution will be extremely
sensitive to small variations in the initial value if these variations have high
frequency. In mathematical terms the solution will not depend continuously on
the initial data.
The main advice is: Stay away from such problems.
Even if the initial function is smooth, the unavoidable rounding errors connected
with numerical computations will introduce high frequency perturbations and
although small at the beginning they will by nature of the equation be magnified.

14.2 Practice

But it is difficult to restrain our curiosity. How will our difference schemes react
if we try to solve such a problem numerically with a finite difference method.
The components of the numerical solution are governed by the growth factor
which for the general θ-method is (cf. section 2.4)

1 − 4(1 − θ)bµ sin2 ϕ2


g(ϕ) = . (14.5)
1 + 4θbµ sin2 ϕ2

For a given step size, h = (X2 − X1 )/M, not all frequencies, ϕ, will occur.
Because of the finite and discrete nature of the problem, only a finite number of
frequencies are possible, given by (6.23):

ϕp = , p = 1, 2, . . . M −1. (14.6)
M
For the explicit method we have
ϕ
g(ϕ) = 1 − 4bµ sin2 . (14.7)
2
When bµ < 0 we notice immediately that g(ϕ) > 1 for all ϕ. All components will
be magnified, and the high frequency components will be magnified most. This
is fine for it reflects the behaviour of the true solution, at least qualitatively.
The largest magnification at time t = nk is

|bk| n |bnk|
4 2
|bt|
4 2
(1 + 4|bµ|)n = (1 + 4 ) ≈ e h = e h ,
h2
144
so with a given step size h there is a limit to the magnification at a given time, t,
independent of the time step k. This is also in accordance with the mathematical
properties of the solution since the value of h defines an upper limit on the possible
frequencies.
For the Crank-Nicolson method the growth factor is

1 − 2bµ sin2 ϕ2
g(ϕ) = . (14.8)
1 + 2bµ sin2 ϕ2

We may experience infinite magnification at the finite frequency ϕ given by


ϕ 1
sin2 = − ,
2 2bµ

a situation which is possible if bµ < −1/2. We may not observe infinite magni-
fication in practice if the corresponding frequency is not among those given by
(14.6).
The largest magnification is given by the value of ϕp which maximizes (14.8). On
the other hand it is easily seen from (14.8) that all components are magnified,
just as for the explicit method, but the magnification becomes rather small for
high frequency components when |bµ| is large.
For the implicit method the growth factor is
1
g(ϕ) = . (14.9)
1 + 4bµ sin2 ϕ2

Again we may experience infinite magnification when bµ < −1/4 but we may not
observe it in practice because of the discrete set of applicable frequencies. If |bµ|
is large we observe from (14.9) that high frequency components of the numerical
solution will be damped. This may result in a pleasantly looking solution, but it
is a deception. Since all components of the true solution are magnified, a damping
of some is really an unwanted effect.
A further complication associated with negative values of bµ is that we may
encounter zeroes in the diagonal in the course of the Gaussian elimination, even in
cases where the tridiagonal matrix is non-singular. It may therefore be necessary
to introduce pivoting.

145
1 1 1

-1 1 -1 1 -1 1

EX CN IM

Figure 14.1: The growth factors for EX, CN, and IM


as functions of x = 2bµ sin2 ϕ2 .

The behaviour of the explicit method (EX), Crank-Nicolson (CN), and the im-
plicit method (IM) is visualized in Fig. 14.1 using the graphs of 1 − 2x for EX,
(1 − x)/(1 + x) for CN, and 1/(1 + 2x) for IM, where x = 2bµ sin2 ϕ2 . We have
stability (damping) when the respective functions lie in the strip between −1 and
1, so for positive bµ, (positive x), we have unconditional stability with CN and
IM, and we require 2bµ ≤ 1, (x ≤ 1) with EX. For negative bµ we always have
instability with EX and CN, but we note that the magnification is rather small
for large |bµ| with CN. For IM we have stability for large |bµ| (and ϕ). These
observations should be compared with the fact that the true solution exhibits
large growth for large values of ω.
When bµ < −1/2 the numerical results from CN and IM will not be influenced
most by the high frequency components. Larger growth factors will appear due
to the singularity of g(ϕ) for intermediate values of ϕp = pπ/M. The dominant
factor will occur for the value of p which makes 2bµ sin2 (ϕp /2) closest to −1,
respectively −1/2, and the behaviour will be somewhat erratic when we vary the
step sizes.

146
Table 14.1: Error growth for the negative heat equation.

k bµ n t g p
EX 0.1 −10 4 0.4 40 19
0.01 −1 11 0.11 5 19
0.005 −0.5 16 0.08 3 19
0.0025 −0.25 27 0.0675 2 19
0.00125 −0.125 47 0.05875 1.5 19
CN 0.1 −10 7 0.7 −23 3
0.01 −1 2 0.02 ∞ 10
0.005 −0.5 4 0.02 300 19
0.0025 −0.25 19 0.0475 3 19
0.00125 −0.125 39 0.04875 1.7 19
IM 0.1 −10 10 1.0 45 2
0.01 −1 3 0.03 −10 7
0.005 −0.5 2 0.01 ∞ 10
0.0025 −0.25 5 0.0125 150 19
0.00125 −0.125 31 0.03875 2 19

Example. The equation ut = −uxx has the solution u(t, x) = et cos x and
this is used to define initial and boundary values. We have solved the equation
numerically using EX, CN, and IM on x ∈ [−1, 1] with M = 20 corresponding
to h = 1/10 and a range of time steps from k = 1/10 to k = 1/800 giving values
of bµ from −10 to −1/8. We have continued the numerical solution until the
error exceeded 5 and have recorded the number of time steps, the final value of
t, and the observed growth which in most cases was in good agreement with the
theoretical value from (14.5) and (14.6).
The results are given in Table 14.1. For small values of bµ the growth factor
approaches 1 and the worst growth is always associated with the highest frequency
component. As the reduction in g is coupled with a reduction in the time step we
notice that the time interval of integration becomes smaller as the time step is
reduced. As bµ gets larger (in absolute value) we may observe significant growth
with CN and IM for low frequency components because of the singularity in the
expression for g. ✷

147
14.3 Variable coefficients – An example

What happens when an equation is ill-posed in part of its domain. Can we trust
the solution in the rest of the domain where the equation is well posed. This
question can be illustrated by the following
Example. Consider the equation
1
ut = xuxx . (14.10)
2
The coefficient of uxx is negative when x < 0 and the equation is therefore ill-
posed here. A solution to (14.10) is

u(t, x) = tx + x2

and this is used to define initial and boundary values. If we solve (14.10) on
an interval such that 0 becomes a grid point then the system of equations will
decouple in two, because of zeroes in the side diagonal, and bad vibrations from
the negative part will have no influence on the positive side. To avoid this decou-
pling we therefore choose to solve in the interval x ∈ [−0.983, 1.017]. We have
solved the equation numerically using CN and IM with M = 20 corresponding
to h = 1/10, and a range of time steps from k = 1/10 to k = 1/200. We have
continued the numerical solution until the error exceeded 5 and have recorded
the number of time steps and the final value of t and give the results in Table
14.2.
In all cases we observed severe error growth originating from the negative part
of the interval and eventually spreading to the whole interval. ✷

Table 14.2: Range of integration until error exceeds 5.

k n t
CN 1/10 7 0.7
1/40 14 0.35
1/100 25 0.25
1/200 56 0.28
IM 1/10 8 0.8
1/40 22 0.55
1/100 10 0.1
1/200 34 0.17

148
Chapter 15

A Free Boundary Problem

15.1 The Stefan problem

Free boundary problems arise in the mathematical modelling of systems involving


heat conduction together with a phase change such as the freezing of a liquid or
the melting of a solid. The original paper by J. Stefan [33] was a study of the
thickness of ice in arctic waters, and since then these problems have often been
called Stefan problems.
To illustrate the one-dimensional one-phase Stefan problem consider the following
system: A horisontal rod of ice, enclosed in an insulated tube, is kept initially at
the freezing point, 0◦ C. We now supply heat to one end of the rod. The problem
is to determine the position of the ice-water interface as a function of time and to
find the temperature distribution in the water as a function of time and distance
from the heat source. Mathematically this can be formulated as
ut − uxx = 0, t > 0, 0 < x < y(t), (15.1)
ux (t, 0) = −1, t > 0, (15.2)
u(t, y(t)) = 0, t ≥ 0, (15.3)
ux (t, y(t)) = −y ′ (t), t > 0, (15.4)
y(0) = 0. (15.5)
y(t) denotes the position of the ice-water interface at time t, and u(t, x) is the
temperature of the water at time t and distance x. At time t = 0 there is no water,
(15.5), the supply of heat is constant in time, (15.2), the temperature of the water
at the interface is 0◦ C (15.3), and the rate of melting of ice is proportional to the
heat flux at the interface (15.4). The temperature of the water is governed by the
simple heat equation (15.1), and by suitable linear transformations all physical
constants are set equal to 1.

149
If all the heat supplied was used to melt ice, the interface would be at y(t) = t.
But some of it is used to heat the water, so at time t we have
Z y(t)
y(t) = t − u(t, x)dx, t>0 (15.6)
0

a relation which is equivalent to (15.4), given (15.1) – (15.3) and often used
instead of (15.4) in the numerical calculations.
The first question one should be concerned with is that of existence and unique-
ness of a solution to (15.1) – (15.5). We shall not address this question here but
be content with the fact that our numerical schemes seem to converge as the step
sizes become smaller, and this might be used as a basis of an existence proof.
It is intuitively clear that as time passes more and more ice will melt, and the
temperature of the water at any particular point will increase, and also that
the temperature of the water at any particular time will decrease with x. We
shall therefore expect to find that y ′ (t) > 0 for t > 0 and that ut (t, x) > 0 and
ux (t, x) < 0 for t > 0 and 0 < x < y(t).
We shall first note some immediate consequences of equations (15.1) – (15.6).
From (15.2), (15.4), and the continuity of y(t) and y ′(t) at t = 0 we get

y ′(0) = 1. (15.7)

From (15.3) and (15.2) and the maximum principle for (15.1) we deduce that

u(t, x) > 0, t > 0, 0 < x < y(t), (15.8)

and

ux (t, y(t)) < 0, t>0 (15.9)

and from (15.4) we then have

y ′ (t) ≥ 0, t>0 (15.10)

and from (15.6)

y(t) < t, t > 0. (15.11)

15.2 The Douglas-Gallie method

Among the various numerical methods proposed for the Stefan problem we have
chosen the difference scheme of Douglas and Gallie [9]. They choose a fixed step

150
x
1.0

0.5

h
h
t
k1 k2
0.5 1.0

Figure 15.1: The Douglas-Gallie grid

size, h, in the x-direction and a variable step size in the t-direction with steps k1 ,
k2 , . . . determined such that the computed boundary curve y(t) passes through
grid points and such that there is precisely one extra grid point when going from
time tn−1 to tn . In Fig. 15.1 we have shown how the grid might look.
We shall use the following notation
n
X
n
xm = mh; tn = ki ; vm = v(tn , xm ); m = 0, 1, . . . , n; n = 0, 1, . . .
i=1

v(tn , xm ) is the numerical approximation to the temperature distribution u(t, x)


and xn is the numerical approximation to y(tn ).
Since we have one extra grid point at time tn compared to the previous time tn−1 ,
the implicit formula looks like an ideal choice. Douglas and Gallie propose the
following equations at time n:
n
vm − vm n−1
v n − 2vm
n n
+ vm−1
= m+1 , m = 1, 2, . . . , n − 1, (15.12)
kn h2
v0n − v1n = h, (15.13)
vnn = 0, (15.14)
n−1
X
n
kn = h vm + nh − tn−1 . (15.15)
m=1

These equations are straightforward discretizations of (15.1), (15.2), (15.3), and


n
(15.6). They comprise n+2 equations in the n+2 unknowns, vm , (m = 0, 1, . . . , n)
plus kn . However, the equations are non-linear so the solution process is not
completely straightforward.

151
Remark. A simpler alternative to (15.15) is to discretize (15.4) to

kn = h2 /vn−1
n
(15.16)

which then can be used to correct the time step kn . ✷


We shall first see how to get the process started. For n = 0 we get from (15.14)
that

v00 = 0 (15.17)

and this is the only value for n = 0. For n = 1 we have two values. From (15.14),
(15.13), and (15.15) (or (15.16)) we get

v11 = 0, v01 = h, k1 = h. (15.18)

For n = 2 (15.14), (15.13), and (15.12) reduce to

v12 h − v12
=
k2 h2
and (15.15) gives together with (15.18)

k2 = hv12 + h.

Combined we have a quadratic equation for v12 where the positive root is
s s
1 1 1 1
v12 = − + +h ⇒ k2 = h( + + h). (15.19)
2 4 2 4

Remark. The same solution is obtained if we use (15.16) instead of (15.15). ✷


Remark. The superscript ‘2’ on v indicates the time step, The same superscript
on h indicates the second power. ✷
For n ≥ 3 we solve the equations (15.12) – (15.15) iteratively, guessing a starting
value kn(0) , solving (15.12) – (15.14) for vm n(0)
, m = 0, 1, . . . , n and then using
(15.15) (or possibly (15.16)) to produce an improved value kn(1) . The process is
then repeated until kn(r) − kn(r−1) is smaller than some predetermined tolerance, ε.
Several questions can be raised at this point:
Is there a (useful) solution to (15.12) – (15.15)?
Does the iteration converge to this solution?
Does it converge fast enough to be useful?
How do we get good starting values for kn ?
And more specificly:
Should we include v0n in the sum in (15.15)?

152
Or maybe with weight 0.5 (the trapezoidal rule)?
Should we use second order approximations instead of (15.13) (and (15.16))?
Possibly symmetric ones with fictitious points?
Is it possible to use Crank-Nicolson?
We observe convergence in practice and this assures us that the equations have
a solution. The convergence (in r) appears to be linear and can therefore be
accelerated using Aitken’s device [1]. A good starting value for kn is the previous
time step, kn−1 , and an even better value is obtained by extrapolation from the
previous two time steps: kn(0) = 2kn−1 −kn−2 . The iterations using (15.16) appears
to have slower convergence than those using (15.15) but after one or two Aitken
extrapolations the difference is minimal. The limit value appears to be the same.
We have only one independent stepsize (h), since the time steps are determined
from h. The implicit method is first order in k and we shall therefore expect the
overall method to be first order in h. Therefore there seems to be no immediate
demand for second order approximations to the derivative boundary conditions
(15.13) and (15.16), or the integral (15.15).
The boundary curve y(t) is an increasing function of t and therefore has an inverse
function which we shall call t(x). The computed value of t(x) for x = n · h is the
sum of the first n time steps. It is therefore straightforward to experimentally
determine the order of the method w.r.t. the determination of t(x).
Computer experiments confirm our assumptions. They also seem to indicate that
use of the trapezoidal rule instead of (15.15) (i.e. adding v0n /2 to the sum) give
more, and use of (15.16) less, accurate results, although still first order. After two
Richardson extrapolations the results differ by less then 0.0000005. The results
in Section 15.4 were computed using (15.16).

15.3 The global error

Along the lines of Chapter 9 we shall assume that the computed solution v(t, x)
can be expressed in a power series in h:
v(t, x) = u(t, x) − hc − h2 d − h3 f − · · · (15.20)
where u(t, x) is the true solution and c, d, and f are auxiliary functions of t and
x. Likewise we shall assume that the computed boundary function Y (t) can be
expressed as
Y (t) = y(t) − hγ − h2 δ − h3 ϕ − · · · (15.21)
where y(t) is the true boundary function and γ, δ, and ϕ are auxiliary functions
of t.

153
The functions, v(t, x) and Y (t), are actually known only at grid points t = tn ,
x = xm , but we shall assume (as we have done previously) that they can be
extended in a differentiable manner to all t > 0 and x ∈ (0, Y (t))
The computed step sizes, kn , can be viewed as instances of a step size function
k(t) which also can be written as a power series

k(t) = hκ + h2 λ + h3 µ + · · · (15.22)

where κ, λ, and µ are auxiliary functions of t. These functions are closely related
to the functions y, γ, δ, and ϕ above since

h = Yn − Yn−1 = Y (tn ) − Y (tn − kn )


1 1
= kn Y ′ (tn ) − kn2 Y ′′ + kn3 Y ′′′ − · · ·
2 6
1
= (hκ + h2 λ)(y ′ − hγ ′ − h2 δ ′ ) − h2 κ2 y ′′ + · · ·
2
1
= hκy ′ + h2 (λy ′ − κγ ′ − κ2 y ′′ ) + · · ·
2
using (15.21) and (15.22). Equating terms with equal powers of h we get

1 = κy ′
1
0 = λy ′ − κγ ′ − κ2 y ′′
2
When we know y(t) and have established that y ′ > 0 we have
1
κ(t) = (15.23)
y ′(t)

and when we also know γ(t) then we find


1
λ(t) = (κγ ′ + κ2 y ′′)/y ′
2
γ′ 1 y ′′
= + (15.24)
(y ′)2 2 (y ′ )3

From the left-hand-side of (15.12) we get using (15.20) and (15.22)


n n−1
vm − vm 1 1 1
= ut − hct − h2 dt − kn utt + kn hctt + kn2 uttt + · · ·
kn 2 2 6
1 1
= ut − hct − h2 dt − (hκ + h2 λ)(utt − hctt ) + h2 κ2 uttt + · · ·
2 6
1 1 1 1
= ut − h(ct + κutt ) − h (dt + λutt − κctt − κ2 uttt ) + · · ·
2
2 2 2 6
154
From the right-hand-side we get

1 2
uxx − hcxx − h2 dxx + h u4x + · · ·
12
Equating terms with equal powers of h we get the differential equations to be
satisfied by the auxiliary functions:

ut − uxx = 0, (15.25)
1
ct − cxx = − κutt , (15.26)
2
1 1 1 1
dt − dxx = − λutt + κctt + κ2 uttt − u4x . (15.27)
2 2 6 12
From (15.13) we get

1 1 1
h = v0n − v1n = −hux − h2 uxx − h3 uxxx + h2 cxx + h3 cxx + h3 dx + · · ·
2 6 2
and equating terms we get the following boundary conditions at x = 0 for the
auxiliary functions

ux (t, 0) = −1, (15.28)


1
cx (t, 0) = uxx (t, 0), (15.29)
2
1 1
dx (t, 0) = uxxx − cxx . (15.30)
6 2
The boundary condition (15.14) involves v(tn , nh) whereas we have information
about u at (tn , y(tn )). The difference in the x-coordinate is

y(tn ) − nh = y(tn ) − Y (tn ) = hγ + h2 δ + h3 ϕ + · · ·

and we therefore have


1
0 = vnn = v(tn , y(tn )) + (nh − y(tn ))vx + (nh − y(tn ))2 vxx + · · ·
2
1
= u(tn , y(tn )) − hc − h2 d − (hγ + h2 δ)(ux − hcx ) + h2 γ 2 uxx + · · ·
2
leading to

u(tn , y(tn )) = 0, (15.31)


c(tn , y(tn )) = −γux = γ(t)y ′ (t), (15.32)
1
d(tn , y(tn )) = −δux + γcx + γ 2 uxx . (15.33)
2
155
As the second condition at y(t) we take (15.16) which we rewrite to
n
vn−1 − vnn 1 1
−h = −kn = kn [vx (tn , nh) − hvxx + h2 vxxx + · · · ]
h 2 6
= (hκ + h2 λ)[ux (tn , y(tn )) − hcx − h2 dx − (hγ + h2 δ)(uxx − hcxx )
1 1 1
+ h2 γ 2 uxxx − h(uxx − hcxx − hγuxxx ) + h2 uxxx ] + · · ·
2 2 6
1
= hκux + h2 (λux − κ(cx + (γ + )uxx )) + · · ·
2
Equating powers of h we get

κ(t)ux (t, y(t)) = −1, (15.34)


1
λ(t)ux (t, y(t)) = κ(cx + γuxx + uxx ). (15.35)
2
At t = 0 we have y(0) = 0 and u(0, 0) = 0 and since we begin with Y (0) = 0 and
v(0, 0) = 0 we have the initial values

γ(0) = δ(0) = c(0, 0) = d(0, 0) = 0. (15.36)

From (15.25), (15.28), (15.31), (15.23), and (15.34) we recover the original Stefan
problem indicating that our difference scheme is consistent and that our basic
assumptions (15.20), (15.21), and (15.22) are not completely unrealistic. We now
assume u(t, x) and y(t) known such that the upcoming problems are with a fixed
boundary. In equation (15.35) we have κ, λ, and γ appearing and they must first
be eliminated using (15.23), (15.24), and (15.32). Differentiating (15.32) we get

ct (t, y(t)) + cx (t, y(t))y ′(t) = γ ′ y ′ + γy ′′ (15.37)

and using this in (15.35) we end up with


y ′′ (t) 1 1
2y ′(t)cx (t, y(t)) + ct (t, y(t)) + (y ′ (t)2 − )c(t, y(t)) = − y ′′(t) − y ′(t)2 .
y (t)
′ 2 2
(15.38)

(15.38), together with (15.26), (15.29), and (15.36), defines a boundary value
problem for the auxiliary function c(t, x). The boundary condition (15.38) is
unusual since it involves ct in addition to c and cx and standard existence and
uniqueness theorems do not apply. Uniqueness of solutions is not difficult to
prove, and convergence of a difference scheme can be used to establish existence
of a solution function c(t, x). Once c(t, x) is known, (15.37) supplies an ordinary
differential equation for the determination of γ(t).
The differential equation for c(t, x) is inhomogeneous and so are the boundary
conditions, so we expect c to be different from 0 and our difference approximation
accordingly to be first order in h.

156
15.4 Estimating the global error

In practice we do not wish to solve extra differential equations in order to gain


information on the order and discretization error of our difference schemes. In-
stead we would use the techniques of Chapter 10 and let the computer do the
work.
We only have one independent step size, h, so we perform calculations with three
values, h, 2h, and 4h and compare results.
The inverse function t(x) to the boundary function y(t) is the easier one. The
calculated values are:
n
X
t(xn ) = t(nh) = tn = ki .
1

Based on h-values of 1/80, 1/40, 1/20, and 1/10 we have calculated values for
x = 0.1, 0.2, . . . , 1.0 . In Table 15.1 we supply in column 2 and 3 the order
ratios corresponding to the three small step sizes and the three large step sizes,
respectively. It is clearly seen that the results are first order and furthermore that
they follow the scheme of 2 + ε and 2 + 2ε as formula (10.14) would prescribe.
Richardson extrapolation can be performed and the order ratios in column 4
indicate, that the extrapolated results are indeed second order. In columns 5 and
6 we give extrapolated values for t(x) to second and third order, respectively.
The error estimate on the values in column 6 indicate that the error is positive
and at most 2 units in the last figure.

Table 15.1: Order ratios and extrapolated values for t(x).

x Order ratios Extrapolated


0.1 2.016 2.035 4.442 0.1047219 0.1047188
0.2 2.011 2.023 4.268 0.2180340 0.2180301
0.3 2.008 2.017 4.193 0.3390453 0.3390411
0.4 2.006 2.013 4.150 0.4671460 0.4671419
0.5 2.005 2.010 4.122 0.6018826 0.6018785
0.6 2.004 2.008 4.102 0.7428988 0.7428949
0.7 2.003 2.007 4.086 0.8899047 0.8899009
0.8 2.003 2.006 4.074 1.0426579 1.0426543
0.9 2.003 2.005 4.064 1.2009517 1.2009482
1.0 2.002 2.004 4.056 1.3646068 1.3646035

Because of the variable step sizes in the t-direction we have little control over
the t-values where we calculate approximations to u(t, x) and y(t). In order to

157
get function values for specific values of t which we must for order and error
estimation, we resort to interpolation.
Since our basic approximations are first order it would seem that linear interpola-
tion would be adequate. For reasons to be elaborated in Appendix C we prefer to
go one step further and use 3-point interpolation in order to minimize the effect
of the erratic error components introduced by interpolation.
We first look at the boundary function, y(t). Like before we calculate with four
h-values from 1/80 and up to 1/10 and supply in Table 15.2 in column 2 and 3
the order ratios corresponding to the three small step sizes and the three large
step sizes, respectively. It is clearly seen that the results are first order but the
(2 + ε)-effect is sometimes drowned by the interference of the interpolation error.
Richardson extrapolation can be performed and the order ratios in column 4 indi-
cate, that the extrapolated results are probably second order although the effect
of the interpolation error blurs the picture. In column 5 we give extrapolated
values for y(t) and based on the error estimates we claim that the error is at
most one unit in the last figure.

Table 15.2: Order ratios and extrapolated values for y(t).

t Order ratios y(t)


0.1 2.045 2.103 4.697 0.09567
0.2 2.018 2.076 8.538 0.18454
0.3 2.042 2.021 1.025 0.26840
0.4 2.027 2.028 2.049 0.34825
0.5 2.019 2.073 7.905 0.42483
0.6 2.022 2.047 4.281 0.49864
0.7 2.017 2.039 4.680 0.57003
0.8 2.019 2.028 3.050 0.63932
0.9 2.018 2.038 4.246 0.70673
1.0 2.014 2.033 4.587 0.77244

For u(t, x) the picture is similar. We refer to Appendix C for sample values of
the order ratio and to Table 15.3 for extrapolated values which according to the
error estimate are correct up to one unit in the last figure.

158
Table 15.3: Extrapolated values of u(t, x) using 3-point interpolation.

u(t, x)
t\x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0.1 0.0918
0.2 0.1717 0.0755
0.3 0.2437 0.1472 0.0575
0.4 0.3099 0.2131 0.1226 0.0384
0.5 0.3715 0.2745 0.1834 0.0982 0.0188
0.6 0.4294 0.3322 0.2407 0.1547 0.0741
0.7 0.4843 0.3869 0.2950 0.2083 0.1268 0.0505
0.8 0.5365 0.4391 0.3467 0.2594 0.1771 0.0998 0.0272
0.9 0.5865 0.4889 0.3963 0.3085 0.2254 0.1472 0.0736 0.0045
1.0 0.6345 0.5368 0.4439 0.3556 0.2720 0.1929 0.1183 0.0481

159
160
Chapter 16

The American Option

16.1 Introduction

Another free boundary problem arises when modeling the price of an American
option. Because of the early exercise property we do not know beforehand the
extent of the region where the differential equation must be solved. The position
of the boundary must be calculated along with the solution. The following is
joint work with Asbjørn Trolle Hansen as presented in [15] which is also part of
Asbjørn’s Ph.D.-thesis [14].

16.2 The mathematical model

The differential equation for the price function for an American put option is

1 2 2
ut = σ x uxx + rxux − ru, t > 0, x > y(t) (16.1)
2
where t is the time to expiry and x is the price of the underlying risky asset.
The initial condition is

u(0, x) = 0, x≥K (16.2)

and the boundary conditions are

lim u(t, x) = 0, t > 0, (16.3)


x→∞
u(t, y(t)) = K − y(t), t > 0, (16.4)
ux (t, y(t)) = −1, t > 0. (16.5)

161
The function y(t) is called the exercise boundary and this function is not known
beforehand except for the information that

lim y(t) = K (16.6)


t→0

where K is the exercise price. The region

C = {(t, x) | t > 0, x > y(t)} (16.7)

is called the continuation region. It is in this region we seek the solution of (16.1),
and it is characterized by the condition that u(t, x) > K − x. The region

S = {(t, x) | t > 0, x < y(t)} (16.8)

is called the stopping region. We can assign the price

u(t, x) = K − x, (t, x) ∈ S (16.9)

but we must emphasize that this is not a solution to (16.1).

C
K

y(t)
S
0 t

Figure 16.1: The continuation region and the stopping region.

The initial condition is only given for x ≥ K because the American option will
never expire in the money due to the early exercise feature. If one wishes, one
can extend (16.9) to t = 0.
The boundary condition (16.3) expresses that the value of the option approaches
0 as the price of the underlying asset approaches infinity.
The boundary condition (16.4) expresses that we are at the exercise boundary,
and (16.5) is the smooth fit condition expressing that the partial derivative of u

162
with respect to x is continuous across the boundary if we assume the price from
(16.9) in the stopping region.
In [26] the above problem has been studied extensively, and the existence, unique-
ness, and differentiability of u(t, x) and y(t) has been shown. We shall now study
the behaviour near the boundary further.
Theorem.
a. y(t) < K for t > 0.
b. y ′(t) < 0 for t > 0.
c. ut is continuous across the boundary. More specifically

lim ut (s, y(t)) = lim ut (s, y(t)) = 0. (16.10)


s↓t s↑t

d. uxx is discontinuous across the boundary. More specifically


2rK
lim uxx (s, y(t)) = (16.11)
s↓t (σy(t))2

whereas
lim uxx (s, y(t)) = 0.
s↑t

Proof. If y(t) > K for some t > 0 then u(t, y(t)) < 0 by (16.4) which is
counterintuitive.
If y(t) = K for some t > 0 then u(t, y(t)) = 0 and because of (16.5) we would
have u(t, y(t) + ε) < 0 for small, positive ε which again is counterintuitive.
Now consider for k > 0

u(t + k, y(t)) − u(t, y(t))

= u(t + k, y(t + k)) − u(t + k, y(t + k)) + u(t + k, y(t)) − u(t, y(t))
= K − y(t + k) − K + y(t) − (y(t + k) − y(t))ux(t + k, z)
for some z between y(t) and y(t + k). If we also use the mean value theorem on
the very first expression then there is a θ ∈ (0, 1) such that

kut (t + θk, y(t)) = (y(t) − y(t + k))(1 + ux (t + k, z)).

From the definition of the continuation region we know that u(t, x) > K − x for
x > y(t) and it follows that ux (t, x) > −1 for x − y(t) small and positive. It
also follows that ut (t, x) > 0 for x − y(t) small and positive. We therefore must
have y(t) > y(t + k) and therefore y ′(t) < 0 for t > 0. Applying the mean value
theorem to y, dividing by k, and letting k → 0 gives

lim ut (s, y(t)) = 0.


s↓t

163
The limit from the other side is also 0 since u(t, x) is independent of t in the
stopping region and we have established (16.10).
We therefore have
1
lim{ σ 2 x2 uxx (s, y(t)) + rxux (s, y(t)) − ru(s, y(t))} = 0.
s↓t 2

By (16.4) and (16.5) and the continuity of u and ux in C (16.11) follows. That
the limit from the other side is 0 follows from the fact that u is a linear function
in S. ✷
We conclude from (16.1) and (16.2) that

lim ut (t, x) = 0, x>K


t↓0

but we note that limt→0 ut (t, K) is undefined. For reasons of monotonicity we


expect to have

ut (t, x) > 0, ux (t, x) < 0, uxx (t, x) > 0, for (t, x) ∈ C.

16.3 The boundary condition at infinity

A boundary condition at infinity is impractical when implementing a finite dif-


ference method. A commonly used technique to avoid it is to pick some large
L > K and replace (16.3) by

u(t, L) = 0, t > 0. (16.12)

Since the solution u(t, x) is usually very small for large x the error in using (16.12)
instead of (16.3) should be small. But how small is it and what is the effect on
the boundary curve.
Let us first look at the steady-state solution to the original problem, i.e. u(x) =
limt→∞ u(t, x). Since limt→∞ ut (t, x) = 0 we have the following ordinary differen-
tial equation problem for u(x)
1 2 2 ′′
σ x u + rxu′ − ru = 0, y ≤ x < ∞, (16.13)
2
lim u(x) = 0, (16.14)
x→∞
u(y) = K − y, (16.15)
u′ (y) = −1. (16.16)

where y = limt→∞ y(t).


To find the general solution to (16.13) we try with the power function u(x) = xz

164
and get the characteristic equation
1 2
σ z(z − 1) + rz − r = 0
2
or
1 2 2 1
σ z + (r − σ 2 )z − r = 0.
2 2
The discriminant of this quadratic is
1 1
disc = (r − σ 2 )2 + 2rσ 2 = (r + σ 2 )2
2 2
such that
−(r − 12 σ 2 ) ± (r + 12 σ 2 )
z=
σ2
and the two roots are z = 1 and z = −γ = − σ2r2 . The general solution can
therefore be written
x x
u(x) = A + B( )−γ . (16.17)
K K
The boundary condition at infinity gives A = 0, and from (16.15) and (16.16) we
get
y y
u(y) = B( )−γ = K − y ⇒ B = (K − y)( )γ ,
K K
B y K−yK
u′ (y) = −γ ( )−γ−1 = −γ = −1
K K K y
γ
⇒ −γK + γy = −y ⇒ y = K
γ+1

1 γ γ
⇒ B = K ( ) . (16.18)
γ+1 γ+1

If we replace the upper boundary condition (16.14) with u(L) = 0 the general
solution is still (16.17) but the determination of A and B becomes a bit more
complicated.
L L L
u(L) = 0 ⇒ A + B( )−γ = 0 ⇒ B = −A( )γ+1 .
K K K
The boundary conditions (16.15) and (16.16) now give

y L y
A − A( )γ+1 ( )−γ = K − y,
K K K
A A L γ+1 y −γ−1
+γ ( ) ( ) = −1.
K K K K
165
The second equation gives

y K +A −γA 1/(γ+1)
( )−γ−1 = ⇒ y = L( )
L −γA K +A

and the first equation now gives

A A 1 γ K γ+1
(1 + )γ = − ( ) = −α. (16.19)
K K γ γ+1 L

We cannot give a closed form solution for A from (16.19) but when α is small,
A/K will also be small, and (1 + A/K)γ will be close to 1, and an approximate
value is A(1) = −αK. A better value can be obtained by Newton-iteration where
the next iterate will be
A(2) = −(α + β)K
with
(1 − α)((1 − α)−γ − 1)
β=α .
1 − α(γ + 1)

Table 16.1: Corresponding values of L, α, β, A, B, and y.

L α β A B y
120 0.022 431 0.003 044 −2.5527 7.6224 85.516
140 0.008 896 0.000 426 −0.9322 7.0191 84.117
200 0.001 047 0.000 006 −0.1052 6.7333 83.421
∞ 6.6980 83.333

In Table 16.1 we supply values for α, β, A, B, and y for various values of L and
corresponding to K = 100, σ = 0.2, r = 0.1 and therefore γ = 5.
Denote the solution function and the boundary function corresponding to a finite
value of L by uL (t, x) and yL (t), respectively. We note that the limit value
limt→∞ yL (t) moves upwards as the value for L decreases and we conclude that
this holds for the whole boundary curve since otherwise the boundary curves for
two different values of L would intersect.
We want to estimate the error in u and y when we use a finite value, L, so we
define the error function

w(t, x) = u(t, x) − uL (t, x).

166
L

K
y L (t)
y(t)

0 t

Figure 16.2: y(t) and yL (t).

It is defined in the same region as uL since yL (t) > y(t) and here it satisfies the
same differential equation:
1 2 2
wt = σ x wxx + rxwx − rw, t ≥ 0, yL (t) ≤ x ≤ L. (16.20)
2
The initial condition is

w(0, x) = 0, K ≤ x ≤ L, (16.21)

and the boundary conditions are

w(t, L) = u(t, L), t ≥ 0, (16.22)


w(t, yL (t)) = u(t, yL (t)) − K + yL (t), t > 0, (16.23)
wx (t, yL (t)) = ux (t, yL (t)) + 1, t > 0. (16.24)

Using Taylor expansion we get


1
u(t, yL (t)) = u(t, y(t)) + (yL (t) − y(t))ux + (yL (t) − y(t))2 uxx + · · ·
2
1 2rK
= K − y(t) − yL (t) + y(t) + (yL (t) − y(t))2 2 +···
2 σ y(t)2
yL (t) rK
= K − yL (t) + ( − 1)2 2 + · · · ,
y(t) σ
yL (t) 2rK
ux (t, yL (t)) = −1 + ( − 1) 2 +···
y(t) σ y(t)

167
so the conditions (16.23) and (16.24) read

yL (t) rK
w(t, yL (t)) = ( − 1)2 2 + · · · , t > 0,
y(t) σ
yL (t) 2rK
wx (t, yL (t)) = ( − 1) 2 + ···, t > 0.
y(t) σ y(t)

We note that w(t, L) > 0, w(t, yL (t)) > 0, wx (t, yL (t)) > 0, wt (t, yL (t)) > 0, and
wt (t, L) > 0. This suggests that w(t, x) > 0 and is increasing with t and x. We
therefore have
K γ
0 ≤ w(t, x) < lim w(t, x) = u(x) − uL (x) ≤ u(L) = B( )
t→∞ L

with B given by (16.18). With the above parameter values K = 100, σ = 0.2,
r = 0.1, and L = 200, the error for at the money options is bounded by u(K) −
uL (K) = 0.07, and the maximum error for any x is bounded by u(L) = 0.21.

100

95

90

85

0 1 2 3 4 5

Figure 16.3: The boundary curve calculated with Brennan-Schwartz.

16.4 Finite difference schemes

If we introduce a traditional grid with fixed step sizes k and h then we face
the problem that the boundary curve, y(t), typically passes between grid points.
There are various ways to deal with this difficulty.

168
Table 16.2: The price function calculated with Brennan-Schwartz.
x\t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 4.0
195 0.000 0.000 0.001 0.002 0.005 0.008 0.011 0.014 0.019
190 0.000 0.000 0.002 0.005 0.011 0.018 0.024 0.031 0.042
185 0.000 0.000 0.003 0.009 0.019 0.029 0.040 0.050 0.068
180 0.000 0.001 0.005 0.014 0.028 0.044 0.059 0.074 0.099
175 0.000 0.001 0.008 0.022 0.041 0.062 0.083 0.103 0.136
170 0.000 0.002 0.012 0.032 0.058 0.086 0.113 0.139 0.181
165 0.000 0.004 0.019 0.047 0.081 0.117 0.152 0.184 0.237
160 0.000 0.007 0.031 0.069 0.113 0.159 0.202 0.241 0.305
155 0.000 0.012 0.048 0.100 0.158 0.214 0.267 0.313 0.390
150 0.001 0.022 0.076 0.146 0.219 0.288 0.351 0.407 0.497
145 0.002 0.041 0.119 0.211 0.303 0.387 0.462 0.528 0.632
140 0.006 0.073 0.185 0.306 0.419 0.521 0.609 0.685 0.805
135 0.016 0.129 0.287 0.442 0.580 0.700 0.803 0.890 1.025
130 0.040 0.226 0.442 0.636 0.803 0.943 1.060 1.158 1.308
125 0.097 0.390 0.676 0.915 1.111 1.271 1.402 1.511 1.675
120 0.223 0.661 1.026 1.311 1.535 1.714 1.859 1.977 2.154
115 0.488 1.101 1.545 1.872 2.122 2.316 2.471 2.596 2.781
110 1.009 1.795 2.306 2.665 2.931 3.135 3.295 3.423 3.611
105 1.962 2.868 3.411 3.780 4.048 4.252 4.410 4.535 4.718
100 3.519 4.456 4.988 5.340 5.591 5.780 5.926 6.040 6.207
95 6.142 6.836 7.249 7.530 7.731 7.885 8.004 8.096 8.232
90 10.222 10.430 10.589 10.708 10.803 10.877 10.934 11.022

At a given time level we can artificially move the boundary curve to the nearest
grid point. We hereby introduce an error in the x-direction of order h and this
is undesirable.
We can also introduce difference approximations to ux and uxx based on uneven
steps. This is a viable approach when the boundary curve is known beforehand,
but things become complicated when y(t) has to be determined along with u(t, x).
A third method was proposed by Brennan & Schwartz in 1977 [4]. They start
with the initial values from (16.2) augmented with values from (16.9) for t = 0
and x < K. They choose a step size k in the time direction and a step size h
in the x-direction such that K/h is an integer, and a value L = Mh where M is
another integer. They then perform a Crank-Nicolson step to t = k computing a
1
set of auxiliary values νm , m = 0, 1, . . . , M = L/h with boundary values ν01 = K
1 1 1
and νM = 0. The solution values are now determined as vm = max(K − mh, νm )
1
and the position of the exercise boundary is given by ȳL (k) = h · max{m|νm ≤
K − mh}. Once v 1 has been determined we can move on to v 2 , v 3 , etc. This

169
Table 16.3: Order ratios corresponding to h.

x \ t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0
195 4.7 2.8 3.5 3.7 3.7 3.8 3.9 3.8
190 4.1 4.9 2.9 3.5 3.7 3.8 3.8 3.9 3.8
185 4.1 5.1 3.1 3.5 3.7 3.8 3.8 3.9 3.8
180 4.1 6.0 3.2 3.6 3.7 3.8 3.8 3.9 3.8
175 4.2 12.7 3.3 3.6 3.7 3.8 3.8 3.9 3.8
170 4.2 0.5 3.4 3.6 3.7 3.8 3.9 3.8 3.8
165 4.0 4.4 2.5 3.5 3.6 3.7 3.8 3.9 3.8 3.8
160 4.0 4.6 3.1 3.5 3.7 3.7 3.8 3.9 3.8 3.8
155 4.0 5.5 3.3 3.6 3.7 3.8 3.8 3.9 3.8 3.8
150 4.0 -2.4 3.4 3.6 3.7 3.8 3.9 3.9 3.8 3.8
145 4.0 2.8 3.5 3.7 3.7 3.8 3.9 3.9 3.8 3.8
140 4.1 3.3 3.6 3.7 3.8 3.8 3.9 3.8 3.8 3.8
135 4.3 3.5 3.6 3.7 3.8 3.8 3.9 3.8 3.8 3.9
130 -4.0 3.6 3.7 3.8 3.8 3.9 3.9 3.8 3.8 3.9
125 3.4 3.6 3.7 3.8 3.8 4.0 3.9 3.7 3.8 4.0
120 3.6 3.7 3.7 3.8 3.8 4.1 3.8 3.7 3.8 4.1
115 3.6 3.7 3.7 3.9 3.8 4.2 3.7 3.7 3.9 4.2
110 3.6 3.7 3.7 3.8 3.9 4.3 3.4 3.7 3.9 4.5
105 3.6 3.7 3.7 3.6 4.1 4.4 3.2 3.9 4.0 4.9
100 2.7 2.0 1.4 1.1 0.8 0.6 0.5 0.4 0.3 0.2
95 -2.5 -30.2 11.4 5.2 4.3 2.4 2.3 2.4 2.6 3.0
90 -1.1 -1.1 -0.8 -1.1 -0.5 -0.7 -1.3 -1.9 -1.4

may seem like a rather harsh treatment of the problem, but the results seem
reasonable at a first glance.
In Fig. 16.3 we show a plot of the computed boundary curve for the parameter
values K = 100, σ = 0.2, r = 0.1, and L = 200, and calculated with h = 0.25
and k = 0.0625. In Table 16.2 we give values for the price function for a selection
of points in the continuation region.
In order to estimate the error we apply the techniques of chapter 10. In Table
16.3 we supply the order ratios which should be close to 4.0 if the method is
second order in h. This looks fairly reasonable for x > K. Occasional isolated
deviations correspond to small values of the associated error estimate which is
shown in Table 16.4. For x ≤ K the order determination is far from reliable and
the error estimate due to the x-discretisation is unfortunately also much larger
here, close to the exercise boundary.

170
Table 16.4: Error estimate *1000 corresponding to h.

x\t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0
195 -0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
190 -0.00 -0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
185 -0.00 -0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
180 -0.00 -0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
175 -0.00 -0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01
170 -0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.01
165 -0.00 -0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.01
160 -0.00 -0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.01 0.01
155 -0.00 -0.00 0.00 0.01 0.01 0.01 0.01 0.01 0.01 0.01
150 -0.00 0.00 0.00 0.01 0.01 0.01 0.02 0.02 0.02 0.02
145 -0.00 0.00 0.01 0.01 0.02 0.02 0.02 0.02 0.02 0.02
140 -0.00 0.00 0.01 0.02 0.02 0.02 0.02 0.03 0.03 0.03
135 -0.00 0.01 0.02 0.03 0.03 0.03 0.03 0.03 0.03 0.03
130 0.00 0.02 0.03 0.03 0.04 0.04 0.04 0.04 0.04 0.04
125 0.01 0.03 0.04 0.05 0.05 0.05 0.05 0.05 0.05 0.05
120 0.03 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06
115 0.07 0.08 0.08 0.08 0.08 0.07 0.08 0.07 0.07 0.08
110 0.13 0.12 0.11 0.10 0.10 0.09 0.09 0.08 0.09 0.09
105 0.15 0.17 0.15 0.13 0.12 0.10 0.11 0.09 0.12 0.11
100 -6.74 -8.53 -9.31 -9.57 -9.52 -9.30 -8.98 -8.59 -8.12 -7.71
95 -0.50 -0.06 0.18 0.30 0.42 0.50 0.45 0.47 0.51 0.40
90 -1.90 -1.72 -1.57 -1.43 -1.07 -1.17 -0.91 -0.81 -0.91

In Table 16.5 we supply the similar ratios corresponding to the time discretisa-
tion. Values close to 2.0 indicate that the method is first order in k contrary
to our expectations of a method based on Crank-Nicolson. Also here the order
determination leaves a lot to be desired for x ≤ K, and the error estimate due to
the time discretisation is also much larger here as seen in Table 16.6. We must
conclude that the Brennan-Schwartz approach is not ideal.

16.5 Varying the time steps

For the American option problem where the boundary curve is known to be mono-
tonic we can suggest an alternate approach. Since the finite difference methods
which we are usually considering for parabolic problems are one-step methods
there is no need to keep the step size in time constant throughout the calcu-

171
Table 16.5: Order ratios corresponding to k.

x\t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0
195 5.8 3.6 0.7 2.2 2.1 2.0 1.9 1.9 1.9 1.9
190 5.5 3.7 1.0 2.2 2.1 2.0 1.9 1.9 1.9 1.9
185 4.9 3.9 1.3 2.2 2.1 2.0 1.9 1.9 1.9 1.9
180 4.4 4.3 1.7 2.2 2.1 2.0 1.9 1.9 1.9 1.9
175 4.0 5.1 1.9 2.2 2.0 2.0 1.9 1.9 1.9 1.9
170 3.6 7.7 2.1 2.1 2.0 1.9 1.9 1.9 1.9 1.9
165 3.3 -25.1 2.2 2.1 2.0 1.9 1.9 1.9 1.9 1.9
160 3.0 -0.3 2.2 2.1 2.0 1.9 1.9 1.9 1.9 1.9
155 2.9 1.5 2.2 2.0 1.9 1.9 1.9 1.9 1.9 1.9
150 3.0 2.2 2.2 2.0 1.9 1.9 1.9 1.9 1.9 1.9
145 3.5 2.6 2.1 1.9 1.9 1.9 1.9 1.9 1.9 1.9
140 6.0 2.7 2.0 1.9 1.9 1.9 1.9 1.9 1.9 1.9
135 -2.7 2.5 2.0 1.9 1.8 1.8 1.9 1.9 1.9 1.9
130 1.0 2.2 1.9 1.8 1.8 1.8 1.8 1.9 1.9 1.9
125 2.4 1.8 1.8 1.8 1.8 1.8 1.9 1.9 1.9 1.9
120 4.3 1.6 1.7 1.8 1.8 1.9 1.9 1.9 1.9 1.9
115 5.5 2.2 1.7 1.7 1.8 1.8 1.9 1.9 1.9 1.9
110 -0.1 2.4 2.3 2.0 1.9 1.9 1.8 1.8 1.9 1.9
105 -11.0 -0.6 0.6 1.3 1.7 2.0 2.1 2.1 2.1 2.1
100 1.8 1.9 2.0 2.0 1.9 1.9 1.9 1.9 1.8 1.8
95 -5.4 -1.5 -0.5 -0.0 0.3 0.5 0.6 0.7 0.8 0.9
90 1.2 1.2 1.2 1.3 1.4 1.4 1.4 1.5

lation. Instead we propose to choose the step sizes kn such that the boundary
curve will pass exactly through grid points. This idea was proposed for the orig-
inal Stefan Problem by Douglas & Gallie [9]. In our case the boundary curve is
decreasing so we shall choose kn such that there is precisely one extra grid point
at the next time level, see Fig. 16.4. This will imply that the initial time steps
will be very small and then keep increasing, but as we shall see that is not such
a bad idea from other points of view as well.

16.6 The implicit method

Since there is exactly one extra grid point on the next time level the situation
is ideally suited for the implicit method which will never have to refer to points
outside the continuation region.

172
Table 16.6: Error estimate *100 corresponding to k.

x \ t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 4.0


195 -0.00 -0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.02
190 -0.00 -0.00 0.00 0.01 0.01 0.02 0.02 0.03 0.04
185 -0.00 -0.00 0.00 0.01 0.02 0.03 0.04 0.05 0.06
180 -0.00 -0.00 0.01 0.02 0.03 0.05 0.06 0.07 0.09
175 -0.00 -0.00 0.01 0.03 0.05 0.07 0.08 0.10 0.12
170 -0.00 -0.00 0.02 0.04 0.07 0.09 0.11 0.13 0.15
165 -0.00 0.00 0.03 0.06 0.09 0.12 0.15 0.17 0.20
160 -0.01 0.01 0.05 0.09 0.12 0.16 0.19 0.22 0.26
155 -0.01 0.02 0.07 0.12 0.17 0.21 0.25 0.28 0.33
150 -0.02 0.04 0.11 0.17 0.23 0.28 0.32 0.36 0.41
145 -0.02 0.08 0.16 0.24 0.30 0.36 0.42 0.46 0.52
140 -0.02 0.13 0.23 0.33 0.41 0.48 0.54 0.59 0.66
135 0.03 0.22 0.34 0.45 0.55 0.63 0.70 0.75 0.83
130 0.15 0.34 0.49 0.63 0.74 0.83 0.91 0.96 1.05
125 0.36 0.51 0.71 0.87 1.00 1.10 1.18 1.24 1.33
120 0.54 0.77 1.01 1.21 1.35 1.46 1.54 1.60 1.69
115 0.66 1.15 1.48 1.68 1.83 1.94 2.02 2.08 2.16
110 1.67 1.65 2.06 2.33 2.49 2.59 2.66 2.70 2.77
105 0.60 3.36 3.56 3.48 3.43 3.43 3.45 3.48 3.54
100 19.86 15.82 13.94 12.90 12.22 11.72 11.32 10.99 10.46
95 4.33 7.09 7.09 6.94 6.86 6.82 6.81 6.80 6.77
90 9.50 9.30 9.07 8.85 8.63 8.50 8.45 8.36

Like Brennan and Schwartz we choose a step size in the x-direction, h = K/M0 ,
where M0 is a positive integer, and such that M = L/h is also an integer. The
step size h is kept fixed during the computation. The number of grid points above
the exercise boundary is thus M − M0 at time t = 0 which corresponds to the
expiration time. For each time step we add one extra grid point in the x-direction
such that at the end of the n-th time step we have M − Mn grid points above the
boundary where Mn = M0 − n. The time steps will be denoted k1 , k2 , . . . and we
P
define tn = ni=1 ki . We now compute a grid function
n
vm = v(tn , mh) ≈ u(tn , mh)
satisfying
n n−1
vm − vm σ2 2 n n n
= m (vm+1 − 2vm + vm−1 ) (16.25)
kn 2
r n n n
+ m(vm+1 − vm−1 ) − rvm , m = Mn + 1, . . . , M − 1,
2
173
K

Figure 16.4: The first few grid lines.

0
vm = 0, m = M0 , . . . , M, (16.26)
n
vM = 0, (16.27)
n
vMn = K − Mn h = nh, (16.28)
n n n
−vm+2 + 4vm+1 − 3vm
−1 = , m = Mn , (16.29)
2h
n
n = 1, 2, . . .. The first order approximation to the boundary derivative vm+1 −
n
vm = −h is abandoned in favour of (16.29) for reasons of accuracy and because
of problems with the initial steps.
The equations (16.25) – (16.29) are non-linear so we propose an iterative ap-
proach. We guess a value for kn , solve the linear tridiagonal system (16.25),
(16.27), and (16.28), and use formula (16.29) to correct the time step. Alter-
natively we could solve the (almost) tridiagonal system determined by (16.25),
(16.27), and (16.29), and use formula (16.28) to correct the time step. Numerical
experiments, however, suggest that the former method is computationally more
efficient in that it requires fewer iterations when searching for the size of the next
time step.

Getting Started

For n = 1 equations (16.28) and (16.29) give


1 1
−vm+2 + 4vm+1 = h, (m = M1 = M0 − 1).

174
1 1
When h is small we expect vm+2 to be very small. We put vm+2 = εh and get

1 h
vm+1 = (1 + ε), m = M1 .
4
Applying (16.25) with m = M1 + 1 = M0 = K/h we then get
h(1 + ε) σ2K 2 1+ε rK h
= 2
(ε − + 1)h + (ε − 1)h − r (1 + ε)
4k1 2h 2 2h 4
2 2
1 σ K 1 − ε 2rK
⇒ = 2
− −r
k1 h 1+ε h
h2 h2
⇒ k1 = ≈ 2 2 (16.30)
σ 2 K 2 − 2rKh 1−ε
1+ε
− rh2 σ K − 2rKh − rh2
and we have a good initial guess for the size of the first time step. Note further
that
h2
k1 ≈ 2 2
σ K
so it appears that the boundary curve y(T ) starts out from (0, K) with a vertical
tangent and with a shape much like a parabola with its apex at (0, K).
For n > 1 we could as a starting value for kn use kn−1 , but for n > 2 it turns out
to be more efficient to use
kn1 = 2kn−1 − kn−2 (16.31)
based on the assumption that the second derivative of yL (t) is small. Given the
initial value kn1 , a good second value can be obtained as follows. First we solve
n
equations (16.25) – (16.28) and obtain tentative values for vm , m = Mn , . . . , M.
1
We then evaluate the accuracy of kn by calculating the error term
n n n
−vM n +2
+ 4vM n +1
− 3vM
e = −1 − n
. (16.32)
2h
If e = 0 then kn1 is the right size of the time step. Otherwise we would like to
change kn such that e = 0. First notice that
n
∂vM
n n n
4∆vM n +1
− ∆vM n +2
3∆vM n +1
3 ∂t
n +1
∆t
∆e = − ≈− ≈− .
2h 2h 2h
An approximation to the time derivative at grid point (n, Mn +1) can be obtained
by
n n n−1
∂vM n +1
vM n +1
− vM n +1
≈ 1
∂t kn
and since we want ∆e = −e our second value for kn becomes
2 hkn1 e
kn2 = kn1 + n n−1 . (16.33)
3 vM n +1
− vM n +1

175
Iterating kn

With a proposed value for kn we can rewrite equation (16.25) as

n n n n n n n n−1
vm − αm kn (vm+1 − 2vm + vm−1 ) − βm kn (vm+1 − vm−1 ) + rkn vm = vm

with αm = 12 σ 2 m2 and βm = 21 rm. Collecting terms we get

n n n n−1
(βm − αm )kn vm−1 + (1 + (2αm + r)kn )vm − (βm + αm )kn vm+1 = vm , (16.34)

m = Mn + 1, . . . , M − 1. With the two boundary conditions (16.27) and (16.28)


we have as many equations as unknowns and we can solve the resulting system of
equations. The calculated values are then checked with equation (16.29). If they
do not fit, and they seldom do the first time around, the value for kn is adjusted,
and the equations are solved again until a satisfactory agreement with (16.29) is
achieved.
The first and the second value for kn have been discussed above. The general
way of calculating kni+1 from kni−1 and kni for i > 1 is by using the secant method
on formula (16.32):

kni − kni−1
kni+1 = kni − ei (16.35)
ei − ei−1

where ei is calculated from (16.32) with v-values calculated with kni . The iteration
is continued until two successive values of kni differ by less than a predetermined
tolerance.
Table 16.7 illustrates the time step iterations corresponding to the usual set of
parameter values K = 100, σ = 0.2, r = 0.1, and L = 200, and calculated with
h = 1. We note that the initial guess for k1 from (16.30) is reasonably good, and
so is the second guess from (16.33). k1 is a reasonable initial guess for k2 which
turns out to be only slightly larger than k1 , thus not supporting the parabola
hypothesis. For the subsequent steps (16.31) is a reasonable first guess although
it always undershoots. In all cases the subsequent secant method displays rapid
convergence. We have chosen a tolerance of 10−10 because even when only a
limited accuracy is demanded in the final results it is important that the time
steps are correct. And since the secant method has superlinear convergence the
last decimals are inexpensive (cf. Table 16.14).
Because the boundary curve has a horizontal asymptote the time steps must
eventually increase without bound as we approach the magical value of yL . We
stop the calculations when the proposed time step exceeds a predetermined large
value (or becomes negative).

176
Table 16.7: Time step iterations with the implicit method.

n 1 2 3 4 5 6
1 0.00263227 0.00262001 0.00261359 0.00261361 0.00261361
2 0.00261361 0.00290627 0.00305600 0.00306050 0.00306055 0.00306055
3 0.00350749 0.00387369 0.00404826 0.00404955 0.00404955
4 0.00503856 0.00595513 0.00624391 0.00624057 0.00624058
5 0.00843160 0.01026408 0.01041317 0.01041053 0.01041053
6 0.01458048 0.01695680 0.01683730 0.01683827 0.01683827
7 0.02326601 0.02603689 0.02574487 0.02574540
8 0.03465253 0.03849420 0.03797056 0.03796966 0.03796966
9 0.05019391 0.05598995 0.05510357 0.05509914 0.05509915
10 0.07222864 0.08125947 0.07982226 0.07980969 0.07980971
11 0.10452027 0.11917606 0.11686413 0.11683260 0.11683268
12 0.15385565 0.17916186 0.17537148 0.17529225 0.17529254
13 0.23375241 0.28168447 0.27523543 0.27502091 0.27502202
14 0.37475150 0.47960268 0.46820027 0.46753680 0.46754150 0.46754149
15 0.66006096 0.95513540 0.93700802 0.93475183 0.93476999 0.93476997
16 1.40199846 2.89947139 2.99960311 3.04665061 3.04805724 3.04807080

16.7 Crank-Nicolson

When attempting to use the Crank-Nicolson method on this problem there is a


minor complication. Since the boundary curve is decreasing, the first interior grid
point at a particular time level corresponds to a boundary point at the previous
time level. If we want to use a Crank-Nicolson approximation here we shall refer
to a point outside the continuation region at the previous time level. Such a point
is usually called a fictitious point and it must be treated separately. There are
(at least) three suggestions:
1. Use the value K − x at the fictitious point. This is not correct but reasonably
close.
2. Extrapolate from the boundary point and its neighbour using the central
difference approximation to ux which we know is −1.
3. Replace the difference approximations to ux and uxx on the boundary by the
‘exact’ values of −1 and σ22rK
y(t)2
from (16.5) and (16.11).
In practice these three approaches perform equally well. We shall therefore con-
centrate on the method described in 3. and note that the details in implementing

177
1. and 2. can be filled in similarly. We do make one exception from the method-
ology described in 3. At the initial step neither ux (0, K) nor uxx (0, K) is defined
so we need to deal with the initial step differently. Instead we look at suggestion
1. and approximate
u(0, K + h) − u(0, K − h) 1
ux (0, K) by = −
2h 2
and
u(0, K + h) − 2u(0, K) + u(0, K − h) 1
uxx (0, K) by 2
= .
h h
With notation as in the previous section we compute a grid function
n
vm = v(tn , mh) ≈ u(tn , mh)
satisfying
1 0
vm − vm σ2 2 1 1 1
= m (vm+1 − 2vm + vm−1 + h) (16.36)
kn 4
r 1 1 r 1
+ m(vm+1 − vm−1 − h) − vm , m = M0 , n = 1,
4 2
n n−1
vm − vm σ2 2 n n n rK
= m (vm+1 − 2vm + vm−1 )+ (16.37)
kn 4 2
r n n r n n−1
+ m(vm+1 − vm−1 − 2h) − (vm + vm ), m = Mn−1 , n > 1,
4 2
n n−1
vm − vm σ2 2 n n n n−1 n−1 n−1
= m (vm+1 − 2vm + vm−1 + vm+1 − 2vm + vm−1 ) (16.38)
kn 4
r n n n−1 n−1 r n n−1
+ m(vm+1 − vm−1 + vm+1 − vm−1 ) − (vm + vm ),
4 2
m = Mn + 2, . . . , M − 1,
0
vm = 0, m = M0 , . . . , M, (16.39)
n
vM = 0, (16.40)
n
vMn = K − Mn h = nh, (16.41)
n n n
−vm+2 + 4vm+1 − 3vm
−1 = , m = Mn , (16.42)
2h
n = 1, 2, . . .. Just as for the implicit method we guess a value for kn , solve the
tridiagonal system of equations given by (16.36) – (16.41), and use (16.42) to
correct the time step. Alternatively (16.42) could be incorporated in the system
of equations and (16.41) be used for the correction. We prefer the former since
it appears to lead to fewer iterations.

Getting Started

For n = 1 equations (16.41) and (16.42) give


1 1
−vm+2 + 4vm+1 = h, (m = M1 = M0 − 1).

178
1 1
When h is small we expect vm+2 to be very small. We put vm+2 = εh and get

1 h
vm+1 = (1 + ε), m = M1 .
4
Applying (16.36) with m = M1 + 1 = M0 = K/h we then get

h(1 + ε) σ 2 m2 1+ε rm rh
= (ε − + 2)h + (ε − 2)h − (1 + ε)
4k1 4 2 4 24
1 1 1 2−ε 1
⇒ = σ 2 m2 ( + )− rm − r
k1 2 1+ε 1+ε 2
2
h
⇒ k1 = 1 1 2−ε
2 2
σ K ( 2 + 1+ε ) − rKh 1+ε − 12 rh2

and we have a good initial guess for the size of the first time step. Note further
that

h2 h2
k1 ≈ 3 2 2 ≈ (16.43)
2
σ K − 2rKh − 12 rh2 3 2 2
2
σ K

Table 16.8: Time step iterations with Crank-Nicolson.

n 1 2 3 4 5 6
1 0.00172429 0.00181050 0.00176141 0.00176115 0.00176116
2 0.00528347 0.00534501 0.00575238 0.00575488 0.00575490
3 0.00974864 0.00803102 0.00654763 0.00659399 0.00659352 0.00659352
4 0.00743214 0.00779111 0.00804438 0.00804408 0.00804408
5 0.00949463 0.01017074 0.01067041 0.01066699 0.01066699
6 0.01328991 0.01456567 0.01543735 0.01542461 0.01542468
7 0.02018236 0.02222535 0.02337563 0.02335643 0.02335653
8 0.03128838 0.03377432 0.03489219 0.03488019 0.03488023
9 0.04640393 0.04936500 0.05056048 0.05055332 0.05055333
10 0.06622644 0.07076255 0.07257623 0.07256957 0.07256957
11 0.09458581 0.10195371 0.10499982 0.10499678
12 0.13742398 0.14955464 0.15495471 0.15496947 0.15496949
13 0.20494220 0.22635535 0.23713380 0.23722496 0.23722525
14 0.31948101 0.36112641 0.38639870 0.38687650 0.38688015
15 0.53653504 0.63033291 0.70622891 0.70946632 0.70953139 0.70953144
16 1.03218273 1.30246815 1.65738437 1.70025643 1.70343860 1.70346436

179
Table 16.9: Order ratios for the implicit method.

x\t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0
195 6.9 1.7 2.0 1.9 1.5 - 5.7 2.0 1.7 1.6
190 6.4 1.7 2.0 1.9 1.5 - 5.6 2.0 1.7 1.6
185 5.8 1.7 2.0 1.9 1.5 6.6 5.3 2.0 1.7 1.7
180 5.2 1.7 2.0 1.9 1.5 4.5 5.1 2.0 1.7 1.7
175 4.5 1.6 2.0 1.9 1.5 3.4 4.8 2.0 1.7 1.7
170 4.0 1.6 2.0 1.9 1.3 2.7 4.5 2.0 1.7 1.7
165 3.5 1.6 1.9 1.8 0.9 2.3 4.3 2.1 1.8 1.7
160 3.1 1.5 1.9 1.7 7.1 2.1 4.1 2.1 1.8 1.7
155 2.8 1.5 1.9 1.3 2.5 1.9 3.9 2.1 1.8 1.8
150 2.5 1.5 1.9 -2.4 2.2 1.8 3.7 2.1 1.8 1.8
145 2.3 1.5 1.9 3.7 2.1 1.7 3.6 2.2 1.9 1.9
140 2.1 1.5 1.8 2.9 2.1 1.7 3.5 2.2 1.9 1.9
135 2.0 1.5 2.2 2.7 2.1 1.6 3.5 2.2 2.0 1.9
130 1.9 1.6 2.1 2.7 2.1 1.6 3.5 2.3 2.0 2.0
125 1.8 1.5 2.2 2.7 2.2 1.6 3.4 2.3 2.1 2.0
120 1.6 1.6 2.2 2.7 2.2 1.6 3.4 2.4 2.1 2.1
115 2.3 1.7 2.3 2.7 2.3 1.6 3.4 2.4 2.1 2.1
110 2.1 1.7 2.3 2.7 2.3 1.6 3.4 2.5 2.2 2.2
105 2.1 1.8 2.3 2.8 2.3 1.6 3.4 2.5 2.2 2.2
100 2.2 1.8 2.4 2.8 2.3 1.6 3.4 2.5 2.2 2.2
95 2.1 1.8 2.3 2.7 2.3 1.7 3.4 2.5 2.2 2.2
90 1.6 1.7 2.1 2.5 2.2 1.7 3.3 2.4 2.2 2.2
85 8.0 0.6 0.7 1.0 1.1
y 2.1 1.6 2.3 2.8 2.3 1.5 3.5 2.5 2.2 2.2

so just like for the implicit method it appears that the boundary curve y(T ) starts
out from (0, K) with a vertical tangent and with a shape much like a parabola
with its apex at (0, K), although this time a slightly different parabola.
For n = 2 practical experience shows that it is efficient to exploit the fact that the
boundary curve at the beginning looks like a parabola with its apex at (0, K).
When f (x) = αx2 then f (2h)/f (h) = 4 so that f (2h) − f (h) = 3f (h). This
indicates that it might be a good idea to put k21 = 3k1 . This is in contrast to
the implicit method where the second step turns out to be of the same order
of magnitude as the first. Thus it looks like the parabola conjecture fits better
to Crank-Nicolson than to the implicit method, at least for the first two steps.
The third step with Crank-Nicolson is, however, not large enough to fit the same
pattern.

180
For n > 2 we proceed like with the implicit method and put

kn1 = 2kn−1 − kn−2. (16.44)

For the second guess we use (16.33) and for subsequent values the secant method
(16.35) is used just as with the implicit method producing better and better
values for the time step until the tolerance is met.
Table 16.8 illustrates the time step iterations corresponding to the usual set of
parameter values K = 100, σ = 0.2, r = 0.1, and L = 200, and calculated with
h = 1. Most comments from the previous table can be taken verbatim to this
one. The main differences are that k2 now is close to 3k1 whereas k3 is close
to k2 . The initial guess for k3 therefore overshoots. So much for the parabola
hypothesis.

Table 16.10: Error estimate *10 for the implicit method.


x\t 0.4 0.8 1.2 1.6 2.0
2.4 2.8 3.2 4.0
195 -0.000 -0.001 -0.002 -0.003 -0.002
0.000 0.002 0.008 0.013
190 -0.000 -0.002 -0.005 -0.006 -0.004
0.001 0.005 0.017 0.027
185 -0.000 -0.003 -0.007 -0.009 -0.006
0.002 0.008 0.028 0.044
180 -0.000 -0.005 -0.011 -0.013 -0.008
0.004 0.013 0.041 0.064
175 -0.000 -0.007 -0.014 -0.016 -0.009
0.008 0.019 0.057 0.087
170 -0.001 -0.010 -0.018 -0.018 -0.008
0.015 0.028 0.077 0.115
165 -0.001 -0.014 -0.022 -0.019 -0.005
0.026 0.040 0.102 0.147
160 -0.002 -0.018 -0.025 -0.018 0.001
0.041 0.056 0.132 0.185
155 -0.004 -0.024 -0.027 -0.013 0.012
0.063 0.078 0.170 0.230
150 -0.007 -0.030 -0.027 -0.003 0.029
0.093 0.106 0.215 0.283
145 -0.012 -0.035 -0.021 0.014 0.055
0.134 0.141 0.269 0.343
140 -0.020 -0.037 -0.009 0.041 0.090
0.185 0.184 0.333 0.411
135 -0.031 -0.031 0.015 0.079 0.136
0.249 0.236 0.406 0.486
130 -0.042 -0.011 0.052 0.132 0.195
0.326 0.296 0.486 0.566
125 -0.044 0.028 0.107 0.199 0.264
0.412 0.361 0.571 0.648
120 -0.024 0.094 0.179 0.277 0.341
0.502 0.428 0.654 0.726
115 0.040 0.185 0.261 0.359 0.417
0.588 0.489 0.726 0.790
110 0.152 0.287 0.342 0.432 0.481
0.655 0.535 0.776 0.830
105 0.279 0.370 0.399 0.477 0.514
0.683 0.551 0.785 0.830
100 0.333 0.391 0.403 0.468 0.498
0.651 0.523 0.735 0.770
95 0.242 0.312 0.330 0.386 0.411
0.535 0.432 0.604 0.633
90 0.043 0.134 0.174 0.218 0.242
0.319 0.267 0.376 0.399
85 0.002 0.024 0.040 0.057
y -0.579 -0.654 -0.642 -0.734 -0.765 -1.003 -0.772 -1.092 -1.127

181
16.8 Determining the order

As usual we should like to determine the order of the methods and to estimate
the error. There is only one independent step size, h, and we can easily perform
calculations with h, 2h, and 4h, but now we are faced with the same problem as
in Chapter 15. The step sizes in the t-direction are determined in the course of
the calculations, and there is no way of ensuring that we have comparable grid
function values at the same point in time corresponding to two different values
of h.

Table 16.11: Order ratios for Crank-Nicolson.

x\t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0
195 - 4.0 - 3.8 5.1 5.5 7.1 4.2 6.8 3.1
190 - 4.0 - 3.8 5.1 5.5 7.1 4.2 7.0 3.2
185 - 4.0 -0.2 3.8 5.2 5.5 7.1 4.2 7.3 3.4
180 - 4.1 3.5 3.9 5.2 5.5 7.2 4.2 8.0 3.5
175 3.9 4.1 4.4 3.9 5.2 5.5 7.2 4.2 - 3.7
170 3.9 4.2 4.8 3.9 5.2 5.6 7.2 4.2 - 3.8
165 3.9 4.5 5.0 4.0 5.2 5.6 7.3 4.3 -5.7 3.9
160 4.0 - 5.1 4.0 5.2 5.6 7.5 4.4 1.1 3.9
155 4.0 3.5 5.2 4.0 5.2 5.7 7.7 7.0 2.6 4.0
150 4.1 3.8 5.3 4.1 5.2 5.8 8.8 3.8 3.3 4.0
145 4.3 3.9 5.3 4.1 5.2 6.2 1.6 4.0 3.7 4.0
140 5.9 3.9 5.3 4.1 5.2 - 6.0 4.0 3.9 4.1
135 3.5 4.0 5.2 4.2 4.4 5.0 6.5 4.1 4.1 4.1
130 3.7 4.0 5.0 4.2 5.5 5.3 6.7 4.1 4.2 4.1
125 3.8 3.9 4.2 4.1 5.4 5.4 6.8 4.1 4.3 4.1
120 3.8 3.8 - 4.1 5.4 5.5 6.8 4.1 4.3 4.1
115 3.7 2.0 6.6 4.2 5.4 5.5 6.9 4.1 4.4 4.1
110 3.4 4.8 6.2 4.2 5.4 5.6 6.9 4.1 4.4 4.1
105 2.4 4.5 6.0 4.2 5.4 5.6 6.9 4.1 4.4 4.1
100 - 4.4 5.9 4.2 5.3 5.5 6.8 4.1 4.4 4.1
95 -0.2 4.3 5.7 4.1 5.3 5.4 6.7 4.1 4.4 4.1
90 3.8 4.2 4.7 4.1 4.9 5.1 6.2 4.1 4.3 4.1
85 1.5 - -3.3 3.7 2.2 3.9
y 7.5 4.2 5.3 4.1 5.0 5.2 6.2 4.1 4.4 4.1

The solution is again to interpolate between the time values that we actually
compute. In Table 16.9 we give the order ratios and in Table 16.10 the error
function multiplied by 10 for a computation with the parameter values K = 100,

182
Table 16.12: Error function *1000 for Crank-Nicolson.
x\t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 4.0
195 - -0.000 - 0.001 0.002 0.003 0.002 0.003 -0.001
190 - -0.000 - 0.002 0.004 0.006 0.005 0.005 -0.002
185 - -0.001 0.000 0.003 0.006 0.009 0.008 0.008 -0.004
180 - -0.001 0.000 0.005 0.009 0.012 0.010 0.010 -0.006
175 -0.000 -0.001 0.001 0.007 0.011 0.015 0.013 0.012 -0.011
170 -0.000 -0.001 0.002 0.009 0.014 0.017 0.014 0.012 -0.017
165 -0.000 -0.001 0.003 0.012 0.017 0.019 0.015 0.011 -0.025
160 -0.000 -0.000 0.005 0.014 0.018 0.020 0.014 0.008 -0.037
155 -0.001 0.001 0.008 0.017 0.019 0.019 0.011 0.001 -0.052
150 -0.001 0.003 0.011 0.019 0.019 0.016 0.006 -0.009 -0.072
145 -0.001 0.007 0.014 0.020 0.016 0.010 -0.002 -0.024 -0.096
140 -0.000 0.011 0.016 0.018 0.011 0.000 -0.014 -0.044 -0.125
135 0.002 0.015 0.016 0.013 0.002-0.013 -0.029 -0.068 -0.160
130 0.007 0.019 0.014 0.004 -0.011-0.031 -0.048 -0.097 -0.198
125 0.014 0.019 0.008 -0.010 -0.028-0.052 -0.069 -0.129 -0.238
120 0.022 0.014 -0.003 -0.028 -0.048-0.076 -0.092 -0.162 -0.278
115 0.026 0.003 -0.017 -0.049 -0.070-0.100 -0.115 -0.193 -0.313
110 0.021 -0.013 -0.033 -0.069 -0.089-0.120 -0.133 -0.217 -0.337
105 0.009 -0.028 -0.046 -0.084 -0.102-0.132 -0.142 -0.228 -0.343
100 0.001 -0.034 -0.050 -0.086 -0.102-0.129 -0.138 -0.218 -0.323
95 0.003 -0.027 -0.041 -0.070 -0.085-0.108 -0.116 -0.181 -0.267
90 0.005 -0.008 -0.021 -0.037 -0.049-0.064 -0.072 -0.111 -0.168
85 -0.001 -0.009 -0.008 -0.022
y 0.007 0.121 0.159 0.231 0.256 0.302 0.311 0.443 0.598

σ = 0.2, r = 0.1, and L = 200, and calculated with h = 0.25, 0.5, and 1.0
and using linear interpolation. The last line in each table refers to the boundary
curve. We expect the implicit method to be of first order and have therefore
elected to use linear interpolation. Furthermore 3-point interpolation produces
bad results for large values of t because of the large time steps here. The first
order is clearly demonstrated by the order ratios. which vary slowly with x (with
occasional deviations which can be explained by small values of the auxiliary
function). But the effect of the interpolation error is evident in the t-dependence
which displays a somewhat erratic behaviour. As discussed in Chapter 15 and
Appendix C we can still have confidence in the error estimates.
In contrast to Brennan-Schwartz the order ratios behave nicely around x = K
and the error function does not have an excessive maximum there. The behaviour
is much smoother with a ‘soft’ maximum attained near x = K.

183
Table 16.13: The price function computed with Crank-Nicolson.
x\t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 4.0
195 0.000 0.000 0.001 0.002 0.005 0.008 0.011 0.014 0.020
190 0.000 0.000 0.002 0.005 0.011 0.018 0.025 0.031 0.042
185 0.000 0.000 0.003 0.009 0.019 0.030 0.041 0.051 0.069
180 0.000 0.001 0.005 0.015 0.029 0.044 0.060 0.075 0.100
175 0.000 0.001 0.008 0.022 0.041 0.063 0.084 0.104 0.138
170 0.000 0.002 0.012 0.033 0.059 0.087 0.115 0.140 0.183
165 0.000 0.004 0.020 0.048 0.082 0.119 0.154 0.186 0.239
160 0.000 0.007 0.031 0.070 0.115 0.161 0.204 0.243 0.308
155 0.000 0.012 0.049 0.102 0.160 0.217 0.269 0.317 0.394
150 0.001 0.023 0.077 0.148 0.221 0.291 0.355 0.411 0.502
145 0.002 0.041 0.121 0.214 0.306 0.391 0.467 0.533 0.638
140 0.006 0.074 0.188 0.310 0.424 0.526 0.615 0.692 0.812
135 0.016 0.132 0.291 0.447 0.587 0.708 0.811 0.898 1.034
130 0.041 0.230 0.448 0.644 0.812 0.953 1.070 1.169 1.320
125 0.100 0.396 0.684 0.925 1.122 1.283 1.416 1.525 1.690
120 0.229 0.671 1.038 1.325 1.551 1.731 1.877 1.995 2.173
115 0.498 1.116 1.563 1.892 2.143 2.339 2.494 2.620 2.806
110 1.026 1.818 2.332 2.693 2.960 3.164 3.325 3.454 3.642
105 1.992 2.901 3.447 3.818 4.087 4.290 4.449 4.574 4.758
100 3.636 4.528 5.047 5.394 5.644 5.831 5.977 6.091 6.258
95 6.234 6.911 7.320 7.599 7.800 7.952 8.070 8.163 8.299
90 10.068 10.320 10.526 10.681 10.799 10.892 10.965 11.023 11.110
85 15.000 15.006 15.014 15.022 15.038
y 88.528 86.792 85.875 85.294 84.891 84.597 84.374 84.201 83.953

Crank-Nicolson is expected to be of second order and we therefore switch to 3-


point interpolation. Table 16.11 gives the order ratios and Table 16.12 gives the
error function multiplied by 1000 for a computation with the same parameter
values as before but now with step sizes h = 0.0625, 0.125, and 0.25. The last
line in each table refers to the boundary curve. The general picture is that of a
second order method but it is clear that 3-point interpolation is not really good
enough for the order ratios. On the positive side we notice that the behaviour
on the line x = K is pretty much like in the rest of the region and that the error
function is very small, and again with a ‘soft’ maximum in the neighbourhood of
x = K. That the singularity at (0, K) has no significant effect on the numbers
is explained by the fact that the time steps in the beginning are very small,
leading to small values of bµ and therefore efficient damping of high frequency
components by the Crank-Nicolson method.

184
We conclude this section by giving in Table 16.13 the values of the price function
and the boundary curve as calculated by Crank-Nicolson with h = 0.0625.

Table 16.14: Average number of iterations per time step,


and total number (N) of time steps.

Tolerance
−3 −4 −5 −6
h 10 10 10 10 10−7 10−8 10−9 10−10 10−11 10−12 N
1/1 3.06 3.19 3.75 4.19 4.56 4.81 5.06 5.19 5.56 5.69 16
1/2 1.94 3.12 3.21 3.58 4.12 4.36 4.58 4.82 5.12 5.27 33
1/4 1.44 2.08 2.79 3.20 3.36 3.62 4.20 4.30 4.36 4.61 66
1/8 1.33 1.50 2.21 2.58 3.16 3.23 3.36 3.69 4.17 4.21 132
1/16 1.26 1.34 1.51 2.18 2.48 3.11 3.15 3.23 3.39 3.94 265
1/32 - 1.27 1.36 1.48 2.14 2.38 3.07 3.09 3.15 3.34 530
1/64 - 1.27 1.30 1.35 1.55 2.12 2.31 3.05 3.07 - 1061
1/128 - 1.30 1.31 1.32 1.36 1.57 2.10 2.24 2.88 - 2122

16.9 Efficiency of the methods

Comparing Brennan-Schwartz to Crank-Nicolson with variable time steps we note


that the latter is a second order method with error estimates that can be trusted,
also in the interesting region where x ≤ K. The price we have to pay to achieve
this is very small time steps in the beginning and several iterations per time
step. To take the last point first we supply in Table 16.14 the average number
of iterations per time step as a function of h and the tolerance as well as the
total number (N) of time steps for a given value of h. As expected the number
of iterations increase with decreasing values of the tolerance but not very much.
On the other hand the number of iterations decrease with decreasing values of h.
Typically we can expect 3-4 iterations per time step for small h. For comparison
we should remember that Brennan-Schwartz computes all the way down to 0
which amounts to 1.7-2 times the number of valid grid points.
When discussing the time step variations it is essential to point out that we
use very small time steps for t close to 0 (time close to expiry) where most of
the ‘action’ is, as measured by large values of the time derivatives of u near
x = K. One property to strive for is a constant value of kn · ut (tn , K) as tn
n n−1
increases. In Table 16.15 we have given values of vm − vm which can be taken as
approximations of kn ·ut (tn , mh) for values of x = mh around K and for all the 16
time steps we take when h = 1. It is seen that for constant x, kn ·ut (tn , x) displays

185
Table 16.15: Values of kn · ut near x = K.
n\x 93 94 95 96 97 98 99 100 101 102 103 104
1 0.26 0.03 0.00 0.00 0.00
2 0.16 0.36 0.22 0.08 0.03 0.01
3 0.10 0.24 0.21 0.21 0.14 0.07 0.03
4 0.07 0.17 0.18 0.22 0.19 0.16 0.11 0.07
5 0.05 0.13 0.16 0.22 0.22 0.21 0.18 0.14 0.10
6 0.04 0.11 0.15 0.22 0.24 0.26 0.25 0.23 0.19 0.15
7 0.03 0.10 0.15 0.22 0.25 0.29 0.30 0.30 0.28 0.25 0.22
8 0.03 0.10 0.15 0.22 0.26 0.31 0.33 0.35 0.35 0.34 0.31 0.28
9 0.09 0.15 0.21 0.26 0.31 0.34 0.37 0.38 0.39 0.38 0.36 0.34
10 0.15 0.21 0.26 0.31 0.34 0.38 0.40 0.42 0.42 0.42 0.41 0.39
11 0.21 0.26 0.31 0.35 0.39 0.41 0.43 0.44 0.45 0.45 0.45 0.44
12 0.26 0.31 0.35 0.39 0.42 0.45 0.46 0.48 0.49 0.49 0.49 0.49
13 0.31 0.35 0.40 0.43 0.46 0.48 0.50 0.51 0.53 0.53 0.53 0.53
14 0.36 0.40 0.43 0.47 0.49 0.52 0.54 0.56 0.57 0.58 0.58 0.58
15 0.41 0.44 0.48 0.51 0.54 0.56 0.58 0.60 0.61 0.62 0.63 0.64
16 0.46 0.49 0.52 0.56 0.58 0.61 0.63 0.65 0.66 0.68 0.69 0.70

a slow increase, possibly after some initial fluctuations. With this behaviour in
mind we can of course modify Brennan-Schwartz and incorporate varying time
steps, possibly making this method more competitive.

186
Appendix A

The One-Way Wave Equation

In this appendix we shall analyze various difference schemes for the one-way wave
equation

ut + aux = 0 (A.1)

using the tools developed in Chapters 2 and 3.


To ensure uniqueness we must impose an initial condition at t = 0, and a bound-
ary condition at one end of the interval in question. Which end depends on the
sign of the coefficient a. If a is positive, such that the movement is from left
to right, the boundary information should be supplied at the left end. If a is
negative the boundary condition should be given at the right end. Some of the
numerical schemes below come in pairs, such as A.1 and A.2, or A.6 and A.7,
where one can be used for negative a, the other for positive a.
In order to check the accuracy (and other properties) of the difference schemes
we note that the symbol of the differential operator in (A.1) is

p(s, ξ) = s + iaξ. (A.2)

A.1 Forward-time forward-space

As suggested by the name this method can be written

(∆t + a∆)v = ν

or
n+1 n n n
vm − vm vm+1 − vm n
+a = νm (A.3)
k h
187
or
n+1 n n n n
vm = vm − aλ(vm+1 − vm ) + kνm (A.4)

using the step ratio


k
λ= .
h
The growth factor is

g(ϕ) = 1 − aλ(eiϕ − 1) (A.5)


= 1 + aλ(1 − cos ϕ) − iaλ sin ϕ.

The condition for stability is −1 ≤ aλ ≤ 0, so this method should only be used


with negative a.
The symbol (cf. section 3.3) for the difference operator is

1 sk a
pk,h (s, ξ) = (e − 1) + (eiξh − 1)
k h
1 2 1
= s + s k + iaξ − aξ 2 h + · · · (A.6)
2 2
showing that this method is first order accurate in both k and h.

A.2 Forward-time backward-space

This method can be written

(∆t + a∇)v = ν

or
n+1 n
vm − vm v n − vm−1
n
+a m n
= νm (A.7)
k h
or
n+1 n n n n
vm = vm − aλ(vm − vm−1 ) + kνm . (A.8)

The growth factor is

g(ϕ) = 1 − aλ(1 − e−iϕ ) (A.9)


= 1 − aλ(1 − cos ϕ) − iaλ sin ϕ.

188
The condition for stability is 0 ≤ aλ ≤ 1 and this condition also guarantees a
maximum principle.
The symbol for the difference operator is
1 sk a
pk,h (s, ξ) = (e − 1) + (1 − e−iξh )
k h
1 2 1
= s + s k + iaξ + aξ 2 h + · · · (A.10)
2 2
showing that this method is first order accurate in both k and h.
Better accuracy is obtained using

A.3 Forward-time central-space

This method can be written

(∆t + aµ̃δ)v = ν

or
n+1
vm n
− vm v n − vm−1
n
+ a m+1 n
= νm (A.11)
k 2h
or

n+1 n 1 n n n
vm = vm − aλ(vm+1 − vm−1 ) + kνm . (A.12)
2
The growth factor is
1
g(ϕ) = 1 − aλ(eiϕ − e−iϕ )
2
= 1 − iaλ sin ϕ. (A.13)

The method is 0-stable if k = O(h2 ) but it is never absolutely stable. We shall


therefore disregard it in favour of

A.4 Central-time central-space or leap-frog

This method can be written

(µ̃t δt + aµ̃δ)v = ν

189
or
n+1 n−1
vm − vm v n − vm−1
n
+ a m+1 n
= νm (A.14)
2k 2h
or
n+1 n−1 n n n
vm = vm − aλ(vm+1 − vm−1 ) + 2kνm . (A.15)
For the growth factor we have
g 2 + 2iaλ sin ϕ · g − 1 = 0 (A.16)
and the condition for stability is |aλ| ≤ 1.
The symbol for the difference operator is
1 sk a
pk,h (s, ξ) = (e − e−sk ) + (eiξh − e−iξh )
2k 2h
1 3 2 1 3 2
= s + iaξ + s k − iaξ h + · · · (A.17)
6 6
showing that this method is second order accurate in both k and h. The main
drawback is that it is a two-step method. The following modification results in a
similar one-step method:

A.5 Lax-Friedrichs

This method can be written


n+1
vm − 21 (vm+1
n n
+ vm−1 ) v n − vm−1
n
+ a m+1 = νm n
(A.18)
k 2h
or
n+1 1 n n 1 n n n
vm = (vm+1 + vm−1 ) − aλ(vm+1 − vm−1 ) + kνm . (A.19)
2 2
For the growth factor we have
g(ϕ) = cos ϕ − iaλ sin ϕ (A.20)
and the condition for stability is again |aλ| ≤ 1 and this condition also guarantees
a maximum principle.
The symbol for the difference operator is
1 sk a
pk,h (s, ξ) = (e − cos ξh) + (eiξh − e−iξh )
k 2h
1 1 1 h2
= s + iaξ + s2 k − iaξ 3 h2 + ξ 2 + · · · (A.21)
2 6 2 k
If µ = k/h2 is constant then Lax-Friedrichs is not consistent with (A.1), but if
λ = k/h is constant which is more customary for this problem and in line with
the stability condition above, then the method is first order accurate.

190
A.6 Backward-time forward-space

This method can be written


n
vm n−1
− vm v n − vm
n
+ a m+1 n
= νm (A.22)
k h
or
n n−1 n n
vm (1 − aλ) = vm − aλvm+1 + kνm . (A.23)

The growth factor is

1
g(ϕ) = (A.24)
1 − aλ(1 − eiϕ )

showing that this method is stable for aλ ≤ 0 and thus can be used when a is
negative.
The method is implicit but since a is negative we have a boundary condition on
the right boundary and we can arrange the calculations in an explicit manner as
shown in (A.23), since we always know the value to the right at the advanced
time step (which is here step n):
Remark. We also have |g(ϕ)| ≤ 1 when aλ ≥ 1 but this is less useful. ✷
The symbol for the difference operator is

1 a
pk,h (s, ξ) = (1 − e−sk ) + (eiξh − 1)
k h
1 2 1
= s − s k + iaξ − aξ 2 h + · · · (A.25)
2 2
showing that this method is first order accurate in both k and h.

A.7 Backward-time backward-space

This method can be written


n
vm n−1
− vm v n − vm−1
n
+a m n
= νm (A.26)
k h
or
n n−1 n n
vm (1 + aλ) = vm + aλvm−1 + kνm . (A.27)

191
The growth factor is
1
g(ϕ) = (A.28)
1 + aλ(1 − e−iϕ )
showing that this method is stable for aλ > 0 and this condition also guarantees
a maximum principle.
This method is also implicit but again we can arrange the calculations in an
explicit manner (as shown) since we always know the value to the left at the
advanced time step.
The symbol for the difference operator is
1 a
pk,h (s, ξ) = (1 − e−sk ) + (1 − e−iξh )
k h
1 2 1
= s − s k + iaξ + aξ 2 h + · · · (A.29)
2 2
showing that this method is first order accurate in both k and h.

A.8 Backward-time central-space

This method can be written


n
vm n−1
− vm v n − vm−1
n
+ a m+1 = νm n
. (A.30)
k 2h
This method is truly implicit and we must solve the tridiagonal system of equa-
tions
1 n n 1 n n−1 n
− aλvm−1 + vm + aλvm+1 = vm + kνm . (A.31)
2 2
This may present some problems since we usually only have one boundary con-
dition in the one-way wave equation.
The growth factor is
1
g(ϕ) = (A.32)
1 + iaλ sin ϕ
showing that this method is unconditionally stable.
The symbol for the difference operator is
1 1
pk,h (s, ξ) = s − s2 k + iaξ − iaξ 3 h2 + · · · (A.33)
2 6
showing that this method is first order accurate in k and second order accurate
in h.

192
A.9 Lax-Wendroff

This method can be viewed as a modification of forward-time central-space


n+1 n
vm − vm v n − vm−1
n
a2 k vm+1
n n
− 2vm n
+ vm−1
+ a m+1 − (A.34)
k 2h 2 h2
1 n+1 n ak n n
= (νm + νm ) − (νm+1 − νm−1 )
2 4h
or
n+1 n aλ n n 1
vm = vm − (vm+1 − vm−1 ) + a2 λ2 (vm+1
n
− 2vmn n
+ vm−1 ) (A.35)
2 2
k n+1 n ak 2 n n
+ (νm + νm )− (ν − νm−1 ).
2 4h m+1
The growth factor is
ϕ
g(ϕ) = 1 − iaλ sin ϕ − 2a2 λ2 sin2 . (A.36)
2
and the method is therefore stable if |aλ| ≤ 1, but it does not satisfy a maximum
principle.
The symbol for the difference operator is
1 sk sin ξh k ξh
pk,h (s, ξ) = (e − 1) + ia + 2a2 2 sin2
k h h 2
1 2 1 3 2 1 3 2 1 2 2
= s + iaξ + s k + s k − iaξ h + a ξ k + · · · (A.37)
2 6 6 2
and for the right-hand-side operator
1 sk 1 sin ξh
rk,h (s, ξ) = (e + 1) − iak
2 2 h
1 1 2 2 1
= 1 + sk + s k − iaξk + · · · (A.38)
2 4 2
such that
1 3 2 1 1
pk,h − rk,h p = − s k − iaξs2 k 2 − iaξ 3 h2 + · · · (A.39)
12 4 6
showing that this method is second order accurate in both k and h.

A.10 Crank-Nicolson

This method can be written


n+1 n n+1 n+1 n n n+1 n
vm − vm vm+1 − vm−1 + vm+1 − vm−1 νm + νm
+a = . (A.40)
k 4h 2
193
Crank-Nicolson is also implicit and the tridiagonal system of equations now looks
like
1 n+1 n+1 1 n+1
− aλvm−1 + vm + aλvm+1 (A.41)
4 4
1 1 ν n+1 + νm
n
= aλvm−1 n
+ vmn
− aλvm+1n
+k m .
4 4 2
The growth factor is
1 − 21 iaλ sin ϕ
g(ϕ) = (A.42)
1 + 21 iaλ sin ϕ
showing that the method is unconditionally stable.
The symbol for the difference operator is
1 sk esk + 1 sin ξh
pk,h (s, ξ) = (e − 1) + ia (A.43)
k 2 h
and for the right-hand-side operator
1 sk
rk,h (s, ξ) = (e + 1) (A.44)
2
such that
1 1
pk,h − rk,h p = − s3 k 2 − iaξ 3 h2 + · · · (A.45)
12 6
showing that this method is second order accurate in both k and h.

A.11 Overview of the methods


Method Stencil Order Stability Comments
1 ••• 1,1 −1 ≤ aλ ≤ 0 a<0
2 ••• 1,1 0 ≤ aλ ≤ 1 a>0
3 •••• 1,2 −
4 • •• • 2,2 −1 ≤ aλ ≤ 1 2-step
5 LF ••• 1,1 −1 ≤ aλ ≤ 1
6 ••
• 1,1 aλ ≤ 0 a<0
7 ••• 1,1 aλ ≥ 0 a>0
8 •••
• 1,2 + bdry.
9 LW •
••• 2,2 −1 ≤ aλ ≤ 1 –max
10 CN •••
••• 2,2 + bdry.

194
Method A.3 is never stable and is disregarded in the following. Method A.4 is a
two-step method and methods A.8 and A.10 require an extra boundary condition
in order to solve the system of linear equations. Of the remaining methods, only
A.9 is second order, but it does not satisfy a maximum principle. If we use a first
order method together with Richardson extrapolation we can often get second
order results anyway.
Remark. A stability condition of the form |aλ| ≤ 1 requires k ≤ |a|h but this
is not unreasonable for a method which has the same order in t and x, where we
for reasons of accuracy probably should choose k proportional to h anyway. ✷
It should be mentioned here that if a is constant and if we choose k = ah, i.e.
aλ = 1 then methods A.2, A.4, A.5, A.6, and A.9 are all exact in the sense that
n+1 n
they all reduce to vm = vm−1 and thus reproduce the transport with no error. In
this special case the methods do not exhibit their more general behaviour which
usually includes a certain amount of dissipation.
We have compared the various methods on the two test examples from section
5.7, now with b = 0 and a = 9. It is again easy to show that all methods consid-
ered will conserve the area. The true solution is just a translation of the initial
function with velocity a. The numerical solution will exhibit various degrees of
dissipation, which means that certain components will be damped (|g(ϕ)| < 1),
and dispersion, which means that some components will travel with a velocity
different from a. As a result sharp corners will be rounded, the maximum will be
lowered, and the half-width will be wider.
Since a > 0 we do not test methods A.1 and A.6. Method A.3 is disregarded
because of instability and method A.4 because it is a 2-step method.
For the remaining six methods we have checked the height of the maximum, the
position of the maximum (using 3-point interpolation around the maximum v-
value) and the half-width of the bump. We have used h = 0.02 together with
k = 0.002, and h = 0.1 together with k = 0.01 and k = 0.05. The two former
combinations give aλ = 0.9, the latter aλ = 0.45. The general conclusion from
this limited test is that method A.7 always shows a large amount of dissipa-
tion/dispersion (low max, large half-width), closely followed by A.8. Methods
A.5 and A.2 perform reasonably well for aλ = 0.9 but not for the smaller value.
Methods A.9 and A.10 always perform well except that A.10 sometimes has prob-
lems with the right speed. The tests also show that methods A.8, A.9, and A.10
produce ‘waves’ with occasional negative function values upstream.

195
196
Appendix B

A Class of Test Problems

When testing algorithms it is useful to have a number of test problems for


which we know the true solution. Two such sets are introduced here for the
two-dimensional equation

ut = b1 uxx + 2b12 uxy + b2 uyy . (B.1)

This equation has solutions of the form

u(t, x, y) = eαt sin(βx − γy) cosh(δx + ǫy) (B.2)

provided the constants α, β, γ, δ, and ǫ satisfy some simple relations. Differen-


tiating (B.2) we get

uxx = (δ 2 − β 2 )u + 2βδeαt cos(βx − γy) sinh(δx + ǫy), (B.3)


uyy = (ǫ2 − γ 2 )u − 2γǫeαt cos(βx − γy) sinh(δx + ǫy), (B.4)
uxy = (βγ + δǫ)u + (βǫ − δγ)eαt cos(βx − γy) sinh(δx + ǫy). (B.5)

If we choose β, γ, δ, and ǫ such that

b1 βδ − b2 γǫ + b12 (βǫ − δγ) = 0 (B.6)

then the last terms on the right-hand-side of (B.3) - (B.5) will cancel, and if we
furthermore define

α = b1 (δ 2 − β 2 ) + b2 (ǫ2 − γ 2 ) + 2b12 (βγ + δǫ) (B.7)

then (B.2) is a solution to (B.1).


We should avoid combinations with βγ + δǫ = 0 and βǫ − δγ = 0 because this
leads to uxy = 0 and we shall not see the effect of the mixed term.

197
If we try solutions of the form

u(t, x, y) = eαt sin(βx − γy) cos(δx + ǫy) (B.8)

then we get

uxx = − (δ 2 + β 2 )u − 2βδeαt cos(βx − γy) sin(δx + ǫy), (B.9)


uyy = − (ǫ2 + γ 2 )u + 2γǫeαt cos(βx − γy) sin(δx + ǫy), (B.10)
uxy = (βγ − δǫ)u − (βǫ − δγ)eαt cos(βx − γy) sin(δx + ǫy). (B.11)

If (B.6) is satisfied then (B.8) is a solution to (B.1) when

α = − b1 (δ 2 + β 2 ) − b2 (ǫ2 + γ 2 ) + 2b12 (βγ − δǫ). (B.12)

Here we should avoid βγ − δǫ = 0 and βǫ − δγ = 0 if we want to see the effect of


the mixed term.

198
Appendix C

Interpolation and the Order


Ratio

C.1 Introduction

When we use variable step sizes in the t-direction as we do in the moving boundary
problems of Chapters 15 and 16 it becomes necessary to interpolate to get values
of the computed solution function at specified time values. When choosing an
interpolation formula we should have in mind that the interpolation error does
not interfere too much with the order ratios and the error estimation. If we use
the implicit method we expect to have first order results and a linear interpolation
should suffice, but as we shall see in the following section this might not be quite
enough.

C.2 Linear interpolation

If we want a function value v(t) at a specific value t, and tn < t < tn+1 then we
shall interpolate between tn and tn+1 and the result is a function value

tn+1 − t t − tn
w(t) = v(tn ) + v(tn+1 )
tn+1 − tn tn+1 − tn
tn+1 − t t − tn 1
= v(tn ) + (v(tn ) + (tn+1 − tn )v ′ + (tn+1 − tn )2 v ′′ + · · ·)
tn+1 − tn tn+1 − tn 2
1
= v(tn ) + (t − tn )v ′ + (tn+1 − tn )(t − tn )v ′′ + · · ·
2
1
= v(t) + (t − tn )(tn+1 − t)v ′′ + · · ·
2
199
1 2
= v(t) + α(1 − α)kn+1 v ′′ + · · · (C.1)
2
with
t − tn
kn+1 = tn+1 − tn and α = (C.2)
tn+1 − tn

Since kn = κh + · · · we see that the difference between w(t) and v(t) is of second
order, and linear interpolation is accurate enough because the interpolation error
is of higher order than the truncation error.

Table C.1: Order ratios for y(t) and u(t, x) using linear interpolation.

y(t) u(t, x)
t\x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0.1 2.174 2.287
0.2 2.131 2.193 2.192
0.3 2.048 2.096 2.077 2.062
0.4 1.862 1.917 1.870 1.825 1.787
0.5 1.991 2.032 2.016 2.001 1.989 1.982
0.6 2.013 2.046 2.038 2.030 2.023 2.018 2.016
0.7 2.072 2.094 2.094 2.095 2.095 2.096 2.096
0.8 1.921 1.957 1.941 1.927 1.914 1.903 1.897 1.897
0.9 2.011 2.035 2.030 2.026 2.021 2.018 2.015 2.014 2.014
1.0 2.066 2.082 2.083 2.084 2.084 2.085 2.085 2.084 2.083

But there is another consideration. The truncation error and the various terms
in the error expansion (15.20) tend to vary smoothly with the independent vari-
ables. The interpolation error on the other hand depends on the position of the
interpolation point relative to the calculated tn -values and will therefore exhibit
an erratic behaviour as t varies. The effect is seen most clearly in the order ratios.
Since they are ratios of differences of nearly equal quantities they are especially
sensitive to erratic changes. In Table C.1 we show the order ratios for y(t) (the
second column) and u(t, x) (the triangular scheme) when linear interpolation has
been used in connection with numerical solution of the Stefan problem (cf. section
15.4). There is no doubt about the first order behaviour but we notice that as t
varies the ratio is sometimes bigger, sometimes smaller than 2. The variation in
x for a fixed value of t is much smoother because we use the same interpolation
coefficients for all x.
In our basic assumptions (15.20) and (15.21) about the truncation error it is
understood that the auxiliary functions depend only on t and x but not on the

200
step size. The interpolation error, however, depends on the step size, not only
through the k 2 -factor in (C.1) but implicitly through α which most likely takes
on different values as h is varied. We use calculations with three different values
of h in order to determine the order and it is necessary to keep the interpolation
error small relative to the relevant components of the truncation error in order
to get a reliable error determination.

Table C.2: Error estimates*1000 for y(t) and u(t, x) using linear interpolation.

y(t) u(t, x)
t\x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0.1 -1.048 -1.946
0.2 -1.706 -2.917 -2.007
0.3 -2.235 -3.606 -2.770 -2.114
0.4 -2.834 -4.343 -3.565 -2.923 -2.418
0.5 -3.154 -4.662 -3.931 -3.316 -2.815 -2.425
0.6 -3.592 -5.128 -4.436 -3.839 -3.337 -2.928 -2.608
0.7 -3.827 -5.323 -4.665 -4.091 -3.600 -3.191 -2.862
0.8 -4.240 -5.734 -5.105 -4.545 -4.057 -3.640 -3.291 -3.007
0.9 -4.421 -5.861 -5.258 -4.718 -4.242 -3.830 -3.480 -3.191 -2.957
1.0 -4.632 -6.025 -5.444 -4.920 -4.454 -4.046 -3.695 -3.399 -3.154

The interpolation error will on the average increase by a factor 4 when we double
the step size, although the actual value also depends on α. Therefore the interpo-
lation error will tend to be largest when the step size is 4h (and it increases with
a higher rate than the truncation error). Since results with all three step sizes
enter in the calculation of the order ratio, this result is very sensitive. The dif-
ference between function values corresponding to h and 2h which we use as error
estimate or correction term in the extrapolation is less sensitive as demonstrated
in Table C.2.

C.3 Three-point interpolation

We now propose a more accurate interpolation formula. For a given value t we


use three consecutive time-values tn−1 , tn , and tn+1 with tn being the closer to t:
|t − tn | < min{t − tn−1 , tn+1 − t}.

Repeating the calculations of the previous section we arrive at


1
w(t) = v(tn ) + (t − tn )v ′ + (t − tn )2 v ′′ + O(kn3 )v ′′′ = v(t) + O(h3 ).
2
201
Table C.3: Order ratios for y(t) and u(t, x) using 3-point interpolation.

y(t) u(t, x)
t\x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0.1 2.103 2.217
0.2 2.076 2.144 2.124
0.3 2.021 2.062 2.038 2.019
0.4 2.028 2.064 2.050 2.038 2.031
0.5 2.073 2.112 2.112 2.111 2.110 2.046
0.6 2.047 2.077 2.073 2.070 2.067 2.065 2.051
0.7 2.039 2.065 2.062 2.058 2.056 2.053 2.052
0.8 2.028 2.051 2.047 2.043 2.040 2.038 2.037 2.037
0.9 2.038 2.059 2.057 2.055 2.053 2.052 2.050 2.049 2.039
1.0 2.033 2.053 2.051 2.049 2.047 2.045 2.044 2.043 2.042

Table C.4: Error estimates*1000 for y(t) and u(t, x) using 3-point interpolation.

y(t) u(t, x)
t\x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0.1 -1.062 -1.973
0.2 -1.801 -3.067 -2.156
0.3 -2.410 -3.871 -3.034 -2.367
0.4 -2.878 -4.405 -3.626 -2.982 -2.473
0.5 -3.277 -4.824 -4.093 -3.473 -2.965 -2.564
0.6 -3.642 -5.191 -4.500 -3.901 -3.396 -2.983 -2.653
0.7 -3.949 -5.473 -4.815 -4.238 -3.741 -3.325 -2.986
0.8 -4.237 -5.731 -5.102 -4.543 -4.055 -3.638 -3.289 -3.005
0.9 -4.497 -5.952 -5.348 -4.806 -4.327 -3.912 -3.557 -3.262 -3.016
1.0 -4.726 -6.132 -5.550 -5.025 -4.556 -4.144 -3.788 -3.485 -3.234

The interpolation will still introduce some disturbance but smaller and in a higher
order term and therefore hardly visible. The order ratios of Table C.3 exhibit a
much smoother variation than those of Table C.1 although the erratic components
in the t-direction can be seen faintly. The error estimates given in Table C.4 agree
reasonably well with those from linear interpolation indicating that these were
quite adequate.

202
Bibliography

[1] A. C. Aitken, On Bernoulli’s numerical solution of algebraic equations,


Proc. Roy. Soc. Edinburgh, 46 (1926), pp. 289–305.

[2] V. A. Barker, Extrapolation,


Hæfte 31, Numerisk Institut, DtH, Lyngby, 1974.

[3] M. J. Brennan and E. S. Schwartz, A continuous time approach to the pricing


of bonds, Journal of Banking and Finance, 3 (1979), pp. 133–155.

[4] M. J. Brennan and E. S. Schwartz, The valuation of American put options,


Journal of Finance, 32 (1977), pp. 449–462.

[5] G. G. O’Brien, M. A. Hyman, and S. Kaplan,


A Study of the Numerical Solution of Partial Differential Equations,
J. Math. Phys., 29 (1951), pp. 223–251.

[6] D. Britz and O. Østerby, Some numerical investigations of the stability of


electrochemical digital simulation, particularly as affected by first-order ho-
mogeneous reactions, J. Electroanal. Chem., 368 (1994), pp. 143–147.

[7] E. T. Copson and P. Keast, On a boundary-value problem for the equation


of heat, J. Inst. Maths. Applics, 2 (1966), pp. 358–363.

[8] J. Crank and P. Nicolson, A practical method for numerical evaluation of


solutions of partial differential equations of the heat-conduction type,
Proc. Cambridge Philos. Soc., 43 (1947), pp. 50–67.
Reprinted in Adv. Comput. Math., 6 (1996), pp. 207–226.

[9] J. Douglas and T. M. Gallie, On the numerical integration of a parabolic


differential equation subject to a moving boundary condition,
Duke Math. J., 22 (1955), pp. 557-571.

[10] J. Douglas and H. H. Rachford, On the numerical solution of heat conduction


problems in two and three space variables,
Trans. Amer. Math. Soc., 82 (1956), pp. 421–439.

203
[11] E. C. Du Fort and S. P. Frankel, Stability Conditions in the Numerical Treat-
ment of Parabolic Differential Equations,
Math. Tables Aid Comput., 7, (1953), pp. 135–152.

[12] A. R. Gourlay and J. Ll. Morris, The extrapolation of first order methods for
parabolic partial differential equations II,
SIAM J. Numer. Anal., 17, (1980), pp. 641–655.

[13] P. M. Gresho and R. L. Lee, Don’t suppress the wiggles – they’re telling you
something, Computers and Fluids, 9 (1981), pp. 223–253.

[14] Asbjørn Trolle Hansen, Martingale Methods in Contingent Claim Pricing


and Asymmetric Financial Markets,
Ph.D. thesis, Dept. Oper. Research, Aarhus University, 1998.

[15] Asbjørn Trolle Hansen and Ole Østerby,


Accelerating the Crank-Nicolson method in American option pricing,
Dept. Oper. Research, Aarhus University, 1998.

[16] D. R. Hartree and J. R. Womersley, A Method for the Numerical or Mechan-


ical Solution of Certain Types of Partial Differential Equations,
Proc. Royal Soc. London, Ser. A, 161 (1937), pp. 353–366.

[17] D. C. Joyce, Survey of extrapolation processes in numerical analysis,


SIAM Review, 13 (1971), pp. 435–490.

[18] P. Keast and A. R. Mitchell, Finite difference solution of the third boundary
problem in elliptic and parabolic equations,
Numer. Math., 10, (1967), pp. 67–75.

[19] P. Laasonen, Über eine Methode zur Lösung der Wärmeleitungsgleichung,


Acta Math., 81 (1949), pp. 309–317.

[20] J. D. Lawson and J. Ll. Morris, The extrapolation of first order methods for
parabolic partial differential equations I,
SIAM J. Numer. Anal., 15, (1978), pp. 1212–1224.

[21] P. D. Lax and R. D. Richtmyer, Survey of the Stability of Linear Finite


Difference Equations, Comm. Pure Appl. Math., 9, (1956), pp. 267–293.

[22] P. D. Lax and B. Wendroff, Systems of conservation laws,


Comm. Pure Appl. Math., 13, (1960), pp. 217–237.

[23] B. Lindberg, On smoothing and extrapolation for the trapezoidal rule,


BIT, 11 (1971), pp. 29–52.

204
[24] F. A. Longstaff and E. S. Schwartz, Interest rate volatility and the term
structure: A two-factor general equilibrium model,
J. Finance, 47 (1992), pp. 1259–1282.
See also Journal of Fixed Income (1993), pp. 7–14.

[25] A. R. Mitchell and D. F. Griffiths, The Finite Difference Method in Partial


Differential Equations, John Wiley, Chichester, 1980.

[26] P. L. J. van Moerbeke, On optimal stopping and free boundary problems,


Arch. Rational Mech. Anal., 60 (1976), pp. 101–148.

[27] D. W. Peaceman and H. H. Rachford, The numerical solution of parabolic


and elliptic differential equations, J. SIAM, 3 (1955), pp. 28–41.

[28] C. E. Pearson, Impulsive end condition for diffusion equation,


Math. Comp., 19 (1965), pp. 570–576.

[29] Lewis F. Richardson, The Approximate Numerical Solution by Finite Differ-


ences of Physical Problems Involving Differential Equations,
Phil. Trans. Roy. Soc. London, Series A, 210 (1910), pp. 307–357.

[30] Lewis F. Richardson and J. Arthur Gaunt,


The Deferred Approach to the Limit I - II,
Phil. Trans. Roy. Soc. London, Series A, 226 (1927), pp. 299–361.

[31] Robert D. Richtmyer, Difference Methods for Initial-Value Problems,


Interscience, New York, 1957.

[32] Werner Romberg, Vereinfachte Numerische Integration,


Norske Vid. Selsk. Forh., Trondheim, 28 (1955), pp. 30–36.

[33] J. Stefan, Über die Theorie der Eisbildung, insbesondere über die Eisbildung
im Polarmeere, Akad. Wiss. Wien, Mat. Nat. Classe, Sitzungsberichte, 98
(1889), pp. 965–983.

[34] G. W. Stewart, Introduction to Matrix Computations,


Academic Press, New York, 1973.

[35] J. C. Strikwerda, Finite Difference Schemes and Partial Differential Equa-


tions, Wadsworth and Brooks/Cole, Pacific Grove, CA, 1989.

[36] Øystein Tødenes, On the numerical solution of the diffusion equation,


Math. Comp., 24, (1970), pp. 621–627.

[37] J. H. Wilkinson, Rounding Errors in Algebraic Processes,


HMSO, London, 1963.

205
[38] W. L. Wood and R. W. Lewis, A comparison of time marching schemes for
the transient heat conduction equation,
Int. J. Num. Meth. Engrg., 9, (1975), 679–689.

[39] Y. G. D’Yakonov, On the application of disintegrating difference operators,


USSR Comp. Math., 3 (1963), 511–515.
See also vol. 2 (1962), pp. 55–77 and pp. 581–607.

[40] O. Østerby, Stability of finite difference formulas for linear parabolic equa-
tions, 2nd Int. Coll. on Numerical Analysis, Plovdiv, D. Bainov and V.
Covachev (eds.), VSP Press, Utrecht (1994), pp. 165–176.

[41] O. Østerby, Five ways of reducing the Crank-Nicolson oscillations,


BIT, 43 (2003), pp. 811–822.

206

You might also like