0% found this document useful (0 votes)
90 views110 pages

NumProg2 - 2020-07-12

This document contains notes from Professor Rainer Callies's "Numerical Programming 2" course in the summer of 2020. It covers various topics in numerical analysis including: iterative methods for solving linear systems; numerical methods for solving ordinary differential equations; finite difference methods for partial differential equations; and multigrid methods. Contact information is provided for any questions about error messages or proposed corrections to the notes.

Uploaded by

lojoj3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views110 pages

NumProg2 - 2020-07-12

This document contains notes from Professor Rainer Callies's "Numerical Programming 2" course in the summer of 2020. It covers various topics in numerical analysis including: iterative methods for solving linear systems; numerical methods for solving ordinary differential equations; finite difference methods for partial differential equations; and multigrid methods. Contact information is provided for any questions about error messages or proposed corrections to the notes.

Uploaded by

lojoj3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 110

Numerical Programming 2

(MA 3306)
Summer Term 2020

Rainer Callies
Department of Mathematics M2
Technical University of Munich

Error messages/proposed corrections please email to


[email protected]
These notes follow Prof. Callies’s ”Numerical Programming 2” course in the Summer
term 2020. His course is constructed using many different resources like books, other
professor’s lecture material or codes, none of which are cited here as these are my
personal notes.

I
Contents

1 Repetition 3
1.1 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Fixed-Point Interations in Banach Spaces . . . . . . . . . . . . . 4
1.3 Error Propagation – Basics . . . . . . . . . . . . . . . . . . . . . 7

2 Iterative Solution of Linear Systems 10


2.1 Linear Iterative Methods – Stationary Methods . . . . . . . . . . 11
2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Classical linear iterative methods . . . . . . . . . . . . . . 12
2.2 Methods Based on Minimization – Krylov Subspace Methods . . 23
2.2.1 Fundamental Idea . . . . . . . . . . . . . . . . . . . . . . 23
2.2.2 Simplest realization: Gradient method . . . . . . . . . . . 23
2.2.3 Scalar Product . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.4 CG Method (Conjugate Gradient Method) . . . . . . . . . 26

3 Numerical Solution of Ordinary Differential Equations 30


3.1 Basic Definitions and Transformations . . . . . . . . . . . . . . . 30
3.2 Summary of Important Theorems . . . . . . . . . . . . . . . . . . 31
3.3 Numerical Methods: Basic Idea and Notation . . . . . . . . . . . 34
3.4 Consistency and Convergence of One-Step Methods . . . . . . . 35
3.5 Construction of One-Step Methods . . . . . . . . . . . . . . . . . 37
3.5.1 Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5.2 Explicit Runge-Kutta Methods . . . . . . . . . . . . . . . . 39
3.6 Stepsize Control for One-Step Methods . . . . . . . . . . . . . . 43
3.6.1 Basic problem and Solution Strategy . . . . . . . . . . . . 43
3.6.2 One Method, Two Different Stepsizes . . . . . . . . . . . 44
3.6.3 Two Different Methods, One Stepsize . . . . . . . . . . . 45
3.7 Relation Between Convergence and Consistency . . . . . . . . . 47
3.8 Stiff ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

II
4 Finite Differences 54
4.1 One-Dimensional Model Problem . . . . . . . . . . . . . . . . . . 54
4.1.1 Model problem . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.2 Numerical Approximation by Finite Differences . . . . . . 56
4.1.3 Convergence of the Finite Difference Method . . . . . . . 57
4.2 Quasilinear PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Poisson Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3.1 Derivation of the Poisson Equation . . . . . . . . . . . . . 62
4.3.2 Poisson Equation and Properties of its Solution . . . . . . 64
4.3.3 Grid, Difference Operators and Boundary Conditions . . . 65
4.3.4 Formulation of the Sparse Linear System . . . . . . . . . 70
4.3.5 Analysis of the Finite Difference Discretization . . . . . . 71
4.4 1D Linear Advection Equation . . . . . . . . . . . . . . . . . . . . 75
4.4.1 Formulation of the PDE Problem . . . . . . . . . . . . . . 75
4.4.2 Explicit Schemes . . . . . . . . . . . . . . . . . . . . . . . 77
4.4.3 Repetition: Discrete Fourier Transform (DFT) . . . . . . . 80
4.4.4 Von Neumann Stability Analysis . . . . . . . . . . . . . . 82
4.4.5 Difference Equations . . . . . . . . . . . . . . . . . . . . . 87
4.4.6 Von Neumann Stability Analysis Extendend . . . . . . . . 95
4.4.7 Implicit Schemes – Crank-Nicolson Scheme . . . . . . . . 97
4.4.8 Matrix Stability Analysis . . . . . . . . . . . . . . . . . . . 99
4.5 Multigrid Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 101

III
1 Repetition
1.1 Norms
Definition (special vector norms)
For ⃗x ∈ C n we define
∥⃗x∥1 |x1 | + |x2 | + . . . + |xn |
:= v
u∑
u n
∥⃗x∥2 := t |xi |2
i=1
∥⃗x∥∞ := max {|x1 |, |x2 |, . . . , |xn |}

Definition (vector norm)


A vector norm p is a mapping ∥ · ∥p : C n → IR+
0 such that

∥⃗x∥p > 0 ∀⃗x ̸= ⃗0 ∧ ∥⃗x∥p = 0 ⇔ ⃗x = ⃗0


∥a⃗x∥p = |a| · ∥⃗x∥p ∀ a ∈ C , ∀⃗x
∥⃗x + ⃗y ∥p ≤ ∥⃗x∥p + ∥⃗y ∥p ∀⃗x,⃗y

Definition (matrix norm)


A matrix norm is a mapping ∥ · ∥ : IRn×m → IR+
0 such that

∥A∥ > 0 ∀ A ̸= 0 ∧ (∥A∥ = 0 ⇔ A = 0)


∥αA∥ = |α|∥A∥ (homogenity)
∥A + B∥ ≤ ∥A∥ + ∥B∥ (triangular inequality)

for all A, B ∈ IRn×m , α ∈ IR. Generalization to A, B ∈ C n×m , α ∈ C possible!


A matrix norm ∥ · ∥ is called sub-multiplicative, if

∥C · D∥ ≤ ∥C∥ · ∥D∥ ∀ C ∈ IRn×m , D ∈ IRm×q

A matrix norm ∥ · ∥ is called compatible or consistent with the vector norm ∥ · ∥p ,


if
∥A⃗x∥p ≤ ∥A∥∥⃗x∥p ∀ A ∈ IRn×m , ⃗x ∈ IRm

Definition
Let be A ∈ IRn×m , then the vector norm ∥· ∥p can be used to define the following
matrix norm
∥A⃗x∥p
∥A∥p := sup = sup ∥A⃗x∥p
x̸=0 ∥⃗
⃗ x∥p ∥⃗x∥p =1

This matrix norm is called induced matrix norm. 


Induced matrix norms are compatible. Among all compatible matrix norms, the
induced matrix norm is the smallest one

∥A⃗x∥p ≤ ∥A∥∥⃗x∥p ∀⃗x ∧ ∥A∥ ≥ ∥A∥p


c Rainer Callies TUM 2020 3
 Examples: induced matrix norms
( )

m
∥⃗x∥1 = |x1 | + . . . + |xn | ⇒ ∥A∥1 = max |aij |
j=1...n
v i=1
u∑ √
u n
∥⃗x∥2 = t |xi |2 ⇒ ∥A∥2 = λmax (AH A)
i=1

= max ⃗xH AH A⃗x/(⃗xH ⃗x)
x̸=0 
⃗ 
∑n
∥⃗x∥∞ = max {|x1 |, . . . , |xn |} ⇒ ∥A∥∞ = max  |aij |
i=1...m
j=1

∥A∥∞ is the maximum absolute row sum of the matrix and is called row sum
norm, ∥A∥1 is the maximum absolute column sum of the matrix and is called
column sum norm, ∥A∥2 is the spectral norm. 

 Example
All induced norms are sub-multiplicative, the matrix norm ∥A∥ := maxi,j |aij | is
not. The Frobenius-norm is a sub-multiplicative norm compatible with – but not
induced by – the vector norm ∥ · ∥2
v
u∑
u n ∑ m
∥A∥F := t |aij |2
i=1 j=1

1.2 Fixed-Point Interations in Banach Spaces


When we talked about the solution of nonlinear equations in the last semester,
we mainly focused on the scalar case: For f ∈ C p ([a, b], IR) with p ∈ IN we
wanted to calculate x∗ such that

f (x∗ ) = 0 .

Mostly it is not possible to calculate an analytical solution. Therefore, we dis-


cussed efficient iterative methods, in each of which a sequence {xk } of ap-
proximations of x∗ is generated such that

lim xk = x∗ .
k→∞

The convergence properties of some of those methods have been analyzed in


the framework of fixed point methods. For that we restricted our analysis to
one-step methods xk+1 = ϕ(xk ) with f, ϕ ∈ C p (IR, IR), p sufficiently large. Then

lim xk = x∗ ⇒ ϕ(x∗ ) = x∗ .
k→∞


c Rainer Callies TUM 2020 4
 Example
What is the meaning of a fixed-point in Newton’s method?
f (xk ) xk =xk+1 =x∗ f (x∗ )
xk+1 = ϕ(xk ) = xk − ========⇒ 0=− ⇒ f (x∗ ) = 0
f ′ (xk ) f ′ (x∗ )

Most of the statements derived for fixed-point iterations in the scalar case (e.g.
Newton’s method applied to the one-dimensonal case) can be generalized to
the n-dimensional case (and more general also to Banach spaces) with only
minor changes: Instead of the absolute value the vector norm is used.
By this, we can reuse the knowledge obtained here for the iterative solution of
large linear systems.

Definition (Banach space = complete and normed vector space)


A Banach space B is a vector space with a metric (norm) that allows the com-
putation of vector length and distance between vectors and it is complete in the
sense that a Cauchy sequence of vectors always converges to a well defined
limit vector that is within the space. 

 Examples of Banach spaces


(IRn , ∥ · ∥1 ) , (IRn , ∥ · ∥2 ) , (IRn , ∥ · ∥∞ )

Remark
In IRn all norms are equivalent, i.e. let denote ∥ · ∥a , ∥ · ∥b two different norms
(e.g. ∥ · ∥2 , ∥ · ∥∞ , . . .), then there exist two constants α > 0, β > 0

∋ α∥⃗x∥a ≤ ∥⃗x∥b ≤ β∥⃗x∥a ∀⃗x ∈ IRn .

Statements using one special norm are valid for all norms, only the values of
constants (e.g. Lipschitz constant L) change, if the norm changes. 

Definition (Operators)
Let be X, Y normed spaces over IR. A function T : D → Y, D ⊆ X, is called
operator. Often we abbreviate T x for T (x) even if T is not linear.

A standard iteration scheme xk+1 = ϕ(xk ) for X = IR can thus be written as


xk+1 = T xk with T := ϕ.

The operator T : D → Y is linear, if D is a linear subspace of X and

T (x + αy) = T (x) + αT (y) , ∀ x, y ∈ D, α ∈ IR

The operator T : D → Y is continuous in x0 ∈ D, if for {xn } ∈ D with xn → x0


always follows T xn → T x0 .
ε − δ-formulation as usual: ∥x − x0 ∥ < δ ⇒ ∥T x − T x0 ∥ < ε. 


c Rainer Callies TUM 2020 5
Definition (Lipschitz condition)
Operator T : D → Y is in D Lipschitz continuous with Lipschitz constant L, if
∥T x − T y∥ ≤ L∥x − y∥ ∀ x, y ∈ D

T is contractive, if L < 1 is possible.

If T is linear, then we can restrict our investigation to the case y = 0 and obtain
∥T x∥ ≤ L∥x∥ ∀ x ∈ D
In this case the smallest Lipschitz constant possible is called operator norm
∥T ∥ of T
∥T x∥
∥T ∥ := sup
x∈D ∥x∥


Iteration methods in Banach spaces

Many problems in analysis – e.g. the iterative solution of nonlinear equations,


the iterative solution of large and sparse linear systems – can be written as
operator equations in a properly selected Banach space B
x = Tx with T : D ⊆ B → B

A solution x∗ ∈ B of this equation is called fixed-point of T .


To calculate x∗ often iteration methods are used
x1 = T x0 , x2 = T x1 , x3 = T x2 , . . . , xn+1 = T xn , . . .

 Important question: Do we get convergence xn → x∗ ?

Necessary (!) condition:


The iteration method converges, if the corresponding operator T is contractive.

Banach’s fixed-point theorem

 Let be B a Banach space, D ⊆ B closed;


 let the operator T : D → B be a contractive mapping with Lipschitz constant
L < 1 and
 let T map D into D: T (D) ⊆ D.

Then the equation x = T x admits a unique fixed-point x∗ ∈ D.


If the sequence {xn } is iteratively generated by xn+1 = T xn , then it converges
to x∗ . The following estimate holds
1 Ln
∥xn − x∗ ∥ ≤ ∥xn+1 − xn ∥ ≤ ∥x1 − x0 ∥
1−L 1−L


c Rainer Callies TUM 2020 6
1.3 Error Propagation – Basics
The errors investigated in this context are unavoidable and problem-induced,
even if a perfect numerical algorithm is chosen. To reduce the errors one has
to reformulate the underlying mathematical problem.
Only input errors are considered, other error sources like rounding errors are
neglected here.
Input errors may be measurement errors, but also all errors (e.g. rounding
and discretization errors) made in previous iteration steps or in the solution of
preceeding subproblems.

Definition of the problem


Let be (X, ∥ · ∥X ) and (Y, ∥ · ∥Y ) normed vector spaces and consider a mapping
f : X → Y (often called ”problem” or ”subproblem”).
For x, δx ∈ X we define
δf := f (x + δx) − f (x)

Important question
How large is the perturbation δf of the solution compared to the perturbation
δx of the input data?

Definition: absolute and relative condition for the norm error


The absolute condition number of f in x ∈ X is
( )
∥δf ∥Y
κabs (f, x) := lim sup
δ→0 ∥δx∥<δ ∥δx∥X

Using the norm allows an approach from different directions!


The relative condition number of f in x ∈ X is
( )
∥δf ∥Y /∥f ∥Y
κrel (f, x) := lim sup
δ→0 ∥δx∥<δ ∥δx∥X /∥x∥X

A problem is ill-conditioned, if κabs (f, x) ≫ 1 or κrel (f, x) ≫ 1. 

Remark
If the condition number is large, a small perturbation in the input data causes
large perturbations in the final result. The condition numbers compress the
information about error amplification into one scalar. 

Remark
Introducing condition numbers can be seen as a linearization of the original
problem. This leads to an equivalent definition of the condition numbers. 


c Rainer Callies TUM 2020 7
 Example

Consider f⃗ ∈ C 1 (D, IRm ), D ⊆ IRn . Multidimensional Taylor expansion with trun-


cation after the linear term yields
·
δ f⃗ = Df (⃗x) · δ⃗x , Df (⃗x) Jacobian

For the calculation of the condition numbers for this problem we use matrix
norms and obtain
∥Df (⃗x)∥
κabs (f⃗,⃗x) = ∥Df (⃗x)∥ , κrel (f⃗,⃗x) :=
∥f⃗(⃗x)∥/∥⃗x∥


 Example (perturbed linear system)
Definition of possible perturbations:
Let us consider the perturbation

A⃗x = ⃗b → (A + δA)(⃗x + δ⃗x) = ⃗b + δ⃗b , A ∈ IRn×n , ⃗x,⃗b ∈ IRn

Does a unique solution of the perturbed system exist?


We assume that the matrix A is non-singular and that the perturbation δA small
enough such that also det(A + δA) ̸= 0. How small is small enough?
To answer that question, we analyze the kernel of A + δA. We know that the
singular case det(A + δA) = 0 is equivalent to the existence of at least one ⃗x ̸= ⃗0
such that (A + δA)⃗x = ⃗0. In that case

(A + δA)⃗x = ⃗0 ⇒ ⃗x = −A−1 δA⃗x ⇒ ∥⃗x∥ ≤ ∥A−1 ∥ · ∥δA∥ · ∥⃗x∥


1
⇒ (1 − ∥A−1 ∥ · ∥δA∥) · ∥⃗x∥ ≤ 0 ⇒ ∥δA∥ ≥
∥A−1 ∥

If ∥δA∥ < 1/∥A−1 ∥, the norm estimate in the first line can be valid only for ⃗x = ⃗0.
Thus ⃗x = ⃗0 is the only solution of (A + δA)⃗x = ⃗0 and therefore A + δA is non-
singular.

Calculation of the condition numbers:


Consider the vector function f⃗ : IRn → IRn with f⃗(⃗b) := A−1 b ; using the definition
of the total derivative we get

κabs (f⃗,⃗b) = ∥f⃗ ′ (b)∥ = ∥A−1 ∥


∥A−1 ∥ ∥A⃗x∥
κrel (f⃗,⃗b) = = ∥A−1 ∥ ≤ ∥A−1 ∥∥A∥
−1 ⃗
∥A b∥/∥b∥ ⃗ ∥⃗
x ∥

Analogously we get for ⃗g : {A ∈ IRn×n | det A ̸= 0} → IRn with ⃗g (A) := A−1⃗b

κrel (⃗g , A) ≤ ∥A−1 ∥∥A∥


c Rainer Callies TUM 2020 8
Therefore this expression is defined as the ”condition of the linear system” and
denoted by κrel (A) := ∥A−1 ∥∥A∥ (rough estimate!).

For an induced matrix norm, we may further rewrite this as

max∥⃗x∥p =1 ∥A⃗x∥p
∥A∥p ∥A−1 ∥p =
min∥⃗z∥p =1 ∥A⃗z∥p

Properties of κrel (A):

• κrel (A) ≥ 1.

• κrel (A) = κrel (αA) ∀ α ∈ IR, α ̸= 0.

• κrel (A) = ∞ ⇔ det(A) = 0

In contrast to the determinant, the condition of the linear system is invariant


under scaling. 


c Rainer Callies TUM 2020 9
2 Iterative Solution of Linear Systems

Problem
We want to solve the (large) linear system
Ax = b , A ∈ C n×n , b ∈ C n
with n2 > available storage and aik = 0 for almost all i, k (”sparse matrix”).
A special structure of A (e.g. banded matrix with aik = 0 for |i − k| > const) is
not necessary.
The direct methods for the solution of linear systems investigated up to now are
not suited. Even in case of sufficient storage, fill-in occurs when performing e.g.
Gaussian elimination on a sparse matrix: Many entries of the matrix change
from zero to a non-zero value in the execution of the algorithm.

 Example

• Finite Elements, Finite Differences for the solution of PDEs

• Control theory, stationary state: ẋ = Ax + Bu ⇒ Ax + B(F x) = 0

 Example (discretization of Laplace’s equation)


Consider Laplace’s equation (an elliptic PDE)
uxx + uyy = 0 ∀ (x, y) ∈ [0, 1] × [0, 1]
with the boundary conditions u(0, y) = 0, u(1, y) = 1, u(x, 0) = 0, u(x, 1) = 0.

y, j h=1/4
}

y=1- 4
u(x,y) unknown
u(x,y) given on boundary
3

2
7

4
8

5}
9

6
} h=1/4

1
1 2 3

y=0 - 0
0 1 2 3 4 x, i

x=0 x=1

Figure 1: Grid for finite difference approach to Laplace’s equation

In the simplest case the differential quotient is approximated by the difference


quotient (→ five-point stencil of a point in the grid made up of the point itself
together with its four ”neighbors”)
u(x + h, y) − 2u(x, y) + u(x − h, y)
uxx = , uyy = . . .
h2

c Rainer Callies TUM 2020 10
We make use of the uniform grid xi = i · h, yj = j · h and define uij := u(xi , yj )
to obtain a linear system

ui+1,j + ui−1,j + ui,j+1 + ui,j−1 − 4ui,j = 0

The result is a sparse linear system for the unknowns uij , i = 1, . . . n, j =


1, . . . , m. These uij are sorted in the z-vector of the system Az = b; the sorting
strategy affects the structure of A.
For the example system sketched in the last figure we obtain using row-wise
numbering (3 rows with 3 variables each)
   
T B 0 4 −1 0
A= B T B , T =  −1 4 −1  , B = diag(−1, −1, −1)
0 B T 0 −1 4 3×3

and    
u11 u10 + u01
   
 u21   u20 
   
 u31   u30 + u41 
Az = A 
 u12
=
  u02
=b

   
 ..   .. 
 .   . 
u33 u34 + u4,3

The five-point stencil at the grid point with the (green) number k leads to the
k-th row of the linear system.

If system size increases, the block structure remains unchanged, but the size
and the number of the blocks increase accordingly and the ratio of the zero
elements increases too. 

Solution strategy: iterative methods


The objective of the algorithms is to limit fill-in and storage requirements.
The memory requirement per non-zero element is larger than in case of di-
rect methods, because not only the value of aik has to be stored, but also its
address i, k. Fortunately, the numer of non-zero elements is (relatively) low. 

2.1 Linear Iterative Methods – Stationary Methods

2.1.1 Introduction
Definition
Let us consider the problem Ax = b , A ∈ C n×n , b ∈ C n .


c Rainer Callies TUM 2020 11
An iterative method is called convergent, if for each initial guess x(0) ∈ C n (or
IRn ) a sequence {x(m) } of approximations is generated that converges to the
solution x∗ of the linear system Ax = b

lim x(m) = x∗
m→∞

The iterative method is called consistent, if x∗ is a fixed-point of the iteration.


The iterative method is called linear, if x(m+1) depends linearly on x(m) and b

x(m+1) = M x(m) + N b

with the square matrices M, N ∈ C n×n properly chosen. 

 Example
The modified Richardson iteration is
( )
x(m+1) = x(m) + ω b − Ax(m)

where ω is a scalar parameter that has to be chosen such that the sequence
{x(m) } converges. It is easy to see that the method has the correct fixed point
x∗ and thus is consistent.
In our notation, N = ωI and M = I − ωA.
If there are both positive and negative eigenvalues of A, the method will di-
verge for any ω if the initial error x(0) − x∗ has nonzero components in the
corresponding eigenvectors.
We observe that in each iteration step the main effort is one matrix-vector
multiplication only corresponding to O(N 2 ) operations for general matrices and
O(N ) operations for sparse matrices. 

2.1.2 Classical linear iterative methods


Let denote D the diagonal part of A ∈ IRn×n , E its strictly lower triangular part
and F its strictly upper triangular part. We get the additive decomposition

A = E +D+F

 Example
 
1 2 3 4
 
 5 6 7 8 
A :=   ⇒
 11 12 13 14 
15 16 17 18
     
0 0 0 0 1 0 0 0 0 2 3 4
     
 5 0 0 0   0 6 0 0   0 0 7 8 
E =  , D =  , F =  
 11 12 0 0   0 0 13 0   0 0 0 14 
15 16 17 0 0 0 0 18 0 0 0 0


c Rainer Callies TUM 2020 12
The following iterations again are all fixed-point iterations and thus consistent.

Iterations schemes for different classical methods


Choose an initial guess x(0) to the solution x∗ of the linear system Ax = b.
for m = 0 to p do
A1 x(m+1) + A2 x(m) = b ⇒ A1 x(m+1) = b − A2 x(m)

x(m) denotes the result after the m-th iteration cycle. Comparing the above
decomposition with the general definition, we get M = −A−1 −1
1 A2 and N = A1 .

Now we choose for the


Jacobi method (J) : A1 = D , A2 = E + F
Gauß-Seidel method (GS) : A1 = D + E , A2 = F
Successive over-relaxation (SOR) : A1 = D/ω + E , A2 = (1 − 1/ω)D + F
We always decompose such that A1 + A2 = A (additive decomposition).
Motivation for the choice of those A1 :
The resulting linear system A1 x(m+1) = . . . can be efficiently solved.
The initial guess x(0) of course should be as good as possible, but in principle
there is no restriction for the choice.
Iteration stops as soon as ∥x(p+1) − x∗ ∥ < tol. 

What basic idea do these methods have in common?


We start with the residual vector r(m) := b − Ax(m) into the (m + 1)th cycle.
(m)
In the i-th substep (i = 1, . . . , n) we try to make one (!) component ri precisely
zero: For that we choose an index pair (i, k) with aik ̸= 0 and modify
(m) (m) (m) (m+1)
xk → xk + δxk =: xk such that

(m) (m)

n
(m) (m) (m)
ri − aik δxk = bi − aij xj − aik δxk = 0 ⇒ δxk = ... (∗)
j=1

(m)
From the linear equation (∗) the correction δxk can be calculated. After that
iteration step the i-th equation is exactly fulfilled for one moment, but already in
the next step that property is destroyed again and another equation is exactly
fulfilled (i.e. another component of the residual is precisely zero).
We have to assure that each row – and thus each component of the residual –
is reached.

The different methods mainly differ in their strategy to choose the sequence of
index pairs (i, k). We hope, that for a proper choice of the index pairs (i, k) the
iteration converges: x(m) → x∗ . That remains to be studied in detail.


c Rainer Callies TUM 2020 13
In the Gauss-Seidel method (GS) we cyclically choose i = k = 1, 2, . . . , n and
thus carry out m times complete cycles of n steps to obtain x(m) . Permutations
may be necessary to achieve aii ̸= 0.

Successive Over-Relaxation (SOR) refines that process by choosing a relax-


ation factor ω ̸= 0 (mostly ω ∈ ]1, 2[). Therefore, the older method (GS) is a
special case of (SOR).

Algorithm 1: Method of successive over-relaxation (SOR)

 Start: x(0) arbitrary initial guess

for m = 1 to p do
Choose relaxation factor ω = ω(m) ̸= 0
for i = 1 to n do
 

n
xi := xi + ω bi − aij xj  /aii
j=1

The Jacobi method (J) modifies the inner loop:


...
for i = 1 to ndo   

n ∑
n
x′i := bi − aij xj  /aii = xi + bi − aij xj  /aii
j=1,j̸=i j=1

for i = 1 to n do
xi := x′i
...
(m)
In step i = 1, . . . n of the (m + 1)-th cycle the i-th component xi is modified
(m)
such that ri = 0; the modification is not immediately applied, but only stored.
After the complete cycle has been finished all stored modifications are applied
(m) (m+1)
simultaneously (xi → xi for i = 1, . . . , n). This method nowadays is not
used very often. 

Remark
For sparse A, the iteration steps in (GS), (SOR) and (J) are cheap!

We prove: The (J) algorithm is compatible with the matrix formulation.


By construction of the (J) algorithm we get
x(m+1) = x(m) + D−1 (b − Ax(m) ) = D−1 (b − Ex(m) − F x(m) )
Then we multiply the complete equation by the matrix D. 


c Rainer Callies TUM 2020 14
We prove: The (SOR) algorithm is compatible with the matrix formulation.
Let us analyze step i of the inner loop in the (m + 1)-th cycle.
(m+1) (m+1)
In that case the components x1 , . . . , xi−1 in the inner loop already have
been updated. For i ∈ {1, . . . , n} step i can be written in detail
 
ω  ∑
i−1 ∑n
aij xj 
(m+1) (m) (m+1) (m)
xi = xi + bi − aij xj −
aii
j=1 j=i
( )
= xi + ω D−1 (b − Ex(m+1) − Dx(m) − F x(m)
(m)
i−th component

Multiplication by D/ω from the left yields the desired result. 

Convergence Theorem 1
If {x(m) } converges at all, then it converges to x∗ .

Proof:
If x∗ = limm→∞ x(m) exists, then we get by insertion into the matrix formulation
A1 x∗ + A2 x∗ = b ⇒ x∗ = (A1 + A2 )−1 b = A−1 b
and x∗ is uniquely determined for det(A) ̸= 0. 

Definition
Let denote ε(m) := x(m) − x∗ the error after the m-th iteration cycle
and ϱ := max |λi (A−1
1 A2 )| the spectral radius (= largest absolute value of the
1≤i≤n
eigenvalues) of the matrix A−1 −1
1 A2 . (−A1 A2 ) is called iteration matrix. 

Convergence Theorem 2
It is ε(m+1) = −A−1
1 A2 · ε
(m) and consequently ε(m) = (−A−1 A )m ε(0) .
1 2

Proof:
A1 x(m+1) + A2 x(m) = b = A1 x∗ + A2 x∗ ⇒ A1 ε(m+1) + A2 ε(m) = 0
The second formula can be proven by induction. 

Convergence Theorem 3

lim x(m) = x∗ = A−1 b ∀ x(0) ⇔ ϱ<1


m→∞

Moreover, convergence is ”linear”: ε(m) = O(ϱm )


c Rainer Callies TUM 2020 15
Proof: (only ”⇒”, by contradiction)
If ϱ ≥ 1, then there exists at least one EV vmax for the maximum EW λmax with
|λmax | ≥ 1. If x(0) = x∗ − vmax , then by definition ε(0) = vmax .
From convergence theorem 2 we get (using the EW/EV definition)

ε(m) = (−A−1 m EV m m
1 A2 ) vmax = (−1) λmax vmax

Taking the norm gives ∥ε(m) ∥ = |λmax |m ∥vmax ∥ ≥ ∥vmax ∥ and thus no conver-
gence . 

Remark
Per decimal digit of precision therefore −1/ log10 (ϱ) cycles have to be per-
formed. 

Convergence Theorem 4

 A ∈ C n×n :
(SOR) converges – if at all – only for ω ∈ ]0, 2[ .
 A ∈ C n×n positive definite:
(GS) converges, (SOR) converges for ω ∈ ]0, 2[ fixed,
convergence of (J) not granted

 A ∈ C n×n strictly diagonal-dominant, i.e. |aii | > nj=1,̸=i |aij | , i = 1, . . . , n :
(J) and (GS) converge.

Convergence rates of the methods (GS), (SOR) and (J) are investigated later.

 Example
Jacobi method and 5-point stencil for Laplace’s equation

 
−1 4 0 −1 0
 
 −1 . . . . . . ..
. 
 
 .. .. .. 
 . . −1 . 
 
 0 −1 4 −1 
 0 
 
 −1 0 4 −1 0 
A=
 .. . .


 . −1 . . . . 
 
 .. .. .. 
 . . . −1 
 
 0 −1 0 −1 4 
 
 
 
N 2 ×N 2


c Rainer Callies TUM 2020 16
1 1
−A−1 −1
1 A2 = −D (E + F ) = − I(A − 4I) = I − A
4 4
 
0 1 0 1 0
 
 1 ... ... ..
. 
 
 .. .. .. 
 . . 1 . 
 
 0 
 1 0 0 1 
 
1

1 0 0 1 0 

=
4

..
.
.
1 .. ..
. 

 
 .. .. .. 
 . . . 1 
 
 0 1 0 1 0 
 
 
 
N 2 ×N 2

Claim:
−A−1 2
1 A2 has the N eigenvectors z
(k,l) , k, l = 1, . . . , N, with the components
( ) ( )
(k,l) kπi lπj
z(i−1)·N +j := sin sin
N +1 N +1
and the correspondig eigenvalues
( )
(k,l) 1 kπ lπ
λ := cos + cos
2 N +1 N +1
The index i refers to the i-th block row and the index j to the number of the
single row within this block row.

Proof:
For e.g. the 1st component we get using trigonometric angle sum identities

4(−A−1
1 A2 z
(k,l)
)1
(k,l) (k,l) (k,l) (k,l)
= z2 + zN +1 = zi=1,j=2 + zi=2,j=1
kπ 2lπ 2kπ lπ
= sin sin + sin sin
N +1 N +1 N +1 N +1
kπ lπ lπ kπ kπ lπ
= sin · 2 sin cos + 2 sin cos · sin
N +1 N +1 N +1 N +1 N +1 N +1
(k,l)
= 4λ(k,l) z1

Conclusion: For the convergence rate we get


π
ϱ(−A−1
1 A2 ) = max |λ
(k,l)
| = cos
k,l N +1

This is a typical behavior often observed when iteration methods are applied to
linear systems resulting from the discretization of PDEs by mult-point stencils:
If system size increases, not only the effort per iteration cycle increases, but
the number of necessary cycles increases too! 


c Rainer Callies TUM 2020 17
Definition (consistently ordered matrix)
Given A = D + E + F = D(I + L + U ) ∈ C n×n with L := D−1 E and U := D−1 F .
We define J(α) := −(αL + α−1 U ) , α ∈ C \ {0}.
A is consistently ordered, if the EWs of the matrix J(α) are independent of α.


Remark
By reordering the variables x1 , . . . xn of a linear system Ax = b, the resulting
and new matrix can be consistently ordered (that is the reason for the notion).


 Example (tridiagonal matrices)

   
2 1 0 0 1/α 0
 1 2 1  1
⇒ J(α) = −  α 0 1/α 
2
0 1 2 0 α 0
 
2λ 1/α 0
1
⇒ J(α) − λI = −  α 2λ 1/α 
2
0 α 2λ

Expanding the determinant det(J(α) − λI) = 0 along a column yields



2λ((2λ)2 − α · (1/α)) − α =0
α
and thus the characteristic polynomial – and with it the EWs – are independent
of α. 

Convergence Theorem 5 (for consistently ordered matrices)

Assumption: A consistently ordered


Claim: ϱGauß−Seidel = (ϱJacobi )2
Remark: (J) needs approximately twice as many iterations as (GS).

Convergence Theorem 6 (for consistently ordered matrices)

Assumption: A consistently ordered


EWs of J = J(α) are real-valued
ϱJ := ϱJacobi < 1
H(ω) := −A−1 −1
1 A2 = −(D/ω + E) ((1 − 1/ω)D + F ) defines the
iteration matrix of the SOR method


c Rainer Callies TUM 2020 18
Claim: For the optimal relaxation parameter ωb we get
2
ωb := arg min ϱ(H(ω)) = √ , ϱ(H(ωb )) = ωb − 1
ω∈ ]0,2[
1 + 1 − ϱJ
2

and in general
{
ω−1 √ , ω ∈ [ωb , 2]
ϱ(H(ω)) =
1 − ω + ω 2 ϱ2J /2 + ωϱj 1 − ω + ω 2 ϱ2J /4 , ω ∈ ]0, ωb ]

Remark: If only a coarse estimate of ωb is known, it is better to choose ωb a


little bit larger than the resulting ωb .

0.8

0.6
ρ(H(ω))

0.4

ρ = 0.3
J
0.2 ρJ = 0.7
ρ = 0.9
J

0
0 0.5 1 1.5 2
ω

Figure 2: Spectral radius ϱ of SOR as a function of the relaxation parameter ω

 Example (Laplace’s eq., 5-point stencil, example continued)


( )
π
For this example we have proven ϱJ = cos . With that information we
N +1
get for (SOR) and for (GS) – as a special case of (SOR) –
( )
2 2 π
ϱGS = ϱ(H(1)) = ϱJ = cos
N +1
2
ωb = ( )
π
1 + sin
N +1
( )
 2 2 π
cos
ϱ N +1
ϱ(H(ωb )) = ωb − 1 =   =(
J
√ ( ))2
1 + 1 − ϱ2J 1 + sin
π
N +1


c Rainer Callies TUM 2020 19
Question:
How man cycles of (J) do we need instead of one optimal (SOR) cycle?
Answer:
ln ϱ(H(ωb ))
ϱkJ = ϱ(H(ωb )) ⇒ k =
ln ϱJ
We want to estimate that expression using several Taylor expansions up to
O(N −3 ) to get an idea of the order of magnitude of that effect:
( ( ))
π
ln ϱ(H(ωb )) = 2 ln ϱJ − ln 1 + sin
N +1
( ) ( )2
π 1 π
cos = 1− + O(N −4 ) for N ≫ 1
N +1 2 N +1
ln(1 + z) = z + O(z 2 ) = z − z 2 /2 + O(z 3 ) ⇒
 
( )2
 1 π 
ln ϱJ = ln 
1 − 2 + O(N −4 )

N +1
| {z }
=:z
( )2
1 π
= − + O(N −4 )
2 N +1
)(
π π
1 + sin = 1+ + O(N −3 )
N +1 N +1
( ( )) ( )2
π π −3 1 π
ln 1 + sin = + O(N ) − + O(N −6 )
N +1 N +1 2 N +1
( ( ))
π 2π
ln ϱ(H(ωb )) = 2 ln ϱJ − ln 1 + sin =− + O(N −3 )
N +1 N +1
4
⇒ k(N ) ≈ (N + 1)
π
In our example, the optimal SOR method is more than N times faster than (J) !!


Block Iteration Methods


Block iteration schemes are generalizations of the ”point” iteration schemes
described above. They update a whole set of components at each time, typ-
ically a subvector of the solution vector, instead of only one component. The
matrix A and the right-hand side and solution vectors of Ax = b are partitioned
as follows:
     
A11 · · · A1M x1 b1
 ..  , x =  ..  , b =  .. 
A →  ... .   .   . 
AM 1 · · · AM M xM bM

in which the partitionings of b and x into subvectors bi and xi are identical and
compatible with the partitioning of A.


c Rainer Callies TUM 2020 20
We assume that the Aij are square matrices with det(Aii ) ̸= 0.
Now we define, similarly to the scalar case, the splitting A = D + E + F with
   
A11 0
   
 A22   A12 0 
D=  ..
, E= .
  . .. ..
 , F = ...

 .   . . . 
AM M AM 1 AM 2 · · · 0

With these definitions, it is easy to generalize the iterative methods defined


earlier – e.g. Jacobi, Gauss-Seidel, and SOR –, which made one scalar com-
(m)
ponent ri of the residual equal to zero in each substep.
For example, the block Jacobi iteration is now defined as a technique in which
(m+1)
the new subvectors xi are all calculated according to

Dx(m+1) + (A − D)x(m) = b ⇒
(m+1)

M
(m)
Ajj xj = bj − Ajk xk , j = 1 . . . M
k=1,k̸=j

The iterative method simply is applied blockwise instead of componentwise.


(m+1)
The (much smaller) linear subsystems Ajj xj = . . . can be solved by a
direct method (LU, QR, . . . ) each.

 Example
With finite difference approximations of PDEs, it is standard to block the vari-
ables and the matrix by partitioning along whole lines of the mesh. More gen-
eral, a block can also correspond to the unknowns associated with a few con-
secutive lines in the plane. One such blocking is illustrated for a 6 × 6 grid:

31 32 33 34 35 36

25 26 27 28 29 30

19 20 21 22 23 24

13 14 15 16 17 18

7 8 9 10 11 12

1 2 3 4 5 6

Figure 3: Blocking example for a 6 × 6 mesh: partitioning into three subdomains

The corresponding matrix has the following block structure:


c Rainer Callies TUM 2020 21
Figure 4: Block structure of the matrix A associated with that mesh


The advantage of block iterations is the smaller number of iteration cycles (in
our benchmark example of the 5-point stencil the number of iterations depends
on N and in case of block iterations on the much smaller number M ≪ N ). So
the number of cycles required to achieve convergence often decreases rapidly
as the block-size increases.
The disadvantage is, that the effort per cycle significantly increases, because
the subproblems – linear systems with Aii – have to be solved directly. More-
over fill-in may occur in the subproblems.
Finally, block techniques can be defined in more general terms. First, by using
blocks that allow us to update arbitrary groups of components, and second,
by allowing the blocks to overlap. This is a form of the domain-decomposition
method.

Theorem
Let be A a matrix in block-tridiagonal form
 
D1 A12
 A 
 21 D2 A23 
 
 .. .. 
A= A32 . . 
 
 .. .. 
 . . AM −1,M 
AM,M −1 DM

and let the Aii = Di be square diagonal matrices.


Then A is called T -matrix and A is consistently ordered.


c Rainer Callies TUM 2020 22
2.2 Methods Based on Minimization – Krylov Subspace Methods
2.2.1 Fundamental Idea
Let us solve Ax = b , A ∈ C n×n ∧ A positive definite.
This problem is substituted by the minimization problem
1
min f (x) with f (x) := xT Ax − bT x
x̸=0 2

2.2.2 Simplest realization: Gradient method


The gradient is the direction of steepest descent; naively one might think that
this qualifies the method to find the minimum most quickly.

Algorithm 2: Gradient method

 Start: x(0) arbitrary initial guess

for k = 0, 1, 2 . . . do
(A) Determine the search direction (gradient)

dk := −∇f (x(k) ) = b − Ax(k)

(B) One-dimensional minimization along the search direction

αk := arg min{f (x(k) + t · dk )}


t≥0

We can explicitly determine the argument αk from

! d
0 = f (x(k) + t · dk )
dt [ ]
d 1 (k)
= (x + t · dk ) A(x + t · dk ) − b (x + t · dk )
T (k) T (k)
dt 2
( )
(k) T
T
= tdk Adk + x Adk − b dk = tdTk Adk − dTk dk
T

dTk dk
⇒ t = =: αk
dTk Adk

(C) Update: x(k+1) := x(k) + αk dk

Remarks
 The method converges to the solution x∗ , if A is positive definite.
 The method is converging locally for α sufficiently small and ∇f (x(k) ) ̸= 0:
( )
f (x + αdk ) = f (x ) + ∇f (x ) −α∇f (x ) + O(α2 d2k ) < f (x(k) )
(k) (k) (k) T (k)


c Rainer Callies TUM 2020 23
 Caution necessary if ∇f (x(k) ) is determined numerically e.g. from finite
difference approximation (error in search direction!). 

Rate of convergence
Let be A ∈ IRn×n positive definite with EWs 0 < λ1 ≤ λ2 ≤ . . . ≤ λn ; let us define
κ := cond2 A = λn /λ1 and f (x) := 12 xT Ax − bT x.
Minimization
{ of } the quadratic function f (x) by the Gradient method produces a
(k)
sequence x with
k∈IN0

( )k
(k) ∗ κ−1 (0)


x − x ≤ x − x∗
A κ+1 A

Here ∥x∥A := xT Ax denotes the so-called ”energy norm” and x∗ the exact
solution of Ax = b.

Remark
If the linear system is ill-conditioned, then we get for the rate of convergence
because of κ ≫ 1:
κ−1
≈1
κ+1
After a few iteration steps the iteration almost terminates.
Linear systems which result from the discretization of elliptic PDEs are often
ill-conditioned. 

 Example
1
Let us apply the Gradient method to the function f (x, y) := (x2 + ay 2 ) , a ≫ 1,
2
with initial values x(0) = (x0 , y0 ) = (a, 1).
( )
1 0
⇒ A= , κ=a
0 a
( ) ( (1) ) ( )
1 2 x a a−1
d0 = −a , α0 = ⇒ =ϱ , ϱ := ≈1
1 1+a y (1) −1 a+1
( ) ( (2) ) ( ) ( (0) )
1 2 x a x
d1 = −ϱa , α0 = ⇒ =ϱ 2
=ϱ 2
−1 1+a y (2) 1 y (0)
...
( ) ) (
x(k) a k
by induction we prove: =ϱ
y (k) (−1)k
( (k) ) ( )
x x(k)
gradient: dk = − , dk+1 = −ϱ
ay (k) −ay (k)
⇒ dk ⊥ dk+1 for this special case.


c Rainer Callies TUM 2020 24
We observe that always after two iterations dk is parallel to dk+2 : We are
searching for a better approximation of the solution in a direction, in which
we already have searched two steps before. That is not very efficient! 

–1 0 1 2 3 4 5 x

–1

x2
Figure 5: Gradient method applied to f (x, y) := + y 2 , x(0) = (4.5, 3). Iterates shown
2
in a contour picture with contour lines at 19.125, 12, 6, 1.976, 0.204, 0.021, 0.0022.

2.2.3 Scalar Product


An inner product space is a vector space with an additional structure called
an inner product (= scalar product). This additional structure associates each
pair of vectors in the space with a scalar quantity. Inner products allow the
generalized definition of orthogonality.

Definition (scalar product, Hilbert space)


Consider the vector space X = Kn (K = IR or C ). A mapping

⟨·, ·⟩X : X × X → K

is called scalar product, if ∀ x, y, z ∈ X, ∀ α ∈ K we get

(1) ⟨x, x⟩X ≥ 0 ∧ (x = 0 ⇔ ⟨x, x⟩X = 0)

(2) ⟨x + αy, z⟩X = ⟨x, z⟩X + α⟨y, z⟩X

(3) ⟨x, y⟩X = ⟨y, x⟩X

X with the scalar product ⟨·, ·⟩X is called pre-Hilbert space.


An inner√product naturally induces an associated norm by the definition
∥x∥X := ⟨x, x⟩X , thus an inner product space is also a normed vector space.
If X is complete with reference to this special norm (Banach space), then X is
a Hilbert space.


c Rainer Callies TUM 2020 25
The lower index in ⟨·, ·⟩X reminds us, that the scalar product has to be specified
exactly . 

 Example
X := IRn with

n
⟨x, y⟩2 := xi yi = xT y
i=1

is a Hilbert space, if ∥x∥2 := ⟨x, x⟩2 . 

Definition (orthogonality)
Let X be Hilbert space, then
x, y ∈ X orthogonal (or x ⊥ y) :⇔ ⟨x, y⟩X = 0


2.2.4 CG Method (Conjugate Gradient Method)

Fundamental idea
In contrast to the gradient method we want to avoid searching in the same
direction several times!

What are the consequences for a mathematical algorithm?


If x(k) is optimal with respect to the search direction p ̸= 0 , then the search
direction q in the next iteration step x(k) → x(k+1) is chosen such that also
x(k+1) is optimal in the direction p.

How can we realize this idea?


”x(k) optimal in the direction p ̸= 0” means, that (at least locally) in the direction
p no further improvement is possible (→ definition of ”contour line”)
∇f (x(k) )T · p = 0
The same property should be valid also for x(k+1)
!
0 = ∇f (x(k+1) )T · p = ∇f (x(k) + αq)T · p , α ̸= 0
⇒ 0 = (Ax(k+1) − b)T p = (A(x(k) + αq) − b)T p
= (Ax(k) − b)T p + α(Aq)T p = ∇f (x(k) )T · p +α(Aq)T p
| {z }
=0
⇒ 0 = (Aq) p , T
because α ̸= 0

Definition (conjugate vectors)


Let be A positive definite.
Vectors with the property q T Ap = 0 are called conjugate with respect to A. In
the special scalar product defined by ⟨p, q⟩A := pT Aq the conjugate vectors are
perpendicular (= orthogonal). 


c Rainer Callies TUM 2020 26
Remark (linear independence)
Let be {p1 , . . . , pk } ⊂ IRn mit k ≤ n pairwise conjugate vectors with respect to A
(A positive definite) with pj ̸= 0.
Then the pj are linearly independent, orthogonal in the special scalar product
⟨·, ·⟩A and span a k-dimensional (sub-)space, because
 T
∑ k ∑k
αj pj = 0 ⇒ 0 =  αj pj  Api = αi (pTi Api ) ⇒ αi = 0, i = 1, . . . , k
j=1 j=1

The last step holds because pTi Api ̸= 0 for A positive definite and pi ̸= 0.

Algorithm 3: CG method (core algorithm)

 Given: A ∈ IRn×n positive definite, b ∈ IRn


{p0 , . . . , pn−1 } ⊂ IRn pairwise conjugate w.r.t. A, pj ̸= 0

 Start: x(0) ̸= 0 arbitrary initial guess

for k = 0, 1, 2, . . . , n − 1 do
(A) Calculate the original search direction: dk := b − Ax(k)
(B) Determine αk from

dTk pk ⟨dk , pk ⟩2
αk = =
pk Apk ⟨pk , pk ⟩A
T

(C) Update with modified search direction: x(k+1) := x(k) + αk pk

Detailed analysis of step (B)


Let be x∗ := A−1 b. Because the {pj } are a basis of the IRn w.r.t ⟨·, ·⟩A , we get

n−1 ∑
n−1
∗ ∗
x −x (0)
= αj pj ⇐⇒ x = x (0)
+ αj pj .
j=0 j=0

From the step x(k+1) := x(k) + αk pk in part (C) we see that in the (k + 1)th
iteration step the correct component αk pk in the direction of one basis vector
pk is added (recursive scheme) to get the improved approximation x(k+1) of the
true solution x∗ . The direction of the basis vector pk is only used once.
How do we obtain the αk in this algorithm? We investigate ⟨pk , x∗ − x(0) ⟩A :
( )

k−1
pTk A(x∗ − x(0) ) = pTk (b − Ax(0) ) = pTk b − Ax(0) − αi Api
i=0
= pTk (b − Ax(k) ) = pTk dk


c Rainer Callies TUM 2020 27
We have subtracted the sum in the bracket, because in the steps before the
components of x∗ − x(0) in the direction of p0 , . . . , pk−1 already have been added
and the scalar product is not changed by that operation (the {pj } are conju-
gate). On the other hand we can use the basis representation and get
 

n−1
pTk A(x∗ − x(0) ) = pTk  αj Apj  = αk · (pTk Apk ) = αk · ⟨pk , pk ⟩A
j=0

From these two equations we can calculate the unknown αk . 

Important remarks and properties

 The CG method strongly resembles the Gradient method.


Here we have directly derived the core algorithm via the basis represen-
tation. It can be shown that this algorithm minimizes the residuum in a
properly chosen subspace too (see below).
 In an exact calculation without numerical errors (!!) after n steps all αj
have been determined and the true solution x(n) = A−1 b = x∗ has been
calculated.
 The algorithm is an iterative algorithm because of the numerical errors that
accumulate for n ≫ 1. The iteration is continued until ∥dk ∥ ≤ tol.
Often we do not calculate all pk , but already obtain a good approximation to
x∗ after applying a few iteration steps: pk , k = 1, . . . , N < n. This allows us
to approximately solve systems where n is so large that the direct method
would take too much time.
 The CG method is numerically stable even in presence of rounding errors.
 Using very tricky programming techniques only one of the expansive matrix-
vector multiplications is needed per iteration step (→ Hestenes/Stiefel)
 In the core algorithm the basis was assumed to be given. That is no realistic
situation.
In a real algorithm, the basis {pj } is constructed simultaneously with the
main iteration. For that, we add at the end of algorithm 3 an additional step
(D) which – here without proof – can be written as

dTk+1 dk+1
p0 = d0 and pk+1 = dk+1 + pk .
dTk dk

Of course, then dk+1 already has to be calculated before. 


c Rainer Callies TUM 2020 28
Theorem (minimum property of the iterate)
Let be A ∈ IRn×n positive definite, Vk := span{p0 , . . . , pk−1 } and apply the CG
method from algorithm 3.
Then the approximation x(k) of x∗ minimizes the function f (x) := 21 xT Ax + bx
not only along the line {x(k−1) + αpk−1 , α ∈ [0, 1]} (analogously to the Gradient
method), but in the total subspace x(0) + Vk . 

Theorem (rate of convergence of the CG method)


Assumption analogously to the Gradient method. Let denote κ = cond2 A.
For the CG method we obtain
(√ )k
∗ κ−1
∥x(k)
− x ∥A ≤ 2 √ ∥x(0) − x∗ ∥A
κ+1


Remarks

 Comparison with Gradient method ⇒ instead of κ now κ
 Generalization to arbitrary matrices possible (GMRES → generalized mini-
mum residuum, MINRES)
 Increasing efficiency by preconditioning → κ is changed
Idea of preconditioning:
Given Ax = b with A ∈ IRn×n positive definite. Choose B ∈ IRn×n positive
definite too and solve instead of the original problem

Ãx̃ = b̃ with à := BAB, x̃ := B −1 x, b̃ := Bb


or Ãx = b̃ with à := BA, b̃ := Bb
or Ãx̃ = b with à := AB, x̃ := B −1 x

Choose B such that κ(Ã) ≪ κ(A) and that B can be cheaply applied. A
good preconditioner concentrates the EWs.


 Example
A simple matrix B for preconditioning is a diagonal matrix, the diagonal ele-
ments of which are the inverse of the roots of the diagonal elements of the
original matrix.
This idea is motivated by the following theorem:
For a positive definite matrix the minimum EW is smaller or equal the minimum
diagonal element, the maximum EW is greater or equal the maximum diagonal
element. All EWs are positive. 


c Rainer Callies TUM 2020 29
3 Numerical Solution of Ordinary Differential Equations
3.1 Basic Definitions and Transformations
Let be U ⊆ IR × IRn a domain (i.e. an open and connected subset), f : U → IRn
sufficiently often differentiable (theory says: at least continuous) and the initial
value (t0 , x0 ) ∈ U .
We want to determine a function x ∈ C 1 (I, IRn ) on an open and connected
interval I = ]t0 , tf [ ⊂ IR such that

x′ (t) = f (t, x(t)) ∧ t0 ∈ I ∧ x(t0 ) = x0 ∧ (t, x(t)) ∈ U ∀ t ∈ I

Analytically such a solution often does not exist. Thus we want to calculate
numerically the solution of the above described initial value problem (IVP) of
an ordinary differential equation (ODE).

Remarks

 The t-argument can be formally removed by introducing an additional vari-


able xn+1 (t) := t together with fn+1 (t, x) := 1: The ODE then is called
autonomous .
 The IVP is equivalent to the following integral equation
∫ t
x(t) = x0 + f (ξ, x(ξ)) dξ
t0

This formula we will use for the construction of numerical methods.


 The interval I can always be transformed to ]0, 1[.
 We only solve explicit ODEs.
 Any ODE of higher order can be transformed to a system of ODEs of first
order, e.g.

x′′′ = f (t, x, x′ , x′′ ), f ∈ C 2 (I × IRn × IRn , IRn )


 ′   
z1 (t) := x(t) z1 z2
⇒ z2 (t) := x′ (t) ⇒ z ′ =  z2′  =  z3 
z3 (t) := x′′ (t) ′
z3 f (t, z1 , z2 , z3 )

 Example
( ) ( )
y1′ (x) x · y22 (x)
= = f (x, y(x), y(x)′ )
y2′′ (x) y2′ (x) + x · y1 (x)
( )
1
y(4) = , y2′ (4) = 7 , I = ]4, 13[
2


c Rainer Callies TUM 2020 30
Transformation of the ODE into an autonomous system of first order with the
new independent variable ξ instead of x leads to
   
z1 = y 1 z1′ z4 · z22
 ′   
z2 = y 2 ′  z2   z3  ˜
⇒ z (ξ) =   =   = f (z(ξ)), ξ ∈ ]4, 13[
z3 = y2′  z3′   z3 + z4 · z1 
z4 = x z4′ 1

After that we transform to the standard ”time” interval ]0, 1[ by

ξ −4 d d d 1 d
ξ → t := ⇒ t ∈ ]0, 1[ , = (13 − 4) ↔ = ·
13 − 4 dt dξ dξ 9 dt

So we obtain the equivalent and final system in standard form


   
9(z4 (t) · z2 (t)2 ) 1
   
′ d  9z3 (t)   2 
z (t) = z(t) =   , z(0) =   , t ∈ ]0, 1[
dt  9(z3 (t) + z4 (t) · z1 (t))   7·9 
9 4

Prior to the numerical solution always transform a problem into this standard
form! 

3.2 Summary of Important Theorems

Existence theorem of Peano


Let be U ⊆ IR × IRn domain, f ∈ C 0 (U, IRn ) and (t0 , x0 ) ∈ U .
Then the IVP (not the ODE alone!)

x ′ (t) = f (t, x(t)) , x(t0 ) = x0

has at least one (no uniqueness!) solution, which can be extended to the
boundary of U in both directions (i.e. t < t0 and t > t0 ).

Remark
”To extend to the boundary” means to come as close as we want to the bound-
ary of U : either x(t) contains the respective boundary point or ∥x(t)∥ is un-
bounded at the boundary.
To extend to the boundary does not mean that a solution exists on the total
interval [a, b]. The solution might leave U before reaching a or b. 

Definition (Lipschitz condition)


Let be U ⊆ IRn+1 and f ∈ C 0 (U, IRn ).


c Rainer Callies TUM 2020 31
f (t, x) satisfies a (global) Lipschitz condition on U with respect to x with Lip-
schitz constant L , if

∃ L > 0 ∋ ∥f (t, x1 ) − f (t, x2 )∥ ≤ L∥x1 − x2 ∥ ∀ (t, x1 ), (t, x2 ) ∈ U

f (t, x) satisfies a local Lipschitz condition on U w.r.t. x, if for every (t1 , x1 ) ∈


U there exists δ1 = δ1 (t1 , x1 ) > 0 such that f (t, x) satisfies a global Lipschitz
condition on
Uδ1 ( (t1 , x1 ) ) ∩ U
with Lipschitz constant L = L(δ1 , t1 , x1 ) (the constant may differ at different
points). Uδ1 denotes the open ball with radius δ1 around the point (t1 , x1 ). 

Theorem (global Lipschitz condition, sufficient condition)


Let be U ⊆ IRn+1 convex domain, f ∈ C 1 (U, IRn ). If

∂fi

∂xj ≤ K , i, j = 1, . . . , n , ∀ (t, x) ∈ U

– i.e. all partial derivatives are (continuous and) bounded –, then f satisfies a
global Lipschitz condition in U (with L ≤ K · n, if for the norm we have chosen
∥ · ∥ = ∥ · ∥∞ ).

Theorem (local Lipschitz condition, sufficient condition)


( )
∂f
Let be U ⊆ IR n+1
domain, f ∈ C (U, IR ) and Jacob matrix
0 n
∈ C 0 (U, IRn×n ).
∂x
Then f satisfies a local Lipschitz condition in U .

Remark: no boundedness, no convexity, less smoothness (w.r.t. t) necessary

Theorem (global existence and uniqueness)


Let be U = [a, b] × IRn domain and f ∈ C 0 (U, IRn ); let f satisfy a global Lipschitz
condition in U w.r.t. x.

For every (t0 , x0 ) ∈ U there exists exactly one solution x(t) of the IVP

x ′ (t) = f (t, x(t)) , x(t0 ) = x0

defined on the full interval a ≤ t ≤ b.


c Rainer Callies TUM 2020 32
Remark
In this case it cannot happen that |x(t)| → ∞ for t → t1 ∈ ] t0 , b [ ; the solution
exists and is uniquely defined on the full interval [a, b].
Sufficient condition: f ∈ C 1 and U = Q (Q cuboid) sufficiently large. 

Theorem (local existence and uniqueness)


Let be U = [a, b] × IRn domain and f ∈ C 0 (U, IRn ); let f satisfy a local Lipschitz
condition in U w.r.t. x.

For every (t0 , x0 ) ∈ U there exists exactly one solution x(t) of the IVP

x ′ (t) = f (t, x(t)) , x(t0 ) = x0 .

This solution can be extended to the boundaries of U in both directions.

Remark
In may happen that |x(t)| → ∞ for t → t1 ∈ ] t0 , b [ . The solution exists and is
unique on an interval ]c, d [ ⊆ ]a, b[ with t0 ∈ ]c, d [ .
Sufficient condition: Jacobian w.r.t. ⃗x is continuous. 

Remark
The exact calculation of the Lipschitz constant is mostly impossible, we can
only estimate it. 

Theorem (continuous dependency of a solution)


Assumptions:

 I ⊂ IR interval, U ⊆ I × IRn
 f ∈ C 0 (U, IRn ) satisfies global Lipschitz condition in U w.r.t. x with Lipschitz
constant L.
 x ∈ C 1 (I, IRn ) is solution of IVP x(t)′ = f (t, x(t)), x(t0 ) = x0 , t0 ∈ I (∗)
 Let z ∈ C 1 (I, IRn ) denote an approximation to the solution of the IVP (∗) with

∥z(t0 ) − x(t0 )∥ ≤ γ ∧ ∥z ′ (t) − f (t, z(t))∥ ≤ δ , γ, δ > 0 const.

 graph(x(t)) ⊂ U , graph(z(t)) ⊂ U

Claim:
L|t−t0 | δ ( L|t−t0 | )
∥x(t) − z(t)∥ ≤ γe + e −1
L


c Rainer Callies TUM 2020 33
3.3 Numerical Methods: Basic Idea and Notation

Definition
Numerical methods always use a discretization, i.e. we subdivide the integra-
tion interval I = [t0 , tf ]

t0 < t1 < t2 < . . . < tN = tf

The ti are the grid points, hm := tm+1 − tm is the stepsize and Ih :=


{t0 , t1 , . . . , tN } is the mesh of nodes. Grid points are also called discretization
nodes. If hm = h ∀ m, then Ih is called equidistant.
Let denote xi := x(ti ) the exact solution of the IVP x′ = f (t, x), x(t0 ) = x0 at
the point ti .
Let denote ηi := η(ti ) the approximation of the solution obtained by a numer-
ical method at the same grid point; therefore, η is only defined at t = ti , i =
0, 1, . . . , N . 

Basic idea to obtain a numerical method: formal integration

x′ (t) = f (t, x(t)) ⇒


x(t + h) − x(t) 1 t+h
!
= f (ξ, x(ξ)) dξ =: ∆(t, x, h) ≈ ϕ(t, x, h)
h h t

Here ϕ(t, x, h) denotes the numerical approximation to ∆(t, x, h).


ϕ(t, x, h) is the increment function; the notation is a formal one only.
The exact solution

 x(t + h) − x(t)
, h ̸= 0
∆(t, x, h) = h
 f (t, x) , h=0

is the exact relative increment. 

Definition (discretization method, grid function)


A discretization method for the approximation of the solution x(t) of the IVP
x′ = f (t, x), x(t0 ) = x0 , is a numerical rule that tells us how to assign a grid
function η : Ih → IRn to a mesh of nodes Ih . 

 Example
Explicit Euler:

ηm+1 = ηm + hm f (tm , ηm ) ⇒ ϕ(t, x, h) = f (t, x)


c Rainer Callies TUM 2020 34
Implicit Euler:

ηm+1 = ηm + hm f (tm+1 , ηm+1 ) ⇒ ϕ(t, x, h) = f (t + h, x(t + h))

More complicated formulae are possible, e.g. in case of an equidistant grid


( )
ηm+2 = ηm + h/2 · f (tm , ηm ) + 2f (tm+1 , ηm+1 ) + f (tm+2 , ηm+2 )
1( )
⇒ ϕ(t, x, h) = f (t, x(t)) + 2f (t + h/2, x(t + h/2)) + f (t + h, x(t + h))
4
or the explicit mid-point rule

ηm+2 = ηm + 2hf (tm+1 , ηm+1 ) ⇒ ϕ(t, x, h) = f (t + h/2, x(t + h/2))


Definition (one-step method)
A one-step method is a discretization method which for the calculation of ηm+1
only uses ηm , but not e.g. ηm−1 , ηm−2 , . . .
Therefore, a one-step method can be written as

 Initial value: η0 = y0
for i = 1 to N − 1 do
ηi+1 = ηi + hi ϕ(ti , ηi , hi )
ti+1 = ti + hi

Remark
With this definition we also can write the implicit Euler as a one-step method

ϕ(tm , ηm , hm ) := f (tm + hm , ηm + hm ϕ(tm , ηm , hm ))


Remark
If not otherwise stated, we will restrict ourselves to one-step methods! 

3.4 Consistency and Convergence of One-Step Methods

Consider the IVP x′ = f (t, x), x(t0 ) = x0 from chap. 3.1 on the closed interval
I = [t0 , tf ]. Let ϕ be a one-step method which we want to analyze.

Definition (local discretization error)


Let be η̃m+1 the result of a single (!) step of the one-step method with the exact
initial value ηm = x(tm ), i.e.

η̃m+1 = x(tm ) + hm ϕ(tm , x(tm ), hm )


c Rainer Callies TUM 2020 35
Then T (tm , x(tm ), hm ) := x(tm+1 ) − η̃m+1 is called the local discretization error
of the one-step method at tm+1 .
This really is the error of the method after one step only! 

Definition (consistent method)


A method is called consistent, if the local discretization error per unit step
T (t, x, h)/h converges uniformly to zero ∀ t, x for h → 0

∥T (t, x, h)∥
≤ σ(h) ∧ lim σ(h) = 0 ∀ t ∈ I, ∀ x
h h→0

A method is consistent of order p, if

∥T (t, x, h)∥ ≤ C · |h|p+1 =: O(hp+1 ) ∀ t ∈ I, ∀ x, ∀ h ∈ ] 0, hmax ]


The order of consistency describes the quality of the approximation and allows
to compare different discretization methods.

Theorem
ϕ consistent ⇐⇒ lim ϕ(t, x, h) = f (t, x)
h→0

uniformly, i.e. ∀ t ∈ I, ∀ x and ∀ f ∈ C 1 (I × IRn , IRn ).

 Example

Because of ϕ(t, x, h) = f (t, x) the explicit Euler is consistent.


Because ηm+1 = xm + hf (tm , xm ) with (in general) f (tm , xm ) ̸= 0 it is consistent
of order p = 1:

T (tm , x(tm ), hm ) = x(tm+1 ) − η̃m+1


= x(tm + h) − x(tm ) − hf (tm , x(tm ))
T aylor h2
= (ft (tm , xm ) + fx (tm , xm ) · f (tm xm )) + O(h3 )
2
∥T ∥
⇒ ≤ h1 · const = σ(h) → 0 for h → 0 ⇒ p = 1
h

In the autonomous case the discretization method is simplified to

ηm+1 = ηm + hf (xm )


c Rainer Callies TUM 2020 36
Definition (global discretization error)
W.l.o.g. we simply the situation and use a constant stepsize h, i.e. tm = t0 +
m · h.
The global discretization error

e(h, t) := η(t) − x(t) for t := tm = t0 + m · h

directly describes the difference between the true solution and its numerical
approximation. Because of the use of η it is only defined at discrete values
(grid points).

Definition (convergent method)


A method is convergent, if the global discretization error e(h, t) uniformly con-
verges to zero ∀ t ∈ I for h → 0.
A method is convergent of order p if

∥e(h, t)∥ ≤ s(t) · |h|p ∀t ∈ I

and s : I → IR is a bounded function. 

Remark
In contrast to consistency it is very difficult to analyze convergence directly. 

3.5 Construction of One-Step Methods


3.5.1 Strategy
Consider the IVP from chap. 3.1 with n = 1. We want to construct a one-step
method with maximum order of consistency for a given number of evaluations
of the right-hand side f (t, x(t)).
We analyze a single integration step and carry out the Taylor expansion using
x′ = f (t, x)

h2 ′′ h3
x(t + h) = x(t) + hx′ (t) + x (t) + x′′′ (t) + . . . = x(t) + h · ∆(t, x, h)
2 6
hd h2 d2
∆(t, x, h) = f (t, x(t)) + f (t, x(t)) + f (t, x(t)) + O(h3 )
2 dt 6 dt2
h( )
= f (t, x(t)) + ft (t, x(t)) + fx (t, x(t))f (t, x(t))
2
h 2 ( )
+ ftt + 2ftx · f + fxx · f + fx · (ft + fx f ) + O(h3 )
2
6

If we would be able to calculate the necessary derivatives of f (t, x(t)) and x(t),
then we immediately would obtain a method which is consistent of order p.


c Rainer Callies TUM 2020 37
As an example, by
h h( )
ϕ(t, x, h) := x′ (t) + x′′ (t) = f (t, x) + ft (t, x) + fx (t, x)f (t, x)
2 2
we would construct a one-step method of order p = 2, for

x(t + h) − η̃(t + h)
( ) ( )
′ h2 ′′ h3 ′′′ ′ h2 ′′
= x(t) + hx (t) + x (t) + x (t) + . . . − x(t) + hx (t) + x (t)
2 6 2
= O(h3 )

Unfortunately the explicit calculation of derivatives for real problems either is


impossible or too expensive.
We make a general ansatz for the new method instead, e.g.

ϕ(t, x, h) = α1 f (t, x) + α2 f (t + β1 h, x + β2 hf (t, x))

with the free parameters α1 , α2 , β1 , β2 .

The free parameters α1 , α2 , β1 , β2 are chosen such that we obtain a method of


maximum order of consistency. Because for the calculation of ϕ solely (t, x(t))
is needed, this ansatz again leads to a one-step method.

We now want to construct a method which is consistent of order p = 2. For that


we again carry out a two-dimensional Taylor expansion of ϕ(t, x, h) and obtain

f (t + β1 h, x + β2 hf (t, x))
( )
∂ ∂
= f (t, x) + β1 h + β2 hf (t, x) f (t, x)
∂t ∂x
( )2
1 ∂ ∂
+ β1 h + β2 hf (t, x) f (t, x) + . . .
2! ∂t ∂x
( )
1 ∂ ∂
= f + (β1 hft + β2 hfx f ) + β1 h + β2 hf (β1 hft + β2 hfx f ) + . . .
2 ∂t ∂x

With f := f (t, x) we get

ϕ(t, x, h) = α1 f + α2 f + α2 h(β1 ft + β2 fx · f )
α2 2 ( 2 )
+ h β1 ftt + 2β1 β2 f ftx + β1 β2 fx ft + β22 f fxx f + β22 f fx2 + . . .
2

Now we choose the free parameters such that as many h-terms as possible
from the expansion of this ansatz ϕ(t, x, h) match those from ∆(t, x, h).
We get a method of order p = 2 if

1 = α1 + α2
1/2 = α2 β1
1/2 = α2 β2


c Rainer Callies TUM 2020 38
The solution of the nonlinear system is not unique.
1
For the choice α1 = α2 = , β1 = 1, β2 = 1 we get the ”method of Heun”, for
2
1 1
α1 = 0, α2 = 1, β1 = , β2 =
2 2
we get the ”modified Euler”. Both are consistent of order p = 2.
In a similar way a method of order p = 3, . . . can be constructed, if the ansatz
contains a sufficient number of free parameters. 

3.5.2 Explicit Runge-Kutta Methods


We obtain an important class of methods by the Runge-Kutta ansatz
 

s ∑
k−1
ϕ(t, x, h) = bk · fk (t, x, h) , fk (t, x, h) := f t + ck h, x + h αkj fj (t, x, h)
k=1 j=1

with the free parameters bk , αkj , ck . 

The definition specifies an explicit Runge-Kutta method with s stages. For e.g.
s = 4 this method is called RK4 method. The method is called explicit, because
fk can be calculated using f1 , . . . , fk−1 only; these values have been calculated
before and thus are already known.

Algorithm
In the RK method per integration step t → t + h (or tm → tm+1 = tm + hm ) the
following algorithm is executed
 

k−1
fk (t, η(t), h) := f t + ck h, η(t) + h αkj fj (t, η(t), h) , k = 1, . . . s ,
j=1

s
η(t + h) = η(t) + h bk · fk (t, η(t), h)
k=1

In a compact way the parameter set for an s-stage explicit Runge-Kutta method
can be arranged in a Butcher tableau

0 0
c2 α21 0
c3 α31 α32 0
.. .. .. . . ..
. . . . .
cs αs1 αs2 · · · αs,s−1 0
b1 b2 ··· bs−1 bs


c Rainer Callies TUM 2020 39
with the nodes (0, c2 , . . . , cs )T , the Runge-Kutta matrix A := (αkj ) and the
weights (b1 , b2 , . . . , bs )T .
s(s + 1)
These parameters mostly are determined such that the resulting ex-
2
plicit RK method has maximum order of consistency.

Remark
A nice insight into the basic ideas of RK methods is obtained if we use the
following equivalent reformulation of the classical RK method (i.e. a special
RK4 method):
h
ηm+1 = ηm + (k1 + 2k2 + 2k3 + k4 ) , tm+1 = tm + h
6
with
k1 = f (tm , ηm ),
( )
h h
k2 = f tm + , ηm + k1 ,
2 2
( )
h h
k3 = f tm + , ηm + k2 ,
2 2
k4 = f (tm + h, ηm + hk3 ) .
and the Butcher tableau
0
1/2 1/2
1/2 0 1/2
1 0 0 1
1/6 1/3 1/3 1/6

Here a sequence of four Euler steps with stepsizes ci h is performed, all starting
at ηm . After the i-th Euler step (i = 1, 2, 3, 4), the (t, x)-values of the resulting
point are used to calculate an updated slope ki specified by the right-hand side
f (t, x) of the differential equation. This slope is used for the next Euler step in
case of i = 1, 2, 3.
At the end, a final Euler step is performed with stepsize h and a slope which is
the weighted average of the four slopes ki calculated before.
In averaging the four increments, greater weight is given to the increments at
the midpoint. If f is independent of x, then the differential equation is equivalent
to a simple integral and the classical RK4 method reduces to Kepler’s rule. 

Order conditions
After performing the Taylor expansion the coefficients of the Taylor series of
ϕ(t, x, h) and ∆(t, x, h) are compared. The goal is to choose the free para-
meters such that as many h-terms as possible from the expansion of ϕ(t, x, h)
match those from ∆(t, x, h). An example was given in chap. 3.5.1. For the RK
methods, this approach leads to the following order conditions


c Rainer Callies TUM 2020 40
x x(t)

(tm+1,ηm+1)
ηm + h

ηm + h
ηm + h

ηm

tmm tmm + h/2 tmm+ h t

Figure 6: Classical RK4 method generating a sequence of slopes.

order p order conditions



1 i bi = 1

2 i bi ci = 1/2

3 i bi c2i = 1/3

i,j bi αij cj = 1/6

4 i bi c3i = 1/4

i,j bi ci αij cj = 1/8
∑ 2
i,j bi αij cj = 1/12

i,j,k bi αij αjk ck = 1/24

... ...

Remark
The first condition guarantees that the method is consistent at all: Because of

s
ϕ(t, x, h) = bk · fk (t, x, h) we get
k=1
( )

s
h → 0 ⇒ η(t + h) → η(x) ⇒ ϕ(t, η(t), h) = f (t, η(t)) ∀ f ⇐⇒ bi = 1
i=1


c Rainer Callies TUM 2020 41
Remark
In addition we often want that the node condition is satisfied:

i−1
ci = αij
j=1

This condition guarantees, that we get the same numerical results no matter
if the RK method is applied to a non-autonomous IVP or to the same problem
after transformation into autonomous form. 

The numerical effort – an overview


The following table lists the minimum number of stages smin (and with it in most
cases the number of function evaluations) necessary to construct a RK method
of order p. N denotes the number of order conditions that have to be fulfilled.

p 1 2 3 4 5 6 7 8
N 1 2 4 8 17 37 85 200
smin 1 2 3 4 6 7 9 11

RK methods of order p = 2, 4, 5, 8 are often used in practical applications.

 Example
3/8-rule of Kutta: RK method of order p = 4

0
1/3 1/3
2/3 −1/3 1
1 1 −1 1
1/8 3/8 3/8 1/8

Classical RK method: RK method of order p = 4

0
1/2 1/2
1/2 0 1/2
1 0 0 1
1/6 1/3 1/3 1/6

The method of Heun – that we already have discussed – is a RK method of


order p = 2. 


c Rainer Callies TUM 2020 42
3.6 Stepsize Control for One-Step Methods
3.6.1 Basic problem and Solution Strategy
Basic problem

A one-step method calculates in one integration step the solution η(tm+1 ) at


t = tm+1 using (tm , η(tm )) only

tm → tm+1 = tm + hm , tm , tm+1 ∈ ]t0 , tf [

The selection of the correct stepsize hm plays a crucial role.

If hm is too small, then the


– discretization error (→ truncated Taylor expansion) ∼ O(hp+1 ): very small
– computational effort for the interval [t0 , tf ] is high because of many steps
– rounding error is high because of many steps
If hm is too large, then the inverse statements are true.
If hm is chosen automatically, a compromise is required between the total error
of the result and the computational effort.

We have almost no access to the rounding errors. We do not know the total
discretization error that describes the difference between the true solution and
its numerical approximation (convergence!) at the grid points ti .
We only can get an estimate of the local discretization error (= error per step,
consistency). That is not much, but unfortunately that mostly is all we have!

So we choose a rough estimate for the local discretization error (our tolerance
tol); subject to that constraint we try to maximize hm (→ rounding error and
computational effort decrease). ”Rough estimate” is meant literally: Often the
required tolerance tol is only poorly approximated.
Instead of reaching ∥T (tm , x(tm ), hm )∥ < tol we alternatively try to control the
local discretization error per step length

∥T (tm , x(tm ), hm )∥
< tol2
hm

Solution strategy

To obtain the local discretization error we have to compare after each step
tm → tm+1 the numerical result η(tm+1 ) with the exact solution of the IVP for
the same initial value η(tm ). The exact solution unfortunately is unknown.
To overcome this difficulty, the following workaround is used: We calculate the
step tm → tm+1 twice with different accuracy and then we use the numerical
approximation of higher precision as a substitute for the exact but unknown
solution.


c Rainer Callies TUM 2020 43
There are two standard ways to calculate these two approximations with differ-
ent accuracy:
Either to take one method only and calculate the solution with the two stepsizes
hm (1 integration step) and hm /2 (two succeeding integration steps)
or to choose two different methods with different orders of consistency and to
perform one integration step each with the same stepsize hm .

The mathematical justification of these strategies is given by the theorem on


the asymptotic expansion of the global discretization error for one-step meth-
ods (Gragg’s theorem).

Gragg’s theorem
Assumption:
 Let be f ∈ C N +1 ([a, b] × IRn , IRn ), t0 ∈ [a, b], x′ = f (t, x), x(t0 ) = x0 .
 Let denote ϕ the increment function for a one-step method of order p.
tm − t0
 The stepsize is assumed to be constant: h = hm ∀ m ⇒ h =
m
Claim:

N
η(tm , h) = x(tm ) + hi ei (tm ) + hN +1 EN +1 (tm , h) ∀ m
i=p

and we get in addition:


ei (t0 ) = 0 , i = p, . . . , N , the residual term EN +1 (tm , h) is bounded ∀ h ≤ H with
H properly chosen and the ei (tm ) are independent of h!

Remark
The most important feature is: ei (xm ) is independent of h!!
Gragg’s theorem guarantees the existence of an asymptotic expansion of the
global discretization error (→ convergence). 

3.6.2 One Method, Two Different Stepsizes


The local discretization error (error per step) should be smaller than a given
tolerance tol. We investigate one step; instead of t0 we might also write tm .
We apply Gragg’s theorem to a method which is convergent of order p; instead
of h we write hold , because here we insert our (old) estimate for the proper
stepsize h (and want to find out, whether this estimate is good enough or not):
·
η(t0 + hold , hold ) = x(t0 + hold ) + hold p ep (t0 + hold )
( )p
· hold
η(t0 + hold , hold /2) = x(t0 + hold ) + ep (t0 + hold )
2


c Rainer Callies TUM 2020 44
We subtract the two equations, make a Taylor expansion and use ep (t0 ) = 0;
then the following approximation of e′p (t0 ) can be calculated numerically
η(t0 + hold , hold ) − η(t0 + hold , hold /2) ·
( ) = ep (t0 + hold ) = ep (t0 ) +e′p (t0 ) · hold (∗)
1 | {z }
hold p 1 − p =0
2
The term e′p (t0 ) is independent of h and hold respectively!
Now we apply the Taylor expansion once more and directly to Gragg’s theorem
with the new and improved stepsize hnew :
η(t0 + hnew , hnew ) = x(t0 + hnew ) + hnew p ep (t0 + hnew )
+ hnew p+1 ep+1 (t0 + hnew ) + O(hnew p+2 )
T aylor
= x(t0 + hnew ) + hnew p (ep (t0 ) + hnew e′p (t0 ))
+ hnew p+1 ep+1 (t0 ) + O(hnew p+2 )
·
= x(t0 + hnew ) + hnew p+1 e′p (t0 )

For the local discretization error we then obtain


· !
∥η(t0 + hnew , hnew ) − x(t0 + hnew )∥ = hnew p+1 ∥e′p (t0 )∥ ≤ tol (∗∗)
Algorithm 4 (next page) gives a stepsize control based on (∗), (∗∗). 

Remark

 Important feature: e′p (t0 ) is independent of hold and hnew ; that is the only
reason why a relation between (∗) and (∗∗) can be established.
 The stepsize control with two stepsizes can be easily understood, but it is
not often used: too many function evaluations. 

3.6.3 Two Different Methods, One Stepsize


We use two methods of different orders of convergence p, p + 1 and one com-
mon stepsize h. The mathematical derivation is similar to the case above and
also uses Taylor expansions. This method can be efficiently programmed: Of-
ten only one additional function evaluation necessary.
And here are the details:
η(t0 + h) = x(t0 + h) + hp ep (t0 + h) + hp+1 ep+1 (t0 + h) + O(hp+2 )
η̂(t0 + h) = x(t0 + h) + hp+1 êp+1 (t0 + h) + O(hp+2 )
For the difference we get
· !
∥η(t0 + h) − η̂(t0 + h)∥ = hp ∥ep (t0 + h)∥ + . . . = hp · h · ∥e′p (x0 )∥ ≤ tol
Setting an ”=”-sign instead of the ”≤”-sign we finally obtain

tol
hnew = α · hold p+1
∥η(t0 + hold ) − η̂(t0 + hold )∥
For the safety factor we choose e.g. α = 0.9. 


c Rainer Callies TUM 2020 45
Algorithm 4: Stepsize control with two different stepsizes

 Start: Choose h = hold , e.g. from the preceeding step, i.e.


tm−1 → tm = tm−1 + hold
for m = 0, 1, . . . N − 1 do
Stepsize control for the step: tm → tm+1 > tm
(A) Step tm → tm+1 := tm + hold : calculate η(tm+1 , hold ) and
η(tm+1 , hold /2) from η(tm )
(B) Calculate approximation for e′p (tm ) as in (∗)
(C) Insert result into (∗∗) and calculate hnew :

tol / ∥η(t + h , h ) − η(t + h , h /2)∥


· m old old m
hp+1
new ≤ ′ = tol ( ) old old
∥ep (tm )∥ 1
hp+1
old 1− p
2

We calculate the (p + 1)-th root and get hnew . Now we can check a
posteriori whether the original stepsize selection was correct or not.

if hold > 2hnew then


Stepsize estimate was wrong:
Define hold := 1.5 · hnew , goto (A), repeat integration step

else
tm+1 := tm + hold
η(tm+1 ) := η (tm + hold , hold /2)
hold := min {hnew , tf − tm+1 }

Remark
This type of stepsize control (2 methods, 1 stepsize) is often used in RK meth-
ods (idea of Fehlberg). The methods are then denoted e.g. by RKF 4(5) and
RKF 8(7). The method RKF 4(5) is consistent of order 4 with an embedded
error estimator of order 5 (→ order of the error estimator written in the bracket).

0
c2 α21
c3 α31 α32
.. .. .. . .
. . . .
cs αs1 αs2 · · · αs,s−1
b1 b2 · · · bs−1 bs
b̂1 b̂2 ··· b̂s−1 b̂s

Both methods are constructed simultaneously by an extended Butcher tableau.


c Rainer Callies TUM 2020 46
To obtain a sufficient number of free parameters for both methods, often one
additional stage in the tableau is enough. 
Remark
In deviation from the strict theory, often the result of the better method is used
as the initial value for the next integration step (e.g. RKF 8(7)). By this a small
gain in precision is obtained without additional effort. 

 Example
Three-body problem: Simulation of the motion of a rocket in the gravitational
fields of Earth and Moon. Start from Earth orbit, flight to Moon, one revolution
around the Moon, flight back to Earth and arrival in Earth orbit again (Apollo
13 type mission). The final accuracy describes the deviation at the end point.

Method No. of steps Final accuracy


Euler 24000 > 100
RK4 6000 ≈ 3 · 10−1
RKF 5(4) 98 10−3
RKF 8(5,3) 102 10−6

3.7 Relation Between Convergence and Consistency


For explicit one-step methods the following property holds:
order of consistency = order of convergence
Let us investigate that in detail. The direction ”order of convergence p ⇒ order
of consistency p” is directly contained in the theorem of Gragg. The opposite
direction is still missing.

Theorem (consistency ⇒ convergence)


Assumptions:
Let x(t) solve the IVP x′ = f (t, x), x(t0 ) = x0 .
For the numerical solution let us use the general following one-step method
η0 := x0
ηk := ηk−1 + hk−1 ϕ(tk−1 , ηk−1 , hk−1 , f )
tk := tk−1 + hk−1 , k = 1, . . . , N
Furthermore, let f fulfill a global Lipschitz condition in x
∥f (t, x) − f (t, z)∥ ≤ L∥x − z∥ ∀ x, z ∈ IRn , ∀ t ∈ I
Claim:
∥T (t, x, h)∥
≤ σ(h) ∀ x, ∀t ∈ I, ∀ h ∈ ] 0, H[ ∧ lim σ(h) = 0
h h→0
−t
e k 0 −1
L|t |
=⇒ ∥x(tk ) − ηk ∥ ≤ σ(hmax ) · for k = 1, . . . , N
L


c Rainer Callies TUM 2020 47
Proof: We assume w.l.o.g. t0 < tk , hk > 0 ∀ k and consider exact solutions
zk (t) for the IVP with modified initial values

zk (tk ) = ηk , zk′ = f (t, zk (t)) ⇒ z0 (t) = x(t)

Using the triangle inequality and the theorem on the continuous dependency
of a solution from the initial values we get

n
∥x(tn ) − ηn ∥ = ∥z0 (tn ) − zn (tn )∥ ≤ ∥zk−1 (tn ) − zk (tn )∥
k=1

n
≤ ∥zk−1 (tk ) − zk (tk ) ∥eL|tn −tk |
| {z }
k=1 =ηk

n
= ∥zk−1 (tk ) − (ηk−1 + hk−1 ϕ(tk−1 , ηk−1 , hk−1 , f ))∥ eL|tn −tk |
k=1
∑n
z (t ) − z (t ) − h ϕ(t , η , h , f ) L|t −t |
= hk−1 k−1 k k−1 k−1 k−1 k−1 k−1 k−1 e n k
h
k=1 k−1

∑n n ∫
∑ tk
L|tn −tk |
≤ hk−1 σ(hk−1 )e ≤ σ(hmax ) eL(tn −ξ) dξ
k=1 k=1 tk−1
∫ tn
= σ(hmax ) eL(tn −ξ) dξ ⇒ claim
t0

We estimated hk−1 eL|tn −tk | by the integral, for eL|tn −tk | ≤ eL|tn −ξ| ∀ ξ ∈ [tk−1 , tk ].


3.8 Stiff ODEs


 Introductory Example
 
( ′ ) λ1 + λ2 λ1 − λ2 ( )
x1  2 2  x1
= λ −  , f (t, x) := Ax , λ1 , λ 2 < 0
x′2 1 λ2 λ1 + λ2 x2
| {z } 2 {z 2 | {z }
=:x′ | } =:x
=:A

General analytic solution:

x1 (t) = c1 eλ1 t + c2 eλ2 t


x2 (t) = c1 eλ1 t − c2 eλ2 t

First we choose e.g. the explicit Euler as our solution method and obtain

ηi+1 = ηi + hi f (ti , ηi ) = ηi + hi Aηi = (I + hi A)ηi , , ηi ∈ IR2

and obtain in case of h = hi ∀i the numerical solution (proof by induction)

η1,i = c1 (1 + hλ1 )i + c2 (1 + hλ2 )i


η2,i = c1 (1 + hλ1 )i − c2 (1 + hλ2 )i


c Rainer Callies TUM 2020 48
The numerical approximation converges to 0 for i → ∞ only if |1 + hλ1 | < 1 ∧
|1+hλ2 | < 1; only in this case the numerical approximation matches the analytic
solution in the limit.
We will see later on: The analytic solution is asymptotically stable, therefore the
ODE problem is called stiff. The numerical solution ”explodes”, if the stepsize
is too large.

Let us now choose λ2 ≪ λ1 (e.g. λ1 = −1 and λ2 = −1000). Then for e.g.


t ≥ 0.1 the component eλ2 t does ”not” contribute to the numerical solution
(e−100 ≈ 3 · 10−44 ), but nevertheless it determines and reduces the stepsize
of the integrator:
|1 − 1000h| < 1 ⇒ h < 0.002

If we use the implicit Euler instead

ηi+1 = ηi + hi f (ti+1 , ηi+1 ) = ηi + hi Aηi+1 ,

then for h = hi ∀i we get the numerical solution (proof by induction)


c1 c2
η1,i = +
(1 − hλ1 )i (1 − hλ2 )i
c1 c2
η2,i = −
(1 − hλ1 )i (1 − hλ2 )i
We get ηi → 0 for i → ∞ ∀ h > 0, for always |1 − λi h| > 1 is true because of
the assumption λi < 0, i = 1, 2. There is no stepsize restriction for the implicit
method. 

 Repetition: stability of linear ODEs


Consider the linear system with constant coefficients

x′ (t) = Ax(t) , x(t0 ) = x0 , A ∈ IRn×n

The system is stable, if for all EWs λi of A: Re(λi ) ≤ 0 and in case of Re(λi ) = 0
the algebraic and the geometric multiplicity of λi are equal (i.e. the EW λi has
multiplicity ki and ki linearly independent EVs).
The system is exponentially and thus also asymptotically stable, if Re(λi ) < 0
is true for all EWs λi of A. 

 How to characterize stiff ODEs?


Stiff ODE systems contain at least one asymptotically stable component. Here
perturbations in the initial conditions are rapidly damped.
From the theorem on the continuous dependency of a solution from the initial
values we get the (slightly simplified) criterion

(tf − t0 )∥fx (t, x)∥ ≤ L(tf − t0 ) ≪ 1


c Rainer Callies TUM 2020 49
x x

t t

Figure 7: Examples of asymptotically stable (le., with large deviations in the initial val-
ues) and instable (ri., after small perturbations from the nominal trajectory) behaviour
of the solutions of an ODE, nominal solution is marked in blue.

 How to test solution methods for stiff ODEs?


Starting with the above characterization of stiff ODEs, we want to develop very
simple ODEs which may serve us to decide whether a numerical method is
suited for the solution of stiff ODEs.

Let be x′ = f (x) an (autonomous) ODE system with the exact solution x(t) and
the initial value x(t0 ) = x0 . Let be v(t) another solution of the same ODE, but
for slightly modified initial values. By Taylor expansion we get
·
v ′ (t) = f (v(t)) = f (x(t)) + fx (x(t)) · (v(t) − x(t))

Simplifying assumption 1:
fx (x(t)) is only changing slowly , i.e. fx (x(t)) ≈ const ≈ J. For e(t) := v(t) − x(t)
we get the new ODE e′ (t) = Je(t). We investigate the difference e(t) to obtain
the asymptotic behaviour sketched in Fig. 7.
⇒ 1st test ODE: x′ (t) = Ax(t) , x(0) = x0 , A ∈ IRn×n

Simplifying assumption 2:
By a similarity transformation, in special cases J can be transformed to diago-
nal form: ∃ Q ∋ Q−1 JQ = diag(λ1 , . . . , λn ). We define p(t) by p(t) := Q−1 e(t).
Then the 1st test ODE decomposes into the following scalar ODEs

Q−1 e′ (t) = Q−1 JQ · Q−1 e(t) ⇒ p′i (t) = λi pi (t) , i = 1, . . . , n

If J can be transformed to diagonal form, then the λi ∈ C are the EWs. Because
a stiff system is characterized by asymptotic stability, we choose Re(λ) < 0
⇒ 2nd test ODE: x′ (t) = λx(t) , x(0) = x0 , Re(λ) < 0 (Dahlquist 1963)

The two simplifying assumptions preserve the fundamental properties of the


ODE system!


c Rainer Callies TUM 2020 50
There might exist stiff ODEs that do not satisfy the simplifying assumptions.
A method which has been tested to work for Dahlquist’s ODE not necessarily
works well for these ODEs.

Requirements for a good numerical method:


The exact solution of the scalar system is x(ti + h) = ehλ x(ti ).
The numerical solution should coincide as good as possible – but at least qual-
itatively – with the true and exact solution. The minimum requirement is

|x(ti + h)| ≤ |x(ti )| ∀ h


lim x(ti + h) = 0
h·Re(λ)→−∞

Definition (stability function R(z))


The stability function is the function that allows an equivalent formulation of the
numerical one-step method under consideration applied to the 2nd test ODE

ηi+1 = R(hλ)ηi

Here we use the special argument z = hλ in R(z).


Thus the stability function R(z) is defined via the numerical solution after one
step for the 2nd test ODE

x′ (t) = λx(t) , x(0) = ηi , z = hλ

Alternatively, we can use the 1st test ODE and analogously define R(z) by

ηi+1 = R(hA)ηi

 Example

Explicit Euler: R(z) = 1 + z

ηi+1 = ηi + hf (t, ηi ) = ηi + hAηi = (I + hA)ηi

We substitute hA → z, interpret the identity matrix as ”1” and obtain R(z) =


1 + z. The same formula we would obtain if we directly insert the 2nd test ODE.
1
Implicit Euler: R(z) =
1−z
For the implicit Euler applied to the 1st test ODE we obtain

ηi+1 = ηi + hAηi+1 ⇔ (I − hA)ηi+1 = ηi ⇔ ηi+1 = (I − hA)−1 ηi


c Rainer Callies TUM 2020 51
2+z
Trapezoidal rule: R(z) =
2−z
From the original definition of the trapezoidal rule applied to the 1st test ODE,
we obtain the stability function R(z) analogously to the implicit Euler
h h
ηi+1 = ηi + (f (ti , ηi ) + f (ti+1 , ηi+1 )) = ηi + (Aηi + Aηi+1 )
( 2 ) ( ) 2
h h
⇒ 1 − A ηi+1 = 1 + A ηi
2 2
⇒ ηi+1 = (2 − hA)−1 (2 + hA) ηi


 Observation
For (almost) all methods we get:
If the method is explicit, then R(z) is a polynomial;
if the method is implicit, then R(z) is a rational function. 

Theorem
Let be A ∈ IRn×n diagonalizable: Q−1 AQ = D = diag(λ1 , . . . , λn ). Let us define a
numerical method by ηi+1 = R(hA)ηi with R rational function and assume that
Re(λi ) < 0 ∀ i, i.e. for all EWs of A.

Then ξi := Q−1 ηi satisfies the recursion ξi+1 = R(hD)ξi ; in addition, for h > 0
the so-defined numerical method converges as required

ηj → 0 for j → ∞ ⇐⇒ |R(hλi )| < 1 ∀ i = 1 . . . n

Definition (stability)
A numerical method defined by ξi+1 = R(hA)ξi is called

absolutly stable :⇔ |R(z)| < 1 ∀ z with Re(z) < 0


A-stable :⇔ |R(z)| ≤ 1 ∀ z with Re(z) ≤ 0
L-stable :⇔ A-stable and in addition lim R(z) = 0
Re(z)→−∞

The set SR := {z ∈ C | |R(z)| ≤ 1} is called stability domain and


MR := {z ∈ C | |R(z)| < 1} is called domain of absolute stability of the method.

Remark
The larger the set MR ∩ C − , the better a method is suited for the treatment of
stiff ODEs.
For MR ⊇ C − = {z ∈ C | Re(z) < 0} the method is absolutly stable.
If |R(z)| < 1 for z = hλ, then if the neg. real part of λ increases, the stepsize h
has to decrease to obtain the same value of the stability function R(z). 


c Rainer Callies TUM 2020 52
 Example (stability domains)

Explicit Euler: M1+z = {z ∈ C | |1 + z| < 1}

Implicit Euler: M1/(1+z) = {z ∈ C | |1 − z| > 1}, i.e. the implicit Euler is absolute
stable.

z2 z3 z4
Classical explicit Runge-Kutta method RK 4: R(z) = 1 + z + + +
2 6 24

Im (z)

p=s=3 2

p=s=2
1

p=s=1

3 2 1 Re (z)
explicit Euler
1

p=s=4

Figure 8: Stability domains of explicit RK methods of order p with s stages. z = hλ ⇒


on the boundary of the stability domain: if |λ| increases, then h has to decrease.

Implicit s-stage Runge-Kutta methods (IRK)


 

s ∑
s
ϕ(t, x, h) = bk · fk (t, x, h) , fk (t, x, h) := f t + ck h, x + h αkj fj (t, x, h)
k=1 j=1

These methods have excellent stability properties, but they are rather expen-
sive numerically: In each integration step a system of nonlinear equations of
dimension (n · s) has to be solved → O(n3 s3 ) operations!
Example: A Radau-IIA method of order p = 2s − 1 is L-stable, e.g.

1/3 5/12 −1/12


1 3/4 1/4
3/4 1/4

 How to detect stiffness in an IVP for ODEs


Either analyze the local stability of the system after linearization or use a good
explicit integrator first. If the stepsize h → 0, switch to a stiff integrator. 


c Rainer Callies TUM 2020 53
4 Finite Differences
4.1 One-Dimensional Model Problem
4.1.1 Model problem
In an experiment elevation data along a mountain path are measured by GPS.
Let the data be superimposed by heavy noise due to low signal strength. How
to get a ”reasonable” altitude profile of the terrain structure?

altitude

pointwise measured elevation profile


f(x) after interpolation

true elevation profile

L
projection of the path

Figure 9: Measured (red) and true elevation profile of the path.

 Simple solution idea


The measured elevation data (red) are interpolated by a spline f : [0, L] → IR.

We want to get determine a smoothing curve u : [0, L] → IR, which is close to f


(i.e. |u(x) − f (x)| is small ∀ x ∈ [0, L]) but not noisy (i.e. |u′ (x)| small).

This leads us to the following objective function for a minimization problem


∫ L
(u(x) − f (x))2 + β(u′ (x))2 dx = min
!
I(u) =
0

with β > 0 constant and chosen properly. It is no finite-dimensional problem of


nonlinear optimization, because u(x) has to be determined at infinitely many
x-values. For sake of simplicity let the altitude at the initial and the final point
(u(0) = f (0) and u(L) = f (L)) be exactly known/measured.

 Mathematical realization of that idea


For the solution we use the method of Lagrange: For an arbitrarily chosen
function η ∈ C 1 ([0, L], IR) with η(0) = 0 = η(L) (∗) we embed the optimal solution
u into and compare it with the one-dimensional set of functions v := u + εη for
ε ∈ [−ε0 , ε0 ]. We have chosen (∗) because the values at the endpoints are
prescribed and that has to be true also for all possible solution candidates.


c Rainer Callies TUM 2020 54
If u is an optimum, then the following necessary condition holds
dJ(ε)
= 0 with J(ε) := I(u + εη)
dε ε=0

We differentiate I(v) with respect to ε and get (using chain rule ”chain” and
integration by parts ”p.I.” on η ′ )

dJ(ε) dI(u + εη)
0 = =
dε ε=0 dε
(∫ ε=0 )
L
d ′
= (v(x) − f (x)) + β(v (x)) dx
2 2
dε 0
ε=0
∫ L
( )
2(u(x) − f (x)) · η(x) + 2βu′ (x) · η ′ (x) dx
chain
=
0
∫ L x=L
p.I. ( ′′
) ′
= 2 (u(x) − f (x)) · η(x) − βu (x) · η(x) dx + u (x)η(x) (∗)
0 x=0

The second term in (∗) vanishes, because η(0) = 0 = η(L). We apply the Fun-
damental lemma (see below) to the integral and obtain, that a necessary con-
dition for an optimum is that u solves the following boundary value problem
(BVP)
Problem (P ):
−βu′′ (x) + u(x) = f (x) , x ∈ ]0, L[
u(0) = f (0)
u(L) = f (L)

The Fundamental lemma could be applied because the integral has to be zero
for every choice of such a test function η:

Fundamental lemma
Let be G ∈ C 0 ([a, b], IR), η ∈ C 1 ([a, b], IR) with η(a) = η(b) = 0.
∫ b
If ∀ η : η(x)G(x)dx = 0, then it follows: G(x) ≡ 0 .
a

Remark
The boundary conditions in our example are u(0) = f (0) and u(L) = f (L), the
function values are prescribed. That type of boundary condition is called Dirich-
let condition.
If e.g. u(L) = f (L) is omitted, then the new and special boundary condition
u′ (L) = 0 is necessary to fulfill (∗). If the derivative with respect to the exterior
normal to the boundary (here in 1 D i.e. the ordinary derivative) is given, that
type of boundary condition is called Neumann condition. 


c Rainer Callies TUM 2020 55
Remark
From problem (P ) we see that solutions u have to be at least in C 2 ([0, L], IR)!
For f a continuous approximation by a polygon is sufficient. 

4.1.2 Numerical Approximation by Finite Differences


On [0, L] we define a mesh with N gridpoints and a mesh size h := L/(N − 1)

Ωh := {xi | xi = (i − 1) · h, i = 1, . . . , N }

and want to approximate the exact values u(xi ) by Ui ≈ u(xi ).

To obtain the Ui , the derivative u′′ (x) is approximated by the difference quotient
(Taylor expansion!)
u(x + h) − 2u(x) + u(x − h)
u′′ (x) = + O(h2 )
h2
This Taylor expansion is possible only for u ∈ C 4 ([0, L], IR)!

Insertion into the BVP (P ) gives the new discretized problem


Problem (Ph ):
Consider the function f ∈ C 0 ([0, L], IR) and choose β > 0. Let be N ∈ IN and
h := L/(N − 1).
Determine Ui , i = 1, . . . N, such that
β
− (Ui+1 − 2Ui + Ui−1 ) + Ui = f (xi ) , i = 2, . . . , N − 1,
h2
U1 = f (x1 )
UN = f (xN )


In matrix notation (Ph ) can be stated as a sparse linear system:
        
0 0 1 0 U1 f (x1 )
  1 −2 1     
  U2   f (x2 ) 
    1   
 β      ..   
 .. 

−  .. .. ..  +  . .    .  =  . 
 h2  . . .   .  
      .   
  1 −2 1   1   ..   
 ..
.


0 0 0 1 UN f (xN )

Remark (again)
We have obtained the difference approximation of u′′ (x) using Taylor’s theorem.
From the first nonvanishing term of the error we see that the approximation
quality O(h2 ) stated there is valid only for u ∈ C 4 ([0, L], IR)!!
Therefore implicit smoothness assumptions are used (C 2 from optimization, C 4
or C 3 from finite differences) which are not part of the original problem. 


c Rainer Callies TUM 2020 56
4.1.3 Convergence of the Finite Difference Method
 Introduction
The discretized problem (Ph ) leads to a linear system. Two questions arise:

• Is the linear system uniquely solvable?

• Does the solution of (Ph ) converge to the solution of (P ) for N → ∞ (or


h → 0)? If yes, with which convergence rate?

In chap. 4.1.3 we analyze these questions only for the one-dimensional model
problem discussed in the previous chap. 4.1.1! 

Lemma (discrete maximum principle, special version)


Let {Ui , i = 1, . . . , n} solve the problem (Ph ). Then

max Ui ≤ max f (xi )


i=1,...,N i=1,...,N
min Ui ≥ min f (xi )
i=1,...,N i=1,...,N

Proof:
We only show the first property. Let Uj be the maximum of {Ui , i = 1, . . . , n}.
If j = 1 (analogously for j = N ), then

max Ui = U1 = f (x1 ) ≤ max f (xi )


i=1,...,N i=1,...,N

If j ∈ {2, . . . , N }, then because of the maximum property

Uj ≥ Uj−1 ∧ Uj ≥ Uj+1

We insert this into the finite difference expression and obtain


β
(Uj+1 − 2Uj + Uj−1 ) ≤ 0 ⇒ − (Uj+1 − 2Uj + Uj−1 ) ≥ 0
h2
(Ph )
=⇒ f (xj ) − Uj ≥ 0 ⇒ Uj ≤ f (xj ) ≤ max f (xi )
i=1,...,N

 Uniqueness 
We now address the first question.
The linear system is uniquely solvable, if the matrix is regular. For that we
have to show that either the determinant of the matrix is non-zero – e.g. with
the minor expansion formula (= Determinantenentwicklungssatz) – or that for
the homogeneous system the only solution is U := (U1 , . . . , UN ) = 0.
In our case the latter approach is simple if we use the discrete maximum prin-
ciple:


c Rainer Callies TUM 2020 57
Consider the homogeneous system, i.e. f (xi ) = 0 ∀ i ∈ {1, . . . , N }.
Then mini=1,...,N f (xi ) = 0 = maxi=1,...,N f (xi ).
From the discrete maximum principle we get

0 ≤ min Ui ≤ max Ui ≤ 0 ⇒ Ui = 0 ∀ i .
i=1,...,N i=1,...,N

Therefore the linear system is uniquely solvable.

 Consistency, Stability, Convergence

Definition
Let denote Ω = ]0, L[ the domain, h = L/(N − 1) the discretization mesh size,
Ωh := {xi = (i − 1) · h, i = 1, . . . , N } the mesh of N ∈ IN gridpoints; u ∈ C 4 (Ω, IR).
Then the differential operator Lβ is defined by

d2
Lβ := −β +1 ⇒ Lβ u(x) = −βu′′ (x) + u(x)
dx2

The discrete differential operator Lβh u : Ωh → IR is defined by


( ) β
Lβh u (xi ) := − (u(xi+1 ) − 2u(xi ) + u(xi−1 )) + u(xi )
h2

An operator always may be considered as a rule that tells us what to do e.g.


with a function or a point. 

These definitions allow us a more compact and clear formulation of the follow-
ing theorems.

Theorem (consistency, model problem)


For u ∈ C 4 (Ω, IR) and our model problem we get
( )
β β
max L u(xi ) − Lh u (xi ) ≤ Ch2
i=2,...,N −1

The constant C does not depend on h. The order of consistency is 2 because


of the exponent in h2 .

Proof:
( ) ( )
β
L u(xi ) − Lh u (xi ) = −βu′′ (xi ) + u(xi ) − Lh u (xi ) = O(h2 )
β β

because of the approximation of u′′ by the Taylor-based finite difference formula


in chap. 4.1.2. 


c Rainer Callies TUM 2020 58
Remark
As in case of the one-step methods for ODEs, consistency is a local charac-
terization.
We insert the exact solution u at the grid points xi into the homogeneous part
of the exact differential equation and into its finite difference approximation and
measure the maximum difference.
For a consistent method, the difference vanishes for h → 0.
Attention: We do not compare the results Ui of the numerical solution of the
ODE with the exact solution u(xi ) here! 

Theorem (stability, continuous dependence on f , model problem)


Let be U ∈ IRN the solution of problem (Ph ). Then

max |Ui | ≤ C̃ max |f (xi )|


i=1,...,N i=1,...,N

The constant C̃ does not depend on h. Such a method is called stable.

Proof:
We again apply the discrete maximum principle and directly obtain

max |Ui | ≤ max |f (xi )| ⇒ C̃ = 1


i=1,...,N i=1,...,N

Here we used that min Ui ≥ min f (xi ) ⇒ − min Ui ≤ − min f (xi ) and in case of
min f (xi ) < 0 we get − min f (xi ) = max(−f (xi )). 

Theorem (convergence, model problem)


Let be u the solution of (P ) and u ∈ C 4 (Ω, IR), let U ∈ IRN be solution of (Ph ).
Then
max |u(xi ) − Ui | ≤ Ĉh2
i=1,...,N

Such a method is called convergent of 2nd order.

Proof:
We investigate the error e : Ωh → IR, e(xi ) := Ui − u(xi ) and prove that it solves
problem (Ph ) with a new right hand side r(xi ) for i = 2, . . . , N :
( ) β ( ) ( )
β β β
Lh e (xi ) = − 2 (Ui+1 − 2Ui + Ui−1 ) − Lh u (xi ) = f (xi ) − Lh u (xi )
h ( ) ( )
= −βu′′ (xi ) + u(xi ) − Lβh u (xi ) = Lβ u(xi ) − Lβh u (xi )=: r(xi )


c Rainer Callies TUM 2020 59
We now define r(x1 ) := 0, r(xN ) := 0; then e solves the problem (Ph ) with the
new right hand side r instead of f .
Because we have proven that our method is stable:

max |e(xi )| ≤ C̃ max |r(xi )|


i=1,...,N i=1,...,N

Because we have proven that our method is consistent:

|r(xi )| ≤ Ch2 for i = 2, . . . , N − 1 ∧ r(x1 ) = r(xN ) = 0

In total we get
max |e(xi )| ≤ Ĉ with Ĉ = C · C̃
i=1,...,N

and Ĉ is independent on h. 

4.2 Quasilinear PDEs


Definition
A partial differential equation (PDE) for the scalar function u(x1 , . . . , xn ) with the
n independent varables (x1 , . . . , xn ) ∈ D ⊆ IRn is an equation of the form
( )
∂u ∂u ∂2u ∂2u
F x1 , . . . xn , u, ,... , ,... , ... = 0.
∂x1 ∂xn ∂x1 ∂x1 ∂x1 ∂xn
If F is a linear function of u and its derivatives, then the PDE is called linear.
A PDE is called quasilinear, if F is at least linear in the highest order derivatives
of u.
The order of the PDE is the highest order partial derivative of u in F . 

Notation
In PDEs, it is common to denote partial derivatives using subscripts. So e.g.
for u = u(x, y) we write:
( )
∂u ∂ 2u ∂2u ∂ ∂u
ux = , uxx = 2 , uxy = = .
∂x ∂x ∂y ∂x ∂y ∂x

Especially in physics, nabla (∇) is often used to denote spatial derivatives, and
u̇, ü for time derivatives. For example, the wave equation can be written as

ü = c2 ∇2 u = c2 ∆u

where ∆ is the Laplace operator. 


c Rainer Callies TUM 2020 60
 Example
General scalar linear PDE of 2nd order with 2 independent variables x, y:

a(x, y)uxx + b(x, y)uxy + c(x, y)uyy + d(x, y)ux + e(x, y)uy + g(x, y)u = f (x, y)

with a, b, c, d, e, f, g ∈ C 0 (Ω, IR), Ω ⊂ IR2 bounded domain and |a| + |b| + |c| >
0 ∀ (x, y) ∈ Ω.
General scalar quasilinear PDE of 2nd order with 2 independent variables x, y:

a(x, y, u, ux , uy )uxx + b(x, y, u, ux , uy )uxy + c(x, y, u, ux , uy )uyy


= f (x, y, u, ux , uy )

with a, b, c, f ∈ C 0 (D, IR), D ⊂ IR5 und |a| + |b| + |c| > 0.


If f ≡ 0, then the PDE is homogeneous, otherwise inhomogeneous. 

Classification of quasilinear and linear PDEs of second order


Consider the general quasilinear PDE with n independent variables x ∈ Ω and
Ω ⊆ IRn bounded domain (open, connected, bounded)
n ∑
∑ n
∂2u
aij (x, u, p) (x) = f (x, u, p)
∂xi ∂xj
i=1 j=1
A = (aij ) symmetric, aij , f ∈ C 0 (Q, IR), Q ⊂ Ω × IR × IRn
∂u
x := (x1 , . . . , xn ), p := (p1 , . . . , pn ), pi := = uxi , u(x) ∈ IR .
∂xi
In operator notation we write

n
∂ 2u
Lu := aik (x, u, p) , Lu(x) = f (x, u, p)
∂xi ∂xj
i,k=1

The PDE is in (x, u, p)


elliptic , if all EWs of A have the same sign (all are positive or negative)
parabolic , if exactly one EW is equal to zero and
all the other EWs of A have the same sign (are pos. or neg.)
hyperbolic , there is only one negative EW and all the rest are positive, or
there is only one positive EW and all the rest are negative.
A PDE is elliptic/parabolic/hyperbolic, if this property holds at every point of the
domain.
Analogously we classify linear PDEs of second order


n
∂2 ∑ n

L := − aik (x) + bi (x) + c(x) , Lu(x) = f (x)
∂xi ∂xj ∂xi
i,k=1 i=1


c Rainer Callies TUM 2020 61
Well-posed PDE problems (Hadamard)
The mathematical term ”well-posed problem” stems from a definition given by
Hadamard. He believed that mathematical models of physical phenomena
should have the properties that:
(1) a solution exists,
(2) the solution is unique,
(3) the solution’s behavior changes continuously with the data (stability).
Examples of well-posed problems include the Dirichlet problem for Laplace’s
equation, and the heat equation with specified initial conditions.
Especially important in PDE applications is the correct determination of the
initial data and the boundary values. Otherwise it might happen that no solution
exists or that the solution changes dramatically even for a very small change
in the data. 

4.3 Poisson Equation


We use the Poisson equation −∆u = f as an example of an elliptic PDE. To
obtain a numerical solution by Finite Difference methods, we proceed step by
step as in the treatment of the one-dimensional model problem. We will use
the Poisson equation again when we discuss the Finite Element method.

4.3.1 Derivation of the Poisson Equation


 Problem
Let us describe the vertical displacement of a thin membrane (e.g. of a drum)
caused by an external load f .
The resulting shape of the membrane can be obtained as the solution u of a
variational problem from physics.

 Notation
Let be Ω ∈ IR2 a bounded domain and f ∈ C 0 (Ω, IR) a given function. Let denote
u : Ω → IR, (x, y) 7→ u(x, y), the function that describes the vertical displacement
of the membrane at every (x, y) ∈ Ω.
Let the boundary ∂Ω of Ω consist of two parts ΓD and ΓN with
ΓD ∪ ΓN = ∂Ω ∧ ΓD ∩ ΓN = ∅ .
As boundary condition on ΓD we assume a Dirichlet condition (function values
prescribed)
u(x, y) := G(x, y) for (x, y) ∈ ΓD
In addition, let n(x, y) denote the exterior normal, i.e. the outward pointing
unit normal vector on ∂Ω; the directional derivative ∂u/∂n is calculated via the
scalar product
∂u
(x, y) = ⟨n(x, y), ∇u(x, y)⟩2
∂n


c Rainer Callies TUM 2020 62
 Physical model
From physics we get (without proof) for the potential energy of the deformed
membrane
∫ ∫ ∫
1
I(u) := ⟨∇u, ∇u⟩2 dx dy − uf dx dy − uH ds
2 Ω Ω ΓN

The potential energy I(u) consists of the strain energy (first integral, Verzer-
rungsenergie) minus the energy resulting from the external forces acting on
the surface Ω and on the boundary ΓN .
A physical system in equilibrium takes the state of minimum energy and there-
fore we get for u
I(u) → min! ∧ u = G on ΓD

 Variational approach
We assume that a ”classical solution” u ∈ C 2 (Ω, IR) ∩ C 1 (Ω̄, IR) with Ω̄ := Ω ∪
∂Ω exists and will obtain the Poisson equation as a necessary condition for a
minimum.

For the solution we again (similar to chap. 4.1.1) use the method of Lagrange:
For an arbitrarily chosen function η ∈ C (Ω, IR) ∩ C (Ω̄, IR) with η Γ = 0 we em-
2 0
D
bed the optimal solution u into and compare it with the one-dimensional set of
functions v := u + εη for ε ∈ [−ε0 , ε0 ]. We have chosen that embedding because
the values on the part ΓD of the boundary are prescribed and that has to be
true also for all possible solution candidates.

If u is an optimum, then the following necessary condition holds


dJ(ε)
= 0 with J(ε) := I(u + εη)
dε ε=0
Insertion yields

1
J(ε) = ⟨∇u + ε∇η, ∇u + ε∇η⟩2 dx dy
2
∫Ω ∫
− (u + εη)f dx dy − (u + εη)H ds
Ω ΓN
∫ ∫ ∫
dJ(ε)
⇒ 0= = ⟨∇u, ∇η⟩2 dx dy − ηf dx dy − ηH ds (∗)
dε ε=0 Ω Ω ΓN

Now we need something like ”generalized integration by parts” in several


dimensions. We use the following trick: We define the new vector field
F (x, y) := η · ∇u = (η · ux , η · uy ) ∈ IR2 and apply Gauss’s divergence theorem
(Gaußscher Integralsatz)
∫ ∫
div F dx dy = ⟨F, n⟩2 ds
Ω ∂Ω

with div F = ∂F1 /∂x + ∂F2 /∂y = uxx η + ux ηx + uyy η + uy ηy = η∆u + ⟨∇u, ∇η⟩2 .


c Rainer Callies TUM 2020 63
Using this expression for the generalized integration by parts we get
∫ ∫ ∫
⟨∇u, ∇η⟩2 dx dy = − η∆u dx dy + η⟨∇u, n⟩2 ds
Ω Ω ∂Ω

We insert the last expression into (∗), use that η ΓD
= 0 and obtain
∫ ∫ ∫ ∫
0 = ηf dx dy −
η∆u dx dy + η⟨∇u, n⟩2 ds + ηH ds
Ω Ω ∂Ω=ΓN +ΓD ΓN
∫ ∫
= η(∆u + f ) dx dy + η(H − ⟨∇u, n⟩2 ) ds
Ω ΓN

This expression has to be valid for all η ∈ C 2 (Ω, IR) ∩ C 0 (Ω̄, IR) with η Γ = 0.
D
Using the Fundamental lemma (in its generalized form) again, we get the fol-
lowing necessary condition for a minimum: u has to solve the
”Poisson equation” (PDE problem)
( 2 )
∂ ∂2
−∆u = − u(x, y) + 2 u(x, y) = f (x, y) on Ω
∂x2 ∂y
u(x, y) = G(x, y) on ΓD
∂u
(x, y) = ⟨n(x, y), ∇u(x, y)⟩2 = H(x, y) on ΓN
∂n
u ∈ C 2 (Ω, IR) ∩ C 1 (Ω̄, IR)

On ΓN a Neumann boundary condition has to be fulfilled (derivatives of the


solution u prescribed). The Neumann boundary condition cannot be chosen
freely in our example, but it is determined by the variational problem!

Remark
On ΓD we do not need the C 1 -property of u, here C 0 is sufficient. 

4.3.2 Poisson Equation and Properties of its Solution

Theorem (maximum principle)


Let be u ∈ C 2 (Ω, IR) ∩ C 0 (Ω̄, IR).
Maximum principle:
If −∆u = f ≤ 0 in Ω, then u has its maximum on the boundary ∂Ω.
Minimum principle:
If −∆u = f ≥ 0 in Ω, then u has its minimum on the boundary ∂Ω.
Comparison:
Let be v ∈ C 2 (Ω, IR) ∩ C 0 (Ω̄, IR) another function with −∆u = f ≤ −∆v = f˜ in Ω
and u ≤ v on ∂Ω, then u ≤ v in Ω.


c Rainer Callies TUM 2020 64
Remark
A consequence of the maximum principle is that the solution changes continu-
ously with the data on the boundary (in case of Dirichlet condition):
Consider −∆u1 = f and −∆u2 = f with ui (x) = Gi (x) ∀ x ∈ ∂Ω, i = 1, 2. We get
−∆w = 0 for w := u1 − u2 .
From the maximum principle we conclude

w(x) ≤ sup w(z) ≤ sup |w(z)| , w(x) ≥ inf w(z) ≥ − sup |w(z)|
z∈∂Ω z∈∂Ω z∈∂Ω z∈∂Ω

and with this

sup |u1 (x) − u2 (x)| ≤ sup |u1 (z) − u2 (z)| = sup |G1 (z) − G2 (z)|
x∈Ω z∈∂Ω z∈∂Ω

From that we see that the Poisson equation with Dirichlet boundary conditions
is well-posed in the sense of Hadamard (effect of changes in f not analyzed
here). 

Remark

With the definition ∆u(x) := ni=1 uxi xi (x) for u ∈ C 2 (Ω, IR) ∩ C 0 (Ω̄, IR) with
Ω ∈ IRn the Poisson equation can be generalized to IRn . 

4.3.3 Grid, Difference Operators and Boundary Conditions


To use a finite difference method to approximate the solution to a problem, one
must first discretize the problem’s domain. Note that this means that finite-
difference methods produce discrete numerical approximations to the deriva-
tives.
A first introduction to that topic was given in the example ”discretization of
Laplace’s equation” in chap. 2.
In this subchapter we consider the Poisson equation defined on a rectangular
domain Ω ∈ IR2 :
∂ 2u ∂ 2u
+ = f (x, y) for (x, y) ∈ Ω ⊂ IR2
∂x2 ∂y 2
with Ω := {(x, y) | x ∈ ]0, (N + 1)h[, y ∈ ]0, (M + 1)h[} and define a uniform grid
with meshsize h (refinement possible!).
The set of the gridpoints in the interior is defined by

Ωh := {(x, y) ∈ Ω | x = ih, y = jh for i = 1, . . . N, j = 1, . . . M }

and the set of the gridpoints on the boundary is defined by

∂Ωh := {(x, y) | x = ih, y = 0 ∨ y = (M + 1)h for i = 0, 1, . . . N, N + 1}


∪ {(x, y) | x = 0 ∨ x = (N + 1)h, y = jh for j = 1, . . . M }

We do not want to have the same grid point twice in the definition.


c Rainer Callies TUM 2020 65
y-axis
h

4h

j
3h

2h

1h

0 1h 2h 3h 4h 5h x-axis
i

Figure 10: Rectangular domain Ω with N = 4 and M = 3, interior gridpoints


(= elements of Ωh ) marked by circles together with one five-point stecil (red)

We make use of the uniform grid xi = i · h, yj = j · h and define uij := u(xi , yj ).


Our goal is not to calulate u(x, y) ∀ x, y ∈ Ω̄, but only to get approximations Ui,j
of the exact solution ui,j := u(i · h, j · h) at the gridpoints.

Derivatives at the grid points are approximated by discrete difference operators


which are derived by Taylor expansions.

 Derivatives at the interior grid points (xi , yj ) ∈ Ωh


First order partial derivatives:

h2
u(x + h, y) = u(x, y) + ux (x, y) · h + uxx (x, y) + O(h3 )
2
h2
u(x, y + h) = u(x, y) + uy (x, y) · h + uyy (x, y) + O(h3 )
2

1( )
Backward difference: ux |i,j = ui,j − ui−1,j + O(h)
h
1( )
uy |i,j = ui,j − ui,j−1 + O(h)
h
1( )
Forward difference: ux |i,j = ui+1,j − ui,j + O(h)
h
1( )
uy |i,j = ui,j+1 − ui,j + O(h)
h
1( )
Centered difference: ux |i,j = ui+1,j − ui−1,j + O(h2 )
2h
1( )
uy |i,j = ui,j+1 − ui,j−1 + O(h2 )
2h


c Rainer Callies TUM 2020 66
1

1
-
h [ -1 1 0 [ , -1h [ 0 -1 1
1
[ ,-
2h [
-1 0 1
1
[, -
2h
0

-1

Figure 11: Computational molecules for backward, forward, centered difference ap-
prox. of ux |i,j and centered difference approx. of uy |i,j (from le. to ri.).

Some second order partial derivatives:


u(x − h, y) − 2u(x, y) + u(x + h, y)
uxx |i,j = + O(h2 )
h2 i,j
ui−1,j − 2ui,j + ui+1,j
= + O(h2 )
h2
1 (
uxx |i,j = u(x − 2h, y) − 16u(x − h, y) + 30u(x, y)
12h2 )

−16u(x + h, y) + u(x + 2h, y) + O(h4 )
i,j

and analogous formulae for uyy .

Based on those two formulae the Laplace operator ∆u can be approximated


by the 5-point stencil at (xi , yj )

1
∆u = ∆h u + O(h2 ) , ∆h u := 2 (ui+1,j + ui−1,j + ui,j+1 + ui,j−1 − 4ui,j )
i,j i,j i,j h

(9)
or the non-compact 9-point stencil: ∆u = ∆h u + O(h4 ) (see fig. 12).
i,j i,j

1 -16

1 1
-
h2
1 -4 1 , -
12h2
1 -16 60 -16 1

1 -16

Figure 12: Computational molecules for the 5-point stencil ∆h (le.) and the non-
(9)
compact 9-point stencil ∆h (ri.).

The approximation of the Laplace operator by the 5-point stencil at (xi , yj )


leads to the equation for that gridpoint

Ui−1,j + Ui+1,j + Ui,j−1 + Ui,j+1 − 4Ui,j = h2 f (xi , yj ) (1)


c Rainer Callies TUM 2020 67
Remark
The approximation of the Laplace operator by finite differences is possible only
if u is sufficiently smooth (because of Taylor!). A much higher smoothness is
necessary than for the analytical solution: u ∈ C 4 (Ω, IR) in case of the 5-point
stencil and u ∈ C 6 (Ω, IR) in case of the non-compact 9-point stencil.
The discretization with the non-compact 9-point stencil includes values at
points that are not closest neighbors. This leads to increased difficulties at
points close to the boundary.
Because of these two drawbacks, discretizations of higher order are often not
used. 

 Boundary conditions at grid points (xi , yj ) ∈ ∂Ωh


We consider the grid geometry in fig. 13 (section of the rectangular domain
Ω with the set of gridpoints Ωh ∪ ∂Ω) and analyze how to treat the boundary
conditions at (w.l.o.g) (x0 , yj ) = (0, j · h).
For the approximation of the Laplace operator ∆ we have chosen the 5-point
stencil ∆h . The special stencil centered at (x1 , yj ) is also marked in fig. 13:
This is the only stencil at an interior point in contact with our special grid point
(x0 , yj ) on the discretized boundary ∂Ωh .
h h

j+1 j+1

j j

j-1 j-1

i=0 1 2 3 i=0 1 2 3

Figure 13: Grid geometry with boundary conditions at point (x0 , yj ) (blue/red): 5-point
stencil centered at (x1 , yj ) (black) for Dirichlet boundary condition (le.) and extrapola-
tion centered at (x0 , yj ) (blue/red) for Neumann boundary condition (ri.).

Dirichlet boundary condition:


U0,j is prescribed: U0,j = G(x0 , yj ). For the 5-point stencil this results in the
following equation at (x1 , yj ):

U2,j + U1,j+1 + U1,j−1 − 4U1,j = h2 f (x1 , yj ) − U0,j = h2 f (x1 , yj ) − G(x0 , yj ) (2)

Neumann boundary condition:


The discretized Neumann condition
∂u
(x, y) = H(x, y) ∀ (x, y) ∈ ΓN,h ⊆ ∂Ωh
∂n

c Rainer Callies TUM 2020 68
at (x0 , yj ) in case of backward difference approximation of the derivative leads
to
∂u(x, y) u0,j − u1,j
− = H(x0 , yj ) ⇒ = H(x0 , yj ) + O(h)
∂x 0,j h
Unfortunately this poor approximation results in an additional local error
(= consistency error) of order O(h), whereas the 5-point stencil has a con-
sistency error of only O(h2 ).
The approximation can be improved e.g. by extrapolation techniques. We
demonstrate that for an arbitrary function g ∈ C ∞ (IR, IR)

h2 ′′ h3
g(x + h) = g(x) + hg ′ (x) + g (x) + g ′′′ (x) + . . . ⇒
2 6
g(x + h) − g(x) h h 2
= g ′ (x) + g ′′ (x) + g ′′′ (x) + . . .
h 2 6
g(x + 2h) − g(x) 2h 2
= g ′ (x) + hg ′′ (x) + g ′′′ (x) + . . .
2h 3
By linear combination we get

g(x + h) − g(x) g(x + 2h) − g(x)


2· − = g ′ (x) + O(h2 )
h 2h
We choose g(x) = u(x, y) and obtain the following equation at (x0 , yj ):

4u1,j − 4u0,j − u2,j + u0,j = −2h · H(x0 , yj ) + O(h2 )


⇒ 4U1,j − 3U0,j − U2,j = −2h · H(x0 , yj ) (3)

with a consistency error of O(h2 ).

 Curvilinear boundary
If the domain Ω has a more complicated geometry, a modification of the dis-
cretization of the Laplace operator is necessary.
We consider the example in fig. 14.
On the intersections of the curvilinear boundary with the mesh we define addi-
tional points (red). The point A has the coordinates (xA , yA ) = (i · h − hA , j · h)
and the point B has the coordinates (xB , yB ) = (i · h, j · h + hB ) with hA , hB > 0.
We modifiy the 5-point stencil centered at (xi , yj ).
Using Taylor expansion again we get
( )
ui+1,j ui,j u(xA , yA )
uxx = 2 − + + O(h)
i,j h(h + hA ) h · hA hA (h + hA )
( )
ui,j−1 ui,j u(xB , yB )
uyy = 2 − + + O(h)
i,j h(h + hB ) h · hB hB (h + hB )

Addition completes the modified 5-point stencil:



(m)
uxx + uyy = ∆h u + O(h)
i,j i,j i,j


c Rainer Callies TUM 2020 69
h
boundary

j+1
B
h
hB
A
j
hA

j-1

i-1 i i+1

Figure 14: Curvilinear boundary and modified 5-point stencil.

For the difference equation at (xi , yj ) we get

αi,j Ui,j + αi+1,j Ui+1,j + αi,j−1 Ui,j−1 + αA UA + αB UB = h2 f (xi , yj ) (4)

with αi,j , . . . , αB chosen according to the equations above. UA , UB are the ap-
proximations at the additional points A, B.
This scheme is called Shortley-Weller scheme. The order of consistency is
only linear. In case of h = hA = hB we get the usual 5-point stencil which is
consistent of order 2.

Remark
This example shows that Finite Difference methods run into difficulties in case
of more complicated geometries of Ω. 

4.3.4 Formulation of the Sparse Linear System


Each gridpoint (xi , yj ) at which an approximation Uij of the exact solution uij =
u(xi , yj ) is required contributes one equation (see eq. (1)-(4) in the previous
chap. 4.3.3) to the final linear system

Ah U = f˜h .

We approximate the exact solution at all interior points (i.e. Ωh ) and at all
boundary points with Neumann condition (i.e. ΓN,h ).

As an example consider Q̄ := {(x, y)|x ∈ [0, 5h], y ∈ [0, 4h]} together with a grid
with uniform mesh size h as in fig. 11. Let us assume Dirichlet boundary
conditions with rij := G(xi , yj ) = u(xi , yj ) on the boundary ∂Ωh and let denote
fij = f (xi , yj ). Then after ordering the unknowns Uij into the vector U ∈ IR12 in
a proper way we obtain the following sparse linear system


c Rainer Callies TUM 2020 70
4 −1 −1 U11 h 2 f 11 − r 10 − r 01
−1 4 −1 −1 U21 h 2 f 21 − r 20
−1 4 −1 −1 U31 h 2 f 31 − r 30
−1 4 −1 U41 h 2 f 41 − r 40 − r 51
−1 4 −1 −1 U12 h 2 f 12 − r 02
− 1 −1 4 −1 −1 U 22 h2 f 22
= −
−1 −1 4 −1 −1 U 32 h2 f 32
−1 −1 4 −1 U42 h 2 f 42 − r 52
−1 4 −1 U13 h 2 f 13 − r 14 − r 03
−1 −1 4 −1 U23 h 2 f 23 − r 24
−1 −1 4 −1 U33 h 2 f 33 − r 34
−1 −1 4 U43 h 2 f 43 − r 44 − r 53

Figure 15: Sparse linear system Ah U = f˜h for the discretized Poisson problem with
Dirichlet boundary conditions and the domain and grid as in fig. 11.

The structure of the matrix Ah depends on the chosen numbering of the grid
points as can be seen in fig. 16 for a square domain with 25 interior points and
Dirichlet boundary conditions. A number is assigned to each grid point; the
number corresponds to the row of Ah that contains one of the equations (1)-(4)
belonging to this grid point.

21 22 23 24 25 11 16 20 23 25 11 24 12 25 13

16 17 18 19 20 7 12 17 21 24 21 9 22 10 23

11 12 13 14 15 4 8 13 18 22 6 19 7 20 8

6 7 8 9 10 2 5 9 14 19 16 4 17 5 18

1 2 3 4 5 1 3 6 10 15 1 14 2 15 3

0 0 0

5 5 5

10 10 10

15 15 15

20 20 20

25 25 25
0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25
nz = 105 nz = 105 nz = 105

Figure 16: Structure of Ah depending on the numbering of the grid points. In all cases
Ah ∈ IR25×25 contains nz = 105 nonzero elements.

4.3.5 Analysis of the Finite Difference Discretization


We analyze the discretized Poisson problem in a similar way as the one-
dimensional model problem in chap. 4.1.3.
We assume that the mesh size h is sufficiently small such that Ωh is connected
if Ω is connected.


c Rainer Callies TUM 2020 71

Figure 17: Mesh size h too large: Ω connected, Ωh not connected.

Again a discretized maximum principle can be formulated:

Lemma (discrete maximum principle)


Consider the Poisson equation −∆u = f with f ≤ 0 in Ω and Dirichlet bound-
ary conditions u(x, y) = G(x, y) on ∂Ω. Let denote ∆h the discretization on a
uniform grid using a 5-point stencil. Let us assume curvilinear boundaries (4).
We obtain a sparse linear system Ah U = f˜h .
If
max Uij ≥ max G(xi , yj )
(xi ,yj )∈Ωh (xi ,yj )∈∂Ωh

then Uij = const for all (xi , yj ) ∈ Ωh .


Otherwise the discrete maximum is taken on the boundary ∂Ωh

max Uij ≤ max G(xi , yj )


(xi ,yj )∈Ωh (xi ,yj )∈∂Ωh

Remark
Analogously to the (non-discretized) Poisson equation in chap. 4.3.2 a discrete
minimum principle and a discrete comparison principle can be formulated. The
proof is similar to chap. 4.1.3 for the one-dimensional model problem. 

Theorem (uniqueness)
Consider the Poisson equation −∆u = f and Dirichlet boundary conditions
u(x, y) = G(x, y) on ∂Ω. Let denote ∆h the discretization on a uniform grid
using a 5-point stencil. Let us assume curvilinear boundaries (4). We obtain a
sparse linear system Ah U = f˜h .
Then Ah is non-singular and the sparse linear system is uniquely solvable.

Proof:
Similar to chap. 4.1.3 using the discrete maximum principle. 


c Rainer Callies TUM 2020 72
Question
How accurate is the Laplace operator approximated by the 5-point stencil?
That is a local property.

Definition (consistency)
The difference scheme ∆h is consistent with the Laplace operator ∆, if

∥∆h v(xi , yj ) − ∆v(xi , yj )∥ ≤ γ(h) ∧ lim γ(h) = 0 ∀ (xi , yj ) ∈ Ωh


h→0

for all functions v ∈ C 2 (Ω̄, IR).

The scheme is consistent of order k, if for v ∈ C 2+k (Ω̄, IR)

∥∆h v − ∆v∥ = O(hk ) ∀ (xi , yj ) ∈ Ωh and h → 0




Lemma
The 5-point stencil is consistent of order k = 2.

Proof: We directly get that from the Taylor expansion in case of constant h. 

Remark
Again, from consistency we cannot conclude convergence. We need stability
in addition. 

Definition (global error)


The global error of the difference method ∆h at (xi , yj ) ∈ Ωh is defined as the
difference of the true and the approximated result

e(xi , yj ) := Uij − u(xi , yj ) for (xi , yj ) ∈ Ωh


Remark
We investigate the global error and thus the convergence only on Ωh (i.e. in
the interior of the domain), not on the boundaries. For Dirichlet conditions, that
is sufficient. 

Theorem
Consider the Poisson equation −∆u = f and Dirichlet boundary conditions
u(x, y) = G(x, y) on ∂Ω. Let denote ∆h the discretization on a uniform grid with
mesh size h using a 5-point stencil.

Consistency and stability ⇒ convergence


c Rainer Callies TUM 2020 73
Proof and explanation:
• We define the (local) consistency error r(xi , yj ) := ∆h u(xi , yj ) − ∆u(xi , yj ) for
(xi , yj ) ∈ Ωh and obtain

∆h e(xi , yj ) = ∆h Uij − ∆h u(xi , yj ) = −f (xi , yj ) − ∆h u(xi , yj )


= ∆u(xi , yj ) − ∆h u(xi , yj ) = −r(xi , yj )

Therefore, we get a new and discrete boundary value problem for the error

−∆h e = r ∀ (xi , yj ) ∈ Ωh ∧ e(xi , yj ) = 0 ∀ (xi , yj ) ∈ ∂Ωh

This is another discretized Poisson equation, which again can be written as a


sparse linear system
( ) ( )
Ah E = R with E = e(xi , yj ) , R = r(xi , yj ) , (xi , yj ) ∈ Ωh

Ah is the same matrix as defined in chap. 4.3.4, E and R are vectors with the
components e(xi , yj ), r(xi , yj ) ordered in the same way as the components of
U and f˜h in chap. 4.3.4. The boundary conditions for e already have been
inserted.

• In the next step we show the stability of the system Ah E = R using the dis-
crete maximum principle.

We scale the errors


e r
ẽ := , r̃ := with γ(h) := max |r(xi , yj )|
γ(h) γ(h) (xi ,yj )∈Ωh

and consider the scaled system

−∆h ẽ = r̃ ∧ |r̃(xi , yj )| ≤ 1 ∀ (xi , yj ) ∈ Ωh ∧ ẽ(xi , yj ) = 0 ∀ (xi , yj ) ∈ ∂Ωh

If Ω is bounded we can define a ball (= circle) Bϱ (0) := {(x, y) ∈ IR2 | x2 +y 2 < ϱ2 }


that completely contains Ω: Ω ⊆ Bϱ (0).

The new defined function


1( 2 )
w(x, y) := ϱ − x2 − y 2
4
has the following properties

−∆w = −∆h w = 1 ∀ (xi , yj ) ∈ Ωh ∧ w(xi , yj ) ≥ 0 ∀ (xi , yj ) ∈ ∂Ωh

From that we get

−∆h ẽ = r̃ ≤ 1 = −∆h w ∀ (xi , yj ) ∈ Ωh ∧ 0 = ẽ ≤ w ∀ (xi , yj ) ∈ ∂Ωh

Now we can use the discrete comparison principle to obtain


ϱ2
ẽ ≤ w ≤ max(w) = ∀ (xi , yj ) ∈ Ωh
4


c Rainer Callies TUM 2020 74
Analogously we perform the steps above for w̃ := −w and get ẽ ≥ −ϱ2 /4; we
combine these two results and obtain (after multiplication with γ(h)) the stability
condition
ϱ2 ϱ2 ϱ2
max |ẽ(xi , yj )| ≤ ⇒ max |e(xi , yj )| ≤ max |r(xi , yj )| = γ(h)
(xi ,yj )∈Ωh 4 (xi ,yj )∈Ωh 4 (xi ,yj )∈Ωh 4

• Because the consistency condition limh→0 γ(h) = 0 holds, the total error e
shows the same behaviour. This is convergence. 

Let us summarize our convergence results from the proof above:

Theorem (convergence of discretized Poisson)


Consider the Poisson equation −∆u = f and Dirichlet boundary conditions
u(x, y) = G(x, y) on ∂Ω. Let denote ∆h the discretization using a 5-point stencil.

If u ∈ C 3 (Ω̄, IR), then the difference scheme converges to the exact solution and

max |Uij − u(xi , yj )| = O(h) for h → 0


(xi ,yj )∈Ωh

For u ∈ C 4 (Ω̄, IR) and a uniform grid with mesh size h, the difference scheme
converges to the exact solution and

max |Uij − u(xi , yj )| = O(h2 ) for h → 0


(xi ,yj )∈Ωh

Remarks
 For a uniform grid the discretization error γ(h) = O(h2 ).
 With a refined numerical analysis we can show that for O(h2 )-convergence
a uniform grid is not necessary. 

4.4 1D Linear Advection Equation


4.4.1 Formulation of the PDE Problem
The linear advection equation (= transport equation) may be used in a model
of various phenomena like the movement of pollutant in a river. In its simplest
version and in one spatial dimension the linear advection equation is

ut + vux = 0 , u = u(t, x)

with time t > 0 and space coordinate x. To complete the PDE problem let the
initial condition for u be
u(0, x) = f (x)
For the moment we will ignore any boundary condition.


c Rainer Callies TUM 2020 75
Remark
If we interpret the above defined PDE as a (partial) model of the transport of
a soluble pollutant by a 1D river then u(t, x) is pollutant concentration at time t
and position x along the river and v is the (constant) velocity of the river. 

Exact solution of the liner advection equation


It can be easily shown that the exact solution is

u(t, x) = f (x − vt)

This means that u(t, x) is just the initial concentration profile, f (x), translated
by vt along the x-axis. For v > 0, the translation is to the right and for v < 0, the
translation is to the left. In either case the pollution moves downstream at the
speed of the river. This model is unrealistic (because it e.g. neglects diffusion)
u(t,x)
concentration

vt

1 5 1+vt 5+vt x

Figure 18: Concentration profile at initial time t = 0 (red) and after time t (blue), v > 0.

but is useful for learning purposes.

Example (simple explicit FD scheme applied to the 1D advection equation)


Consider an initial condition profile as in fig. 19 and the computational spatial
domain [0, 100] uniformly discretized with x0 = 0, . . . , x100 = 100 and v = 0.5 > 0.
We use first order forward differences in both space and time to obtain the finite
difference formulation.

1 1

0.8 0.8
concentration u
concentration u

0.6 0.6

0.4 0.4

0.2 0.2

0 0

0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
x x

Figure 19: Comparison of numerical (+) and exact solutions (o) to the 1D linear
advection equation using ∆t = 0.3, 10 time steps (le.) and 25 time steps (ri.).

The numerical concentration peak has moved to the right place but is higher
than the exact solution. More critical there is some noticeable divergence from


c Rainer Callies TUM 2020 76
the exact solution in the numerical solution around x = 15 and x = 67. Some-
thing is wrong with this scheme! 

Question: Can we design other FD schemes to get more accurate results?

4.4.2 Explicit Schemes


In a first approach, we construct some explicit schemes, i.e. the data at the
next time level is obtained from an explicit formula involving data from previous
time levels only. By that, the solution is obtained row by row in the x-t-mesh.

To obtain such schemes we reformulate the PDE using


( ) ( ) ( )
∂ ∂ ∂ ∂ ∂ ∂
utt = u = −v u = −vuxt = −vutx = −v u = v 2 uxx
∂t ∂t ∂t ∂x ∂x ∂t

For fixed x, the Taylor expansion of u(t + ∆t, x) to order 3 gives,

∆t2
u(t + ∆t, x) = u(t, x) + ∆t · ut (t, x) + utt (x, t) + O(∆t3 )
2!
∆t2
= u(t, x) − v∆t · ux (t, x) + v 2 uxx (x, t) + O(∆t3 )
2!
Now we only need information of the n-th time step to compute the new ap-
proximates for the (n + 1)-th step, because only spatial derivates exist. Using
the equivalent operator notation we get

∂ ∆t2 ∂ 2
Lx (∆t) := 1 − v∆t + v2 , u(t + ∆t, x) = Lx (∆t)u(t, x) + O(∆t3 )
∂x 2! ∂x2

To design FD schemes we simply redefine Lx (∆t) by replacing each continu-


ous partial derivative by a finite difference approximation (denoted by δx , δxx ).

t
.. .. .. .. .. .. ..
. . . . . . .

n +1
n Uni

n –1

t=0
0 i–1 i i +1 N
x=0 x=L

Figure 20: Mesh on a semi-infinite strip used for solution to the 1D linear advection
equation. Solid blue squares indicate the location of the (known) initial values. Open
squares indicate the location of the (known) boundary values. Open circles indicate
the position of the interior points where the FD approximation is computed.


c Rainer Callies TUM 2020 77
Consistent with the notation in the previous subchapters, Un,i is the approxi-
mation to the exact solution u(tn , xi ) at the n-th time step and the i-th spatial
grid point. Some examples of FD schemes are now given.

 Forward Time Central Space (FTCS) Scheme

Un,i+1 − Un,i−1
δx Un,i = , δxx Un,i = 0
2∆x
v∆t
⇒ Un+1,i = Un,i − (Un,i+1 − Un,i−1 )
2∆x
The scheme is first order in time and second order in space, i.e. it has a
truncation error of O(∆t) + O(∆x2 ). Ghost values are required at both left and
right ends of the computational domain.

 First Order Upwind (FOU) Scheme (for v > 0)

Un,i − Un,i−1
δx Un,i = , δxx Un,i = 0
∆x
v∆t
⇒ Un+1,i = Un,i − (Un,i − Un,i−1 )
∆x
The scheme has a truncation error of O(∆t)+O(∆x). A ghost value is required
at the left end of the computational domain.

time level time level

n+1 n+1

∆t ∆t
n n

i-1 i i+1 i-1 i


spatial steps
2 ∆x ∆x

Figure 21: Stencil for the FTCS Scheme (le.) and the FOU Scheme (ri.).

 Lax-Wendroff Scheme

Un,i+1 − Un,i−1 Un,i+1 − 2U n, i + Un,i−1


δx Un,i = , δxx Un,i =
2∆x ∆x2
v∆t 2
v ∆t 2
⇒ Un+1,i = Un,i − (Un,i+1 − Un,i−1 ) + (Un,i+1 − 2Un,i + Un,i−1 )
2∆x 2∆x2
The scheme is second order in time and second order in space, i.e. the
scheme has a truncation error of O(∆t2 ) + O(∆x2 ). Ghost values are required
at both left and right ends of the computational domain.


c Rainer Callies TUM 2020 78
 Lax-Friedrich Scheme (almost FTCS, Un,i replaced by mean value)

Un,i+1 − Un,i−1
δx Un,i = , δxx Un,i = 0
2∆x
Un,i+1 + Un,i−1 v∆t
⇒ Un+1,i = − (Un,i+1 − Un,i−1 )
2 2∆x
The scheme has a truncation error of O(∆t) + O(∆x). Ghost values are re-
quired at both left and right ends of the computational domain. Although this
scheme appears to be quite similar to the FTCS scheme its performance is
very different.

time level time level

n+1 n+1

∆t ∆t
n n

i-1 i i+1 i-1 i+1


spatial steps
2 ∆x 2 ∆x

Figure 22: Stencil for the Lax-Wendroff (le.) and the Lax-Friedrich Scheme (ri.).

 Multi-Level Schemes in General


So far all our schemes have been based on using data at the current time level
(n) to advance to the next time level (n + 1). This approach can be extended to
multi-level schemes by performing Taylor approximations at t − ∆t in addition:

u(t + ∆t, x) − u(t − ∆t, x) = 2∆tut (t, x) + O(∆t3 ) = −2v∆tux (t, x) + O(∆t3 )

Dropping the error term, replacing the differential operator by the difference
operator and using the usual discrete notation gives the general FD scheme in
operator notation
Un+1,i = Un−1,i − 2v∆t · δx Un,i

 Leap-Frog Scheme (a special multi-level scheme)

Un,i+1 − Un,i−1
δx Un,i =
2∆x
v∆t
⇒ Un+1,i = Un−1,i −
(Un,i+1 − Un,i−1 )
∆x
The scheme has a truncation error of O(∆t2 ) + O(∆x2 ). Ghost values are re-
quired at both left and right ends of the computational domain. Initial conditions
are required at two time levels!


c Rainer Callies TUM 2020 79
time level

n+1

∆t
n

n-1
i-1 i i+1
spatial steps
2 ∆x

Figure 23: Stencil for the Leap-Frog Scheme.

4.4.3 Repetition: Discrete Fourier Transform (DFT)


We start the discussion of von Neumann stability analysis with a summary of
the DFT (see last semester). It is an interpolation scheme well-suited to inter-
polate a periodic function f with some set of periodic basic functions instead
of polynomials.

 Example
Consider a 2π-periodic function f with f (x) = f (x + 2π), x ∈ IR. Other periodici-
ties can easily be transformed to 2π-periodicity. We want to interpolate f at the
six equidistant nodes xk = 2πk/n for k = 0, . . . 5 by an interpolant T (x) which is
a linear combination of trigonometric basis functions:


2 ∑
2
( )j
T (x) = γj e ijx
= γj eix ∧ γk from T (xk ) = f (xk ), k = 0, . . . , 4
j=−2 j=−2

Because of the special structure of the equidistant nodes

2πk ( )l ( )l
xk = ⇒ eixk = ei2πk/5 = ω kl , ω := ei2π/5
5
the interpolation condition T (xk ) = f (xk ) leads to the linear system
     
ω0 ω0 ω0 ω0 ω0 γ−2 f (x0 )
 ω −2 ω −1 ω0 ω1 ω2     
   γ−1   f (x1 ) 
     
F⃗γ :=  ω −4 ω −2 ω0 ω2 ω4 · γ0 = f (x2 )  =: f⃗
     
 ω −6 ω −3 ω0 ω3 ω6   γ1   f (x3 ) 
ω −8 ω −4 ω0 ω4 ω8 γ2 f (x4 )

We have seen that


1 H
F H · F = n · In ⇒ F −1 = F , n=5
n
and thus the matrix √1 F is a unitary matrix. We have computed the condition
n
number of F : ∥F ∥2 · ∥F −1 ∥2 = 1, the matrix is perfectly conditioned! 


c Rainer Callies TUM 2020 80
Remark
The solution parameters γ−(n−1)/2 , . . . , γ+(n−1)/2 (here: n = 5) are called dis-
crete Fourier coefficients of the data stored in f⃗. 

Definition
Let be f : [0, 2π] → C piecewise continuous (finite number of jumps of finite size
in the real or imaginary part). Then the Fourier series of f is defined by

∑ ∫ 2π
1
Sf (x) := ck e ikx
with ck := f (x)e−ikx dx , k ∈ ZZ .
2π 0
k=−∞

Theorem
Let be f ∈ Cc1 ([0, 2π], C ) (function continuous everywhere, its first derivative
piecewise continuous).
Then Sf converges uniformly to f . 

Important property!
Notice that γk is an approximation to this ck or – after renaming the index – γl
approximates cl :
( )
−1 1
F⃗γ = f⃗ ⇒ γl = (F f⃗)l = F f⃗
H
n k
1∑ 1∑
n−1 n−1
= f (xk )ω −lk = f (xk )e−ilxk
n n
k=0 k=0

With the periodicity condition f (x0 )e−ilx0 = f (xn )e−ilxn we rewrite the last sum
( )
2π 1 ∑
n−1
1
2πγl = f (x0 )e−ilx0 + f (xk )e−ilxk + f (xn )e−ilxn
n 2 2
k=1
∫ 2π
≈ f (x)e−ilx dx = 2πcl
0

The last sum can be viewed as a composite trapezoid rule approximation of


the integral. 

Conclusion: The trigonometric interpolant


(n−1)/2
Tn (x) := γk eikx (n = 5 in our example)
k=−(n−1)/2

is an approximation to the Fourier series obtained by (1) truncating the series,


and (2) replacing the integral ck with its approximation γk . 


c Rainer Callies TUM 2020 81
4.4.4 Von Neumann Stability Analysis
The idea of a FD scheme is that Un,i approximates u(tn , xi ) and the approxima-
tion becomes better and better as ∆x and ∆t become smaller. With increas-
ing mesh refinement round-off errors play an increasing role in the difference
equation. On the other hand discretization errors are reduced.
Let the pointwise error (also called the ’global error’),

en,i := Un,i − u(tn , xi ) .

en,i contains the accumulated (discretization and rounding) errors resulting


from all previous steps and from the possible perturbation in the initial data.

Remark
Without perturbations on time level 0 the values u(0, xi ) are known at all grid
points and U0,i is taken to be u(0, xi ) so e0,i = 0 at all grid points. As iterations
of the FD scheme introduce additional errors, in general en,i ̸= 0. It may be that
as iterations continue errors are compounded and amplified so that en,i grows
unboundedly making the FD scheme useless. 

Remark
Consistency is a condition on the structure of the formulation of the numerical
algorithm.
The discretized PDE is compared with the true PDE and for finer and finer
mesh the discretized problem (not the solution!) comes closer and closer to
the true problem.
Stability is a condition on the solution of the numerical scheme.
Here the real numerical solution of the FD scheme is investigated and error
propagation and amplification are analyzed.
Convergence is a condition on the solution of the numerical scheme.
The real numerical solution is compared to the exact solution of the PDE. 

Basic strategy in stability analysis


We assume that on time level n all errors en,i are known. Let us investigate
the next time step n → n + 1. The new errors made in this single step are
neglected. Instead we investigate which influence do the accumulated errors
en,i obtained so far have on the results on the next time level: they propagate
and cause errors – denoted by ẽn+1,i – on time level (n + 1).
A FD scheme is stable if and only if these propagated errors do not grow un-
boundedly with time, i.e.

|ẽn+1,i | ≤ |en,i | (∗)

The analysis of stability due to von Neumann is based on that property (∗).


c Rainer Callies TUM 2020 82
time level
output data with output error,
n+1
resulting from input error only

ideal algorithm, no additional errors


∆t numerical FD scheme idea of error propagation/condition

n input data with input error

i-1 i i+1
2 ∆x

Figure 24: Neumann’s stability analysis related to error propagation for 1 timestep.

Realization of that strategy


For a linear and consistent FD scheme and neglecting the (vanishing) trunca-
tion errors the exact solution u(tn , xi ) of the PDE satisfies the same difference
scheme as the Un,i and so does the error en,i . Hence the errors do evolve over
time in the same way as the numerical solution Un,i does.

 Example (FOU scheme)

Un+1,i = Un,i − c(Un,i − Un,i−1 )


⇒ [un+1,i + ẽn+1,i ] = [un,i + en,i ] − c([un,i + en,i ] − [un,i−1 + en,i−1 ])
⇒ ẽn+1,i = en,i − c(en,i − en,i−1 ) + O(∆t) + O(∆x)

We neglect the errors in the time step n → n + 1 and therefore cancel especially
the discretization error. We get

ẽn+1,i = en,i − c(en,i − en,i−1 )

Next we will always assume that the boundary conditions are periodic.
The problem of stability for a linear problem with constant coefficients is well
understood when the influence of boundaries can be neglected or removed.
This is the case either for an infinite domain or for periodic boundary conditions
on a finite domain.
In the latter case we consider that the computational domain on the x-axis of
length L is repeated periodically and the non-periodic solution u(t, x) on the
finite interval [0, L] is transformed into a periodic one.
In case of non-periodic Dirichlet boundary conditions the approach is also pos-
sible, because the error values are then zero at the boundaries (for details see
below) and thus the errors are periodic even if the solution is not.


c Rainer Callies TUM 2020 83
 Example
In our advection example periodicity is no restriction because at the beginning
and at the end of a sufficiently long river (the spatial domain) the concentration
u(t, x) of the pollutant equals zero! 

We next interpolate the discrete error values en,i at xi by trigonometric interpo-


lation to obtain a continuous error function en (x) on time level n.

For that we assume a uniform spatial grid x0 , . . . x2N +1 (even number of 2N + 2


nodes, h = xj+1 − xj ) with Un,0 = Un,2N +1 and un,0 = un,2N +1 and hence en,0 =
en,2N +1 because of periodicity.

Rescaling the spatial interval [x0 , x2N +1 ] to [0, 2π] and applying the DFT, we
may write,

N ∑
N

en (x) = γn,k ejkx ⇒ en,i = en (xi ) = γn,k ejkxi , j := −1
k=−N k=−N

and |γn,k | is an approximation of the amplitude of the k-th Fourier component.


en (x) can be regarded as the sum of 2N + 1 individual harmonic modes.

Analogously we apply the DFT to


N
ẽn+1,i = γ̃n+1,k ejkxi .
k=−N

We insert the sums into the linear FD scheme for the errors, rearrange the
coefficients and obtain


N ( )
ejkxi · . . . = 0 , i = 0, . . . , 2N
k=−N

The resulting homogeneous linear system Ax = 0 has a nonsingular matrix


A = (ejkxi ). Therefore its solution is unique: The discretized equation (= FD
scheme), which is satisfied by the error, must also be satisfied by each indi-
vidual γn,k and γ̃n+1,k respectively. Therefore, an arbitrary harmonic can be
singled out and, when introduced into the scheme, stability requires that no
harmonic mode should be allowed to increase in time without bound.
So we replace en,i by γn,k in (∗) to get the von Neumann condition for stability

γ̃n+1,k
G := max ≤ 1.
−N ≤k≤N γn,k

G is called the amplification factor.


c Rainer Callies TUM 2020 84
Remark
For a better understanding, let us suppose that a special example set of errors
exists such that only one harmonic mode with index k interpolates all errors at
the xi on time level n

en (x) = γn,k ejkx ⇐⇒ en,i = γn,k ejkxi , i = 0, 1, . . . , 2N

For a linear FD scheme the errors satisfy the same scheme (as we have seen),
thus e.g. for the FOU scheme

γn,k ejkxi − c(γn,k ejkxi − γn,k ejk(xi −∆x) = (1 − c + ce−jk·∆x) ) ·γn,k ejkxi
| {z }
=:λ

N
= λ · γn,k ejkxi = γ̃n+1,l ejlxi , i = 0, 1, . . . , 2N
l=−N

Therefore we have 2N + 1 conditions for the 2N + 1 unknowns γ̃n+1,l and


uniquely obtain γ̃n+1,k = λ · γn,k and γ̃n+1,l = 0 for l ̸= k (proof by insertion).
The effect of a single step of the numerical scheme in this special case is to
multiply each error γn,k ejkxi by the so-called magnification factor λ. In other
words, γn,k ejkx assumes the role of an eigenfunction, with the magnification
factor λ being the corresponding eigenvalue, of the linear operator governing
each step of the numerical scheme. Continuing, we find that the effect of m
further iterations of the scheme is to multiply the exponential by the mth power
of the magnification factor:

en+m (x) = λm en (x) for en (x) = γn,k ejkx

Thus, the stability of the scheme will be governed by the size of the magnifica-
tion factor and it ist necessary that |λ| ≤ 1. 

Remarks
Von Neumann stability analysis is applicable without further modifications only
for linear PDEs with constant coefficients.
To apply this simple approach to multi-level schemes, additional considerations
are necessary.
Stability analysis hasn’t been worked out for most FD non-linear schemes,
because it heavily relies on the theory of linear difference equations.
If G ≤ 1 is always satisfied, the scheme is stable. That often occurs for implicit
schemes.
If G ≤ 1 can never be satisfied for ∆t > 0 the scheme is unconditionally unsta-
ble.
Mostly for explicit schemes, G ≤ 1 establishes a relation between the mesh
sizes ∆t and ∆x. In that case the scheme is called conditionally stable. 


c Rainer Callies TUM 2020 85
 Example
We apply von Neumann stability analysis to the FOU scheme
v∆t
Un+1,i = Un,i − c · (Un,i − Un,i−1 ) = c · Un,i−1 + (1 − c) · Un,i , c :=
∆x

Step 1: Replace each instance of Un,i in the FD scheme by its corresponding


single DFT component.

γ̃n+1,k ejkxi = c · γn,k ejkxi−1 + (1 − c)γn,k ejkxi , xi−1 = xi − ∆x

Step 2: Rearrange to get G.


Dividing through by γn,k ejkxi gives,

γ̃n+1,k
−jk∆x
G= = (1 − c) + c · e
γn,k

Step 3: Use the constraint G ≤ 1 to obtain the condition for ∆t (this step could
be algebraically tricky).
Using the triangle inequality we estimate,


(1 − c) + c · e−jk∆x ≤ |(1 − c)| + c · e−jk∆x = |1 + c| + |c|

When c ∈ [0, 1] then |1 − c| = 1 − c and |c| = c, therefore we get,



−jk∆x
(1 − c) + c · e ≤ (1 − c) + c = 1 .

Hence the FOU scheme for the 1D linear advection equation is stable when
0 ≤ c ≤ 1 which means that
∆x
∆t ≤ .
v
The FOU scheme is said to be conditionally stable. 

Remark
In a similar way we can prove (see (A41)) that for the 1D linear advection
equation the FTCS scheme is unconditionally unstable and therefore useless
even though it is consistent!
The Lax-Friedrich scheme for the 1D advection equation is conditionally sta-
ble (see (A42)): The stability condition is fulfilled if the Courant number
C := v∆t/∆x satisfies |C| < 1 (Courant-Friedrichs-Lewy or CFL condition). 

Comment on the CFL condition


It is a fundamental stability condition of most explicit schemes for wave and
convection equations and it expresses that the distance covered during the
time interval ∆t, by the disturbances propagating with speed v, should be lower
than the minimum distance ∆x between two mesh points. 


c Rainer Callies TUM 2020 86
 Example
The one-dimensional heat equation ut = auxx defined on the spatial interval
[0, L] can be discretized by the FTCS scheme as
a ∆t
Un+1,j = Un,j + r (Un,j+1 − 2Un,j + Un,j−1 ) , r=
(∆x)2
Neumann stability analysis shows that
a∆t 1
r= 2 ≤
(∆x) 2

is the stability requirement for the FTCS scheme as applied to the one-
dimensional heat equation. In contrast to the advection equation, the FTCS
scheme is applicable here!
A numerical example:
For copper, a = 117 · 10−6 m2 /s. If we choose a thin rod of length 1 m with a
spatial resolution of 1 cm, then ∆x = 10−2 . Stability restriction gives

∆x2
∆t ≤ = 10−4 /(2 · 117 · 10−6 ) ≈ 0.5 [s] .
2a
For ∆x = 10−3 [m] (i.e. 1 mm), we get ∆t ≈ 0.005 [s]. 

4.4.5 Difference Equations


To extend the von Neumann analysis to multi-level schemes, we need some
basic knowledge on difference equations. This chapter summarizes P. Henrici,
Elemente der numerischen Analysis 1, chap. 6, BI-HTB Nr. 551, 1964.

Definition
A linear difference equation of order m with constant coefficients is defined by

xn + a1 xn−1 + a2 xn−2 + . . . + am xn−m = bn , ai ∈ C , am ̸= 0

with {bn } ⊂ C a given sequence; the unknown sequence {xn } ⊂ C has to be


calculated. If bn = 0 ∀ n, the difference equation is called homogeneous. A
difference equation often is called recursion.
We further define X := {xn } and thus (X)n = xn and b := {bn }.
We introduce the additional sequence LX by the componentwise definition

(LX)n := xn + a1 xn−1 + a2 xn−2 + . . . + am xn−m


Remark
X can be regarded as a generalization of a vector with countable, but infinitely
many components. Multiplication by a scalar (aX)n = a(X)n and vector addi-
tion (X + Y )n = (X)n + (Y )n are defined componentwise.


c Rainer Callies TUM 2020 87
Then the operator L is linear:

L(aX + bY ) = aLX + bLY


(A) Special solutions of homogeneous difference equations of order 2
We start the discussion with the simple case m = 2:

xn + a1 xn−1 + a2 xn−2 = 0 (∗)

This corresponds to homogeneous linear ODEs x′′ + a1 x′ + a2 x = 0, which al-


ways has (special) solutions exp(rt) with r properly chosen.
Motivated by this similarity and replacing the continuous variable t by the dis-
crete variable n, we try to find special solutions of the difference equation with
the structure xn = er·n = z n , z := er .
Insertion yields

(LX)n = z n + a1 z n−1 + a2 z n−2 = z n−2 (z 2 + a1 z + a2 ) = 0

For this we get either the trivial solution z = 0 or z is a root of the so-called
characteristic polynomial p(z) := z 2 + a1 z + a2 .

Theorem
Let be z1 ̸= z2 two different (complex) solutions of the characteristic polynomial
p(z), then the two sequences

(X (1) )n = z1n and (X (2) )n = z2n

are solutions of LX = ⃗0.


If z1 = z2 , then
(X (1) )n = z1n and (X (2) )n = n · z1n−1
are solutions of LX = ⃗0.

Proof:
Case z1 ̸= z2 is clear. For z1 = z2 we know that not only p(z1 ) = 0, but also
p′ (z1 ) = 0. Differentiation of (LX)n = z n−2 p(z) yields

nz n−1 + a1 (n − 1)z n−2 + a2 (n − 2)z n−3 = (n − 2)z n−3 p(z) + z n−2 p′ (z)

For z1 the right hand side is zero and thus nz1n−1 is a solution of the difference
equation (∗): With the substitution xn = nz1n−1 into the left side we get again

xn + a1 xn−1 + a2 xn−2 = 0


c Rainer Callies TUM 2020 88

Example

xn − 2xn−1 + xn−2 = 0 ⇒ p(z) = z 2 − 2z + 1 ⇒ z1 = z2 = 1

We get the solutions (X (1) )n = 1n = 1 and (X (2) )n = n · 1n−1 = n; insertion


proves that they are solutions. 

Lemma
Let be X (1) , X (2) two solutions of LX = ⃗0 and c1 , c2 ∈ C arbitrary constants.
Then also X := c1 X (1) + c2 X (2) is a solution (proof via linearity).

Remark
Situation very similar to linear ODEs. Let e.g. z1 and z2 = z1 complex roots of
p(z). Then (X (1) )n = z1n and (X (2) )n = (z1 )n are complex solutions of LX = ⃗0.
With the corollary we get the real-valued solutions
1 n
(Y (1) )n = (z1 + (z1 )n ) = Re z1n , (Y (2) )n = Im z1n
2

By this e.g. from (X (1) )n = einφ we get (Y (1) )n = cos(nφ) and (Y (2) )n = sin(nφ).

Lemma
Let be bn defined for n ≥ n0 and N ≥ n0 + 1.
Then the difference equation LX = b has exactly one solution, which takes
preset values (vorgegebene Werte) for xN −1 and xN .

Proof: (by contradiction)


Assume that X (1) and X (2) are 2 different solutions with (X (1) )N −1 = (X (2) )N −1
and (X (1) )N = (X (2) )N .
Then the sequence D = {dn } := X (1) − X (2) solves LX = ⃗0 and dN −1 = dN = 0.
X (1) ̸= X (2) , so there exists (w.l.o.g.) a minimum index n∗ > N ∋ dn∗ ̸= 0 (in
case n∗ < N − 1 we proceed analogously).
D is solution, i.e. dn∗ + a1 dn∗ −1 + a2 dn∗ −2 = 0 with dn∗ −1 = dn∗ −2 = 0, dn∗ ̸= 0

Remark
From the lemma above we get: If X is solution of LX = ⃗0 with xN −1 = xN = 0
(there exist two successive elements which are zero), then X = ⃗0. 


c Rainer Callies TUM 2020 89
Theorem
Let be X (1) and X (2) two solutions of LX = ⃗0.
Then every solution X of LX = ⃗0 can be uniquely written as X = c1 X (1) +c2 X (2)
⇐⇒ the Wronski determinant

(1) (2)
xn xn
wn := (1)
xn−1 x(2)
n−1

is non-zero for at least one n ∈ ZZ.

Proof:
X = c1 X (1) + c2 X (2) ⇒ the system
{ }
(1) (2)
c1 xn + c2 xn = xn
(1) (2)
c1 xn−1 + c2 xn−1 = xn−1

for the determination of (c1 , c2 ) is uniquely solvable ⇔ wn ̸= 0.


”⇐”: Lemma 1 → X and c1 X (1) + c2 X (2) are solutions; Lemma 2 → because
of the condition in brackets both solutions are identical. 

Remark
We are now able to solve initial value problems for the difference equation
LX = ⃗0. We have to find two special solutions X (1) , X (2) with non-vanishing
Wronski determinant. c1, c2 are determined by the inital condition.
Possible initial conditions are x−1 = 1 ∧ x0 = 1 (two components of the solution
sequence X are given). 

Example
Let the characteristic polynomial p of LX = ⃗0 have two different roots z1 , z2 .
We determine the Wronski determinant of the corresponding special solutions
(1) (2)
xn = z1n and xn = z2n
n
z1 z2n

wn := n−1 n−1 = (z1 z2 )n−1 (z1 − z2 ) ̸= 0 ∀ n
z1 z2

⇒ the general solution of the difference equation is given by c1 z1n + c2 z2n . 

Definition
Two sequences X (1) , X (2) – which are not necessarily solutions of LX = ⃗0 –
are called linearly dependent, if ∃ c1 , c2 ∈ C ∋ c1 X (1) + c2 X (2) = ⃗0.
Otherwise they are linearly independent.


c Rainer Callies TUM 2020 90
Theorem
Let be X (1) and X (2) two solutions of LX = ⃗0 and W = {wn } the sequence of
the Wronski determinants.

(1) If X (1) and X (2) linearly dependent, then W = ⃗0, i.e. wn = 0 ∀ n.


(2) If wN = 0 for at least one N , then X (1) and X (2) are linearly dependent.
(3) If wN ̸= 0 for one N , then wn ̸= 0 ∀ n.

(B) Inhomogeneous difference equations of order 2


We consider the system LX = b or

xn + a1 xn−1 + a2 xn−2 = bn (∗∗)

As in the homogeneous case, the results are similar to those for linear ODEs.

Theorem
Let be X (1) and X (2) two linearly independent solutions of LX = ⃗0 and Y a
special (”partikuläre”) solution of LY = b.

Then every solution X of LX = b can be written as

X = Y + c1 X (1) + c2 X (2)

with c1 , c2 ∈ C properly chosen.

Proof:
For Y = {yn } we get by assumption: yn + a1 yn−1 + a2 yn−2 = bn .
Let be X an arbitrary solution of (∗∗) and let us define the difference D :=
X − Y . For that we obtain

dn + a1 dn−1 + a2 dn−2 = 0

and thus D solves the homogeneous difference equation. Because of the lin-
ear independence, D can be written as D = c1 X (1) + c2 X (2) . With X = Y + D
we get the claim. 

Remark
The inhomogeneous problem is reduced to the determination of a special so-
lution Y .


c Rainer Callies TUM 2020 91
As in the linear ODE case, we can either find a proper ansatz heuristically
(e.g. if the bn are polynomials, try for xn polynomials of the same degree and
determine the free constants) or use the general method of the variation of
parameters (”Variation der Konstanten”). 

Theorem (variation of parameters)


Let be X (1) and X (2) two linearly independent solutions of LX = ⃗0 and W =
{wn } the sequence of their Wronski determinants.

Then a special solution of LX = b is obtained from



n (1)

∑ (2)
xn xn bi
xn = (1)
x (2) · w
i−1 xi−1
i=0 i

for n ≥ 0.

Proof: by insertion. 

Example

Determine a special solution of

xn − 2xn−1 + xn−2 = 1

We already treated the homogeneous problem in a previous example:

xn − 2xn−1 + xn−2 = 1 ⇒ p(z) = z 2 − 2z + 1 ⇒ z1 = z2 = 1

and obtained the linear independent solutions (X (1) )n = 1n = 1 and (X (2) )n =


n · 1n−1 = n for the double root.
For the Wronski determinants in case of a double root we get
n
z1 nz n−1
wn = n−1 1
n−2 = −z1
2n−2
= −1
z1 (n − 1)z1

Using the formula from the theorem on the variation of parameters and insert-
ing bn = 1 (n = 0, 1, 2, . . .) we obtain
n


1 n ∑n ∑
n+1
(n + 1)(n + 2)
xn = − = (n + 1 + i)
k:=n+1−i
= k=
1 i−1 2
i=0 i=1 k=1


c Rainer Callies TUM 2020 92
(C) Generalization to difference equations of order m

 Structure: (LX)n = xn + a1 xn−1 + . . . + am xn−m


 Every linear combination of two solutions of LX = ⃗0 again is a solution.
 Let be bn defined for n ≥ n0 and N ≥ n0 + m. The difference equa-
tion LX = b has exactly one solution X which takes preset values for
xN , xN −1 , . . . , xN −m+1 .
Again X ≡ ⃗0, if xn = 0 for m successive values.
 Let be X (1) , . . . , X (m) solutions of LX = ⃗0. Then the Wronski determinant
wn is defined by
(1)
xn xn
(2) (m)
· · · xn

(1) (2) (m)
xn−1 xn−1 · · · xn−1
wn := . .. ..

.. . .
(1)
x (2) (m)
xn−m+1 · · · xn−m+1
n−m+1

 Let be X (1) , . . . , X (m) solutions of LX = ⃗0 and wn the Wronski determinant.


If the X (1) , . . . , X (m) are linearly dependent, then W = (wn ) = ⃗0.
If one element of W is equal zero, then the X (1) , . . . , X (m) are linearly de-
pendent.
 Let be X (1) , . . . , X (m) linearly independent solutions of LX = ⃗0 and Y a spe-
cial (”partikuläre”) solution of LY = b.

Then every solution X of LX = b can be written as

X = Y + c1 X (1) + . . . + cm X (m)

with c1 , . . . , cm ∈ C properly chosen.


 A special solution Y again can be obtained via the strategy ”variation of
parameters”.
Let be X (1) , . . . , X (m) linearly independent solutions of LX = ⃗0 and W =
{wn } the sequence of their Wronski determinants. Let be B = (bn ) well-
defined for n ≥ 0.

Then a special solution of LX = b is obtained from


(1)
xn xn
(2)
· · · xn
(m)

∑n (1) (2) (m)
xi−1 xi−1 · · · xi−1 bi
xn = · , n≥0
.. .. .. wi
i=0 . . .

x (1) (2) (m)
i−m+1 xi−m+1 · · · xi−m+1

The final question now is: How to obtain a set {X (1) , . . . , X (m) } of linearly inde-
pendent solutions? The approach is similar to the case m = 2 and leads to the
following theorem.


c Rainer Callies TUM 2020 93
Theorem
Given is the linear homogeneous difference equation of order m
xn + a1 xn−1 + . . . + am xn−m = 0
Let be p(z) = z m + a1 z m−1 + . . . + am−1 z 1 + am the characteristic polynomial of
LX = ⃗0 and let be z1 , . . . , zk (k ≤ m) the k different roots of p(z) with the multi-

plicity li + 1 ( li = m − k) of the i-th root zi .
Then the m sequences X (j) with
{ n }
(j) zi , p=0
xn (= xn ) = , i = 1, 2, . . . , k ,
n(n − 1) · · · (n − p + 1)zin−p , p = 1, . . . , li

form a full set of m linearly independent solutions – i.e. a basis – of LX = ⃗0.

Alternatively a second set of m linearly independent sequences can be ob-


tained by linear combinations of the elements of the just defined set. The new
and simpler set reads as
xn = np zin , p = 0, 1, . . . , li , i = 1, 2, . . . , k .

Example
Consider the difference equation
xn − 2xn−2 + xn+4 = 0 ⇒ p(z) = z 4 − 2z 2 + 1 = (z + 1)2 (z − 1)2
with two double roots z1 = 1 and z2 = −1. From the theorem we get
(1) (2) (3) (4)
xn = 1 , xn = n , xn = (−1)n , xn = n · (−1)n

Example
Assume that z is a fourfold root of p(z). We want to prove that in this case
X = {n3 z n } is a solution of the difference equation (this illustrates the existence
of an alternative set in the above theorem).
For that we try to represent X as a linear combination of the original solutions:
n(n − 1)(n − 2) = n3 − 3n2 + 2n
n(n − 1) = n2 − n
n = n
and n3 = n(n − 1)(n − 2) + 3n(n − 1) + n. From that we get
n3 z n = z 3 [n(n − 1)(n − 2)z n−3 ] + 3z 2 [n(n − 1)z n−2 ] + z[nz n−1 ]
This is the required linear combination of solutions of the first set in the above
theorem to obtain a solution of the second set. 


c Rainer Callies TUM 2020 94
4.4.6 Von Neumann Stability Analysis Extendend

Test example
Let us again considerate the (explicit) FOU scheme

Un+1,i = c · Un,i−1 + (1 − c) · Un,i , µ := ejk∆x


1
γ̃n+1,k = cγn,k + (1 − c)γn,k ⇒
µ
( )
c
0 = γ̃n+1,k − + (1 − c) γn,k
µ

This leads to the difference equation


( )
c
xn+1 − + (1 − c) xn = 0
µ

For the characteristic polynomial and its root we obtain


( )
c c !
p(z) = z − + (1 − c) ⇒ |z1 | = + (1 − c) ≤ 1
for c ∈ [0, 1] ,
µ µ

because for those c we get |1 − c| = 1 − c and |c| = c.


We obtain the basis solution X (1) with

(X (1) )n = xn = z1n

For a unique solution of our problem we still have to add the initial condition:

xn+p = αz1n+p , α ∈ IR
p=0: xn = γn,k = αz1n
⇒ xn+p = γ̃n+p,k = γn,k · z1p

Starting with xn = γn,k we get the solution xn+p = γn,k · z1p and thus the error
component induced by γn,k is damped for increasing p only for |z1 | < 1, which
is the case for
∆x
0 < ∆t < .
v

Example (wave equation and implicit multi-level scheme)
Let us consider the wave equation utt = a2 uxx on the (normalized) spatial in-
terval [0, 2π] with periodic boundary conditions.
We define a uniform spatial grid 0 =: x0 , . . . x2N +1 := 2π (even number of 2N + 2
nodes, ∆x = xj+1 − xj ) with un,0 = un,2N +1 (because of periodicity) and uniform
stepsize ∆t in time.
For the implicit difference scheme we choose
Un+1,i − 2Un,i + Un−1,i Un+1,i+1 − 2Un+1,i + Un+1,i−1
2
= a2 ·
∆t ∆x2

c Rainer Callies TUM 2020 95
and assume that Un,0 = Un,2N +1 too. Hence for the error terms en,i = Un,i − un,i
we obtain the same difference scheme because of linearity and we get en,0 =
en,2N +1 .
We apply the DFT to interpolate the error values on the n-th level and write

N ∑
N

en (x) = jkx
γn,k e ⇒ en,i = en (xi ) = γn,k ejkxi , j := −1
k=−N k=−N

|γn,k | is an approximation of the amplitude of the k-th Fourier component. en (x)


can be regarded as the sum of 2N + 1 individual harmonic modes.
We are now interested in the influence of errors on the levels n − 1 and n on the
error on level n + 1 (error propagation). Due to the error interpolation by DFT
all existing errors on the levels n − 1 and n can affect the new error. Additional
errors produced in the current step are neglected (as usual). It is required that
all the amplitudes |γn,k | of the single error modes are damped in the following
time steps. We will write γn+1,k instead of more precisely γ̃n+1,k .
Insertion of the error terms into the difference scheme with c := |a| · ∆t/∆x and
µk := ejk∆x yields


N ( [ ])
[ ] 2 γn+1,k
e jkxi
· γn+1,k − 2γn,k + γn−1,k − c γn+1,k · µk − 2γn+1,k + =0
µk
k=−N

for i = 0, . . . , 2N . This again is a homogeneous linear system with zero as the


unique solution; therefore the content in the brackets has to vanish and we get
the recursion
( )
c2
γn+1,k 1 − c µk + 2c −
2 2
− 2γn,k + 1 · γn−1,k = 0
µk

With
( )2 ( )
1 ejk∆x/2 − e−jk∆x/2 k∆x
µk − 2 + = (−4) · = −4 sin 2
µk 2i 2
( )
and sk := sin k∆x
2 the characteristic polynomial associated to the difference
equation reads
( )
1 + 4c2 s2k z 2 − 2z + 1 = 0

For the roots we get with |z| = z · z̄
( √ )
(k) 1
z1,2 = 2 ± 4 − 4(1 + 4c sk )
2 2
2(1 + 4c2 s2k )

1 ± −4c2 s2k 1 ± i · 2c · |sk |
= = ⇒
1 + 4c2 s2k 1 + 4c2 s2k

(k) 1
z1,2 = √ < 1 ∀c
1 + 4c2 s2k


c Rainer Callies TUM 2020 96
( )
(k) n
Because the basic solution is xn = zi for single roots, in that case the
scheme is stable.
Double roots only exist for sk = 0, which is impossible for k ̸= 0, because
( )
2π k∆x
∆x = and k ∈ {−N, . . . , N } and sk := sin .
N +1 2

For stability considerations, we can stop here. If in addition we are interested in


the unique solution compatible with the initial conditions, we make the ansatz
of a linear combination of the basis solutions and insert the initial conditions
( ) ( )
(k) n+p (k) n+p
xn+p = α z1 + β z2
( )n−1 ( )
(k) (k) n−1
p = −1 : γn−1,k = xn−1 = α z1 + β z2
( )n ( )
(k) (k) n
p = 0 : γn,k = xn = α z1 + β z2

From these two equations we determine the unknown constants α and β. 

Summary
Von Neumann stability analysis can be extended to multi-level schemes, as can
be seen in the example(s) above. Here difference equations for the amplitudes
(k)
γn,k of the error modes are formulated and the zeros zi of the associated
characteristic polynomials are calculated. The linearly independent sequences
that solve these difference equations should to be damped.

(k)
In case of single roots, the finite difference scheme is stable ⇔ maxi,k zi ≤ 1.
In case of ”< 1”, all error modes are damped.

(k)
A sufficient condition for instability is maxi,k zi > 1.

Because the approach is based on the theory of linear difference equations,


nonlinear finite difference schemes cannot be analyzed by the von Neumann
approach.

4.4.7 Implicit Schemes – Crank-Nicolson Scheme


FD schemes are called explicit if data at the next time level is obtained from
an explicit formula involving data from previous time levels only. This normally
leads to a (stability) restriction on the maximum allowable time step, ∆t.
In implicit schemes data from the next time level occurs on both sides of the dif-
ference scheme that necessitates solving a system of linear equations. There
is no stability restriction on the maximum time step ∆t which may be much
larger than in an explicit scheme for the same problem, as we have seen for
the above example of the wave equation.


c Rainer Callies TUM 2020 97
We go back to the advection equation and choose
Un,i+1 − Un,i−1 Un+1,i+1 − Un+1,i−1
δx Un,i = α + (1 − α) , α ∈ [0, 1] , δxx Un,i = 0
2∆x 2∆x
This is a weighted average of central difference approximations to spatial
derivatives at times levels n and n + 1.

For α = 1/2 we obtain the famous Crank-Nicolson scheme:

time level

n+1

∆t
n

i-1 i i+1
spatial steps
2 ∆x

Figure 25: Stencil for the Crank-Nicolson Scheme.

( )
c Un,i+1 − Un,i−1 Un+1,i+1 − Un+1,i−1 v∆t
Un+1,i = Un,i − + , c=
2 2 2 ∆x
The scheme has a truncation error of O(∆t) + O(∆x2 ). Ghost values are re-
quired at both left and right ends of the computational domain.
The scheme is implicit so values at time level n + 1 are found by solving a
tridiagonal system of linear equations: Rearranging so that data from the same
time level is on the same side gives,
−cUn+1,i−1 + 4Un+1,i + cUn+1,i+1 = cUn,i−1 + 4Un,i − cUn,i+1 =: dn,i
The definition of dn,i reflects that the data at time level n is assumed known.
Un+1,0 and Un+1,N +1 on the left hand side are ghost values which may be
known directly (or can be calculated in terms of neighbouring values depending
on the type of boundary condition given in the problem).
This system is expressed as the matrix equation,
  
4 c 0 ··· 0 Un+1,1
 −c 4 · · ·  
 c 0 0   Un+1,2 
 0 −c 4 0 ··· 0 
 c  
 


 .. .. ..
. 0 0    = d(n)
Ah U (n+1)
= 0 . . ..
0
  . 
 .. .. ..  
 0 ··· 0 . . . 0  
  
 0 ··· 0 −c 4 c   Un+1,N −1 
0 ··· 0 −c 4 Un+1,N

This tridiagonal linear system is solved at each time step and the solution up-
dated iteratively.


c Rainer Callies TUM 2020 98
1

0.8

concentration u
0.6

0.4

0.2

0 10 20 30 40 50 60 70 80 90 100
x

Figure 26: Comparison of numerical (+) and exact solutions (o) to the 1D linear ad-
vection equation using Crank-Nicolson scheme with v = 0.5, c = 2.0, 15 time steps.

Comment on advection and convection


Convection: a flow that combines diffusion and advection.
Diffusion: non-directional molecular transport of mass, heat, or momentum.
Advection: directional bulk transport of mass, heat, or momentum. 

 Example
The advection-diffusion equation belongs to the class of parabolic PDEs. In 1
spatial dimension it is,

ut + vux = Kx uxx , u = u(t, x) .

Kx is called the diffusion coefficient (in the x direction). If Kx = 0 then we get


again the linear advection equation which we studied so far.
Using our previous interpretation of the linear advection equation in which u =
u(t, x) is a river pollutant concentration and v is the speed of the flow, we now
get a more realistic description of pollutant transport. Not only does the initial
pollutant move downstream with velocity v, the pollutant also diffuses into the
surrounding water at rate Kx (the presence of second order spatial derivatives
often indicates a diffusive process).

4.4.8 Matrix Stability Analysis


The linear FD scheme can be rewritten as a linear difference equation

AU (n+1) = BU (n) with A, B ∈ IRN ×N , U (n) = (Un,1 , . . . , Un,N )T

For a consistent scheme and neglecting the (vanishing) truncation error the
exact solution of the PDE satisfies the same scheme and so does the error
vector

Au(n+1) = Bu(n) ⇒ Ae(n+1) = Be(n) , e(n) := U (n) − u(n)


c Rainer Callies TUM 2020 99
1 1

0.8 0.8

concentration u

concentration u
0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
x x

Figure 27: Time evolution of the exact solutions for pure advection (le.) and
advection-diffusion (ri.). Each of the plots contains the initial profile (red) and two
later solutions (blue and green).

From that we get



(n+1) −1 (n)
e(n+1) = A−1 Be(n) ⇒ e = A Be ≤ A−1 B e(n)

Hence the FD scheme is stable if the error is not increasing (same idea as in
von Neumann stability analysis) and this is true if
−1
A B ≤ 1

The matrix norm used here is induced by the vector norm. Often the euclidian
norm is used. Hence the stability of a (linear) FD scheme can be investigated
by finding the norm of the matrix A−1 B.
This is the matrix method for stability and may be quite difficult to implement.
It should be noted that there are many definitions of norms and a FD scheme
may be stable in one norm but not in another.
The sharpest statements we obtain using the spectral radius, but this is pos-
sible only if the 2-norm and the spectral radius coincide for the matrix under
investigation. 

 Example
...


c Rainer Callies TUM 2020 100
4.5 Multigrid Methods
To define and analyze basic properties of multigrid methods we use the FD
approximation of a 1D Dirchlet boundary value problem as a simple example
problem.

 Example 1 (cf. (A30) – smoothen the error)


We consider the BVP
−uxx = f for Ω = ]0, 1[
u(0) = a, u(1) = b on ∂Ω
On [0, 1] we define a uniform mesh Ωh with N + 2 gridpoints and a mesh size
h := 1/(N + 1)
Ωh := {xj | xj = j · h, j = 0, . . . , N + 1}
For the O(h2 )-approximation of uxx we choose
u(x + h) − 2u(x) + u(x − h)
uxx (x) = + O(h2 )
h2
and we need u ∈ C 4 ([0, 1], IR) at minimum! With fj := f (xj ) and the numerical
approximation Uj,h of u(xj ) we obtain the following tridiagonal system Ah Uh =
fh of N equations for the calculation of the approximate solutions U1,h , . . . UN,h
     / 
−2 1 U1,h f1 + a h 2
 1 −2 1     
   U2,h   f2 
     
 ..   U3,h  2  f3 
− 1 −2 . ·  = h · 
   ..   .. 
 . . . .   .   . 
 . . 1     / 
1 −2 UN,h fN + b h 2
| {z } | {z } | {z }
Ah Uh fh

We denote the sparse matrix by Ah , because even if the h appears not directly
in Ah , its size is affected by the choice of h!

We want to solve this problem iteratively by the damped Jacobi method


= MJ (ω, h)Uh + ωDh−1 fh with MJ (ω, h) := (I − ωD−1 Ah ) , ω ∈ ]0, 1]
(ν+1) (ν)
Uh
Consider the eigenvalue problem for the damped Jacobi iteration
(k) (k) (k)
MJ (ω, h)vh = λh vh
(k) (k)
For the EWs λh (ω) and the EVs vh we get
(k) ( )
λh (ω) = 1 − ω · 1 − cos(kπh) ,
 
sin(kπ · 1 · h)
(k) ( )  .. 
vh = sin(kπhj) j=1,N =  . , k = 1, . . . , N .
sin(kπ · N · h)
(k)
vh can be regarded as the vector of function values obtained by evaluating a
smooth function gk (x) := sin(kπ · x) at the interior grid points j · h, j = 1, . . . , N .


c Rainer Callies TUM 2020 101
Does that iterative solver converge to a fixed-point, i.e. Uh → Uh∗ for ν → ∞?
(ν)

For the spectral radius, we have that ϱ(MJ (ω, h)) < 1 ∀ ω ∈ ]0, 1], because
(k)
ϱ(MJ (ω, h)) = max |λh (ω)|
k,ω

and thus convergence is guaranteed.


We see: Convergence depends on the method chosen (here: damped Jacobi)
as well as – because of the structure and EWs of the matrix Ah – on the
problem this method is applied to (here: 1D Laplace).
Please mind that due to the approximation of the derivatives by FD in the dif-
ferential equation, Uh∗ is not equal to the exact solution u at the grid points!

Which spectral radius do we get e.g. for ω = 1 (classical Jacobi method)?


For ω = 1 the spectral radius is for h → 0
1
ϱ(MJ (1, h)) = max |1 − 1 + cos(kπh)| = cos(πh) = 1 − πh2 + O(h4 )
k=1,...,N 2
The convergence of our iterative solver deteriorates as h → 0, because ϱ → 1.
The finer the grid is, the worse are the guaranteed convergence properties.

What do we obtain from a detailed analysis of the iteration error?


Let denote εh := Uh − Uh∗ the error after the ν-th iteration cycle. The EVs
(ν) (ν)

form a complete basis of the IRN , therefore we can decompose the initial error
at ν = 0
∑N
(0) (0) ∗ (0) (k)
εh = Uh − Uh = ek,h · vh
k=1
After one iteration cycle we get using the EW-/EV-property

(1) (0)

N
(k) (0) (k)
εh = MJ (ω, h)εh = λh (ω) · ek,h · vh
k=1

and after m iteration cycles we get (see p.15)


N (
∑ )m
(m) (0) (0) (k)
εh = MJ (ω, h)m εh = λh (ω) (k)
· ek,h · vh
k=1

(k) (0)
We see: The smaller the k-th EW λh (ω) is, the faster the k-th component ek,h
(0)
of the error εh is damped. Damping is only weak for EWs close to 1.

How to damp at least one half of the error components efficiently for fine grids?
(k)
Let us divide the EWs into two groups: Group 1 contains the λh with 1 ≤ k <
(k)
N/2 and group 2 contains the λh with N/2 ≤ k ≤ N .


c Rainer Callies TUM 2020 102
We want to choose the ω such that the error components which belong to
N/2 ≤ k ≤ N are damped as good as possible:
{ }
(k)
µ := max |λh |, N/2 ≤ k ≤ N
< max{1 − ω, |1 − ω(1 − cos(π)|} = max{1 − ω, |1 − 2ω|}
Using this result we find that the optimal µ∗ = 1/3 is obtained using ω ∗ = 2/3.
Each gk (x) that belongs to group 2 is more rapidly oscillating than each gk (x)
that belongs to group 1. So for our example system by the choice of ω ∗ = 2/3
we try to damp the so-called high-frequency components belonging to EWs
from group 2 as good as possible, whereas the so-called low-frequency com-
ponents belonging to EWs from group 1 are not included into the optimization
procedure.
We also obeserve, that the worst-case situation is obtained for k = 1 which
(1)
belongs to the g1 (x) with the lowest frequency: For N ≫ 1 we get λh (ω) ≈ 1
(0)
and extremely low damping of the respective error component e1,h .
On the other hand after a few cycles m we get
( 1 )m
(m) (0) (0)
ek,h < ek,h ≪ ek,h
3
for all high-frequency components (group 2 with N/2 ≤ k ≤ N ). For this reason,
although the global error decreases slowly per iteration step, it is smoothed out
very quickly – i.e. components which belong to EVs/gk (x) with high oscillations
are damped – and this process does not depend on h! 

 Example 2 (two-grid method)


We consider again the BVP
−uxx = f for Ω = ]0, 1[
u(0) = a, u(1) = b on ∂Ω
and define two uniform meshes Ωh (fine grid, N + 1 gridpoints with N even)
and ΩH (coarse grid) with e.g. H = 2h. Using the same FD discretization as in
the last example we get two discretized systems
Ah Uh = fh and AH UH = fH
of different dimension (N − 1 and (N/2 − 1 in our example), but with the same
(tridiagonal) structure of Ah and AH .

Basic idea
The two-grid strategy combines two complementary schemes. The high-fre-
quency components of the error are reduced by applying iterative methods like
Jacobi or Gauss-Seidel schemes. For this reason these methods are called
smoothers.
On the other hand, the low-frequency error components are effectively reduced
by a coarse-grid correction procedure.


c Rainer Callies TUM 2020 103
Realization of the two-grid idea in 7 steps

(1) Presmoothing steps on the fine grid:


(0)
Start with Uh and try to solve the system Ah Uh = fh iteratively (e.g. by
damped Jacobi), stop after m steps
(0) (1) (m)
Uh → Uh → . . . → Uh

The high-frequency components of the initial error εh := Uh − Uh∗ are


(0) (0)

− Uh∗ , i.e. only the smooth(er) functions


(m) (m)
efficiently damped in εh := Uh
(m)
gk (x) (see last example) contribute to the error εh significantly.

(2) Calculation of the residual


(m)
The exact error εh is the solution of the following equation which is
equivalent to the original problem
( )
Ah Uh∗ = fh ⇔ Ah Uh − εh
(m) (m) (m) (m) (m)
= fh ⇔ Ah εh = Ah Uh − fh =: rh

(m)
The residual rh can be simply calculated.
(k) (k)
To analyze the residual, we also calculate the EWs µh and EVs zh of
Ah . For that we use that D = 2 · I for our special matrix Ah in the damped
Jacobi iteration:
( (k)
)
1 − λ
= λh vh ⇔ D−1 Ah vh =
(k) (k) (k) (k) h (k)
MJ (ω, h)vh vh
ω
( (k)
)
(k) 1 − λ (k) (k)
⇔ Ah vh = 2 h
vh = 2(1 − cos(kπh) · vh

(k) (k)
Therefore the EVs zh = vh are the same as those of Mj (ω, h), only the
(k)
EWs have to be transformed: µh = 2(1 − cos(kπh).

Insertion into the residual gives

= Ah Uh − fh = Ah Uh∗ + Ah εh − fh
(m) (m) (m)
rh

N −1 ( )m
(0) (k)
= λh (ω) (k)
· ek,h · Ah vh
k=1

N −1 ( )m
(k) (0) (k)
= λh (ω)(k) · µh · ek,h · vh
k=1

Also the residual is smooth(er) with a similar damping of the high-


(k) (k)
frequency error components (0 < µh < 4); by the µh also the low-
frequency components are damped.


c Rainer Callies TUM 2020 104
(3) Restriction of the residual
If we inspect
(m) (m)
A h εh = rh (∗)
this again can be seen as a linear system that results from the FD ap-
proximation of the Poisson equation; here we know in addition that the
(m) (m)
new right hand side rh and the unknown solution eh are rel. smooth,
i.e. varying not so rapidly.
That motivates the strategy to solve the Poisson equation for the new
(m)
right hand side rh on a coarser grid: That is more efficient and possibly
the accuracy is sufficient in that case. Later we have to prove (!) that our
idea was good.
In the simplest approach, we cancel every second equation in (∗) and by
that restrict our residual to the coarse grid with a mesh size H = 2h.
For this simple approach the restriction operator IhH is
 
0 1 0
 
 0 1 0 
IhH =   ∈ IR(N/2−1)×(N −1)
 ··· ··· 
0 1 0

and we obtain the (N/2 − 1)-dimensional linear system


(m) (m) (m)
AH εH = IhH rh =: rH , H = 2h

The restriction is no inverse operation (IhH is not a non-singular square


matrix) and so we cannot avoid to loose information.
Here we use a better restriction based on the averaging operator
 
1 2 1
1 1 2 1


IhH =   ∈ IR(N/2−1)×(N −1)
4 ··· ··· 
1 2 1

Let be gh ∈ IRN −1 a vector – e.g. a discrete function on Ωh –, then we


obtain componentwise for the restricted vector gH ∈ IRN/2−1 – e.g. the
(restricted) discrete function on ΩH – from gH = IhH · gh
1 2 1
(gH )j = (gh )2j−1 + (gh )2j + (gh )2j+1
4 4 4

(4) Solution of the coarse grid problem


We solve (directly or by another iterative scheme)
(m) (m) (m)
AH εH = rH ⇒ εH = . . .

For that solution we again can use a nested two-grid cycle!


c Rainer Callies TUM 2020 105
(5) Coarse-grid correction
(m) (m)
Because of smoothness one expects that εH is an approximation to εh
on all grid points that Ωh and Ωh have in common, i.e. on all grid points
xj ∈ Ωh ∩ ΩH .
(m)
To obtain an approximation of εh for all the other grid points of the fine
grid which are not grid points of the coarse grid too, we use interpolation.
With the prolongation operator IH h we get an improved approximation on

the fine grid


(m+1) (m) (m)
Uh = Uh − IH h
· εH
As an example we use linear interpolation with the prolongation operator
 
1
 2 
 
 1 1 
 
 
 2 
h 1  
IH :=  .. 
2 1 . 
 
 .. 1 
.
 
 
 2 
1

If again gh ∈ IRN −1 is a discrete function on Ωh (= vector) and gH ∈ IRN/2−1


on ΩH , then we obtain componentwise from gh = IH h ·g
H
{
(gH )j/2 if j even
(gh )j = 1
( )
2 (gH )(j−1)/2 + (gH )(j+1)/2 if j odd

(6) Postsmoothing steps on the fine grid:


(m+1)
Start with Uh and try to solve the system Ah Uh = fh iteratively (e.g.
by damped Jacobi), stop after m̃ steps
(m+1) (m+2) (m+1+m̃)
Uh → Uh → . . . → Uh

(7) Loop
Continue with step (1) of the algorithm, if necessary. 

The fine-coarse-fine loop defined in steps (1)-(6) is called v-cycle, because


from a fine grid we go down to a coarse grid and back again.

From an eigenvector analysis of the errors (see (A46)) we get the following
essential result for the v-cycle in our example problem 2:


c Rainer Callies TUM 2020 106
Theorem
Consider the BVP problem

−uxx = f for Ω = ]0, 1[


u(0) = a, u(1) = b on ∂Ω

Define the two-grid method (v-cycle) exactly as in example 2 and use the same
notation. Choose m = 2 and ω = 2/3. Then after the steps (1)-(5) (i.e. without
postsmoothing) of one v-cycle we get

− Uh∗ ∥2 ≤ 0.782 · ∥εh ∥2


(m+1) (m+1) (0)
∥εh ∥2 = ∥Uh

Remarks
Each v-cycle reduces the error at least by a constant factor, and this is true
also for h → 0! That is an excellent result.

From example 2 we see that a perfect smoother followed by an exact solution


at step 4 would leave no error. In reality, this will not happen.
Fortunately, a careful (but unfortunately not so simple) analysis shows that
a v-cycle with good smoothing (better than by the damped Jacobi!) always
reduces the error by a constant factor ϱ that is independent of h: A typical and
good value is ϱ = 0.1; compare this e.g. with ϱ = .99 for Jacobi alone. We
achieve a convergence factor ϱ that does not move up to 1 as h → 0 and thus
achieve a given relative accuracy in a fixed number of cycles. Since each step
of each v-cycle requires only O(N ) operations on sparse problems of size N ,
one loop (1-6) of the two-grid method is an O(N ) algorithm. This does not
change in higher dimensions. 

V-Cycles, W-Cycles and Full Multigrid


Clearly multigrid need not stop at two grids. Because of size, a direct solution
of the problem in step (4) of example 2 often is not possible or efficient. And
for an iterative solution, the lowest frequency is still low on the H = 2h grid, and
that part of the error does not decay quickly until we move to 4h or 8h or . . . (or
a very coarse 512h).
The two-grid v-cycle extends in a natural way to more grids. It can go down to
coarser grids (e.g. 2h, 4h, 8h) and back up again to (4h, 2h, h). This nested
sequence of v-cycles is a V-cycle (capital V, see Fig. 28/le.). Because coarse
grid iterations are much faster than fine grid iterations, a detailed mathematical
analysis shows that time is well spent on the coarse grids. So the W-cycle that
stays coarse longer is generally superior to a V-cycle.
The full multigrid cycle is asymptotically better than V or W. Fig. 28/ri. de-
scribes a typical multigrid scheme.


c Rainer Callies TUM 2020 107
h
2h
4h
8h
V-cycle W-cycle full multigrid

Figure 28: V-cycles, W-cycles and Full Multigrid use several grids several times.

Full multigrid starts on the coarsest grid. The solution on the 8h grid is inter-
(0)
polated to provide a good initial vector U4h on the 4h grid. A v-cycle between
4h and 8h improves it. Then interpolation predicts the solution on the 2h grid,
and a deeper V-cycle makes it better (using 2h, 4h, 8h). Interpolation of that
improved solution onto the finest grid gives an excellent start to the last and
deepest V-cycle.

The operation counts for a deep V-cycle and for full multigrid are certainly
greater than for a two-grid v-cycle, but only by a constant factor. That is be-
cause the count is divided by a power of 2 every time we move to a coarser
grid. For a differential equation in d space dimensions, we divide by 2d . The
cost of a V-cycle (as deep as we want) is less than a fixed multiple of the v-cycle
cost:
( ( )2 )
1 1 2d
V-cycle cost < 1 + d + d + . . . · v-cycle cost = d · v-cycle cost
2 2 2 −1

Full multigrid is nothing else than a series of inverted V-cycles, beginning on


a very coarse mesh. Because of this and using the estimate just obtained we
get
( )2
2d 2d
full multigrid cost < d · V-cycle cost < · v-cycle cost
2 −1 2d − 1

The method works excellent in practice, if carefully programmed!


c Rainer Callies TUM 2020 108

You might also like