NumProg2 - 2020-07-12
NumProg2 - 2020-07-12
(MA 3306)
Summer Term 2020
Rainer Callies
Department of Mathematics M2
Technical University of Munich
I
Contents
1 Repetition 3
1.1 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Fixed-Point Interations in Banach Spaces . . . . . . . . . . . . . 4
1.3 Error Propagation – Basics . . . . . . . . . . . . . . . . . . . . . 7
II
4 Finite Differences 54
4.1 One-Dimensional Model Problem . . . . . . . . . . . . . . . . . . 54
4.1.1 Model problem . . . . . . . . . . . . . . . . . . . . . . . . 54
4.1.2 Numerical Approximation by Finite Differences . . . . . . 56
4.1.3 Convergence of the Finite Difference Method . . . . . . . 57
4.2 Quasilinear PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Poisson Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3.1 Derivation of the Poisson Equation . . . . . . . . . . . . . 62
4.3.2 Poisson Equation and Properties of its Solution . . . . . . 64
4.3.3 Grid, Difference Operators and Boundary Conditions . . . 65
4.3.4 Formulation of the Sparse Linear System . . . . . . . . . 70
4.3.5 Analysis of the Finite Difference Discretization . . . . . . 71
4.4 1D Linear Advection Equation . . . . . . . . . . . . . . . . . . . . 75
4.4.1 Formulation of the PDE Problem . . . . . . . . . . . . . . 75
4.4.2 Explicit Schemes . . . . . . . . . . . . . . . . . . . . . . . 77
4.4.3 Repetition: Discrete Fourier Transform (DFT) . . . . . . . 80
4.4.4 Von Neumann Stability Analysis . . . . . . . . . . . . . . 82
4.4.5 Difference Equations . . . . . . . . . . . . . . . . . . . . . 87
4.4.6 Von Neumann Stability Analysis Extendend . . . . . . . . 95
4.4.7 Implicit Schemes – Crank-Nicolson Scheme . . . . . . . . 97
4.4.8 Matrix Stability Analysis . . . . . . . . . . . . . . . . . . . 99
4.5 Multigrid Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 101
III
1 Repetition
1.1 Norms
Definition (special vector norms)
For ⃗x ∈ C n we define
∥⃗x∥1 |x1 | + |x2 | + . . . + |xn |
:= v
u∑
u n
∥⃗x∥2 := t |xi |2
i=1
∥⃗x∥∞ := max {|x1 |, |x2 |, . . . , |xn |}
Definition
Let be A ∈ IRn×m , then the vector norm ∥· ∥p can be used to define the following
matrix norm
∥A⃗x∥p
∥A∥p := sup = sup ∥A⃗x∥p
x̸=0 ∥⃗
⃗ x∥p ∥⃗x∥p =1
⃝
c Rainer Callies TUM 2020 3
Examples: induced matrix norms
( )
∑
m
∥⃗x∥1 = |x1 | + . . . + |xn | ⇒ ∥A∥1 = max |aij |
j=1...n
v i=1
u∑ √
u n
∥⃗x∥2 = t |xi |2 ⇒ ∥A∥2 = λmax (AH A)
i=1
√
= max ⃗xH AH A⃗x/(⃗xH ⃗x)
x̸=0
⃗
∑n
∥⃗x∥∞ = max {|x1 |, . . . , |xn |} ⇒ ∥A∥∞ = max |aij |
i=1...m
j=1
∥A∥∞ is the maximum absolute row sum of the matrix and is called row sum
norm, ∥A∥1 is the maximum absolute column sum of the matrix and is called
column sum norm, ∥A∥2 is the spectral norm.
Example
All induced norms are sub-multiplicative, the matrix norm ∥A∥ := maxi,j |aij | is
not. The Frobenius-norm is a sub-multiplicative norm compatible with – but not
induced by – the vector norm ∥ · ∥2
v
u∑
u n ∑ m
∥A∥F := t |aij |2
i=1 j=1
f (x∗ ) = 0 .
lim xk = x∗ .
k→∞
lim xk = x∗ ⇒ ϕ(x∗ ) = x∗ .
k→∞
⃝
c Rainer Callies TUM 2020 4
Example
What is the meaning of a fixed-point in Newton’s method?
f (xk ) xk =xk+1 =x∗ f (x∗ )
xk+1 = ϕ(xk ) = xk − ========⇒ 0=− ⇒ f (x∗ ) = 0
f ′ (xk ) f ′ (x∗ )
Most of the statements derived for fixed-point iterations in the scalar case (e.g.
Newton’s method applied to the one-dimensonal case) can be generalized to
the n-dimensional case (and more general also to Banach spaces) with only
minor changes: Instead of the absolute value the vector norm is used.
By this, we can reuse the knowledge obtained here for the iterative solution of
large linear systems.
Remark
In IRn all norms are equivalent, i.e. let denote ∥ · ∥a , ∥ · ∥b two different norms
(e.g. ∥ · ∥2 , ∥ · ∥∞ , . . .), then there exist two constants α > 0, β > 0
Statements using one special norm are valid for all norms, only the values of
constants (e.g. Lipschitz constant L) change, if the norm changes.
Definition (Operators)
Let be X, Y normed spaces over IR. A function T : D → Y, D ⊆ X, is called
operator. Often we abbreviate T x for T (x) even if T is not linear.
⃝
c Rainer Callies TUM 2020 5
Definition (Lipschitz condition)
Operator T : D → Y is in D Lipschitz continuous with Lipschitz constant L, if
∥T x − T y∥ ≤ L∥x − y∥ ∀ x, y ∈ D
If T is linear, then we can restrict our investigation to the case y = 0 and obtain
∥T x∥ ≤ L∥x∥ ∀ x ∈ D
In this case the smallest Lipschitz constant possible is called operator norm
∥T ∥ of T
∥T x∥
∥T ∥ := sup
x∈D ∥x∥
⃝
c Rainer Callies TUM 2020 6
1.3 Error Propagation – Basics
The errors investigated in this context are unavoidable and problem-induced,
even if a perfect numerical algorithm is chosen. To reduce the errors one has
to reformulate the underlying mathematical problem.
Only input errors are considered, other error sources like rounding errors are
neglected here.
Input errors may be measurement errors, but also all errors (e.g. rounding
and discretization errors) made in previous iteration steps or in the solution of
preceeding subproblems.
Important question
How large is the perturbation δf of the solution compared to the perturbation
δx of the input data?
Remark
If the condition number is large, a small perturbation in the input data causes
large perturbations in the final result. The condition numbers compress the
information about error amplification into one scalar.
Remark
Introducing condition numbers can be seen as a linearization of the original
problem. This leads to an equivalent definition of the condition numbers.
⃝
c Rainer Callies TUM 2020 7
Example
For the calculation of the condition numbers for this problem we use matrix
norms and obtain
∥Df (⃗x)∥
κabs (f⃗,⃗x) = ∥Df (⃗x)∥ , κrel (f⃗,⃗x) :=
∥f⃗(⃗x)∥/∥⃗x∥
Example (perturbed linear system)
Definition of possible perturbations:
Let us consider the perturbation
If ∥δA∥ < 1/∥A−1 ∥, the norm estimate in the first line can be valid only for ⃗x = ⃗0.
Thus ⃗x = ⃗0 is the only solution of (A + δA)⃗x = ⃗0 and therefore A + δA is non-
singular.
⃝
c Rainer Callies TUM 2020 8
Therefore this expression is defined as the ”condition of the linear system” and
denoted by κrel (A) := ∥A−1 ∥∥A∥ (rough estimate!).
max∥⃗x∥p =1 ∥A⃗x∥p
∥A∥p ∥A−1 ∥p =
min∥⃗z∥p =1 ∥A⃗z∥p
• κrel (A) ≥ 1.
⃝
c Rainer Callies TUM 2020 9
2 Iterative Solution of Linear Systems
Problem
We want to solve the (large) linear system
Ax = b , A ∈ C n×n , b ∈ C n
with n2 > available storage and aik = 0 for almost all i, k (”sparse matrix”).
A special structure of A (e.g. banded matrix with aik = 0 for |i − k| > const) is
not necessary.
The direct methods for the solution of linear systems investigated up to now are
not suited. Even in case of sufficient storage, fill-in occurs when performing e.g.
Gaussian elimination on a sparse matrix: Many entries of the matrix change
from zero to a non-zero value in the execution of the algorithm.
Example
y, j h=1/4
}
y=1- 4
u(x,y) unknown
u(x,y) given on boundary
3
2
7
4
8
5}
9
6
} h=1/4
1
1 2 3
y=0 - 0
0 1 2 3 4 x, i
x=0 x=1
and
u11 u10 + u01
u21 u20
u31 u30 + u41
Az = A
u12
=
u02
=b
.. ..
. .
u33 u34 + u4,3
The five-point stencil at the grid point with the (green) number k leads to the
k-th row of the linear system.
If system size increases, the block structure remains unchanged, but the size
and the number of the blocks increase accordingly and the ratio of the zero
elements increases too.
2.1.1 Introduction
Definition
Let us consider the problem Ax = b , A ∈ C n×n , b ∈ C n .
⃝
c Rainer Callies TUM 2020 11
An iterative method is called convergent, if for each initial guess x(0) ∈ C n (or
IRn ) a sequence {x(m) } of approximations is generated that converges to the
solution x∗ of the linear system Ax = b
lim x(m) = x∗
m→∞
x(m+1) = M x(m) + N b
Example
The modified Richardson iteration is
( )
x(m+1) = x(m) + ω b − Ax(m)
where ω is a scalar parameter that has to be chosen such that the sequence
{x(m) } converges. It is easy to see that the method has the correct fixed point
x∗ and thus is consistent.
In our notation, N = ωI and M = I − ωA.
If there are both positive and negative eigenvalues of A, the method will di-
verge for any ω if the initial error x(0) − x∗ has nonzero components in the
corresponding eigenvectors.
We observe that in each iteration step the main effort is one matrix-vector
multiplication only corresponding to O(N 2 ) operations for general matrices and
O(N ) operations for sparse matrices.
A = E +D+F
Example
1 2 3 4
5 6 7 8
A := ⇒
11 12 13 14
15 16 17 18
0 0 0 0 1 0 0 0 0 2 3 4
5 0 0 0 0 6 0 0 0 0 7 8
E = , D = , F =
11 12 0 0 0 0 13 0 0 0 0 14
15 16 17 0 0 0 0 18 0 0 0 0
⃝
c Rainer Callies TUM 2020 12
The following iterations again are all fixed-point iterations and thus consistent.
x(m) denotes the result after the m-th iteration cycle. Comparing the above
decomposition with the general definition, we get M = −A−1 −1
1 A2 and N = A1 .
(m) (m)
∑
n
(m) (m) (m)
ri − aik δxk = bi − aij xj − aik δxk = 0 ⇒ δxk = ... (∗)
j=1
(m)
From the linear equation (∗) the correction δxk can be calculated. After that
iteration step the i-th equation is exactly fulfilled for one moment, but already in
the next step that property is destroyed again and another equation is exactly
fulfilled (i.e. another component of the residual is precisely zero).
We have to assure that each row – and thus each component of the residual –
is reached.
The different methods mainly differ in their strategy to choose the sequence of
index pairs (i, k). We hope, that for a proper choice of the index pairs (i, k) the
iteration converges: x(m) → x∗ . That remains to be studied in detail.
⃝
c Rainer Callies TUM 2020 13
In the Gauss-Seidel method (GS) we cyclically choose i = k = 1, 2, . . . , n and
thus carry out m times complete cycles of n steps to obtain x(m) . Permutations
may be necessary to achieve aii ̸= 0.
for m = 1 to p do
Choose relaxation factor ω = ω(m) ̸= 0
for i = 1 to n do
∑
n
xi := xi + ω bi − aij xj /aii
j=1
for i = 1 to n do
xi := x′i
...
(m)
In step i = 1, . . . n of the (m + 1)-th cycle the i-th component xi is modified
(m)
such that ri = 0; the modification is not immediately applied, but only stored.
After the complete cycle has been finished all stored modifications are applied
(m) (m+1)
simultaneously (xi → xi for i = 1, . . . , n). This method nowadays is not
used very often.
Remark
For sparse A, the iteration steps in (GS), (SOR) and (J) are cheap!
⃝
c Rainer Callies TUM 2020 14
We prove: The (SOR) algorithm is compatible with the matrix formulation.
Let us analyze step i of the inner loop in the (m + 1)-th cycle.
(m+1) (m+1)
In that case the components x1 , . . . , xi−1 in the inner loop already have
been updated. For i ∈ {1, . . . , n} step i can be written in detail
ω ∑
i−1 ∑n
aij xj
(m+1) (m) (m+1) (m)
xi = xi + bi − aij xj −
aii
j=1 j=i
( )
= xi + ω D−1 (b − Ex(m+1) − Dx(m) − F x(m)
(m)
i−th component
Convergence Theorem 1
If {x(m) } converges at all, then it converges to x∗ .
Proof:
If x∗ = limm→∞ x(m) exists, then we get by insertion into the matrix formulation
A1 x∗ + A2 x∗ = b ⇒ x∗ = (A1 + A2 )−1 b = A−1 b
and x∗ is uniquely determined for det(A) ̸= 0.
Definition
Let denote ε(m) := x(m) − x∗ the error after the m-th iteration cycle
and ϱ := max |λi (A−1
1 A2 )| the spectral radius (= largest absolute value of the
1≤i≤n
eigenvalues) of the matrix A−1 −1
1 A2 . (−A1 A2 ) is called iteration matrix.
Convergence Theorem 2
It is ε(m+1) = −A−1
1 A2 · ε
(m) and consequently ε(m) = (−A−1 A )m ε(0) .
1 2
Proof:
A1 x(m+1) + A2 x(m) = b = A1 x∗ + A2 x∗ ⇒ A1 ε(m+1) + A2 ε(m) = 0
The second formula can be proven by induction.
Convergence Theorem 3
⃝
c Rainer Callies TUM 2020 15
Proof: (only ”⇒”, by contradiction)
If ϱ ≥ 1, then there exists at least one EV vmax for the maximum EW λmax with
|λmax | ≥ 1. If x(0) = x∗ − vmax , then by definition ε(0) = vmax .
From convergence theorem 2 we get (using the EW/EV definition)
ε(m) = (−A−1 m EV m m
1 A2 ) vmax = (−1) λmax vmax
Taking the norm gives ∥ε(m) ∥ = |λmax |m ∥vmax ∥ ≥ ∥vmax ∥ and thus no conver-
gence .
Remark
Per decimal digit of precision therefore −1/ log10 (ϱ) cycles have to be per-
formed.
Convergence Theorem 4
A ∈ C n×n :
(SOR) converges – if at all – only for ω ∈ ]0, 2[ .
A ∈ C n×n positive definite:
(GS) converges, (SOR) converges for ω ∈ ]0, 2[ fixed,
convergence of (J) not granted
∑
A ∈ C n×n strictly diagonal-dominant, i.e. |aii | > nj=1,̸=i |aij | , i = 1, . . . , n :
(J) and (GS) converge.
Convergence rates of the methods (GS), (SOR) and (J) are investigated later.
Example
Jacobi method and 5-point stencil for Laplace’s equation
−1 4 0 −1 0
−1 . . . . . . ..
.
.. .. ..
. . −1 .
0 −1 4 −1
0
−1 0 4 −1 0
A=
.. . .
. −1 . . . .
.. .. ..
. . . −1
0 −1 0 −1 4
N 2 ×N 2
⃝
c Rainer Callies TUM 2020 16
1 1
−A−1 −1
1 A2 = −D (E + F ) = − I(A − 4I) = I − A
4 4
0 1 0 1 0
1 ... ... ..
.
.. .. ..
. . 1 .
0
1 0 0 1
1
1 0 0 1 0
=
4
..
.
.
1 .. ..
.
.. .. ..
. . . 1
0 1 0 1 0
N 2 ×N 2
Claim:
−A−1 2
1 A2 has the N eigenvectors z
(k,l) , k, l = 1, . . . , N, with the components
( ) ( )
(k,l) kπi lπj
z(i−1)·N +j := sin sin
N +1 N +1
and the correspondig eigenvalues
( )
(k,l) 1 kπ lπ
λ := cos + cos
2 N +1 N +1
The index i refers to the i-th block row and the index j to the number of the
single row within this block row.
Proof:
For e.g. the 1st component we get using trigonometric angle sum identities
4(−A−1
1 A2 z
(k,l)
)1
(k,l) (k,l) (k,l) (k,l)
= z2 + zN +1 = zi=1,j=2 + zi=2,j=1
kπ 2lπ 2kπ lπ
= sin sin + sin sin
N +1 N +1 N +1 N +1
kπ lπ lπ kπ kπ lπ
= sin · 2 sin cos + 2 sin cos · sin
N +1 N +1 N +1 N +1 N +1 N +1
(k,l)
= 4λ(k,l) z1
This is a typical behavior often observed when iteration methods are applied to
linear systems resulting from the discretization of PDEs by mult-point stencils:
If system size increases, not only the effort per iteration cycle increases, but
the number of necessary cycles increases too!
⃝
c Rainer Callies TUM 2020 17
Definition (consistently ordered matrix)
Given A = D + E + F = D(I + L + U ) ∈ C n×n with L := D−1 E and U := D−1 F .
We define J(α) := −(αL + α−1 U ) , α ∈ C \ {0}.
A is consistently ordered, if the EWs of the matrix J(α) are independent of α.
Remark
By reordering the variables x1 , . . . xn of a linear system Ax = b, the resulting
and new matrix can be consistently ordered (that is the reason for the notion).
2 1 0 0 1/α 0
1 2 1 1
⇒ J(α) = − α 0 1/α
2
0 1 2 0 α 0
2λ 1/α 0
1
⇒ J(α) − λI = − α 2λ 1/α
2
0 α 2λ
⃝
c Rainer Callies TUM 2020 18
Claim: For the optimal relaxation parameter ωb we get
2
ωb := arg min ϱ(H(ω)) = √ , ϱ(H(ωb )) = ωb − 1
ω∈ ]0,2[
1 + 1 − ϱJ
2
and in general
{
ω−1 √ , ω ∈ [ωb , 2]
ϱ(H(ω)) =
1 − ω + ω 2 ϱ2J /2 + ωϱj 1 − ω + ω 2 ϱ2J /4 , ω ∈ ]0, ωb ]
0.8
0.6
ρ(H(ω))
0.4
ρ = 0.3
J
0.2 ρJ = 0.7
ρ = 0.9
J
0
0 0.5 1 1.5 2
ω
⃝
c Rainer Callies TUM 2020 19
Question:
How man cycles of (J) do we need instead of one optimal (SOR) cycle?
Answer:
ln ϱ(H(ωb ))
ϱkJ = ϱ(H(ωb )) ⇒ k =
ln ϱJ
We want to estimate that expression using several Taylor expansions up to
O(N −3 ) to get an idea of the order of magnitude of that effect:
( ( ))
π
ln ϱ(H(ωb )) = 2 ln ϱJ − ln 1 + sin
N +1
( ) ( )2
π 1 π
cos = 1− + O(N −4 ) for N ≫ 1
N +1 2 N +1
ln(1 + z) = z + O(z 2 ) = z − z 2 /2 + O(z 3 ) ⇒
( )2
1 π
ln ϱJ = ln
1 − 2 + O(N −4 )
N +1
| {z }
=:z
( )2
1 π
= − + O(N −4 )
2 N +1
)(
π π
1 + sin = 1+ + O(N −3 )
N +1 N +1
( ( )) ( )2
π π −3 1 π
ln 1 + sin = + O(N ) − + O(N −6 )
N +1 N +1 2 N +1
( ( ))
π 2π
ln ϱ(H(ωb )) = 2 ln ϱJ − ln 1 + sin =− + O(N −3 )
N +1 N +1
4
⇒ k(N ) ≈ (N + 1)
π
In our example, the optimal SOR method is more than N times faster than (J) !!
in which the partitionings of b and x into subvectors bi and xi are identical and
compatible with the partitioning of A.
⃝
c Rainer Callies TUM 2020 20
We assume that the Aij are square matrices with det(Aii ) ̸= 0.
Now we define, similarly to the scalar case, the splitting A = D + E + F with
A11 0
A22 A12 0
D= ..
, E= .
. .. ..
, F = ...
. . . .
AM M AM 1 AM 2 · · · 0
Dx(m+1) + (A − D)x(m) = b ⇒
(m+1)
∑
M
(m)
Ajj xj = bj − Ajk xk , j = 1 . . . M
k=1,k̸=j
Example
With finite difference approximations of PDEs, it is standard to block the vari-
ables and the matrix by partitioning along whole lines of the mesh. More gen-
eral, a block can also correspond to the unknowns associated with a few con-
secutive lines in the plane. One such blocking is illustrated for a 6 × 6 grid:
31 32 33 34 35 36
25 26 27 28 29 30
19 20 21 22 23 24
13 14 15 16 17 18
7 8 9 10 11 12
1 2 3 4 5 6
⃝
c Rainer Callies TUM 2020 21
Figure 4: Block structure of the matrix A associated with that mesh
The advantage of block iterations is the smaller number of iteration cycles (in
our benchmark example of the 5-point stencil the number of iterations depends
on N and in case of block iterations on the much smaller number M ≪ N ). So
the number of cycles required to achieve convergence often decreases rapidly
as the block-size increases.
The disadvantage is, that the effort per cycle significantly increases, because
the subproblems – linear systems with Aii – have to be solved directly. More-
over fill-in may occur in the subproblems.
Finally, block techniques can be defined in more general terms. First, by using
blocks that allow us to update arbitrary groups of components, and second,
by allowing the blocks to overlap. This is a form of the domain-decomposition
method.
Theorem
Let be A a matrix in block-tridiagonal form
D1 A12
A
21 D2 A23
.. ..
A= A32 . .
.. ..
. . AM −1,M
AM,M −1 DM
⃝
c Rainer Callies TUM 2020 22
2.2 Methods Based on Minimization – Krylov Subspace Methods
2.2.1 Fundamental Idea
Let us solve Ax = b , A ∈ C n×n ∧ A positive definite.
This problem is substituted by the minimization problem
1
min f (x) with f (x) := xT Ax − bT x
x̸=0 2
for k = 0, 1, 2 . . . do
(A) Determine the search direction (gradient)
! d
0 = f (x(k) + t · dk )
dt [ ]
d 1 (k)
= (x + t · dk ) A(x + t · dk ) − b (x + t · dk )
T (k) T (k)
dt 2
( )
(k) T
T
= tdk Adk + x Adk − b dk = tdTk Adk − dTk dk
T
dTk dk
⇒ t = =: αk
dTk Adk
Remarks
The method converges to the solution x∗ , if A is positive definite.
The method is converging locally for α sufficiently small and ∇f (x(k) ) ̸= 0:
( )
f (x + αdk ) = f (x ) + ∇f (x ) −α∇f (x ) + O(α2 d2k ) < f (x(k) )
(k) (k) (k) T (k)
⃝
c Rainer Callies TUM 2020 23
Caution necessary if ∇f (x(k) ) is determined numerically e.g. from finite
difference approximation (error in search direction!).
Rate of convergence
Let be A ∈ IRn×n positive definite with EWs 0 < λ1 ≤ λ2 ≤ . . . ≤ λn ; let us define
κ := cond2 A = λn /λ1 and f (x) := 12 xT Ax − bT x.
Minimization
{ of } the quadratic function f (x) by the Gradient method produces a
(k)
sequence x with
k∈IN0
( )k
(k) ∗
κ−1
(0)
x − x
≤
x − x∗
A κ+1 A
√
Here ∥x∥A := xT Ax denotes the so-called ”energy norm” and x∗ the exact
solution of Ax = b.
Remark
If the linear system is ill-conditioned, then we get for the rate of convergence
because of κ ≫ 1:
κ−1
≈1
κ+1
After a few iteration steps the iteration almost terminates.
Linear systems which result from the discretization of elliptic PDEs are often
ill-conditioned.
Example
1
Let us apply the Gradient method to the function f (x, y) := (x2 + ay 2 ) , a ≫ 1,
2
with initial values x(0) = (x0 , y0 ) = (a, 1).
( )
1 0
⇒ A= , κ=a
0 a
( ) ( (1) ) ( )
1 2 x a a−1
d0 = −a , α0 = ⇒ =ϱ , ϱ := ≈1
1 1+a y (1) −1 a+1
( ) ( (2) ) ( ) ( (0) )
1 2 x a x
d1 = −ϱa , α0 = ⇒ =ϱ 2
=ϱ 2
−1 1+a y (2) 1 y (0)
...
( ) ) (
x(k) a k
by induction we prove: =ϱ
y (k) (−1)k
( (k) ) ( )
x x(k)
gradient: dk = − , dk+1 = −ϱ
ay (k) −ay (k)
⇒ dk ⊥ dk+1 for this special case.
⃝
c Rainer Callies TUM 2020 24
We observe that always after two iterations dk is parallel to dk+2 : We are
searching for a better approximation of the solution in a direction, in which
we already have searched two steps before. That is not very efficient!
–1 0 1 2 3 4 5 x
–1
x2
Figure 5: Gradient method applied to f (x, y) := + y 2 , x(0) = (4.5, 3). Iterates shown
2
in a contour picture with contour lines at 19.125, 12, 6, 1.976, 0.204, 0.021, 0.0022.
⟨·, ·⟩X : X × X → K
⃝
c Rainer Callies TUM 2020 25
The lower index in ⟨·, ·⟩X reminds us, that the scalar product has to be specified
exactly .
Example
X := IRn with
∑
n
⟨x, y⟩2 := xi yi = xT y
i=1
√
is a Hilbert space, if ∥x∥2 := ⟨x, x⟩2 .
Definition (orthogonality)
Let X be Hilbert space, then
x, y ∈ X orthogonal (or x ⊥ y) :⇔ ⟨x, y⟩X = 0
Fundamental idea
In contrast to the gradient method we want to avoid searching in the same
direction several times!
⃝
c Rainer Callies TUM 2020 26
Remark (linear independence)
Let be {p1 , . . . , pk } ⊂ IRn mit k ≤ n pairwise conjugate vectors with respect to A
(A positive definite) with pj ̸= 0.
Then the pj are linearly independent, orthogonal in the special scalar product
⟨·, ·⟩A and span a k-dimensional (sub-)space, because
T
∑ k ∑k
αj pj = 0 ⇒ 0 = αj pj Api = αi (pTi Api ) ⇒ αi = 0, i = 1, . . . , k
j=1 j=1
The last step holds because pTi Api ̸= 0 for A positive definite and pi ̸= 0.
for k = 0, 1, 2, . . . , n − 1 do
(A) Calculate the original search direction: dk := b − Ax(k)
(B) Determine αk from
dTk pk ⟨dk , pk ⟩2
αk = =
pk Apk ⟨pk , pk ⟩A
T
From the step x(k+1) := x(k) + αk pk in part (C) we see that in the (k + 1)th
iteration step the correct component αk pk in the direction of one basis vector
pk is added (recursive scheme) to get the improved approximation x(k+1) of the
true solution x∗ . The direction of the basis vector pk is only used once.
How do we obtain the αk in this algorithm? We investigate ⟨pk , x∗ − x(0) ⟩A :
( )
∑
k−1
pTk A(x∗ − x(0) ) = pTk (b − Ax(0) ) = pTk b − Ax(0) − αi Api
i=0
= pTk (b − Ax(k) ) = pTk dk
⃝
c Rainer Callies TUM 2020 27
We have subtracted the sum in the bracket, because in the steps before the
components of x∗ − x(0) in the direction of p0 , . . . , pk−1 already have been added
and the scalar product is not changed by that operation (the {pj } are conju-
gate). On the other hand we can use the basis representation and get
∑
n−1
pTk A(x∗ − x(0) ) = pTk αj Apj = αk · (pTk Apk ) = αk · ⟨pk , pk ⟩A
j=0
dTk+1 dk+1
p0 = d0 and pk+1 = dk+1 + pk .
dTk dk
⃝
c Rainer Callies TUM 2020 28
Theorem (minimum property of the iterate)
Let be A ∈ IRn×n positive definite, Vk := span{p0 , . . . , pk−1 } and apply the CG
method from algorithm 3.
Then the approximation x(k) of x∗ minimizes the function f (x) := 21 xT Ax + bx
not only along the line {x(k−1) + αpk−1 , α ∈ [0, 1]} (analogously to the Gradient
method), but in the total subspace x(0) + Vk .
Remarks
√
Comparison with Gradient method ⇒ instead of κ now κ
Generalization to arbitrary matrices possible (GMRES → generalized mini-
mum residuum, MINRES)
Increasing efficiency by preconditioning → κ is changed
Idea of preconditioning:
Given Ax = b with A ∈ IRn×n positive definite. Choose B ∈ IRn×n positive
definite too and solve instead of the original problem
Choose B such that κ(Ã) ≪ κ(A) and that B can be cheaply applied. A
good preconditioner concentrates the EWs.
Example
A simple matrix B for preconditioning is a diagonal matrix, the diagonal ele-
ments of which are the inverse of the roots of the diagonal elements of the
original matrix.
This idea is motivated by the following theorem:
For a positive definite matrix the minimum EW is smaller or equal the minimum
diagonal element, the maximum EW is greater or equal the maximum diagonal
element. All EWs are positive.
⃝
c Rainer Callies TUM 2020 29
3 Numerical Solution of Ordinary Differential Equations
3.1 Basic Definitions and Transformations
Let be U ⊆ IR × IRn a domain (i.e. an open and connected subset), f : U → IRn
sufficiently often differentiable (theory says: at least continuous) and the initial
value (t0 , x0 ) ∈ U .
We want to determine a function x ∈ C 1 (I, IRn ) on an open and connected
interval I = ]t0 , tf [ ⊂ IR such that
Analytically such a solution often does not exist. Thus we want to calculate
numerically the solution of the above described initial value problem (IVP) of
an ordinary differential equation (ODE).
Remarks
Example
( ) ( )
y1′ (x) x · y22 (x)
= = f (x, y(x), y(x)′ )
y2′′ (x) y2′ (x) + x · y1 (x)
( )
1
y(4) = , y2′ (4) = 7 , I = ]4, 13[
2
⃝
c Rainer Callies TUM 2020 30
Transformation of the ODE into an autonomous system of first order with the
new independent variable ξ instead of x leads to
z1 = y 1 z1′ z4 · z22
′
z2 = y 2 ′ z2 z3 ˜
⇒ z (ξ) = = = f (z(ξ)), ξ ∈ ]4, 13[
z3 = y2′ z3′ z3 + z4 · z1
z4 = x z4′ 1
ξ −4 d d d 1 d
ξ → t := ⇒ t ∈ ]0, 1[ , = (13 − 4) ↔ = ·
13 − 4 dt dξ dξ 9 dt
Prior to the numerical solution always transform a problem into this standard
form!
has at least one (no uniqueness!) solution, which can be extended to the
boundary of U in both directions (i.e. t < t0 and t > t0 ).
Remark
”To extend to the boundary” means to come as close as we want to the bound-
ary of U : either x(t) contains the respective boundary point or ∥x(t)∥ is un-
bounded at the boundary.
To extend to the boundary does not mean that a solution exists on the total
interval [a, b]. The solution might leave U before reaching a or b.
⃝
c Rainer Callies TUM 2020 31
f (t, x) satisfies a (global) Lipschitz condition on U with respect to x with Lip-
schitz constant L , if
– i.e. all partial derivatives are (continuous and) bounded –, then f satisfies a
global Lipschitz condition in U (with L ≤ K · n, if for the norm we have chosen
∥ · ∥ = ∥ · ∥∞ ).
For every (t0 , x0 ) ∈ U there exists exactly one solution x(t) of the IVP
⃝
c Rainer Callies TUM 2020 32
Remark
In this case it cannot happen that |x(t)| → ∞ for t → t1 ∈ ] t0 , b [ ; the solution
exists and is uniquely defined on the full interval [a, b].
Sufficient condition: f ∈ C 1 and U = Q (Q cuboid) sufficiently large.
For every (t0 , x0 ) ∈ U there exists exactly one solution x(t) of the IVP
Remark
In may happen that |x(t)| → ∞ for t → t1 ∈ ] t0 , b [ . The solution exists and is
unique on an interval ]c, d [ ⊆ ]a, b[ with t0 ∈ ]c, d [ .
Sufficient condition: Jacobian w.r.t. ⃗x is continuous.
Remark
The exact calculation of the Lipschitz constant is mostly impossible, we can
only estimate it.
I ⊂ IR interval, U ⊆ I × IRn
f ∈ C 0 (U, IRn ) satisfies global Lipschitz condition in U w.r.t. x with Lipschitz
constant L.
x ∈ C 1 (I, IRn ) is solution of IVP x(t)′ = f (t, x(t)), x(t0 ) = x0 , t0 ∈ I (∗)
Let z ∈ C 1 (I, IRn ) denote an approximation to the solution of the IVP (∗) with
graph(x(t)) ⊂ U , graph(z(t)) ⊂ U
Claim:
L|t−t0 | δ ( L|t−t0 | )
∥x(t) − z(t)∥ ≤ γe + e −1
L
⃝
c Rainer Callies TUM 2020 33
3.3 Numerical Methods: Basic Idea and Notation
Definition
Numerical methods always use a discretization, i.e. we subdivide the integra-
tion interval I = [t0 , tf ]
∫
x(t + h) − x(t) 1 t+h
!
= f (ξ, x(ξ)) dξ =: ∆(t, x, h) ≈ ϕ(t, x, h)
h h t
Example
Explicit Euler:
⃝
c Rainer Callies TUM 2020 34
Implicit Euler:
Definition (one-step method)
A one-step method is a discretization method which for the calculation of ηm+1
only uses ηm , but not e.g. ηm−1 , ηm−2 , . . .
Therefore, a one-step method can be written as
Initial value: η0 = y0
for i = 1 to N − 1 do
ηi+1 = ηi + hi ϕ(ti , ηi , hi )
ti+1 = ti + hi
Remark
With this definition we also can write the implicit Euler as a one-step method
Remark
If not otherwise stated, we will restrict ourselves to one-step methods!
Consider the IVP x′ = f (t, x), x(t0 ) = x0 from chap. 3.1 on the closed interval
I = [t0 , tf ]. Let ϕ be a one-step method which we want to analyze.
⃝
c Rainer Callies TUM 2020 35
Then T (tm , x(tm ), hm ) := x(tm+1 ) − η̃m+1 is called the local discretization error
of the one-step method at tm+1 .
This really is the error of the method after one step only!
∥T (t, x, h)∥
≤ σ(h) ∧ lim σ(h) = 0 ∀ t ∈ I, ∀ x
h h→0
The order of consistency describes the quality of the approximation and allows
to compare different discretization methods.
Theorem
ϕ consistent ⇐⇒ lim ϕ(t, x, h) = f (t, x)
h→0
Example
ηm+1 = ηm + hf (xm )
⃝
c Rainer Callies TUM 2020 36
Definition (global discretization error)
W.l.o.g. we simply the situation and use a constant stepsize h, i.e. tm = t0 +
m · h.
The global discretization error
directly describes the difference between the true solution and its numerical
approximation. Because of the use of η it is only defined at discrete values
(grid points).
Remark
In contrast to consistency it is very difficult to analyze convergence directly.
h2 ′′ h3
x(t + h) = x(t) + hx′ (t) + x (t) + x′′′ (t) + . . . = x(t) + h · ∆(t, x, h)
2 6
hd h2 d2
∆(t, x, h) = f (t, x(t)) + f (t, x(t)) + f (t, x(t)) + O(h3 )
2 dt 6 dt2
h( )
= f (t, x(t)) + ft (t, x(t)) + fx (t, x(t))f (t, x(t))
2
h 2 ( )
+ ftt + 2ftx · f + fxx · f + fx · (ft + fx f ) + O(h3 )
2
6
If we would be able to calculate the necessary derivatives of f (t, x(t)) and x(t),
then we immediately would obtain a method which is consistent of order p.
⃝
c Rainer Callies TUM 2020 37
As an example, by
h h( )
ϕ(t, x, h) := x′ (t) + x′′ (t) = f (t, x) + ft (t, x) + fx (t, x)f (t, x)
2 2
we would construct a one-step method of order p = 2, for
x(t + h) − η̃(t + h)
( ) ( )
′ h2 ′′ h3 ′′′ ′ h2 ′′
= x(t) + hx (t) + x (t) + x (t) + . . . − x(t) + hx (t) + x (t)
2 6 2
= O(h3 )
f (t + β1 h, x + β2 hf (t, x))
( )
∂ ∂
= f (t, x) + β1 h + β2 hf (t, x) f (t, x)
∂t ∂x
( )2
1 ∂ ∂
+ β1 h + β2 hf (t, x) f (t, x) + . . .
2! ∂t ∂x
( )
1 ∂ ∂
= f + (β1 hft + β2 hfx f ) + β1 h + β2 hf (β1 hft + β2 hfx f ) + . . .
2 ∂t ∂x
ϕ(t, x, h) = α1 f + α2 f + α2 h(β1 ft + β2 fx · f )
α2 2 ( 2 )
+ h β1 ftt + 2β1 β2 f ftx + β1 β2 fx ft + β22 f fxx f + β22 f fx2 + . . .
2
Now we choose the free parameters such that as many h-terms as possible
from the expansion of this ansatz ϕ(t, x, h) match those from ∆(t, x, h).
We get a method of order p = 2 if
1 = α1 + α2
1/2 = α2 β1
1/2 = α2 β2
⃝
c Rainer Callies TUM 2020 38
The solution of the nonlinear system is not unique.
1
For the choice α1 = α2 = , β1 = 1, β2 = 1 we get the ”method of Heun”, for
2
1 1
α1 = 0, α2 = 1, β1 = , β2 =
2 2
we get the ”modified Euler”. Both are consistent of order p = 2.
In a similar way a method of order p = 3, . . . can be constructed, if the ansatz
contains a sufficient number of free parameters.
The definition specifies an explicit Runge-Kutta method with s stages. For e.g.
s = 4 this method is called RK4 method. The method is called explicit, because
fk can be calculated using f1 , . . . , fk−1 only; these values have been calculated
before and thus are already known.
Algorithm
In the RK method per integration step t → t + h (or tm → tm+1 = tm + hm ) the
following algorithm is executed
∑
k−1
fk (t, η(t), h) := f t + ck h, η(t) + h αkj fj (t, η(t), h) , k = 1, . . . s ,
j=1
∑
s
η(t + h) = η(t) + h bk · fk (t, η(t), h)
k=1
In a compact way the parameter set for an s-stage explicit Runge-Kutta method
can be arranged in a Butcher tableau
0 0
c2 α21 0
c3 α31 α32 0
.. .. .. . . ..
. . . . .
cs αs1 αs2 · · · αs,s−1 0
b1 b2 ··· bs−1 bs
⃝
c Rainer Callies TUM 2020 39
with the nodes (0, c2 , . . . , cs )T , the Runge-Kutta matrix A := (αkj ) and the
weights (b1 , b2 , . . . , bs )T .
s(s + 1)
These parameters mostly are determined such that the resulting ex-
2
plicit RK method has maximum order of consistency.
Remark
A nice insight into the basic ideas of RK methods is obtained if we use the
following equivalent reformulation of the classical RK method (i.e. a special
RK4 method):
h
ηm+1 = ηm + (k1 + 2k2 + 2k3 + k4 ) , tm+1 = tm + h
6
with
k1 = f (tm , ηm ),
( )
h h
k2 = f tm + , ηm + k1 ,
2 2
( )
h h
k3 = f tm + , ηm + k2 ,
2 2
k4 = f (tm + h, ηm + hk3 ) .
and the Butcher tableau
0
1/2 1/2
1/2 0 1/2
1 0 0 1
1/6 1/3 1/3 1/6
Here a sequence of four Euler steps with stepsizes ci h is performed, all starting
at ηm . After the i-th Euler step (i = 1, 2, 3, 4), the (t, x)-values of the resulting
point are used to calculate an updated slope ki specified by the right-hand side
f (t, x) of the differential equation. This slope is used for the next Euler step in
case of i = 1, 2, 3.
At the end, a final Euler step is performed with stepsize h and a slope which is
the weighted average of the four slopes ki calculated before.
In averaging the four increments, greater weight is given to the increments at
the midpoint. If f is independent of x, then the differential equation is equivalent
to a simple integral and the classical RK4 method reduces to Kepler’s rule.
Order conditions
After performing the Taylor expansion the coefficients of the Taylor series of
ϕ(t, x, h) and ∆(t, x, h) are compared. The goal is to choose the free para-
meters such that as many h-terms as possible from the expansion of ϕ(t, x, h)
match those from ∆(t, x, h). An example was given in chap. 3.5.1. For the RK
methods, this approach leads to the following order conditions
⃝
c Rainer Callies TUM 2020 40
x x(t)
(tm+1,ηm+1)
ηm + h
ηm + h
ηm + h
ηm
... ...
Remark
The first condition guarantees that the method is consistent at all: Because of
∑
s
ϕ(t, x, h) = bk · fk (t, x, h) we get
k=1
( )
∑
s
h → 0 ⇒ η(t + h) → η(x) ⇒ ϕ(t, η(t), h) = f (t, η(t)) ∀ f ⇐⇒ bi = 1
i=1
⃝
c Rainer Callies TUM 2020 41
Remark
In addition we often want that the node condition is satisfied:
∑
i−1
ci = αij
j=1
This condition guarantees, that we get the same numerical results no matter
if the RK method is applied to a non-autonomous IVP or to the same problem
after transformation into autonomous form.
p 1 2 3 4 5 6 7 8
N 1 2 4 8 17 37 85 200
smin 1 2 3 4 6 7 9 11
Example
3/8-rule of Kutta: RK method of order p = 4
0
1/3 1/3
2/3 −1/3 1
1 1 −1 1
1/8 3/8 3/8 1/8
0
1/2 1/2
1/2 0 1/2
1 0 0 1
1/6 1/3 1/3 1/6
⃝
c Rainer Callies TUM 2020 42
3.6 Stepsize Control for One-Step Methods
3.6.1 Basic problem and Solution Strategy
Basic problem
We have almost no access to the rounding errors. We do not know the total
discretization error that describes the difference between the true solution and
its numerical approximation (convergence!) at the grid points ti .
We only can get an estimate of the local discretization error (= error per step,
consistency). That is not much, but unfortunately that mostly is all we have!
So we choose a rough estimate for the local discretization error (our tolerance
tol); subject to that constraint we try to maximize hm (→ rounding error and
computational effort decrease). ”Rough estimate” is meant literally: Often the
required tolerance tol is only poorly approximated.
Instead of reaching ∥T (tm , x(tm ), hm )∥ < tol we alternatively try to control the
local discretization error per step length
∥T (tm , x(tm ), hm )∥
< tol2
hm
Solution strategy
To obtain the local discretization error we have to compare after each step
tm → tm+1 the numerical result η(tm+1 ) with the exact solution of the IVP for
the same initial value η(tm ). The exact solution unfortunately is unknown.
To overcome this difficulty, the following workaround is used: We calculate the
step tm → tm+1 twice with different accuracy and then we use the numerical
approximation of higher precision as a substitute for the exact but unknown
solution.
⃝
c Rainer Callies TUM 2020 43
There are two standard ways to calculate these two approximations with differ-
ent accuracy:
Either to take one method only and calculate the solution with the two stepsizes
hm (1 integration step) and hm /2 (two succeeding integration steps)
or to choose two different methods with different orders of consistency and to
perform one integration step each with the same stepsize hm .
Gragg’s theorem
Assumption:
Let be f ∈ C N +1 ([a, b] × IRn , IRn ), t0 ∈ [a, b], x′ = f (t, x), x(t0 ) = x0 .
Let denote ϕ the increment function for a one-step method of order p.
tm − t0
The stepsize is assumed to be constant: h = hm ∀ m ⇒ h =
m
Claim:
∑
N
η(tm , h) = x(tm ) + hi ei (tm ) + hN +1 EN +1 (tm , h) ∀ m
i=p
Remark
The most important feature is: ei (xm ) is independent of h!!
Gragg’s theorem guarantees the existence of an asymptotic expansion of the
global discretization error (→ convergence).
⃝
c Rainer Callies TUM 2020 44
We subtract the two equations, make a Taylor expansion and use ep (t0 ) = 0;
then the following approximation of e′p (t0 ) can be calculated numerically
η(t0 + hold , hold ) − η(t0 + hold , hold /2) ·
( ) = ep (t0 + hold ) = ep (t0 ) +e′p (t0 ) · hold (∗)
1 | {z }
hold p 1 − p =0
2
The term e′p (t0 ) is independent of h and hold respectively!
Now we apply the Taylor expansion once more and directly to Gragg’s theorem
with the new and improved stepsize hnew :
η(t0 + hnew , hnew ) = x(t0 + hnew ) + hnew p ep (t0 + hnew )
+ hnew p+1 ep+1 (t0 + hnew ) + O(hnew p+2 )
T aylor
= x(t0 + hnew ) + hnew p (ep (t0 ) + hnew e′p (t0 ))
+ hnew p+1 ep+1 (t0 ) + O(hnew p+2 )
·
= x(t0 + hnew ) + hnew p+1 e′p (t0 )
Remark
Important feature: e′p (t0 ) is independent of hold and hnew ; that is the only
reason why a relation between (∗) and (∗∗) can be established.
The stepsize control with two stepsizes can be easily understood, but it is
not often used: too many function evaluations.
⃝
c Rainer Callies TUM 2020 45
Algorithm 4: Stepsize control with two different stepsizes
We calculate the (p + 1)-th root and get hnew . Now we can check a
posteriori whether the original stepsize selection was correct or not.
else
tm+1 := tm + hold
η(tm+1 ) := η (tm + hold , hold /2)
hold := min {hnew , tf − tm+1 }
Remark
This type of stepsize control (2 methods, 1 stepsize) is often used in RK meth-
ods (idea of Fehlberg). The methods are then denoted e.g. by RKF 4(5) and
RKF 8(7). The method RKF 4(5) is consistent of order 4 with an embedded
error estimator of order 5 (→ order of the error estimator written in the bracket).
0
c2 α21
c3 α31 α32
.. .. .. . .
. . . .
cs αs1 αs2 · · · αs,s−1
b1 b2 · · · bs−1 bs
b̂1 b̂2 ··· b̂s−1 b̂s
⃝
c Rainer Callies TUM 2020 46
To obtain a sufficient number of free parameters for both methods, often one
additional stage in the tableau is enough.
Remark
In deviation from the strict theory, often the result of the better method is used
as the initial value for the next integration step (e.g. RKF 8(7)). By this a small
gain in precision is obtained without additional effort.
Example
Three-body problem: Simulation of the motion of a rocket in the gravitational
fields of Earth and Moon. Start from Earth orbit, flight to Moon, one revolution
around the Moon, flight back to Earth and arrival in Earth orbit again (Apollo
13 type mission). The final accuracy describes the deviation at the end point.
⃝
c Rainer Callies TUM 2020 47
Proof: We assume w.l.o.g. t0 < tk , hk > 0 ∀ k and consider exact solutions
zk (t) for the IVP with modified initial values
Using the triangle inequality and the theorem on the continuous dependency
of a solution from the initial values we get
∑
n
∥x(tn ) − ηn ∥ = ∥z0 (tn ) − zn (tn )∥ ≤ ∥zk−1 (tn ) − zk (tn )∥
k=1
∑
n
≤ ∥zk−1 (tk ) − zk (tk ) ∥eL|tn −tk |
| {z }
k=1 =ηk
∑
n
= ∥zk−1 (tk ) − (ηk−1 + hk−1 ϕ(tk−1 , ηk−1 , hk−1 , f ))∥ eL|tn −tk |
k=1
∑n
z (t ) − z (t ) − h ϕ(t , η , h , f )
L|t −t |
=
hk−1 k−1 k k−1 k−1 k−1 k−1 k−1 k−1
e n k
h
k=1 k−1
∑n n ∫
∑ tk
L|tn −tk |
≤ hk−1 σ(hk−1 )e ≤ σ(hmax ) eL(tn −ξ) dξ
k=1 k=1 tk−1
∫ tn
= σ(hmax ) eL(tn −ξ) dξ ⇒ claim
t0
We estimated hk−1 eL|tn −tk | by the integral, for eL|tn −tk | ≤ eL|tn −ξ| ∀ ξ ∈ [tk−1 , tk ].
First we choose e.g. the explicit Euler as our solution method and obtain
⃝
c Rainer Callies TUM 2020 48
The numerical approximation converges to 0 for i → ∞ only if |1 + hλ1 | < 1 ∧
|1+hλ2 | < 1; only in this case the numerical approximation matches the analytic
solution in the limit.
We will see later on: The analytic solution is asymptotically stable, therefore the
ODE problem is called stiff. The numerical solution ”explodes”, if the stepsize
is too large.
The system is stable, if for all EWs λi of A: Re(λi ) ≤ 0 and in case of Re(λi ) = 0
the algebraic and the geometric multiplicity of λi are equal (i.e. the EW λi has
multiplicity ki and ki linearly independent EVs).
The system is exponentially and thus also asymptotically stable, if Re(λi ) < 0
is true for all EWs λi of A.
⃝
c Rainer Callies TUM 2020 49
x x
t t
Figure 7: Examples of asymptotically stable (le., with large deviations in the initial val-
ues) and instable (ri., after small perturbations from the nominal trajectory) behaviour
of the solutions of an ODE, nominal solution is marked in blue.
Let be x′ = f (x) an (autonomous) ODE system with the exact solution x(t) and
the initial value x(t0 ) = x0 . Let be v(t) another solution of the same ODE, but
for slightly modified initial values. By Taylor expansion we get
·
v ′ (t) = f (v(t)) = f (x(t)) + fx (x(t)) · (v(t) − x(t))
Simplifying assumption 1:
fx (x(t)) is only changing slowly , i.e. fx (x(t)) ≈ const ≈ J. For e(t) := v(t) − x(t)
we get the new ODE e′ (t) = Je(t). We investigate the difference e(t) to obtain
the asymptotic behaviour sketched in Fig. 7.
⇒ 1st test ODE: x′ (t) = Ax(t) , x(0) = x0 , A ∈ IRn×n
Simplifying assumption 2:
By a similarity transformation, in special cases J can be transformed to diago-
nal form: ∃ Q ∋ Q−1 JQ = diag(λ1 , . . . , λn ). We define p(t) by p(t) := Q−1 e(t).
Then the 1st test ODE decomposes into the following scalar ODEs
If J can be transformed to diagonal form, then the λi ∈ C are the EWs. Because
a stiff system is characterized by asymptotic stability, we choose Re(λ) < 0
⇒ 2nd test ODE: x′ (t) = λx(t) , x(0) = x0 , Re(λ) < 0 (Dahlquist 1963)
⃝
c Rainer Callies TUM 2020 50
There might exist stiff ODEs that do not satisfy the simplifying assumptions.
A method which has been tested to work for Dahlquist’s ODE not necessarily
works well for these ODEs.
ηi+1 = R(hλ)ηi
Alternatively, we can use the 1st test ODE and analogously define R(z) by
ηi+1 = R(hA)ηi
Example
⃝
c Rainer Callies TUM 2020 51
2+z
Trapezoidal rule: R(z) =
2−z
From the original definition of the trapezoidal rule applied to the 1st test ODE,
we obtain the stability function R(z) analogously to the implicit Euler
h h
ηi+1 = ηi + (f (ti , ηi ) + f (ti+1 , ηi+1 )) = ηi + (Aηi + Aηi+1 )
( 2 ) ( ) 2
h h
⇒ 1 − A ηi+1 = 1 + A ηi
2 2
⇒ ηi+1 = (2 − hA)−1 (2 + hA) ηi
Observation
For (almost) all methods we get:
If the method is explicit, then R(z) is a polynomial;
if the method is implicit, then R(z) is a rational function.
Theorem
Let be A ∈ IRn×n diagonalizable: Q−1 AQ = D = diag(λ1 , . . . , λn ). Let us define a
numerical method by ηi+1 = R(hA)ηi with R rational function and assume that
Re(λi ) < 0 ∀ i, i.e. for all EWs of A.
Then ξi := Q−1 ηi satisfies the recursion ξi+1 = R(hD)ξi ; in addition, for h > 0
the so-defined numerical method converges as required
Definition (stability)
A numerical method defined by ξi+1 = R(hA)ξi is called
Remark
The larger the set MR ∩ C − , the better a method is suited for the treatment of
stiff ODEs.
For MR ⊇ C − = {z ∈ C | Re(z) < 0} the method is absolutly stable.
If |R(z)| < 1 for z = hλ, then if the neg. real part of λ increases, the stepsize h
has to decrease to obtain the same value of the stability function R(z).
⃝
c Rainer Callies TUM 2020 52
Example (stability domains)
Implicit Euler: M1/(1+z) = {z ∈ C | |1 − z| > 1}, i.e. the implicit Euler is absolute
stable.
z2 z3 z4
Classical explicit Runge-Kutta method RK 4: R(z) = 1 + z + + +
2 6 24
Im (z)
p=s=3 2
p=s=2
1
p=s=1
3 2 1 Re (z)
explicit Euler
1
p=s=4
These methods have excellent stability properties, but they are rather expen-
sive numerically: In each integration step a system of nonlinear equations of
dimension (n · s) has to be solved → O(n3 s3 ) operations!
Example: A Radau-IIA method of order p = 2s − 1 is L-stable, e.g.
⃝
c Rainer Callies TUM 2020 53
4 Finite Differences
4.1 One-Dimensional Model Problem
4.1.1 Model problem
In an experiment elevation data along a mountain path are measured by GPS.
Let the data be superimposed by heavy noise due to low signal strength. How
to get a ”reasonable” altitude profile of the terrain structure?
altitude
L
projection of the path
⃝
c Rainer Callies TUM 2020 54
If u is an optimum, then the following necessary condition holds
dJ(ε)
= 0 with J(ε) := I(u + εη)
dε ε=0
We differentiate I(v) with respect to ε and get (using chain rule ”chain” and
integration by parts ”p.I.” on η ′ )
dJ(ε) dI(u + εη)
0 = =
dε ε=0 dε
(∫ ε=0 )
L
d ′
= (v(x) − f (x)) + β(v (x)) dx
2 2
dε 0
ε=0
∫ L
( )
2(u(x) − f (x)) · η(x) + 2βu′ (x) · η ′ (x) dx
chain
=
0
∫ L x=L
p.I. ( ′′
) ′
= 2 (u(x) − f (x)) · η(x) − βu (x) · η(x) dx + u (x)η(x) (∗)
0 x=0
The second term in (∗) vanishes, because η(0) = 0 = η(L). We apply the Fun-
damental lemma (see below) to the integral and obtain, that a necessary con-
dition for an optimum is that u solves the following boundary value problem
(BVP)
Problem (P ):
−βu′′ (x) + u(x) = f (x) , x ∈ ]0, L[
u(0) = f (0)
u(L) = f (L)
The Fundamental lemma could be applied because the integral has to be zero
for every choice of such a test function η:
Fundamental lemma
Let be G ∈ C 0 ([a, b], IR), η ∈ C 1 ([a, b], IR) with η(a) = η(b) = 0.
∫ b
If ∀ η : η(x)G(x)dx = 0, then it follows: G(x) ≡ 0 .
a
Remark
The boundary conditions in our example are u(0) = f (0) and u(L) = f (L), the
function values are prescribed. That type of boundary condition is called Dirich-
let condition.
If e.g. u(L) = f (L) is omitted, then the new and special boundary condition
u′ (L) = 0 is necessary to fulfill (∗). If the derivative with respect to the exterior
normal to the boundary (here in 1 D i.e. the ordinary derivative) is given, that
type of boundary condition is called Neumann condition.
⃝
c Rainer Callies TUM 2020 55
Remark
From problem (P ) we see that solutions u have to be at least in C 2 ([0, L], IR)!
For f a continuous approximation by a polygon is sufficient.
Ωh := {xi | xi = (i − 1) · h, i = 1, . . . , N }
To obtain the Ui , the derivative u′′ (x) is approximated by the difference quotient
(Taylor expansion!)
u(x + h) − 2u(x) + u(x − h)
u′′ (x) = + O(h2 )
h2
This Taylor expansion is possible only for u ∈ C 4 ([0, L], IR)!
In matrix notation (Ph ) can be stated as a sparse linear system:
0 0 1 0 U1 f (x1 )
1 −2 1
U2 f (x2 )
1
β ..
..
− .. .. .. + . . . = .
h2 . . . .
.
1 −2 1 1 ..
..
.
0 0 0 1 UN f (xN )
Remark (again)
We have obtained the difference approximation of u′′ (x) using Taylor’s theorem.
From the first nonvanishing term of the error we see that the approximation
quality O(h2 ) stated there is valid only for u ∈ C 4 ([0, L], IR)!!
Therefore implicit smoothness assumptions are used (C 2 from optimization, C 4
or C 3 from finite differences) which are not part of the original problem.
⃝
c Rainer Callies TUM 2020 56
4.1.3 Convergence of the Finite Difference Method
Introduction
The discretized problem (Ph ) leads to a linear system. Two questions arise:
In chap. 4.1.3 we analyze these questions only for the one-dimensional model
problem discussed in the previous chap. 4.1.1!
Proof:
We only show the first property. Let Uj be the maximum of {Ui , i = 1, . . . , n}.
If j = 1 (analogously for j = N ), then
Uj ≥ Uj−1 ∧ Uj ≥ Uj+1
Uniqueness
We now address the first question.
The linear system is uniquely solvable, if the matrix is regular. For that we
have to show that either the determinant of the matrix is non-zero – e.g. with
the minor expansion formula (= Determinantenentwicklungssatz) – or that for
the homogeneous system the only solution is U := (U1 , . . . , UN ) = 0.
In our case the latter approach is simple if we use the discrete maximum prin-
ciple:
⃝
c Rainer Callies TUM 2020 57
Consider the homogeneous system, i.e. f (xi ) = 0 ∀ i ∈ {1, . . . , N }.
Then mini=1,...,N f (xi ) = 0 = maxi=1,...,N f (xi ).
From the discrete maximum principle we get
0 ≤ min Ui ≤ max Ui ≤ 0 ⇒ Ui = 0 ∀ i .
i=1,...,N i=1,...,N
Definition
Let denote Ω = ]0, L[ the domain, h = L/(N − 1) the discretization mesh size,
Ωh := {xi = (i − 1) · h, i = 1, . . . , N } the mesh of N ∈ IN gridpoints; u ∈ C 4 (Ω, IR).
Then the differential operator Lβ is defined by
d2
Lβ := −β +1 ⇒ Lβ u(x) = −βu′′ (x) + u(x)
dx2
These definitions allow us a more compact and clear formulation of the follow-
ing theorems.
Proof:
( ) ( )
β
L u(xi ) − Lh u (xi ) = −βu′′ (xi ) + u(xi ) − Lh u (xi ) = O(h2 )
β β
⃝
c Rainer Callies TUM 2020 58
Remark
As in case of the one-step methods for ODEs, consistency is a local charac-
terization.
We insert the exact solution u at the grid points xi into the homogeneous part
of the exact differential equation and into its finite difference approximation and
measure the maximum difference.
For a consistent method, the difference vanishes for h → 0.
Attention: We do not compare the results Ui of the numerical solution of the
ODE with the exact solution u(xi ) here!
Proof:
We again apply the discrete maximum principle and directly obtain
Here we used that min Ui ≥ min f (xi ) ⇒ − min Ui ≤ − min f (xi ) and in case of
min f (xi ) < 0 we get − min f (xi ) = max(−f (xi )).
Proof:
We investigate the error e : Ωh → IR, e(xi ) := Ui − u(xi ) and prove that it solves
problem (Ph ) with a new right hand side r(xi ) for i = 2, . . . , N :
( ) β ( ) ( )
β β β
Lh e (xi ) = − 2 (Ui+1 − 2Ui + Ui−1 ) − Lh u (xi ) = f (xi ) − Lh u (xi )
h ( ) ( )
= −βu′′ (xi ) + u(xi ) − Lβh u (xi ) = Lβ u(xi ) − Lβh u (xi )=: r(xi )
⃝
c Rainer Callies TUM 2020 59
We now define r(x1 ) := 0, r(xN ) := 0; then e solves the problem (Ph ) with the
new right hand side r instead of f .
Because we have proven that our method is stable:
In total we get
max |e(xi )| ≤ Ĉ with Ĉ = C · C̃
i=1,...,N
and Ĉ is independent on h.
Notation
In PDEs, it is common to denote partial derivatives using subscripts. So e.g.
for u = u(x, y) we write:
( )
∂u ∂ 2u ∂2u ∂ ∂u
ux = , uxx = 2 , uxy = = .
∂x ∂x ∂y ∂x ∂y ∂x
Especially in physics, nabla (∇) is often used to denote spatial derivatives, and
u̇, ü for time derivatives. For example, the wave equation can be written as
ü = c2 ∇2 u = c2 ∆u
⃝
c Rainer Callies TUM 2020 60
Example
General scalar linear PDE of 2nd order with 2 independent variables x, y:
a(x, y)uxx + b(x, y)uxy + c(x, y)uyy + d(x, y)ux + e(x, y)uy + g(x, y)u = f (x, y)
with a, b, c, d, e, f, g ∈ C 0 (Ω, IR), Ω ⊂ IR2 bounded domain and |a| + |b| + |c| >
0 ∀ (x, y) ∈ Ω.
General scalar quasilinear PDE of 2nd order with 2 independent variables x, y:
∑
n
∂2 ∑ n
∂
L := − aik (x) + bi (x) + c(x) , Lu(x) = f (x)
∂xi ∂xj ∂xi
i,k=1 i=1
⃝
c Rainer Callies TUM 2020 61
Well-posed PDE problems (Hadamard)
The mathematical term ”well-posed problem” stems from a definition given by
Hadamard. He believed that mathematical models of physical phenomena
should have the properties that:
(1) a solution exists,
(2) the solution is unique,
(3) the solution’s behavior changes continuously with the data (stability).
Examples of well-posed problems include the Dirichlet problem for Laplace’s
equation, and the heat equation with specified initial conditions.
Especially important in PDE applications is the correct determination of the
initial data and the boundary values. Otherwise it might happen that no solution
exists or that the solution changes dramatically even for a very small change
in the data.
Notation
Let be Ω ∈ IR2 a bounded domain and f ∈ C 0 (Ω, IR) a given function. Let denote
u : Ω → IR, (x, y) 7→ u(x, y), the function that describes the vertical displacement
of the membrane at every (x, y) ∈ Ω.
Let the boundary ∂Ω of Ω consist of two parts ΓD and ΓN with
ΓD ∪ ΓN = ∂Ω ∧ ΓD ∩ ΓN = ∅ .
As boundary condition on ΓD we assume a Dirichlet condition (function values
prescribed)
u(x, y) := G(x, y) for (x, y) ∈ ΓD
In addition, let n(x, y) denote the exterior normal, i.e. the outward pointing
unit normal vector on ∂Ω; the directional derivative ∂u/∂n is calculated via the
scalar product
∂u
(x, y) = ⟨n(x, y), ∇u(x, y)⟩2
∂n
⃝
c Rainer Callies TUM 2020 62
Physical model
From physics we get (without proof) for the potential energy of the deformed
membrane
∫ ∫ ∫
1
I(u) := ⟨∇u, ∇u⟩2 dx dy − uf dx dy − uH ds
2 Ω Ω ΓN
The potential energy I(u) consists of the strain energy (first integral, Verzer-
rungsenergie) minus the energy resulting from the external forces acting on
the surface Ω and on the boundary ΓN .
A physical system in equilibrium takes the state of minimum energy and there-
fore we get for u
I(u) → min! ∧ u = G on ΓD
Variational approach
We assume that a ”classical solution” u ∈ C 2 (Ω, IR) ∩ C 1 (Ω̄, IR) with Ω̄ := Ω ∪
∂Ω exists and will obtain the Poisson equation as a necessary condition for a
minimum.
For the solution we again (similar to chap. 4.1.1) use the method of Lagrange:
For an arbitrarily chosen function η ∈ C (Ω, IR) ∩ C (Ω̄, IR) with η Γ = 0 we em-
2 0
D
bed the optimal solution u into and compare it with the one-dimensional set of
functions v := u + εη for ε ∈ [−ε0 , ε0 ]. We have chosen that embedding because
the values on the part ΓD of the boundary are prescribed and that has to be
true also for all possible solution candidates.
with div F = ∂F1 /∂x + ∂F2 /∂y = uxx η + ux ηx + uyy η + uy ηy = η∆u + ⟨∇u, ∇η⟩2 .
⃝
c Rainer Callies TUM 2020 63
Using this expression for the generalized integration by parts we get
∫ ∫ ∫
⟨∇u, ∇η⟩2 dx dy = − η∆u dx dy + η⟨∇u, n⟩2 ds
Ω Ω ∂Ω
We insert the last expression into (∗), use that η ΓD
= 0 and obtain
∫ ∫ ∫ ∫
0 = ηf dx dy −
η∆u dx dy + η⟨∇u, n⟩2 ds + ηH ds
Ω Ω ∂Ω=ΓN +ΓD ΓN
∫ ∫
= η(∆u + f ) dx dy + η(H − ⟨∇u, n⟩2 ) ds
Ω ΓN
This expression has to be valid for all η ∈ C 2 (Ω, IR) ∩ C 0 (Ω̄, IR) with η Γ = 0.
D
Using the Fundamental lemma (in its generalized form) again, we get the fol-
lowing necessary condition for a minimum: u has to solve the
”Poisson equation” (PDE problem)
( 2 )
∂ ∂2
−∆u = − u(x, y) + 2 u(x, y) = f (x, y) on Ω
∂x2 ∂y
u(x, y) = G(x, y) on ΓD
∂u
(x, y) = ⟨n(x, y), ∇u(x, y)⟩2 = H(x, y) on ΓN
∂n
u ∈ C 2 (Ω, IR) ∩ C 1 (Ω̄, IR)
Remark
On ΓD we do not need the C 1 -property of u, here C 0 is sufficient.
⃝
c Rainer Callies TUM 2020 64
Remark
A consequence of the maximum principle is that the solution changes continu-
ously with the data on the boundary (in case of Dirichlet condition):
Consider −∆u1 = f and −∆u2 = f with ui (x) = Gi (x) ∀ x ∈ ∂Ω, i = 1, 2. We get
−∆w = 0 for w := u1 − u2 .
From the maximum principle we conclude
w(x) ≤ sup w(z) ≤ sup |w(z)| , w(x) ≥ inf w(z) ≥ − sup |w(z)|
z∈∂Ω z∈∂Ω z∈∂Ω z∈∂Ω
sup |u1 (x) − u2 (x)| ≤ sup |u1 (z) − u2 (z)| = sup |G1 (z) − G2 (z)|
x∈Ω z∈∂Ω z∈∂Ω
From that we see that the Poisson equation with Dirichlet boundary conditions
is well-posed in the sense of Hadamard (effect of changes in f not analyzed
here).
Remark
∑
With the definition ∆u(x) := ni=1 uxi xi (x) for u ∈ C 2 (Ω, IR) ∩ C 0 (Ω̄, IR) with
Ω ∈ IRn the Poisson equation can be generalized to IRn .
We do not want to have the same grid point twice in the definition.
⃝
c Rainer Callies TUM 2020 65
y-axis
h
4h
j
3h
2h
1h
0 1h 2h 3h 4h 5h x-axis
i
h2
u(x + h, y) = u(x, y) + ux (x, y) · h + uxx (x, y) + O(h3 )
2
h2
u(x, y + h) = u(x, y) + uy (x, y) · h + uyy (x, y) + O(h3 )
2
1( )
Backward difference: ux |i,j = ui,j − ui−1,j + O(h)
h
1( )
uy |i,j = ui,j − ui,j−1 + O(h)
h
1( )
Forward difference: ux |i,j = ui+1,j − ui,j + O(h)
h
1( )
uy |i,j = ui,j+1 − ui,j + O(h)
h
1( )
Centered difference: ux |i,j = ui+1,j − ui−1,j + O(h2 )
2h
1( )
uy |i,j = ui,j+1 − ui,j−1 + O(h2 )
2h
⃝
c Rainer Callies TUM 2020 66
1
1
-
h [ -1 1 0 [ , -1h [ 0 -1 1
1
[ ,-
2h [
-1 0 1
1
[, -
2h
0
-1
Figure 11: Computational molecules for backward, forward, centered difference ap-
prox. of ux |i,j and centered difference approx. of uy |i,j (from le. to ri.).
1 -16
1 1
-
h2
1 -4 1 , -
12h2
1 -16 60 -16 1
1 -16
Figure 12: Computational molecules for the 5-point stencil ∆h (le.) and the non-
(9)
compact 9-point stencil ∆h (ri.).
⃝
c Rainer Callies TUM 2020 67
Remark
The approximation of the Laplace operator by finite differences is possible only
if u is sufficiently smooth (because of Taylor!). A much higher smoothness is
necessary than for the analytical solution: u ∈ C 4 (Ω, IR) in case of the 5-point
stencil and u ∈ C 6 (Ω, IR) in case of the non-compact 9-point stencil.
The discretization with the non-compact 9-point stencil includes values at
points that are not closest neighbors. This leads to increased difficulties at
points close to the boundary.
Because of these two drawbacks, discretizations of higher order are often not
used.
j+1 j+1
j j
j-1 j-1
i=0 1 2 3 i=0 1 2 3
Figure 13: Grid geometry with boundary conditions at point (x0 , yj ) (blue/red): 5-point
stencil centered at (x1 , yj ) (black) for Dirichlet boundary condition (le.) and extrapola-
tion centered at (x0 , yj ) (blue/red) for Neumann boundary condition (ri.).
h2 ′′ h3
g(x + h) = g(x) + hg ′ (x) + g (x) + g ′′′ (x) + . . . ⇒
2 6
g(x + h) − g(x) h h 2
= g ′ (x) + g ′′ (x) + g ′′′ (x) + . . .
h 2 6
g(x + 2h) − g(x) 2h 2
= g ′ (x) + hg ′′ (x) + g ′′′ (x) + . . .
2h 3
By linear combination we get
Curvilinear boundary
If the domain Ω has a more complicated geometry, a modification of the dis-
cretization of the Laplace operator is necessary.
We consider the example in fig. 14.
On the intersections of the curvilinear boundary with the mesh we define addi-
tional points (red). The point A has the coordinates (xA , yA ) = (i · h − hA , j · h)
and the point B has the coordinates (xB , yB ) = (i · h, j · h + hB ) with hA , hB > 0.
We modifiy the 5-point stencil centered at (xi , yj ).
Using Taylor expansion again we get
( )
ui+1,j ui,j u(xA , yA )
uxx = 2 − + + O(h)
i,j h(h + hA ) h · hA hA (h + hA )
( )
ui,j−1 ui,j u(xB , yB )
uyy = 2 − + + O(h)
i,j h(h + hB ) h · hB hB (h + hB )
⃝
c Rainer Callies TUM 2020 69
h
boundary
j+1
B
h
hB
A
j
hA
j-1
i-1 i i+1
with αi,j , . . . , αB chosen according to the equations above. UA , UB are the ap-
proximations at the additional points A, B.
This scheme is called Shortley-Weller scheme. The order of consistency is
only linear. In case of h = hA = hB we get the usual 5-point stencil which is
consistent of order 2.
Remark
This example shows that Finite Difference methods run into difficulties in case
of more complicated geometries of Ω.
Ah U = f˜h .
We approximate the exact solution at all interior points (i.e. Ωh ) and at all
boundary points with Neumann condition (i.e. ΓN,h ).
As an example consider Q̄ := {(x, y)|x ∈ [0, 5h], y ∈ [0, 4h]} together with a grid
with uniform mesh size h as in fig. 11. Let us assume Dirichlet boundary
conditions with rij := G(xi , yj ) = u(xi , yj ) on the boundary ∂Ωh and let denote
fij = f (xi , yj ). Then after ordering the unknowns Uij into the vector U ∈ IR12 in
a proper way we obtain the following sparse linear system
⃝
c Rainer Callies TUM 2020 70
4 −1 −1 U11 h 2 f 11 − r 10 − r 01
−1 4 −1 −1 U21 h 2 f 21 − r 20
−1 4 −1 −1 U31 h 2 f 31 − r 30
−1 4 −1 U41 h 2 f 41 − r 40 − r 51
−1 4 −1 −1 U12 h 2 f 12 − r 02
− 1 −1 4 −1 −1 U 22 h2 f 22
= −
−1 −1 4 −1 −1 U 32 h2 f 32
−1 −1 4 −1 U42 h 2 f 42 − r 52
−1 4 −1 U13 h 2 f 13 − r 14 − r 03
−1 −1 4 −1 U23 h 2 f 23 − r 24
−1 −1 4 −1 U33 h 2 f 33 − r 34
−1 −1 4 U43 h 2 f 43 − r 44 − r 53
Figure 15: Sparse linear system Ah U = f˜h for the discretized Poisson problem with
Dirichlet boundary conditions and the domain and grid as in fig. 11.
The structure of the matrix Ah depends on the chosen numbering of the grid
points as can be seen in fig. 16 for a square domain with 25 interior points and
Dirichlet boundary conditions. A number is assigned to each grid point; the
number corresponds to the row of Ah that contains one of the equations (1)-(4)
belonging to this grid point.
21 22 23 24 25 11 16 20 23 25 11 24 12 25 13
16 17 18 19 20 7 12 17 21 24 21 9 22 10 23
11 12 13 14 15 4 8 13 18 22 6 19 7 20 8
6 7 8 9 10 2 5 9 14 19 16 4 17 5 18
1 2 3 4 5 1 3 6 10 15 1 14 2 15 3
0 0 0
5 5 5
10 10 10
15 15 15
20 20 20
25 25 25
0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25
nz = 105 nz = 105 nz = 105
Figure 16: Structure of Ah depending on the numbering of the grid points. In all cases
Ah ∈ IR25×25 contains nz = 105 nonzero elements.
⃝
c Rainer Callies TUM 2020 71
Ω
Remark
Analogously to the (non-discretized) Poisson equation in chap. 4.3.2 a discrete
minimum principle and a discrete comparison principle can be formulated. The
proof is similar to chap. 4.1.3 for the one-dimensional model problem.
Theorem (uniqueness)
Consider the Poisson equation −∆u = f and Dirichlet boundary conditions
u(x, y) = G(x, y) on ∂Ω. Let denote ∆h the discretization on a uniform grid
using a 5-point stencil. Let us assume curvilinear boundaries (4). We obtain a
sparse linear system Ah U = f˜h .
Then Ah is non-singular and the sparse linear system is uniquely solvable.
Proof:
Similar to chap. 4.1.3 using the discrete maximum principle.
⃝
c Rainer Callies TUM 2020 72
Question
How accurate is the Laplace operator approximated by the 5-point stencil?
That is a local property.
Definition (consistency)
The difference scheme ∆h is consistent with the Laplace operator ∆, if
Lemma
The 5-point stencil is consistent of order k = 2.
Proof: We directly get that from the Taylor expansion in case of constant h.
Remark
Again, from consistency we cannot conclude convergence. We need stability
in addition.
Remark
We investigate the global error and thus the convergence only on Ωh (i.e. in
the interior of the domain), not on the boundaries. For Dirichlet conditions, that
is sufficient.
Theorem
Consider the Poisson equation −∆u = f and Dirichlet boundary conditions
u(x, y) = G(x, y) on ∂Ω. Let denote ∆h the discretization on a uniform grid with
mesh size h using a 5-point stencil.
⃝
c Rainer Callies TUM 2020 73
Proof and explanation:
• We define the (local) consistency error r(xi , yj ) := ∆h u(xi , yj ) − ∆u(xi , yj ) for
(xi , yj ) ∈ Ωh and obtain
Therefore, we get a new and discrete boundary value problem for the error
Ah is the same matrix as defined in chap. 4.3.4, E and R are vectors with the
components e(xi , yj ), r(xi , yj ) ordered in the same way as the components of
U and f˜h in chap. 4.3.4. The boundary conditions for e already have been
inserted.
• In the next step we show the stability of the system Ah E = R using the dis-
crete maximum principle.
⃝
c Rainer Callies TUM 2020 74
Analogously we perform the steps above for w̃ := −w and get ẽ ≥ −ϱ2 /4; we
combine these two results and obtain (after multiplication with γ(h)) the stability
condition
ϱ2 ϱ2 ϱ2
max |ẽ(xi , yj )| ≤ ⇒ max |e(xi , yj )| ≤ max |r(xi , yj )| = γ(h)
(xi ,yj )∈Ωh 4 (xi ,yj )∈Ωh 4 (xi ,yj )∈Ωh 4
• Because the consistency condition limh→0 γ(h) = 0 holds, the total error e
shows the same behaviour. This is convergence.
If u ∈ C 3 (Ω̄, IR), then the difference scheme converges to the exact solution and
For u ∈ C 4 (Ω̄, IR) and a uniform grid with mesh size h, the difference scheme
converges to the exact solution and
Remarks
For a uniform grid the discretization error γ(h) = O(h2 ).
With a refined numerical analysis we can show that for O(h2 )-convergence
a uniform grid is not necessary.
ut + vux = 0 , u = u(t, x)
with time t > 0 and space coordinate x. To complete the PDE problem let the
initial condition for u be
u(0, x) = f (x)
For the moment we will ignore any boundary condition.
⃝
c Rainer Callies TUM 2020 75
Remark
If we interpret the above defined PDE as a (partial) model of the transport of
a soluble pollutant by a 1D river then u(t, x) is pollutant concentration at time t
and position x along the river and v is the (constant) velocity of the river.
u(t, x) = f (x − vt)
This means that u(t, x) is just the initial concentration profile, f (x), translated
by vt along the x-axis. For v > 0, the translation is to the right and for v < 0, the
translation is to the left. In either case the pollution moves downstream at the
speed of the river. This model is unrealistic (because it e.g. neglects diffusion)
u(t,x)
concentration
vt
1 5 1+vt 5+vt x
Figure 18: Concentration profile at initial time t = 0 (red) and after time t (blue), v > 0.
1 1
0.8 0.8
concentration u
concentration u
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
x x
Figure 19: Comparison of numerical (+) and exact solutions (o) to the 1D linear
advection equation using ∆t = 0.3, 10 time steps (le.) and 25 time steps (ri.).
The numerical concentration peak has moved to the right place but is higher
than the exact solution. More critical there is some noticeable divergence from
⃝
c Rainer Callies TUM 2020 76
the exact solution in the numerical solution around x = 15 and x = 67. Some-
thing is wrong with this scheme!
∆t2
u(t + ∆t, x) = u(t, x) + ∆t · ut (t, x) + utt (x, t) + O(∆t3 )
2!
∆t2
= u(t, x) − v∆t · ux (t, x) + v 2 uxx (x, t) + O(∆t3 )
2!
Now we only need information of the n-th time step to compute the new ap-
proximates for the (n + 1)-th step, because only spatial derivates exist. Using
the equivalent operator notation we get
∂ ∆t2 ∂ 2
Lx (∆t) := 1 − v∆t + v2 , u(t + ∆t, x) = Lx (∆t)u(t, x) + O(∆t3 )
∂x 2! ∂x2
t
.. .. .. .. .. .. ..
. . . . . . .
n +1
n Uni
n –1
t=0
0 i–1 i i +1 N
x=0 x=L
Figure 20: Mesh on a semi-infinite strip used for solution to the 1D linear advection
equation. Solid blue squares indicate the location of the (known) initial values. Open
squares indicate the location of the (known) boundary values. Open circles indicate
the position of the interior points where the FD approximation is computed.
⃝
c Rainer Callies TUM 2020 77
Consistent with the notation in the previous subchapters, Un,i is the approxi-
mation to the exact solution u(tn , xi ) at the n-th time step and the i-th spatial
grid point. Some examples of FD schemes are now given.
Un,i+1 − Un,i−1
δx Un,i = , δxx Un,i = 0
2∆x
v∆t
⇒ Un+1,i = Un,i − (Un,i+1 − Un,i−1 )
2∆x
The scheme is first order in time and second order in space, i.e. it has a
truncation error of O(∆t) + O(∆x2 ). Ghost values are required at both left and
right ends of the computational domain.
Un,i − Un,i−1
δx Un,i = , δxx Un,i = 0
∆x
v∆t
⇒ Un+1,i = Un,i − (Un,i − Un,i−1 )
∆x
The scheme has a truncation error of O(∆t)+O(∆x). A ghost value is required
at the left end of the computational domain.
n+1 n+1
∆t ∆t
n n
Figure 21: Stencil for the FTCS Scheme (le.) and the FOU Scheme (ri.).
Lax-Wendroff Scheme
⃝
c Rainer Callies TUM 2020 78
Lax-Friedrich Scheme (almost FTCS, Un,i replaced by mean value)
Un,i+1 − Un,i−1
δx Un,i = , δxx Un,i = 0
2∆x
Un,i+1 + Un,i−1 v∆t
⇒ Un+1,i = − (Un,i+1 − Un,i−1 )
2 2∆x
The scheme has a truncation error of O(∆t) + O(∆x). Ghost values are re-
quired at both left and right ends of the computational domain. Although this
scheme appears to be quite similar to the FTCS scheme its performance is
very different.
n+1 n+1
∆t ∆t
n n
Figure 22: Stencil for the Lax-Wendroff (le.) and the Lax-Friedrich Scheme (ri.).
u(t + ∆t, x) − u(t − ∆t, x) = 2∆tut (t, x) + O(∆t3 ) = −2v∆tux (t, x) + O(∆t3 )
Dropping the error term, replacing the differential operator by the difference
operator and using the usual discrete notation gives the general FD scheme in
operator notation
Un+1,i = Un−1,i − 2v∆t · δx Un,i
Un,i+1 − Un,i−1
δx Un,i =
2∆x
v∆t
⇒ Un+1,i = Un−1,i −
(Un,i+1 − Un,i−1 )
∆x
The scheme has a truncation error of O(∆t2 ) + O(∆x2 ). Ghost values are re-
quired at both left and right ends of the computational domain. Initial conditions
are required at two time levels!
⃝
c Rainer Callies TUM 2020 79
time level
n+1
∆t
n
n-1
i-1 i i+1
spatial steps
2 ∆x
Example
Consider a 2π-periodic function f with f (x) = f (x + 2π), x ∈ IR. Other periodici-
ties can easily be transformed to 2π-periodicity. We want to interpolate f at the
six equidistant nodes xk = 2πk/n for k = 0, . . . 5 by an interpolant T (x) which is
a linear combination of trigonometric basis functions:
∑
2 ∑
2
( )j
T (x) = γj e ijx
= γj eix ∧ γk from T (xk ) = f (xk ), k = 0, . . . , 4
j=−2 j=−2
2πk ( )l ( )l
xk = ⇒ eixk = ei2πk/5 = ω kl , ω := ei2π/5
5
the interpolation condition T (xk ) = f (xk ) leads to the linear system
ω0 ω0 ω0 ω0 ω0 γ−2 f (x0 )
ω −2 ω −1 ω0 ω1 ω2
γ−1 f (x1 )
F⃗γ := ω −4 ω −2 ω0 ω2 ω4 · γ0 = f (x2 ) =: f⃗
ω −6 ω −3 ω0 ω3 ω6 γ1 f (x3 )
ω −8 ω −4 ω0 ω4 ω8 γ2 f (x4 )
⃝
c Rainer Callies TUM 2020 80
Remark
The solution parameters γ−(n−1)/2 , . . . , γ+(n−1)/2 (here: n = 5) are called dis-
crete Fourier coefficients of the data stored in f⃗.
Definition
Let be f : [0, 2π] → C piecewise continuous (finite number of jumps of finite size
in the real or imaginary part). Then the Fourier series of f is defined by
∞
∑ ∫ 2π
1
Sf (x) := ck e ikx
with ck := f (x)e−ikx dx , k ∈ ZZ .
2π 0
k=−∞
Theorem
Let be f ∈ Cc1 ([0, 2π], C ) (function continuous everywhere, its first derivative
piecewise continuous).
Then Sf converges uniformly to f .
Important property!
Notice that γk is an approximation to this ck or – after renaming the index – γl
approximates cl :
( )
−1 1
F⃗γ = f⃗ ⇒ γl = (F f⃗)l = F f⃗
H
n k
1∑ 1∑
n−1 n−1
= f (xk )ω −lk = f (xk )e−ilxk
n n
k=0 k=0
With the periodicity condition f (x0 )e−ilx0 = f (xn )e−ilxn we rewrite the last sum
( )
2π 1 ∑
n−1
1
2πγl = f (x0 )e−ilx0 + f (xk )e−ilxk + f (xn )e−ilxn
n 2 2
k=1
∫ 2π
≈ f (x)e−ilx dx = 2πcl
0
∑
(n−1)/2
Tn (x) := γk eikx (n = 5 in our example)
k=−(n−1)/2
⃝
c Rainer Callies TUM 2020 81
4.4.4 Von Neumann Stability Analysis
The idea of a FD scheme is that Un,i approximates u(tn , xi ) and the approxima-
tion becomes better and better as ∆x and ∆t become smaller. With increas-
ing mesh refinement round-off errors play an increasing role in the difference
equation. On the other hand discretization errors are reduced.
Let the pointwise error (also called the ’global error’),
Remark
Without perturbations on time level 0 the values u(0, xi ) are known at all grid
points and U0,i is taken to be u(0, xi ) so e0,i = 0 at all grid points. As iterations
of the FD scheme introduce additional errors, in general en,i ̸= 0. It may be that
as iterations continue errors are compounded and amplified so that en,i grows
unboundedly making the FD scheme useless.
Remark
Consistency is a condition on the structure of the formulation of the numerical
algorithm.
The discretized PDE is compared with the true PDE and for finer and finer
mesh the discretized problem (not the solution!) comes closer and closer to
the true problem.
Stability is a condition on the solution of the numerical scheme.
Here the real numerical solution of the FD scheme is investigated and error
propagation and amplification are analyzed.
Convergence is a condition on the solution of the numerical scheme.
The real numerical solution is compared to the exact solution of the PDE.
The analysis of stability due to von Neumann is based on that property (∗).
⃝
c Rainer Callies TUM 2020 82
time level
output data with output error,
n+1
resulting from input error only
i-1 i i+1
2 ∆x
Figure 24: Neumann’s stability analysis related to error propagation for 1 timestep.
We neglect the errors in the time step n → n + 1 and therefore cancel especially
the discretization error. We get
Next we will always assume that the boundary conditions are periodic.
The problem of stability for a linear problem with constant coefficients is well
understood when the influence of boundaries can be neglected or removed.
This is the case either for an infinite domain or for periodic boundary conditions
on a finite domain.
In the latter case we consider that the computational domain on the x-axis of
length L is repeated periodically and the non-periodic solution u(t, x) on the
finite interval [0, L] is transformed into a periodic one.
In case of non-periodic Dirichlet boundary conditions the approach is also pos-
sible, because the error values are then zero at the boundaries (for details see
below) and thus the errors are periodic even if the solution is not.
⃝
c Rainer Callies TUM 2020 83
Example
In our advection example periodicity is no restriction because at the beginning
and at the end of a sufficiently long river (the spatial domain) the concentration
u(t, x) of the pollutant equals zero!
Rescaling the spatial interval [x0 , x2N +1 ] to [0, 2π] and applying the DFT, we
may write,
∑
N ∑
N
√
en (x) = γn,k ejkx ⇒ en,i = en (xi ) = γn,k ejkxi , j := −1
k=−N k=−N
∑
N
ẽn+1,i = γ̃n+1,k ejkxi .
k=−N
We insert the sums into the linear FD scheme for the errors, rearrange the
coefficients and obtain
∑
N ( )
ejkxi · . . . = 0 , i = 0, . . . , 2N
k=−N
⃝
c Rainer Callies TUM 2020 84
Remark
For a better understanding, let us suppose that a special example set of errors
exists such that only one harmonic mode with index k interpolates all errors at
the xi on time level n
For a linear FD scheme the errors satisfy the same scheme (as we have seen),
thus e.g. for the FOU scheme
γn,k ejkxi − c(γn,k ejkxi − γn,k ejk(xi −∆x) = (1 − c + ce−jk·∆x) ) ·γn,k ejkxi
| {z }
=:λ
∑
N
= λ · γn,k ejkxi = γ̃n+1,l ejlxi , i = 0, 1, . . . , 2N
l=−N
Thus, the stability of the scheme will be governed by the size of the magnifica-
tion factor and it ist necessary that |λ| ≤ 1.
Remarks
Von Neumann stability analysis is applicable without further modifications only
for linear PDEs with constant coefficients.
To apply this simple approach to multi-level schemes, additional considerations
are necessary.
Stability analysis hasn’t been worked out for most FD non-linear schemes,
because it heavily relies on the theory of linear difference equations.
If G ≤ 1 is always satisfied, the scheme is stable. That often occurs for implicit
schemes.
If G ≤ 1 can never be satisfied for ∆t > 0 the scheme is unconditionally unsta-
ble.
Mostly for explicit schemes, G ≤ 1 establishes a relation between the mesh
sizes ∆t and ∆x. In that case the scheme is called conditionally stable.
⃝
c Rainer Callies TUM 2020 85
Example
We apply von Neumann stability analysis to the FOU scheme
v∆t
Un+1,i = Un,i − c · (Un,i − Un,i−1 ) = c · Un,i−1 + (1 − c) · Un,i , c :=
∆x
Step 3: Use the constraint G ≤ 1 to obtain the condition for ∆t (this step could
be algebraically tricky).
Using the triangle inequality we estimate,
(1 − c) + c · e−jk∆x ≤ |(1 − c)| + c · e−jk∆x = |1 + c| + |c|
Hence the FOU scheme for the 1D linear advection equation is stable when
0 ≤ c ≤ 1 which means that
∆x
∆t ≤ .
v
The FOU scheme is said to be conditionally stable.
Remark
In a similar way we can prove (see (A41)) that for the 1D linear advection
equation the FTCS scheme is unconditionally unstable and therefore useless
even though it is consistent!
The Lax-Friedrich scheme for the 1D advection equation is conditionally sta-
ble (see (A42)): The stability condition is fulfilled if the Courant number
C := v∆t/∆x satisfies |C| < 1 (Courant-Friedrichs-Lewy or CFL condition).
⃝
c Rainer Callies TUM 2020 86
Example
The one-dimensional heat equation ut = auxx defined on the spatial interval
[0, L] can be discretized by the FTCS scheme as
a ∆t
Un+1,j = Un,j + r (Un,j+1 − 2Un,j + Un,j−1 ) , r=
(∆x)2
Neumann stability analysis shows that
a∆t 1
r= 2 ≤
(∆x) 2
is the stability requirement for the FTCS scheme as applied to the one-
dimensional heat equation. In contrast to the advection equation, the FTCS
scheme is applicable here!
A numerical example:
For copper, a = 117 · 10−6 m2 /s. If we choose a thin rod of length 1 m with a
spatial resolution of 1 cm, then ∆x = 10−2 . Stability restriction gives
∆x2
∆t ≤ = 10−4 /(2 · 117 · 10−6 ) ≈ 0.5 [s] .
2a
For ∆x = 10−3 [m] (i.e. 1 mm), we get ∆t ≈ 0.005 [s].
Definition
A linear difference equation of order m with constant coefficients is defined by
Remark
X can be regarded as a generalization of a vector with countable, but infinitely
many components. Multiplication by a scalar (aX)n = a(X)n and vector addi-
tion (X + Y )n = (X)n + (Y )n are defined componentwise.
⃝
c Rainer Callies TUM 2020 87
Then the operator L is linear:
(A) Special solutions of homogeneous difference equations of order 2
We start the discussion with the simple case m = 2:
For this we get either the trivial solution z = 0 or z is a root of the so-called
characteristic polynomial p(z) := z 2 + a1 z + a2 .
Theorem
Let be z1 ̸= z2 two different (complex) solutions of the characteristic polynomial
p(z), then the two sequences
Proof:
Case z1 ̸= z2 is clear. For z1 = z2 we know that not only p(z1 ) = 0, but also
p′ (z1 ) = 0. Differentiation of (LX)n = z n−2 p(z) yields
nz n−1 + a1 (n − 1)z n−2 + a2 (n − 2)z n−3 = (n − 2)z n−3 p(z) + z n−2 p′ (z)
For z1 the right hand side is zero and thus nz1n−1 is a solution of the difference
equation (∗): With the substitution xn = nz1n−1 into the left side we get again
xn + a1 xn−1 + a2 xn−2 = 0
⃝
c Rainer Callies TUM 2020 88
Example
Lemma
Let be X (1) , X (2) two solutions of LX = ⃗0 and c1 , c2 ∈ C arbitrary constants.
Then also X := c1 X (1) + c2 X (2) is a solution (proof via linearity).
Remark
Situation very similar to linear ODEs. Let e.g. z1 and z2 = z1 complex roots of
p(z). Then (X (1) )n = z1n and (X (2) )n = (z1 )n are complex solutions of LX = ⃗0.
With the corollary we get the real-valued solutions
1 n
(Y (1) )n = (z1 + (z1 )n ) = Re z1n , (Y (2) )n = Im z1n
2
By this e.g. from (X (1) )n = einφ we get (Y (1) )n = cos(nφ) and (Y (2) )n = sin(nφ).
Lemma
Let be bn defined for n ≥ n0 and N ≥ n0 + 1.
Then the difference equation LX = b has exactly one solution, which takes
preset values (vorgegebene Werte) for xN −1 and xN .
⃝
c Rainer Callies TUM 2020 89
Theorem
Let be X (1) and X (2) two solutions of LX = ⃗0.
Then every solution X of LX = ⃗0 can be uniquely written as X = c1 X (1) +c2 X (2)
⇐⇒ the Wronski determinant
(1) (2)
xn xn
wn := (1)
xn−1 x(2)
n−1
Proof:
X = c1 X (1) + c2 X (2) ⇒ the system
{ }
(1) (2)
c1 xn + c2 xn = xn
(1) (2)
c1 xn−1 + c2 xn−1 = xn−1
Remark
We are now able to solve initial value problems for the difference equation
LX = ⃗0. We have to find two special solutions X (1) , X (2) with non-vanishing
Wronski determinant. c1, c2 are determined by the inital condition.
Possible initial conditions are x−1 = 1 ∧ x0 = 1 (two components of the solution
sequence X are given).
Example
Let the characteristic polynomial p of LX = ⃗0 have two different roots z1 , z2 .
We determine the Wronski determinant of the corresponding special solutions
(1) (2)
xn = z1n and xn = z2n
n
z1 z2n
wn := n−1 n−1 = (z1 z2 )n−1 (z1 − z2 ) ̸= 0 ∀ n
z1 z2
Definition
Two sequences X (1) , X (2) – which are not necessarily solutions of LX = ⃗0 –
are called linearly dependent, if ∃ c1 , c2 ∈ C ∋ c1 X (1) + c2 X (2) = ⃗0.
Otherwise they are linearly independent.
⃝
c Rainer Callies TUM 2020 90
Theorem
Let be X (1) and X (2) two solutions of LX = ⃗0 and W = {wn } the sequence of
the Wronski determinants.
As in the homogeneous case, the results are similar to those for linear ODEs.
Theorem
Let be X (1) and X (2) two linearly independent solutions of LX = ⃗0 and Y a
special (”partikuläre”) solution of LY = b.
X = Y + c1 X (1) + c2 X (2)
Proof:
For Y = {yn } we get by assumption: yn + a1 yn−1 + a2 yn−2 = bn .
Let be X an arbitrary solution of (∗∗) and let us define the difference D :=
X − Y . For that we obtain
dn + a1 dn−1 + a2 dn−2 = 0
and thus D solves the homogeneous difference equation. Because of the lin-
ear independence, D can be written as D = c1 X (1) + c2 X (2) . With X = Y + D
we get the claim.
Remark
The inhomogeneous problem is reduced to the determination of a special so-
lution Y .
⃝
c Rainer Callies TUM 2020 91
As in the linear ODE case, we can either find a proper ansatz heuristically
(e.g. if the bn are polynomials, try for xn polynomials of the same degree and
determine the free constants) or use the general method of the variation of
parameters (”Variation der Konstanten”).
for n ≥ 0.
Proof: by insertion.
Example
xn − 2xn−1 + xn−2 = 1
Using the formula from the theorem on the variation of parameters and insert-
ing bn = 1 (n = 0, 1, 2, . . .) we obtain
n
∑
1 n ∑n ∑
n+1
(n + 1)(n + 2)
xn = − = (n + 1 + i)
k:=n+1−i
= k=
1 i−1 2
i=0 i=1 k=1
⃝
c Rainer Callies TUM 2020 92
(C) Generalization to difference equations of order m
X = Y + c1 X (1) + . . . + cm X (m)
The final question now is: How to obtain a set {X (1) , . . . , X (m) } of linearly inde-
pendent solutions? The approach is similar to the case m = 2 and leads to the
following theorem.
⃝
c Rainer Callies TUM 2020 93
Theorem
Given is the linear homogeneous difference equation of order m
xn + a1 xn−1 + . . . + am xn−m = 0
Let be p(z) = z m + a1 z m−1 + . . . + am−1 z 1 + am the characteristic polynomial of
LX = ⃗0 and let be z1 , . . . , zk (k ≤ m) the k different roots of p(z) with the multi-
∑
plicity li + 1 ( li = m − k) of the i-th root zi .
Then the m sequences X (j) with
{ n }
(j) zi , p=0
xn (= xn ) = , i = 1, 2, . . . , k ,
n(n − 1) · · · (n − p + 1)zin−p , p = 1, . . . , li
Example
Consider the difference equation
xn − 2xn−2 + xn+4 = 0 ⇒ p(z) = z 4 − 2z 2 + 1 = (z + 1)2 (z − 1)2
with two double roots z1 = 1 and z2 = −1. From the theorem we get
(1) (2) (3) (4)
xn = 1 , xn = n , xn = (−1)n , xn = n · (−1)n
Example
Assume that z is a fourfold root of p(z). We want to prove that in this case
X = {n3 z n } is a solution of the difference equation (this illustrates the existence
of an alternative set in the above theorem).
For that we try to represent X as a linear combination of the original solutions:
n(n − 1)(n − 2) = n3 − 3n2 + 2n
n(n − 1) = n2 − n
n = n
and n3 = n(n − 1)(n − 2) + 3n(n − 1) + n. From that we get
n3 z n = z 3 [n(n − 1)(n − 2)z n−3 ] + 3z 2 [n(n − 1)z n−2 ] + z[nz n−1 ]
This is the required linear combination of solutions of the first set in the above
theorem to obtain a solution of the second set.
⃝
c Rainer Callies TUM 2020 94
4.4.6 Von Neumann Stability Analysis Extendend
Test example
Let us again considerate the (explicit) FOU scheme
(X (1) )n = xn = z1n
For a unique solution of our problem we still have to add the initial condition:
xn+p = αz1n+p , α ∈ IR
p=0: xn = γn,k = αz1n
⇒ xn+p = γ̃n+p,k = γn,k · z1p
Starting with xn = γn,k we get the solution xn+p = γn,k · z1p and thus the error
component induced by γn,k is damped for increasing p only for |z1 | < 1, which
is the case for
∆x
0 < ∆t < .
v
Example (wave equation and implicit multi-level scheme)
Let us consider the wave equation utt = a2 uxx on the (normalized) spatial in-
terval [0, 2π] with periodic boundary conditions.
We define a uniform spatial grid 0 =: x0 , . . . x2N +1 := 2π (even number of 2N + 2
nodes, ∆x = xj+1 − xj ) with un,0 = un,2N +1 (because of periodicity) and uniform
stepsize ∆t in time.
For the implicit difference scheme we choose
Un+1,i − 2Un,i + Un−1,i Un+1,i+1 − 2Un+1,i + Un+1,i−1
2
= a2 ·
∆t ∆x2
⃝
c Rainer Callies TUM 2020 95
and assume that Un,0 = Un,2N +1 too. Hence for the error terms en,i = Un,i − un,i
we obtain the same difference scheme because of linearity and we get en,0 =
en,2N +1 .
We apply the DFT to interpolate the error values on the n-th level and write
∑
N ∑
N
√
en (x) = jkx
γn,k e ⇒ en,i = en (xi ) = γn,k ejkxi , j := −1
k=−N k=−N
∑
N ( [ ])
[ ] 2 γn+1,k
e jkxi
· γn+1,k − 2γn,k + γn−1,k − c γn+1,k · µk − 2γn+1,k + =0
µk
k=−N
With
( )2 ( )
1 ejk∆x/2 − e−jk∆x/2 k∆x
µk − 2 + = (−4) · = −4 sin 2
µk 2i 2
( )
and sk := sin k∆x
2 the characteristic polynomial associated to the difference
equation reads
( )
1 + 4c2 s2k z 2 − 2z + 1 = 0
√
For the roots we get with |z| = z · z̄
( √ )
(k) 1
z1,2 = 2 ± 4 − 4(1 + 4c sk )
2 2
2(1 + 4c2 s2k )
√
1 ± −4c2 s2k 1 ± i · 2c · |sk |
= = ⇒
1 + 4c2 s2k 1 + 4c2 s2k
(k) 1
z1,2 = √ < 1 ∀c
1 + 4c2 s2k
⃝
c Rainer Callies TUM 2020 96
( )
(k) n
Because the basic solution is xn = zi for single roots, in that case the
scheme is stable.
Double roots only exist for sk = 0, which is impossible for k ̸= 0, because
( )
2π k∆x
∆x = and k ∈ {−N, . . . , N } and sk := sin .
N +1 2
Summary
Von Neumann stability analysis can be extended to multi-level schemes, as can
be seen in the example(s) above. Here difference equations for the amplitudes
(k)
γn,k of the error modes are formulated and the zeros zi of the associated
characteristic polynomials are calculated. The linearly independent sequences
that solve these difference equations should to be damped.
(k)
In case of single roots, the finite difference scheme is stable ⇔ maxi,k zi ≤ 1.
In case of ”< 1”, all error modes are damped.
(k)
A sufficient condition for instability is maxi,k zi > 1.
⃝
c Rainer Callies TUM 2020 97
We go back to the advection equation and choose
Un,i+1 − Un,i−1 Un+1,i+1 − Un+1,i−1
δx Un,i = α + (1 − α) , α ∈ [0, 1] , δxx Un,i = 0
2∆x 2∆x
This is a weighted average of central difference approximations to spatial
derivatives at times levels n and n + 1.
time level
n+1
∆t
n
i-1 i i+1
spatial steps
2 ∆x
( )
c Un,i+1 − Un,i−1 Un+1,i+1 − Un+1,i−1 v∆t
Un+1,i = Un,i − + , c=
2 2 2 ∆x
The scheme has a truncation error of O(∆t) + O(∆x2 ). Ghost values are re-
quired at both left and right ends of the computational domain.
The scheme is implicit so values at time level n + 1 are found by solving a
tridiagonal system of linear equations: Rearranging so that data from the same
time level is on the same side gives,
−cUn+1,i−1 + 4Un+1,i + cUn+1,i+1 = cUn,i−1 + 4Un,i − cUn,i+1 =: dn,i
The definition of dn,i reflects that the data at time level n is assumed known.
Un+1,0 and Un+1,N +1 on the left hand side are ghost values which may be
known directly (or can be calculated in terms of neighbouring values depending
on the type of boundary condition given in the problem).
This system is expressed as the matrix equation,
4 c 0 ··· 0 Un+1,1
−c 4 · · ·
c 0 0 Un+1,2
0 −c 4 0 ··· 0
c
.. .. ..
. 0 0 = d(n)
Ah U (n+1)
= 0 . . ..
0
.
.. .. ..
0 ··· 0 . . . 0
0 ··· 0 −c 4 c Un+1,N −1
0 ··· 0 −c 4 Un+1,N
This tridiagonal linear system is solved at each time step and the solution up-
dated iteratively.
⃝
c Rainer Callies TUM 2020 98
1
0.8
concentration u
0.6
0.4
0.2
0 10 20 30 40 50 60 70 80 90 100
x
Figure 26: Comparison of numerical (+) and exact solutions (o) to the 1D linear ad-
vection equation using Crank-Nicolson scheme with v = 0.5, c = 2.0, 15 time steps.
Example
The advection-diffusion equation belongs to the class of parabolic PDEs. In 1
spatial dimension it is,
For a consistent scheme and neglecting the (vanishing) truncation error the
exact solution of the PDE satisfies the same scheme and so does the error
vector
⃝
c Rainer Callies TUM 2020 99
1 1
0.8 0.8
concentration u
concentration u
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
x x
Figure 27: Time evolution of the exact solutions for pure advection (le.) and
advection-diffusion (ri.). Each of the plots contains the initial profile (red) and two
later solutions (blue and green).
Hence the FD scheme is stable if the error is not increasing (same idea as in
von Neumann stability analysis) and this is true if
−1
A B
≤ 1
The matrix norm used here is induced by the vector norm. Often the euclidian
norm is used. Hence the stability of a (linear) FD scheme can be investigated
by finding the norm of the matrix A−1 B.
This is the matrix method for stability and may be quite difficult to implement.
It should be noted that there are many definitions of norms and a FD scheme
may be stable in one norm but not in another.
The sharpest statements we obtain using the spectral radius, but this is pos-
sible only if the 2-norm and the spectral radius coincide for the matrix under
investigation.
Example
...
⃝
c Rainer Callies TUM 2020 100
4.5 Multigrid Methods
To define and analyze basic properties of multigrid methods we use the FD
approximation of a 1D Dirchlet boundary value problem as a simple example
problem.
We denote the sparse matrix by Ah , because even if the h appears not directly
in Ah , its size is affected by the choice of h!
⃝
c Rainer Callies TUM 2020 101
Does that iterative solver converge to a fixed-point, i.e. Uh → Uh∗ for ν → ∞?
(ν)
For the spectral radius, we have that ϱ(MJ (ω, h)) < 1 ∀ ω ∈ ]0, 1], because
(k)
ϱ(MJ (ω, h)) = max |λh (ω)|
k,ω
form a complete basis of the IRN , therefore we can decompose the initial error
at ν = 0
∑N
(0) (0) ∗ (0) (k)
εh = Uh − Uh = ek,h · vh
k=1
After one iteration cycle we get using the EW-/EV-property
(1) (0)
∑
N
(k) (0) (k)
εh = MJ (ω, h)εh = λh (ω) · ek,h · vh
k=1
(k) (0)
We see: The smaller the k-th EW λh (ω) is, the faster the k-th component ek,h
(0)
of the error εh is damped. Damping is only weak for EWs close to 1.
How to damp at least one half of the error components efficiently for fine grids?
(k)
Let us divide the EWs into two groups: Group 1 contains the λh with 1 ≤ k <
(k)
N/2 and group 2 contains the λh with N/2 ≤ k ≤ N .
⃝
c Rainer Callies TUM 2020 102
We want to choose the ω such that the error components which belong to
N/2 ≤ k ≤ N are damped as good as possible:
{ }
(k)
µ := max |λh |, N/2 ≤ k ≤ N
< max{1 − ω, |1 − ω(1 − cos(π)|} = max{1 − ω, |1 − 2ω|}
Using this result we find that the optimal µ∗ = 1/3 is obtained using ω ∗ = 2/3.
Each gk (x) that belongs to group 2 is more rapidly oscillating than each gk (x)
that belongs to group 1. So for our example system by the choice of ω ∗ = 2/3
we try to damp the so-called high-frequency components belonging to EWs
from group 2 as good as possible, whereas the so-called low-frequency com-
ponents belonging to EWs from group 1 are not included into the optimization
procedure.
We also obeserve, that the worst-case situation is obtained for k = 1 which
(1)
belongs to the g1 (x) with the lowest frequency: For N ≫ 1 we get λh (ω) ≈ 1
(0)
and extremely low damping of the respective error component e1,h .
On the other hand after a few cycles m we get
( 1 )m
(m) (0) (0)
ek,h < ek,h ≪ ek,h
3
for all high-frequency components (group 2 with N/2 ≤ k ≤ N ). For this reason,
although the global error decreases slowly per iteration step, it is smoothed out
very quickly – i.e. components which belong to EVs/gk (x) with high oscillations
are damped – and this process does not depend on h!
Basic idea
The two-grid strategy combines two complementary schemes. The high-fre-
quency components of the error are reduced by applying iterative methods like
Jacobi or Gauss-Seidel schemes. For this reason these methods are called
smoothers.
On the other hand, the low-frequency error components are effectively reduced
by a coarse-grid correction procedure.
⃝
c Rainer Callies TUM 2020 103
Realization of the two-grid idea in 7 steps
(m)
The residual rh can be simply calculated.
(k) (k)
To analyze the residual, we also calculate the EWs µh and EVs zh of
Ah . For that we use that D = 2 · I for our special matrix Ah in the damped
Jacobi iteration:
( (k)
)
1 − λ
= λh vh ⇔ D−1 Ah vh =
(k) (k) (k) (k) h (k)
MJ (ω, h)vh vh
ω
( (k)
)
(k) 1 − λ (k) (k)
⇔ Ah vh = 2 h
vh = 2(1 − cos(kπh) · vh
2ω
(k) (k)
Therefore the EVs zh = vh are the same as those of Mj (ω, h), only the
(k)
EWs have to be transformed: µh = 2(1 − cos(kπh).
= Ah Uh − fh = Ah Uh∗ + Ah εh − fh
(m) (m) (m)
rh
∑
N −1 ( )m
(0) (k)
= λh (ω) (k)
· ek,h · Ah vh
k=1
∑
N −1 ( )m
(k) (0) (k)
= λh (ω)(k) · µh · ek,h · vh
k=1
⃝
c Rainer Callies TUM 2020 104
(3) Restriction of the residual
If we inspect
(m) (m)
A h εh = rh (∗)
this again can be seen as a linear system that results from the FD ap-
proximation of the Poisson equation; here we know in addition that the
(m) (m)
new right hand side rh and the unknown solution eh are rel. smooth,
i.e. varying not so rapidly.
That motivates the strategy to solve the Poisson equation for the new
(m)
right hand side rh on a coarser grid: That is more efficient and possibly
the accuracy is sufficient in that case. Later we have to prove (!) that our
idea was good.
In the simplest approach, we cancel every second equation in (∗) and by
that restrict our residual to the coarse grid with a mesh size H = 2h.
For this simple approach the restriction operator IhH is
0 1 0
0 1 0
IhH = ∈ IR(N/2−1)×(N −1)
··· ···
0 1 0
⃝
c Rainer Callies TUM 2020 105
(5) Coarse-grid correction
(m) (m)
Because of smoothness one expects that εH is an approximation to εh
on all grid points that Ωh and Ωh have in common, i.e. on all grid points
xj ∈ Ωh ∩ ΩH .
(m)
To obtain an approximation of εh for all the other grid points of the fine
grid which are not grid points of the coarse grid too, we use interpolation.
With the prolongation operator IH h we get an improved approximation on
(7) Loop
Continue with step (1) of the algorithm, if necessary.
From an eigenvector analysis of the errors (see (A46)) we get the following
essential result for the v-cycle in our example problem 2:
⃝
c Rainer Callies TUM 2020 106
Theorem
Consider the BVP problem
Define the two-grid method (v-cycle) exactly as in example 2 and use the same
notation. Choose m = 2 and ω = 2/3. Then after the steps (1)-(5) (i.e. without
postsmoothing) of one v-cycle we get
Remarks
Each v-cycle reduces the error at least by a constant factor, and this is true
also for h → 0! That is an excellent result.
⃝
c Rainer Callies TUM 2020 107
h
2h
4h
8h
V-cycle W-cycle full multigrid
Figure 28: V-cycles, W-cycles and Full Multigrid use several grids several times.
Full multigrid starts on the coarsest grid. The solution on the 8h grid is inter-
(0)
polated to provide a good initial vector U4h on the 4h grid. A v-cycle between
4h and 8h improves it. Then interpolation predicts the solution on the 2h grid,
and a deeper V-cycle makes it better (using 2h, 4h, 8h). Interpolation of that
improved solution onto the finest grid gives an excellent start to the last and
deepest V-cycle.
The operation counts for a deep V-cycle and for full multigrid are certainly
greater than for a two-grid v-cycle, but only by a constant factor. That is be-
cause the count is divided by a power of 2 every time we move to a coarser
grid. For a differential equation in d space dimensions, we divide by 2d . The
cost of a V-cycle (as deep as we want) is less than a fixed multiple of the v-cycle
cost:
( ( )2 )
1 1 2d
V-cycle cost < 1 + d + d + . . . · v-cycle cost = d · v-cycle cost
2 2 2 −1
⃝
c Rainer Callies TUM 2020 108